Convert HTML numbered entities in php to unicode for use on iPhone -


i'm creating web service transfer json iphone app. i'm using json-framework receive json, , works great because automatically decodes things "\u2018". problem i'm running there doesn't seem comprehensive way characters in 1 fell swoop.

for example html_entity_decode() gets things, leaves behind stuff ‘ (‘). in order catch these entities , convert them json-framework can use (e.g., \u2018), i'm using code convert &# \u, convert numbers hex, , strip ending semicolon.

function func($matches) {   return "\u" . dechex($matches[1]); } $json = preg_replace_callback("/&#(\d{4});/", "func", $json); 

this working me @ moment, doesn't feel right. seems i'm surely missing characters going come haunt me later.

does see flaws in approach? can think of characters approach miss?

any appreciated!

from getting html-encoded input? if you're scraping web page should using html parser, decode both entity , character references you. if getting them in form input data, you've got problem encodings (make sure serve page containing form utf-8 avoid this).

if must convert html-encoded stretch of literal text json, should html-decoding first json-encoding, rather attempting go straight json format (which fail bunch of other characters need escaping). use built-in decoder , encoder functions rather trying create json-encoded characters \u.... (as there traps there).

$html= 'abc " def Ӓ ghi ሴ jkl \n mno'; $raw= html_entity_decode($html, ent_compat, 'utf-8'); $json= json_encode($raw);  "abc \" def \u04d2 ghi \u1234 jkl \\n mno" 

Comments

Popular posts from this blog

asp.net - repeatedly call AddImageUrl(url) to assemble pdf document -

java - Android recognize cell phone with keyboard or not? -

iphone - How would you achieve a LED Scrolling effect? -