Convert HTML numbered entities in php to unicode for use on iPhone -
i'm creating web service transfer json iphone app. i'm using json-framework receive json, , works great because automatically decodes things "\u2018". problem i'm running there doesn't seem comprehensive way characters in 1 fell swoop.
for example html_entity_decode()
gets things, leaves behind stuff ‘ (‘). in order catch these entities , convert them json-framework can use (e.g., \u2018), i'm using code convert &# \u, convert numbers hex, , strip ending semicolon.
function func($matches) { return "\u" . dechex($matches[1]); } $json = preg_replace_callback("/&#(\d{4});/", "func", $json);
this working me @ moment, doesn't feel right. seems i'm surely missing characters going come haunt me later.
does see flaws in approach? can think of characters approach miss?
any appreciated!
from getting html-encoded input? if you're scraping web page should using html parser, decode both entity , character references you. if getting them in form input data, you've got problem encodings (make sure serve page containing form utf-8 avoid this).
if must convert html-encoded stretch of literal text json, should html-decoding first json-encoding, rather attempting go straight json format (which fail bunch of other characters need escaping). use built-in decoder , encoder functions rather trying create json-encoded characters \u....
(as there traps there).
$html= 'abc " def Ӓ ghi ሴ jkl \n mno'; $raw= html_entity_decode($html, ent_compat, 'utf-8'); $json= json_encode($raw); "abc \" def \u04d2 ghi \u1234 jkl \\n mno"
Comments
Post a Comment