ColdFusion: Strip Image Tag Attributes from an Image Tag -


i'm using yui rich text editor format textarea within cms i'm building. cms allows user upload , include images (its blogging application).

just prior writing contents of textarea database, coldfusion locate image tags, , strip out of extraneous attributes beyond src.

for example:

<img src="foo.jpg" title="foo!" alt="foo!" height="100" width="100" style="border=1;"> 

should come out other end as:

<img src="joo.jpg"> 

the challenge:

  1. image attributes any, or none of listed in example.
  2. the image attributes in order
  3. the image tag located anywhere in body of text textarea

this coldfusion exercise obviated if there way tell yui not allow image attributes, i'm not sure if that's (easily) possible.

many kind in advance!

best regards,

kris

the stable , safe way load html dom, strip unwanted bits (or, more securely, strip everything but wanted bits) , convert result string.

however — knowledge coldfusion not provide own dom parser html (only 1 xml), , yui rich text editor not produce xml (i.e. xhtml). bit unfortunate, not dead end.

  1. there plenty html parsers available java , using java objects coldfusion easy. include 1 of them in project.
  2. you convert html input xhtml (via jtidy) , use built-in xml parser implement scrubbing. can convert html jtidy after you're done.

to started, i've created sample strict white-listing solution html elements , attributes around built-in xml parser:

<!--- serve example of jtidy ---> <cfset xhtml = xmlparse(' <html xmlns="http://www.w3.org/1999/xhtml">   foo <img src="foo.jpg" title="foo!" alt="foo!" height="100" width="100" style="border=1;" />   bar <a href="asdasdad" title="blah" target="baz" onmouseover="dosomethingevil();">link</a>   baz <script type="text/javascript">dosomethingevil();</script> </html>', true)>  <!--- configurable list of allowed elements , attributes ---> <cfset whitelist = structnew()> <cfset whitelist["html"] = "xmlns"> <cfset whitelist["head"] = ""> <cfset whitelist["body"] = ""> <cfset whitelist["img"]  = "src"> <cfset whitelist["a"]    = "href,title,name">  <!--- delete attributes not white-listed ---> <cfloop collection="#whitelist#" item="tag">   <cfset nodes = xmlsearch(xhtml, "//*[local-name() = '#tag#']")>   <cfloop from="1" to="#arraylen(nodes)#" index="i">     <cfset nodeattrs = nodes[i].xmlattributes>     <cfloop list="#structkeylist(nodeattrs)#" index="attr">       <cfif not listfind(whitelist[tag], attr)>         <cfset structdelete(nodeattrs, attr)>       </cfif>     </cfloop>   </cfloop> </cfloop>  <!--- delete elements not white-listed ---> <cfset unwantedelements = xmlsearch(xhtml, "//*[not(contains(',#structkeylist(whitelist)#,', concat(',',local-name(),',')))]")> <cfloop from="1" to="#arraylen(unwantedelements)#" index="i">   <cfset node = unwantedelements[i]>   <cfset node.xmlattributes["x-delete-flag"] = "true">    <cfset parent = xmlsearch(node, "..")>   <cfif arraylen(parent) eq 1 , structkeyexists(parent[1], "xmlchildren")>     <cfset childnodes = parent[1].xmlchildren>     <cfloop from="#arraylen(childnodes)#" to="1" step="-1" index="k">       <cfif structkeyexists(childnodes[k].xmlattributes, "x-delete-flag")>         <cfset arraydeleteat(childnodes, k)>       </cfif>     </cfloop>   </cfif> </cfloop> 

when done, contents of xhtml looks this:

<?xml version="1.0" encoding="utf-8"?> <html xmlns="http://www.w3.org/1999/xhtml">   foo <img src="foo.jpg"/>   bar <a href="asdasdad" title="blah">link</a>   baz  </html> 

a few explanations:

  • there several descriptions , udfs on how tidy work in coldfusion around web, them.
  • xml dom handling cumbersome in coldfusion. it's neither beautiful nor elegant, still better trying use (god forbid) regular expressions achieve same effect. strongly discourage you using them problem.
  • use <cfdump> feeling how coldfusion represents xml documents , understand what's going on in code.
  • the second bit (removing non-white-listed elements) bit hairy. apparently impossible delete node coldfusion more elegantly, since coldfusion xml nodes expose neither parentnode() nor removechild() dom methods. implementation based on ben nadel's approach deleting dom nodes in cf. works, painfully aware sucks. sorry that. :-\
  • xpath: expression "//*[not(contains(',#structkeylist(whitelist)#,', concat(',',local-name(),',')))]" selects nodes local name (i.e. without looking @ xml namespace) not contained in list of allowed names. in detail:
    • // shorthand "anywhere in document".
    • * means "any element node".
    • the square brackets denote condition. commas make sure full matches taken account — otherwise contains() return "a" match in "abbr", example.

Comments

Popular posts from this blog

Add email recipient to all new Trac tickets -

400 Bad Request on Apache/PHP AddHandler wrapper -

php - Change action and image src url's with jQuery -