Python, XPath: Find all links to images -


i'm using lxml in python parse html , want extract link images. way right is:

//a[contains(@href,'.jpg') or contains(@href,'.jpeg') or ... (etc)] 

there couple of problem approach:

  • you have list possible image extensions in cases (both "jpg" , "jpg"), wich not elegant
  • in weird situations, href may contain .jpg somewhere in middle, not @ end of string

i wanted use regexp, failed:

//a[regx:match(@href,'.*\.(?:png|jpg|jpeg)')] 

this returned me links time ...

does knows right, elegant way or wrong regexp approach ?

instead of:

a[contains(@href,'.jpg')] 

use:

a[substring(@href, string-length(@href)-3)='.jpg'] 

(and same expression pattern other possible endings).

the above expression xpath 1.0 equivalent following xpath 2.0 expression:

a[ends-with(@href, '.jpg')] 

Comments

Popular posts from this blog

asp.net - repeatedly call AddImageUrl(url) to assemble pdf document -

java - Android recognize cell phone with keyboard or not? -

iphone - How would you achieve a LED Scrolling effect? -