Python, XPath: Find all links to images -
i'm using lxml in python parse html , want extract link images. way right is:
//a[contains(@href,'.jpg') or contains(@href,'.jpeg') or ... (etc)]
there couple of problem approach:
- you have list possible image extensions in cases (both "jpg" , "jpg"), wich not elegant
- in weird situations, href may contain .jpg somewhere in middle, not @ end of string
i wanted use regexp, failed:
//a[regx:match(@href,'.*\.(?:png|jpg|jpeg)')]
this returned me links time ...
does knows right, elegant way or wrong regexp approach ?
instead of:
a[contains(@href,'.jpg')]
use:
a[substring(@href, string-length(@href)-3)='.jpg']
(and same expression pattern other possible endings).
the above expression xpath 1.0 equivalent following xpath 2.0 expression:
a[ends-with(@href, '.jpg')]
Comments
Post a Comment