java - HtmlUnit download link from DIV -
i'm trying download images website , stored in table under div elements. i'm using java htmlunit library , have far:
_page = (htmlpage) linktopicspage.click(); list<htmlelement> _divlist = _page.getelementsbyidandorname("imgcontainer"); int num = 0; (htmlelement el : _divlist) { inputstream = el.click().getwebresponse().getcontentasstream(); file path = new file(_downloadpath+_car.getregnumber()); if (!path.exists()) path.mkdir(); writetofile(is,new file(_downloadpath+_car.getregnumber()+system.getproperty("file.separator")+_car.getregnumber()+"["+num+"].jpg")); num++; }
the website code looks this:
<table id="ctl00_contentplacecontenido_gridimagenes" cellspacing="0" border="0" style="border-collapse:collapse;"> <tr> <td> <div id="imgcontainer"> <div class="imgitem"> <a href="descarga.aspx?idowner=40312&id=598477&action=view"> <img alt="foto frente izquierda" border="0" src="imgthumb.aspx?idowner=40312&id=598477&action=view"/> </a> <br /> foto frente izquierda </div> </div> </td><td>
but i'm dowloading html code instead of images themselves. don't know how can href attribute htmldivision elements in "_divlist". suggestions?
thanks
edit1:
this current code i'm using download them, problem code i'm downloading elements don't need (i'm downloading has "descarga.aspx" in href). that's why want more specific , download images. can see, htmlanchors searching "descarga.aspx" not redirecting me page:
list<htmlanchor> picslinks = new linkedlist<htmlanchor>(); picslinks = _page.getanchors(); int num = 0; (htmlanchor currentpic : picslinks) { if (currentpic.gethrefattribute().contains("descarga.aspx")) { inputstream = currentpic.click().getwebresponse().getcontentasstream(); file path = new file(_downloadpath+_car.getregnumber()); if (!path.exists()) path.mkdir(); writetofile(is,new file(_downloadpath+_car.getregnumber()+system.getproperty("file.separator")+_car.getregnumber()+"["+num+"].jpg")); _log.append("....downloaded picture "+regnumber+num+".jpg\n"); num++; } _log.setcaretposition(_log.getdocument().getlength()); }
i can't without seeing whole site, suspect it's clicking on "imgcontainer" , contains more image. happens when manually click on words "foto frente izquierda" in browser?
try clicking on image directly, using getbyxpath , "//div[@class='imgitem']/a" (off top of head) instead of getelementsbyidandorname.
Comments
Post a Comment