php - Using Zend_Dom as a screen scraper -

April 15, 2011

how?

more point...

this:

$url = 'http://php.net/manual/en/class.domelement.php'; $client = new zend_http_client($url); $response = $client->request(); $html = $response->getbody(); $dom = new zend_dom_query($html); $result = $dom->query('div.note'); zend_debug::dump($result);

gives me this:

object(zend_dom_query_result)#867 (7) {   ["_count":protected] => null   ["_cssquery":protected] => string(8) "div.note"   ["_document":protected] => object(domdocument)#79 (0) {   }   ["_nodelist":protected] => object(domnodelist)#864 (0) {   }   ["_position":protected] => int(0)   ["_xpath":protected] => null   ["_xpathquery":protected] => string(33) "//div[contains(@class, ' note ')]" }

and cannot life of me figure out how this.

i want extract various parts of retrieved data (that being div class "note" , of elements inside it... text , urls) cannot working.

someone pointed me domelement class on @ php.net when try using of methods mentioned, can't things work. how grab chunk of html page , go through grabbing various parts? how inspect object getting can @ least figure out in it?

hjälp?

the iterator implementation of zend_dom_query_result returns domelement object each iteration:

foreach ($result $element) {     var_dump($element instanceof domelement); // true }

from $element variable, can use domelement method:

foreach ($result $element) {     echo 'element id: '.$element->getattribute('id').php_eol;     if ($element->haschildnodes()) {         echo 'element has child nodes'.php_eol;     }     $anodes = $element->getelementsbytagname('a');     // etc }

you can access document element, or can use zend_dom_query_result so:

$document1 = $element->ownerdocument; $document2 = $result->getdocument(); var_dump($document1 === $document2); // true echo $document1->savehtml();

Search This Blog

shell

php - Using Zend_Dom as a screen scraper -

Comments

Post a Comment

Popular posts from this blog

Add email recipient to all new Trac tickets -

400 Bad Request on Apache/PHP AddHandler wrapper -

asp.net - repeatedly call AddImageUrl(url) to assemble pdf document -