c - count and parse all the href links out of a html file -
following previous question have been trying parse href strings out of html file in order send string solution of previous question.
this have doesn't work...
void parseurls(char* buffer) { char *begin = buffer; char *end = null; int total = 0; while(strstr(begin, "href=\"") != null) { end = strstr(begin, "</a>"); if(end != null) { char *url = (char*) malloc (1000 * sizeof(char)); strncpy(url, begin, 100); printf("url = %s\n", url); if(url) free(url); } total++; begin++; } printf("total urls = %d\n", total); return; }
basically need extract string information of href, like:
<a href="http://www.w3schools.com">visit w3schools</a>
any appreciated.
there's lot of things wrong code.
you increment begin 1 each time around loop. means find same href on , on again. think meant move
begin
afterend
?the strncpy copy 100 characters (as html longer) , not nul-terminate string. want
url[100]
= '\0' somewherewhy allocate 1000 characters , use 100?
you search
end
starting begin. means if there's before href="" you'll find instead.you don't use
end
anything.why don't search terminating quote @ end of url?
given above issues (and adding termination of url) works ok me.
given
"<a href=\"/email_services.php\">email services</a> "
it prints
url = <a href="/email_services.php">email services</a> url = href="/email_services.php">email services</a> url = href="/email_services.php">email services</a> url = href="/email_services.php">email services</a> total urls = 4
for allocation of space, think should keep result of strstr of "href=\"" (call start
, size need end - start
(+1 terminating nul). allocate space, strncpy across, add nul , robert's parent's male sibling.
also, remember href= isn't unique anchors. can appear in other tags too.
Comments
Post a Comment