Parsing the information of a URL out of a HTML <a></a> tags in C -


my application gets part of data large html formatted file contains large amounts of links. if search on google or yahoo or other search engines: list of urls , description or other text.

i've been trying come out function can parse url , description , save them text file it's proven hard, @ least me. so, if have:

<a href="http://www.w3schools.com">visit w3schools</a>

i parse http://www.w3schools.com , visit w3schools , save them in file.

any way achieve this? in plain c?
appreciated.

you need proper html parser, quick , dirty, try:

bool get_url(char **data, char **url, char **desc) {   bool result = false;   char *ptr = strstr(*data, "<a");    if(null != ptr)   {     *data = ptr + 2;      ptr = strstr(*data, "href=\"");     if(null != ptr)     {       *data = ptr + 6;       *url = *data;        ptr = strchr(*data, '"');       if(null != ptr)       {         *ptr = '\0';         *data = ptr + 1;          ptr = strchr(*data, '>');         if(null != ptr)         {           *data = ptr + 1;           *desc = *data;            ptr = strstr(*data, "</a>");           if(null != ptr)           {             *ptr = '\0';             *data = ptr + 4;             result = true;           }         }       }     }   }    return result; } 

not data gets updated beyond data parsed (it's in-out parameter) , string passed in gets modified. i'm feeling lazy/too busy full solutions memory allocated return strings.

also ought return errors on cascade of close scope braces (except first one) partly why stacked them that. there other neater solutions can adapted more generic.

so call function repeatedly until returns false.


Comments

Popular posts from this blog

asp.net - repeatedly call AddImageUrl(url) to assemble pdf document -

java - Android recognize cell phone with keyboard or not? -

iphone - How would you achieve a LED Scrolling effect? -