Regex pattern with subpattern exceptions (Python) -
i using beautifulsoup extract tabledata tags table. td's have class of either 'a','u','e','available-unavailable' or 'unavailable-available'. (yes, know quirky class names hey...)
here's example:
<tr> <td class="u">4</td> <td class="unavailable-available">5</td> <td class="a'>6</td> <td class="available-unavailable">7</td> <td class="u">8</td> ...
i've been working line incorporates re.compile():
tab = [int(tag.string) tag in soup.find('table',{'summary':tablesummary}).findall("td", attrs = {"class": re.compile('\aa')})]
i need extract td's class name of 'a' , 'unavailable-available'. have been trying negative-lookahead assertions without luck. value regex legends can produce correct regex...
table.findall('td', attrs = {"class":re.compile(r'(^|\s)(a|unavailable-available)($|\s)')})
this matches start of string or whitespace followed "a" or "unavailable-available" followed whitespace or end of string. it'll match these sorts of things
class="a" class="a ui-xxx" class="ui-xxx a" class="ui-xxx ui-yyy" class="unavailable-available" class="unavailable-available foo"
Comments
Post a Comment