Regex pattern with subpattern exceptions (Python) -

June 15, 2011

i using beautifulsoup extract tabledata tags table. td's have class of either 'a','u','e','available-unavailable' or 'unavailable-available'. (yes, know quirky class names hey...)

here's example:

<tr>   <td class="u">4</td>   <td class="unavailable-available">5</td>   <td class="a'>6</td>   <td class="available-unavailable">7</td>   <td class="u">8</td>   ...

i've been working line incorporates re.compile():

  tab = [int(tag.string) tag in soup.find('table',{'summary':tablesummary}).findall("td", attrs = {"class": re.compile('\aa')})]

i need extract td's class name of 'a' , 'unavailable-available'. have been trying negative-lookahead assertions without luck. value regex legends can produce correct regex...

table.findall('td', attrs = {"class":re.compile(r'(^|\s)(a|unavailable-available)($|\s)')})

this matches start of string or whitespace followed "a" or "unavailable-available" followed whitespace or end of string. it'll match these sorts of things

class="a" class="a ui-xxx" class="ui-xxx a" class="ui-xxx ui-yyy" class="unavailable-available" class="unavailable-available foo"

Search This Blog

shell

Regex pattern with subpattern exceptions (Python) -

Comments

Post a Comment

Popular posts from this blog

Add email recipient to all new Trac tickets -

400 Bad Request on Apache/PHP AddHandler wrapper -

asp.net - repeatedly call AddImageUrl(url) to assemble pdf document -