Regex pattern with subpattern exceptions (Python) -


i using beautifulsoup extract tabledata tags table. td's have class of either 'a','u','e','available-unavailable' or 'unavailable-available'. (yes, know quirky class names hey...)

here's example:

<tr>   <td class="u">4</td>   <td class="unavailable-available">5</td>   <td class="a'>6</td>   <td class="available-unavailable">7</td>   <td class="u">8</td>   ... 

i've been working line incorporates re.compile():

  tab = [int(tag.string) tag in soup.find('table',{'summary':tablesummary}).findall("td", attrs = {"class": re.compile('\aa')})] 

i need extract td's class name of 'a' , 'unavailable-available'. have been trying negative-lookahead assertions without luck. value regex legends can produce correct regex...

table.findall('td', attrs = {"class":re.compile(r'(^|\s)(a|unavailable-available)($|\s)')}) 

this matches start of string or whitespace followed "a" or "unavailable-available" followed whitespace or end of string. it'll match these sorts of things

class="a" class="a ui-xxx" class="ui-xxx a" class="ui-xxx ui-yyy" class="unavailable-available" class="unavailable-available foo" 

Comments

Popular posts from this blog

asp.net - repeatedly call AddImageUrl(url) to assemble pdf document -

java - Android recognize cell phone with keyboard or not? -

iphone - How would you achieve a LED Scrolling effect? -