parsing - ANTLR on a noisy data stream Part 3 -


still in process of learning antlr... have been posting 2 questions regarding parsing text , extracting information leaving aside "unwanted" words or character. following interesing discussion bart kiers on parsing noisy datastream part 1 , , parsing noisy datastream part 2, i'm ending 1 more problem...

originally, grammar looks

verb            : 'sleeping' | 'walking'; subject         : 'cat'|'dog'|'bird';  indirect_object : 'car'| 'sofa'; any2            :'a'..'z'+ {skip();};             : . {skip();};  parse    :  sentenceparts+ eof    ;  sentenceparts     :  subject verb indirect_object     ;     

a sentence it's 10pm , lazy cat sleeping heavily on sofa in front of tv. produce following

alt text

this good... , want, i.e. extracting word cat, sleeping , sofa, leaving aside other words. now, reason, need introduce new token in grammar, let's call other : 'plane'. used later rule. still want primary rule work : subject verb indirect_object. let's token 'plane' appears in sentence,

it's 10pm , lazy cat on plane sleeping heavily on sofa in front of tv. produce following error (no surprise here lexer has clear definition of 'plane' token)

alt text



there way tell antlr if i'm entering rule sentenceparts care 3 tokens have defined, namely subject, verb or indirect_object , that, if comes across different token, not take account ? able without putting other? everywhere in rule

well in fact, might have found way it... although it's questionable @ point introduce tokens if don't want parse them, solution works :

 verb            : 'sleeping' | 'walking'; subject         : 'cat'|'dog'|'bird';  indirect_object : 'car'| 'sofa'; other       : 'plane'; other2      : 'beautiful'; other3      : 'heavilly'; any2            :'a'..'z'+ {skip();};             : . {skip();};

parse : sentenceparts+ eof ;

next : ( options {greedy=false;}: .)*;

sentenceparts
: subject next verb next indirect_object
;



produce on following sentence it's 10pm , lazy cat on beautiful plane sleeping heavilly on sofa in front of tv following tree... intermediary token

alt text


Comments

Popular posts from this blog

Add email recipient to all new Trac tickets -

400 Bad Request on Apache/PHP AddHandler wrapper -

php - Change action and image src url's with jQuery -