parsing - ANTLR on a noisy data stream Part 3 -
still in process of learning antlr... have been posting 2 questions regarding parsing text , extracting information leaving aside "unwanted" words or character. following interesing discussion bart kiers on parsing noisy datastream part 1 , , parsing noisy datastream part 2, i'm ending 1 more problem...
originally, grammar looks
verb : 'sleeping' | 'walking'; subject : 'cat'|'dog'|'bird'; indirect_object : 'car'| 'sofa'; any2 :'a'..'z'+ {skip();}; : . {skip();}; parse : sentenceparts+ eof ; sentenceparts : subject verb indirect_object ; a sentence it's 10pm , lazy cat sleeping heavily on sofa in front of tv. produce following

this good... , want, i.e. extracting word cat, sleeping , sofa, leaving aside other words. now, reason, need introduce new token in grammar, let's call other : 'plane'. used later rule. still want primary rule work : subject verb indirect_object. let's token 'plane' appears in sentence,
it's 10pm , lazy cat on plane sleeping heavily on sofa in front of tv. produce following error (no surprise here lexer has clear definition of 'plane' token)

there way tell antlr if i'm entering rule sentenceparts care 3 tokens have defined, namely subject, verb or indirect_object , that, if comes across different token, not take account ? able without putting other? everywhere in rule
well in fact, might have found way it... although it's questionable @ point introduce tokens if don't want parse them, solution works :
verb : 'sleeping' | 'walking'; subject : 'cat'|'dog'|'bird'; indirect_object : 'car'| 'sofa'; other : 'plane'; other2 : 'beautiful'; other3 : 'heavilly'; any2 :'a'..'z'+ {skip();}; : . {skip();}; parse : sentenceparts+ eof ;
next : ( options {greedy=false;}: .)*;
sentenceparts
: subject next verb next indirect_object
;
produce on following sentence
it's 10pm , lazy cat on beautiful plane sleeping heavilly on sofa in front of tv following tree... intermediary token
Comments
Post a Comment