yacc - BNF: input going to wrong nonterminal -
i'm developing bnf chess algebraic notation , ran interesting case, input going wrong non-terminal.
my start bnf rule follows (note intentionally doesn't include castling or notes):
algebraic_notation : piece start_position capture end_position promotion
piece
, start_position
, capture
, , promotion
can empty, allowing move 'd4'. problem when such move entered, input ('d4') taken start_position
, resulting in error b/c there no more input end_position
, cannot empty.
the obvious hack/workaround allow end_position
empty , check see if got input , act accordingly.
this work, know if there way deal this. possible input not go first matching symbol if causes entire expression not match?
another question whether standard behaviour bnf, or problem yaccer i'm using: ply v 3.3.
tried using flex/bison , got same thing. appears not specific ply.
here relevant rules completeness:
algebraic_notation : piece start_position capture end_position promotion piece : king | queen | bishop | knight | rook | pawn pawn : empty start_position : file | number | file number | empty end_position : file number | empty // line hack/workaround capture : capture | empty promotion : equal queen | equal rook | equal knight | equal bishop | empty empty :
the problem you're ignoring shift/reduce conflict parser generator. while yacc/bison (and presumably ply) resolve errors you, resolution might not doing want, , might result in parser parses langauge other 1 trying parse.
whenever shift/reduce (or reduce/reduce) conflict lr parser generator, need understand conflict (and why occurs) know whether can ignore or whether need fix it. lets fix grammar getting rid of 'hack' (which wrong , not want parse), useless 'empty' rule (which confuses things):
%token file number %% algebraic_notation : piece start_position capture end_position promotion piece : 'k' | 'q' | 'b' | 'n' | 'r' | /*pawn*/ start_position : file | number | file number | /*empty*/ end_position : file number capture : 'x' | /*empty*/ promotion : '=' 'q' | '=' 'r' | '=' 'n' | '=' 'b' | /*empty*/
now when run through 'bison -v' (always use -v verbose output file -- i'm not sure ply's equivalent is), message shift/reduce conflict, , if in .output
file can see is:
state 7 1 algebraic_notation: piece . start_position capture end_position promotion file shift, , go state 9 number shift, , go state 10 file [reduce using rule 11 (start_position)] $default reduce using rule 11 (start_position) start_position go state 11
this telling after seeing piece
, when next token file
, doesn't know whether should shift (treating file
(part of) start_position
) or reduce (giving empty start_position
). that's because needs more lookahead see if there's second position use end_position
know do, ignoring conflict result in parser fails parse lots of valid things (basically, empty start_position
, capture
).
the best way solve lookahead-related shift-reduce conflict involving empty production (or pretty conflict involving empty production, really) unfactor grammar -- rid of empty rule , duplicate rule uses non-terminal both , without it. in case, gives rules:
algebraic_notation : piece capture end_position promotion algebraic_notation : piece start_position capture end_position promotion start_position : file | number | file number
(the other rules unchanged) still have shift-reduce conflict:
state 7 1 algebraic_notation: piece . capture end_position promotion 2 | piece . start_position capture end_position promotion file shift, , go state 9 number shift, , go state 10 'x' shift, , go state 11 file [reduce using rule 14 (capture)] start_position go state 12 capture go state 13
basically, we've moved conflict 1 step , have problem empty capture
rule. unfactor well:
algebraic_notation : piece end_position promotion algebraic_notation : piece capture end_position promotion algebraic_notation : piece start_position end_position promotion algebraic_notation : piece start_position capture end_position promotion capture : 'x'
and bison reports no more conflicts, can reasonably confident parse way want. can simplify bit more getting rid of capture
rule , using literal 'x'
in algebraic_notation
rule. prefer this, think clearer avoid unnecessary indirection:
%token file number %% algebraic_notation : piece end_position promotion algebraic_notation : piece 'x' end_position promotion algebraic_notation : piece start_position end_position promotion algebraic_notation : piece start_position 'x' end_position promotion piece : 'k' | 'q' | 'b' | 'n' | 'r' | /*pawn*/ start_position : file | number | file number end_position : file number promotion : '=' 'q' | '=' 'r' | '=' 'n' | '=' 'b' | /*empty*/
Comments
Post a Comment