regex - Regular expression for recognizing in-text citations -
i'm trying create regular expression capture in-text citations.
here's few example sentences of in-text citations:
... , reported results in (nivre et al., 2007) not representative ...
... 2 systems used markov chain approach (sagae , tsujii 2007).
nivre (2007) showed ...
... attaching , labeling dependencies (chen et al., 2007; dredze et al., 2007).
currently, regular expression have
\(\d*\d\d\d\d\)
which matches examples 1-3, not example 4. how can modify capture example 4?
thanks!
i’ve been using purpose lately:
#!/usr/bin/env perl use 5.010; use utf8; use strict; use autodie; use warnings qw< fatal >; use open qw< :std io :utf8 >; $citation_rx = qr{ \( (?: \s* # optional author list (?: # has start capitalized \p{uppercase_letter} # have lower case letter, or maybe apostrophe (?= [\p{lowercase_letter}\p{quotation_mark}] ) # before run of letters , admissible punctuation [\p{alphabetic}\p{dash_punctuation}\p{quotation_mark}\s,.] + ) ? # hook if , if want authors optional!! # reasonable year \b (18|19|20) \d\d # citation series suffix, six-parter [a-f] ? \b # trailing semicolon separate multiple citations ; ? \s* ) + \) }x; while (<data>) { while (/$citation_rx/gp) { ${^match}; } } __end__ ... , reported results in (nivré et al., 2007) not representative ... ... 2 systems used markov chain approach (sagae , tsujii 2007). nivre (2007) showed ... ... attaching , labelling dependencies (chen et al., 2007; dredze et al., 2007).
when run, produces:
(nivré et al., 2007) (sagae , tsujii 2007) (2007) (chen et al., 2007; dredze et al., 2007)
Comments
Post a Comment