python - replacing all regex matches in single line -
i have dynamic regexp in don't know in advance how many groups has replace matches xml tags
example
re.sub("(this).*(string)","this string",'<markup>\anygroup</markup>') >> "<markup>this</markup> <markup>string</markup>"
is possible in single line?
for constant regexp in example, do
re.sub("(this)(.*)(string)", r'<markup>\1</markup>\2<markup>\3</markup>', text)
note need enclose .* in parentheses if don't want lose it.
now if don't know regexp looks like, it's more difficult, should doable.
pattern = "(this)(.*)(string)" re.sub(pattern, lambda m: ''.join('<markup>%s</markup>' % s if n % 2 == 0 else s n, s in enumerate(m.groups())), text)
if first thing matched pattern doesn't have marked up, use instead, first group optionally matching prefix text should left alone:
pattern = "()(this)(.*)(string)" re.sub(pattern, lambda m: ''.join('<markup>%s</markup>' % s if n % 2 == 1 else s n, s in enumerate(m.groups())), text)
you idea.
if regexps complicated , you're not sure can make part of group, every second group needs marked up, might smarter more complicated function:
pattern = "(this).*(string)" def replacement(m): s = m.group() n_groups = len(m.groups()) # assume groups not overlap , listed left-to-right in range(n_groups, 0, -1): lo, hi = m.span(i) s = s[:lo] + '<markup>' + s[lo:hi] + '</markup>' + s[hi:] return s re.sub(pattern, replacement, text)
if need handle overlapping groups, you're on own, should doable.
Comments
Post a Comment