replacing all regex matches in single line


Question

I have dynamic regexp in which I don't know in advance how many groups it has I would like to replace all matches with xml tags

example

re.sub("(this).*(string)","this is my string",'<markup>\anygroup</markup>')
>> "<markup>this</markup> is my <markup>string</markup>"

is that even possible in single line?

1
25
12/2/2010 5:55:44 PM

Accepted Answer

For a constant regexp like in your example, do

re.sub("(this)(.*)(string)",
       r'<markup>\1</markup>\2<markup>\3</markup>',
       text)

Note that you need to enclose .* in parentheses as well if you don't want do lose it.

Now if you don't know what the regexp looks like, it's more difficult, but should be doable.

pattern = "(this)(.*)(string)"
re.sub(pattern,
       lambda m: ''.join('<markup>%s</markup>' % s if n % 2 == 0
                         else s for n, s in enumerate(m.groups())),
       text)

If the first thing matched by your pattern doesn't necessarily have to be marked up, use this instead, with the first group optionally matching some prefix text that should be left alone:

pattern = "()(this)(.*)(string)"
re.sub(pattern,
       lambda m: ''.join('<markup>%s</markup>' % s if n % 2 == 1
                         else s for n, s in enumerate(m.groups())),
       text)

You get the idea.

If your regexps are complicated and you're not sure you can make everything part of a group, where only every second group needs to be marked up, you might do something smarter with a more complicated function:

pattern = "(this).*(string)"
def replacement(m):
    s = m.group()
    n_groups = len(m.groups())
    # assume groups do not overlap and are listed left-to-right
    for i in range(n_groups, 0, -1):
        lo, hi = m.span(i)
        s = s[:lo] + '<markup>' + s[lo:hi] + '</markup>' + s[hi:]
    return s
re.sub(pattern, replacement, text)

If you need to handle overlapping groups, you're on your own, but it should be doable.

32
12/12/2010 11:46:44 PM

re.sub() will replace everything it can. If you pass it a function for repl then you can do even more.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon