Match groups in Python


Question

Is there a way in Python to access match groups without explicitly creating a match object (or another way to beautify the example below)?

Here is an example to clarify my motivation for the question:

Following Perl code

if    ($statement =~ /I love (\w+)/) {
  print "He loves $1\n";
}
elsif ($statement =~ /Ich liebe (\w+)/) {
  print "Er liebt $1\n";
}
elsif ($statement =~ /Je t\'aime (\w+)/) {
  print "Il aime $1\n";
}

translated into Python

m = re.search("I love (\w+)", statement)
if m:
  print "He loves",m.group(1)
else:
  m = re.search("Ich liebe (\w+)", statement)
  if m:
    print "Er liebt",m.group(1)
  else:
    m = re.search("Je t'aime (\w+)", statement)
    if m:
      print "Il aime",m.group(1)

looks very awkward (if-else-cascade, match object creation).

1
43
4/29/2019 2:21:55 AM

Accepted Answer

You could create a little class that returns the boolean result of calling match, and retains the matched groups for subsequent retrieval:

import re

class REMatcher(object):
    def __init__(self, matchstring):
        self.matchstring = matchstring

    def match(self,regexp):
        self.rematch = re.match(regexp, self.matchstring)
        return bool(self.rematch)

    def group(self,i):
        return self.rematch.group(i)


for statement in ("I love Mary", 
                  "Ich liebe Margot", 
                  "Je t'aime Marie", 
                  "Te amo Maria"):

    m = REMatcher(statement)

    if m.match(r"I love (\w+)"): 
        print "He loves",m.group(1) 

    elif m.match(r"Ich liebe (\w+)"):
        print "Er liebt",m.group(1) 

    elif m.match(r"Je t'aime (\w+)"):
        print "Il aime",m.group(1) 

    else: 
        print "???"
36
3/31/2010 7:35:51 PM

Less efficient, but simpler-looking:

m0 = re.match("I love (\w+)", statement)
m1 = re.match("Ich liebe (\w+)", statement)
m2 = re.match("Je t'aime (\w+)", statement)
if m0:
  print "He loves",m0.group(1)
elif m1:
  print "Er liebt",m1.group(1)
elif m2:
  print "Il aime",m2.group(1)

The problem with the Perl stuff is the implicit updating of some hidden variable. That's simply hard to achieve in Python because you need to have an assignment statement to actually update any variables.

The version with less repetition (and better efficiency) is this:

pats = [
    ("I love (\w+)", "He Loves {0}" ),
    ("Ich liebe (\w+)", "Er Liebe {0}" ),
    ("Je t'aime (\w+)", "Il aime {0}")
 ]
for p1, p3 in pats:
    m= re.match( p1, statement )
    if m:
        print p3.format( m.group(1) )
        break

A minor variation that some Perl folk prefer:

pats = {
    "I love (\w+)" : "He Loves {0}",
    "Ich liebe (\w+)" : "Er Liebe {0}",
    "Je t'aime (\w+)" : "Il aime {0}",
}
for p1 in pats:
    m= re.match( p1, statement )
    if m:
        print pats[p1].format( m.group(1) )
        break

This is hardly worth mentioning except it does come up sometimes from Perl programmers.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon