Matching multiple regex patterns with the alternation operator?


I ran into a small problem using Python Regex.

Suppose this is the input:


What I'm trying to achieve is obtain whatever is between parentheses as a single match, and any char outside as an individual match. The desired result would be along the lines of:


The order of matches should be kept.

I've tried obtaining this with Python 3.3, but can't seem to figure out the correct Regex. So far I have:

matches = findall(r'\((.*?)\)|\w', '(zyx)bc')

print(matches) yields the following:


Any ideas what I'm doing wrong?

10/1/2014 1:25:31 PM

Accepted Answer

From the documentation of re.findall:

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

While your regexp is matching the string three times, the (.*?) group is empty for the second two matches. If you want the output of the other half of the regexp, you can add a second group:

>>> re.findall(r'\((.*?)\)|(\w)', '(zyx)bc')
[('zyx', ''), ('', 'b'), ('', 'c')]

Alternatively, you could remove all the groups to get a simple list of strings again:

>>> re.findall(r'\(.*?\)|\w', '(zyx)bc')
['(zyx)', 'b', 'c']

You would need to manually remove the parentheses though.

1/6/2013 1:00:29 PM

Let's take a look at our output using re.DEBUG.

  literal 40 
  subpattern 1 
    min_repeat 0 65535 
      any None 
  literal 41 
    category category_word

Ouch, there's only one subpattern in there but re.findall only pulls out subpatterns if one exists!

a = re.findall(r'\((.*?)\)|(.)', '(zyx)bc',re.DEBUG); a
[('zyx', ''), ('', 'b'), ('', 'c')]
  literal 40 
  subpattern 1 
    min_repeat 0 65535 
      any None 
  literal 41 
  subpattern 2 
    any None

Better. :)

Now we just have to make this into the format you want.

[i[0] if i[0] != '' else i[1] for i in a]
['zyx', 'b', 'c']

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow