How can I get the start and end positions of all matches using the
re module? For example given the pattern
r'[a-z]' and the string
'a1b2c3d4' I'd want to get the positions where it finds each letter. Ideally, I'd like to get the text of the match back too.
import re p = re.compile("[a-z]") for m in p.finditer('a1b2c3d4'): print(m.start(), m.group())
span() returns both start and end indexes in a single tuple. Since the match method only checks if the RE matches at the start of a string, start() will always be zero. However, the search method of RegexObject instances scans through the string, so the match may not start at zero in that case.
>>> p = re.compile('[a-z]+') >>> print p.match('::: message') None >>> m = p.search('::: message') ; print m <re.MatchObject instance at 80c9650> >>> m.group() 'message' >>> m.span() (4, 11)
Combine that with:
In Python 2.2, the finditer() method is also available, returning a sequence of MatchObject instances as an iterator.
>>> p = re.compile( ... ) >>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...') >>> iterator <callable-iterator object at 0x401833ac> >>> for match in iterator: ... print match.span() ... (0, 2) (22, 24) (29, 31)
you should be able to do something on the order of
for match in re.finditer(r'[a-z]', 'a1b2c3d4'): print match.span()