How to find all occurrences of a substring?


Question

Python has string.find() and string.rfind() to get the index of a substring in a string.

I'm wondering whether there is something like string.find_all() which can return all found indexes (not only the first from the beginning or the first from the end).

For example:

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]
1
310
5/1/2019 4:26:54 AM

Accepted Answer

There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

If you want to find overlapping matches, lookahead will do that:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you're only iterating through the results once.

455
5/16/2018 3:30:53 PM

>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

Thus, we can build it ourselves:

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

No temporary strings or regexes required.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon