Find the nth occurrence of substring in a string


This seems like it should be pretty trivial, but I am new at Python and want to do it the most Pythonic way.

I want to find the n'th occurrence of a substring in a string.

There's got to be something equivalent to what I WANT to do which is

mystring.find("substring", 2nd)

How can you achieve this in Python?

12/10/2009 9:51:55 PM

Accepted Answer

Mark's iterative approach would be the usual way, I think.

Here's an alternative with string-splitting, which can often be useful for finding-related processes:

def findnth(haystack, needle, n):
    parts= haystack.split(needle, n+1)
    if len(parts)<=n+1:
        return -1
    return len(haystack)-len(parts[-1])-len(needle)

And here's a quick (and somewhat dirty, in that you have to choose some chaff that can't match the needle) one-liner:

'foo bar bar bar'.replace('bar', 'XXX', 1).find('bar')
12/10/2009 9:26:39 PM

Here's a more Pythonic version of the straightforward iterative solution:

def find_nth(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start+len(needle))
        n -= 1
    return start


>>> find_nth("foofoofoofoo", "foofoo", 2)

If you want to find the nth overlapping occurrence of needle, you can increment by 1 instead of len(needle), like this:

def find_nth_overlapping(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start+1)
        n -= 1
    return start


>>> find_nth_overlapping("foofoofoofoo", "foofoo", 2)

This is easier to read than Mark's version, and it doesn't require the extra memory of the splitting version or importing regular expression module. It also adheres to a few of the rules in the Zen of python, unlike the various re approaches:

  1. Simple is better than complex.
  2. Flat is better than nested.
  3. Readability counts.

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow