# Best way to determine if a sequence is in another sequence in Python

### Question

This is a generalization of the "string contains substring" problem to (more) arbitrary types.

Given an sequence (such as a list or tuple), what's the best way of determining whether another sequence is inside it? As a bonus, it should return the index of the element where the subsequence starts:

Example usage (Sequence in Sequence):

``````>>> seq_in_seq([5,6],  [4,'a',3,5,6])
3
>>> seq_in_seq([5,7],  [4,'a',3,5,6])
-1 # or None, or whatever
``````

So far, I just rely on brute force and it seems slow, ugly, and clumsy.

1
24
6/8/2017 10:52:37 PM

I second the Knuth-Morris-Pratt algorithm. By the way, your problem (and the KMP solution) is exactly recipe 5.13 in Python Cookbook 2nd edition. You can find the related code at http://code.activestate.com/recipes/117214/

It finds all the correct subsequences in a given sequence, and should be used as an iterator:

``````>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,6]): print s
3
>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,7]): print s
(nothing)
``````
20
1/8/2009 9:06:56 PM

Here's a brute-force approach `O(n*m)` (similar to @mcella's answer). It might be faster than the Knuth-Morris-Pratt algorithm implementation in pure Python `O(n+m)` (see @Gregg Lind answer) for small input sequences.

``````#!/usr/bin/env python
def index(subseq, seq):
"""Return an index of `subseq`uence in the `seq`uence.

Or `-1` if `subseq` is not a subsequence of the `seq`.

The time complexity of the algorithm is O(n*m), where

n, m = len(seq), len(subseq)

>>> index([1,2], range(5))
1
>>> index(range(1, 6), range(5))
-1
>>> index(range(5), range(5))
0
>>> index([1,2], [0, 1, 0, 1, 2])
3
"""
i, n, m = -1, len(seq), len(subseq)
try:
while True:
i = seq.index(subseq, i + 1, n - m + 1)
if subseq == seq[i:i + m]:
return i
except ValueError:
return -1

if __name__ == '__main__':
import doctest; doctest.testmod()
``````

I wonder how large is the small in this case?