I need a rolling window (aka sliding window) iterable over a sequence/iterator/generator. Default Python iteration can be considered a special case, where the window length is 1. I'm currently using the following code. Does anyone have a more Pythonic, less verbose, or more efficient method for doing this?
def rolling_window(seq, window_size): it = iter(seq) win = [it.next() for cnt in xrange(window_size)] # First window yield win for e in it: # Subsequent windows win[:-1] = win[1:] win[-1] = e yield win if __name__=="__main__": for w in rolling_window(xrange(6), 3): print w """Example output: [0, 1, 2] [1, 2, 3] [2, 3, 4] [3, 4, 5] """
There's one in an old version of the Python docs with
from itertools import islice def window(seq, n=2): "Returns a sliding window (of width n) over data from the iterable" " s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... " it = iter(seq) result = tuple(islice(it, n)) if len(result) == n: yield result for elem in it: result = result[1:] + (elem,) yield result
The one from the docs is a little more succinct and uses
itertools to greater effect I imagine.
This seems tailor-made for a
collections.deque since you essentially have a FIFO (add to one end, remove from the other). However, even if you use a
list you shouldn't be slicing twice; instead, you should probably just
pop(0) from the list and
append() the new item.
Here is an optimized deque-based implementation patterned after your original:
from collections import deque def window(seq, n=2): it = iter(seq) win = deque((next(it, None) for _ in xrange(n)), maxlen=n) yield win append = win.append for e in it: append(e) yield win
In my tests it handily beats everything else posted here most of the time, though pillmuncher's
tee version beats it for large iterables and small windows. On larger windows, the
deque pulls ahead again in raw speed.
Access to individual items in the
deque may be faster or slower than with lists or tuples. (Items near the beginning are faster, or items near the end if you use a negative index.) I put a
sum(w) in the body of my loop; this plays to the deque's strength (iterating from one item to the next is fast, so this loop ran a a full 20% faster than the next fastest method, pillmuncher's). When I changed it to individually look up and add items in a window of ten, the tables turned and the
tee method was 20% faster. I was able to recover some speed by using negative indexes for the last five terms in the addition, but
tee was still a little faster. Overall I would estimate that either one is plenty fast for most uses and if you need a little more performance, profile and pick the one that works best.