# pandas rolling computation with window based on values instead of counts

### Question

I'm looking for a way to do something like the various `rolling_*` functions of `pandas`, but I want the window of the rolling computation to be defined by a range of values (say, a range of values of a column of the DataFrame), not by the number of rows in the window.

As an example, suppose I have this data:

``````>>> print d
RollBasis  ToRoll
0          1       1
1          1       4
2          1      -5
3          2       2
4          3      -4
5          5      -2
6          8       0
7         10     -13
8         12      -2
9         13      -5
``````

If I do something like `rolling_sum(d, 5)`, I get a rolling sum in which each window contains 5 rows. But what I want is a rolling sum in which each window contains a certain range of values of `RollBasis`. That is, I'd like to be able to do something like `d.roll_by(sum, 'RollBasis', 5)`, and get a result where the first window contains all rows whose `RollBasis` is between 1 and 5, then the second window contains all rows whose `RollBasis` is between 2 and 6, then the third window contains all rows whose `RollBasis` is between 3 and 7, etc. The windows will not have equal numbers of rows, but the range of `RollBasis` values selected in each window will be the same. So the output should be like:

``````>>> d.roll_by(sum, 'RollBasis', 5)
1    -4    # sum of elements with 1 <= Rollbasis <= 5
2    -4    # sum of elements with 2 <= Rollbasis <= 6
3    -6    # sum of elements with 3 <= Rollbasis <= 7
4    -2    # sum of elements with 4 <= Rollbasis <= 8
# etc.
``````

I can't do this with `groupby`, because `groupby` always produces disjoint groups. I can't do it with the rolling functions, because their windows always roll by number of rows, not by values. So how can I do it?

1
20
1/13/2013 5:02:41 AM

I think this does what you want:

``````In : df
Out:
RollBasis  ToRoll
0          1       1
1          1       4
2          1      -5
3          2       2
4          3      -4
5          5      -2
6          8       0
7         10     -13
8         12      -2
9         13      -5

In : def f(x):
...:     ser = df.ToRoll[(df.RollBasis >= x) & (df.RollBasis < x+5)]
...:     return ser.sum()
``````

The above function takes a value, in this case RollBasis, and then indexes the data frame column ToRoll based on that value. The returned series consists of ToRoll values that meet the RollBasis + 5 criterion. Finally, that series is summed and returned.

``````In : df['Rolled'] = df.RollBasis.apply(f)

In : df
Out:
RollBasis  ToRoll  Rolled
0          1       1      -4
1          1       4      -4
2          1      -5      -4
3          2       2      -4
4          3      -4      -6
5          5      -2      -2
6          8       0     -15
7         10     -13     -20
8         12      -2      -7
9         13      -5      -5
``````

Code for the toy example DataFrame in case someone else wants to try:

``````In : from pandas import *

In : import io

In : text = """\
...:    RollBasis  ToRoll
...: 0          1       1
...: 1          1       4
...: 2          1      -5
...: 3          2       2
...: 4          3      -4
...: 5          5      -2
...: 6          8       0
...: 7         10     -13
...: 8         12      -2
...: 9         13      -5
...: """

``````
14
1/13/2013 7:49:12 PM

Based on Zelazny7's answer, I created this more general solution:

``````def rollBy(what, basis, window, func):
def applyToWindow(val):
chunk = what[(val<=basis) & (basis<val+window)]
return func(chunk)
return basis.apply(applyToWindow)

>>> rollBy(d.ToRoll, d.RollBasis, 5, sum)
0    -4
1    -4
2    -4
3    -4
4    -6
5    -2
6   -15
7   -20
8    -7
9    -5
Name: RollBasis
``````

It's still not ideal as it is very slow compared to `rolling_apply`, but perhaps this is inevitable.