I'm looking for a way to do something like the various `rolling_*`

functions of `pandas`

, but I want the window of the rolling computation to be defined by a range of values (say, a range of values of a column of the DataFrame), not by the number of rows in the window.

As an example, suppose I have this data:

```
>>> print d
RollBasis ToRoll
0 1 1
1 1 4
2 1 -5
3 2 2
4 3 -4
5 5 -2
6 8 0
7 10 -13
8 12 -2
9 13 -5
```

If I do something like `rolling_sum(d, 5)`

, I get a rolling sum in which each window contains 5 rows. But what I want is a rolling sum in which each window contains a certain range of values of `RollBasis`

. That is, I'd like to be able to do something like `d.roll_by(sum, 'RollBasis', 5)`

, and get a result where the first window contains all rows whose `RollBasis`

is between 1 and 5, then the second window contains all rows whose `RollBasis`

is between 2 and 6, then the third window contains all rows whose `RollBasis`

is between 3 and 7, etc. The windows will not have equal numbers of rows, but the range of `RollBasis`

values selected in each window will be the same. So the output should be like:

```
>>> d.roll_by(sum, 'RollBasis', 5)
1 -4 # sum of elements with 1 <= Rollbasis <= 5
2 -4 # sum of elements with 2 <= Rollbasis <= 6
3 -6 # sum of elements with 3 <= Rollbasis <= 7
4 -2 # sum of elements with 4 <= Rollbasis <= 8
# etc.
```

I can't do this with `groupby`

, because `groupby`

always produces disjoint groups. I can't do it with the rolling functions, because their windows always roll by number of rows, not by values. So how can I do it?

I think this does what you want:

```
In [1]: df
Out[1]:
RollBasis ToRoll
0 1 1
1 1 4
2 1 -5
3 2 2
4 3 -4
5 5 -2
6 8 0
7 10 -13
8 12 -2
9 13 -5
In [2]: def f(x):
...: ser = df.ToRoll[(df.RollBasis >= x) & (df.RollBasis < x+5)]
...: return ser.sum()
```

The above function takes a value, in this case RollBasis, and then indexes the data frame column ToRoll based on that value. The returned series consists of ToRoll values that meet the RollBasis + 5 criterion. Finally, that series is summed and returned.

```
In [3]: df['Rolled'] = df.RollBasis.apply(f)
In [4]: df
Out[4]:
RollBasis ToRoll Rolled
0 1 1 -4
1 1 4 -4
2 1 -5 -4
3 2 2 -4
4 3 -4 -6
5 5 -2 -2
6 8 0 -15
7 10 -13 -20
8 12 -2 -7
9 13 -5 -5
```

Code for the toy example DataFrame in case someone else wants to try:

```
In [1]: from pandas import *
In [2]: import io
In [3]: text = """\
...: RollBasis ToRoll
...: 0 1 1
...: 1 1 4
...: 2 1 -5
...: 3 2 2
...: 4 3 -4
...: 5 5 -2
...: 6 8 0
...: 7 10 -13
...: 8 12 -2
...: 9 13 -5
...: """
In [4]: df = read_csv(io.BytesIO(text), header=0, index_col=0, sep='\s+')
```

Based on Zelazny7's answer, I created this more general solution:

```
def rollBy(what, basis, window, func):
def applyToWindow(val):
chunk = what[(val<=basis) & (basis<val+window)]
return func(chunk)
return basis.apply(applyToWindow)
>>> rollBy(d.ToRoll, d.RollBasis, 5, sum)
0 -4
1 -4
2 -4
3 -4
4 -6
5 -2
6 -15
7 -20
8 -7
9 -5
Name: RollBasis
```

It's still not ideal as it is very slow compared to `rolling_apply`

, but perhaps this is inevitable.

Licensed under: CC-BY-SA with attribution

Not affiliated with: Stack Overflow