I'm trying to wrap my head around Pandas groupby methods. I'd like to write a function that does some aggregation functions and then returns a Pandas DataFrame. Here's a grossly simplified example using sum(). I know there are easier ways to do simple sums, in real life my function is more complex:

```
import pandas as pd
df = pd.DataFrame({'col1': ['A', 'A', 'B', 'B'], 'col2':[1.0, 2, 3, 4]})
In [3]: df
Out[3]:
col1 col2
0 A 1
1 A 2
2 B 3
3 B 4
def func2(df):
dfout = pd.DataFrame({ 'col1' : df['col1'].unique() ,
'someData': sum(df['col2']) })
return dfout
t = df.groupby('col1').apply(func2)
In [6]: t
Out[6]:
col1 someData
col1
A 0 A 3
B 0 B 7
```

I did not expect to have `col1`

in there twice nor did I expect that mystery index looking thing. I really thought I would just get `col1`

& `someData`

.

In my real life application I'm grouping by more than one column and really would like to get back a DataFrame and not a Series object.

Any ideas for a solution or an explanation on what Pandas is doing in my example above?

**----- added info -----**

I should have started with this example, I think:

```
In [13]: import pandas as pd
In [14]: df = pd.DataFrame({'col1':['A','A','A','B','B','B'], 'col2':['C','D','D','D','C','C'], 'col3':[.1,.2,.4,.6,.8,1]})
In [15]: df
Out[15]:
col1 col2 col3
0 A C 0.1
1 A D 0.2
2 A D 0.4
3 B D 0.6
4 B C 0.8
5 B C 1.0
In [16]: def func3(df):
....: dfout = sum(df['col3']**2)
....: return dfout
....:
In [17]: t = df.groupby(['col1', 'col2']).apply(func3)
In [18]: t
Out[18]:
col1 col2
A C 0.01
D 0.20
B C 1.64
D 0.36
```

In the above illustration the result of the `apply()`

function is a Pandas Series. And it lacks the groupby columns from the `df.groupby`

. The essence of what I'm struggling with is how do I create a function which I apply to a groupby which returns both the result of the function AND the columns on which it was grouped?

**----- yet another update ------**

It appears that if I then do this:

```
pd.DataFrame(t).reset_index()
```

I get back a dataframe which is really close to what I was after.

The reason you are seeing the columns with 0s is because the output of `.unique()`

is an **array**.

The best way to understand how your apply is going to work is to inspect each action group-wise:

```
In [11] :g = df.groupby('col1')
In [12]: g.get_group('A')
Out[12]:
col1 col2
0 A 1
1 A 2
In [13]: g.get_group('A')['col1'].unique()
Out[13]: array([A], dtype=object)
In [14]: sum(g.get_group('A')['col2'])
Out[14]: 3.0
```

*The majority of the time you want this to be an aggregated value.*

The output of `grouped.apply`

will always have the group labels as an index (the unique values of 'col1'), so your example construction of `col1`

seems a little obtuse to me.

Note: To pop `'col1'`

(the index) back to a column you can call `reset_index`

, so in this case.

```
In [15]: g.sum().reset_index()
Out[15]:
col1 col2
0 A 3
1 B 7
```

Licensed under: CC-BY-SA with attribution

Not affiliated with: Stack Overflow