Apply function to pandas groupby


I have a pandas dataframe with a column called my_labels which contains strings: 'A', 'B', 'C', 'D', 'E'. I would like to count the number of occurances of each of these strings then divide the number of counts by the sum of all the counts. I'm trying to do this in Pandas like this:

func = lambda x: x.size() / x.sum()
data = frame.groupby('my_labels').apply(func)

This code throws an error, 'DataFrame object has no attribute 'size'. How can I apply a function to calculate this in Pandas?

3/13/2013 12:01:13 AM

Accepted Answer

apply takes a function to apply to each value, not the series, and accepts kwargs. So, the values do not have the .size() method.

Perhaps this would work:

from pandas import *

d = {"my_label": Series(['A','B','A','C','D','D','E'])}
df = DataFrame(d)

def as_perc(value, total):
    return value/float(total)

def get_count(values):
    return len(values)

grouped_count = df.groupby("my_label").my_label.agg(get_count)
data = grouped_count.apply(as_perc, total=df.my_label.count())

The .agg() method here takes a function that is applied to all values of the groupby object.

3/13/2013 8:20:58 AM


g = pd.DataFrame(['A','B','A','C','D','D','E'])

# Group by the contents of column 0 
gg = g.groupby(0)  

# Create a DataFrame with the counts of each letter
histo = gg.apply(lambda x: x.count())

# Add a new column that is the count / total number of elements    
histo[1] = histo.astype(np.float)/len(g) 

print histo


   0         1
A  2  0.285714
B  1  0.142857
C  1  0.142857
D  2  0.285714
E  1  0.142857

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow