Plot key count per unique value count in pandas


I have a set of data from which I want to plot the number of keys per unique id count (x=unique_id_count, y=key_count), and I'm trying to learn how to take advantage of pandas.

In this case:

unique_ids 1 = key count 2

unique_ids 2 = key count 1

from pandas import *
key_items = ("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c")
id_data = ("X", "X", "X", "X", "X", "X", "X", "Y", "Y", "Y", "X", "X", "X")

df = DataFrame({'keys': key_items, 'ids': id_data})

I've managed to mangle the data into what I want by pulling out the data from the dataframe and restructuring it, and rebuilding a new dataframe. In this case it's probably better to do it all in python without pandas...

unique_values = defaultdict(list)
for items in df.itertuples(index=False):
    key = items[1]
    v = items[0]

unique_values_count = {}
for k, values in unique_values.iteritems():
    unique_values_count[k] = [len(set(values))]

# reformat for plotting
key_col = ("a", "b", "c")
id_col = [unique_values_count[k][0] for k in key_col]

df2 = DataFrame({"keys":key_col, "unique_id_count": id_col})

Is there a better way to do this more directly using the initial dataframe?

2/28/2013 3:00:33 AM

Accepted Answer

s = df.groupby("keys").ids.agg(lambda x:len(x.unique()))
2/28/2013 3:25:20 AM

How about just directly use value_counts()


enter image description here

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow