Color by Column Values in Matplotlib


Question

One of my favorite aspects of using the ggplot2 library in R is the ability to easily specify aesthetics. I can quickly make a scatterplot and apply color associated with a specific column and I would love to be able to do this with python/pandas/matplotlib. I'm wondering if there are there any convenience functions that people use to map colors to values using pandas dataframes and Matplotlib?

##ggplot scatterplot example with R dataframe, `df`, colored by col3
ggplot(data = df, aes(x=col1, y=col2, color=col3)) + geom_point()

##ideal situation with pandas dataframe, 'df', where colors are chosen by col3
df.plot(x=col1,y=col2,color=col3)

EDIT: Thank you for your responses but I want to include a sample dataframe to clarify what I am asking. Two columns contain numerical data and the third is a categorical variable. The script I am thinking of will assign colors based on this value.

import pandas as pd
df = pd.DataFrame({'Height':np.random.normal(10),
                   'Weight':np.random.normal(10),
                   'Gender': ["Male","Male","Male","Male","Male",
                              "Female","Female","Female","Female","Female"]})
1
28
3/11/2019 10:50:06 AM

Accepted Answer

Update October 2015

Seaborn handles this use-case splendidly:

import numpy 
import pandas
from  matplotlib import pyplot
import seaborn
seaborn.set(style='ticks')

numpy.random.seed(0)
N = 37
_genders= ['Female', 'Male', 'Non-binary', 'No Response']
df = pandas.DataFrame({
    'Height (cm)': numpy.random.uniform(low=130, high=200, size=N),
    'Weight (kg)': numpy.random.uniform(low=30, high=100, size=N),
    'Gender': numpy.random.choice(_genders, size=N)
})

fg = seaborn.FacetGrid(data=df, hue='Gender', hue_order=_genders, aspect=1.61)
fg.map(pyplot.scatter, 'Weight (kg)', 'Height (cm)').add_legend()

Which immediately outputs:

enter image description here

Old Answer

In this case, I would use matplotlib directly.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

def dfScatter(df, xcol='Height', ycol='Weight', catcol='Gender'):
    fig, ax = plt.subplots()
    categories = np.unique(df[catcol])
    colors = np.linspace(0, 1, len(categories))
    colordict = dict(zip(categories, colors))  

    df["Color"] = df[catcol].apply(lambda x: colordict[x])
    ax.scatter(df[xcol], df[ycol], c=df.Color)
    return fig

if 1:
    df = pd.DataFrame({'Height':np.random.normal(size=10),
                       'Weight':np.random.normal(size=10),
                       'Gender': ["Male","Male","Unknown","Male","Male",
                                  "Female","Did not respond","Unknown","Female","Female"]})    
    fig = dfScatter(df)
    fig.savefig('fig1.png')

And that gives me:

scalle plot with categorized colors As far as I know, that color column can be any matplotlib compatible color (RBGA tuples, HTML names, hex values, etc).

I'm having trouble getting anything but numerical values to work with the colormaps.

45
10/28/2015 4:46:11 PM

Actually you could use ggplot for python:

from ggplot import *
import numpy as np
import pandas as pd

df = pd.DataFrame({'Height':np.random.randn(10),
                   'Weight':np.random.randn(10),
                   'Gender': ["Male","Male","Male","Male","Male",
                              "Female","Female","Female","Female","Female"]})


ggplot(aes(x='Height', y='Weight', color='Gender'), data=df)  + geom_point()

ggplot in python


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon