Color by Column Values in Matplotlib


One of my favorite aspects of using the ggplot2 library in R is the ability to easily specify aesthetics. I can quickly make a scatterplot and apply color associated with a specific column and I would love to be able to do this with python/pandas/matplotlib. I'm wondering if there are there any convenience functions that people use to map colors to values using pandas dataframes and Matplotlib?

##ggplot scatterplot example with R dataframe, `df`, colored by col3
ggplot(data = df, aes(x=col1, y=col2, color=col3)) + geom_point()

##ideal situation with pandas dataframe, 'df', where colors are chosen by col3

EDIT: Thank you for your responses but I want to include a sample dataframe to clarify what I am asking. Two columns contain numerical data and the third is a categorical variable. The script I am thinking of will assign colors based on this value.

import pandas as pd
df = pd.DataFrame({'Height':np.random.normal(10),
                   'Gender': ["Male","Male","Male","Male","Male",
3/11/2019 10:50:06 AM

Accepted Answer

Update October 2015

Seaborn handles this use-case splendidly:

import numpy 
import pandas
from  matplotlib import pyplot
import seaborn

N = 37
_genders= ['Female', 'Male', 'Non-binary', 'No Response']
df = pandas.DataFrame({
    'Height (cm)': numpy.random.uniform(low=130, high=200, size=N),
    'Weight (kg)': numpy.random.uniform(low=30, high=100, size=N),
    'Gender': numpy.random.choice(_genders, size=N)

fg = seaborn.FacetGrid(data=df, hue='Gender', hue_order=_genders, aspect=1.61), 'Weight (kg)', 'Height (cm)').add_legend()

Which immediately outputs:

enter image description here

Old Answer

In this case, I would use matplotlib directly.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

def dfScatter(df, xcol='Height', ycol='Weight', catcol='Gender'):
    fig, ax = plt.subplots()
    categories = np.unique(df[catcol])
    colors = np.linspace(0, 1, len(categories))
    colordict = dict(zip(categories, colors))  

    df["Color"] = df[catcol].apply(lambda x: colordict[x])
    ax.scatter(df[xcol], df[ycol], c=df.Color)
    return fig

if 1:
    df = pd.DataFrame({'Height':np.random.normal(size=10),
                       'Gender': ["Male","Male","Unknown","Male","Male",
                                  "Female","Did not respond","Unknown","Female","Female"]})    
    fig = dfScatter(df)

And that gives me:

scalle plot with categorized colors As far as I know, that color column can be any matplotlib compatible color (RBGA tuples, HTML names, hex values, etc).

I'm having trouble getting anything but numerical values to work with the colormaps.

10/28/2015 4:46:11 PM

Actually you could use ggplot for python:

from ggplot import *
import numpy as np
import pandas as pd

df = pd.DataFrame({'Height':np.random.randn(10),
                   'Gender': ["Male","Male","Male","Male","Male",

ggplot(aes(x='Height', y='Weight', color='Gender'), data=df)  + geom_point()

ggplot in python

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow