I have a set of X,Y data points (about 10k) that are easy to plot as a scatter plot but that I would like to represent as a heatmap.
I looked through the examples in MatPlotLib and they all seem to already start with heatmap cell values to generate the image.
Is there a method that converts a bunch of x,y, all different, to a heatmap (where zones with higher frequency of x,y would be "warmer")?
If you don't want hexagons, you can use numpy's
import numpy as np import numpy.random import matplotlib.pyplot as plt # Generate some test data x = np.random.randn(8873) y = np.random.randn(8873) heatmap, xedges, yedges = np.histogram2d(x, y, bins=50) extent = [xedges, xedges[-1], yedges, yedges[-1]] plt.clf() plt.imshow(heatmap.T, extent=extent, origin='lower') plt.show()
This makes a 50x50 heatmap. If you want, say, 512x384, you can put
bins=(512, 384) in the call to
In Matplotlib lexicon, i think you want a hexbin plot.
If you're not familiar with this type of plot, it's just a bivariate histogram in which the xy-plane is tessellated by a regular grid of hexagons.
So from a histogram, you can just count the number of points falling in each hexagon, discretiize the plotting region as a set of windows, assign each point to one of these windows; finally, map the windows onto a color array, and you've got a hexbin diagram.
Though less commonly used than e.g., circles, or squares, that hexagons are a better choice for the geometry of the binning container is intuitive:
hexagons have nearest-neighbor symmetry (e.g., square bins don't, e.g., the distance from a point on a square's border to a point inside that square is not everywhere equal) and
hexagon is the highest n-polygon that gives regular plane tessellation (i.e., you can safely re-model your kitchen floor with hexagonal-shaped tiles because you won't have any void space between the tiles when you are finished--not true for all other higher-n, n >= 7, polygons).
(Matplotlib uses the term hexbin plot; so do (AFAIK) all of the plotting libraries for R; still i don't know if this is the generally accepted term for plots of this type, though i suspect it's likely given that hexbin is short for hexagonal binning, which is describes the essential step in preparing the data for display.)
from matplotlib import pyplot as PLT from matplotlib import cm as CM from matplotlib import mlab as ML import numpy as NP n = 1e5 x = y = NP.linspace(-5, 5, 100) X, Y = NP.meshgrid(x, y) Z1 = ML.bivariate_normal(X, Y, 2, 2, 0, 0) Z2 = ML.bivariate_normal(X, Y, 4, 1, 1, 1) ZD = Z2 - Z1 x = X.ravel() y = Y.ravel() z = ZD.ravel() gridsize=30 PLT.subplot(111) # if 'bins=None', then color of each hexagon corresponds directly to its count # 'C' is optional--it maps values to x-y coordinates; if 'C' is None (default) then # the result is a pure 2D histogram PLT.hexbin(x, y, C=z, gridsize=gridsize, cmap=CM.jet, bins=None) PLT.axis([x.min(), x.max(), y.min(), y.max()]) cb = PLT.colorbar() cb.set_label('mean value') PLT.show()