I have often encountered and made long-tailed degree distributions/histograms from complex networks like the figures below. They make the heavy end of these tails, well, very heavy and crowded from many observations:

However, many publications I read have much cleaner degree distributions that don't have this clumpiness at the end of the distribution and the observations are more evenly-spaced.

!

How do you make a chart like this using `NetworkX`

and `matplotlib`

?

Use log binning (see also). Here is code to take a `Counter`

object representing a histogram of degree values and log-bin the distribution to produce a sparser and smoother distribution.

```
import numpy as np
def drop_zeros(a_list):
return [i for i in a_list if i>0]
def log_binning(counter_dict,bin_count=35):
max_x = log10(max(counter_dict.keys()))
max_y = log10(max(counter_dict.values()))
max_base = max([max_x,max_y])
min_x = log10(min(drop_zeros(counter_dict.keys())))
bins = np.logspace(min_x,max_base,num=bin_count)
# Based off of: http://stackoverflow.com/questions/6163334/binning-data-in-python-with-scipy-numpy
bin_means_y = (np.histogram(counter_dict.keys(),bins,weights=counter_dict.values())[0] / np.histogram(counter_dict.keys(),bins)[0])
bin_means_x = (np.histogram(counter_dict.keys(),bins,weights=counter_dict.keys())[0] / np.histogram(counter_dict.keys(),bins)[0])
return bin_means_x,bin_means_y
```

Generating a classic scale-free network in `NetworkX`

and then plotting this:

```
import networkx as nx
ba_g = nx.barabasi_albert_graph(10000,2)
ba_c = nx.degree_centrality(ba_g)
# To convert normalized degrees to raw degrees
#ba_c = {k:int(v*(len(ba_g)-1)) for k,v in ba_c.iteritems()}
ba_c2 = dict(Counter(ba_c.values()))
ba_x,ba_y = log_binning(ba_c2,50)
plt.xscale('log')
plt.yscale('log')
plt.scatter(ba_x,ba_y,c='r',marker='s',s=50)
plt.scatter(ba_c2.keys(),ba_c2.values(),c='b',marker='x')
plt.xlim((1e-4,1e-1))
plt.ylim((.9,1e4))
plt.xlabel('Connections (normalized)')
plt.ylabel('Frequency')
plt.show()
```

Produces the following plot showing the overlap between the "raw" distribution in blue and the "binned" distribution in red.

Thoughts on how to improve this approach or feedback if I've missed something obvious are welcome.

Licensed under: CC-BY-SA with attribution

Not affiliated with: Stack Overflow