I'm hoping to use pandas as the main Trace (series of points in parameter space from MCMC) object.

I have a list of dicts of string->array which I would like to store in pandas. The keys in the dicts are always the same, and for each key the shape of the numpy array is always the same, but the shape may be different for different keys and could have a different number of dimensions.

I had been using `self.append(dict_list, ignore_index = True)`

which seems to work well for 1d values, but for nd>1 values pandas stores the values as objects which doesn't allow for nice plotting and other nice things. Any suggestions on how to get better behavior?

**Sample data**

```
point = {'x': array(-0.47652306228698005),
'y': array([[-0.41809043],
[ 0.48407823]])}
points = 10 * [ point]
```

I'd like to be able to do something like

```
df = DataFrame(points)
```

or

```
df = DataFrame()
df.append(points, ignore_index=True)
```

and have

```
>> df['x'][1].shape
()
>> df['y'][1].shape
(2,1)
```

The relatively-new library *xray*[1] has `Dataset`

and `DataArray`

structures that do exactly what you ask.

Here it is my take on your problem, written as an *IPython* session:

```
>>> import numpy as np
>>> import xray
>>> ## Prepare data:
>>> #
>>> point = {'x': np.array(-0.47652306228698005),
... 'y': np.array([[-0.41809043],
... [ 0.48407823]])}
>>> points = 10 * [point]
>>> ## Convert to Xray DataArrays:
>>> #
>>> list_x = [p['x'] for p in points]
>>> list_y = [p['y'] for p in points]
>>> da_x = xray.DataArray(list_x, [('x', range(len(list_x)))])
>>> da_y = xray.DataArray(list_y, [
... ('x', range(len(list_y))),
... ('y0', range(2)),
... ('y1', [0]),
... ])
```

These are the two `DataArray`

instances we built so far:

```
>>> print(da_x)
<xray.DataArray (x: 10)>
array([-0.47652306, -0.47652306, -0.47652306, -0.47652306, -0.47652306,
-0.47652306, -0.47652306, -0.47652306, -0.47652306, -0.47652306])
Coordinates:
* x (x) int32 0 1 2 3 4 5 6 7 8 9
>>> print(da_y.T) ## Transposed, to save lines.
<xray.DataArray (y1: 1, y0: 2, x: 10)>
array([[[-0.41809043, -0.41809043, -0.41809043, -0.41809043, -0.41809043,
-0.41809043, -0.41809043, -0.41809043, -0.41809043, -0.41809043],
[ 0.48407823, 0.48407823, 0.48407823, 0.48407823, 0.48407823,
0.48407823, 0.48407823, 0.48407823, 0.48407823, 0.48407823]]])
Coordinates:
* x (x) int32 0 1 2 3 4 5 6 7 8 9
* y0 (y0) int32 0 1
* y1 (y1) int32 0
```

We can now merge these two `DataArray`

on their common `x`

dimension into a `DataSet`

:

```
>>> ds = xray.Dataset({'X':da_x, 'Y':da_y})
>>> print(ds)
<xray.Dataset>
Dimensions: (x: 10, y0: 2, y1: 1)
Coordinates:
* x (x) int32 0 1 2 3 4 5 6 7 8 9
* y0 (y0) int32 0 1
* y1 (y1) int32 0
Data variables:
X (x) float64 -0.4765 -0.4765 -0.4765 -0.4765 -0.4765 -0.4765 -0.4765 ...
Y (x, y0, y1) float64 -0.4181 0.4841 -0.4181 0.4841 -0.4181 0.4841 -0.4181 ...
```

And we can finally access and aggregate data the way you wanted:

```
>>> ds['X'].sum()
<xray.DataArray 'X' ()>
array(-4.765230622869801)
>>> ds['Y'].sum()
<xray.DataArray 'Y' ()>
array(0.659878)
>>> ds['Y'].sum(axis=1)
<xray.DataArray 'Y' (x: 10, y1: 1)>
array([[ 0.0659878],
[ 0.0659878],
[ 0.0659878],
[ 0.0659878],
[ 0.0659878],
[ 0.0659878],
[ 0.0659878],
[ 0.0659878],
[ 0.0659878],
[ 0.0659878]])
Coordinates:
* x (x) int32 0 1 2 3 4 5 6 7 8 9
* y1 (y1) int32 0
>>> np.all(ds['Y'].sum(axis=1) == ds['Y'].sum(dim='y0'))
True
>>>> ds['X'].sum(dim='y0')
Traceback (most recent call last):
ValueError: 'y0' not found in array dimensions ('x',)
```

[1] A library for handling N-dimensional data with labels, like pandas does for 2D: http://xray.readthedocs.org/en/stable/data-structures.html#dataset

Licensed under: CC-BY-SA with attribution

Not affiliated with: Stack Overflow