shuffling/permutating a DataFrame in pandas


Question

What's a simple and efficient way to shuffle a dataframe in pandas, by rows or by columns? I.e. how to write a function shuffle(df, n, axis=0) that takes a dataframe, a number of shuffles n, and an axis (axis=0 is rows, axis=1 is columns) and returns a copy of the dataframe that has been shuffled n times.

Edit: key is to do this without destroying the row/column labels of the dataframe. If you just shuffle df.index that loses all that information. I want the resulting df to be the same as the original except with the order of rows or order of columns different.

Edit2: My question was unclear. When I say shuffle the rows, I mean shuffle each row independently. So if you have two columns a and b, I want each row shuffled on its own, so that you don't have the same associations between a and b as you do if you just re-order each row as a whole. Something like:

for 1...n:
  for each col in df: shuffle column
return new_df

But hopefully more efficient than naive looping. This does not work for me:

def shuffle(df, n, axis=0):
        shuffled_df = df.copy()
        for k in range(n):
            shuffled_df.apply(np.random.shuffle(shuffled_df.values),axis=axis)
        return shuffled_df

df = pandas.DataFrame({'A':range(10), 'B':range(10)})
shuffle(df, 5)
1
66
2/10/2015 4:30:50 PM

Accepted Answer

In [16]: def shuffle(df, n=1, axis=0):     
    ...:     df = df.copy()
    ...:     for _ in range(n):
    ...:         df.apply(np.random.shuffle, axis=axis)
    ...:     return df
    ...:     

In [17]: df = pd.DataFrame({'A':range(10), 'B':range(10)})

In [18]: shuffle(df)

In [19]: df
Out[19]: 
   A  B
0  8  5
1  1  7
2  7  3
3  6  2
4  3  4
5  0  1
6  9  0
7  4  6
8  2  8
9  5  9
32
4/2/2013 7:41:27 PM

Use numpy's random.permuation function:

In [1]: df = pd.DataFrame({'A':range(10), 'B':range(10)})

In [2]: df
Out[2]:
   A  B
0  0  0
1  1  1
2  2  2
3  3  3
4  4  4
5  5  5
6  6  6
7  7  7
8  8  8
9  9  9


In [3]: df.reindex(np.random.permutation(df.index))
Out[3]:
   A  B
0  0  0
5  5  5
6  6  6
3  3  3
8  8  8
7  7  7
9  9  9
1  1  1
2  2  2
4  4  4

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon