Random row selection in Pandas dataframe


Question

Is there a way to select random rows from a DataFrame in Pandas.

In R, using the car package, there is a useful function some(x, n) which is similar to head but selects, in this example, 10 rows at random from x.

I have also looked at the slicing documentation and there seems to be nothing equivalent.

Update

Now using version 20. There is a sample method.

df.sample(n)

1
130
6/29/2017 6:08:39 PM

Accepted Answer

Something like this?

import random

def some(x, n):
    return x.ix[random.sample(x.index, n)]

Note: As of Pandas v0.20.0, ix has been deprecated in favour of loc for label based indexing.

50
4/28/2019 12:08:03 PM

With pandas version 0.16.1 and up, there is now a DataFrame.sample method built-in:

import pandas

df = pandas.DataFrame(pandas.np.random.random(100))

# Randomly sample 70% of your dataframe
df_percent = df.sample(frac=0.7)

# Randomly sample 7 elements from your dataframe
df_elements = df.sample(n=7)

For either approach above, you can get the rest of the rows by doing:

df_rest = df.loc[~df.index.isin(df_percent.index)]

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon