Is there a way to select random rows from a DataFrame in Pandas.
In R, using the car package, there is a useful function
some(x, n) which is similar to head but selects, in this example, 10 rows at random from x.
I have also looked at the slicing documentation and there seems to be nothing equivalent.
Now using version 20. There is a sample method.
Something like this?
import random def some(x, n): return x.ix[random.sample(x.index, n)]
Note: As of Pandas v0.20.0,
ix has been deprecated in favour of
loc for label based indexing.
With pandas version
0.16.1 and up, there is now a
DataFrame.sample method built-in:
import pandas df = pandas.DataFrame(pandas.np.random.random(100)) # Randomly sample 70% of your dataframe df_percent = df.sample(frac=0.7) # Randomly sample 7 elements from your dataframe df_elements = df.sample(n=7)
For either approach above, you can get the rest of the rows by doing:
df_rest = df.loc[~df.index.isin(df_percent.index)]