cartesian product in pandas


Question

I have two pandas dataframes:

from pandas import DataFrame
df1 = DataFrame({'col1':[1,2],'col2':[3,4]})
df2 = DataFrame({'col3':[5,6]})     

What is the best practice to get their cartesian product (of course without writing it explicitly like me)?

#df1, df2 cartesian product
df_cartesian = DataFrame({'col1':[1,2,1,2],'col2':[3,4,3,4],'col3':[5,5,6,6]})
1
76
11/7/2012 12:54:09 PM

Accepted Answer

If you have a key that is repeated for each row, then you can produce a cartesian product using merge (like you would in SQL).

from pandas import DataFrame, merge
df1 = DataFrame({'key':[1,1], 'col1':[1,2],'col2':[3,4]})
df2 = DataFrame({'key':[1,1], 'col3':[5,6]})

merge(df1, df2,on='key')[['col1', 'col2', 'col3']]

Output:

   col1  col2  col3
0     1     3     5
1     1     3     6
2     2     4     5
3     2     4     6

See here for the documentation: http://pandas.pydata.org/pandas-docs/stable/merging.html#brief-primer-on-merge-methods-relational-algebra

66
10/24/2018 6:04:15 PM

Use pd.MultiIndex.from_product as an index in an otherwise empty dataframe, then reset its index, and you're done.

a = [1, 2, 3]
b = ["a", "b", "c"]

index = pd.MultiIndex.from_product([a, b], names = ["a", "b"])

pd.DataFrame(index = index).reset_index()

out:

   a  b
0  1  a
1  1  b
2  1  c
3  2  a
4  2  b
5  2  c
6  3  a
7  3  b
8  3  c

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon