Pandas Merge – How to avoid duplicating columns

StackOverflow

I am attempting a merge between two data frames. Each data frame has two index levels (date, cusip). In the columns, some columns match between the two (currency, adj date) for example.

What is the best way to merge these by index, but to not take two copies of currency and adj date.

Each data frame is 90 columns, so I am trying to avoid writing everything out by hand.

df:                 currency  adj_date   data_col1 ...
date        cusip
2012-01-01  XSDP      USD      2012-01-03   0.45
...

df2:                currency  adj_date   data_col2 ...
date        cusip
2012-01-01  XSDP      USD      2012-01-03   0.45
...

If I do:

dfNew = merge(df, df2, left_index=True, right_index=True, how="outer")

I get

dfNew:              currency_x  adj_date_x   data_col2 ... currency_y adj_date_y
date        cusip
2012-01-01  XSDP      USD      2012-01-03   0.45             USD         2012-01-03

Thank you! ...

Answer rating: 167

You can work out the columns that are only in one DataFrame and use this to select a subset of columns in the merge.

cols_to_use = df2.columns.difference(df.columns)

Then perform the merge (note this is an index object but it has a handy tolist() method).

dfNew = merge(df, df2[cols_to_use], left_index=True, right_index=True, how="outer")

This will avoid any columns clashing in the merge.

Answer rating: 121

I use the suffixes option in .merge():

dfNew = df.merge(df2, left_index=True, right_index=True,
                 how="outer", suffixes=("", "_y"))
dfNew.drop(dfNew.filter(regex="_y$").columns.tolist(),axis=1, inplace=True)

Thanks @ijoseph





Tutorials