How to check if a column exists in Pandas

StackOverflow

Is there a way to check if a column exists in a Pandas DataFrame?

Suppose that I have the following DataFrame:

>>> import pandas as pd
>>> from random import randint
>>> df = pd.DataFrame({"A": [randint(1, 9) for x in xrange(10)],
                       "B": [randint(1, 9)*10 for x in xrange(10)],
                       "C": [randint(1, 9)*100 for x in xrange(10)]})
>>> df
   A   B    C
0  3  40  100
1  6  30  200
2  7  70  800
3  3  50  200
4  7  50  400
5  4  10  400
6  3  70  500
7  8  30  200
8  3  40  800
9  6  60  200

and I want to calculate df["sum"] = df["A"] + df["C"]

But first I want to check if df["A"] exists, and if not, I want to calculate df["sum"] = df["B"] + df["C"] instead.

Answer rating: 831

This will work:

if "A" in df:

But for clarity, I"d probably write it as:

if "A" in df.columns:

Answer rating: 132

To check if one or more columns all exist, you can use set.issubset, as in:

if set(["A","C"]).issubset(df.columns):
   df["sum"] = df["A"] + df["C"]                

As @brianpck points out in a comment, set([]) can alternatively be constructed with curly braces,

if {"A", "C"}.issubset(df.columns):

See this question for a discussion of the curly-braces syntax.

Or, you can use a list comprehension, as in:

if all([item in df.columns for item in ["A","C"]]):




Get Solution for free from DataCamp guru