# Use .corr to get the correlation between two columns

StackOverflow

I have the following pandas dataframe `Top15`:

I create a column that estimates the number of citable documents per person:

``````Top15["PopEst"] = Top15["Energy Supply"] / Top15["Energy Supply per Capita"]
Top15["Citable docs per Capita"] = Top15["Citable documents"] / Top15["PopEst"]
``````

I want to know the correlation between the number of citable documents per capita and the energy supply per capita. So I use the `.corr()` method (Pearson"s correlation):

``````data = Top15[["Citable docs per Capita","Energy Supply per Capita"]]
correlation = data.corr(method="pearson")
``````

I want to return a single number, but the result is:

Without actual data it is hard to answer the question but I guess you are looking for something like this:

``````Top15["Citable docs per Capita"].corr(Top15["Energy Supply per Capita"])
``````

That calculates the correlation between your two columns `"Citable docs per Capita"` and `"Energy Supply per Capita"`.

To give an example:

``````import pandas as pd

df = pd.DataFrame({"A": range(4), "B": [2*i for i in range(4)]})

A  B
0  0  0
1  1  2
2  2  4
3  3  6
``````

Then

``````df["A"].corr(df["B"])
``````

gives `1` as expected.

Now, if you change a value, e.g.

``````df.loc[2, "B"] = 4.5

A    B
0  0  0.0
1  1  2.0
2  2  4.5
3  3  6.0
``````

the command

``````df["A"].corr(df["B"])
``````

returns

``````0.99586
``````

which is still close to 1, as expected.

If you apply `.corr` directly to your dataframe, it will return all pairwise correlations between your columns; that"s why you then observe `1s` at the diagonal of your matrix (each column is perfectly correlated with itself).

``````df.corr()
``````

will therefore return

``````          A         B
A  1.000000  0.995862
B  0.995862  1.000000
``````

In the graphic you show, only the upper left corner of the correlation matrix is represented (I assume).

There can be cases, where you get `NaN`s in your solution - check this post for an example.

If you want to filter entries above/below a certain threshold, you can check this question. If you want to plot a heatmap of the correlation coefficients, you can check this answer and if you then run into the issue with overlapping axis-labels check the following post.