  Use .corr to get the correlation between two columns

I have the following pandas dataframe Top15: I create a column that estimates the number of citable documents per person:

Top15["PopEst"] = Top15["Energy Supply"] / Top15["Energy Supply per Capita"]
Top15["Citable docs per Capita"] = Top15["Citable documents"] / Top15["PopEst"]

I want to know the correlation between the number of citable documents per capita and the energy supply per capita. So I use the .corr() method (Pearson"s correlation):

data = Top15[["Citable docs per Capita","Energy Supply per Capita"]]
correlation = data.corr(method="pearson")

I want to return a single number, but the result is: Without actual data it is hard to answer the question but I guess you are looking for something like this:

Top15["Citable docs per Capita"].corr(Top15["Energy Supply per Capita"])

That calculates the correlation between your two columns "Citable docs per Capita" and "Energy Supply per Capita".

To give an example:

import pandas as pd

df = pd.DataFrame({"A": range(4), "B": [2*i for i in range(4)]})

A  B
0  0  0
1  1  2
2  2  4
3  3  6

Then

df["A"].corr(df["B"])

gives 1 as expected.

Now, if you change a value, e.g.

df.loc[2, "B"] = 4.5

A    B
0  0  0.0
1  1  2.0
2  2  4.5
3  3  6.0

the command

df["A"].corr(df["B"])

returns

0.99586

which is still close to 1, as expected.

If you apply .corr directly to your dataframe, it will return all pairwise correlations between your columns; that"s why you then observe 1s at the diagonal of your matrix (each column is perfectly correlated with itself).

df.corr()

will therefore return

A         B
A  1.000000  0.995862
B  0.995862  1.000000

In the graphic you show, only the upper left corner of the correlation matrix is represented (I assume).

