Menu
×
   ❮     
HTML CSS JAVASCRIPT SQL PYTHON JAVA PHP HOW TO W3.CSS C C++ C# BOOTSTRAP REACT MYSQL JQUERY EXCEL XML DJANGO NUMPY PANDAS NODEJS R TYPESCRIPT ANGULAR GIT POSTGRESQL MONGODB ASP AI GO KOTLIN SASS VUE DSA GEN AI SCIPY AWS CYBERSECURITY DATA SCIENCE
     ❯   

Data Science - Statistics Correlation Matrix


Correlation Matrix

A matrix is an array of numbers arranged in rows and columns.

A correlation matrix is simply a table showing the correlation coefficients between variables.

Here, the variables are represented in the first row, and in the first column:

Correlation Matrix

The table above has used data from the full health data set.

Observations:

  • We observe that Duration and Calorie_Burnage are closely related, with a correlation coefficient of 0.89. This makes sense as the longer we train, the more calories we burn
  • We observe that there is almost no linear relationships between Average_Pulse and Calorie_Burnage (correlation coefficient of 0.02)
  • Can we conclude that Average_Pulse does not affect Calorie_Burnage? No. We will come back to answer this question later!

Correlation Matrix in Python

We can use the corr() function in Python to create a correlation matrix. We also use the round() function to round the output to two decimals:

Example

Corr_Matrix = round(full_health_data.corr(),2)
print(Corr_Matrix)
Try it Yourself »

Output:

Correlation Matrix

Using a Heatmap

We can use a Heatmap to Visualize the Correlation Between Variables:

Correlation Heatmap

The closer the correlation coefficient is to 1, the greener the squares get.

The closer the correlation coefficient is to -1, the browner the squares get.


Use Seaborn to Create a Heatmap

We can use the Seaborn library to create a correlation heat map (Seaborn is a visualization library based on matplotlib):

Example

import matplotlib.pyplot as plt
import seaborn as sns

correlation_full_health = full_health_data.corr()

axis_corr = sns.heatmap(
correlation_full_health,
vmin=-1, vmax=1, center=0,
cmap=sns.diverging_palette(50, 500, n=500),
square=True
)

plt.show()
Try it Yourself »

Example Explained:

  • Import the library seaborn as sns.
  • Use the full_health_data set.
  • Use sns.heatmap() to tell Python that we want a heatmap to visualize the correlation matrix.
  • Use the correlation matrix. Define the maximal and minimal values of the heatmap. Define that 0 is the center.
  • Define the colors with sns.diverging_palette. n=500 means that we want 500 types of color in the same color palette.
  • square = True means that we want to see squares.

×

Contact Sales

If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail:
[email protected]

Report Error

If you want to report an error, or if you want to make a suggestion, send us an e-mail:
[email protected]

W3Schools is optimized for learning and training. Examples might be simplified to improve reading and learning. Tutorials, references, and examples are constantly reviewed to avoid errors, but we cannot warrant full correctness of all content. While using W3Schools, you agree to have read and accepted our terms of use, cookie and privacy policy.

Copyright 1999-2024 by Refsnes Data. All Rights Reserved. W3Schools is Powered by W3.CSS.