dataset <- data.frame(X = c(1, 2, 3, 4, 5, 6),
Y = c(2, NA, 5, NA, 4, 7),
Z = c(NA, 4, 6, NA, 8, 8))
dataset
X Y Z
1 1 2 NA
2 2 NA 4
3 3 5 6
4 4 NA NA
5 5 4 8
6 6 7 8
Session 7
2024-03-19
The measure of the strength and direction of a relationship between two variables:
Example : Income and Education
In many countries, there is a linear correlation between income and education level. On average, individuals with higher levels of education tend to earn more income.
Example : Technology Adoption
The adoption of new technologies often follows an S-shaped (sigmoid) curve. Initially, adoption is slow, then it accelerates rapidly, and finally, it slows down again as the technology becomes ubiquitous. This is a classic example of a nonlinear trend.
Pearson correlation, also known as the Pearson correlation coefficient or Pearson’s r, is a statistical measure used to assess the strength and direction of the linear relationship between two continuous variables. It’s widely used in various fields, including economics, social sciences, and data analysis.
Definition: Pearson correlation quantifies how two variables move together. It provides a number between -1 and 1, where:
Assumptions:
Formula: The formula for Pearson correlation between two variables X and Y with n data points is:
\[r = \frac{\sum{(X_i - \bar{X})(Y_i - \bar{Y})}}{\sqrt{\sum{(X_i - \bar{X})^2}\sum{(Y_i - \bar{Y})^2}}}\]
Where:
Causality refers to a relationship between two variables where changes in one variable directly influence or cause changes in another variable.
Establishing causality is a more complex endeavor than identifying correlation. While correlated variables might change together, it does not necessarily mean that changes in one variable are causing changes in the other.
To establish causality, researchers often need to conduct controlled experiments, observational studies, or employ advanced statistical techniques such as causal inference models.
=> Correlation does not necessarily imply causality.
Example: Ice Cream Sales and Drowning Incidents
Imagine you’re a researcher examining data on ice cream sales and the number of drowning incidents at a beach over several months. You notice a strong positive correlation between the two variables, meaning that when ice cream sales go up, the number of drowning incidents tends to increase as well. You might be tempted to conclude that eating more ice cream somehow causes more drownings or vice versa.
However, this is a classic case of where correlation does not imply causality. In reality, there’s no direct causal relationship between eating ice cream and drowning. The apparent correlation can be explained by a hidden third variable: the weather, specifically, hot summer weather.
Here’s how it works:
Hot Weather: During the summer months, when the weather is hot, people are more likely to buy ice cream to cool off, and they’re also more likely to go swimming at the beach.
Increased Beach Activity: The hot weather leads to an increase in beach activity, including more people swimming in the water.
Drowning Incidents: With more people swimming, there’s a higher likelihood of drowning incidents occurring simply because there’s a larger pool of individuals exposed to the risk of drowning.
In this scenario, both ice cream sales and drowning incidents are independently influenced by the hot weather. There’s no direct causal link between eating ice cream and drowning; instead, they are correlated because they share a common cause.
use = "everything"
(default) includes all variables in the correlation matrix, treating missing values as NA
.use = "complete"
excludes variables with any missing values from the correlation matrix.use = "pairwise"
maximizes the use of available data for each pair of variables, making the most of the available information. X Y Z
X 1.0000000 0.8304819 0.9534626
Y 0.8304819 1.0000000 0.1889822
Z 0.9534626 0.1889822 1.0000000
X Y
X 1.0000000 0.8304819
Y 0.8304819 1.0000000
X Y
1 1 2
3 3 5
5 5 4
6 6 7
X Y
X 1.0000000 0.8304819
Y 0.8304819 1.0000000
geom_smooth
Linear or Non-linear correlations
Finish Exercise 1 (Social democratic capitalism)
1 application exercise about correlation
Handbooks, videos, cheatsheets