[1] -0.01879769 0.29734468 -0.72161879 0.81348481 -0.58172615 -0.05933099
[7] 0.32264855 0.54180014 1.63311884 1.80059136
25% 50% 75%
-0.7437115 0.1055856 0.7417393
[1] 0.1055856
Session 6
2024-03-12
Exploratory analysis is the process of summarizing multiple values of one or more variables using a set of concise summary statistics. It provides an initial understanding of the data, aiding in decision-making, hypothesis generation, and identifying potential outliers or anomalies.
It aims to uncover patterns, insights, and key characteristics in the data, helping to understand the underlying structure and relationships:
This section introduces the concept of distributions in statistics. A distribution refers to the way data values are spread out or organized. It’s a fundamental concept for understanding the characteristics of data and making informed decisions based on it.
Descriptive statistics are numerical measures that provide insight into the central tendency and variability of a dataset.
There are many visual representations of distributions using plots.
Histograms: A graphical representation of the frequency distribution of data. It divides the data into intervals (bins) and displays the number of data points in each bin. Histograms help understand the shape and spread of data.
Density Curves: A smoothed representation of the distribution of data. It provides insights into the probability density function of continuous data. Density curves are often used to approximate the shape of distributions.
Quantiles:
More about distributions:
Factors:
Cross-tabulation (1/2)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
Month
5 6 7 8 9
(56,72] 24 3 0 1 10
(72,79] 5 15 2 9 10
(79,85] 1 7 19 7 5
(85,97] 0 5 10 14 5
Cross-tabulation (2/2)
Month
5 6 7 8 9
(56,72] 0.63157895 0.07894737 0.00000000 0.02631579 0.26315789
(72,79] 0.12195122 0.36585366 0.04878049 0.21951220 0.24390244
(79,85] 0.02564103 0.17948718 0.48717949 0.17948718 0.12820513
(85,97] 0.00000000 0.14705882 0.29411765 0.41176471 0.14705882
Month
5 6 7 8 9
(56,72] 0.80000000 0.10000000 0.00000000 0.03225806 0.33333333
(72,79] 0.16666667 0.50000000 0.06451613 0.29032258 0.33333333
(79,85] 0.03333333 0.23333333 0.61290323 0.22580645 0.16666667
(85,97] 0.00000000 0.16666667 0.32258065 0.45161290 0.16666667
Month
5 6 7 8 9
(56,72] 0.157894737 0.019736842 0.000000000 0.006578947 0.065789474
(72,79] 0.032894737 0.098684211 0.013157895 0.059210526 0.065789474
(79,85] 0.006578947 0.046052632 0.125000000 0.046052632 0.032894737
(85,97] 0.000000000 0.032894737 0.065789474 0.092105263 0.032894737
Percentages with group_by
and mutate
are also possible!
Finish Exercise 1 (Colonialism / democracy & life expectancy)
1 application exercise about dataviz (in group & ungraded)
Handbooks, videos, cheatsheets