DataScience with
This workshop is about accessing, manipulating visualizing, and analyzing data with a statistical software called and its RStudio interface.
Working with different kinds of data, you will learn some essential statistical concepts along the way, building up from exploratory data analysis to statistical modeling.
You must know that you will be using your computer most of the time throughout the course. The goal is for you to turn to a data scientist 2 hours of each week!
Chapters
Session | Date | Prepare | Slides | Exercises | Exam | |
---|---|---|---|---|---|---|
1. DataScience & Software | Introduction to DataScience and R Software setup | Jan 30, 2024 | 🖥️ | |||
2. Workflow | Utilizing `dplyr` functions: Importing data, exploring structure, selecting variables... | Feb 6, 2024 | 📖️ | 🖥️ | 1️️ 2️⃣️ | |
3. Data 1 | `dplyr` for tidying data: Renaming, aggregating... | Feb 13, 2024 | 📖️ | 🖥️ | 1️️ | |
4. Data 2 | More on `dplyr` for tidying data: aggregating, merging... | Feb 27, 2024 | 📖️ | 🖥️ | 1️️ 2️⃣️ | 🎓️ |
5. Datavisualization | Exploration of data visualization's significance, featuring examples, and introducing `ggplot2` | Mar 5, 2024 | 📖️ | 🖥️ | 1️️ 2️⃣️ | |
6. Univariate Exploratory analysis | Exploratory Analysis for continuous and qualitative variables. | Mar 12, 2024 | 🖥️ | 1️️ | ||
7. Bivariate Exploratory analysis | Correlation and causality | Mar 19, 2024 | 📖️ | 🖥️ | 1️️ | |
8. Statistical inference | Distributions, confidence intervals, and test statistics | Mar 26, 2024 | 📖️ | 🖥️ | 1️️ | |
9. Linear Regression | Linear Regression Models | Apr 2, 2024 | 📖️ | 🖥️ | 1️️ | |
10. Logistic Regression | Linear Generalized Regression Models | Apr 16, 2024 | 📖️ | 🖥️ | 1️️ | 🎓️ |
11. Spatial 1 | Spatial analysis and Cartography | Apr 17, 2024 | 🖥️ | 1️️ | ||
12. Spatial 2 | End of Spatial and notions of classification methods | Apr 23, 2024 | 🖥️ | 1️️ 2️⃣️ | 🎓️ |
Teacher
Kim ANTUNEZ
Learning Outcomes
- Proficiency in exploratory data analysis
- Knowledge of statistical inference and modeling
- Knowledge of the R programming language
- Knowledge of the RStudio software
- Exposition to current data science trends
Professional Skills
Quantitative methods, R and RStudio software, data science skills. After this course, the students will be more able to interact with scientific professions such as data scientists.
Language of tuition
English
Workload
- Attendance: 2 hours a week / 24 hours a semester
- Online learning activities: 12 hours a week / 24 hours a semester
- Reading and Preparation for Class: 1 hour a week / 12 hours a semester
- Research and Preparation for Group Work: 2 hours a week / 24 hours a semester
Pre-requisite
Students need a high curiosity for quantitative methods and motivation to learn how to code around data analysis. Therefore, minimal computing skills (e.g unzipping files) and notions of introductory descriptive statistics would be appreciated. Each student needs to use a laptop running a recent version of Windows, MacOS or Linux, with full admin privileges and the ability to run the latest versions of R (r-project.org) and RStudio (posit.co) installed and to install new libraries using the internet.
Semester
Autumn and Spring 2023-2024
Course validation
There will be exercises to be completed in between workshop sessions, and possibly group projects to be elaborated throughout the semester. Pedagogical format All classes are structured around a slides-based presentation and a ‘demo’ session on the statistical software, followed by a ‘debrief’ email that includes readings and other homework, with feedbacks during the next class.