Week | Topic |
---|---|
1 | Introduction to R, Descriptive Statistics |
2 | Data Manipulation I |
3 | Data Manipulation II, Functional Programming |
4 | Regression, Classification |
5 | Data Visualization I |
6 | Data Visualization II, Tools for Publishing |
7 | Casuality |
8 | Tidy Modelling |
9 | Clustering |
10 | Web scraping, Dimension Reduction |
11 | Text Mining |
Syllabus
Instructor
- Marcell Granát
- Infopark I Building
- granat.marcell@nje.hu
- Schedule an appointment
Course details
- on Tuesdays
- Sept 11–December 02, 2023
- 15:00-18.30
- Neumann Room (computer lab)
Course objectives
In the digital era, an abundant amount of data is created every day, which contains valuable information about the economy, but their proper handling is not trivial. Data analysis and visualization have now become one of the most important skills in business, but in the world of research, it is clearly the most important.
This course introduces several statistical and visualization methods that helps to work with data, such as advanced inferential statistics and dimension reduction techniques. In addition, we will put a lot of effort into helping you to use programming language (properly) to be able to apply these tools in practice.
Topics
This is just a plan! We are still in the early stages of this program. We don’t have much experience with how much preliminary knowledge you have, so the final agenda may change a bit as we progress.
Course materials
Readings that I highly recommend for this course are free, as this course focuses primarily on the technical aspects of data analysis, which are rapidly evolving. The majority of pertinent sources are available online.
Books
Highly recommended (and also free)
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for data science. ” O’Reilly Media, Inc.”
Witten, D., & James, G. (2013). An introduction to statistical learning with applications in R. springer publication.
Kuhn, M., & Silge, J. (2022). Tidy modeling with R. ” O’Reilly Media, Inc.”
Villanueva, R. A. M., & Chen, Z. J. (2019). ggplot2: elegant graphics for data analysis.
Recommended
Gábor Békés, Gábor Kézdi. 2021. Data Analysis for Business, Economics, and Policy. Cambridge University Press
Cole Nussbaumer Knaflic. 2015. Storytelling with Data: A Data Visualization Guide for Business Professionals. 1st Edition. Wiley
Schedule an appointment
Seriously. Feel free to schedule an appointment for an online meeting (especially if I am your supervisor). I strive to keep this calendar up-to-date, so that the vacant slots are accessible to you. Oh, and it would be greatly appreciated if you could provide a brief overview of the topic we will be discussing, so I can do any necessary preparation beforehand.
Assignments
You can find the assignments on the assignments page.
You have the opportunity to fill out the task multiple times, but for personal reasons, I ask you not to try it for too long (I pay for the server usage fee, not the university), or run it manually. Simply copy and paste the following code into your R console, and you will be all set (You must install both of those packages first):
# install.packages("git2r")
# install.packages("rmarkdown")
# install.packages("learnr")
# install.packages("devtools")
# install.packages("httr")
# devtools::install_github("rundel/learnrhash")
# install.packages("tidyverse")
# devtools::install_github("rstudio/gradethis")
# install.packages("shinyalert")
unlink("repo", recursive = TRUE)
git2r::clone(url = "https://github.com/MarcellGranat/bigdata2023_learnr.git", local_path = "repo")
rmarkdown::run("repo/learnr.Rmd")
Once you have completed all the tasks, simply click on the generate
button to obtain a hash code. If you got a message that your results are saved, then its saved, and you do not have any additional task. But if you do not get it, then send this hash code to me via email (the server is on local computers and anything may happen 🤷♂️).
This hash code contains your answers as well as other relevant information about your submission. You must complete each problem set by Monday, 23:59 of the following week.
Late work
You will lose 0.5 point per day for each day a problem set is late (yes, even if it’s only 5 minutes after the deadline). The table with current points are automatically generated from the server.
Grades
You will earn a significant portion of the points with the end-of-semester written exam, but passing the course also requires completing (minimum 24 points from the 48) the ongoing problem set assignments throughout the year.
Assignment | Points | Percent |
---|---|---|
Problem sets | 50 | 20% |
Exam | 200 | 80% |
Total | 250 |
Grade | Range |
---|---|
5 | 90–100% |
4 | 80–89% |
3 | 66–79% |
2 | 50–65% |
1 | 0–49% |