Syllabus

Instructor

Course details

  •   on Tuesdays
  •   Sept 11–December 02, 2023
  •   15:00-18.30
  •   Neumann Room (computer lab)

Course objectives

In the digital era, an abundant amount of data is created every day, which contains valuable information about the economy, but their proper handling is not trivial. Data analysis and visualization have now become one of the most important skills in business, but in the world of research, it is clearly the most important.

This course introduces several statistical and visualization methods that helps to work with data, such as advanced inferential statistics and dimension reduction techniques. In addition, we will put a lot of effort into helping you to use programming language (properly) to be able to apply these tools in practice.

Topics

Caution

This is just a plan! We are still in the early stages of this program. We don’t have much experience with how much preliminary knowledge you have, so the final agenda may change a bit as we progress.

Week Topic
1 Introduction to R, Descriptive Statistics
2 Data Manipulation I
3 Data Manipulation II, Functional Programming
4 Regression, Classification
5 Data Visualization I
6 Data Visualization II, Tools for Publishing
7 Casuality
8 Tidy Modelling
9 Clustering
10 Web scraping, Dimension Reduction
11 Text Mining

Course materials

Readings that I highly recommend for this course are free, as this course focuses primarily on the technical aspects of data analysis, which are rapidly evolving. The majority of pertinent sources are available online.

Books

Schedule an appointment

Seriously. Feel free to schedule an appointment for an online meeting (especially if I am your supervisor). I strive to keep this calendar up-to-date, so that the vacant slots are accessible to you. Oh, and it would be greatly appreciated if you could provide a brief overview of the topic we will be discussing, so I can do any necessary preparation beforehand.

Assignments

You can find the assignments on the assignments page.

You have the opportunity to fill out the task multiple times, but for personal reasons, I ask you not to try it for too long (I pay for the server usage fee, not the university), or run it manually. Simply copy and paste the following code into your R console, and you will be all set (You must install both of those packages first):

# install.packages("git2r")
# install.packages("rmarkdown")
# install.packages("learnr")
# install.packages("devtools")
# install.packages("httr")
# devtools::install_github("rundel/learnrhash")
# install.packages("tidyverse")
# devtools::install_github("rstudio/gradethis")
# install.packages("shinyalert")

unlink("repo", recursive = TRUE)
git2r::clone(url = "https://github.com/MarcellGranat/bigdata2023_learnr.git", local_path = "repo")
rmarkdown::run("repo/learnr.Rmd")

Once you have completed all the tasks, simply click on the generate button to obtain a hash code. If you got a message that your results are saved, then its saved, and you do not have any additional task. But if you do not get it, then send this hash code to me via email (the server is on local computers and anything may happen 🤷‍♂️).

This hash code contains your answers as well as other relevant information about your submission. You must complete each problem set by Monday, 23:59 of the following week.

Late work

You will lose 0.5 point per day for each day a problem set is late (yes, even if it’s only 5 minutes after the deadline). The table with current points are automatically generated from the server.

Grades

You will earn a significant portion of the points with the end-of-semester written exam, but passing the course also requires completing (minimum 24 points from the 48) the ongoing problem set assignments throughout the year.

Assignment Points Percent
Problem sets 50 20%
Exam 200 80%
Total 250
Grade Range
5 90–100%
4 80–89%
3 66–79%
2 50–65%
1 0–49%