The base R

Readings and class materials for Tuesday, September 12, 2023

Why do we need a programming language?

📈 Handling large datasets: Economic data often consists of thousands or even millions of observations. Programming enables us to organize and process this data effectively. By writing code, we can automate repetitive tasks, explore the data, and perform calculations on a scale that would be impractical or time-consuming with manual methods.
🧹 Data cleaning and preprocessing: Real-world data is often messy and inconsistent. Programming allows us to clean and preprocess the data, removing any errors, inconsistencies, or missing values. By writing code to handle such data cleaning tasks, we can ensure the accuracy and integrity of our analysis.
🔁 Reproducibility: Programming promotes reproducibility in statistical analysis. By documenting and sharing our code, others can replicate our analyses, verify our findings, and build upon our work. This promotes transparency and strengthens the validity of our results.
🛃 Flexibility and customization: Programming languages like and Python provide a wide range of statistical libraries and packages specifically designed for data analysis. These libraries offer various functions and algorithms to perform statistical tests, regression models, or other techniques. The ability to customize and tailor these tools to specific research questions allows for more precise and detailed analysis.

Programming, like any other skill, requires practice and persistence ⚒️. As Hadley Wickham, a prominent figure in the community, once said, “The only way to write good code is to write tons of subpar code first. Feeling shame about bad code stops you from getting to good code.” 🚀 This sentiment underscores the importance of perseverance and learning from mistakes in the journey of mastering .

Why Choose ?

stands out as a free software, widely adopted across various domains such as statistics, data science, economics, and more. Beyond being a tool for data-related tasks, offers a rich ecosystem. With over 19,000 packages available on CRAN, extends its basic functionality to cater to diverse needs. Its graphical capabilities are unparalleled, and with the Shiny package, one can effortlessly craft minimalist web applications or dashboards. Our journey will encompass data manipulation, analysis, and visualization techniques.

Yeah, it is not most popular language… But among researchers it is obviously in the top ones. Python, Matlab, and Julia are the most commonly used languages for data science tasks. All the languages have their pro and cons, and we will come back to this issue once we have some insight about programming.

Programming Language	Ratings	Compared to 2022
TIOBE index of the most common popular languages in 2023
Python	14.16%	-1.58%
C	11.27%	-2.70%
C++	10.65%	+0.90%
Java	9.49%	-2.23%
C#	7.31%	+2.42%
JavaScript	3.30%	+0.48%
Visual Basic	2.22%	-2.18%
PHP	1.55%	-0.13%
Assembly language	1.53%	-0.96%
SQL	1.44%	-0.57%
Fortran	1.28%	+0.26%
Go	1.19%	+0.03%
MATLAB	1.19%	+0.13%
Scratch	1.08%	+0.51%
Delphi/Object Pascal	1.02%	-0.07%
Swift	1.00%	+0.02%
Rust	0.97%	+0.47%
R	0.97%	+0.02%
Ruby	0.95%	+0.30%
Kotlin	0.90%	+0.59%

Python is often seen as the second best language for everything due to its large community and available resources. Julia is much less popular, but once you have mastered and Python, it is easy to use and it is extremely fast.

R is great because of the ready-to-use extensions that make the work of a researcher much easier. If you are working with many different data sources for academic research or reporting, then is your best choice! You will see that R’s learning curve is pretty flat at the beginning, but it is going to became pretty steep after a few weeks.

Oh and, it is free 👼🏻. This confers a significant advantage in comparison to other analytical tools. Opting for STATA or SPSS may pose difficulties in terms of personal computer usage due to their costs. Additionally, Python is often inaccessible in research centers, such as the Hungarian Statistical Office or the Central Bank of Hungary, due to privacy concerns. Consequently, it seems that represents a highly favorable alternative.

A Glimpse of R’s Capabilities

R’s versatility can be showcased through various real-world applications:

COVID Tracker: An application developed using (utilizing Shiny and Leaflet) provides real-time tracking of COVID cases. View the tracker here.
Real-time Epidemiology of the Hungarian Coronavirus Epidemic: This application, crafted by Tamás Ferenci, offers insights into the progression of the coronavirus epidemic in Hungary. Explore the application here.
Data Visualization: R’s prowess in data visualization can be demonstrated through intricate plots and graphs, which can be generated in a matter of minutes.
- The outcome
- Behind the scenes

Setting Up and RStudio

To embark on your journey, you’ll need to install both R and RStudio. RStudio serves as a dedicated Integrated Development Environment (IDE) for R. While it’s possible to run code without RStudio, using tools like Notepad, RStudio offers a more integrated experience tailored for (and recently Python). It provides features like code snippets, code completions, and rendering capabilities that enhance the coding experience.

A Tour of RStudio

RStudio’s interface is divided into various panes:

Source: This is where we write and save our scripts. Scripts are saved with a .R extension by default, but other formats are also supported. To execute a line of code, use ctrl + enter, and for the entire script, use ctrl+shift+enter.

Console: This pane displays executed codes and their outputs. For instance, typing 2 + 2 and pressing enter will display the result 4.

Tip

We do not write our codes here. If you just wrote something here, then you can find it at the history pane, but it is going to be painful.

Environment: Here, you can view all the variables you’ve created. For instance, after assigning x = 3, you can see the variable x in this pane.

Tip

Go to to preferences -> General -> Advanced and activate the “Show Last.value”. This will help a lot 😉.

Help: This pane is invaluable when you’re unfamiliar with a function. By typing ?function_name, you can access detailed documentation about the function.

Files: This pane displays files in your current working directory. It’s crucial to set a project directory to streamline your workflow.

Tip

To increase efficiency, you can expedite the process by referencing a file’s name using quotation marks ("") and typing the initial characters of its name. RStudio will then provide you with a list of options that you can select by pressing the TAB key.

Packages: From this pane, you can install and activate packages. From this pane you can install only the packages that are available at CRAN. It requires tons of extra work to register a package there, so you may find a lot online (I spent a 1 week developing currr and one extra week to fulfil every requirement to finally publish). And please, please, please:

Warning

Never leave an install.packages(...) line in your code! This is the worst habit you can have in programming, yet many people do it. If you think it’s useful so that others can run it, it’s not worth it either, since RStudio automatically detects packages that haven’t been installed yet and recommends installing them. If you need an automatic package installation in your workflow, then just use pacman (but this does not belong to a beginner course).

Tip

I suggest you watch the following video for plenty of other useful tips. You can watch it later when you have a better understanding of our goals with R. You may use other IDEs like PyCharm or VSCode in the future, but I highly recommend that you become very familiar with your program as this will make you a productive programmer.

Data Types in

R supports a variety of data types:

Numeric: These are basic numerical values. For instance, assigning x <- 4 makes x a numeric type.
Character: This data type can contain letters, digits, or whitespace. For example, y <- "blue" assigns a character value to y.
Logical: These are boolean values, either TRUE or FALSE.
Factors: Useful for categorical data, factors can help in sorting and classifying data. Imagine the various categories of sizes. It wouldn’t make much sense to arrange them in alphabetical order. By utilizing factors, you have the ability to determine the sequence of the different values.

Ever confused about what is the type of x? We can use the class command to answer this.

x <- "2"
class(x)

[1] "character"

Data Structures in

R offers several data structures to store and manipulate data:

Vectors: These are one-dimensional arrays that can store numeric, character, or logical values.

Some examples

v <- c("a", "b", "c", "d") # combine
v

[1] "a" "b" "c" "d"

v[1] # 1st item

[1] "a"

v[2] # 2nd item

[1] "b"

v <- 1:10 # from 1 to 10
v

 [1]  1  2  3  4  5  6  7  8  9 10

v <- c("XL", "M", "L", "S", "L", "other")
sort(v)

[1] "L"     "L"     "M"     "other" "S"     "XL"

fv <- factor(v, ordered = TRUE, levels = c("XS", "S", "M", "L", "XL"))
sort(fv)

[1] S  M  L  L  XL
Levels: XS < S < M < L < XL

NA stands for “Not Available” and is used to represent missing or undefined values in data. It is a special value that indicates the absence of a valid observation or result. NA is commonly used in data analysis and manipulation tasks to handle missing data and perform computations or operations without encountering errors.

Data Frames: These are tables where each column is a variable and each row is an observation.

Example

avengers_df <- data.frame(
  id = 1:4,
  name = c("Captain America", "Hulk", "Groot","Strange"), 
  species = c("human", NA, "Flora colossus", "human")
)

avengers_df

  id            name        species
1  1 Captain America          human
2  2            Hulk           <NA>
3  3           Groot Flora colossus
4  4         Strange          human

Lists: These are ordered collections of objects, which can be of different types and lengths.

To store more complex data, the list function can be used. If you want to use the data.frame function, it is important to have vectors of equal length. However, in cases where you need to store a collection of data.frames, such as a big panel dataset with separate files for each year (e.g., cis_survey2016.csv, cis_survey2017.csv), using a list is a perfect solution. This situation is not uncommon, and I recommend storing your data in a list.

mylist <- list(avengers_df, v)
mylist

[[1]]
  id            name        species
1  1 Captain America          human
2  2            Hulk           <NA>
3  3           Groot Flora colossus
4  4         Strange          human

[[2]]
[1] "XL"    "M"     "L"     "S"     "L"     "other"

Now mylist stores a data.frame and two vectors. You can access the elements by [[ ]]. For example, the 2nd element:

mylist[[2]]

[1] "XL"    "M"     "L"     "S"     "L"     "other"

Basics of programming

In the following, I will introduce the most basic expressions that may arise during programming. If you already have experience in any language, these will surely not pose a difficulty for you, although syntax varies in every language. These are elements of coding that function in any language in this way, and they are essential foundations for transforming our thoughts into code.

Conditional Statements in

Logical operators such as <, >, ==, and != can be used to form conditions.

Let’s see some examples!

4 < 5

[1] TRUE

5 <= 5

[1] TRUE

4 > 5

[1] FALSE

(2 + 2) == 4

[1] TRUE

5 >=4

[1] TRUE

2 == 3 # equal?

[1] FALSE

3 != 3 # not equal?

[1] FALSE

is.na(NA)

[1] TRUE

3 %in% c(1, 2, 3)

[1] TRUE

Combining conditions

2 == 2 & 2 == 3 # and

[1] FALSE

2 == 2 | 2 == 3 # or

[1] TRUE

It is preferred to use && and || in this case, as this forces R to only check the next condition if it can alter the final outcome. For example, if the first condition is FALSE and && is used, the subsequent condition will not be evaluated.

Conditional evaluation

R supports conditional statements like if, else if, and else to execute code based on specific conditions.

students_at_class = 25

if (students_at_class > 20) {
  print("You R great!")
} else if (students_at_class > 10) {
  print("Would you like some additional assignment?")
} else {
  print("You will all fail!")
}

[1] "You R great!"

Tip

If you want to write an if … else statement in R, I highly recomment you to use the snippet for that. Snippet means, that when you type if and press shift + tab, then R will automaticly write the framework you have to use.

Working with vectors

x <- 1:10
x %% 3 == 0

 [1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE

which(x %% 3 == 0)

[1] 3 6 9

ifelse(x %% 2 == 0, "even", "odd")

 [1] "odd"  "even" "odd"  "even" "odd"  "even" "odd"  "even" "odd"  "even"

Loops

Loops are used to execute a block of code multiple times. supports loops like for and while.

For

For loop is used to iterate over a vector.

# Basic for loop to print numbers from 1 to 5
for (i in 1:5) {
  print(i)
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Alternatively, we have a somewhat more sophisticated sequence, the Fibonacci series! Let us examine the first 10 elements of the Fibonacci sequence.

# Initialize the first two Fibonacci numbers
fibonacci <- c(0, 1)

# Generate the next 8 Fibonacci numbers
for (i in 3:10) {
  next_fibonacci <- fibonacci[i-1] + fibonacci[i-2]
  fibonacci <- c(fibonacci, next_fibonacci)
}

# Print the Fibonacci sequence
print(fibonacci)

 [1]  0  1  1  2  3  5  8 13 21 34

While

While loop is used to execute a given script while the condition is true.

# Initialize the first two Fibonacci numbers
fibonacci <- c(0, 1)

# Generate Fibonacci numbers while the last value is less than 50
while (tail(fibonacci, 1) < 50) {
  next_fibonacci <- sum(tail(fibonacci, 2))
  fibonacci <- c(fibonacci, next_fibonacci)
}

# Remove the last number which is greater than or equal to 50
fibonacci <- fibonacci[-length(fibonacci)]

# Print the Fibonacci sequence
print(fibonacci)

 [1]  0  1  1  2  3  5  8 13 21 34

Functions

Functions are blocks of code that can be reused. provides numerous built-in functions, and users can also define their own. Functions enhance code readability and reduce redundancy.

Tip

Use the FUN 🤗 snippet to generate the framework for a new function.

fibonacci <- function(n) {
  fibonacci_seq <- c(0, 1) # initial = 2 elements
  
  for (i in 1:(max(n) - 2)) { # n can be a vector
    new_elem <- sum(tail(fibonacci_seq, 2))
    
    fibonacci_seq <- c(fibonacci_seq, new_elem)
  }
  
  fibonacci_seq[n]
}

fibonacci(13:15)

[1] 144 233 377

Note

Writing good functions can be challenging, but once you’ve got the basics down, here are some tips that could help: https://style.tidyverse.org/functions.html

The apply family

The functions belonging to the Apply Family will not be used frequently, as there is a more modern and transparent alternative for them. We will encounter these in the following weeks, however, it is important to be familiar with them as they form the basis for many packages and solutions (for instance, in parallelisation), and we cannot avoid having colleagues who use them.

The essence of these functions is to perform iteration on the input data. The difference between iteration and recursion is that in the latter case, it the previous result is used in the evaluation of the next step (for example, we saw this in the Fibonacci sequence). This is why, for example, processor cores cannot perform the given task separately at different points of the input data.

Apply

The function apply calls a function on each row or column of a data.frame (or matrix). So its first argument is the data.frame, the third is the function which should use and the second is the MARGIN:

MARGIN = 2: apply the given function on each of the COLUMNS
MARGIN = 1: apply the given function on each of the ROWS

Example - Fertility rates by country / by year

Replacement Fertility Rate: Assuming there are no migration flows and that mortality rates remain unchanged, a total fertility rate of 2.1 children per woman generates a broad stability of the population.

above_replacement_prop <- function(x) {
  # proportion of observation above the replacement rate
  # exclude NAs
  sum(x >= 2.1, na.rm = TRUE) / sum(!is.na(x))
}

x <- c(1.3, 4, 2, NA)
above_replacement_prop(x)

[1] 0.3333333

Example - Fertility rates by country / by year

fertility_df

      AUS  AUT  BEL  CAN  CZE  DNK  FIN  FRA  DEU  GRC  HUN  ISL  IRL  ITA  JPN
1960 3.45 2.69 2.54 3.90 2.11 2.54 2.71 2.74 2.37 2.23 2.02 4.26 3.76 2.41 2.00
1961 3.55 2.78 2.63 3.84 2.13 2.55 2.65 2.82 2.44 2.13 1.94 3.88 3.79 2.41 1.96
1962 3.43 2.80 2.59 3.76 2.14 2.54 2.66 2.80 2.44 2.16 1.79 3.98 3.92 2.46 1.98
1963 3.34 2.82 2.68 3.67 2.33 2.64 2.66 2.90 2.51 2.14 1.82 3.98 4.01 2.56 2.00
1964 3.15 2.79 2.71 3.50 2.36 2.60 2.58 2.91 2.53 2.24 1.80 3.86 4.06 2.70 2.05
1965 2.97 2.70 2.61 3.15 2.18 2.61 2.46 2.85 2.50 2.25 1.81 3.71 4.03 2.67 2.14
1966 2.89 2.66 2.52 2.81 2.01 2.62 2.40 2.80 2.51 2.32 1.88 3.58 3.95 2.63 1.58
1967 2.85 2.62 2.41 2.60 1.90 2.35 2.32 2.67 2.45 2.45 2.01 3.28 3.84 2.54 2.23
1968 2.89 2.58 2.31 2.45 1.83 2.12 2.15 2.59 2.36 2.42 2.06 3.07 3.78 2.49 2.13
1969 2.89 2.49 2.27 2.40 1.86 2.00 1.94 2.53 2.21 2.36 2.04 2.99 3.85 2.51 2.13
1970 2.86 2.29 2.25 2.33 1.91 1.95 1.83 2.48 2.03 2.40 1.97 2.81 3.87 2.42 2.13
1971 2.95 2.20 2.21 2.19 1.98 2.04 1.70 2.50 1.97 2.32 1.92 2.92 3.98 2.41 2.16
1972 2.74 2.08 2.09 2.02 2.07 2.03 1.59 2.42 1.74 2.32 1.93 3.09 3.88 2.36 2.14
1973 2.49 1.94 1.95 1.93 2.29 1.92 1.50 2.31 1.56 2.27 1.95 2.95 3.74 2.34 2.14
1974 2.32 1.91 1.83 1.82 2.43 1.90 1.62 2.11 1.53 2.38 2.30 2.66 3.62 2.33 2.05
1975 2.15 1.83 1.74 1.80 2.40 1.92 1.69 1.93 1.48 2.33 2.38 2.65 3.40 2.21 1.91
1976 2.06 1.69 1.73 1.76 2.36 1.75 1.72 1.83 1.51 2.35 2.26 2.52 3.31 2.11 1.85
1977 2.01 1.63 1.71 1.75 2.32 1.66 1.69 1.86 1.51 2.27 2.17 2.31 3.27 1.97 1.80
1978 1.95 1.60 1.69 1.70 2.32 1.67 1.65 1.82 1.50 2.28 2.08 2.35 3.24 1.87 1.79
1979 1.91 1.60 1.69 1.70 2.29 1.60 1.64 1.86 1.50 2.26 2.02 2.49 3.23 1.76 1.77
1980 1.89 1.65 1.68 1.68 2.10 1.55 1.63 1.95 1.56 2.23 1.92 2.48 3.23 1.68 1.75
1981 1.94 1.67 1.66 1.65 2.02 1.44 1.65 1.95 1.53 2.10 1.88 2.33 3.07 1.60 1.74
1982 1.93 1.66 1.61 1.64 2.01 1.43 1.72 1.91 1.51 2.03 1.78 2.26 2.96 1.60 1.77
1983 1.92 1.56 1.57 1.63 1.96 1.38 1.74 1.78 1.43 1.94 1.73 2.24 2.76 1.54 1.80
1984 1.84 1.52 1.54 1.63 1.97 1.40 1.70 1.80 1.39 1.82 1.73 2.08 2.59 1.48 1.81
1985 1.92 1.47 1.51 1.61 1.96 1.45 1.64 1.81 1.37 1.68 1.83 1.93 2.50 1.45 1.76
1986 1.87 1.45 1.54 1.59 1.94 1.48 1.60 1.83 1.41 1.60 1.83 1.93 2.44 1.37 1.72
1987 1.85 1.43 1.54 1.58 1.91 1.50 1.59 1.80 1.43 1.50 1.81 2.07 2.31 1.35 1.69
1988 1.83 1.45 1.57 1.60 1.94 1.56 1.70 1.81 1.46 1.50 1.79 2.27 2.17 1.38 1.66
1989 1.84 1.45 1.58 1.66 1.87 1.62 1.71 1.79 1.42 1.40 1.78 2.20 2.08 1.35 1.57
1990 1.90 1.46 1.62 1.71 1.89 1.67 1.79 1.78 1.45 1.39 1.84 2.31 2.12 1.36 1.54
1991 1.85 1.51 1.66 1.72 1.86 1.68 1.80 1.77 1.33 1.37 1.86 2.19 2.09 1.33 1.53
1992 1.89 1.51 1.65 1.71 1.72 1.76 1.85 1.73 1.29 1.36 1.77 2.21 1.99 1.32 1.50
1993 1.86 1.50 1.61 1.69 1.67 1.75 1.81 1.66 1.28 1.32 1.69 2.22 1.91 1.26 1.46
1994 1.84 1.47 1.56 1.69 1.44 1.81 1.85 1.66 1.24 1.33 1.64 2.14 1.85 1.22 1.50
1995 1.82 1.42 1.56 1.67 1.28 1.81 1.81 1.71 1.25 1.28 1.57 2.08 1.85 1.19 1.42
1996 1.80 1.45 1.59 1.63 1.19 1.75 1.76 1.73 1.32 1.26 1.46 2.12 1.89 1.22 1.43
1997 1.78 1.39 1.60 1.57 1.17 1.76 1.75 1.73 1.37 1.27 1.38 2.04 1.94 1.23 1.39
1998 1.76 1.37 1.60 1.56 1.16 1.73 1.71 1.76 1.36 1.24 1.33 2.05 1.95 1.21 1.38
1999 1.76 1.34 1.62 1.55 1.13 1.74 1.73 1.79 1.36 1.23 1.29 1.99 1.91 1.23 1.34
2000 1.76 1.36 1.67 1.51 1.14 1.77 1.73 1.87 1.38 1.25 1.33 2.08 1.90 1.26 1.36
2001 1.73 1.33 1.67 1.54 1.15 1.75 1.73 1.88 1.35 1.25 1.31 1.95 1.96 1.25 1.33
2002 1.77 1.39 1.65 1.52 1.17 1.72 1.72 1.86 1.34 1.28 1.31 1.93 1.98 1.27 1.32
2003 1.77 1.38 1.67 1.55 1.18 1.76 1.76 1.87 1.34 1.29 1.28 1.99 1.98 1.29 1.29
2004 1.78 1.42 1.72 1.56 1.23 1.79 1.80 1.90 1.36 1.31 1.28 2.03 1.95 1.34 1.29
2005 1.85 1.41 1.76 1.58 1.28 1.80 1.80 1.92 1.34 1.34 1.32 2.05 1.88 1.33 1.26
2006 1.88 1.41 1.80 1.63 1.33 1.85 1.84 1.98 1.33 1.40 1.35 2.07 1.94 1.37 1.32
2007 1.99 1.39 1.82 1.67 1.44 1.84 1.83 1.95 1.37 1.41 1.32 2.09 2.01 1.39 1.34
2008 2.02 1.42 1.85 1.70 1.50 1.89 1.85 1.99 1.38 1.50 1.35 2.14 2.06 1.44 1.37
2009 1.97 1.40 1.84 1.69 1.49 1.84 1.86 1.99 1.36 1.50 1.33 2.22 2.06 1.44 1.37
2010 1.95 1.44 1.86 1.65 1.49 1.87 1.87 2.02 1.39 1.48 1.26 2.20 2.05 1.44 1.39
2011 1.92 1.43 1.81 1.63 1.43 1.75 1.83 2.00 1.39 1.40 1.24 2.02 2.03 1.42 1.39
2012 1.93 1.44 1.80 1.63 1.45 1.73 1.80 1.99 1.41 1.34 1.34 2.04 1.98 1.42 1.41
2013 1.88 1.44 1.76 1.61 1.46 1.67 1.75 1.97 1.42 1.29 1.34 1.93 1.93 1.39 1.43
2014 1.79 1.46 1.74 1.61 1.53 1.69 1.71 1.97 1.47 1.30 1.41 1.93 1.89 1.38 1.42
2015 1.79 1.49 1.70 1.60 1.57 1.71 1.65 1.93 1.50 1.33 1.44 1.81 1.85 1.36 1.45
2016 1.79 1.53 1.68 1.59 1.63 1.79 1.57 1.89 1.59 1.38 1.49 1.75 1.82 1.36 1.44
2017 1.74 1.52 1.65 1.55 1.69 1.75 1.49 1.86 1.57 1.35 1.49 1.71 1.78 1.34 1.43
2018 1.74 1.48 1.62 1.51 1.71 1.73 1.41 1.84 1.57 1.35 1.49 1.71 1.75 1.31 1.42
2019 1.67 1.46 1.60 1.47 1.71 1.70 1.35 1.83 1.54 1.34 1.49 1.75 1.70 1.27 1.36
2020 1.59 1.44 1.55 1.41 1.71 1.67 1.37 1.79 1.53 1.39 1.56 1.72 1.63 1.24 1.33
2021 1.70 1.48 1.60 1.43 1.83 1.72 1.46 1.80 1.58 1.43 1.59 1.82 1.72 1.25 1.30
2022   NA   NA   NA   NA   NA 1.55   NA   NA   NA   NA 1.52   NA   NA   NA   NA
      KOR  LUX  MEX  NLD  NZL  NOR  POL  PRT  SVK  ESP  SWE  CHE  TUR  GBR  USA
1960 6.00 2.28 6.77 3.12 4.24 2.91 2.98 3.10 3.07 2.86 2.20 2.44 6.40 2.72 3.65
1961 5.80 2.33 6.76 3.22 4.31 2.94 2.83 3.16 2.96 2.76 2.23 2.53 6.33 2.80 3.62
1962 5.60 2.35 6.76 3.18 4.19 2.91 2.72 3.21 2.83 2.80 2.26 2.60 6.26 2.88 3.46
1963 5.40 2.33 6.75 3.19 4.05 2.93 2.70 3.11 2.93 2.88 2.34 2.67 6.19 2.92 3.32
1964 5.20 2.38 6.75 3.17 3.80 2.98 2.57 3.21 2.91 3.01 2.48 2.68 6.01 2.97 3.19
1965 5.00 2.42 6.76 3.04 3.54 2.94 2.52 3.14 2.80 2.94 2.42 2.61 5.84 2.89 2.91
1966 4.80 2.37 6.77 2.90 3.41 2.90 2.34 3.12 2.67 2.99 2.36 2.52 5.66 2.79 2.72
1967 4.66 2.25 6.79 2.81 3.35 2.81 2.33 3.08 2.49 3.03 2.27 2.41 5.49 2.69 2.56
1968 4.52 2.13 6.81 2.72 3.34 2.75 2.24 3.00 2.40 2.96 2.07 2.30 5.31 2.60 2.46
1969 4.53 2.02 6.83 2.75 3.28 2.69 2.20 2.95 2.43 2.93 1.93 2.19 5.18 2.51 2.46
1970 4.53 1.98 6.83 2.57 3.17 2.50 2.20 2.83 2.40 2.90 1.94 2.10 5.00 2.43 2.48
1971 4.54 1.96 6.79 2.36 3.18 2.49 2.25 2.78 2.43 2.88 1.96 2.04 5.00 2.40 2.27
1972 4.12 1.75 6.70 2.15 3.00 2.38 2.24 2.69 2.49 2.86 1.91 1.91 5.00 2.20 2.01
1973 4.07 1.58 6.56 1.90 2.76 2.23 2.26 2.65 2.56 2.84 1.86 1.81 5.59 2.04 1.88
1974 3.77 1.58 6.37 1.77 2.58 2.13 2.26 2.60 2.60 2.89 1.87 1.73 5.46 1.92 1.84
1975 3.43 1.55 6.13 1.66 2.37 1.98 2.27 2.58 2.53 2.80 1.77 1.61 5.32 1.81 1.77
1976 3.00 1.48 5.86 1.63 2.27 1.86 2.30 2.58 2.52 2.80 1.68 1.55 5.19 1.74 1.74
1977 2.99 1.49 5.59 1.58 2.21 1.75 2.23 2.48 2.47 2.67 1.64 1.53 4.90 1.69 1.79
1978 2.64 1.47 5.32 1.58 2.07 1.77 2.21 2.28 2.45 2.55 1.60 1.51 5.05 1.75 1.76
1979 2.90 1.47 5.06 1.56 2.12 1.75 2.28 2.17 2.44 2.37 1.66 1.52 4.84 1.86 1.81
1980 2.82 1.50 4.84 1.60 2.03 1.72 2.28 2.18 2.31 2.22 1.68 1.55 4.63 1.90 1.84
1981 2.57 1.55 4.64 1.56 2.01 1.70 2.24 2.13 2.28 2.04 1.63 1.55 4.41 1.82 1.81
1982 2.39 1.49 4.46 1.50 1.95 1.71 2.34 2.07 2.27 1.94 1.62 1.56 4.20 1.78 1.83
1983 2.06 1.44 4.30 1.47 1.92 1.66 2.42 1.95 2.27 1.80 1.61 1.52 4.11 1.77 1.80
1984 1.74 1.42 4.15 1.49 1.93 1.66 2.37 1.90 2.25 1.73 1.65 1.53 3.93 1.77 1.81
1985 1.66 1.38 4.02 1.51 1.93 1.68 2.33 1.72 2.25 1.64 1.73 1.52 3.76 1.79 1.84
1986 1.58 1.44 3.90 1.55 1.96 1.71 2.22 1.66 2.20 1.56 1.79 1.53 3.58 1.78 1.84
1987 1.53 1.39 3.79 1.56 2.03 1.75 2.15 1.62 2.14 1.50 1.84 1.52 3.40 1.81 1.87
1988 1.55 1.51 3.68 1.55 2.10 1.84 2.13 1.62 2.15 1.45 1.96 1.57 3.29 1.82 1.93
1989 1.56 1.52 3.57 1.55 2.12 1.89 2.07 1.58 2.08 1.40 2.02 1.56 3.39 1.79 2.01
1990 1.57 1.62 3.47 1.62 2.18 1.93 1.99 1.56 2.09 1.36 2.14 1.59 3.07 1.83 2.08
1991 1.71 1.60 3.37 1.61 2.09 1.92 1.98 1.56 2.05 1.33 2.12 1.58 3.00 1.82 2.06
1992 1.76 1.67 3.27 1.59 2.06 1.89 1.85 1.53 1.99 1.32 2.09 1.58 2.93 1.79 2.05
1993 1.65 1.69 3.18 1.57 2.04 1.86 1.77 1.51 1.93 1.27 2.00 1.51 2.87 1.76 2.02
1994 1.66 1.72 3.10 1.57 1.98 1.87 1.72 1.44 1.67 1.20 1.89 1.49 2.81 1.74 2.00
1995 1.63 1.67 3.02 1.53 1.98 1.87 1.55 1.41 1.52 1.17 1.74 1.48 2.75 1.71 1.98
1996 1.57 1.76 2.95 1.53 1.96 1.89 1.53 1.44 1.47 1.16 1.61 1.50 2.69 1.73 1.98
1997 1.54 1.71 2.88 1.56 1.96 1.86 1.47 1.47 1.43 1.18 1.53 1.48 2.63 1.72 1.97
1998 1.46 1.67 2.82 1.63 1.89 1.81 1.41 1.48 1.37 1.16 1.51 1.47 2.56 1.71 2.00
1999 1.43 1.71 2.77 1.65 1.97 1.85 1.37 1.51 1.33 1.19 1.50 1.48 2.48 1.68 2.01
2000 1.48 1.78 2.72 1.72 1.98 1.85 1.37 1.56 1.29 1.23 1.55 1.50 2.27 1.64 2.06
2001 1.31 1.66 2.67 1.71 1.97 1.78 1.32 1.46 1.20 1.24 1.57 1.38 2.37 1.63 2.03
2002 1.18 1.63 2.62 1.73 1.89 1.75 1.25 1.47 1.19 1.25 1.65 1.39 2.17 1.63 2.01
2003 1.19 1.62 2.58 1.75 1.93 1.80 1.22 1.44 1.20 1.30 1.72 1.39 2.09 1.70 2.04
2004 1.16 1.66 2.54 1.73 1.98 1.83 1.23 1.41 1.24 1.31 1.75 1.42 2.11 1.75 2.05
2005 1.09 1.62 2.50 1.71 1.97 1.84 1.24 1.42 1.25 1.33 1.77 1.42 2.12 1.76 2.06
2006 1.13 1.64 2.46 1.72 2.01 1.90 1.27 1.38 1.24 1.36 1.85 1.44 2.12 1.82 2.11
2007 1.26 1.61 2.42 1.72 2.18 1.90 1.31 1.35 1.25 1.38 1.88 1.46 2.16 1.86 2.12
2008 1.19 1.60 2.39 1.77 2.19 1.96 1.39 1.40 1.32 1.45 1.91 1.48 2.15 1.91 2.07
2009 1.15 1.59 2.36 1.79 2.13 1.98 1.40 1.35 1.41 1.38 1.94 1.50 2.10 1.89 2.00
2010 1.23 1.63 2.34 1.80 2.17 1.95 1.38 1.39 1.40 1.37 1.98 1.54 2.08 1.92 1.93
2011 1.24 1.51 2.32 1.76 2.09 1.88 1.30 1.35 1.45 1.34 1.90 1.52 2.05 1.91 1.89
2012 1.30 1.57 2.29 1.72 2.10 1.85 1.30 1.29 1.34 1.32 1.91 1.53 2.11 1.92 1.88
2013 1.19 1.55 2.27 1.68 2.01 1.78 1.26 1.21 1.34 1.27 1.89 1.52 2.11 1.83 1.86
2014 1.21 1.50 2.21 1.71 1.92 1.76 1.29 1.23 1.37 1.32 1.88 1.54 2.18 1.81 1.86
2015 1.24 1.47 2.14 1.66 1.99 1.73 1.29 1.31 1.40 1.33 1.85 1.54 2.15 1.80 1.84
2016 1.17 1.41 2.09 1.66 1.87 1.71 1.36 1.36 1.48 1.34 1.85 1.54 2.11 1.79 1.82
2017 1.05 1.39 2.04 1.62 1.81 1.62 1.45 1.38 1.52 1.31 1.78 1.52 2.07 1.74 1.77
2018 0.98 1.38 2.00 1.59 1.71 1.56 1.44 1.42 1.54 1.26 1.75 1.52 1.99 1.68 1.73
2019 0.92 1.34 1.92 1.57 1.72 1.53 1.42 1.43 1.57 1.23 1.70 1.48 1.88 1.63 1.71
2020 0.84 1.36 1.91 1.54 1.61 1.48 1.39 1.41 1.59 1.19 1.66 1.46 1.76 1.56 1.64
2021 0.81 1.38 1.82 1.62 1.64 1.55 1.33 1.35 1.63 1.19 1.67 1.51 1.70 1.53 1.66
2022   NA   NA   NA   NA   NA 1.41   NA   NA   NA   NA   NA   NA   NA   NA   NA
      BRA  CHL  CHN  EST  IND  IDN  ISR  RUS  SVN  ZAF  COL  LVA  LTU  ARG  BGR
1960 6.06 4.70 4.45 1.98 5.91 5.55 3.95 2.52 2.18 6.16 6.74 1.94 2.40 3.11 2.31
1961 6.03 4.66 3.86 1.98 5.90 5.57 3.80 2.45 2.26 6.14 6.71 1.94 2.40 3.10 2.29
1962 5.98 4.60 6.09 1.95 5.89 5.59 3.77 2.36 2.27 6.11 6.66 1.91 2.40 3.09 2.24
1963 5.91 4.54 7.51 1.89 5.88 5.60 3.81 2.27 2.28 6.08 6.58 1.85 2.40 3.08 2.21
1964 5.82 4.46 6.67 1.94 5.86 5.61 3.93 2.18 2.32 6.03 6.48 1.79 2.40 3.07 2.19
1965 5.70 4.36 6.61 1.88 5.83 5.62 3.99 2.13 2.45 5.97 6.33 1.74 2.40 3.06 2.09
1966 5.57 4.26 6.31 1.87 5.79 5.60 3.89 2.10 2.48 5.91 6.16 1.76 2.40 3.05 2.03
1967 5.42 4.14 5.81 1.90 5.75 5.58 3.64 2.04 2.38 5.85 5.96 1.80 2.40 3.05 2.02
1968 5.27 4.03 6.51 2.03 5.70 5.54 3.82 1.99 2.28 5.78 5.74 1.83 2.40 3.05 2.27
1969 5.12 3.90 6.18 2.13 5.65 5.51 3.83 1.97 2.17 5.72 5.51 1.88 2.40 3.06 2.27
1970 4.97 3.78 6.09 2.17 5.59 5.45 3.97 1.99 2.21 5.63 5.28 2.02 2.40 3.08 2.17
1971 4.84 3.65 5.52 2.19 5.52 5.36 3.94 2.03 2.16 5.57 5.06 2.04 2.41 3.11 2.10
1972 4.71 3.53 5.11 2.13 5.44 5.29 3.71 2.04 2.14 5.49 4.86 2.03 2.34 3.15 2.03
1973 4.60 3.41 4.73 2.06 5.36 5.22 3.68 2.01 2.18 5.41 4.68 1.96 2.22 3.20 2.15
1974 4.50 3.29 4.17 2.07 5.28 5.09 3.71 2.00 2.10 5.30 4.53 2.00 2.21 3.25 2.29
1975 4.42 3.18 3.57 2.04 5.19 5.04 3.68 1.98 2.16 5.19 4.40 1.97 2.18 3.30 2.23
1976 4.34 3.08 3.24 2.07 5.11 4.92 3.70 1.97 2.17 5.07 4.29 1.93 2.18 3.34 2.24
1977 4.27 2.98 2.84 2.06 5.03 4.81 3.47 1.95 2.16 4.94 4.18 1.89 2.14 3.36 2.21
1978 4.20 2.89 2.72 2.02 4.96 4.72 3.28 1.92 2.19 4.85 4.07 1.87 2.08 3.36 2.15
1979 4.12 2.81 2.75 2.00 4.89 4.61 3.21 1.90 2.22 4.82 3.97 1.87 2.05 3.34 2.16
1980 4.04 2.74 2.74 2.02 4.83 4.49 3.14 1.89 2.11 4.78 3.86 1.90 1.99 3.30 2.05
1981 3.94 2.69 2.79 2.07 4.77 4.36 3.06 1.91 1.96 4.71 3.74 1.90 1.98 3.25 2.00
1982 3.84 2.65 2.97 2.08 4.70 4.25 3.12 2.04 1.93 4.70 3.63 1.98 1.97 3.20 2.01
1983 3.72 2.62 2.56 2.16 4.64 4.10 3.21 2.11 1.82 4.63 3.53 2.13 2.10 3.16 2.01
1984 3.60 2.60 2.61 2.17 4.56 3.94 3.13 2.06 1.75 4.57 3.43 2.15 2.07 3.12 2.01
1985 3.47 2.59 2.63 2.12 4.48 3.71 3.12 2.05 1.72 4.50 3.34 2.09 2.08 3.10 1.97
1986 3.34 2.59 2.72 2.17 4.40 3.53 3.09 2.15 1.65 4.41 3.27 2.21 2.12 3.08 2.02
1987 3.23 2.59 2.76 2.26 4.31 3.42 3.05 2.22 1.64 4.35 3.21 2.21 2.11 3.06 1.96
1988 3.11 2.59 2.54 2.26 4.22 3.33 3.06 2.12 1.63 4.18 3.16 2.16 2.02 3.04 1.97
1989 3.01 2.59 2.52 2.22 4.13 3.22 3.03 2.01 1.52 3.98 3.12 2.05 1.98 3.02 1.90
1990 2.91 2.58 2.51 2.05 4.05 3.10 3.02 1.89 1.46 3.72 3.08 2.01 2.03 3.00 1.82
1991 2.82 2.56 1.93 1.80 3.96 3.06 2.91 1.73 1.42 3.62 3.05 1.86 2.01 2.97 1.66
1992 2.72 2.53 1.78 1.71 3.88 2.94 2.93 1.55 1.34 3.48 3.01 1.73 1.97 2.93 1.55
1993 2.67 2.48 1.69 1.49 3.80 2.88 2.92 1.39 1.33 3.37 2.97 1.51 1.74 2.88 1.46
1994 2.62 2.43 1.63 1.42 3.72 2.84 2.90 1.40 1.32 3.26 2.92 1.39 1.57 2.83 1.37
1995 2.58 2.37 1.59 1.38 3.65 2.80 2.88 1.34 1.29 3.17 2.86 1.26 1.55 2.77 1.23
1996 2.52 2.31 1.55 1.37 3.58 2.77 2.94 1.27 1.28 2.99 2.80 1.16 1.49 2.72 1.23
1997 2.47 2.24 1.53 1.32 3.51 2.74 2.93 1.22 1.25 2.73 2.74 1.11 1.47 2.67 1.09
1998 2.41 2.17 1.52 1.28 3.45 2.66 2.98 1.23 1.23 2.63 2.68 1.10 1.46 2.62 1.11
1999 2.33 2.11 1.53 1.30 3.38 2.58 2.94 1.16 1.21 2.56 2.63 1.18 1.46 2.58 1.23
2000 2.26 2.06 1.63 1.36 3.35 2.54 2.95 1.20 1.26 2.41 2.57 1.25 1.39 2.54 1.26
2001 2.18 2.01 1.56 1.32 3.30 2.50 2.89 1.22 1.21 2.37 2.52 1.22 1.29 2.51 1.21
2002 2.10 1.97 1.57 1.36 3.22 2.46 2.89 1.29 1.21 2.32 2.46 1.26 1.23 2.49 1.21
2003 2.02 1.94 1.57 1.36 3.12 2.43 2.95 1.32 1.20 2.36 2.40 1.32 1.26 2.46 1.23
2004 2.00 1.92 1.61 1.47 3.05 2.42 2.90 1.34 1.25 2.44 2.33 1.29 1.27 2.44 1.29
2005 1.97 1.91 1.62 1.52 2.96 2.43 2.84 1.29 1.26 2.51 2.26 1.39 1.29 2.42 1.32
2006 1.93 1.90 1.64 1.58 2.86 2.45 2.88 1.31 1.31 2.55 2.20 1.46 1.33 2.40 1.38
2007 1.88 1.90 1.67 1.69 2.78 2.49 2.90 1.42 1.38 2.55 2.14 1.54 1.36 2.38 1.49
2008 1.84 1.90 1.70 1.72 2.72 2.48 2.96 1.50 1.53 2.68 2.08 1.58 1.45 2.37 1.56
2009 1.83 1.89 1.71 1.70 2.67 2.46 2.96 1.54 1.53 2.50 2.03 1.46 1.50 2.36 1.66
2010 1.81 1.88 1.69 1.72 2.60 2.45 3.03 1.57 1.57 2.44 1.99 1.36 1.50 2.35 1.57
2011 1.80 1.82 1.67 1.61 2.54 2.50 3.00 1.58 1.56 2.44 1.96 1.33 1.55 2.34 1.51
2012 1.77 1.80 1.80 1.56 2.47 2.49 3.05 1.69 1.58 2.45 1.93 1.44 1.60 2.33 1.50
2013 1.75 1.79 1.71 1.52 2.41 2.43 3.03 1.71 1.55 2.43 1.91 1.52 1.59 2.32 1.48
2014 1.77 1.77 1.77 1.54 2.31 2.39 3.08 1.75 1.58 2.42 1.88 1.65 1.63 2.31 1.53
2015 1.78 1.74 1.67 1.58 2.29 2.35 3.09 1.78 1.57 2.36 1.86 1.70 1.70 2.30 1.53
2016 1.71 1.68 1.77 1.60 2.27 2.31 3.11 1.76 1.58 2.26 1.84 1.74 1.69 2.24 1.54
2017 1.74 1.60 1.81 1.59 2.20 2.26 3.11 1.62 1.62 2.33 1.82 1.69 1.63 2.17 1.56
2018 1.75 1.56 1.55 1.67 2.18 2.23 3.09 1.58 1.60 2.42 1.79 1.60 1.63 2.04 1.56
2019 1.70 1.55 1.50 1.66 2.11 2.22 3.01 1.50 1.61 2.48 1.77 1.61 1.61 1.99 1.58
2020 1.65 1.54 1.28 1.58 2.05 2.19 2.90 1.51 1.59 2.40 1.74 1.55 1.48 1.91 1.56
2021 1.64 1.54 1.16 1.61 2.03 2.17 3.00 1.49 1.64 2.37 1.72 1.57 1.36 1.89 1.58
2022   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
      HRV  CYP  MLT  ROU  SAU  PER  CRI   EU
1960 2.20   NA   NA   NA 7.63 6.94 6.71 2.62
1961 2.19   NA   NA   NA 7.63 6.92 6.65 2.62
1962 2.17   NA   NA   NA 7.64 6.90 6.54 2.61
1963 2.12   NA   NA   NA 7.65 6.86 6.39 2.65
1964 2.12   NA   NA   NA 7.67 6.81 6.19 2.67
1965 2.21   NA   NA   NA 7.66 6.75 5.96 2.62
1966 2.21   NA   NA   NA 7.66 6.68 5.70 2.58
1967 2.07   NA   NA   NA 7.66 6.60 5.42 2.53
1968 1.99   NA   NA   NA 7.63 6.51 5.03 2.45
1969 1.91   NA   NA   NA 7.60 6.42 4.84 2.41
1970 1.83   NA   NA   NA 7.58 6.32 4.59 2.37
1971 1.95   NA   NA   NA 7.56 6.21 4.36 2.36
1972 1.97   NA   NA   NA 7.54 6.09 4.16 2.29
1973 1.98   NA   NA   NA 7.48 5.97 3.99 2.23
1974 1.95   NA   NA   NA 7.43 5.84 3.89 2.23
1975 1.92   NA   NA 2.59 7.37 5.71 3.80 2.18
1976 1.90   NA   NA 2.54 7.33 5.58 3.75 2.14
1977 1.91   NA   NA 2.57 7.30 5.44 3.70 2.09
1978 1.92   NA   NA 2.52 7.26 5.31 3.66 2.05
1979 1.94   NA   NA 2.49 7.23 5.17 3.65 2.03
1980 1.92   NA 1.99 2.43 7.19 5.04 3.59 2.00
1981 1.91   NA 1.87 2.36 7.13 4.92 3.56 1.95
1982 1.90 2.48 2.04 2.17 7.05 4.80 3.54 1.93
1983 1.88 2.50 1.97 2.06 6.95 4.68 3.53 1.89
1984 1.87 2.52 1.97 2.26 6.84 4.57 3.52 1.87
1985 1.81 2.43 1.99 2.31 6.70 4.46 3.51 1.84
1986 1.76 2.46 1.94 2.39 6.55 4.35 3.46 1.84
1987 1.74 2.38 1.97 2.38 6.36 4.25 3.40 1.82
1988 1.74 2.49 2.10 2.30 6.17 4.14 3.34 1.83
1989 1.67 2.36 2.11 2.22 6.00 4.03 3.27 1.79
1990 1.67 2.41 2.05 1.83 5.83 3.91 3.21 1.78
1991 1.55 2.32 2.10 1.59 5.66 3.79 3.12 1.73
1992 1.39 2.48 2.12 1.51 5.49 3.67 3.04 1.70
1993 1.43 2.24 2.01 1.43 5.32 3.55 2.96 1.62
1994 1.43 2.17 1.89 1.40 5.14 3.43 2.89 1.56
1995 1.50 2.03 1.82 1.33 4.95 3.32 2.80 1.51
1996 1.64 1.95 2.01 1.30 4.77 3.20 2.71 1.50
1997 1.69 1.86 1.95 1.32 4.59 3.10 2.64 1.48
1998 1.45 1.76 1.81 1.32 4.42 3.00 2.53 1.45
1999 1.38 1.67 1.72 1.30 4.25 2.92 2.48 1.45
2000 1.39 1.64 1.69 1.31 4.12 2.85 2.41 1.47
2001 1.46 1.57 1.50 1.27 3.91 2.74 2.33 1.44
2002 1.42 1.49 1.45 1.27 3.71 2.69 2.20 1.43
2003 1.41 1.51 1.48 1.30 3.50 2.66 2.14 1.45
2004 1.43 1.52 1.40 1.33 3.34 2.67 2.08 1.47
2005 1.50 1.48 1.38 1.40 3.24 2.69 2.04 1.48
2006 1.47 1.52 1.36 1.42 3.21 2.69 2.01 1.51
2007 1.48 1.44 1.35 1.45 3.18 2.67 2.01 1.53
2008 1.55 1.48 1.43 1.60 3.06 2.63 2.02 1.59
2009 1.58 1.47 1.42 1.66 2.95 2.61 1.98 1.59
2010 1.55 1.44 1.36 1.59 2.85 2.57 1.93 1.58
2011 1.48 1.35 1.45 1.47 2.81 2.54 1.90 1.54
2012 1.51 1.39 1.42 1.52 2.78 2.49 1.88 1.54
2013 1.46 1.30 1.36 1.41 2.74 2.43 1.84 1.51
2014 1.46 1.31 1.38 1.52 2.69 2.38 1.82 1.54
2015 1.40 1.32 1.37 1.58 2.64 2.34 1.79 1.54
2016 1.42 1.37 1.37 1.64 2.59 2.31 1.75 1.56
2017 1.42 1.32 1.26 1.71 2.58 2.28 1.74 1.55
2018 1.47 1.32 1.23 1.76 2.55 2.26 1.71 1.54
2019 1.47 1.33 1.14 1.77 2.50 2.24 1.63 1.52
2020 1.48 1.36 1.13 1.80 2.47 2.22 1.56 1.50
2021 1.58 1.39 1.13 1.81 2.43 2.19 1.53 1.53
2022   NA   NA   NA   NA   NA   NA   NA   NA

Later, we will discuss the meanings of each step and the fact that there are much simpler solutions available.

fertility_df <- readr::read_csv("https://stats.oecd.org/sdmx-json/data/DP_LIVE/.FERTILITY.../OECD?contentType=csv&detail=code&separator=comma&csv-lang=en")
fertility_df <- fertility_df[, c("LOCATION", "TIME", "Value")]  

fertility_df <- tidyr::pivot_wider(fertility_df, names_from = "LOCATION", 
                   values_from = "Value")

fertility_df <- tibble::column_to_rownames(fertility_df, "TIME")
fertility_df <- fertility_df[, - 51] # OECD average

fertility_df <- data.frame(fertility_df)

At the end of these steps, we have a table, each column of which shows the time series of the fertility rate of a country.

Apply - MARGIN = 1

apply(fertility_df, MARGIN = 1, above_replacement_prop)

      1960       1961       1962       1963       1964       1965       1966 
0.92000000 0.92000000 0.92000000 0.92000000 0.92000000 0.92000000 0.88000000 
      1967       1968       1969       1970       1971       1972       1973 
0.86000000 0.86000000 0.82000000 0.80000000 0.78000000 0.68000000 0.66000000 
      1974       1975       1976       1977       1978       1979       1980 
0.66000000 0.62745098 0.60784314 0.56862745 0.50980392 0.52941176 0.48076923 
      1981       1982       1983       1984       1985       1986       1987 
0.42307692 0.39622642 0.43396226 0.39622642 0.37735849 0.43396226 0.43396226 
      1988       1989       1990       1991       1992       1993       1994 
0.47169811 0.37735849 0.35849057 0.32075472 0.30188679 0.28301887 0.28301887 
      1995       1996       1997       1998       1999       2000       2001 
0.24528302 0.26415094 0.24528302 0.24528302 0.24528302 0.22641509 0.22641509 
      2002       2003       2004       2005       2006       2007       2008 
0.22641509 0.18867925 0.18867925 0.18867925 0.20754717 0.22641509 0.20754717 
      2009       2010       2011       2012       2013       2014       2015 
0.20754717 0.18867925 0.15094340 0.18867925 0.16981132 0.16981132 0.16981132 
      2016       2017       2018       2019       2020       2021       2022 
0.15094340 0.13207547 0.11320755 0.11320755 0.09433962 0.09433962 0.00000000

If MARGIN == 1, the apply calculates the proportion of observation above 2.1 in each row. The returned value is a named vector, which means that you can refer to its values by index, or by name.

Apply - MARGIN = 1

apply(fertility_df, MARGIN = 1, above_replacement_prop)[1]

1960 
0.92

apply(fertility_df, MARGIN = 1, above_replacement_prop)["2020"]

      2020 
0.09433962

Apply - MARGIN = 2

apply(fertility_df, MARGIN = 2, above_replacement_prop)

       AUS        AUT        BEL        CAN        CZE        DNK        FIN 
0.25806452 0.19354839 0.19354839 0.19354839 0.22580645 0.14285714 0.14516129 
       FRA        DEU        GRC        HUN        ISL        IRL        ITA 
0.24193548 0.16129032 0.35483871 0.06349206 0.56451613 0.48387097 0.27419355 
       JPN        KOR        LUX        MEX        NLD        NZL        NOR 
0.12903226 0.37096774 0.14516129 0.90322581 0.20967742 0.43548387 0.23809524 
       POL        PRT        SVK        ESP        SWE        CHE        TUR 
0.46774194 0.35483871 0.46774194 0.33870968 0.16129032 0.17741935 0.87096774 
       GBR        USA        BRA        CHL        CHN        EST        IND 
0.20967742 0.22580645 0.69354839 0.64516129 0.50000000 0.17741935 0.96774194 
       IDN        ISR        RUS        SVN        ZAF        COL        LVA 
1.00000000 1.00000000 0.17741935 0.33870968 1.00000000 0.77419355 0.08064516 
       LTU        ARG        BGR        HRV        CYP        MLT        ROU 
0.33870968 0.93548387 0.25806452 0.11290323 0.32500000 0.09523810 0.29787234 
       SAU        PER        CRI         EU 
1.00000000 1.00000000 0.70967742 0.27419355

If MARGIN == 2, the apply calculates the proportion of observation above 2.1 in each column.

Lapply

lapply should be used for same purposes, but it can be used for lists or vectors.

mylist

[[1]]
  id            name        species
1  1 Captain America          human
2  2            Hulk           <NA>
3  3           Groot Flora colossus
4  4         Strange          human

[[2]]
[1] "XL"    "M"     "L"     "S"     "L"     "other"

Lapply

lapply should be used for the same purposes, but it can be used for lists or vectors.

Length of each element:

lapply(mylist, length)

[[1]]
[1] 3

[[2]]
[1] 6

Sapply

sapply is fully identical but returns a vector (s stand for SIMPLIFY)

sapply(mylist, length)

[1] 3 6