Wrapper function of purrr::map. Apply a function to each element of a vector, but save the intermediate data after a given number of iterations.

The map functions transform their input by applying a function to each element of a list or atomic vector and returning an object of the same length as the input. cp_map functions work exactly the same way, but creates a secret folder in your current working directory and saves the results if they reach a given checkpoint. This way if you rerun the code, it reads the result from the cache folder and start to evalutate where you finished.

cp_map() always returns a list.
map_lgl(), map_dbl() and map_chr() return an atomic vector of the indicated type (or die trying). For these functions, .f must return a length-1 vector of the appropriate type.

cp_map_dfr(.x, .f, ..., name = NULL, cp_options = list())

Arguments

.x

A list or atomic vector.

.f

A function, specified in one of the following ways:

A named function, e.g. mean.
An anonymous function, e.g. \(x) x + 1 or function(x) x + 1.
A formula, e.g. ~ .x + 1. You must use .x to refer to the first argument. Only recommended if you require backward compatibility with older versions of R.

...

Additional arguments passed on to the mapped function.

name

Name for the subfolder in the cache folder. If you do not specify, then cp_map uses the name of the function combined with the name of x. This is dangerous, since this generated name can appear multiple times in your code. Also changing x will result a rerun of the code, however you max want to avoid this. (if a subset of .x matches with the cached one and the function is the same, then elements of this subset won't evaluated, rather read from the cache)

cp_options

Options for the evaluation: wait, n_checkpoint, workers, fill.

wait: An integer to specify that after how many iterations the console shows the intermediate results (default 1). If its value is between 0 and 1, then it is taken as proportions of iterations to wait (example length of .x equals 100, then you get back the result after 50 if you set it to 0.5). Set to Inf to get back the results only after full evaluations. If its value is not equal to Inf then evaluation is goind in background job.
n_chekpoint: Number of checkpoints, when intermadiate results are saved (default = 100).
workers: Number of CPU cores to use (parallel package called in background). Set to 1 (default) to avoid parallel computing.
fill() When you get back a not fully evaluated result (default TRUE). Should the length of the result be the same as .x?

You can set these options also with options(currr.n_checkpoint = 200). Additional options: currr.unchanged_message (TRUE/FALSE), currr.progress_length

Value

A tibble.

Examples

# Run them on console!
options(currr.folder = ".currr")

avg_n <- function(.data, .col, x) {
  Sys.sleep(.01)

  .data |>
    dplyr::pull({{ .col }}) |>
    (\(m) mean(m) * x) ()
}


cp_map(.x = 1:10, .f = avg_n, .data = iris, .col = Sepal.Length, name = "iris_mean")
#> ℹ Intermediate result return available only at Rstudio console.
#> |██████████████████████████████████████████████████ | 10% ETA:  1 sec                          
|██████████████████████████████████████████████████ | 20% ETA:  1 sec                          
|██████████████████████████████████████████████████ | 30% ETA:  1 sec                          
|██████████████████████████████████████████████████ | 40% ETA:  1 sec                          
|██████████████████████████████████████████████████ | 50% ETA:  1 sec                          
|██████████████████████████████████████████████████ | 60% ETA:  1 sec                          
|██████████████████████████████████████████████████ | 70% ETA:  1 sec                          
|██████████████████████████████████████████████████ | 80% ETA:  1 sec                          
|██████████████████████████████████████████████████ | 90% ETA:  1 sec                          
|██████████████████████████████████████████████████ | 100% ETA:  0 sec                          

#> [[1]]
#> [1] 5.843333
#> 
#> [[2]]
#> [1] 11.68667
#> 
#> [[3]]
#> [1] 17.53
#> 
#> [[4]]
#> [1] 23.37333
#> 
#> [[5]]
#> [1] 29.21667
#> 
#> [[6]]
#> [1] 35.06
#> 
#> [[7]]
#> [1] 40.90333
#> 
#> [[8]]
#> [1] 46.74667
#> 
#> [[9]]
#> [1] 52.59
#> 
#> [[10]]
#> [1] 58.43333
#> 

 # same function, read from cache
cp_map(.x = 1:10, .f = avg_n, .data = iris, .col = Sepal.Length, name = "iris_mean")
#> ℹ Intermediate result return available only at Rstudio console.
#> ✓ Everything is unchanged. Reading cache.

#> [[1]]
#> [1] 5.843333
#> 
#> [[2]]
#> [1] 11.68667
#> 
#> [[3]]
#> [1] 17.53
#> 
#> [[4]]
#> [1] 23.37333
#> 
#> [[5]]
#> [1] 29.21667
#> 
#> [[6]]
#> [1] 35.06
#> 
#> [[7]]
#> [1] 40.90333
#> 
#> [[8]]
#> [1] 46.74667
#> 
#> [[9]]
#> [1] 52.59
#> 
#> [[10]]
#> [1] 58.43333
#> 

remove_currr_cache()

Wrapper function of `purrr::map`. Apply a function to each element of a vector, but save the intermediate data after a given number of iterations.

Arguments

Value

See also

Examples