purrr::map
. Apply a function to each element of a vector, but save the intermediate data after a given number of iterations.R/cp_map_dfr.R
cp_map_dfr.Rd
The map functions transform their input by applying a function to
each element of a list or atomic vector and returning an object of
the same length as the input. cp_map
functions work exactly the
same way, but creates a secret folder in your current working directory
and saves the results if they reach a given checkpoint. This way
if you rerun the code, it reads the result from the cache folder
and start to evalutate where you finished.
cp_map_dfr(.x, .f, ..., name = NULL, cp_options = list())
A list or atomic vector.
A function, specified in one of the following ways:
A named function, e.g. mean
.
An anonymous function, e.g. \(x) x + 1
or function(x) x + 1
.
A formula, e.g. ~ .x + 1
. You must use .x
to refer to the first
argument. Only recommended if you require backward compatibility with
older versions of R.
Additional arguments passed on to the mapped function.
Name for the subfolder in the cache folder. If you do not specify,
then cp_map
uses the name of the function combined with the name of x.
This is dangerous, since this generated name can appear multiple times in your code.
Also changing x will result a rerun of the code, however you max want to avoid this.
(if a subset of .x matches with the cached one and the function is the same,
then elements of this subset won't evaluated, rather read from the cache)
Options for the evaluation: wait
, n_checkpoint
, workers
, fill
.
wait
: An integer to specify that after how many iterations the console shows the intermediate results (default 1
).
If its value is between 0 and 1, then it is taken as proportions of iterations to wait (example length of .x equals 100, then
you get back the result after 50 if you set it to 0.5). Set to Inf
to get back the results only after full evaluations.
If its value is not equal to Inf
then evaluation is goind in background job.
n_chekpoint
: Number of checkpoints, when intermadiate results are saved (default = 100).
workers
: Number of CPU cores to use (parallel package called in background). Set to 1 (default) to avoid parallel computing.
fill()
When you get back a not fully evaluated result (default TRUE
). Should the length of the result be the same as .x?
You can set these options also with options(currr.n_checkpoint = 200)
. Additional options: currr.unchanged_message
(TRUE/FALSE), currr.progress_length
A tibble.
Other map variants:
cp_map_chr()
,
cp_map_dbl()
,
cp_map_dfc()
,
cp_map_lgl()
,
cp_map()
# Run them on console!
options(currr.folder = ".currr")
avg_n <- function(.data, .col, x) {
Sys.sleep(.01)
.data |>
dplyr::pull({{ .col }}) |>
(\(m) mean(m) * x) ()
}
cp_map(.x = 1:10, .f = avg_n, .data = iris, .col = Sepal.Length, name = "iris_mean")
#> ℹ Intermediate result return available only at Rstudio console.
#> |██████████████████████████████████████████████████ | 10% ETA: 1 sec
|██████████████████████████████████████████████████ | 20% ETA: 1 sec
|██████████████████████████████████████████████████ | 30% ETA: 1 sec
|██████████████████████████████████████████████████ | 40% ETA: 1 sec
|██████████████████████████████████████████████████ | 50% ETA: 1 sec
|██████████████████████████████████████████████████ | 60% ETA: 1 sec
|██████████████████████████████████████████████████ | 70% ETA: 1 sec
|██████████████████████████████████████████████████ | 80% ETA: 1 sec
|██████████████████████████████████████████████████ | 90% ETA: 1 sec
|██████████████████████████████████████████████████ | 100% ETA: 0 sec
#> [[1]]
#> [1] 5.843333
#>
#> [[2]]
#> [1] 11.68667
#>
#> [[3]]
#> [1] 17.53
#>
#> [[4]]
#> [1] 23.37333
#>
#> [[5]]
#> [1] 29.21667
#>
#> [[6]]
#> [1] 35.06
#>
#> [[7]]
#> [1] 40.90333
#>
#> [[8]]
#> [1] 46.74667
#>
#> [[9]]
#> [1] 52.59
#>
#> [[10]]
#> [1] 58.43333
#>
# same function, read from cache
cp_map(.x = 1:10, .f = avg_n, .data = iris, .col = Sepal.Length, name = "iris_mean")
#> ℹ Intermediate result return available only at Rstudio console.
#> ✓ Everything is unchanged. Reading cache.
#> [[1]]
#> [1] 5.843333
#>
#> [[2]]
#> [1] 11.68667
#>
#> [[3]]
#> [1] 17.53
#>
#> [[4]]
#> [1] 23.37333
#>
#> [[5]]
#> [1] 29.21667
#>
#> [[6]]
#> [1] 35.06
#>
#> [[7]]
#> [1] 40.90333
#>
#> [[8]]
#> [1] 46.74667
#>
#> [[9]]
#> [1] 52.59
#>
#> [[10]]
#> [1] 58.43333
#>
remove_currr_cache()