---
title: "Tips to speed up computation"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Tips to speed up computation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(aorsf)
```

## Go faster

Analyses can slow to a crawl when models need hours to run. In this article you will find a few tricks to prevent this bottleneck when using `orsf()`.

## Don't specify a `control`

The default `control` for `orsf()` is `NULL` because, if unspecified, `orsf()` will pick the fastest possible `control` for you depending on the type of forest being grown. The default `control` run-time compared to other approaches can be striking. For example:


```{r}

time_fast <- system.time(
 expr = orsf(pbc_orsf, 
             formula = time+status~. -id, 
             n_tree = 5)
)

time_net <- system.time(
 expr = orsf(pbc_orsf, 
             formula = time+status~. -id, 
             control = orsf_control_survival(method = 'net'), 
             n_tree = 5)
)

# unspecified control is much faster
time_net['elapsed'] / time_fast['elapsed']

```

## Use `n_thread`

The `n_thread` argument uses multi-threading to run `aorsf` functions in parallel when possible. If you know how many threads you want, e.g. you want exactly 5, set `n_thread = 5`. If you aren't sure how many threads you have available but want to use a feasible amount, using `n_thread = 0` (the default) tells `aorsf` to do that for you. 

```{r}

# automatically pick number of threads based on amount available

orsf(pbc_orsf, 
     formula = time+status~. -id, 
     n_tree = 5,
     n_thread = 0)

```

Note: sometimes multi-threading is not possible. For example, because R is a single threaded language, multi-threading cannot be applied when `orsf()` needs to call R functions from C++, which occurs when a customized R function is used to find linear combination of variables or compute prediction accuracy.

## Do less

There are some inputs in `orsf()` that can be adjusted to make it run faster:

- set `n_retry` to `0` 

- set `oobag_pred_type` to `'none'` 

- set `importance` to `'none'` 

- increase `split_min_events`, `split_min_obs`, `leaf_min_events`, or `leaf_min_obs` to make trees stop growing sooner

- increase `split_min_stat` to enforce more strict requirements for growing deeper trees.

Applying these tips:

```{r}

orsf(pbc_orsf, 
     formula = time+status~., 
     n_thread = 0, 
     n_tree = 5, 
     n_retry = 0,
     oobag_pred_type = 'none', 
     importance = 'none',
     split_min_events = 20, 
     leaf_min_events = 10,
     split_min_stat = 10)

```

While modifying these inputs can make `orsf()` run faster, they can also impact prediction accuracy. 

## Show progress

Setting `verbose_progress = TRUE` doesn't make anything run faster, but it can help make it *feel* like things are running less slow.

```{r}

verbose_fit <- orsf(pbc_orsf, 
                    formula = time+status~. -id, 
                    n_tree = 5, 
                    verbose_progress = TRUE)

```

## Don't wait. Estimate!

Instead of running a model and hoping it will be fast, you can estimate how long a specification of that model will take by using `no_fit = TRUE` in the call to `orsf()`.

```{r}

fit_spec <- orsf(pbc_orsf, 
                 formula = time+status~. -id, 
                 control = orsf_control_survival(method = 'net'), 
                 n_tree = 2000,
                 no_fit = TRUE)

# how much time it takes to estimate training time:
system.time(
 time_est <- orsf_time_to_train(fit_spec, n_tree_subset = 5)
)

# the estimated training time:
time_est

```


