---
title: "Running bhmbasket on HPC"
author: "Stephan Wojciekowski"
date: '`r format(Sys.time(), "%B %d, %Y")`'
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Running bhmbasket on HPC}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r knitr_options, include = FALSE}
knitr::opts_chunk$set(
  eval     = FALSE,  # en- / disables R code evaluation globally
  cache    = FALSE,  # en- / disables R code caching globally
  collapse = TRUE,
  comment  = "#>"
)
```

This vignette provides a short example on how to use the R package `bhmbasket` in a high performance computing (HPC) environment using the R packages `doFuture` and `future.batchtools`.

```{r setup}
library(bhmbasket)
library(doFuture)
library(future.batchtools)

rng_seed <- 5440
set.seed(rng_seed)
```

## Setup of the parallel backend
The code below provides an example for specifying a parallel backend using the future framework with the SLURM job scheduler.
Kindly see the documentation of the R packages `doFuture` and `future.batchtools` for further options.
This code is to be run on the master node.
```{r SLURM_Setup}
## Adapt the SLURM template to requirements
job_time  <- 1   # time for job in hours
n_workers <- 24  # number of worker nodes
n_cpus    <- 16  # number of cpus per worker node
gb_memory <- 2   # memory [GB] per cpu

slurm <- tweak(batchtools_slurm,
           template  = system.file('templates/slurm-simple.tmpl',
                                   package = 'batchtools'),
           workers   = n_workers,
           resources = list(
             walltime  = 60 * 60 * job_time,
             ncpus     = n_cpus,
             memory    = 1000 * gb_memory))

## Register the parallel backend 
registerDoFuture()

## Specify how the futures should be resolved
plan(list(slurm, multisession))
```

## Running some bhmbasket code on HPC
The R package `bhmbasket` makes use of the foreach framework and runs with every applicable parallel backend. 
With a parallel backend registered as shown above and running the code on the master node, the job scheduler will automatically distribute the jobs to the worker nodes via `plan(slurm)`, and with the nested parallelization built into `performAnalyses()`, each worker node makes use of its CPUs via `plan(multisession)`.

Below is some example code, which was taken from the examples section of `?bhmbasket::getEstimates`.
Due to the foreach framework, no adjustments to the code are necessary.
Kindly note that running this small example on a HPC environment will most likely not result in a performance improvement.
```{r}
scenarios_list <- simulateScenarios(
    n_subjects_list     = list(c(10, 20, 30)),
    response_rates_list = list(c(0.1, 0.2, 3)),
    n_trials            = 10)

  analyses_list <- performAnalyses(
    scenario_list       = scenarios_list,
    target_rates        = c(0.1, 0.1, 0.1),
    calc_differences    = matrix(c(3, 2, 2, 1), ncol = 2),
    n_mcmc_iterations   = 100)

  getEstimates(analyses_list)
```
