---
title: "Modelling uncertainty in R₀"
output:
  bookdown::html_vignette2:
    fig_caption: yes
    code_folding: show
pkgdown:
  as_is: true
bibliography: references.json
link-citations: true
vignette: >
  %\VignetteIndexEntry{Modelling uncertainty in R₀}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

Epidemic final size calculations are sensitive to input data such as the $R_0$ of the infection. Such values can often be uncertain in the early stages of an outbreak. This uncertainty can be included in final size calculations by running `final_size()` for values drawn from a distribution, and summarising the outcomes.

::: {.alert .alert-warning}
**New to _finalsize_?** It may help to read the ["Get started"](finalsize.html), ["Modelling heterogeneous contacts"](varying_contacts.html), or ["Modelling heterogeneous susceptibility"](varying_susceptibility.Rmd) vignettes first!
:::

::: {.alert .alert-primary}
## Use case {-}

The infection **parameter ($R_0$) is uncertain**. We want to know how much variation this could cause in the estimated final size of the epidemic.
:::

::: {.alert .alert-secondary}
### What we have {-}

  1. In addition to the $R_0$, demography data, social contact data, and data on the distribution of susceptibility among age groups;
  2. A measure of the error in the estimated $R_0$ of the infection.

### What we assume {-}

  1. An SIR epidemic, and the complete partitioning of individuals into different demographic and infection risk groups.
:::

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE,
  warning = FALSE,
  dpi = 300
)
```

```{r setup, message=FALSE, warning=FALSE, class.source = 'fold-hide'}
# load finalsize
library(finalsize)
library(socialmixr)
library(ggplot2)
```

## Getting $R_0$, contact and demography data, and susceptibility

This example uses social contact data from the POLYMOD project [@mossong2008] to estimate the final size of an epidemic in the U.K. These data are provided with the `socialmixr` package.

These data are handled just as in the ["Get started"](finalsize.html) vignette. This example also considers an infection with an $R_0$ of 1.5.

```{r}
# get UK polymod data from socialmixr
polymod <- socialmixr::polymod
contact_data <- socialmixr::contact_matrix(
  polymod,
  countries = "United Kingdom",
  age.limits = c(0, 5, 18, 40, 65),
  symmetric = TRUE
)

# get the contact matrix and demography data
contact_matrix <- t(contact_data$matrix)
demography_vector <- contact_data$demography$population

# scale the contact matrix so the largest eigenvalue is 1.0
contact_matrix <- contact_matrix / max(Re(eigen(contact_matrix)$values))

# divide each row of the contact matrix by the corresponding demography
contact_matrix <- contact_matrix / demography_vector

n_demo_grps <- length(demography_vector)
```

```{r}
# mean R0 is 1.5
r0_mean <- 1.5
```

For simplicity, this example considers a scenario in which susceptibility to infection does not vary.

```{r}
# susceptibility is uniform
susc_uniform <- matrix(
  data = 1,
  nrow = n_demo_grps,
  ncol = 1L
)

# p_susceptibility is uniform
p_susc_uniform <- susc_uniform
```

## Running `final_size` over $R_0$ samples

The basic reproduction number $R_0$ of an infection might be uncertain in the early stages of an epidemic. This uncertainty can be modelled by running `final_size()` multiple times for the same contact, demography, and susceptibility data, while sampling $R_0$ values from a distribution.

This example assumes that the $R_0$ estimate, and the uncertainty around that estimate, is provided as the mean and standard deviation of a normal distribution.

This example considers a normal distribution $N(\mu = 1.5, \sigma = 0.1)$, for an $R_0$ of 1.5. We can draw 1,000 $R_0$ samples from this distribution and run `final_size()` on the contact data and demography data for each sample.

This is quick, as `finalsize` is an Rcpp package with a C++ backend.

```{r}
# create an R0 samples vector
r0_samples <- rnorm(n = 1000, mean = r0_mean, sd = 0.1)
```

### Iterate final_size() {.tabset}

#### With base R

```{r}
# run final size on each sample with the same data
final_size_data <- Map(
  r0_samples, seq_along(r0_samples),
  f = function(r0, i) {
    # the i-th final size estimate
    fs <- final_size(
      r0 = r0,
      contact_matrix = contact_matrix,
      demography_vector = demography_vector,
      susceptibility = susc_uniform,
      p_susceptibility = p_susc_uniform
    )

    fs$replicate <- i
    fs$r0_estimate <- r0
    fs
  }
)

# combine data
final_size_data <- Reduce(x = final_size_data, f = rbind)

# order age groups
final_size_data$demo_grp <- factor(
  final_size_data$demo_grp,
  levels = contact_data$demography$age.group
)

# examine some replicates in the data
head(final_size_data, 15)
```

#### With {tidyverse}

```{r}
library(tibble)
library(dplyr)
library(tidyr)
library(purrr)
library(forcats)

final_size_data <-
  # create a dataframe with values from a vector
  tibble(r0 = r0_samples) %>%
  rownames_to_column() %>%
  # map the function final_size() to all the r0 values
  # with the same set of arguments
  # with {purrr}
  mutate(
    temp = map(
      .x = r0,
      .f = final_size,
      contact_matrix = contact_matrix,
      demography_vector = demography_vector,
      susceptibility = susc_uniform,
      p_susceptibility = p_susc_uniform
    )
  ) %>%
  # unnest all the dataframe outputs in temp
  unnest(temp) %>%
  # relevel the factor variable
  mutate(
    demo_grp = fct_relevel(
      demo_grp,
      contact_data %>%
        pluck("demography") %>%
        pluck("age.group")
    )
  )

head(final_size_data, 15)
```

### Visualise uncertainty in final size

```{r class.source = 'fold-hide', class.source = 'fold-hide', fig.cap="Estimated ranges of the final size of a hypothetical SIR epidemic in age groups of the U.K. population, when the $R_0$ is estimated to be 1.5, with a standard deviation around this estimate of 0.1. In this example, relatively low uncertainty in $R_0$ estimates can also lead to uncertainty in the estimated final size of the epidemic. Points represent means, while ranges extend between the 5th and 95th percentiles.", fig.width=5, fig.height=4}
ggplot(final_size_data) +
  stat_summary(
    aes(
      demo_grp, p_infected
    ),
    fun = mean,
    fun.min = function(x) {
      quantile(x, 0.05)
    },
    fun.max = function(x) {
      quantile(x, 0.95)
    }
  ) +
  scale_y_continuous(
    labels = scales::percent,
    limits = c(0.25, 1)
  ) +
  theme_classic() +
  theme(
    legend.position = "top",
    legend.key.height = unit(2, "mm"),
    legend.title = ggtext::element_markdown(
      vjust = 1
    )
  ) +
  coord_cartesian(
    expand = TRUE
  ) +
  labs(
    x = "Age group",
    y = "% Infected"
  )
```
