---
title: "Theoretical background"
author: "Adam Kucharski"
output:
  bookdown::html_vignette2:
    fig_caption: yes
    code_folding: show
pkgdown:
  as_is: true
bibliography: references.json
link-citations: true
vignette: >
  %\VignetteEncoding{UTF-8}
  %\VignetteIndexEntry{Theoretical background}
  %\VignetteEngine{knitr::rmarkdown}
editor_options:
  chunk_output_type: console
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  echo = TRUE,
  message = FALSE,
  warning = FALSE,
  collapse = TRUE,
  comment = "#>"
)
```

_finalsize_ provides methods to calculate the final proportion of individuals in different age groups infected during an epidemic, as predicted by an age-stratified SIR model. In this vignette we lay out the mathematical concepts behind the functionality available in the package.

## From SIR to final size with homogeneous mixing

The dynamics of the proportion of susceptible and infectious individuals over time in the standard susceptible-infectious-recovered (SIR) epidemic model are given by the following ordinary differential equations:
$$
\frac{d S(t)}{d t} = -\beta S(t) I(t) \\
\frac{d I(t)}{d t}  =  \beta S(t) I(t) -\gamma I(t)
$$
where $\beta$ is the transmission rate and $\gamma$ is the rate of recovery.

To calculate the proportion infected, we need to derive an expression for the proportion of susceptibles who got infected during the epidemic, i.e. $\phi = 1- S(\infty)/S(0)$. Rather than simulating the model over a very long time period (i.e. $t \rightarrow \infty$), we can instead calculate $dI/dS$ and integrate:

$$
\frac{d I}{d S} = - 1 + \frac{\gamma}{\beta S} \\
I(t) = - S(t) + \frac{\gamma}{\beta} \log S(t) + c
$$
where $c$ depends on initial conditions. Under the assumption the population is fully susceptible initially (i.e. $S(0)=1$), and $I(t) \rightarrow 0$ as $t \rightarrow \infty$, we obtain the following:
$$
\log S(\infty) = - R_0 [1-S(\infty) ] \\
\mathrm{log}~\phi  -R_0 (1-\phi) = 0 = F(\phi)
$$
This equation doesn't have an analytical solution, but there are various methods available to allow it to be solved numerically to find the unique solution in (0,1]. For example, we can use Newton’s method (`solver = "newton"` in the `final_size()` function) to find the final epidemic size:

1. Select an arbitrary initial estimate $x_0$ and tolerance $T$ ;
2. Solve $\frac{dF}{d \phi} (x_0) \Delta x = −F(x_0) ~ \text{for } ~ \Delta x$ ;
3. Set $x_{i+1} = x_i + \Delta x$;
4. Repeat until $x_i$ converges sufficiently (i.e. difference in each step is less than $T$)

## Varying susceptibility

We can extend the above formulation to account for variable susceptibility among individuals, following the methods outlined in @miller2012. First, we define $p(x)$ to be the probability density for a randomly chosen individual to be of susceptibility type $x$. If we define the probability an individual of type $x$ to be ultimately infected as $\phi(x)$, then the overall proportion of the population to be infected is defined as follows, depending on whether $p(x)$ is a continuous or discrete distribution:

$$
\hat\phi = \int_0^\infty \phi(x) p(x) dx \\
\text{or} \\
\hat\phi = \sum_x \phi(x) p(x) dx \\
$$

If we define $x\hat\phi$ to be the expected number of transmissions to a given individual of type $x$, we can therefore calculate the probability that this individual remains susceptible. This gives the following relationship between $\phi$ and $\hat\phi$:

$$
1-\phi(x) = S(x,0) e^{-x \hat\phi } \\
$$
And hence we can combine the above to give:
$$
\hat\phi = \int_0^\infty (1-   S(x,0) e^{-x \hat\phi }) p(x) dx \\
\hat\phi = 1- \int_0^\infty  S(x,0) e^{-x \hat\phi } p(x) dx
$$

Under the assumption that $S(x,0)$ is near 1 initially (i.e. f), we therefore obtain:
$$
\hat\phi = 1- \int_0^\infty p(x) e^{-x \hat\phi } dx
$$

## Final size equation with heterogeneous mixing

We can extend the same derivation for the above homogenous model for populations with multiple age groups, following @andreasen2011. The age-dependent ODEs are defined as follows:

$$
\frac{d S(a,t)}{d t} = -S(a,t) \sum_b \beta_{ab} I(b,t) \\
\frac{d I(a,t)}{d t}  =  S(a,t) \sum_b \beta_{ab} I(b,t) -\gamma I(a,t)
$$
Where $\beta_{a,b}$ is the transmission rate from group $b$ to group $a$. If we define $R_{a,b} =\beta_{a,b}/\gamma$ to be group specific reproduction number, we can rearrange the above and integrate to derive an expression for the final size in each age group:

$$
- \gamma \sum_b R_{a,b} I(b,t)   = \frac{d S(a,t)}{d t} \frac{1}{S(a,t)} \\
- \gamma \sum_b R_{a,b}  \int_0^\infty I(b,t)  ~dt  = \int_0^\infty \frac{d S(a,t)}{d t} \frac{1}{S(a,t)} ~dt = \mathrm{log}~S(a,\infty) -\mathrm{log}~S(a,0) = \mathrm{log}~\frac{S(a,\infty)}{S(a,0)}  \\
 -\gamma \int_0^\infty I(a,t)  ~dt = \int_0^\infty \frac{d S(a,t)}{d t} +\frac{dI(a,t)}{dt} ~dt = S(a,\infty) -S(a,0)  ~.
$$
and hence:

$$
\mathrm{log}~\frac{S(a,\infty)}{S(a,0)}  = - \sum_b R_{a,b} [ S(b,\infty) -S(b,0)]
$$

If we define $\underline\phi = (\phi_1, ... \phi_n)$ to be a vector of final sizes in each of $n$ age group, we can rewrite the above as

$$
\mathrm{log}~\underline\phi  -\mathbf{R} (1-\underline\phi) = 0 = F(\underline\phi)
$$
where $\mathbf{R}$ is a matrix with entries $R_{i,j}$. Once again, we can use Newton's method, this time with $\underline\phi$ as a vector, and hence we need to solve a set of linear equations in step 2.

## References
