---
title: "Using the survregVB Package"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{survregVB}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
bibliography: references.bib
csl: https://raw.githubusercontent.com/citation-style-language/styles/master/apa.csl
---

```{r pre,echo=FALSE,results='hide'}
library(knitr)
opts_chunk$set(warning = FALSE, message = FALSE, cache = FALSE)
```

# Overview of survregVB

The `survregVB` function provides a fast and accessible solution for variational inference in accelerated failure time (AFT) models for right-censored survival times following a log-logistic distribution. It provides an efficient alternative to Markov chain Monte Carlo (MCMC) methods by implementing a mean-field variational Bayes (VB) algorithm for parameter estimation. The VB approach employs a coordinate ascent algorithm and incorporates a piecewise approximation technique when computing expectations to achieve conjugacy [@xian2024]. @Rcpp

## The AFT Model

The log-logistic AFT model without shared frailty is specified as follows for the $i^{th}$ subject in the sample, $i=1,...,n$ , $T_i$:

$$
log(T_i):=Y=X_i^T\beta+bz_i
$$

where $X_i$ is column vector of $p-1$ covariates and a constant one (i.e. $X_i=(1,x_i1,...,x_i(p-1))^T$), $\beta$ is a vector of coefficients for the covariates, $z_i$ is a random variable following a standard logistic distribution with scale parameter $b$.

The `survregVB` function uses a Bayesian framework to obtain the optimal variational densities of parameters $\beta$ and $b$ by maximizing the evidence based lower bound (ELBO). To do so, we assume prior distributions:

-   $\beta\sim\text{MVN}(\mu_0,\sigma_0^2I_{p*p})$, with precision $v_0=1/\sigma^2$, and

-   $b\sim\text{Inverse-Gamma}(\alpha_0,\omega_0)$

where $\mu_0,v_0,\alpha_0$ and $\omega_0$ are known hyperparameters. At the end of the model fitting process, `survregVB` obtains the approximated posterior distributions:

-   $q^*(\beta)$, a $N_p(\mu,\Sigma)$ density function, and

-   $q^*(b)$, an $\text{Inverse-Gamma}(\alpha,\omega)$ density function,

where the parameters $\mu,\Sigma,\alpha$ and $\omega$ are obtained via the VB algorithm [@xian2024].

### The AFT Model With Shared Frailty

We can also use the `survregVB` function is to fit it a shared frailty log-logistic AFT regression model that accounts for intra-cluster correlation through a cluster-specific random intercept. For time $T_{ij}$ of the $j_{th}$ subject from the $i_{th}$ cluster in the sample, in a sample with $i=1,...,K$ clusters and $j=1,...,n_i$ subjects:

$$
\log(T_{ij})=\gamma_i+X_{ij}^T\beta+b\epsilon_{ij}
$$

where $X_{ij}$ is column vector of $p-1$ covariates and a constant one (i.e. $X_{ij}=(1,x_{ij1},...,x_{ij(p-1)})^T$), $\beta$ is a vector of coefficients, $\gamma_i$ is a random intercept for the $i^{th}$ cluster, $\epsilon_{ij}$ is a variable following a standard logistic distribution with scale parameter $b$.

In addition to parameters $\beta$ and $b$, `survregVB` obtains the optimal variational densities of parameters $\sigma^2_\gamma$ (the frailty variance) and $\gamma_i$. In addition to $\beta$ and $b$, we assume prior distributions:

-   $\gamma_i|\sigma^2_\gamma\mathop{\sim}\limits^{\mathrm{iid}}N(0,\sigma^2_\gamma)$, and

-   $\sigma^2\sim\text{Inverse-Gamma}(\lambda_0,\eta_0)$,

where $\mu_0,v_0,\alpha_0$, $\omega_0$, $\lambda_0$ and $\eta_0$ are known hyperparameters. At the end of the model fitting process, `survregVB`obtains the approximated posterior distribution,

-   $q^*(\gamma_i)$, a $N(\tau^*_i,\sigma^{2*}_i)$ density function, and

-   $q^*(\sigma^2_\gamma)$, an $\text{Inverse-Gamma}(\lambda^*,\eta^*)$ density function,

where the parameters $\mu,\Sigma,\alpha,\omega,\tau_i, \sigma_i, \lambda$ and $\eta$ are obtained via the VB algorithm [@xian2024a].

# Getting Started using survregVB

First, we load the survregVB and survival libraries.

```{r loadLibrary}
library(survregVB)
library(survival)
```

## Fitting the Model

For the `dnase` data set included in the package, our goal is to fit it a log-logistic AFT regression model of the form:

$$
\log(T):=Y=\beta_0+\beta_1x_1+\beta_2x_2+bz
$$ where `trt` ($x_1$, treatment, binary) and `fev` ($x_2$, forced expiratory volume, continuous) are the covariates of interest, and the right-censoring indicator is `infect` [@xian2024].

The following fits the model with priors based off previous studies:

```{r}
fit <- survregVB(
  formula = Surv(time, infect) ~ trt + fev,
  data = dnase,
  alpha_0 = 501,
  omega_0 = 500,
  mu_0 = c(4.4, 0.25, 0.04),
  v_0 = 1,
  max_iteration = 10000,
  threshold = 0.0005,
  na.action = na.omit
)
print(fit)
summary(fit)
```

## Fitting a Model with Shared Frailty

We will fit the `simulation_frailty` data set included in the package to a log-logistic AFT regression model with shared frailty. For the $j^{th}$ subject in the $i^{th}$ cluster, $i=1,...,K$ and $j=1,...,n_i$:

$$
\log(T_i):=Y_i=0.5+\beta_1x_{1i}+\beta_2x_{2i}+\gamma_i+b\epsilon_i
$$

The following fits the model with non-informative priors [@xian2024a]:

```{r}
fit_frailty <- survregVB(
  formula = Surv(Time.15, delta.15) ~ x1 + x2,
  data = simulation_frailty,
  alpha_0 = 3,
  omega_0 = 2,
  mu_0 = c(0, 0, 0),
  v_0 = 0.1,
  lambda_0 = 3,
  eta_0 = 2,
  cluster = cluster,
  max_iteration = 100,
  threshold = 0.01
)
print(fit_frailty)
summary(fit_frailty)
```

# Session info

The following package and versions were used in the production of this vignette.

```{r echo=FALSE}
sessionInfo()
```

# References
