---
title: "Evaluate continuous biomarkers with caROC"
author: "Ziyi Li (zli16@mdanderson.org)"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Evaluate continuous biomarkers with caROC}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

`caROC` is an R package devoted to the assessment of continuous biomarkers. The metrics considered include **specificity at contolled sensitivity level**, **sensitivity at controlled specificity level**, and **receiver operating characteristic** (ROC) curve. If evaluation in specific sub-population is interested, all these statistics can also be computed in the version sub-population specific analysis. We allow both categorical and continuous covariates to be adjusted in computing these metrics.

## Installation and quick start

### Install caROC

Install `caROC` through

```{r install, eval=FALSE, message=FALSE, warning=FALSE}
library(devtools)
install_github("ziyili20/caROC")
```

### How to get help for caROC

Any caROC questions should be posted
to the GitHub Issue section of caROC 
homepage at https://github.com/ziyili20/caROC/issues.

### Quick start on evaluating continuous biomarker with covariates adjusted

```{r quick_start, eval = FALSE, message=FALSE}
library(caROC)
## get specificity at controlled sensitivity levels 0.2, 0.8, 0.9
caROC(diseaseData,controlData,formula,
      control_sensitivity = c(0.2,0.8, 0.9),
      control_specificity = NULL)
      
## get covariate-adjusted ROC curve with curve-based monotonizing method
curveROC <- caROC(diseaseData,controlData,formula,
            mono_resp_method = "curve", 
            verbose = FALSE)
```

### Illustrating the usage of caROC in details

The tutorial is based on a simulation dataset:

```{r getdata, eval = TRUE, warning=FALSE, message=FALSE}
library(caROC)
### n1: number of cases
### n0: number of controls
n1 = n0 = 1000

## Z_D and Z_C are the covariates in the disease and control groups
Z_D1 <- rbinom(n1, size = 1, prob = 0.3)
Z_D2 <- rnorm(n1, 0.8, 1)

Z_C1 <- rbinom(n0, size = 1, prob = 0.7)
Z_C2 <- rnorm(n0, 0.8, 1)

Y_C_Z0 <- rnorm(n0, 0.1, 1)
Y_D_Z0 <- rnorm(n1, 1.1, 1)
Y_C_Z1 <- rnorm(n0, 0.2, 1)
Y_D_Z1 <- rnorm(n1, 0.9, 1)

## M0 and M1 are the outcome of interest (biomarker to be evaluated) in the control and disease groups
M0 <- Y_C_Z0 * (Z_C1 == 0) + Y_C_Z1 * (Z_C1 == 1) + Z_C2
M1 <- Y_D_Z0 * (Z_D1 == 0) + Y_D_Z1 * (Z_D1 == 1) + 1.5 * Z_D2

diseaseData <- data.frame(M = M1, Z1 = Z_D1, Z2 = Z_D2)
controlData <- data.frame(M = M0, Z1 = Z_C1, Z2 = Z_C2)

## we are interested in evaluating biomarker M while adjusting for covariate Z
userFormula = "M~Z1+Z2"
```

## 1. Covariate-adjusted sensitivity at controlled specificity level (or the reverse)

### 1.1 Compute pooled sensitivity at controlled specificed level

One can easily compute covariate-adjusted specificity at controlled sensitivity levels by specifying `control_sensitivity` and leaving `control_specificity` NULL. 

`mono_resp_method` is to choose which monotonicity restoration method to use, "none" or "ROC". `whichSE` is to choose how to compute standard error. It could be "boostrap" or "numerical", i.e. boostrap-based or sample-based SE. Try ?caROC to see more details of these arguments.

```{r controlspec, warning=FALSE, message=FALSE}
caROC(diseaseData,controlData,userFormula,
      control_sensitivity = c(0.2,0.8, 0.9),
      control_specificity = NULL,
      mono_resp_method = "ROC",
      whichSE = "bootstrap",nbootstrap = 100,
      CI_alpha = 0.95, logit_CI = TRUE)
```

To compute covariate-adjusted sensitivity at controlled specificity levels by specifying `control_specificity` and leaving `control_sensitivity` NULL. 

```{r controlsens, warning=FALSE, message=FALSE}
caROC(diseaseData,controlData,userFormula,
      control_sensitivity = NULL,
      control_specificity = c(0.7,0.8, 0.9),
      mono_resp_method = "none",
      whichSE = "sample",nbootstrap = 100,
      CI_alpha = 0.95, logit_CI = TRUE)
```

### 1.2 Sub-population specific sensitivity at controlled specificity level

Give the covariates of a subpopulation, we can also computed sensitivity at controlled specificity level.
```{r controlspecss, warning=FALSE, message=FALSE}
target_covariates = c(1, 0.7, 0.9)

sscaROC(diseaseData,controlData,
               userFormula = userFormula,
               control_sensitivity = c(0.2,0.8, 0.9),
               target_covariates = target_covariates,
               control_specificity = NULL,
               mono_resp_method = "none",
               whichSE = "sample",nbootstrap = 100,
               CI_alpha = 0.95, logit_CI = TRUE)
```

You can also specific covariates for multiple subpopualtions:
```{r multicontrolspecss, message=FALSE, warning=FALSE}
target_covariates = matrix(c(1, 0.7, 0.9,
                      1, 0.8, 0.8), 2, 3, byrow = TRUE)
sscaROC(diseaseData,controlData,
               userFormula = userFormula,
               control_sensitivity = c(0.2,0.8, 0.9),
               target_covariates = target_covariates,
               control_specificity = NULL,
               mono_resp_method = "none",
               whichSE = "sample",nbootstrap = 100,
               CI_alpha = 0.95, logit_CI = TRUE)
```


## 2. Covariate-adjusted ROC curve

### 2.1 Pooled ROC

Obtaining the covariate-adjusted ROC curve with sensitivity controlled through the whole spectrum is very easy. You can choose restoring monotonicity or no restoration when constructing ROC through argument `mono_resp_method`. It could be "none" (no monotonicity restoration) or "ROC" (curve-based monotonicity restoration). 

```{r ROC, warning=FALSE, message=FALSE}
### ROC with curve-based monotonicity restoration
curveROC <- caROC(diseaseData,controlData,userFormula,
                 mono_resp_method = "ROC", 
                 verbose = FALSE)
```

Plot the ROC curves:

```{r plotROC, warning=FALSE, message=FALSE}
oldpar <- par()
par(mar = c(3, 3, 2, 0.3), mgp = c(1.2, 0.3, 0))
plot_caROC(curveROC)
par(oldpar)
```

Construct confidence-band for the ROC curve:

```{r ROC2, warning=FALSE, message=FALSE}
curveROC_CB <- caROC_CB(diseaseData,controlData,
						userFormula, 
						mono_resp_method = "ROC",
						CB_alpha = 0.95,
						nbin = 100,verbose = FALSE)
```   
 
Plot the confidence band:

```{r plotROCband, warning=FALSE, message=FALSE}
oldpar <- par()
par(mar = c(3, 3, 2, 0.3), mgp = c(1.2, 0.3, 0))
plot_caROC_CB(curveROC_CB, add = FALSE, lty = 2, col = "blue")  
par(oldpar)
```

or plot the ROC and confidence band on the same plot:

```{r plotROCband2, warning=FALSE, message=FALSE}
oldpar <- par()
par(mar = c(3, 3, 2, 0.3), mgp = c(1.2, 0.3, 0))
plot_caROC(curveROC)
plot_caROC_CB(curveROC_CB, add = TRUE, lty = 2, col = "blue")
par(oldpar)
```

### 2.2 Sub-population specific ROC

The ROC curve for given subpopulation can be easily calculated:

```{r ssROC, warning=FALSE, message=FALSE}
target_covariates = c(1, 0.7, 0.9)
myROC <- sscaROC(diseaseData,
                 controlData,
                 userFormula,
                 target_covariates,
                 global_ROC_controlled_by = "sensitivity",
                 mono_resp_method = "none")
oldpar <- par()
par(mar = c(3, 3, 2, 0.3), mgp = c(1.2, 0.3, 0))
plot_sscaROC(myROC, lwd = 1.6)
par(oldpar)
```

Confidence band can also be computed, but may take ~10-20min for a dataset with 2000 samples.

```{r ssROCband, eval=FALSE}
myROCband <- sscaROC_CB(diseaseData,
                        controlData,
                        userFormula,
                        mono_resp_method = "none",
                        target_covariates,
                        global_ROC_controlled_by = "sensitivity",
                        CB_alpha = 0.95,
                        logit_CB = FALSE,
                        nbootstrap = 100,
                        nbin = 100,
                        verbose = FALSE)
oldpar <- par()
par(mar = c(3, 3, 2, 0.3), mgp = c(1.2, 0.3, 0))
plot_sscaROC_CB(myROCband, col = "purple", lty = 2)
par(oldpar)
```


## 3. Threshold at controlled sensitivity/specificity for given covariate values

In clinical setting, it is useful to know the specific thresholds of biomarkers at controlled sensitivity or specificity level for given covariate values.

```{r treshold, warning=FALSE, message=FALSE}
### this is the given covariates of interest
new_covariates <- data.frame(M = 1,
                      Z1 = 0.7,
                      Z2 = 0.9)
### controlling sensitivity levels
caThreshold(userFormula, new_covariates,
            diseaseData = diseaseData,
            controlData = NULL,
            control_sensitivity = c(0.7,0.8,0.9),
            control_specificity = NULL)
            
### controlling specificity levels
caThreshold(userFormula,new_covariates,
            diseaseData = NULL,
            controlData = controlData,
            control_sensitivity = NULL,
            control_specificity = c(0.7,0.8,0.9))
```