---
title: "SelectionBias"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{SelectionBias}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: "references.bib"
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(SelectionBias)
```

# Introduction

Selecting a study population from a larger source population, based on the
research question, is a common procedure, for example in an observational study
with data from a population register. Subjects who fulfill all the selection
criteria are included in the study population, and subjects who do not fulfill
at least one selection criterion are excluded from the study population. These
selections might introduce a systematic error when estimating a causal effect,
commonly referred to as selection bias. Selection bias can also arise if the
selections are involuntary, for example, if there are dropouts or other missing
values for some individuals in the study. In an applied study, it is often of
interest to assess the magnitude of potential biases using a sensitivity
analysis, such as bounding the bias. Different bounds, the SV (Smith and
VanderWeele), sharp, AF (assumption-free), GAF (generalized assumption-free), 
and CAF  (counterfactual assumption-free), for the causal estimand under 
selection bias can be calculated in this R package `SelectionBias`. The content 
is:

* `zika_learner`: a simulated dataset of zika virus and microcephaly inspired
both by data and a previous example [@de2018association;@smith2019bounding]. 
* `sensitivityparametersM()`: a function that calculates the sensitivity
parameters for the SV and GAF bounds for an assumed model following the
M-structure in Figure 1. 
* `SVbound()`: a function that calculates the SV lower and upper bound for the 
risk ratio or risk difference in either the total or subpopulation for 
sensitivity parameters given by the user, or calculated from
`sensitivityparametersM()`.
* `sharpbound()`: a function that calculates the sharp lower and upper bound for
the risk ratio or risk difference in either the total or subpopulation for 
sensitivity parameters given by the user, or calculated from
`sensitivityparametersM()`.
* `AFbound()`: a function that calculates the AF lower and upper bounds for the
risk ratio or risk difference in either the total or subpopulation for a
dataset or probabilities from the data.
* `GAFbound()`: a function that calculates the GAF lower and upper bounds for
the risk ratio or risk difference in either the total or subpopulation for a
dataset or probabilities from the data and sensitivity parameters either given
by the user, or calculated from `sensitivityparametersM()`.
* `CAFbound()`: a function that calculates the CAF lower and upper bounds for
the risk ratio or risk difference in either the total or subpopulation for a
dataset or probabilities from the data and sensitivity parameters.
* `checksharpSVbound()`: a function that evaluates if the SV bound is sharp.

For the formulas of the bounds as well as the theory behind them, we refer to
the original papers [@smith2019bounding;@zetterstrom2022selection;@zetterstrom2023Rpackage;@zetterstrom2024;zetterstrom2025investigations].

![Figure 1. The generalized M-structure.](Figures/genM.png){width=30%}


# R package

## Simulated zika dataset

To illustrate the bounds, a simulated dataset, `zika_learner`, is constructed.
It is inspired by a numerical zika example used in @smith2019bounding together
with a case-control study that investigates the effect of zika virus on
microcephaly [@de2018association]. The variables included are: 

* *Living area* $(V)$
* *Socioeconomic status, SES* $(U)$
* *Zika* $(T)$
* *Microcephaly* $(Y)$
* *Birth* $(S_1)$
* *Public hospital* $(S_2)$

The relationships between the variables are illustrated in Figure 2 and Table 1.
The prevalences of the variables, and strengths of dependencies between them,
are chosen to mimic real data and the assumed values for the sensitivity
parameters in @smith2019bounding. The simulated data mimics a cohort with 5000
observations, even though the original study is a case-control study. For more
details of the variables and the models, see @zetterstrom2023Rpackage.

![Figure 2. Causal model for the `zika_learner` dataset.](Figures/zikaDag.png){width=50%}

The causal dependencies are generated by the logit models described in Table 1. 

Table: Table 1. Data generating process for the dataset `zika_learner`. Models
generating causal dependencies are logistic, $g(X'\theta)$, for predictor
variables $X$ (including a constant 1 for the intercept) and model parameter 
$\theta$ (including the intercept).

| Model                                   | Coefficients ($\theta$)/Proportions | Function argument    |
| :-------------------------------------: | :---------------------------------: | -------------------: |
| $P(V=1)$                                | $0.85$                              | `Vval`               |
| $P(U=1)$                                | $0.50$                              | `Uval`               |
| $P(T=1|V)=g(V'\theta_T)$                | $(-6.20,1.75)$                      | `Tcoef`              |
| $P(Y=1|T,U)=g[(T,U)'\theta_{Y}]$        | $(-5.20,5.00,-1.00)$                | `Ycoef`              |
| $P(S_1=1|V,U,T)=g[(V,U,T)'\theta_{S1}]$ | $(1.20,0.00,2.00,-4.00)$            | `Scoef`              |
| $P(S_2=1|V,U,T)=g[(V,U,T)'\theta_{S2}]$ | $(2.20,0.50,-2.75,0.00)$            | `Scoef`              |

The data was generated in `R`, version 4.2.0, using the package `arm`,
version 1.13-1, with the following code:

```{r eval = FALSE}
# Seed.
set.seed(158118)

# Number of observations.
nObs = 5000

# The unmeasured variable, living area (V).
urban = rbinom(nObs, 1, 0.85)

# The treatment variable, zika.
zika_prob = arm::invlogit(-6.2 + 1.75 * urban)
zika = rbinom(nObs, 1, zika_prob)

# The unmeasured variable, SES (U).
SES = rbinom(nObs, 1, 0.5)

# The outcome variable, microcephaly.
mic_ceph_prob = arm::invlogit(-5.2 + 5 * zika - 1 * SES)
mic_ceph = rbinom(nObs, 1, mic_ceph_prob)

# The first selection variable, birth.
birth_prob = arm::invlogit(1.2 - 4 * zika + 2 * SES)
birth = rbinom(nObs, 1, birth_prob)

# The second selection variable, hospital.
hospital_prob = arm::invlogit(2.2 + 0.5 * urban - 2.75 * SES)
hospital = rbinom(nObs, 1, hospital_prob)

# The selection indicator.
sel_ind = birth * hospital
```
The resulting proportions of the `zika_learner` data, for the total dataset, the
subset with $S_1=1$ and the subset with $S_1=S_2=1$ are seen in Tables 2-4.

```{r echo = FALSE}
zika_learner2 = zika_learner

zika_learner2$zika = ifelse(zika_learner2$zika==1, "Zika infected", "Not zika infected")

table1::label(zika_learner2$mic_ceph) = "Microcephaly"
table1::label(zika_learner2$urban) = "Living area"
table1::label(zika_learner2$SES) = "SES"

my.render.cont <- function(x) {
  with(table1::stats.apply.rounding(table1::stats.default(x), digits=3, rounding.fn = table1::round_pad), c("",
                                                           "Mean "=sprintf("%s", MEAN)))
}

table1::table1(~ mic_ceph + urban + SES | zika, data=zika_learner2, render.continuous = my.render.cont,
           caption = "Table 2. Proportions for the simulated dataset, by treatment status and overall.")

zika_learner2 = subset(zika_learner2,zika_learner2$birth!=0)

table1::label(zika_learner2$mic_ceph) = "Microcephaly"
table1::label(zika_learner2$urban) = "Living area"
table1::label(zika_learner2$SES) = "SES"

table1::table1(~ mic_ceph + urban + SES | zika, data=zika_learner2, render.continuous = my.render.cont,
       caption = "Table 3. Proportions for the simulated dataset, by treatment status and overall, after the first selection.")

zika_learner2 = subset(zika_learner2,zika_learner2$sel_ind!=0)

table1::label(zika_learner2$mic_ceph) = "Microcephaly"
table1::label(zika_learner2$urban) = "Living area"
table1::label(zika_learner2$SES) = "SES"

table1::table1(~ mic_ceph + urban + SES | zika, data=zika_learner2, render.continuous = my.render.cont,
       caption = "Table 4. Proportions for the simulated dataset, by treatment status and overall, after both selections.")
```
The dataset and data generating process (DGP) can be used to test the functions
in `SelectionBias`.


## `sensitivityparametersM()`

The sensitivity parameters for the SV, sharp, and GAF bounds are calculated for 
the generalized M-structure, illustrated in Figure 1. The sensitivity parameters 
are only calculated for an assumed model structure, since they depend on the 
unobserved variable, *U*. However, the observed probabilities of the outcome, 
$P(Y=1|T=t,I_S=1)$, $t=0,1$, are inputs. The code and the output are:
```{r eval = TRUE}
# SV bound
sensparSV = sensitivityparametersM(whichEst = "RR_tot",
                   whichBound = "SV",
                   Vval = matrix(c(1, 0, 0.85, 0.15), ncol = 2),
                   Uval = matrix(c(1, 0, 0.5, 0.5), ncol = 2),
                   Tcoef = c(-6.2, 1.75),
                   Ycoef = c(-5.2, 5.0, -1.0),
                   Scoef = matrix(c(1.2, 2.2, 0.0, 0.5,
                                    2.0, -2.75, -4.0, 0.0),
                                  ncol = 4),
                   Mmodel = "L",
                   pY1_T1_S1 = 0.286,
                   pY1_T0_S1 = 0.004)
print(sensparSV)

# Sharp bound
sensparSharp = sensitivityparametersM(whichEst = "RR_tot",
                   whichBound = "sharp",
                   Vval = matrix(c(1, 0, 0.85, 0.15), ncol = 2),
                   Uval = matrix(c(1, 0, 0.5, 0.5), ncol = 2),
                   Tcoef = c(-6.2, 1.75),
                   Ycoef = c(-5.2, 5.0, -1.0),
                   Scoef = matrix(c(1.2, 2.2, 0.0, 0.5,
                                    2.0, -2.75, -4.0, 0.0),
                                  ncol = 4),
                   Mmodel = "L",
                   pY1_T1_S1 = 0.286,
                   pY1_T0_S1 = 0.004)

print(sensparSharp)

# GAF bound
sensparGAF = sensitivityparametersM(whichEst = "RR_tot",
                   whichBound = "GAF",
                   Vval = matrix(c(1, 0, 0.85, 0.15), ncol = 2),
                   Uval = matrix(c(1, 0, 0.5, 0.5), ncol = 2),
                   Tcoef = c(-6.2, 1.75),
                   Ycoef = c(-5.2, 5.0, -1.0),
                   Scoef = matrix(c(1.2, 2.2, 0.0, 0.5,
                                    2.0, -2.75, -4.0, 0.0),
                                  ncol = 4),
                   Mmodel = "L",
                   pY1_T1_S1 = 0.286,
                   pY1_T0_S1 = 0.004)

print(sensparGAF)
```
The first argument is `whichEst`, where the user inputs the causal estimand of
interest. It must be one of the four `"RR_tot"`, `"RD_tot"`, `"RR_sub"` or 
`"RD_sub"`. The second argument is `whichBound`, where the user inputs the bound
they want to use, either `"SV"`, `"sharp"`, or `"GAF"`. Third, the argument `Vval`
takes the matrix for *V* as input. The first column contains the values that *V* 
can take, and the second column contains the corresponding probabilities. In 
this example, *V* is binary, so the first two elements in the matrix are 1 and 0.
However, any discrete *V* can be used. An approximation of a continuous *V* can 
be used, if it is discretized. The fourth argument is `Uval`, which takes the 
matrix for *U* as input. The matrix *U* has a similar structure as *V*. The 
fifth argument is `Tcoef`, containing the coefficients used in the model for *T*. 
The first entry in `Tcoef` is the intercept of the model, and the second the 
slope for *V*. The sixth argument is `Ycoef`, containing the coefficient vector 
for the outcome model, where the first entry is the intercept, the second the 
slope coefficient for *T* and third is the slope coefficient for *U*. The seventh 
argument is `Scoef`, the coefficient matrix for the selection variables. The 
number of rows is equal to the number of selection variables, and the number of 
columns is equal to four. The columns represent the intercept, and slope 
coefficients for *V*, *U* and *T*, respectively. A summary of the code 
notation is seen in the last column of Table 3. The eighth argument is `Mmodel`, 
which indicates whether the models in the M-structure are probit (`Mmodel = "P"`) 
or logit (`Mmodel = "L"`). The ninth and tenth arguments are `pY1_T1_S1` and
`pY1_T0_S1`. They are the observed probabilities $P(Y=1|T=1,I_S=1)$ and
$P(Y=1|T=0,I_S=1)$. The output is the sensitivity parameters for the chosen 
bound.

In the zika example, the estimand of interest is the risk ratio in the
total population, `whichEst = "RR_tot"`, the DGP is found in Table 1, logistic
models are used in the DGP, and the probabilities are found in Table 4. For the
SV and sharp bounds, the output is $RR_{UY|T=1}=1.9448$, $RR_{UY|T=0}=2.7089$, 
$RR_{SU|11}=1.5566$, $RR_{SU|00}=1.7058$,$RR_{SU|10}=1.9998$, and 
$RR_{SU|01}=1.7998$, which gives $BF_{11}=1.2102$ and $BF_{00}=1.3532$, 
$BF_{10}=1.3208$, and $BF_{01}=1.3895$. For the GAF bound, the output is 
$M_T=0.4502$ and $m_T=0.002$.

## `SVbound()`

The SV bound can be calculated using the function `SVbound()`. The first
argument is `whichEst`, indicating the causal estimand of interest (`"RR_tot"`,
`"RD_tot"`, `"RR_sub"` or `"RD_sub"`). The second argument is `sens`, which is 
either output from `sensitivityparametersM()`, a named list or a data frame with
column names parameter and value, which includes the sensitivity parameters.
This input is optional as the sensitivity parameters can also be specified 
directly in the function with the below arguments. The third and fourth arguments
are the observed conditional probabilities $P(Y=1|T=1,I_S=1)$ and 
$P(Y=1|T=0,I_S=1)$, which are needed to calculate bounds for the causal 
estimands and not the selection bias itself. The optional fifth and sixth 
arguments are the treatment probabilities $P(T=1|I_S=1)$ and $P(T=0|I_S=1)$. 
These are only needed for the alternative SV bound for the causal risk difference 
in the subpopulation. If they are not included, the original SV bound in the
risk difference is calculated. The subsequent arguments are the 
sensitivity parameters provided by the user. The default value for all 
sensitivity parameters are `NULL`, and the user must then specify numeric values
on the sensitivity parameters that are necessary for the bound for the chosen 
estimand. These arguments are also optional in case the argument `sens` is used.
The sensitivity parameters can either be calculated using the function
`sensitivityparametersM()`, or found elsewhere. For sensitivity parameters found
elsewhere, `SVbound()` is not restricted to the generalized M-structure. 
However, the necessary assumptions for the SV bound must still be fulfilled
[@smith2019bounding]. The output is the SV bound. The code and output are:
```{r eval = TRUE}
SVbound(whichEst = "RR_tot",
        pY1_T1_S1 = 0.286,
        pY1_T0_S1 = 0.004,
        RR_UY_T1 = 1.9448,
        RR_UY_T0 = 2.7089,
        RR_SU_11 = 1.5566,
        RR_SU_00 = 1.7058,
        RR_SU_10 = 1.9998,
        RR_SU_01 = 1.7998)
```
As before in the zika example, the causal estimand is the risk ratio in the
total population, `whichEst = "RR_tot"`. The sensitivity parameters are
$RR_{UY|T=1}=1.9448$, $RR_{UY|T=0}=2.7089$, $RR_{SU|11}=1.5566$, 
$RR_{SU|00}=1.7058$,$RR_{SU|10}=1.9998$, and $RR_{SU|01}=1.7998$, calculated 
above in `sensitivityparametersM()`, which gives a lower SV bound equal to 43.66 
and a higher SV bound equal to 131.22. Alternatively, the output from the function
`sensitivityparametersM()` can be used directly as input which gives the code:
```{r eval = TRUE}
SVbound(whichEst = "RR_tot",
        sens = sensparSV,
        pY1_T1_S1 = 0.286,
        pY1_T0_S1 = 0.004)
```
This approach gives the same bounds as specifying the sensitivity parameters
manually.

## `sharpbound()`

The sharp bound can be calculated using the function `sharpbound()`. The input
arguments are very similar to that of `SVbound()`. The first argument is 
`whichEst`, indicating the causal estimand of interest (`"RR_tot"`,
`"RD_tot"`, `"RR_sub"` or `"RD_sub"`). The second argument is `sens`, which is 
either output from `sensitivityparametersM()`, a named list or a data frame with
column names parameter and value, which includes the sensitivity parameters.
This input is optional as the sensitivity parameters can also be specified 
directly in the function with the below arguments. The third and fourth arguments
are the observed conditional probabilities $P(Y=1|T=1,I_S=1)$ and 
$P(Y=1|T=0,I_S=1)$, which are needed to calculate bounds for the causal 
estimands and not the selection bias itself. The fifth and sixth 
arguments are the treatment probabilities $P(T=1|I_S=1)$ and $P(T=0|I_S=1)$. 
These are only needed for the bounds for the causal estimands in the 
subpopulation. The seventh and eighth arguments are the probabilities 
$P(I_S=1|T=1)$ and $P(I_S=1|T=0)$ which are needed for the bounds for the causal 
estimands in the total population. If they are unknown, they can be set to 0
for more conservative bounds. The subsequent arguments are the 
sensitivity parameters provided by the user. The default value for all 
sensitivity parameters are `NULL`, and the user must then specify numeric values
on the sensitivity parameters that are necessary for the bound for the chosen 
estimand. These arguments are also optional in case the argument `sens` is used. 
The sensitivity parameters can either be calculated using the function
`sensitivityparametersM()`, or found elsewhere. For sensitivity parameters found
elsewhere, `sharpbound()` is not restricted to the generalized M-structure. 
However, the necessary assumptions for the sharp bound must still be fulfilled
[@zetterstrom2025investigations]. The output is the sharp bound. The code and 
output with the two options to specify the sensitivity parameters are:
```{r eval = TRUE}
sharpbound(whichEst = "RR_tot",
        pY1_T1_S1 = 0.286,
        pY1_T0_S1 = 0.004,
        pS1_T1 = 0.11,
        pS1_T0 = 0.58,
        RR_UY_T1 = 1.9448,
        RR_UY_T0 = 2.7089,
        RR_SU_11 = 1.5566,
        RR_SU_00 = 1.7058,
        RR_SU_10 = 1.9998,
        RR_SU_01 = 1.7998)

sharpbound(whichEst = "RR_tot",
        sens = sensparSharp,
        pY1_T1_S1 = 0.286,
        pY1_T0_S1 = 0.004,
        pS1_T1 = 0.11,
        pS1_T0 = 0.58,)
```
As before in the zika example, the causal estimand is the risk ratio in the
total population, `whichEst = "RR_tot"`. The observed probabilities come from
the dataset, and the sensitivity parameters are $RR_{UY|T=1}=1.9448$, 
$RR_{UY|T=0}=2.7089$, $RR_{SU|11}=1.5566$, $RR_{SU|00}=1.7058$,
$RR_{SU|10}=1.9998$, and $RR_{SU|01}=1.7998$, calculated above in 
`sensitivityparametersM()`, which gives a lower sharp bound equal to 73.91 and a 
higher sharp bound equal to 78.99.


## `checksharpSVbound()`

The sharpness of the SV bound can be evaluated using `checksharpSVbound()` 
[@zetterstrom2025investigations]. The first argument is 
`whichEst`, indicating the causal estimand of interest (`"RR_tot"`, `"RR_sub"`
or `"RD_sub"`). Note that the SV bound for the risk difference in the total
population is not sharp. The second argument is `sens`, which is either output 
from `sensitivityparametersM()`, a named list or a data frame with
column names parameter and value, which includes the bounding factors ($BF$).
This input is optional as the bounding factors can also be specified 
directly in the function with the argument below. The third argument is `BF`, 
the bounding factors $BF_{00}$ and $BF{10}$ for the total population, and $BF_0$
and $BF_1$ for the subpopulation. The fourth argument is `pY1`, the probabilities
$P(Y=1|T=1,I_S=1)$ and $P(Y=1|T=0,I_S=1)$. Note that the order of the bounding 
factors and probabilities matter. The output is two strings stating whether the 
lower and upper SV bound are sharp or not. Note that the SV bound for the risk 
ratio in the total population and risk difference in the subpopulation can only 
be arbitrarily sharp, see [@zetterstrom2025investigations] for more details. 
The code and output are:
```{r eval = TRUE}
checksharpSVbound(whichEst = "RR_tot",
        sens = sensparSV,
        pY1 = c(0.286, 0.004))
```
Here, both the lower and upper SV bounds are arbitrarily sharp.


## `AFbound()`

The AF bound is calculated using the function `AFbound()`. The first argument is
the causal estimand of interest (`"RR_tot"`, `"RD_tot"`, `"RR_sub"` or
`"RD_sub"`). The second argument is `outcome`, where the user inputs either the
observed numeric vector with the outcome variable or a vector with the
conditional outcome probabilities, $P(Y=1|T=1,I_S=1)$ and $P(Y=1|T=0,I_S=1)$.
The third argument is `treatment`, where the user inputs either the observed
numeric vector with the treatment variable or a vector with the conditional 
treatment probabilities, $P(T=1|I_S=1)$ and $P(T=0|I_S=1)$. The fourth argument
is `selection` where the user can either input the observed selection vector
or the selection probability. Its default value is `NULL` since it is only 
required when the causal estimands in the total population are of interest. If 
the subpopulation is of interest and `selection = NULL`, the outcome and
treatment vectors must only include the selected subjects. The output is the 
lower and upper AF bounds. The code and output are:
```{r eval = TRUE}
attach(zika_learner)

AFbound(whichEst = "RR_tot",
        outcome = mic_ceph[sel_ind == 1],
        treatment = zika[sel_ind == 1],
        selection = mean(sel_ind))
```
Similar to before, `whichEst = "RR_tot"`. Furthermore, the outcome and treatment
variables are microcephaly and zika. The selection probability is specified 
since the other variables are restricted to those subjects with $I_S=1$. The 
output is the lower and upper AF bounds, which are 0 and 454.09 in the zika example.

If the raw data is not available, one can input the conditional probabilities
instead. In this example, these probabilities are:
```{r eval = TRUE}
AFbound(whichEst = "RR_tot",
        outcome = c(0.286, 0.004),
        treatment = c(0.002, 0.998),
        selection = mean(sel_ind))
```
The difference in these two examples comes from rounding errors in the estimated 
probabilities.


## `GAFbound()`

The GAF bound is calculated using the function `GAFbound()`. The first argument
is the causal estimand of interest (`"RR_tot"`, `"RD_tot"`, `"RR_sub"` or
`"RD_sub"`). The second argument is `sens`, which is either output from 
`sensitivityparametersM()`, a named list or a data frame with column names 
"parameter" and "value", which includes the sensitivity parameters. This input 
is optional as the sensitivity parameters can also be specified directly in the 
function with the below arguments. The third and fourth arguments are `M` and 
`m` which are the two sensitivity parameters for the GAF bound. The sensitivity 
parameters can either be calculated using `sensitivityparametersM()`, or found 
elsewhere. For sensitivity parameters found elsewhere, `GAFbound()` is not 
restricted to the generalized M-structure. However, the necessary assumptions 
for the GAF bound must still be fulfilled [@zetterstrom2024]. The fifth argument 
is `outcome`, where the user inputs either the observed numeric vector with the 
outcome variable or a vector with the conditional outcome probabilities, 
$P(Y=1|T=1,I_S=1)$ and $P(Y=1|T=0,I_S=1)$. The fifth argument is `treatment`, 
where the user inputs either the observed numeric vector with the treatment 
variable or a vector with the conditional treatment probabilities, 
$P(T=1|I_S=1)$ and $P(T=0|I_S=1)$. The sixth argument is `selection` where the 
user can either input the observed selection vector or selection probability. 
Its default value is `NULL` since it is only required when the causal estimands 
in the total population are of interest. If the subpopulation is of interest and 
`selection = NULL`, the outcome and treatment vectors must only include the 
selected subjects. The output is the lower and upper GAF bounds. The code and 
output are:
```{r eval = TRUE}
GAFbound(whichEst = "RR_tot",
         M = 0.4502,
         m = 0.002, 
         outcome = mic_ceph[sel_ind == 1],
         treatment = zika[sel_ind == 1],
         selection = mean(sel_ind))

GAFbound(whichEst = "RR_tot",
         sens = sensparGAF, 
         outcome = mic_ceph[sel_ind == 1],
         treatment = zika[sel_ind == 1],
         selection = mean(sel_ind))
```
Similar to before, `whichEst = "RR_tot"`. The sensitivity parameters are the 
output from `sensitivityparametersM()`. Furthermore, the outcome and treatment
variables are microcephaly and zika. The selection probability is specified 
since the other variables are restricted to those subjects with $I_S=1$. The 
output is the lower and upper GAF bounds, which are 0.01 and 147.42 in the 
zika example. If the raw data is not available, one can input the conditional 
probabilities instead, similar to the AF bound.


## `CAFbound()`

The CAF bound is calculated using the function `CAFbound()`. The first argument
is the causal estimand of interest (`"RR_tot"`, `"RD_tot"`, `"RR_sub"` or
`"RD_sub"`). The second and third arguments are `M` and `m` which are the two
sensitivity parameters for the CAF bound. The fourth argument is `outcome`,
where the user inputs either the observed numeric vector with the outcome
variable or a vector with the conditional outcome probabilities,
$P(Y=1|T=1,I_S=1)$ and $P(Y=1|T=0,I_S=1)$. The fifth argument is `treatment`,
where the user inputs either the observed numeric vector with the treatment
variable or a vector with the conditional treatment probabilities, 
$P(T=1|I_S=1)$ and $P(T=0|I_S=1)$. The sixth argument is `selection` where the
user can either input the observed selection vector or selection probability.
Its default value is `NULL` since it is only required when the causal estimands
in the total population are of interest. If the subpopulation is of interest and
`selection = NULL`, the outcome and treatment vectors must only include the
selected subjects. The output is the lower and upper CAF bounds. The code and
output are:
```{r eval = TRUE}
CAFbound(whichEst = "RR_tot",
         M = 0.3,
         m = 0.005, 
         outcome = c(0.286, 0.004),
         treatment = c(0.002, 0.998),
         selection = mean(sel_ind))
```
Similar to before, `whichEst = "RR_tot"`. The sensitivity parameters are chosen 
by the user. Furthermore, the outcome and treatment variables are microcephaly 
and zika. The selection probability is specified since other variables are 
restricted to those subjects with $I_S=1$. The output is the lower and upper 
CAF bounds, which are 0.04 and 67.78 in the zika example. If the raw data is not 
available, one can input the conditional probabilities instead, similar to the 
AF and GAF bounds.


# References
