---
title: "Cross-validation with multiple ML algorithms"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Cross-validation with multiple ML algorithms}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "../man/figures/README-"
  )

library(dplyr)

load("../data/star.rda")

# specifying the outcome
outcomes <- "g3tlangss"

# specifying the treatment
treatment <- "treatment"

# specifying the data (remove other outcomes)
star_data <- star %>% dplyr::select(-c(g3treadss,g3tmathss))

# specifying the formula
user_formula <- as.formula(
  "g3tlangss ~ treatment + gender + race + birthmonth + 
  birthyear + SCHLURBN + GRDRANGE + GKENRMNT + GKFRLNCH + 
  GKBUSED + GKWHITE ")
```

We can estimate ITR with various machine learning algorithms and then compare the performance of each model. The package includes all ML algorithms in the  `caret` package and 2 additional algorithms ([causal forest](https://grf-labs.github.io/grf/reference/causal_forest.html) and [bartCause](https://CRAN.R-project.org/package=bartCause)).

The package also allows estimate heterogeneous treatment effects on the individual and group-level. On the individual-level, the summary statistics and the AUPEC plot show whether assigning individualized treatment rules may outperform complete random experiment. On the group-level, we specify the number of groups through `ngates` and estimating heterogeneous treatment effects across groups. 

```{r multiple, message=TRUE, warning=TRUE}
library(evalITR)

# specify the trainControl method
fitControl <- caret::trainControl(
                           method = "repeatedcv",
                           number = 2,
                           repeats = 2)
# estimate ITR
set.seed(2021)
fit_cv <- estimate_itr(
               treatment = "treatment",
               form = user_formula,
               data = star_data,
               trControl = fitControl,
               algorithms = c(
                  "causal_forest", 
                  # "bartc",
                  # "rlasso", # from rlearner 
                  # "ulasso", # from rlearner 
                  "lasso" # from caret package
                  # "rf" # from caret package
                  ), # from caret package
               budget = 0.2,
               n_folds = 2)

# evaluate ITR
est_cv <- evaluate_itr(fit_cv)

# summarize estimates
summary(est_cv)
```

We plot the estimated Area Under the Prescriptive Effect Curve for the writing score across different ML algorithms.


```{r multiple_plot, fig.width=8, fig.height=6,fig.align = "center"}
# plot the AUPEC with different ML algorithms
plot(est_cv)
```