---
title: "How to use codalm"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{How to use codalm}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

We will start by loading the `codalm` R package.

```{r setup}
library(codalm)
```

We will now access the data from the ggtern package. We will be analyzing how
two different methods (image analysis or microscopic inspection) estimate the composition
of 30 white blood cells. The format that we need is for both compositions to be in matrices,
with one row per observation. We will also normalize the rows of these matrices to ensure 
that they sum to 1, although the `codalm` function would also take care of this for us.

```{r load_data}
data("WhiteCells", package = 'ggtern')
image <- subset(WhiteCells, Experiment == "ImageAnalysis")
image_mat <- as.matrix(image[,c("G", "L", "M")])
microscopic <- subset(WhiteCells, Experiment == "MicroscopicInspection")
microscopic_mat <- as.matrix(microscopic[,c("G", "L", "M")])

image_mat  <- image_mat  / rowSums(image_mat)
microscopic_mat <- microscopic_mat / rowSums(microscopic_mat)
```

To estimate the coefficient matrix B, we can use the `codalm` function.

```{r estimate_B}
B_est <- codalm(y = microscopic_mat, x = image_mat)
B_est
```

To see the interpretation of this matrix, please see [Fiksel et al. (2022)](https://doi.org/10.1111/biom.13465). If all the rows of B_est are exactly the same, it is recommended to set `accelerate = FALSE` as a sensitivity check.

We can also use the bootstrap to estimate 95% confidence intervals. We will
only use 50 bootstrap iterations as an example (nboot = 50), but is recommended to do more.

```{r bootstrap}
B_ci <- codalm_ci(y = microscopic_mat, x = image_mat, nboot = 50,
                   conf = .95)
B_ci$ci_L
B_ci$ci_U
```

These matrices given the lower and upper bounds for the confidence interval for each
element of the coefficient matrix.

You can also take advantage of parallelization, if you have multiple cores available.

```{r bootstrap_parallel}
ncores <- 2
Sys.setenv(R_FUTURE_SUPPORTSMULTICORE_UNSTABLE = "quiet")
B_ci_parallel <- codalm_ci(y = microscopic_mat, x = image_mat, nboot = 50,
                   conf = .95, parallel = TRUE, ncpus = ncores, strategy = 'multisession')
isTRUE(all.equal(B_ci$ci_L, B_ci_parallel$ci_L))
isTRUE(all.equal(B_ci$ci_U, B_ci_parallel$ci_U))
# identical(B_ci$ci_L, B_ci_parallel$ci_L)
# identical(B_ci$ci_U, B_ci_parallel$ci_U)
```

Finally, we will do a permutation test for linear independence. Again, we will only do 50
permutations as an example, but in practice this number should be higher. For demonstration purposes,
we will generate the compositional outcome independently of the  compositional predictor

```{r independence_test}
set.seed(123)
x <- gtools::rdirichlet(100, rep(1, 3))
y <- gtools::rdirichlet(100, rep(1, 3))
indep_test_pval <- codalm_indep_test(y = y, x = x,
                                     nperms = 100, init.seed = 123)
indep_test_pval
```

This function can also be parallelized. Unlike the bootstrapping, there is no need
to differentiate between whether the user is using a Windows or Unix system.

```{r independence_test_parallel}
indep_test_pval_parallel <- codalm_indep_test(y = y, x = x,
                                              nperms = 100, init.seed = 123,
                                              parallel = TRUE, ncpus = ncores,
                                              strategy = 'multisession')
indep_test_pval_parallel
```

