---
title: "Reproduction of manuscript's results"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Reproduction of manuscript's results}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

## Introduction

This vignette serves the purpose to reproduce the results shown in the section "Real data example" of the manuscript describing *collaborative graphical lasso* ([Albanese, Kohlen and Behrouzi, 2024](#ref)). The computations carried out by the following code are heavy and could take a considerable amount of time. On our machine (RAM 48 Gb, CPU Intel(R) Xeon(R) W-2235 CPU @ 3.80GHz) it took almost 3 hours. Hence, we refer the user who wants to get started with the package to the introductory vignette: `vignette("coglasso")`. *Collaborative graphical lasso* is an algorithm to reconstruct multi-omics networks based on *graphical lasso* ([Friedman, Hastie and Tibshirani, 2008](#ref)) and *collaborative regression* ([Gross and Tibshirani, 2015](#ref)). The coglasso package implements and R interface to this algorithm. We will use it to reproduce the results obtained in the original manuscript.

Let us first attach coglasso.

```{r setup}
library(coglasso)
```

We will use the multi-omics data set provided by the package, in its largest version: `multi_omics_sd`. Let's inspect its description in the manual.

```{r}
help(multi_omics_sd)
```

We notice that this data set has 162 transcripts and 76 metabolites. For further information on the data set sources and generation, we refer the user to the script [multi_omics_sd.R](https://github.com/DrQuestion/coglasso/blob/main/data-raw/multi_omics_sd.R) on the GitHub repository of the package.

## Reproducing the results

For the multi-omics network reconstruction carried out in the manuscript we explored 30 automatically generated $λ_w$ and $λ_b$ values and 9 given $c$ values.

```{r}
nlambda_w <- 30
nlambda_b <- 30
cs <- c(0.01, 0.05, 0.1, 0.2, 1, 5, 10, 20, 100)
```

We can now build the networks, one per combination of the chosen hyperparameters, with `coglasso()`.

```{r}
cg <- coglasso(multi_omics_sd,
  p = 162,
  nlambda_w = nlambda_w,
  nlambda_b = nlambda_b,
  c = cs
)
```

We proceed now to select the most stable, yet sparse network, using *eXtended StARS* (*XStARS*, [Albanese, Kohlen and Behrouzi, 2024](#ref)), an adaptation to coglasso of *StARS* ([Liu, Roeder and Wasserman, 2010](#ref)). The function implementing *XStARS* is `select_coglasso()`, when setting the argument `method` to "xstars" (which is the default behaviour).

```{r}
set.seed(42)
sel_cg <- select_coglasso(cg, method = "xstars")
```

We display now the selected network using the package `igraph`, reproducing the one displayed in Figure 3 of the manuscript. The orientation of the network may vary.

```{r}
plot(sel_cg)
```

We now plot the subnetwork of the gene *Cirbp* and of its neighborhood. This reproduces the network displayed in Figure 4A.

```{r}
igraph::V(sel_graph)$label<-colnames(sel_cg$data)
cirbp_index <- which(colnames(sel_cg$data) == "Cirbp")
subnetwork <- igraph::subgraph(
  sel_graph,
  c(cirbp_index, igraph::neighbors(sel_graph, cirbp_index))
)
plot(subnetwork)
```

A useful way to extract information from a network is community discovery. This technique allows to simplify a network by breaking it in functional units where nodes share several connections, enough to be recognized as separate communities. In the manuscript we used the greedy algorithm described in [Clauset, Newman and Moore, 2004](#ref), part of the `igraph` toolkit. We focused and inspected the second largest community, shown in Figure 4B. Here is the code to reproduce it.

```{r}
communities <- igraph::cluster_fast_greedy(sel_graph)
community2 <- communities[[2]]
community2_graph<-igraph::subgraph(sel_graph, community2)

# Focusing on nodes of interest
fosjun_erg_AA <- c("Fos", "Fosb", "Junb", "Fosl2", "Egr1", "Egr2", "Egr3", "Ala", "Arg", "Asn","His","Ile","Leu","Lys","Met","Orn","Phe","Ser","Thr","Tyr","Val")
igraph::V(community2_igraph)[!(label %in% fosjun_erg_AA)]$color<-c(rep("#3bd8ff", 29), rep("#ffadad", 5))
igraph::V(community2_igraph)[!(label %in% fosjun_erg_AA)]$frame.color<-NA
igraph::V(community2_igraph)[!(label %in% fosjun_erg_AA)]$label<-NA

lo_community2 <- igraph::layout_with_fr(community2_graph)
plot(community2_graph, layout=lo_community2)
```

## References {#ref}

  Albanese, A., Kohlen, W., & Behrouzi, P. (2024). Collaborative graphical lasso (arXiv:2403.18602). *arXiv* <https://doi.org/10.48550/arXiv.2403.18602>
  
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. *Biostatistics*, 9(3), 432--441. <https://doi.org/10.1093/biostatistics/kxm045>

Gross, S. M., & Tibshirani, R. (2015). Collaborative regression. *Biostatistics*, 16(2), 326--338. <https://doi.org/10.1093/biostatistics/kxu047> 

Liu, H., Roeder, K., & Wasserman, L. (2010). Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models (arXiv:1006.3316). *arXiv.* <https://doi.org/10.48550/arXiv.1006.3316>

Clauset, A., Newman, M. E. J., & Moore, C. (2004). Finding community structure in very large networks. *Physical Review E*, 70(6), 066111. <https://doi.org/10.1103/PhysRevE.70.066111>
