---
title: "Introduction to scLink"
author: "Wei Vivian Li, Rutgers Department of Biostatistics and Epidemiology"
date: "`r Sys.Date()`"
#output: rmarkdown::github_document
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{scLink: Inferring Sparse Gene Co-expression Networks from Single Cell Expression Data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

A system-level understanding of the regulation and coordination mechanisms of gene expression is essential to understanding the complexity of biological processes in health and disease. With the rapid development of single-cell RNA sequencing technologies, it is now possible to investigate gene interactions in a cell-type-specific manner. Here we introduce the `scLink` package, which uses statistical network modeling to understand the co-expression relationships among genes and to construct sparse gene co-expression networks from single-cell gene expression data. 

Here we demonstrate the functionality of `scLink` using the example data stored in the package.

```{r}
library(scLink)
count = readRDS(system.file("extdata", "example.rds", package = "scLink"))
genes = readRDS(system.file("extdata", "genes.rds", package = "scLink"))
```
The example raw count matrix `count` has 793 **rows** representing different cells and 23,341 **columns** representing different genes.
`genes` is a character vector of 500 genes of interest.


## `sclink_norm`
We use the function `sclink_norm` to process single cell read count for application of the sclink method. The code below will normalize the read count matrix with a library size of $10^6$ and only keep the 500 genes in `genes` for downstream analysis. Note that the normalized count matrix `count.norm` is on the $\log_{10}$ scale.
```{r}
count.norm = sclink_norm(count, scale.factor = 1e6, filter.genes = FALSE, gene.names = genes)
```
If users do not have a particular gene list for network inference, they can set `filter.genes=TRUE` to filter for the top $n$ genes with largest average expression values. For example:
```{r eval = FALSE}
count.norm = sclink_norm(count, scale.factor = 1e6, filter.genes = TRUE, n = 500)
```

## `sclink_net`
After the pre-processing step, we use the function `sclink_net` to calculate the robust correlation matrix and identifed sparse co-expression network of scLink.
`expr` is the normalized count matrix output by `sclink_norm` or supplied by the users. `lda` is the candidate regularization parameters used in scLink's graphical model. The users can set `ncores` to take advantage of parallel computation.
```{r, message=FALSE}
networks = sclink_net(expr = count.norm, ncores = 1, lda = seq(0.5, 0.1, -0.05))
```

`sclink_net` returns a list of results. The scLink's robust correlation matrix can be retrieved from the `cor` element:
```{r}
networks$cor[1:3,1:3]
```
The gene co-expression networks and summary statistics can be retrieved from the `summary` element, which is a list with the same length as `lda`: each element corresponds to one regularization parameter.
```{r}
net1 = networks$summary[[1]]
names(net1)
### adjacency matrix
net1$adj[1:3,1:3]
### concentration matrix
net1$Sigma[1:3,1:3]
### BIC 
net1$bic
### number of edges
net1$nedge
### regularization parameter lambda
net1$lambda
```

## `sclink_cor`
Since it is very difficult to infer co-expression relationships for lowly expressed genes in single-cell data, we suggest the filtering step as used in `sclink_norm` to select genes. This also reduces the computational burden. However, if the users would like to infer gene networks for a large gene list (e.g., $> 5000$ genes), we suggest that the users first use `sclink_cor` to investigate the correlation structures among these genes.
```{r eval = FALSE}
corr = sclink_cor(expr = count.norm, ncores = 1)
```
If the correlation matrix suggests obvious gene modules, then the users can apply `sclink_net` separately on these modules to reduce computation time and increase overall accuracy.
