---
title: 'Benchmarking SignacFast with single cell flow-sorted data'
author: 'Mathew Chamberlain'
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Benchmarking SignacFast with flow-sorted data}
  %\VignetteEngine{knitr::rmarkdown_notangle}
  %\VignetteEncoding{UTF-8}
---

This vignette shows how to use SignacFast to annotate flow-sorted synovial cells by integrating SignacX with Seurat. We start with raw counts from [this publication](https://www.nature.com/articles/s41590-019-0378-1). 

```{r setup0, include=FALSE}
all_times <- list()  # store the time for each chunk
knitr::knit_hooks$set(time_it = local({
  now <- NULL
  function(before, options) {
    if (before) {
      now <<- Sys.time()
    } else {
      res <- difftime(Sys.time(), now, units = "secs")
      all_times[[options$label]] <<- res
    }
  }
}))
knitr::opts_chunk$set(
  tidy = TRUE,
  tidy.opts = list(width.cutoff = 95),
  message = FALSE,
  warning = FALSE,
  time_it = TRUE
)
#celltypes_fast = readRDS("./fls/celltypes_fast_citeseq.rds")
#celltypes = readRDS("./fls/celltypes_citeseq.rds")
# pbmc = readRDS("fls/pbmcs_signac_citeseq.rds")
celltypes = readRDS(file = "fls/celltypes_amp_synovium.rds")
celltypes_fast = readRDS(file = "fls/celltypes_fast_synovium_celltypes.rds")
```

## Load data

Read the CEL-seq2 data.

```{r read CELSeq2, message = F, eval = F}
ReadCelseq <- function (counts.file, meta.file)
{
  E = suppressWarnings(readr::read_tsv(counts.file));
  gns <- E$gene;
  E = E[,-1]
  E = Matrix::Matrix(as.matrix(E), sparse = TRUE)
  rownames(E) <- gns
  E
}

counts.file = "./fls/celseq_matrix_ru10_molecules.tsv.gz"
meta.file = "./fls/celseq_meta.immport.723957.tsv"

E = ReadCelseq(counts.file = counts.file, meta.file = meta.file)
M = suppressWarnings(readr::read_tsv(meta.file))

# filter data based on depth and number of genes detected
kmu = Matrix::colSums(E != 0)
kmu2 = Matrix::colSums(E)
E = E[,kmu > 200 & kmu2 > 500]

# filter by mitochondrial percentage
logik = grepl("^MT-", rownames(E))
MitoFrac = Matrix::colSums(E[logik,]) / Matrix::colSums(E) * 100
E = E[,MitoFrac < 20]
```

## Seurat 

Start with the standard pre-processing steps for a Seurat object.

```{r setupSeurat, message = F, eval = F}
library(Seurat)
```

Create a Seurat object, and then perform SCTransform normalization. Note:

* You can use the legacy functions here (i.e., NormalizeData, ScaleData, etc.), use SCTransform or any other normalization method (including no normalization). We did not notice a significant difference in cell type annotations with different normalization methods.
* We think that it is best practice to use SCTransform, but it is not a necessary step. Signac will work fine without it.

```{r Seurat, message = T, eval = F}
# load data
synovium <- CreateSeuratObject(counts = E, project = "FACs")

# run sctransform
synovium <- SCTransform(synovium)
```

Perform dimensionality reduction by PCA and UMAP embedding. Note:

* Signac actually needs these functions since it uses the nearest neighbor graph generated by Seurat.

```{r Seurat 2, message = T, eval = F}
# These are now standard steps in the Seurat workflow for visualization and clustering
synovium <- RunPCA(synovium, verbose = FALSE)
synovium <- RunUMAP(synovium, dims = 1:30, verbose = FALSE)
synovium <- FindNeighbors(synovium, dims = 1:30, verbose = FALSE)
```

## SignacX

```{r setup2, message = F, eval = F}
library(SignacX)
```

Generate Signac labels for the Seurat object. Note:

* Optionally, you can do parallel computing by setting num.cores > 1 in the Signac function.

```{r Signac, message = T, eval = F}
labels <- Signac(synovium, num.cores = 4)
celltypes = GenerateLabels(labels, E = synovium)
```

Sometimes, training the neural networks takes a lot of time. The above classification took 27 minutes. To make a faster method, we implemented SignacFast which uses pre-trained models. Note:

* SignacFast uses an ensemble of 1,800 pre-calculated neural networks using the GenerateModels function together with the training_HPCA reference data set.
* Features that are absent from the single cell data and present in the neural network are set to zero.

```{r SignacFast, message = T, eval = F}
# Run SignacFast
labels_fast <- SignacFast(synovium, num.cores = 4)
celltypes_fast = GenerateLabels(labels_fast, E = synovium)
```

Compare results:

Celltypes:
```{r Compare 1, echo = F, eval = T}
knitr::kable(table(Signac = celltypes$CellTypes, SignacFast = celltypes_fast$CellTypes), format = "html")
```

Cellstates:
```{r Compare 2, echo = F, eval = T}
knitr::kable(table(Signac = celltypes$CellStates, SignacFast = celltypes_fast$CellStates), format = "html")
```

Save results
```{r save results, message = F, eval = F}
saveRDS(synovium, file = "fls/seurat_obj_amp_synovium.rds")
saveRDS(celltypes, file = "fls/celltypes_amp_synovium.rds")
saveRDS(celltypes_fast, file = "fls/celltypes_fast_amp_synovium_celltypes.rds")
```

```{r save.times, include = FALSE, eval = F}
write.csv(x = t(as.data.frame(all_times)), file = "fls/tutorial_times_SignacFast_AMP.csv")
```

<details>
  <summary>**Session Info**</summary>
```{r, echo=FALSE}
sessionInfo()
```
</details>
