---
title: "scCATCH tutorial"
author: "Xin Shao"
date: "`r Sys.Date()`"
output:
  prettydoc::html_pretty:
    theme: cayman
    highlight: github
vignette: >
  %\VignetteIndexEntry{scCATCH tutorial}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This package is a single cell Cluster-based auto-Annotation Toolkit for Cellular Heterogeneity (scCATCH) from cluster potential marker genes identification to cluster annotation based on evidence-based score by matching the potential marker genes with known cell markers in tissue-specific cell taxonomy reference database (CellMatch).

The scCATCH mainly includes two function `findmarkergene` and `findcelltype` to realize the automatic annotation for each identified cluster.

scCATCH can be used to annotate scRNA-seq data from tissue with cancer and without cancer.

### General usage
[1] For scRNA-seq data, we suggest to revise the gene symbols with `rev_gene()`. `geneinfo` is the system data.frame containing the information of human and mouse from NCBI gene(updated in June. 19, 2022). To use your own `geneinfo` data.frame, please refer to `demo_geneinfo` to build a new one, e.g., rat, zebrafish, Drosophila, C. elegans, etc.

```{r setup, echo=TRUE}
library(scCATCH)
load(paste0(system.file(package = "scCATCH"), "/extdata/mouse_kidney_203.rda"))

# demo_geneinfo
demo_geneinfo()

# revise gene symbols
mouse_kidney_203 <- rev_gene(data = mouse_kidney_203, data_type = "data", species = "Mouse", geneinfo = geneinfo)
```

[2] create scCATCH object with `createscCATCH()`. Users need to provide the normalized data and the cluster for each cell.
```{r createscCATCH, echo=TRUE}
obj <- createscCATCH(data = mouse_kidney_203, cluster = mouse_kidney_203_cluster)
```

[3] find highly expressed genes with `findmarkergene()`. Users need to provided the speices, tissue, or cancer information. `cellmatch` is the system data.frame containing the known markers of human and mouse. To use your own marker data.frame, please refer to `demo_marker` to build a new one, e.g., rat, zebrafish, Drosophila, C. elegans, etc.
```{r findmarkergene, echo=TRUE}
# demo_geneinfo
demo_marker()

# find highly expressed genes
obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch, tissue = "Kidney")
```

[4] Evidence-based score and annotation for each cluster with `findcelltype()`
```{r findcelltype, echo=TRUE}
obj <- findcelltype(object = obj)

# Results is stored in obj
obj@celltype
```

Note: There two methods to find marker genes. Set `use_method` `1` to compare with every other cluster and `2` to compare with other clusters together like the strategy in `Seurat`. Besides, when setting `use_method` `1`, users can set `comp_cluster`, it represent the number of clusters to compare. Default is to compare all other cluster for each cluster. Set it between 1 and length of unique clusters. More marker genes will be obtained for smaller `comp_cluster`.

```{r findcelltype_new, echo=TRUE}
# The most strict condition to identify marker genes
obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch,tissue = "Kidney", use_method = "1")

# The most loose condition to identify marker genes
obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch, tissue = "Kidney", use_method = "2")

# Other conditions to identify marker genes
obj <- findmarkergene(object = obj,species = "Mouse", marker = cellmatch, tissue = "Kidney", use_method = "1", comp_cluster = 1)
```

Moreover, users can adjust the `cell_min_pct`, `logfc`, and `pvalue` to identify the different marker genes.

### Custom usage
Users are allowed to use the custom `cellmatch` for cell type prediction when [1] users want to select different combination of tissues or cancers for annotation; [2] users want to add more marker genes to `cellmatch` for annotation; [3] users want to use markers from different species other than human and mouse.
In this way, please set `if_use_custom_marker` `TRUE` in `findmarkergene()` function and do not need to set `species`,`tissue`, and `cancer`

[1] Different combination of tissues or cancers
```{r findcelltype1, echo=TRUE}
# Example
cellmatch_new <- cellmatch[cellmatch$species == "Mouse" & cellmatch$tissue %in% c("Kidney", "Liver", "Lung", "Brain"), ]
obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new)
obj <- findcelltype(obj)

# Example
cellmatch_new <- cellmatch[cellmatch$species == "Mouse" & cellmatch$cancer %in% c("Lung Cancer", "Lymph node", "Renal Cell Carcinoma", "Prostate Cancer"), ]
obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new)
obj <- findcelltype(obj)

# Example
cellmatch_new <- cellmatch[cellmatch$species == "Mouse", ]
cellmatch_new <- cellmatch[cellmatch$cancer %in% c("Lung Cancer", "Lymph node", "Renal Cell Carcinoma", "Prostate Cancer") | cellmatch$tissue %in% c("Kidney", "Liver", "Lung", "Brain"), ]
obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new)
obj <- findcelltype(obj)
```

[2] Add more marker genes to `cellmatch` for annotation
```{r findcelltype2, echo=TRUE}
# Example

# cellmatch_new is provided by users
# cellmatch_new <- rbind(cellmatch, cellmatch_new)

# Then use the new cellmatch
# a. define the species, tissue, and cancer
obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch_new, tissue = "Kidney")
obj <- findcelltype(obj)

# b. directly use custom cellmatch
obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new)
obj <- findcelltype(obj)
```

[3] Use markers from different species
```{r findcelltype3, echo=TRUE}
# Please refer to demo_marker to build a marker data.frame (new_cellmatch) for another species, e.g., rat
# Then use the new marker
obj <- findmarkergene(object = obj, species = "Rat", if_use_custom_marker = TRUE, marker = cellmatch_new, tissue = "Kidney")
obj <- findcelltype(obj)
```

### About
Please refer to the [scCATCH](https://github.com/ZJUFanLab/scCATCH) on GitHub for more information. Available tissues and cancers see the [wiki page](https://github.com/ZJUFanLab/scCATCH/wiki)

### Cite
Shao et al., scCATCH:Automatic Annotation on Cell Types of Clusters from Single-Cell RNA Sequencing Data, iScience, Volume 23, Issue 3, 27 March 2020. [doi: 10.1016/j.isci.2020.100882](https://www.sciencedirect.com/science/article/pii/S2589004220300663). [PMID:32062421](https://pubmed.ncbi.nlm.nih.gov/32062421/)
