---
title: "Interactive Volcano Plot"
format: 
    html:
        embed-resources: true
        fontsize: 1.1em
        linestretch: 1.7
author: "Guangchuang Yu\\

        School of Basic Medical Sciences, Southern Medical University"
date: "`r Sys.Date()`"
vignette: >
  %\VignetteIndexEntry{ivolcano introduction}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
---

```{r results='hide', echo=FALSE, message=FALSE}
library(yulab.utils)
```


## Example Data and Differential Analysis

```{r}
#| echo: false
#| include: false
library(org.Hs.eg.db)
```

```{r}
#| eval: false
library(airway)
library(DESeq2)
library(org.Hs.eg.db)
library(AnnotationDbi)

data("airway")

counts_mat <- assay(airway)
airway <- airway[rowSums(counts_mat) > 1, ]

dds <- DESeqDataSet(airway, design = ~ cell + dex)
dds <- DESeq(dds)
res <- results(dds, contrast = c("dex", "trt", "untrt"))

df <- as.data.frame(res)
df$gene_id <- rownames(df)
df$symbol <- mapIds(org.Hs.eg.db, 
                    keys = df$gene_id,
                    column = "SYMBOL", 
                    keytype = "ENSEMBL", 
                    multiVals = "first")

df <- df[!is.na(df$symbol), ]
df <- df[, c("log2FoldChange", "padj", "symbol")]

head(df)
```

```{r}
#| echo: false
f <- system.file('extdata/airway.rds', package='ivolcano')
df <- readRDS(f)
head(df)
```

## Interactive Volcano Plot Visualization

By default, it creates an interactive plot. If you pass the parameter `interactive = FALSE`, it will generate a regular static `ggplot` figure.

In the interactive plot, hovering the mouse will display gene name, logFC, and adjusted P values information.

`onclick_fun` parameter can be used to define custom actions when clicking on data points. Here we use the built-in function `onclick_genecards` to open GeneCards page for the clicked gene.

Another options for redirecting to external websites can be:

+ `onclick_ensembl` to open Ensembl page for the clicked gene.
+ `onclick_ncbi` to open NCBI page for the clicked gene.
+ `onclick_hgnc` to open HGNC page for the clicked gene.
+ `onclick_uniprot` to open UniProt page for the clicked gene.
+ `onclick_pubmed` to open PubMed page for the clicked gene.


```{r}
#| label: ivocano
#| fig-width: 10
#| fig-height: 6

library(ivolcano)

p <- ivolcano(df,
        logFC_col = "log2FoldChange",
        pval_col = "padj",
        gene_col = "symbol",
        top_n = 5,
        onclick_fun=onclick_genecards)

print(p)
```


We can also use the `r CRANpkg("fanyi")` package to retrieve gene information from NCBI and display it on the plot. Here we also translate the gene 'summary' information into Chinese. 

```{r}
library(org.Hs.eg.db)
df$gene_id <- rownames(df)
df$entrez <- mapIds(org.Hs.eg.db, 
                    keys = df$gene_id,
                    column = "ENTREZID", 
                    keytype = "ENSEMBL", 
                    multiVals = "first")

top_eg <- df$entrez[order(df$padj)][1:50]

library(fanyi)
gs <- gene_summary(top_eg)

# not run, as it required api key
# see also https://github.com/YuLab-SMU/fanyi
#
# gs$summary_cn <- tencent_translate(gs$summary)

head(gs)
```

Define the `onclick` function to select multiple columns of information for display. Here we use the gene description (full name) and the summary information. The `onclick_fanyi` function will return a function definition according to our requirements, which can be passed to `ivolcano`. 

```{r}
#| label: ivolcano-fanyi
#| fig-width: 10
#| fig-height: 6

onclick_fun <- onclick_fanyi(gs, c("description", "summary"))

p2 <- ivolcano(df,
        logFC_col="log2FoldChange",
        pval_col="padj",
        gene_col="symbol",
        onclick_fun = onclick_fun)

print(p2)
```

## Point Size Adjustment


The `size_by` can be used to adjust the point size based on a specific variable or column in the data frame. Options include "negLogP" for negative log (adjusted) P-values, "absLogFC" for absolute log fold changes, or a column name in the data frame. 


```{r}
#| label: ivolcano-point-size
#| fig-width: 10
#| fig-height: 6

p3 <- ivolcano(df,
        logFC_col="log2FoldChange",
        pval_col="padj",
        gene_col="symbol",
        size_by = "negLogP")

print(p3)
```



If `size_by` is set to "manual", users can utilize the `point_size` parameter to define the point sizes for different categories. The default setting is `list(base = 2, medium = 4, large = 6)`, where `base` sets the size for non-significant points, `medium` for points meeting both `pval_cutoff` and `logFC_cutoff`, and `large` for points meeting both `pval_cutoff2` and `logFC_cutoff2` if these parameters are provided.


```{r}
#| label: ivolcano-point-size-manual
#| fig-width: 10
#| fig-height: 6

p4 <- ivolcano(df,
        logFC_col="log2FoldChange",
        pval_col="padj",
        gene_col="symbol",
        pval_cutoff = 0.05,
        pval_cutoff2 = 0.01,
        logFC_cutoff = 1,
        logFC_cutoff2 = 2,
        size_by = "manual")

print(p4)
```

## FigureYa Themed Volcano Plot

The `ivolcano` function integrates the FigureYa themed volcano plot, which can generate more aesthetically pleasing academic publication-level volcano plots. This is mainly achieved by specifying more stringent thresholds through the `pval_cutoff2` and `logFC_cutoff2` parameters, and the function will automatically apply the FigureYa color theme.


```{r}
#| label: figureya-basic
#| fig-width: 10
#| fig-height: 6

# load example data
f1 <- system.file('extdata/easy_input_limma.rds', package='ivolcano')
df_limma <- readRDS(f1)

p5 <- ivolcano(df_limma, 
        logFC_col = "logFC",
        pval_col = "P.Value",
        pval_cutoff = 0.05,
        pval_cutoff2 = 0.01,
        logFC_cutoff = 1,
        logFC_cutoff2 = 2,
        gene_col = "X")
print(p5)
```

## Custom Gene Display and Styles

### Highlight Top Significant Genes

The `ivolcano` function provide functionality for custom gene display. You can use `top_n` to specify displaying the top N most significant genes (by default). By default, it shows the top N most significant upregulated genes and top N most significant downregulated genes (total 2N genes). If you set `label_mode = "all"`, it will display the top N most significant genes including both upregulated and downregulated (total N genes).

### Highlight Selected Genes

If you define the `filter` parameter, it will filter genes based on that condition for display. In this case, the `top_n` parameter will not take effect.

You can specify genes of interest for highlighting.


```{r}
#| label: figureya-selected
#| fig-width: 10
#| fig-height: 6

# load selected gene data
f2 <- system.file('extdata/easy_input_selected.rds', package='ivolcano')
selected <- readRDS(f2)
genes <- selected[,1]
genes

# display selected genes
# here X is the column in our volcano plot data frame that contains gene names

p6 <- ivolcano(df_limma, 
        logFC_col = "logFC",
        pval_col = "P.Value",
        pval_cutoff = 0.05,
        pval_cutoff2 = 0.01,
        logFC_cutoff = 1,
        logFC_cutoff2 = 2,
        gene_col = "X",
        filter = 'X %in% genes')
print(p6)
```

You can write judgement conditions to filter genes, for example, you can automatically mark genes with particular large LogFC values.

```{r}
#| label: figureya-extreme
#| fig-width: 10
#| fig-height: 6

# Mark genes with extreme logFC values
p7 <- ivolcano(df_limma, 
        logFC_col = "logFC",
        pval_col = "P.Value",
        pval_cutoff = 0.05,
        pval_cutoff2 = 0.01,
        logFC_cutoff = 1,
        logFC_cutoff2 = 2,
        gene_col = "X",
        filter = 'logFC > 8')
        
print(p7)
```

## Pathway and Volcano Plot Linkage

The `pathway_volcano` function allows you to combine an interactive pathway dotplot (generated by `idotplot`) and an interactive volcano plot (generated by `ivolcano`). Clicking on a pathway in the dotplot will highlight the corresponding genes in the volcano plot.

```{r}
#| label: pathway-volcano-linkage
#| fig-width: 12
#| fig-height: 6
#| message: false
#| warning: false

library(clusterProfiler)
library(ivolcano)

# 1. Perform Enrichment Analysis
# We use the provided airway dataset
# Need to map to ENTREZID for enrichment analysis
library(org.Hs.eg.db)

# Select significant genes
sig_genes <- df$entrez[df$padj < 0.05 & abs(df$log2FoldChange) > 1]
sig_genes <- sig_genes[!is.na(sig_genes)]

# Run GO enrichment
ego <- enrichGO(gene = sig_genes,
                OrgDb = org.Hs.eg.db,
                ont = "BP",
                pAdjustMethod = "BH",
                pvalueCutoff = 0.01,
                qvalueCutoff = 0.05)

# Convert gene ID back to Symbol for matching with volcano plot
ego <- setReadable(ego, OrgDb = org.Hs.eg.db, keyType="ENTREZID")

# 2. Create Interactive Dotplot
p_dot <- idotplot(ego, showCategory = 10)

# 3. Create Interactive Volcano Plot
# Ensure gene_col matches the gene symbols in enrichment result
p_vol <- ivolcano(df, 
                  logFC_col = "log2FoldChange", 
                  pval_col = "padj", 
                  gene_col = "symbol",
                  title = "Volcano Plot",
                  interactive = TRUE)

# 4. Combine and Link
# You can adjust relative widths using the widths parameter
g <- pathway_volcano(p_dot, p_vol, widths = c(1.2, 1))

g
```

