--- title: "Surprisal Analysis Guidelines" output: rmarkdown::html_vignette: mathjax: default vignette: > %\VignetteIndexEntry{Surprisal analysis guidelines} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Surprisal Analysis, an R package for information theoretic analysis of gene expression data ```{r} library(SurprisalAnalysis) library(ggplot2) ``` Read data and apply Surprisal analysis ```{r} data <- read.csv(system.file("extdata", "helper_T_cell_0_test.csv.gz", package = "SurprisalAnalysis"), header=TRUE) results <- surprisal_analysis(data) results[[2]]-> transcript_weights percentile_GO <- 0.95 #change based on your preference lambda_no <- 2 #change based on your preference, lambda #1 is the baseline state ``` Run GO analysis ```{r, eval = FALSE} GO.results <- GO_analysis_surprisal_analysis(transcript_weights, percentile_GO, lambda_no, key_type = "SYMBOL", flip = FALSE, species.db.str = "org.Mm.eg.db", top_GO_terms=15) ``` The function GO_analysis_surprisal_analysis() runs Gene Ontology (GO) enrichment on the most influential transcripts from a chosen Surprisal pattern. Below are the input arguments:

transcript_weights
A matrix of transcript weights, typically the second element ([[2]]) returned from the Surprisal analysis function.
percentile_GO
A numeric value between 0 and 1 specifying the quantile cutoff for transcript selection. Example: 0.95 means only the top 5% of transcripts (by absolute weight) in the chosen $\lambda$ pattern are used.
lambda_no
An integer specifying which $\lambda$ pattern to analyze. Note: $\lambda_1$ represents the balance state, while higher-order $\lambda$’s capture additional constraints or patterns.
key_type
The type of transcript identifiers used in your data. Options include: "SYMBOL" (gene symbols, e.g. TP53), "ENTREZID" (Entrez gene IDs), "ENSEMBL" (Ensembl IDs), "PROBEID" (microarray probe IDs). This must match the ID format in your input dataset.
flip
Logical (TRUE/FALSE). If TRUE, multiplies transcript weights for the selected $\lambda$ by –1 before selecting the top quantile. Useful for ensuring consistency with the direction of $\lambda$ plots.
species.db.str
The organism database to use for gene mapping. Current options: "org.Hs.eg.db" for Homo sapiens (human), "org.Mm.eg.db" for Mus musculus (mouse)
ont
The GO ontology branch for enrichment analysis. Options: "BP" – Biological Process (default), "MF" – Molecular Function, "CC" – Cellular Component
pAdjustMethod
The multiple testing correction method. Options include: "BH" (default), "bonferroni", "holm", "hochberg", "hommel", "BY", "none".
top_GO_terms
An integer specifying the number of top enriched GO terms to return (default: 15).

transcript_weights

percentile_GO

lambda_no

key_type

flip

species.db.str

ont

pAdjustMethod

top_GO_terms