---
title: "Introduction to FracFixR"
author: "Alice Cleynen, Agin Ravindran, Nikolay Shirokikh"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to FracFixR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```

## Introduction

FracFixR is a compositional statistical framework for analyzing RNA-seq data from fractionation experiments. It addresses the fundamental challenge where library preparation and sequencing depth obscure the original proportions of RNA fractions.

This vignette demonstrates:

1. Basic usage with polysome profiling data
2. Differential proportion testing
3. Visualization of results
4. Advanced applications

## Installation

```{r installation, eval=FALSE}
# From CRAN
install.packages("FracFixR")

# From GitHub (development version)
devtools::install_github("Arnaroo/FracFixR")
```

## Quick Start Example

Let's simulate a simple polysome profiling experiment:

```{r load_package}
library(FracFixR)

# Set seed for reproducibility
set.seed(123)
```

### Creating Example Data

```{r create_data}
# Simulate count data for 500 genes across 12 samples
n_genes <- 100
n_samples <- 12

# Generate count matrix with varying expression levels
total_counts <- matrix(
  rnbinom(n_genes * 4, mu = 100, size = 10),
  nrow = n_genes,
  dimnames = list(
    paste0("Gene", 1:n_genes),
    paste0("Sample", 1:4)
  )
)
prob=c(1/4,1/4,1/2)

# distribute counts evenly accross different fractions in Control
multinom_samples <- apply(total_counts, c(1, 2), function(x) rmultinom(1, size = x, prob = prob))
reshaped <- aperm(multinom_samples, c(3, 2, 1))  # now (n, 4, 3)
reshaped_matrix1 <- matrix(reshaped, nrow = n_genes, ncol = 12)
colnames(reshaped_matrix1) <- paste0("V", rep(1:4, each = 3), "_p", 1:3)

counts<-cbind(total_counts[,1:2],reshaped_matrix1[,c(2:3,5:6)],total_counts[,3:4],reshaped_matrix1[,c(8:9,11:12)])

# Create annotation data frame
annotation <- data.frame(
  Sample = colnames(counts),
  Condition = rep(c("Control", "Treatment"), each = 6),
  Type = rep(c("Total", "Total", "Light_Polysome", "Heavy_Polysome","Light_Polysome", "Heavy_Polysome"), 2),
  Replicate = c("Rep1","Rep2", rep("Rep1", 2), rep("Rep2", 2), "Rep1","Rep2", rep("Rep1", 2), rep("Rep2", 2))
)

print(head(annotation))
```

### Running FracFixR

```{r run_fracfixr}
# Run the main analysis
results <- FracFixR(MatrixCounts = counts, Annotation = annotation)

# View the structure of results
names(results)
```

### Understanding the Output

The FracFixR output contains several components:

```{r explore_output}
# 1. Fraction proportions for each replicate
print(results$Fractions)

# 2. Regression coefficients
print(results$Coefficients)

# 3. Estimated Proportions (first 5 genes, first 6 samples)
print(results$Propestimates[1:5, 1:6])
```

## Visualizing Fraction Proportions

```{r plot_fractions, fig.cap="Fraction proportions across replicates"}
# Create fraction plot
frac_plot <- PlotFractions(results)
print(frac_plot)
```

The plot shows:
- The proportion of RNA in each fraction
- The "Lost" fraction (grey) representing unrecoverable material
- Consistency across replicates

## Differential Proportion Testing

Now let's identify genes with differential polysome association between conditions:

```{r diff_test}
# Test for differential proportion in heavy polysomes
diff_heavy <- DiffPropTest(
  NormObject = results,
  Conditions = c("Control", "Treatment"),
  Types = "Heavy_Polysome",
  Test = "Logit"
)

# View top differentially associated genes
top_genes <- diff_heavy[order(diff_heavy$padj), ]
print(head(top_genes, 10))
```

### Interpreting Results

- **mean_diff**: Difference in proportion between conditions
- **log2FC**: Log2 fold change of proportions
- **padj**: FDR-adjusted p-value

Positive mean_diff indicates higher proportion in Treatment.

## Creating a Volcano Plot

```{r volcano_plot, fig.cap="Volcano plot of differential polysome association"}
volcano <- PlotComparison(
  diff_heavy,
  Conditions = c("Control", "Treatment"),
  Types = "Heavy_Polysome",
  cutoff = 20
)
print(volcano)
```

### Data Requirements

For real experiments, ensure your data meets these requirements:

1. **Count Matrix**: Raw counts (not normalized)
2. **Annotation**: Must include:
   - Sample names matching count matrix columns
   - At least one "Total" sample per condition-replicate
   - Consistent fraction names across replicates


## Session Info

```{r session_info}
sessionInfo()
```

## References

Cleynen A, Ravindran A, Shirokikh N (2025). FracFixR: A compositional statistical framework for absolute proportion estimation between fractions in RNA sequencing data. 

For more examples and advanced usage, see the [FracFixR GitHub repository](https://github.com/Arnaroo/FracFixR).
