---
title: "Getting Started with BioMoR"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with BioMoR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

BioMoR: Bioinformatics Modeling with Recursion and Autoencoder-Based Ensembles

BioMoR is an R package for bioinformatics modeling that integrates:
- Recursive Transformer architectures via Mixture-of-Recursions (MoR)
  (Bae et al. 2025 doi:10.48550/arXiv.2507.10524)
- Autoencoder-based representation learning
  (Hinton & Salakhutdinov 2006 doi:10.1126/science.1127647)
- Random Forests for robust tree-based modeling
  (Breiman 2001 doi:10.1023/A:1010933404324)
- XGBoost for efficient gradient boosting
  (Chen & Guestrin 2016 doi:10.1145/2939672.2939785)
- Stacked ensembles to combine diverse models for stronger predictive power.

It is designed as a benchmarking framework for predictive workflows in bioinformatics, enabling consistent cross-validation, calibration, and threshold optimization.

Motivation

Modern bioinformatics involves high-dimensional and noisy data such as genomics, transcriptomics, and proteomics. BioMoR addresses these challenges by:
- Using Mixture-of-Recursions (MoR) for adaptive recursive depth and computational efficiency.
- Learning latent embeddings through autoencoders to improve classifier generalization.
- Leveraging ensemble methods (RF, XGB) for robustness.
- Providing a standardized benchmarking interface to evaluate models on ROC-AUC, PR-AUC, F1, Balanced Accuracy, Brier score, calibration, and threshold optimization.

Example Workflow

We illustrate with the classic iris dataset (binary recoding for simplicity):

```{r, message=FALSE}
library(BioMoR)

# Prepare dataset: recode labels to binary
data(iris)
iris$Label <- ifelse(iris$Species == "setosa", "Active", "Inactive")

# Cross-validation control
ctrl <- get_cv_control(cv = 3)

# Train a Random Forest
fit <- train_rf(iris, outcome_col = "Label", ctrl = ctrl)

# Benchmark the model
results <- biomor_benchmark(fit, iris, outcome_col = "Label")
results
```

You can further extend this workflow by:

- Replacing `train_rf()` with `train_xgb_caret()` for XGBoost.
- Incorporating autoencoder features via `train_autoencoder()` and `get_embeddings()`.
- Using `train_biomor()` to stack multiple models.
- Benchmarking across models to compare pipelines in one consistent framework.
