---
title: "SelectBoost.beta algorithms"
shorttitle: "SelectBoost.beta algorithms"
author: 
- name: "Frédéric Bertrand"
  affiliation: 
  - Cedric, Cnam, Paris
  email: frederic.bertrand@lecnam.net
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{SelectBoost.beta algorithms}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
LOCAL <- identical(Sys.getenv("LOCAL"), "TRUE")

knitr::opts_chunk$set(purl = LOCAL)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
suppressPackageStartupMessages(library(SelectBoost.beta))
set.seed(321)
```

## Motivation

`SelectBoost.beta` re-uses the correlated-resampling machinery introduced by the
original SelectBoost package and combines it with Beta-regression selectors.
This vignette summarises the main routines and presents pseudo-code for their
internal logic. The goal is to make it easy to re-implement or extend the
algorithms in other contexts.

## Building blocks

The following helpers expose the canonical SelectBoost stages.

- `sb_normalize()` centres and \(\ell_2\)-normalises the design matrix columns.
- `sb_compute_corr()` computes a correlation (or user-supplied association)
  matrix from the normalised design.
- `sb_group_variables()` converts the correlation matrix into groups of highly
  associated predictors for a given threshold \(c_0\).
- `sb_resample_groups()` regenerates correlated predictors for each group by
  drawing from a multivariate normal approximation and re-normalising. When all
  groups are singletons it now warns and simply returns repeated copies of the
  normalised design.
- `sb_apply_selector_manual()` applies a selector to each resampled design and
  collects the resulting coefficient vectors. Set `keep_template = TRUE` (the
  default) to retain the base fit as column `sim0` without recomputing it on the
  first resample.
- `sb_selection_frequency()` converts the matrix of coefficients into selection
  frequencies while respecting the selector's coefficient convention.

## Pseudo-code: manual workflow

The manual SelectBoost workflow follows the same steps regardless of the base
selector. Pseudo-code for producing selection frequencies at a single threshold
is given below.

```text
Procedure ManualSelectBoost(X, Y, selector, c0, B):
  1. X_norm <- sb_normalize(X)
  2. Corr <- sb_compute_corr(X_norm)
  3. Groups <- sb_group_variables(Corr, c0)
  4. Resamples <- sb_resample_groups(X_norm, Groups, B)
  5. CoefMatrix <- sb_apply_selector_manual(X_norm, Resamples, Y, selector)
  6. Frequencies <- sb_selection_frequency(CoefMatrix, version = "glmnet")
  7. Return Frequencies
```

In practice `sb_resample_groups()` preserves singletons untouched. Only groups
with two or more predictors receive correlated draws.

## Pseudo-code: correlation grid driver

`sb_beta()` extends the manual workflow by iterating over a grid of correlation
thresholds. The following pseudo-code matches the behaviour of the exported
function.

```text
Algorithm sb_beta(X, Y, selector, B, step.num, steps.seq, version, squeeze):
  1. If squeeze, transform Y into the open unit interval.
  2. X_norm <- sb_normalize(X)
  3. Corr <- sb_compute_corr(X_norm)
  4. Grid <- {1} ∪ .sb_c0_sequence(Corr, step.num, steps.seq) ∪ {0}
  5. For each c0 in Grid:
       a. Groups <- sb_group_variables(Corr, c0)
       b. If every group has size 1:
            i. CoefMatrix <- selector(X_norm, Y)
          Else:
            i. Resamples <- sb_resample_groups(X_norm, Groups, B)
           ii. For each design in Resamples:
                  - CoefMatrix[, b] <- selector(design, Y)
       c. Freq[c0, ] <- sb_selection_frequency(CoefMatrix, version)
  6. Attach attributes (B, selector, c0 sequence) and return Freq
```

The selector argument can be any function returning a numeric vector of
coefficients with optional names. When `version = "glmnet"`, the first entry is
interpreted as the intercept and excluded from the selection frequencies.

The squeezing step enforces the usual SelectBoost transformation that pushes all
responses inside `(0, 1)`. Keep it enabled unless you already pre-processed the
outcome; otherwise zero or one values will cause the selectors to abort.

## Extending the algorithms

The modular helpers are designed to be recomposed. For example, it is possible
to plug in a custom grouping routine before calling `sb_resample_groups()` or to
supply a selector that implements cross-validation or penalisation strategies.
Because each helper only relies on basic R primitives, the pseudo-code above
translates readily into other languages.


## Conference communications

The SelectBoost4Beta concepts described here were showcased by Frédéric
Bertrand and Myriam Maumy in 2023 at:

- Joint Statistical Meetings 2023 (Toronto, Canada): "Improving variable
  selection in Beta regression models using correlated resampling".
- BioC2023 (Boston, USA): "SelectBoost4Beta: Improving variable selection in
  Beta regression models".

These communications detailed how correlation-aware resampling strengthens
variable selection performance for Beta regression under strong predictor
dependencies.
