---
title: "SelectBoost for Beta regression"
shorttitle: "SelectBoost for Beta regression"
author: 
- name: "Frédéric Bertrand"
  affiliation: 
  - Cedric, Cnam, Paris
  email: frederic.bertrand@lecnam.net
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{SelectBoost for Beta regression}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
LOCAL <- identical(Sys.getenv("LOCAL"), "TRUE")

knitr::opts_chunk$set(purl = LOCAL)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
suppressPackageStartupMessages(library(SelectBoost.beta))
set.seed(321)
```

## Overview

The new `sb_beta()` helper glues the beta-regression selectors provided by this
package to a SelectBoost-style correlated-resampling loop implemented directly
in `SelectBoost.beta`. It takes care of squeezing the response inside the open
unit interval (unless `squeeze = FALSE`) and tagging the output with the
selector that was used.

This vignette walks through two complementary perspectives:

1. Reconstructing the SelectBoost workflow step by step with
   `betareg_step_aic()` to highlight where correlated resampling happens.
2. Calling `sb_beta()` to obtain the same result with a single function call.

Throughout the examples we rely on the built-in simulator to generate correlated
design matrices with a handful of truly associated predictors.

```{r, cache=TRUE, eval=LOCAL}
sim <- simulation_DATA.beta(
  n = 150, p = 6, s = 3, beta_size = c(1, -0.8, 0.6),
  corr = "ar1", rho = 0.25,
  mechanism = "jitter"
)
str(sim$X)
summary(sim$Y)
```

## Manual SelectBoost workflow with beta selectors

The classic SelectBoost algorithm first normalises the design matrix, computes
pairwise correlations, groups variables above a chosen threshold and finally
resamples the predictors before applying the selector. All of those stages are
available directly in `SelectBoost.beta`.

```{r, cache=TRUE, eval=LOCAL}
# Normalise the predictors (centre + L2 scale)
X_norm <- sb_normalize(sim$X)

# Compute correlations
corr_mat <- sb_compute_corr(X_norm)

# Group variables whose absolute correlation exceeds 0.6
raw_groups <- sb_group_variables(corr_mat, c0 = 0.6)

# Draw eight correlated replicas for the grouped variables
X_draws <- sb_resample_groups(X_norm, raw_groups, B = 8, seed = 11)

dim(X_draws[[1]])
```

Each element of `X_draws` stores a correlated copy of the normalised design.
Feeding these matrices to `sb_apply_selector_manual()` together with a
beta-regression selector yields coefficient estimates for every resampled data
set.

```{r, cache=TRUE, eval=LOCAL}
coef_path <- sb_apply_selector_manual(
  X_norm, X_draws, sim$Y, selector = betareg_step_aic
)

dim(coef_path)
coef_path[, 1:3]
```

The leading column `sim0` records the coefficients fitted on the original
normalised design, providing a convenient baseline against which the resampled
paths can be compared.

Finally, the `sb_selection_frequency()` helper counts how often each variable
appears with a non-zero coefficient across the replicates. Because
`betareg_step_aic()` returns a `glmnet`-style coefficient vector (intercept plus
predictors), we set `version = "glmnet"` when computing the selection
frequencies.

```{r, cache=TRUE, eval=LOCAL}
sel_freq <- sb_selection_frequency(coef_path, version = "glmnet")
sel_freq
```

This manual exercise confirms that the correlated resampling loop from the
original SelectBoost package plugs seamlessly into the beta selectors shipped in
`SelectBoost.beta`.

## Running the entire loop with `sb_beta()`

The `sb_beta()` wrapper performs the same steps internally while exposing the
arguments most relevant to beta regression. By default it uses
`betareg_step_aic()` as the base selector, but any of the exported functions
(`"betareg_step_bic"`, `betareg_glmnet`, etc.) can be passed either by name or
as a function.

```{r, cache=TRUE, eval=LOCAL}
sb <- sb_beta(
  sim$X, sim$Y,
  B = 60,
  step.num = 0.5,
  steps.seq = c(0.9, 0.7, 0.5)
)

class(sb)
attr(sb, "selector")
rownames(sb)
round(sb, 3)
```

The resulting matrix comes with several attributes that document how the
frequencies were generated. `attr(sb, "c0.seq")` returns the correlation
threshold grid, `attr(sb, "B")` stores the number of correlated resamples per
threshold, `attr(sb, "interval")` highlights whether interval sampling was
activated, and `attr(sb, "resample_diagnostics")` keeps summary statistics on
the cached surrogate draws. These metadata mirror the legacy SelectBoost beta
implementation and are now documented in `?sb_beta()`.

Changing the selector is simply a matter of passing a different routine. The
call below uses the GAMLSS-based elastic-net variant and asks `sb_beta()` to pass
`choose = "bic"` to the underlying `betareg_glmnet()` implementation.

```{r, cache=TRUE, eval=LOCAL}
sb_enet <- sb_beta(
  sim$X, sim$Y,
  selector = betareg_glmnet,
  B = 60,
  step.num = 0.5,
  version = "glmnet",
  choose = "bic",
  prestandardize = TRUE
)

attr(sb_enet, "selector")
colMeans(sb_enet)
```

Because the wrapper always builds on the same correlated resamples, results are
directly comparable across selectors as long as they adopt the `glmnet`-style
coefficient convention. This makes it straightforward to run stability analyses
for interval responses by pairing `sb_beta()` with the convenience wrapper
`sb_beta_interval()` (or the lower-level `fastboost_interval()`) or to compare
several beta selectors under the exact same resampled design matrices.


## Conference communications

The SelectBoost4Beta workflow and its correlated resampling foundations were
presented by Frédéric Bertrand and Myriam Maumy in 2023 at two conferences:

- **Joint Statistical Meetings 2023 (Toronto, Canada)** — "Improving variable
  selection in Beta regression models using correlated resampling".
- **BioC2023 (Boston, USA)** — "SelectBoost4Beta: Improving variable selection
  in Beta regression models".

Both communications emphasised how leveraging correlation-aware resampling improves the
recall and precision of variable selection in high-dimensional Beta regression
settings.
