---
title: "Introduction to gradLasso"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to gradLasso}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

The `gradLasso` package implements an efficient gradient descent solver for LASSO-penalized regression models. It supports several families including Gaussian, Binomial, Negative Binomial, and Zero-Inflated Negative Binomial (ZINB). It also features built-in stability selection and cross-validation.

This vignette demonstrates the basic usage of the package.

```{r}
library(gradLasso)
```

## 1. Gaussian Regression (Standard LASSO)

We start by simulating simple Gaussian data with correlated predictors.

```{r}
set.seed(42)
# Simulate 200 obs, 20 predictors, 5 active
sim <- simulate_data(n = 200, p = 20, family = "gaussian", k = 5, snr = 3.0)
df <- data.frame(y = sim$y, sim$X)

# Check the first few rows
head(df[, 1:6])
```

We can fit the model using the standard formula interface. By default, `gradLasso` performs 50 bootstraps for stability selection.

```{r}
fit <- gradLasso(y ~ ., data = df, lambda_cv = TRUE, boot = TRUE, n_boot = 50)

print(fit)
```

We can inspect the selected coefficients using `summary()`. The "Selection_Prob" column shows how often each variable was selected across bootstrap iterations.

```{r}
summary(fit)
```

### Diagnostics

We can visualize the stability path and residual plots.

```{r}
# Plot Stability Selection (Plot 1) and CV Deviance (Plot 2)
plot(fit, which = c(1, 2))
```

## 2. Zero-Inflated Negative Binomial (ZINB)

`gradLasso` specializes in complex GLMs like ZINB. We support a pipe syntax (`|`) to specify different predictors for the Count model and the Zero-Inflation model.
Simulation

We simulate data where the count model depends on different variables than the zero-inflation model.

```{r}
set.seed(456)
sim_zinb <- simulate_data(n = 500, p = 20, family = "zinb",
                          k_mu = 5, k_pi = 5, theta = 2.0)
df_zinb <- data.frame(y = sim_zinb$y, sim_zinb$X)
```

### Model Fitting

We use the pipe syntax: `y ~ predictors_for_count | predictors_for_zero`. Here we use all variables (`.`) for both models.

```{r}
# We use a smaller number of bootstraps for speed in this vignette
fit_zinb <- gradLasso(y ~ . | ., data = df_zinb,
                      family = grad_zinb(),
                      n_boot = 10,
                      lambda = 0.05) # Fixed lambda for demonstration

print(fit_zinb)
```

### Inspecting ZINB Coefficients

The summary automatically splits coefficients into "Count", "Zero-Infl", and "Dispersion" components.

```{r}
summary(fit_zinb)
```

## 3. Parallel Processing

For large datasets, `gradLasso` supports parallel execution for both Cross-Validation and Bootstrapping.

```{r}
# Example (not run in vignette):
# fit <- gradLasso(y ~ ., data = df, parallel = TRUE, n_cores = 4)
```

Conclusion

`gradLasso` provides a unified, tidy interface for sparse regression across multiple GLM families. Its integrated stability selection offers robust variable selection for high-dimensional data.
