---
title: "Clustering Individualized Survival Curves with unsurv"
author: "Imad EL BADISY"
date: "`r format(Sys.Date())`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Clustering Individualized Survival Curves with unsurv}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4.5
)
```

The **unsurv** package clusters full survival trajectories rather than baseline covariates. This vignette walks through a minimal workflow: simulate curves, fit the model, choose the number of clusters automatically, predict new samples, and assess stability.

## Simulate individualized survival curves

We create three prognosis groups with different exponential hazards and add noise so the curves are not perfectly smooth.

```{r data}
library(unsurv)

set.seed(2026)
n <- 150
Q <- 60
times <- seq(0, 5, length.out = Q)

group <- sample(1:3, n, TRUE, prob = c(0.35, 0.4, 0.25))
haz   <- c(0.18, 0.45, 0.8)[group]

S <- sapply(times, function(t) exp(-haz * t))
S <- S + matrix(rnorm(n * Q, 0, 0.02), nrow = n)
S[S < 0] <- 0
S[S > 1] <- 1
```

## Fit clustering with automatic K selection

Leaving `K = NULL` lets **unsurv** pick the number of clusters using the mean silhouette over `2:K_max`.

```{r fit}
fit <- unsurv(S, times, K = NULL, K_max = 6, distance = "L2",
              enforce_monotone = TRUE, smooth_median_width = 5,
              standardize_cols = TRUE, eps_jitter = 0.0005)
fit
```

Key slots:

- `K`: chosen cluster count
- `clusters`: assignments for each curve
- `medoids`: representative survival curves
- `silhouette_mean`: average silhouette width

## Visualize medoids

```{r medoids-plot, fig.cap = "Cluster medoid survival curves."}
plot(fit)
```

For a ggplot2 version:

```{r autoplot, fig.cap = "Medoid curves via ggplot2 autoplot."}
library(ggplot2)
autoplot(fit)
```

## Predict cluster membership for new curves

New curves must use the same time grid as the fit. Preprocessing (clamping, monotonicity, smoothing, standardization) is reused automatically.

```{r predict}
new_curves <- S[1:5, ]
predict(fit, new_curves)
```

## Stability assessment

Resampling gives a sense of how stable the clustering is to perturbations.

```{r stability}
stab <- unsurv_stability(
  S, times, fit,
  B = 20, frac = 0.7,
  mode = "subsample",
  jitter_sd = 0.01,
  weight_perturb = 0.05,
  return_distribution = TRUE
  )
stab$mean
```

Higher mean ARI indicates more reproducible clusters.

## Tips and troubleshooting

- Ensure `times` is strictly increasing and matches the number of columns in `S`.
- If curves show small upward wiggles, keep `enforce_monotone = TRUE` or increase `smooth_median_width` to an odd integer ≥ 3.
- Use `distance = "L1"` when robustness to large deviations at a few time points is desired.
- Setting `weights` lets you emphasize clinically important time windows. Weights are normalized internally.

## Reproducibility

Set `seed` inside `unsurv()` for deterministic PAM initialization and silhouette selection. Vignette figures may differ slightly because of noise added to simulated curves.
