---
title: "From raw data to model with tidyILD"
author: "tidyILD authors"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{From raw data to model with tidyILD}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  fig.height = 4
)
```

This vignette walks through the full tidyILD pipeline: prepare data, inspect structure, apply within-between decomposition and lags, fit a mixed-effects model, and run diagnostics and plots.

For **which temporal assumptions** (lags in the mean, residual AR, time-varying effects, state-space) fit your question, see `vignette("temporal-dynamics-model-choice", package = "tidyILD")`.

## Simulate and prepare

```{r prepare}
library(tidyILD)
# Simulate simple ILD
d <- ild_simulate(n_id = 10, n_obs_per = 12, irregular = TRUE, seed = 42)
# Prepare: encode time structure and add .ild_* columns
x <- ild_prepare(d, id = "id", time = "time", gap_threshold = 7200)
```

## Inspect

```{r summary}
ild_summary(x)
ild_spacing_class(x)
```

## Within-person centering and lags

```{r center_lag}
x <- ild_center(x, y)
x <- ild_lag(x, y, mode = "gap_aware", max_gap = 7200)
```

## Fit a model

Without residual autocorrelation (lmer):

```{r lme_no_ar1}
fit0 <- ild_lme(y ~ 1 + (1 | id), data = x, ar1 = FALSE, warn_no_ar1 = FALSE)
```

With AR1 residual correlation (nlme):

```{r lme_ar1}
fit1 <- ild_lme(y ~ 1, data = x, ar1 = TRUE, correlation_class = "CAR1")
```

## Diagnostics and plots

```{r diagnostics}
diag <- ild_diagnostics(fit1, data = x)
names(diag)  # meta, data, stats
names(plot_ild_diagnostics(diag))  # plot names for requested types
# Pooled residual ACF (tibble)
head(diag$stats$acf$pooled)
# By-id ACF when by_id = TRUE: one tibble per person
names(diag$stats$acf$by_id)
head(diag$stats$acf$by_id[[1]])
```

```{r plot_trajectory, fig.alt = "Trajectory plot"}
ild_plot(x, type = "trajectory", var = "y", max_ids = 5)
```

```{r plot_fitted, fig.alt = "Fitted vs observed"}
ild_plot(fit1, type = "fitted")
```

## MSM-style weights (IPTW and IPCW) — optional

For a **marginal structural model**–style sensitivity analysis, you can build **inverse probability of treatment** weights: either **pooled** (`ild_iptw_weights()`) or **sequential MSM** for time-varying `A_t` (`ild_iptw_msm_weights()` after building history with `ild_lag()`), then **inverse probability of censoring** weights for **monotone dropout** (`ild_ipcw_weights()`), multiply into a joint analysis weight with `ild_joint_msm_weights()`, and refit with `ild_ipw_refit()`. This path targets **treatment assignment** and **loss-to-follow-up**; it is distinct from **outcome missingness** IPW (`ild_missing_model()` + `ild_ipw_weights()`). Assumptions (positivity, correct models) are still yours; for **uncertainty** with estimated weights, use **`ild_msm_bootstrap()`** (see `?ild_msm_inference`) instead of trusting default `lmer` SEs alone—**`weight_policy = "reestimate_weights"`** with a **`weights_fn`** that rebuilds IPW on each bootstrap resample is often the appropriate choice when you want first-stage uncertainty reflected; **`fixed_weights`** resamples clusters but keeps the weights attached to resampled rows (faster, approximate).

```{r msm_weights, eval = FALSE}
# Example skeleton (not run in the vignette build)
x2 <- ild_simulate(n_id = 12, n_obs_per = 10, seed = 1)
x2$stress <- rnorm(nrow(x2))
x2$trt <- rbinom(nrow(x2), 1L, 0.45)
x2 <- ild_prepare(x2, id = "id", time = "time")
x2 <- ild_center(x2, y)
x2 <- ild_iptw_weights(x2, treatment = "trt", predictors = "stress")
# Sequential A_t: x2 <- ild_lag(x2, stress); x2 <- ild_lag(x2, trt); ...
# x2 <- ild_iptw_msm_weights(x2, treatment = "trt", history = ~ stress_lag1 + trt_lag1)
x2 <- ild_ipcw_weights(x2, predictors = "stress")
x2 <- ild_joint_msm_weights(x2)
fit_msm <- ild_lme(y ~ y_bp + y_wp + stress + (1 | id), data = x2,
  ar1 = FALSE, warn_no_ar1 = FALSE, warn_uncentered = FALSE)
fit_msm_w <- ild_ipw_refit(fit_msm, data = x2)
```

**Balance and overlap:** After weights are attached, use `ild_msm_balance()` (weighted SMDs), `ild_ipw_ess()` (Kish effective sample size), and `ild_msm_overlap_plot()` (propensity densities by treatment—pooled IPTW or sequential MSM via `attr(x, "ild_iptw_msm_fits")`). Optional: `ild_diagnose(..., balance = TRUE, balance_treatment = "trt", balance_covariates = c("stress"))` fills `causal$balance` and may trigger guardrails `GR_MSM_BALANCE_SMD_HIGH` / `GR_MSM_ESS_LOW`. `ild_autoplot(bundle, section = "causal", type = "overlap", treatment = "trt")` draws overlap when `ggplot2` is available.

For a full assumptions-oriented walkthrough (exchangeability, positivity/overlap, consistency, and weight-model correctness) and simulation-based recovery checks, see `vignette("msm-identification-and-recovery", package = "tidyILD")`. When using `ild_msm_fit()`, check `fit_obj$inference$status`/`reason` for degraded paths, or set `strict_inference = TRUE` to fail fast.

A minimal **bootstrap** illustration (small `n_boot` for speed; increase in real analyses):

```{r msm_bootstrap, eval = requireNamespace("lme4", quietly = TRUE)}
set.seed(3)
xb <- ild_simulate(n_id = 10, n_obs_per = 5, seed = 3)
xb$stress <- rnorm(nrow(xb))
xb <- ild_prepare(xb, id = "id", time = "time")
xb <- ild_center(xb, y)
xb$.ipw <- runif(nrow(xb), 0.85, 1.15)
fb <- ild_lme(y ~ y_bp + y_wp + stress + (1 | id), data = xb,
  ar1 = FALSE, warn_no_ar1 = FALSE, warn_uncentered = FALSE)
fwb <- ild_ipw_refit(fb, data = xb, weights = ".ipw")
bs_fixed <- ild_msm_bootstrap(fwb, n_boot = 20L, weight_policy = "fixed_weights", seed = 3)
tidy_ild_msm_bootstrap(bs_fixed)
# reestimate_weights: weights_fn must return ILD with the weight column, e.g. re-run IPTW pipeline:
bs_re <- ild_msm_bootstrap(fwb, n_boot = 12L, weight_policy = "reestimate_weights",
  seed = 4, weights_fn = function(d) { d$.ipw <- runif(nrow(d), 0.85, 1.15); d })
```

## Reproducibility

Use a fixed seed when simulating or fitting models so results can be recreated. The pipeline is deterministic for a given seed and data. When saving results (e.g. after [ild_lme()] or [ild_diagnostics()]), you can attach a reproducibility manifest and save a single bundle with [ild_manifest()] and [ild_bundle()]:

```{r reproducibility}
# Optional: build a manifest with scenario and seed, then bundle the fit for saving
manifest <- ild_manifest(seed = 42, scenario = ild_summary(x), include_session = FALSE)
bundle <- ild_bundle(fit1, manifest = manifest, label = "model_ar1")
# saveRDS(bundle, "run.rds")  # one file with result + manifest + label
names(bundle)
```

For **simulation-based recovery and power** under the bundled `ild_simulate()` DGP, see `vignette("benchmark-simulation-recovery", package = "tidyILD")`.