---
title: "Using {cmdstanr} with {rixpress}"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using cmdstanr with rixpress}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This vignette details how to effectively use the `{cmdstanr}` package within a
`{rixpress}` pipeline for Bayesian statistical modelling with Stan. For a general
introduction to `{rixpress}` and its core concepts, please refer to
`vignette("intro-concepts")` and `vignette("core-functions")`.

`{cmdstanr}` provides a user-friendly R interface to `cmdstan`, Stan's
command-line interface. While powerful, its reliance on external processes and
file system interactions requires careful handling within the hermetic build
environment of `{rixpress}`.

## Setting up the Environment

As with any `{rixpress}` pipeline, the first step is to define the execution
environment using `{rix}`:

```r
library(rix)

rix(
  date = "2025-04-29",
  r_pkgs = c("readr", "dplyr", "ggplot2"), # Add other R packages as needed
  system_pkgs = "cmdstan", # Crucial: include cmdstan as a system dependency
  git_pkgs = list(
    list(
      package_name = "cmdstanr",
      repo_url = "https://github.com/stan-dev/cmdstanr",
      commit = "79d37792d8e4ffcf3cf721b8d7ee4316a1234b0c" # Pin to a specific commit
    ),
    list(
      package_name = "rixpress",
      repo_url = "https://github.com/ropensci/rixpress",
      commit = "HEAD" # Or pin to a specific commit
    )
  ),
  ide = "none", # Or your preferred IDE
  project_path = ".",
  overwrite = TRUE
)
```

Key points in this environment definition:

-   `cmdstan` is included in `system_pkgs`. This makes the `cmdstan` executables
    available to the pipeline.
-   `{cmdstanr}` is installed from its GitHub repository, as it's not available
    on CRAN. Pinning to a specific commit is recommended for maximum
    reproducibility.

With the environment set up, we can define the pipeline:

## Setting up the pipeline

The Stan model code itself should reside in a `.stan` file. We use
`rxp_r_file()` to bring its contents into the pipeline as a character string.

```r
rxp_r_file(
  bayesian_linear_regression_model,
  "model.stan",
  readLines
)
```


Next, we define parameters and simulate some data for our model.

```r
  rxp_r(
    parameters,
    list(
      N = 100,
      alpha = 2,
      beta = -0.5,
      sigma = 1.e-1
    )
  ),
  rxp_r(
    x,
    rnorm(parameters$N, 0, 1)
  ),
  rxp_r(
    y,
    rnorm(
      n = parameters$N,
      mean = parameters$alpha + parameters$beta * x,
      sd = parameters$sigma
    )
  ),
  rxp_r(
    # Prepare the data list for cmdstanr
    inputs,
    list(N = parameters$N, x = x, y = y)
  ),
```

## Compiling and Sampling the Model

Interfacing with `cmdstan` from within `{rixpress}` requires a specific strategy
due to the hermetic nature of Nix sandboxes. We'll use a wrapper function to
handle model compilation and sampling within a single `rxp_r()` step.

First, let's define the wrapper function (e.g., in a `functions.R` file that
we'll include):

```r
# In functions.R
cmdstan_model_wrapper <- function(
  stan_string = NULL, # The Stan model code as a character string
  inputs,             # Data list for the model
  seed,               # Seed for reproducibility
  ...                 # Additional arguments for cmdstan_model or sample
) {
  # Create a temporary .stan file within the sandbox
  stan_file <- tempfile(pattern = "model_", fileext = ".stan")
  writeLines(stan_string, con = stan_file)

  # Compile the Stan model
  # cmdstanr will find cmdstan via the CMDSTAN environment variable
  model <- cmdstanr::cmdstan_model(
    stan_file = stan_file,
    ...
  )

  # Sample from the posterior
  fitted_model <- model$sample(
    data = inputs,
    seed = seed,
    ...
  )

  return(fitted_model)
}
```

Now, we use this wrapper in our pipeline:

```r
# ... (continuation of pipeline_steps list)
  rxp_r(
    model, # Target name for the fitted model object
    cmdstan_model_wrapper(
      stan_string = bayesian_linear_regression_model,
      inputs = inputs,
      seed = 22
    ),
    user_functions = "functions.R",
    encoder = "save_model",
    env_var = c("CMDSTAN" = "${defaultPkgs.cmdstan}/opt/cmdstan")
  )
```

**Explanation of the Wrapper Approach:**

1.  **`stan_string = bayesian_linear_regression_model`**: We pass the model code
    (read by `rxp_r_file`) as a string to our wrapper.
2.  **`writeLines(stan_string, con = stan_file)`**: Inside the wrapper, the Stan
    code is written to a temporary `.stan` file. This file exists *within the
    sandbox* of the current `rxp_r` step. This is crucial because
    `cmdstan_model` needs a file path. Attempting to pass the original
    `model.stan` path directly via `additional_files` to `cmdstan_model` can
    lead to permission or path issues when `cmdstan` tries to compile it from a
    different working directory or context.
3.  **`cmdstanr::cmdstan_model()`**: Compiles the model from the temporary
    `stan_file`.
4.  **`model$sample()`**: Samples from the compiled model.
5.  **Single Step**: Both compilation and sampling *must* happen within the same
    `rxp_r` step (and thus the same sandbox). This is because the `model` object
    returned by `cmdstan_model()` contains paths to the compiled executable. If
    these were separate steps, the paths from the compilation sandbox wouldn't
    be valid in the sampling sandbox.
6.  **`env_var = c("CMDSTAN" = "${defaultPkgs.cmdstan}/opt/cmdstan")`**: This
    sets the `CMDSTAN` environment variable within the sandbox for this specific
    step. `{cmdstanr}` uses this variable to locate the `cmdstan` installation.
    The `${defaultPkgs.cmdstan}` is a Nix interpolation that resolves to the
    path of the `cmdstan` package in the Nix store. If the environment providing
    `cmdstan` were named differently, for example `cmdstan-env.nix`, then you would
    need to use `${cmdstan_envPkgs.cmdstan}`.

### Custom Serialisation

`{cmdstanr}` provides a specific method for saving fitted model objects to
ensure all necessary components are preserved. We define a simple wrapper for
this to use with `{rixpress}`.

```r
save_model <- function(fitted_model, path, ...) {
  fitted_model$save_object(file = path, ...)
}
```

By specifying `encoder = "save_model"` in the `rxp_r()` call,
`{rixpress}` will use this function instead of the default `saveRDS()`. The fitted
model can then be read using `rxp_read("model")`, which will internally use
`readRDS()`.

## Summary

Using `{cmdstanr}` with `{rixpress}` involves these key considerations:

- Include `cmdstan` in `system_pkgs` and `{cmdstanr}` (from Git) in your
  `{rix}` environment definition.

- Read your `.stan` file into the pipeline using `rxp_r_file()`.

- Implement a wrapper function that:
   *   Takes the model code string and writes it to a temporary `.stan` file
       *inside the wrapper*.
   *   Calls `cmdstanr::cmdstan_model()` on this temporary file.
   *   Calls `model$sample()` to fit the model.
   *   Returns the fitted model object.

- Perform model compilation and sampling within the *same* `rxp_r()` call using
  the wrapper.

- Set the `CMDSTAN` environment variable for the `rxp_r()` step that runs the
  wrapper, pointing to the Nix store path of `cmdstan`.

- Use `{cmdstanr}`'s `$save_object()` method via a custom `encoder`
  for robust saving of the fitted model.

This approach ensures that `cmdstan` can operate correctly within the isolated
and reproducible environment provided by `{rixpress}` and Nix.
