---
title: "Persistent Indexes and Lifecycle"
output:
  litedown::html_format:
    meta:
      css: ["@default"]
---

<!--
%\VignetteEngine{litedown::vignette}
%\VignetteIndexEntry{Persistent Indexes and Lifecycle}
%\VignetteEncoding{UTF-8}
-->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

options(bigANNOY.progress = FALSE)
set.seed(20260326)
```

`bigANNOY` v3 adds explicit index lifecycle support around persisted Annoy
files. That makes it possible to:

- build an index once and reopen it later
- choose whether a reopened index should load eagerly or lazily
- check whether a native handle is currently live
- close loaded handles explicitly in long sessions
- validate the Annoy file against its recorded metadata before reuse

This vignette focuses on those operational workflows rather than on search
quality or benchmark tuning.

## Why Lifecycle Management Matters

Annoy indexes are stored on disk. In practice, that means the useful object is
not just the result of a single build call, but a persisted pair:

- the `.ann` index file
- the `.meta` sidecar metadata file

The `bigannoy_index` object returned by `bigANNOY` is a session-level wrapper
around those files. It remembers the key metadata and can optionally hold a
live native handle for faster repeated searches within the same R session.

## Load the Packages

```{r}
library(bigANNOY)
library(bigmemory)
```

## Build an Index in Lazy Mode

We will create a small reference matrix, write the Annoy index into a temporary
directory, and keep the returned object in lazy mode so the first search is
what loads the live handle.

```{r}
artifact_dir <- file.path(tempdir(), "bigannoy-lifecycle")
dir.create(artifact_dir, recursive = TRUE, showWarnings = FALSE)

ref_dense <- matrix(
  c(
    0.0, 0.1, 0.2,
    0.1, 0.0, 0.1,
    0.2, 0.1, 0.0,
    1.0, 1.1, 1.2,
    1.1, 1.0, 1.1,
    1.2, 1.1, 1.0
  ),
  ncol = 3,
  byrow = TRUE
)

ref_big <- as.big.matrix(ref_dense)
index_path <- file.path(artifact_dir, "ref.ann")
metadata_path <- paste0(index_path, ".meta")

index <- annoy_build_bigmatrix(
  ref_big,
  path = index_path,
  n_trees = 25L,
  metric = "euclidean",
  seed = 123L,
  load_mode = "lazy"
)

index
```

The returned object points to the persisted files, but the native handle is
not loaded yet.

```{r}
annoy_is_loaded(index)
file.exists(index$path)
file.exists(index$metadata_path)
```

## Inspect the Sidecar Metadata

The sidecar metadata file is meant to support safe reopen and validation
workflows. It records the metric, dimension, item count, build settings, and a
small file signature for the persisted Annoy file.

```{r}
metadata <- read.dcf(index$metadata_path)
metadata[, c(
  "index_id",
  "metric",
  "n_dim",
  "n_ref",
  "n_trees",
  "build_seed",
  "build_backend",
  "file_size",
  "file_md5"
)]
```

The important point is not the exact formatting of the metadata file, but that
the persisted index is now self-describing enough to be reopened and checked in
later sessions.

## Lazy Loading Versus Eager Loading

There are two lifecycle modes:

- `"lazy"` keeps only metadata in memory until the first search
- `"eager"` loads a native handle immediately when the index object is created
  or reopened

The index we just built is lazy.

```{r}
annoy_is_loaded(index)
```

The first search loads the handle automatically.

```{r}
first_result <- annoy_search_bigmatrix(index, k = 2L, search_k = 100L)

annoy_is_loaded(index)
first_result$index
round(first_result$distance, 3)
```

Once the handle is loaded, repeated searches in the same session can reuse it.

```{r}
second_result <- annoy_search_bigmatrix(index, k = 2L, search_k = 100L)

identical(first_result$index, second_result$index)
all.equal(first_result$distance, second_result$distance)
```

## Validate Without Loading

Validation and loading are related, but they are not the same thing. Sometimes
you want to confirm that the metadata and file signature still look right
without paying the cost of loading the native handle yet.

```{r}
annoy_close_index(index)
annoy_is_loaded(index)

validation_no_load <- annoy_validate_index(
  index,
  strict = TRUE,
  load = FALSE
)

validation_no_load$valid
validation_no_load$checks[, c("check", "passed", "severity")]
annoy_is_loaded(index)
```

Because `load = FALSE`, the validation report checks the recorded metadata
against the current file without changing the loaded state of the object.

## Validate and Load Explicitly

If you do want validation to also confirm that the Annoy index can be opened
successfully, set `load = TRUE`.

```{r}
validation_with_load <- annoy_validate_index(
  index,
  strict = TRUE,
  load = TRUE
)

validation_with_load$valid
tail(validation_with_load$checks[, c("check", "passed", "severity")], 2L)
annoy_is_loaded(index)
```

This is a useful pattern before long-running queries or before handing a
reopened index to downstream analysis code.

## Close a Loaded Handle Explicitly

Explicit close support is helpful in long R sessions, in tests, and in code
that wants deterministic control over when handles are released.

```{r}
annoy_close_index(index)
annoy_is_loaded(index)
```

The persisted `.ann` file is still there, so the next search can load it
again.

```{r}
reload_result <- annoy_search_bigmatrix(index, k = 2L, search_k = 100L)

annoy_is_loaded(index)
reload_result$index
```

## Reopen the Same Index in a New Object

The more important persistence workflow is reopening the same files into a new
`bigannoy_index` object. This is what a later R session would typically do.

`annoy_open_index()` and `annoy_load_bigmatrix()` both support this pattern.
The main distinction is semantic: `annoy_load_bigmatrix()` is a friendlier name
when you are thinking in terms of `bigmemory` workflows, while
`annoy_open_index()` makes the persisted-index lifecycle more explicit.

```{r}
reopened_lazy <- annoy_open_index(
  path = index$path,
  load_mode = "lazy"
)

reopened_eager <- annoy_load_bigmatrix(
  path = index$path,
  load_mode = "eager"
)

annoy_is_loaded(reopened_lazy)
annoy_is_loaded(reopened_eager)
```

The eager reopen path loads immediately. The lazy reopen path waits until first
use.

```{r}
reopened_result <- annoy_search_bigmatrix(
  reopened_lazy,
  k = 2L,
  search_k = 100L
)

annoy_is_loaded(reopened_lazy)
reopened_result$index
```

## Lifecycle State Lives in the Session Object

The persisted files are shared, but loaded-state tracking is per-object and
per-session. Closing one in-memory object does not invalidate another object
that already opened the same index.

```{r}
annoy_close_index(reopened_lazy)
c(
  original = annoy_is_loaded(index),
  reopened_lazy = annoy_is_loaded(reopened_lazy),
  reopened_eager = annoy_is_loaded(reopened_eager)
)
```

This is a useful mental model:

- the `.ann` file is the durable asset
- the `bigannoy_index` object is the session-level controller
- the loaded handle is cached inside that controller only for the current
  session

## What Happens If Validation Fails?

In normal workflows, `annoy_validate_index(..., strict = TRUE)` is the safest
default because it stops immediately when critical checks fail. If you want a
diagnostic report instead of an error, use `strict = FALSE`.

```{r}
report <- annoy_validate_index(
  reopened_eager,
  strict = FALSE,
  load = FALSE
)

report$valid
report$checks[, c("check", "passed", "severity")]
```

That pattern is especially helpful when you are writing higher-level code that
wants to display a validation report before deciding whether to rebuild or
reload an index.

## Recommended Workflow

For most projects, a sensible lifecycle pattern looks like this:

1. build the index once with `annoy_build_bigmatrix()`
2. keep the `.ann` file and the `.meta` file together
3. reopen with `annoy_open_index()` or `annoy_load_bigmatrix()` in later
   sessions
4. run `annoy_validate_index()` before important downstream work
5. use lazy loading for lighter startup or eager loading for repeated search
   sessions
6. call `annoy_close_index()` when you want explicit control over loaded
   handles

## Recap

`bigANNOY` v3 turns persisted Annoy files into a more explicit lifecycle:

- build once, reopen later
- choose eager or lazy loading
- test loaded state with `annoy_is_loaded()`
- close handles with `annoy_close_index()`
- validate persisted files with `annoy_validate_index()`

The next vignette to read after this one is usually *File-Backed bigmemory
Workflows*, which focuses on descriptor files, file-backed matrices, and
streamed output destinations.
