---
title: "Benchmarking Recall and Latency"
output:
  litedown::html_format:
    meta:
      css: ["@default"]
---

<!--
%\VignetteEngine{litedown::vignette}
%\VignetteIndexEntry{Benchmarking Recall and Latency}
%\VignetteEncoding{UTF-8}
-->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

options(bigANNOY.progress = FALSE)
set.seed(20260326)
```

`bigANNOY` includes exported benchmark helpers so you can measure three related
things with the same interface:

- index build time
- search time
- optional Euclidean recall against an exact `bigKNN` baseline
- comparison against direct `RcppAnnoy`
- scaling with data volume and generated index size

This vignette shows how to use those helpers for both quick one-off runs and
small parameter sweeps.

## What the Benchmark Helpers Do

The package currently exports four benchmark functions:

- `benchmark_annoy_bigmatrix()` for one build-and-search configuration
- `benchmark_annoy_recall_suite()` for a grid of `n_trees` and `search_k`
  settings on the same dataset
- `benchmark_annoy_vs_rcppannoy()` for a direct comparison between the
  package's `bigmemory` workflow and a dense `RcppAnnoy` baseline
- `benchmark_annoy_volume_suite()` for scaling studies across larger synthetic
  data sizes

These helpers can work with:

- synthetic data generated on the fly
- user-supplied dense matrices
- `big.matrix` inputs, descriptors, descriptor paths, and external pointers

They can also write summaries to CSV so results can be saved outside the
current R session, and the comparison helpers add byte-oriented fields for the
reference data, query data, Annoy index file, and total persisted artifacts.

## Load the Package

```{r}
library(bigANNOY)
```

## Create a Benchmark Workspace

We will write any temporary benchmark files into a dedicated directory so the
workflow is easy to inspect.

```{r}
bench_dir <- tempfile("bigannoy-benchmark-")
dir.create(bench_dir, recursive = TRUE, showWarnings = FALSE)
bench_dir
```

## A Single Synthetic Benchmark Run

The simplest benchmark call uses synthetic data. This is useful when you want
a quick sense of how build and search times respond to `n_trees`, `search_k`,
and the problem dimensions.

```{r}
single_csv <- file.path(bench_dir, "single.csv")

single <- benchmark_annoy_bigmatrix(
  n_ref = 200L,
  n_query = 20L,
  n_dim = 6L,
  k = 3L,
  n_trees = 10L,
  search_k = 50L,
  exact = FALSE,
  path_dir = bench_dir,
  output_path = single_csv,
  load_mode = "eager"
)

single$summary
```

The returned object contains more than just the summary row.

```{r}
names(single)
single$params
single$exact_available
```

Because `exact = FALSE`, the benchmark skips the exact `bigKNN` comparison and
focuses only on the approximate Annoy path.

## Validation Is Part of the Benchmark Workflow

The benchmark helpers also validate the built Annoy index before measuring the
search step. That helps ensure the timing result corresponds to a usable,
reopenable index rather than a partially successful build.

```{r}
single$validation$valid
single$validation$checks[, c("check", "passed", "severity")]
```

The same summary is also written to CSV when `output_path` is supplied.

```{r}
read.csv(single_csv, stringsAsFactors = FALSE)
```

## External-Query Versus Self-Search Benchmarks

One subtle but important detail is how synthetic data generation works:

- if `x = NULL` and `query` is omitted, the benchmark generates a separate
  synthetic query matrix
- if `x = NULL` and `query = NULL` is supplied explicitly, the benchmark runs
  self-search on the reference matrix

That difference is reflected in the `self_search` and `n_query` fields.

```{r}
external_run <- benchmark_annoy_bigmatrix(
  n_ref = 120L,
  n_query = 12L,
  n_dim = 5L,
  k = 3L,
  n_trees = 8L,
  exact = FALSE,
  path_dir = bench_dir
)

self_run <- benchmark_annoy_bigmatrix(
  n_ref = 120L,
  query = NULL,
  n_dim = 5L,
  k = 3L,
  n_trees = 8L,
  exact = FALSE,
  path_dir = bench_dir
)

shape_cols <- c("self_search", "n_ref", "n_query", "k")

rbind(
  external = external_run[["summary"]][, shape_cols],
  self = self_run[["summary"]][, shape_cols]
)
```

That distinction matters when you are benchmarking workflows that mirror either
training-set neighbour search or truly external query traffic.

## Benchmark a Recall Suite Across Parameter Grids

For tuning work, a single benchmark point is usually not enough. The suite
helper runs a grid of `n_trees` and `search_k` values on the same dataset so
you can compare trade-offs more systematically.

```{r}
suite_csv <- file.path(bench_dir, "suite.csv")

suite <- benchmark_annoy_recall_suite(
  n_ref = 200L,
  n_query = 20L,
  n_dim = 6L,
  k = 3L,
  n_trees = c(5L, 10L),
  search_k = c(-1L, 50L),
  exact = FALSE,
  path_dir = bench_dir,
  output_path = suite_csv,
  load_mode = "eager"
)

suite$summary
```

Each row corresponds to one `(n_trees, search_k)` configuration on the same
underlying benchmark dataset.

The saved CSV contains the same summary table.

```{r}
read.csv(suite_csv, stringsAsFactors = FALSE)
```

## Optional Exact Recall Against bigKNN

For Euclidean workloads, the benchmark helpers can optionally compare Annoy
results against the exact `bigKNN` baseline and report:

- `exact_elapsed`
- `recall_at_k`

That comparison is only available when the runtime package `bigKNN` is
installed.

```{r}
if (length(find.package("bigKNN", quiet = TRUE)) > 0L) {
  exact_run <- benchmark_annoy_bigmatrix(
    n_ref = 150L,
    n_query = 15L,
    n_dim = 5L,
    k = 3L,
    n_trees = 10L,
    search_k = 50L,
    metric = "euclidean",
    exact = TRUE,
    path_dir = bench_dir
  )

  exact_run$exact_available
  exact_run$summary[, c("build_elapsed", "search_elapsed", "exact_elapsed", "recall_at_k")]
} else {
  "Exact baseline example skipped because bigKNN is not installed."
}
```

This is the most direct way to answer the practical question, "How much search
speed am I buying, and what recall do I lose in return?"

## Benchmark User-Supplied Data

Synthetic data is convenient, but real benchmarking usually needs real data.
Both benchmark helpers can accept user-supplied reference and query inputs.

```{r}
ref <- matrix(rnorm(80 * 4), nrow = 80, ncol = 4)
query <- matrix(rnorm(12 * 4), nrow = 12, ncol = 4)

user_run <- benchmark_annoy_bigmatrix(
  x = ref,
  query = query,
  k = 3L,
  n_trees = 12L,
  search_k = 40L,
  exact = FALSE,
  filebacked = TRUE,
  path_dir = bench_dir,
  load_mode = "eager"
)

user_run$summary[, c(
  "filebacked",
  "self_search",
  "n_ref",
  "n_query",
  "n_dim",
  "build_elapsed",
  "search_elapsed"
)]
```

When `filebacked = TRUE`, dense reference inputs are first converted into a
file-backed `big.matrix` before the Annoy build starts. That can be useful
when you want the benchmark workflow to resemble the package's real persisted
data path more closely.

## Compare bigANNOY with Direct RcppAnnoy

When you want to understand the cost of the `bigmemory`-oriented wrapper
itself, the most useful benchmark is not an exact Euclidean baseline. It is a
direct comparison with plain `RcppAnnoy`, using the same synthetic dataset, the
same metric, the same `n_trees`, and the same `search_k`.

That is what `benchmark_annoy_vs_rcppannoy()` provides.

```{r}
compare_csv <- file.path(bench_dir, "compare.csv")

compare_run <- benchmark_annoy_vs_rcppannoy(
  n_ref = 200L,
  n_query = 20L,
  n_dim = 6L,
  k = 3L,
  n_trees = 10L,
  search_k = 50L,
  exact = FALSE,
  path_dir = bench_dir,
  output_path = compare_csv,
  load_mode = "eager"
)

compare_run$summary[, c(
  "implementation",
  "reference_storage",
  "n_ref",
  "n_query",
  "n_dim",
  "total_data_bytes",
  "index_bytes",
  "build_elapsed",
  "search_elapsed"
)]
```

This benchmark is useful for a different question from the earlier exact
baseline:

- `benchmark_annoy_bigmatrix()` asks how approximate Annoy behaves on a given
  dataset and, optionally, how much recall it loses against exact `bigKNN`
- `benchmark_annoy_vs_rcppannoy()` asks how much overhead or benefit comes from
  the package's `bigmemory` and persistence workflow relative to direct
  `RcppAnnoy`

The output also includes data-volume fields:

- `ref_bytes`: estimated bytes in the reference matrix
- `query_bytes`: estimated bytes in the query matrix
- `total_data_bytes`: reference plus effective query volume
- `index_bytes`: bytes in the saved Annoy index
- `metadata_bytes`: bytes in the sidecar metadata file
- `artifact_bytes`: persisted Annoy artifacts written by the workflow

The generated CSV contains the same comparison table.

```{r}
read.csv(compare_csv, stringsAsFactors = FALSE)[, c(
  "implementation",
  "ref_bytes",
  "query_bytes",
  "index_bytes",
  "metadata_bytes",
  "artifact_bytes"
)]
```

In practice, the comparison table helps answer two operational questions:

- Is `bigANNOY` close enough to plain `RcppAnnoy` on build and search speed for
  this workload?
- How large is the persisted Annoy index relative to the input data volume?

## Benchmark Scaling by Data Volume

A single comparison point is useful, but it does not tell you whether the
wrapper overhead stays modest as the problem gets larger. The volume suite runs
the same `bigANNOY` versus `RcppAnnoy` comparison across a grid of synthetic
data sizes.

```{r}
volume_csv <- file.path(bench_dir, "volume.csv")

volume_run <- benchmark_annoy_volume_suite(
  n_ref = c(200L, 500L),
  n_query = 20L,
  n_dim = c(6L, 12L),
  k = 3L,
  n_trees = 10L,
  search_k = 50L,
  exact = FALSE,
  path_dir = bench_dir,
  output_path = volume_csv,
  load_mode = "eager"
)

volume_run$summary[, c(
  "implementation",
  "n_ref",
  "n_dim",
  "total_data_bytes",
  "index_bytes",
  "build_elapsed",
  "search_elapsed"
)]
```

This kind of table is especially useful when you want to prepare a more formal
benchmark note for a package release or for internal performance regression
tracking:

- it shows how build time changes as reference size grows
- it shows how query time changes as dimension grows
- it shows whether index size scales roughly as expected with data volume
- it makes the `bigANNOY` versus direct `RcppAnnoy` gap visible across more
  than one benchmark point

## Interpreting the Main Summary Columns

The most useful summary fields are:

- `build_elapsed`: time spent creating the Annoy index
- `search_elapsed`: time spent running the search step
- `exact_elapsed`: time spent on the exact Euclidean baseline, when available
- `recall_at_k`: average overlap with the exact top-`k` neighbours
- `implementation`: whether the row came from `bigANNOY` or direct `RcppAnnoy`
- `n_trees`: index quality/size control at build time
- `search_k`: query effort control at search time
- `self_search`: whether the benchmark searched the reference rows against
  themselves
- `filebacked`: whether dense reference data was converted into a file-backed
  `big.matrix`
- `ref_bytes`, `query_bytes`, and `index_bytes`: the rough data and artifact
  volume associated with the benchmark

In practice:

- raise `search_k` first when recall is too low
- increase `n_trees` when higher search budgets alone are not enough
- compare `search_elapsed` and `recall_at_k` together instead of optimizing
  either one in isolation
- use `benchmark_annoy_vs_rcppannoy()` when you want to reason about package
  overhead rather than approximate-versus-exact quality
- use `benchmark_annoy_volume_suite()` when you need a more formal scaling
  table for release notes or internal reports

## Installed Benchmark Runner

The package also installs a command-line benchmark script. That is convenient
when you want to run a benchmark outside an interactive R session or save CSV
output from shell scripts.

The installed path is:

```{r}
system.file("benchmarks", "benchmark_annoy.R", package = "bigANNOY")
```

Example single-run command:

```r
Rscript "$(R -q -e 'cat(system.file(\"benchmarks\", \"benchmark_annoy.R\", package = \"bigANNOY\"))')" \
  --mode=single \
  --n_ref=5000 \
  --n_query=500 \
  --n_dim=50 \
  --k=20 \
  --n_trees=100 \
  --search_k=5000 \
  --load_mode=eager
```

Example suite command:

```r
Rscript "$(R -q -e 'cat(system.file(\"benchmarks\", \"benchmark_annoy.R\", package = \"bigANNOY\"))')" \
  --mode=suite \
  --n_ref=5000 \
  --n_query=500 \
  --n_dim=50 \
  --k=20 \
  --suite_trees=10,50,100 \
  --suite_search_k=-1,2000,10000 \
  --output_path=/tmp/bigannoy_suite.csv
```

Example direct-comparison command:

```r
Rscript "$(R -q -e 'cat(system.file(\"benchmarks\", \"benchmark_annoy.R\", package = \"bigANNOY\"))')" \
  --mode=compare \
  --n_ref=5000 \
  --n_query=500 \
  --n_dim=50 \
  --k=20 \
  --n_trees=100 \
  --search_k=5000 \
  --load_mode=eager
```

Example volume-suite command:

```r
Rscript "$(R -q -e 'cat(system.file(\"benchmarks\", \"benchmark_annoy.R\", package = \"bigANNOY\"))')" \
  --mode=volume \
  --suite_n_ref=2000,5000,10000 \
  --suite_n_query=200 \
  --suite_n_dim=20,50 \
  --k=10 \
  --n_trees=50 \
  --search_k=1000 \
  --output_path=/tmp/bigannoy_volume.csv
```

## Recommended Workflow

A practical tuning workflow usually looks like this:

1. start with a small single benchmark to confirm dimensions and plumbing
2. switch to a suite over a small `n_trees` by `search_k` grid
3. enable exact Euclidean benchmarking when `bigKNN` is available
4. compare recall and latency together
5. repeat the same workflow on user-supplied data before drawing conclusions

## Recap

`bigANNOY`'s benchmark helpers are designed to make performance work part of
the normal package workflow, not a separate ad hoc script:

- `benchmark_annoy_bigmatrix()` for one configuration
- `benchmark_annoy_recall_suite()` for parameter sweeps
- `benchmark_annoy_vs_rcppannoy()` for direct implementation comparison
- `benchmark_annoy_volume_suite()` for speed and size scaling studies
- optional exact recall against `bigKNN`
- CSV output for saved summaries
- support for both synthetic and user-supplied data

The next vignette to read after this one is usually *Metrics and Tuning*,
which goes deeper on how to choose metrics and search/build controls.
