---
title: "Automatic Variable Labeling"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Automatic Variable Labeling}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

options(gtsummary.print_engine = "gt")
```

```{r setup}
#| eval: false
library(sumExtras)
library(gtsummary)
library(dplyr)

use_jama_theme()
```

```{r setup2}
#| echo: false
#| message: false
#| warning: false
library(sumExtras)
library(gtsummary)
library(dplyr)
library(gt)

use_jama_theme()
```

Raw variable names like `trt`, `marker`, and `grade` don't belong in a publication table. If you're building 20+ tables across an analysis, manually relabeling the same variables in every `tbl_summary()` call is time consuming. `add_auto_labels()` lets you define labels once and apply them everywhere.

## Creating a Data Dictionary

A dictionary is a data frame with two columns: `variable` (exact variable names) and `description` (the labels you want displayed). Column names are case-insensitive.

```{r}
dictionary <- tibble::tribble(
  ~variable,    ~description,
  "trt",        "Chemotherapy Treatment",
  "age",        "Age at Enrollment (years)",
  "marker",     "Marker Level (ng/mL)",
  "stage",      "T Stage",
  "grade",      "Tumor Grade",
  "response",   "Tumor Response",
  "death",      "Patient Died"
)

dictionary
```

In practice, you could load this from a CSV or define it once at the top of your analysis script.

## Labeling gtsummary Tables

### Pass the Dictionary Explicitly

```{r}
trial |>
  tbl_summary(by = trt, include = c(age, grade, marker)) |>
  extras() |> 
  add_auto_labels(dictionary = dictionary)
```

### Automatic Discovery

If a `dictionary` object exists in your environment, `add_auto_labels()` finds it without you passing it:

```{r}
# dictionary already exists from above
trial |>
  tbl_summary(by = trt, include = c(age, stage, response)) |>
  extras() |> 
  add_auto_labels()
```

### Pre-Labeled Data

If your data already has label attributes (e.g., from `haven::read_sas()` or manual assignment), `add_auto_labels()` reads those directly:

```{r}
labeled_trial <- trial
attr(labeled_trial$age, "label") <- "Patient Age at Baseline"
attr(labeled_trial$marker, "label") <- "Biomarker Concentration (ng/mL)"

labeled_trial |>
  tbl_summary(by = trt, include = c(age, marker)) |>
  extras() |> 
  add_auto_labels()
```

### Manual Overrides Always Win

Labels set via `label = list(...)` in `tbl_summary()` always take priority over dictionary or attribute labels:

```{r}
trial |>
  tbl_summary(
    by = trt,
    include = c(age, grade, marker),
    label = list(age ~ "Age (from tbl_summary function)")
  ) |>
  extras() |> 
  add_auto_labels(dictionary = dictionary)
```

### Regression Tables

Works with `tbl_regression()` the same way:

```{r}
lm(marker ~ age + grade + stage, data = trial) |>
  tbl_regression() |>
  add_auto_labels()
```

## Label Priority

When both dictionary labels and attribute labels exist for the same variable, attribute labels take priority by default:

1. **Manual labels** (from `label = list(...)` in `tbl_summary()`) always win
2. **Attribute labels** (from `attr(data$var, "label")`) take priority over dictionary
3. **Dictionary labels** are used as a fallback

We recommend setting `options(sumExtras.prefer_dictionary = TRUE)` so dictionary labels take priority over attribute labels. This is especially useful when your imported data has generic attribute labels but your dictionary has the labels you actually want in publication tables. See `vignette("options")` for details.

```{r}
trial_both <- trial
attr(trial_both$age, "label") <- "Age from Attribute"

dictionary_conflict <- tibble::tribble(
  ~variable, ~description,
  "age", "Age from Dictionary"
)

# Attribute wins over dictionary
trial_both |>
  tbl_summary(by = trt, include = age) |>
  add_auto_labels(dictionary = dictionary_conflict) |>
  extras()
```

## Automatic Labeling via Options

If you always keep a `dictionary` in your environment, you can skip calling `add_auto_labels()` entirely. Set this once per session (or put it in your `.Rprofile`):

```{r, eval=FALSE}
options(sumExtras.auto_labels = TRUE)
```

Now every `extras()` call picks up the dictionary automatically:

```{r, eval=FALSE}
dictionary <- tibble::tribble(
  ~variable,    ~description,
  "age",        "Age at Enrollment (years)",
  "marker",     "Marker Level (ng/mL)",
  "grade",      "Tumor Grade"
)

# No add_auto_labels() needed
trial |>
  tbl_summary(by = trt) |>
  extras()
```

If no dictionary is found and the data has no label attributes, `extras()` continues normally. If something goes wrong, it warns and moves on. You can still call `add_auto_labels()` explicitly whenever you need per-table control.

See `vignette("options")` for more on `.Rprofile` setup.

## More Vignettes

* `vignette("sumExtras-intro")` -- getting started with extras()
* `vignette("styling")` -- group headers and advanced formatting
* `vignette("themes")` -- JAMA compact themes for `{gtsummary}` and `{gt}`