---
title: "Multiple-Testing Corrections: Personality Structure in the spi Dataset"
author: "Andreas M. Brandmaier"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Multiple-Testing Corrections: Personality Structure in the spi Dataset}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Overview

This vignette demonstrates the application of multiple-testing corrections in SEM trees. It shows this with a simple SEM tree based on two personality dimensions from the `spi` dataset included in the `psychTools` package. We will

1. compute two higher-order personality dimensions,
2. specify a bivariate structural equation model in `OpenMx` that captures their variances and covariance,
3. grow a SEM tree with demographic predictors, and
4. illustrate the Bonferroni correction for multiple testing.

First, we load the necessary packages:

```{r setup}
library(psychTools)
library(psych)
library(OpenMx)
library(semtree)
```

## Prepare the spi data

The `psychTools::spi` data frame contains a large set of personality items along with demographic variables such as age, education, and gender. The companion object `spi.keys` holds scoring keys that map items to broad personality dimensions. Using `psych::scoreFast()`, we can quickly compute scale scores while handling reverse-keyed items.

```{r}
data(spi)
data(spi.keys)

# score all available scales and extract two broad dimensions
spi_scored <- scoreFast(keys = spi.keys, items = spi)


selected_traits <- c("Agree-A", "Extra-A")

trait_scores <- spi_scored[, selected_traits, drop = FALSE]

# potential predictors available in the dataset
candidate_covariates <- intersect(c("age", "education", "gender"), names(spi))
spi_analysis <- cbind(trait_scores, spi[, candidate_covariates, drop = FALSE])
spi_analysis <- na.omit(spi_analysis)

selected_traits <- c("Agreeableness", "Extraversion")
names(spi_analysis)[1:2] <- selected_traits

# add some noise
set.seed(104358)
spi_analysis$noise1 <- ordered(sample(c(0,1), nrow(spi_analysis), TRUE))
spi_analysis$noise2 <- ordered(sample(c(1:10), nrow(spi_analysis), TRUE))
spi_analysis$noise3 <- ordered(sample(c(1:10), nrow(spi_analysis), TRUE))
spi_analysis$noise4 <- ordered(sample(c(1:20), nrow(spi_analysis), TRUE))

spi_analysis$education <- ordered(spi_analysis$education)
```

The resulting `spi_analysis` data frame contains the two selected personality scales along with any demographic predictors that are present in the dataset. In addition, we simulate some uninformative _noise_ predictors.

## Specify a two-dimensional OpenMx model

We now specify a concise bivariate SEM that estimates the means, variances, and covariance of the two personality dimensions.

```{r}

two_dim_model <- mxModel(
  "TwoDimensions",
  type = "RAM",
  manifestVars = selected_traits,
  mxData(spi_analysis[, selected_traits, drop = FALSE], type = "raw"),
  # freely estimate the variances and their covariance
  mxPath(
    from = selected_traits,
    arrows = 2,
    connect = "unique.pairs",
    free = TRUE,
    values = c(1, 0.2, 1),
    labels = c("var_trait1", "cov_traits", "var_trait2")
  ),
  # estimate manifest means
  mxPath(
    from = "one",
    to = selected_traits,
    arrows = 1,
    free = TRUE,
    values = 0,
    labels = paste0("mean_", selected_traits)
  )
)

two_dim_fit <- mxRun(two_dim_model)
```

## Grow a SEM tree

With the fitted model in place, we can grow a SEM tree to search for subgroups that differ in their personality means and covariance structure. Any columns in `spi_analysis` not used as manifest variables automatically serve as candidate splitting variables.

```{r message=FALSE, warning=FALSE, results="hide"}
spi_tree <- semtree(
  model = two_dim_fit,
  data = spi_analysis,
  control = semtree_control(method="score")
)

```

The resulting tree captures how demographic variables partition the sample into groups with distinct estimates for the two personality dimensions.

## Applying the Bonferroni correction

To adjust for the inflated Type I error rate that arises from testing multiple candidate split variables, enable the Bonferroni correction in the control settings. This reduces the per-test significance threshold.

```{r message=FALSE, warning=FALSE, results="hide"}
bonf_ctrl <- semtree.control(bonferroni = TRUE, 
                             alpha = 0.05,
                             method = "score")
spi_tree_bonf <- semtree(
  model = two_dim_fit,
  data = spi_analysis,
  control = bonf_ctrl
)
```

You can compare the Bonferroni-corrected tree to the uncorrected version to see how stricter multiplicity control affects the discovered subgroups.

```{r}
plot(spi_tree)
plot(spi_tree_bonf)
```

The Bonferroni-adjusted tree will typically be smaller or include fewer splits because candidate partitions must meet the more conservative significance threshold.
