---
title: "Tutorial: Obtain an overall p-value for a factor variable"
author: "Emily C. Zabor"
date: "Last updated: `r format(Sys.Date(), '%B %d, %Y')`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteEncoding{UTF-8}
  %\VignetteIndexEntry{Tutorial: Obtain an overall p-value for a factor variable}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(riskclustr)
```


# Context

After using `eh_test_subtype()` to obtain a model fit, if factor variables are involved in the analysis it will be of interest to obtain overall p-values testing for differences across subtypes across all levels of the factor variable.

The `posthoc_factor_test()` function allows for post-hoc testing of a factor variable.

# Example

```{r, message = FALSE}
# Load needed packages
library(riskclustr)
library(dplyr)
```

```{r}
# create a new example dataset that contains a factor variable
factor_data <- 
  subtype_data %>%
  mutate(
    x4 = cut(
      x1,
      breaks = c(-3.4, -0.4, 0.3, 1.1, 3.8),
      include.lowest = T,
      labels = c("1st quart",
                 "2nd quart",
                 "3rd quart",
                 "4th quart")
      )
    )
```

```{r}
# Fit the model using x4 in place of x1
mod1 <- eh_test_subtype(
  label = "subtype",
  M = 4,
  factors = list("x4", "x2", "x3"),
  data = factor_data,
  digits = 2
)
```


After we have the model fit, we can obtain the p-value testing all levels of `x4` simulaneously.

```{r}
mypval <- posthoc_factor_test(
  fit = mod1, 
  factor = "x4", 
  nlevels = 4
  )
```

The function returns both a formatted and unformatted p-value. The formatted p-value can be accessed as `pval`:

```{r}
mypval$pval
```

The unformatted p-value can be accessed as `pval_raw`:

```{r}
mypval$pval_raw
```