---
title: "Introduction to hypothesize"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to hypothesize}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(hypothesize)
```

## What is hypothesize?

The `hypothesize` package provides a consistent API for hypothesis testing in R.
It is designed around three principles from *Structure and Interpretation of
Computer Programs* (SICP):

1. **Data Abstraction**: Hide implementation details behind a clean interface
2. **Closure Property**: Combining tests yields tests
3. **Higher-Order Functions**: Transform tests into new tests

This vignette introduces the package through examples, building from simple
primitives to more sophisticated compositions.

## The Hypothesis Testing Interface

Every hypothesis test in this package implements the same interface:

- `pval(x)` — extract the p-value
- `test_stat(x)` — extract the test statistic
- `dof(x)` — extract degrees of freedom
- `is_significant_at(x, alpha)` — check significance at level α

This abstraction means you can work with any test the same way, regardless of
its internal implementation.

## Primitive Tests

### The Z-Test: Simplest Case

The z-test is the most basic hypothesis test. It tests whether a population
mean equals a hypothesized value when the population standard deviation is
known:

```{r}
# A manufacturer claims widgets weigh 100g on average
# We test 30 widgets (population SD is known to be 5g from historical data)
set.seed(42)
weights <- rnorm(30, mean = 102, sd = 5)

# Test H0: mu = 100 vs H1: mu != 100
z <- z_test(weights, mu0 = 100, sigma = 5)
z
```

We can extract components:

```{r}
test_stat(z)   # The z-statistic
pval(z)        # The p-value
dof(z)         # Degrees of freedom (Inf for z-test)
```

### The Wald Test: General Parameters

The Wald test generalizes the z-test to any asymptotically normal estimator.
If you have an estimate and its standard error, you can test whether the true
parameter equals a hypothesized value:

```{r}
# Suppose we fit a regression and got coefficient = 2.3 with SE = 0.8
# Test H0: beta = 0 (no effect) vs H1: beta != 0
w <- wald_test(estimate = 2.3, se = 0.8, null_value = 0)
w

is_significant_at(w, 0.05)
```

The Wald statistic is z², which follows a chi-squared distribution with 1 df:

```{r}
w$z            # The z-score
test_stat(w)   # z² (the Wald statistic)
```

### The Likelihood Ratio Test: Comparing Models

The LRT compares nested models by examining the ratio of their likelihoods.
You can pass raw log-likelihood values, or use standard `logLik` objects
from any fitted model:

```{r}
# From raw log-likelihoods (specify dof manually)
test <- lrt(null_loglik = -250, alt_loglik = -240, dof = 3)
test

# Or from fitted models -- dof is derived automatically
set.seed(42)
x <- 1:50
y <- 2 + 3 * x + rnorm(50, sd = 5)
m0 <- lm(y ~ 1)       # intercept only
m1 <- lm(y ~ x)       # add slope
lrt(logLik(m0), logLik(m1))
```

## Duality: Tests and Confidence Intervals

Hypothesis tests and confidence intervals are two sides of the same coin. A 95%
confidence interval contains exactly those parameter values that would *not* be
rejected at the 5% level.

For tests that store their estimates, we can extract confidence intervals:

```{r}
w <- wald_test(estimate = 5.2, se = 1.1)
confint(w)               # 95% CI
confint(w, level = 0.99) # 99% CI
```

Notice the duality: the CI contains 0 if and only if the test is not significant:

```{r}
# Is 0 in the 95% CI?
ci <- confint(w)
ci["lower"] < 0 && 0 < ci["upper"]

# Is the test significant at 5%?
is_significant_at(w, 0.05)
```

## The Closure Property: Combining Tests

A key principle from SICP is the *closure property*: when you combine things,
you get back the same kind of thing. In our case, combining hypothesis tests
yields a hypothesis test.

Fisher's method combines p-values from independent tests:

```{r}
# Three independent studies test the same hypothesis
# None is individually significant, but together...
p1 <- 0.08
p2 <- 0.12
p3 <- 0.06

combined <- fisher_combine(p1, p2, p3)
combined

is_significant_at(combined, 0.05)
```

You can also combine actual test objects:

```{r}
t1 <- wald_test(estimate = 1.2, se = 0.8)
t2 <- wald_test(estimate = 0.9, se = 0.6)
t3 <- wald_test(estimate = 1.5, se = 1.0)

fisher_combine(t1, t2, t3)
```

Because the result is itself a `hypothesis_test`, you can use all the same
methods on it:

```{r}
combined <- fisher_combine(0.05, 0.10, 0.15)
pval(combined)
test_stat(combined)
dof(combined)
```

## Higher-Order Functions: Transforming Tests

When performing multiple tests, we need to adjust p-values to control error
rates. The `adjust_pval()` function transforms tests by adjusting their p-values:

```{r}
# Perform 5 tests
tests <- list(
  wald_test(estimate = 2.5, se = 1.0),
  wald_test(estimate = 1.8, se = 0.9),
  wald_test(estimate = 1.2, se = 0.7),
  wald_test(estimate = 0.9, se = 0.8),
  wald_test(estimate = 2.1, se = 1.1)
)

# Original p-values
vapply(tests, pval, numeric(1))

# Bonferroni-adjusted p-values
adjusted <- adjust_pval(tests, method = "bonferroni")
vapply(adjusted, pval, numeric(1))

# Less conservative: Benjamini-Hochberg (FDR control)
adjusted_bh <- adjust_pval(tests, method = "BH")
vapply(adjusted_bh, pval, numeric(1))
```

The adjusted tests retain all original properties:

```{r}
adj <- adjusted[[1]]
test_stat(adj)           # Same as original
adj$original_pval        # Can access unadjusted p-value
adj$adjustment_method    # Method used
```

Because adjusted tests are still hypothesis tests, they compose naturally:

```{r}
# First adjust, then combine
adjusted <- adjust_pval(tests[1:3], method = "bonferroni")
fisher_combine(adjusted[[1]], adjusted[[2]], adjusted[[3]])
```

## Extending the Package

The `hypothesis_test()` constructor makes it easy to add new test types:

```{r}
# Example: Create a simple chi-squared goodness-of-fit test wrapper
chisq_gof <- function(observed, expected) {
  stat <- sum((observed - expected)^2 / expected)
  df <- length(observed) - 1
  p.value <- pchisq(stat, df = df, lower.tail = FALSE)

  hypothesis_test(
    stat = stat,
    p.value = p.value,
    dof = df,
    superclasses = "chisq_gof_test",
    observed = observed,
    expected = expected
  )
}

# Use it
result <- chisq_gof(
  observed = c(45, 35, 20),
  expected = c(40, 40, 20)
)
result

# It works with all the standard functions
pval(result)
is_significant_at(result, 0.05)
```

## Summary

The `hypothesize` package demonstrates that a small set of well-designed
primitives and combinators can provide a powerful, extensible framework for
hypothesis testing:

| Function | Purpose | SICP Principle |
|----------|---------|----------------|
| `hypothesis_test()` | Base constructor | Data abstraction |
| `z_test()`, `wald_test()`, `lrt()`, `score_test()` | Primitive tests | Building blocks |
| `pval()`, `test_stat()`, `dof()` | Accessors | Abstraction barrier |
| `fisher_combine()` | Combine tests | Closure property |
| `intersection_test()`, `union_test()` | Boolean AND/OR | Closure property |
| `adjust_pval()`, `complement_test()` | Transform tests | Higher-order functions |
| `confint()`, `invert_test()` | Extract CI / confidence set | Duality |

This design makes hypothesis testing composable, extensible, and easy to reason
about. See `vignette("boolean-algebra")` for the full Boolean algebra of
hypothesis tests, De Morgan's laws, and the test-confidence duality.