---
title: "Understanding CPI Adjustments in edfinr"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Understanding CPI Adjustments in edfinr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  message = FALSE, warning = FALSE, fig.width = 8
)
```

## Introduction

When analyzing education finance data across multiple years, adjusting for inflation can help produce more meaningful comparisons. `edfinr` provides built-in functionality to adjust all dollar-denominated values for inflation using the Consumer Price Index for All Urban Consumers (CPI-U).

```{r setup, include = FALSE}
library(edfinr)
library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)
```

```{r, eval = FALSE}
library(tidyverse)
library(edfinr)
```
## Understanding Nominal vs. Real Dollars

By default, all financial data returned by `get_finance_data()` is in **nominal dollars** - the actual dollar amounts reported in each year without any inflation adjustment. This means that $1,000 in 2012 and $1,000 in 2022 are treated as equal amounts, even though they have different purchasing power.

To make valid comparisons across years, you need to convert to **real dollars** (also called constant dollars) by adjusting for inflation.

## How CPI Adjustment Works

`edfinr` uses the CPI-U index to adjust for inflation. The adjustment is aligned to the school year calendar:

- Each school year's CPI is calculated by averaging:
  - The second half of the first calendar year (July-December).
  - The first half of the second calendar year (January-June).
  
For example, the 2021-22 school year CPI combines:
- July-December 2021 (HALF2 2021).
- January-June 2022 (HALF1 2022).

## Using the cpi_adj Parameter

The `get_finance_data()` function includes a `cpi_adj` parameter to automatically adjust all dollar values:

```{r example-nominal}
# Get nominal (unadjusted) data - this is the default
nominal_data <- get_finance_data(yr = "2015:2022", geo = "KY")

# View the nominal revenue for a specific district
nominal_data |>
  filter(dist_name == "Jefferson County") |>
  select(year, dist_name, rev_total, rev_total_pp)
```

```{r example-adjusted}
# Get data adjusted to 2022 dollars
real_2022_data <- get_finance_data(yr = "2015:2022", geo = "KY", cpi_adj = 2022)

# View the same district with inflation-adjusted values
real_2022_data |>
  filter(dist_name == "Jefferson County") |>
  select(year, dist_name, rev_total, rev_total_pp)
```

## What Gets Adjusted?

When you use `cpi_adj`, the following variables are automatically adjusted for inflation:

- All revenue variables (total, local, state, federal).
- All expenditure variables.
- Median household income.
- Median property values.

Variables that are NOT adjusted include:
- Enrollment counts.
- Demographic percentages.
- Any ratio or percentage variables.

## Working with the CPI Index

Every dataset includes a `cpi_sy12` column that shows the CPI index relative to the 2011-12 school year:

```{r cpi-index}
# Examine the CPI index values
cpi_values <- get_finance_data(yr = "all", geo = "KY") |>
  select(year, cpi_sy12) |>
  distinct() |>
  arrange(year)

print(cpi_values)

# Calculate cumulative inflation since 2012
cpi_values |>
  mutate(
    inflation_since_2012 = (cpi_sy12 - 1) * 100,
    inflation_label = paste0(round(inflation_since_2012, 1), "%")
  )
```

## Practical Example: Tracking Real Spending Over Time

Here's how to analyze whether education revenue has kept pace with inflation:

```{r real-spending-analysis}
# get multiyear data in nominal dollars
ky_nominal <- get_finance_data(yr = "all", geo = "KY", cpi_adj = "none") |> 
  mutate(type = "Nominal dollars")

# get multi-year data adjusted to 2022 dollars
ky_real <- get_finance_data(yr = "all", geo = "KY", cpi_adj = "2022") |> 
  mutate(type = "Real 2022 dollars")

# join data
ky_data <- bind_rows(ky_nominal, ky_real)

# calculate statewide per-pupil revenue trends for real dollars
rev_trends <- ky_data |>
  group_by(type, year) |>
  summarize(
    rev_local = sum(rev_local, na.rm = TRUE),
    rev_state = sum(rev_state, na.rm = TRUE),
    rev_fed = sum(rev_fed, na.rm = TRUE),
    enroll = sum(enroll, na.rm = TRUE)
  ) |> 
  mutate(
    rev_local_pp = rev_local / enroll,
    rev_state_pp = rev_state / enroll,
    rev_fed_pp = rev_fed / enroll
  ) |> 
  select(type, year, rev_local_pp:rev_fed_pp) |>
  pivot_longer(
    cols = rev_local_pp:rev_fed_pp,
    names_to = "var", values_to = "val") |> 
  mutate(
    var = stringr::str_remove_all(var, "rev_"),
    var = stringr::str_remove_all(var, "_pp"),
    var = stringr::str_to_title(var),
    var = str_replace_all(var, "Fed", "Federal")
  )    

# plot trends
ggplot(rev_trends) +
  geom_line(
    aes(x = year, y = val, color = var, group = var)
    ) +
  facet_wrap(~type) +
  scale_x_discrete(breaks = c("2012", "2014", "2016", "2018", "2020", "2022")) +
  scale_y_continuous(labels = scales::dollar) +
  labs(
    title = "Comparing Nominal and Real Per-Pupil Revenue in Kentucky",
    subtitle = "Statewide average per-pupil revenue by source, 2012-2022",
    x = "Year",
    y = "Per-Pupil Revenue",
    color = "Revenue Source"
  ) +
  theme_minimal()
```

## Choosing a Base Year

You can adjust to any year from 2012 to 2022. Common choices include:

- **Most recent year** (e.g., 2022): Shows all values in current dollar terms.
- **First year of analysis**: Makes it easy to see percentage changes from baseline.
- **Midpoint year**: Minimizes the size of adjustments across the time series.

```{r base-year-comparison}
# select ky district to assess
district_sample <- "Jefferson County"

# get data with nominal dollars and cpi-adjusted for different base years
nominal <- get_finance_data(yr = "2012:2022", geo = "KY") |>
  filter(dist_name == district_sample) |>
  select(year, rev_total_pp) |>
  mutate(type = "Nominal")

adjusted_2012 <- get_finance_data(yr = "2012:2022", geo = "KY", cpi_adj = 2012) |>
  filter(dist_name == district_sample) |>
  select(year, rev_total_pp) |>
  mutate(type = "2012 Dollars")

adjusted_2022 <- get_finance_data(yr = "2012:2022", geo = "KY", cpi_adj = 2022) |>
  filter(dist_name == district_sample) |>
  select(year, rev_total_pp) |>
  mutate(type = "2022 Dollars")

# join and plot data
bind_rows(nominal, adjusted_2012, adjusted_2022) |>
  ggplot(aes(x = year, y = rev_total_pp, color = type, group = type)) +
  geom_line(size = 1.2) +
  scale_y_continuous(labels = scales::dollar) +
  labs(
    title = paste("Per-Pupil Revenue:", district_sample),
    x = "Year",
    y = "Revenue per Pupil",
    color = "CPI Adjustment"
  ) +
  theme_minimal()
```

## Best Practices

1. **Always use inflation adjustment for multi-year analyses**: Comparing nominal dollars across years can be misleading.

2. **Be consistent with your base year**: Use the same `cpi_adj` value for all data in an analysis.

3. **Document your choice**: Always note whether values are nominal or real, and which base year you used.

4. **Consider your audience**: Current dollars (most recent year) are often most intuitive for general audiences.

## Technical Notes

- The CPI data comes from the U.S. Bureau of Labor Statistics CPI-U series.
- School year alignment ensures the index matches the academic calendar.
- All adjustments are applied before any per-pupil calculations.
- The `cpi_sy12` column is always included regardless of adjustment choice .
