---
title: "Basic usage of edfinr"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Basic usage of edfinr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  message = FALSE, 
  warning = FALSE,
  eval = TRUE
)
```

## Introduction

The `edfinr` package provides a simple, consistent interface for accessing comprehensive education finance data for U.S. school districts. This vignette will help you get started with the package's core functionality.

```{r setup, include = FALSE}
library(edfinr)
library(dplyr)
library(ggplot2)
```


```{r, eval = FALSE}
library(edfinr)
library(tidyverse)
```
## Core Function: get_finance_data()

The primary function in `edfinr` is `get_finance_data()`, which provides access to comprehensive school finance data from school years 2011-12 through 2021-22. The function combines data from multiple sources:

- **Financial data**: Revenue and expenditure data from the National Center for Education Statistics (NCES) version of the F-33 survey.
- **Enrollment**: Student counts from NCES Common Core of Data.
- **Demographics**: Poverty estimates from the U.S. Census Bureau Small Area Income and Poverty Estimates (SAIPE).
- **Community characteristics**: Income and education data from American Community Survey (ACS).
- **Inflation adjustments**: Consumer Price Index for All Urban Consumers (CPI-U) data for constant dollar calculations.

## Basic Usage

The simplest way to use `get_finance_data()` is to specify a year and state. For example, to get finance data for Kentucky school districts from the 2015-16 school year:
```{r example-1}
# download "skinny" dataset for a single year and a single staet
ky_sy16 <- get_finance_data(yr = 2016, geo = "KY")

# view the structure of the returned data
glimpse(ky_sy16)
```

## Dataset Types: Skinny vs. Full

By default, `get_finance_data()` returns a "skinny" dataset with 41 essential variables covering:
- District identifiers and characteristics.
- Total revenues by source (local, state, federal).
- Current expenditures.
- Key demographic and economic indicators.

For more detailed analysis, you can request the "full" dataset with 89 variables that includes:
- All skinny dataset variables.
- Detailed expenditure data.
- Data on spending of temporary pandemic-related federal funding.

```{r example-2}
# download the full dataset with detailed expenditure data for a single year/state
ky_full_sy16 <- get_finance_data(yr = "2016", geo = "KY", dataset_type = "full")

# view additional variables in "full" dataset
names(ky_full_sy16)[42:89]
```

## Multiple Years and States

The `get_finance_data()` function makes it easy to access data across multiple years and states:

```{r example-3}
# get data for multiple states across multiple years
sec_data <- get_finance_data(
  yr = "2018:2022",  # years 2018 through 2022
  geo = "AL,AR,FL,GA,KY,LA,MS,MO,OK,SC,TN,TX"  # comma-separated state codes
)

# get the most recent year of data for all states
us_sy22 <- get_finance_data(yr = 2022, geo = "all")
```


## Working with the Data

Once you've retrieved the data, you can use standard data manipulation tools to analyze it. Here are some common analysis patterns:

### Analyze Local vs. Total Revenue Per-Pupil

```{r analysis-1}
# download 2022 data for connecticut
ct_sy22 <- get_finance_data(yr = "2022", geo = "CT")

# plot local revenue vs. total revenue w/ urbanicity + enrollment
ggplot(ct_sy22) +
  geom_point(aes(
    x = rev_local_pp, 
    y = rev_total_pp,
    color = urbanicity,
    size = enroll),
    alpha = .6) +
  scale_size_area(
    max_size = 10,
    labels = scales::label_comma()
    ) +    
  scale_x_continuous(labels = scales::label_dollar()) +
  scale_y_continuous(labels = scales::label_dollar()) +
  labs(
    title = "Connecticut Districts' Local vs. Total Revenue Per-Pupil, SY2021-22",
    x = "Local Revenue Per-Pupil", 
    y = "Total Revenue Per-Pupil", 
    size = "Enrollment", 
    color = "Urbanicity") +
  theme_bw()
```

### Analyzing Revenue Sources by Urbanicity

```{r analysis-2}
# compare revenue sources across districts
revenue_analysis <- ct_sy22 |>
  mutate(
    pct_local = rev_local / rev_total,
    pct_state = rev_state / rev_total,
    pct_federal = rev_fed / rev_total
  ) |>
  select(dist_name, urbanicity, enroll, pct_local, pct_state, pct_federal) |>
  group_by(urbanicity) |>
  summarize(
    avg_pct_local = mean(pct_local, na.rm = TRUE),
    avg_pct_state = mean(pct_state, na.rm = TRUE),
    avg_pct_federal = mean(pct_federal, na.rm = TRUE),
    n_districts = n(),
    enrollment = sum(enroll, na.rm = TRUE)
  )

print(revenue_analysis)
```

## Additional Resources

For more information about the data and methods used in this package:

- Use `list_variables()` to see all available variables and their descriptions.
- Use `get_states()` to see valid state codes.
- See the "CPI Adjustments" vignette for information about inflation adjustments.
- See the "Data Sources and Methods" vignette for detailed methodology.
