---
title: "Alcohol Counts"
output: rmarkdown::html_vignette
fig.width: 8
fig.height: 5
vignette: >
  %\VignetteIndexEntry{Alcohol Counts}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  markdown: 
    wrap: 80
---

Counts involving alcohol are complicated due to a high rate of missing BAC
values. FARS and GESCRSS include the variables `alc_res` (BAC values) and
`dr_drink` (police-reported alcohol involvement). NHTSA also uses [multiple
imputation](https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/809403)
(MI) to estimate missing BAC values in FARS records (*not* GES/CRSS records).
[Traffic Safety
Facts](https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/813713) uses
MI to report the number of fatalities from crashes involving a driver with a BAC
of .08 g/dL or higher. Using `alc_res` and `dr_drink` produce smaller counts
than those produced via MI.

`rfars::counts()` counts alcohol-involved crashes, fatalities and injuries using
`alc_res` and `dr_drink`. However, the `flat` object returned by `get_fars()`
includes the MI variables `a1`:`a10` from the ***Midrvacc*** file, and
`p1`:`p10` from the ***Miper*** file. Below we explore the differences and show
how to use the MI variables to replicate NHTSA's reporting. See Appendix E of
the [FARS Analytical User's
Manual](https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/813794) for
more information on these files.

```{r, message=F}
library(dplyr)
library(ggplot2)
library(tidyr)
library(rfars)
```

Here we use `rfars::counts()` to count alcohol-involved crashes in 2023:

```{r, results='asis', eval=FALSE}
myFARS <- get_fars(years = 2023, proceed = TRUE)
counts(myFARS, involved = 'alcohol') %>% knitr::kable(format = "html")
```


```{r, results='asis', echo=F}
vignette_data <- rfars:::vignette_data
knitr::kable(vignette_data$alccounts_1, format = "html")
message("Note: rfars::counts() uses the variables alc_res and dr_drink to determine alcohol involvement. NHTSA reports counts using multiple imputation to estimate missing BAC values. See vignette('Alcohol Counts', package = 'rfars') for more information.")
```

And here we use `rfars::counts()` to count the number of alcohol-involved
*fatalities* in the same year:

```{r, results='asis', eval=F}
counts(
  df = myFARS, 
  what = "fatalities",
  involved = 'alcohol'
) %>%
  knitr::kable(format = "html")
```

```{r, results='asis', echo=F}
knitr::kable(vignette_data$alccounts_2, format = "html")
message("Note: rfars::counts() uses the variables alc_res and dr_drink to determine alcohol involvement. NHTSA reports counts using multiple imputation to estimate missing BAC values. See vignette('Alcohol Counts', package = 'rfars') for more information.")
```

The MI process produces 10 separate BAC estimates. The ***Miper*** file provides
`p1`:`p10`, person level BAC estimates for each driver and non-occupant in the
FARS Person data file. The ***Midrvacc*** file provides variables `a1`:`a10`,
crash level BAC estimates derived from driver records in the ***Miper*** file.

To replicate the estimated number of fatalities reported in Traffic Safety
Facts, we implement the procedure below.

**Step 1.** Start with the `flat` object and filter to fatalities. Then generate
30 indicator variables, 10 `FPC_i`, 10 `SPC_i`, and 10 `TPC_i` variables. `i`
corresponds to the 10 `a_i` and `p_i` variables. FPC indicates that BAC = 0.00,
SPC that BAC is between 0.01 and 0.07, and TPC that BAC is \>= 0.08. So if `a_1`
= 5 (representing a BAC of 0.05), then `FPC_1` = 0, `SPC_1` = 1, and `TPC_1` =
0.

```{r, eval=F}
temp <- myFARS$flat %>% 
  select(year:per_no, age, sex, per_typ, inj_sev, alc_res, dr_drink, a1:a10) %>%
  filter(inj_sev == "Fatal Injury (K)")

for(i in 1:10) {
  imputation_col <- paste0("a", i)
  temp[[paste0("FPC", i)]] <- ifelse(temp[[imputation_col]] == 0, 1, 0)  # BAC = 0.00
  temp[[paste0("SPC", i)]] <- ifelse(temp[[imputation_col]] >= 1 & temp[[imputation_col]] <= 7, 1, 0)  # BAC = 0.01-0.07
  temp[[paste0("TPC", i)]] <- ifelse(temp[[imputation_col]] >= 8, 1, 0)  # BAC = 0.08+
}
```

This produces the table below (transposed for easier viewing).

```{r, results='asis', eval=F}
temp %>% 
  select(st_case, a1:a10, starts_with("FPC"), starts_with("SPC"), starts_with("TPC")) %>% 
  slice(1:10) %>% 
  t() %>%
  knitr::kable(format = "html")
```

```{r, results='asis', echo=F}
vignette_data$alccounts_3 %>% 
  t() %>%
  knitr::kable(format = "html")
```



We can examine one crash to see the relation between the `a` and `FPC`, `SPS`,
and `TPC` variables. The table below shows the 10 iterations for `a`, and the
corresponding indicator variables.

```{r, results='asis', eval=F}
temp %>% 
  slice(1) %>% 
  select(st_case, a1:a10, starts_with("FPC"), starts_with("SPC"), starts_with("TPC")) %>%
  pivot_longer(-1) %>%
  mutate(
    iter = gsub("\\D", "", name),
    name = gsub("[^A-Za-z]", "", name)
  ) %>%
  pivot_wider() %>%
  knitr::kable(format = "html")
```

```{r, results='asis', echo=F}
vignette_data$alccounts_4 %>%
  pivot_wider() %>%
  knitr::kable(format = "html")
```

**Step 2.** Sum the indicator variables in each iteration.

```{r, eval=F}
case_results <- list()

for(i in 1:10) {
  fpc_col <- paste0("FPC", i)
  spc_col <- paste0("SPC", i)
  tpc_col <- paste0("TPC", i)
  
  case_results[[i]] <-
    temp %>%
    summarise(
      TOTAL = n(),
      !!paste0("FSBAC", i) := sum(!!sym(fpc_col), na.rm = TRUE),
      !!paste0("SSBAC", i) := sum(!!sym(spc_col), na.rm = TRUE),
      !!paste0("TSBAC", i) := sum(!!sym(tpc_col), na.rm = TRUE),
      .groups = 'drop'
    )
}
```

This produces 10 estimates of the number of fatalities from crashes where BAC =
0 (FSBAC), where BAC was between 0.01 and 0.07 (SSBAC), and where BAC \>= 0.08
(TSBAC). The table below shows all of these estimates.

```{r, results='asis', eval=F}
bind_rows(
  data.frame(case_results[[1]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=1),
  data.frame(case_results[[2]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=2),
  data.frame(case_results[[3]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=3),
  data.frame(case_results[[4]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=4),
  data.frame(case_results[[5]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=5),
  data.frame(case_results[[6]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=6),
  data.frame(case_results[[7]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=7),
  data.frame(case_results[[8]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=8),
  data.frame(case_results[[9]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=9),
  data.frame(case_results[[10]]) %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=10)
  ) %>%
  knitr::kable(format = "html")
```

```{r, results='asis', echo=F}
knitr::kable(vignette_data$alccounts_5, format = "html")
```

**Step 3.** Take the average of each set of estimates.

```{r, eval=F}
calc <- case_results[[1]]

for(i in 2:10) {
  calc <- calc %>% bind_cols(case_results[[i]] %>% select(-TOTAL))
}

calc <-
  calc %>%
  rowwise() %>%
  mutate(
    SBAC0 = round(mean(c_across(starts_with("FSBAC")), na.rm = TRUE)), # BAC 0.00
    SBAC1 = round(mean(c_across(starts_with("SSBAC")), na.rm = TRUE)), # BAC 0.01-0.07
    SBAC2 = round(mean(c_across(starts_with("TSBAC")), na.rm = TRUE))  # BAC 0.08+
  ) %>%
  ungroup()
```

This gives us our answer. The values below give the final estimates of the
number of fatalities from crashes where BAC = 0 (SBAC0), where BAC was between
0.01 and 0.07 (SBAC1), and where BAC \>= 0.08 (SBAC2). The value for SBAC2 is
12,429, which matches *Traffic Safety Facts* for 2023.

```{r, results='asis', eval=F}
select(calc, SBAC0:SBAC2) %>% knitr::kable(format = "html")
```


```{r, results='asis', echo=F}
knitr::kable(vignette_data$alccounts_6, format = "html")
```

**Shortcut.** We can do this more directly if we're only interested in the
number of fatalities from crashes with alcohol-impaired drivers:

```{r, eval=F}
x <-
  myFARS$flat %>% 
  select(year:per_no, age, sex, per_typ, inj_sev, alc_res, dr_drink, a1:a10) %>%
  filter(inj_sev == "Fatal Injury (K)") %>%
  mutate_at(paste0("a", 1:10), function(x) 1*(x>=8)) %>%
  group_by(year) %>%
  summarize_at(paste0("a", 1:10), sum, na.rm=T) %>%
  rowwise() %>%
  mutate(a = round(mean(c_across(a1:a10)))) 

x$a
```

```{r, echo=F}
vignette_data$alccounts_7
```


Please see [Transitioning to Multiple Imputation – A New Method to Impute
Missing BAC values in
FARS](https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/809403) for
more guidance on generating other counts using the MI data.