---
title: "NIH Enrollment Tables in HTML"
author: "Will Beasley and Peter Higgins"
date: '`r Sys.Date()`'
output:
  html_document:
    theme: simplex
    toc: true
    toc_depth: 2
    toc_float: true
    number_sections: true
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{NIH Enrollment Tables in HTML}
  %\VignetteEncoding{UTF-8}

---

```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r declare, echo = F}
```

Install `codified` Package
-----------------------------------------------------

First, install the `codified` package if necessary, and then [load it into memory](http://r-pkgs.had.co.nz/package.html#package).

```{r install-codified}
# if( !requireNamespace("codified", quietly = TRUE) )
#   remotes::install_github(repo = "OuhscBbmc/codified")

library(codified)
```

Local Data Source
-----------------------------------------------------

### Establish Datasets

```{r local-establish}
path <- system.file("misc/example-data-1.csv", package = "codified")
col_types <- readr::cols_only(
  record_id = readr::col_integer(),
  name_last = readr::col_character(),
  dob       = readr::col_date(format = ""),
  gender    = readr::col_integer(),
  race      = readr::col_integer(),
  ethnicity = readr::col_integer()
)
ds <- readr::read_csv(path, col_types = col_types) |>
  dplyr::mutate(
    gender     = as.character(gender),
    race       = as.character(race),
    ethnicity  = as.character(ethnicity)
  )
ds |>
  head(10) |>
  knitr::kable(caption = "Observed Dataset (first ten rows)")

ds_lu_gender <- tibble::tribble(
  ~input,   ~displayed            ,
  "0"   ,  "Female"               ,
  "1"   ,  "Male"                 ,
  "U"   ,  "Unknown/Not Reported"
)
knitr::kable(ds_lu_gender, caption = "Gender Mapping")

ds_lu_race <- tibble::tribble(
  ~input , ~displayed                                   ,
  "1"    , "American Indian/Alaska Native"              ,
  "2"    , "Asian"                                      ,
  "3"    , "Native Hawaiian or Other Pacific Islander"  ,
  "4"    , "Black or African American"                  ,
  "5"    , "White"                                      ,
  "M"    , "More than One Race"                         ,
  "6"    , "Unknown or Not Reported"
)
knitr::kable(ds_lu_race, caption = "Race Mapping")

ds_lu_ethnicity <- tibble::tribble(
  ~input,   ~displayed                      ,
  "2"   ,  "Not Hispanic or Latino"         ,
  "1"   ,  "Hispanic or Latino"             ,
  "0"   ,  "Unknown/Not Reported Ethnicity"
)
knitr::kable(ds_lu_ethnicity, caption = "Ethnicity Mapping")
```

### Apply Map to Observed Dataset

```{r local-apply-map}
ds_summary_long <- codified::table_nih_enrollment(
  d              = ds,
  d_lu_gender    = ds_lu_gender,
  d_lu_race      = ds_lu_race,
  d_lu_ethnicity = ds_lu_ethnicity
)

knitr::kable(ds_summary_long, caption = "Counts of Each Subgroup")
```

### Conform to NIH Cosmetics

```{r local-cosmetically-format}
codified::table_nih_enrollment_pretty(
  d              = ds,
  d_lu_gender    = ds_lu_gender,
  d_lu_race      = ds_lu_race,
  d_lu_ethnicity = ds_lu_ethnicity
)
```

REDCap Data Source
-----------------------------------------------------

A hosted (fake) clinical trial dataset demonstrates how to extract demographic data from [REDCap](https://projectredcap.org/) and then present the demographic data in the NIH Inclusion Enrollment Report format.

### Establish Datasets

First, install the `REDCapR` package if necessary, and then [load it into memory](http://r-pkgs.had.co.nz/package.html#package).

```{r install-redcapr}
# if( !requireNamespace("REDCapR", quietly = TRUE) )
#   remotes::install_github("OuhscBbmc/REDCapR")

library(REDCapR)
```

Next, download the data from the REDCap database into the `ds_2` data.frame.  If you're running the most recent version of REDCapR (available on GitHub), the code will be:

```r
ds_2 <- REDCapR::redcap_read(
  redcap_uri = "https://bbmc.ouhsc.edu/redcap/api/",  # URL of REDCap Server.
  token      = "F304DEC3793FECC3B6DEEFF66302CAD3",    # User-specific token/password.
  guess_type = FALSE                                  # Keep all variables as strings/characters.
)$data
```

<!-- readr::write_csv(ds_2, "inst/misc/example-data-2.csv") -->

However [CRAN policy](https://cran.r-project.org/web/packages/policies.html) understandably discourages vignettes from using "Internet resources" so this vignette mimics the code above with this local call.  On your own computer, feel free to call that demonstration REDCap project.

```{r redcap-establish}
ds_2 <-
  readr::read_csv(
    file      = system.file("misc/example-data-2.csv", package = "codified"),
    col_types = readr::cols(.default = readr::col_character())
  )
```

### Conform to NIH Cosmetics

Now, convert these demographic data into a properly formatted NIH enrollment table.  Pass the `ds_lu_gender`, `ds_lu_race`, and `ds_lu_ethnicity` metadata, which was defined above.  As a reminder, these translate values like `1` to `Male` and `3` to `Native Hawaiian or Other Pacific Islander`.

```{r redcap-local-cosmetically-format}
table_nih_enrollment_pretty(
  d               = ds_2,
  d_lu_gender     = ds_lu_gender,
  d_lu_race       = ds_lu_race,
  d_lu_ethnicity  = ds_lu_ethnicity
)
```

Collapsing Levels
-----------------------------------------------------

Many observed datasets may collect race with a different set of levels.  For instance in `ds_3`, the `American Indian` level is separate from the `Alaska Native` level.  In the first and second rows in the metadata below, the two levels are effectively combined into the`American Indian/Alaska Native` level, so it complies with the format of the NIH Enrollment table.

```{r collapsing-levels}
ds_lu_race_3 <- tibble::tribble(
  ~input                      , ~displayed                                  ,
  "American Indian"           , "American Indian/Alaska Native"             , # Combine w/ Alaska Native
  "Alaska Native"             , "American Indian/Alaska Native"             , # Combine w/ American Indian
  "Asian"                     , "Asian"                                     ,
  "Native Hawaiian"           , "Native Hawaiian or Other Pacific Islander" , # Combine w/ Pacific Islanders
  "Pacific Islander"          , "Native Hawaiian or Other Pacific Islander" , # Combine w/ Hawaiian
  "Black or African American" , "Black or African American"                 ,
  "White"                     , "White"                                     ,
  "More than One Race"        , "More than One Race"                        ,
  "Unknown or Not Reported"   , "Unknown or Not Reported"
)

ds_3 <- tibble::tribble(
  ~subject_id,   ~gender  , ~race            ,  ~ethnicity                 ,
           1L,   "Female" , "American Indian",  "Not Hispanic or Latino"   ,
           2L,   "Male"   , "American Indian",  "Not Hispanic or Latino"   ,
           3L,   "Male"   , "American Indian",  "Not Hispanic or Latino"   ,
           4L,   "Female" , "Alaska Native"  ,  "Not Hispanic or Latino"   ,
           5L,   "Male"   , "Alaska Native"  ,  "Not Hispanic or Latino"   ,
           6L,   "Male"   , "Alaska Native"  ,  "Not Hispanic or Latino"   ,
           7L,   "Male"   , "White"          ,  "Not Hispanic or Latino"   ,
           8L,   "Male"   , "White"          ,  "Not Hispanic or Latino"
)

table_nih_enrollment_pretty(
  d               = ds_3,
  d_lu_race       = ds_lu_race_3
)
```

Feedback Welcome
-----------------------------------------------------

Please share your opinions with us by creating an issue at https://github.com/OuhscBbmc/codified/issues.

### Upcoming Features

1. Create tables in a fresh docx file with [flextable](https://davidgohel.github.io/flextable/). [Issue #5](https://github.com/OuhscBbmc/codified/issues/5)
1. Create tables in a fresh pdf file [Issue #5](https://github.com/OuhscBbmc/codified/issues/5).
1. Populate tables in an existing pdf file with [staplr](https://CRAN.R-project.org/package=staplr)?
1. The 'totals' column & row are calculated.  The NIH form does this already, so this is mostly for the purpose of checking. [Issue #4](https://github.com/OuhscBbmc/codified/issues/4).
1. Allow variable names to change (*e.g.*, `Gender` instead of `gender`).  [Issue #3](https://github.com/OuhscBbmc/codified/issues/3).
1. [pkgdown site](https://pkgdown.r-lib.org/), like [REDCapR](https://ouhscbbmc.github.io/REDCapR/).

### Group discussion

* Any holes in the functionality or concept?
* What else (*e.g.*, features or explanation) do you think would be helpful?
* Are people able to run this package?  Is the vignette clear?
* What other standardized tables/graphs/whatever could this be applied to?
  * IRB continuation tables?
  * Graphs?
