---
title: "Datasets for categorical data analysis"
author: "Michael Friendly"
date: "`r Sys.Date()`"
package: vcdExtra
output: 
  rmarkdown::html_vignette:
  fig_caption: yes
bibliography: ["vcd.bib", "vcdExtra.bib"]
csl: apa.csl
vignette: >
  %\VignetteIndexEntry{Datasets for categorical data analysis}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  message = FALSE,
  warning = FALSE,
  fig.height = 6,
  fig.width = 7,
#  fig.path = "fig/datasets-",
  dev = "png",
  comment = "##"
)

# save some typing
knitr::set_alias(w = "fig.width",
                 h = "fig.height",
                 cap = "fig.cap")
```

The `vcdExtra` package contains `r nrow(vcdExtra::datasets("vcdExtra"))` datasets, taken
from the literature on categorical data analysis, and selected to illustrate various
methods of analysis and data display. These are in addition to the 
`r nrow(vcdExtra::datasets("vcd"))` datasets in the [vcd package](https://cran.r-project.org/package=vcd).

To make it easier to find those which illustrate a particular method, the datasets in `vcdExtra` have been classified
using method tags. This vignette creates an "inverse table", listing the datasets that apply to each method.
It also illustrates a general method for classifying datasets in R packages.

```{r load}
library(dplyr)
library(tidyr)
library(readxl)
```

## Processing tags

Using the result of `vcdExtra::datasets(package="vcdExtra")` I created a spreadsheet, `vcdExtra-datasets.xlsx`,
and then added method tags.
```{r read-datasets}
dsets_tagged <- read_excel(here::here("inst", "extdata", "vcdExtra-datasets.xlsx"), 
                           sheet="vcdExtra-datasets")

dsets_tagged <- dsets_tagged |>
  dplyr::select(-Title, -dim) |>
  dplyr::rename(dataset = Item)

head(dsets_tagged)
```


To invert the table, need to split tags into separate observations, then collapse the rows for the same tag.

```{r split-tags}
dset_split <- dsets_tagged |>
  tidyr::separate_longer_delim(tags, delim = ";") |>
  dplyr::mutate(tag = stringr::str_trim(tags)) |>
  dplyr::select(-tags)

#' ## collapse the rows for the same tag
tag_dset <- dset_split |>
  arrange(tag) |>
  dplyr::group_by(tag) |>
  dplyr::summarise(datasets = paste(dataset, collapse = "; ")) |> ungroup()

# get a list of the unique tags
unique(tag_dset$tag)
```

## Make this into a nice table

Another sheet in the spreadsheet gives a more descriptive `topic` for corresponding to each tag.

```{r read-tags}
tags <- read_excel(here::here("inst", "extdata", "vcdExtra-datasets.xlsx"), 
                   sheet="tags")
head(tags)
```

Now, join this with the `tag_dset` created above.

```{r join-tags}
tag_dset <- tag_dset |>
  dplyr::left_join(tags, by = "tag") |>
  dplyr::relocate(topic, .after = tag)

tag_dset |>
  dplyr::select(-tag) |>
  head()
```

### Add links to `help()`

We're almost there. It would be nice if the dataset names could be linked to their documentation.
This function is designed to work with the `pkgdown` site. There are different ways this can be
done, but what seems to work is a link to `../reference/{dataset}.html`
Unfortunately, this won't work in the actual vignette.

```{r add-links}
add_links <- function(dsets, 
                      style = c("reference", "help", "rdrr.io"),
                      sep = "; ") {

  style <- match.arg(style)
  names <- stringr::str_split_1(dsets, sep)

  names <- dplyr::case_when(
    style == "help"      ~ glue::glue("[{names}](help({names}))"),
    style == "reference" ~ glue::glue("[{names}](../reference/{names}.html)"),
    style == "rdrr.io"   ~ glue::glue("[{names}](https://rdrr.io/cran/vcdExtra/man/{names}.html)")
  )  
  glue::glue_collapse(names, sep = sep)
}

```

## Make the table {#table}

Use `purrr::map()` to apply `add_links()` to all the datasets for each tag.
(`mutate(datasets = add_links(datasets))` by itself doesn't work.)

```{r kable}
tag_dset |>
  dplyr::select(-tag) |>
  dplyr::mutate(datasets = purrr::map(datasets, add_links)) |>
  knitr::kable()
```

Voila!

