---
title: "A guide on how to use the package gglyph"
author: Valentin Velev (University of Konstanz)
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{A guide on how to use the package gglyph}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## The Package
`gglyph` is a package for creating directed network-style graphs for statistical and non-statistical data with custom edges. It builds on `ggplot2` and includes four functions:

* `geom_glyph()`: Create a network-based graph that illustrates pairwise relationships (statistical and non-statistical) using custom edges
* `process_data_statistical()`: Process statistical data (e.g., pairwise t-tests) for plotting
* `process_data_general()`: Process general / non-statistical data (any data with directional relationships) for plotting
* `generate_mock_data()`: Create mock data for experimenting with `geom_glyph()`

The pipeline is as follows:

1. Obtain a dataset with directed pairwise relationships. This can be done using the function `generate_mock_data()` or by using your own dataset (e.g., running pairwise t-tests on your data).
2. Process the dataset using either `process_data_statistical()`or `process_data_general()`.
3. Create a glyph plot with `geom_glyph()`.

The package also includes two datasets:

* **PISA 2022**: The Programme for International Student Assessment (PISA) is a global evaluation study conducted by the Organisation for Economic Co-operation and Development (OECD) that assesses the scholastic performance of 15-year-old students in reading, math, and science. It is held every three years and provides comparative data for countries to understand and improve their education systems.
* **SIPRI Military Expenditure Database**: The Stockholm International Peace Research Institute (SIPRI) Military Expenditure Database comprises panel data on the amount of financial resources dedicated by a state to raising and maintaining the state's armed forces. The database includes data in local currencies, constant (2022) and current US dollars, as a share of gross domestic product (GDP), and per capita.

In the following chapter, I will illustrate how the main function `geom_glyph()` works and how its arguments are related to common `ggplot2` arguments.

## The Plotting Function
### Basics
To begin with, I have created a table table showing the equivalence of `geom_glyph()` arguments and common `ggplot2` arguments. <br>

```{r setup_knitr, include=FALSE}
library(knitr)

# Set plot size and quality
knitr::opts_chunk$set(
  fig.height = 6,
  fig.width = 8
)

# Reset options and par
default_opts <- options(digits = 3)
default_par <- par(mfrow = c(1,2))
```

```{r equivalence_table, echo=FALSE}
library(tibble)
library(kableExtra)

eq_table <- tribble(
  ~`geom_glyph Argument`, ~`ggplot2 Equivalent`, ~Explanation,
  #-------------------------------|---------------------------------|----------------------------------------------------------------------
  "`edge_colour`, `node_colour`",   "`color`",                       "Controls the outline color of the nodes/edges.",
  "`edge_fill`, `node_fill`",       "`fill`",                        "Controls the fill color of the nodes/edges.",
  "`edge_alpha`, `node_alpha`",     "`alpha`",                       "Controls the transparency of the nodes/edges.",
  "`edge_size`, `node_size`",       "`size`",                        "Controls the size of the nodes/edges.",
  "`node_spacing`",                 "N/A",                           "Controls the space between the nodes; not a standard `ggplot2` argument.",
  "`node_shape`",                   "`shape`",                       "Controls the shape of the nodes.",
  "`label_size`",                   "`fontsize` in `grid::gpar()`",  "Controls the font size of the node labels.",
  "`group_label_size`",             "`theme(strip.text)`",           "Controls the font size of the facet labels (group titles).",
  "`legend_title`",                 "`title` in `guides()`",         "Sets the main title text within the legend.",
  "`legend_subtitle`",              "`title` in `guides()`",         "Sets an additional subtitle."
)

kable(eq_table, "html", caption = "<span style='font-size: 0.9em;'>Table 1: Equivalence of geom_glyph and ggplot2 arguments</span>", booktabs = TRUE) %>%
  kable_styling(full_width = FALSE, font_size = 13)
```

### Some Examples
Now I will set up the vignette:

```{r setup, message=FALSE, warning=FALSE}
# Load packages
library(gglyph)
library(tidyverse)
library(readr)
library(haven)
library(purrr)
library(viridisLite)
library(kableExtra)
library(patchwork)
library(ggthemes)

# Remove scientific notation
options(scipen = 999, digits = 3)

# Set seed for reproducibility
set.seed(42)
```

And create mock data using the custom function `generate_mock_data()`, which comprises several arguments listed in Table 2:

```{r data_generation_func_table, echo=FALSE, results='asis'}
eq_table <- tribble(
  ~Argument,  ~Explanation,
  #---------|---------------------------------------------------------------------------------------
  "`n_nodes`",        "Number of nodes. Default is 5.",
  "`n_edges`",        "Number of edges. Default is 7.",
  "`n_groups`",       "Number of groups. Default is 1 (ungrouped).",
  "`statistical`",    "Boolean indicator for whether to generate statistical data. Default is FALSE.",
  "`p_threshold`",    "Statistical significance threshold. Default is 0.05."
)

cat('<div style="width: 100%;">')
kable(eq_table, "html", caption = "<span style='font-size: 0.9em;'>Table 2: Arguments in `generate_mock_data`</span>", booktabs = TRUE) %>%
  kable_styling(full_width = TRUE, font_size = 13)
cat('</div>')
```

This function can be used if you want to just play around with `geom_glyph()`. Here is how it can be used:

```{r mock_data, warning=FALSE, message=FALSE}
mock_data <- generate_mock_data(n_nodes = 5, n_edges = 10, statistical = TRUE)
mock_data_grouped <- generate_mock_data(n_nodes = 5, n_edges = 10, n_groups = 3, statistical = TRUE)
```

This is what data that can be directly passed to `geom_glyph()` must look like (more on this in the chapter on the data wrangling functions): <br>

```{r mock_data_table, echo=FALSE}
kable(mock_data, "html", caption = "<span style='font-size: 0.9em;'>Table 3: Ungrouped data for `geom_glyph`</span>", booktabs = TRUE) %>%
  kable_styling(full_width = TRUE, font_size = 12)

kable(mock_data_grouped, "html", caption = "<span style='font-size: 0.9em;'>Table 4: Grouped data for `geom_glyph`</span>", booktabs = TRUE) %>%
  kable_styling(full_width = TRUE, font_size = 10)
```

With this data we can plot some basic glyphs using the previously generated mock data:

```{r example_glyphs_base}
# Non-grouped
ggplot(data = mock_data) +
  geom_glyph()

# Grouped
ggplot(data = mock_data_grouped) +
  geom_glyph() +
  facet_wrap(~ group)
```

Note that the function works well with up to 9 nodes:

```{r example_glyphs_diff_num_nodes}
plot_list <- list()

for (num_nodes in 3:9) {
  data <- generate_mock_data(n_nodes = num_nodes, n_edges = num_nodes * 5, statistical = TRUE)
  p <- ggplot(data = data) +
    geom_glyph(label_size = 9, node_size = 0.5)
  plot_list[[length(plot_list) + 1]] <- p
}

final_grid <- wrap_plots(plot_list, ncol = 2)
final_grid
```

This style of plots was first used in [this paper](https://doi.org/10.1371/journal.pone.0245100), where the authors investigated the relationship between spokesperson and the likelihood of message resharing during the COVID-19 pandemic using pairwise statistical tests. In that paper, the plots were painstakingly created manually in Photoshop. Now we have a package for that ;).

### Some Prettier Examples... Well, depends on the eye of the beholder
These plots can also be improved aesthetically using the arguments in Table 1. To illustrate, I will use the mock data created earlier.

First, you can change the fill color of the nodes and edges.

Note that if an edge or a node outline colour is provided but not a fill colour, the outline colour is used for both. This also applies if a fill colour is provided but no outline colour.

Furthermore, if you use a colour function such as `viridis` and you do not manually set a `scale_*_manual()` (more on this below), you will always get the default legend (black nodes and grey edge).

```{r example_glyphs_fill}
# Non-grouped
ggplot(data = mock_data) +
  geom_glyph(node_fill = "purple", edge_fill = "purple")

# Grouped
ggplot(data = mock_data_grouped) +
  geom_glyph(node_fill = viridis, edge_fill = viridis) +
  facet_wrap(~ group)
```

Next, you can change the outline color of the nodes and edges:

```{r example_glyphs_outline}
# Non-grouped
ggplot(data = mock_data) +
  geom_glyph(
    node_colour = "black",
    node_fill = "purple",
    edge_colour = "black",
    edge_fill = "purple"
  )

# Grouped
ggplot(data = mock_data_grouped) +
  geom_glyph(
    node_colour = "black",
    node_fill = viridis,
    edge_colour = "black",
    edge_fill = viridis
  ) +
  facet_wrap(~ group)
```

Further, you can change the size of both the nodes and the edges:

```{r example_glyphs_size}
# Non-grouped
ggplot(data = mock_data) +
  geom_glyph(
    node_colour = "black",
    node_fill = "purple",
    node_size = 0.5,
    edge_colour = "black",
    edge_fill = "purple",
    edge_size = 0.75
  )

# Grouped
ggplot(data = mock_data_grouped) +
  geom_glyph(
    node_colour = "black",
    node_fill = "purple",
    node_size = 0.5,
    edge_colour = "black",
    edge_fill = "purple",
    edge_size = 0.75
  ) +
  facet_wrap(~ group)
```

Then, you can change the transparency of the nodes and the edges as well as the spacing between the nodes:

```{r example_glyphs_alpha}
# Non-grouped
ggplot(data = mock_data) +
  geom_glyph(
    node_colour = "black",
    node_fill = "purple",
    node_size = 0.5,
    node_alpha = 0.5,
    node_spacing = 0.5,
    edge_colour = "black",
    edge_fill = "purple",
    edge_size = 0.75,
    edge_alpha = 0.5
  )

# Grouped
ggplot(data = mock_data_grouped) +
  geom_glyph(
    node_colour = "black",
    node_fill = "purple",
    node_size = 0.5,
    node_alpha = 0.5,
    node_spacing = 0.5,
    edge_colour = "black",
    edge_fill = "purple",
    edge_size = 0.75,
    edge_alpha = 0.5
  ) +
  facet_wrap(~ group)
```

The shape of the nodes can also be changed. Click [here](https://ggplot2.tidyverse.org/reference/scale_shape.html) for a list of all `ggplot2` shapes.

```{r example_glyphs_shape}
# Non-grouped
ggplot(data = mock_data) +
  geom_glyph(
    node_colour = "black",
    node_fill = "purple",
    node_size = 0.5,
    node_alpha = 0.5,
    node_spacing = 0.5,
    node_shape = 24,
    edge_colour = "black",
    edge_fill = "purple",
    edge_size = 0.75,
    edge_alpha = 0.5
  )

# Grouped
ggplot(data = mock_data_grouped) +
  geom_glyph(
    node_colour = "black",
    node_fill = "purple",
    node_size = 0.5,
    node_alpha = 0.5,
    node_spacing = 0.5,
    node_shape = 24,
    edge_colour = "black",
    edge_fill = "purple",
    edge_size = 0.75,
    edge_alpha = 0.5
  ) +
  facet_wrap(~ group)
```

In addition, the size of the labels can be changed:

```{r example_glyphs_labels}
# Non-grouped
ggplot(data = mock_data) +
  geom_glyph(
    node_colour = "black",
    node_fill = "purple",
    node_size = 0.5,
    node_alpha = 0.5,
    node_spacing = 0.5,
    node_shape = 24,
    edge_colour = "black",
    edge_fill = "purple",
    edge_size = 0.75,
    edge_alpha = 0.5,
    label_size = 14
  )

# Grouped
ggplot(data = mock_data_grouped) +
  geom_glyph(
    node_colour = "black",
    node_fill = "purple",
    node_size = 0.5,
    node_alpha = 0.5,
    node_spacing = 0.5,
    node_shape = 24,
    edge_colour = "black",
    edge_fill = "purple",
    edge_size = 0.75,
    edge_alpha = 0.5,
    label_size = 10,
    group_label_size = 15
  ) +
  facet_wrap(~ group)
```

Similarly, the legend title and subtitle can be changed:

```{r example_glyphs_legend}
# Non-grouped
ggplot(data = mock_data) +
  geom_glyph(
    node_colour = "black",
    node_fill = "purple",
    node_size = 0.5,
    node_alpha = 0.5,
    node_spacing = 0.5,
    node_shape = 24,
    edge_colour = "black",
    edge_fill = "purple",
    edge_size = 0.75,
    edge_alpha = 0.5,
    label_size = 14,
    legend_title = "Legend Title",
    legend_subtitle = "Legend Subtitle"
  )

# Grouped
ggplot(data = mock_data_grouped) +
  geom_glyph(
    node_colour = "black",
    node_fill = "purple",
    node_size = 0.5,
    node_alpha = 0.5,
    node_spacing = 0.5,
    node_shape = 24,
    edge_colour = "black",
    edge_fill = "purple",
    edge_size = 0.75,
    edge_alpha = 0.5,
    label_size = 10,
    group_label_size = 15,
    legend_title = "Legend Title",
    legend_subtitle = "Legend Subtitle"
  ) +
  facet_wrap(~ group)
```

Finally, you can use the standard `ggplot2` functions with `+` to change certain aspects of the appearance.

Note that if you would like to use `ggplot2`'s `scale_*_manual()` for a faceted plot, you need specify a grouping variable in the `mapping` argument in `ggplot()`. Further, `scale_colour_manual()` and `scale_fill_manual()` will apply to the edges and `scale_shape_manual()` to the nodes.

Furthermore, if you have data with more than 6 groups and you manually specify different shapes for each using `scale_shape_manual()` the warning:

```
Warning message:
The shape palette can deal with a maximum of 6 discrete values because more than 6 becomes difficult to
discriminate
ℹ you have requested 9 values. Consider specifying shapes manually if you need that many have them. 
```

will appear. This can safely be ignored.

```{r example_glyphs_additional, warning=FALSE}
# Non-grouped
ggplot(data = mock_data) +
  geom_glyph(
    node_colour = "black",
    node_fill = "purple",
    node_size = 0.5,
    node_alpha = 0.5,
    node_spacing = 0.5,
    node_shape = 24,
    edge_colour = "black",
    edge_fill = "purple",
    edge_size = 0.75,
    edge_alpha = 0.5,
    label_size = 14,
    legend_title = "Legend Title",
    legend_subtitle = "Legend Subtitle"
  ) +
  labs(title = "Very Creative Title") +
  theme(
    legend.box.margin = margin(l = 20, r = 20),
    strip.background = element_rect(fill = "white", color = "black", linewidth = 0.5)
  )

# Grouped
ggplot(data = mock_data_grouped, aes(colour = group, fill = group, shape = group)) +
  geom_glyph(
    node_colour = "black",
    node_fill = "purple",
    node_size = 0.5,
    node_alpha = 0.5,
    node_spacing = 0.5,
    edge_size = 0.75,
    edge_alpha = 0.5,
    label_size = 10,
    group_label_size = 15,
    legend_title = "Legend Title",
    legend_subtitle = "Legend Subtitle"
  ) +
  facet_wrap(~ group) +
  labs(title = "Very Creative Title") +
  scale_color_manual(values = c("Group 1" = "black", "Group 2" = "green", "Group 3" = "blue")) +
  scale_fill_manual(values = c("Group 1" = "red", "Group 2" = "black", "Group 3" = "yellow")) +
  scale_shape_manual(values = c("Group 1" = 22, "Group 2" = 23, "Group 3" = 24)) +
  theme(
    legend.box.margin = margin(l = 20, r = 20),
    strip.background = element_rect(fill = "white", color = "black", linewidth = 0.5)
  )
```

Please note again that if you manually set the colour, fill, or shape, you should *not* use the corresponding `geom_glyph()` argument.

In the following chapter, I will briefly go over the two functions for data wrangling and demonstrate how they together with the two datasets can be used to create glyphs.

## The Data Wrangling Functions
As mentioned above, `gglyph` includes two functions for data wrangling `process_data_statistical` and `process_data_general`. In the table below, I have listed the different arguments for each function. <br>

```{r data_wrangling_func_table, echo=FALSE}
eq_table <- tribble(
  ~Argument,  ~Explanation,
  #---------|---------------------------------------------------------------------------------------
  "`data`",     "A DataFrame to be processed.",
  "`from`",     "Column name for the start nodes.",
  "`to`",       "Column name for the end nodes.",
  "`group`",    "Column name for the grouping variable.",
  "`sig`*",     "Column name for the significance level.",
  "`tresh`*",   "Significance threshold. Default is 0.05."
)

kable(eq_table, "html", caption = "<span style='font-size: 0.9em;'>Table 5: Arguments in `process_data_statistical` and `process_data_general`</span>", booktabs = TRUE) %>%
  kable_styling(full_width = FALSE, font_size = 13) %>%
  footnote(symbol = "Argument is only available in `process_data_statistical`.")
```

To illustrate how raw data is processed using `process_data_statistical` and `process_data_general`, I will use the two datasets in `gglyph` and show a "before and after".

First, I will load and wrangle the datasets included in the package (see the first chapter).

For the PISA 2022 dataset, I used the country variable (CNT), the variable indicating the highest educational level attainment by either parent (HISCED), and an average score of the math comprehension items (PV*MATH) to conduct pairwise t-tests (with Bonferroni correction).

For the SIPRI dataset, I will use the absolute amount of military expenditures in current US dollars to create higher-lower pairwise relationships.

For both, I will use the ready-made datasets included in the package. For more information on how they were created, click [here](https://github.com/valentinsvelev/gglyph/tree/main/data-raw).

```{r load_data_from_pkg}
data(pisa_2022)
data(sipri_milex_1995_2023)
```

This is what the two datasets that I will henceforth work with look like: <br>

```{r, echo=FALSE}
kable(pisa_2022 %>% head(), "html", caption = "<span style='font-size: 0.9em;'>Table 6: Raw statistical data (PISA)</span>", booktabs = TRUE) %>%
  kable_styling(full_width = FALSE, font_size = 12)

kable(sipri_milex_1995_2023 %>% head(), "html", caption = "<span style='font-size: 0.9em;'>Table 7: Raw non-statistical data (SIPRI MilEx)</span>", booktabs = TRUE) %>%
  kable_styling(full_width = FALSE, font_size = 12)
```

Compared with after using the the functions `process_data_statistical()` or `process_data_general()`:

```{r}
# Process the PISA data (statistical data)
## Grouped data
processed_data_pisa_group <- process_data_statistical(
  data = pisa_2022,
  from = "from",
  to = "to",
  sig = "sig",
  group = "group",
  thresh = 0.05
)

## Non-grouped data
processed_data_pisa <- process_data_statistical(
  data = pisa_2022[pisa_2022$group == "Germany",],
  from = "from",
  to = "to",
  sig = "sig",
  thresh = 0.05
)

# Process the SIPRI MilEx data (non-statistical data)
## Grouped data
processed_data_sipri_group <- process_data_general(
  data = sipri_milex_1995_2023,
  from = "from",
  to = "to",
  group = "group"
)

## Non-grouped data
processed_data_sipri <- process_data_general(
  data = sipri_milex_1995_2023[sipri_milex_1995_2023$group == "2023",],
  from = "from",
  to = "to"
)
```

This is what the processed datasets look like:

(Note: I will only show the PISA dataset) <br>

```{r, echo=FALSE}
kable(processed_data_pisa %>% head(), "html", caption = "<span style='font-size: 0.9em;'>Table 8: Processed ungrouped statistical data</span>", booktabs = TRUE) %>%
  kable_styling(full_width = FALSE, font_size = 10)

kable(processed_data_pisa_group %>% head(), "html", caption = "<span style='font-size: 0.9em;'>Table 9: Processed grouped statistical data</span>", booktabs = TRUE) %>%
  kable_styling(full_width = FALSE, font_size = 10)
```

With this data the following plots can be created:

```{r glyphs_pisa_base}
ggplot(data = processed_data_pisa) +
  geom_glyph()

ggplot(data = processed_data_pisa_group) +
  geom_glyph() +
  facet_wrap(~ group)
```

And for the SIPRI dataset:

```{r glyphs_sipri_base}
ggplot(data = processed_data_sipri) +
  geom_glyph()

ggplot(data = processed_data_sipri_group) +
  geom_glyph() +
  facet_wrap(~ group)
```

After a bit of polishing, they can look like this:

```{r glyphs_pisa_polished}
ggplot(data = processed_data_pisa) +
  geom_glyph(
    node_size = 1.175,
    node_colour = "black",
    edge_colour = "orange"
  ) +
  labs(title = "PISA 2022 Parental Education")

ggplot(data = processed_data_pisa_group) +
  geom_glyph(
    node_size = 0.75,
    node_fill = rainbow,
    node_colour = "black",
    edge_fill = rainbow,
    label_size = 3.75,
    group_label_size = 6.75
  ) +
  facet_wrap(~ group) +
  labs(title = "PISA 2022 Parental Education")
```

And for the SIPRI dataset:

```{r glyphs_sipri_polished}
ggplot(data = processed_data_sipri) +
  geom_glyph(
    node_size = 1.175,
    node_colour = "black",
    node_fill = "purple",
    edge_fill = "blue"
  ) +
  labs(title = "SIPRI Military Expenditures")

ggplot(data = processed_data_sipri_group) +
  geom_glyph(
    node_fill = viridis,
    node_colour = "black",
    edge_fill = viridis
  ) +
  facet_wrap(~ group) +
  labs(title = "SIPRI Military Expenditures")
```

## Concluding Remarks
You can save the plot using `ggsave()` from `ggplot2`:

```{r ggsave, eval=FALSE}
ggsave(filename = "plot.pdf", plot = last_plot(), width = 8, height = 6, dpi = 300)
```

Finally, if you find any bugs or if you have any additional features that you would like me to add, please let me know at [valentin.velev@uni-konstanz.de](mailto:valentin.velev@uni-konstanz.de).

```{r reset_params, include=FALSE}
options(default_opts)
par(default_par)
```
