---
title: "Overview of the wcde package"
author: Guy J. Abel, Samir K.C., Michaela Potancokova, Claudia Reiter, Andrea Tamburini and Dilek Yildiz
output:
  html_document:
    fig_caption: false
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Overview of wcde}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

<!-- <img src='https://raw.githubusercontent.com/guyabel/wcde/main/hex/logo_transp.png' align="right" height="200" style="float:right; height:200px;"/> -->

The `wcde` package allows for R users to easily download data from the [Wittgenstein Centre for Demography and Human Capital Data Explorer](https://dataexplorer.wittgensteincentre.org/) as well as containing a number of helpful functions for working with education specific demographic data.

# Installation

You can install the released version of `wcde` from [CRAN](https://CRAN.R-project.org) with:

```{r eval=FALSE}
install.packages("wcde")
```

Install the developmental version with:

```{r eval=FALSE}
library(devtools)
install_github("guyabel/wcde", ref = "main")
```


# Getting data into R

The `get_wcde()` function can be used to download data from the Wittgenstein Centre Human Capital Data Explorer. It requires three user inputs

- `indicator`: a short code for the indicator of interest
- `scenario`: a number referring to a SSP narrative, by default 2 is used (for SSP2)
- `country_code` (or `country_name`): corresponding to the country of interest

```{r, messages = FALSE, message=FALSE}
library(wcde)
# download education specific tfr data
get_wcde(indicator = "etfr",
         country_name = c("Brazil", "Albania"))

# download education specific survivorship rates
get_wcde(indicator = "eassr",
         country_name = c("Niger", "Korea"))
```

## Indicator codes

The indicator input must match the short code from the indicator table. The `find_indicator()` function can be used to look up short codes (given in the first column) from the `wic_indicators` data frame:

```{r}
find_indicator(x = "tfr")
```

## Temporal coverage

By default, `get_wdce()` returns data for all years or available periods or years. The `filter()` function in [dplyr](https://cran.r-project.org/package=dplyr/index.html) can be used to filter data for specific years or periods, for example:

```{r, message=FALSE, warning=FALSE}
library(tidyverse)
get_wcde(indicator = "e0",
         country_name = c("Japan", "Australia")) %>%
  filter(period == "2015-2020")

get_wcde(indicator = "sexratio",
         country_name = c("China", "South Korea")) %>%
  filter(year == 2020)
```

Past data is only available for selected indicators. These can be viewed using the version column:

```{r}
wic_indicators %>%
  filter(`wcde-v3` == "past-available") %>%
  select(1:2)
```

The `filter()` function can also be used to filter specific indicators to specific age, sex or education groups

```{r, messages = FALSE, message=FALSE}
get_wcde(indicator = "sexratio",
         country_name = c("China", "South Korea")) %>%
  filter(year == 2020,
         age == "All")
```


## Country names and codes

Country names are guessed using the [countrycode](https://cran.r-project.org/package=countrycode/index.html) package.

```{r, messages = FALSE, message=FALSE}
get_wcde(indicator = "tfr",
         country_name = c("U.A.E", "Espania", "Österreich"))
```

The `get_wcde()` functions accepts ISO alpha numeric codes for countries via the `country_code` argument:

```{r, messages = FALSE, message=FALSE}
get_wcde(indicator = "etfr", country_code = c(44, 100))
```

A full list of available countries and region aggregates, and their codes, can be found in the `wic_locations` data frame.

```{r}
wic_locations
```


## Scenarios

By default `get_wcde()` returns data for Medium (SSP2) scenario. Results for different SSP scenarios can be returned by passing a different (or multiple) scenario values to the `scenario` argument in `get_data()`.

```{r, messages = FALSE, message=FALSE}
get_wcde(indicator = "growth",
         country_name = c("India", "China"),
         scenario = c(1:3, 22, 23)) %>%
  filter(period == "2095-2100")
```

Set `include_scenario_names = TRUE` to include a columns with the full names of the scenarios

```{r, messages = FALSE, message=FALSE}
get_wcde(indicator = "tfr",
         country_name = c("Kenya", "Nigeria", "Algeria"),
         scenario = 1:3,
         include_scenario_names = TRUE) %>%
  filter(period == "2045-2050")
```

Additional details of the pathways for each scenario numeric code can be found in the `wic_scenarios` object. Further background and links to the corresponding literature are provided in the [Data Explorer](https://dataexplorer.wittgensteincentre.org/)

```{r}
wic_scenarios
```



## All countries data

Data for all countries can be obtained by not setting `country_name` or `country_code`

```{r, messages = FALSE, message=FALSE}
get_wcde(indicator = "mage")
```

## Multiple indicators

The `get_wdce()` function needs to be called multiple times to download multiple indicators. This can be done using the `map()` function in [`purrr`](https://cran.r-project.org/package=purrr/index.html)

```{r, messages = FALSE, message=FALSE}
mi <- tibble(ind = c("odr", "nirate", "ggapedu25")) %>%
  mutate(d = map(.x = ind, .f = ~get_wcde(indicator = .x)))
mi

mi %>%
  filter(ind == "odr") %>%
  select(-ind) %>%
  unnest(cols = d)

mi %>%
  filter(ind == "nirate") %>%
  select(-ind) %>%
  unnest(cols = d)

mi %>%
  filter(ind == "ggapedu25") %>%
  select(-ind) %>%
  unnest(cols = d)
```

## Previous versions

Previous versions of projections from the Wittgenstein Centre for Demography are available using the `version` argument in `get_wdce()`. Set `version` to [`"wcde-v1"`](https://dataexplorer.wittgensteincentre.org/wcde-v1/) or [`"wcde-v2"`](https://dataexplorer.wittgensteincentre.org/wcde-v2/) or  [`"wcde-v3"`](https://dataexplorer.wittgensteincentre.org) (the default since 2024). 
```{r}
get_wcde(indicator = "etfr",
         country_name = c("Brazil", "Albania"),
         version = "wcde-v2")
```
Note, not all indicators and scenarios are available in all versions - see the the `wic_indicators` and `wic_scenarios` objects for further details or see above. 

## Server

If you have trouble with connecting to the IIASA server you can try alternative hosts using the `server` option in `get_wcde()`, which can be set to `"iiasa"` (default) `"github"` and `"1&1"`. 

```{r}
get_wcde(indicator = "etfr",
         country_name = c("Brazil", "Albania"), 
         version = "wcde-v2", server = "github")
```

You may also set `server = "search-available"` to search through the three possible data location to download the data wherever it is available.

# Working with population data

Population data for a range of age-sex-educational attainment combinations can be obtained by setting `indicator = "pop"` in `get_wcde()` and specifying a `pop_age`, `pop_sex` and `pop_edu` arguments. By default each of the three population breakdown arguments are set to "total"

```{r}
get_wcde(indicator = "pop", country_name = "India")
```
The `pop_age` argument can be set to `all` to get population data broken down in five-year age groups. The `pop_sex` argument can be set to `both` to get population data broken down into female and male groups. The `pop_edu` argument can be set to `four`, `six` or `eight` to get population data broken down into education categorizations with different levels of detail.

```{r}
get_wcde(indicator = "pop", country_code = 900, pop_edu = "four")
```

The population breakdown arguments can be used in combination to provide further breakdowns, for example sex and education specific population totals

```{r}
get_wcde(indicator = "pop", country_code = 900, pop_edu = "six", pop_sex = "both")
```

The full age-sex-education specific data can also be obtained by setting `indicator = "epop"` in `get_wcde()`.
<!-- The education population data in the data explorer, obtained by setting `indicator = "epop"` in `get_wcde()`, provide results by up to three different education categorizations (4, 6 and 8 education groups).  -->

<!-- ```{r, messages = FALSE, message=FALSE} -->
<!-- d <- get_wcde(indicator = "epop", country_code = 900) -->
<!-- d -->
<!-- ``` -->

<!-- As the data frame contains multiple groupings, the `edu_group_sum()` function can be used to provide education specific population data. Users can specify the education groupings by setting the `n` argument to 4, 6 or 8. -->

<!-- ```{r, warning=FALSE, message=FALSE} -->
<!-- d %>%  -->
<!--   edu_group_sum(n = 4) %>% -->
<!--   filter(year == 2020) -->

<!-- d %>%  -->
<!--   edu_group_sum(n = 6) %>% -->
<!--   filter(year == 2020, -->
<!--          age == "30--34") -->
<!-- ``` -->

# Population pyramids

Create population pyramids by setting male population values to negative equivalent to allow for divergent columns from the y axis.

```{r}
w <- get_wcde(indicator = "pop", country_code = 900,
              pop_age = "all", pop_sex = "both", pop_edu = "four",
              version = "wcde-v3")
w

w <- w %>%
  mutate(pop_pm = ifelse(test = sex == "Male", yes = -pop, no = pop),
         pop_pm = pop_pm/1e3)
w
```

## Standard plot

Use standard ggplot code to create population pyramid with

-   `scale_x_symmetric()` from the [`lemon`](https://cran.r-project.org/package=lemon/index.html) package to allow for equal male and female x-axis
-   fill colours set to the `wic_col4` object in the wcde package which contains the names of the colours used in the Wittgenstein Centre Human Capital Data Explorer Data Explorer.

Note `wic_col6` and `wic_col8` objects also exist for equivalent plots of population data objects with corresponding numbers of categories of education.

```{r, message=FALSE, warning=FALSE}
library(lemon)

w %>%
  filter(year == 2020) %>%
  ggplot(mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
  geom_col() +
  geom_vline(xintercept = 0, colour = "black") +
  scale_x_symmetric(labels = abs) +
  scale_fill_manual(values = wic_col4, name = "Education") +
  labs(x = "Population (millions)", y = "Age") +
  theme_bw()

```

## Sex label position

Add male and female labels on the x-axis by

- Creating a facet plot with the strips on the bottom with transparent backgrounds and no space between.
- Set the x axis to have zero expansion beyond the values in the data allowing the two sides of the pyramids to meet.
- Add a `geom_blank()` to allow for equal x-axis and additional space at the end of largest columns.

```{r}
w <- w %>%
  mutate(pop_max = ifelse(sex == "Male", -max(pop/1e3), max(pop/1e3)))

w %>%
  filter(year == 2020) %>%
  ggplot(mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
  geom_col() +
  geom_vline(xintercept = 0, colour = "black") +
  scale_x_continuous(labels = abs, expand = c(0, 0)) +
  scale_fill_manual(values = wic_col4, name = "Education") +
  labs(x = "Population (millions)", y = "Age") +
  facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") +
  geom_blank(mapping = aes(x = pop_max * 1.1)) +
  theme(panel.spacing.x = unit(0, "pt"),
        strip.placement = "outside",
        strip.background = element_rect(fill = "transparent"),
        strip.text.x = element_text(margin = margin( b = 0, t = 0)))
```

## Animate

Animate the pyramid through the past data and projection periods using the `transition_time()` function in the
[`gganimate`](https://cran.r-project.org/package=gganimate/index.html) package

```{r, echo=FALSE, eval=FALSE}
library(gganimate)

g <- ggplot(data = w,
       mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
  geom_col() +
  geom_vline(xintercept = 0, colour = "black") +
  scale_x_continuous(labels = abs, expand = c(0, 0)) +
  scale_fill_manual(values = wic_col4, name = "Education") +
  facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") +
  geom_blank(mapping = aes(x = pop_max * 1.1)) +
  theme(panel.spacing.x = unit(0, "pt"),
        strip.placement = "outside",
        strip.background = element_rect(fill = "transparent"),
        strip.text.x = element_text(margin = margin(b = 0, t = 0))) +
  transition_time(time = year) +
  labs(x = "Population (millions)", y = "Age",
       title = 'SSP2 World Population {round(frame_time)}')

animate(g, width = 672, height = 520, units = "px", res = 100,
        renderer = gifski_renderer())

anim_save(filename = "../man/figures/world4_ssp2.gif")
```





```{r, eval =FALSE}
library(gganimate)

ggplot(data = w,
       mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
  geom_col() +
  geom_vline(xintercept = 0, colour = "black") +
  scale_x_continuous(labels = abs, expand = c(0, 0)) +
  scale_fill_manual(values = wic_col4, name = "Education") +
  facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") +
  geom_blank(mapping = aes(x = pop_max * 1.1)) +
  theme(panel.spacing.x = unit(0, "pt"),
        strip.placement = "outside",
        strip.background = element_rect(fill = "transparent"),
        strip.text.x = element_text(margin = margin(b = 0, t = 0))) +
  transition_time(time = year) +
  labs(x = "Population (millions)", y = "Age",
       title = 'SSP2 World Population {round(frame_time)}')
```

<img src='../man/figures/world4_ssp2.gif'/>
