--- title: "Overview of the wcde package" author: Guy J. Abel, Samir K.C., Michaela Potancokova, Claudia Reiter, Andrea Tamburini and Dilek Yildiz output: html_document: fig_caption: false toc: true toc_float: collapsed: false smooth_scroll: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Overview of wcde} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The `wcde` package allows for R users to easily download data from the [Wittgenstein Centre for Demography and Human Capital Data Explorer](https://dataexplorer.wittgensteincentre.org/) as well as containing a number of helpful functions for working with education specific demographic data. # Installation You can install the released version of `wcde` from [CRAN](https://CRAN.R-project.org) with: ```{r eval=FALSE} install.packages("wcde") ``` Install the developmental version with: ```{r eval=FALSE} library(devtools) install_github("guyabel/wcde", ref = "main") ``` # Getting data into R The `get_wcde()` function can be used to download data from the Wittgenstein Centre Human Capital Data Explorer. It requires three user inputs - `indicator`: a short code for the indicator of interest - `scenario`: a number referring to a SSP narrative, by default 2 is used (for SSP2) - `country_code` (or `country_name`): corresponding to the country of interest ```{r, messages = FALSE, message=FALSE} library(wcde) # download education specific tfr data get_wcde(indicator = "etfr", country_name = c("Brazil", "Albania")) # download education specific survivorship rates get_wcde(indicator = "eassr", country_name = c("Niger", "Korea")) ``` ## Indicator codes The indicator input must match the short code from the indicator table. The `find_indicator()` function can be used to look up short codes (given in the first column) from the `wic_indicators` data frame: ```{r} find_indicator(x = "tfr") ``` ## Temporal coverage By default, `get_wdce()` returns data for all years or available periods or years. The `filter()` function in [dplyr](https://cran.r-project.org/package=dplyr/index.html) can be used to filter data for specific years or periods, for example: ```{r, message=FALSE, warning=FALSE} library(tidyverse) get_wcde(indicator = "e0", country_name = c("Japan", "Australia")) %>% filter(period == "2015-2020") get_wcde(indicator = "sexratio", country_name = c("China", "South Korea")) %>% filter(year == 2020) ``` Past data is only available for selected indicators. These can be viewed using the version column: ```{r} wic_indicators %>% filter(`wcde-v3` == "past-available") %>% select(1:2) ``` The `filter()` function can also be used to filter specific indicators to specific age, sex or education groups ```{r, messages = FALSE, message=FALSE} get_wcde(indicator = "sexratio", country_name = c("China", "South Korea")) %>% filter(year == 2020, age == "All") ``` ## Country names and codes Country names are guessed using the [countrycode](https://cran.r-project.org/package=countrycode/index.html) package. ```{r, messages = FALSE, message=FALSE} get_wcde(indicator = "tfr", country_name = c("U.A.E", "Espania", "Österreich")) ``` The `get_wcde()` functions accepts ISO alpha numeric codes for countries via the `country_code` argument: ```{r, messages = FALSE, message=FALSE} get_wcde(indicator = "etfr", country_code = c(44, 100)) ``` A full list of available countries and region aggregates, and their codes, can be found in the `wic_locations` data frame. ```{r} wic_locations ``` ## Scenarios By default `get_wcde()` returns data for Medium (SSP2) scenario. Results for different SSP scenarios can be returned by passing a different (or multiple) scenario values to the `scenario` argument in `get_data()`. ```{r, messages = FALSE, message=FALSE} get_wcde(indicator = "growth", country_name = c("India", "China"), scenario = c(1:3, 22, 23)) %>% filter(period == "2095-2100") ``` Set `include_scenario_names = TRUE` to include a columns with the full names of the scenarios ```{r, messages = FALSE, message=FALSE} get_wcde(indicator = "tfr", country_name = c("Kenya", "Nigeria", "Algeria"), scenario = 1:3, include_scenario_names = TRUE) %>% filter(period == "2045-2050") ``` Additional details of the pathways for each scenario numeric code can be found in the `wic_scenarios` object. Further background and links to the corresponding literature are provided in the [Data Explorer](https://dataexplorer.wittgensteincentre.org/) ```{r} wic_scenarios ``` ## All countries data Data for all countries can be obtained by not setting `country_name` or `country_code` ```{r, messages = FALSE, message=FALSE} get_wcde(indicator = "mage") ``` ## Multiple indicators The `get_wdce()` function needs to be called multiple times to download multiple indicators. This can be done using the `map()` function in [`purrr`](https://cran.r-project.org/package=purrr/index.html) ```{r, messages = FALSE, message=FALSE} mi <- tibble(ind = c("odr", "nirate", "ggapedu25")) %>% mutate(d = map(.x = ind, .f = ~get_wcde(indicator = .x))) mi mi %>% filter(ind == "odr") %>% select(-ind) %>% unnest(cols = d) mi %>% filter(ind == "nirate") %>% select(-ind) %>% unnest(cols = d) mi %>% filter(ind == "ggapedu25") %>% select(-ind) %>% unnest(cols = d) ``` ## Previous versions Previous versions of projections from the Wittgenstein Centre for Demography are available using the `version` argument in `get_wdce()`. Set `version` to [`"wcde-v1"`](https://dataexplorer.wittgensteincentre.org/wcde-v1/) or [`"wcde-v2"`](https://dataexplorer.wittgensteincentre.org/wcde-v2/) or [`"wcde-v3"`](https://dataexplorer.wittgensteincentre.org) (the default since 2024). ```{r} get_wcde(indicator = "etfr", country_name = c("Brazil", "Albania"), version = "wcde-v2") ``` Note, not all indicators and scenarios are available in all versions - see the the `wic_indicators` and `wic_scenarios` objects for further details or see above. ## Server If you have trouble with connecting to the IIASA server you can try alternative hosts using the `server` option in `get_wcde()`, which can be set to `"iiasa"` (default) `"github"` and `"1&1"`. ```{r} get_wcde(indicator = "etfr", country_name = c("Brazil", "Albania"), version = "wcde-v2", server = "github") ``` You may also set `server = "search-available"` to search through the three possible data location to download the data wherever it is available. # Working with population data Population data for a range of age-sex-educational attainment combinations can be obtained by setting `indicator = "pop"` in `get_wcde()` and specifying a `pop_age`, `pop_sex` and `pop_edu` arguments. By default each of the three population breakdown arguments are set to "total" ```{r} get_wcde(indicator = "pop", country_name = "India") ``` The `pop_age` argument can be set to `all` to get population data broken down in five-year age groups. The `pop_sex` argument can be set to `both` to get population data broken down into female and male groups. The `pop_edu` argument can be set to `four`, `six` or `eight` to get population data broken down into education categorizations with different levels of detail. ```{r} get_wcde(indicator = "pop", country_code = 900, pop_edu = "four") ``` The population breakdown arguments can be used in combination to provide further breakdowns, for example sex and education specific population totals ```{r} get_wcde(indicator = "pop", country_code = 900, pop_edu = "six", pop_sex = "both") ``` The full age-sex-education specific data can also be obtained by setting `indicator = "epop"` in `get_wcde()`. # Population pyramids Create population pyramids by setting male population values to negative equivalent to allow for divergent columns from the y axis. ```{r} w <- get_wcde(indicator = "pop", country_code = 900, pop_age = "all", pop_sex = "both", pop_edu = "four", version = "wcde-v3") w w <- w %>% mutate(pop_pm = ifelse(test = sex == "Male", yes = -pop, no = pop), pop_pm = pop_pm/1e3) w ``` ## Standard plot Use standard ggplot code to create population pyramid with - `scale_x_symmetric()` from the [`lemon`](https://cran.r-project.org/package=lemon/index.html) package to allow for equal male and female x-axis - fill colours set to the `wic_col4` object in the wcde package which contains the names of the colours used in the Wittgenstein Centre Human Capital Data Explorer Data Explorer. Note `wic_col6` and `wic_col8` objects also exist for equivalent plots of population data objects with corresponding numbers of categories of education. ```{r, message=FALSE, warning=FALSE} library(lemon) w %>% filter(year == 2020) %>% ggplot(mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) + geom_col() + geom_vline(xintercept = 0, colour = "black") + scale_x_symmetric(labels = abs) + scale_fill_manual(values = wic_col4, name = "Education") + labs(x = "Population (millions)", y = "Age") + theme_bw() ``` ## Sex label position Add male and female labels on the x-axis by - Creating a facet plot with the strips on the bottom with transparent backgrounds and no space between. - Set the x axis to have zero expansion beyond the values in the data allowing the two sides of the pyramids to meet. - Add a `geom_blank()` to allow for equal x-axis and additional space at the end of largest columns. ```{r} w <- w %>% mutate(pop_max = ifelse(sex == "Male", -max(pop/1e3), max(pop/1e3))) w %>% filter(year == 2020) %>% ggplot(mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) + geom_col() + geom_vline(xintercept = 0, colour = "black") + scale_x_continuous(labels = abs, expand = c(0, 0)) + scale_fill_manual(values = wic_col4, name = "Education") + labs(x = "Population (millions)", y = "Age") + facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") + geom_blank(mapping = aes(x = pop_max * 1.1)) + theme(panel.spacing.x = unit(0, "pt"), strip.placement = "outside", strip.background = element_rect(fill = "transparent"), strip.text.x = element_text(margin = margin( b = 0, t = 0))) ``` ## Animate Animate the pyramid through the past data and projection periods using the `transition_time()` function in the [`gganimate`](https://cran.r-project.org/package=gganimate/index.html) package ```{r, echo=FALSE, eval=FALSE} library(gganimate) g <- ggplot(data = w, mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) + geom_col() + geom_vline(xintercept = 0, colour = "black") + scale_x_continuous(labels = abs, expand = c(0, 0)) + scale_fill_manual(values = wic_col4, name = "Education") + facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") + geom_blank(mapping = aes(x = pop_max * 1.1)) + theme(panel.spacing.x = unit(0, "pt"), strip.placement = "outside", strip.background = element_rect(fill = "transparent"), strip.text.x = element_text(margin = margin(b = 0, t = 0))) + transition_time(time = year) + labs(x = "Population (millions)", y = "Age", title = 'SSP2 World Population {round(frame_time)}') animate(g, width = 672, height = 520, units = "px", res = 100, renderer = gifski_renderer()) anim_save(filename = "../man/figures/world4_ssp2.gif") ``` ```{r, eval =FALSE} library(gganimate) ggplot(data = w, mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) + geom_col() + geom_vline(xintercept = 0, colour = "black") + scale_x_continuous(labels = abs, expand = c(0, 0)) + scale_fill_manual(values = wic_col4, name = "Education") + facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") + geom_blank(mapping = aes(x = pop_max * 1.1)) + theme(panel.spacing.x = unit(0, "pt"), strip.placement = "outside", strip.background = element_rect(fill = "transparent"), strip.text.x = element_text(margin = margin(b = 0, t = 0))) + transition_time(time = year) + labs(x = "Population (millions)", y = "Age", title = 'SSP2 World Population {round(frame_time)}') ```