---
title: "A real world example"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{a-real-world-example}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This is practically the same code you can find on this blog post of mine:
https://brodrigues.co/posts/2018-11-14-luxairport.html with some minor
updates to reflect the current state of the `{tidyverse}` packages as well as
logging using `{chronicler}`.

Let's first load the required packages, and the `avia` dataset included in the
`{chronicler}` package:


```{r parsermd-chunk-1}
library(chronicler)
library(dplyr)
library(tidyr)
library(lubridate)

data("avia")
```

Now I need to define the needed functions for the analysis. To improve logging,
I add the `dim()` function as the `.g` argument of each function below. This
will make it possible to see how the dimensions of the data change inside the
pipeline:


```{r parsermd-chunk-2}
# Define required functions
# You can use `record_many()` to avoid having to write everything

r_select <- record(select, .g = dim)
r_pivot_longer <- record(pivot_longer, .g = dim)
r_filter <- record(filter, .g = dim)
r_separate <- record(separate, .g = dim)
r_group_by <- record(group_by, .g = dim)
r_summarise <- record(summarise, .g = dim)

```

We can now start by preparing the data:


```{r parsermd-chunk-3}
avia_clean <- avia %>%
  r_select(1, contains("20")) %>% # select the first column and every column starting with 20
  bind_record(r_pivot_longer,
              -starts_with("freq"),
              names_to = "date",
              values_to = "passengers") %>%
  bind_record(r_separate,
              col = 1,
              into = c("freq", "unit", "tra_meas", "air_pr\\time"),
              sep = ",")
```

Let's take a look at the data:


```{r parsermd-chunk-4}
avia_clean
```

The passengers column contains `":"` characters instead of `NA`s, and it's a
character column. Let's convert this column to numbers:


```{r parsermd-chunk-5}
r_mutate <- record(mutate, .g = dim)

avia_clean2 <- avia_clean %>%
  bind_record(r_mutate,
              passengers = as.numeric(passengers))
```

Let's look at the data:



```{r parsermd-chunk-6}
avia_clean2
```

What happened? Let's read the log to find out!


```{r parsermd-chunk-7}
read_log(avia_clean2)
```

So what happened is that `as.numeric()` introduced `NA`s by coercion. This is 
what happens when trying to convert a character to a number, for example
`as.numeric(":")` will result in an `NA`. Because `mutate()` was recorded with
the default value for its `strict` argument (which is `2`), warnings get promoted
to errors. This can be quite useful to avoid problems with silent conversions.
But in this case, we want to ignore the warning: let's record `mutate()` with
`strict = 1`, so that only errors can stop the pipeline:


```{r parsermd-chunk-8}
r_mutate_lenient <- record(mutate, .g = dim, strict = 1)

avia_clean2 <- avia_clean %>%
  bind_record(r_mutate_lenient,
              passengers = as.numeric(passengers)
              )

```

As you can see, the warnings get printed, they're not captured. We can now
take a look at the data and see that `":"` characters where successfully replaced
by `NA`s:


```{r parsermd-chunk-9}
avia_clean2
```

Let’s continue and focus on monthly data:


```{r parsermd-chunk-10}
avia_monthly <- avia_clean2 %>%
  bind_record(r_filter,
              freq == "M",
              tra_meas == "PAS_BRD_ARR",
              !is.na(passengers)) %>%
  bind_record(r_mutate,
              date = paste0(date, "01"),
              date = ymd(date)) %>%
  bind_record(r_select,
              destination = "air_pr\\time", date, passengers)

```

To make sure I only have monthly data, I can count the values of the `date`
column using `dplyr::count()`. But because `avia_monthly` is not a data frame,
but a `chronicle` I need to `record()` the `dplyr::count()` function. But
because I only need it this once, I could instead use `fmap_record()`, which
makes it possible to apply an undecorated function to a `chronicle` object:


```{r parsermd-chunk-11}
fmap_record(avia_monthly, count, date)
```

`avia_monthly` is an object of class `chronicle`, but in essence, it is just a
list, with its own print method:


```{r parsermd-chunk-12}
avia_monthly
```

Now that the data is clean, we can read the log:


```{r parsermd-chunk-13}
read_log(avia_monthly)
```

This is especially useful if the object `avia_monthly` gets saved using
`saveRDS()`. People can then read this object, can read the log to know what
happened and reproduce the steps if necessary.

Let's take a look at the final data set:


```{r parsermd-chunk-14}
avia_monthly %>%
  unveil("value")
```

It is also possible to take a look at the underlying `.log_df` object that
contains more details, and see the output of the `.g` argument (which was
defined in the beginning as the `dim()` function):


```{r parsermd-chunk-15}
check_g(avia_monthly)
```

```{r parsermd-chunk-16, include = FALSE}
hu <- check_g(avia_monthly)$g
```

After `select()` the data has `hu[[1]][1]` rows and `hu[[1]][2]` columns, after
the call to `pivot_longer()`, `hu[[2]][1]` rows and `hu[[2]][2]` columns,
`separate()` adds three columns, after `filter()` only `hu[[5]][1]` rows remain
(`mutate()` does not change the dimensions) and then `select()` is used to
remove three columns.

