---
title: "Forecasting Time Series Groups in the tidyverse"
author: "Matt Dancho"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Forecasting Time Series Groups in the tidyverse}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, echo = FALSE, message = FALSE, warning = FALSE}
knitr::opts_chunk$set(
    # message = FALSE,
    # warning = FALSE,
    fig.width = 8, 
    fig.height = 4.5,
    fig.align = 'center',
    out.width='95%', 
    dpi = 200
)

# devtools::load_all() # Travis CI fails on load_all()
```

> Extending `broom` to time series forecasting

One of the most powerful benefits of `sweep` is that it helps forecasting at scale within the "tidyverse". There are two common situations:

1. Applying a model to groups of time series
2. Applying multiple models to a time series

In this vignette we'll review how `sweep` can help the __first situation__: _Applying a model to groups of time series_.

# Prerequisites

Before we get started, load the following packages.

```{r, message = F}
library(dplyr)
library(ggplot2)
library(tidyr)
library(purrr)
library(lubridate)
library(tidyquant)
library(timetk)
library(sweep)
library(forecast)
```

# Bike Sales

We'll use the bike sales data set, `bike_sales`, provided with the `sweep` package for this tutorial. The `bike_sales` data set is a _fictional_ daily order history that spans 2011 through 2015. It simulates a sales database that is typical of a business. The customers are the "bike shops" and the products are the "models".  

```{r}
bike_sales
```

We'll analyse the monthly sales trends for the bicycle manufacturer. Let's transform the data set by aggregating by month.

```{r}
bike_sales_monthly <- bike_sales %>%
    mutate(month = month(order.date, label = TRUE),
           year  = year(order.date)) %>%
    group_by(year, month) %>%
    summarise(total.qty = sum(quantity)) 
bike_sales_monthly
```


We can visualize package with a month plot using the `ggplot2` .

```{r}
bike_sales_monthly %>%
    ggplot(aes(x = month, y = total.qty, group = year)) +
    geom_area(aes(fill = year), position = "stack") +
    labs(title = "Quantity Sold: Month Plot", x = "", y = "Sales",
         subtitle = "March through July tend to be most active") +
    scale_y_continuous() +
    theme_tq()
```

Suppose Manufacturing wants a more granular forecast because the bike components are related to the secondary category. In the next section we discuss how `sweep` can help to perform a forecast on each sub-category.


# Performing Forecasts on Groups

First, we need to get the data organized into groups by month of the year. We'll create a new "order.month" date using `zoo::as.yearmon()` that captures the year and month information from the "order.date" and then passing this to `lubridate::as_date()` to convert to date format. 

```{r}
monthly_qty_by_cat2 <- bike_sales %>%
    mutate(order.month = as_date(as.yearmon(order.date))) %>%
    group_by(category.secondary, order.month) %>%
    summarise(total.qty = sum(quantity))
monthly_qty_by_cat2
```

Next, we use the `nest()` function from the `tidyr` package to consolidate each time series by group. The newly created list-column, "data.tbl", contains the "order.month" and "total.qty" columns by group from the previous step. The `nest()` function just bundles the data together which is very useful for iterative functional programming.

```{r}
monthly_qty_by_cat2_nest <- monthly_qty_by_cat2 %>%
    group_by(category.secondary) %>%
    nest()
monthly_qty_by_cat2_nest
```

## Forecasting Workflow

The forecasting workflow involves a few basic steps:

1. Step 1: Coerce to a `ts` object class.
2. Step 2: Apply a model (or set of models)
3. Step 3: Forecast the models (similar to predict)
4. Step 4: Tidy the forecast

## Step 1: Coerce to a `ts` object class

In this step we map the `tk_ts()` function into a new column "data.ts". The procedure is performed using the combination of `dplyr::mutate()` and `purrr::map()`, which works really well for the data science workflow where analyses are built progressively. As a result, this combination will be used in many of the subsequent steps in this vignette as we build the analysis. 

### mutate and map

The `mutate()` function adds a column, and the `map()` function maps the contents of a list-column (`.x`) to a function (`.f`). In our case, `.x = data.tbl` and `.f = tk_ts`. The arguments `select = -order.month`, `start = 2011`, and `freq = 12` are passed to the `...` parameters in map, which are passed through to the function. The `select` statement is used to drop the "order.month" from the final output so we don't get a bunch of warning messages. We specify `start = 2011` and `freq = 12` to return a monthly frequency.

```{r}
monthly_qty_by_cat2_ts <- monthly_qty_by_cat2_nest %>%
    mutate(data.ts = map(.x       = data, 
                         .f       = tk_ts, 
                         select   = -order.month, 
                         start    = 2011,
                         freq     = 12))
monthly_qty_by_cat2_ts
```


## Step 2: Modeling a time series

Next, we map the Exponential Smoothing ETS (Error, Trend, Seasonal) model function, `ets`, from the `forecast` package. Use the combination of `mutate` to add a column and `map` to interatively apply a function rowwise to a list-column. In this instance, the function to map the `ets` function and the list-column is "data.ts". We rename the resultant column "fit.ets" indicating an ETS model was fit to the time series data. 

```{r}
monthly_qty_by_cat2_fit <- monthly_qty_by_cat2_ts %>%
    mutate(fit.ets = map(data.ts, ets))
monthly_qty_by_cat2_fit
```

At this point, we can do some model inspection with the `sweep` tidiers.

### sw_tidy

To get the model parameters for each nested list, we can combine `sw_tidy` within the `mutate` and `map` combo. The only real difference is now we `unnest` the generated column (named "tidy"). Last, because it's easier to compare the model parameters side by side, we add one additional call to `spread()` from the `tidyr` package.

```{r}
monthly_qty_by_cat2_fit %>%
    mutate(tidy = map(fit.ets, sw_tidy)) %>%
    unnest(tidy) %>%
    spread(key = category.secondary, value = estimate)
```

### sw_glance

We can view the model accuracies also by mapping `sw_glance` within the `mutate` and `map` combo.

```{r}
monthly_qty_by_cat2_fit %>%
    mutate(glance = map(fit.ets, sw_glance)) %>%
    unnest(glance)
```

### sw_augment

The augmented fitted and residual values can be achieved in much the same manner. This returns nine groups data. Note that we pass `timetk_idx = TRUE` to return the date format times as opposed to the regular (yearmon or numeric) time series.  

```{r}
augment_fit_ets <- monthly_qty_by_cat2_fit %>%
    mutate(augment = map(fit.ets, sw_augment, timetk_idx = TRUE, rename_index = "date")) %>%
    unnest(augment)

augment_fit_ets
```

We can plot the residuals for the nine categories like so. Unfortunately we do see some very high residuals (especially with "Fat Bike"). This is often the case with realworld data.

```{r}
augment_fit_ets %>%
    ggplot(aes(x = date, y = .resid, group = category.secondary)) +
    geom_hline(yintercept = 0, color = "grey40") +
    geom_line(color = palette_light()[[2]]) +
    geom_smooth(method = "loess") +
    labs(title = "Bike Quantity Sold By Secondary Category",
         subtitle = "ETS Model Residuals", x = "") + 
    theme_tq() +
    facet_wrap(~ category.secondary, scale = "free_y", ncol = 3) +
    scale_x_date(date_labels = "%Y")
```

### sw_tidy_decomp

We can create decompositions using the same procedure with `sw_tidy_decomp()` and the `mutate` and `map` combo.

```{r}
monthly_qty_by_cat2_fit %>%
    mutate(decomp = map(fit.ets, sw_tidy_decomp, timetk_idx = TRUE, rename_index = "date")) %>%
    unnest(decomp)
```



## Step 3: Forecasting the model

We can also forecast the multiple models again using a very similar approach with the `forecast` function. We want a 12 month forecast so we add the argument for the `h = 12` (refer to `?forecast` for all of the parameters you can add, there's quite a few).

```{r}
monthly_qty_by_cat2_fcast <- monthly_qty_by_cat2_fit %>%
    mutate(fcast.ets = map(fit.ets, forecast, h = 12))
monthly_qty_by_cat2_fcast
```

## Step 4: Tidy the forecast

Next, we can apply `sw_sweep` to get the forecast in a nice "tidy" data frame. We use the argument `fitted = FALSE` to remove the fitted values from the forecast (leave off if fitted values are desired). We set `timetk_idx = TRUE` to use dates instead of numeric values for the index. We'll use `unnest()` to drop the left over list-columns and return an unnested data frame.

```{r}
monthly_qty_by_cat2_fcast_tidy <- monthly_qty_by_cat2_fcast %>%
    mutate(sweep = map(fcast.ets, sw_sweep, fitted = FALSE, timetk_idx = TRUE)) %>%
    unnest(sweep)
monthly_qty_by_cat2_fcast_tidy
```


Visualization is just one final step. 

```{r, fig.height=7}
monthly_qty_by_cat2_fcast_tidy %>%
    ggplot(aes(x = index, y = total.qty, color = key, group = category.secondary)) +
    geom_ribbon(aes(ymin = lo.95, ymax = hi.95), 
                fill = "#D5DBFF", color = NA, linewidth = 0) +
    geom_ribbon(aes(ymin = lo.80, ymax = hi.80, fill = key), 
                fill = "#596DD5", color = NA, linewidth = 0, alpha = 0.8) +
    geom_line() +
    labs(title = "Bike Quantity Sold By Secondary Category",
         subtitle = "ETS Model Forecasts",
         x = "", y = "Units") +
    scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
    scale_color_tq() +
    scale_fill_tq() +
    facet_wrap(~ category.secondary, scales = "free_y", ncol = 3) +
    theme_tq() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
```


# Recap

The `sweep` package has a several tools to analyze grouped time series. In the next vignette we will review how to apply multiple models to a time series.
