---
title: "breakDown plots for the linear models"
author: "Przemyslaw Biecek"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{breakDown plots for the linear model}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

Here we will use the wine quality data (archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv) to present the breakDown package for `lm` models.

```{r}
library("breakDown")
head(wine, 3)
```

Now let's create a liner model for `quality`.

```{r}
model <- lm(quality ~ fixed.acidity + volatile.acidity + citric.acid + residual.sugar + chlorides + free.sulfur.dioxide + total.sulfur.dioxide + density + pH + sulphates + alcohol,
               data = wine)
```

The common goodness-of-fit parameteres for lm model are R^2, adjusted R^2, AIC or BIC coefficients.

```{r}
summary(model)$r.squared
summary(model)$adj.r.squared
BIC(model)
```
They assess the overall quality of fit. But how to understand the factors that drive predictions for a single observation? 

With the `breakDown` package!

```{r, fig.width=7}
library(breakDown)
library(ggplot2)

new_observation <- wine[1,]
br <- broken(model, new_observation)
br
# different roundings
print(br, digits = 2, rounding_function = signif)
print(br, digits = 6, rounding_function = round)
plot(br) + ggtitle("breakDown plot for predicted quality of a wine")
```

Use the `baseline` argument to set the origin of plots.

```{r, fig.width=7}
br <- broken(model, new_observation, baseline = "Intercept")
br
plot(br) + ggtitle("breakDown plot for predicted quality of a wine")
```

Works for interactions as well

```{r, fig.width=7}
model <- lm(quality ~ (alcohol + density  + residual.sugar)^2,
               data = wine)
new_observation <- wine[1,]

br <- broken(model, new_observation, baseline = "Intercept")
br
plot(br) + ggtitle("breakDown plot for predicted quality of a wine")
```