---
title: "How to use shapper for regression"
author: "Alicja Gosiewska"
date: "`r Sys.Date()`"
output: 
  html_document:
    toc: true
    toc_float: true
    number_sections: true
vignette: >
  %\VignetteEngine{knitr::knitr}
  %\VignetteIndexEntry{How to use shapper for regression}
  %\usepackage[UTF-8]{inputenc}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
                      message = FALSE,
                      warning = FALSE)
```

# Introduction

The `shapper` is an R package which ports the `shap` python library in R. For details and examples see [shapper repository on github](https://github.com/ModelOriented/shapper) and [shapper website](https://modeloriented.github.io/shapper/).

SHAP (SHapley Additive exPlanations) is a method to explain predictions of any machine learning model. 
For more details about this method see [shap repository on github](https://github.com/slundberg/shap).

# Install shaper and shap

## R package shapper

```{r eval=FALSE}
library("shapper")
```

## Python library shap

To run shapper python library shap is required. It can be installed both by python or R. To install it throught R, you an use function `install_shap` from the `shapper` package.

```{r, eval = FALSE}
shapper::install_shap()
```


# Load data sets

The example usage is presented on the `titanic` dataset form the R package `DALEX`. 

```{r eval=FALSE}
library("DALEX")
titanic_train <- titanic[,c("survived", "class", "gender", "age", "sibsp", "parch", "fare", "embarked")]
titanic_train$survived <- factor(titanic_train$survived)
titanic_train$gender <- factor(titanic_train$gender)
titanic_train$embarked <- factor(titanic_train$embarked)
titanic_train <- na.omit(titanic_train)
head(titanic_train)
```

# Let's build a model

```{r eval=FALSE}
library("randomForest")
set.seed(123)
model_rf <- randomForest(survived ~ . , data = titanic_train)
model_rf
```

## Prediction to be explained

Let's assume that we want to explain the prediction of a particular observation (male, 8 years old, traveling 1-st class embarked at C, without parents and siblings.

```{r eval=FALSE}
new_passanger <- data.frame(
            class = factor("1st", levels = c("1st", "2nd", "3rd", "deck crew", "engineering crew", "restaurant staff", "victualling crew")),
            gender = factor("male", levels = c("female", "male")),
            age = 8,
            sibsp = 0,
            parch = 0,
            fare = 72,
            embarked = factor("Cherbourg", levels = c("Belfast", "Cherbourg", "Queenstown", "Southampton"))
)
```


## Here shapper starts

To use the function `shap()` function (alias for `individual_variable_effect()`) we need four elements

* a model,
* a data set,
* a function that calculated scores (predict function),
* an instance (or instances) to be explained.

The `shap()` function can be used directly with these four arguments, but for the simplicity here we are using the *DALEX* package with preimplemented predict functions.


```{r eval=FALSE}
library("DALEX")
exp_rf <- explain(model_rf, data = titanic_train[,-1], y = as.numeric(titanic_train[,1])-1)
```

The explainer is an object that wraps up a model and meta-data. Meta data consists of, at least, the data set used to fit model and observations to explain. 

And now it's enough to generate SHAP attributions with explainer for RF model.

```{r eval=FALSE}
library("shapper")
ive_rf <- shap(exp_rf, new_observation = new_passanger)
ive_rf
```

# Plotting results

```{r eval=FALSE}
plot(ive_rf)
```
[](https://modeloriented.github.io/shapper/articles/shapper_regression_files/figure-html/unnamed-chunk-8-1.png)
