---
title: "Example of global variable importance"
author: "Anna Kozak"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Example of global variable importance}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Example of global variable importance

In this vignette, we present a global variable importance measure based on Partial Dependence Profiles (PDP)  for the random forest regression model.

```{r, include=FALSE, warning=FALSE, error=FALSE, message=FALSE}
library("ggplot2")
```

### 1 Dataset

We work on Apartments dataset from `DALEX` package.

```{r, warning = FALSE, echo = FALSE, message = FALSE, include = TRUE}
library("DALEX")
data(apartments)
head(apartments)
```

### 2 Random forest regression model

Now, we define a random forest regression model and use `explain()` function from `DALEX`.

```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE}
library("randomForest")
apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor +
                                      no.rooms, data = apartments)
explainer_rf <- explain(apartments_rf_model,
                        data = apartmentsTest[,2:5], y = apartmentsTest$m2.price)
```



### 3 Calculate Partial Dependence Profiles

Let see the Partial Dependence Profiles calculated with `DALEX::model_profile()` function. The PDP also can be calculated with `DALEX::variable_profile()` or `ingredients::partial_dependence()`.

```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE}
profiles <- model_profile(explainer_rf)
plot(profiles) 
```

### 4 Calculate measure of global variable importance

Now, we calculated a measure of global variable importance via oscillation based on PDP. 

```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE}
library("vivo")
measure <- global_variable_importance(profiles)
```

```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE}
plot(measure)
```

The most important variable is surface, then no.rooms, floor, and construction.year.


### 5 Comparison of the importance of variables for two or more models

Let created a linear regression model and `explain` object.

```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE}
apartments_lm_model <- lm(m2.price ~ construction.year + surface + floor +
                                      no.rooms, data = apartments)
explainer_lm <- explain(apartments_lm_model,
                        data = apartmentsTest[,2:5], y = apartmentsTest$m2.price)
```

We calculated Partial Dependence Profiles and measure.

```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE}
profiles_lm <- model_profile(explainer_lm)

measure_lm <- global_variable_importance(profiles_lm)
```

```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE}
plot(measure_lm, measure, type = "lines")
```          

Now we can see the order of importance of variables by model.
