---
title: "Exploring keras models with condvis2"
author: "K. Domijan"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Exploring keras models with condvis2}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width=5, fig.height=5 ,fig.align="center", eval=FALSE
)
fpath <- ""
```

`keras` is an R based interface to the [Keras](https://keras.io/): the Python Deep Learning library. It uses the [TensorFlow](https://www.tensorflow.org/) backend engine.

The basic workflow is to define a model object of class `keras.engine.training.Model` by initialising it using the `keras_model_sequential` function and then adding layers to it. Function  `fit` trains a Keras model. It requires the predictors (inputs) and responses (targets/labels) to be passed a two separate data objects as vector, matrix, or arrays. 

## Classification:


Use the Diabetes in Pima Indian Women dataset from library `MASS`
```{r}
library(keras)
library(condvis2)
library(MASS)
set.seed(123)
```

Prepare data for Keras and Condvis: 
```{r}
# Training features
Pima.training <- Pima.tr[,1:7]
# Testing features
Pima.testing <- Pima.te[,1:7]

# Scale the data
Pima.training <-as.matrix(scale(Pima.training))
means <- attr(Pima.training,"scaled:center")
sds<- attr(Pima.training,"scaled:scale")
Pima.testing <- as.matrix(scale(Pima.testing, center=means, scale=sds))

# One hot encode training target values
Pima.trainLabels <- to_categorical(as.numeric(Pima.tr[,8]) -1)[, 2]
# One hot encode test target values
Pima.testLabels <- to_categorical(as.numeric(Pima.te[,8]) -1)[, 2]

# Create dataframes for Condvis
dtf <- data.frame(Pima.training)
dtf$Pima.trainLabels <- Pima.tr[,8]

dtf.te <- data.frame(Pima.testing)
dtf.te$Pima.testLabels <- Pima.te[,8]
```


Define and fit the model:
```{r}

model <- keras_model_sequential() # Add layers to the model
model %>%
  layer_dense(units = 8, activation = 'tanh', input_shape = c(7)) %>%
  layer_dense(units = 1, activation = 'sigmoid')

# Print a summary of a model
summary(model)


# Compile the model
model %>% compile(
  loss = 'binary_crossentropy',
  optimizer = 'adam',
  metrics = 'accuracy'
)

# Fit the model
history <-model %>% fit(Pima.training, Pima.trainLabels,
              epochs = 500,
              batch_size = 50,
                  validation_split = 0.2,
              class_weight = as.list(c("0" = 1, "1"=3))
)
```
Condvis uses a generic `CVpredict` to provide a uniform interface to `predict` methods. For classification, the choice of `ptype` allows for output for each observation as:

- predicted class `ptype` = "pred" (default)
- predicted probability for the last class `ptype` = "prob" (e.g. $P(X=1)$ in binary classification).
- matrix of predicted probabilities for each class `ptype` = "probmatrix"

```{r}
kresponse <- "Pima.testLabels"
kpreds <- setdiff(names(dtf.te),kresponse)
CVpredict(model, dtf.te[1:10,], response=kresponse, predictors=kpreds)
CVpredict(model, dtf.te[1:10,], response=kresponse, predictors=kpreds, ptype="prob")
CVpredict(model, dtf.te[1:10,], response=kresponse, predictors=kpreds, ptype="probmatrix")
```


Note that for `keras` models  so one needs to specify the name of response and predictors for `CVpredict`. When creating the Condvis shiny app, arguments for `CVpredict` can be passed in `condvis` using `predictArgs` argument.

Calculate model accuracy from:

```{r}
mean(CVpredict(model, dtf.te, response=kresponse, predictors=kpreds) == dtf.te$Pima.testLabels)
```


Compare to LDA:

```{r}

fit.lda <- lda(Pima.trainLabels~., data = dtf)
mean(CVpredict(fit.lda, dtf.te) == dtf.te$Pima.testLabels)

```

LDA scores higher on accuracy. It is known that a linear model performs best for this dataset.

```{r eval=F}
kresponse <- "Pima.trainLabels"
kArgs1 <-  list(response=kresponse,predictors=kpreds)
condvis(dtf, list(model.keras = model, model.lda = fit.lda), sectionvars = c("bmi", "glu"), response="Pima.trainLabels",predictArgs = list(kArgs1, NULL), pointColor = "Pima.trainLabels")

```

Click the showprobs button to see class probabilities.

```{r echo=FALSE, out.width='100%', eval=T}
knitr::include_graphics(paste0(fpath, "Pima.png"))
```


### Tour controls

To view a tour through the space where the fits differ: select `Choose tour` option `Diff fits` and click on the arrow below `Tour Step` to watch.
You can increase the number of  points via the `Tour Length` slider.


## Regression:

Use the Boston housing data. This example comes from one of the original keras tutorial vignettes, which is no longer available.

Prepare data:
```{r}
boston_housing <- dataset_boston_housing()
c(train_data, train_labels) %<-% boston_housing$train
c(test_data, test_labels) %<-% boston_housing$test

# Normalize training data
train_data <- scale(train_data)

# Use means and standard deviations from training set to normalize test set
col_means_train <- attr(train_data, "scaled:center")
col_stddevs_train <- attr(train_data, "scaled:scale")
test_data <- scale(test_data, center = col_means_train, scale = col_stddevs_train)

```


Fit the model:
```{r}
build_model <- function() {

  model <- keras_model_sequential() %>%
    layer_dense(units = 64, activation = "relu",
                input_shape = dim(train_data)[2]) %>%
    layer_dense(units = 64, activation = "relu") %>%
    layer_dense(units = 1)

  model %>% compile(
    loss = "mse",
    optimizer = optimizer_rmsprop(),
    metrics = list("mean_absolute_error")
  )

  model
}

model <- build_model()
model %>% summary()


# Display training progress by printing a single dot for each completed epoch.
print_dot_callback <- callback_lambda(
  on_epoch_end = function(epoch, logs) {
    if (epoch %% 80 == 0) cat("\n")
    cat(".")
  }
)

epochs <- 500

# Fit the model 
early_stop <- callback_early_stopping(monitor = "val_loss", patience = 20)

model <- build_model()
history <- model %>% fit(
  train_data,
  train_labels,
  epochs = epochs,
  validation_split = 0.2,
  verbose = 0,
  callbacks = list(early_stop, print_dot_callback)
)

```

Create dataframes for `condvis`:
```{r}

column_names <- c('CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE',
                  'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT')
train_df <- data.frame(train_data)
colnames(train_df) <- column_names
train_df$medv <- as.numeric(train_labels)

test_df <- data.frame(test_data)
colnames(test_df) <- column_names
test_df$medv <- as.numeric(test_labels)
kpreds <- column_names
kresponse <- "medv"
```
Fit some other models for comparison (a random forest and a generalised additive model):

```{r}
suppressMessages(library(mgcv))
gam.model = gam(medv ~ s(LSTAT) + s(RM) + s(CRIM), data=train_df)

suppressMessages(library(randomForest))
rf.model <- randomForest(formula = medv ~ ., data = train_df)

```

Use `CVpredict` to compare RMSE in the scaled data:

```{r}

mean((test_labels - CVpredict(model, test_df, response=kresponse, predictors=kpreds))^2)
mean((test_labels - CVpredict(gam.model, test_df))^2)
mean((test_labels - CVpredict(rf.model, test_df))^2, na.rm=TRUE)
```

RF gives the best fit. 

```{r eval = FALSE}

kArgs <-  list(response=kresponse,predictors=kpreds)
condvis(train_df, list(gam = gam.model, rf = rf.model, kerasmodel = model), sectionvars = c("LSTAT", "RM"),predictArgs = list(NULL, NULL, kArgs) )

```

Ticking `Show 3d surface` shows the 3d-surface of the fit and you can use the `Rotate` slider to rotate them around the z-axis.

```{r echo=FALSE, out.width='100%', eval=T}
knitr::include_graphics(paste0(fpath, "Boston.png"))
```

Random forest gives a blockier fit, compared to the smooth gams and the neural net. The RF fit is more flexible in the areas where there is more data points.