---
title: "Model evaluation"
author: "François Rebaudo, using data from Shi et al. 2016"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{sec01}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

This vignette exemplifies the use of the devRate package i) to fit a development rate 
models to empirical datasets with eight species using six nonlinear models, and ii) to 
compare models based on goodness of fit and the trade-off between the model's 
goodness of fit and its structural complexity.

It reproduces some of the results of the study by Shi et al. 2016^[Shi, P. J., Reddy, G. 
V., Chen, L., & Ge, F. (2015). Comparison of thermal performance equations in 
describing temperature-dependent developmental rates of insects:(I) empirical 
models. Annals of the Entomological Society of America, 109(2), 211-215.].

```{r}
require("devRate")
```

## Nonlinear models

__Table 1.__ Models used by Shi et al. 2016

| Model name     | Model name in devRate    | 
|----------------|--------------------------|
| Briere-1       |briere1_99                | 
| Briere-2       |briere2_99                |
| Lactin         |lactin2_95                |
| Performance-2  |perf2_11                  |
| Beta           |beta_16                   |
| Ratkowsky      |ratkowsky_83              |

## Datasets

__Table 2.__ Datasets of temperature-dependent development rates used by Shi et al. 2016 
and method used to retrieve the source dataset.

| Sample species               | Source                        | Method            |
|------------------------------|-------------------------------|-------------------|
|_Helicoverpa armigera_        | Wu et al. (2009)              | IPEC manual       |
|_Kampimodromus aberrans_      | Broufas et al. (2007)         | Table 3           |
|_Toxorhyynchites brevipalpis_ | Trpis (1972)                  | Table 1           |
|_Bactrocera dorsalis_         | Messenger and Flitters (1958) | Table 1           |
|_Aedes aegypti_               | Gilpin and McClelland (1979)  | article not found |
|_Bemisia tabaci_ (B-biotype)  | Xiang et al. (2007)           | article not found |
|_Lipaphis erysimi_            | Liu and Meng (2000)           | Table 1           |
|_Myzus persicae_              | Liu and Meng (1999)           | Table 1           |
|_Epilachna varivestis_        | Shirai and Yara (2001)        | Table 4           |
|_Drosophila buzzatii_         | de Jong (2010)                | Web Plot Digitizer|

Gilpin and McClelland (1979), and Xiang et al. (2007) articles were not found and were 
excluded from this analysis.

The Wu et al. (2009) dataset was retrieved from the IPEC package manual (article not 
found).

```{r}
wuDS <- data.frame(
  temp = seq(from = 15, to = 37, by = 1), 
  devRate = 1/c(41.24, 37.16, 32.47, 26.22, 22.71, 19.01, 16.79, 15.63, 14.27, 12.48, 11.3, 
                 10.56, 9.69, 9.14, 8.24, 8.02, 7.43, 7.27, 7.35, 7.49, 7.63, 7.9, 10.03))
```

The Broufas et al. (2007) dataset was retrieved from Table 3 where the mean development 
period is provided in hours and transformed in the inverse of days.

```{r}
broufasDS <- data.frame(
  temp = c(15.0, 17.0, 20.0, 22.0, 25.0, 27.0, 30.0, 33.0, 35.0), 
  devRate = 1/(c(595.4, 463.0, 260.5, 222.5, 161.8, 153.1, 136.0, 147.8, 182.1)/24))
```

The Trpis et al. (2007) dataset was retrieved from Table 1 where the mean egg development 
time is provided in hours and transformed in the inverse of days.

```{r}
trpisDS <- data.frame(
  temp = c(14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 
           27.0, 28.0, 29.0, 30.0, 31.0, 32.0), 
  devRate = 1/(c(209.0, 174.0, 165.0, 125.0, 102.0, 90.0, 76.0, 62.0, 55.0, 50.0, 48.0, 
                44.0, 41.0, 39.0, 38.0, 37.5, 37.0, 38.0, 39.0)/24))
```

The Messenger and Flitters (1958) dataset was retrieved from Table 1 where the temperature 
is in Fahrenheit and transformed into Celsius and the mean egg development time is provided 
in hours and transformed in the inverse of days.

```{r}
messengerDS <- data.frame(
  temp = (c(55.0, 56.0, 57.0, 58.0, 60.0, 62.5, 65.0, 67.5, 70.0, 75.0, 80.0, 85.0, 87.5, 
            90.0, 92.5, 95.0, 96.0, 97.0, 97.5) - 32)/1.8, 
  devRate = 1/(c(263.0, 232.0, 170.5, 148.0, 121.3, 95.5, 74.0, 62.5, 51.5, 38.0, 30.5, 
                27.0, 25.0, 24.0, 23.5, 25.0, 26.5, 29.3, 34.3)/24))
```

The Liu and Meng (2000) dataset was retrieved from Table 1 (constant temperatures), and for 
the altae form (this may be different from the dataset used by Shi et al. 2016). 

```{r}
liu1DS <- data.frame(
  temp = c(8.3, 11.3, 14.3, 17.3, 20.1, 22.3, 24.7, 26.5, 28.0, 30.0, 33.0, 35.1), 
  devRate = 1/c(47.5, 26.1, 15.8, 11.2, 8.3, 6.7, 6.2, 5.7, 5.4, 5.1, 5.6, 7.1))
```

The Liu and Meng (1999) dataset was retrieved from Table 1 for the altae form (this may be 
different from the dataset used by Shi et al. 2016).

```{r}
liu2DS <- data.frame(
  temp = c(6.2, 8.3, 11.3, 14.3, 17.3, 20.1, 22.3, 24.7, 26.5, 28.0, 30.0), 
  devRate = 1/c(50.4, 33.9, 21.5, 14.3, 11.2, 8.0, 6.8, 6.2, 5.8, 6.1, 6.5))
```

The Shirai and Yara (2001) dataset was retrieved from Table 4 (this may be different from 
the dataset used by Shi et al. 2016).

```{r}
shiraiDS <- data.frame(
  temp = c(12.5, 15.0, 17.5, 20.0, 22.5, 25.0, 27.5, 30.0, 32.5), 
  devRate = 1/c(87.0, 55.5, 41.9, 32.4, 26.9, 21.7, 20.1, 20.6, 20.8))
```

The de Jong (2010) dataset was retrieved using Web Plot Digitizer on Figure 1B (this should 
be different from the dataset used by Shi et al. 2016).

```{r}
deJongDS <- data.frame(
  temp = c(15.1, 16.1, 17.1, 18.0, 21.7, 27.9, 30.1, 31.8), 
  devRate = c(0.0253, 0.0291, 0.0370, 0.0381, 0.0710, 0.1079, 0.1098, 0.0972))
```

## Fitting the six models to the eight empirical datasets

### [1] Briere-1 model

Starting values for the NLS fit were chosen on the basis of available information 
on the package database with the briere1_99 object.

The devRateModel function can be applied to each dataset, or a list 
containing all the datasets can be built to ease the process. In the first case: 

```{r}
wuDS_briere1_99 <- devRateModel(eq = briere1_99, 
  dfData = wuDS,
  startValues = list(aa = 0.01, Tmin = 10, Tmax = 40))

broufasDS_briere1_99 <- devRateModel(eq = briere1_99, 
  dfData = broufasDS,
  startValues = list(aa = 0.01, Tmin = 10, Tmax = 40))

trpisDS_briere1_99 <- devRateModel(eq = briere1_99, 
  dfData = trpisDS,
  startValues = list(aa = 0.01, Tmin = 10, Tmax = 40))

messengerDS_briere1_99 <- devRateModel(eq = briere1_99, 
  dfData = messengerDS,
  startValues = list(aa = 0.01, Tmin = 10, Tmax = 40))

liu1DS_briere1_99 <- devRateModel(eq = briere1_99, 
  dfData = liu1DS,
  startValues = list(aa = 0.01, Tmin = 10, Tmax = 40))

liu2DS_briere1_99 <- devRateModel(eq = briere1_99, 
  dfData = liu2DS,
  startValues = list(aa = 0.01, Tmin = 10, Tmax = 40))

shiraiDS_briere1_99 <- devRateModel(eq = briere1_99, 
  dfData = shiraiDS,
  startValues = list(aa = 0.01, Tmin = 10, Tmax = 40))

deJongDS_briere1_99 <- devRateModel(eq = briere1_99, 
  dfData = deJongDS,
  startValues = list(aa = 0.01, Tmin = 10, Tmax = 40))
```

The results can then be stored in a single list.

```{r}
briere1NLS <- list(wuDS_briere1_99, broufasDS_briere1_99, trpisDS_briere1_99, 
                   messengerDS_briere1_99, liu1DS_briere1_99, liu2DS_briere1_99, 
                   shiraiDS_briere1_99, deJongDS_briere1_99)
```

Or alternatively:

```{r}
listDS <- list(wuDS, broufasDS, trpisDS, messengerDS, liu1DS, liu2DS, shiraiDS, deJongDS)
briere1NLS <- lapply(listDS, function(myDataSet){
  devRateModel(eq = briere1_99, 
    dfData = myDataSet,
    startValues = list(aa = 0.01, Tmin = 10, Tmax = 40)
  )
})
```

### [2] Briere-2 model

Likewise, the briere2_99 model can be fitted to the dataset each one at a time, or 
using the lapply function.

```{r}
broufasDS_briere2_99 <- devRateModel(eq = briere2_99, 
  dfData = broufasDS,
  startValues = list(aa = 0.01, Tmin = 10, Tmax = 40, bb = 2))
### and so on ...
```

```{r}
briere2NLS <- lapply(listDS, function(myDataSet){
  devRateModel(eq = briere2_99, 
    dfData = myDataSet,
    startValues = list(aa = 0.01, Tmin = 10, Tmax = 40, bb = 2)
  )
})
```

### [3] Lactin model

```{r}
broufasDS_lactin2_95 <- devRateModel(eq = lactin2_95, 
  dfData = broufasDS,
  startValues = list(aa = 0.03, Tmax = 30, deltaT = 5.0, bb = -1.5))
### and so on ...
```

```{r}
lactin2NLS <- lapply(listDS, function(myDataSet){
  devRateModel(eq = lactin2_95,
    dfData = myDataSet,
    startValues = list(aa = 0.03, Tmax = 30, deltaT = 5.0, bb = -1.5)
  )
})
```

### [4] Performance-2 Model

```{r}
perf2NLS <- lapply(listDS, function(myDataSet){
  devRateModel(eq = perf2_11, 
    dfData = myDataSet,
    startValues = list(cc = 0.02, T1 = 10, k = 0.5, T2 = 35)
  )
})
```

### [5] Beta model

```{r}
betaNLS <- lapply(listDS, function(myDataSet){
  devRateModel(eq = beta_16, 
    dfData = myDataSet,
    startValues = list(rm = 0.2, T1 = 5, T2 = 40, Tm = 30)
  )
})
```

### [6] Ratkowsky model

```{r}
ratkowskyNLS <- lapply(listDS, function(myDataSet){
  devRateModel(eq = ratkowsky_83, 
    dfData = myDataSet,
    startValues = list(cc = 0.02, T1 = 5, T2 = 40, k = 0.2)
  )
})
```

### Organizing models' results

Model results can be grouped into a single object to ease the data processing.

```{r}
listNLS <- list(briere1NLS, briere2NLS, lactin2NLS, perf2NLS, betaNLS, ratkowskyNLS)
```

## Model evaluation

Shi et al. 2016 used the RSS, R2, R2 adjusted, RMSE, and AIC corrected in their study.

### [1] RSS

The RSS can be computed from the nls objects returned by the devRateModel function, and equation (7) 
from Shi et al. 2016.

\begin{equation}
  RSS = \sum_{i=1}^n (obs_i - pred_i)^2
\end{equation}

```{r}
RSS <- t(sapply(1:6, function(myModel){
  sapply(1:8, function(myDS){
    return(sum((listDS[[myDS]][,2] - predict(listNLS[[myModel]][[myDS]]))^2))
  })
}))
rownames(RSS) <- c("Briere-1", "Briere-2", "Lactin", "Perf-2", "Beta", "Ratkowsky")
colnames(RSS) <- c("wu", "broufas", "trpis", "messenger", "liu1", "liu2", "shirai", "deJong")
```

__Table 3.__ RSS results
```{r, echo=FALSE, results='asis'}
knitr::kable(RSS) 
```

### [2] R squared

The R2 can be computed using equation (8) from Shi et al. 2016.

\begin{equation}
  R^2 = 1 - \frac{\sum_{i=1}^n (obs_i - pred_i)^2}{\sum_{i=1}^n (obs_i - meanObs_i)^2}
\end{equation}

```{r}
R2 <- t(sapply(1:6, function(myModel){
  sapply(1:8, function(myDS){
    return(1 - sum((listDS[[myDS]][,2] - predict(listNLS[[myModel]][[myDS]]))^2) / 
      sum((listDS[[myDS]][,2] - mean(listDS[[myDS]][,2]))^2))
  })
}))
rownames(R2) <- c("Briere-1", "Briere-2", "Lactin", "Perf-2", "Beta", "Ratkowsky")
colnames(R2) <- c("wu", "broufas", "trpis", "messenger", "liu1", "liu2", "shirai", "deJong")
```

__Table 4.__ R2 results
```{r, echo=FALSE, results='asis'}
knitr::kable(R2) 
```

### [3] R squared adjusted 

The R2 adjusted can be computed using equation (9) from Shi et al. 2016.

\begin{equation}
  R^2_{adj} = 1 - \frac{n-1}{n-p}(1-R^2)
\end{equation}

```{r}
R2adj <- t(sapply(1:6, function(myModel){
  p <- 1 + length(coef(listNLS[[myModel]][[1]]))
  sapply(1:8, function(myDS){
    n <- length(listDS[[myDS]][,2])
    Rsq <- 1 - sum((listDS[[myDS]][,2] - predict(listNLS[[myModel]][[myDS]]))^2) / 
      sum((listDS[[myDS]][,2] - mean(listDS[[myDS]][,2]))^2)
    return(1 - (n - 1)/(n - p) * (1 - Rsq))
  })
}))
rownames(R2adj) <- c("Briere-1", "Briere-2", "Lactin", "Perf-2", "Beta", "Ratkowsky")
colnames(R2adj) <- c("wu", "broufas", "trpis", "messenger", "liu1", "liu2", "shirai", "deJong")
```

__Table 5.__ R2 adjusted results
```{r, echo=FALSE, results='asis'}
knitr::kable(R2adj) 
```

### [4] RMSE

The RMSE can be computed using equation (10) from Shi et al. 2016.

\begin{equation}
  RMSE = \sqrt{RSS/(n-p+1)}
\end{equation}

```{r}
RMSE <- t(sapply(1:6, function(myModel){
  p <- 1 + length(coef(listNLS[[myModel]][[1]]))
  sapply(1:8, function(myDS){
    n <- length(listDS[[myDS]][,2])
    RSS <- sum((listDS[[myDS]][,2] - predict(listNLS[[myModel]][[myDS]]))^2)
    return(sqrt(RSS / (n - p + 1)))
  })
}))
rownames(RMSE) <- c("Briere-1", "Briere-2", "Lactin", "Perf-2", "Beta", "Ratkowsky")
colnames(RMSE) <- c("wu", "broufas", "trpis", "messenger", "liu1", "liu2", "shirai", "deJong")
```

__Table 6.__ RMSE results
```{r, echo=FALSE, results='asis'}
knitr::kable(RMSE) 
```

### [5] AIC corrected

The AICc can be computed using equations (11) and (12) from Shi et al. 2016.

\begin{equation}
  AIC_c = -2L+2pn/(n-p-1)
\end{equation}

\begin{equation}
  L = -\frac{n}{2}ln\frac{RSS}{n}
\end{equation}

```{r}
AICc <- t(sapply(1:6, function(myModel){
  p <- 1 + length(coef(listNLS[[myModel]][[1]]))
  sapply(1:8, function(myDS){
    n <- length(listDS[[myDS]][,2])
    RSS <- sum((listDS[[myDS]][,2] - predict(listNLS[[myModel]][[myDS]]))^2)
    L <- -n/2 * log(RSS / n)
    return(-2 * L + 2 * p * n / (n - p - 1))
  })
}))
# number of parameters + 1
p <- sapply(1:6, function(myModel){
  p <- 1 + length(coef(listNLS[[myModel]][[1]]))
  return(p)
})
# sample size
n <- sapply(1:8, function(myDS){
  n <- length(listDS[[myDS]][,2])
  return(n)
})
rownames(AICc) <- paste0(c("Briere-1", "Briere-2", "Lactin", 
      "Perf-2", "Beta", "Ratkowsky"), " (", p, ")")
colnames(AICc) <- paste0(c("wu", "broufas", "trpis", "messenger", 
      "liu1", "liu2", "shirai", "deJong"), " (", n, ")")
```

__Table 7.__ AICc results with the number of parameters + 1 in parenthesis next to the model names 
and the sample size in parenthesis next to the sample names.
```{r, echo=FALSE, results='asis'}
knitr::kable(AICc) 
```

### [6] AIC

The AIC can also be computed with the AIC function in the stats package.

```{r}
AIC <- t(sapply(1:6, function(myModel){
  sapply(1:8, function(myDS){
    return(AIC(listNLS[[myModel]][[myDS]]))
  })
}))
rownames(AIC) <- c("Briere-1", "Briere-2", "Lactin", "Perf-2", "Beta", "Ratkowsky")
colnames(AIC) <- c("wu", "broufas", "trpis", "messenger", "liu1", "liu2", "shirai", "deJong")
```

__Table 8.__ AIC results 
```{r, echo=FALSE, results='asis'}
knitr::kable(AIC) 
```

## Visualization

The fitted models can be plotted using the devRatePlot function from the devRate package, or 
alternatively they can be plotted using the predict function from the stats package. In the 
following code, listDS[[4]] refers to the fourth dataset from Messenger and Flitters (1958) and 
listNLS[[i]][[4]] refers to the corresponding model fit.  

```{r, fig.width = 6, fig.height = 6}
plot(x = listDS[[4]][,1], 
  y = listDS[[4]][,2], 
  ylim = c(0, 1), ylab = "Development rate",
  xlim = c(5, 40), xlab = "Temperature", type = "n")
for(i in 1:6){
  points(x = seq(from = 0, to = 50, by = 0.1), 
    y = predict(listNLS[[i]][[4]], newdata = list(T = seq(from = 0, to = 50, by = 0.1))), 
    type = 'l', lwd = 2, col = i)
}
points(x = listDS[[4]][,1], y = listDS[[4]][,2], pch = 16, cex = 1.5)
legend("topleft", col = 1:6, lwd = 2, legend = c("Briere-1", "Briere-2", "Lactin", "Perf-2", "Beta", "Ratkowsky"))
```

__Figure 1.__ The six models fitted to the Messenger and Flitters (1958) dataset.

```{r, fig.width = 6, fig.height = 6}
plot(x = listDS[[3]][,1], 
  y = listDS[[3]][,2], 
  ylim = c(0, 0.7), ylab = "Development rate",
  xlim = c(5, 40), xlab = "Temperature", type = "n")
for(i in 1:6){
  points(x = seq(from = 0, to = 50, by = 0.1), 
    y = predict(listNLS[[i]][[3]], newdata = list(T = seq(from = 0, to = 50, by = 0.1))), 
    type = 'l', lwd = 2, col = i)
}
points(x = listDS[[3]][,1], y = listDS[[3]][,2], pch = 16, cex = 1.5)
legend("topleft", col = 1:6, lwd = 2, legend = c("Briere-1", "Briere-2", "Lactin", "Perf-2", "Beta", "Ratkowsky"))
```

__Figure 2.__ The six models fitted to the Trpis (1972) dataset.

```{r, fig.width = 7, fig.height = 6}
getPlot <- function(datasetNumber, ...){
  plot(x = listDS[[datasetNumber]][,1], 
    y = listDS[[datasetNumber]][,2], 
    ylab = "Development rate",
    xlab = "Temperature", type = "n", ...)
  for(i in 1:6){
    points(x = seq(from = 0, to = 50, by = 0.1), 
      y = predict(listNLS[[i]][[datasetNumber]], newdata = list(T = seq(from = 0, to = 50, by = 0.1))), 
      type = 'l', lwd = 2, col = i)
  }
  points(x = listDS[[datasetNumber]][,1], y = listDS[[datasetNumber]][,2], pch = 16, cex = 1.5)
  legend("topleft", col = 1:6, lwd = 2, 
         legend = c("Briere-1", "Briere-2", "Lactin", "Perf-2", "Beta", "Ratkowsky"))
}
par(mfrow = c(3, 2), mar = c(4, 4, 1, 1))
getPlot(datasetNumber = 1, ylim = c(0, 0.20), xlim = c(5, 40))
text(x = 40, y = 0.20, labels = "A", cex = 2, pos = 1)
getPlot(datasetNumber = 2, ylim = c(0, 0.25), xlim = c(5, 40))
text(x = 40, y = 0.25, labels = "B", cex = 2, pos = 1)
getPlot(datasetNumber = 5, ylim = c(0, 0.20), xlim = c(0, 40))
text(x = 40, y = 0.20, labels = "C", cex = 2, pos = 1)
getPlot(datasetNumber = 6, ylim = c(0, 0.20), xlim = c(0, 35))
text(x = 35, y = 0.20, labels = "D", cex = 2, pos = 1)
getPlot(datasetNumber = 7, ylim = c(0, 0.08), xlim = c(5, 40))
text(x = 40, y = 0.08, labels = "E", cex = 2, pos = 1)
getPlot(datasetNumber = 8, ylim = c(0, 0.15), xlim = c(5, 35))
text(x = 35, y = 0.15, labels = "F", cex = 2, pos = 1)
```

__Figure 3.__ The six models fitted to the other datasets: A: Wu et al. (2009), B: Broufas et al. (2007), 
C: Liu and Meng (2000), D: Liu and Meng (1999), E: Shirai and Yara (2001), and F: de Jong (2010).

## Conclusion

The first four datasets gave the same results as in Shi et al. 2016, and the last four different results, 
probably because the datasets retrieved from the source may differ (e.g., altae or apterae aphid data in 
Liu et al. 1999; 2000). This vignette exemplifies the use of the devRate package to compare models. For 
interpretations, see the extensive literature on model evaluation with RSS, R2, R2 adjusted, RMSE, AIC, 
AIC corrected available online, and the study by Shi et al. 2016.