---
title: "Main functions: abundance data"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Main functions: abundance data}
  %\VignetteEngine{knitr::knitr}
  %\VignetteEncoding{UTF-8}
---

<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/7.0.1/css/all.min.css" integrity="sha512-2SwdPD6INVrV/lHTZbO2nodKhrnDdJK9/kg2XD1r9uGqPo1cUbujc+IYdlYdEErWNu69gVcYgdxlmVmzTWnetw==" crossorigin="anonymous" referrerpolicy="no-referrer" />


## <i class="fa-solid fa-code"></i> Complete code example

Here are presented, in a full and complete example, all main functions (starting with `BIOMOD_[...]`) of `biomod2`.

The data set used is the [DataSTOC](../reference/DataSTOC.html) containing abundance data. <br/> Similar examples are presented for binary data on the [Main functions (bin) webpage](examples_1_mainFunctions_BIN.html).

<br/><br/>


### <i class="fa-solid fa-truck-ramp-box"></i> Load dataset and variables

```R
library(biomod2)
library(terra)

# Load species abundances (20 species available)
data('DataSTOC')
head(DataSTOC)

# Divide into calibration/validation and evaluation
period1 <- which(DataSTOC[, 'period'] == '2006-2011')
period2 <- which(DataSTOC[, 'period'] == '2012-2017')

# Select the name of the studied species
myRespName <- 'Periparus.ater'

# Get corresponding count data
myResp <- as.numeric(DataSTOC[, myRespName])

# Get corresponding XY coordinates
myRespXY <- DataSTOC[, c('X_WGS84', 'Y_WGS84')]

# Get corresponding environmental variables
myExpl <- DataSTOC[, c('temp', 'precip', 'sdiv_hab',
                       'cover_agri', 'cover_water', 'cover_wet')]
```


<br/><br/>


### <i class="fa-solid fa-list-check"></i> Prepare data & parameters

#### <i class="fa-solid fa-spell-check"></i> Format data (observations & explanatory variables)

Main difference when providing abundance data instead of binary is to define the `data.type` parameter. <br/> The distribution used in the selected algorithms afterwards will be set accordingly, for example :

| Type          | Data                                          | Distribution   |
| -------------- | ---------------------------------------------- | --------------- |
| `binary`      | Only 1, and 0 or NA                           | `binomial`     |
| `count`       | Positive integer values                       | `poisson`      |
| `multiclass`  | Categorical classes                           | classification |
| `ordinal`     | Ordered categorical classes                   | classification |
| `relative`    | Continuous values between 0 and 1             | beta           |
| `abundance`   | Positive continuous values                    | `gaussian`     |


```R
# Format data with count data type
myBiomodData <- BIOMOD_FormatingData(resp.name = myRespName, 
                                     resp.var = myResp[period1],
                                     resp.xy = myRespXY[period1, ],
                                     expl.var = myExpl[period1, ],
                                     eval.resp.var = myResp[period2],
                                     eval.resp.xy = myRespXY[period2, ],
                                     eval.expl.var = myExpl[period2, ],
                                     data.type = 'count')
myBiomodData
summary(myBiomodData)
plot(myBiomodData, plot.eval = TRUE)
```

*Note that pseudo-absence selection is only possible when `data.type = 'binary'`.*


#### <i class="fa-solid fa-scissors"></i> Cross-validation datasets

Several cross-validation methods are available and can be selected with the [`BIOMOD_Modeling`](../reference/BIOMOD_Modeling.html) function, which calls the [`bm_CrossValidation`](../reference/bm_CrossValidation.html) function to do so. More examples are presented on the [Auxiliary functions webpage](examples_2_auxiliaryFunctions.html).

When using the `balance` parameter with `multiclass` or `ordinal` data, proportions are balanced within each class as much as possible.

```R
# # k-fold selection
# cv.k <- bm_CrossValidation(bm.format = myBiomodData,
#                            strategy = 'kfold',
#                            nb.rep = 2,
#                            k = 3)
# 
# # stratified selection (geographic)
# cv.s <- bm_CrossValidation(bm.format = myBiomodData,
#                            strategy = 'strat',
#                            k = 2,
#                            balance = 'presences',
#                            strat = 'x')
# head(cv.k)
# head(cv.s)
```

#### <i class="fa-solid fa-rectangle-list"></i> Retrieve modeling options

Modeling options are automatically retrieved from selected models within the [`BIOMOD_Modeling`](../reference/BIOMOD_Modeling.html) function, which calls the [`bm_ModelingOptions`](../reference/bm_ModelingOptions.html) function to do so. Model parameters can also be automatically tuned to a specific dataset, by calling the [`bm_Tuning`](../reference/bm_Tuning.html) function, however it can be quite long. More examples are presented on the [Auxiliary functions webpage](examples_2_auxiliaryFunctions.html).

```R
# # bigboss parameters
# opt.b <- bm_ModelingOptions(data.type = 'count',
#                             models = c('DNN', 'XGBOOST'),
#                             strategy = 'bigboss')
# 
# # tuned parameters with formated data
# opt.t <- bm_ModelingOptions(data.type = 'count',
#                             models = c('DNN', 'XGBOOST'),
#                             strategy = 'tuned',
#                             bm.format = myBiomodData)
# 
# opt.b
# opt.t
```

<br/><br/>


### <i class="fa-solid fa-gear"></i> Run modeling

Not all algorithms are available when it comes to data type other than binary.

|   | ANN | CTA | DNN | FDA | GAM | GBM | GLM | MARS | MAXENT | MAXNET | RF | RFd | SRE | XGBOOST |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| binary     | x | x | x | x | x | x | x | x | x | x | x | x | x | x |
| multiclass |  | x | x | x |  |  |  | x |  |  | x |  |  | x |
| ordinal    |  | x | x | x | x |  | x | x |  |  | x |  |  | x |
| count / relative / abundance |  | x | x |  | x | x | x | x |  |  | x |  |  | x |

New evaluation metrics have been added :

- `Accuracy`, `Recall`, `Precision` and `F1` for multiclass and ordinal
- `RMSE`, `MSE`, `MAE`, `Max_error`, `Rsquared` and `Rsquared_aj` for count, relative and abundance

as well as new ensemble models :

- `EMmode` and `EMfreq` for multiclass and ordinal

<i class="fas fa-exclamation-triangle"></i> The selection of single models for the ensemble modeling is different for `RMSE`, `MSE`, `MAE` and `Max_error` metrics : are kept all models whose evaluation value is < (best value + the given threshold).


#### <i class="fa-solid fa-virus"></i> Single models

```R
# Model single models
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
                                    modeling.id = 'AllModels',
                                    models = c('DNN', 'GBM', 'GLM', 'MARS', 'RF', 'XGBOOST'),
                                    CV.strategy = 'block',
                                    OPT.strategy = 'bigboss',
                                    metric.eval = c('Rsquared', 'Rsquared_aj', 'RMSE'),
                                    var.import = 3)
                                    # seed.val = 123)
                                    # nb.cpu = 8)
myBiomodModelOut

# Get evaluation scores & variables importance
get_evaluations(myBiomodModelOut)
get_variables_importance(myBiomodModelOut)

# Represent evaluation scores & variables importance
bm_PlotEvalMean(bm.out = myBiomodModelOut, dataset = 'calibration')
bm_PlotEvalMean(bm.out = myBiomodModelOut, dataset = 'validation')
bm_PlotEvalBoxplot(bm.out = myBiomodModelOut, dataset = 'calibration', group.by = c('algo', 'algo'))
bm_PlotEvalBoxplot(bm.out = myBiomodModelOut, dataset = 'calibration', group.by = c('algo', 'run'))
bm_PlotVarImpBoxplot(bm.out = myBiomodModelOut, group.by = c('expl.var', 'algo', 'algo'))
bm_PlotVarImpBoxplot(bm.out = myBiomodModelOut, group.by = c('expl.var', 'algo', 'run'))
bm_PlotVarImpBoxplot(bm.out = myBiomodModelOut, group.by = c('algo', 'expl.var', 'run'))

# Represent response curves
bm_PlotResponseCurves(bm.out = myBiomodModelOut, 
                      models.chosen = get_built_models(myBiomodModelOut)[c(1:3, 7:9)],
                      fixed.var = 'median')
bm_PlotResponseCurves(bm.out = myBiomodModelOut, 
                      models.chosen = get_built_models(myBiomodModelOut)[c(1:3, 7:9)],
                      fixed.var = 'min')
bm_PlotResponseCurves(bm.out = myBiomodModelOut, 
                      models.chosen = get_built_models(myBiomodModelOut)[3],
                      fixed.var = 'median',
                      do.bivariate = TRUE)
                      
# Explore models' outliers & residuals
bm_ModelAnalysis(bm.mod = myBiomodModelOut,
                 models.chosen = get_built_models(myBiomodModelOut)[c(1:3, 7:9)])
```


#### <i class="fa-solid fa-viruses"></i> Ensemble models

```R
# Model ensemble models
myBiomodEM <- BIOMOD_EnsembleModeling(bm.mod = myBiomodModelOut,
                                      models.chosen = 'all',
                                      em.by = 'all',
                                      em.algo = c('EMmedian', 'EMmean', 'EMwmean'),
                                      metric.select = c('Rsquared_aj', 'RMSE'),
                                      metric.select.thresh = c(0.4, 2),
                                      metric.eval = c('Rsquared', 'Rsquared_aj', 'RMSE'),
                                      var.import = 3)
myBiomodEM

# Get evaluation scores & variables importance
get_evaluations(myBiomodEM)
get_variables_importance(myBiomodEM)

# Represent evaluation scores & variables importance
bm_PlotEvalMean(bm.out = myBiomodEM, dataset = 'calibration', group.by = 'full.name')
bm_PlotEvalBoxplot(bm.out = myBiomodEM, dataset = 'calibration', group.by = c('full.name', 'full.name'))
bm_PlotVarImpBoxplot(bm.out = myBiomodEM, group.by = c('expl.var', 'full.name', 'full.name'))
bm_PlotVarImpBoxplot(bm.out = myBiomodEM, group.by = c('expl.var', 'algo', 'merged.by.run'))
bm_PlotVarImpBoxplot(bm.out = myBiomodEM, group.by = c('algo', 'expl.var', 'merged.by.run'))

# Represent response curves
bm_PlotResponseCurves(bm.out = myBiomodEM, 
                      models.chosen = get_built_models(myBiomodEM)[c(2, 5)],
                      fixed.var = 'median')
bm_PlotResponseCurves(bm.out = myBiomodEM, 
                      models.chosen = get_built_models(myBiomodEM)[c(2, 5)],
                      fixed.var = 'min')
bm_PlotResponseCurves(bm.out = myBiomodEM, 
                      models.chosen = get_built_models(myBiomodEM)[5],
                      fixed.var = 'median',
                      do.bivariate = TRUE)
```

<br/><br/>


### <i class="fa-solid fa-earth-europe"></i> Project models

The `digits` parameter is used to define the number of digits after the decimal point for the predicted values. <br/> For relative data, the `on_0_1000` parameter can be used in the same way than with binary data.

*Note that `integer` values are lighter when saved in memory than `float` (decimal values).*


#### <i class="fa-solid fa-virus"></i> Single models

```R
# Project single models
myBiomodProj <- BIOMOD_Projection(bm.mod = myBiomodModelOut,
                                  proj.name = 'Period1',
                                  new.env = myExpl[period1, ],
                                  new.env.xy = myRespXY[period1, ],
                                  models.chosen = 'all',
                                  build.clamping.mask = TRUE, 
                                  digits = 1)
myBiomodProj
plot(myBiomodProj, coord = myRespXY[period1, ])
```

#### <i class="fa-solid fa-viruses"></i> Ensemble models

```R
# Project ensemble models (from single projections)
myBiomodEMProj <- BIOMOD_EnsembleForecasting(bm.em = myBiomodEM, 
                                             bm.proj = myBiomodProj,
                                             models.chosen = 'all')

# Project ensemble models (building single projections)
myBiomodEMProj <- BIOMOD_EnsembleForecasting(bm.em = myBiomodEM,
                                             proj.name = 'Period1EM',
                                             new.env = myExpl[period1, ],
                                             models.chosen = 'all')
myBiomodEMProj
plot(myBiomodEMProj, coord = myRespXY[period1, ])
```


<br/><br/>


### <i class="fa-solid fa-ruler-combined"></i> Compare range sizes

```R
# Project onto future conditions
myBiomodProjectionFuture <- BIOMOD_Projection(bm.mod = myBiomodModelOut,
                                              proj.name = 'Period2',
                                              new.env = myExpl[period2, ],
                                              new.env.xy = myRespXY[period2, ],
                                              models.chosen = 'all',
                                              build.clamping.mask = TRUE)

# Compute differences
myBiomodRangeSize <- BIOMOD_RangeSize(proj.current = myBiomodProj, 
                                      proj.future = myBiomodProjectionFuture,
                                      metric.binary = 'TSS')

myBiomodRangeSize@Compt.By.Models
```

<br/><br/>


### <i class="fa-solid fa-file-export"></i> Export a report

```R
# Get a summary report
BIOMOD_Report(bm.out = myBiomodEM, strategy = 'report')

# Get a pre-filled ODMAP
BIOMOD_Report(bm.out = myBiomodEM, strategy = 'ODMAP')

# Get a code report
BIOMOD_Report(bm.out = myBiomodEM, strategy = 'code')
```

<br/><br/>
