--- 
title: "sdmpredictors quickstart guide" 
author: "Samuel Bosch" 
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette 
vignette: > 
  %\VignetteIndexEntry{sdmpredictors quickstart guide} 
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8} 
---

The goal of sdmpredictors is to make environmental data, commonly used for 
species distribution modelling (SDM), also called ecological niche modelling 
(ENM) or habitat suitability modelling, easy to use in R. It contains methods 
for getting metadata about the available environmental data for the current 
climate but also for future and paleo climatic conditions. A way to download the
rasters and load them into R and some general statistics about the different 
layers.

## Getting the metadata

Different list_* functions are available in order to find out which datasets and
environmental layers can be downloaded.

### list_datasets

With the *list_datasets* function you can view all the available datasets. If you 
only want terrestrial datasets then you have to set the marine parameter to 
FALSE and vice versa.

```{r, message=F, warning=F}
library(sdmpredictors)

# exploring the marine datasets 
datasets <- list_datasets(terrestrial = FALSE, marine = TRUE)
```

```{r, echo = FALSE}
knitr::kable(datasets, row.names = FALSE)
```

### list_layers

Using the *list_layers* function we can view all layer information based on datasets, 
terrestrial (TRUE/FALSE), marine (TRUE/FALSE) and/or whether it should be 
monthly data. The table only shows the first 4 columns of the first 3 layers.

```{r} 
# exploring the marine layers 
layers <- list_layers(datasets)
``` 
```{r, echo = FALSE} 
knitr::kable(layers[1:3,1:4], row.names = FALSE)
```

## Citing data

With the *dataset_citations* and *layer_citations* functions you can fetch plain text or bibentries for the datasets and layers used, allowing for proper citation of the data.

```{r}
# print the Bio-ORACLE citation
print(dataset_citations("Bio-ORACLE"))

# print the citation for ENVIREM as Bibtex
print(lapply(dataset_citations("WorldClim", astext = FALSE), toBibtex))

# print the citation for a MARSPEC paleo layer
print(layer_citations("MS_biogeo02_aspect_NS_21kya"))
```

## Loading the data

### load_layers

To be able to use the layers you want in R you have to call the *load_layers* 
function with

```{r, eval = FALSE}
# download pH and Salinity to the temporary directory
load_layers(layers[layers$name %in% c("pH", "Salinity") & 
                     layers$dataset_code == "Bio-ORACLE",], datadir = tempdir())

# set a default datadir, preferably something different from tempdir()
options(sdmpredictors_datadir= tempdir())

# (down)load specific layers 
specific <- load_layers(c("BO_ph", "BO_salinity"))

# equal area data (Behrmann equal area projection) 
equalarea <- load_layers("BO_ph", equalarea = TRUE)
```

## Loading future and paleo data

Similarly to the current climate layers

```{r}
# exploring the available future marine layers 
future <- list_layers_future(terrestrial = FALSE) 
# available scenarios 
unique(future$scenario) 
unique(future$year)

paleo <- list_layers_paleo(terrestrial = FALSE)
unique(paleo$epoch) 
unique(paleo$model_name) 
```

Other functions related to layers metadata and future and paleo layers are:

```{r} 
get_layers_info(c("BO_calcite","BO_B1_2100_sstmax","MS_bathy_21kya"))$common

# functions to get the equivalent future layer code for a current climate layer 
get_future_layers(c("BO_sstmax", "BO_salinity"), 
                  scenario = "B1", 
                  year = 2100)$layer_code 

# functions to get the equivalent paleo layer code for a current climate layer 
get_paleo_layers(c("MS_bathy_5m", "MS_biogeo13_sst_mean_5m"), 
                 model_name = c("21kya_geophysical", "21kya_ensemble_adjCCSM"), 
                 years_ago = 21000)$layer_code 
```

## Statistics

Two types of statistics are available for the current climate layers:

- individual layer statistics 
    - summary statistics (minimum, q1, median, q3, maximum, mad, mean and sd)
    - spatial autocorrelation (Moran's I and Geary's C) 
- Pearson correlation coefficient between the different layers and their 
quadratic

```{r, message=F, warning=F} 
# looking up statistics and correlations for marine annual layers
datasets <- list_datasets(terrestrial = FALSE, marine = TRUE)
layers <- list_layers(datasets)

# filter out monthly layers
layers <- layers[is.na(layers$month),]

layer_stats(layers)[1:2,]

correlations <- layers_correlation(layers)

# create groups of layers where no layers in one group 
# have a correlation > 0.7 with a layer from another group 
groups <- correlation_groups(correlations, max_correlation=0.7)

# group lengths
sapply(groups, length)

for(group in groups) {
  if(length(group) > 1) {
    cat(paste(group, collapse =", "))
    cat("\n")
  }
}

# plot correlations (requires ggplot2)
plot_correlation(correlations)
```