---
title: "5. Design Principles for fundiversity"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{5. Design Principles for fundiversity}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This package is built with a clear set of design principles based on current
best practices. These design principles have been established before writing any
code to make sure obstacles along the way would not lower our expectations.

## Scope

fundiversity aims to provide a reliable tool to compute functional diversity
indices. As some of the most used functional diversity indices are defined in
Villéger et al. (2008), fundiversity adopts this framework.
Rao's Quadratic Entropy is quite popular as an additional functional diversity
index, which makes it a good addition to the panel of indices computable with
fundiversity.

## Dependencies

We aim at having as few as dependencies as possible, unless they remain 
relatively lightweight **and** provide a large speed boost.

As CRAN packages are now automatically archived at the same time of any of their
strong dependencies, dependencies for fundiversity should be well established,
have a good track record at remaining on CRAN, and ideally already have a large
number of reverse-dependencies.

Based on these guidelines, some acceptable dependencies are:

- [Rccp](https://cran.r-project.org/package=Rcpp)
- [geometry](https://davidcsterratt.github.io/geometry/)

Additionally, special care is taken with packages that rely on external
libraries, as [their installation might be an issue on shared computing 
platforms where users don't have super-user privileges](https://twitter.com/groundwalkergmb/status/1350336844022521857).

fundiversity does however depend on 
[vegan](https://cran.r-project.org/package=vegan) which brings quite a number of
other dependencies. vegan is still heavy developed and as such shouldn't be 
archived by CRAN without notice. fundiversity users probably already have vegan
installed as they are probably also interested in other community ecology
analyses provided by vegan. Thus depending on vegan is not a dependency nor
an installation burden.

## Functions

### Each index should be computed in its own separate function

Putting each index in its own function improves maintainability by having 
shorter, more easily readable functions, with less control-flow.

Additionally, it speeds up computation (versus the case where all indices 
would be computed each time) and keep the number of columns in the output 
constant (as opposed to the case where an argument would control which index
is returned by a single function with all indices).

### Input data should not be transformed without any explicit action from the user

Some packages in functional ecology transform the data before processing it.
One common such transformation is the use of dimensionality reduction 
techniques.

Such transformations should only be done by the user if they wish to do so but
never by the package itself. Even when documented in the function help, it is
easily overlooked by users and may lead to misinterpreted results.

One acceptable exception is when function need dissimilarity matrix as input,
such as for the computation of FEve or Rao's Quadratic Entropy. Because other
functions in fundiversity accept raw trait data, for the sake of coherence
the functions that need dissimilarity matrix as input should also accept
raw trait data and compute dissimilarity internally. To do so, they should
still make minimal assumptions regarding the input dataset that all traits are
quantitative. If it's not the case, then it's the user's responsibility
to choose the adapted dissimilarity metric she wants to use.

## Inputs

Two inputs are acceptable in fundiversity functions:

- tabular data (such as a `matrix`, or a `tibble`, but with a preference for
`data.frame`s)  with individuals as rows and characteristics as columns.
- a distance matrix between individuals.

The rationale is that providing `data.frame`s with characteristics is more 
user-friendly, as they are a common format in functional ecology, and 
`data.frame`s are a familiar object in R.

However, only allowing `data.frame`s doesn't provide enough flexibility. In
particular, advanced users may want to compute distances with a custom function 
(instead of Euclidean distances).

## Outputs

### Functions should output `data.frame`s

The functions should return `data.frame`s whenever possible for several reasons:

- `data.frame`s are one of the most common object in the R ecosystem meaning 
they are familiar to beginners and there exist many S3 methods for them.
- `data.frame`s enable a pipe-able workflow, which is important it is already 
prevalent in the tidyverse and will be part of base R 4.1

### The outputs of functions should be similar in structure

To avoid further steps of data wrangling after running analyses, all functions
should output similar structured data. When computing functional diversity
indices most users want to compute site-level metric. To make our functions
easy to use and flexible, they should only output the absolutely necessary data:

- a column named `site` containing the site names as given by the user 
(guessed from the row names in the input site-species matrix),
- a column named following the computed functional index (`FDiv`, `FRic`, etc.)
that contains the values of the indices.

That way our functions only outputs unambiguous data that can easily be reused
and merged with other data at the site-level.