\documentclass[11pt, A4]{article}
%\usepackage[brazil]{babel}
\usepackage{graphicx}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{url}
\usepackage{Sweave}
\usepackage{natbib}
\usepackage{framed, color}
\usepackage{xspace}
\definecolor{shadecolor}{rgb}{0.9, 0.9, 0.9}
\setlength{\parindent}{0pt}
\setlength{\hoffset}{-0.5in}
\setlength{\textwidth}{6in}
\setlength{\voffset}{-0.1in}
%\pdfpagewidth=\paperwidth
%\pdfpageheight=\paperheight
\newcommand{\R}{\textnormal{\sffamily\bfseries R}\xspace}
% altered bc \sf is obsolete, see:
% http://tex.stackexchange.com/questions/74478/latex-command-incantation-for-r 
\newcommand{\code}[1]{\texttt{#1}}
\SweaveOpts{eval=TRUE, keep.source=TRUE, echo=TRUE}
%\VignetteIndexEntry{Introduction to sads}

\begin{document}

\title{Fitting species abundance models with maximum likelihood \\ Quick reference for \code{sads} package}
\author{Paulo In\'acio Prado,  Murilo Dantas Miranda and Andre Chalom \\ Theoretical Ecology Lab \\ LAGE at the Dep of Ecology, USP, Brazil \\ 
  %\url{http://ecologia.ib.usp.br/let/} \\
  \url{prado@ib.usp.br}}

\date{\today}

\maketitle

@ 
<<R setup, echo=FALSE, >>=
options(width=60, continue=" ")
@ %def 

\section{Introduction}

Species abundance distributions (SADs) are one of the basic patterns
of ecological communities \citep{McGill2007}. 
The empirical distributions are
traditionally modeled through probability distributions. Hence, the
maximum likelihood method can be used to fit and compare competing
models for SADs. 
The package \code{sads} provides functions to fit the most used models
to empirical SADs. The resulting objects have methods to evaluate fits and compare competing
models. The package also allows the simulation of SADs expected from communities' 
samples, with and without aggregation of individuals of the same species.


\section{Installation}

The package is available on CRAN and can be installed in \R with the command:

@ 
<<installation, eval=FALSE>>=
install.packages('sads')
@ %def 

then loaded by

<<load-sads, eval=TRUE>>=
library(sads)
@ %def 

\subsection{Developer version}
\label{sec:developer-version}

The current developer version can be installed from GitHub with:

@
<<install-dev-version, eval=FALSE >>=
library(devtools)
install_github(repo = 'piLaboratory/sads', ref= 'dev', build_vignettes = TRUE)
@ 

And then load the package:

@ 
<<load-sads-package >>=
library(sads)
@ %def 

\section{Exploratory analyses}
\label{sec:analise-exploratoria}

Throughout this document we'll use two data sets of abundances from the sads
package. For more information on these data please refer to their help
pages: 

@ 
<<Loading datasets>>=
data(moths)# William's moth data used by Fisher et al (1943)
data(ARN82.eB.apr77)# Arntz et al. benthos data
data(birds)# Bird census used by Preston (1948)
@ %def 

\subsection{Octaves}
\label{sec:oitavas}

Function \code{octav} tabulates the number of species in classes
of logarithm of abundances at base 2 (Preston's octaves) and returns a data frame 
\footnote{actually an object of class \emph{octav} which inherits from class \emph{dataframe}}:

@ 
<<Tabulating species in octaves>>=
(moths.oc <- octav(moths))
(arn.oc <- octav(ARN82.eB.apr77))
@ %def 

A logical argument \code{preston} allows smoothing the numbers as proposed by \citet{Preston1948}. 

The octave number is the upper limit of the class in log2 scale. 
Hence, for abundance values smaller than one (\emph{e.g.} biomass data) the octave numbers are negative.
A Preston plot is a histogram of this table, obtainable by applying the function \code{plot} to the data frame:

\setkeys{Gin}{width=0.65\textwidth}

{\centering
@ 
<<Ploting-octaves, fig=TRUE, height=5,width=5>>=
plot(moths.oc)
@ def 
}

{\centering
@ 
<<Biomass-octave-plot, fig=TRUE, height=5,width=5>>=
plot(arn.oc)
@ %def
}

The plot method for objects of class \code{octav} has 
a logical argument \code{prop} that rescales the y-axis to relative frequencies of species in each octave,
which can be used to compare different data sets:

@ 
<<octaves-relative-frequencies>>=
plot(moths.oc, prop = TRUE, border=NA, col=NA)
lines(octav(birds), mid = FALSE, prop = TRUE, col="red")
lines(octav(moths), mid = FALSE, prop = TRUE)
legend("topright", c("Preston's birds", "Fisher's moths"), col=c("red", "blue"), lty=1, bty="n")
@ %def 


\subsection{Rank-abundance plots}
\label{sec:rank_abund}
Function \code{rad} returns a data frame of sorted abundances and their ranks 
\footnote{actually an object of class \emph{rad} which inherits from class \emph{dataframe}}:

@ 
<<Rank-abundance tables>>=
head(moths.rad <- rad(moths))
head(arn.rad <- rad(ARN82.eB.apr77))
@ %def 

To get the rank-abundance or Whitaker's plot apply the function \code{plot} on the data frame:

\setkeys{Gin}{width=0.65\textwidth}
{\centering % centers BOTH of the next plots
@ 
<<radplot1, fig=TRUE, height=5, width=5>>=
plot(moths.rad, ylab="Number of individuals")
@ %def 
}

{\centering
@ 
<<radplots, fig=TRUE, height=5, width=5>>=
plot(arn.rad, ylab="Biomass")
@ %def 
}

Again, the plot method for \code{rad}  has a logical argument \code{prop} 
rescales the y-axis to depict relative abundances:

@ 
<<rads-relative-abundances>>=
plot(moths.rad, prop = TRUE, type="n")
lines(rad(birds), prop = TRUE, col="red")
lines(rad(moths), prop = TRUE)
legend("topright", c("Preston's birds", "Fisher's moths"), col=c("red", "blue"), lty=1, bty="n")
@ %def 

\section{Model fitting}
\label{sec:ajuste-e-selecao}
The \emph{sads} package provides maximum-likelihood fits of many
probability distributions to empirical sads. The working horses are the
functions \code{fitsad} for fitting species abundance distributions
and \code{fitrad} for fitting rank-abundance distributions. The first
argument of these functions is the vector of observed abundances 
\footnote{\code{fitrad} also accepts a rank-abundance table returned by function \code{rad} as its first argument.} 
The second argument is the name of the model to be fitted.
Please refer to the help
page of the functions for details on the models. For more information
on the fitting procedure see also the vignette of
the \emph{bbmle} package, on top of which the package \emph{sads} is built.

To fit a log-series distribution use the argument \code{sad='ls'}:

@ 
<<Fitting a log-series model>>=
(moths.ls <- fitsad(moths,'ls'))
@ %def 

The resulting model object inherits from \emph{mle2}
\citep{Bolkerbbmle}, and has all usual methods for model objects, such as
summaries, log-likelihood, and AIC values:
@ 
<<Operations on fitsad object>>=
summary(moths.ls)
coef(moths.ls)
logLik(moths.ls)
AIC(moths.ls)
@ %def 

On the above examples, notice that the \code{print} method\footnote{Or, equivalently, the \code{show} method}
displays some statistics on the input data and 
fitting function used - number of species, number of individuals, truncation point for the probability 
distribution (when used, see below) and whether we are fitting a discrete or
continuous sad or rad - while the \code{summary} method displays information more associated with the
fitting \emph{per se}: standard errors and significance codes for each parameter. Also, notice that
the input data is displayed by both methods, but the \code{print} method only shows the first values, as the
complete list can be quite large.

\subsection{Model diagnostics}
\label{sec:model-diagnostics}
Many other diagnostic and functions are available for sad and rad models. 
To get likelihood profiles, likelihood intervals and confidence intervals use:

\setkeys{Gin}{width=\textwidth}

@ 
<<Profiling and intervals>>=
moths.ls.prf <- profile(moths.ls)
likelregions(moths.ls.prf) #likelihood intervals
confint(moths.ls.prf)
@ %def 

Then use \code{plotprofmle} to plot likelihood profiles at the original scale (relative negative log-likelihood)
and function \code{plot} to get plots at chi-square scale (square-root of twice the relative log-likelihood):

\setkeys{Gin}{width=\textwidth}

@ 
<<Ploting-profiles, fig=TRUE, height=5, width=10>>=
par(mfrow=c(1,2))
plotprofmle(moths.ls.prf)# log-likelihood profile
plot(moths.ls.prf)# z-transformed profile
par(mfrow=c(1,1))
@ %def

\begin{shaded}
  \textbf{Likelihood intervals and confindence intervals:} \hfill
  Likelihood intervals include all values of the parameters that have up to a given 
  log-likelihood absolute difference to the maximum likelihood estimate.
  This difference is the log-likelihood ratio and is set with the argument \code{ratio} 
  of function \code{likelregions}. The default value of \code{ratio} is log(8), and thus
  in the example above the likelihood interval encloses all values of the parameter 
  that are up to 8 times as plausible
  as the estimated value of $\alpha = $ \Sexpr{unname(round(coef(moths.ls)[2],2))}.
  
  Likelihood intervals at log(8) converge to the value of confidence intervals at 95\%
  as sample size increases. In most cases even for moderate sample sizes the limits of
  confidence and likelihood intervals are very close. Discrepancies occur only when the likelihood profile
  is highly asymmetric or have local {\em minima}. But in this kind of profile usually indicates an ill-behaved
  fit, and so the intervals may not be meaningful anyway.
\end{shaded}


When applied on a sad model object, the function \code{plot} returns four diagnostic plots:
@ 
<<Plot-of-predicted-values, fig=TRUE>>=
par(mfrow=c(2,2))
plot(moths.ls)
par(mfrow=c(1,1))
@ %def 


The first two plots (top right and left) are the octave and rank-abundance plots with the predicted values 
of number of species in each octave 
and of each species' abundance. The two last plots (bottom) are quantile-quantile and percentile-percentile graphs of 
the observed vs. predicted abundances. The straight line indicates the expected relation in case of perfect fit.

\subsection{SADs vs RADs}

Species-abundance models assign a probability for each abundance
value. Thus, these models are probability density functions (PDFs) of
abundances of species. 
Rank-abundance models assign a probability for each
\textbf{abundance rank}. They are PDFs for rankings of species. The models are
interchangeable \citep{May1975}, but currently only four rad models
are available in package sads through the argument \code{rad} of
function \code{fitrad}:

\begin{itemize}
\item ``gs'': geometric series (which is NOT geometric PDF, available
  in \code{fitsad} as ``geom'')
\item ``rbs'': broken-stick model \citep{macarthur1957, May1975}
\item ``zipf'': Zipf power-law distribution
\item ``mand'': Zipf-Mandelbrot power-law distribution
\end{itemize}

\begin{shaded}
  \textbf{Comparison to \code{radfit} from \emph{vegan} package:} \hfill
  
  fits by \code{fitsad}, \code{fitrad} and \code{radfit} of \emph{vegan} 
  package provide similar estimates of model coefficients 
  but not comparable likelihood values. The reason for this is the fact each function fits models that assign 
  probability values to data in different ways. Function \code{fitsad} fits PDFs to observed abundances and \code{fitrad} fits PDFs 
  to the ranks of the abundances. Finally, \code{radfit} of \emph{vegan} fits a Poisson generalized linear model 
  to the \emph{expected abundances} deduced 
  from rank-abundance relationships from the corresponding sads and rads models \citep{wilson1991}. 
  See also the help page of \code{radfit}. 
  Therefore \textbf{likelihoods obtained from these three functions are not comparable}.
\end{shaded}

\section{Model selection}

It's possible to fit other models to the same data set, such as the Poisson-lognormal and a truncated lognormal:
@ 
<<Fitting two other models>>=
(moths.pl <- fitsad(x=moths, sad="poilog"))#default is zero-truncated
(moths.ln <- fitsad(x=moths, sad="lnorm", trunc=0.5)) # lognormal truncated at 0.5
@ %def 

moreover, the function \code{AICtab} and friends from the \emph{bbmle} package can be used to get a model selection table:

@ 
<<Model selection table>>=
AICtab(moths.ls, moths.pl, moths.ln, base=TRUE)
@ %def 

\begin{shaded}
  \textbf{Flawed model comparisons} \hfill The information criterion
  methods do not differentiate between objects of \code{fitsad},
  \code{fitrad} or \code{fitsadC} classes. Because of this, it is
  possible to include \code{fitsad} and \code{fitrad} objects in the
  same IC-table without generating an error, but the result will be
  meaningless.
\end{shaded}

To compare visually fits first get octave tables:

@ 
<<Predicted values for octaves>>=
head(moths.ls.oc <- octavpred(moths.ls))
head(moths.pl.oc <- octavpred(moths.pl))
head(moths.ln.oc <- octavpred(moths.ln))
@ %def 

then use \code{lines} to superimpose the predicted values on the octave plot:

\setkeys{Gin}{width=0.75\textwidth}
@ 
<<Octaves-plot, fig=TRUE>>=
plot(moths.oc)
lines(moths.ls.oc, col="blue")
lines(moths.pl.oc, col="red")
lines(moths.ln.oc, col="green")
legend("topright", 
       c("Logseries", "Poisson-lognormal", "Truncated lognormal"), 
       lty=1, col=c("blue","red", "green"))
@ %def 

To do the same with rank-abundance plots get the rank-abundance objects:

@ 
<<Predicted values - radplots>>=
head(moths.ls.rad <- radpred(moths.ls)) 
head(moths.pl.rad <- radpred(moths.pl))
head(moths.ln.rad <- radpred(moths.ln))
@ %def 

then plot observed and predicted values:

@ 
<<Rad-plots, fig=TRUE>>=
plot(moths.rad)
lines(moths.ls.rad, col="blue")
lines(moths.pl.rad, col="red")
lines(moths.ln.rad, col="green")
legend("topright", 
       c("Logseries", "Poisson-lognormal", "Truncated lognormal"), 
       lty=1, col=c("blue","red", "green"))
@ %def 

\subsection{Abundance class data}

For ecological communities, the representation of species abundances
typically occurs through categorization. For instance, when assessing
the abundance of sessile organisms, it is common practice to utilize a
scale based on the coverage of sampling areas, because direct enumeration
of individuals or estimation of biomass are extremely laborious.

The package \code{sads} has a specific class for fitting continuous
distributions for this kind of data. We will show the use of this
class using the data from \citet{Vieira2020}, who provide the coverage
class of each plant species in plots set in grasslands in
Southern Brazil. The object \code{grasslands} has the data from plot
'CA8', which has the largest number of species recorded in this study.

@ 
<<grasslands dataset>>=
head(grasslands)
@ %def

The vector \code{cover} in this data frame has the cover class for
each plant species, that is, the  proportion of the area of the
plot covered by all individuals of each species. Coverage is expressed
in a scale with 13 intervals, as defined by the breakpoints below:

@ 
<<grasslands breakpoints>>=
(grass.brk <- c(0,1,3,5,seq(15,100, by=10),100) )
@ %def

We can tally the number of species in each cover class using a histogram:

@ 
<<grasslands histogram>>=
 grass.h <- hist(grasslands$mids, breaks = grass.brk, plot = FALSE)
@ %def

The resulting object of the class \code{histogram} has the number of
species in each cover class, as well as the classes midpoints:

@
<<grasslands histogram data>>=
 data.frame(midpoint = grass.h$mids, N.spp = grass.h$counts) 
@ %def


The function \code{fitsadC} fits continuous distributions usually
applied to describe this kind of data. The commands below fit all
models currently available for abundance class data in the package
\code{sads}. Note that the data is provided to this function through an object
of the class \code{histogram}:

@
<<grasslands model fit >>=
grass.e <- fitsadC(grass.h, 'exp') # Exponential
grass.g <- fitsadC(grass.h, 'gamma') # Pareto
grass.l <- fitsadC(grass.h, 'lnorm') # Log-normal
grass.p <- fitsadC(grass.h, 'pareto') # Pareto
grass.w <- fitsadC(grass.h, 'weibull') # Weibull
@ %def

The fitted models are of class \code{fitsadC}, for which most of the
methods for diagnostics and model comparison showed in the previous sections are
available. For instance, the fitted models can be compared in the
usual way:

@
<<grasslands model selection >>=
AICctab(grass.e, grass.g, grass.l,
        grass.p, grass.w,
        weights = TRUE, base = TRUE)
@ %def

A histogram with observed values can ploted simply with

@
<<grasslands_hist_plot, fig = TRUE >>=
plot(grass.h, main = "", xlab = "Abundance class")
@ %def

By default, R plots histograms with classes of unequal size using a
density scale. The function \code{coverpred} provides the number of
species expected by a fited model, in frequency, relative frequency
and density scales.

@
<<grasslands coverpred >>=
## Predicted by each model
grass.e.p <- coverpred(grass.e)
grass.g.p <- coverpred(grass.g)
grass.l.p <- coverpred(grass.l)
grass.p.p <- coverpred(grass.p)
grass.w.p <- coverpred(grass.w)
@ %def

We can then use this object to add the densities
values predicted by each model:

@
<<grasslands_coverpred_plot, fig = TRUE>>=
## Plot
plot(grass.h, main = "", xlab = "Abundance class", xlim = c(0,40))
## Adds predicted points
points(grass.e.p, col = 1)
points(grass.g.p, col = 2)
points(grass.l.p, col = 3)
points(grass.p.p, col = 4)
points(grass.w.p, col = 5)
legend("topright",
       legend = c("Exponential", "Gamma", "Log-normal", "Pareto", "Weibull"),
       col = 1:5, bty = "n", 
       lty = 1 , pch = 1)
@ %def

Please refer to the man pages of \code{fitsadC} and
\code{fitsaC-class} for further methods and usage.

\section{Simulations}

The function \code{rsad} returns random samples of a community with
$S$ species. The mean abundances of the species in the communities are
independent identically distributed (\emph{iid}) variables that follow
a given probability distribution. The sample simulates a given number
of draws of a fraction $a$ from the total number of individuals in the
community. For instance, to simulate two Poisson samples of 10\% of a
community with 10 species that follows a lognormal distribution with
parameters $\mu=3$ and $\sigma=1.5$ use:

@ 
<<rsad-example1>>=
set.seed(42)# fix random seed to make example reproducible
(samp1 <- rsad(S = 10, frac = 0.1, sad = "lnorm", 
               coef=list(meanlog = 3, sdlog = 1.5),
               zeroes=TRUE, ssize = 2))
@ 

The function returns a data frame with a sample numeric label,
species' numeric label and species' abundance in each sample. By
default, \code{rsad} returns a vector of abundances of single Poisson
sample with zeroes omitted:

@ 
<<rsad-example2>>=
(samp2 <- rsad(S = 100, frac=0.1, sad="lnorm", 
               list(meanlog=5, sdlog=2)))
@ 

Since this is a Poisson sample of a lognormal community, the abundances
in the sample should follow a Poisson-lognormal distribution with
parameters $\mu + \log a $ and $\sigma$
\citep{grotan2008}. We can check this by fitting a Poisson-lognormal
model to the sample:

@ 
<<rsad-poilog-fit>>=
(samp2.pl <- fitsad(samp2, "poilog"))
## checking correspondence of parameter mu
coef(samp2.pl)[1] - log(0.1)
@ %def 

Not bad. By repeating the sampling and the fit many times it's possible to
evaluate the bias and variance of the maximum likelihood estimates:

\setkeys{Gin}{width=\textwidth}
@ 
<<rsad-repeated-samples>>=
results <- matrix(nrow=75,ncol=2)
for(i in 1:75){
    x <- rsad(S = 100, frac=0.1, sad="lnorm", 
              list(meanlog=5, sdlog=2))
    y <- fitsad(x, "poilog")
    results[i,] <- coef(y)
}
results[,1] <- results[,1]-log(0.1)
@ 

Bias is estimated as the difference between the mean of estimates and
the value of parameters:

@ 
<<rsads_bias>>=
##Mean of estimates
apply(results,2,mean)
## relative bias
(c(5,2)-apply(results,2,mean))/c(5,2)
@ %def 

And the precision of the estimates are their standard deviations

@ 
<<rsads_bias>>=
##Mean of estimates
apply(results,2,sd)
## relative precision
apply(results,2,sd)/apply(results,2,mean)
@ 

Finally, a density plot with lines indicating the mean of estimates and the values of parameters:
@ 
<<rsads-bias-plots, fig=TRUE, height=5, width=10>>=
par(mfrow=c(1,2))
plot(density(results[,1]), main=expression(paste("Density of ",mu)))
abline(v=c(mean(results[,1]),5), col=2:3)
plot(density(results[,2]), main=expression(paste("Density of ",sigma)))
abline(v=c(mean(results[,2]), 2), col=2:3)
par(mfrow=c(1,1))
@ %def 

Increasing the number of simulations improves these estimators.

\section{Bugs and issues}
\label{sec:bugs-issues}

The package project is hosted on GitHub (\url{https://github.com/piLaboratory/sads/}). 
Please report bugs and issues and give us your feedback at 
\url{https://github.com/piLaboratory/sads/issues}.

\bibliographystyle{ecology}
\bibliography{sads_intro}
\end{document}
