---
title: "Overview of unmarked: an R Package for the Analysis of Data from Unmarked Animals"
author:
- name: Ian Fiske
- name: Richard Chandler
date: February 5, 2019
bibliography: unmarked.bib
csl: ecology.csl
output: 
  rmarkdown::html_vignette:
    fig_width: 5
    fig_height: 3.5
    number_sections: true
    toc: true
vignette: >
  %\VignetteIndexEntry{Overview of unmarked}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r,echo=FALSE}
options(rmarkdown.html_vignette.check_title = FALSE)
```

# Abstract

`unmarked` aims to be a complete environment for the
statistical analysis of data from surveys of unmarked
animals. Currently, the focus is on hierarchical models that
separately model a latent state (or states) and an observation
process. This vignette provides a brief overview of the package -
for a more thorough treatment see @fiskeChandler_2011 and @kellner2023.

# Overview of unmarked

Unmarked provides methods to estimate site occupancy, abundance, and
density of animals (or possibly other organisms/objects) that cannot be
detected with certainty. Numerous models are available that correspond
to specialized survey methods such as temporally replicated surveys,
distance sampling, removal sampling, and double observer
sampling. These data are often associated with metadata related to the
design of the study. For example, in distance sampling, the study
design (line- or point-transect), distance class break points,
transect lengths, and units of measurement need to be accounted for in
the analysis. Unmarked uses S4 classes to store data and metadata in a
way that allows for easy data manipulation, summarization, and model
specification. Table 1 lists the currently implemented models and
their associated fitting functions and data classes.

```{r, echo=FALSE}
tab1 <- data.frame(
  Model=c("Occupancy", "Royle-Nichols", "Point Count", "Distance-sampling",
          "Generalized distance-sampling", "Arbitrary multinomial-Poisson",
          "Colonization-extinction", "Generalized multinomial-mixture"),
  `Fitting Function`=c("occu","occuRN","pcount","distsamp","gdistsamp",
                       "multinomPois","colext","gmultmix"),
  Data=c("unmarkedFrameOccu","unmarkedFrameOccu","unmarkedFramePCount",
         "unmarkedFrameDS","unmarkedFrameGDS","unmarkedFrameMPois",
         "unmarkedMultFrame","unmarkedFrameGMM"),
  Citation=c("@mackenzie_estimating_2002","@royle_estimating_2003",
             "@royle_n-mixture_2004","@royle_modeling_2004",
             "@chandlerEA_2011","@royle_generalized_2004",
             "@mackenzie_estimating_2003","@royle_generalized_2004"),
        check.names=FALSE)

knitr::kable(tab1, format='markdown', align="lccc",
             caption="Table 1. Models handled by unmarked.")
```

Each data class can be created with a call to the constructor function
of the same name as described in the examples below.

# Typical unmarked session

The first step is to import the data into R, which we do below using
the `read.csv` function. Next, the data need to be formatted for
use with a specific model fitting function. This can be accomplished
with a call to the appropriate type of `unmarkedFrame`. For
example, to prepare the data for a single-season site-occupancy
analysis, the function `unmarkedFrameOccu` is used.


## Importing and formatting data

Two examples of reading in data from a CSV and formatting for unmarked.
First, from a CSV in "wide" format:

```{r}
library(unmarked)
wt <- read.csv(system.file("csv","widewt.csv", package="unmarked"))
y <- wt[,2:4]
siteCovs <-  wt[,c("elev", "forest", "length")]
obsCovs <- list(date=wt[,c("date.1", "date.2", "date.3")],
    ivel=wt[,c("ivel.1",  "ivel.2", "ivel.3")])
wt <- unmarkedFrameOccu(y = y, siteCovs = siteCovs, obsCovs = obsCovs)
summary(wt)
```

Next, a CSV in "long" format:

```{r}
pcru_raw <- read.csv(system.file("csv","frog2001pcru.csv", package="unmarked"))

sites <- unique(pcru_raw$RouteNumStopNum)
M <- length(sites)
J <- max(table(pcru_raw$RouteNumStopNum))
y <- JulianDate <- MinAfterSunset <- Wind <- Sky <- Temperature <- matrix(NA, M, J)
for (i in 1:length(sites)){
  datsub <- pcru_raw[pcru_raw$RouteNumStopNum == sites[i],]
  n <- nrow(datsub)
  y[i,1:n] <- datsub$Pcru
  JulianDate[i,1:n] <- datsub$JulianDate
  MinAfterSunset[i,1:n] <- datsub$MinAfterSunset
  Wind[i,1:n] <- datsub$Wind
  Sky[i,1:n] <- datsub$Sky
  Temperature[i,1:n] <- datsub$Temperature
}
obscovs <- list(JulianDate = JulianDate, MinAfterSunset = MinAfterSunset,
                Wind = Wind, Sky = Sky, Temperature = Temperature)
pcru <- unmarkedFrameOccu(y = y, obsCovs = obscovs)
```

## Fitting models

Occupancy models can then be fit with the occu() function.
To help stabilize the numerical optimization algorithm, we recommend standardizing covariates.
The best way to do this is to use the `scale()` function on covariates inside your formulas.

```{r}
fm1 <- occu(~1 ~1, pcru)
fm2 <- occu(~ scale(MinAfterSunset) + scale(Temperature) ~ 1, pcru)
fm2
```

Here, we have specified that the detection process is modeled with the
`MinAfterSunset` and `Temperature` covariates.  No covariates are
specified for occupancy here.  See `?occu` for more details.


## Back-transforming parameter estimates

`unmarked` fitting functions return `unmarkedFit` objects which can be
queried to investigate the model fit.  Variables can be
back-transformed to the unconstrained scale using `predict`, which also
returns a 95% confidence interval.
Since there are no occupancy covariates, all sites have the same occupancy
estimate, and we can look at just the first row of the output.

```{r}
predict(fm2, 'state')[1,]
```

The expected probability that a site was
occupied is 0.823. This estimate applies to the hypothetical
population of all possible sites, not the sites found in our sample.
For a good discussion of population-level vs finite-sample inference,
see @royle_dorazio:2008 page 117. Note also that finite-sample
quantities can be computed in `unmarked` using empirical Bayes
methods as demonstrated at the end of this document.

Back-transforming the estimate of $\psi$ was easy because there were
no covariates. Because the detection component was modeled with
covariates, $p$ is a function, not just a scalar quantity, and so we
need to be provide values of our covariates to obtain an
estimate of $p$. Here, we request
the probability of detection given a site is occupied and all
covariates are set to 0.

```{r}
nd <- data.frame(MinAfterSunset = 0, Temperature = 0)
round(predict(fm2, type = 'det', newdata = nd, appendData = TRUE), 2)
```

Thus, we can say that the expected probability of detection was 0.552
when time of day and temperature are fixed at their mean value. The
`predict` metho can also be used to obtain estimates of
parameters at a range of specific covariate values.

```{r}
nd <- data.frame(MinAfterSunset = 0, Temperature = -2:2)
round(predict(fm2, type = 'det', newdata = nd, appendData=TRUE), 2)
```

Confidence intervals are requested with `confint`, using either the
asymptotic normal approximation or profiling.

```{r, eval=FALSE}
confint(fm2, type='det')
confint(fm2, type='det', method = "profile")
```

```{r, echo=FALSE}
confint(fm2, type='det')
nul <- capture.output(ci <- confint(fm2, type='det', method = "profile"))
ci
```

## Model selection and model fit

Model selection and multi-model inference can be implemented after
organizing models using the `fitList` function.

```{r}
fms <- fitList('psi(.)p(.)' = fm1, 'psi(.)p(Time+Temp)' = fm2)
modSel(fms)
predict(fms, type='det', newdata = nd)
```


The parametric bootstrap can be used to check the adequacy of model fit.
Here we use a $\chi^2$ statistic appropriate for binary data.

```{r, warning=FALSE}
chisq <- function(fm) {
    umf <- fm@data
    y <- umf@y
    y[y>1] <- 1
    fv <- fitted(fm)
    sum((y-fv)^2/(fv*(1-fv)), na.rm=TRUE)
    }

set.seed(1)
(pb <- parboot(fm2, statistic=chisq, nsim=100, parallel=FALSE))
```

We fail to reject the null hypothesis, and conclude that the model fit
is adequate.

## Derived parameters and empirical Bayes methods

The `parboot` function can be also be used to compute confidence
intervals for estimates of derived parameters, such as the proportion
of $N$ sites occupied $\mbox{PAO} = \frac{\sum_i z_i}{N}$ where $z_i$ is the true
occurrence state at site $i$, which is unknown at sites where no individuals
were detected. The `colext` vignette shows examples of using
`parboot` to obtain confidence intervals for such derived
quantities. An alternative way achieving this goal is to use empirical Bayes
methods, which were introduced in `unmarked` version 0.9-5. These methods estimate
the posterior distribution of the latent variable given the data and
the estimates of the fixed effects (the MLEs). The mean or the mode of
the estimated posterior distibution is referred to as the empirical
best unbiased predictor (EBUP), which in `unmarked` can be
obtained by applying the `bup` function to the estimates of the
posterior distributions returned by the `ranef` function. The
following code returns an estimate of PAO using EBUP.

```{r}
re <- ranef(fm2)
EBUP <- bup(re, stat="mode")
sum(EBUP) / numSites(pcru)
```

Note that this is similar, but slightly lower than the
population-level estimate of $\psi$ obtained above.

A plot method also exists for objects returned by `ranef`, but
distributions of binary variables are not so pretty. Try it out on a
fitted abundance model instead.

# References
