---
title: "Getting started with the robis package"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{getting-started}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r, include = FALSE}
NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true")
knitr::opts_chunk$set(
  purl = NOT_CRAN,
  eval = NOT_CRAN,
  collapse = TRUE,
  comment = "#>"
)
```

This package is a client for the [OBIS API](https://api.obis.org/). It includes functions for data access, as well as a few helper functions for visualizing occurrence data and extracting nested MeasurementOrFact or DNADerivedData records.

First some packages:

```{r setup}
library(robis)
library(dplyr)
library(ggplot2)
```

## Occurrence data

The `occurrence()` function provides access to raw occurrence data. For example, to fetch all occurrences by scientific name:

```{r message = FALSE}
occ <- occurrence("Abra aequalis")
occ
```

```{r message = FALSE}
ggplot(occ) +
  geom_bar(aes(date_year), stat = "count", width = 1)
```

Alternatively, occurrences can be fetched by AphiaID:

```{r message = FALSE}
occurrence(taxonid = 293683)
```

Other parameters include `geometry`, which accepts polygons in WKT format:

```{r message = FALSE}
occurrence("Abra alba", geometry = "POLYGON ((2.59689 51.16772, 2.62436 51.14059, 2.76066 51.19225, 2.73216 51.20946, 2.59689 51.16772))")
```

WKT strings can be created by drawing on a map using the `get_geometry()` function.

A convenience function `map_leaflet()` is provided to visualize occurrences on an interactive map:

```{r eval = FALSE}
map_leaflet(occurrence("Abra sibogai"))
```

## Checklists

The `checklist()` function returns all taxa observed for a given set of filters.

```{r message = FALSE}
cl <- checklist("Semelidae")
cl
```

```{r message = FALSE}
ggplot(cl %>% filter(!is.na(genus))) +
  geom_bar(aes(genus)) +
  coord_flip() +
  ylab("species count")
```

Just like the `occurrence()` function, `checklist()` accepts WKT geometries:

```{r message = FALSE}
checklist(geometry = "POLYGON ((2.59689 51.16772, 2.62436 51.14059, 2.76066 51.19225, 2.73216 51.20946, 2.59689 51.16772))")
```

## MeasurementOrFact records

The package also provides access to MeasurementOrFact records associated with occurrences. When calling `occurrence()`, MeasurementOrFact records can be included by setting `mof = true`.

```{r message = FALSE}
occ <- occurrence("Abra tenuis", mof = TRUE)
```

MeasurementOrFact records are nested in the occurrence, but the `unnest_extension()` function allows you to extract them to a flat data frame. Use the `fields` parameter to indicate which occurrence fields need to be preserved in the measurements table.

```{r message = FALSE}
mof <- unnest_extension(occ, extension = "MeasurementOrFact", fields = c("scientificName", "decimalLongitude", "decimalLatitude"))
mof
```

Note that the MeasurementOrFact fields can be used as parameters to the `occurrence()` function. For example, to only get occurrences with associated biomass measurements:

```{r message = FALSE}
library(dplyr)

occurrence("Abra tenuis", mof = TRUE, measurementtype = "biomass") %>%
  unnest_extension(extension = "MeasurementOrFact")
```

## DNADerivedData records

Just like MeasurementOrFact records, nested DNADerivedData records can be extracted from the occurrence results.

```{r message = FALSE}
occ <- occurrence("Prymnesiophyceae", datasetid = "62b97724-da17-4ca7-9b26-b2a22aeaab51", dna = TRUE)
occ
```

```{r message = FALSE}
dna <- unnest_extension(occ, extension = "DNADerivedData", fields = c("scientificName"))

dna %>%
  select(scientificName, target_gene, DNA_sequence)
```
