---
title: "Introduction to MazamaTimeSeries"
author: "Mazama Science"
date: "2024-03-08"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to MazamaTimeSeries}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, echo=FALSE}
knitr::opts_chunk$set(fig.width=7, fig.height=5)
```

## Background

This package supports data management activities associated with environmental 
time series collected at fixed locations in space. The motivating fields include 
both air and water quality monitoring where fixed sensors report at regular time 
intervals.

## Data Model

The most compact format for time series data collected at fixed locations is a
list including two tables. **MazamaTimeSeries** stores time series measurements in a
`data` table where each row is a synoptic record containing all measurements
associated with a particular UTC time stamp and each 
column contains data measured by a single sensor (aka "device"). Any time invariant 
metadata associated with a sensor at a known location (aka a "device-deployment")
is stored in a separate `meta` table. A unique `deviceDeploymentID` connects the two tables. 
In the language of relational databases, this "normalizes" the database and can 
greatly reduce the disk space and memory needed to store and work with the data.

### Single Time Series

Time series data from a single environmental sensor typically consists of 
multiple parameters measured at successive times. This data is stored in an R 
list containing two dataframes. The package refers to this structure as an `sts` object
for **S**ingle**T**ime**S**eries:

`sts$meta` -- 1 row = unique device-deployment; cols = device/location metadata

`sts$data` -- rows = UTC times; cols = measured parameters (plus an additional `datetime` column)

`sts` objects can support the following types of time series data:

* stationary device-deployments only (no "mobile" sensors)
* single sensor only
* regular or irregular time axes
* multiple parameters

Raw, "engineering data" containing uncalibrated measurements, instrument voltages 
and QC flags may be stored in this format. This format is also appropriate for 
processed and QC'ed data whenever multiple parameters are measured by a single
device.

_**Note:**_ The `sts` object time axis specified in `data$datetime` reflects device 
measurement times and is not required to have uniform spacing. (It _may_ be 
regular but it need not be.) It _is_ guaranteed to be monotonically increasing.

### Multiple Time Series

Working with timeseries data from multiple sensors at once is often challenging
because of the amount of memory required to store all the data from each 
sensor. However, a common situation is to have time series that share a 
common time axis -- _e.g._ hourly measurements. In this case, it is possible to
create single-parameter `data` dataframes that contain all data for all 
sensors for a single parameter of interest. In air quality applications, common
parameters of interest include PM~2.5~ and Ozone.

Multi-sensor, single-parameter time series data is 
stored in an R list with two dataframes. The package refers to this structure as 
an `mts` object for **M**ultiple**T**ime**S**eries:

`mts$meta` -- N rows = unique device-deployments; cols = device/location metadata

`mts$data` -- rows = UTC times; N cols = device-deployments (plus an additional `datetime` column)

A key feature of `mts` objects is the use of the `deviceDeploymentID` as a
"foreign key" that allows sensor `data` columns to be mapped onto the associated
spatial and sensor metadata in a `meta` row. The following will always be true:

```
identical(names(mts$data), c('datetime', mts$meta$deviceDeploymentID))
```

`mts` objects can support the following types of time series data:

* stationary device-deployments only (no "mobile" sensors)
* multiple sensors
* regular (shared) hourly time axes only
* single parameter only

Each column of `mts$data` represents a timeseries associated with a particular
device-deployment while each row represents a _synoptic_ snap shot of all
measurements made at a particular time. 

In this manner, software can create both timeseries plots and maps from a single
`mts` object in memory.

_**Note:**_ The `mts` object time axis specified in `data$datetime` is 
guaranteed to be a regularly spaced, monotonic axis with no gaps.

## Example Usage

_See usage examples in the function documentation._

----

This project is supported by the [USFS AirFire](https://www.airfire.org) team.

