---
title: "Tour de trackeR"
author: "[Hannah Frick](https://www.frick.ws) and [Ioannis Kosmidis](https://ikosmidis.com)"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: trackeR.bib
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{Tour de trackeR}
  %\VignetteEncoding{UTF-8}
---


The **trackeR** package provides infrastructure for handling running and cycling
data from GPS-enabled tracking devices. A short demonstration of its functionality is
provided below, based on data from running activities. A more comprehensive introduction
to the package can be found in the vignette "Infrastructure for Running and
Cycling Data", which can be accessed by typing

```{r, vignette, eval = FALSE}
vignette("trackeR", package = "trackeR")
```

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 7
)
```


## Reading data

**trackeR** can currently import files in the Training Centre XML (TCX) format and
.db3 files (SQLite databases, used, for example, by devices from GPSports) through the
corresponding functions *readTCX*() and *readDB3*(). It also offers support
for JSON files from [Golden Cheetah](http://www.goldencheetah.org/) via *readJSON*().

```{r, runDF, message = FALSE}
library("trackeR")
filepath <- system.file("extdata/tcx/", "2013-06-01-183220.TCX.gz", package = "trackeR")
runDF <- readTCX(file = filepath, timezone = "GMT")
```

These read functions return a `data.frame` of the following structure

```{r, str_runDF}
str(runDF)
```

That `data.frame` can be used as an input to the constructor function for **trackeR**'s
*trackeRdata* class, to produce a session-based and unit-aware object that can be used
for further analyses.

```{r, runTr0}
runTr0 <- trackeRdata(runDF)
```

The *read_container*() function combines the two steps of importing the data and constructing
the *trackeRdata* object.

```{r, runTr1}
runTr1 <- read_container(filepath, type = "tcx", timezone = "GMT")
identical(runTr0, runTr1)
```

The *read_directory*() function can be used to read `all` supported files in a directory and
produce the corresponding *trackeRdata* objects.



## Visualisations

The package includes an example data set which can be accessed through

```{r, runs}
data("runs", package = "trackeR")
```

The default behaviour of the *plot* method for *trackeRdata* objects
is to show how heart rate and pace evolve over the session.

```{r, plot_runs, fig.width = 7, fig.height = 5}
plot(runs, session = 1:7)
```

The elevation profile of a training session is also accessible, here along with the pace.

```{r, plot_runs_2, fig.width = 7, fig.height = 5}
plot(runs, session = 8, what = c("altitude", "pace"))
```

The route taken during a training session can also be plotted on maps from various sources
e.g., from Google or OpenStreetMap. This can be done either on a static map

```{r, plotRoute, message = FALSE, warning = FALSE, fig.width = 7, fig.height = 7}
tryCatch(plot_route(runs, session = 1, source = "stamen"),
         error = function(x) "Failed to donwload map data")
```

or on an interactive map.

```{r, leafletRoute, fig.width = 7, fig.height = 7}
tryCatch(leaflet_route(runs, session = c(1, 6, 12)),
         error = function(x) "Failed to donwload map data")
```



## Session summaries

The summary of sessions includes basic statistics like duration,
time spent moving, average speed, pace, and heart rate. The speed threshold used to
distinguish moving from resting can be set by the argument *moving_threshold*.

```{r, summary}
summary(runs, session = 1, moving_threshold = c(cycling = 2, running = 1, swimming = 0.5))
```

It is usually desirable to visualise summaries from multiple sessions. This can be done
using the *plot* method for summary objects. Below, we produce such a plot for average
heart rate, average speed, distance, and duration.

```{r, plot_summary, fig.width = 7, fig.height = 7}
runs_summary <- summary(runs)
plot(runs_summary, group = c("total", "moving"),
     what = c("avgSpeed", "distance", "duration", "avgHeartRate"))
```

The timeline plot is useful to visualise the date and time that the sessions took place and
provide information of their relative duration.

```{r, timeline, fig.width = 7, fig.height = 7}
timeline(runs_summary)
```



## Time in zones

The time spent training in certain zones, e.g., speed zones, can also be calculated and
visualised.

```{r, plot_zones, fig.width = 7, fig.height = 5}
run_zones <- zones(runs[1:4], what = "speed", breaks = c(0, 2:6, 12.5))
plot(run_zones)
```



## Quantifying work capacity via W' (W prime)

*trackeR* can also be used to calculate and visualise the work capacity W' (pronounced
as `W prime`). The comprehensive vignette "Infrastructure for Running and Cycling Data"
provides the definition of work capacity and details on the *version* and *quantity* arguments.

```{r, plot_Wprime, fig.width = 7, fig.height = 5}
wexp <- Wprime(runs, session = 11, quantity = "expended", cp = 4, version = "2012")
plot(wexp, scaled = TRUE)
```



## Distribution and concentration profiles

@trackeR:Kosmidis+Passfield:2015 introduce the concept of distribution and concentration
profiles for which **trackeR** provides an implementation. These profiles are motivated
by the need to compare sessions and use information on such variables as heart rate or
speed during a session for further modelling.

The distribution profile for a variable such as speed or heart rate describes the time
exercising above a (speed or heart rate) threshold.

Here, the distribution profiles for the first 4 sessions are calculated for speed with
thresholds ranging from 0 to 12.5 m/s in increments of 0.05 m/s.

```{r, distProfile, fig.width = 7, fig.height = 5}
d_profile <- distribution_profile(runs, session = 1:4, what = "speed",
                                  grid = list(speed = seq(0, 12.5, by = 0.05)))
plot(d_profile, multiple = TRUE)
```

Sessions 4 and 1 are longer than session 2 and 3, as visible by the
higher amount of time spent exercising above 0 m/s. Sessions 3 and 4
show a larger amount of time spent exercising above 4 m/s than the
other sessions. This is easier to spot in the concentration profiles
which are the negative derivative of the distribution profiles.  The
concentration profile for session 3 has a mode at around 3.5 meters
per second and another one above 4 meters per second, showing that
this session involved training at a combination of low and high
speeds.

```{r, conProfile, fig.width = 7, fig.height = 5}
c_profile <- concentrationProfile(d_profile, what = "speed")
plot(c_profile, multiple = TRUE, smooth = TRUE)
```

More details on distribution and concentration profiles can be found in the comprehensive
vignette "Infrastructure for Running and Cycling Data".



## Functional principal components analysis

The distribution and concentration can be used for further analysis such as a functional
principal components analysis (PCA) to describe the differences between the profiles.

The concentration profiles for all session

```{r, prep_profiles, fig.width = 7, fig.height = 5}
runsT <- threshold(runs)
dp_runs <- distribution_profile(runsT, what = "speed")
dp_runs_S <- smoother(dp_runs)
cp_runs <- concentration_profile(dp_runs_S)
plot(cp_runs, multiple = TRUE, smooth = FALSE)
```

vary in their shape (unimodal or multimodal), height, and location (revealing
concentrations at higher or lower speeds). The function *funPCA*() can be used
to fit a functional PCA, here with 4 principal components.

```{r, funPCA}
cpPCA <- funPCA(cp_runs, what = "speed", nharm = 4)
```

For the speed concentration profiles here, the first two components cover
`r round(sum(cpPCA$varprop[1:2]*100))`% of the variability in the profiles.

```{r, plot_fPCA, fig.width = 7, fig.height = 7}
round(cpPCA$varprop, 2)
plot(cpPCA, harm = 1:2)
```

A plot of the first two principal components reveals that the profiles here differ mostly
in the general height of the curves and the location. These two aspects of the profiles
can be captured by two univariate measures of the sessions, the time spent moving and the
average speed moving.

```{r, plot_scores, fig.show = 'hold', fig.width = 7, fig.height = 5}
## plot scores vs summary statistics
scoresSP <- data.frame(cpPCA$scores)
names(scoresSP) <- paste0("speed_pc", 1:4)
d <- cbind(runs_summary, scoresSP)

library("ggplot2")
## pc1 ~ session duration (moving)
ggplot(d) + geom_point(aes(x = as.numeric(durationMoving), y = speed_pc1)) + theme_bw()
## pc2 ~ avg speed (moving)
ggplot(d) + geom_point(aes(x = avgSpeedMoving, y = speed_pc2)) + theme_bw()
```


## References
