---
title: "Fitted-Model-Based Annotations :: Cheat Sheet"
subtitle: "'ggpmisc' `r packageVersion('ggpmisc')`"
author: "Pedro J. Aphalo"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: yes
vignette: >
  %\VignetteIndexEntry{Fitted-Model-Based Annotations :: Cheat Sheet}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

## Basics

**ggpmisc** is based on the **grammar of graphics** implemented in **ggplot2**, the idea that you can build every graph from the same components: a **data** set, a **coordinate system**, and **geoms**---visual marks that represent data points. If you are not already familiar with this grammar and **ggplot2** you should visit the [**ggplot2** Cheat Sheet](https://rstudio.github.io/cheatsheets/html/data-visualization.html) first, and afterwards come back to this Cheat Sheet.

Differently to **ggplot2**, no geometries with the new stats as default are provided. The plot layers described here are always added with a _stat_, and when necessary, their default `geom` argument can be overridden. The default _geoms_ for the statistics described below are from packages **ggplot2** and **ggpp**.

```{r, eval=FALSE}
library(ggpmisc)
```

Most of the layer functions in **ggpmisc** aim at making it easier to add to plots information derived from model fitting, tests of significance and some summaries. All layer functions work as expected with groups and facets. 

## Correlation

* `stat_correlation()` computes parametric $r$ or non-parametric correlation coefficients, $\tau$ and $\rho$, and optionally their confidence intervals, $P$, and $n$, the number of observations, flexibly adding an annotation to the plot.

## Fitted models

The statistics for fitted models come in matched pairs, one that adds a plot layer with one or more curves and confidence band(s), and one that annotates the plot with the fitted model equation and/or other parameter estimates. These depend on the type of fitted model and include $R^2$, $F$, $P$, $AIC$, $BIC$, and $n$. The curve plotting stats are similar to `ggplot2::stat_smooth()` but the ones for textual annotations have no equivalent in 'ggplot2'.

* `stat_poly_line()` and `stat_poly_line()` are the pair supporting a broader set of model fit functions: e.g., linear models (OLS, resistant and robust), linear splines, general linear model (gls), major axis (MA) and standardised major axis (SMA) regression, etc.

* `stat_quant_line()`, `stat_quant_band()` and `stat_quant_eq()` support quantile regression  (using 'quantreg').

* `stat_ma_line()` and `stat_ma_eq()` support major axis (MA), standardised major axis (SMA) and ranged major axis (RMA) regression (using 'lmodel2').

* `stat_fit_augment()` works with model fit functions supported by `broom::augment()` methods including non-linear models.

* `stat_fit_tidy()` works with model fit functions supported by `broom::tidy()` methods including non-linear models.

* `stat_fit_fitted()` and `stat_fit_deviations()` can be used to highlight the fitted values and their distance to the observations in a scatterplot in combination with the statistics above.

* `stat_fit_residuals()` can be used to create consistent plots of residuals for many different model fit functions.

* `stat_distrmix_line()` and `stat_distrmix_eq()` support univariate Normal distribution mixture models.

## ANOVA or summary tables

* `stat_fit_tb()` fits any model supported by a `broom::tidy()` method. Adds an ANOVA or Summary table. Which columns are included and their naming can be set by the user.

## Multiple comparisons

* `stat_multcomp()` fits a model, computes ANOVA and subsequently calls functions from package 'multcomp' to test the significance of Tukey, Dunnet or arbitrary sets of pairwise contrasts, with a choice of the adjustment method for the _P_-values. Significance of differences can be indicated with letters, asterisks or _P_-values. Sizes of differences are also computed and available for user-assembled labels. 

## Peaks and valleys

* `stat_peaks()` finds and labels peaks (= global or local maxima).

* `stat_valleys()` finds and labels valleys (= global or local minima).

## Volcano and quadrant plots

These plots are frequently used with gene expression data, and each of the many genes labelled based on the ternary outcome from a statistical test. Data are usually, in addition transformed. 'ggpmisc' provides several variations on continuous, colour, fill and shape scales, with defaults set as needed. Scales support log fold-change (`logFC`), false discovery ratio (`FDR`), _P_-value (`Pvalue`) and binary or ternary test outcomes (`outcome`).

## Utility functions

Most of the functions used to generate formatted labels in layers and scales are also exported.

------------------------------------------------------------------------

Learn more at [docs.r4photobiology.info/ggpmisc/](https://docs.r4photobiology.info/ggpmisc/).

------------------------------------------------------------------------