---
title: "ggmatplot"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{ggmatplot}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE,
  warning = FALSE
)
```

`ggmatplot` is a quick and easy way of plotting the columns of two matrices or data frames against each other using [`ggplot2`](https://ggplot2.tidyverse.org/).

`ggmatplot` is built upon [`ggplot2`](https://ggplot2.tidyverse.org/), and its functionality is inspired by [`matplot`](https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/matplot). Therefore, `ggmatplot` can be considered as a `ggplot` version of [`matplot`](https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/matplot).

## What does `ggmatplot` do?

Similar to [`matplot`](https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/matplot), `ggmatplot` plots a vector against the columns of a matrix, or the columns of two matrices against each other, or a vector/matrix on its own. However, unlike [`matplot`](https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/matplot), `ggmatplot` returns a `ggplot` object.

Suppose we have a covariate vector `x` and a matrix `z` with the response `y` and the fitted value `fit.y` as the two columns.

```{r defining-data}
# vector x
x <- c(rnorm(100, sd = 2))

head(x)

# matrix z
y <- x * 0.5 + rnorm(100, sd = 1)
fit.y <- fitted(lm(y ~ x))
z <- cbind(actual = y,
           fitted = fit.y)

head(z)
```

`ggmatplot` plots vector `x` against each column of matrix `z` using the default `plot_type = "point"`. This will be represented on the resulting plot as two groups, identified using different shapes and colors.

```{r point-plot}
library(ggmatplot)

ggmatplot(x, z)
```

The default aesthetics used to differentiate the two groups can be updated using `ggmatplot()` arguments. Since the two groups in this example are differentiated using their shapes and colors, the `shape` and `color` parameters can be used to change them. If we want points in both groups to have the same shape, we can simply set the `shape` parameter to a single value. However, if we want the points in the groups to be differentiated by color, we can pass a list of colors as the `color` parameter - but we should make sure the number of colors in the list matches up with the number of groups.

```{r point-plot-w-parameters}
ggmatplot(x, z,
          shape = "circle", # using a single shape over both groups
          color = c("blue","purple") # assigning two colors to the two groups
          )
```

Since `ggmatplot` is built upon [`ggplot2`](https://ggplot2.tidyverse.org/) and creates a ggplot object, [ggplot add ons](https://ggplot2.tidyverse.org/reference/index.html) such as scales, faceting specifications, coordinate systems, and themes can be added on to plots created using `ggmatplot` too.

Each `plot_type` allowed by the `ggmatplot()` function is also built upon a `ggplot2 geom` (geometric object), as listed [here](../reference/ggmatplot.html#plot-types). Therefore, `ggmatplot()` will support additional parameters specific to each plot type. These are often aesthetics, used to set an aesthetic to a fixed value, like `size = 2` or `alpha = 0.5`. However, they can also be other parameters specific to different types of plots.


```{r point-plot-w-theme}
ggmatplot(x, z,
          shape = "circle",
          color = c("blue","purple"),
          size = 2,
          alpha = 0.5
          ) +
  theme_bw()

```

<!-- [This list of examples](../index.html#examples) includes other types of plots we can create using `ggmatplot`. -->

## When can we use `ggmatplot` over `ggplot2`?

[`ggplot2`](https://ggplot2.tidyverse.org/) requires wide format data to be wrangled into long format for plotting, which can be quite cumbersome when creating simple plots. Therefore, the motivation for `ggmatplot` is to provide a solution that allows [`ggplot2`](https://ggplot2.tidyverse.org/) to handle wide format data. Although `ggmatplot` doesn't provide the same flexibility as [`ggplot2`](https://ggplot2.tidyverse.org/), it can be used as a workaround for having to wrangle wide format data into long format and creating simple plots using [`ggplot2`](https://ggplot2.tidyverse.org/).

Suppose we want to use the `iris` dataset to plot the distributions of its numeric variables individually.

```{r iris-data}
library(tidyr)
library(dplyr)

iris_numeric <- iris %>%
  select(is.numeric)

head(iris_numeric)
```

If we were to plot this data using [`ggplot2`](https://ggplot2.tidyverse.org/), we'd have to wrangle the data into long format before plotting.

```{r iris-ggplot2-density-plot}
iris_numeric_long <- iris_numeric %>%
  pivot_longer(cols = everything(),
               names_to = "Feature",
               values_to = "Measurement") 

head(iris_numeric_long)

iris_numeric_long %>%
  ggplot(aes(x = Measurement,
             color = Feature)) +
  geom_density()
```

But the wide format data can be directly used with `ggmatplot` to achieve the same result. Note that the order of the categories in the legend follows the column order in the original dataset.

```{r iris-ggmatplot-density-plot}
ggmatplot(iris_numeric, plot_type = "density", alpha = 0)
```

Suppose we also have the following dataset of the monthly totals of international airline passengers(in thousands) from January 1949 to December 1960.

```{r airline-data}
AirPassengers <- matrix(AirPassengers,
  ncol = 12, byrow = FALSE,
  dimnames = list(month.abb, as.character(1949:1960))
)

AirPassengers
```

If we want to plot the trend of the number of passengers over the years using `ggplot2`, we'd have to wrangle the data into long format. But we can use `ggmatplot` as a workaround. 

First, we can split the data into two matrices as follows:

`months`: a vector containing the list of months  
`nPassengers`: a matrix of passenger numbers with each column representing a year

```{r airline-data-split}
months <- rownames(AirPassengers)
nPassengers <- AirPassengers[, 1:12]
```

Then we can use `ggmatplot()` to plot the `months` matrix against each column of the `nPassengers` matrix - which can be more simply understood as grouping the plot using each column(`year`) of the `nPassengers` matrix.

```{r line-plot}
ggmatplot(
  x = months,
  y = nPassengers,
  plot_type = "line",
  size = 1,
  legend_label = c(1949:1960),
  xlab = "Month",
  ylab = "Total airline passengers (in thousands)",
  legend_title = "Year"
) +
  theme_minimal()
```
