---
title: "Introduction to forestploter"
author: "Alimu Dayimu"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: yes
vignette: >
  %\VignetteIndexEntry{Introduction to forestploter}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  dpi = 300,
  comment = "#>"
)
```

# Introduction

The `forestploter` package provides a flexible way to draw forest plots. The layout of the plot is determined by the dataset, and all plot elements are placed in cells, making it easy to edit any element by specifying its row and column. The graphical parameters of each element can be further customized within its respective cell. This vignette will demonstrate how to create a simple forest plot.

The plotting steps demonstrated in this vignette may not be optimal. Other R packages may be better suited for the plots shown here. Please choose the one that best suits your needs. The final plot is shown below:

![Final Plot](https://raw.githubusercontent.com/adayim/forestploter/main/inst/extdata/forestploter-intro-final.png)

[Forest plots](https://en.wikipedia.org/wiki/Forest_plot) are commonly used in medical research publications, especially in [meta-analysis](https://en.wikipedia.org/wiki/Meta-analysis). They can also be used to report the coefficients and confidence intervals (CIs) of regression models.

There are many packages available for drawing forest plots. The most popular one is [forestplot](https://CRAN.R-project.org/package=forestplot). Other packages specialized for meta-analysis include [meta](https://CRAN.R-project.org/package=meta), [metafor](https://CRAN.R-project.org/package=metafor), and [rmeta](https://CRAN.R-project.org/package=rmeta). Some packages, like [ggforestplot](https://nightingalehealth.github.io/ggforestplot/index.html), use [ggplot2](https://CRAN.R-project.org/package=ggplot2) to draw forest plots, though ggforestplot is not yet available on CRAN.

The main differences between `forestploter` and other packages are:

* Focuses specifically on forest plots.
* Treats the forest plot as a table, where elements are aligned in rows and columns. Users have full control over what and how to display the forest plot contents.
* Graphical parameters are controlled with a theme.
* Allows post-hoc plot editing.
* Supports CIs in multiple columns and by groups.

# Basic Forest Plot

The layout of the forest plot is determined by the dataset provided. Please refer to the other vignette for instructions on changing text or background, adding or inserting text, adding borders to cells, and editing the color of the CI in specific cells.

## Simple Forest Plot

The first step is to prepare a `data.frame` that will serve as the basic layout of the forest plot. The column names of the data will be drawn as the header, and the content will be displayed in the forest plot body. One or more blank columns should be provided to draw the confidence intervals (CIs). **The width of the CI is determined by the width of its corresponding column. To provide more space for the CI, increase the number of spaces in the blank column.**

First, we need to prepare the data for plotting.

```{r prepare-data}
library(grid)
library(forestploter)

# Read provided sample example data
dt <- read.csv(system.file("extdata", "example_data.csv", package = "forestploter"))

# Keep needed columns
dt <- dt[, 1:6]

# Indent the subgroup if there is a number in the placebo column
dt$Subgroup <- ifelse(is.na(dt$Placebo), 
                      dt$Subgroup,
                      paste0("   ", dt$Subgroup))

# Replace NA with blank or NA will be transformed to character
dt$Treatment <- ifelse(is.na(dt$Treatment), "", dt$Treatment)
dt$Placebo <- ifelse(is.na(dt$Placebo), "", dt$Placebo)
dt$se <- (log(dt$hi) - log(dt$est)) / 1.96

# Add a blank column for the forest plot to display CI
# Adjust the column width with spaces; increase the number of spaces below 
# to provide a larger area for drawing the CI
dt$` ` <- paste(rep(" ", 20), collapse = " ")

# Create a confidence interval column to display
dt$`HR (95% CI)` <- ifelse(is.na(dt$se), "",
                             sprintf("%.2f (%.2f to %.2f)",
                                     dt$est, dt$low, dt$hi))
head(dt)
```

The data prepared above will serve as the basic layout of the forest plot. The example below demonstrates how to draw a simple forest plot, with a footnote added for demonstration.

```{r simple-plot, out.width="80%", fig.width = 8, fig.height = 6}
p <- forest(dt[, c(1:3, 8:9)],
            est = dt$est,
            lower = dt$low, 
            upper = dt$hi,
            sizes = dt$se,
            ci_column = 4,
            ref_line = 1,
            arrow_lab = c("Placebo Better", "Treatment Better"),
            xlim = c(0, 4),
            ticks_at = c(0.5, 1, 2, 3),
            footnote = "This is the demo data. Please feel free to change\nanything you want.")

# Print plot
plot(p)
```

## Changing the Theme

We will now use the same data as above but add a summary point. Additionally, we will change the graphical parameters for the confidence interval and other parts of the plot. The theme of the forest plot can be adjusted with the `forest_theme` function. Refer to the manual for more details.

```{r simple-plot-theme, out.width="80%", fig.width = 7, fig.height = 3.3}
dt_tmp <- rbind(dt[-1, ], dt[1, ])
dt_tmp[nrow(dt_tmp), 1] <- "Overall"
dt_tmp <- dt_tmp[1:11, ]

# Define theme
tm <- forest_theme(base_size = 10,
                   # Confidence interval point shape, line type/color/width
                   ci_pch = 15,
                   ci_col = "#762a83",
                   ci_fill = "black",
                   ci_alpha = 0.8,
                   ci_lty = 1,
                   ci_lwd = 1.5,
                   ci_Theight = 0.2, # Set a T end at the end of CI 
                   # Reference line width/type/color
                   refline_gp = gpar(lwd = 1, lty = "dashed", col = "grey20"),
                   # Vertical line width/type/color
                   vertline_lwd = 1,
                   vertline_lty = "dashed",
                   vertline_col = "grey20",
                   # Change summary color for filling and borders
                   summary_fill = "#4575b4",
                   summary_col = "#4575b4",
                   # Footnote font size/face/color
                   footnote_gp = gpar(cex = 0.6, fontface = "italic", col = "blue"))

pt <- forest(dt_tmp[, c(1:3, 8:9)],
             est = dt_tmp$est,
             lower = dt_tmp$low, 
             upper = dt_tmp$hi,
             sizes = dt_tmp$se,
             is_summary = c(rep(FALSE, nrow(dt_tmp) - 1), TRUE),
             ci_column = 4,
             ref_line = 1,
             arrow_lab = c("Placebo Better", "Treatment Better"),
             xlim = c(0, 4),
             ticks_at = c(0.5, 1, 2, 3),
             footnote = "This is the demo data. Please feel free to change\nanything you want.",
             theme = tm)

# Print plot
plot(pt)
```

## Text Justification and Background

By default, all cells are left-aligned. However, it is possible to justify any cell in the forest plot by setting parameters in `forest_theme`. For example, `core = list(fg_params = list(hjust = 0, x = 0))` left-aligns the content, while `rowhead = list(fg_params = list(hjust = 0.5, x = 0.5))` centers the header. To right-align text, set `hjust = 1` and `x = 0.9`. **You can also change the text justification with `edit_plot`, as detailed in another vignette.**

The same rule applies to changing the background color. This can be done by setting `core = list(bg_params = list(fill = c("#edf8e9", "#c7e9c0", "#a1d99b")))`. Modify settings in `core` to change the graphical parameters of the plot's content, and use `colhead` for the header. To modify the text, adjust the settings in `fg_params` (see `textGrob()` in the `grid` package), and for the background, change `bg_params` (see `gpar()` in the `grid` package). Parameters should be passed as a list. More details can be found [here](https://CRAN.R-project.org/package=gridExtra/vignettes/tableGrob.html).

Provide a single value for uniform justification across all cells or a vector for varied justification. As shown in the second example, text is justified by row using the provided vector, which will be recycled as needed.

```{r text-justification, out.width="80%", fig.width = 7, fig.height = 2}
dt <- dt[1:4, ]

# Header center and content right
tm <- forest_theme(core = list(fg_params = list(hjust = 1, x = 0.9),
                               bg_params = list(fill = c("#edf8e9", "#c7e9c0", "#a1d99b"))),
                   colhead = list(fg_params = list(hjust = 0.5, x = 0.5)))

p <- forest(dt[, c(1:3, 8:9)],
            est = dt$est,
            lower = dt$low, 
            upper = dt$hi,
            sizes = dt$se,
            ci_column = 4,
            title = "Header center content right",
            theme = tm)

# Print plot
plot(p)

# Mixed justification
tm <- forest_theme(core = list(fg_params = list(hjust = c(1, 0, 0, 0.5),
                                                x = c(0.9, 0.1, 0, 0.5)),
                               bg_params = list(fill = c("#f6eff7", "#d0d1e6", "#a6bddb", "#67a9cf"))),
                   colhead = list(fg_params = list(hjust = c(1, 0, 0, 0, 0.5),
                                                   x = c(0.9, 0.1, 0, 0, 0.5))))

p <- forest(dt[, c(1:3, 8:9)],
            est = dt$est,
            lower = dt$low, 
            upper = dt$hi,
            sizes = dt$se,
            ci_column = 4,
            title = "Mixed justification",
            theme = tm)
plot(p)
```

## Text Parsing

Similar to text justification, you can parse text in any cell. However, parsing all text will remove blanks from the data, which will also affect the blank columns used for drawing the whiskers.

```{r text-parsing, out.width="80%", fig.width = 7, fig.height = 2}
# Check out the `plotmath` function for math expression.
dt <- data.frame(
  Study = c("Study ~1^a", "Study ~2^b", "NO[2]"),
  low = c(0.2, -0.03, 1.11),
  est = c(0.71, 0.35, 1.79),
  hi = c(1.22, 0.74, 2.47)
)

dt$SMD <- sprintf("%.2f (%.2f, %.2f)", dt$est, dt$low, dt$hi)
dt$` ` <- paste(rep(" ", 20), collapse = " ")

fig_dt <- dt[, c(1, 5:6)]

# Get a matrix of which row and columns to parse
parse_mat <- matrix(FALSE, 
                    nrow = nrow(fig_dt),
                    ncol = ncol(fig_dt))

# Here we want to parse the first column only, you can amend this to whatever you want.
parse_mat[, 1] <- TRUE  

# Remove this if you don't want to parse the column head.
tm <- forest_theme(colhead = list(fg_params = list(parse = TRUE)), 
                   core = list(fg_params = list(parse = parse_mat)))

p <- forest(fig_dt,
            est = dt$est,
            lower = dt$low,
            upper = dt$hi,
            ci_column = 3,
            theme = tm)

# Add customized footnote.
# Due to the limitation of the textGrob, passing a parsed text with linebreak 
# has some issues. We use a different approach here.
txt <- "<sup>a</sup> This is study A<br><sup>b</sup> This is study B"

add_grob(p, 
         row = 4, 
         col = 1:2,
         order = "background",
         gb_fn = gridtext::richtext_grob,
         text = txt,
         gp = gpar(fontsize = 8),
         hjust = 0, vjust = 1, halign = 0, valign = 1,
         x = unit(0, "npc"), y = unit(1, "npc"))
```

# Multiple CI Columns

You may want to have multiple CI columns, with each representing a different outcome. To achieve this, provide a vector of column positions where the CIs will be drawn. If the number of CI columns matches the number of `est` values, one CI will be drawn in each specified column. If there are fewer CI columns than `est` values, the extra `est` values will be treated as a group and drawn sequentially in the available CI columns. In this case, the group number is determined by dividing the number of `est` values by the number of `ci_column`, and multiple CIs will be drawn in a single cell. As shown in the example below, the CIs are drawn in columns 3 and 5, with the first and second elements of `est`, `lower`, and `upper` corresponding to columns 3 and 5, respectively.

In an example with multiple groups, two or more CIs can be displayed in one cell. The solution is to provide all values sequentially to `est`, `lower`, and `upper`. This means that the first `n` elements in `est`, `lower`, and `upper` are treated as the same group, and the same applies to the next `n` elements, where `n` is determined by the number of `ci_column`. As demonstrated in the example below, `est_gp1` and `est_gp2` are drawn in columns 3 and 5 as **group 1**, while `est_gp3` and `est_gp4` are drawn in the same columns as **group 2**.

This is an example of multiple CI columns and groups:

```{r multiple-group, out.width="80%", fig.width = 8, fig.height = 5}
dt <- read.csv(system.file("extdata", "example_data.csv", package = "forestploter"))
dt <- dt[1:7, ]
# Indent the subgroup if there is a number in the placebo column
dt$Subgroup <- ifelse(is.na(dt$Placebo), 
                      dt$Subgroup,
                      paste0("   ", dt$Subgroup))

# Replace NA with blank or NA will be transformed to character
dt$n1 <- ifelse(is.na(dt$Treatment), "", dt$Treatment)
dt$n2 <- ifelse(is.na(dt$Placebo), "", dt$Placebo)

# Add two blank columns for CI
dt$`CVD outcome` <- paste(rep(" ", 20), collapse = " ")
dt$`COPD outcome` <- paste(rep(" ", 20), collapse = " ")

# Generate point estimation and 95% CI. Paste two CIs together and separate by line break.
dt$ci1 <- paste(sprintf("%.1f (%.1f, %.1f)", dt$est_gp1, dt$low_gp1, dt$hi_gp1),
                sprintf("%.1f (%.1f, %.1f)", dt$est_gp3, dt$low_gp3, dt$hi_gp3),
                sep = "\n")
dt$ci1[grepl("NA", dt$ci1)] <- "" # Any NA to blank

dt$ci2 <- paste(sprintf("%.1f (%.1f, %.1f)", dt$est_gp2, dt$low_gp2, dt$hi_gp2),
                sprintf("%.1f (%.1f, %.1f)", dt$est_gp4, dt$low_gp4, dt$hi_gp4),
                sep = "\n")
dt$ci2[grepl("NA", dt$ci2)] <- ""

# Set-up theme
tm <- forest_theme(base_size = 10,
                   refline_gp = gpar(lty = "solid"),
                   ci_pch = c(15, 18),
                   ci_col = c("#377eb8", "#4daf4a"),
                   footnote_gp = gpar(col = "blue"),
                   legend_name = "Group",
                   legend_value = c("Trt 1", "Trt 2"),
                   vertline_lty = c("dashed", "dotted"),
                   vertline_col = c("#d6604d", "#bababa"),
                   # Table cell padding, width 4 and heights 3
                   core = list(padding = unit(c(4, 3), "mm")))

p <- forest(dt[, c(1, 19, 23, 21, 20, 24, 22)],
            est = list(dt$est_gp1,
                       dt$est_gp2,
                       dt$est_gp3,
                       dt$est_gp4),
            lower = list(dt$low_gp1,
                         dt$low_gp2,
                         dt$low_gp3,
                         dt$low_gp4), 
            upper = list(dt$hi_gp1,
                         dt$hi_gp2,
                         dt$hi_gp3,
                         dt$hi_gp4),
            ci_column = c(4, 7),
            ref_line = 1,
            vert_line = c(0.5, 2),
            nudge_y = 0.4,
            theme = tm)

plot(p)
```

It is clear that `forest` uses the provided data as the skeleton for the forest plot. You can use your imagination to place any content in a cell, including line breaks. Please refer to the other vignette for instructions on how to modify text alignment.

# Different Parameters for Different CI Columns

When a forest plot has multiple columns, you may want to apply different settings to each one. For example, different CI columns can have distinct `xlim`, x-axis ticks, x-axis labels, `x_trans` transformations, reference lines, vertical lines, or arrow labels. This can be easily achieved by providing a list or a vector. Use a list for `xlim`, `vert_line`, `arrow_lab`, and `ticks_at`, and an atomic vector for `xlab`, `x_trans`, and `ref_line`. See the example below for a demonstration.

```{r multiple-param, out.width="70%", fig.width = 10, fig.height = 6.5}
dt$`HR (95% CI)` <- ifelse(is.na(dt$est_gp1), "",
                             sprintf("%.2f (%.2f to %.2f)",
                                     dt$est_gp1, dt$low_gp1, dt$hi_gp1))
dt$`Beta (95% CI)` <- ifelse(is.na(dt$est_gp2), "",
                             sprintf("%.2f (%.2f to %.2f)",
                                     dt$est_gp2, dt$low_gp2, dt$hi_gp2))

tm <- forest_theme(arrow_type = "closed",
                   arrow_label_just = "end")

p <- forest(dt[, c(1, 21, 23, 22, 24)],
            est = list(dt$est_gp1,
                       dt$est_gp2),
            lower = list(dt$low_gp1,
                         dt$low_gp2), 
            upper = list(dt$hi_gp1,
                         dt$hi_gp2),
            ci_column = c(2, 4),
            ref_line = c(1, 0),
            vert_line = list(c(0.3, 1.4), c(0.6, 2)),
            x_trans = c("log", "none"),
            arrow_lab = list(c("L1", "R1"), c("L2", "R2")),
            xlim = list(c(0, 3), c(-1, 3)),
            ticks_at = list(c(0.1, 0.5, 1, 2.5), c(-1, 0, 2)),
            xlab = c("OR", "Beta"),
            nudge_y = 0.2,
            theme = tm)

plot(p)
```

# Custom CIs

It is possible to pass a custom CI drawing function to `forest`. The `fn_ci` argument accepts a CI drawing function for normal confidence intervals, while `fn_summary` is used for summary CIs. Other parameters for these functions can be passed via `forest`. If you need to pass row values such as `est` and `lower` to these functions, you must define the names of the parameters you have passed in `index_args`. This is an advanced technique, and this vignette does not cover how to create a CI drawing function. However, you can find tutorials [here](https://www.stat.auckland.ac.nz/~paul/RG3e/chapter8.html) if you are interested. Below is an example of how to use a box plot CI with the built-in `make_boxplot` function.

```{r custom-ci, out.width="70%", fig.width = 3, fig.height = 3}
# Function to calculate Box plot values
box_func <- function(x){
  iqr <- IQR(x)
  q3 <- quantile(x, probs = c(0.25, 0.5, 0.75), names = FALSE)
  c("min" = q3[1] - 1.5 * iqr, "q1" = q3[1], "med" = q3[2],
    "q3" = q3[3], "max" = q3[3] + 1.5 * iqr)
}
# Prepare data
val <- split(ToothGrowth$len, list(ToothGrowth$supp, ToothGrowth$dose))
val <- lapply(val, box_func)

dat <- do.call(rbind, val)
dat <- data.frame(Dose = row.names(dat),
                  dat, row.names = NULL)

dat$Box <- paste(rep(" ", 20), collapse = " ")

# Draw a single group box plot
tm <- forest_theme(ci_Theight = 0.2)

p <- forest(dat[, c(1, 7)],
            est = dat$med,
            lower = dat$min,
            upper = dat$max,
            # sizes = sizes,
            fn_ci = make_boxplot,
            ci_column = 2,
            lowhinge = dat$q1, 
            uphinge = dat$q3,
            hinge_height = 0.2,
            # values of the lowhinge and uphinge will be used as row values
            index_args = c("lowhinge", "uphinge"), 
            gp_box = gpar(fill = "black", alpha = 0.4),
            theme = tm
)
p
```

# Saving the Plot

You can use either the base R method or the `ggsave` function to save the plot. When using `ggsave`, be sure to specify the `plot` parameter. The width and height should be adjusted to achieve the desired output. Alternatively, you can set `autofit = TRUE` in the `print` or `plot` function to automatically fit the plot, though this may result in a layout that is not as compact as desired.

```{r eval=FALSE}
# Base method
png('rplot.png', res = 300, width = 7.5, height = 7.5, units = "in")
p
dev.off()

# ggsave function
ggplot2::ggsave(filename = "rplot.png", plot = p,
                dpi = 300,
                width = 7.5, height = 7.5, units = "in")
```

Alternatively, you can retrieve the width and height of the forest plot using `get_wh` and use these dimensions when saving.

```{r eval=FALSE}
# Get width and height
p_wh <- get_wh(plot = p, unit = "in")
png('rplot.png', res = 300, width = p_wh[1], height = p_wh[2], units = "in")
p
dev.off()

# Or get scale
get_scale <- function(plot,
                      width_wanted,
                      height_wanted,
                      unit = "in"){
  h <- convertHeight(sum(plot$heights), unit, TRUE)
  w <- convertWidth(sum(plot$widths), unit, TRUE)
  max(c(w / width_wanted,  h / height_wanted))
}
p_sc <- get_scale(plot = p, width_wanted = 6, height_wanted = 4, unit = "in")
ggplot2::ggsave(filename = "rplot.png", 
                plot = p,
                dpi = 300,
                width = 6, 
                height = 4,
                units = "in",
                scale = p_sc)
```

# FAQs

**Q: The whisker/CI plot area is too narrow. What should I do?**

**A:** The vignettes may not be perfectly written, but you should be able to resolve this by carefully reviewing the examples. To widen the CI plot area, increase the number of blank spaces in the column where the CI is drawn. Please refer to the first example for a demonstration of how to do this.

**Q: Can I modify the width and height of each row and column?**

**A:** Yes. Although the data's content determines the initial dimensions of the rows and columns, you can modify them after plotting. For details, see the discussion [here](https://github.com/adayim/forestploter/issues/30#issuecomment-1459038988). You can also add padding to each cell by using `core = list(padding = unit(c(4, 3), "mm"))` in `forest_theme`.

**Q: How should I use weights for sizes?**

**A:** The `forest` function uses the `sizes` argument as is, without any transformation. If you need to weigh the sizes yourself, you can find some options discussed [here](https://github.com/adayim/forestploter/issues/37#issuecomment-1450208581).

**Q: How can I create a grouped forest plot?**

**A:** You can indicate group breaks by leaving a few blank lines in your data. Alternatively, you can combine multiple forest plots using `arrangeGrob` from the `gridExtra` package or `wrap_elements` from `patchwork`.


