---
title: "irrCAC-benchmarking"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{irrCAC-benchmarking}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(irrCAC)
```

# Abstract

The **irrCAC** is an R package that provides several functions for calculating various chance-corrected agreement coefficients (see [overview.html](overview.html) for ageneral overview), their weighted versions using various weights (see [weighting.html](weighting.html) for a more detailed discusssion on the weighting of agreement coefficients). In this document, I like to show you how you would implement the benchmarking approach discussed in chapter 6 of Gwet (2014). This package closely follows the general framework of inter-rater reliability assessment presented by Gwet (2014).

In a nutshell, the problem consists of qualifying the magnitude of a given agreement coeffficient as *poor*, *good*, *very good*, or something else.  There are essentially 3 bechmarking scales that have been mentioned in the inter-rater reliability, and which are covered in this package. These are the **Landis-Koch**, **Altman**, and **Fleiss** bechmarking scales defined as follows:
```{r}
landis.koch
altman
fleiss
```
These are data frames that become available to you as soon as you you install the irrCAC package.  

#
# Interpreting the magnitude of agreement coeeficients

Suppose that you computed Gwet's $\mbox{AC}_1$ coefficient using raw ratings from the dataset **cac.raw4raters**. You now want to qualify the magnitude of this coefficient using one of the benchmarking scales. Although you would normally choose one of the 3 benchmarking scales, I will use all 3 for illustration purposes.  You would proceed as follows:
```{r}
  ac1 <- gwet.ac1.raw(cac.raw4raters)$est
  data.frame(ac1$coeff.val, ac1$coeff.se)
  landis.koch.bf(ac1$coeff.val, ac1$coeff.se)
  altman.bf(ac1$coeff.val, ac1$coeff.se)
  fleiss.bf(ac1$coeff.val, ac1$coeff.se)
```
Each of the functions *landis.koch.bf(ac1\$coeff.val, ac1\$coeff.se)*, *altman.bf(ac1\$coeff.val, ac1\$coeff.se)*, and *fleiss.bf(ac1\$coeff.val, ac1\$coeff.se)* produces 2 columns: the agreement strength level, and the associated cumulative membership probability (*CumProb*). *CumProb* represents the probability that the true agreement strength level is the one associated with the probability or one that is better. I recommended in Gwet (2014) to retain an agreement strength level that is associated with *ComProb* that exceeds 0.95. 
* Landis-Koch Scale
  Since $\mbox{AC}_1=0.775$ then according to the Landoch-Koch benchmarking scale, this 
  agreement coefficient is deemed *Moderate* since the associated *CumProb* exceeds the 
  thereshold of 0.95. It cannot be qualified as *substantial* due to its standard error 
  of 0.14295 being high. See chapter 4 of Gwet (2014) for a more detailed discussion on this topic. 
  
* Altman Scale
  According to Altman's benchmarking scale, the $\mbox{AC}_1$ agreement coefficient of 0.775 is
  *Moderate* as well. 
  
* Fleiss Scale
  According to Fleiss' benchmarking scale, the $\mbox{AC}_1$ agreement coefficient of 0.775 is
  *Intermediate to Good*. 

#
# References:
1. Gwet, K.L. (2014). "*Handbook of Inter-Rater Reliability*," 4th Edition. Advanced Analytics, LLC