---
title: "Biclustering and Ranklustering"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Biclustering and Ranklustering}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  out.width = "100%"
)
```

> **Note**: Some computationally intensive examples below are shown with `eval=FALSE` to keep CRAN build times short. For full rendered output, see the [pkgdown site](https://kosugitti.github.io/exametrika/articles/biclustering.html).

```{r library, message=FALSE, warning=FALSE}
library(exametrika)
```

## Biclustering and Ranklustering

Biclustering and Ranklustering simultaneously cluster items into fields and examinees into classes/ranks. The difference is specified via the `method` option:

- `method = "B"`: Biclustering (no filtering matrix)
- `method = "R"`: Ranklustering (with filtering matrix for ordered ranks)

### Biclustering

```{r model-biclustering, message=FALSE, warning=FALSE}
Biclustering(J35S515, nfld = 5, ncls = 6, method = "B")
```

### Ranklustering

```{r model-ranklustering, message=FALSE, warning=FALSE}
result.Ranklustering <- Biclustering(J35S515, nfld = 5, ncls = 6, method = "R")
```

```{r plot-ranklustering, fig.width=7, fig.height=5, message=FALSE, warning=FALSE}
plot(result.Ranklustering, type = "Array")
plot(result.Ranklustering, type = "FRP", nc = 2, nr = 3)
plot(result.Ranklustering, type = "RRV")
plot(result.Ranklustering, type = "RMP", students = 1:9, nc = 3, nr = 3)
plot(result.Ranklustering, type = "LRD")
```

## Finding Optimal Number of Classes and Fields

### Grid Search

`GridSearch()` systematically evaluates multiple parameter combinations and selects the best-fitting model:

```{r grid-search, eval=FALSE}
result <- GridSearch(J35S515, method = "R", max_ncls = 10, max_nfld = 10, index = "BIC")
result$optimal_ncls
result$optimal_nfld
result$optimal_result
```

### Infinite Relational Model (IRM)

The IRM uses the Chinese Restaurant Process to automatically determine the optimal number of fields and classes:

```{r model-irm, eval=FALSE}
result.IRM <- Biclustering_IRM(J35S515, gamma_c = 1, gamma_f = 1, verbose = TRUE)
plot(result.IRM, type = "Array")
plot(result.IRM, type = "FRP", nc = 3)
plot(result.IRM, type = "TRP")
```

## Biclustering for Polytomous Data

### Ordinal Data

```{r bic-ord, eval=FALSE}
result.B.ord <- Biclustering(J35S500, ncls = 5, nfld = 5, method = "R")
result.B.ord
plot(result.B.ord, type = "Array")
```

FRP (Field Reference Profile) shows the expected score per field across latent ranks:

```{r bic-ord-frp, eval=FALSE}
plot(result.B.ord, type = "FRP", nc = 3, nr = 2)
```

FCRP (Field Category Response Profile) shows category probabilities across ranks. The `style` parameter can be `"line"` or `"bar"`:

```{r bic-ord-fcrp, eval=FALSE}
plot(result.B.ord, type = "FCRP", nc = 3, nr = 2)
plot(result.B.ord, type = "FCRP", style = "bar", nc = 3, nr = 2)
```

FCBR (Field Cumulative Boundary Reference) shows cumulative boundary probabilities (ordinal only):

```{r bic-ord-fcbr, eval=FALSE}
plot(result.B.ord, type = "FCBR", nc = 3, nr = 2)
```

ScoreField and RRV plots:

```{r bic-ord-scorefield, eval=FALSE}
plot(result.B.ord, type = "ScoreField")
plot(result.B.ord, type = "RRV")
```

### Nominal Data

```{r bic-nom, eval=FALSE}
result.B.nom <- Biclustering(J20S600, ncls = 5, nfld = 4)
result.B.nom
plot(result.B.nom, type = "Array")
```

Nominal Biclustering supports FRP, FCRP, ScoreField, and RRV (but not FCBR):

```{r bic-nom-plots, eval=FALSE}
plot(result.B.nom, type = "FRP", nc = 2, nr = 2)
plot(result.B.nom, type = "FCRP", nc = 2, nr = 2)
plot(result.B.nom, type = "FCRP", style = "bar", nc = 2, nr = 2)
plot(result.B.nom, type = "ScoreField")
plot(result.B.nom, type = "RRV")
```

### Rated Data (Multiple-Choice with Correct Answers)

Rated data has both multiple response categories and correct answers. `Biclustering.rated()` internally runs nominal Biclustering, then sorts classes by correct response rate:

```{r bic-rated, eval=FALSE}
result.B.rated <- Biclustering(J21S300, ncls = 5, nfld = 3, method = "R", maxiter = 300)
result.B.rated
plot(result.B.rated, type = "Array")
plot(result.B.rated, type = "FRP", nc = 3, nr = 1)
```

Two layers of fit indices are reported: binary (with CFI/RMSEA) and nominal (AIC/BIC/CAIC only). Access them via `result$TestFitIndices` and `result$TestFitIndices_nominal`.

### Rated IRM

The IRM also supports rated data:

```{r irm-rated, eval=FALSE}
result.IRM.rated <- Biclustering_IRM(J21S300, gamma_c = 1, gamma_f = 1, verbose = TRUE)
plot(result.IRM.rated, type = "Array")
plot(result.IRM.rated, type = "FRP", nc = 3, nr = 1)
```

## Distractor Analysis

`DistractorAnalysis()` examines how examinees in each rank/class respond to each item's categories. It computes observed frequencies, proportions, chi-square statistics, p-values, and Cramer's V (effect size) for each item-by-rank cell.

```{r distractor, eval=FALSE}
result.B.rated <- Biclustering(J21S300, ncls = 5, nfld = 3, method = "R", maxiter = 300)
da <- DistractorAnalysis(result.B.rated)

# Full output (grouped by field for Biclustering)
print(da)

# Filter by items and/or ranks
print(da, items = 1:7, ranks = c(1, 5))

# Plot distractor bar charts
plot(da, items = 1:6, nc = 3, nr = 2)
```

`DistractorAnalysis()` also works with `LRA()` results for rated data:

```{r distractor-lra, eval=FALSE}
result.LRA.rated <- LRA(J21S300, nrank = 5, mic = TRUE)
da_lra <- DistractorAnalysis(result.LRA.rated)
print(da_lra, items = 1:3)
plot(da_lra, items = 1:6, nc = 3, nr = 2)
```

## Reference

Shojima, K. (2022). *Test Data Engineering*. Springer.
