---
title: "MacKay's ITILA Examples"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{MacKay's ITILA Examples}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  fig.width  = 6,
  fig.height = 6
)
```

```{r setup, message = FALSE}
library(gghinton)
library(ggplot2)
```

I first came across Hinton diagrams in David MacKay's excellent book _Information Theory, Inference, and Learning Algorithms_ ([ITILA](https://www.inference.org.uk/mackay/itila/book.html), Cambridge University Press, 2003). 
Here we recreate some examples from Chapter 2, visualising discrete probability distributions over characters and character pairs in English text.


## The MacKay colour scheme

MacKay's convention in this chapter is the inverse of the default
`gghinton` style: white squares on a black background, with square area
proportional to probability. It's simple enough to change this by updating the theme.

MacKay's figures use unsigned data (all probabilities are non-negative), so
`scale_fill_hinton(values = c(unsigned = "white"))` combined with a black panel
background reproduces his style:

```{r theme-mackay}
theme_mackay <- function() {
  theme_hinton() +
    theme(
      panel.background = element_rect(fill = "black", colour = NA),
      panel.border = element_rect(colour = "grey30", fill = NA,
                                  linewidth = 0.4), 
      axis.text = element_text(size = 12, family = "mono")
    )
}
```

---

## Figure 2.1: discrete unigram probabilities

MacKay's Figure 2.1 gives the unigram probabilities (estimated from the
Linux FAQ), which can be reproduced directly:

```{r unigram}
chars27     <- c(letters, " ")
axis_labels <- c(letters, "_")

# Probabilities from MacKay ITILA Table / Figure 2.1
p_char <- c(
  a = 0.0575, b = 0.0128, c = 0.0263, d = 0.0285, e = 0.0913,
  f = 0.0173, g = 0.0133, h = 0.0313, i = 0.0599, j = 0.0006,
  k = 0.0084, l = 0.0335, m = 0.0235, n = 0.0596, o = 0.0689,
  p = 0.0192, q = 0.0008, r = 0.0508, s = 0.0567, t = 0.0706,
  u = 0.0334, v = 0.0069, w = 0.0119, x = 0.0073, y = 0.0164,
  z = 0.0007, ` ` = 0.1928
)

# Display as a single-column Hinton diagram (1x27 matrix, one column)
unigram_mat <- matrix(p_char, nrow = length(p_char), ncol = 1,
                      dimnames = list(chars27, "p"))
df_uni <- matrix_to_hinton(unigram_mat)

ggplot(df_uni, aes(x = col, y = row, weight = weight)) +
  geom_hinton() +
  scale_fill_hinton(values = c(unsigned = "white")) +
  scale_y_continuous(breaks = seq_along(chars27),
                     labels = rev(axis_labels),
                     expand = c(0.02, 0.02)) +
  scale_x_continuous(breaks = NULL) +
  coord_fixed() +
  theme_mackay() +
  theme(axis.text.y = element_text(size = 8, family = "mono")) +
  labs(
    x        = NULL,
    y        = NULL
  )
```

![ITILA fig 2.1 original](../img/itila-fig2.1.png)

---

## Figure 2.2: English letter bigrams

MacKay's Figure 2.2 shows the joint probability distribution $P(x, y)$ over the
27 x 27 = 729 possible bigrams (letter pairs) in an English text -- the 26
letters plus space (shown as `_`).  The source in the book is _The Frequently
Asked Questions Manual for Linux_; we use the full text of
*Alice's Adventures in Wonderland* (Lewis Carroll, 1865; Project Gutenberg
item 11, public domain) instead, shipped as the `alice_bigrams` dataset in this package.

```{r bigram-compute}
# alice_bigrams[x, y] = count of character x immediately followed by y
bg_prob <- alice_bigrams / sum(alice_bigrams)

# Axis labels: a-z then "-" for space (MacKay's convention)
chars27     <- c(letters, " ")
axis_labels <- c(letters, "_")

df_bg <- matrix_to_hinton(bg_prob)
```

```{r bigram-plot, fig.width = 7.5, fig.height = 7.5}
ggplot(df_bg, aes(x = col, y = row, weight = weight)) +
  geom_hinton() +
  scale_fill_hinton(values = c(unsigned = "white")) +
  # x: column 1 = 'a', column 27 = '-' (space)
  scale_x_continuous(
    breaks = seq_along(chars27),
    labels = axis_labels,
    expand = c(0.02, 0.02)
  ) +
  # y: row 1 (matrix row 'a') maps to highest y; labels reversed so 'a' is at top
  scale_y_continuous(
    breaks = seq_along(chars27),
    labels = rev(axis_labels),
    expand = c(0.02, 0.02)
  ) +
  coord_fixed() +
  theme_mackay() +
  labs(
    title    = "English letter bigrams: joint probability P(x, y)",
    subtitle = "Recreating MacKay ITILA Figure 2.2 (source: Alice in Wonderland)",
    x        = "y (second character)",
    y        = "x (first character)"
  )
```

![ITILA fig 2.2 original](../img/itila-fig2.2.png)

```{r corpus-size}
# Fraction of the 729 cells with at least one observed bigram
mean(alice_bigrams > 0)
# Total bigrams observed
sum(alice_bigrams)
```

---

## Figure 2.3: Conditional probability distributions

Normalising each row of the joint bigram matrix by its row sum gives
P(y|x) -- the distribution over second characters given the first.
Normalising each column by its column sum gives P(x|y) -- the distribution
over first characters given the second. MacKay's Figure 2.3 displays both
as Hinton diagrams side by side.

```{r cond-compute}
# P(y|x): row-normalise -- each row sums to 1
row_sums <- rowSums(alice_bigrams)
cond_yx  <- alice_bigrams / row_sums          # M[x, y] = P(y | first = x)

# P(x|y): column-normalise -- each column sums to 1
col_sums <- colSums(alice_bigrams)
cond_xy  <- sweep(alice_bigrams, 2, col_sums, "/")  # M[x, y] = P(x | second = y)

# Combine into one data frame for faceting
df_yx <- matrix_to_hinton(cond_yx)
df_xy <- matrix_to_hinton(cond_xy)
df_yx$panel <- "(a) P(y | x)"
df_xy$panel <- "(b) P(x | y)"
df_cond <- rbind(df_yx, df_xy)
```

```{r cond-plot, fig.width = 15, fig.height = 7.5}
ggplot(df_cond, aes(x = col, y = row, weight = weight)) +
  geom_hinton() +
  scale_fill_hinton(values = c(unsigned = "white")) +
  scale_x_continuous(breaks = seq_along(chars27), labels = axis_labels,
                     expand = c(0.02, 0.02)) +
  scale_y_continuous(breaks = seq_along(chars27), labels = rev(axis_labels),
                     expand = c(0.02, 0.02)) +
  coord_fixed() +
  facet_wrap(~ panel, ncol = 2) +
  theme_mackay() +
  labs(
    title    = "English letter bigrams: conditional probability P(x|y) and P(y|x)",
    subtitle = "Recreating MacKay ITILA Figure 2.3",
    x        = "y (second character)",
    y        = "x (first character)"
  )
```

![ITILA fig 2.3 original](../img/itila-fig2.3.png)

---

## Figure 2.5: Bill and Fred's urn problem

MacKay introduces this joint distribution to illustrate Bayesian inference
(ITILA Exercise 2.3).

Setup: An urn contains $N = 10$ balls.  Fred draws $u$, the number of black
balls, from a uniform prior $P(u) = 1/11$ for $u = 0, 1, \ldots, 10$.  Bill then
draws $N = 10$ balls with replacement and observes $n_B$ black balls.  The joint
distribution is:

$$P(u, n_B) = P(u) \cdot P(n_B | u, N) \cdot \mathrm{Binomial}(n_B; N = 10, p = u/10)$$

```{r urn-compute}
N       <- 10L
u_vals  <- 0:N   # number of black balls in the urn (Fred's choice)
nB_vals <- 0:N   # number of black balls observed in N draws (Bill's data)

# Rows = u (0..10), columns = n_B (0..10)
joint_mat <- outer(u_vals, nB_vals, function(u, nB) {
  (1 / (N + 1)) * dbinom(nB, size = N, prob = u / N)
})
rownames(joint_mat) <- u_vals
colnames(joint_mat) <- nB_vals

df_urn <- matrix_to_hinton(joint_mat)
```

```{r urn-plot, fig.width = 6, fig.height = 6}
ggplot(df_urn, aes(x = col, y = row, weight = weight)) +
  geom_hinton() +
  scale_fill_hinton(values = c(unsigned = "white")) +
  # row 1 of the matrix (u = 0) maps to the highest y, so labels are reversed
  scale_x_continuous(breaks = 1:(N + 1L), labels = nB_vals,
                     expand = c(0.04, 0.04)) +
  scale_y_continuous(breaks = 1:(N + 1L), labels = rev(u_vals),
                     expand = c(0.04, 0.04)) +
  coord_fixed() +
  theme_mackay() +
  labs(
    title    = "Joint probability P(u, n_B | N = 10)",
    subtitle = "Recreating MacKay ITILA Figure 2.5",
    x        = expression(n[B]~~"(observed black balls)"),
    y        = expression(u~~"(black balls in urn)")
  )
```

The dominant diagonal reflects that $n_B$ is most probable near $u$, with the
corners ($u$ = 0, $n_B$ = 0) and ($u$ = 10, $n_B$ = 10) being certain outcomes.  This
structure is immediately legible in the Hinton diagram but would be hard to
read in a table of 121 numbers.

![ITILA fig 2.5 original](../img/itila-fig2.5.png)