---
title: "Public perception of climate change: the `icons` dataset in the `hyper2` package"
author: "Robin K. S. Hankin"
bibliography: hyper2.bib
link-citations: true
output: html_vignette
vignette: |
  %\VignetteIndexEntry{icons}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r setup, include=FALSE}
library("hyper2",quietly=TRUE)
set.seed(0)
knitr::opts_chunk$set(echo = TRUE)
```

```{r label="hexsticker",out.width='20%', out.extra='style="float:right; padding:10px"',echo=FALSE}
knitr::include_graphics(system.file("help/figures/hyper2.png", package = "hyper2"))
```

To cite the `hyper2` package in publications, please use @hankin2017_rmd.
This short document analyses the climate change dataset introduced by
@oneill2008 and discussed in the `hyperdirichlet` R package
[@hankin2010], but using the improved `hyper2` package instead.
@oneill2009 observe that lay perception of climate change is a
complex and interesting process, and here we assess the engagement of
non-experts by the use of "icons" (this word is standard in this
context.  An icon is a "representative symbol") that illustrate
different impacts of climate change.

Here we analyse results of one experiment [@oneill2008], in which
subjects were presented with a set of icons of climate change and
asked to identify which of them they found most concerning.  Six icons
were used: PB [polar bears, which face extinction through loss of ice
floe hunting grounds], NB [the Norfolk Broads, which flood due to
intense rainfall events], L [London flooding, as a result of sea level
rise], THC [the thermo-haline circulation, which may slow or stop as a
result of anthropogenic modification of the water cycle], OA [oceanic
acidification as a result of anthropogenic emissions of ${\mathrm
C}{\mathrm O}_2$], and WAIS [the West Antarctic Ice Sheet, which is
rapidly calving as a result of climate change].  Methodological
constraints dictated that each respondent could be presented with a
maximum of four icons.  The R idiom below (dataset `icons` in the
package) shows the experimental results.

```{r label=loadlib}
library("hyper2",quietly=TRUE)
M <- icons_table # saves typing
M
```

(M is called `icons_table` in the package).  Each row of `M`
corresponds to a particular cue given to respondents.  The first row,
for example, means that a total of $5+3+4+3=15$ people were shown
icons NB, L, THC, WAIS [column names of the the non-NA entries]; 5
people chose NB as most concerning, 3 chose L, and so on.  The dataset
is more fully described in the package.  The builtin `icons`
likelihood function in the `hyper2` package may be created by using
the `saffy()` function:

```{r showicons}
icons
icons == saffy(icons_table)  # should be TRUE
```

At this point, the `icons` object as created above is mathematically
identical to the `icons` object in the `hyperdirichlet` package (and
indeed the `hyper2` package), but the terms might appear in a
different order due to `disordR` discipline.

## Analysis of the icons dataset

The first step is to find the maximum
likelihood estimate for the `icons` likelihood:

```{r estmaxlike,cache=TRUE}
options("digits" = 4)
(mic <- maxp(icons))
dotchart(mic,pch=16)
```

We also need the log-likelihood at an unconstrained maximum:

```{r calclog}
L1 <- loglik(indep(mic),icons)
options(digits=9)
L1
```

We see agreement to 4 decimal places with the value given in the
`hyperdirichlet` package.  The next step is to assess a number of
hypotheses concerning the relative sizes of $p_1$ through $p_6$.

# Hypothesis testing.

Below, I investigate a number of inequality hypotheses about
$p_1,\ldots, p_6$.  The general analysis proceeds as follows; we can
use $H_0\colon p_1=1/6$ as an example but the method applies to any
observation.  I consider the general case first, then modifications
for one-sided hypotheses.

* We have an observation ${\mathcal O}$  and 
wish to quantify the strength of evidence to support this.
* Translate our observation into a hypothesis phrased in terms as a
 restriction on parameter space.
* State a null hypothesis and alternative.  For example, we might
  have $H_0\colon p_1=1/6$ and $H_A\colon p_1\neq 1/6$, and $H_A$ is the complement of $H_0$.
  We sometimes write $H_A\colon\sum p_i=1$, ignoring the (measure zero) $H_0$.
* We then attempt to reject $H_0$ and to do this we perform two
  likelihood maximizations: one unconstrained, corresponding
  to $H_A$, and one constrained, corresponding to $H_0$.
* We calculate ${\mathcal L}_{H_0} :=\operatorname{argmax}_{p\in
  H_0}{\mathcal L}(p)$ and ${\mathcal L}_{H_A} :=
  \operatorname{argmax}_{p\in H_A}{\mathcal L}(p)$ and observe that, in general,
  ${\mathcal O}$ implies the strict inequality
  ${\mathcal L}_{H_A} > {\mathcal L}_{H_0}$.
* Calculate the likelihood ratio
  $R={\mathcal L}_{H_A}/{\mathcal L}_{H_0}>1$ or equivalently the
  _support_ for $H_0$, $\Lambda = \log R>0$.
* We may either use Edwards's criterion of two units of support per
  degree of freedom, or Wilks's theorem:
  $H_0\longrightarrow 2\Lambda\sim\chi^2_d$.  If we are
  considering inequality hypotheses, we have $d=1$. 
* If the criterion is met we may reject $H_0$ and infer
  that $H_A$ is the case.

As another example, consider $H_0\colon p_1=p_2=p_3$.  The vector
${\mathbf p}\in \left\{\left.\left(p_1,\ldots,p_6\right)\right| \sum
p_i=1\right\}$ is a point in a 5-dimensional manifold, and $H_0$
asserts that ${\mathbf p}\in
\left\{\left.\left(p_1,p_1,p_1,p_4,p_5,p_6\right)\right|
3p_1+p_4+p_5+p_6=1\right\}$, that is, a 3-dimensional manifold.  The
null imposes a loss of two degrees of freedom.

The logic above operates for one-sided tests.  We might observe that
$\hat{p_1}$ is the largest and wish to test a null of $p_1\leqslant
1/6$ against an alternative of $p_1>1/6$.  Note that the one-sided
p-value and likelihood ratio statistic are the same as the two-sided
values.

Below I will rework some of the hypotheses tested in the
hyperdirichlet package, with consistent labelling of null and
alternative hypotheses, and renumbering

## Equality of strengths

The most straightforward null would be the hypothesis of player
equality, specifically:

\[
H_E\colon p_1=p_2=\cdots=p_n=\frac{1}{n}.
\]

("E" for equal).  This was not carried out in @hankin2010, because I
had not thought of it.  Testing $H_E$ is implemented in the package as
`test.equalp()`.  Note that because $H_E$ is a point hypothesis we
would have $n-1$ degrees of freedom (because $H_0$ has $n-1$ df).

```{r doequalp,cache=TRUE}
equalp.test(icons)
```	

## Hypothesis 1: $p_1 = 1/6$

Following the analysis in @hankin2010, and restated above, we first
observe that NB [the Norfolk Broads] is the icon with the largest
estimated probability with $\hat{p_1}\simeq 0.25$ (O'Neill gives a
number of theoretical reasons to expect $p_1$ to be large).  This
would suggest that $p_1$ is in fact large, in some sense, and here I
show how to assess this statement statistically.  Consider the
hypothesis $H_0\colon p_1=1/6$, and as per the protocol above we will
try to reject it.

To that end, we perform a constrained optimization, with (active)
constraint that $p_1\leqslant 1/6$ (note that the inequality
constraint allows us to use fast `maxp()`, which has access to
derivatives).  We note the support at the evaluate and then compare
this support with the support at the unconstrained evaluate; if the
difference in support is large then this constitutes strong evidence
for $H_A$ and then would conclude that $p_1 > 1/6$.  In package idiom,
the optimization is implemented by the `specificp.test()` suite of
functions; these work by imposing additional constraints to the
`maxp()` function via the `fcm` and `fcv` arguments.  Using the
defaults we have:

```{r dospecificp,cache=TRUE}
specificp.test(icons,1)
```	

which tests a null of $p_1\leqslant\frac{1}{6}$; see how the evaluate
under the null is on the boundary and we have $\hat{p_1}=\frac{1}{6}$.
Compare the support of 2.607 with 2.608181 from the `hyperdirichlet`
package.  This exceeds Edwards's two-units-of-support criterion; the
$p$-value is obtained by applying Wilks's theorem on the asymptotic
distribution of 2\log\Lambda.

Both these criteria indicate that we may reject that hypothesis that
$p_1\leqslant 1/6$ and thereby infer $p_1>\frac{1}{6}$.

## Hypothesis 2: $p_1\geqslant\max\left(p_2,\ldots,p_6\right)$

We observe that NB (the Norfolk Broads) has large strength, and
hypothesise that it is in fact stronger than any other icon.  This is
another constrained likelihood maximization, although this one is not
possible with convex constraints.  In the language of the generic
procedure given above, we would have

\[
H_0\colon 
(p_1\leqslant p_2)\cup
(p_1\leqslant p_3)\cup
(p_1\leqslant p_4)\cup
(p_1\leqslant p_5)\cup
(p_1\leqslant p_6)
\]

\[
\overline{H_0}\colon 
(p_1 > p_2)\cap
(p_1 > p_3)\cap
(p_1 > p_4)\cap
(p_1 > p_5)\cap
(p_1 > p_6)
\]

(although note that $H_0$ has nonzero measure).  In words, $H_0$
states that $p_1$ is smaller than $p_2$, or $p_1$ is smaller than
$p_3$, etc; while $\overline{H_0}$ states that $p_1$ is larger than
$p_2$, and is larger than $p_3$, and so on.  Considering the evaluate,
we see that ${\mathcal L}_{H_0} < {\mathcal L}_{\overline{H_0}}$, so
optimizing over $\overline{H_0}$ is equivalent to unconstrained
optimization [of course, the intrinsic constraints $p_i\geqslant 0,
\sum p_i\leqslant 1$ have to be respected].  Here, $\overline{H_0}$ is
regions of $(p_1,\ldots,p_6)$ with $p_1$ being greater than *at least
one* of $p_2,\ldots,p_6$.  The union of convex sets is not necessarily
convex (e.g. a two-way Venn diagram).  As far as I can see, the only
way to do it is to perform a sequence of five constrained
optimizations: $p_1\leqslant p_2, p_1\leqslant p_3, p_1\leqslant p_4,
p_1\leqslant p_5$.  The fillup constraint would be $p_1\leqslant
p_6\longrightarrow 2p_1+p_2+\cdots +p_5\leqslant 1$.  We then choose
the largest likelihood from the five.

```{r largestoffive,cache=TRUE}
o <- function(Ul,Cl,startp,give=FALSE){
    small <- 1e-4  #  ensure start at an interior point
    if(missing(startp)){startp <- small*(1:5)+rep(0.1,5)}			
    out <- maxp(icons, startp=small*(1:5)+rep(0.1,5), give=TRUE, fcm=Ul,fcv=Cl)
    if(give){
        return(out)
    }else{
        return(out$value)
    }
}

p2max <- o(c(-1, 1, 0, 0, 0), 0)  # p1 <= p2
p3max <- o(c(-1, 0, 1, 0, 0), 0)  # p1 <= p3
p4max <- o(c(-1, 0, 0, 1, 0), 0)  # p1 <= p4
p5max <- o(c(-1, 0, 0, 0, 1), 0)  # p1 <= p5
p6max <- o(c(-2,-1,-1,-1,-1),-1)  # p1 <= p6 (fillup)
```

(the final line is different because $p_6$ is the fillup value).

```{r pnmax}
likes <- c(p2max,p3max,p4max,p5max,p6max)
likes
ml <- max(likes) 
ml
```

Thus the first element of `likes` corresponds to the maximum
likelihood, constrained so that $p_1\leqslant p_2$; the second element
corresponds to the constraint that $p_1\leqslant p_3$, and so on.  The
largest likelihood is the easiest constraint to break, in this case
$p_1\leqslant p_3$: this makes sense because $p_3$ has the second
highest MLE after $p_1$.  The extra likelihood is given by

```{r extralike}
L1-ml
```

(the `hyperdirichlet` package gives 0.0853 here, a surprisingly small
discrepancy given the difficulties of optimizing over a nonconvex
region).  We conclude that there is no evidence for
$p_1\geqslant\max\left(p_2,\ldots,p_6\right)$.

It's worth looking at the evaluate too:

```{r evalworth,cache=TRUE}
o2 <- function(Ul,Cl){
  jj <-o(Ul,Cl,give=TRUE)
  out <- c(jj[[1]],1-sum(jj[[1]]),jj[[2]])
  names(out) <- c("p1","p2","p3","p4","p5","p6","support")
  return(out)
}
rbind(
o2(c(-1, 1, 0, 0, 0), 0),  # p1 <= p2
o2(c(-1, 0, 1, 0, 0), 0),  # p1 <= p3
o2(c(-1, 0, 0, 1, 0), 0),  # p1 <= p4
o2(c(-1, 0, 0, 0, 1), 0),  # p1 <= p5
o2(c(-2,-1,-1,-1,-1),-1)   # p1 <= p6
)
```

In the above, the evaluate is the first five columns ($p_6$ being the
fillup) and the final column is the log-likelihood at the evaluate.
See how the constraint is active in each line: `M[1,] == M[1:5,2:6]`.
Also note that the largest log-likelihood is the second row: if we
were to violate any of the constraints, it would be $p_1<p_3$,
consistent with the fact that $p_3$ (polar bears) has the second
highest strength, after $p_1$.

## Low frequency responses

The next hypothesis follows from the observed smallness of $\hat{p_5}$
(ocean acidification) and $\hat{p_6}$ (West Antarctic Ice Sheet) at
0.111 and 0.069 respectively.  These two strengths correspond to
"distant" concerns and O'Neill had reason to consider their sum (which
she argued would be small).  Thus we specify $H_0\colon
p_5+p_6\leqslant \frac{1}{3}$.

The optimizing constraint of $p_5+p_6\geqslant\frac{1}{3}$ translates
to an operational constraint of
$-p_1-p_2-p_3-p_4\geqslant-\frac{2}{3}$ (because
$p_5+p_6=1-p_1-p_2-p_3-p_4$):

```{r lowfreq,cache=TRUE} 
jj <- o(c(-1,-1,-1,-1,0) , -2/3, give=TRUE,start=indep((1:6)/21))$value
jj
```

then the extra support is 

```{r extralowfreq}
L1-jj
```

(compare 7.711396 in `hyperdirichlet`, not sure why the discrepancy is
so large).

## Final example

The final example is motivated by the fact that both the distant
icons $p_5$ and $p_6$ had lower strength than any of the local icons
$p_1,\ldots, p_4$.  Thus we would have
$H_0\colon\max\left\{p_5,p_6\right\}\geqslant\min\left\{p_1,p_2,p_3,p_4\right\}$.
This means the null optimization is constrained so that *at least one*
of $\left\{p_5,p_6\right\}$ exceeds *at least one* of
$\left\{p_1,p_2,p_3,p_4\right\}$.  So we have the union of the various
possibilities:

\begin{equation}\label{eq:Habar}H_0=
\overline{H_A}\colon
\bigcup_{j\in\left\{5,6\right\}\atop k\in\left\{1,2,3,4\right\}}
\left\{\left(p_1,p_2,p_3,p_4,p_5,p_6\right)\left|\sum p_i=1,
p_j\geqslant p_k\right.\right\}
\end{equation}

The fillup value $p_6$ behaves differently in this context and
$p_6\geqslant p_2$, say, translates to $-p_1-2p_2-p_3-p_4-p_5\geqslant
-1$.

```{r ofcourse,cache=TRUE}
small <- 1e-4
start <- indep(c(small,small,small,small,0.5-2*small,0.5-2*small))
jj <- c(
   o(c(-1, 0, 0, 0, 1), 0,start=start),  # p1 >= p5
   o(c( 0,-1, 0, 0, 1), 0,start=start),  # p2 >= p5
   o(c( 0, 0,-1, 0, 1), 0,start=start),  # p3 >= p5
   o(c( 0, 0, 0,-1, 1), 0,start=start),  # p4 >= p5

   o(c(-2,-1,-1,-1,-1),-1,start=start),  # p1 >= p6
   o(c(-1,-2,-1,-1,-1),-1,start=start),  # p2 >= p6
   o(c(-1,-1,-2,-1,-1),-1,start=start),  # p3 >= p6
   o(c(-1,-1,-1,-2,-1),-1,start=start)   # p4 >= p6
   )
jj
```

Above, the elements of vector `jj` are the maximum likelihoods for
each of the separate components of the parameter space allowed under
$H_0$.  Note that these components are not disjoint.  So the maximum
likelihood for the whole of allowable parameters under $H_0$ would be
the maximum of these maxima:

```{r maxjj}
max(jj)
```

This corresponds to the fourth component of $H_0$, viz $p_4=p_5$.  The
extra support is thus

```{r extraofcourse}
L1-max(jj)
```

(compare `hyperdirichlet` which gives 3.16, not sure why the
difference although if pressed I would point to `hyper2` using
multiple walkers in its optimization [defaulting to $n=10$] ).  We
should look at the maximum value:

```{r lookatmax,cache=TRUE}
o(c( 0, 0, 0,-1, 1), 0,give=TRUE,start=start)
```

So the evaluate is at the boundary, for $p_4=p_5$; `THC` and `OA` have
the same Bradley-Terry strength.  The small amount of extra support
given by allowing an unconstrained optimization would suggest that
there is no strong evidence for the contention investigated, viz "both
the distant icons $p_5$ and $p_6$ have lower strength than any of the
local icons $p_1,\ldots, p_4$".


## References