---
title: "Reproducing Goeyvaerts et al. (2018) using `ergm.multi`"
author: "Pavel N. Krivitsky"
date: "`ergm.multi` version `r packageVersion('ergm.multi')` (`r Sys.Date()`)"
output: rmarkdown::html_vignette
bibliography: ../inst/REFERENCES.bib
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
  %\VignetteIndexEntry{Reproducing Goeyvaerts et al. (2018) using ergm.multi}
---

```{css, echo=FALSE}
pre {
  font-size: 85%
}
```

```{r, echo=FALSE, cache=FALSE, eval=TRUE}
library(knitr)
library(rmarkdown)
options(rmarkdown.html_vignette.check_title = FALSE)
opts_chunk$set(message=FALSE, echo=TRUE, cache=TRUE, autodep=TRUE,
concordance=TRUE, error=FALSE, fig.width=7, fig.height=7)
options(width=160)
```

```{r, message=FALSE}
library(ergm.multi)
library(dplyr)
library(purrr)
library(tibble)
library(ggplot2)
```

# Obtaining data

The list of networks studied by @GoSa18h is included in this package:
```{r}
data(Goeyvaerts)
length(Goeyvaerts)
```
An explanation of the networks, including a list of their
network (`%n%`) and vertex (`%v%`) attributes, can be obtained via `?Goeyvaerts`.
A total of `r length(Goeyvaerts)` complete networks were collected, then two were excluded due to "nonstandard" family composition:
```{r}
Goeyvaerts %>% discard(`%n%`, "included") %>% map(as_tibble, unit="vertices")
```

To reproduce the analysis, exclude them as well:
```{r}
G <- Goeyvaerts %>% keep(`%n%`, "included")
```

# Data summaries

Obtain weekday indicator, network size, and density for each network, and summarize them as in @GoSa18h Table 1:
```{r}
G %>% map(~list(weekday = . %n% "weekday",
                n = network.size(.),
                d = network.density(.))) %>% bind_rows() %>%
  group_by(weekday, n = cut(n, c(1,2,3,4,5,9))) %>%
  summarize(nnets = n(), p1 = mean(d==1), m = mean(d)) %>% kable()
```

# Reproducing ERGM fits

We now reproduce the ERGM fits. First, we extract the weekday networks:
```{r}
G.wd <- G %>% keep(`%n%`, "weekday")
length(G.wd)
```

Next, we specify the multi-network model using the `N(formula, lm)` operator. This operator will evaluate the `ergm` formula `formula` on each network, weighted by the predictors passed in the one-sided `lm` formula, which is interpreted the same way as that passed to the built-in `lm()` function, with its "`data`" being the table of network attributes.

Since different networks may have different compositions, to have a consistent model, we specify a consistent list of family roles.
```{r}
roleset <- sort(unique(unlist(lapply(G.wd, `%v%`, "role"))))
```

We now construct the formula object, which will be passed directly to `ergm()`:
```{r}
# Networks() function tells ergm() to model these networks jointly.
f.wd <- Networks(G.wd) ~
  # This N() operator adds three edge counts:
  N(~edges,
    ~ # one total for all networks  (intercept implicit as in lm),
      I(n<=3)+ # one total for only small households, and
      I(n>=5) # one total for only large households.
    ) +

  # This N() construct evaluates each of its terms on each network,
  # then sums each statistic over the networks:
  N(
      # First, mixing statistics among household roles, including only
      # father-mother, father-child, and mother-child counts.
      # Since tail < head in an undirected network, in the
      # levels2 specification, it is important that tail levels (rows)
      # come before head levels (columns). In this case, since
      # "Child" < "Father" < "Mother" in alphabetical order, the
      # row= and col= categories must be sorted accordingly.
    ~mm("role", levels = I(roleset),
        levels2=~.%in%list(list(row="Father",col="Mother"),
                           list(row="Child",col="Father"),
                           list(row="Child",col="Mother"))) +
      # Second, the nodal covariate effect of age, but only for
      # edges between children.
      F(~nodecov("age"), ~nodematch("role", levels=I("Child"))) +
      # Third, 2-stars.
      kstar(2)
  ) +
  
  # This N() adds one triangle count, totalled over all households
  # with at least 6 members.
  N(~triangles, ~I(n>=6))
```
See `ergmTerm?mm` for documentation on the `mm` term used above.
Now, we can fit the model:
```{r}
# (Set seed for predictable run time.)
fit.wd <- ergm(f.wd, control=snctrl(seed=123))
```
```{r}
summary(fit.wd)
```


Similarly, we can extract the weekend network, and fit it to a smaller model. We only need one `N()` operator, since all statistics are applied to the same set
of networks, namely, all of them.
```{r}
G.we <- G %>% discard(`%n%`, "weekday")
fit.we <- ergm(Networks(G.we) ~
                 N(~edges +
                     mm("role", levels=I(roleset),
                        levels2=~.%in%list(list(row="Father",col="Mother"),
                                           list(row="Child",col="Father"),
                                           list(row="Child",col="Mother"))) +
                     F(~nodecov("age"), ~nodematch("role", levels=I("Child"))) +
                     kstar(2) +
                     triangles), control=snctrl(seed=123))
```
```{r}
summary(fit.we)
```

# Diagnostics

Perform diagnostic simulation [@KrCo22t], summarize the residuals, and make residuals vs. fitted and scale-location plots:
```{r}
gof.wd <- gofN(fit.wd, GOF = ~ edges + kstar(2) + triangles)
summary(gof.wd)
```

Variances of Pearson residuals substantially greater than 1 suggest unaccounted-for heterogeneity.

```{r}
autoplot(gof.wd)
```

The plots don't look unreasonable.


Also make plots of residuals vs. square root of fitted and vs. network size:
```{r}
autoplot(gof.wd, against=sqrt(.fitted))
autoplot(gof.wd, against=ordered(n))
```

It looks like network-size effects are probably accounted for.

# References