---
title: "Introduction to spnaf"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to spnaf}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

The _spnaf_ package is developed for calculating spatial network autocorrelation for flow data. Functions in the package are designed specifically to evaluate how networks are spatially clustered, in the form of **$G_{ij}$ statistic** which is presented in the [paper](https://link.springer.com/article/10.1007/s101090050013) written by Berglund and Karlström.  


```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```
```{r setup, echo = FALSE}
library(spnaf)
knitr::opts_chunk$set(warning = FALSE, message = FALSE) 
```


## Data: CA
The package has a dataset called **CA** which stands for California, US. This dataset contains migration amounts among CA counties in 2019. The data consists of origins and destinations of each residential flow.  
```{r}
dim(CA)
head(CA)
```

## Data: CA_polygon
The package also has a sf object called **CA_polygon** which is a *sf* class object that represents boundaries of CA counties. It has id column and geometry column and can be plotted by attaching the *sf* package. The polygon can be joined with **CA** since it has id column that matches County code of **CA**. You can learn more about how to deal with spatial objects at https://r-spatial.github.io/sf/.

```{r}
library(sf)
plot(CA_polygon, col = 'white', main = 'CA polygon')
```

## Function: Gij.flow
*spnaf* package aims to measure spatial density of networks, which have origins (starting point) and destinations (ending point). Main function of *spnaf* is called **Gij.flow** and the first main input of the function is **df** which is OD data in a data.frame form that must contain "oid", "did", and "n" (please refer to the help document) like CA above. The second important input is **shape** which is corresponding polygon object in *sf* class. The function also inherited two parameters from [_spdep_](https://r-spatial.github.io/spdep/) such as **queen, snap**. The parameter **method** is one of c("t", "o", "d") which stand for total, origins only, and destinations only respectively (Please check [this paper](https://link.springer.com/article/10.1007/s10109-008-0068-2) to get more information about the method). The last parameter **n** is used for bootstrapping permutation of resampling the individual statistic n times to generate a non-parametric distribution, since there would be a violation of the assumption of normality when one tries to calculate a spatial statistic with polygons(see how authors told about it in [this paper](https://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1992.tb00261.x)). The process should be done to ensure a statistical significance of the statistic.

```{r}
args(Gij.flow)
```

### How to execute
```{r, warnings = FALSE}
# Data manipulation
CA <- spnaf::CA
OD <- cbind(CA$FIPS.County.Code.of.Geography.B, CA$FIPS.County.Code.of.Geography.A)
OD <- cbind(OD, CA$Flow.from.Geography.B.to.Geography.A)
OD <- data.frame(OD)
names(OD) <- c("oid", "did", "n")
OD$n <- as.numeric(OD$n)
OD <- OD[order(OD[,1], OD[,2]),]
head(OD) # check the input df's format

# Load sf polygon
CA_polygon <- spnaf::CA_polygon
head(CA_polygon) # it has geometry column

# Execution of Gij.flow with data above and given parameters
result <- Gij.flow(df = OD, shape = CA_polygon, method = 'queen', snap = 1, OD = 't', R = 1000)
```

### Interpretation of the result
The metric, an extended statistic of Getis and Ord (1992), $G_{i}^{*}$, has similar intuition of hotspot analysis with static data: a high and significant value in a flow indicates spatial clustering of flows with high values. it can be interpreted as Z-value suitable for conducting statistical tests as the metric inherited the characteristics of $G_{i}^{*}$. If one conducted bootstrapping for 1,000 times like above, those with a value greater than the 50th largest value of the distribution (i.e., at the significance level of 0.05) can be defined as positive clusters.

```{r, eval = TRUE}
# positive clusters at the significance level of 0.05
head(result[[1]][result[[1]]$pval < 0.05,])
# positive clusters at the significance level of 0.05 in lines class
head(result[[2]][result[[2]]$pval < 0.05,])
```

### Visualization of all flows and Significant Flows(<0.05) only
```{r, warning = FALSE, fig.show = "hold", out.width = "45%"}
library(tmap)
# plot all flows with the polygon (left)
tm_shape(CA_polygon) +
  tm_polygons()+
  tm_shape(result[[2]]) +
  tm_lines()
# plot significant flows only with the polygon (right)
tm_shape(CA_polygon) +
  tm_polygons()+
  tm_shape(result[[2]][result[[2]]$pval < 0.05,]) +
  tm_lines(col='pval')

```

### Reference
* Berglund, S. & Karlström, A. (1999). Identifying local spatial association in flow data, *Journal of Geographical Systems*, 1(3), 219-236. https://doi.org/10.1007/s101090050013  
* Getis, A. & Ord, J. K. (1992). The Analysis of Spatial Association by Use of Distance Statistics, *Geographical Analysis*, 24(3), 189-206. https://doi.org/10.1111/j.1538-4632.1992.tb00261.x  
* Chun, Y. (2008). Modeling network autocorrelation within migration flows by eigenvector spatial filtering. *Journal of Geographical Systems*, 10, 317–344. https://doi.org/10.1007/s10109-008-0068-2  
* Lee, Y., Park, S., Kim, K., Ha, H., and Lee, J. (2021). Discovering Millennials' Migration Clusters in Seoul, South Korea: A Local Spatial Network Autocorrelation Approach. *Findings*, November. https://doi.org/10.32866/001c.29523.