---
title: "4. Manipulating Simple Features"
author: "Edzer Pebesma"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{4. Manipulating Simple Features}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

**For a better version of the sf vignettes see** https://r-spatial.github.io/sf/articles/

```{r echo=FALSE, include=FALSE}
knitr::opts_chunk$set(fig.height = 4.5)
knitr::opts_chunk$set(fig.width = 6)
knitr::opts_chunk$set(collapse = TRUE)
```
This vignette describes how simple features, i.e. records that come with a geometry, can be manipulated, for the case where these manipulations involve geometries. Manipulations include:

* aggregating feature sets
* summarising feature sets
* joining two feature sets based on feature geometry

Features are represented by records in an `sf` object, and have feature attributes (all non-geometry fields) and feature geometry. Since `sf` objects are a subclass of `data.frame` or `tbl_df`, operations on feature attributes work identically to how they work on `data.frame`s, e.g.

```{r}
library(sf)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc, 2264)
nc[1,]
``` 
prints the first record.

Many of the tidyverse/dplyr verbs have methods for `sf` objects. This means that if both `sf` and `dplyr` are loaded, manipulations such as selecting a single attribute will return an `sf` object:
```{r}
library(dplyr)
nc |> select(NWBIR74) |> head(2)
```
which implies that the geometry is sticky, and gets added automatically. If we want to drop geometry, we can coerce to `data.frame` first, this drops geometry list-columns:
```{r}
nc |> as.data.frame() |> select(NWBIR74) |> head(2)
```

## Subsetting feature sets

We can subset feature sets by using the square bracket notation

```{r}
nc[1, "NWBIR74"]
```
and use the `drop` argument to drop geometries:
```{r}
nc[1, "NWBIR74", drop = TRUE]
```
but we can also use a spatial object as the row selector, to select features that intersect with another spatial feature:
```{r}
Ashe = nc[nc$NAME == "Ashe",]
class(Ashe)
nc[Ashe,]
```
We see that in the result set `Ashe` is included, as the default value for argument `op` in `[.sf` is `st_intersects()`, and `Ashe` intersects with itself. We could exclude self-intersection by using predicate `st_touches()` (overlapping features don't touch):
```{r}
Ashe = nc[nc$NAME == "Ashe",]
nc[Ashe, op = st_touches]
```
Using `dplyr`, we can do the same by calling the predicate directly:
```{r}
nc |> filter(lengths(st_touches(nc, Ashe)) > 0)
```

## Aggregating or summarizing feature sets

Suppose we want to compare the 1974 fraction of SID (sudden infant death) of the counties that intersect with `Ashe` to the remaining ones. We can do this by:
```{r}
a <- aggregate(nc[, c("SID74", "BIR74")], list(Ashe_nb = lengths(st_intersects(nc, Ashe)) > 0), sum)
a <- a |> mutate(frac74 = SID74 / BIR74) |> select(frac74)
plot(a[2], col = c(grey(.8), grey(.5)))
plot(st_geometry(Ashe), border = '#ff8888', add = TRUE, lwd = 2)
```

## Joining two feature sets based on attributes

The usual join verbs of base R (`merge`) and of dplyr (`left_join()`, etc) work for `sf` objects as well; the joining takes place on attributes (ignoring geometries). In case of no matching geometry, an empty geometry is substituted. The second argument should be a `data.frame` (or similar), not an `sf` object:

```{r}
x = st_sf(a = 1:2, geom = st_sfc(st_point(c(0,0)), st_point(c(1,1))))
y = data.frame(a = 2:3)
merge(x, y)
merge(x, y, all = TRUE)
right_join(x, y)
```

## Joining two feature sets based on geometries

For joining based on spatial intersections (of any kind), `st_join()` is used:

```{r fig=TRUE}
x = st_sf(a = 1:3, geom = st_sfc(st_point(c(1,1)), st_point(c(2,2)), st_point(c(3,3))))
y = st_buffer(x, 0.1)
x = x[1:2,]
y = y[2:3,]
plot(st_geometry(x), xlim = c(.5, 3.5))
plot(st_geometry(y), add = TRUE)
```

The join method is a left join, retaining all records of the first attribute:

```{r}
st_join(x, y)
st_join(y, x)
```

and the geometry retained is that of the first argument.

The spatial join predicate can be controlled with any function compatible with `st_intersects()` (the default), e.g.

```{r}
st_join(x, y, join = st_covers) # no matching y records: points don't cover circles
st_join(y, x, join = st_covers) # matches for those circles covering a point
```
