---
title: "Introduction to geobr (R)"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to geobr (R)}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = identical(tolower(Sys.getenv("NOT_CRAN")), "true"),
  out.width = "100%"
)


```


The [**geobr**](https://github.com/ipeaGIT/geobr) package provides quick and easy access to official spatial data sets of Brazil. The syntax of all **geobr** functions operate on a simple logic that allows users to easily download a wide variety of data sets with updated geometries and harmonized attributes and geographic projections across geographies and years. This vignette presents a quick intro to **geobr**.


## Installation

You can install geobr from CRAN or the development version to use the latest features.

```{r eval=FALSE, message=FALSE, warning=FALSE}
# From CRAN
install.packages("geobr")

# Development version
utils::remove.packages('geobr')
devtools::install_github("ipeaGIT/geobr", subdir = "r-package")

```

Now let's load the libraries we'll use in this vignette.

```{r message=FALSE, warning=FALSE, results='hide'}
library(geobr)
library(sf)
library(dplyr)
library(ggplot2)
```



## General usage

### Available data sets

The geobr package covers 27 spatial data sets, including a variety of political-administrative and statistical areas used in Brazil. You can view what data sets are available using the `list_geobr()` function.

```{r message=FALSE, warning=FALSE}
# Available data sets
datasets <- list_geobr()

head(datasets)

```


## Download spatial data as `sf` objects

The syntax of all *geobr* functions operate one the same logic, so the code to download the data becomes intuitive for the user. Here are a few examples.

Download an specific geographic area at a given year
```{r message=FALSE, warning=FALSE}
# State of Sergige
state <- read_state(
  code_state="SE",
  year=2018,
  showProgress = FALSE
  )

# Municipality of Sao Paulo
muni <- read_municipality(
  code_muni = 3550308, 
  year=2010, 
  showProgress = FALSE
  )

ggplot() + 
  geom_sf(data = muni, color=NA, fill = '#1ba185') +
  theme_void()
```


Download all geographic areas within a state at a given year
```{r message=FALSE, warning=FALSE, results='hide'}
# All municipalities in the state of Minas Gerais
muni <- read_municipality(code_muni = "MG", 
                          year = 2007,
                          showProgress = FALSE)

# All census tracts in the state of Rio de Janeiro
cntr <- read_census_tract(
  code_tract = "RJ", 
  year = 2010,
  showProgress = FALSE
  )

head(muni)
```

If the parameter `code_` is not passed to the function, geobr returns the data for the whole country by default.

```{r message=FALSE, warning=FALSE}
# read all intermediate regions
inter <- read_intermediate_region(
  year = 2017,
  showProgress = FALSE
  )

# read all states
states <- read_state(
  year = 2019, 
  showProgress = FALSE
  )

head(states)
```


## Important note about data resolution

All functions to download polygon data such as states, municipalities etc. have a `simplified` argument. When `simplified = FALSE`, geobr will return the original data set with high resolution at detailed geographic scale (see documentation). By default, however, `simplified = TRUE` and geobr returns data set geometries with simplified borders to improve speed of downloading and plotting the data.




## Plot the data

Once you've downloaded the data, it is really simple to plot maps using `ggplot2`.

```{r message=FALSE, warning=FALSE, fig.height = 8, fig.width = 8, fig.align = "center"}
# Remove plot axis
no_axis <- theme(axis.title=element_blank(),
                 axis.text=element_blank(),
                 axis.ticks=element_blank())

# Plot all Brazilian states
ggplot() +
  geom_sf(data=states, fill="#2D3E50", color="#FEBF57", size=.15, show.legend = FALSE) +
  labs(subtitle="States", size=8) +
  theme_minimal() +
  no_axis

```



Plot all the municipalities of a particular state, such as Rio de Janeiro:

```{r message=FALSE, warning=FALSE, fig.height = 8, fig.width = 8, fig.align = "center"}

# Download all municipalities of Rio
all_muni <- read_municipality(
  code_muni = "RJ", 
  year= 2010,
  showProgress = FALSE
  )

# plot
ggplot() +
  geom_sf(data=all_muni, fill="#2D3E50", color="#FEBF57", size=.15, show.legend = FALSE) +
  labs(subtitle="Municipalities of Rio de Janeiro, 2000", size=8) +
  theme_minimal() +
  no_axis

```


## Thematic maps

The next step is to combine  data from ***geobr*** package with other data sets to create thematic maps. In this first example, we will be using data from the (Atlas of Human Development (by Ipea/FJP and UNPD) to create a choropleth map showing the spatial variation of **Life Expectancy at birth** across Brazilian states.

#### Merge external data

First, we need a `data.frame` with estimates of Life Expectancy and merge it to our spatial database. The two-digit abbreviation of state name is our key column to join these two databases.

```{r message=FALSE, warning=FALSE, results='hide'}
# Read data.frame with life expectancy data
df <- utils::read.csv(system.file("extdata/br_states_lifexpect2017.csv", package = "geobr"), encoding = "UTF-8")

states$name_state <- tolower(states$name_state)
df$uf <- tolower(df$uf)

# join the databases
states <- dplyr::left_join(states, df, by = c("name_state" = "uf"))

```


#### Plot thematic map

```{r message=FALSE, warning=FALSE, fig.height = 8, fig.width = 8, fig.align = "center" }
ggplot() +
  geom_sf(data=states, aes(fill=ESPVIDA2017), color= NA, size=.15) +
    labs(subtitle="Life Expectancy at birth, Brazilian States, 2014", size=8) +
    scale_fill_distiller(palette = "Blues", name="Life Expectancy", limits = c(65,80)) +
    theme_minimal() +
    no_axis

```

### Using **geobr** together with **censobr**

Following the same steps as above, we can use together **geobr** with our sister package [**censobr**](https://ipeagit.github.io/censobr/index.html) to map the proportion of households connected to a sewage network in Brazilian municipalities 

First, we need to download households data from the Brazilian census using the `read_households()` function.



```{r}
library(censobr)
library(arrow)

hs <- read_households(year = 2010, 
                      showProgress = FALSE)

```

Now we're going to (a) group observations by municipality, (b) get the number of households connected to a sewage network, (c) calculate the proportion of households connected, and (d) collect the results.

```{r, warning = FALSE}
esg <- hs |> 
        collect() |>
        group_by(code_muni) |>                                             # (a)
        summarize(rede = sum(V0010[which(V0207=='1')]),                    # (b)
                  total = sum(V0010)) |>                                   # (b)
        mutate(cobertura = rede / total) |>                                # (c)
        collect()                                                          # (d)

head(esg)
```
Now we only need to download the geometries of Brazilian municipalities from **geobr**, merge the spatial data with our estimates and map the results.

```{r, warning = FALSE}
# download municipality geometries
muni_sf <- geobr::read_municipality(year = 2010,
                                    showProgress = FALSE)

# merge data
esg_sf <- left_join(muni_sf, esg, by = 'code_muni')

# plot map
ggplot() +
  geom_sf(data = esg_sf, aes(fill = cobertura), color=NA) +
  labs(title = "Share of households connected to a sewage network") +
  scale_fill_distiller(palette = "Greens", direction = 1, 
                       name='Share of\nhouseholds', 
                       labels = scales::percent) +
  theme_void()

```
