---
title: <br><center>Datasets in `"sdam"` package</center>
date:  "August 2022" 
author: 
  - name: <center>Antonio Rivero Ostoic</center><div style="margin-bottom:-20px;"> </div>
    affiliation: <center>Aarhus University</center><div style="margin-bottom:-20px;"> </div>
    email: <center>jaro@cas.au.dk</center>
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Datasets in `"sdam"` package}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

<style type="text/css">
h1.title {
  text-align: center; font-size: 20pt; color: #400040; 
}
h2 {
  font-size: 16pt; color: #400040;
}
h3 {
  font-size: 14pt; color: #400080;
}
h4 {
  text-align: center; color: DarkRed;
}
</style>


```{r setup, echo=FALSE, message=FALSE}
knitr::opts_chunk$set(echo=TRUE,error=TRUE)
knitr::opts_chunk$set(comment = "")
library("sdam")
```

```{r set-options, echo=FALSE, cache=FALSE}
options(width = 96)
```


<div style="margin-bottom:20px;"> </div>


## Preliminaries
Install and load one version of `"sdam"` package.

```{r, echo=TRUE, eval=FALSE}
install.packages("sdam") # from CRAN
devtools::install_github("sdam-au/sdam") # development version
devtools::install_github("mplex/cedhar", subdir="pkg/sdam") # a legacy version R 3.6.x
```
<div style="margin-bottom:10px;"> </div>
```{r}
# load and check version
library(sdam)
packageVersion("sdam")
```

<div style="margin-bottom:40px;"> </div>


## Built-in datasets
Package `"sdam"` comes with a suite of datasets and external data to execute different functions available in the package 
and to perform analysis.

For a list of built-in datasets in `"sdam"` use the `"utils"` function `data()` or `utils::data()` with the `'package'` argument. 

The CRAN distribution has four built-in datasets, while the development and legacy distributions add three more built-in datasets.

<div style="margin-bottom:10px;"> </div>

```{r, echo=TRUE, eval=FALSE}
# pop-up a new window
data(package="sdam")
```
```{r, echo=FALSE, eval=TRUE}
print(data(package="sdam"))
```

```{r}
# Data sets in package 'sdam':
# 
# rp        Roman province names and acronyms as in EDH
# rpcp      Roman provinces chronological periods
# rpd       Roman provinces dates from EDH
# rpmcd     Caption maps and affiliation dates of Roman provinces
```

```{r}
# Additional built-in datasets in 'sdam':
#
# EDH       Epigraphic Database Heidelberg Dataset
# rpmp      Maps of ancient Roman provinces and Italian regions
# retn      Roman Empire transport network and Mediterranean sea
```


<div style="margin-bottom:20px;"> </div>

A description of each dataset is available in the manual that from the R console is accessible as e.g. 
the `EDH` dataset in a non-CRAN distribution.

```{r, echo=TRUE, eval=FALSE}
# Epigraphic Database Heidelberg Dataset help
?EDH
```

<div style="margin-bottom:40px;"> </div>


### Ancient Mediterranean built-in datasets

The `EDH` dataset in `"sdam"` has information about Latin epigraphy retrieved from the Epigraphic Database Heidelberg API repository from the Roman world during the antiquity period.

A list of Roman provinces and regions in this dataset is available in dataset `"rp"`, and use again function `data()` to load this built-in dataset 
to look at its internal structure with `utils::str()` function.

<div style="margin-bottom:30px;"> </div>

* Dataset `"rp"` is a named list with Roman provinces and regions with acronyms according to the Epigraphic Database Heidelberg. 


```{r, echo=TRUE, eval=TRUE}
# load dataset
data("rp")

# obtain object structure
str(rp)
```


<div style="margin-bottom:50px;"> </div>



#### `edhw()` interface with `"rp"` dataset

* Function `edhw()` is a wrapper to extract and transform the records in the `EDH` dataset that invokes `"rp"` dataset 
to retrieve the records from a specific Roman province or region in `EDH`.

```{r}
# Armenian records in 'EDH'
edhw(province="Arm")[1]
```

The `Warning` messages from `edhw()` are first because there is not an explicit input in `x`, it is assumed that the input data is from the `EDH` dataset. 
The second warning message just tells the type object to return is always a list for argument `province` alone.


<div style="margin-bottom:20px;"> </div>

#### `EDH` in data frames

All records in the `EDH` dataset have a list format and it is possible to transform this information into a dataframe format 
with the wrapper function `edhw()`. 
For instance, displaying the first record from `Arm` as a data frame in argument `'as'` is made by the record `'id'` number.

```{r, echo=TRUE, eval=FALSE}
# record HD015521
edhw(id="15521", as="df")
```

<div style="margin-bottom:30px;"> </div>

However, it is easier to visualise in the screen only the variables related to people.

```{r, echo=TRUE, eval=FALSE}
# record HD015521 with explicit variables
edhw(id="15521", vars="people", as="df")
```
```{r, echo=FALSE, eval=TRUE}
suppressWarnings(edhw(id="15521", vars="people", as="df"))
```

<div style="margin-bottom:30px;"> </div>

```{r, echo=TRUE, eval=FALSE}
# record HD015521 with more explicit variables
edhw(id="15521", vars=c("people", "province_label"), as="df")
```
```{r, echo=FALSE, eval=TRUE}
suppressWarnings(edhw(id="15521", vars=c("people", "province_label"), as="df"))
```






<div style="margin-bottom:60px;"> </div>



### Obtaining  all `people` variables

Start by looking at the `people` variables in the `EDH` dataset for the Roman province of **Armenia**. 

<div style="margin-bottom:20px;"> </div>


#### Armenia


```{r, echo=FALSE, eval=TRUE, out.width="25%", fig.align="center", fig.cap="Roman province of Armenia (ca 117 AD)."}
plot.map("Arm", cap=TRUE, name=FALSE)
```

<div style="margin-bottom:20px;"> </div>

Transformation of the entire province from the `EDH` dataset requires extracting first a list with the province content. 
Function `edhw()` is to obtain available inscriptions per province from `EDH` and all data attributes from `people` variable. 
The default outputs are a list and a dataframe for the first and the second instance of the function. 

```{r, echo=TRUE, eval=FALSE}
# people in Armenia
edhw(province="Arm") |> 
  edhw(vars="people")
```
```{r, echo=FALSE, eval=TRUE}
edhw(province="Arm") |> 
  suppressWarnings() |> 
  edhw(vars="people") |> 
  suppressWarnings()
```


<div style="margin-bottom:30px;"> </div>

People attribute variables in inscriptions for `Armenia` are 
`age: years`, `cognomen`, `gender`, `name`, `nomen`, `person_id`, `praenomen`, and, `status`, 
but any inscription with `tribus` or `origo` as in the case of other provinces. 


For `Armenia`, two inscriptions have people variables and all people scripted are `male`, 
where record `HD015524` spans two rows because there are two persons where one have `nomen`, `cognomen`, and `name` ineligible. 



<div style="margin-bottom:60px;"> </div>



### Datasets for cartographical maps
The plotting of the Roman province in the previous section requires other datasets. 
Apart from `"rp"`. In `"sdam"`, there are other three datasets invoked for plotting cartographical maps related to the Roman Empire and the 
Mediterranean basin, which are `"rpmp"`, `"rpmcd"`, and `"retn"`. 

Function `plot.map()` calls dataset `"rpmp"` for the shapes and colours in the plotting of the cartographical maps 
of different regions of the Roman Empire. For the caption and province dates with this function shapes and colours are in dataset `"rpmcd"`. 

<div style="margin-bottom:30px;"> </div>

* Dataset `"retn"` bears the shapes of places and routes of an ancient transportation system in the Mediterranean region and political 
divisions of the Roman Empire. 
It also has it contours and parts of the European continent.  


```{r}
# land contour around Mediterranean
plot.map(type="plain")
```

<div style="margin-bottom:30px;"> </div>


```{r, echo=TRUE, eval=FALSE}
# display settlements and shipping routes
plot.map(type="plain", settl=TRUE, shipr=TRUE)
```

<div style="margin-bottom:30px;"> </div>

Vignette [Cartographical maps and networks](../doc/Maps.html) has more about 
transportation networks in the ancient Mediterranean.


<div style="margin-bottom:60px;"> </div>


### Datasets with dates

There are built-in datasets in `"sdam"` related to dates as well that are either displayed in a cartographical map or used for other computations. 

<div style="margin-bottom:30px;"> </div>


* Dataset `"rpd"` that has dates for provinces from the `EDH` dataset. 
It serves for performing a restricted imputation on data subsets in `EDH` or in another dataset.

```{r, echo=TRUE, eval=TRUE}
# dates from EDH
data("rpd")

# three provinces in object structure
str(rpd[1:3])
```


From this set of three Roman provinces in the `EDH`, the longest timespan is for `Aem`, and on average `Ach` 
has the oldest incriptions, while `Aeg` has incriptions with the newest dates.



<div style="margin-bottom:30px;"> </div>


* Dataset `"rpcp"` with chronological periods for regions with early and later Roman influence 
 per province.

```{r, echo=TRUE, eval=TRUE}
# periods for Roman provinces
data("rpcp")

# object structure
str(rpcp)
```

The early and later Roman influence in the 45 ancient provinces and regions are timespans with a 
*terminus ante quem* and a *terminus post quem*.

<div style="margin-bottom:30px;"> </div>

Vignette [Dates and missing dating data](../doc/Dates.html) has the visualisation of these 
and other dates.


<div style="margin-bottom:50px;"> </div>



### External data
Apart from the built-in datasets, it is attached as external data the semi-colon separated file `StraussShipwrecks.csv` 
with the Shipwrecks dataset for performing analyses: 
Reference and documentation in

Strauss, J. (2013). *Shipwrecks Database*. Version 1.0. 
Accessed (07-12-2021) from oxrep.classics.ox.ac.uk/databases/shipwrecks_database/

<div style="margin-bottom:20px;"> </div>

Built from Parker, A.J.  *Ancient Shipwrecks of the Mediterranean and the Roman Provinces* (Oxford: BAR International Series 580, 1992)


<div style="margin-bottom:30px;"> </div>

Details about the access to the database are in:

- [Shipwrecks network in the Mediterranean Basin (23-June-2022)](https://htmlpreview.github.io/?https://github.com/sdam-au/R_code/blob/master/HTML/Shipwrecks%20Network%20in%20the%20Mediterranean%20Basin.html)

- Vignettes [Dates and missing dating data](../doc/Dates.html) and 
            [Cartographical maps and networks](../doc/Maps.html) also use the Shipwrecks dataset.




<div style="margin-bottom:60px;"> </div>


### See also

#### Vignettes

* [Re-encoding `people` in the `EDH` dataset](../doc/Encoding.html)

* [Dates and missing dating data](../doc/Dates.html)

* [Cartographical maps and networks](../doc/Maps.html)

<div style="margin-bottom:30px;"> </div>

#### Manuals

* [sdam: Digital Tools for the SDAM Project at Aarhus University](../html/sdam-package.html)
* [`"sdam"` manual](https://github.com/mplex/cedhar/blob/master/typesetting/reports/sdam.pdf)

<div style="margin-bottom:30px;"> </div>

#### Project

* [Release candidate version](https://github.com/sdam-au/sdam)
* [Code snippets using `"sdam"`](https://github.com/sdam-au/R_code)
* [Social Dynamics and complexity in the Ancient Mediterranean project](https://sdam-au.github.io/sdam-au/)

<div style="margin-bottom:60px;"> </div>

&nbsp;
