---
title: <br><center>Dates and missing dating data in `"sdam"`</center>
date:  "August 2022" 
author: 
  - name: <center>Antonio Rivero Ostoic</center><div style="margin-bottom:-20px;"> </div>
    affiliation: <center>Aarhus University</center><div style="margin-bottom:-20px;"> </div>
    email: <center>jaro@cas.au.dk</center>
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Dates and missing dating data in `"sdam"`}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

<style type="text/css">
h1.title {
  text-align: center; font-size: 20pt; color: #400040; 
}
h2 {
  font-size: 18pt; color: #400040;
}
h3 {
  font-size: 14pt; color: #400080;
}
h4 {
  text-align: center; color: DarkRed;
}
</style>


```{r setup, echo=FALSE, message=FALSE}
knitr::opts_chunk$set(echo=TRUE,error=TRUE)
knitr::opts_chunk$set(comment = "")
library("sdam")
```

```{r set-options, echo=FALSE, cache=FALSE}
options(width = 96)
```


<div style="margin-bottom:20px;"> </div>


### Preliminaries
Install and load a version of `"sdam"` package.

<div style="margin-bottom:30px;"> </div>


```{r, echo=TRUE, eval=FALSE}
install.packages("sdam") # from CRAN
devtools::install_github("sdam-au/sdam") # development version
devtools::install_github("mplex/cedhar", subdir="pkg/sdam") # legacy version R 3.6.x
```
<div style="margin-bottom:10px;"> </div>
```{r}
# load and check versions
library(sdam)
packageVersion("sdam")
```

<div style="margin-bottom:40px;"> </div>


## Dating data
Temporal data is significant when it comes to analysing the history of archaeological 
artefacts like written markers from the Ancient Mediterranean. 
In the `EDH` dataset, for example, dates for inscriptions are plausible timespans of existence 
with the endpoints in variables `not_before` and `not_after` that, from the perspective of the timespan, 
are the *terminus ante quem* (TAQ) and *terminus post quem* (TPQ) of the time segment. 
However, not all inscriptions have these two variables filled by domain experts and replacing missing dating data constitutes a challenge. 

Besides `EDH`, other datasets with `"sdam"` the package and related functions involve dating data in
the ancient Mediterranean like displaying dates and time segments in a plot, by organising dates within Roman provinces, 
and by performed imputation techniques for missing dating data.



<div style="margin-bottom:60px;"> </div>


### Plotting temporal data


#### Shipwrecks dataset dating data

An example of plotting dates is with the Shipwrecks external dataset, which is a semicolon separated file of different variables.

<div style="margin-bottom:40px;"> </div>

References for shipwrecks data are in 

- Vignette [Datasets in `"sdam"` package](../doc/Intro.html) 

<div style="margin-bottom:40px;"> </div>


When reading the shipwrecks external dataset with ` read.csv` make sure to use the right separator in `sep` and leave untouched the names of the variables. 


```{r}
# load shipwrecks external dataset
sw <- system.file("extdata","StraussShipwrecks.csv",package="sdam") |> 
  read.csv(sep=";", check.names=FALSE)
```

```{r}
# variables in shipwrecks dataset
colnames(sw)
```

Plot the time segments with function `plot.dates()` and a customized `'id'` where variables 15 to 16 in `sw` have timespans of existence as `'taq'` and `'tpq'`. 

```{r, echo=TRUE, eval=TRUE, fig.width=4, fig.height=4, fig.align="center", fig.cap="Range of timespans in Shipwrecks dataset"}
# shipwrecks dates with Wreck ID
plot.dates(sw, id="Wreck ID", type="rg", taq="Earliest date", tpq="Latest date", col=4)
```

<div style="margin-bottom:60px;"> </div>


#### Mid points and range of timespan

The mid points and range of shipwrecks data are explicitly computed by function `prex()` with the `mp` option in the `'type'` argument. 
`'vars'` stands for the variables that in this case are TAQ and TPQ, and the `'keep'` option allows maintaining the rest of the variables 
in the output that for `prex()` with mid points is a data frame. 


```{r}
# add mid points and range to shipwrecks data
prex(sw[c(1,7,15:16)], type="mp", vars=c("Earliest date", "Latest date"), keep=TRUE) |> 
  tail()
```

<div style="margin-bottom:30px;"> </div>


The default `'type'` option and chronological phase in `prex()` are the aoristic sum with a five periods bin or `bin5`.


```{r}
# aoristic sum shipwrecks
prex(sw[c(1,7,15:16)], vars=c("Earliest date", "Latest date"))
```


<div style="margin-bottom:30px;"> </div>


For an eight chronological periods bin in the shipwrecks dataset

```{r}
# aoristic sum shipwrecks 8 bin
prex(sw[c(1,7,15:16)], vars=c("Earliest date", "Latest date"), cp="bin8")
```


<div style="margin-bottom:60px;"> </div>


For aoristic sum algorithm, cf. [Temporal Uncertainty](https://mplex.github.io/cedhar/Uncertainty.html).


<div style="margin-bottom:60px;"> </div>



## Dating data in the Roman world

Many functions and datasets in `"sdam"` are related to temporal information of the Roman world, 
particularly from the Roman Empire during the classical ancient period.


<div style="margin-bottom:30px;"> </div>

Function `plot.map()` is to depict cartographical maps per Roman province or region, and it has a `'date'` argument to display dates within the caption. Dates in this case are one or two years either for the consolidation of the Italian peninsula or the affiliation of the region to the Roman Empire. 


```{r, echo=TRUE, eval=FALSE}
# silhouette of Italian peninsula
plot.map(x="Ita", date=TRUE)
## not run
```

<div style="margin-bottom:30px;"> </div>

* The built-in dataset `rpmcd` has the shapes and colours used in the cartographical maps with `plot.map()`, and some 
dates related to provinces as well. 


```{r}
# 59 provinces dates, colors, and shapes
data("rpmcd")

# province acronyms as in EDH
names(rpmcd)
```

<div style="margin-bottom:60px;"> </div>



### Roman provinces establishment dates

The establishment dates of Roman provinces used in the cartographical map captions are in the second 
component of `rpmcd`.


```{r}
# pipe dataset for dates in second component
rpmcd |> 
  lapply(function (x) x[[2]]) |> 
  head()
```


<div style="margin-bottom:40px;"> </div>

A vector of establishment dates in years from the `"rpmcd"` dataset is recorded in object `est` that 
allow making a chronology of the Roman provinces.


```{r, echo=-5}
# second component in dataset
est <- rpmcd |> 
  lapply(function (x) x[[2]]) |> 
  unlist(use.names=FALSE)
est
```

<div style="margin-bottom:60px;"> </div>

### Formatting dates
The establishment dates of Roman provinces and regions are in vector `est`, and these dates can become 
more standard with the function `cln()` for further processing. 
This is a cleaning function where, for instance, level `9` removes all content after the first parenthesis 
in the input while the other levels are for specific needs.


```{r}
# clean levels are 0-9
cln(est, level=9)
```

<div style="margin-bottom:30px;"> </div>

After this transformation of the data in `est`, is possible to format dates 
as numerical data with function `dts()`, which takes the first 
value when there are two competing dates in the input; unless the opposite is specified 
in the `'last'` argument. 


```{r}
# update object with establishment dates
est <- est |> 
  cln(level=9) |> 
  dts()
```
```{r}
est
```


<div style="margin-bottom:60px;"> </div>


### Chronology of Roman provinces
Object `est` has a chronology for the establishment dates of Mediterranean regions and territories as 
Roman provinces that corresponds to the provinces in `"rpmcd"` dataset. 
The union of the names of provinces and dates of establishment as a Roman province is a data frame object 
`rpde` that better displays without the row names.



```{r}
# Roman province dates of establishement (strings still strings)
rpde <- cbind(names(rpmcd),dts(est)) |>
  as.data.frame(stringsAsFactors=FALSE)
```
```{r}
rownames(rpde) <- NULL
head(rpde)
```

<div style="margin-bottom:30px;"> </div>

Because the dates have a numerical format from function `dts()`, the data frame allows producing a chronology of affiliation dates for the provinces and regions to the Roman Empire by ordering the second variable in `rpde`. 



```{r}
# order of affiliation of provinces
rpde[order(as.numeric(rpde$V2)),1]
```

<div style="margin-bottom:30px;"> </div>

The regions in the Italian peninsula have the earliest affiliation dates, and Mesopotamia has the latest affiliation date to the Roman Empire.


<div style="margin-bottom:50px;"> </div>


### Roman influence periods

* Dataset `"rpcp"` has influence periods of the Roman Empire.

```{r, echo=TRUE, eval=TRUE}
# list with 45 early and late influence dates provinces
data("rpcp")
```

```{r}
# look at data internal structure
str(rpcp)
```

<div style="margin-bottom:50px;"> </div>


#### Early period of Roman influence

Visualize time intervals of early Roman influence in provinces and regions.

```{r, echo=TRUE, eval=FALSE}
# early influence dates are in first list of 'rpcp'
plot.dates(x=rpcp[[1]], taq="EarInf", tpq="OffPrv", main="Early period", ylab="province")
```
```{r, echo=FALSE, eval=TRUE, fig.width=4, fig.height=4, fig.align="center"}
plot.dates(x=rpcp[[1]], taq="EarInf", tpq="OffPrv", main="Early period", ylab="province", yaxt="n")
```

<div style="margin-bottom:50px;"> </div>


#### Late period and fall from the Roman Empire

Time intervals of late Roman influence in provinces and regions depicted with mid points and 
range interval if longer than one.

```{r, echo=TRUE, eval=FALSE}
# late influence dates are in second list of 'rpcp'
plot.dates(x=rpcp[[2]], type="mp", taq="LateInf", tpq="Fall", lwd=5, col="red", 
           main="Late period", ylab="province")
```
```{r, echo=FALSE, eval=TRUE, fig.width=4, fig.height=4, fig.align="center"}
plot.dates(x=rpcp[[2]], type="mp", taq="LateInf", tpq="Fall", lwd=5, col="red", main="Late period", ylab="province", yaxt="n")
```


<div style="margin-bottom:80px;"> </div>



## Restricted imputation of missing dating data

* Dataset `rpd` has time intervals for `"not_before"` and `"not_after"` that corresponds to the dating data 
in the `EDH` dataset.


```{r, echo=TRUE, eval=TRUE}
# Roman provinces dates from EDH
data("rpd")
```

```{r}
# Rome
summary(rpd$Rom)
```

```{r}
# Aegyptus
summary(rpd$Aeg)
```

<div style="margin-bottom:10px;"> </div>

These intervals are the basis for a restricted imputation of missing dating data in `EDH`


<div style="margin-bottom:40px;"> </div>

### Imputation of dates by province
Function `edhwpd()` constructs, for a chosen province, a list of data frames with the 
components made of its inscriptions related by attribute co-occurrences. 
The replacement of missing dates occurs in this setting with function `rmids()` that stand for 
*restricted multiple imputation on data subsets*.

An example of restricted multiple imputations is the province of **Armenia** which has the fewest inscriptions in the `EDH` dataset. Dataset `rpd` is a list where each component corresponds to a province and where the component class provides the `HD` `ids` of inscriptions.



```{r}
# Armenia
rpd$Arm
```


<div style="margin-bottom:40px;"> </div>

#### Imputation of inscriptions by similarity

Imputation from similarities of attribute variables per province and dates is organised with wrapper function `edhwpd()` having different argument options.

```{r}
# list with arguments
formals(edhwpd)
```

<div style="margin-bottom:40px;"> </div>

By default, the input data for this function is the `EDH` dataset and the organisation is based on characteristics of the 
artefacts in `vars`. 


<div style="margin-bottom:10px;"> </div>


```{r}
# characteristics of inscriptions
vars = c("findspot_ancient", "type_of_inscription", "type_of_monument", "language")
```
<div style="margin-bottom:10px;"> </div>

Function `rmids()` performs the multiple imputation of missing dating data in `EDH` by default or in another dataset as input.
In the case of `Arm`, record `HD015521` has censored data in dates while the other two records have complete missing dating data.


```{r, echo=TRUE, eval=TRUE}
# Armenia: restricted imputation of dates
edhwpd(vars=vars, province="Arm") |> 
  rmids()
```

<div style="margin-bottom:30px;"> </div>


The warnings tell us that the imputation values are taken from the respective province in the `rpd` dataset 
where `avg len TS` stands for *average length of timespan*, `min TAQ` is the minimum value of `not_before`, and
`max TPQ` is the maximum value of `not_after`.


<div style="margin-bottom:50px;"> </div>


### Pooling results

Since there are multiple imputations of missing dating data, one next step is to combine the data by pooling rules of the *m* results from function `rmids()` into final point estimates plus standard error.


Pooling options for time intervals are take:

* average time-span with `avg len TS`
* `min TAQ` and `max TPQ`
* `max TAQ` and `min TPQ`

With these options, there is a single imputed value per variable with implied consequences.


<div style="margin-bottom:60px;"> </div>

&uml;

<div style="margin-bottom:60px;"> </div>



<div style="margin-bottom:60px;"> </div>

### See also

#### Vignettes

* [Datasets in `"sdam"` package](../doc/Intro.html)

* [Re-encoding `people` in the `EDH` dataset](../doc/Encoding.html)

* [Cartographical maps and networks](../doc/Maps.html)

<div style="margin-bottom:30px;"> </div>


#### Reference Manual

* [sdam: Digital Tools for the SDAM Project at Aarhus University](../html/sdam-package.html)
* [`"sdam"` manual](https://github.com/mplex/cedhar/blob/master/typesetting/reports/sdam.pdf)

<div style="margin-bottom:30px;"> </div>

#### Project

* [Release candidate version](https://github.com/sdam-au/sdam)
* [Code snippets using `"sdam"`](https://github.com/sdam-au/R_code)
* [Social Dynamics and complexity in the Ancient Mediterranean project](https://sdam-au.github.io/sdam-au/)

<div style="margin-bottom:60px;"> </div>

&nbsp;
