---
title: "Data details"
author: "Chi Zhang"
date: "2023-05-18"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Data details}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

In this vignette, we provide more detailed information on the data included in `covidnor`, and demonstrate how to extract the data you need. 


## Covid data outcomes

We have the following groups of Covid data, for certain combinations of time and location specifications. 

There are two type of **time granularity**: day (`date`) and week (`isoyearweek`) in the data. For **geo granularity**, there are country (`nation`), county (`county`)and municipality (`municip`). Note that not all outcomes of interest have municipality level data.


Population data has been attached to compute number of cases **per 100.000 population** for a certain location. 

In this dataset we only provide **total** age groups and sex groups. 



### Cases

**Cases** are the PCR test confirmed Covid positive cases. We have the following variables:

* number of cases (counts, per 100.000 population) by the date of PCR test, `cases_by_testdate_n`, `cases_by_testdate_vs_pop_pr100000`
* number of cases (counts, per 100.000 population) by the date of registration, `cases_by_regdate_n`, `cases_by_regdate_vs_pop_pr100000`

Available for 

* granularity_time: date, isoyearweek
* granularity_geo: nation, county, municip (only for registration date)

### Tests

**Tests** are the number of testing events. We have the following variables: 

* number of testing events (all, positive, negative), `testevents_all_n`, `testevents_pos_n`, `testevents_neg_n`
* percentage of testing events that are positive, `testevents_pos_vs_all_pr100`

Available for 

* granularity_time: date, isoyearweek
* granularity_geo: nation


### Hospital admission

We provide two variables related to hospital admissions: 

* admission due to Covid as main cause, `hospital_admissions_main_cause_n`; 
* ICU admission, `icu_admissions_n`.  

Available for 

* granularity_time: date, isoyearweek
* granularity_geo: nation

### Vaccination

For vaccination we provide data for two dates: vaccination date and registration date. We have data on 4 doses. For each dose, we have the following (e.g. dose 1):  

* number of vaccinations delivered on the date, `vax_dose_1_by_vaxdate_n`
* number of vaccinations registered in SYSVAK, `vax_dose_1_by_regdate_n`
* cumulative number of vaccinations delivered by the date, `vax_dose_1_by_vaxdate_sum0_999999_n`


Available for 

* granularity_time: date, isoyearweek
* granularity_geo: nation, county, municip (only for registration date)






## Subsetting data 

Instead of working directly on `total` data, you might want to use a certain combination of **time, location, outcome**. We recommend using the [data.table](https://CRAN.R-project.org/package=data.table) syntax for data filtering and subsetting.

The way we organize time and location codes is documented in more detail in another csverse package, [cstidy](https://www.csids.no/cstidy/index.html). We highly recommend you read through this [vignette](https://www.csids.no/cstidy/articles/csfmt_rts_data_v1.html#context-specific-columns)!



### Based on `granularity_time` and `granularity_geo`

To get **weekly** Covid cases and hospital admissions as main cause for Norway (nation):

```{r}
# load total data (419k rows)
totaldata <- covidnor::total_b2020

# get weekly cases (confirmed) and hospitalisation for Norway
case_hosp <- totaldata[granularity_time == 'isoyearweek' &
                     granularity_geo == 'nation',
                   .(date, 
                     location_name, 
                     cases = cases_by_testdate_n, 
                     hospital_adm = hospital_admissions_main_cause_n)]
case_hosp[1:6,]
```


### Based on specific dates and locations

Get data for a certain date and location combination:

```{r}
totaldata[date == '2021-12-10' & location_code %in% c('county_nor03', 'county_nor15'), 
          .(date, location_name, 
            cases = cases_by_testdate_n, 
            vax_1 = vax_dose_1_by_vaxdate_n, 
            vaxcum1 = vax_dose_1_by_vaxdate_sum0_999999_n)]
```

Can also get data for a whole calendar month, such as April 2022, 

```{r}
totaldata[calyearmonth == '2022-M04' & location_code == 'county_nor03', 
          .(date, location_name, 
            cases = cases_by_testdate_n)]
```



