---
title: "Getting UK tax data with hmrc"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting UK tax data with hmrc}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

The `hmrc` package provides tidy access to statistical data published by HM Revenue
and Customs (HMRC) on GOV.UK. All functions resolve download URLs at runtime via
the GOV.UK Content API and cache files locally between sessions. Every result is
returned as an `hmrc_tbl`, a subclass of `data.frame` carrying provenance metadata
(source URL, fetch time, vintage, cell methods) for reproducible fiscal research.

```{r load}
library(hmrc)
```

## Discovery

`hmrc_search()` searches the dataset catalogue by keyword. `hmrc_publications()`
returns a tidy index of implemented and planned datasets.

```{r discover}
# Anything in the catalogue mentioning capital gains
hmrc_search("capital gains")

# Only annual datasets already implemented
hmrc_search(implemented = TRUE, frequency = "annual")
```

## Monthly tax receipts

`hmrc_tax_receipts()` downloads the monthly HMRC Tax Receipts and National Insurance
Contributions bulletin, covering 41 tax heads from April 2008 to the most recent
published month.

```{r receipts-basic}
# All 41 tax heads
receipts <- hmrc_tax_receipts()
head(receipts)
#>         date   tax_head          description receipts_gbp_m
#>   2016-04-01 income_tax Income Tax (PAYE...          17423
#>   2016-05-01 income_tax Income Tax (PAYE...          11847
```

Use `hmrc_list_tax_heads()` to see all available identifiers without downloading
data:

```{r list-heads}
hmrc_list_tax_heads()
```

Filter to specific heads and date ranges:

```{r receipts-filter}
big_three <- hmrc_tax_receipts(
  tax   = c("income_tax", "vat", "nics_total"),
  start = "2020-01"
)
```

Inspect the provenance metadata on any result:

```{r meta}
hmrc_meta(big_three)
#> $dataset
#> [1] "tax_receipts_monthly"
#> $source_url
#> [1] "https://www.gov.uk/government/statistics/hmrc-tax-and-nics-receipts-for-the-uk"
#> $cell_methods
#> [1] "cash"
#> $frequency
#> [1] "monthly"
#> $fetched_at
#> [1] "2026-04-26 09:00:00 UTC"
```

```{r receipts-plot, fig.width = 7, fig.height = 4}
library(ggplot2)

ggplot(big_three, aes(x = date, y = receipts_gbp_m / 1000, colour = description)) +
  geom_line(linewidth = 0.8) +
  scale_y_continuous(labels = scales::label_comma(suffix = "bn")) +
  labs(
    title   = "UK monthly tax receipts",
    x       = NULL,
    y       = "GBP billions",
    colour  = NULL,
    caption = "Source: HMRC Tax Receipts and NICs bulletin"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom")
```

## VAT

`hmrc_vat()` covers monthly VAT receipts from April 1973, broken into payments,
repayments, import VAT, and home VAT.

```{r vat}
# Net VAT: total minus repayments
vat <- hmrc_vat(measure = c("total", "repayments"), start = "2015-01")

# Repayments are recorded as negative (money flowing out of HMRC)
head(vat[vat$measure == "repayments", c("date", "receipts_gbp_m")])
```

## Fuel duties

`hmrc_fuel_duties()` covers monthly hydrocarbon oil duty receipts from January 1990,
broken down into petrol, diesel, other, and total.

```{r fuel}
fuel <- hmrc_fuel_duties(fuel = "total", start = "2010-01")

# Annual totals
fuel$year <- format(fuel$date, "%Y")
aggregate(receipts_gbp_m ~ year, data = fuel, FUN = sum)
```

## Tobacco duties

`hmrc_tobacco_duties()` covers monthly tobacco duty receipts from January 1991, by
product: cigarettes, cigars, hand-rolling tobacco, other, and total.

```{r tobacco}
tobacco <- hmrc_tobacco_duties(product = c("cigarettes", "hand_rolling"),
                               start   = "2015-01")
```

## Corporation Tax

`hmrc_corporation_tax()` returns annual Corporation Tax receipts broken down by levy
type: onshore CT, offshore CT, Bank Levy, Bank Surcharge, Residential Property
Developer Tax (RPDT), Energy Profits Levy (EPL), and Electricity Generators Levy
(EGL). Covers 2019-20 to the most recent financial year.

```{r ct}
ct <- hmrc_corporation_tax()
ct[ct$type == "total_ct", c("tax_year", "receipts_gbp_m")]
```

## Stamp duty

`hmrc_stamp_duty()` returns annual stamp duty receipts by type from 2003-04: SDLT
on property, SDLT on new leases, SDRT on shares, and stamp duty on documents.

```{r stamp}
sd <- hmrc_stamp_duty(type = "sdlt_total")
tail(sd[, c("tax_year", "receipts_gbp_m")], 5)
```

## R&D tax credits

`hmrc_rd_credits()` returns annual statistics on R&D tax credit claims and their
cost by scheme (SME R&D Relief and RDEC) from 2000-01.

```{r rd}
# Cost of R&D credits: SME vs RDEC
rd <- hmrc_rd_credits(measure = "amount_gbp_m")
rd[rd$tax_year == "2023-24", c("scheme", "description", "value")]
```

## Capital Gains Tax

`hmrc_capital_gains()` returns annual estimates of CGT taxpayers, gains, and tax
liabilities from 1987-88 (HMRC CGT Table 1).

```{r cgt}
# Total CGT receipts over time
cgt <- hmrc_capital_gains(measure = "tax_total_gbp_m")
tail(cgt[, c("tax_year", "value")], 6)
```

## Inheritance Tax

`hmrc_inheritance_tax()` returns IHT estate counts, tax due, average tax, and
effective tax rates by net-estate band for the latest published year of death
(HMRC IHT Table 12.1a). The publication carries a roughly three-year
administrative lag.

```{r iht}
iht <- hmrc_inheritance_tax()
iht[iht$measure == "number_taxed" & iht$estate_band != "Total",
    c("estate_band", "value")]
```

## Patent Box

`hmrc_patent_box()` returns the annual count of companies electing into the
Patent Box and total relief claimed (HMRC Patent Box Table 1) from 2013-14
onwards.

```{r patent-box}
hmrc_patent_box()
```

## Creative Industries reliefs

`hmrc_creative_industries()` returns annual reliefs across the eight creative
industries reliefs (film, high-end TV, animation, children's TV, video games,
theatre, orchestra, museums and galleries).

```{r creative}
# Film tax relief over time
hmrc_creative_industries(sector = "film")

# All eight sectors in the latest year
hmrc_creative_industries(tax_year = "2023-24")
```

## Tax gap

`hmrc_tax_gap()` returns the most recent cross-sectional tax gap estimates,
broken down by tax type, taxpayer group, and behaviour component (evasion,
error, avoidance, etc.).

```{r taxgap}
gap <- hmrc_tax_gap()

# Sort by absolute gap
gap[order(-gap$gap_gbp_bn),
    c("tax", "component", "gap_gbp_bn", "uncertainty")]
```

## Income Tax liabilities

`hmrc_income_tax_stats()` returns annual Income Tax liabilities by income range
(HMRC Table 2.5), including taxpayer counts, total income, tax liabilities, and
average tax rates.

```{r income-tax}
it <- hmrc_income_tax_stats(tax_year = "2023-24")
it[, c("income_range", "taxpayers_thousands", "tax_liability_gbp_m", "average_rate_pct")]
```

## Property transactions

`hmrc_property_transactions()` returns monthly counts of residential and
non-residential property transactions by UK nation from April 2005.

```{r property}
prop <- hmrc_property_transactions(
  type   = "residential",
  nation = "uk",
  start  = "2018-01"
)
```

```{r property-plot, fig.width = 7, fig.height = 4}
ggplot(prop, aes(x = date, y = transactions / 1000)) +
  geom_line(colour = "#3B82F6", linewidth = 0.8) +
  scale_y_continuous(labels = scales::label_comma(suffix = "k")) +
  labs(
    title   = "UK residential property transactions",
    x       = NULL,
    y       = "Transactions (thousands)",
    caption = "Source: HMRC Monthly Property Transactions bulletin"
  ) +
  theme_minimal(base_size = 12)
```

## Caching

All downloads are cached locally in your user cache directory. Subsequent calls
return the cached file instantly with no network request.

```{r cache}
# Inspect the cache
hmrc_cache_info()

# Remove files older than 30 days
hmrc_clear_cache(max_age_days = 30)

# Remove everything and start fresh
hmrc_clear_cache()
```
