\documentclass{article}
\usepackage{enumerate}
\usepackage{listings}

\newcommand{\rcode}[1]{\lstinline[language=R,basicstyle=\normalsize\ttfamily]!#1!}

%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{Using fitPS to fit number of groups data}

\title{Fitting a zeta distribution to a {P} survey---number of groups data}

\begin{document}

<<R0-setup, echo=FALSE>>=
options(width=70)
knitr::opts_chunk$set(message = FALSE,
                      #background = opcolour,
                      comment = "",
                      highlight = TRUE,
                      prompt = TRUE,
                      tidy = TRUE,
                      tidy.opts = list(arrow = FALSE),
                      warning = FALSE)
suppressWarnings(suppressPackageStartupMessages({library(fitPS)}))
@

In this example we will learn how to use \rcode{fitPS} to fit a zeta distribution to some data from a survey where the number of groups of glass found is recorded. The data in this example comes from @roux2001 who surveyed the footwear of 776 individuals in south-eastern Australia, and is summarised in the table below.
<<R-01, results='asis', echo=FALSE>>=
library(xtable)

tbl = Psurveys$roux$data
colnames(tbl) = c("$n$", "$r_n$")
tbl = xtable(tbl, align = "rrr", digits = 0)
print(tbl, type="latex", , include.rownames = FALSE, sanitize.text = function(x){x})
@

This data set is built into the package and can be accessed from the \rcode{Psurveys} object. That is, we can type:
<<R-02>>=
data("Psurveys")
roux = Psurveys$roux
@
The data is stored as an object of class \rcode{psData}. This probably will not be of importance to most users. If you are interested in the details, then these can be found in the \textbf{Value} section of the help page for \rcode{readData}. There is an S3 \rcode{print} method for objects of time, meaning that if we print the object---either by typing its name at the command prompt, or by explicitly calling \rcode{print}---then we will get formatted printing of the information contained within the object. Specifically:
\begin{enumerate}[-]
  \item the values of $n$ and $r_n$ will be printed out in tabular format (where $n$ is the number of groups or the size of the groups, and $r_n$ is the number of times $n$ has been observed in the survey),
  \item the type of survey will be printed---either "Number of Groups" or "Group Size",
  \item and if the object has a reference or notes attached to it, then these will be
  printed as well
\end{enumerate}
For example
<<R-03>>=
roux
@
It is very simple to fit a zeta distribution to this data set. We do this using the \rcode{fitDist} function.
<<R-04>>=
fit = fitDist(roux)
@
The function returns an object of class \rcode{psFit}, the details of which can be found in the help page for \rcode{fitDist}. There are both S3 \rcode{print} and S3 \rcode{plot} methods for objects of this class. The \rcode{print} method method displays an estimate of the shape parameter $s$, an estimate of the standard deviation---the standard error---of the estimate of $s$ ($\widehat{\mathrm{sd}}(\hat{s})=\mathrm{se}(\hat{s})$). **NOTE**: it is important to understand that the value of the shape parameter that is displayed, and the value that is stored in the fitted object differ by 1. That is, $s$ is shown,m and $s^\prime = s - 1$ is stored. This is done because the package which assists in the fitting, \rcode{VGAM}, is parameterised in terms of $s^\prime$. This difference only has consequences if the fitted value is being used in conjuction with other \rcode{VGAM} functions.   The \rcode{print} method also displays the first 10 fitted probabilities from the model by default.
<<R-05>>=
fit
@
The package provides a \rcode{confint} method for the fitted value. The method returns both a Wald confidence interval and profile likelihood interval. The Wald interval takes the usual the form where the lower and upper bound are given by $\hat{s} \pm z^*(1-\alpha/2)\times se(\hat{s}))$. The profile likelihood interval finds the end-points of the interval that satisfies
\[
-2\left[\ell(\hat{s};\mathbf{x})-\ell(s;\mathbf{x})\right] \le \chi^2_1(\alpha)
\]
where $\ell(s;\mathbf{x})$ is value of the log-likelihood given shape parameter $s$. The two intervals are returned as elements of a \rcode{list} named \rcode{wald} and \rcode{prof} respectively.
<<R-06>>=
ci = confint(fit)
ci$wald
ci$prof
@
You will notice that neither of these intervals contain the value shown in the previous output. However, this simply because they are confidence intervals on $s^\prime$ and not $s$. This can be remedied by adding one to each interval:
<<R-07>>=
ci$wald + 1
ci$prof + 1
@
The reason for not \emph{correcting} these intervals is that the method mostly exists to feed into other parts of the package, especially the \rcode{plot} method.
\end{document}
