\documentclass[nojss]{jss}
\usepackage{Sweave}

\author{Tomasz G\'orecki\\Adam Mickiewicz University \And
  \L ukasz Smaga\\Adam Mickiewicz University}
  \title{\pkg{fdANOVA}: An \proglang{R}~Software Package for\\ Analysis of Variance for Univariate and Multivariate Functional Data}

  \Plainauthor{Tomasz G\'orecki, \L ukasz Smaga}
  \Plaintitle{fdANOVA: An R Software Package for\\ Analysis of Variance for Univariate and Multivariate Functional Data}
  \Shorttitle{\pkg{fdANOVA}: Analysis of Variance for Functional Data in \proglang{R}}

  \Abstract{
Functional data, i.e., observations represented by curves or functions, frequently arise in various fields. The theory and practice of statistical methods for such data is referred to as functional data analysis (FDA) which is one of major research fields in statistics. The practical use of FDA methods is made possible thanks to availability of specialized and usually free software. In particular, a number of \proglang{R}~packages is devoted to these methods. In the vignette, we introduce a new \proglang{R}~package \pkg{fdANOVA} which provides an access to a broad range of global analysis of variance methods for univariate and multivariate functional data. The implemented testing procedures mainly for homoscedastic case are briefly overviewed and illustrated by examples on a well known functional data set. To reduce the computation time, parallel implementation is developed.
}
  \Keywords{analysis of variance, functional data, \pkg{fdANOVA}, \proglang{R}}
  \Plainkeywords{analysis of variance, functional data, fdANOVA, R} %% without formatting

  \Address{
  Tomasz G\'orecki, \L ukasz Smaga\\
  Faculty of Mathematics and Computer Science\\
  Adam Mickiewicz University\\
  Umultowska 87, 61-614 Pozna\'n, Poland\\
  E-mail: \email{tomasz.gorecki@amu.edu.pl, ls@amu.edu.pl}}
  %% for those who use Sweave please include the following line (with % symbols):
  %% need no \usepackage{Sweave.sty}

  \usepackage{amsmath}
  \usepackage{booktabs}

  \newcommand{\vect}[1]{\mbox{\boldmath${#1}$}}

  \begin{document}
\SweaveOpts{concordance=TRUE}

\SweaveOpts{engine=R,eps=FALSE}
%\VignetteIndexEntry{fdANOVA: An R Software Package for Analysis of Variance for Univariate and Multivariate Functional Data}
%\VignetteDepends{}
%\VignetteKeywords{analysis of variance, functional data, fdANOVA, R}
%\VignettePackage{fdANOVA}

<<preliminaries,echo=FALSE,results=hide>>=
options(prompt = "R> ", continue = "+  ", width = 70, useFancyQuotes = FALSE)
@

\section[Introduction]{Introduction}
\label{sec1}
In recent years considerable attention has been devoted to analysis of so called functional data. The functional data are represented by functions or curves which are observations of a random variable (or random variables) taken over a continuous interval or in large discretization of it. Sets of functional observations are peculiar examples of the high-dimensional data where the number of variables significantly exceeds the number of observations. Such data are often gathered automatically due to advances in modern technology, including computing environments. The functional data are naturally collected in agricultural sciences, behavioral sciences, chemometrics, economics, medicine, meteorology, spectroscopy, and many others. The main purpose of functional data analysis (FDA) is to provide tools for statistically describing and modelling sets of functions or curves. The monographs by \cite{RamsaySilverman2005}, \cite{FerratyVieu2006}, \cite{HorvathKokoszka2012} and \cite{Zhang2013} present a broad perspective of the FDA solutions. The following problems for functional data are commonly studied \citep[see also the review papers of][]{Cuevas2014,Wangetal2016}: analysis of variance \citep{Faraway1997, CFF2004, ShenFaraway2004, ZhangChen2007, CAFB2010, Zhang2011, Zhang2013,
ZhangLiang2014, GoreckiSmaga2015, Zhangetal2013}, canonical correlation analysis \citep{KW2013, KeserKocakoc2015}, classification \citep{JamesHastie2001, DelaigleHall2012, DelaigleHall2013, Changetal2014}, cluster analysis \citep{Giacofcietal2013, Coffeyetal2014, YamamotoTerada2014}, outlier detection \citep{FBetal2008}, principal component analysis \citep{Boenteetal2014, Fremdtetal2014}, regression analysis \citep{Chenetal2011, Hilgertetal2013, Morris2015, Robbianoetal2015, Pengetal2016}, repeated measures analysis \citep{MCC2011,Smaga2017}, simultaneous confidence bands \citep{Degras2011, Caoetal2012, Maetal2012}. The above references concern the univariate functional data. However, some methods have their multivariate counterparts, e.g., analysis of variance \citep{GoreckiSmaga2017}, canonical correlation analysis \citep{Goreckietal2016a}, classification \citep{Goreckietal2015, Goreckietal2016a,Goreckietal2016b}, cluster analysis \citep{Tokushigeetal2007, JacquesPreda2014,ParkAhn2016}, linear regression and prediction \citep{Chiouetal2016,KrzyskoSmaga2017}, principal component analysis \citep{Berrenderoetal2011, Chiouetal2014}, statistical modelling \citep{ChiouMuller2014}. Some examples of applications of functional data analysis can be found in  \cite{Ogdenetal2002}, \cite{Pfeifferetal2002}, \cite{Rossietal2002}, \cite{JankShmueli2006}, \cite{FBetal2008}, \cite{Bobelynetal2010},  \cite{TarrioSaavedraetal2011} and \cite{Longetal2012}.

Many methods for functional data analysis have been already implemented in the \proglang{R}~software \citep{RCoreTeam2017}. The packages \pkg{fda} \citep{Ramsayetal2009,Ramsayetal2014} and \pkg{fda.usc} \citep{FBOF2012} are the biggest and probably the most commonly used ones. The first package includes the techniques for functional data in the Hilbert space $L_2(I)$ of square integrable functions over an interval $I=[a,b]$. On the other hand, in the second one, many of the methods implemented do not need such assumption and use only the values of functions evaluated on a grid of discretization points (also non-equally spaced). There is also a broad range of \proglang{R}~packages containing solutions for more particular functional data problems: Widely used principal components analysis for functional data implemented in the packages \pkg{fda} and \pkg{fda.usc} is also available in \pkg{fdapace} \citep{Daietal2016}, \pkg{fpca} \citep{PengPaul2011}, \pkg{fdasrvf} \citep{fdasrvf2017} and \pkg{MFPCA} \citep{MFPCA2017, HappGreven2016} ones. For classification and cluster analysis, the packages \pkg{classiFunc} \citep{classiFunc2017}, \pkg{fdakma} \citep{Parodietal2015}, \pkg{Funclustering} \citep{Soueidatt2014}, \pkg{funcy} \citep{Yassouridis2016} and \pkg{GPFDA} \citep{GPFDA2014} can be used. In the packages \pkg{fdaPDE} \citep{Lilaetal2016}, \pkg{FDboost} \citep{Brockhausetal2015, BrockhausRugamer2016}, \pkg{flars} \citep{flars2016}, \pkg{funreg} \citep{DziakShiyko2016} and \pkg{refund} \citep{Goldsmithetal2016}, a lot of work was done to develop the regression analysis for functional data. The packages \pkg{freqdom} \citep{freqdom2015}, \pkg{ftsa} \citep{Shang2013, HyndmanShang2016}, \pkg{ftsspec} \citep{Tavakoli2015} and  \pkg{pcdpca} \citep{Kidzinskietal2016} provide implementation of various methods of functional time series analysis. Methods for the robust analysis of univariate and multivariate functional data are provided in the \pkg{roahd} package \citep{roahd2017}. Interval testing procedures for functional data in different frameworks (i.e., one or two-population frameworks, functional linear models) are implemented in the \pkg{fdatest} package \citep{fdatest2015}. The functional data analysis in mixed model framework is implemented in the package \pkg{fdaMixed} \citep{Markussen2014}. The functional observations can be visualized by many of plots implemented in the following packages \pkg{GOplot} \citep{Walteretal2015}, \pkg{rainbow} \citep{ShangHyndman2016} and \pkg{refund.shiny} \citep{WrobelGoldsmith2016}. The packages \pkg{fds} \citep{ShangHyndman2013} and \pkg{mfds} \citep{GoreckiSmaga2017mfds} contain some interesting functional data sets.

Despite so many \proglang{R}~packages for functional data analysis, a broad range of test for a widely applicable analysis of variance problem for functional data was implemented very recently in the package \pkg{fdANOVA}. Earlier, only the testing procedures of \cite{CFF2004} and \cite{CAFB2010} were available in the package \pkg{fda.usc}. The package \pkg{fdANOVA} is available from the Comprehensive \proglang{R}~Archive Network at \url{http://CRAN.R-project.org/package=fdANOVA}. It is the aim of this package to provide a few functions implementing most of known analysis of variance tests for univariate and multivariate functional data, mainly for homoscedastic case. Most of them are based on bootstrap, permutations or projections, which may be time-consuming. For this reason, the package also contains parallel implementations which enable to reduce the computation time significantly, which is shown in empirical evaluation.

The rest of the vignette is organized in the following manner. In Section~\ref{sec2}, the problems of the analysis of variance for univariate and multivariate functional data are introduced. A review of most of the known solutions of these problems is also presented there. Some of the testing procedures are slightly generalized. Since it was not easy task to implement many different tests in a few functions, their usage may also be not easy at first. Thus, Section~\ref{sec3} contains a detailed description of (eventual) preparation of data and package functionality as well as a package demonstration on commonly used real data set.

\section{Analysis of variance for functional data}
\label{sec2}
In this section, we briefly describe most of the known testing procedures for the analysis of variance problem for functional data in the univariate and multivariate cases. All of them are implemented in the package \pkg{fdANOVA}.

\subsection{Univariate case}
\label{sec21}
We consider the $l$ groups of independent random functions $X_{ij}(t)$, $i=1,\dots,l,$ $j=1,\dots,n_i$ defined over a closed and bounded interval $I=[a,b]$. Let $n=n_1+\dots+n_l$. These groups may differ in mean functions, i.e., we assume that $X_{ij}(t)$, $j=1,\dots,n_i$ are stochastic processes with mean function $\mu_i(t)$, $t\in I$ and covariance function $\gamma(s, t)$, $s,t\in I$, for $i=1,\dots,l$. Of interest is to test the following null hypothesis
\begin{equation}
\label{H0u}
H_0:\mu_1(t)=\dots=\mu_l(t),\ t\in I.
\end{equation}
The alternative is the negation of the null hypothesis. The problem of testing this hypothesis is known as the one-way analysis of variance problem for functional data (FANOVA).

Many of the tests for (\ref{H0u}) are based on the pointwise $F$~test statistic \citep{RamsaySilverman2005} given by the formula
$$F_n(t)=\frac{\mathrm{SSR}_n(t)/(l-1)}{\mathrm{SSE}_n(t)/(n-l)},$$
where $$\mathrm{SSR}_n(t)=\sum_{i=1}^l n_i(\bar{X}_i(t)-\bar{X}(t))^{2},\quad\mathrm{SSE}_n(t)=\sum_{i=1}^l\sum_{j=1}^{n_i}(X_{ij}(t)-\bar{X}_i(t))^{2}$$ are the pointwise between-subject and within-subject variations respectively, and $\bar{X}(t)=(1/n)\sum_{i=1}^l\sum_{j=1}^{n_i}X_{ij}(t)$ and $\bar{X}_i(t)=(1/n_i)\sum_{j=1}^{n_i}X_{ij}(t),$ $i=1,\dots,l$, are respectively the sample grand mean function and the sample group mean functions.

\cite{Faraway1997} and \cite{ZhangChen2007} proposed to use only the pointwise between-subject variation and considered the test statistic  $\int_I\mathrm{SSR}_n(t)dt$. Tests based on it are called the $L^2$-norm-based tests. The distribution of this test statistic can be approximated by that of $\beta\chi_d^2$ and comparing the first two moments of these random variables. The parameters $\beta$ and $d$ were estimated by the naive and biased-reduced methods \citep[see, for instance,][for more detail]{GoreckiSmaga2015}. Thus we have the $L^2$-norm-based tests with the naive and biased-reduced methods of estimation of these parameters (the $\mathrm{L}^{2}$N and $\mathrm{L}^{2}$B tests for short). In case of non-Gaussian samples or small sample sizes, the bootstrap $L^2$-norm-based test is also considered (the $\mathrm{L}^{2}$b tests for short).

A bit different $L^2$-norm-based test was proposed by \cite{CFF2004}. Namely, they considered $\sum_{1\leq i<j\leq l}n_i\int_I(\bar{X}_i(t)-\bar{X}_j(t))^{2}dt$ as a test statistic and approximated its null distribution by a parametric bootstrap method via re-sampling the Gaussian processes involved in the limit random expression of their test statistic under $H_0$. \cite{CFF2004} investigated two testing procedures (the CH and CS tests for short) under homoscedastic and heteroscedastic cases. The tests differ in estimating the covariance functions.

Following test, which uses both the pointwise between-subject and within-subject variations, is known as the $F$-type test. More precisely, the test statistic is of the form
\begin{equation}
\label{Fstat}
\frac{\int_I\mathrm{SSR}_n(t)dt/(l-1)}{\int_I\mathrm{SSE}_n(t)dt/(n-l)}.
\end{equation}
Tests of this type were considered by \cite{ShenFaraway2004} and \cite{Zhang2011}. The null distribution of the above test statistic is approximated by $F_{(l-1)\kappa,(n-l)\kappa}$~distribution and depending on the method of estimation of parameter $\kappa$, the $F$-type tests based on naive and biased-reduced methods of estimation are considered (the FN and FB tests for short). For the same reasons as for $L^2$-norm-based test, the Fb test is also investigated, i.e., the bootstrap $F$-type test.

The following testing procedure, which also uses the test statistic (\ref{Fstat}), is the slightly modified procedure proposed by \cite{GoreckiSmaga2015}. Assume that $X_{ij}\in L_2(I)$, $i=1,\dots,l,j=1,\dots,n_i$, where  $L_2(I)$ is a Hilbert space consisting of square integrable functions on $I$, equipped with the inner product of the form $\langle f,g\rangle=\int_{I}f(t)g(t)dt$. We represent the functional observations by a finite number of basis functions $\varphi_m\in L_2(I),m=1,\dots$, i.e.,
\begin{equation}
\label{Xbfr}
X_{ij}(t)\approx\sum_{m=1}^{K}c_{ijm}\varphi_m(t),\quad t\in I,
\end{equation}
where $c_{ijm}$, $m=1,\dots,K$, are random variables such that $\mathrm{Var}(c_{ijm})<\infty$ and $K$ is sufficiently large \citep[see][for more details about basis function representation]{RamsaySilverman2005}. By the results of \cite{Goreckietal2015}, the commonly used least squares method \citep[see, for example,][]{KW2013} seems to be one of the best for estimation of coefficients $c_{ijm}$, so we use only that method for this purpose. The value of $K$ can be selected for each observation $X_{ij}(t)$ using certain information criterion, e.g., BIC, eBIC, AIC or AICc. From the values of $K$ corresponding to all observations a modal, minimum, maximum or mean value is selected as the common value for all processes $X_{ij}(t)$. By using (\ref{Xbfr}) and easy modifications of the results obtained by \cite{GoreckiSmaga2015}, we proved that the test statistic (\ref{Fstat}) is approximately equal to
\begin{equation}
\label{FGS}
\frac{(a-b)/(l-1)}{(c-a)/(n-l)},
\end{equation}
where $$a=\sum_{i=1}^l\frac{1}{n_i}\mathbf{1}_{n_{i}}^{\top}\mathbf{C}_{i}^{\top}\mathbf{J}_{\vect{\varphi}}\mathbf{C}_{i}\mathbf{1}_{n_{i}},\ b=\frac{1}{n}\sum_{i=1}^l\sum_{j=1}^l\mathbf{1}_{n_{i}}^{\top}\mathbf{C}_{i}^{\top}\mathbf{J}_{\vect{\varphi}}\mathbf{C}_{j}\mathbf{1}_{n_{j}},\ c=\sum_{i=1}^l\mathrm{trace}(\mathbf{C}_i^{\top}\mathbf{J}_{\vect{\varphi}}\mathbf{C}_i),$$ $\mathbf{1}_a$ is the $a\times 1$ vector of ones, $\mathbf{C}_i=(c_{ijm})_{j=1,\dots,n_i;m=1,\dots,K}$ are $K\times n_i$ matrices of coefficients, $i=1,\dots,l$, and $\mathbf{J}_{\vect{\varphi}}:=\int_I\vect{\varphi}(t)\vect{\varphi}^{\top}(t)dt$ is the $K\times K$ cross product matrix corresponding to the vector $\vect{\varphi}(t)=(\varphi_1(t),\dots,\varphi_K(t))^{\top}$. So the statistic (\ref{Fstat}) can be calculated based only on the coefficients $c_{ijm}$ and the matrix $\mathbf{J}_{\vect{\varphi}}$, which can be approximated by using the function \code{inprod()} from the \proglang{R}~package \pkg{fda} \citep{Ramsayetal2009,Ramsayetal2014} (For orthonormal basis, $\mathbf{J}_{\vect{\varphi}}$ is the identity matrix.). Moreover, observe that any permutation of the observations leaves the values of the sums $b$ and $c$ unchanged. For this reason, \cite{GoreckiSmaga2015} considered the permutation test based on (\ref{FGS}). We refer to this test as the FP test. Simulation studies of \cite{GoreckiSmaga2015} suggest that the FP test has better finite sample properties than the $F$-type and $L^2$-norm-based tests. Moreover, for short functional data (i.e., observed on a short grid of design time points) it may also be better than the GPF test described in the following paragraph.

In the above test statistics, $\mathrm{SSR}_n(t)$ and $\mathrm{SSE}_n(t)$ were integrated separately. However, by the simulation results of \cite{GoreckiSmaga2015}, it follows that for example integrating the whole quotient $\mathrm{SSR}_n(t)/\mathrm{SSE}_n(t)$ is more powerful in many situations. Such test statistic of the form $\int_I F_n(t)dt$ was considered by \cite{ZhangLiang2014}. They proposed the globalizing pointwise $F$ test (the GPF test) based on this test statistic and critical value approximated similarly as for the $L^2$-norm-based test. Although integration seems to be a natural operation on $F_n(t)$ or its part, in some situations other using of $F_n(t)$ may be better in the sense of power, as was shown by \cite{Zhangetal2013}. Instead of integral of $F_n(t)$, they used simply $\sup_{t\in I}F_n(t)$ as a test statistic and simulated the critical value of the resulting Fmaxb test via bootstrapping. By intensive simulation studies, \cite{Zhangetal2013} found that the Fmaxb (resp. GPF) test generally has higher power than the GPF (resp. Fmaxb) test when the functional data are moderately or highly (resp. less) correlated.

A different approach to test the null hypothesis (\ref{H0u}) was proposed by \cite{CAFB2010}. Their tests are based on the analysis of randomly chosen projections. Suppose that $\mu_i$, $i=1,\dots,l$ belong to a separable Hilbert space $\mathcal{H}$ endowed with a scalar product $\langle\cdot,\cdot\rangle$, and $\xi$ is a Gaussian distribution on this space and each of its one-dimensional projections is nondegenerate. Let $h$ be a vector  chosen randomly from $\mathcal{H}$ using the distribution $\xi$. When the null hypothesis $H_0$ holds, then we can easily observe that for every $h\in\mathcal{H}$, the following null hypothesis
\begin{equation}
\label{H0p}
H_0^h:\langle\mu_1,h\rangle=\dots=\langle\mu_l,h\rangle
\end{equation}
also holds. Moreover, \cite{CAFB2010} showed that for $\xi$-almost every $h$, $H_0^h$ fails, in case of failing of $H_0$. Therefore, a testing procedure for $H_0^h$ can be used to test $H_0$. Assuming that $X_{ij}\in L_2(I)$, $i=1,\dots,l,j=1,\dots,n_i$, \cite{CAFB2010} propose the following testing procedure, in which $k$ random projections are used:

\begin{enumerate}
    \item Choose, with Gaussian distribution, functions $h_r\in L_2(I)$, $r=1,\dots,k$.
	 \item Compute the projections $P_{ij}^r=\int_I X_{ij}(t)h_r(t)dt/\left(\int_I h_r^2(t)dt\right)^{1/2}$
for $i=1,\dots,l$, $j=1,\dots,n_{i}$, $r=1,\dots,k$.
	 \item For each $r\in\{1,\dots,k\}$,  apply the appropriate ANOVA test for $P_{ij}^r$, $i=1,\dots,l$, $j=1,\dots,n_i$. Let $p_1,\dots,p_k$ denote the resulting $p$~values.
	 \item Compute the final $p$~value for $H_0$ by the formula $\inf\{kp_{(r)}/r,r=1,\dots,k\}$, where $p_{(1)}\leq\dots\leq p_{(k)}$ are the ordered $p$~values obtained in step 3.
\end{enumerate}

The tests based on the above procedure are referred to as the test based on random projections. \cite{CAFB2010} suggested to use $k$ near 30, which was confirmed by the results of \cite{GoreckiSmaga2017}. However, in the case of unconvincing results of the test, we should use a higher number of projections. We also have to choose a Gaussian distribution and ANOVA test appearing in steps 1 and 3 of the above procedure, respectively. We can do this in many ways and some of them are implemented in the package \pkg{fdANOVA} (see Section \ref{sec3}). In step 4, we can also use other final $p$~values instead of Benjamini and Hochberg procedure \citep{BenjaminiHochberg1995, BenjaminiYekutieli2001}, as for example Bonferroni correction. However, according to our experience, the test with corrected $p$~value given in step 4 behaves best under finite samples, so we use only it. The procedure by Cuesta-Albertos and Febrero-Bande (2010) can also handle the heteroskedastic case. It only depends that such procedure exists for the projected data.

\subsection{Multivariate case}
\label{sec22}
Now, we study the multivariate version of the ANOVA problem for functional data as well as extensions of certain methods presented in the last section to this problem. The results of this section were mainly obtained by \cite{GoreckiSmaga2017}.

Instead of single functions, we consider independent vectors of random functions $\mathbf{X}_{ij}(t)=(X_{ij1}(t),\dots,X_{ijp}(t))^{\top}\in SP_p(\vect{\mu}_i,\mathbf{\Gamma})$, $i=1,\dots,l,$ $j=1,\dots,n_{i}$ defined over the interval $I$, where $SP_{p}(\vect{\mu},\vect{\Gamma})$ is a set of $p$-dimensional stochastic processes with mean vector $\vect{\mu}(t)$, $t\in I$ and covariance function $\vect{\Gamma}(s, t)$, $s,t\in I$. In the multivariate analysis of variance problem for functional data (FMANOVA), we have to test the null hypothesis as follows:
\begin{equation}
\label{H0m}
H_0:\vect{\mu}_1(t)=\dots=\vect{\mu}_l(t),\ t\in I.
\end{equation}

The first (permutation) tests are based on a basis function representation of the components of the vectors $\mathbf{X}_{ij}(t)$, $i=1,\dots,l$, $j=1,\dots,n_i$, similarly as in the FP test. For this purpose, we assume that $\mathbf{X}_{ij}$ belong to $L_{2}^{p}(I)$ -- a Hilbert space of $p$-dimensional vectors of square integrable functions on the interval $I$, equipped with the inner product: $\langle\mathbf{x},\mathbf{y}\rangle=\int_I\mathbf{x}^{\top}(t)\mathbf{y}(t)dt$. Hence we can represent the components of $\mathbf{X}_{ij}(t)$ in a similar way as in (\ref{Xbfr}). Then the vectors are represented as follows
\begin{equation}
\label{XXbfr}
\mathbf{X}_{ij}(t)\approx\left(\begin{array}{c}
\mathbf{c}_{ij1}\\
\vdots\\
\mathbf{c}_{ijp}\\
\end{array}\right)\vect{\varphi}(t)=\mathbf{c}_{ij}\vect{\varphi}(t),
\end{equation}
where $\mathbf{c}_{ijm}=(c_{ijm1},\dots,c_{ijmK_{m}},0,\dots,0)\in R^{KM}$, $\vect{\varphi}(t)=(\varphi_1(t),\dots,\varphi_{KM}(t))^{\top},$ $t\in I$  and $i=1,\dots,l$, $j=1,\dots,n_{i}$, $m=1,\dots,p$, $KM=\max\{K_{1},\dots,K_{p}\}$. The coefficients in $\mathbf{c}_{ij}$ and values of $K_m$ are estimated separately for each feature by using the same methods as described in Section \ref{sec21}. Similarly to MANOVA \citep[see][]{Anderson2003}, the following matrices were used in constructing test statistics for FMANOVA problem:
\begin{flalign*}
\mathbf{E}&=\sum_{i=1}^l\sum_{j=1}^{n_{i}}\int_I \left(\mathbf{X}_{ij}(t)-\bar{\mathbf{X}}_{i}(t)\right)\left(\mathbf{X}_{ij}(t)-\bar{\mathbf{X}}_{i}(t)\right)^{\top} dt,\\
\mathbf{H}&=\sum_{i=1}^l n_{i}\int_I \left(\bar{\mathbf{X}}_{i}(t)-\bar{\mathbf{X}}(t)\right)\left(\bar{\mathbf{X}}_{i}(t)-\bar{\mathbf{X}}(t)\right)^{\top} dt,
\end{flalign*}
where $\bar{\mathbf{X}}_i(t)=(1/n_i)\sum_{j=1}^{n_i}\mathbf{X}_{ij}(t),$ $i=1,\dots,l$ and $\bar{\mathbf{X}}(t)=(1/n)\sum_{i=1}^l\sum_{j=1}^{n_i}\mathbf{X}_{ij}(t)$, $t\in I$. Modifying the results of \cite{GoreckiSmaga2017}, we showed that these matrices can be designated only by the coefficient matrices $\mathbf{c}_{ij}$ and appropriate cross product matrix, i.e., $\mathbf{E}\approx\mathbf{A}-\mathbf{B}$ and $\mathbf{H}\approx\mathbf{B}-\mathbf{C}$, where
\[\mathbf{A}=\sum_{i=1}^l\sum_{j=1}^{n_{i}}\mathbf{c}_{ij}\mathbf{J}_{\vect{\varphi}}\mathbf{c}_{ij}^{\top},\
\mathbf{B}=\sum_{i=1}^l\frac{1}{n_{i}}\sum_{j=1}^{n_{i}}\sum_{m=1}^{n_{i}}\mathbf{c}_{ij}\mathbf{J}_{\vect{\varphi}}\mathbf{c}_{im}^{\top},\ \mathbf{C}=\frac{1}{n}\sum_{i=1}^l\sum_{j=1}^{n_{i}}\sum_{t=1}^l\sum_{u=1}^{n_{t}}\mathbf{c}_{ij}\mathbf{J}_{\vect{\varphi}}\mathbf{c}_{tu}^{\top},\]
and $\mathbf{J}_{\vect{\varphi}}$ is the $KM\times KM$ cross product matrix corresponding to $\vect{\varphi}$. The following test statistics for (\ref{H0m}) are constructed based on those appearing in MANOVA \citep{Anderson2003}: the Wilk's lambda $\mathrm{W}=\det(\mathbf{E})/\det(\mathbf{E}+\mathbf{H})$, the Lawley-Hotelling trace $\mathrm{LH}=\mathrm{trace}(\mathbf{H}\mathbf{E}^{-1})$, the Pillai trace $\mathrm{P}=\mathrm{trace}(\mathbf{H}(\mathbf{H}+\mathbf{E})^{-1})$, the Roy's maximum root $\mathrm{R}=\lambda_{\max}(\mathbf{H}\mathbf{E}^{-1})$, where $\lambda_{\max}(\mathbf{M})$ is the maximum eigenvalue of a matrix $\mathbf{M}$. We consider the permutation testing procedures based on these test statistics and refer to them as the W, LH, P and R tests, respectively. Generally, we refer to them as the permutation tests based on a basis function representation for the FMANOVA problem. A quite fast implementation of these permutation tests was obtained by observing that the matrices $\mathbf{A}$ and $\mathbf{C}$ are not changed by any permutation of the data.

The second group of testing procedures for (\ref{H0m}) is based on random projections similarly as in the FANOVA test based on random projections. Let a space $\mathcal{H}$ and a distribution $\xi$ be defined as in Section \ref{sec21}. Assume that $\mu_{ij}\in\mathcal{H}$, where $\mu_{ij}$ are the components of the vectors $\vect{\mu}_i$, $i=1,\dots,l$, $j=1,\dots,p$. If the null hypothesis in (\ref{H0m}) holds, then for every $\mathbf{H}=(h_1,\dots,h_p)^{\top}\in \mathcal{H}\times\dots\times\mathcal{H}$,
%$$H_0^{\mathbf{H}}:\left(\begin{array}{c}
%\langle\mu_{11},h_1\rangle\\
%\vdots\\
%\langle\mu_{1p},h_p\rangle\\
%\end{array}\right)=\dots=\left(\begin{array}{c}
%\langle\mu_{l1},h_1\rangle\\
%\vdots\\
%\langle\mu_{lp},h_p\rangle\\
%\end{array}\right)$$
\begin{equation}
\label{H0pm}
H_0^{\mathbf{H}}:
(\langle\mu_{11},h_1\rangle,\dots,\langle\mu_{1p},h_p\rangle)^{\top}=\dots=
(\langle\mu_{l1},h_1\rangle,\dots,
\langle\mu_{lp},h_p\rangle)^{\top}
\end{equation}
also holds, and further when $H_0$ fails, for $(\xi\times\dots\times\xi)$-almost every $\mathbf{H}\in\mathcal{H}\times\dots\times\mathcal{H}$, $H_0^{\mathbf{H}}$ also fails \citep[see Theorem 2.2 in][]{GoreckiSmaga2017}. Thus, a test for the MANOVA problem can be used to test the FMANOVA one by using it to test the null hypothesis (\ref{H0pm}). For this reason, \cite{GoreckiSmaga2017} investigated the similar testing procedure based on $k$ random projection as described in Section \ref{sec21}, which the first three steps are now as follows:
\begin{enumerate}
    \item Choose, with Gaussian distribution, functions $h_{mr}\in L_2(I)$, $m=1,\dots,p$, $r=1,\dots,k$.
	 \item Compute the projections $P_{ijm}^r=\int_I X_{ijm}(t)h_{mr}(t)dt/\left(\int_I h_{mr}^2(t)dt\right)^{1/2}$
for $i=1,\dots,l$, $j=1,\dots,n_{i}$, $m=1,\dots,p$, $r=1,\dots,k$.
	 \item For each $r\in\{1,\dots,k\}$, apply the appropriate MANOVA test for $\mathbf{P}_{ij}^r=(P_{ij1}^r,\dots,P_{ijp}^r)^{\top}$, $i=1,\dots,l$, $j=1,\dots,n_i$. Let $p_1,\dots,p_k$ denote the resulting $p$~values.
\end{enumerate}
In step 3 of this procedure, the well-known MANOVA tests were applied, namely Wilk's lambda test (Wp test), the Lawley-Hotelling trace test (LHp test), the Pillai trace test (Pp test) and Roy's maximum root test (Rp test). Their permutation versions are also investigated.

By the extensive Monte Carlo simulation studies of \cite{GoreckiSmaga2017}, the performance of the tests considered except the Rp test is very satisfactory under finite samples. Unfortunately, the Rp test does not control the nominal type-I error level, and hence it can not be recommended. The other testing procedures do not perform equally well, and there is no single method performing best.

\section[R implementation]{\proglang{R}~implementation}
\label{sec3}
In this section, we present the \proglang{R}~package \pkg{fdANOVA} and illustrate the usage of it step by step using certain real data set. First, however, we mention about the eventual preparation of the functional data in the \proglang{R}~program to use the functions of our package properly.

\subsection{Preparation of the data}
\label{sec31}
In practice, functional samples are not continuously observed, i.e., each function is usually observed on a grid of design time points. In our implementations of FANOVA and FMANOVA tests in the \proglang{R}~programming language, all functions are observed on a common grid of design time points equally spaced in the interval $I=[a,b]$. In the case where the design time points are different for different individual functions or not equally spaced in $I$, we follow the methodology proposed by \cite{Zhang2013}. First, we have to reconstruct the functional samples from the observed discrete functional samples using smoothing technique such as regression splines, smoothing splines, P-splines or local polynomial smoothing \citep[see][Chapters 2-3]{Zhang2013}. For this purpose, in \proglang{R} we can use the function \code{smooth.spline()} from the \pkg{stats} package \citep{RCoreTeam2017} or functions given in the packages \pkg{splines} \citep{RCoreTeam2017}, \pkg{bigsplines} \citep{Helwig2016}, \pkg{pspline} \citep{pspline} and \pkg{locpol} \citep{OjedaCabrera2012}. After that we discretize each individual function of the reconstructed functional samples on a common grid of $\mathcal{T}$ time points equally spaced in $I$, and then the implementations of the tests can be applied to discretized samples.

\subsection{Package functionality}
\label{sec32}
Now, we describe the implementation of the tests for analysis of variance problem for univariate and multivariate functional data in the \proglang{R}~package \pkg{fdANOVA}. As we will see many of the implemented tests may be performed with different values of parameters. However, by simulation and real data examples presented in the previous papers (see Sections~\ref{sec2}), satisfactory results are usually obtained by using the default values of these parameters. Nevertheless, when the results are unconvincing (e.g., the $p$~values are close to the significance level), we have the opportunity to use other options provided by the functions of the package.

All tests for FANOVA problem presented in Section \ref{sec21} are implemented in the function \code{fanova.tests()}:
\begin{CodeChunk}
\begin{CodeInput}
R> library("fdANOVA")
R> str(fanova.tests)
\end{CodeInput}
\begin{CodeOutput}
function(x = NULL, group.label, test = "ALL", params = NULL,
         parallel = FALSE, nslaves = NULL)
\end{CodeOutput}
\end{CodeChunk}
which is controlled by the following parameters:
\begin{itemize}
\item \code{x} -- a $\mathcal{T}\times n$ matrix of data, whose each column is a discretized version of a function and rows correspond to design time points. Its default values is \code{NULL}, since if the FP test is only used, we can give a basis representation of the data instead of raw observations (see the list \code{paramFP} below). For any of the other testing procedures, the raw data are needed.

\item \code{group.label} -- a vector containing group labels.

\item \code{test} -- a kind of indicator which establishes a choice of FANOVA tests to be performed. Its default value means that all testing procedures of Section \ref{sec21} will be used. When we want to use only some tests, the parameter \code{test} is an appropriate subvector of the following vector of tests' labels (see Section \ref{sec21}):
\begin{Code}
c("FP", "CH", "CS", "L2N", "L2B", "L2b", "FN", "FB", "Fb", "GPF",
  "Fmaxb", "TRP")
\end{Code}

\item \code{params} -- a list of additional parameters for the FP, CH, CS, L$^2$b, Fb, Fmaxb tests and the test based on random projections. Its possible elements and their default values are described below. The default value of this parameter means that these tests are performed with their default values.

\item \code{parallel} -- a logical indicating whether to use parallelization.

\item \code{nslaves} -- if \code{parallel = TRUE}, a number of slaves. Its default value means that it will be equal to a number of logical processes of a computer used.
\end{itemize}

The list \code{params} can contain all or a part of the elements \code{paramFP}, \code{paramCH}, \code{paramCS}, \code{paramL2b}, \code{paramFb}, \code{paramFmaxb} and \code{paramTRP} for passing the parameters for the FP, CH, CS, L$^2$b, Fb, Fmaxb tests and the test based on random projections, respectively, to the function \code{fanova.tests()}. They are described as follows. The list
\begin{Code}
paramFP = list(int, B.FP = 1000, basis = c("Fourier", "b-spline", "own"),
               own.basis, own.cross.prod.mat,
               criterion = c("BIC", "eBIC", "AIC", "AICc", "NO"),
               commonK = c("mode", "min", "max", "mean"),
               minK = NULL, maxK = NULL, norder = 4, gamma.eBIC = 0.5)
\end{Code}
contains the parameters of the FP test and their default values, where:
\begin{itemize}
\item \code{int} -- a vector of two elements representing the interval $I=[a,b]$. When it is not specified, it is determined by a number of design time points.

\item \code{B.FP} -- a number of permutation replicates.

\item \code{basis} -- a choice of basis of functions used in the basis function representation of the data.

\item \code{own.basis} -- if \code{basis = "own"}, a $K\times n$ matrix with columns containing the coefficients of the basis function representation of the observations.

\item \code{own.cross.prod.mat} -- if \code{basis = "own"}, a $K\times K$ cross product matrix corresponding to a basis used to obtain the matrix \code{own.basis}.

\item \code{criterion} -- a choice of information criterion for selecting the optimum value of $K$. \code{criterion = "NO"} means that $K$ is equal to the parameter \code{maxK} defined below. By (\ref{Xbfr}), we have
$\text{\code{BIC}}(X_{ij})=\mathcal{T}\log(\mathbf{e}_{ij}^{\top}\mathbf{e}_{ij}/\mathcal{T})+K\log\mathcal{T}$, $\text{\code{eBIC}}(X_{ij})=\mathcal{T}\log(\mathbf{e}_{ij}^{\top}\mathbf{e}_{ij}/\mathcal{T})+K[\log\mathcal{T}+2\gamma\log(K_{\max})]$, $\text{\code{AIC}}(X_{ij})=\mathcal{T}\log(\mathbf{e}_{ij}^{\top}\mathbf{e}_{ij}/\mathcal{T})+2K$ and $\text{\code{AICc}}(X_{ij})=\text{\code{AIC}}(X_{ij})+2K(K + 1)/(n-K-1)$, where $\mathbf{e}_{ij}=(e_{ij1},\dots,e_{ij\mathcal{T}})^{\top}$, $e_{ijr}=X_{ij}(t_r)-\sum_{m=1}^K\hat{c}_{ijm}\varphi_m(t_r)$, $t_1,\dots,t_{\mathcal{T}}$ are the design time points, $\gamma\in[0,1]$, $K_{\max}$ is a maximum $K$ considered and $\log$ denotes the natural logarithm.

\item \code{commonK} -- a choice of method for selecting the common value for all observations from the values of $K$ corresponding to all processes.

\item \code{minK} (resp. \code{maxK}) -- a minimum (resp. maximum) value of $K$. When \code{basis = "Fourier"}, they have to be odd numbers. If $\text{\code{minK}}=\text{\code{NULL}}$ and $\text{\code{maxK}}=\text{\code{NULL}}$, we take $\text{\code{minK}}=3$ and \code{maxK} equal to the largest odd number smaller than the number of design time points. If \code{maxK} is greater than or equal to the number of design time points, \code{maxK} is taken as above. For \code{basis = "b-spline"}, \code{minK} (resp. \code{maxK}) has to be greater (resp. smaller) than or equal to \code{norder} defined below (resp. $\mathcal{T}$). If $\text{\code{minK}}=\text{\code{NULL}}$ or $\text{\code{minK}}<\text{\code{norder}}$ and $\text{\code{maxK}}=\text{\code{NULL}}$ or $\text{\code{maxK}}>\mathcal{T}$, then we take $\text{\code{minK}}=\text{\code{norder}}$ and $\text{\code{maxK}}=\mathcal{T}$.

\item \code{norder} -- if \code{basis = "b-spline"}, an integer specifying the order of b-splines.

\item \code{gamma.eBIC} -- a $\gamma\in[0,1]$ parameter in the eBIC.
\end{itemize}

It should be noted that the AICc may choose the finale model with a number $K$ of coefficients close to a number of observations $n$, when $K_{\max}$ is greater than $n$. Such selection usually differs from choices suggested by other criterion, but it seems that this does not have much impact on the results of testing.

For the CH and CS (resp. L$^2$b, Fb and Fmaxb) tests, the parameters \code{paramCH} and \code{paramCS} (resp. \code{paramL2b}, \code{paramFb} and \code{paramFmaxb}) denote the numbers of discretized artificial trajectories for certain Gaussian processes (resp. bootstrap samples) used to approximate the null distributions of their test statistics. The default value of each of these parameters is 10,000. The parameters of the test based on random projections and their default values are contained in a list
\begin{Code}
paramTRP = list(k = 30, projection = c("GAUSS", "BM"),
                permutation = FALSE, B.TRP = 10000,
                independent.projection.tests = TRUE)
\end{Code}
where:
\begin{itemize}
\item \code{k} -- a vector of numbers of projections.

\item \code{projection} -- a method of generating Gaussian processes in step 1 of the testing procedure based on random projections presented in Section \ref{sec2}. If \code{projection = "GAUSS"}, the Gaussian white noise is generated as in the function \code{anova.RPm()} from the \proglang{R}~package \pkg{fda.usc} \citep{FBOF2012}. In the second case, the Brownian motion is generated.

\item \code{permutation} -- a logical indicating whether to compute $p$~values by permutation method.

\item \code{B.TRP} -- a number of permutation replicates.

\item \code{independent.projection.tests} -- a logical indicating whether to generate the random projections independently or dependently for different elements of vector \code{k}. In the first case, the random projections for each element of vector \code{k} are generated separately, while in the second one, they are generated as chained subsets, e.g., for \code{k = c(5, 10)}, the first 5 projections are a subset of the last 10. The second way of generating random projections is faster than the first one.
\end{itemize}

A Brownian process in $[a,b]$ has small variances near $a$ and higher variances close to $b$. This means that the tests based on random projections and the Brownian motion may be more able to detect differences among mean groups, when those differences are much closer to $b$ than to $a$.

To perform step 3 of the procedure based on random projections given in Section \ref{sec21}, in the package, we use five testing procedures: the standard (\code{paramTRP$permutation = FALSE}) and permutation (\code{paramTRP$permutation = TRUE}) tests based on ANOVA F~test statistic and ANOVA-type statistic (ATS) proposed by \cite{Brunneretal1997}, as well as the testing procedure based on Wald-type permutation statistic (WTPS) of \cite{Paulyetal2015}.

The function \code{fanova.tests()} returns a list of the class \code{fanovatests}. This list contains the values of the test statistics, the $p$~values and the parameters used. The results for a given test are given in a list (being an element of output list) named the same as the indicator of a test in the vector \code{test}. Additional outputs as the chosen optimal length of basis expansion (the FP test), the values of estimators used in approximations of null distributions of test statistics (the L$^2$N, L$^2$B, FN, FB, GPF tests) and projections of the data (the test based on random projections) are contained in appropriate lists. If \code{independent.projection.tests = TRUE}, the projections of the data are contained in a list of the length equal to length of vector \code{k}, whose $i$-th element is an $n\times \text{\code{k[i]}}$ matrix with columns being projections of the data. When \code{independent.projection.tests = FALSE}, the projections of the data are contained in an $n\times \max(\text{\code{k}})$ matrix with columns being projections of the data.

The permutation tests based on a basis function representation for FMANOVA problem, i.e., the W, LH, P and R tests are implemented in the function \code{fmanova.ptbfr()}:
\begin{CodeChunk}
\begin{CodeInput}
R> library("fdANOVA")
R> str(fmanova.ptbfr)
\end{CodeInput}
\begin{CodeOutput}
function(x = NULL, group.label, int, B = 1000,
         parallel = FALSE, nslaves = NULL,
         basis = c("Fourier", "b-spline", "own"),
         own.basis, own.cross.prod.mat,
         criterion = c("BIC", "eBIC", "AIC", "AICc", "NO"),
         commonK = c("mode", "min", "max", "mean"),
         minK = NULL, maxK = NULL, norder = 4, gamma.eBIC = 0.5)
\end{CodeOutput}
\end{CodeChunk}
The parameters \code{group.label}, \code{int}, \code{B}, \code{parallel}, \code{nslaves}, \code{basis}, \code{norder} and \code{gamma.eBIC} are the same as in the function \code{fanova.tests()} (\code{B} corresponds to \code{B.FP}). The other arguments of \code{fmanova.ptbfr()} are described as follows:
\begin{itemize}
\item \code{x} -- a list of $\mathcal{T}\times n$ matrices of data, whose each column is a discretized version of a function  and rows correspond to design time points. The $m$th element of this list contains the data of $m$th feature, $m=1,\dots,p$. Its default values is \code{NULL}, because a basis representation of the data can be given instead of raw observations (see the parameter \code{own.basis} below).

\item \code{own.basis} -- if \code{basis = "own"}, a list of length $p$, whose elements are $K_m\times n$ matrices ($m=1,\dots,p$) with columns containing the coefficients of the basis function representation of the observations.

\item \code{own.cross.prod.mat} -- if \code{basis = "own"}, a $KM\times KM$ cross product matrix corresponding to a basis used to obtain the list \code{own.basis}.

\item \code{criterion} -- a choice of information criterion for selecting the optimum value of $K_m$, $m=1,\dots,p$. \code{criterion = "NO"} means that $K_m$ are equal to the parameter \code{maxK} defined below.

\item \code{commonK} -- a choice of method for selecting the common value for all observations from the values of $K_{m}$ corresponding to all processes.

\item \code{minK} (resp. \code{maxK}) -- a minimum (resp. maximum) value of $K_m$. Further remarks about these arguments are the same as for the function \code{fanova.tests()}.
\end{itemize}

The function \code{fmanova.ptbfr()} returns a list of the class \code{fmanovaptbfr} containing the values of the test statistics (\code{W}, \code{LH}, \code{P}, \code{R}), the $p$~values (\code{pvalueW}, \code{pvalueLH}, \code{pvalueP}, \code{pvalueR}), chosen optimal values of $K_m$ and the parameters used. This function uses the \proglang{R}~package \pkg{fda} \citep{Ramsayetal2014}.

The function \code{fmanova.trp()} performs the testing procedures based on random projections for FMANOVA problem (the Wp, LHp, Pp and Rp tests):
\begin{CodeChunk}
\begin{CodeInput}
R> library("fdANOVA")
R> str(fmanova.trp)
\end{CodeInput}
\begin{CodeOutput}
function(x, group.label, k = 30, projection = c("GAUSS", "BM"),
         permutation = FALSE, B = 1000,
         independent.projection.tests = TRUE,
         parallel = FALSE, nslaves = NULL)
\end{CodeOutput}
\end{CodeChunk}
The first two parameters of this function as well as the arguments \code{parallel}, \code{nslaves} are the same as in the function \code{fmanova.ptbfr()}. The other ones have the same meaning as in the parameter list \code{paramTRP} of the function \code{fanova.tests()} (\code{B} corresponds to \code{B.TRP}). The function \code{fmanova.trp()} returns a list of class \code{fmanovatrp} containing the parameters and the following elements ($|\text{\code{k}}|$ denotes the length of vector \code{k}): \code{pvalues} -- a $4\times |\text{\code{k}}|$ matrix of $p$~values of the tests; \code{data.projections} -- if \code{independent.projection.tests = TRUE}, a list of length $|\text{\code{k}}|$, whose elements are lists of $n\times p$ matrices of projections of the observations, while when \code{independent.projection.tests = FALSE}, a list of length $\max(\text{\code{k}})$, whose elements are $n\times p$ matrices of projections of the observations.

The executions of selecting the optimum length of basis expansion by some information criterion, the bootstrap, permutation and projection  loops are the most time consuming steps of the testing procedures under consideration. To reduce the computational cost of the procedures, they are parallelized, when the parameter \code{parallel} is set to \code{TRUE}. The parallel execution is handled by \pkg{doParallel} package \citep{RASW2015}.

In the package, the number of auxiliary functions are also contained. The $p$~values of the tests based on random projections for FANOVA problem against the number of projections are visualized by the function \code{plot.fanovatests()} using the package \pkg{ggplot2} \citep{Wickham2009}, which is controlled by the following parameters: \code{x} -- an \code{fanovatests} object, more precisely, a result of the function \code{fanova.tests()} for the standard tests based on random projections; \code{y} -- an \code{fanovatests} object, more precisely, a result of the function \code{fanova.tests()} for the permutation tests based on random projections. Similarly, the $p$~values of the Wp, LHp, Pp and Rp tests are plotted by the function \code{plot.fmanovatrp()}. The arguments of this function are as follows: \code{x} -- an \code{fmanovatrp} object, more precisely, a result of the function \code{fmanova.trp()} for the standard tests; \code{y} -- an \code{fmanovatrp} object, more precisely, a result of the function \code{fmanova.trp()} for the permutation tests; \code{withoutRoy} -- a logical indicating whether to plot the $p$~values of the Rp test. We can use only one of the arguments \code{x} and \code{y}, or both simultaneously.

Using the package \pkg{ggplot2} \citep{Wickham2009}, the function \code{plotFANOVA()}:
\begin{CodeChunk}
\begin{CodeInput}
R> library("fdANOVA")
R> str(plotFANOVA)
\end{CodeInput}
\begin{CodeOutput}
function(x, group.label = NULL, int = NULL, separately = FALSE,
         means = FALSE, smooth = FALSE, ...)
\end{CodeOutput}
\end{CodeChunk}
produces a plot showing univariate functional observations with or without indication of groups as well as mean functions of samples. The following parameters control this function:
\begin{itemize}
\item \code{x} -- a $\mathcal{T}\times n$ matrix of data, whose each column is a discretized version of a function and rows correspond to design time points.

\item \code{group.label} -- a character vector containing group labels. Its default value means that all functional observations are drawn without division into groups.

\item \code{int} -- this parameter is the same as in the function \code{fanova.tests()}.

\item \code{separately} -- a logical indicating how groups are drawn. If \code{separately = FALSE}, groups are drawn on one plot by different colors. When \code{separately = TRUE}, they are depicted in different panels.

\item \code{means} -- a logical indicating whether to plot only group mean functions.

\item \code{smooth} -- a logical indicating whether to plot reconstructed data via smoothing splines instead of raw data.
\end{itemize}

The functions \code{print.fanovatests()}, \code{print.fmanovaptbfr()} and \code{print.fmanovatrp()} print out the $p$~values and values of the test statistics for the implemented testing procedures. Additionally, the functions \code{summary.fanovatests()}, \code{summary.fmanovaptbfr()} and \code{summary.fmanovatrp()} print out information about the data and parameters of the methods.

When calling the functions of the \pkg{fdANOVA} package, the software will check for presence of the \pkg{doBy}, \pkg{doParallel}, \pkg{ggplot2}, \pkg{fda}, \pkg{foreach}, \pkg{magic},  \pkg{MASS} and \pkg{parallel} packages if necessary \citep{doBy2016, RASW2015, Wickham2009,
Ramsayetal2014, magic2005, MASS2002}. If the required packages are not installed, an error message will be displayed.

It is worth to mention that fifteen labeled multivariate functional data sets are available in the \pkg{mfds} package \citep{GoreckiSmaga2017mfds}, which is a kind of supplement to the \pkg{fdANOVA} package. Table~\ref{Tab1} depicts brief information about these data sets, i.e., numbers of variables, design time points, groups and observations. The data sets were created from multivariate time series data available in \cite{Chenetal2015}, \cite{Leebetal2007}, \cite{Lichman2013} and \cite{Olszewski2001} by extending all variables to the same length as in \cite{Rodriguezetal2005}. They originate from different domains, including handwriting recognition, medicine, robotics, etc. The data sets can be used for illustrating and evaluating practical efficiency of classification and statistical inference methods, etc. \citep[see, for example,][]{GoreckiSmaga2015,GoreckiSmaga2017}.

\begin{table}
\begin{center}
\begin{tabular}{lcccc}
\hline
Data sets & \#Variables & \#Time points & \#Groups & \#Observations \\
\hline
Arabic digits & 13 & 93 & 10 & 8800 \\
Australian language & 22 & 136 & 95 & 2565 \\
Character trajectories & 3 & 205 & 20 & 2858 \\
ECG & 2 & 152 & 2 & 200 \\
Graz & 3 & 1152 & 2 & 140 \\
Japanese vowels & 12 & 29 & 9 & 640 \\
Libras & 2 & 45 & 15 & 360 \\
Pen digits & 2 & 8 & 10 & 10992 \\
Robot failure LP1 & 6 & 15 & 4 & 88 \\
Robot failure LP2 & 6 & 15 & 5 & 47 \\
Robot failure LP3 & 6 & 15 & 4 & 47 \\
Robot failure LP4 & 6 & 15 & 3 & 117 \\
Robot failure LP5 & 6 & 15 & 5 & 164 \\
uWaveGestureLibrary & 3 & 315 & 8 & 4478 \\
Wafer & 6 & 198 & 2 & 1194 \\\hline
\end{tabular}
\end{center}
\caption{Brief information about data sets available in the \pkg{mfds} package \citep{GoreckiSmaga2017mfds}.}
\label{Tab1}
\end{table}

\subsection{Package demonstration on real data example}
\label{sec33}
In this section, we provide examples that illustrate how the functions of the \proglang{R}~package \pkg{fdANOVA} can be used to analyze real data. For this purpose, we use the popular \textit{gait} data set available in the \pkg{fda} package. This data set consists of the angles formed by the hip and knee of each of 39 children over each child's gait cycle. The simultaneous variations of the hip and knee angles for children are observed at 20 equally spaced time points in $[0.025, 0.975]$. So, in this data set, we have two functional features, which we put in the list \code{x.gait} of length two, as presented below.
\begin{CodeChunk}
\begin{CodeInput}
R> library("fda")
R> gait.data.frame <- as.data.frame(gait)
R> x.gait <- vector("list", 2)
R> x.gait[[1]] <- as.matrix(gait.data.frame[, 1:39])
R> x.gait[[2]] <- as.matrix(gait.data.frame[, 40:78])
\end{CodeInput}
\end{CodeChunk}
Similarly to \cite{GoreckiSmaga2017}, for illustrative purposes, the functional observations are divided into three samples. Namely, the first sample consists of the functions for the first 13 children, the second sample of the functions for the next 13 children, and the third sample of the functions for the remaining children. The sample labels are contained in the vector \code{group.label.gait}:
\begin{CodeChunk}
\begin{CodeInput}
R> group.label.gait <- rep(1:3, each = 13)
\end{CodeInput}
\end{CodeChunk}
We can plot the functional data by using the function \code{plotFANOVA()}. For example, we plot the observations for the first functional feature without (Figure~\ref{Fig1} (a)) and with indication of the samples (Figure~\ref{Fig1} (b) and (c)) as well as the group mean functions (Figure~\ref{Fig1} (d)).
\begin{CodeChunk}
\begin{CodeInput}
R> library("fdANOVA")
R> plotFANOVA(x = x.gait[[1]], int = c(0.025, 0.975))
R> plotFANOVA(x = x.gait[[1]], group.label = as.character(group.label.gait),
+             int = c(0.025, 0.975))
R> plotFANOVA(x = x.gait[[1]], group.label = as.character(group.label.gait),
+             int = c(0.025, 0.975), separately = TRUE)
R> plotFANOVA(x = x.gait[[1]], group.label = as.character(group.label.gait),
+             int = c(0.025, 0.975), means = TRUE)
\end{CodeInput}
\end{CodeChunk}

\begin{figure}[!t]
\includegraphics[width=0.99\textwidth,
height=0.45\textwidth]{Fig1.pdf}
		\caption{The first functional feature of the \textit{gait} data without (Panel (a)) and with indication of the samples (Panels (b) and (c)). Panel (d) depicts the group mean functions.}
	\label{Fig1}
\end{figure}

From Figure~\ref{Fig1}, it seems that the mean functions of the three samples do not differ significantly. To confirm this statistically, we use the FANOVA tests implemented in the \code{fanova.tests()} function. First, we use default values of the parameters of this function:
\begin{CodeChunk}
\begin{CodeInput}
R> set.seed(123)
R> (fanova <- fanova.tests(x = x.gait[[1]], group.label = group.label.gait))
\end{CodeInput}
\begin{CodeOutput}
     Analysis of Variance for Functional Data

FP test - permutation test based on a basis function representation
Test statistic = 1.468218  p-value = 0.198

CH test - L2-norm-based parametric bootstrap test for homoscedastic samples
Test statistic = 7911.385  p-value = 0.2247

CS test - L2-norm-based parametric bootstrap test for heteroscedastic samples
Test statistic = 7911.385  p-value = 0.1944

L2N test - L2-norm-based test with naive method of estimation
Test statistic = 2637.128  p-value = 0.2106562

L2B test - L2-norm-based test with bias-reduced method of estimation
Test statistic = 2637.128  p-value = 0.1957646

L2b test - L2-norm-based bootstrap test
Test statistic = 2637.128  p-value = 0.2169

FN test - F-type test with naive method of estimation
Test statistic = 1.46698  p-value = 0.2226683

FB test - F-type test with bias-reduced method of estimation
Test statistic = 1.46698  p-value = 0.2198691

Fb test - F-type bootstrap test
Test statistic = 1.46698  p-value = 0.2704

GPF test - globalizing the pointwise F-test
Test statistic = 1.363179  p-value = 0.2691363

Fmaxb test - Fmax bootstrap test
Test statistic = 3.752671  p-value = 0.1815

TRP - tests based on k = 30 random projections
p-value ANOVA = 0.4026718
p-value ATS   = 0.3422311
p-value WTPS  = 0.509
\end{CodeOutput}
\end{CodeChunk}
Besides of the $p$~values displayed above, the list of matrices of projections of the data may be of practical interest for the test based on random projections users. The reason for this is that we can check the assumptions of the tests used in step 3 of the procedure based on random projections (see Section~\ref{sec21}), e.g., the normality assumptions of ANOVA F test. Such inspection may result in choosing the appropriate test used in this step. This is especially important when the tests based on random projections differ in their decisions.
\begin{CodeChunk}
\begin{CodeInput}
R> fanova$TRP$data.projections
\end{CodeInput}
\begin{CodeOutput}
[[1]]
          [,1]     [,2]     [,3]      [,4]          [,30]
[1,]  56.27204 42.95853 4.717162  2.128967 ...  -6.055347
[2,]  59.79175 47.34407 4.081016  5.029328 ... -15.867838
[3,]  59.56066 50.29226 5.867918  4.284960 ... -17.816042
...
[39,] 82.65391 65.15959 12.971629 7.695403 ...  -7.070380
\end{CodeOutput}
\end{CodeChunk}
As expected, neither FANOVA test rejects the null hypothesis. Now, we show how particular tests can be chosen and how the parameters of these tests can be changed. For the FP test, we use the predefined basis function representation of the data. For this purpose, we expand the data in the b-spline basis by using the functions from the \pkg{fda} package. They return the coefficients of expansion as well as the cross product matrix corresponding to the basis functions. For control, we choose the GPF test, which does not need any additional parameters. The Fmaxb test is performed by 1000 bootstrap samples. For the tests based on random projections, 10 and 15 projections are generated by using the Brownian motion, and $p$~values are computed by the permutation method.
\begin{CodeChunk}
\begin{CodeInput}
R> fbasis <- create.bspline.basis(rangeval = c(0.025, 0.975), 19, norder = 4)
R> own.basis <- Data2fd(seq(0.025, 0.975, len = 20), x.gait[[1]], fbasis)$coefs
R> own.cross.prod.mat <- inprod(fbasis, fbasis)
R> set.seed(123)
R> fanova.tests(x.gait[[1]], group.label.gait,
+               test = c("FP", "GPF", "Fmaxb", "TRP"),
+               params = list(paramFP = list(B.FP = 1000, basis = "own",
+                                            own.basis = own.basis,
+                                            own.cross.prod.mat =
+                                               own.cross.prod.mat),
+                             paramFmaxb = 1000,
+                             paramTRP = list(k = c(10, 15),
+                                             projection = "BM",
+                                             permutation = TRUE,
+                                             B.TRP = 1000)))
\end{CodeInput}
\begin{CodeOutput}
     Analysis of Variance for Functional Data

FP test - permutation test based on a basis function representation
Test statistic = 1.468105  p-value = 0.193

GPF test - globalizing the pointwise F-test
Test statistic = 1.363179  p-value = 0.2691363

Fmaxb test - Fmax bootstrap test
Test statistic = 3.752671  p-value = 0.177

TRP - tests based on k = 10 random projections
p-value ANOVA = 0.3583333
p-value ATS   = 0.3871429
p-value WTPS  = 0.465

TRP - tests based on k = 15 random projections
p-value ANOVA = 0.504
p-value ATS   = 0.507
p-value WTPS  = 0.345
\end{CodeOutput}
\end{CodeChunk}
The above examples concern only the first functional feature of the \textit{gait} data set. Similar analysis can be performed for the second one. However, both features can be simultaneously investigated by using the FMANOVA tests desribed in Section~\ref{sec22}. First, we consider the permutation tests based on a basis function representation implemented in the function \code{fmanova.ptbfr()}. We apply this function to the whole data set specifying non-default values of most of parameters. Here, we also show how use the function \code{summary.fmanovaptbfr()} to additionally obtain a summary of the data and test parameters. Observe that the results are consistent with these obtained by FANOVA tests.
\begin{CodeChunk}
\begin{CodeInput}
R> set.seed(123)
R> fmanova <- fmanova.ptbfr(x.gait, group.label.gait, int = c(0.025, 0.975),
+                           B = 5000, basis = "b-spline", criterion = "eBIC",
+                           commonK = "mean", minK = 5, maxK = 20, norder = 4,
+                           gamma.eBIC = 0.7)
R> summary(fmanova)
\end{CodeInput}
\begin{CodeOutput}
 FMANOVA - Permutation Tests based on a Basis Function Representation

 Data summary

Number of observations = 39
Number of features = 2
Number of time points = 20
Number of groups = 3
Group labels: 1 2 3
Group sizes: 13 13 13
Range of data = [0.025 , 0.975]

 Testing results

W  = 0.9077424   p-value = 0.5322
LH = 0.1003732   p-value = 0.5286
P  = 0.09340229  p-value = 0.5366
R  = 0.08565056  p-value = 0.3852

 Parameters of test

Number of permutations = 5000
Basis: b-spline (norder = 4)
Criterion: eBIC (gamma.eBIC = 0.7)
CommonK: mean
Km = 20 20 KM = 20 minK = 5 maxK = 20
\end{CodeOutput}
\end{CodeChunk}
Finally, we apply the tests based on random projections for the FMANOVA problem in the \textit{gait} data set. In the following, these tests are performed with $k=1,5,10,15,20$ projections, standard and permutation methods as well as the random projections generated in independent and dependent ways. The resulting $p$~values are visualized by the \code{plot.fmanovatrp()} function:
\begin{CodeChunk}
\begin{CodeInput}
R> set.seed(123)
R> fmanova1 <- fmanova.trp(x.gait, group.label.gait, k = c(1, 5, 10, 15, 20))
R> fmanova2 <- fmanova.trp(x.gait, group.label.gait, k = c(1, 5, 10, 15, 20),
+                          permutation = TRUE)
R> plot(x = fmanova1, y = fmanova2)
R> plot(x = fmanova1, y = fmanova2, withoutRoy = TRUE)
R> set.seed(123)
R> fmanova3 <- fmanova.trp(x.gait, group.label.gait, k = c(1, 5, 10, 15, 20),
+                          independent.projection.tests = FALSE)
R> fmanova4 <- fmanova.trp(x.gait, group.label.gait, k = c(1, 5, 10, 15, 20),
+                          permutation = TRUE,
+                          independent.projection.tests = FALSE)
R> plot(x = fmanova3, y = fmanova4)
R> plot(x = fmanova3, y = fmanova4, withoutRoy = TRUE)
\end{CodeInput}
\end{CodeChunk}

\begin{figure}[!t]
\includegraphics[width=0.99\textwidth,
height=0.55\textwidth]{Fig2.pdf}
		\caption{P~values of the tests based on random projections for the FMANOVA problem in the \textit{gait} data set. The Wpp, LHpp, Ppp and Rpp tests are the permutation versions of the Wp, LHp, Pp and Rp tests. In the first (resp. second) row, the random projections were generated in independent (resp. dependent) way.}
		\label{Fig2}
\end{figure}

The obtained plots are shown in Figure \ref{Fig2}. As we can observe, except the standard Rp test, all testing procedures behave similarly and do not reject the null hypothesis. The standard Rp test does not keep the pre-assigned type-I error rate as \cite{GoreckiSmaga2017} shown in simulations. More precisely, this test is usually too liberal, which explains that its $p$~values are much smaller than these of the other testing procedures. That is why the function has option not to plot the $p$~values of this test.

\newpage

\bibliography{fdANOVA}
%\bibliographystyle{jss}

  \end{document}
