1 Introduction

This interesting paper adds on the authors long and outstanding trajectory in robust statistics and , more specifically, on robust functional data analysis. We congratulate Mia Hubert, Peter Rousseeuw and Pieter Segaert for this important contribution.

As the authors point out, most of the literature to date on functional outlier detection deals with univariate functional data (one curve observed by individual). This work considers the case of p-variate functional data (p curves observed by individual). The paper discusses carefully the problem of outlier detection in this setting and starts by establishing a classification of different outlying behaviours. Then, several p-variate functional depths and distance functions are defined by integrating over time the existing or newly defined p-variate counterparts. Finally, by combining these measures, several graphical diagnostic tools are proposed.

We would like to contribute to the discussion by focusing on two aspects. Firstly, we will compare the proposed taxonomy of functional outliers with the classification currently adopted in the literature. Secondly, we will comment on the differences between the proposed collection of methods and the outliergram (Arribas-Gil and Romo 2014), the recent procedure to detect shape outliers. We compare it with the proposed methodology in several examples.

2 Taxonomy of functional outliers

The proposed taxonomy of functional outliers (see Fig. 1, top) is mainly based in the way curves are generated. If we assume that the observed curves are i.i.d. realizations of a common process \(\{Y(t), t\in U\}\) on \(\mathbb {R}^p\) with expectation f(t) characterizing the shape of the curves, outliers can be thought of as curves drawn from a different process with expectation \(g(t)\ne f(t)\), whose relation with f generates the classification proposed in the article. This provides a detailed description of functional outlying behaviour and significantly differs from the usual classification in functional data analysis literature, mostly based on how outliers can be detected. This is one of the interesting contributions of the paper.

Most authors consider magnitude outliers those curves lying outside the range of the vast majority of the data [see e.g., Hyndman and Shang (2010)]. That is, a magnitude outlier is generally thought of as a curve that visually outstands from the rest of the sample, either because it is persistently out of the range of the remaining curves (vertically shifted or amplified from a common pattern) or because it has one or several spikes or bumps at places where the rest of the curves don’t. These are typically the curves that would be detected as outliers by the functional boxplot of Sun and Genton (2011). However, in the proposed taxonomy only some of the curves of the first kind (the ones that are amplified from a common pattern) are considered as magnitude outliers, whereas the second kind of curves are defined as isolated outliers. While we acknowledge that the distinction between isolated and persistent outliers is very interesting, we think that perhaps it could be made at a second level, as suggested in Fig. 1(bottom).

Fig. 1
figure 1

Proposed taxonomy (top) and usual taxonomy (bottom)

With respect to the notion of shift outliers, defined by the authors as curves with the same shape as the majority but moved away, perhaps it is convenient to clarify if it includes vertical and horizontal shifts or only vertical shifts (all the examples given in the article only consider vertical shift outliers). There could be an overlapping with previous concepts since most authors would consider that vertical shift outliers are magnitude outliers (the ones that are easily identified by visually inspection of the data and can be detected with the functional boxplot). As for horizontal shift outliers, one should consider two cases: the case in which horizontal shifting doesn’t carry significant information about the process of interest and where typically all the curves present horizontal variation that can be removed by time warping, and no outliers will remain after registration; and the case in which only few curves present horizontal variation with respect to the rest, and this variation is understood as an important difference in behavior from one individual to another, and in this case these horizontal shift outliers can be considered as a particular case of shape outliers.

3 Some differences with the outliergram

The graphical diagnostic tools presented in the article are the halfspace heatmap, the bagdistance heatmap, the skew-adjusted projection heatmap, the adjusted outlyingness heatmap and the centrality-stability plot. The heatmaps provide a graphical representation of the data as a coloured matrix where rows represent curves and columns represent time points. Rows are sorted according to the functional depth or distance value of each curve and cells are coloured according to the depth or distance value of the curve indicated by the row evaluated at the time point indicated by the column. If one considers univariate functional data, these heatmaps can be seen as an alternative to the representation of the data matrix containing the curves where the cells are coloured according to the values of the functions and the rows are sorted according to some ranking [for instance the one provided by the functional epigraph index that ranks the curves of a sample from bottom to top (López-Pintado and Romo 2011)]. Such representation for the octane data set is presented in Fig. 2. However, in the p-variate functional case these heatmaps are able to summarize the information on the p dimensions that should be obtained by inspecting and combining the representation of the p data matrices otherwise. They thus provide a valuable way of visually inspecting the sample.

Fig. 2
figure 2

Left octane data set. Right matrix data for the octane data set where rows (curves) are sorted by the epigraph index and cells are coloured according to the curve values (color figure online)

The centrality-stability plot is somehow similar to the outliergram since it provides a two-dimensional representation of functional data based on the combination of two depth related measures and it exploits their relationship to characterize curves whose outlyingness changes over time, which includes what we understand by shape outliers.

The essential difference between the tools defined in the article and the outliergram is that the latter is only defined for univariate functional data. However, one can consider the comparison either in the case \(p=1\) or by applying the outliergram to the marginals, as it is done in the article. For instance, in the writing data set, the application of the outliergram to the horizontal and vertical coordinates provide the results presented in Fig. 3. We can see that the outliers detected are different from the ones identified with the centrality-stability plot in the article, except for curve 41, that is detected with both techniques. The remaining ones correspond to letters i that are atypically open (80), atypically closed and curvy (146), atypically short (39) or atypically rounded (77). The fact that the application of both methods provides different results makes us think of their complementarity.

Fig. 3
figure 3

Left writing data set with outliers in color. Right outliergrams on the horizontal and vertical coordinates (the adaptive detection rule has been applied with \({ FDR}=0.02\), see (Arribas-Gil and Romo 2014) for details) (color figure online)

Another difference is that the outliergram provides an outlier detection rule on top of the graphical display. Although visual inspection of different representations of the data and selection of potential outliers by hand is preferable in many cases, a rule is convenient for very large data sets or when automation is required. Regarding the heatmaps, an straightforward rule could be given by considering as outliers all the curves with a value in the corresponding functional depth or distance below or above a certain threshold. In the case of the centrality-stability plot, some kind of detection rule could be also useful. In this sense, the centrality-stability plot and outliergram results on the tablets data set presented in Section 5.2 of Hubert et al. (2015) are of a different nature. Indeed, for the outliergram the solution is that one provided by the automatic detection rule whereas the outliers are selected by hand after visual inspection of the centrality-stability plot. If we take a closer look at the outliergram on the baseline-corrected spectra and their derivatives (Fig. 4) we can see that outlying curves appear at the bottom of the plot, and if we adjust the detection rule (estimating the cutoff from the data) we can separate them from the rest of the sample. Of course, there is no such clear separation as the one obtained with the centrality-stability plot, since here we only consider one source of information at a time, that is, only the information contained in the marginals. Moreover, we don’t exploit all the marginals since we loose the information on the levels of the functions (the analysis of the baseline functions with the outliergram won’t provide any conclusions since they are all parallel, so no shape outlying information can be extracted from them). Also, the outliergram is not sensitive to small shape variations, such as the roughness of curves 71–90 around wavelength 8170. That is why the outliergram on the baseline-corrected spectra provides worst clustering than the one on the derivatives (where this small variations are amplified). However, what can be seen as a drawback in this particular example makes the outliergram robust to the presence of noise and to smoothing of the data.

Fig. 4
figure 4

Left Outliergram of the baseline-corrected spectra of the tablets data set. Right outliergram of the derivatives of the tablets data set. (the adaptive detection rule has been applied with \({ FDR}=0.15\), see (Arribas-Gil and Romo 2014) for details)