Introduction

The study of evolutionary and developmental changes in morphology has seen a significant renovation for the last two decades, particularly due to the use of methods based on Cartesian coordinates of points or landmarks which capture the geometric information of phenotypic structures (Rohlf and Marcus 1993; Adams et al. 2004; Mitteroecker and Gunz 2009). Because a large proportion of phenotypic variation among individuals and/or species involves spatial changes in specific anatomical points, landmark-based analysis are shedding light on morphometric aspects not previously addressed (Zelditch et al. 2004; Mitteroecker et al. 2005; Hallgrimsson and Lieberman 2008).

In order to compare all of a set of specimens to extract information about the spatial relationships among landmarks, it is necessary to establish a common reference frame or coordinate system. A central issue in coordinate-based methods is therefore which criterion should be used to align these configurations of points, since shape differences among specimens highly depend on those alignments (Richtsmeier et al. 2002; Perez et al. 2006; Theobald and Wuttke 2006; Catalano and Goloboff 2012). An optimal superimposition exhibiting shape differences can only be achieved when differences due to translation, scaling and rotation or reflection have been filtered out, because these transformations do not alter shape. The least-squares (LS) or classical Procrustes superimposition has become the preferred alignment criterion within the morphometric synthesis (Bookstein 1996; Adams et al. 2004, 2013; Zelditch et al. 2004; Mitteroecker and Gunz 2009). This method minimizes the sum of squared Euclidean distances between the Cartesian coordinates of landmarks after superimposition, and the magnitude of shape differences between any two configurations of landmarks is estimated by this sum of squared Euclidean distances, traditionally named Procrustes distance. Procrustes distance is afterwards analyzed by statistical techniques that preserve its geometric properties, such as PCA or multivariate regression (Adams et al. 2004; Zelditch et al. 2004; Mitteroecker and Gunz 2009), and shape changes are finally depicted as relative shifts of landmark positions, or by thin-plate spline interpolating functions that use deformation grids and 3D morphing (Mitteroecker and Gunz 2009; Klingenberg 2013).

Despite being the method commonly used the LS Procrustes alignment has some limitations, particularly when shape differences are not homogeneous across landmarks (Slice 1996; Richtsmeier et al. 2002; Theobald and Wuttke 2006; Van der Linde and Houle 2009). In such cases the resultant superimposition can be somewhat misleading, as relatively large differences in only a few landmarks tend to be spread since the LS Procrustes optimal matching averages the overall lack of fit (Rohlf and Slice 1990). As a consequence, relative shape changes of different parts of a structure or an organism may not be accurately depicted and/or measured, and hence revealed (Fig. 1).

Fig. 1
figure 1

LS (a) and resistant fit (b) Procrustes superimpositions of the 8 vertices (landmarks) of a standard cube and a similar cube that results from distorting the upper face moving the two top posterior landmarks downwards. Non-squared absolute residuals for each landmark following superimposition are shown in (c)

Although several alternatives have been proposed to overcome this drawback (Rohlf 1990; Rohlf and Slice 1990; Zelditch et al. 2004; Van der Linde and Houle 2009), the robust or resistant Procrustes fit (Siegel and Benson 1982) is probably the most elegant and efficient strategy among them. The resistant alignment between any two configurations of landmarks is intuitively achieved by perfectly superimposing those unchanged landmarks. Then, true shape differences may be clearly exhibited through the lack of fit of the remaining landmarks and their corresponding large residuals (Fig. 1). This parsimonious matching strategy is a realization of the statistical property of resistance: whenever a few points from a data set deviate from the trend of shape change for most of points, a resistant procedure does not allow them to have a major influence on the resultant fit. It is worth mentioning that whenever more than half of the landmarks change, the notion of change itself -its measurement and/or recognition- becomes ambiguous. The referred strategy is also followed in other biological areas, such as the alignment of DNA sequences (Lemey et al. 2009). The resistant fit (RF) uses a repeated-medians calculation to estimate the alignment parameters: the best possible superimposition for most of the landmarks is therefore obtained, being not affected by huge displacements in just a few points. This desirable property comes from the fact that if up to half but one values in a set vary, their nested-median remains the same, or resists. Accordingly, shape differences resulting from a resistant superimposition typically reflect more accurately those localized differences among configurations. This facilitates the biological understanding and interpretation of phenotypic variation in many situations (Siegel and Benson 1982; Slice 1996).

In this article we first develop an alternative extension of the resistant Procrustes method to handle two configurations of three-dimensional landmarks (i.e., a new ordinary resistant Procrustes superimposition, ORP). This novel formulation avoids the use of skew-symmetric matrices and triplets of points considered in a previous version from Slice (1996) to estimate the rotation matrix. Next, we extend this method to compare more than two configurations of landmarks (i.e., a new generalized resistant Procrustes superimposition, GRP), matching them all iteratively to a pivotal or resistant consensus configuration: the configuration whose landmarks are, respectively, the 3D spatial medians of the corresponding landmarks; this configuration can be obtained by using a well known algorithm (Weiszfeld 1937). Third, following the resistant superimposition we propose a coherent distance to estimate shape differences between configurations; its proprieties are in turn discussed. Finally, we use this distance as an input for exploratory multivariate analysis based on both the resistant version of the universal Multidimensional Scaling (Agarwal et al. 2010; rMDS in the reference) and the non-metric Multidimensional Scaling (Taguchi and Oono 2004; nmMDS in the reference), leading to a coherent display of shape differences among configurations. We compare ordinations generated by the rMDS and nmMDS based on the proposed distance with the corresponding ordinations generated by the classical version of the universal Multidimensional Scaling (named fMDS in the reference) and Principal Coordinates based on the Procrustes distance. Examples include both simulated and real data, the latter from phylogenetic and ontogenetic contexts.

Procrustes Superimposition Methods

Procrustes superimposition methods are based upon measurements of n homologous points named landmarks which are identified in every configuration being studied, generally in two or three dimensions. Let l (k) i be the Cartesian coordinates of landmark i (i = 1, … , n) from configuration k (k = 1, 2) stored in row i from the n by p (p = 2 or 3) matrix X k . A combination of an isotropic scale factor ρ > 0, a rotation/reflection matrix R and a (row) translation term t are searched such that the transformed landmarks from matrix X 1:

$$ \rho {\mathbf{l}}_{i}^{(1)} {\mathbf{R}}{ + }{\mathbf{t}} $$

achieve maximal agreement with corresponding landmarks l (2) i from matrix X 2. The choice of both the fixed and transformed matrix is arbitrary; for simplicity, it will be assumed in the following that X 2 will remain fixed while X 1 will be transformed. Both LS and resistant Procrustes superimpositions are special cases of orthogonal Procrustes analysis; as mentioned, this work focuses on the latter, and the reader may search in the literature for a detailed description about the former (Gower 1970, 1975; Rohlf and Slice 1990).

Resistant Procrustes Superimposition in 3D

Ordinary Resistant Procrustes (ORP) Superimposition

The resistant Procrustes superimposition was originally stated for two-dimensional landmarks configurations (Siegel and Benson 1982), but some steps in that procedure can be carried over to three dimensions with the proper adaptations: this is the case for the scale factor and the translation term estimates. The estimation of the resistant rotation matrix, however, requires a slightly different approach. Following an unpublished paper by Siegel and Pinkerton, Slice (1996) suggested the use of skew-symmetric matrices to parameterize rotations as a preliminary step to apply repeated-medians. Although it has been widely used since for 3D data handling, Slice’s formulation turns out to be rather cumbersome and computationally expensive due to the amount of initial estimates of the rotation matrix required before applying a three times-repeated median on all of them. Moreover, no proof of the goodness of the algorithm is given (something Siegel and Benson carefully did in their pioneer work) leaving reasonable doubts on its average performance. These two drawbacks are solved by our new formulation, which is presented next (see Appendix also). A LS Procrustes superimposition is typically assumed to have been previously performed, to take into account eventually needed reflections and to facilitate the estimation of the rotation angle.

The Resistant Rotation

Consider two sets of homologous points in 3D. Unlike in 2D, the alignment of two points in 3D it is not a well defined problem from a mathematical point of view: there are infinitely many solutions, and triplets of points should instead be considered for a correct formulation. However, it is not possible to exactly map the coordinates of three arbitrary landmarks from one configuration in 3D onto the corresponding landmarks from the other configuration, yet considering rotation, scaling and translation. Overcoming this intrinsic limitation constitutes a major challenge when attempting to extend the resistant Procrustes method from 2D to 3D data.

Every unordered triplet of landmarks from a 3D configuration describes a planar triangle, and three pairs of corresponding landmarks will therefore describe two homologous triangles. From a geometrical point of view, the alignment of a pair of homologous triangles in 3D requires two conditions to be fulfilled:

  1. 1.

    The planes containing the triangles must be parallel, and

  2. 2.

    the alignment of one of the corresponding sides of the triangles it is also necessary.

To estimate the rotation matrix, Slice (1996) considered triplets of landmarks of the form:

$$ \left\{ {{\mathbf{l}}_{i}^{(k)} ,{\mathbf{l}}_{j}^{(k)} ,{\mathbf{l}}_{m}^{(k)} } \right\} \, \left( {k = \,1,2} \right) $$

which have three degrees of freedom and generate \( \left( {\begin{array}{*{20}c} {n} \\ 3 \\ \end{array} } \right) = \tfrac{{{n}({n} - 1)({n} - 2)}}{6} \) combinations during the estimation process. In contrast, triplets of the form:

$$ \left\{ {{\mathbf{l}}_{i}^{{(k)}} ,\,{\mathbf{l}}_{j}^{(k)} ,\,{\mathbf{0}}} \right\}({k} = 1,2) $$

are now chosen. They retain only two degrees of freedom as in 2D, and involve significantly fewer combinations of landmarks: \( \left( {\begin{array}{*{20}c} n \\ 2 \\ \end{array} } \right) = \tfrac{{{n}({n} - 1)}}{2} \), resulting in simplicity and computation time saving. Here 0 denotes the zero vector, which geometrically refers to the origin of the coordinate system in 3D.

The estimation of the rotation matrix R may now be approached. For every pair of homologous triplets:

$$ \left\{ {{\mathbf{l}}_{i}^{{(k)}} ,\,{\mathbf{l}}_{j}^{{(k)}} ,\,{\mathbf{0}}} \right\}(k = 1,2) $$

describing homologous triangles whose sides are given by vectors l (k) i , l (k) j and l (k) j  − l (k) i , an initial estimate R ij of the rotation matrix is required to align side l (k) j  − l (k) i from both triangles through the equation:

$$ \left( {{\mathbf{l}}_{j}^{(1)} - {\mathbf{l}}_{i}^{(1)} } \right){\mathbf{R}}_{ij} = \left( {{\mathbf{l}}_{j}^{(2)} - {\mathbf{l}}_{i}^{(2)} } \right) $$
(1)

in the view of condition 2) above. (Note that vectors l (k) j  − l (k) i need first to be made of the same length, because lengths are preserved by rotation matrices. Is typical to make them of unit length -named unitary vectors- as the needed operations can always be performed)

Assume for now that landmarks l (k) i and l (k) j are not collinear in any of the configurations: the area of the triangle they describe is therefore not null, and the spatial orientation of that triangle or the plane containing it can be simply determined by the orientation of a vector orthogonal to them. The cross product of two arbitrary vectors lying on a plane produces a vector orthogonal to them and to the plane; thus, any two of the triangle sides will serve to this objective. The choice:

$$ \left( {{\mathbf{l}}_{j}^{{(k)}} - {\mathbf{l}}_{i}^{{(k)}} } \right) \times {\mathbf{l}}_{j}^{{(k)}} $$

(where × is the cross-product of 3D vectors) will aid in solving the matrix equation on R ij . Now, to make two planes parallel it is equivalent and easier to make its corresponding orthogonal vectors parallel. Thus, by condition 1 above it seems appropriate for rotation matrix initial estimate R ij to satisfy also equation:

$$ \left[ {\left( {{\mathbf{l}}_{j}^{(1)} - {\mathbf{l}}_{i}^{(1)} } \right) \times {\mathbf{l}}_{j}^{(1)} } \right]{\mathbf{R}}_{ij} = \left[ {\left( {{\mathbf{l}}_{j}^{(2)} - {\mathbf{l}}_{i}^{(2)} } \right) \times {\mathbf{l}}_{j}^{(2)} } \right] , $$
(2)

where reasoning as before both cross-product vectors have been made unitary. Finally, rotation matrices in 3D are compatible with the cross-product (Gantmacher 1959) in the sense that they preserve orientation and satisfy:

$$ \left( {{\mathbf{u}} \times {\mathbf{v}}} \right){\mathbf{R}} = {\mathbf{uR}} \times {\mathbf{vR}} $$

for every choice of equal length vectors u and v in 3D. Combining this property with Eqs. (1) and (2) above provides a third and final equation:

$$ \left\{ {\left( {{\mathbf{l}}_{j}^{(1)} - {\mathbf{l}}_{i}^{(1)} } \right) \times \left[ {\left( {{\mathbf{l}}_{j}^{(1)} - {\mathbf{l}}_{i}^{(1)} } \right) \times {\mathbf{l}}_{j}^{(1)} } \right]} \right\}{\mathbf{R}}_{ij} = \left\{ {\left( {{\mathbf{l}}_{j}^{(2)} - {\mathbf{l}}_{i}^{(2)} } \right) \times \left[ {\left( {{\mathbf{l}}_{j}^{(2)} - {\mathbf{l}}_{i}^{(2)} } \right) \times {\mathbf{l}}_{j}^{(2)} } \right]} \right\} $$
(3)

where again involved vectors are assumed to be unitary. All these considerations are summarized in the following matrix equation for the initial rotation matrix estimate R ij :

$$ \left[ {\begin{array}{*{20}c} {{\mathbf{l}}_{j}^{(1)} - {\mathbf{l}}_{i}^{(1)} } \\ {\left( {{\mathbf{l}}_{j}^{(1)} - {\mathbf{l}}_{i}^{(1)} } \right) \times {\mathbf{l}}_{j}^{(1)} } \\ {\left( {{\mathbf{l}}_{j}^{(1)} - {\mathbf{l}}_{i}^{(1)} } \right) \times \left[ {\left( {{\mathbf{l}}_{j}^{(1)} - {\mathbf{l}}_{i}^{(1)} } \right) \times {\mathbf{l}}_{j}^{(1)} } \right]} \\ \end{array} } \right]{\mathbf{R}}_{ij} = \left[ {\begin{array}{*{20}c} {{\mathbf{l}}_{j}^{(2)} - {\mathbf{l}}_{i}^{(2)} } \\ {\left( {{\mathbf{l}}_{j}^{(2)} - {\mathbf{l}}_{i}^{(2)} } \right) \times {\mathbf{l}}_{j}^{(2)} } \\ {\left( {{\mathbf{l}}_{j}^{(2)} - {\mathbf{l}}_{i}^{(2)} } \right) \times \left[ {\left( {{\mathbf{l}}_{j}^{(2)} - {\mathbf{l}}_{i}^{(2)} } \right) \times {\mathbf{l}}_{j}^{(2)} } \right]} \\ \end{array} } \right] $$
(4)

where matrices between brackets are clearly orthogonal since theirs rows are pairwise orthogonal and unitary by construction, under the assumption of non-collinearity of landmarks l (k) i and l (k) j . Solving this equation is immediate by using the orthogonality of the two known matrices:

$$ {\mathbf{R}}_{ij} = \left[ {\begin{array}{*{20}c} {{\mathbf{l}}_{j}^{(1)} - {\mathbf{l}}_{i}^{(1)} } \\ {\left( {{\mathbf{l}}_{j}^{(1)} - {\mathbf{l}}_{i}^{(1)} } \right) \times {\mathbf{l}}_{j}^{(1)} } \\ {\left( {{\mathbf{l}}_{j}^{(1)} - {\mathbf{l}}_{i}^{(1)} } \right) \times \left[ {\left( {{\mathbf{l}}_{j}^{(1)} - {\mathbf{l}}_{i}^{(1)} } \right) \times {\mathbf{l}}_{j}^{(1)} } \right]} \\ \end{array} } \right]^{\prime } \left[ {\begin{array}{*{20}c} {{\mathbf{l}}_{j}^{(2)} - {\mathbf{l}}_{i}^{(2)} } \\ {\left( {{\mathbf{l}}_{j}^{(2)} - {\mathbf{l}}_{i}^{(2)} } \right) \times {\mathbf{l}}_{j}^{(2)} } \\ {\left( {{\mathbf{l}}_{j}^{(2)} - {\mathbf{l}}_{i}^{(2)} } \right) \times \left[ {\left( {{\mathbf{l}}_{j}^{(2)} - {\mathbf{l}}_{i}^{(2)} } \right) \times {\mathbf{l}}_{j}^{(2)} } \right]} \\ \end{array} } \right] $$
(5)

where “′” denotes the matrix transpose. An initial estimate R ij of the rotation matrix is in this way obtained for every pair of corresponding landmarks l (k) i and l (k) i (k = 1, 2).

According to matrix theory results (Gantmacher 1959), every 3D rotation matrix can be expressed in a rather simple or canonical way by means of both an unitary (row) eigenvector v and an angle θ. The eigenvector v expresses the direction of the rotation axis, a straight line left fixed by the rotation, while the angle θ measures the amount of circular displacement around the axis, in radians. A correspondence between 3D rotation matrices R and pairs of the form (v, θ) arises in this way naturally. Moreover, this link can be made unique by selecting a fixed range for angle values θ and establishing some criterion to pick vector v among the two possible and opposite unitary eigenvectors associated to a straight line in three dimensions. The range [−π, π] is a natural choice for θ if a preliminary LS Procrustes superimposition has been performed, and the uniqueness of vector v can be achieved if, for instance, its first coefficient is chosen to be positive. This bijective relationship can be expressed by the notation R (v, θ) that will be adopted in the following. Once the initial rotation matrix estimate R ij is obtained, the associated rotation axis v ij and rotation angle θ ij can be computed by standard matrix decomposition techniques. The repeated-median is then applied componentwise to all these vectors:

$$ \mathop {\text{med} }\limits_{i} \,(\mathop {\text{med} }\limits_{{{j} \ne {i}}} \, {\mathbf{v}}_{ij} ) = \widetilde{{\mathbf{v}}} $$
(6)

and to their corresponding angles:

$$ \mathop {\text{med} }\limits_{i} \,(\mathop {\text{med} }\limits_{{{j} \ne {i}}} \, \theta_{ij} ) = \,\tilde{\theta } $$
(7)

producing axis and angle final estimates \( \left( {{\tilde{\mathbf{v}}}, \, \tilde{\theta }} \right) \) on which the final rotation matrix estimate \( {\tilde{\mathbf{R}}}_{{\left( {{\tilde{\mathbf{v}}}, \, \tilde{\theta }} \right)}} = {\tilde{\mathbf{R}}} \) is based.

If the assumption of linear independence failed to be true for an arbitrary pair of landmarks l (k) i and l (k) j in any of the configurations, two situations might be given: they could be collinear in one configuration but not in the other, or they could be collinear in both. In any case, the collapsed area of at least one of the two triangles described by the triplet {l (k) i , l (k) j , 0} would make impossible the desired alignment. The parsimony principle would suggest in this case not to rotate, and to set by default a trivial initial estimate:

$$ {\mathbf{R}}_{ij} = \left[ {\begin{array}{*{20}c} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{array} } \right]\,\,(\text{the \, identity \, matrix}) $$

of the rotation matrix, associated to rotation angle θ = 0 and to rotation axis:

$$ {\mathbf{v}}_{ij} = \left[ {\begin{array}{*{20}c} 1 & 0 & 0\\ \end{array} } \right]. $$

The Scale Factor and the Translation Term

To complete the algorithm, the estimation of both the scale factor ρ > 0 and the translation term t is considered. As mentioned before, straightforward extension from their 2D analogues takes place.

For every pair of different landmarks l (k) i and l (k) j (1 ≤ i ≠ j ≤ n), an initial estimate ρ ij of ρ is defined as the scale factor by which the segment connecting these two points in configuration X 1 is made of the same length as the corresponding segment in X 2:

$$ \rho_{ij} = \left\| {\frac{{{\mathbf{l}}_{j}^{(2)} - {\mathbf{l}}_{i}^{(2)} }}{{{\mathbf{l}}_{j}^{(1)} - {\mathbf{l}}_{i}^{(1)} }}} \right\|. $$
(8)

The final scale estimate is naturally obtained by taking the doubly-repeated median:

$$ \mathop {\text{med} }\limits_{i} \, (\mathop {\text{med} }\limits_{{{j} \ne {i}}} \, \rho_{ij} ) = \widetilde{\rho}. $$
(9)

Once the final rotation matrix and scale factor estimates \( {\tilde{\mathbf{R}}} \) and \( \tilde{\rho } \) have been independently obtained, the translation term t is initially estimated by every row t i (i = 1, … , n) of the residual matrix:

$$ {\mathbf{X}}_{2} - \widetilde{\rho}{\mathbf{X}}_{1} \widetilde{{\mathbf{R}}}{ = }{\mathbf{T}}{ = }\left[ {\begin{array}{*{20}c} {{\mathbf{t}}_{1} } \\ \vdots \\ {{\mathbf{t}}_{n} } \\ \end{array} } \right] $$
(10)

The single componentwise median along each column of matrix T produces the final estimate:

$$ \mathop {\text{med} }\limits_{i} \, {\mathbf{t}}_{i} = \widetilde{{\mathbf{t}}} $$
(11)

of the translation term.

Generalized Resistant Procrustes (GRP) Superimposition

In order to generalize the proposed ORP method for optimal superimposition of r > 2 configurations of landmarks X 1, X 2, … , X r in an iterative procedure, both the notion of a consensus configuration and a stopping criterion are needed.

Two alternatives for a consensus configuration Y seem to be in line with the appealing features of the resistant method: the componentwise median configuration of the n × 3 matrices X 1, X 2, … , X r (Rohlf and Slice 1990), or their spatial median. For the latter, the consensus configuration Y is chosen to be the matrix whose rows or landmarks are, respectively, the spatial medians (Weiszfeld 1937) of the corresponding rows or landmarks from matrices X 1, X 2, … , X r . Unlike the componentwise median, the spatial median configuration Y is rotationally invariant and seems thus to be more appropriate in the context of shape analysis.

In any case, the monotonically convergence of an iterative procedure is not guaranteed as no explicit criterion is being optimized. The median residual (i.e., the median of the non squared Euclidean distances between corresponding landmarks) in the consensus configuration Y for successive iterations:

$$ \mathop {\text{med} }\limits_{i} \left\| {{\mathbf{y}}_{i} - {\tilde{\mathbf{y}}}_{i}} \right\| $$

can be used as the stopping criterion (Rohlf and Slice 1990), and this is our choice. In addition, it was combined with an efficiency-based configuration update criterion to aid in reaching convergence: each configuration X k (k = 1, 2, … , r) is transformed only when its median residual after transformation is confirmed to decrease, meaning that after transformation the lower 50 % of its residuals will be spread over a shorter range (see steps 5 and 6 below).

The algorithm steps to obtain the new GRP superimposition on r arbitrary n × 3 matrices X 1, X 2, … , X r are as follows:

  1. 1.

    Perform a preliminary LS generalized Procrustes superimposition. This achieves an initial alignment that takes into account reflections if needed, and usually enables a better estimation of the resistant rotation angle.

  2. 2.

    To aid in reaching convergence, scale each of the configuration matrices X k (k = 1, 2, … , r) to a common size, producing either a unit median interlandmark distance:

    $$ \mathop {\text{med} }\limits_{{{j} \ne {i}}} \left\| {{\mathbf{l}}_{j}^{{({k})}} - {\mathbf{l}}_{i}^{{(k)}} } \right\|\, = \,1 $$

    or a unit median landmark length:

    $$ \mathop {\text{med} }\limits_{i} \left\| {{\mathbf{l}}_{i}^{{(k)}} } \right\|\, = \,1. $$
  1. 3.

    Set Y, the initial consensus matrix, as the matrix whose ith row y i is the spatial median of the corresponding rows from current matrices X k (k = 1, 2, … , r) by using the Weiszfeld algorithm.

  2. 4.

    For every k = 1, … , r evaluate the initial residuals:

    $$ \left\| {{\mathbf{l}}_{i}^{{({k})}} - {\mathbf{y}}_{i} } \right\|\,(i\, = \,1,\,\ldots,\,{n}) $$

    between all pairs of corresponding landmarks from current matrix X k and current consensus matrix Y, respectively, and next evaluate the corresponding median residual:

    $$ \mathop {\text{med} }\limits_{i} \left\| {{\mathbf{l}}_{i}^{{(k)}} - {\mathbf{y}}_{i} } \right\|. $$

    (Note that whenever more than half of the landmarks from X k can be perfectly superimposed to the corresponding landmarks from consensus Y, this median residual will be zero).

  1. 5.

    For every k = 1, … , r perform a tentative ORP superimposition of current matrix X k to current consensus matrix Y, obtaining the corresponding new or fitted matrix \( {\tilde{\mathbf{X}}}_{k} \). Compute the new residuals:

    $$ \left\| {{\tilde{\mathbf{l}}}_{i}^{{({k})}} - {\mathbf{y}}_{i} } \right\|\,({i}\, = \,1,\,\ldots ,\,{n}) $$

    and the corresponding new median residual:

    $$ \mathop {\text{med} }\limits_{i} \left\| {{\tilde{\mathbf{l}}}_{i}^{{(k)}} - {\mathbf{y}}_{i} } \right\|. $$
  1. 6.

    If the new median residual is lower than the previous one, confirm the tentative fit and replace the current matrix X k by the corresponding fitted matrix: \( {\mathbf{X}}_{k} = {\tilde{\mathbf{X}}}_{\text{k}} \). Otherwise, leave matrix X k unchanged. Calculate afterwards the new consensus matrix \( {\tilde{\mathbf{Y}}} \) as the spatial median from the updated matrices.

  2. 7.

    If the median difference between the new and the previous consensus matrices:

    $$ \mathop {\text{med} }\limits_{i} \left\| {{\mathbf{y}}_{i}^{{}} - {\tilde{\mathbf{y}}}_{i}} \right\| $$

    is greater than a specified tolerance, set Y = \( {\tilde{\mathbf{Y}}} \) and repeat from step 5 above. Otherwise, the iteration is finished.

Resistant distance and exploratory multivariate analysis

Once the iterative procedure is completed and a GRP superimposition of configurations X 1, X 2, … , X r has been obtained, a resistant distance d kj is computed for each pair of configurations X k , X j as the total sum of residuals through all pairs of corresponding landmarks:

$$ d_{kj} = {\text{distance}}\left( {{\mathbf{X}}_{k} , \, {\mathbf{X}}_{j} } \right) \, = \sum\limits_{{{i} = 1}}^n {\left\| {{\mathbf{l}}_{i}^{{(k)}} - {\mathbf{l}}_{i}^{{({j})}} } \right\|} \, (1 \le {k,}\,{j} \le {r}\varvec{)} $$
(12)

In the context of morphometries, this measure has been frequently ruled out because it is less tractable from a mathematical point of view: it is neither differentiable, nor directly associated to an inner product which would link distances with lengths and angles. Unlike Procrustes distance, however, this distance is less affected by the presence of huge displacements in just a few landmarks, being in this way an appropriate tool for measuring those shape differences detected by a RF. In addition, whenever more than 50 % of the corresponding landmarks of matrices X k , X j can be superimposed perfectly, the RF performs that superimposition and makes each of the corresponding terms in (12) be equal to 0. All these reasons have made the proposed distance a reasonable choice to be included in our approach.

In order to visualize shape differences following a generalized Procrustes superimposition, the universal multidimensional scaling framework (Agarwal et al. 2010) was adopted. This versatile algorithm enjoys convergence and accuracy, allowing several input distances to be combined with different cost functions to produce its outputs. The classical version fMDS, where the generated distances are (non-squared) Euclidean and the cost function is the sum of squares between the original and the generated distances, was used for low-dimensional visualization of the generalized LS Procrustes results. Correspondingly, the resistant version rMDS, where the generated distances are (non-squared) Euclidean and the cost function is the sum of absolute differences between original and generated distances, was used for visualization of the GRP results. The rMDS takes as input: a reference set of 3D points or configurations X 1, X 2, … , X r ; the corresponding r × r distance or similarity matrix D = [d kj ] (in our case, the matrix of the resistant distances defined in (12)) and a parameter m (typically, m = 2) expressing the dimension in which those distances in D are to be depicted. The rMDS approach then obtains, through an iterative procedure, a corresponding set of points Z 1, Z 2, … , Z r in the m-dimensional space such that the sum of absolute non-squared differences (cost function) between the original distances and the (non squared) Euclidean distances between the generated points is minimized:

$$ \mathop {\hbox{min} }\limits_{{{\mathbf{Z}}_{1} \varvec{,}\, \ldots \varvec{,} \, {\mathbf{Z}}_{r} \in {\mathbb{R}}^{m} }} \, \sum\limits_{{1 \le {k,}\,{j} \le {r}}} {\left| {d}_{kj} - {\left\| {{\mathbf{Z}}_{k} - {\mathbf{Z}}_{j} } \right\|} \,\right|} $$
(13)

This resistant version rMDS was first suggested in (Cayton and Dasgupta 2006); it produces a lower dimensional representation of points that best recovers the original distances for most of them, being less sensitive than a LS cost function to the presence of outliers. Once again, this is the typical purpose of all resistant tools.

For comparative reasons we also employed a principal coordinates analysis (PCo; Davis 1986) in order to visualize the Procrustes distance, and a non-metric multidimensional scaling (nmMDS; Taguchi and Oono 2004) to visualize the resistant distance. PCo analysis finds the eigenvalues and eigenvectors of the matrix containing the Procrustes distances between all pairs of landmark configurations (Davis 1986). Then, the eigenvectors or PCo scores are used to plot the differences among these configurations. The nmMDS analysis, in turn, attempts to place the resistant distances between all pairs of landmark configurations in a two or three-dimensional coordinate system such the rank of the distances is preserved (Taguchi and Oono 2004). The PCo and nmMDS analyses were performed in PAST (Hammer et al. 2001).

Examples

All simulated and real examples were analyzed through Scilab written routines; pseudocode versions of them are available under request. A flexible and integrated software toolbox including all mentioned techniques is under preparation.

Simulated Data

Different simulation experiments were designed to evaluate the performance of the resistant Procrustes superimposition under the effect of an increasing amount of variation in a growing subset of landmarks. Based on a single skull that was extracted from a real sample of 152 Alouatta caraya specimens, 5 data sets consisting of 10 simulated primate skulls each were computer-generated. The target skull consisted in 35 cranial landmarks in 3D grouped in face, vault and base.

For the first set, 5 landmarks among a total of 26 from the face and cranial vault (Fig. 2) were randomly chosen and 10 individuals were simulated; as the variability of the base was known to be lower in the dataset, landmarks from this structure were not modified. In all of simulated individuals, the same 5 chosen landmarks were perturbed by a centered normally distributed noise whose standard deviation was set to be half the distance departing that landmark from its nearest, in order to obtain 95 % of simulated landmarks not altering this minimal distance. A similar procedure was performed by perturbing 10, 15, 20 and 25 randomly chosen landmarks among the 26 landmarks from the face and cranial vault. For simplicity, landmarks from the cranial base were constrained to present zero variation in all the simulations. The pattern of shape variation was in this way designed to reproduce the known changes of these structures during the ontogeny of mammalian skulls, where the cranial base reaches adult proportions at a younger age than the face and vault (Cheverud 1995; Sperber 2001).

Fig. 2
figure 2

Cranial landmarks recorded from New World monkeys using a 3D Microscribe G2X digitizer. Wireframes exhibited next in Figs. 10 and 12 are also displayed

Two main conclusions were drawn from these simulations. First, following the resistant Procrustes superimposition the residual between each perturbed landmark from any of the generated skulls and the corresponding landmark from the target (real Alouatta caraya) skull was always larger than the same residual calculated following a LS Procrustes superimposition (Fig. 3). This result confirmed the specific ability of the RF, when compared to LS, to more effectively detect and depict the true localized shape differences.

Fig. 3
figure 3

Plots showing the non-squared absolute residuals between target and selected simulated skulls for each landmark following LS and resistant fit (RF) Procrustes superimpositions

Second, the proportion of the total shape distance (considering the Procrustes distance after a LS fit, and the proposed resistant distance after a resistant superimposition) between the simulated skulls and the target skull that was accumulated solely by those perturbed landmarks (the ones capturing the true shape differences by design), the RF exhibited much higher accuracy whenever up to half of the landmarks were perturbed: averages between 87 and 95 % of the total resistant distance were associated to the true shape differences; instead, averages between 60 and 78 % of the total Procrustes distances were associated to true shape differences (Fig. 4). This confirmed that a resistant superimposition enables a more accurate measurement of shape differences: almost all the shape distance was concentrated on those landmarks where localized deformation had truly taken place. The LS fit, on the other hand, added artificial variation in many landmarks not associated to those partial deformations. Whenever more than half of the landmarks were perturbed (20 and 25 landmarks in our simulations, not anymore a pattern of local deformation) the percentage of total shape distance recovered by them was similar for both methods (see also Fig. 4).

Fig. 4
figure 4

Percentages of LS (gray) and resistant (white) distances between target and simulated skulls due to true shape differences. Dotted line separates simulated datasets with less (left) and more (right) than 50 % of landmarks perturbed

Ontogeny of Human Mandible

The postnatal ontogeny of human mandible is an example of moderate shape variation in which change is not expected to be particularly concentrated in a few landmarks. An ontogenetic series including individuals of both sexes with ages ranging from 7 to 45 years was analyzed. These specimens belong to the collection of Identified Skeletons of the University of Coimbra (Rocha 1995). In order to describe shape changes throughout ontogeny, 16 landmarks in 3D were digitized using a Microscribe G2X (Fig. 5). Both generalized LS and resistant Procrustes superimpositions were performed using this dataset. The resulting distances between individuals were represented in a low-dimensional space by means of the classical (fMDS) and the resistant (rMDS) versions of the universal MDS, and by the PCo and nmMDS implemented in PAST, as well.

Fig. 5
figure 5

Landmarks recorded from human mandible using a 3D Microscribe G2X digitizer. Wireframes exhibited next in Fig. 8 are also displayed

Ordinations of specimens along the fMDS and nmMDS axes (Fig. 6) were analogous to those obtained by PCo and rMDS, respectively, and therefore are not shown. Both the LS (Fig. 6a) and resistant Procrustes (Fig. 6b) superimpositions exhibited shape differences between the distributions of adults and subadults. The correlation between the respective distance matrices was high and significant (Spearman correlation R = 0.94; Pearson correlation r = 0.96; regression fit r2 = 92 %; Fig. 7a). An additional superimposition by both methods of two extreme configurations also showed the same pattern of shape change with age (Fig. 8); the alveolar region became relatively shorter and the angle between the alveolar region and the ascending ramus was narrower in older individuals (note that visualization techniques other than wireframes are also available, and they could be used to explore the pattern of shape changes depicted by each method of superimposition; see Klingenberg 2013). There were also striking changes in the posterior side of the ascending ramus: adults have the condylar process placed upward and the angular process placed downward compared to the morphology of young individuals.

Fig. 6
figure 6

MDS ordinations showing adult and sub-adult mandibles following generalized LS (a) and resistant (b) Procrustes superimpositions. Dots represent individuals

Fig. 7
figure 7

Scatter-plot of estimated LS versus resistant shape distance matrices for the mandible (a), cranial ontogeny (b) and cranial inter-specific (c) datasets. Dots represent the distance values; Spearman rank (R) and Pearson (r) correlation coefficients are indicated

Fig. 8
figure 8

Wireframes showing mandible shape changes between adult (gray line) and sub-adult (black line) individuals resulting from LS and resistant fit (RF) Procrustes superimpositions. Non-squared absolute residuals for each landmark are also shown

Cranial Ontogeny in Species of New World Monkeys

A third example focused on shape changes along postnatal ontogeny of skulls from three platyrrhine species: Cebus apella, Callithrix jacchus and Alouatta caraya. These species exhibit a relatively large variation which is expected to be concentrated in particular landmarks. We analysed the three ontogenetic series including individuals of both sexes and different ages. These specimens are housed in the Museo Argentino de Ciencias Naturales ‘‘Bernardino Rivadavia’’ (Argentina), in the Museo de Ciencias Naturales de La Plata (Argentina) and in the Museu Nacional de Rio de Janeiro (Brazil). In order to describe shape changes throughout ontogeny, 35 landmarks in 3D were digitized using a Microscribe G2X (Fig. 2). The generalized versions of both LS and resistant Procrustes superimpositions were performed, and the resulting distances were plotted by using universal fMDS and nmMDS.

The scatter-plot of LS vs resistant distances (Fig. 7b) for all pairs of individuals from the three species, and the ordination of specimens along the MDS axes based on LS and resistant distances were different (Fig. 9). Using LS and fMDS, the ontogenetic trajectories of the three species shared the same direction, and the Alouatta trajectory was an extension of the trajectory followed by Callithrix (Fig. 9a). Moreover, the trajectories of Cebus and Alouatta had a similar extension. Conversely, when the specimens were analysed through the resistant approach the ontogenetic trajectory of Alouatta exhibited differences in orientation and extension with respect to the remaining species (Fig. 9b). The scatter-plot in Fig. 7b clearly indicates that the two distance matrices differ considerably, showing a Spearman correlation of 0.81, a Pearson correlation of 0.75 and a regression fit of merely 56 %. The superimposition of two extreme configurations of Alouatta by both methods (Fig. 10) exhibited differences in the pattern of shape change with age, as expected. The LS method distributed the variation evenly across the skull, while the resistant method suggested that most of variation was concentrated on the facial region. In this way, the resistant Procrustes superimposition better represented the pattern of primate cranial growth during post-natal ontogeny, which is characterized by the extended growth of facial structures and the associated allometric shape changes (Cheverud 1995; Hallgrimsson and Lieberman 2008).

Fig. 9
figure 9

MDS ordinations displaying ontogenetic trajectories of three platyrrhine species following generalized LS (a) and resistant (b) superimpositions. Dots represent specimens from each species

Fig. 10
figure 10

Wireframes depicting cranial shape changes between adult (gray line) and sub-adult (black line) extreme primates resulting from LS and resistant fit (RF) Procrustes superimpositions. Non-squared absolute residuals for each landmark are also shown

Cranial Variation among Species of New World Monkeys

New World monkeys were also used to investigate the inter-specific pattern of shape variation at macroevolutionary scales. They are an excellent system for exploring the goodness of the resistant superimposition, due to the large variation in cranial shape (Perez et al. 2011). The variation among 29 species belonging to 5 main clades (Aotus, Cebinae, Atelidae, Pitheciidae, Callithrichinae) was analyzed; the sample included 221 adult individuals from both sexes (see Perez et al. 2011 for more details on the sample composition). These specimens are deposited in the Museo Argentino de Ciencias Naturales ‘‘Bernardino Rivadavia’’ (Argentina), in the Museo de Ciencias Naturales de La Plata (Argentina), in the Museu Nacional de Rio de Janeiro (Brazil) and in the Museu de Zoologia of Universidade de Sao Paulo (Brazil). In order to describe shape changes in 3D, the same set of 35 cranial landmarks considered for the ontogenetic analysis were used (Fig. 2).

The coordinates of landmarks within each species were superimposed first by LS, and consensus configurations for each species were then estimated. These consensus configurations were in turn aligned through a second LS fit, and the obtained distances between the superimposed consensus were represented using fMDS and PCo. When using the resistant approach, a GRP fit followed the LS superimposition within each species, and a resistant consensus for each species was in this way obtained. Afterwards, the LS plus a posterior GRP fit were applied to all the resistant consensus, and the resulting distances were depicted using both the rMDS and nmMDS.

The results showed that the ordination of the species consensus depends both on the method used to superimpose and represent the corresponding distances. While LS seemed to cluster the species by clade (Fig. 11a), the RF showed greater resemblance among three genera (Ateles, Chiropotes and Cebus) that exhibit convergent morphologies (Fig. 11b), mainly related to a higher degree of encephalization (Hartwig et al. 2011). The scatter-plot in the Fig. 7c showed that both distance matrices differ, but less than the cranial ontogenetic matrices: a Spearman correlation of 0.81, a Pearson correlation of 0.86 and a regression fit of 74 % were obtained. In order to visualize the patterns of shape change, we additionally superimposed by both methods two consensus configurations representing extreme species along the MDS axes (Fig. 12). The results resembled those obtained for the ontogenetic analysis, where the resistant method displayed the largest variation in the facial and some vault landmarks.

Fig. 11
figure 11

MDS ordinations of the platyrrhine species following LS (a) and resistant (b) generalized Procrustes superimpositions. Dots represent consensus configurations for each species

Fig. 12
figure 12

Wireframes showing cranial shape changes between extreme primate species resulting from LS and resistant fit (RF) Procrustes superimposition (LS). Non-squared absolute residuals for each landmark are also shown

Discussion

Both LS and resistant Procrustes fits are based on landmarks: specific points capturing the geometry of those structures being studied. By using homologous points, landmarks enable a rather complete structural understanding of shape variation patterns. The purpose of this work is to offer an integrated resistant approach for landmark-based shape comparisons in 3D: we present both a new ORP method and a corresponding GRP extension for the resistant superimposition of two or more than two configurations of landmarks, respectively. In the process, a resistant consensus configuration and a corresponding resistant distance are also presented.

In terms of the ORP method, the new algorithm differs from Slice’s (1996) mainly in two features: (1) Slice’s formulation is now greatly simplified by using pairs instead of triplets of homologous points, which results in significant computation time saving; (2) an optimality theorem is given (see Appendix), establishing the optimal performance of this algorithm whenever localized shape variation or partial deformation takes place. It must be acknowledged that, in comparison to LS, the repeated-medians calculation is computationally more expensive: obtaining the median of n values takes an average time proportional to n (the number of landmarks in this case) because each value has to be examined. Slice’s RF requires a processing time proportional to n 3, while the new algorithm maintains the processing cost at the same level of a two-dimensional superimposition: that is, proportional to n 2.

The new GRP, in turn, replaces the componentwise median (Rohlf and Slice 1990; Slice 1996) by a consensus configuration whose landmarks (rows) are, respectively, the spatial median of corresponding landmarks. An appealing feature of this choice is that, unlike the componentwise median, the spatial median is rotationally (and translationally) invariant, just as shape is. A resistant distance is also presented: the overall sum of non-squared Euclidean distances across all pairs of corresponding landmarks. Although not explicitly minimized, the corresponding terms in this distance become zero whenever more than 50 % of the landmarks can be perfectly superimposed.

We have shown through the simulations that the resistant superimposition better detects and measures localized shape variation: shape change located in up to 50 % of the landmarks. In particular, we demonstrated the ability of the resistant superimposition, when compared to LS, to highlight localized shaped change, producing in general greater residuals for those landmarks capturing partial deformation and lower residuals for those landmarks that did not change. Also, we showed that following a resistant superimposition the proportion of the total shape distance accumulated by those landmarks truly capturing the shape change was above 85 % and typically much higher than its LS analogue, confirming both the accuracy and the efficiency of the resistant approach in measuring real shape variation. Similar results were obtained by Walker (2000) when analyzing the results of generalized LS and resistant Procrustes alignments for estimating known covariance matrices: the latter performed better when less than 25 % of the landmarks had excessive variance, while both methods had a similar performance when more than 75 % of the landmarks had excessive variance.

The analysis of the three real data sets produced dissimilar results. Firstly, the mandibles example showed that when there is small and homogeneously distributed shape change, superimpositions by LS and RFs do not greatly differ and obtain a similar pattern of shape change. This example represents a real case in which more than half of the landmarks were perturbed, or changed along phylogenetic or ontogenetic evolution. Conversely, the phylogenetic and ontogenetic cranial data presented a very different scenario: whenever a moderate-to-great non-homogeneously distributed shape change was suspected, superimpositions by LS and RF methods revealed different patterns of shape change.

The LS fit has been typically favoured as “the” Procrustes method for optimal superimpositions mainly because it is based on the Euclidean distance, the distance we all are used to. Besides, the sum of squared differences it is mathematically more tractable than many other alternative distances: it is differentiable, and has a direct link to the inner product which enables the measurement of vector lengths and angles. Additionally, it has been suggested that LS has a theoretical advantage because it is placed in the geometrical theory of shape from Kendall (1984; Zelditch et al. 2004). Slice (2001), however, showed that LS methods used in biological studies are only an approximation to Kendall’s shape space. Lastly, the LS superimposition is considered the only way to obtain the so-called shape variables following landmark digitization (Zelditch et al. 2004; Mitteroecker and Gunz 2009). It seems thus reasonable to conclude that the adoption of the LS superimposition as a standard is more a matter of a consensus relying on acquaintance and mathematical or theoretical convenience than a decision grounded on biological reasons.

A relevant feature of the LS Procrustes method is that it spreads landmark’s variability homogeneously among all of them (Richtsmeier et al. 2002). This is a drawback, only admissible if it is known in advanced that the variation in each point is isotropic (Bookstein 1991) as we suspected was the case for the late ontogeny of the mandibles example. However, many morphometric studies expect shape variability among specimens and/or species to be placed at specific points from structures (Cheverud 1995; Slice 1996; Zelditch et al. 2004; Hallgrimsson and Lieberman 2008). Recently, Van der Linde and Houle (2009) have proposed a modification of the traditional LS Procrustes superimposition based on prior biological knowledge about the variation in form on those structures under study. The method progressively discards landmarks from a dataset if a generalized LS Procrustes superimposition (GLS) excluding those landmarks results in a significant reduction in the Procrustes residuals. This sort of alternative superimposition method, just as the resistant approach proposed here, would therefore be preferable over GLS whenever local shape changes and non-isotropic variation is expected.

Due to its mathematical formulation, the RF typically requires a preliminary LS superimposition to perform reflections, if necessary. The subsequent RF not only does not worsen the results: most of the time, it gives insightful information on where shape differences are specifically placed. When local shape changes do not take place, a RF superimposition does not greatly differ from that obtained by LS.

In the view of the previous considerations, morphometric studies may face the question: ¿should shape differences between two objects, following an estimated optimal superimposition, be depicted (and therefore perceived) as homogeneously distributed, when on the basis of complementary information patterns of localized shape variation would be expected? A quantitative answer can be approached. The breakdown value (Donoho and Huber 1983; Hampel et al. 1986) of an estimation method is a measure describing the percentage of data that can be arbitrarily changed or perturbed without modifying the resultant estimate. LS superimposition breakdown value is 0 %, since a single change in data produces a different estimate. The repeated-medians superimposition breakdown value is, instead, nearly 50 % (the maximum possible; Siegel 1982) as the estimate remains the same even if up to half but one of the points vary. To put it clearly: in the context of shape analysis, if differences between two configurations of landmarks were placed, say, in a single landmark, the LS Procrustes superimposition would produce different fits depending on the particular location of that landmark, compelling the remaining n−1 points that did not change (an undisputed majority) to fall apart in order to reduce the sum of squared differences. As a consequence, artificial shape variation is often introduced in most of the landmarks when using LS, which in turn may mask the real shape differences.

The resistant Procrustes fit is instead designed to add no artificial variation when a relative displacement is present not only in one but in up to half but one of the landmarks. Based on parsimony, its results tend to be typically more in agreement with biological foundations; the adoption of procedures incorporating these biological assumptions might dramatically improve the quality of inferences on shape variation patterns.

In the context of shape analysis, the need of superimposition methods not only mathematically sound but also and perhaps mainly biologically meaningful has been previously pointed out (Richtsmeier et al. 2002; Van der Linde and Houle 2009; Catalano and Goloboff 2012). Since no consensus has been reached yet, methodological contributions in the near future will keep on defying the goodness of traditional LS techniques, aiming at the same time to establish the performance or improvements that alternative methods can bring on solving specific biological problems.