Examining the Variations in the Results of the Hotelling T 2 Test in Case of Changing Baseline Landmarks in the Bookstein Coordinates

Ercan, Ilker; Sigirli, Deniz; Ozkaya, Guven

doi:10.1007/s12539-015-0025-y

Examining the Variations in the Results of the Hotelling T ² Test in Case of Changing Baseline Landmarks in the Bookstein Coordinates

Original Research Article
Published: 22 July 2015

Volume 7, pages 186–193, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Examining the Variations in the Results of the Hotelling T ² Test in Case of Changing Baseline Landmarks in the Bookstein Coordinates

Download PDF

Ilker Ercan¹,
Deniz Sigirli¹ &
Guven Ozkaya¹

139 Accesses
2 Citations
Explore all metrics

Abstract

In many biological and biomedical investigations, the most effective way to analyze the forms of whole biological organs or organisms is by recording geometric locations of landmark points. If we want to compare shapes, then individuals should be translated, rotated and scaled in such a way that all of the individuals lie in a standard position and are centered. Bookstein conducted this process by choosing two landmarks as reference landmarks. Each individual is translated, rotated and scaled according to these reference landmarks. The aim of the present study was to examine the change in the p values in the case of choosing different baseline landmarks when performing the Hotelling T ² test, which is commonly used when comparing two sample shape configurations based on Bookstein coordinates. For this purpose, the changes in the p values were investigated in shape configurations that are composed of a different number of landmarks by taking all of the possible paired landmark combinations at different variance levels and sample sizes. As a result of the present study, it was observed that with the increase in the landmark number, the number of possible baseline landmark combinations also increases and, for this reason, a substantial number of variations occur in the p values. Therefore, it is an important to decide which landmarks should be taken as reference landmarks when using the Bookstein coordinates.

Procrustes Shape Cannot be Analyzed, Interpreted or Visualized one Landmark at a Time

Article 16 March 2022

A Brief Overview on Statistical Shape Analysis

Making Heads or Tails of Combined Landmark Configurations in Geometric Morphometric Data

Article 05 June 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Most of the studies in medicine are related to the examination of geometrical properties of an organ or organism. While in these studies, the datasets used for statistical analysis are composed of quantitative or qualitative measuring values, currently, an organ’s or organism’s appearance or shape is treated as input data to develop imaging techniques [1].

In many biological and biomedical investigations, the most effective way to analyze forms of whole biological organs or organisms is to record the geometric locations of landmark points [2]. Several different morphometric algorithms are available for analyzing landmark data. All of these methods utilize the same raw data: the coordinate values of landmarks in two or three dimensions. Some current morphometric techniques that use landmark-based methods are the finite-element scaling analysis (FESA), thin-plate splines (TPS), procrustes analysis and Euclidean distance matrix analysis (EDMA) [3].

When digitizing landmarks from a number of specimens, it is in general impossible to ensure that each specimen is measured in the same position and orientation. If we want to compare shapes, it is, therefore, necessary to translate and rotate the specimens into a standardized position and orientation. In addition, we would like to scale the specimens to a standard size. Bookstein suggested performing this task by selecting two landmarks to form a baseline [4, 5].

With the shape coordinates that are suggested by Bookstein, different analysis can be performed on these coordinates. While some of these analyses can be performed by comparing two samples to examine the differences between the shape configurations, some of them can be performed on the same subject to examine a single individual’s shape configuration or can compare two individuals with each other [4, 6].

One of the tests that is implemented on Bookstein shape coordinates to compare shape configurations of the two samples is the Hotelling T ² test. This test assumes that the samples are independent, that Bookstein shape coordinates have a multivariate normal distribution, and that the samples are drawn from populations that have the same covariance matrix [7].

The aim of the present study is to examine the change in p values for choosing different baseline landmarks when performing the Hotelling T ² test, which is commonly used to compare two sample shape configurations that are based on Bookstein coordinates. For this purpose, the changes in p values were investigated in shape configurations that comprise a different number of landmarks, by taking all of the possible paired landmark combinations at different variance levels and sample sizes.

2 Methods

2.1 Bookstein Coordinates

Let $(x_{j}, y_{j}),\,j= 1, \ldots , k$, be $k\ge 3$ landmarks in a plane ($m=2$ dimensions). Bookstein suggests removing the similarity transformations by translating, rotating and rescaling such that landmarks 1 and 2 are sent to a fixed position. If landmark 1 is sent to (0, 0) and landmark 2 is sent to (1, 0), then suitable shape variables are the coordinates of the remaining $k-2$ coordinates after these operations. To preserve symmetry, we consider the coordinate system in which the baseline landmarks are sent to $(-0.5, 0)$ and (0.5, 0).

Bookstein coordinates $(u_j^B, v_j^B)^T$, $j=3, {\ldots }, k$, are the remaining coordinates of an object after translating, rotating and rescaling the baseline to $(-0.5, 0)$ and (0.5, 0), which results in the following:

$$\begin{aligned} u_j^B =&\{(x_2 -x_1)(x_j -x_1)\nonumber \\&+\,(y_2 -y_1)(y_j -y_1)\}/D_{12}^2-\frac{1}{2}, \end{aligned}$$

(1)

$$\begin{aligned} v_j^B =&\{(x_2 -x_1)(y_j -y_1)\nonumber \\&-\, (y_2 -y_1)(x_j -x_1)\} / D_{12}^2 \end{aligned}$$

(2)

where $j=3, {\ldots }, k$, $D_{12}^2 =(x_2 -x_1)^2+(y_2 -y_1)^2>0$ and $-\infty <u_j^B$, $v_j^B <\infty$ ($(v^B_j, v^B_j)^T$ is the Bookstein coordinates and T shows the transpose).

If the baseline is taken as (0, 0) and (1, 0), then there is no $-0.5$ in the equation for $u_j^B$ [8].

2.2 Two Independent Sample Hotelling’s T ² Test

Consider two random samples $X_1, X_2, \ldots , X_{n_1}$ and $Y_1, Y_2, \ldots , Y_{n_2}$ from independent populations with mean shapes $[\mu _1]$ and $[\mu _2]$. To test between $H_0:\, [\mu _1]=[\mu _2]$ versus $H_1:\, [\mu _1]\ne [\mu _2]$, Hotelling’s $T^2$ test can be performed.

Hotelling’s $T^2$ two-sample test based on Bookstein coordinates has been explicitly described. Consider two samples of shapes with sizes $n_1$ and $n_2$. The multivariate normal model is proposed for Bookstein coordinates (M-vectors) $v_1, \ldots , v_{n_1}$ and $w_1, \ldots , w_{n_2}$:

$$\begin{aligned}&v_i \sim N_M (\xi _1, \Sigma ),\quad w_j \sim N_M (\xi _2, \Sigma ),\\&\quad i = 1, {\ldots }, n_1;\,\, j = 1, {\ldots }, n_2, \end{aligned}$$

and the $v_i$ and $w_j$ are all mutually independent; common variance matrices are assumed, and M is the dimension of the shape space (which is $2k-4$ for planar data). We write $\bar{v}$, $\bar{w}$ and $S_v$, $S_w$ for the sample means and sample covariance matrices (with divisors $n_1$ and $n_2$) in each group. The squared Mahalanobis distance between $\bar{v}$ and $\bar{w}$ is

$$\begin{aligned} D^2=(\bar{v}-\bar{w})^TS_u^{-1} (\bar{v}-\bar{w}), \end{aligned}$$

(3)

where $S_u =(n_1 S_v +n_2 S_w)/(n_1 +n_2 -2)$ and $S_u^{-1}$ is the inverse of $S_u$, which will usually exist. Under $H_0$, $\xi _1 =\xi _2$, and the test statistic given below is used.

$$\begin{aligned} F=\frac{n_1 n_2 (n_1 +n_2 -M-1)}{(n_1 +n_2)(n_1 +n_2 -2)M}D^2 \end{aligned}$$

(4)

The test statistic has an $F_{M, n_1 +n_2 -M-1}$ distribution under $H_0$. Hence, $H_0$ is rejected for large values of F [8].

2.3 Sample

To investigate the effect of the baseline choice on the results of the Hotelling $T^{2}$ test, a total of six landmarks are included in the present study, which are marked on the nose images of 100 subjects (Fig. 1). Landmarks have been identified through some of the anatomical landmarks of nose which have been described in at [9]. These landmarks were selected so that they can show the effects of the landmarks which were found near and far. In the present study, to examine the effect of baseline choice in the different number of landmarks, separate evaluations were performed for 3 (landmark 1, landmark 2 and landmark 3), 4 (landmarks 1, 2, 3 and 4), 5 (landmarks 1, 2, 3, 4 and 5) and 6 (landmarks 1, 2, 3, 4, 5 and 6) landmarks.

2.4 Design of Simulation

In the present study, simulations were performed according to an isotropic model for four different shape configurations, which comprise 3, 4, 5 and 6 landmarks, respectively. In the simulation study, the mean vectors of landmarks were obtained from a real dataset as below.

$$\begin{aligned} X=[586.19, 667.34, 624.90, 440.70, 451.41, 416.55] \end{aligned}$$

(5)

Data were generated from a multivariate normal distribution for the variances 0.01 and 0.05. Additionally, to investigate high variance levels, data were generated for the minimum and maximum variances (737 and 2949) obtained from the real dataset.

After generating the data, the Hotelling T ² test was performed on the Bookstein coordinates, which are obtained by taking landmark 1 and landmark 2, landmark 1 and landmark 3, and landmark 2 and landmark 3 as baseline landmarks, with the same dataset for the shape configuration, which consists of three landmarks (Fig. 2).

The same application was performed for the other shape configurations, which consist of 4, 5 and 6 landmarks, by taking all of the possible landmark pairs as baseline landmarks.

Maximum change rates were calculated from the percentage changes of maximum and the minimum p values, which were obtained from the test results that were performed on the same individual (shape) by taking different landmark combinations as baseline landmarks. Change rates obtained for all of the individuals (shapes) were compared according to the landmark number in different sample sizes and different variance levels. The Kruskal–Wallis test and the Mann–Whitney U test were used in these comparisons. Bonferroni correction was applied to the post hoc comparison results ($\alpha ^{*}=\alpha /k=0.05/6=0.008$).

In the simulation study, 1,000,000 repetitions were performed for the $n=25$, 50, 100, 250, 500 and 1000 sample sizes. Simulations were performed on an HP Z800 Workstation with a 2.40 GHz dual processor and 16 Gb of RAM, using R software.

3 Results

Simulation results are given in Table 1 for $\sigma ^{2}=0.01$, in Table 2 for $\sigma ^2=0.05$, in Table 3 for $\sigma ^{2}=737$ and in Table 4 for $\sigma ^2=2949$.

Table 1 Statistical comparison of the maximum percentage changes in the p values according to the number of landmarks, for a variance of 0.01

Full size table

Table 2 Statistical comparison of the maximum percentage change in the p values according to the number of landmarks, for a variance of 0.05

Full size table

Table 3 Statistical comparison of the maximum percentage change in the p values according to the number of landmarks, for a variance of 737

Full size table

Table 4 Statistical comparison of the maximum percentage change in the p values according to the number of landmarks, for a variance of 2949

Full size table

In both large and small sample sizes, it is seen that changes in the p values increase as the landmark number increases. Also the increase in the variances shows an effect in the way of increasing the changes in the p values. In the present study, while the lowest change in the p values was obtained for the lowest variance level ($\sigma ^2=0.01$) and for the three-landmark situation, the biggest change in the p values obtained for the highest variance level which we obtained from the real dataset $(\sigma ^2=2949)$ and for the six-landmark situation.

4 Discussion

Morphometrics is a field that is concerned with the variation and change in the form (size and shape) of objects and organisms. There are various methods that are used for the analysis of shapes. These methods can be considered to be traditional methods that use linear distances, ratios or angles in the analysis or can be considered to be modern geometric morphometric methods, which include statistical shape analysis applications based on landmarks [1, 10, 11]. One of the approaches in modern geometric morphometrics is the use of Bookstein coordinates. Bookstein coordinates are the coordinates of the remaining $k-2$ coordinates after removing the similarity transformations by translating, rotating and rescaling such that landmarks 1 and 2 are sent to a fixed position. There are debates on the choice of baseline landmarks in the Hotelling $T^{2}$ test for when two groups are compared based on Bookstein coordinates.

There are no found similar studies to the present study in the literature, but Rohlf [12] stated that different choices for the base (reference) landmarks of the shape lead to somewhat different results. The present study was conducted to investigate this discussion. For this reason, we have performed a simulation study with different landmark numbers, in cases of changing baseline landmarks, for low and high variance levels and in small, moderate and large sample sizes.

In the present study, significant differences were observed in terms of the variations in the p values derived from the changes in the baseline landmarks, according to the landmark numbers. These differences were observed in both high and low variance levels and in small, moderate and large sample sizes. It was observed that the variations in the p values increased with an increase in the landmark number. When the effect of the increment of the landmark number on the p values was examined in the low and high variance levels, major variations were observed in the p values, especially at high variance levels.

5 Conclusion

While working with a small number of landmarks leads to fewer effects on the p values because of the selection of baseline landmarks, this approach can be insufficient for describing the shape. As we know, the more landmarks we study with, the more power we obtain when describing the shape. However, the results of the present study which is based on the Bookstein coordinates show that with the increase in the landmark number, the number of possible baseline landmark combinations also increases and, for that reason, a substantial number of variations occur in the p values. Therefore, it is important to determine which landmarks should be taken as reference landmarks in the case of using Bookstein coordinates.

References

Ercan I, Ocakoglu G, Sigirli D, Ozkaya G (2012) Statistical shape analysis and usage in medical sciences. Turk Klin J Biostat 4:27–35
Google Scholar
Bookstein FL (1991) Morphometric tools for landmark data: geometry and biology. Cambridge University Press, Cambridge
Google Scholar
Richtsmeier JT, Cheverud JM, Lele S (1992) Advances in anthropological morphometrics. Annu Rev 21:283–305
Google Scholar
Bookstein FL (1984) A statistical method for biological shape comparisons. J Theor Biol 107:475–520
Article CAS PubMed Google Scholar
Hammer Ø, Harper DAT (2005) Paleontological data analysis. Wiley, Massachusetts
Book Google Scholar
Baskan EB, Ercan I, Ozdemir ST, Ozkaya G, Yilmaz M, Ozgenel Y, Saricaoglu H (2013) A case of Parry–Romberg syndrome: examining efficacy of lipofilling by statistical shape analysis. Turk Klin J Med Sci 33:284–289
Article Google Scholar
Rohlf FC (2000) Statistical power comparisons among alternative morphometric methods. Am J Phys Anthropol 111:463–478
Article CAS PubMed Google Scholar
Dryden I, Mardia KV (1998) Statistical shape analysis. Wiley, Chichester
Google Scholar
Ercan I, Etoz A, Guney I, Ocakoglu G, Ozdemir ST, Kan I, Kahveci R (2007) Statistical shape analysis of nose in Turkish young adults. J Craniofac Surg 18(1):219–224
Article PubMed Google Scholar
Ocakoglu G, Ercan I (2013) Traditional and modern morphometrics. Turk Klin J Biostat 5:37–41
Google Scholar
Sigirli D, Ercan I (2013) Growth and allometry in modern morphometrics. Turk Klin J Biostat 5:42–48
Google Scholar
Rohlf FC (1999) Shape statistics: procrustes superimpositions and tangent space. J Classif 16:197–223
Article Google Scholar

Download references

Acknowledgments

This study was supported by Uludag University Scientific Research Project Committee UAP(T)-2010/34.

Author information

Authors and Affiliations

Department of Biostatistics, Faculty of Medicine, Uludag University, Gorukle Campus, 16059, Bursa, Turkey
Ilker Ercan, Deniz Sigirli & Guven Ozkaya

Authors

Ilker Ercan
View author publications
You can also search for this author in PubMed Google Scholar
Deniz Sigirli
View author publications
You can also search for this author in PubMed Google Scholar
Guven Ozkaya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ilker Ercan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ercan, I., Sigirli, D. & Ozkaya, G. Examining the Variations in the Results of the Hotelling T ² Test in Case of Changing Baseline Landmarks in the Bookstein Coordinates. Interdiscip Sci Comput Life Sci 7, 186–193 (2015). https://doi.org/10.1007/s12539-015-0025-y

Download citation

Received: 20 March 2014
Revised: 15 August 2014
Accepted: 24 October 2014
Published: 22 July 2015
Issue Date: June 2015
DOI: https://doi.org/10.1007/s12539-015-0025-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Examining the Variations in the Results of the Hotelling T ² Test in Case of Changing Baseline Landmarks in the Bookstein Coordinates

Abstract

Similar content being viewed by others

Procrustes Shape Cannot be Analyzed, Interpreted or Visualized one Landmark at a Time

A Brief Overview on Statistical Shape Analysis

Making Heads or Tails of Combined Landmark Configurations in Geometric Morphometric Data

1 Introduction