1 Introduction

The skull, as an important biological feature of the human, does not change easily because of external factors, such as facial expression and light. After the skull has reached maturity(\( 16\sim 25 \) years old), the features of the skull generally do not change, and the features are stable and not easily destroyed. At the same time, the skull is unique, and it is hard to be forged. Therefore, similarity results based on skull have more stability and safety than based on other biological features.

In the process of craniofacial reconstruction, the similarity comparison of the skulls can be used to improve the accuracy of the craniofacial reconstruction results. That is, according to the features of the skull to be reconstructed, a group of skulls similar to the unknown skulls are found by similarity calculation and then the statistical craniofacial model is reconstructed by the corresponding skins, thus improved craniofacial reconstruction results are obtained.

In the field of actual criminal investigation, skull similarity comparison can be used to find the most similar skull or exclude the most dissimilar skull in the database, which will assist the craniofacial reconstruction and the identification by experts, the witnesses of the case and the families of the victims will play an active role in the investigation of the case. In the field of archaeology, finding the corresponding features of the similar skull to guide the craniofacial reconstruction is also beneficial to reconstruct the appearance of the person in ancient times through his skull. At the same time, the comparison methods of skull similarity also play an active role in the medical cosmetology, biology and other fields.

In this paper, we propose a new method of skull similarity comparison based on SPCA. We extract skull features based on SPCA, and compare the similarity of skull features with the mean square error (MSE) and the normalized inner product. Compared with PCA, SPCA can not only effectively reduce the data dimension, but also produce sparse principal components which are easy to explain. SPCA makes the loadings sparse, and thus highlights the main part of the principal component, which can solve the problem of PCA that has difficulty in explaining the result. The comparison results show the comparison of skull similarity based on SPCA is more accurate and stable than the comparison results based on PCA, which can provide an effective direction for improving the accuracy of craniofacial reconstruction and obtaining improved reconstruction effects.

2 Related works

At present, researches on computer-assisted craniofacial reconstruction mainly focuses on the craniofacial reconstruction methods. There are few studies on the similarity evaluation of skulls. So far, there is still no uniform standard to judge the accuracy on the evaluation method of skull similarity.

2.1 Skull similarity

Because of the complex topological structure of a skull, how to calculate the similarity between two skulls is a difficult problem. Sun et al. [20] proposed a two-dimensional skull recognition technique based on CT images. The relative distance is used as the feature to identify the eyelids, and the polar radius invariant moment is used for horizontal and vertical comparison, and then the similarity evaluation of the skull is performed. Zhao et al. used multiple fitting [40], wavelet analysis [41], wavelet analysis of skull gray features [42], quadratic rational Bezier curve fitting [43] and Radon transform method [46] for skull similarity analysis of X-ray images. These methods are based on skull recognition of two-dimensional images, which lose some original information of the three-dimensional skull, and the evaluation results lack comprehensiveness. Alessio et al. [27] proposed an objective method for comparing skulls and face photos. Jin et al. [17] put forward a method to evaluate the similarity between skull and skin but did not compare the similarities between the two skulls.

Subjective methods are mostly used in selecting similar skulls for craniofacial reconstruction, which are chosen through subjective observation of human beings. But this method does not ensure the objectivity of th evaluation results, and it is time-consuming and laborious. A few scholars used objective methods to calculate the similarity of the skull: Faisan [7] used Hidden Markov Construction Statistical Model to represent the location of the skull-related region and compared the abnormalities of the skull. This method is applied to cranio-maxillofacial plastic surgery and only focuses on the location of different regions but does not consider the shape of the region. Shui [28] defined the geometric skull similarity based on the feature points of skulls. The Euclidean distance of the matrix between the two skulls’ features is used as the basis for judging their similarity. The similarity values which were calculated by the method depend on the selected feature points. So, the objective calculation of the skull similarity needs to be further studied.

2.2 Craniofacial similarity

Subjective methods are also used in the similarity evaluation of the craniofacial reconstruction results. Snow [29] invited more than 200 participants, and Stephan et al. [30] invited 37 participants to perform a subjective similarity evaluation of craniofacial reconstruction results, and the results showed that male participants were significantly more accurate than female participants. Helmer [11], Claes [3], Vanezis et al. [32] also used direct inviting respondents to evaluate the craniofacial reconstruction results. However, this evaluation strategy used a certain number of participants to evaluate the similarity of craniofacial reconstruction results. Although the conclusions can be consistent with the general cognitive conclusions of humans, the accuracy of the evaluation results are subject to human factors. At the same time, it costs a lot of labo, material resources and time.

Both the skull and skin belong to the three-dimensional model. From this point of view, some scholars compare the differences between two craniofacial surfaces from the perspective of the comparison of three-dimensional models for the evaluation of craniofacial reconstruction results. Feng et al. [9] used the relative angle histogram to compare the data of the two reconstructed craniofacial regions, so as to obtain the similarity evaluation between the reconstructed craniofacial surfaces. Li [23] combined the improved ICP algorithm and relative angle histogram method to evaluate the reconstructed craniofacial similarity. IP et al. [16] used the method of extending Gauss map (EGI) to extract 3D craniofacial features and mapping into a matrix to measure craniofacial features and obtain the results of similarity evaluation. Wong et al. [34] also used the Gaussian graph to evaluate the geometric features of the reconstructed craniofacial surface. Zhai [37] used the Euclidean geometric distance matrix method (EDMA) to describe the morphological features of the object and carries out the similarity evaluation. The coordinate measurement is not restricted by the coordinate axis selection. It reduces the error caused by the fixed coordinates in the measurement. Li et al. [22, 24] proposed a method for evaluating the similarity of the reconstructed craniofacial surface based on the iso-geodesic stripes. Zhao et al. [39] proposed a local and global craniofacial reconstruction evaluation method based on a geodesic grid. Zhu et al. [44] used Principal Warps to evaluate craniofacial geometric similarity, by comparing the bending deformation between the craniofacial surface and other craniofacial surfaces in the database. And the mapping is established by using the thin-plate spline function, so as to calculate the corresponding bending deformation matrix. Liang et al. [21] used automatic marking feature points on the skull to obtain a good reconstruction effect, and used the reconstructed craniofacial global and local features similarity evaluation methods to improve the accuracy of evaluation.

2.3 SPCA

At present, some scholars have applied statistical methods to the facial similarity evaluation. Yuan [36] used PCA to normalize the three-dimensional shape extracted from two-dimensional texture and three-dimensional face image, and then used fuzzy clustering and neural network to recognize the face. Theodoros et al. [26] used registration and PCA to evaluate the results of 3D face recognition. Firstly, the face is registered to get a standardized template, and then PCA is used to build a model to reduce the dimension of face space to calculate face similarity. These classical PCA method can automatically extract the important information and reduce the dimension of the original data, simplify the calculation, but it is difficulty in explaining the single principal component.

Sparse principal component analysis (SPCA) is an algorithm introduced to solve the problem of PCA that has difficulty in explaining the single principal component. The main idea is to add the penalty function or sparse constraint condition on the basis of PCA, and force the loadings of some principal components to close to or equal to zero, thus obtaining the sparse principal component. These sparse principal components are used to highlight an important part of the principal component and to maintain the PCA variance maximization attribute. At the same time, the PCA problem can be transformed into an optimization problem. In recent years, many scholars have done a lot of researches and experiments on the algorithm and application of sparse principal component analysis, and many valuable research results have been obtained in this process. Different constraints and penalty functions were used in different sparse principal component analysis methods.In 1995, Cadima et al. [2] proposed a method to improve the PCA algorithm by setting zero to the loadings where their absolute value is less than the threshold value, and the original sparse principal components are obtained. This method simplifies the principal component and increases the interpretability of the principal component, but the error is large. In 1995, Jolliffe [18] proposed a heuristic method of SPCA, which gets the sparse loading by rotating principal components. The interpretability of the principal component is strengthened, but the variance of the principal component is not diminishe. In 1996, Tibshirani [31] proposed a new variable selection method called Lasso, which used the sparse absolute value function of the model as a penalty function to compress the coefficient of the smaller absolute value to zero, and to realize the selection of the main variables and the estimation of the corresponding parameters to produce a precise and sparse model. In 2000, Vines et al. [33] proposed a simple principal component method, which directly limited the discrete sets, such as selecting elements in the discrete set of \( \left \{ -1,0,1 \right \}\) as the loadings. In 2003, Zou and Hastie [45] proposed Elastic Net, which introduced L2 penalty on the basis of Lasso, which can effectively select the model and solve the problem that the number of independent variables is larger than the sample size. In 2003, Jolliffe et al. [19] applied Lasso’s idea to PCA, called SCoTLASS algorithm, which is the first optimization-based sparse PCA algorithm to add L1 norm constraints to loadings based on PCA. Accurate zero elements are obtained through rigorous calculations to obtain sparse principal components, which overcomes the shortcomings of rotating principal components, but the method lacks effective algorithm support. In 2004, Efron and Hastie et al. [6] proposed the Least Angle Regression algorithm (LARS), which proved that the Lasso estimation is a piecewise linear function, and the whole Lasso solution path is solved effectively by fitting the same calculation sequence with the single least square method, and the calculation problem of Lasso is solved. In 2006, Zou et al. [15] put forward the concept of SPCA based on the regularization of the elastic net and selection variable [14]. It shows that the PCA problem can be transformed into a regression optimization problem with a second penalty, which is obtained by adding Lasso or elastic net penalty to the regression coefficient. This algorithm is sensitive to the number of extracted principal components and the selection of sparse parameters.

These studies in the related works consider the complex topology of the skull, compare the similarity of the skull by image, and ignore the depth information of the skull. The subjective evaluation method may be affected by external factors such as experience, professional knowledge, etc. It’s difficult to guarantee the objectivity of comparison result. Considering the capability of extracting features of PCA and SPCA, this paper proposes a method for comparing the skull similarity based on 3D point cloud skull data. The SPCA is used to reduce the dimensionality of the skull data, and the similarity of the skull is compared by using the data after dimension reduction. SPCA can not only effectively reduce the dimension of the data, but also generate sparse principal components to facilitate interpretation of the results.

3 Materials

3.1 Skull data acquisition and preprocess

The study was conducted on 208 CT scans of skulls by Han Chinese volunteers in northern China, with participants aged 19 to 75. The skull data was obtained by the Clinical Multi-Layer EDS Scan System (Siemens Sensation 16). The scanned data of each object are stored in DICOM format, and the size of each slice image is 512 × 512. After filtering the noise, Sobel operator model is used to process the original CT slice image (as shown in Fig. 1) and extract the skull contour (as shown in Fig. 2) [5]. The 3D skull surfaces are reconstructed by the marching cube algorithm [25], and they are represented as triangular meshes with approximately 150000 and 220000 vertices (as shown in Fig. 3). All the heads are basically complete. Specifically, each skull contains all the bones from the skull to the jaw, with a complete tooth and no missing part of the face. A mesh edit process is also included to refine the skull surfaces, such as removal of background noise, fixed topological or geometric errors. For more details on the data processing process, see reference [4].(Marching cube algorithm and Sobel operator are generally applicable to other scans.)

Fig. 1
figure 1

A head slice image captured by CT scanner

Fig. 2
figure 2

The extracted skull contour

Fig. 3
figure 3

The reconstructed 3D skull triangle mesh models

3.2 A unified coordinate system establishment and data standardized

In order to eliminate the inconsistency of data acquisition in position, pose and scale, all skulls are transformed into a unified coordinate system. The unified coordinate system is determined by four skull landmarks: left porion, right porion, the left (or right) orbitale and the glabella (expressed as LP, RP, LO, G). From three points, LP,RP, LO, the Frankfurt plane is identified [13],[5]. The coordinate origin (expressed as O) is the intersection of the line LPRP and the plane containing the point G, which intersects the line LPRP orthogonally. We take the line ORP as the x-axis. The z-axis is the line of point O, which has the normal direction of the Frankfurt plane. Then, the cross product of z-axis and x-axis gets y-axis. Once the uniform coordinate system (as shown in Fig. 4) is established, all the prototypic skulls are transformed into it through simple rotation. Finally, by setting the distance between Lp and Rp to a unit, i.e. , each vertex \( \left (x,y,z \right ) \) of skull is scaled by \(\left \|L_{P}-R_{P} \right \|\). The scale of all samples is standardized.

Fig. 4
figure 4

A skull in the uniform coordinate system

3.3 Skull registration

The original skull meshes have different vertex numbers, triangular meshes, and different topological relationships. These data are difficult to use directly in statistical learning. It is necessary to make a registration firstly. For complex non-rigid deformations between different skull objects, we firstly choose a skull to cut off their backs (as shown in Fig. 5) as a reference which has 41059 vertices, then we use the classic thin plate spline transformation (TPS) for skull registration.

Fig. 5
figure 5

The reference skull for registration

The basic idea of the TPS registration algorithm is to use a smaller corresponding point set of the original data point set as a control point set to establish the TPS transformation, and then acts the transformation on all data points to realize shape transformation of the object. The thin plate spline function f is composed of two parts: the first part is an elastic transformation represented by the radial basis function (RBF); and the second part is the global affine transformation. The formula is as follows:

$$ f\left (x,y,z \right )= R_{s}\left (x,y,z \right )+\phi_{s}\left (x,y,z \right ) =\sum\limits_{i=1}^{n}\alpha_{i}U\left (\left \| {{P}_{i}^{t}}-\left (x,y,z \right ) \right \| \right )+\beta_{1}+\beta_{2}x+\beta_{3}y+\beta_{4}z $$
(1)

Where n is the number of feature points. \(U\left (\left \| {{P}_{i}^{t}}-\left (x,y,z \right ) \right \| \right )= \left \| {{P}_{i}^{t}}-\left (x,y,z \right ) \right \| \) is the Euclidean distance between feature points \({P_{i}^{t}}-\left (x_{i},y_{i},z_{i} \right )\) and the point \(\left (x,y,z \right ),\alpha _{i}\left (i=1,{\cdots } ,n \right )\) and \(\beta _{i}\left (i=1,2,3,4\right )\) are the weights to be calculated.

For the elastic transformation part, there are four additional boundary conditions, which are respectively expressed as follows:

$$ \sum\limits_{i=1}^{n}\alpha_{i}=0,\sum\limits_{i=1}^{n}\alpha_{i}x_{i}=0,\sum\limits_{i=1}^{n}\alpha_{i}y_{i}=0,\sum\limits_{i=1}^{n}\alpha_{i}z_{i}=0 $$
(2)

In order to solve the unknown variables \(\alpha _{i}\left (i=1,{\cdots } ,n \right )\) and \(\beta _{i}\left (i=1,2,3,4\right )\) are determined, the TPS function is obtained and it can be used on a reference point set. Then, iteratively using TPS transformation makes the source 3D skull model gradually change to the target model. Finally, we can register the skull surface by finding the corresponding point of the reference skull on the target skull.

The above TPS algorithm can be generally used for non-rigid surface registration. In this study, we marked 17 anatomical feature points manually as the control points of TPS transformation and achieved good registration results. A TPS skull registration method based on automatic marked features points can be realized according to reference [12]. More details on the method are described in reference [12, 13]. After data registration, all skulls have the same points as the reference one.

4 Methods

SPCA is an improved algorithm based on PCA that can be written as a regression optimization problem. It can integrate Lasso or elastic net directly into the regression standard. Because PCA’s principal component loadings can form a sparse estimation of regression optimization problem, the sparse loadings can be obtained by constraining the coefficients with Lasso or the elastic net. The methods and steps used in this paper are as follows: Firstly, all the skull data are pre-processed to normal standard skull data. Secondly, using SPCA to reduce the dimension of the three-dimensional skull training set data, the sparse principal component is obtained, and the skull training set data is projected into the sparse principal component space to obtain the dimension reduction test set data. Thirdly, according to the dimension reduction test set obtained in the second step, the distance between two skulls is calculated by MSE formula or normalized inner product formula. Finally, according to the distance obtained in the third step, we can find the most similar skull. The entire process is shown in Fig. 6.

Fig. 6
figure 6

Method Overview

4.1 PCA

PCA uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

Suppose X = (x1,x2,⋯ ,xp) is the data of the original variable (the original variable has p indicators), and Y = (y1,y2,⋯ ,yp) is a set of new features of the linear combination of the original variables. The PCA algorithm uses the Y = XV equation to transform the original variable into a new variable Y, where the k-th principal component can be expressed as:

$$ y_{_{k}}=x_{_{1}}v_{_{1k}}+x_{_{2}}v_{_{2k}}+\cdots+x_{_{p}}v_{_{pk}} $$
(3)

The vik in the (3) is called the loading of the k-th principal component yk of the i-th index xi of the original variable vik,i = 1,2,⋯ ,p constitutes the loading vector vk of the k-th principal component yk, that is, the k-th column of the mapping matrix V. The mapping matrix V is also called a loading matrix, and the (3) can be written as:

$$ y_{k}=X{\mathbf{v_k}} $$
(4)

The principal component obtained by PCA is a linear combination of original variables, and the principal components are irrelevant, that is, the principal component can retain the information of the original variables to the greatest extent, and the content of the information contained can be expressed by the variance contribution rate. Normally, the loadings are non-zero, that is to say the principal component depends on all the original variables. This leads to the problem of PCA that has difficulty in explaining the single principal component. In response to this problem, it is necessary to obtain a sparse principal component that is easy to explain.

4.2 SPCA

SPCA is an improved algorithm based on PCA that can be written as a regression optimization problem. It can integrate Lasso or elastic net directly into the regression standard. Because PCA’s principal component loadings can form a sparse estimation of regression optimization problem, the sparse loadings can be obtained by constraining the coefficients by Lasso or the elastic network.

Sparse principal component is required not only to keep the original information as much as possible, but also to keep the loadings sparse.

Consider a general linear regression model, assuming we have data (Xi,yi),i = 1,2,⋯ ,N, where Xi = (xi1,xi2,⋯ ,xip)T and yi,i = 1,2,⋯ ,N,are predictors and response variables, respectively. The ordinary least squares method can be written as:

$$ b_{OLS}=arg \min_{b}\left \| y-Xb \right \|^{2} $$
(5)

However, the ordinary least square method has offset and high variance, and the prediction accuracy is not enough, and it is not easy to explain. We hope that with a large number of predictors, we can determine a minimum subset to reflect the most prominent information in the original information. To improve the defect of the least square method, a ridge regression occurs, which can continuously shrink the coefficient. It is a linear regression method that shrinks the regression coefficient by its capacity addition method. The definition of the ridge regression method [35][8] can be written as:

$$ b_{ridge}=arg \min_{b}\left \| y-Xb \right \|^{2}+\lambda \left \| b \right \|^{2} $$
(6)

Where, the λ parameter is a parameter that controls the complexity of the shrinkage amount. The larger the value of λ, the larger the shrinkage amount, and the coefficient shrinks to zero. Although the ridge regression is relatively stable, it does not make certain coefficients zero, and can only approach zero, which makes it difficult to interpret the model.

Breiman [1] proposed an improved method of least squares method, which starts from the squares estimation and constrains the contraction factor to achieve the purpose of contraction, and then interprets the model. It is a new method of variable selection and contraction. Frank et al. [10] proposed using a constant to constrain the Lq norm form of a parameter, where q ≥ 0 is a constant and Lasso regression is q = 1. Lasso can not only continuously shrink the coefficient, but also automatically select variables. The Lasso regression method can be written as:

$$ b_{lasso}=arg \min_{b}\left \| y-Xb \right \|^{2}+\delta \left \| b \right \|_{1} $$
(7)

Where δ > 0, control the sparsity of loading, the larger the value of δ, the larger the shrinkage. When δ is large enough, some coefficients shrink to zero. It is easy to understand the relationship between the dependent variable and each variable, and solve the problem of ridge regression. However, when P > n (n is the number of samples, P is the observation value of the sample), it will affect the prediction accuracy of the model and the establishment of the model.

Subsequently, Zou [45] proposed an improved algorithm for Lasso regression, which can be written as:

$$ b_{EN}=arg \min_{b}\left \| y-Xb \right \|^{2}+\lambda \left \| b \right \|^{2}+\delta \left \| b \right \|_{1} $$
(8)

Where δ > 0 is used to control the sparsity of the loadings. The parameter λ can flexibly adapt to different data set types. For the data set of P > n, set λ > 0; for the data set of Pn, set λ = 0.

The elastic net regression method is a new method of regularization and variable selection. It is a combination of ridge regression and Lasso. That is, the L2 norm is added to the Lasso algorithm. In other words, the L1 norm and the L2 norm are used simultaneously. Not only can it solve the problem of stability and inconvenient interpretation of the model, but also solve the data set type of P > n.

According to the SPCA criterion proposed by Zou et al. [45], it can be transformed into a new optimization problem, further written as:

$$ \left (\widehat{A},\widehat{B} \right)=arg\min_{A,B}\sum\limits_{i=1}^{n}\left |X_{i}-AB^{T}x_{i} \right |^{2}+\lambda \sum\limits_{j=1}^{k}\left | b_{j} \right |^{2}+{\sum}_{j=1}^{k}\delta_{j}\left | b_{j} \right |_{1},A^{T}A=Ik $$
(9)

Where, λ has the same non-negative parameter value for each principal component to adapt to different types of data sets. δ can be set to different non-negative parameter values for each principal component to control the scarcity of the loading. For the values of λ and δ, they must be set in advance.

Although the ridge regression can fit the model to a certain extent, it can easily lead to distortion of regression results, and lasso regression can depict the reality represented by the model, but the model is too simple and not realistic. Elastic net regression, on the one hand, achieves the goal of ridge regression to select important features, on the other hand, like Lasso regression, removes the characteristics that have less influence on the dependent variables, so this paper finally chooses the elastic net regression method.

4.3 Similarity comparison

When estimating the similarity between different samples, the “distance” between samples is usually used. In this paper, the mean square error and normalized inner product distance formula are used to compare the similarity between different skulls:

$$ S_{M}(y_{i},y_{j})=\frac{1}{n}\sum\limits_{m=1}^{n}(y_{im}-y_{jm})^{2} $$
(10)
$$ S_{N}(y_{i},y_{j})=cos\left \langle y_{i},y_{j}\right \rangle=\frac{y_{i}y_{j}}{\left \| y_{i} \right \|*\left \| y_{j} \right \|} $$
(11)

According to the formula (10), the greater the mean square error between the two skulls, the greater the difference between the two skulls, and the smaller the value, the more similar the two skulls are. From the formula (11), we can know the normalized inner product, which uses the cosine value of the two vector angles to evaluate the similarity between the two skulls. The smaller the cosine value, the higher the difference of the two skulls, the greater the value, the more similar the two skulls.

5 Skull similarity comparison based on SPCA

Before using SPCA to compare skull similarity, first of all, it is necessary to ensure that the skull data have been preprocessed, including model repair, unified coordinate system, normalization, model cutting and registration to generate standard skull data. Then the skull point cloud data is applied to the skull similarity comparison method based on SPCA. A skull is a point cloud composed of p points (Experiment, p = 41059). Each point includes 3 coordinates: x,y,z. In order to store and calculate data easily, the point cloud data of the skulls are transformed into one-dimensional vectors containing P (P = 3p = 123177) variables. N skull data are provided for the training set (Experiment,N = 108). That is, the stored training samples form a matrix of P × N, and each column of the matrix represents a one-dimensional vector composed of point cloud data of one skull.

Firstly, 108 skulls are used as the training set, and the matrix C composed of the principal component loading vectors is obtained by the PCA algorithm.

Secondly, we use general SPCA algorithm to get sparse principal components.

Step 1: Let A = C(,1 : k), the loadings of first k ordinary principal components.

Step 2: Given fixed A = [a1,a2,⋯ ,ak], solve the following native elastic net problem for j = 1,2,⋯ ,k:

$$ b_{j}=arg\min_{b}\left (a_{j}-b \right )^{T}X^{T}X\left (a_{j}-b \right )+\lambda\left \| b \right \|^{2}+\delta\left \| b \right \|_{1} $$
(12)

Step 3: For each fixed B = [b1,b2,⋯ ,bk], do the SVD of XTXB = UDVT, then update A = UVT.

Step 4: Repeat step 2 − 3, until B converges.

Step 5: Normalization,\( V_{j}=\frac {b_{j}}{\left | b_{j} \right |} , j=1,2,\cdots ,k \).

Extracting features by SPCA, L (experiment, L = 60) sparse principal components are obtained. And then, test set matrix X composed of T skulls is projected into the direction corresponding to the sparse principal components. Then we can obtain the training set dimensionality reduction matrix Y (T × L). Where yi and yj represent the projections of the test skulls xi and xj on the sparse principal components, respectively.

Thirdly, according to the matrix Y and the mean square error formula (10) and the normalized inner product formula (11), we can calculate mean square error value and the normalized inner product value between any two skulls to obtain the similarity matrix S (T × T ).

In the mean square error evaluation, we calculate the difference between two vectors yi and yj on different dimensions, and use the mean value of the difference square sum to represent the results. That is, the mean square error between any two skulls is calculated using the formula (10), and the similarity matrix SM is obtained. In the normalized inner product evaluation, we calculate the cosine values of two vectors yi and yj to represent the results, that is, using the formula (11) to calculate the normalized inner product between any two skulls and get a similarity matrix SN.

Finally, we find out the similarity of skull based on SPCA and compare the results of SM and SN, which are the most similar skull numbers, and form the matrix O.

Let \( S_{M} (i,i) = + \infty ,i\in \left \{ 1,2,{\cdots } ,100 \right \} \) in the matrix SM exclude the comparison value between the skull and itself, and then sort the skull numbers which give the minimum value of the mean square error value to form the matrix OM. Similarly, the \( S_{N}(i,i)=0,i\in \left \{ 1,2,{\cdots } ,100 \right \} \) is sorted, and then the ordinal number of the skull is obtained by sorting the maximum value of the normalized inner product value, forming a matrix ON. The most similar skull of each skull based on SPCA similarity comparison is obtained.

In SPCA, the matrix V is a sparse principal component matrix obtained by the training set, in which each column of V is a sparse principal component, which is added to the average skull obtained by the training set, and then compared with the average skull, the specific region reflected by each sparse principal component can be seen.

Since the sparse principal components obtained by SCPA are linearly correlated, the method of calculating variance contribution rate in PCA is not suitable for SPCA. The contribution rate of the sparse principal component in the similarity comparison is calculated, and three kinds of importance measures based on the mean square error formula are proposed by Zhao et al. [38]. One is to calculate the phase similarity between the i-th skull and the j-th skull base of the reduced dimension matrix Y based on the formula (10) mean square error formula. Therefore, the ratio of each sparse principal component in skull similarity comparison can be calculated according to formula (13), and the matrix R is formed.

$$ R_{k}=\frac{{\sum}_{i=1}^{T}{\sum}_{j=1}^{T}\left (y_{ik}- y_{jk}\right )^{2}}{{\sum}_{k=1}^{L}{\sum}_{i=1}^{T}{\sum}_{j=1}^{T}\left (y_{ik}- y_{jk}\right )^{2}} $$
(13)

Where, Rk is the proportion of the k-th sparse principal component in the similarity comparison. The molecule is the sum of the k-th sparse principal component using the mean square error to calculate the similarity comparison value, and the denominator is the sum of all the sparse principal components using the mean square error to calculate the similarity comparison value. The matrix R is sorted and the position number is recorded to form a matrix OR, that is, the proportion of each sparse principal component, i.e. the contribution rate, can be obtained when the mean square error is used to measure the similarity degree.

6 Skull similarity comparison experiments and discusses

In our experiments, 108 skulls were used as training set, PCA and SPCA were used to obtain the principal component space and sparse principal component space respectively. And each skull has 41059 vertices.

6.1 The pcs of PCA and SPCA

Using PCA and SPCA, 108 skulls were used to obtain the principal component and the sparse principal component, respectively, and added to the average skull, and then compared with the average skull to obtain the region reflected by it. The results ar shown as Table 1.

Table 1 Principal components reflection area

Table 1 shows that the principle components of the average skull obtained by the PCA algorithm are larger and not concentrated, and the region is too scattered, which leads to the failure of the principal components of the PCA extraction to explain the specific areas they reflect. It also shows that the principal component reflected by the SPCA algorithm is smaller and more concentrated on the average skull than which by PCA. It is easy to show that the sparse principal components extracted by SPCA can be reflected in the specific area, and it is easy to explain the specific areas reflected by a single sparse principal component.

According to the importance measure method based on the mean square error formula (13), the contribution rate of the sparse principal component is calculated. The results are shown in Figs. 7 and 8.

Fig. 7
figure 7

PCA principal component contribution rate

Fig. 8
figure 8

SPCA sparse principal component contribution rate

Figure 7 shows that the principal components obtained by PCA are sorted according to the variance, so the contribution rate of each principal component is reduced in sequence, and the content of the first several principal components is too large, indicating that it contains more information. It can’t reflect the small area and is inconvenient to explain the results. Figure 8 shows that the sparse principal components obtained by SPCA has a small contribution rate for each sparse principal component, indicating that each sparse principal component contains less information and is convenient to explain the results.

Therefore, the sparse principal components of SPCA are superior to the principal component of PCA in explaining results.

6.2 MSE values of PCA and SPCA

Using 2 original skulls M1,M2 and three skulls are generated by interpolation of Fi = (1 − α)M1 + αM2, where α takes the values 0.2, 0.5, and 0.9, respectively. Thus, these 5 skull data were used as test sets From the way of generating the skull, the following conclusions can be drawn: the similarity between the skulls with the label 1 and 2 are the highest, the similarity between the skulls with the label 1 and 5 are the lowest, the similarity between the skulls with the label 1 and 2 are higher than the similarity between the skulls with the label 1 and 3. The test set skulls are shown in Table 2.

Table 2 Generated skull data set

The mean square error values are calculated according to formula (10) to compare the similarities of any two skulls. The calculation results are shown in Tables 34 and Figs. 910.

Table 3 MSE of PCA
Table 4 MSE of SPCA
Fig. 9
figure 9

Comparison results on MSE of PCA

Fig. 10
figure 10

Comparison results on MSE of SPCA

Tables 3 and 4 show that the experimental conclusions are consistent with the theoretical conclusions. The corresponding labels of the most similar skull are consistent with the theory, so the similarity comparison results in this experiment are correct. The skull similarity computed by mean square error of PCA or SPCA are consistent with the theory. The correct similarity comparison results are obtained, but the scope of magnitude differs greatly.

Figure 9 shows the range of mean square error values obtained by PCA is [0,1.2 × 105], and Fig. 10 shows the range is too large. The range of mean square error values obtained by SPCA is [0,7], and the range is moderate, which extends its ordinate axis by 3 times, and of course this value is still far smaller than [0,1.2 × 105], it still can be clearly seen that the direction of the curve in Fig. 10 is the same as that of Fig. 9, but the range is far less than that in Fig. 9.

It shows that the data after reducing the dimension by PCA is much higher than that of the data after reducing the dimension by SPCA in complexity. The similarity comparison results computed by the mean square error after the SPCA dimensionality reduction are consistent with our requirements.

Therefore, the data processed by the SPCA algorithm is superior to the data processed by the PCA algorithm in complexity.

6.3 Normalized inner product values of PCA and SPCA

The test set was the same as that in Experiment 2, with 5 skulls. The similarity between the two skulls is compared by calculating their normalized inner product according to formula (11). The results are shown in Tables 56 and Fig. 11.

Table 5 Normalized inner product of PCA
Table 6 Normalized inner product of SPCA
Fig. 11
figure 11

Comparison results of Normalized inner product similarity between PAC and SPCA

Tables 5 and 6 show that the results of the normalized inner product values of the data after PCA and SPCA reduction are consistent with the theoretical results and are correct. In order to distinguish the similarity between skulls, we need to preserve five decimal digits in normalized inner product values of PCA. The normalized inner product values of SPCA can be used to distinguish the similarity between skulls when the four decimal digits is preserved.

Figure 11 shows that when the ordinate coordinate range is consistent with [0.985,1], the variation range of the normalized inner product values of PCA is significantly lower than that of SPCA. The normalized inner product values of PCA is concentrated on the upper part of the longitudinal axis. The normalized result value of the SPCA is almost in the whole longitudinal axis, and the normalized inner product value of the PCA is small and is not easy to distinguish, but the normalized inner product value of SPCA has a large range and is clearer and more obvious than that of PCA. It shows that the degree of distinguish of normalized inner product value of PCA is obviously lower than that of SPCA.

Therefore, the data processed by SPCA algorithm is superior to the data processed by PCA algorithm in terms of degree of distinguish.

6.4 Consistency of PCA and SPCA results

100 skulls were used as test set, and their mean square error values and normalized inner product values were calculated according to (10) and (11). Comparison results between mean square error and normalized inner product of the data after PCA and SPCA are shown in Figs. 1213 and 14.

Fig. 12
figure 12

Consistency of PCA

Fig. 13
figure 13

Consistency of SPCA

Fig. 14
figure 14

Consistency ratios of PCA and SPCA

Figure 12 shows that the similarity based on PCA between the mean square error and the normalized inner product results are mostly inconsistent in large differences in the most similar skull numbers. Figure 13 shows that most of the two similarity comparisons based on SPCA can coincide. In Fig. 14, the coincidence rate of similar skull number obtained by PCA was 38%, and the coincidence rate of similar skull obtained by SPCA was 80%. The result of SPCA was obviously higher than that of PCA. It is shown that the dimensionality reduction data from SPCA can keep good consistency and the result is stable when comparing the similarity between formula (10) and (11).

Therefore, the data processed by SPCA algorithm is superior to the data processed by PCA algorithm in terms of stability.

6.5 Running time of similarity algorithms

The 100 skulls of the test set were divided into 10 groups, 10 skulls in each group. In the same program, on the premise that the operating environment is completely consistent, the data is reduced by using the principal component space of PCA and the sparse principal component space of SPCA respectively. The similarity degree is compared with the mean square error and the normalized inner product, and the final result is obtained. The comparison results of the 10 programs run time shown in Table 7 and Fig. 15.

Table 7 Running time
Fig. 15
figure 15

Comparison of the running time of 10 groups of test sets

Table 7 shows that when the similarity comparison is performed by SPCA dimensionality reduction data, the running time is almost half of that of PCA. Figure 15 clearly shows that the data after SPCA dimension reduction is more convenient and faster in computer processing than that after PCA dimensionality reduction.

That is to say, the similarity comparison calculated by SPCA is significantly faster than PCA. At the same time, this conclusion also confirms the conclusion drawn from experiment 6.2 from another aspect that the data complexity of SPCA dimension reduction is low.

6.6 Subjective evaluation results

100 skulls were used as test set, and the results of the similarity between the mean square error and the normalized inner product based on PCA and SPCA were obtained. Table 8 shows the comparison results of the first ten skulls similarities. In the first ten similarity comparison results, the results of SPCA mean square error(MSE) and normalized inner product(NIP) similarity are exactly the same.

Table 8 The questionnaire of skull similarity evaluation

Subjective survey of similarity evaluation results was conducted by questionnaires. A total of 80 people participated in the questionnaire, and each of the two questionnaires received 40. A total of 47 comparisons were made for the similarity comparison results of PCA and SPCA based on normalized inner products. The questionnaire results are shown in Figs. 16 and 17.

Fig. 16
figure 16

PCA and SPCA’s questionnaire about MSE

Fig. 17
figure 17

PCA and SPCA’s questionnaire about Normalized inner product

Figure 16 is a questionnaire survey result based on the comparison of the mean square error similarity between PCA and SPCA. This questionnaire removed the same result of similarity comparison, and conducted 59 groups of comparisons and 40 questionnaires. Of the survey participants, 72% thought the results of SPCA were more similar than PCA results, 28% thought the PCA results were more similar than SPCA results. Figure 17 is a questionnaire survey result based on the comparison of normalized product similarity between PCA and SPCA. This questionnaire removed the same result of similarity comparison, and conducted 47 groups of comparisons and 40 questionnaires. Of the survey participants, 75% thought the results of SPCA were more similar than PCA results, 25% thought the PCA results were more similar than SPCA results. It shows that the similarity comparison results of SPCA are more consistent with people’s cognition than that of PCA.

Therefore, the similarity evaluation using SPCA is superior to PCA in the evaluation effect.

7 Conclusion

With the results of the objective and subjective comparison experiments by PCA and SPCA, it can be concluded that both PCA and SPCA can achieve the effect of reducing the dimension of the original data while maintaining the main information. Therefore, both PCA and SPCA can be used in evaluating skull similarity, reducing dimensions and simplifying data. But in the comparison of the similarity degree obtained by the mean square error and the normalized inner product of the reduced dimension data, it is obvious that the data dimensionality reduction by SPCA are superior to PCA in terms of complexity, discrimination, stability and the effect of similarity evaluation. More importantly, the sparse principal components of SPCA can be reflected in specific areas and provide results that are easy to explain. Skull similarity comparison results based on SPCA are more accurate and stable than the comparison results based on PCA, can provide an effective direction for the improvement of craniofacial reconstruction, improve the accuracy of craniofacial reconstruction and get improve reconstruction effect.