Keywords

1 Introduction

In regards to comprehensive evaluation, the most important problem needed to address is how to set the weight of each evaluated indicator. The setting of weights can fall into two categories. One is the subjective weighting, such as AHP [1, 2], and the other is the objective weighting, such as the least square method and the principal component analysis [3, 4]. The two types have their own advantages and disadvantages. Subjective weighting is that the weights are given by experts and could be arbitrary in some cases, while objective weighting is not able to reflect the experiences or preferences of experts. Attribute coordinate comprehensive evaluation, belonging to the former, whose characteristic is that it can construct the corresponding psychological preference curve through evaluators rating the sample data in light of their own experiences or preferences, has made certain progress both in theory and practice [5,6,7,8,9,10,11,12,13]. However, when indicators are many, it is difficult for experts to accurately distinguish satisfactory samples from unsatisfactory samples, which might result in arbitrary ratings on some samples. To address the obstacle, the principal component analysis method is used to reduce the number of indicators and give the related meanings of new indicators, so it is easier for experts to rate on samples with new indicators.

This paper first introduces the steps of simplification of indicators by means of the principal component analysis, then explores the core idea of the attribute coordinate comprehensive evaluation method, and next elaborates the process of combining the two methods through the simulation and the comparison of results before and after the model is improved.

2 Reduction of Indicators by Principal Component Analysis

Principal component analysis is a method of dimensionality reduction in mathematics. The basic idea is to try to make the original indicators X1, X2, … Xt (for example, there are t indicators) recombined into a set of relatively unrelated comprehensive indicators Fm with fewer numbers than the number of original indicators. The specific steps of the principal component analysis are as follows:

  1. (1)

    Calculate the covariance matrix

Calculate the covariance matrix \( s = (s_{ij} )_{p \times p} \) of sample data

$$ s_{ij} = \frac{1}{n - 1}\sum\limits_{k = 1}^{n} {(x_{ki} - \bar{x}_{i} )(x_{kj} - \bar{x}_{j} )} \quad i,j = 1,2, \ldots ,p $$
(1)

Among them, sij (i, j = 1, 2, …, p) is the correlation coefficient between the original variable xi and xj. p is the number of indicators. n is the number of samples. \( \bar{x}_{i} \) and \( \bar{x}_{j} \) is respectively the mean of values of indicator i and j. \( x_{ki} \) is the value of indicator i of a certain sample, and \( x_{kj} \) is the value of indicator j of a certain sample.

  1. (2)

    Calculate the eigenvalues \( \lambda_{i} \) of S and orthogonal unit eigenvectors \( a_{i} \).

The first m larger eigenvalues of S, λ1 ≥ λ2 ≥ … λm > 0, is the variance of the first m principal components, and the unit eigenvector \( a_{i} \) corresponding to \( \lambda_{i} \) is the coefficient of the principal component Fi, and then the ith principal component Fi is:

$$ F_{i} = a_{i} X $$
(2)

The variance (information) contribution rate of principal components reflects the information magnitude, \( \gamma_{i} \) is:

$$ \gamma_{i} = \lambda_{i} /\sum\limits_{i = 1}^{m} {\lambda_{i} } $$
(3)
  1. (3)

    Determine the principal components

The final principal components to be selected are F1, F2, … Fm, and m is determined by the cumulative contribution rate of variance G(m).

$$ G(m) = \sum\limits_{i = 1}^{m} {\lambda_{i} } /\sum\limits_{k = 1}^{p} {\lambda_{k} } $$
(4)

When the cumulative contribution rate is greater than 85%, it will be considered enough to reflect the information of the original variables, and m is the extracted first m principal components.

  1. (4)

    Calculate the load of the principal components

The principal component load reflects the degree of correlation between the principal component Fi and the original variable Xj, and the load lij(i = 1, 2, …, m; j= 1,2,…, p) of the original variable Xj (j = 1,2, … p) on the principal component Fi (i = 1, 2, …, m) is:

$$ l ( {\text{F}}_{i} ,X_{j} )= \sqrt {\lambda_{i} } a_{ij} (i = 1,2, \ldots ,m;j = 1,2, \ldots ,p) $$
(5)
  1. (5)

    Calculate the scores of the principal components

The scores on the m principal components of the sample:

$$ F_{i} = a_{1i} X_{1} + a_{2i} X_{2} + \cdots + a_{pi} X_{p} \quad \quad \quad i \, = \, 1, \, 2, \ldots , \, m $$
(6)
  1. (6)

    Select the principal components and give the new meanings

Provide the new meaning of the new evaluation indicator Fi (i = 1, 2, …, m) for experts to rate on the new samples.

3 Attribute Coordinate Comprehensive Evaluation Model

3.1 Explore Barycentric Coordinates Reflecting Evaluators’ Preference Weight

Attribute coordinate comprehensive evaluation method combines machine learning with experts’ ratings on sample data. Set T0 to be the critical total score, Tmax the largest total score, we evenly select several total scores: T1, T2, … Tn−1 from (T0, Tmax) regarding the curve fitting requirements, and then select some samples on each total score Ti(i = 1, 2, 3 … n − 1) and rate them according to experts’ preference or experiences, which is taken as the process of the learning of samples, so as to get the barycentric coordinate for Ti (i = 1, 2, 3 … n − 1) according to (7).

$$ {\text{b}}\left( {\{ {\text{f}}^{\text{h}} \left( {\text{z}} \right)\} } \right) = \left( {\frac{{\sum\limits_{h = 1}^{t} {v_{1}^{h} f_{1}^{h} } }}{{\sum\limits_{h = 1}^{t} {v_{1}^{h} } }}, \ldots ,\frac{{\sum\limits_{h = 1}^{t} {v_{m}^{h} f_{m}^{h} } }}{{\sum\limits_{h = 1}^{t} {v_{m}^{h} } }}} \right) $$
(7)

Where, {fk, k = 1, … s} ⊆ ST ∩ F is the set for sample fi with the total score equal to T. In Formula (7), b({vh(z)}) is the barycentric coordinate of {vh(z)}, {fh, h = 1, … t} is the values of indicators of t sets of samples the evaluator Z selects from {fk}, {vh(z)} is the ratings (or taken as weight) the evaluator gives on the samples.

3.2 Calculate the Most Satisfactory Solution

Use the interpolation formula Gj(T) = a0j + a1j T + a2j T2+ +an+1j Tn+1 and barycentric coordinates obtained above to do curve fitting and construct the psychological barycentric line (or most satisfactory local solution line) L(b({fh(z)})); and then calculate the global satisfaction degree according to (8), and sort them in descending order to obtain the most satisfactory solution.

$$ sat(f,Z) = \left( {\frac{{\sum\limits_{i = 1}^{m} {f_{ij} } }}{{\sum\limits_{j = 1}^{m} {F_{j} } }}} \right)^{{\left( {\frac{{\sum\limits_{i = 1}^{m} {f_{j} } }}{{3(\sum\limits_{j = 1}^{m} {f_{ij} } )}}} \right)}} *\exp \left( { - \frac{{\sum\limits_{j = 1}^{m} {w_{j} \left| {f_{j} - b(f^{h} (z_{j} )} \right|} }}{{\sum\limits_{j = 1}^{m} {w_{j} \delta_{j} } }}} \right) $$
(8)

Where, sat(f, Z) is the satisfaction of evaluated object f from evaluator Z, whose value is expected to be between 0 and 1. \( f_{j} \) is the value of each indicator. \( \left| {f_{j} - b(f^{h} (z_{j} )} \right| \) is to measure the difference between each attribute value and the corresponding barycentric value. \( w_{j} \) and \( \delta_{j} \) are used as the factor which can be adjusted to make the satisfaction comparable value in the case where the original results are not desirable. \( \sum\limits_{j = 1}^{m} {F_{j} } \) is the sum of Fj with each indicator value full score. \( \sum\limits_{ij = 1}^{m} {f_{ij} } \) is the sum of the values of all the indicators Fij of Fi.

4 Simulation Experiment

To verify the effectiveness of the improved method, we chose the grades of nine courses from 2008 students in the final exam in a high school as the experimental data, nine courses being taken as nine indicators including Chinese, mathematics, English, physics, chemistry, politics, history, geography and biology. The sample data is shown in Table 1.

Table 1. Sample data of nine courses

First of all, we use the attribute coordinate comprehensive evaluation method to respectively construct the psychological barycentric curves of several courses without applying the principal component analysis. And then we improve the method in the way that the principal component analysis is used to simplify the indicators, further the attribute coordinate comprehensive evaluation method is applied to construct the psychological barycentric lines of the new indicators.

We also compare the global satisfaction degrees between two students before and after the improved method is applied.

4.1 Attribute Coordinate Comprehensive Evaluation Without Using Principal Component Analysis

Respectively, we choose the total score of 1000, 701 and 620 as the three evaluation planes, and select some samples for the experts to rate. The last column (Rating) of Tables 2 and 3 are respectively the rating data for total score 701 and 620.

Table 2. The samples and ratings for total score 701
Table 3. The samples and ratings for total score 620

According to (7), the barycentric coordinates of total score 701 and 620 with (Chinese, math, geography) are respectively (88.65625, 92.625, 76.4375) and (88.79069767, 83.93023256, 67.51162791).

Next, according to the interpolation theorem, we calculate the barycentric curves of Chinese, mathematics, geography (respectively shown in Figs. 1, 2, 3). It can be seen that the barycenter curve of Chinese is very unreasonable, as the curve should be monotonically increasing, while in this curve, the curve for total score of 650 is even lower than that of the total score of 600. From Figs. 2 and 3, we can see that barycentric curves of mathematics and geography are almost the same, which is not obvious to see the expert put more weight on arts or science.

Fig. 1.
figure 1

The barycenter curve of Chinese

Fig. 2.
figure 2

The barycenter curve of Math

Fig. 3.
figure 3

The barycenter curve of Geography

The most likely reason for the result is that so many indicators make it difficult for experts to accurately distinguish good samples from bad samples among nine indicators, which could result in arbitrary ratings.

4.2 Attribute Coordinate Comprehensive Evaluation with Principal Component Analysis

We apply the improved algorithm, first carrying out principal component analysis to reduce the quantity of indicators.

  1. (1)

    Calculate the covariance matrix S (correlation coefficient matrix) between indicators.

$$ \left( {\begin{array}{*{20}c} {} \hfill & {{\text{x}}1} \hfill & {{\text{x}}2} \hfill & {{\text{x}}3} \hfill & {{\text{x}}4} \hfill & {{\text{x}}5} \hfill & {{\text{x}}6} \hfill & {{\text{x}}7} \hfill & {{\text{x}}8} \hfill & {{\text{x}}9} \hfill \\ {\text{x1}} \hfill & {0.1725} \hfill & {0.3276} \hfill & { - 0.2918} \hfill & {0.5982} \hfill & {0.0915} \hfill & { - 0.3520} \hfill & { - 0.3481} \hfill & {0.2933} \hfill & { - 0.2836} \hfill \\ {\text{x2}} \hfill & {0.5151} \hfill & { - 0.3978} \hfill & {0.5263} \hfill & {0.4750} \hfill & {0.0890} \hfill & {0.2451} \hfill & { - 0.0012} \hfill & { - 0.0697} \hfill & { - 0.0318} \hfill \\ {\text{x3}} \hfill & {0.2543} \hfill & {0.7953} \hfill & {0.4751} \hfill & { - 0.1837} \hfill & { - 0.0851} \hfill & {0.1512} \hfill & {0.0638} \hfill & {0.0963} \hfill & {0.0034} \hfill \\ {\text{x4}} \hfill & {0.4274} \hfill & { - 0.1859} \hfill & {0.0203} \hfill & { - 0.2104} \hfill & { - 0.6264} \hfill & { - 0.5471} \hfill & {0.0743} \hfill & {0.1566} \hfill & {0.1282} \hfill \\ {\text{x5}} \hfill & {0.3474} \hfill & { - 0.0728} \hfill & { - 0.0053} \hfill & { - 0.3652} \hfill & {0.7509} \hfill & { - 0.3622} \hfill & {0.1624} \hfill & {0.1384} \hfill & { - 0.0019} \hfill \\ {\text{x6}} \hfill & {0.1596} \hfill & {0.1882} \hfill & { - 0.2401} \hfill & {0.2653} \hfill & {0.0964} \hfill & { - 0.0190} \hfill & {0.0561} \hfill & { - 0.2374} \hfill & {0.8614} \hfill \\ {\text{x7}} \hfill & {0.2310} \hfill & {0.1357} \hfill & { - 0.3541} \hfill & {0.1628} \hfill & { - 0.0799} \hfill & {0.0730} \hfill & {0.7306} \hfill & { - 0.3300} \hfill & { - 0.3492} \hfill \\ {\text{x8}} \hfill & {0.3152} \hfill & { - 0.0823} \hfill & { - 0.4071} \hfill & { - 0.1534} \hfill & { - 0.0618} \hfill & {0.5785} \hfill & { - 0.0036} \hfill & {0.5999} \hfill & {0.0786} \hfill \\ {\text{x9}} \hfill & {0.3983} \hfill & {0.0389} \hfill & { - 0.2510} \hfill & { - 0.2912} \hfill & { - 0.0266} \hfill & {0.1494} \hfill & { - 0.5532} \hfill & { - 0.5754} \hfill & { - 0.1788} \hfill \\ \end{array} } \right) $$
  1. (2)

    Calculate the eigenvalue vector of the correlation coefficient matrix

(1.5315, 0.2945, 0.2291, 0.1658, 0.1331, 0.1170, 0.1006, 0.0881, 0.0778)

  1. (3)

    Calculate the principal component contribution rate vector \( \lambda \) and cumulative contribution rate G(M).

The contribution rate vector \( \lambda \) = (55.9456, 10.7591, 8.3684, 6.0553, 4.8631, 4.2733, 3.6763, 3.2172, 2.8418)

The contribution rate of the first three principal components is G(M) = 75.0731%, although there will be some information loss, it is not so great to affect the overall situation.

According to the coefficient matrix S, the expressions of the first three principal components (f1, f2, f3) are respectively as follows.

  • f1 = 0.1725x1 + 0.5151x2 + 0.2543x3 + 0.4274x4 + 0.3474x5 + 0.1596x6 + 0.231x7 + 0.3152x8 + 0.3983x9

  • f2 = 0.3276x1 − 0.3978x2 + 0.7953x3 − 0.1859x4 − 0.0728x5 + 0.1882x6 + 0.1357x7 − 0.0823x8 + 0.0389x9

  • f3 = −0.2918x1 + 0.5263x2 + 0.4751x3 + 0.0203x4 − 0.0053x5 − 0.2401x6 − 0.3541x7 − 0.4071x8 − 0.2510x9

Respectively, x1, x2 … x9 represents Chinese, math…biological.

From the expression of the first principal component f1, it has the positive load on each variable, indicating that the first principal component represents the comprehensive components.

From the expression of the second principal component f2, the value of f2 decreases with the increase of x2(Math), x4(physics) and x5(chemistry), whereas increases with the increase of x3(English), x6(politics), x7(history) and x9(biology), which indicates f2 reflects a student’s level of liberal arts.

From the expression of the third principal component f3, the value of f3 increases with the increase of x2(Math), x3(English) and x4(physics), whereas decreases with the increase of x1(Chinese), x6(politics), x7(history), x8(geography) and x9(biology), which indicates f3 reflects a student’s level of science.

In this way we can simplify the nine indicators into three ones: f1, f2 and f3. Now we can calculate students’ scores with the new indicator system. Table 4 is new sample data with the new indicator system.

Table 4. Sample data with the new indicators
  1. (4)

    Attribute coordinate comprehensive evaluation

Respectively we provide three total score planes 460, 345 and 311 for the expert to rate. The scores of the last two total samples are shown in Tables 5 and 6 respectively. The expert’s preference can be seen directly from the ratings (the last column). When the total score is higher, the expert pays more attention to the comprehensive level of students. When the total score is relatively lower, the expert values students’ science scores more. This evaluation is easier than that without principal component analysis.

Table 5. The samples and ratings for total score around 345
Table 6. The samples and ratings for total score around 311

We can obtain the barycentric coordinates of 460, 345 and 311 respectively (268.2157, 92.1146, 100), (223.98, 51.46642, 6922313) and (200.2936, 44.12158, 66.05498). We draw the barycentric curves of indicator f1, f2 and f3 respectively (shown in Figs. 4, 5, 6). It can be seen that the three curves are all monotonically increasing, which are more reasonable than those drawn with the old model.

Fig. 4.
figure 4

The barycentric curve of indicator f1

Fig. 5.
figure 5

The barycentric curve of indicator f2

Fig. 6.
figure 6

The barycentric curve of indicator f3

Table 7. The comparison of satisfaction degrees using the unimproved method
  1. (5)

    the Comparison of Satisfaction Degree Before and After Improvement

Finally, we examine the satisfaction degree obtained respectively using the two models. The followings are the scores of two students No. 466 and No. 196. They almost have the same total score, however, it is obvious that No. 196 is better at science than No. 466. So normally the satisfaction degree of No. 196 should be greater than that of No. 466 under the condition that the evaluator values the science scores more. However the result is opposite in the case of the unimproved method, which is unreasonable (shown in Table 7). Comparatively, the improved algorithm fixes the flaw and obtains the reasonable result, better reflecting the preference of the evaluator (shown in Table 8).

Table 8. The comparison of satisfaction degrees using the improved method

5 Conclusion

The improved method integrates principal component analysis into the original method to reduce the number of indicators so as to make the experts’ rating process more simple and effective. The simulation examines the comparison of the results before and after using the principal component analysis and shows that the barycentric curves look more favorable, and the satisfaction degrees of the evaluated objects more accurately reflect the preferences and experiences of experts.