Keywords

1 Introduction

User experience (UE or UX) was firstly proposed by designer Donald Norman (1997) and widely recognized in the mid-1990s. According to ISO 9241-210 (1998), user experience refers to all emotions, beliefs, preferences, feelings, physical and psychological responses, behaviors, and sense of accomplishment in the processing of how users use products and services1. That is, the physical and psychological feelings that users may have when interacting with a product, system or service. In recent years, user experience design has been widely used in the field of game design. Perspectives to study user gaming experience are diverse which also leads to the diversity of evaluation methods that are used. In general, there are two kinds of methods, one is the direct evaluation method, and another is to construct a user experience evaluation model at first. The direct evaluation method is usually qualitative that analyzing a series of physiological data of the player measured by an instrument as to evaluate the user experience level. Commonly used methods include eye movement and Electroencephalogram (EEG) experiments. While the method to construct an evaluation model is usually quantitative or a combination of quantitative and qualitative analyses. It mainly uses some mathematical and statistical methods, such as multiple regression analysis, to establish a model that assesses the relationship between user experience elements and ultimate goal. Since the concept of users’ gaming experience is abstract and ambiguous and contains many elements, it is hard to fully reflect the real situation of user experience by a single method. Exploring the dimensions of how to evaluate users’ gaming experience and making users’ gaming experience specific becomes necessary. The basic idea is to divide a complex system into multiple indicators to reflect the overall situation. Bai (2019) applied theories about Kansei engineering and used the semantic differential method and Oneway ANOVA to study the differences of the player’s perception influenced by each system in card games2. Using League of Legends as an example, Wang and Zhao (2014) studied the user’s gaming experience from the perspective of senses, operation, and interaction3. Combined with the previous studies, it is practicable to use quantitative methods to establish the model evaluating user’s gaming experience and provide scientific and reliable data to support game design.

In recent years, the battle royale games such as PUBG and APEX were very popular among players. Fast-pace and intense competitiveness are basic characteristics of such games. Therefore, whether new players can quickly understand and master basic operations of the gameplay determines the quality of the gaming experience. Hence, for the battle royale games, the novice tutorial plays a significant role in the user retention rate. For the game UX designers, it is necessary to screen out the best design plan using certain methods to evaluate and compare all the proposals during product designing. People are crucial to evaluate the game UX design. Perceptions of users about UX design obtained through some qualitative and quantitative research methods can help UX designers establish an evaluation system that gives help to guide and makes designers introspect their work. However, it is not all factors in user gaming experience can be measured. It is a complex system containing plenty of unknown factors that the exterior might seem clear while the inside is vague. In this regard, this study uses grey system related theories and the analytic hierarchy process to explore the construction of a model evaluating user experience in the battle royale games. Proposed by Prof. Deng Julong (2017) of China, grey system theory is a research method based on mathematical theory. Its main study object is usually a system with fewer samples and less information. By quantifying uncertain gray information, it solves problems that contain unknown factors4. The object of this study is the user experience of novice tutorials in the Battle Royale Games. It contains quantifiable and non-quantifiable information, which is a typical gray system. In such a complex system, not all factors that affect the player’s experience are of equal weight.

The analytic hierarchy process (AHP) was a decision method proposed by Thomas Saaty (1980). This method treats decision making as a system and decomposes it into three layers, that is target, criterion and plan. It has good effect on solving complex decision-making problems and multi-plan optimizations. Grey AHP theory is a method combining the grey system theory and AHP absorbed the strengths and characteristics of both methods. It fits well with the object in this study.

2 Collection of Emotional Words

By emotional words, we refer to any word that conveys emotional connotations and typically is adjectives. For example, adjectives such as exquisite and smooth are emotional words we might use to describe porcelain. In the study of user experience evaluation, emotional words to describe a product are typically used to assess users’ emotional dimensions about the product. Theoretically, emotional vocabulary has single adjectives as well as pairs of adjectives with opposite meanings. However, considering the similarities between adjectives, researchers usually choose pairs of adjectives with opposite meanings when constructing user experience evaluation models. For evaluation model established by grey AHP method, it is significant to select emotional vocabulary. It has a great impact on the accuracy of the evaluation results. The appropriateness of emotional vocabulary and the number of it are important factors affecting the evaluation results. Too many words may increase the burden of the participants, while too few emotional words may result in insufficient description of the evaluation object. Concerning the appropriateness, there are two types: (a) appropriateness of the evaluation object indicating whether the emotional words can accurately describe the objective evaluated, (b) appropriateness of the user referring to the frequency that users use these words as well as the familiarity with them. Therefore, researchers need to choose an appropriate number of emotional words based on their knowledge and experience. In the collection process, we need to select adjectives that can describe the cognitive characteristics of the object in every aspect. This study uses two ways for collection: (a) through websites and forums of the battle royale games, (b) from interviews with the players of the battle royale games.

In the above two approaches, the study team collected 33 pairs of emotional words in the beginning. After filtering out the less frequently used words in daily life, the remaining 30 pairs are adopted and stated in the Table 1.

Table 1. Selected emotional words

To screen the most representative words, we further classify and reduce the number of the above words. The study team recruited 11 participants who had a certain understanding of the Battle Royale Game. There were 9 males and 2 females, and 8 participants were design practitioners. The study team asked the participants to classify the 30 groups of emotional adjectives initially collected. The number of classifications is not limited, and the number of words in each category is controlled within 4. By counting the number of pairwise words that occurred in the same group, we got a 30 * 30 matrix. Multi-dimensional scaling was adopted to analyze the above matrix, and the dimension was set up as 2–6. Multidimensional scaling (MDS) is a method of multivariate analysis. It is commonly applied in the area of sociology, psychology, marketing, etc. The reason that the research team using multidimensional scaling analysis is to fit the research data through a lower dimension and obtain the coordinate values of each emotional word in the spatial map. This experiment measures the goodness of fitting of the spatial map according to the empirical standard proposed by Kruskal (1964) shown in the Table 2. As shown in the Table 3, the stress value indicates that the two-dimensional model fits the observed data well in this study, which is 0.071. In this way, we got the coordinate distribution of each emotional word in the two-dimensional space.

Table 2. Relationship between Kruskal stress coefficient and goodness of fit
Table 3. Emotional vocabulary coefficient

Next, a cluster analysis was performed on the two-dimensional spatial coordinates of the obtained emotional words. First, a systematic clustering analysis was conducted and a line chart was drawn with “category number” as the abscissa and “aggregation coefficient” as the ordinate, as shown in the Fig. 1 below. The broken line became slow when the category number reached 11. In this way, we can assert that there were 11 categories.

Fig. 1.
figure 1

Aggregation coefficient

A K-means clustering analysis was then conducted on the two-dimensional coordinates of the emotional words according to the obtained aggregation coefficient, and the K value was set to 11. After calculation, the pair groups of the emotional words were sorted out and the distance of each group from the cluster center were obtained. Pairing words that were closest to the cluster center in each group turned out to be the most representative ones. The final grouping of the emotional words was shown in the Table 4. The study team renumbered the obtained 11 representative adjective pairs (E1–E9), and the results were also shown in Table 5 below.

Table 4. Categories of emotional words
Table 5. Representative emotional words

The interpretation of each representative emotional word is as follows.

  • E1 Safe: The experience of the novice tutorial makes players feel safe.

  • E2 Understandable: The introduction of the gameplay is easy for players to understand.

  • E3 Active: The teaching method of the novice tutorial is not rigid but lively.

  • E4 Responsive: Players are able to get feedback after finishing corresponding operation or achieving a goal.

  • E5 Smooth: Players will not get stuck when experiencing the novice tutorial.

  • E6 Clear: The interface guide of the novice tutorial is clear and straightforward.

  • E7 Beautiful: The interface and scene design of the novice tutorial can meet the aesthetic needs of players.

  • E8 Fault-tolerant: Players are allowed to misuse in the novice tutorial and the cost of mistakes is small.

  • E9 Natural: The interaction of the novice tutorial is natural and fit with players’ interactive habits.

  • E10 Accomplished: Players can gain a certain sense of accomplishment after achieving the goals set in the novice tutorial.

  • E11 Orderly: The arrangement of the teaching modules of the novice tutorial is orderly.

3 Establish of the Evaluation Hierarchy

The analytic hierarchy process method usually consists of three layers from top to down. The uppermost layer is the target layer, which is the decision goal; the second layer is the criterion layer which mainly includes the criteria considered in the decision-making process, and there can be multiple sub-layers. The third is a plan layer, which includes several alternative plans for the decision. Combining the relative researches of user gaming experience, the target layer established in the evaluation hierarchy in this study was the user experience of novice tutorials in the battle royale games. Moreover, the study team divided the user experience into three criterion layers, that is sensory experience, behavioral experience and value experience. And the emotional words obtained above were classified into the sub-layers of the criterion layers. Since there were no design plans in this study, plan layer was not included for analysis. The schematic diagram of the final evaluation hierarchical structure is shown in the Fig. 2.

Fig. 2.
figure 2

Evaluation hierarchy of the user experience in novice tutorials in the Battle Royal Game

4 Perceptual Evaluation Experiment and Calculation of Subjective Weights

Before acquiring users’ perceptual needs through perceptual experiment, certain samples as well as questionnaires about perceptual evaluation of the above emotional words were to collect at first. We need to understand that the purpose of this experiment was to obtain users’ perceptual needs for the Battle Royale Games’ novice tutorials. Thus, novice tutorials of three battle royale games were selected as samples. In order to control variables, the three games were all FPS games, called PUBG, APEX, and CODwarzone, named as D1, D2, D3. First, participants were asked to experience the novice tutorials of these three games and then filled in the semantic difference questionnaires. Participants all had a certain understanding of the battle royale games or were the target players of those games. There were 22 participants in total, of which 17 were males and 5 were females. The age range of the participants were 20–25. Questionnaires of the user’s perception needs are shown in the Table 6 below.

Table 6. Perceptual questionnaires sample

After calculating the average value of each sample in every emotional vocabulary group, a line chart of expected target results of users was obtained, as shown in the Fig. 3.

Fig. 3.
figure 3

Line chart of users’ perceptual needs

After obtaining the users’ perceptual needs for novice tutorials in the battle royale games, the study team aimed to calculate the weight of each evaluation index based on the evaluation hierarchy acquired. The study team compared the elements at the same level one by one with reference to the users’ perceptual needs and produced a comparison matrix. The scaling method proposed by Professor Saaty (1980) was applied to score the comparison results5, as shown in the Table 7.

Table 7. Scoring scale

When comparing the indexes, we assumed that there were n criteria (i.e., \({a}_{1}, {a}_{2}\)\({a}_{n}\)) in the next layer of the target C, it was necessary to use C as a standard to compare the importance of these criteria (i.e., \({a}_{1}, {a}_{2}\)\({a}_{n}\)) respectively and produce the comparison matrix A. The matrix A could be expressed as follows.

A = \(\left[\begin{array}{cc}\begin{array}{cc}{a}_{11}& {a}_{12}\\ {a}_{21}& {a}_{22}\end{array}& \begin{array}{cc}\dots & {a}_{1n}\\ \dots & {a}_{2n}\end{array}\\ \begin{array}{cc}\dots & \dots \\ {a}_{n1}& {a}_{n2}\end{array}& \begin{array}{cc}\dots & \dots \\ \dots & {a}_{n4}\end{array}\end{array}\right]\)

$$ A = \left( {a_{ij} } \right)_{n \times n} ,\;\;{\text{i}},{\text{j}} = {1},{2}, \ldots ,{\text{n}} $$
(1)

And, \({{\varvec{a}}}_{{\varvec{i}}{\varvec{j}}}\) is the score after comparing the importance of factor i and j.

After obtaining the matrix, a consistency test will be performed to obtain the consistency ratio (CR). The calculation method of CR is as follows.

$$\begin{aligned} &{\rm{CR}} = \frac{CI}{RI} \\& {\rm{CI}} = \frac{\lambda -n}{n-1} \\& {\text{and}}\;\lambda = \frac{1}{n}\sum\nolimits_{i - 1}^{n} {\frac{{\left( {AW} \right)_{i} }}{{w_{i} }}\;{\text{is}}\;{\text{the}}\;{\text{characteristic}}\;{\text{root}}\;{\text{of}}\;{\text{the}}\;{\text{matrix}}.} \end{aligned} $$
(2)

The values of RI are as follows according to research6.

Table 8. Value of RI

If CR = 0, the matrix is perfectly consistent. If CR = 1, the matrix is perfectly inconsistent. Normally, if CR < 0.1, the consistency of the comparison matrix is considered satisfactory. When the matrix deviates from consistency, the accuracy of the index weights cannot be guaranteed. Based on the above methods, the consistency ratio of this experiment was calculated and CR = 0.043 < 0.1, which suggested the matrix was acceptable.

At last, Yaahp was used to process the obtained data to calculate the weight index of each element relative to the upper layer. The player’s perceptual needs can be recorded as follows.

Table 9. The weight of each evaluation index

5 Grey Relational Analysis

After obtaining the weight of each evaluation index, this paper used Grey relational analysis method to improve the hierarchical analysis model. The main purpose was to transform the data from grey to white, that is from uncertain information to definite information. As mentioned, evaluation of gaming experience is a system containing lots of fuzzy information and the cognition bias from different evaluators might also bring deviation to the experimental results. Thus, the average weight of each index obtained above is not yet an ideal result. It is necessary to combine the Grey relational analysis method to further optimize the obtained evaluation model.

  1. (1)

    First, we determined the reference sequence and the comparison sequence. The reference sequence reflected the characteristics of the system referring to the best scores of each indicator about the user experience of the novice tutorial in the battle royale games here. The comparison sequence referred to data that has impact on system and needed to be sorted. Here it referred to experimental data of the novice tutorials of three the battle royale games. According to the serial number of the plans, the comparison sequence was remarked as \({x}_{i}\), and i indicated the serial number, \({x}_{i}\)(k)referred to the evaluation score of the i-th plans within the index k.

    $$ x_{i} = \left( {x_{1i} ,x_{2i} ,...,x_{ji} } \right),\;\;{\text{i}} = 1,2,...,{\text{m}} $$
    (3)
  1. (2)

    Non-dimensionalization of the comparison sequences. Since dimensions impacting the evaluation indexes may be different, all indexes were made dimensionless before the comparison sequence was analyzed. All sequences were divided by the k-th adjustment sequence to ensure that the evaluation scores of all series fall in (0, 1). Yet the obtained matrix in the experiment was dimensionless already, non-dimensionalization was unnecessary.

    $$ {\text{f}}\left( {x_{i} \left( {\text{k}} \right)} \right) = \frac{{x_{i} \left( {\text{k}} \right)}}{{A\left( {\text{k}} \right)}},\;{\text{k}} = 1,2,...,{\text{n,}}\;{\text{i}} = 1,2,...,{\text{m}} $$
    (4)
  1. (3)

    Calculated the Grey relational coefficient (GRC). The GRC ξ(\({x}_{0}\)) refers to the degree of difference between the geometric shapes of the curves, that is, the difference between the curves can measure the degree of correlation. The calculation is as follows. The i refers to the serial number of the plans while k represents the index sequence.

    $$ {\upxi }_{{\text{i}}} \left( {\text{k}} \right) = { }\frac{{\mathop {{\text{min}}\;{\text{min }}}\limits_{{{\text{i}} \in {\text{m}}\;\;{\text{k}} \in {\text{n }}}} \left| {{\text{x}}_{0} \left( {\text{k}} \right){ } - {\text{ x}}_{{\text{i}}} \left( {\text{k}} \right)} \right| + \mathop {\,{\uprho }\,{\text{max}}\,{\text{max}}}\limits_{{{\text{i}} \in {\text{m }}\;\;{\text{k}} \in {\text{n}}}} \left| {{\text{x}}_{0} \left( {\text{k}} \right){ } - {\text{ x}}_{{\text{i}}} \left( {\text{k}} \right)} \right|}}{{\left| {{\text{x}}_{0} \left( {\text{k}} \right){ } - {\text{ x}}_{{\text{i}}} \left( {\text{k}} \right)} \right| + \mathop {{\uprho }\;{\text{max}}\;{\text{max}}}\limits_{{{\text{i}} \in {\text{m }}\;\;{\text{k}} \in {\text{n}}}} \left| {{\text{x}}_{0} \left( {\text{k}} \right){ } - {\text{ x}}_{{\text{i}}} \left( {\text{k}} \right)} \right|}} $$
    (5)

Among them, \(\rho \) is the discrimination coefficient which can improve the statistical significance between GRC. The smaller the \(\rho \), the greater the discrimination. The general value range of \(\rho \) is (0, 1), and its general value is 0.5. According to relative research, the value of \(\rho \) is adopted as 0.5 in this study as well.

  1. (4)

    Calculated the Grey interconnect degree. Since the relational coefficient is the value of the interconnect degree of the comparison and reference sequences at each time, that is, each point in the curve, it has more than one value. To prevent the information from being too scattered, all relational coefficients at each time need to become centralized as one value, that is Grey interconnect degree \({r}_{i}\), making it feasible for comparison as a whole. The calculation formula is as follows.

    $$ r_{i} = \sum\nolimits_{k = 1}^{n} {w_{k}^{G} \xi_{oi} \left( k \right)} ,\;\;{\text{k}} = \left( {{1},{2}, \ldots ,{\text{n}}} \right) $$
    (6)

Design plans will be compared with each other for better selection according to the value of the Grey interconnect degree. If \({r}_{i}\ge {r}_{j}\), it suggests that the plan \({{\varvec{x}}}_{{\varvec{i}}}\) is better than plan \({{\varvec{x}}}_{{\varvec{j}}}\). In this way, we compared three design plans in this study and selected the one with the biggest Grey interconnect degree as the best plan scores of Grey interconnect degree of three design plans were obtained, as shown in the Table 10.

Table 10. Score of the Grey interconnect degree

Combined with the weight of each evaluation index obtained, scores of weighted Grey interconnect degree of three design plans were calculated, as shown in the Table 11.

Table 11. Score of Weighted Grey interconnect degree

6 Conclusions and Limitations

  1. (1)

    This paper discusses a method of constructing the system evaluating the user experience of novice tutorials of the battle royal games in the process of game design. This method can help designers to fully understand the satisfaction degree of users in a scientific way in the process of game development and select the most superior design plan. It also provides a certain reference for the direction of further iterations. When we were making general evaluation about the design plans, the Analytic Hierarchy Process method and Grey Relational Analysis method were applied to calculate the weighted interconnected degree of the evaluation targets and ideal plans so that the plan with higher interconnected degree was screened out. This verified the auxiliary role of the grey AHP Theory in evaluating game experience. The ideas and methods to establish the model evaluating gaming experience in this paper can be extended to experience evaluation of other game systems other than the novice tutorial and other kinds of games. It is of great significance for game design.

  2. (2)

    Due to limited time and resources, only three novice tutorials of the battle royale games that were extremely popular recently were selected for evaluation in this study. In the follow-up research, this paper may adopt more samples or design plans of our own game projects for evaluation to improve the accuracy of the model. As for research methods, this study adopted online questionnaires to conduct the perceptual experiment due to physical constraints. The study team will consider applying eye-tracking equipment or other perceptual measurement methods in further research.