1 Introduction

Shield tunnelling is increasingly being utilised in infrastructure development owing to its efficiency and environmental benefits. However, the construction process involved is complex and poses various safety risks, particularly when deployed in soil-rock mixed strata [7, 8]. Geological uncertainties, potential ground settlement, groundwater influx, shield cutter wear, shield attitude control, and ground condition variations are the main risks associated with soil-rock mixed strata [32]. These risks must be evaluated judiciously to ensure the stability and safety of the tunnelling process [2, 29]. To identify the potential risks in these strata, a safety risk assessment must be performed to ensure a safe and efficient construction.

The risk assessment of shield construction has progressed considerably in recent years. However, existing research primarily focuses on individual risk sources in shield construction, such as collapse [9, 12, 19, 28], water inflow and inrush [26, 30], tool wear [5], and delay risk [13], whereas a comprehensive research and evaluation of complex risk sources is rarely conducted. Based on our literature review, numerous theories and methods have been proposed to address the risk of shield construction, including empirical [15], analytical [33], experimental [16], and numerical methods [14, 22, 35]. However, many current risk assessment methods rely on experience [11, 12, 28]. Meanwhile, owing to the development of artificial intelligence (AI), significant efforts have been expended to develop AI-based risk assessment methods [10, 20, 25, 31], where mining information from construction and monitoring data is essential [21]. Nevertheless, the data used in existing studies are incomplete and lack systematicity. For example, Zhou et al. [36] used only tunnelling machine data from a shield acquisition system for clustering, whereas Ge et al. [6] used data from both geological conditions and shield tunnelling parameters. Compared with construction data [1], monitoring data are more accessible (e.g., settlement monitoring data), which can reflect the environmental effect of tunnel construction. Furthermore, vibration data can be acquired to reflect the uncertainty of geological conditions [23]. Uncertainty or fuzziness in the data obtained during shield construction is inevitable. In this regard, fuzzy C-means (FCM) clustering is an effective tool for addressing uncertainty or fuzziness in data [4], thus rendering it a promising option for risk assessment in shield tunnel construction, where data may be complex and not clearly defined.

The objective of this study is to present an integrated risk assessment method for shield tunnelling in soil-rock mixed strata while considering shield tunnelling parameters and monitoring data. A novel model using fuzzy set pair analysis (FSPA) and FCM clustering is developed to analyse and evaluate risks during shield tunnelling based on construction data. Eight parameters of tunnelling machine data are selected. Additionally, monitoring data pertaining to ground settlement and vibrations at the tunnel face are utilised. The mutual information (MI) method is employed for feature selection, and a risk assessment index system is established by combining practical engineering and MI scores. Based on construction data, the connection numbers are calculated using the FSPA method, and the criteria importance though intercriteria correlation (CRITIC) method is adopted to weight the indicators. Subsequently, the results are classified by the FCM clustering with a modified objective function to obtain clustering results that combine the importance of risk indicators such that the risk level of each ring can be derived in real time. This novel model is a practical option for guiding engineering-risk decisions during tunnel construction. The key innovations of this study include the following: i) the development of an integrated risk assessment method based on FSPA and FCM, including the consideration of raw data pre-processing and risk level clustering prediction; ii) a modified objective function in the FCM algorithm to consider the risk factor distribution; and iii) a scientific index system based on data pertaining to tunnelling machine, deformation, and vibration, with features selection according to the MI method and practical tunnelling engineering.

2 Materials and methodology

2.1 Risk evaluation framework

Figure 1 illustrates the framework of the proposed novel model for risk evaluation, which can be described in three stages involving six steps: (i) acquisition of construction and monitoring data, (ii) establishment of an index system, (iii) weight assignments for the indices, (iv) calculation of the connection number, (v) FCM clustering analysis considering the weight, and (vi) determination of risk levels. Steps (i) and (ii) are presented in a case study, and the methods used in steps (ii) to (v) are introduced in detail in the following Sects.  2.3 to 2.6.

Fig. 1
figure 1

Flowchart for risk assessment of shield tunnelling in soil-rock mixed strata

2.2 Data sources

Two types of data are involved in this study: internal tunnelling performance data, i.e. the shield tunnelling parameters, such as the thrust force, tunnelling speed, and cutter head torque; and external environmental data, i.e. monitoring parameters, including deformation and vibration information. The internal tunnelling performance parameters were obtained by the shield machine data acquisition system, whereas the external environmental data were available from monitoring equipment. Specifically, deformation information involving ground settlement, building settlement, vault settlement, and convergence was measured using electronic levels and total stations. Moreover, vibration data were acquired using accelerometers installed on the back of the soil chamber wall inside the shield [23].

2.3 Feature selection (MI method)

Feature selection is crucial in refining datasets for analysis by identifying the most relevant attributes. This strategic process can enhance the performance and interpretability of the risk assessment model, while concurrently reducing the computational complexity of extensive datasets. MI algorithms are widely used for feature selection in data mining. In this algorithm, the dependence between two random variables is measured based on the concept of entropy derived from information theory [27]. The MI method is suitable for high-dimensional datasets because it can capture complex feature dependencies. Feature selection is performed by calculating the MI score of each feature relative to the target variable. The MI score is calculated as follows:

$$MI(X;Y) = \sum\limits_{x,y} p (x,y)\log {{p(x,y)} \mathord{\left/ {\vphantom {{p(x,y)} {p(x)p(y)}}} \right. \kern-0pt} {p(x)p(y)}}$$
(1)

where X and Y are the sets of values for two variables; p(x, y) is the joint probability distribution of X and Y; and p(x) and p(y) are the marginal probability distributions of X and Y, respectively. The MI score ranges from zero to infinity, with higher values indicating stronger associations between the two variables.

2.4 Weights assignment of risk factor (CRITIC method)

CRITIC is an objective weighting method [3] that determines weights based on the comparative strength and the conflict degree, as well as comprehensively considers the correlation and variability of indicators. The main steps of CRITIC are as follows: i. Standardisation of indicator data (Eq. 2); ii. calculation of comparative strength, where the standard deviation S is utilised to illustrate the volatility (the greater the volatility, the higher is the weight); iii. calculation of conflict degree, where the correlation coefficient R is used to indicate the conflict (for two indicators with a strong positive correlation between them, a lower level of conflict signifies a lower weight); iv. determination of the objective weights (Eq. 3).

$$x_{{}}^{\prime } = \left\{ \begin{gathered} \, {{(x_{{}} - x_{\min})} \;\mathord{\left/ {\vphantom {{(x_{{}} - x_{\min})} {(x_{\max} - x_{\min})}}} \right. \kern-0pt} {(x_{\max} - x_{\min})}} \, (\text{positive}) \hfill \, {{(x_{\max} - x_{{}})} \mathord{\left/ {\vphantom {{(x_{\max} - x_{{}})} \;\;{(x_{\max} - x_{\min})}}} \right. \kern-0pt} {(x_{\max} - x_{\min})}} \,\;(\text{negative}) \hfill \\ \end{gathered} \right.$$
(2)
$$W_{k} = {{S_{k} \times R_{k} } \mathord{\left/ {\vphantom {{S_{k} \times R_{k} } {\sum\limits_{k = 1}^{n} {(S_{k} \times R_{k} )} }}} \right. \kern-0pt} {\sum\limits_{k = 1}^{n} {(S_{k} \times R_{k} )} }}$$
(3)

2.5 Connection number distribution (FSPA method)

To analyse the uncertain relationship between two sets with a certain relationship in a set pair, set pair analysis [34] is performed as it can address uncertain systems puts forward based on the concept of a connection number μ:

$$\mu = a + b_{1} i_{1} + b_{2} i_{2} + \cdots + b_{n - 2} i_{n - 2} + cj$$
(4)

where a + b1 + b2 + … + bn-2 + c = 1; a, b, and c are the degree of identity, difference, and opposition of the set pair, respectively; b1, b2, …, bn-2 are the difference degree components, which can describe the ambiguity in two sets; i1, i2, …, in-2 are the difference degree component coefficients, whose value range is [− 1,1]; and j is the coefficient of opposition, whose value is typically set as -1. Fuzzy numbers [17, 18] are used to analyse the uncertainty of the difference coefficient i. For the risk assessment problem, μ reflects the relationship between the measured value and safety standard interval.

2.6 Risk level determination (FCM method)

FCM, which was proposed by Dunn (1973) and Bezdek (1981), is a widely used fuzzy clustering algorithm based on the fuzzy set theory and k-means algorithm. The membership degree is employed in the FCM method to indicate the extent to which a sample belongs to a specific cluster [24]. The objective function is calculated as the product of the membership degree and the distance to the cluster centre. Assuming that the data set X (x1, x2, …, xn) is partitioned into c clusters, and the membership degree of sample xj with cluster centre ci is uij, the constraint function can be expressed as:

$$J = \sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{n} {u_{ij}^{m} } } \left\| {x_{j} - c_{i} } \right\|^{2} = \sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{n} {u_{ij}^{m} } } d_{ij}^{2}$$
(5)

where m is the membership factor, which is generally set as 2; and dij is the Euclidean distance from xj to the cluster centre ci. Additionally, each sample xj comprises p features, and the weight W is combined to obtain the distance indicator Dij, which is expressed as:

$$D_{ij} = \sqrt {\sum\limits_{k = 1}^{p} {W_{k} \left( {x_{jk} - c_{ik} } \right)^{2} } }$$
(6)

Using Dij instead of the original dij and substituting it into Eq. 5, the objective function that considers the weights can be expressed as:

$$J = \sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{n} {\sum\limits_{k = 1}^{p} {u_{ij}^{m} \cdot W_{k} \left( {x_{jk} - c_{ik} } \right)^{2} } } }$$
(7)

The objective function is optimised using an iterative method to obtain the membership degree. Thus, the optimal cluster centre ci and fuzzy membership matrix can be output to achieve an automatic classification of the samples based on the maximum membership principle.

3 Case study

3.1 Project overview

The proposed method was applied to a case study conducted in Guangzhou, China. The construction of a 21-ring shield tunnelling interval on the left line of the Guangzhou–Foshan Intercity Railway was investigated, as illustrated in Fig. 2. Specifically, Ring 1560 to 1580 (ZDK 31653.546–ZDK 31691.42) was investigated, where the shield was tunnelling in soil-rock mixed strata. For more details regarding this tunnel project, please refer to Zhang et al. [32] and Shen et al. [23].

Fig. 2
figure 2

Location of construction site

The geological conditions in the study area were extremely challenging and were characterised by soft and hard unevenness, which rendered construction susceptible to large ground settlements, cutter wear, and low excavation efficiency. Figure 3 shows the tunnelling progress from 5 June, 2022, where numerous challenges were encountered; the tunnelling became even less efficient beginning from 24 June, 2022, which coincided with the initiation of construction on Ring 1572. As shown in Figs. 3, 10 soil chamber openings with pressure were implemented to accommodate tool changes during the construction of these 21 rings, particularly during the construction of Ring 1578, which was opened twice. The ground was reinforced using advanced grouting during the construction of Rings 1574 and 1579.

Fig. 3
figure 3

Tunnelling process during research interval from Ring 1560 to 1580

3.2 Data acquisition and pre-processing

In this study, Ring 1560 to 1580 on the left line of the shield tunnel were investigated. The database of the proposed model comprised shield tunnelling parameters, deformation information, and vibration data. The shield tunnelling parameters were manually regulated by experienced shield operators. In fact, they can be acquired directly through the shield machine data acquisition system. Figure 4 shows the variation in eight shield tunnelling parameters: the total thrust force (F), tunnelling speed (V), cutter head rotation speed (RSP), cutter head torque (T), soil pressure (Ps), grouting pressure (Pg), shield horizontal attitude (Trx), and shield vertical attitude (Trv). These parameters can provide insights into the performance and efficiency of the shield machine and offer a comprehensive characterisation of the challenges encountered in tunnelling. For instance, as shown in Fig. 4a, the range of hard rock excavation broadened from Ring 1568 to 1570, thus resulting in higher values of F and T for overcoming the soil resistance; the RSP reached its minimum value at Ring 1579 (see Fig. 4b), thus implying the weakest cutter head cutting ability at this time and the worthiness of investigating the rationality of the cutter head parameter settings.

Fig. 4
figure 4

Shield tunnelling parameters data from Ring 1560 to 1580: a F and T; b V and RSP; c Ps and Pg; d Trx and Trv

Deformation data reflect the effect of the construction on the surrounding environment and assembled tunnels. The data comprise five parameters: the cumulative settlement above the tunnel face (TFSsum, mm), maximum ground settlement rate (GSRmax, mm/d), maximum building settlement rate (BSRmax, mm/d), maximum vault settlement rate (VSRmax, mm/d), and maximum clearance convergence rate (CCRmax, mm/d). Figure 5 presents the variations in these five parameters, which provides valuable information regarding changes in the environment and structures during tunnelling. Notably, as shown in Fig. 5a, a clear transition occurred between Rings 1571 and 1572, where TFSsum shifted from approximately + 10 mm to − 50 mm, thus signifying a transition from ground uplift to ground settlement. Such a transformation can indicate the presence of an underlying geological structure or condition that affects the interaction between the tunnelling process and the surrounding ground, such as a geological fault or a change in the geological strata. In risk analysis, this change can serve as a poignant reminder of the effect of construction activities on both environmental safety and structural integrity.

Fig. 5
figure 5

Deformation data from Ring 1560 to 1580: a TFSsum; b GSRmaxBSRmaxVSRmax, and CCRmax

Vibration data are closely related to the geological characteristics of the tunnelling environment. Vibration is generated by the cutting action during tunnelling as it interacts with the surrounding geotechnical materials, and its data are acquired using accelerometers positioned on the back of the soil chamber. Two types of vibration indicators were obtained from the accelerometers installed above and below the soil chamber: upper and lower. The effective value of the vibration, which is also known as the root mean square (RMS), signifies the energy strength and stability of the vibration signal. It is calculated as follows:

$$RMS = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {x_{i}^{2} } }$$
(8)

In addition, the vibration margin (CL), which is the ratio of the absolute value of the peak to the square root amplitude (Eq. 9), reflects the shock characteristics of a vibration signal. These two parameters are shown in Fig. 6. As illustrated in Fig. 6a, RMS data from Ring 1572 to 1575 show an increase in energy intensity, thus indicating that this particular excavation phase can feature challenging geological conditions accompanied by more substantial cutter wear. A comparison between the datasets acquired from the upper and lower accelerometers showed minimal divergence (see Fig. 6), except for a clear fluctuation in the CL value at Ring 1563. This reflects the greater effect of the surrounding geotechnical materials on the lower cutter head at this position.

$$CL = {{{\text{Max}} (|x|)} \mathord{\left/ {\vphantom {{{\text{Max}} (|x|)} {\left( {\frac{1}{N}\sum\limits_{i = 1}^{n} {\sqrt {\left| {x_{i} } \right|} } } \right)^{2} }}} \right. \kern-0pt} {\left( {\frac{1}{N}\sum\limits_{i = 1}^{n} {\sqrt {\left| {x_{i} } \right|} } } \right)^{2} }}$$
(9)
Fig. 6
figure 6

Vibration data from Ring 1560 to 1580: a RMS and b CL

3.3 Establishment of risk assessment model

The proposed model was developed to conduct a comprehensive evaluation of the overall risk inherent in the tunnelling process, which encompasses various risks. This approach diverges from conventional isolated risk assessment practices, which strive to capture the intricate relationship between multiple risk sources that collectively affect construction safety and stability. The model provides a new assessment index system comprising the selected features. In this study, the MI method was utilised to perform feature selection on a dataset comprising the 17 features obtained in Sect. 3.2. To ensure the stability and reliability of feature selection, a cross-validation strategy was employed, in which the dataset was segregated into five folds using the Stratified Shuffle Split function from the scikit-learn library. The MI scores between each feature and target variable were computed, and the top nine features were selected. As shown in Table 1, an index system of risk assessment was built based on the selected features and the field engineering situation.

Table 1 Index system of risk assessment

The risk evaluation grade of the shield construction was classified into five levels: safe (Level I), relatively safe (Level II), low-risk (Level III), relatively high-risk (Level IV), and high-risk (Level V). Referring to previous engineering experience, the ten risk assessment indicators were quantified based on the conditions of the current project, and the results are listed in Table 2. Specifically, the classification criteria for groups C11–C14 were inspired by the foundational risk assessment standards outlined in previous studies [17]. These criteria were then adjusted to align with the mixed geological conditions by incorporating relevant findings from the literature [8] and engineering insights from the case study [32]. Similarly, the classification standards for groups C21–C23 were established based on an amalgamation of empirical observations from the case study site and the literature [32]. Regarding groups C31–C32, the vibration data for Rings 1557, 1558, 1562, 1563, 1570, and 1578 from the case study were analysed comprehensively in a previous study [23]. When integrated with the practical field conditions, these findings can contribute to the formulation of vibration-based classification standards. In this case study, determining the risk level is an involved process that entails a holistic evaluation of the nine indicators above. This underscores the importance of weights assigned to each indicator in the assessment framework.

Table 2 Classification standard for evaluation index of shield tunnel construction

3.4 Risk level prediction

After developing the assessment model, the weight of each evaluation indicator was determined. In accordance with the proposed methodology, the CRITIC method, which employs a meticulous evaluation process to determine weights based on both the comparative strength and degree of conflict, was employed to assign weights. The standard deviation (S) and correlation coefficient (R) were calculated for each indicator to capture the variability degree and quantify the conflicts between the indicator pairs. Using Eq. 3, the study arrived at a weight vector of nine indicators for the index system, which is represented as W = (0.121, 0.092, 0.104, 0.077, 0.131, 0.135, 0.124, 0.116, 0.100). The weights assigned via this method reflect the significance of each indicator in the overall assessment process, providing a basis for objective evaluation and decision making.

In addition, the FSPA method was utilised to compute the connection number, and the results are presented in Fig. 7. The connection number was then fed into the FCM for clustering, while the weights were combined to the distances, to obtain the membership of each indicator for each ring. In this case study, c = 5 (i = 1, 2, …, 5), n = 21 (j = 1, 2, …, 21), and p = 9 (k = 1, 2, …, 9). Nine features were considered in the calculation of Dij, namely, the distance between xj (xj1, xj2, …, xj9) and ci (ci1, ci2, …, ci9) in a nine-dimensional space. The membership degree was subsequently calculated, as illustrated in Fig. 8.

Fig. 7
figure 7

Connection number of nine indexes from Ring 1560 to 1580

Fig. 8
figure 8

Membership degree of Ring 1560 to 1580 for each risk level:a Level I; b Level II; c Level III; d Level IV; e Level V

The membership degree applied in the clustering analysis provides a comprehensive assessment of the distribution of risk levels across the tunnel rings. In Fig. 8, the membership degrees of Ring 1560 to 1580, which belonged to risk Level I to V, are shown in Fig. 8a to e. Based on the principle of maximum membership, the risk level of each ring was determined, as denoted by the red circles in Fig. 8. For instance, the membership degrees of Rings 1565 and 1569, which represent Level I risk, reached peak values of 0.30 and 0.77, respectively, thus indicating their alignment with safety standards (Level I), as illustrated in Fig. 8a. As shown in Fig. 8c, the maximum membership of the eight rings (Ring 1562, 1564, 1566, 1568, 1570, 1578, and 1580) was assigned to Level III risk, which constituted the largest proportion. Notably, Ring 1575 possessed a membership degree of 0.85 for Level V risk, thus signifying a significantly high-risk level and considerable construction challenges. Figure 8 illustrates the distribution of risk levels across the tunnel rings, which provides valuable insights into the overall risk of the construction process.

4 Results and discussions

4.1 Risk level of tunnelling

The results of risk-level assessment using the proposed model are summarised in Fig. 10a, which indicate a high level of risk fluctuation from Ring 1560 to 1580. From Ring 1572 onwards, the risk of tunnelling was significantly higher. Combining the results shown in Fig. 8, Rings 1565 and 1569 exhibited the highest membership degrees at Level I, indicating the lowest risk levels for these two rings. By contrast, Rings 1575 and 1576 indicated the highest degree of membership at Level V, thus highlighting the necessity for appropriate risk management strategies to mitigate the potential hazards associated with these rings. The remaining rings showed varying degrees of membership at Levels II, III, and IV, thus reflecting the different levels of risk for each ring. These membership degrees, as illustrated clearly through graphical presentation, establish a dynamic connection with the risk landscape. This array of membership degrees seamlessly translates into varying risk profiles for each ring, thereby reflecting the complex relationship among the geological, structural, and construction factors. The risk assessment results can serve as valuable references for tunnel risk management, facilitating safe and efficient construction.

4.2 Risk identification

Risk identification is essential for effective risk management. In this study, the risk levels of the nine sub-indicators were analysed to comprehensively assess the risk of each tunnel ring, which is crucial in risk identification. Based on the connection number shown in Fig. 7, the risk level of each indicator for each ring can be calculated, as illustrated in Fig. 9.

Fig. 9
figure 9

Construction risk of nine indexes from Ring 1560 to 1580

As shown in Fig. 9, C11 and C14 exhibited consistently high-risk levels from Ring 1560 to 1580. Therefore, the shield tunnelling parameters (B1), particularly the thrust force and earth pressure, should be regulated closely during the construction of this project. Indicator C21 in the deformation parameter (B2) demands specific attention as it risk level changes abruptly from I to V between Ring 1571 and 1572. Similarly, the vibration parameter (B3) requires additional attention during risk assessment, owing to its significance for tunnelling in soil-rock mixed strata.

Specifically, for Ring 1575, which exhibited an overall risk level of V, the five indicators (C11, C14, C21, C23, and C31) reached a high-risk level of IV. Notably, C21, which exhibited a risk level of V, significantly affected the ground surface owing to shield tunnelling. The high settlement data above the tunnel face at this location, as depicted in Fig. 5a, support this finding. By contrast, for Ring 1576, which shares a similarly high-risk level, the indicators with the highest risk levels were C14, C21, and C32. This suggests that the geological conditions at this location may be challenging, thus resulting in higher earth pressure values and vibration effects.

4.3 Model reliability

To assess the validity of the proposed method, the results were compared with those of three other methods, as shown in Fig. 10. The three methods were as follows: Method 1, which uses a modified FCM algorithm with weights to analyse the raw data; Method 2, which combines the original FCM and FSPA methods and uses the original FCM to analyse the connection numbers; and Method 3, which directly uses the original FCM method to analyse the raw data. Notably, the proposed model outperformed the other methods, with a risk level closest to the field situation. Among the other methods, Method 1 showed considerable differences in the raw data analysis, whereas while Method 2 did not consider the indicator weights. In Method 3, the original FCM was applied directly to the raw data, which yielded relatively inconsistent results. As shown in Fig. 10, the four methods indicated a similar upward trend in terms of the risk levels in the four methods. Thus, we demonstrate in the following that, compared with other methods, the proposed method is superior in terms of its accuracy and volatility, which renders it more sensitive to risk identification.

Fig. 10
figure 10

Construction risk levels from Ring 1560 to 1580 using four methods: a proposed method; b Method 1 (modified FCM); c Method 2 (FCM with FSPA); d Method 3 (FCM)

To verify the reliability and feasibility of the proposed model, three widely used evaluation metrics were employed: the silhouette coefficient, Calinski–Harabasz index, and Davies–Bouldin index. These three metrics measure the similarity of each data point to its respective cluster in comparison to other clusters, the ratio of dispersion between and within clusters, and the average similarity between each cluster and its most similar cluster. Higher values of the first two metrics signify clearer clustering results and better separation between the clusters. Conversely, a lower Davies–Bouldin index indicates better clustering results. As presented in Table 3, the evaluation results revealed favourable metric values for the proposed model, thus indicating its high reliability and feasibility for risk assessment in tunnel construction. Notably, one should be relied solely on internal evaluation indicators. For instance, if the normalisation of the raw data is not performed, then Method 3 yields a silhouette coefficient of 0.887; however, the evaluation result may not accurately reflect the actual risk situation. Thus, the reliability of the model results must be assessed with respect to actual construction risk situations.

Table 3 Evaluation metrics results for the four clustering methods

In practice, a faster and more efficient construction was observed in Ring 1560 to 1571 (see Fig. 3), thus demonstrating a lower risk, which is consistent with the actual project. However, the construction speed does not accurately reflect the level of construction risk. For example, Ring 1570 was constructed rapidly, but exhibited a risk level of III. Rings 1575 and 1576, which exhibited a risk level of V, required a week to excavate but demonstrated inferior poor settlement control. For Ring 1574, whose risk level was VI, two weeks were required to complete the construction because of the application of advanced grouting and the extended waiting time for concrete consolidation. In general, the assessment results of the case study agreed well with the actual engineering situation.

4.4 Model limitation

Despite its advantages, the proposed model presents some limitations. In this study, the clustering of construction data into five classes may have resulted a changed classification labels every time the program was executed. Therefore, additional research is required to combine these five classes with their corresponding risk levels. In practical engineering, risk-level clustering can be learned after determining the risk of construction area, which allow the model to achieve a more direct risk classification. Additionally, the risk level was generated for each tunnel ring, as the settlement data were obtained from daily monitoring reports. However, considering that the tunnelling parameters and vibration data were recorded every minute, continuous settlement data should be collected in the future to obtain more data for time-series analysis. Furthermore, a feature-selection algorithm was adopted during the built of the assessment index system. The removed features may be less influential or strongly correlated with other characteristics that are not necessary for the clustering methods. In other numerical risk assessment methods, multiple input parameters enable a more comprehensive consideration of risk. Whereas the application of the proposed model reveals its effectiveness in the case study, the adaptability and generalisation of its results across diverse tunnelling projects warrant further investigation. Future validations employing datasets from various projects would provide a more comprehensive verification of the research outcomes.

5 Conclusions

A novel FSPA-FCM-based model was developed in this study for risk assessment during shield tunnelling in soil-rock mixed strata. The following conclusions can be drawn:

  1. 1)

    An integrated risk assessment method based on FSPA and FCM was developed, via FSPA to transform raw data into connection numbers within [−1,1] of each risk level, followed by using FCM to cluster the risk levels. This method not only solves the problem wherein the membership degree is affected by the value domain when predicting risk levels using AI clustering algorithms, but also overcomes the disadvantage of the conventional data normalisation without consideration of the physical risk significance.

  2. 2)

    The FCM algorithm was modified to incorporate the weights of individual indicators with the original Euclidean distance in the objective function, thus achieving balance among different features and improving clustering accuracy. Owing to this modification, the effects of different features were taken into account, resulting in a more realistic clustering after processing the original data.

  3. 3)

    A scientific and systematic risk assessment index system was developed. Tunnelling machine, deformation, and vibration data were utilised to comprehensively consider the construction, environmental and geological conditions, respectively. Moreover, to ensure the scientific nature of the index system, feature selection was performed using the MI algorithm in an actual engineering situation.

  4. 4)

    The proposed method was applied to the Rings 1560–1580 in Guangzhou, China. The results showed that the construction from Ring 1572 onwards exhibited a significantly higher risk, with a high level of risk fluctuation from Ring 1560 to 1580. Findings from risk identification suggest that the thrust force and earth pressure must be regulated carefully during tunnelling. The assessment results of the case agreed well with the actual engineering situation, thus indicating that the proposed model can provide a promising solution for risk assessment during shield tunnelling in soil-rock mixed strata.