Introduction

With the rapid development of society and economy, geological hazards caused by human engineering construction have increased significantly (Froude and Petley 2018; Zhao et al. 2021). Geological hazards are characterized by complexity, suddenness and inevitability (Yang and Hu 2019). Geological hazards in China are widely distributed and occur frequently, bringing huge economic losses every year. Therefore, it is of great significance to construct a more accurate geological hazard susceptibility model and to partition the study area into geological hazard susceptibility zones in a more reasonable way, so as to reduce the hazard losses and protect the ecological environment (Jiang et al. 2017).

In the work of geological hazard susceptibility assessment, an empirical model (such as analytic hierarchy model) was first used (Pawluszek and Borkowski 2017) and achieved a certain prediction effect. However, this type of model is susceptible to subjective factors, which reduces the evaluation accuracy of the model. With the development of GIS and other technologies, statistical analysis models have been applied to geological hazard susceptibility models (such as information value and frequency ratio model) (Ba et al. 2017; Du et al. 2019; Khan et al. 2019; Nicu 2017; Sarda and Pandey 2019) and have led to a certain improvement in evaluation accuracy. However, this type of model lacks uniform standards and criteria in the hierarchical classification of causative factors, and the prediction results vary considerably (Aditian et al. 2018; Wang et al. 2014). With the rapid development of computers, machine learning models are widely used in the evaluation of geological hazard susceptibility (such as random forest and support vector machine model) (Akinci et al. 2020; Huang and Zhao 2018; Zhao et al. 2020). And the predictive accuracy of the models was greatly improved, generally higher than the first two types of models. At the same time, researchers also found that the most suitable evaluation models are different for different study areas. Therefore, it has become a trend for researchers to use two or more models for comparison in the evaluation of geological hazard susceptibility. In the process of continuous application of machine learning models, the hybrid model composed of a single model has become a new trend in geological hazard susceptibility evaluation (Chen et al. 2019), such as FT (fractal theory)-IV-RF (Zhao et al. 2021), ANN-Bayes (He et al. 2012) and RS-SVM (Chang and Wan 2015). Machine learning models require positive and negative sample data in the process of using them, while previous studies mostly improve the prediction accuracy by improving machine learning models, and few studies related to positive and negative samples. Accurate positive samples can be obtained through remote sensing image interpretation or field surveys (Kalantar et al. 2018). Accurate positive samples can be obtained through remote sensing image interpretation or field investigation (Conoscenti et al. 2016; Mondini and Chang 2014; Posner and Georgakakos 2015). Negative sample acquisition methods vary, including random selection in the study area (Polykretis and Chalkias 2018; Pourghasemi and Rahmati 2018), in the low slope area (Kavzoglu et al. 2014), through special methods (Chen et al. 2021; Liu et al. 2021; Su et al. 2022) and out of the positive sample buffer (Peng et al. 2014; Su et al. 2017). All of these selection methods have yielded better prediction results. But the studies on randomly selecting samples outside the positive sample buffer are not only few, but more importantly, only one particular buffer distance is set in these studies, and multiple buffer distances are not studied. Other studies based on machine learning models also do not divide and study the buffer distance. Park et al. (2019) evaluated the susceptibility of Woomyeon Mountain in Korea with a buffer distance of 100 m. Wang set the buffer distance 500 m and evaluated the susceptibility of geological hazards in Yunyang County (Chongqing, China) (Wang et al. 2020). Wang conducted a multi-model susceptibility evaluation for Helong City (Jilin Province, China) with a buffer distance of 1000 m (Wang et al. 2021b). However, the large variability of sample attributes selected outside different buffer distances may have a large impact on the model accuracy. Based on this problem, this paper innovatively divides the buffer distance according to the scope of the study area and the experience of previous researchers and conducts a detailed analysis and study: to determine whether the buffer distance has an impact on the model accuracy, whether the impact is large, how will the model accuracy change with buffer distance, what is the optimal buffer distance. The main objectives of this study are (1) to map geological hazard susceptibility using RF and FR-RF models with different buffer distances and analyze the distribution of susceptibility zones of different grades in the study area; (2) to analyze whether different buffer distances have an effect on the results of the susceptibility evaluation and how the ROC curves and confusion matrices change with changes in buffer distances to analyze the reasons for such differences; (3) to determine whether there is a significant impact on the model accuracy through model test and economic and environmental losses to determine the optimal buffer distance in the study area; and (4) identify the main causative factors leading to the occurrence of geological hazards in the study area.

The western part of Shanxi Province is the area where geologic hazards occur most frequently. Linfen city is a more serious area in western Shanxi province where geologic hazards occur (Zhao et al. 2016). And Pu County is the area of Linfen City where geologic hazards occur in larger numbers and are more typical (Zhao 2017). Geological hazards pose a great threat to local infrastructure and the safety of people’s lives and property. According to the “14th Five-Year Plan for Geological Hazards Prevention and Control in Shanxi Province”, a more accurate evaluation of the susceptibility to geological hazards is required than in the past. Therefore, the analysis of buffer distance in this area can not only improve the accuracy of the susceptibility model, but also provide a theoretical basis for the selection of negative samples in the neighbouring areas with similar terrain.

In summary, taking Puxian County as an example, the buffer distance of the study area was divided into 100 m, 500 m, 1000 m and 2000 m, and the RF and FR-RF models were used to evaluate the susceptibility to geologic hazards with different buffer distances. And the results of the study can provide theoretical basis for more accurate hazard prevention and mitigation in the study area. It can also provide a new idea for negative sample selection and a scientific basis for future geological hazard susceptibility evaluation based on machine learning models in buffer distance selection. The research process is shown in Fig. 1.

Fig. 1
figure 1

Flow chart of this study

Materials and methods

Study area

Pu County is located in the southwestern part of Shanxi Province, at the southern end of the Lvliang Mountain Range and on the east bank of the middle reaches of the Yellow River. The county has a warm-temperate continental climate, with an average annual temperature between 8.6 and 12.6 °C and an average annual rainfall of 586 mm. The altitude of the area is shown as high in the northeast, east and south and low in the west. The large geomorphological units can be divided into folded and broken block-stripped high school mountainous area, middle mountainous area, loess plateau and hilly area and intermountain valley area. The Loess Plateau and hilly areas are distributed with a large number of slopes, which have the proximity conditions to produce avalanches and landslide deformation activities. The terrain of the river valley area is relatively flat, and geological hazards are not developed, but the valley slopes on both sides of the transition to other landforms have steep slopes due to the erosion and cutting influence of the river, and most of them are composed of sub-soft and semi-hard rocks. These valleys and slopes provide favourable conditions for the occurrence of geological hazards due to river erosion and hollowing. Through field survey and data collection, 148 geological hazards were identified (Fig. 2). Based on ArcGIS, the study area was divided into raster units with the size of 30 × 30 m (Liu et al. 2018).

Fig. 2
figure 2

Geographical location map of Pu County

Causative factors

Based on geological environmental conditions, geological hazard development mechanism and related assessment work in the study area, ten factors including distance from fault (DFF), distance from roads (DFR1), distance from rivers (DFR2), average annual rainfall (AAR) and normalized vegetation index (NDVI) were selected for correlation analysis and geological hazard susceptibility evaluation.

Elevation is the most commonly used index in geological hazard susceptibility evaluation. Potential energy is more likely to be converted into kinetic energy and geological hazards occur in areas with greater elevation changes. (Xiong et al. 2017). Slope is the embodiment of the steepness of a local area and is one of the key indexes in the evaluation of geological hazard susceptibility (Bordoni et al. 2020; Zhang et al. 2020). The distribution of light duration and intensity and rainfall infiltration was different in different aspect (Erener and Duzgun 2013). Curvature refers to the topographic form of slope, which has an indirect influence on the development range of geological hazards (Pourghasemi et al. 2018).

The basic property of lithology determines the initial state of slope and the ability to resist weathering and erosion, which is also the most important factor for geological hazards (Lin et al. 2021; Sun et al. 2018). Geological structure movement deteriorates the mechanical properties of rock and soil mass, directly or indirectly leading to the occurrence of geological hazards (Hong et al. 2015). Human engineering construction has great influence on the slope; frequent disturbance may lead to slope instability (Kanwal et al. 2017). Geological hazards are widely distributed in the distance near or very close to the roads, and the number of geological hazards decreases rapidly as the distance becomes longer (Wang et al. 2021b).

Rainfall not only erodes slopes but also penetrates and softens rocks and soil, providing sufficient hydrodynamic conditions for the occurrence of geological hazards (Li et al. 2020). The erosion of slope surface rock and soil by water system is one of the important causes of geological hazards (Meinhardt et al. 2015). Vegetation can prevent soil erosion and help strengthen the stability of the slope. Generally, the more lush the vegetation, the less prone to slope instability (Chen et al. 2017).

Methods

Pearson correlation coefficient

Pearson’s correlation coefficient can be calculated to evaluate the overall linear relationship (Biswas and Si 2011), usually expressed as r. It can also represent the positive and negative correlation between variables (Hu et al. 2021). In general, the higher the correlation between variables, the closer the absolute value of r is to 1. The degree of correlation between variables are divided into small, low significant correlation and the strong correlation, the r value range of the corresponding to |r| < 0.3, 0.3 ≤ |r| < 0.5, 0.5 ≤ |r| < 0.8, |r| ≥ 0.8 (Guo 2020).

$$r=\frac{\sum_{\textrm{i}=1}^{\textrm{k}}\left({X}_{\textrm{i}}-\overline{X}\right)\left({Y}_{\textrm{i}}-\overline{Y}\right)}{\sqrt{\sum_{\textrm{i}=1}^{\textrm{k}}{\left({X}_{\textrm{i}}-\overline{X}\right)}^2}\sqrt{\sum_{\textrm{i}=1}^{\textrm{k}}{\left({Y}_{\textrm{i}}-\overline{Y}\right)}^2}}$$
(1)

where k is the number of samples, Xi,Yi are the observed values of sample variables, and \(\overline{X}\), \(\overline{Y}\) represent the mean values of samples.

FR model

FR model is a method of mathematical calculation and analysis, and its size represents the contribution degree of factors at various levels to the occurrence of hazards after classification (Ma et al. 2022). The sum of the frequency ratios of all subcategories of causative factors is the final probability index of hazard occurrence (Aditian et al. 2018). The calculation method has been explained in previous articles (Chen et al. 2023).

$${\textrm{FR}}_{\textrm{ij}}=\frac{N_{\textrm{ij}}}{N_r}\div \frac{A_{\textrm{ij}}}{A_r}$$
(2)

where FRij is the frequency ratio; Nij is the number of geological hazards grids at the jth level under the ith infuencing factor; Nr is the grid number of geological hazards in the study area; Aij is the number of regional grids of the jth level under the ith infuencing factor; Ar is the total number of grids in the study area.

RF model

The random forest, originally developed by Breiman (Breiman 2001; Nun et al. 2014), is a multi-variable classification machine learning model (Catani et al. 2013). Due to variance and other problems, a single DT shows a weak prediction effect (Taalab et al. 2018). Therefore, the RF model combines multiple DT for classification and prediction, which greatly improves the accuracy of prediction. The calculation method has been explained in previous articles (Chen et al. 2023).

Hybrid model

The FR model can re-quantify the attributes of each causal factor and input them as initial variables in the form of frequency ratios into the RF model to construct a new model (Li et al. 2021).

Model validation method

Confusion matrix

Due to the extreme imbalance between geological hazard and non-geological hazard samples, it is poorly applicable to judge the model prediction precision accuracy by statistical methods only (Yu et al. 2021). Previous researchers have used confusion matrices for geological hazard susceptibility evaluation (Frattini et al. 2010).

Combined with Table 1, the accuracy(ACC) calculation formula is shown as follows:

$$\textrm{ACC}=\frac{F_{11}+{F}_{22}}{F_{11}+{F}_{12}+{F}_{21}+{F}_{22}}$$
(3)
Table 1 Confusion matrix

ROC curve

The ROC curve of each model can be plotted and the AUC value (generally between 0.5 and 1) can be calculated by Python language. Many studies evaluate the accuracy of the model through them (Chen et al. 2020). An AUC value greater than 0.7 indicates good prediction effect, and an AUC value greater than 0.8 indicates excellent prediction effect (Wang et al. 2021a). The calculation method has been explained in previous articles (Chen et al. 2023).

Result and discussion

Screening of causative factors

The table of factor correlation coefficients (Table 2) was obtained by Pearson correlation coefficient analysis. From the table, it can be seen that AAR is strongly correlated with elevation. It can also be seen from Fig. 3 that the trend of rainfall change is almost consistent with the trend of elevation change. Except for the strong correlation between elevation and AAR, the correlation between elevation and the rest of the factors is less than 0.5, which is in the low and weak correlation. On the other hand, the correlation between AAR and DFF is higher than 0.5, with significant correlation. Therefore, the AAR factor is excluded in this paper.

Table 2 Factor phase relation number table
Fig. 3
figure 3

Causative factors attribute value classification: A elevation; B slope; C aspect; D curvature; E Lithology; F DFF; G DFR1; H DFR2; I AAR; J NDVI

Susceptibility map of geological hazard based on the RF and FR-RF model

Although the RF model has been well evaluated in previous studies, it is unreasonable to use only one method to evaluate the susceptibility of geological hazards in the study area (Wang et al. 2022). In this paper, the RF and FR-RF models are used to model the susceptibility of geological hazards. The ratio of positive and negative samples is 1:1 (Jiang et al. 2017; Tsangaratos and Ilia 2016; Zhang et al. 2023). In order to avoid the negative sample selection process, the distance is too close, and the attribute values are too similar which reduces the model accuracy. Therefore, the negative samples are selected based on ArcGIS so that the distance between them is greater than 500 m. RF is a machine learning algorithm that can classify multiple variables, so it needs to input initial variables. Based on ArcGIS platform, the attribute values of each causative factor of positive and negative samples are extracted as input variables of RF model. The frequency ratio of each causative factor is used as the input variable of the FR-RF model. According to previous research experience, the samples are divided into training set and test set by 7:3 (Hussain et al. 2021; Sahin et al. 2020; Zhou et al. 2021). Similarly, attribute values and FR values of causative factors are assigned to the 30 × 30 m grids divided in the study area and input into each trained model as initial variables to obtain the susceptibility probability of each grid in the research area. The natural discontinuity method can be used to divide the susceptibility probability of the study area into 5 levels and draw the susceptibility zoning maps (Fig. 4). Table 3

Fig. 4
figure 4

Geological hazard susceptibility zoning maps

Table 3 Classification of causative factors and FR values

Model validation

Confusion matrix

The confusion matrix is used to observe the performance of the model on each category. It calculates the accuracy of the model corresponding to each category, making the categories more distinguishable. Based on Python, F11, F12, F21 and F22 are calculated for each model and ACC values are calculated for each model according to Table 1. Through Table 4, it is learnt that with the increase of buffer distance, the change trend of RF and FR-RF models remains consistent, both showing the trend of increasing first and then decreasing, and the ACC values all reach the maximum when the buffer distance is 1000 m. The ACC values of RF models constructed with different buffer distances are always higher than those of FR-RF models. The results of the confusion matrix indicate that the model constructed by RF with negative samples randomly selected outside the buffer distance of 1000 m is more accurate.

Table 4 Different buffer distance ACC values

ROC curves

According to Fig. 5, the prediction effect of the models constructed outside different buffer distances is better. The variation trend of AUC values in RF and FR-RF models was consistent with that of ACC, both of which showed a trend of first rising and then declining, and AUC values reached the maximum when the buffer distance was 1000 m. The results of ROC curve show that the accuracy of the model constructed by RF is higher when the buffer distance is 1000 m.

Fig. 5
figure 5

ROC curve and AUC value: A ROC curves and AUC values under RF modeling, B ROC curves and AUC values under FR-RF modeling, C ROC curves and AUC values at a buffer distance of 3000 m

Model evaluation

Evaluation of susceptibility zoning maps of different buffer distance

According to Fig. 4, the spatial distribution of the model susceptibility zoning maps is basically consistent with the actual investigation. Very low susceptibility zones are widely distributed in the southern border area of the study area, near Cao Village in the southeast and in the very high altitude area in the northeast. The low susceptibility zone is widely distributed in the area near the centre of Pu County within the very low susceptibility zone in the south and northeast. In the south, on both sides of the Nanchuan River and in the north-west, moderate susceptibility zones are mainly distributed. The high susceptibility zone is mainly distributed on both sides of rivers and roads in the study area. Very high susceptibility zones are mainly distributed in Xinshui River, Nanchuan River, Miangou River and on both sides of S329 road. According to Fig. 6, the proportion of very high susceptibility zones increases with the increase of buffer distance and reaches the highest at a buffer distance of 2000 m. The proportion of high susceptibility zone fluctuates and the RF and FR-RF models reach the highest at a buffer distance of 1000 m and 100 m, respectively. The proportion of moderate susceptibility zone shows an overall trend of increasing and then decreasing and reaches the highest at a buffer distance of 500 m. The RF model showed a decreasing trend in the proportion of low susceptibility zones, reaching the maximum at a buffer distance of 100 m, and the FR-RF model showed a fluctuating change, reaching the maximum at a buffer distance of 500 m. The very low susceptibility zone is highest at a buffer distance of 100 m and lowest at 500 m. At a buffer distance of 100 m, the proportion of very low and low susceptibility zones of the RF model is higher than that of the FR-RF model, and the moderate, high and very high susceptibility zones are higher than that of the FR-RF model. At a buffer distance of 500 m, the proportion of very low, low and high susceptibility zones of the RF model is lower than that of the FR-RF model, and the moderate and very high susceptibility zones are higher than that of the FR-RF model. At a buffer distance of 1000 m, the proportion of very low susceptibility zones of the RF model is higher than that of the FR-RF model, and the rest of the susceptibility zones are lower than that of the FR-RF model. At a buffer distance of 2000 m, the very low, high and high susceptibility zones of the RF model are higher than the FR-RF model, and the low and moderate susceptibility zones are higher than the FR-RF model.

Fig. 6
figure 6

The proportion of susceptibility zoning of different models

Analysis of reasons for differences in evaluation results

According to Figs. 4 and 5 and Table 1, there are large differences in the results of geological hazard susceptibility when negative samples are selected outside different buffer distances. Since the RF model is the result of comprehensive decision-making by multiple decision trees, when the spatial location of the sample changes, its initial attribute values change along with it, and the results produced by the features selected by each decision tree will change. When all the decision results are aggregated together, there are differences in the judgement of the susceptibility level within each grid. In terms of buffer distance variation, the smaller the buffer distance, the closer the randomly created negative samples are to the positive samples. This results in the negative sample being more likely to have similar or consistent values for certain attributes as the positive sample. At this point, the negative samples are not representative. When the RF model is trained, it will confuse the features of the positive and negative samples, especially the features with higher weights, and therefore, the accuracy will decrease. As the buffer distance becomes larger, there are fewer cases where the attributes of the positive and negative samples are similar, in which case the negative samples selected are more representative. When buffer distances are particularly large, the randomly selected negative sample attribute values become more “extreme”. For example, the DFR1 is generally long. Therefore, when constructing the model, areas that do not meet these characteristics are considered to be highly vulnerable to geological hazards. Therefore, if negative samples are selected outside the buffer distance of 2000 m for model construction, the number of high- and very high–susceptibility zones will increase, and the ACC and AUC values will decrease, which is not in line with the actual situation. The difference between RF and FR-RF models is mainly because the FR-RF model inputs are calculated FR values, while the RF model inputs the real attribute values of each causal factor.

In order to further verify whether the samples will be more “extreme” and less “representative” as the buffer distance increases, we use 3000 m as the new buffer distance to construct and test the RF and FR-RF models. Fig. 5C and Table 4 show that when the buffer distance is 3000 m, the models are less effective, the ACC value of RF and FR-RF models is 0.77, and the AUC value is 0.846 and 0.865, respectively, which are smaller than those of 2000 m. Therefore, it can be inferred that the buffer distance continues to decrease, the “representativeness” of RF and FR-RF models will be more “extreme”, and we use 3000 m as the new buffer distance to construct RF and FR-RF models and test them. Therefore, it can be inferred that as the buffer distance continues to decrease, the model accuracy will also continue to decrease.

Weight evaluation of causative factors

According to the RF-1000m model, to analyze the weight shares causative factors (Fig. 7). DFR1, DFR2, NDVI and elevation are the most important factors leading to the occurrence of geological hazards in the study area. According to the geological hazard survey, the occurrence of 124 geological hazards was related to human activities, which accounted for 83.78% of the total number, and most of them were distributed on both sides of the roads and rivers, especially the S329 road and Xinshui River. Due to historical reasons and the limitation of topographic and geomorphological conditions, local villagers living in the loess area are accustomed to building houses and digging kilns to live at the foot of slopes and opening up mountains to build roads. Such engineering activities often form high and steep slopes, breaking the natural equilibrium of the slope; with the passage of time and rainfall scouring, the slope soil body becomes broken, losing the original stability, the formation of landslides, collapses and unstable slopes. This type of engineering activity often occurs on both sides of the road. River development in the study area is widely distributed. On both sides of Xinshui River, Nanchuan River and Miangou River, bedrock is mostly exposed, the downward erosion of the river is blocked, the erosion on both sides is strong, which has obvious influence on the stability of the valley slopes, and it has triggered more landslides in the history. In some gullies, mainly loess gullies, the downward erosion of flowing water and both sides of the erosion exist. The valley slopes on both sides are steep and are still under erosion by flowing water, making them prone to landslides and avalanches. Vegetation plays the role of slope protection and preventing soil erosion, which has a certain influence on the evolution and stability of slopes. From Fig. 3H and J, it is seen that the vegetation cover on both sides of the river is generally low, which, together with the effects of river erosion and rainfall scouring, results in this being a high incidence area for geological hazards. The north-eastern and southern parts of the area have higher vegetation cover, and the distribution of geological hazards is minimal. Geological hazards are mainly distributed in areas of lower elevation, and almost none in areas of very high elevation. This is mainly due to the fact that there is less engineering construction in the high-altitude areas, and the erosion of the water system is not serious. Meanwhile, although the rainfall is high, it is mostly absorbed by the vegetation or flows to the low altitude areas, so geological hazards are not likely to occur in this area. On the other hand, the lithology of the low-altitude areas is more fragile, mostly loess and clastic rocks, the water system is widely distributed, human engineering is frequent, and the vegetation cover is low, so the combination of many factors leads to geological hazards in this area.

Fig. 7
figure 7

Weight proportion of causative factors

The mechanism of geological hazards is mainly related to the stability of the slope body. When the shear strength of the slope body is less than the shear force, landslides, collapse, debris flows and other slope-based geological hazards will occur. The shear strength of the slope body is closely related to the mechanical properties of the geotechnical body, slope, aspect, moisture content and other factors. Therefore, when these factors change in the direction of prompting the stability of the slope body to decrease, it will lead to the occurrence of geological hazards, especially when induced by rainfall and other factors. Through the above geological hazard susceptibility zoning and weighting analysis of causing factors, the geological hazard-prone areas in the study area are mainly on both sides of the river and the road. The erosion of the river on the slopes on both sides of the river destroys the physical and chemical properties of the geotechnical body, increases its water content, and reduces the shear strength of the slope, thus making it more prone to geological hazards. Secondly, with the development of human society, it is forced to carry out infrastructural construction in these gullies and ravines, such as cutting the slope to build houses and opening up mountains to build roads. These activities force the originally stable slopes to become unstable by physical means, and then geological hazards occur under the influence of river erosion and other factors. River and human activities are the most important factors, and NDVI, slope and other factors also play an important role in the occurrence of geological hazards.

Economic and environmental loss assessment

Research shows that land use and GDP indicators are very important for maintaining environmental safety and sustainable economic development (Wang et al. 2021b; Zhao et al. 2006). The more accurate the geological hazard susceptibility model is, the more reasonable the zoning will be, and the lower the economic loss and treatment cost will be. HMC and loss rates (expressed as GDP/HMC in this article) can be used as objective indicators to assess the extent of damage. (Yum et al. 2020). The calculation method has been explained in previous articles (Chen et al. 2023).

Economic and loss rate indicators of models constructed with different buffer distances (Table 5) show that FR-RF-1000m HMC and GDP loss ratio is lower than other models, and the economic benefits are better (only worse than RF-100m and FR-RF-500m). FR-RF-1000m is the model with the highest prediction degree in the study area. Compared with other models, the governance cost of FR-RF-1000m decreases by 3.55% on average, which indicates that the RF model can effectively reduce the cost of hazard management by using sampling outside 1000m buffer distance.

Table 5 Economic and loss rate indexes

Conclusion

Accurate and scientific geological hazard susceptibility analysis and obtaining a scientific and reliable buffer distance index are the key steps to improve the accuracy of susceptibility zoning. This paper takes Puxian County as the research object and constructs the susceptibility model by dividing different buffer distances. Through the study, whether it is the RF or FR-RF model, the change of buffer distance will have an impact on the accuracy of the model. Therefore, buffer distance is a necessary consideration when using machine learning methods to construct highly accurate geological hazard susceptibility models. Each area should have a buffer distance that is most suitable for it. Through this study, we found that the buffer distance is too large or too small, which will lead to the sample not being “representative”, and it should be in an “intermediate value”. Through the weighting analysis of causative factors, DFR1, DFR2 and NDVI are the main factors leading to the occurrence of hazards in the study area. By comparing the economic and environmental losses, the average cost of hazard management was 3.55% higher in the other models than in the RF-1000m model. This study is of great significance to promote the sustainable development of economy and ecological environment in the geological hazard susceptibility areas and also provides scientific basis for the selection of buffer distance in the future evaluation of geological hazard susceptibility in Puxian County and the surrounding areas.