Developing and comparing machine learning approaches for predicting insurance penetration rates based on each country

Ghorashi, Seyed Farshid; Bahri, Maziyar; Goodarzi, Atousa

doi:10.1007/s12076-024-00387-7

Developing and comparing machine learning approaches for predicting insurance penetration rates based on each country

Comments
Published: 03 September 2024

Volume 17, article number 24, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Letters in Spatial and Resource Sciences Aims and scope Submit manuscript

Developing and comparing machine learning approaches for predicting insurance penetration rates based on each country

Download PDF

18 Accesses
Explore all metrics

Abstract

Insurance plays an important role in the financial system, impacting various financial variables and vice versa. Many studies have analyzed the long- and short-term relationship between insurance and financial variables, particularly economic growth, using classical and benchmark methods. Given its impact on financial indicators, it is critical for policymakers to predict the insurance penetration rate, which can, in turn, forecast trends in other financial variables. This paper uses decision trees, random forests, and XGBoost to predict the insurance penetration rates for 30 OECD countries, identifying the most effective method for each country. The mean squared error for predicting the total insurance penetration rate with the decision tree and XGBoost is 2.32, and 1.461, respectively, while the loss function decreases to 1.1 with random forest. Additionally, XGBoost outperforms the other models in predicting non-life insurance penetration rate with an RMSE of 0.92, while for life insurance penetration rate, the random forest model has the highest accuracy, with an RMSE of 1.8. Using the prediction, for countries where the insurance penetration rate is going to decrease, policymakers could implement strategies that foster growth in the insurance sector and, consequently, boost other financial variables, particularly economic growth.

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

1.1 Brief introduction

The major role of insurance industry is undeniable, as it not only mitigates the risk but also mobilizes funds for financial industry, promoting investment and growth. Also, the importance of insurance in both domestic and international economies was recognized as early as the 1960s. The United Nations Conferences on Trade and Development (UNCTD) acknowledged that both insurance and reinsurance play key roles on economic growth according to Outreville (2013). Since then, numerous studies have been conducted to cover the relationship between all aspects of insurance, such as insurance density, insurance size, penetration rate, and financial system of a country, with a specific focus on economic growth. Extensive research has also analyzed the effect of financial variables, including insurance industry, on economic growth. However, fewer studies have considered how changes of insurance industry are impacted by financial variables and economic growth.

1.2 Insurance and classical methods

Among the various insurance indicators, this paper focuses on life, non-life, and total insurance penetration rates, which refer to the proportion of premiums paid to gross domestic product (GDP). The Insurance Penetration Rate (IPR) indicates the level of development of the insurance sector and is considered an essential indicator of a country’s financial development. Before introducing Machine Learning (ML) and Deep Learning (DL), numerous research studies employed methods such as vector auto regression, vector error correction model, generalized methods of moment, and linear regression to analyze the relationship between the insurance and financial variables. For instance, studies by Flores (2021), Sharku (2021), Pradhan (2016, 2017), Hasan (2018), and Olarewaju (2021) examined the relationship between insurance industry and different financial variables. These studies indicate that the relationship between insurance and financial variables, excluding economic growth and the banking industry, can be positive or negative based on the circumstances of each country. However, they indicate that insurance, economic growth, and banking have a positive relationship with each other. Thus, the IPR is a key factor in measuring and predicting economic growth trends. Classical methods, however, struggle to predict spatio-temporal data with high accuracy, and as a result, ML approaches are used in scientific papers.

1.3 ML in financial time series, economics, and insurance

Financial time series prediction has become a major topic, with various methods of artificial intelligence (AI) developed to improve performance.

Masini (2021) employed different methods of ML for economic and financial time series, using both linear (LASSO, Ridge) and non-linear models, with a special focus on tree-based methods. Their results indicated that when a large dataset exists, non-linear models can be useful for economic prediction. Another study that supports ML algorithms for financial time series data is Araujo (2023). They adopted 50 different methods of classical and ML models to predict the inflation rate of Brazil. The results indicate that, ML models outperformed traditional statistical and benchmark methods in terms of accuracy, and the findings were also supported by Claveria (2016). Additionally, ML models can handle noisy, large, and nonlinear datasets. They also indicated that tree-based algorithms, such as decision tree, random forest, and XGBoost, perform quite well compared to other ML algorithms. Another study regarding inflation prediction was conducted by Rodríguez-Vargas (2020). This study used various ML algorithms and Long-Short Term Memory (LSTM) and showed that although LSTM is the best method for inflation forecasting, Random Forest and XGBoost are extremely useful methods for prediction. Moreover, Kanaparthi (2024) used non-linear ML algorithms like XGBoost and random forest to predict the inflation rate in the USA. The results indicated that XGBoost performed better, especially during high inflation.

Numerous studies have also focused on forecasting GDP and economic growth using ML algorithms (see Papadimitriou and Mertzanis (2021), Soni and Kumar (2023), Srinivas and Asokan (2020), Tang et al. (2022), Tian et al. (2022), and Zhang et al. (2021). Yoon (2021) utilized random forest and XGBoost to forecast Japan’s economic growth from 2001 to 2018, finding ML methods outperformed traditional ones, with gradient boosting models proving more accurate. Conversely, Amman Hossain Mahmudul (2021) predicted Bangladesh’s GDP growth with ML algorithms and found that random forest has higher accuracy compared to the gradient boosting model. In another study conducted by Thilaka (2024), GDP was predicted based on CO₂ emissions, the Human Development Index, and life expectancy, and the results showed that the decision tree reached a high score of 1.

Although there is extensive literature on financial time series prediction, few studies have considered the IPRs. Most insurance-related predictions deal with fraud and sales, less with Insurance Density (ID) or IPRs. Lim (2023) used ML algorithms to predict travel insurance sales after COVID-19 and suggested that K-nearest neighbors had the highest accuracy, about 82%, compared to Support Vector Machine (SVM), logistic regression, and random forest. In terms of insurance fraud, Aslam (2022) applied ML models for auto insurance fraud detection, and among all models, SVM has the highest accuracy. For health insurance fraud, Akbar (2020) used both Decision Tree and XGBoost and shows that XGBoost is better at predicting fraud compared to decision tree, with an accuracy of 87% compared to 81%. In addition, Poufinas (2023) used SVM, random forest, decision tree, and XGBoost for insurance claims, and their results showed random forest and XGBoost had the best performance among other algorithms. Orji and Ukwandu (2024) and Ar et al. (2020) predicted medical insurance costs using ML methods. They applied random forest, XGBoost, and gradient boosting machine, and their results indicated that XGBoost had the best performance compared to others. The study held significant importance, particularly for policymakers and insurers, as it aimed to enhance their decision-making process and better meet their needs.

Based on the literature review, most of the studies in the insurance field correlate with claims, sales, and fraud (Reinhart (2021), Wang et al. (2020)). While a few studies consider IPRs or ID. Given the proven effectiveness of ML algorithms Li et al. (2020), especially Decision Tree, Random Forest, and XGBoost, these methods can perform well with complex, non-linear, and noisy data. Hence, it is decided to predict life, non-life, and total insurance penetration rates using tree-based methods and compare their accuracy.

1.4 Aim of the paper

The paper aims to design and compare three different tree-based models, namely Decision Tree, Random Forest, and XGBoost, to predict the life, non-life, and total insurance penetration rates, as well as the best methods for prediction for each country. This study covers a 21 years period, from 2000 to 2021, focusing on 30 OECD countries. It scrutinizes seven financial features, including trade openness, rule of law, financial development index, foreign direct investment, economic growth, inflation rate, domestic credit to private sector. Various studies have demonstrated a positive or negative relationship with the insurance sector.

1.5 Novelty

In this research, ML models are developed to predict life, non-life, and total insurance penetration rates for each country, enabling policymakers to anticipate changes and trends in the financial industry and economic growth, as insurance, financial development, and economic growth are highly correlated, according to Haiss (2008) and Apergis (2020). In this regard, informed decision-making by policymakers can shape economic outcomes and enhance financial development. Also, all the features in this study have been shown to have either a positive or negative impact on the insurance industry across various countries, enhancing the accuracy of prediction. Moreover, the data covers two important global events: the pre- and post-financial crisis and the COVID-19 pandemic, from 2000 to 2021. These global events cause a large variation in the data, and as a result, if an unexpected phenomenon occurs in the future that affects the financial indicators, the model’s accuracy is acceptable. So, the authors offer valuable insight about the effect of these two events on insurance, the financial industry, and economic growth. More importantly, based on the provided literature, most ML methods are applied to various aspects of insurance, with the exception of the level of insurance development, which is the focus of this study.

1.6 Limitations

Data availability poses a significant limitation, as the authors aim to collect data from 2000 to 2023, which is not fully available, and primary coverage high and mid income countries. This limitation introduces a significant bottleneck in ML, as the missing data can lead to the model underfitting. Addressing these biases is important to improve the reliability of the models. Furthermore, the quality data influences prediction outcomes. Inaccurate or incomplete data may weaken the predictions and conclusions. Hence, ensuring high-quality of data is critical to the effectiveness of the ML models.

ML algorithms also face several other challenges. One of the classical problems in ML algorithms is overfitting (Srivastava et al. 2014). In addition, scalability is another issue when using ML models. Large and complex datasets require high memory capacity, processing power, and storage are needed (Bottou 2010). Finally, one ML model is not suitable for all problems. In order to identify the best model, pre-processing and visualization are required to recognize the distribution and patterns of the data (Riggs and Tariq 2023). Therefore, further research should focus on large-scale datasets with more complex relationships and advanced algorithms, such as DL methods, to address the impact of data and ML model limitations on performance.

The reminder of this article is structured as follows: Sect. 2 covers data and variables; Sect. 3 discusses methodology; Sect. 4 is empirical results and ML structure, and Sect. 5 provides a conclusion.

2 Data and variables

2.1 Data gathering and definition

This study covers 30 OECD countries with relatively high levels of income, spanning a 21-year period from 2000 to 2021. Ten indicators are examined in this study, including three dependent variables obtained from (World Bank, n.d.) and seven features sourced from (International Monetary Fund, n.d.), each defined in Table 1. All the features extracted for this study have a positive or negative relationship with insurance, which can aid in accurate prediction. Supplementary material provides the data source.

Table 1 Definition of data

Full size table

Due to data unavailability, certain data points are missing for a few countries. Given the relatively low variance of the data and the limited proportion of missing values, the linear interpolation method is used to create a complete dataset, effectively addressing the issue of missing values.

In this section of the paper, a data engine is applied to gain a better understanding of trends and changes over the years in each country. First, dependent variables and then features are considered.

Table 2 represents a comprehensive statistical analysis. As it can be seen, all financial indicators shows a relatively large range and variance in data. This might indicate that each country has different policies based on their situation and available capacity. In other words, policymakers apply different policies to boost their financial development. Visualization and analysis of the trend of data for both features and targets are discussed in supplementary material.

Table 2 Summary statistics

Full size table

2.2 Spatio-temporal data

This research uses spatio-temporal data as its dataset. Spatio-temporal data sequences are usually complex because both time and location are taken into account. To predict a spatio-temporal sequence, time continuity and spatial correlations between different regions should be considered, as these spatial correlations change over time, in order to predict a spatio-temporal sequence. Hence, it is difficult to determine accurate features, according to Zhu and Chen (2023). In the past, statistical methods were used for spatio-temporal data and treated them as multiple- time series. However, since it is challenging to identify non-linear spatio-temporal patterns and spatial correlations, statistical methods do not perform accurately, according to Fang (2021).

In this regard, ML algorithms can cope well with spatio-temporal sequence data, although in some cases, DL methods perform more accurately for predicting the spatio-temporal datasets because they can extract features Nguyen et al. (2018). Additionally, in cases with enormous sequences of data deep, learning methods can predict and train the model better than ML algorithms. Numerous studies have been conducted using on spatio-temporal data, focusing on ML algorithms, particularly decision tree, random forest and XGBoost.

Sorel (2010), Hossain (2020), Nieto et al. (2021), and Liu (2022) used decision tree algorithms based on spatio- temporal data for prediction. The random forest algorithm is used to cope with spatio-temporal data for prediction, according to Bagherzadeh (2022) and Zhan (2018). Finally, many studies, such as Sun (2022) and Dong (2022), used the XGBoost algorithm for the prediction of spatio-temporal data. Predicting financial variables using ML approaches based on time series and spatio-temporal data can be seen in Wu (2022), Chen (2021), Medeiros (2021), and Akbari (2021). Also, in the field of economics, Yoon (2021), Martin (2019), and Richardson (2018) indicate that ML models have a higher accuracy compared to benchmark forecasts for GDP prediction.

Based on the studies mentioned and the dataset of this study, it is decided to use decision tree, random forest, and XGBoost algorithm for the prediction of life, non-life, and total insurance penetration rates and compare them to each other. Figure 1 represents the diagram in the paper.

3 Methodology

In this section, three tree-based ML models are developed to predict the IPR for each country. The methods are presented in the following:

3.1 Decision tree

Although there are many methodologies for constructing decision tree, the classification and regression tree is the most well-known algorithm. In a decision tree, the training data is split into homogeneous subgroups, and a simple constant is fitted to each subgroup. Subgroups, or nodes, are formed based on the answer to a simple yes–no question. This process continues until the criteria are satisfied or the maximum tree depth is reached. After forming the nodes completely, the model predicts outputs based on the average response values of all observations in subgroup, according to Wu (2022).

3.2 Random forest

Due to the drawbacks of decision trees, including complex calculations on large datasets, instability with noisy data, and sensitivity to data changes, the random forest was introduced as one of the successful ML algorithms, according to Breiman (2001). A random forest is an ensemble of classifiers composed of decision trees that are generated using two different sources of randomization. First, bootstrap samples are randomly chosen based on the dataset, and attribute sampling is the second level of randomization. After this, decision trees are built, and the output is produced. Subsequently, voting is used to find the best model for prediction. The procedure for the random forest is described in the equation below. Additionally, random forest models provide feature importance, illustrating the contribution of each feature to the prediction accuracy. This provides valuable insights into which features have the most important impact on the model’s prediction (Bagherzadeh 2022; Heddam 2021).

$$h_{k} = \frac{1}{k}\mathop \sum \limits_{k = 1}^{k} h\left( {x;\theta_{k} } \right)$$

(1)

$$E_{X,Y} \left( {Y - h\left( X \right)} \right)^{2} \to E_{X,Y} \left( {Y - E_{\theta } h\left( X \right)} \right)^{2}$$

(2)

where k in Eq. (1) goes to infinity based on the law of large number, in Eq. (1), $h(x;{\theta }_{k})$ represents a collection of tree predictors, X indicates a random vector with length p and ${\theta }_{k}$ are independent and identically distributed random vectors.

The random forest algorithm is more beneficial compared to a decision tree as it reduces overfitting by averaging the outcome (Orji 2024). However, it may still cause a generalization error within certain limits, according to Dai (2018).

3.3 XGBoosts

The extreme gradient boosting algorithm (XGBoost) is employed to optimize the training of gradient boosting trees. Like other tree-based models, it can be used for both classification and regression. However, compared to other tree-based models, it is highly flexible and efficient and provides better results in terms of accuracy, sensitivity, specificity, and precision. Also, by utilizing advanced regularization, it alleviates the problem of model generalization (Dalal et al. 2022).

XGBoost has many advantages. Firstly, the model trains faster and requires less storage space, reducing tree complexity. Secondly, to increase the training speed and reduce overfitting, the randomization technique is applied in XGBoost. furthermore, to find the best split, this algorithm reduces computational complexity, which is the most time-consuming part of tree-based models. The split-finding algorithm considers all possible splits and identifies the one with the highest gain, which necessitates a linear scan of all sorted attributes to find the best split of each node. To sort data at each node, XGBoost uses a compressed column-based structure where the data is pre-sorted, resulting in in a single sort of each node. In addition, this structure allows for finding the best split for each node in parallel (Bentéjac 2020).

Given a dataset including n samples and m features, XGBoosts builds t trees in order to predict ${\widehat{y}}_{i}^{(t)}$. Equation (5) shows the prediction up to the t-th tree. The equation indicates that for each iteration, a classifier ${f}_{k}\left({x}_{i}\right)$ is generated, and the summation of the previous prediction value and the decision tree result of the round provide the predicted value, as shown in Eq. (6) (Li 2020).

$$\hat{y}_{i}^{\left( t \right)} = 0$$

(3)

$$\hat{y}_{i}^{\left( 1 \right)} = f_{1} \left( {x_{i} } \right) = \hat{y}_{i}^{\left( 0 \right)} + f_{1} \left( {x_{i} } \right)$$

(4)

$$\hat{y}_{i}^{\left( 2 \right)} = f_{1} \left( {x_{i} } \right) + f_{2} \left( {x_{i} } \right) = \hat{y}_{i}^{\left( 0 \right)} + f_{2} \left( {x_{i} } \right)$$

(5)

$$\hat{y}_{i}^{\left( t \right)} = \mathop \sum \limits_{k = 1}^{t} f_{k} \left( {x_{i} } \right) = \hat{y}_{i}^{{\left( {t - 1} \right)}} + f_{t} \left( {x_{i} } \right)$$

(6)

where ${x}_{i}$, ${y}_{i}$, ${\widehat{y}}_{t}$ and is the number of features, actual values, and predicted values, respectively.

3.4 Tuning the ML models

As demonstrated in Fig. 1, ML algorithms comprise several important steps, including data preparation, splitting the data into training and test datasets, and training the model. To find the best model with high performance and the lowest loss function, hyperparameters are tuned and evaluated with a test dataset. The dataset was split into a training set of 70% and a test set of 30%.

To find the best model for each individual algorithm, hyperparameters are tuned using k-fold cross- validation. Also, in all three models predicting life, non-life, and total insurance penetration rates, the range of hyperparameters and optimal values of hyperparameters are provided. Table 3 presents all the hyperparameters tuned for this study.

Table 3 Hyperparameters

Full size table

3.5 Evaluation criteria

To evaluate all three models in this paper, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-Squared (${R}^{2}$) are used, as defined from Eqs. (3) to (6), respectively.

$$MSE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {x_{i} - \hat{y}_{i} } \right)$$

(7)

$$MSE = \frac{1}{n}\sqrt {\mathop \sum \limits_{i = 1}^{n} \left( {x_{i} - \hat{y}_{i} } \right)}$$

(8)

$$MAE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {\hat{y}_{i} - x_{i} } \right|$$

(9)

$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \hat{y}_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} x_{i} - \overline{y}}}$$

(10)

where ${x}_{i}$ is the actual observation in the data, ${\widehat{y}}_{i}$ is the prediction of ${x}_{i}$, $\overline{y }$ is the mean value of a sample, and n is the number of instances. For the first three criteria (MSE, RMSE, MAE), closer values to 0 indicate better model performance, and the last criteria (R-Squared) should be closer to 1 to demonstrate acceptable model performance.

4 Result and discussion

This study develops and evaluates well-performing ML methods for predicting life, non-life, and total insurance penetration rates for 30 OECD countries. In this section, the prediction of the LIPR is discussed. Three ML models—namely, decision tree, random forest, and XGBoost—were developed and compared to each other.

4.1 LIPR prediction

4.1.1 LIPR prediction with regression decision tree

In this section, a decision tree model was developed to predict the LIPR. The inputs for the ML model included seven financial features, as represented in Table 1. To find the best-performing model for training, K-fold cross-validation was applied to train the decision tree model. The range of hyperparameters and chosen hyperparameters are presented in Table 4. Supplementary material (Fig. S7-a) includes graphs of errors for each model during tuning.

Table 4 Search grid point for finding the hyperparameters for decision tree

Full size table

The best hyperparameters of the decision tree model were identified when it reached its lowest metric error: an RMSE of 0.59, MAE of 0.28, MSE of 0.35, and R-squared value of 0.98 for the training dataset. Likewise, the optimum decision tree model presented corresponding metrics of 2.50, 0.28, 6.27, and 0.81 for the test data set. Figure 2 illustrates the comparison between actual and predicted values for both the training and test datasets for the decision tree algorithm.

4.1.2 LIPR prediction with random forest

Following the decision tree, a random forest algorithm, as an advancement of the decision tree, is developed, offering the crucial advantage of demonstrating the feature importance. The range and optimal hyperparameters for the random forest model are provided in Table 5. The supplementary material (Fig. S7-b) contains graphs showing tuning errors for each model.

Table 5 Search grid point for finding the hyperparameters for random forest

Full size table

The best-performing random forest model calculates RMSE of 0.9, MAE of 0.35, and MSE of 0.81. The R-squared value, another evaluation criteria, is 0.97 for the training dataset. Moreover, for the test dataset, the error metrics, including RMS, MAE, and MSE, are 1.8, 1.2, and 3.2, respectively. Additionally, the R-squared value for the test dataset is 0.9, illustrating that the random forest model has better performance for the test dataset. Nevertheless, during the training process, the decision tree shows greater robustness. Figure 3 demonstrates the accuracy of the random forest model by comparing the actual and predicted values for both the test and training datasets.

4.1.3 LIPR prediction with XGBOOST

XGBoost is a specific implementation of the gradient boosting algorithms, which have been widely used in ML studies (Chen and Guestrin 2016). As the XGBoost algorithm is one of the most efficient methods in ML prediction, it is the final model used in this paper. Firstly, Table 6 describes the range and optimal values of hyperparameters. Supplementary material (Fig. S7-c) provides graphs of tuning errors across different models.

Table 6 Search grid point for finding the hyperparameters for XGBoost

Full size table

In the well-performing XGboost model, the value of RMSE is 0.33, MAE is 0.31, and MSE is 0.1, and the R-squared value is 0.99 for the training dataset. Similarly, these metrics are 1.98, 1.29, 3.9, and0.88 for the test dataset. Based on the results, XGBoost is the best-performed model for the training dataset, however, for the test dataset, random forest is better compared to XGBoost, though the difference is not significant. Lastly, Fig. 4 illustrates the comparison between actual and predicted values for both the test and training datasets.

4.2 NLIPR prediction

After developing and comparing tree-based ML algorithms for LIPR, ML methods are also developed and compared for NLIPR. Three ML models—decision tree, random forest, and XGBoost—were developed for this purpose.

4.2.1 NLIPR prediction with regression decision tree

Initially, a decision tree model is developed using the same financial features. Table 7 indicates the range and the optimal value of hyperparameters. Graphs illustrating tuning errors for each model are available in the supplementary material (Fig. S8-a).

Table 7 Search grid point for finding the hyperparameters for decision tree

Full size table

The ideal hyperparameters are identified when the error metrics reach their lowest values during cross-validation: an RMSE of 0.46, MAE of 0.33, and MSE of 0.21, also R-squared should reach its highest value, which is 0.85 for the training dataset. Meanwhile, for the test dataset, the R-squared value is 0.61, while the error metrics illustrate an RMSE of 0.99, MAE of 0.62, and MSE of 0.99. Figure 5 represents the comparison between actual and predicted values of the NLIPR for both the training and test dataset in the decision tree model.

4.2.2 NLIPR prediction with random forest

Another ML model used is the random forest method. Table 8 illustrates the range and chosen hyperparameters for the random forest algorithm. Error graphs for model tuning are provided in the supplementary material (Fig. S8-b).

Table 8 Search grid point for finding the hyperparameters for random forest

Full size table

Based on these chosen hyperparameters, the error metrics were calculated. For the training dataset, the RMSE is 0.56, MAE is 0.36, MSE is 0.31, and the R-squared reached 0.78. However, these metrics for the test datasets are an RMSE of 1.05, MAE of 0.66, MSE of 1.1, and R-squared of 0.57. Figure 6 indicates the relationship between the actual and predicted values of the NLIPR for the random forest.

The results show that for both the training and test datasets, the decision tree has a higher accuracy compared to the random forest.

4.2.3 NLIPR prediction with XGBoost

The final algorithm for predicting the NLIPR is XGBoost. Table 9 demonstrates the range and ideal hyperparameters after tuning the XGBoost model. The supplementary material (Fig. S8-c) contains graphs showing tuning errors for each model.

Table 9 Search grid point for finding the hyperparameters for XGBoost

Full size table

By reaching these optimal values of hyperparameters, the error metrics are described as follows: an RMSE of 0.35, MAE of 0.23, MSE of 0.12, and R-squared of 0.91 for the training dataset. These metrics for the test dataset are 0.92, 0.57, 0.85, and 0.67. Figure 7 represents the comparison and actual values for the XGBoost method.

The results indicate that the XGBoost method is the best-performing model for NLIPR rate compared to the decision tree and random forest, as the error metrics for the XGBoost algorithm are lower and the R-squared value is higher in comparison to the decision tree and random forest.

4.3 TIPR prediction

In the last part of the empirical result, decision tree, random forest, and XGBoost models are developed for predicting the TIPR to identify the best-performing algorithm.

4.3.1 TIPR prediction with regression decision tree

Firstly, a decision tree model is developed for predicting the TIPR. Table 10 represents the range and the optimal values of hyperparameters. Supplementary material (Fig. S9-a) includes graphs of errors for each model during tuning.

Table 10 Search grid point for finding the hyperparameters for decision tree

Full size table

When the optimal values of hyperparameters are determined, the error metrics reach their lowest values. For the training dataset, RMSE is 0.57, MAE is 0.23, MSE is 0.32, and R-squared value is 0.99. Nevertheless, an RMSE of 2.32, MAE of 0.23, MSE of 5.4, and R-squared of 0.86 are calculated for the test dataset. Figure 8 represents a comparison between actual and predicted values for the decision tree model.

4.3.2 TIPR prediction with random forest

The range and the ideal values of hyperparameters for the random forest model are described in Table 11. Graphs illustrating tuning errors for each model are available in the supplementary material (Fig. S9-b).

Table 11 Search grid point for finding the hyperparameters for random forest

Full size table

For the random forest model, an RMSE of 0.25, MAE of 0.09, MSE of 0.06, and R-squared of 0.99 are recognized for the training dataset. Similarly, for the test dataset, the values are 1.1, 0.8, 1.4, and 0.96, respectively. Figure 9 compares the actual and predicted values.

The results indicate that random forest has significant effectiveness for both training and test datasets.

4.3.3 TIPR prediction with XGBoost

XGBoost, the final algorithm, is discussed in this section for TIPR prediction. The range and optimal hyperparameters for XGBoost are presented in Table 12. Error graphs for model tuning are provided in the supplementary material (Fig. S9-c).

Table 12 Search grid point for finding the hyperparameters for XGBoost

Full size table

After tuning the hyperparameters, the error metrics reached their lowest value in k-fold cross validation: an RMSE of 0.35, MAE of 0.25, MSE of 0.12, and R-squared of 0.99 were identified for the training dataset. Meanwhile, for the test dataset, the values are 1.46, 1.06, 2.14, and 0.95, respectively. Figure 10 illustrates the comparison between the actual and predicted values for the TIPR using the XGBoost model.

The results indicate that for both training and test datasets, the random forest is the best-performing model compared to others, although there is no significant difference compared to XGBoost.

4.4 Feature importance

One of the significant benefits of the random forest method is its ability to measure feature importance. Many studies have used this feature to highlight the importance of various indicators (see Saarela and Jauhiainen 2021; Molnar et al. 2023; Orji and Ukwandu 2024). Figure 11 illustrates the importance of features for life, non-life, and total insurance penetration rates. As it can be seen for LIPR and TIPR, trade percent is the most crucial feature, whereas economic growth is the least important. Additionally, for the NLIPR rate, rule of law and economic growth have the highest and lowest impact, respectively.

4.5 Discussion

In the previous part of the paper, decision tree, random forest, and XGBoost were developed to predict life, non-life, and total insurance penetration rates, evaluating the best-performing algorithms for each. Table 13 compares accuracy, in terms of RMSE and R-squared, for both train and test datasets. Based on the results, all three models performed well for the training dataset, especially for total and life insurance, where the R-squared values for all of them are higher than 0.97. Although XGBoost is the best-performing model for LIPR and TLIPR for training datasets, the random forest algorithm has higher accuracy for NLIPR. Meanwhile, for the test dataset, XGBoost is the best-performing model for NLIPR, and the random forest has the highest accuracy for both LIPR and TIPR, although the difference in effectiveness is not significant compared to XGBoost.

Table 13 Model Selection for Prediction IPR based on R2 and RMSE

Full size table

Figure 12 represents the comparison between actual and predicted values for life, non-life, and total insurance penetration rates. As can be seen, all models indicate high accuracy. However, according to Fig. 12b, due to noisy data, the prediction is not as good as for LIPR and the TIPR.

The scenario might be different for each country. Table 14 illustrates the absolute error for some countries based on all developed methods and shows that the best model for each country might vary. For instance, in terms of LIPR, the decision tree algorithm performs best for Hungary, Iceland, Turkey, and Switzerland. However, the XGBoost model has higher accuracy for Ireland, the United States, and Germany. Additionally, the random forest model outperformed the decision tree and XGBoost in predicting NLIPR in the United States, Denmark, as well as TIPR in Hungary, Mexico, and Japan. Therefore, based on the provided results, an appropriate algorithm should be chosen for each country.

Table 14 Countries’ absolute error

Full size table

Figures 13, 14, and 15 represent the average real value for LIPR, NLIPR, and TIPR based on test dataset, respectively.

5 Conclusion

Insurance plays a crucial role into the economic growth of each country. Predicting IPR provides deep insights into different countries’ economic conditions, financial variables, and insurance industries of different countries. Therefore, predicting IPRs using financial variables can offer valuable information about each country’s insurance sector and financial industry.

To identify the most robust and effective model overall, as well as the most effective method for each country, three ML models were developed and compared to each other. The study covers 30 OECD countries from 2000 to 2021.

In addition, to evaluate algorithms, criteria such as RMSE, MSE, MAE, and R-Squared were considered. All the hyperparameters were tuned using k-cross validation, and the optimal value of each hyperparameter was chosen when loss functions reached their minimum value and R-squared values reached their maximum.

For LIPR and TIPR on unseen data, random forest and XGBoost are the best-performing models overall, with no significant difference, although random forest is slightly more accurate. However, results can vary by country. For instance, in Iceland, the decision tree predicted LIPR better, with an average absolute error of 0.27, compared to 0.61 for random forest and 0.49 for XGBoost. Meanwhile, for non-life insurance, XGBoost performed more accurately, with an RMSE value of 0.35, which is lower than that of the decision tree and random forest.

Since the insurance industry is a significant sector of the financial industry in each country, policymakers can use these findings to enhance the decision-making process, improve their insurance industry, and subsequently develop their financial sectors. Despite differences in model accuracy, policymakers should chose best algorithms for their countries. Further studies should collect more data from a large number of countries, especially those with low levels of income, to account for greater variation and noisier data. Hence, advanced models, such as DL, are suggested for further research to compare their accuracy with that of ML algorithms.

Data availability

The data that supports the finding of this study are available per request.

References

Akbar, N. A., Sunyoto, A., Arief, M. R., & Caesarendra, W. (2020, November). Improvement of decision tree classifier accuracy for healthcare insurance fraud prediction by using Extreme Gradient Boosting algorithm. In 2020 International conference on informatics, multimedia, cyber and information system (ICIMCIS) (pp. 110–114). IEEE. https://doi.org/10.1109/ICIMCIS51567.2020.9354286.
Akbari, A., Ng, L., Solnik, B.: Drivers of economic and financial integration: a machine learning approach. J. Empir. Finance 61(October 2020), 82–102 (2021). https://doi.org/10.1016/j.jempfin.2020.12.005
Article Google Scholar
Amman Hossain, M., Hossen, M, Mahmudul Hasan, A.S.: GDP growth prediction of Bangladesh using machine learning algorithm. In: Icicv, pp. 812–817 (2021)
Apergis, N., Poufinas, T.: The role of insurance growth in economic growth: fresh evidence from a panel of OECD countries. N. Am. J. Econ. Finance 53(December 2019), 101217 (2020). https://doi.org/10.1016/j.najef.2020.101217
Article Google Scholar
Ar, N. A., Sunyoto, A., Rudyanto Arief, M., & aesarendra, W.: Improvement of decision tree classifier accuracy for healthcare insurance fraud prediction by using extreme gradient boosting algorithm. In: Proceedings - 2nd International Conference on Informatics, Multimedia, Cyber, and Information System, ICIMCIS 2020, pp. 110–114. (2020). https://doi.org/10.1109/ICIMCIS51567.2020.9354286
Araujo, G.S., Gaglianone, W.P.: Machine learning methods for inflation forecasting in Brazil: new contenders versus classical models. Latin Am. J. Central Bank. 4(2), 100087 (2023). https://doi.org/10.1016/j.latcb.2023.100087
Article Google Scholar
Aslam, F., Hunjra, A.I., Ftiti, Z., Louhichi, W., Shams, T.: Insurance fraud detection: evidence from artificial intelligence and machine learning. Res. Int. Bus. Finance 62, 101744 (2022). https://doi.org/10.1016/j.ribaf.2021.101744
Article Google Scholar
Bagherzadeh, F., Shafighfard, T.: Ensemble machine learning approach for evaluating the material characterization of carbon nanotube-reinforced cementitious composites. Case Stud. Constr. Mater. 17(July), e01537 (2022). https://doi.org/10.1016/j.cscm.2022.e01537
Article Google Scholar
Bentéjac, C., Csörgő, A., Martínez, G.: A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. (2020). https://doi.org/10.1007/s10462-020-09896-5
Article Google Scholar
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT’2010, pp. 177–186. Physica-Verlag, Heidelberg (2010). https://doi.org/10.1007/978-3-7908-2604-3_16
Chapter Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article Google Scholar
Chen, T., Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794) (2016). https://doi.org/10.1145/2939672.2939785
Chen, X., Cho, Y.H., Dou, Y., Lev, B.I.: Fundamental analysis of XBRL data: a machine learning approach. SSRN Electron. Jo. April (2021). https://doi.org/10.2139/ssrn.3741015
Article Google Scholar
Claveria, O., Monte, E., Torra, S.: Combination forecasts of tourism demand with machine learning models. Appl. Econ. Lett. 23(6), 428–431 (2016). https://doi.org/10.1080/13504851.2015.1078441
Article Google Scholar
Dai, B., Gu, C., Zhao, E., Qin, X.: Statistical model optimized random forest regression model for concrete dam deformation monitoring. Struct. Control. Health Monit. 25(6), 1–15 (2018). https://doi.org/10.1002/stc.2170
Article Google Scholar
Dalal, S., Seth, B., Radulescu, M., Secara, C., Tolea, C.: Predicting fraud in financial payment services through optimized hyper-parameter-tuned XGBoost model. Mathematics 10(24), 44679 (2022). https://doi.org/10.3390/math10244679
Article Google Scholar
Dong, B., Khan, L., Smith, M., Trevino, J., Zhao, B., Hamer, G.L., Lopez-Lemus, U.A., Molina, A.A., Lubinda, J., Nguyen, U.-S.D.T., Haque, U.: Spatio-temporal dynamics of three diseases caused by Aedes-borne arboviruses in Mexico. Commun. Med. (2022). https://doi.org/10.1038/s43856-022-00192-7
Article Google Scholar
Fang, W., Chen, Y., Xue, Q.: Survey on research of RNN-based spatio-temporal sequence prediction algorithms. J. Big Data 3(3), 97–110 (2021). https://doi.org/10.32604/jbd.2021.016993
Article Google Scholar
Flores, E., de Carvalho, J.V.F., Sampaio, J.O.: Impact of interest rates on the life insurance market development: cross-country evidence. Res. Int. Bus. Finance 58(May), 101444 (2021). https://doi.org/10.1016/j.ribaf.2021.101444
Article Google Scholar
Haiss, P., Sümegi, K.: The relationship between insurance and economic growth in Europe: a theoretical and empirical analysis. Empirica 35(4), 405–431 (2008). https://doi.org/10.1007/s10663-008-9075-2
Article Google Scholar
Hasan, M.B., Islam, S.N., Wahid, A.N.M.: The effect of macroeconomic variables on the performance of non-life insurance companies in Bangladesh. Indian Econ. Rev. 53(1–2), 369–383 (2018). https://doi.org/10.1007/s41775-019-00037-6
Article Google Scholar
Heddam, S.: Intelligent data analytics approaches for predicting dissolved oxygen concentration in river: extremely randomized tree versus random forest, MLPNN and MLR. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5772-9_5
Book Google Scholar
Hossain, S., Abtahee, A., Kashem, I., Hoque, M.M., Sarker, I.H.: Crime prediction using spatio-temporal data. In: Communications in Computer and Information Science, vol. 1235 CCIS, pp. 293–301. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-6648-6_22
Chapter Google Scholar
Kanaparthi, V.: The role of machine learning in predicting and understanding inflation dynamics: insights from the COVID-19 pandemic. In: 2024 3rd International Conference on Artificial Intelligence for Internet of Things (AIIoT), Vellore, India, pp. 1–6 (2024). https://doi.org/10.1109/AIIoT58432.2024.10574616
Li, H., Cao, Y., Li, S., Zhao, J., Sun, Y.: XGBoost model and its application to personal credit evaluation. IEEE Intell. Syst. 35(3), 52–61 (2020). https://doi.org/10.1109/MIS.2020.2972533
Article Google Scholar
Lim, S.T., Yuan, J.Y., Khaw, K.W., Chew, X.: Predicting travel insurance purchases in an insurance firm through machine learning methods after COVID-19. J. Inf. Web Eng. 2(2), 43–58 (2023). https://doi.org/10.33093/jiwe.2023.2.2.4
Article Google Scholar
Liu, Y., Zhang, L., Zhou, Y., Xu, Q., Fu, W., Shen, T.: Clustering-based decision tree for vehicle routing spatio-temporal selection. Electronics 11(15), 1–14 (2022). https://doi.org/10.3390/electronics11152379
Article Google Scholar
Martin, L.-C.: Machine learning vs traditional forecasting methods: an application to South African GDP (2019). www.ekon.sun.ac.za/wpapers
Masini, R.P., Medeiros, M.C., Mendes, E.F.: Machine learning advances for time series forecasting. ML (2021). https://doi.org/10.1111/joes.12429
Article Google Scholar
Medeiros, M.C., Vasconcelos, G.F.R., Veiga, Á., Zilberman, E.: Forecasting inflation in a data-rich environment: the benefits of machine learning methods. J. Bus. Econ. Stat. 39(1), 98–119 (2021). https://doi.org/10.1080/07350015.2019.1692904
Article Google Scholar
Molnar, C., König, G., Bischl, B., Casalicchio, G. Model-agnostic feature importance and effects with dependent features: a conditional subgroup approach. Data Mining and Knowledge Discovery, 1–39 (2023). https://doi.org/10.1007/s10618-022-00901-9
Nguyen, D., Kieu, L., Wen, T., Cai, C.: Deep learning methods in transportation domain: a review. IET Intel. Transp. Syst. 12(9), 998–1004 (2018). https://doi.org/10.1049/iet-its.2018.5008
Article Google Scholar
Nieto, B., Matias, J.M., Masegosa, A.D., Murillo, J.: Assessment of the accuracy of machine learning techniques for predicting the vertical total electron content over Europe using cross-validation and bootstrapping. J. Space Weather Space Clim. (2021). https://doi.org/10.1051/swsc/2021007
Article Google Scholar
Olarewaju, O., & Msomi, T. Determinants of insurance penetration in West African countries: A panel auto regressive distributed lag approach. Journal of Risk and Financial Management, 14(8), 350 (2021). https://doi.org/10.3390/jrfm14080350
Orji, U., Ukwandu, E.: Machine learning for an explainable cost prediction of medical insurance. Mach. Learn. Appl. 15, 100516 (2024). https://doi.org/10.1016/j.mlwa.2023.100516
Article Google Scholar
Outreville, J.F. The relationship between insurance and economic development: 85 empirical papers for a review of the literature. Risk Management and Insurance Review. 16(1), 71–122. Chicago. (2013). https://doi.org/10.1111/j.1540-6296.2012.01219.x
Papadimitriou, F., Mertzanis, C.: A comparative analysis of machine learning approaches for the forecast of macroeconomic indicators. J. Econ. Surveys 35(3), 694–726 (2021). https://doi.org/10.1111/joes.12429
Article Google Scholar
Poufinas, T., Gogas, P., Papadimitriou, T., Zaganidis, E. Machine learning in forecasting motor insurance claims. Risks 11(9), 164 (2023). https://doi.org/10.3390/risks11090164
Reinhart, R.: XGBoost vs random forest: a comparative analysis of ensemble learning techniques in insurance fraud detection. In: Proceedings of the 2021 15th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS 2021), pp. 213–218 (2021). https://doi.org/10.1109/SITIS.2021.42
Richardson, A., Mulder, T. (2018). Nowcasting New Zealand GDP using machine learning. https://doi.org/10.2139/ssrn.3256578
Rodríguez-Vargas, A. Forecasting Costa Rican inflation with machine learning methods. Latin American Journal of Central Banking, 1(1–4), 100012 (2020). https://doi.org/10.1016/j.latcb.2020.100012
Saarela, M., Jauhiainen, S. Comparison of feature importance measures as explanations for classification models. SN Applied Sciences 3(2), 272 (2021). https://doi.org/10.1007/s42452-021-04148-9
Sharku, G., & Bajrami, E. Insurance-economic growth nexus–evidence from selected Western Balkan’s Countries. Regional Science Inquiry, 13(2), 53–68 (2021)
Sorel, L., Viaud, V., Durand, P., Walter, C. Modeling spatio-temporal crop allocation patterns by a stochastic decision tree method, considering agronomic driving factors. Agricultural Systems 103(9), 647–655 (2010). https://doi.org/10.1016/j.agsy.2010.08.003
Soni, P., Kumar, D.: Machine learning models for GDP forecasting in developing countries: a comparative study. Int. J. Forecast. 39(1), 40–58 (2023). https://doi.org/10.1016/j.ijforecast.2022.06.007
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1), 1929–1958 (2014)
Srinivas, V., Asokan, R.: Deep learning-based approaches for GDP prediction. In: 2020 3rd International Conference on Artificial Intelligence for IoT (AI-IoT), IEEE, pp. 1–6 (2020). https://doi.org/10.1109/AIIoT49216.2020.00010
Sun, W., Cai, S., Yuen, K.K.: Machine learning for forecasting stock market returns: a systematic review. Appl. Artif. Intell. 36(1), 1–23 (2022). https://doi.org/10.1080/08839514.2022.2086781
Article Google Scholar
Tang, B., Pan, Z., Guo, J.: Predicting GDP growth using machine learning: a comparison between traditional models and modern approaches. J. Econ. Bus. 105(October), 105871 (2022). https://doi.org/10.1016/j.jeconbus.2022.105871
Article Google Scholar
Thilaka, Sundaravalli, E.: A machine learning approach to GDP prediction by analyzing economic indicators. In: 2024 2nd International Conference on Artificial Intelligence and Machine Learning Applications Theme: Healthcare and Internet of Things (AIMLA), Namakkal, India, 2024, pp. 1–7 (2024). https://doi.org/10.1109/AIMLA59606.2024.10531454
Tian, Y., Guo, C., Liu, Y., Wang, J.: Deep learning-based approaches for macroeconomic forecasting in China. J. Econ. Dyn. Control 144(June), 104224 (2022). https://doi.org/10.1016/j.jedc.2022.104224
Article Google Scholar
Tufail, S., Riggs, H., Tariq, M., Sarwat, A. I. Advancements and challenges in machine learning: A comprehensive review of models, libraries, applications, and algorithms. Electronics 12(8), 1789 (2023). https://doi.org/10.3390/electronics12081789
Wang, J., Li, S., Zhang, Y., Gao, X.: Predicting insurance fraud using machine learning algorithms. IEEE Access 8, 63305–63313 (2020). https://doi.org/10.1109/ACCESS.2020.2983261
Article Google Scholar
Wu, X., Zhang, H., Xie, X., Zhang, H., Yang, S.: Forecasting electricity consumption in China using a novel hybrid model combining machine learning and time series analysis. Energy 254(2021), 124449 (2022). https://doi.org/10.1016/j.energy.2022.124449
Article Google Scholar
Yoon, J. Forecasting of real GDP growth using machine learning models: Gradient boosting and random forest approach. Computational Economics, 57(1), 247–265 (2021). https://doi.org/10.1007/s10614-020-10054-w
Zhan, Y., Luo, Y., Deng, X., Grieneisen, M. L., Zhang, M., Di, B. Spatiotemporal prediction of daily ambient ozone levels across China using random forest for human exposure assessment. Environmental Pollution 233, 464–473 (2018). https://doi.org/10.1016/j.envpol.2017.10.029
Zhang, H., Lu, X., Yang, Z.: A new hybrid model for GDP prediction: integrating ARIMA, SVR, and XGBoost. J. Appl. Stat. 48(5), 989–1003 (2021). https://doi.org/10.1080/02664763.2020.1759524
Article Google Scholar
Zhu, C., Chen, T.: The role of machine learning in predicting corporate bankruptcy: a comparative study of deep learning and traditional models. Finance Res. Lett. 50(January), 103312 (2023). https://doi.org/10.1016/j.frl.2022.103312
Article Google Scholar
(n.d.). Retrieved from International Monetary Fund: https://www.imf.org/
(n.d.). Retrieved from World Bank: https://www.worldbank.org/

Download references

Funding

There is no funding available for this study.

Author information

Authors and Affiliations

Department of Eco College of Insurance, Allameh Tabatab’i Univ., Tehran, Iran
Seyed Farshid Ghorashi & Atousa Goodarzi
Departamento de Estructuras de la Edificación e Ingeniería del Terreno, Escuela Técnica Superior de Arquitectura, Instituto Universitario de Arquitectura y Ciencias de la Construcción, Sevilla University, Avenida de Reina Mercedes 2, 41012, Seville, Spain
Maziyar Bahri

Authors

Seyed Farshid Ghorashi
View author publications
You can also search for this author in PubMed Google Scholar
Maziyar Bahri
View author publications
You can also search for this author in PubMed Google Scholar
Atousa Goodarzi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All of the authors contributed equally to the conception and design of this study.

Corresponding author

Correspondence to Seyed Farshid Ghorashi.

Ethics declarations

Conflict of interest

The authors declare no conflict interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (DOCX 3718 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghorashi, S.F., Bahri, M. & Goodarzi, A. Developing and comparing machine learning approaches for predicting insurance penetration rates based on each country. Lett Spat Resour Sci 17, 24 (2024). https://doi.org/10.1007/s12076-024-00387-7

Download citation

Received: 18 April 2024
Accepted: 17 August 2024
Published: 03 September 2024
DOI: https://doi.org/10.1007/s12076-024-00387-7

Keywords

JEL Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Developing and comparing machine learning approaches for predicting insurance penetration rates based on each country

Abstract

Explore related subjects

1 Introduction

1.1 Brief introduction

1.2 Insurance and classical methods

1.3 ML in financial time series, economics, and insurance

1.4 Aim of the paper

1.5 Novelty

1.6 Limitations

2 Data and variables

2.1 Data gathering and definition

2.2 Spatio-temporal data

3 Methodology

3.1 Decision tree

3.2 Random forest

3.3 XGBoosts

3.4 Tuning the ML models

3.5 Evaluation criteria

4 Result and discussion

4.1 LIPR prediction

4.1.1 LIPR prediction with regression decision tree

4.1.2 LIPR prediction with random forest

4.1.3 LIPR prediction with XGBOOST

4.2 NLIPR prediction

4.2.1 NLIPR prediction with regression decision tree

4.2.2 NLIPR prediction with random forest

4.2.3 NLIPR prediction with XGBoost

4.3 TIPR prediction

4.3.1 TIPR prediction with regression decision tree

4.3.2 TIPR prediction with random forest

4.3.3 TIPR prediction with XGBoost

4.4 Feature importance

4.5 Discussion

5 Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (DOCX 3718 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation