1 Introduction

1.1 Brief introduction

The major role of insurance industry is undeniable, as it not only mitigates the risk but also mobilizes funds for financial industry, promoting investment and growth. Also, the importance of insurance in both domestic and international economies was recognized as early as the 1960s. The United Nations Conferences on Trade and Development (UNCTD) acknowledged that both insurance and reinsurance play key roles on economic growth according to Outreville (2013). Since then, numerous studies have been conducted to cover the relationship between all aspects of insurance, such as insurance density, insurance size, penetration rate, and financial system of a country, with a specific focus on economic growth. Extensive research has also analyzed the effect of financial variables, including insurance industry, on economic growth. However, fewer studies have considered how changes of insurance industry are impacted by financial variables and economic growth.

1.2 Insurance and classical methods

Among the various insurance indicators, this paper focuses on life, non-life, and total insurance penetration rates, which refer to the proportion of premiums paid to gross domestic product (GDP). The Insurance Penetration Rate (IPR) indicates the level of development of the insurance sector and is considered an essential indicator of a country’s financial development. Before introducing Machine Learning (ML) and Deep Learning (DL), numerous research studies employed methods such as vector auto regression, vector error correction model, generalized methods of moment, and linear regression to analyze the relationship between the insurance and financial variables. For instance, studies by Flores (2021), Sharku (2021), Pradhan (2016, 2017), Hasan (2018), and Olarewaju (2021) examined the relationship between insurance industry and different financial variables. These studies indicate that the relationship between insurance and financial variables, excluding economic growth and the banking industry, can be positive or negative based on the circumstances of each country. However, they indicate that insurance, economic growth, and banking have a positive relationship with each other. Thus, the IPR is a key factor in measuring and predicting economic growth trends. Classical methods, however, struggle to predict spatio-temporal data with high accuracy, and as a result, ML approaches are used in scientific papers.

1.3 ML in financial time series, economics, and insurance

Financial time series prediction has become a major topic, with various methods of artificial intelligence (AI) developed to improve performance.

Masini (2021) employed different methods of ML for economic and financial time series, using both linear (LASSO, Ridge) and non-linear models, with a special focus on tree-based methods. Their results indicated that when a large dataset exists, non-linear models can be useful for economic prediction. Another study that supports ML algorithms for financial time series data is Araujo (2023). They adopted 50 different methods of classical and ML models to predict the inflation rate of Brazil. The results indicate that, ML models outperformed traditional statistical and benchmark methods in terms of accuracy, and the findings were also supported by Claveria (2016). Additionally, ML models can handle noisy, large, and nonlinear datasets. They also indicated that tree-based algorithms, such as decision tree, random forest, and XGBoost, perform quite well compared to other ML algorithms. Another study regarding inflation prediction was conducted by Rodríguez-Vargas (2020). This study used various ML algorithms and Long-Short Term Memory (LSTM) and showed that although LSTM is the best method for inflation forecasting, Random Forest and XGBoost are extremely useful methods for prediction. Moreover, Kanaparthi (2024) used non-linear ML algorithms like XGBoost and random forest to predict the inflation rate in the USA. The results indicated that XGBoost performed better, especially during high inflation.

Numerous studies have also focused on forecasting GDP and economic growth using ML algorithms (see Papadimitriou and Mertzanis (2021), Soni and Kumar (2023), Srinivas and Asokan (2020), Tang et al. (2022), Tian et al. (2022), and Zhang et al. (2021). Yoon (2021) utilized random forest and XGBoost to forecast Japan’s economic growth from 2001 to 2018, finding ML methods outperformed traditional ones, with gradient boosting models proving more accurate. Conversely, Amman Hossain Mahmudul (2021) predicted Bangladesh’s GDP growth with ML algorithms and found that random forest has higher accuracy compared to the gradient boosting model. In another study conducted by Thilaka (2024), GDP was predicted based on CO2 emissions, the Human Development Index, and life expectancy, and the results showed that the decision tree reached a high score of 1.

Although there is extensive literature on financial time series prediction, few studies have considered the IPRs. Most insurance-related predictions deal with fraud and sales, less with Insurance Density (ID) or IPRs. Lim (2023) used ML algorithms to predict travel insurance sales after COVID-19 and suggested that K-nearest neighbors had the highest accuracy, about 82%, compared to Support Vector Machine (SVM), logistic regression, and random forest. In terms of insurance fraud, Aslam (2022) applied ML models for auto insurance fraud detection, and among all models, SVM has the highest accuracy. For health insurance fraud, Akbar (2020) used both Decision Tree and XGBoost and shows that XGBoost is better at predicting fraud compared to decision tree, with an accuracy of 87% compared to 81%. In addition, Poufinas (2023) used SVM, random forest, decision tree, and XGBoost for insurance claims, and their results showed random forest and XGBoost had the best performance among other algorithms. Orji and Ukwandu (2024) and Ar et al. (2020) predicted medical insurance costs using ML methods. They applied random forest, XGBoost, and gradient boosting machine, and their results indicated that XGBoost had the best performance compared to others. The study held significant importance, particularly for policymakers and insurers, as it aimed to enhance their decision-making process and better meet their needs.

Based on the literature review, most of the studies in the insurance field correlate with claims, sales, and fraud (Reinhart (2021), Wang et al. (2020)). While a few studies consider IPRs or ID. Given the proven effectiveness of ML algorithms Li et al. (2020), especially Decision Tree, Random Forest, and XGBoost, these methods can perform well with complex, non-linear, and noisy data. Hence, it is decided to predict life, non-life, and total insurance penetration rates using tree-based methods and compare their accuracy.

1.4 Aim of the paper

The paper aims to design and compare three different tree-based models, namely Decision Tree, Random Forest, and XGBoost, to predict the life, non-life, and total insurance penetration rates, as well as the best methods for prediction for each country. This study covers a 21 years period, from 2000 to 2021, focusing on 30 OECD countries. It scrutinizes seven financial features, including trade openness, rule of law, financial development index, foreign direct investment, economic growth, inflation rate, domestic credit to private sector. Various studies have demonstrated a positive or negative relationship with the insurance sector.

1.5 Novelty

In this research, ML models are developed to predict life, non-life, and total insurance penetration rates for each country, enabling policymakers to anticipate changes and trends in the financial industry and economic growth, as insurance, financial development, and economic growth are highly correlated, according to Haiss (2008) and Apergis (2020). In this regard, informed decision-making by policymakers can shape economic outcomes and enhance financial development. Also, all the features in this study have been shown to have either a positive or negative impact on the insurance industry across various countries, enhancing the accuracy of prediction. Moreover, the data covers two important global events: the pre- and post-financial crisis and the COVID-19 pandemic, from 2000 to 2021. These global events cause a large variation in the data, and as a result, if an unexpected phenomenon occurs in the future that affects the financial indicators, the model’s accuracy is acceptable. So, the authors offer valuable insight about the effect of these two events on insurance, the financial industry, and economic growth. More importantly, based on the provided literature, most ML methods are applied to various aspects of insurance, with the exception of the level of insurance development, which is the focus of this study.

1.6 Limitations

Data availability poses a significant limitation, as the authors aim to collect data from 2000 to 2023, which is not fully available, and primary coverage high and mid income countries. This limitation introduces a significant bottleneck in ML, as the missing data can lead to the model underfitting. Addressing these biases is important to improve the reliability of the models. Furthermore, the quality data influences prediction outcomes. Inaccurate or incomplete data may weaken the predictions and conclusions. Hence, ensuring high-quality of data is critical to the effectiveness of the ML models.

ML algorithms also face several other challenges. One of the classical problems in ML algorithms is overfitting (Srivastava et al. 2014). In addition, scalability is another issue when using ML models. Large and complex datasets require high memory capacity, processing power, and storage are needed (Bottou 2010). Finally, one ML model is not suitable for all problems. In order to identify the best model, pre-processing and visualization are required to recognize the distribution and patterns of the data (Riggs and Tariq 2023). Therefore, further research should focus on large-scale datasets with more complex relationships and advanced algorithms, such as DL methods, to address the impact of data and ML model limitations on performance.

The reminder of this article is structured as follows: Sect. 2 covers data and variables; Sect. 3 discusses methodology; Sect. 4 is empirical results and ML structure, and Sect. 5 provides a conclusion.

2 Data and variables

2.1 Data gathering and definition

This study covers 30 OECD countries with relatively high levels of income, spanning a 21-year period from 2000 to 2021. Ten indicators are examined in this study, including three dependent variables obtained from (World Bank, n.d.) and seven features sourced from (International Monetary Fund, n.d.), each defined in Table 1. All the features extracted for this study have a positive or negative relationship with insurance, which can aid in accurate prediction. Supplementary material provides the data source.

Table 1 Definition of data

Due to data unavailability, certain data points are missing for a few countries. Given the relatively low variance of the data and the limited proportion of missing values, the linear interpolation method is used to create a complete dataset, effectively addressing the issue of missing values.

In this section of the paper, a data engine is applied to gain a better understanding of trends and changes over the years in each country. First, dependent variables and then features are considered.

Table 2 represents a comprehensive statistical analysis. As it can be seen, all financial indicators shows a relatively large range and variance in data. This might indicate that each country has different policies based on their situation and available capacity. In other words, policymakers apply different policies to boost their financial development. Visualization and analysis of the trend of data for both features and targets are discussed in supplementary material.

Table 2 Summary statistics

2.2 Spatio-temporal data

This research uses spatio-temporal data as its dataset. Spatio-temporal data sequences are usually complex because both time and location are taken into account. To predict a spatio-temporal sequence, time continuity and spatial correlations between different regions should be considered, as these spatial correlations change over time, in order to predict a spatio-temporal sequence. Hence, it is difficult to determine accurate features, according to Zhu and Chen (2023). In the past, statistical methods were used for spatio-temporal data and treated them as multiple- time series. However, since it is challenging to identify non-linear spatio-temporal patterns and spatial correlations, statistical methods do not perform accurately, according to Fang (2021).

In this regard, ML algorithms can cope well with spatio-temporal sequence data, although in some cases, DL methods perform more accurately for predicting the spatio-temporal datasets because they can extract features Nguyen et al. (2018). Additionally, in cases with enormous sequences of data deep, learning methods can predict and train the model better than ML algorithms. Numerous studies have been conducted using on spatio-temporal data, focusing on ML algorithms, particularly decision tree, random forest and XGBoost.

Sorel (2010), Hossain (2020), Nieto et al. (2021), and Liu (2022) used decision tree algorithms based on spatio- temporal data for prediction. The random forest algorithm is used to cope with spatio-temporal data for prediction, according to Bagherzadeh (2022) and Zhan (2018). Finally, many studies, such as Sun (2022) and Dong (2022), used the XGBoost algorithm for the prediction of spatio-temporal data. Predicting financial variables using ML approaches based on time series and spatio-temporal data can be seen in Wu (2022), Chen (2021), Medeiros (2021), and Akbari (2021). Also, in the field of economics, Yoon (2021), Martin (2019), and Richardson (2018) indicate that ML models have a higher accuracy compared to benchmark forecasts for GDP prediction.

Based on the studies mentioned and the dataset of this study, it is decided to use decision tree, random forest, and XGBoost algorithm for the prediction of life, non-life, and total insurance penetration rates and compare them to each other. Figure 1 represents the diagram in the paper.

Fig. 1
figure 1

Diagram of the study

3 Methodology

In this section, three tree-based ML models are developed to predict the IPR for each country. The methods are presented in the following:

3.1 Decision tree

Although there are many methodologies for constructing decision tree, the classification and regression tree is the most well-known algorithm. In a decision tree, the training data is split into homogeneous subgroups, and a simple constant is fitted to each subgroup. Subgroups, or nodes, are formed based on the answer to a simple yes–no question. This process continues until the criteria are satisfied or the maximum tree depth is reached. After forming the nodes completely, the model predicts outputs based on the average response values of all observations in subgroup, according to Wu (2022).

3.2 Random forest

Due to the drawbacks of decision trees, including complex calculations on large datasets, instability with noisy data, and sensitivity to data changes, the random forest was introduced as one of the successful ML algorithms, according to Breiman (2001). A random forest is an ensemble of classifiers composed of decision trees that are generated using two different sources of randomization. First, bootstrap samples are randomly chosen based on the dataset, and attribute sampling is the second level of randomization. After this, decision trees are built, and the output is produced. Subsequently, voting is used to find the best model for prediction. The procedure for the random forest is described in the equation below. Additionally, random forest models provide feature importance, illustrating the contribution of each feature to the prediction accuracy. This provides valuable insights into which features have the most important impact on the model’s prediction (Bagherzadeh 2022; Heddam 2021).

$$h_{k} = \frac{1}{k}\mathop \sum \limits_{k = 1}^{k} h\left( {x;\theta_{k} } \right)$$
(1)
$$E_{X,Y} \left( {Y - h\left( X \right)} \right)^{2} \to E_{X,Y} \left( {Y - E_{\theta } h\left( X \right)} \right)^{2}$$
(2)

where k in Eq. (1) goes to infinity based on the law of large number, in Eq. (1), \(h(x;{\theta }_{k})\) represents a collection of tree predictors, X indicates a random vector with length p and \({\theta }_{k}\) are independent and identically distributed random vectors.

The random forest algorithm is more beneficial compared to a decision tree as it reduces overfitting by averaging the outcome (Orji 2024). However, it may still cause a generalization error within certain limits, according to Dai (2018).

3.3 XGBoosts

The extreme gradient boosting algorithm (XGBoost) is employed to optimize the training of gradient boosting trees. Like other tree-based models, it can be used for both classification and regression. However, compared to other tree-based models, it is highly flexible and efficient and provides better results in terms of accuracy, sensitivity, specificity, and precision. Also, by utilizing advanced regularization, it alleviates the problem of model generalization (Dalal et al. 2022).

XGBoost has many advantages. Firstly, the model trains faster and requires less storage space, reducing tree complexity. Secondly, to increase the training speed and reduce overfitting, the randomization technique is applied in XGBoost. furthermore, to find the best split, this algorithm reduces computational complexity, which is the most time-consuming part of tree-based models. The split-finding algorithm considers all possible splits and identifies the one with the highest gain, which necessitates a linear scan of all sorted attributes to find the best split of each node. To sort data at each node, XGBoost uses a compressed column-based structure where the data is pre-sorted, resulting in in a single sort of each node. In addition, this structure allows for finding the best split for each node in parallel (Bentéjac 2020).

Given a dataset including n samples and m features, XGBoosts builds t trees in order to predict \({\widehat{y}}_{i}^{(t)}\). Equation (5) shows the prediction up to the t-th tree. The equation indicates that for each iteration, a classifier \({f}_{k}\left({x}_{i}\right)\) is generated, and the summation of the previous prediction value and the decision tree result of the round provide the predicted value, as shown in Eq. (6) (Li 2020).

$$\hat{y}_{i}^{\left( t \right)} = 0$$
(3)
$$\hat{y}_{i}^{\left( 1 \right)} = f_{1} \left( {x_{i} } \right) = \hat{y}_{i}^{\left( 0 \right)} + f_{1} \left( {x_{i} } \right)$$
(4)
$$\hat{y}_{i}^{\left( 2 \right)} = f_{1} \left( {x_{i} } \right) + f_{2} \left( {x_{i} } \right) = \hat{y}_{i}^{\left( 0 \right)} + f_{2} \left( {x_{i} } \right)$$
(5)
$$\hat{y}_{i}^{\left( t \right)} = \mathop \sum \limits_{k = 1}^{t} f_{k} \left( {x_{i} } \right) = \hat{y}_{i}^{{\left( {t - 1} \right)}} + f_{t} \left( {x_{i} } \right)$$
(6)

where \({x}_{i}\), \({y}_{i}\), \({\widehat{y}}_{t}\) and is the number of features, actual values, and predicted values, respectively.

3.4 Tuning the ML models

As demonstrated in Fig. 1, ML algorithms comprise several important steps, including data preparation, splitting the data into training and test datasets, and training the model. To find the best model with high performance and the lowest loss function, hyperparameters are tuned and evaluated with a test dataset. The dataset was split into a training set of 70% and a test set of 30%.

To find the best model for each individual algorithm, hyperparameters are tuned using k-fold cross- validation. Also, in all three models predicting life, non-life, and total insurance penetration rates, the range of hyperparameters and optimal values of hyperparameters are provided. Table 3 presents all the hyperparameters tuned for this study.

Table 3 Hyperparameters

3.5 Evaluation criteria

To evaluate all three models in this paper, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-Squared (\({R}^{2}\)) are used, as defined from Eqs. (3) to (6), respectively.

$$MSE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {x_{i} - \hat{y}_{i} } \right)$$
(7)
$$MSE = \frac{1}{n}\sqrt {\mathop \sum \limits_{i = 1}^{n} \left( {x_{i} - \hat{y}_{i} } \right)}$$
(8)
$$MAE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {\hat{y}_{i} - x_{i} } \right|$$
(9)
$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \hat{y}_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} x_{i} - \overline{y}}}$$
(10)

where \({x}_{i}\) is the actual observation in the data, \({\widehat{y}}_{i}\) is the prediction of \({x}_{i}\), \(\overline{y }\) is the mean value of a sample, and n is the number of instances. For the first three criteria (MSE, RMSE, MAE), closer values to 0 indicate better model performance, and the last criteria (R-Squared) should be closer to 1 to demonstrate acceptable model performance.

4 Result and discussion

This study develops and evaluates well-performing ML methods for predicting life, non-life, and total insurance penetration rates for 30 OECD countries. In this section, the prediction of the LIPR is discussed. Three ML models—namely, decision tree, random forest, and XGBoost—were developed and compared to each other.

4.1 LIPR prediction

4.1.1 LIPR prediction with regression decision tree

In this section, a decision tree model was developed to predict the LIPR. The inputs for the ML model included seven financial features, as represented in Table 1. To find the best-performing model for training, K-fold cross-validation was applied to train the decision tree model. The range of hyperparameters and chosen hyperparameters are presented in Table 4. Supplementary material (Fig. S7-a) includes graphs of errors for each model during tuning.

Table 4 Search grid point for finding the hyperparameters for decision tree

The best hyperparameters of the decision tree model were identified when it reached its lowest metric error: an RMSE of 0.59, MAE of 0.28, MSE of 0.35, and R-squared value of 0.98 for the training dataset. Likewise, the optimum decision tree model presented corresponding metrics of 2.50, 0.28, 6.27, and 0.81 for the test data set. Figure 2 illustrates the comparison between actual and predicted values for both the training and test datasets for the decision tree algorithm.

Fig. 2
figure 2

The relationship between actual and predicted values: a training dataset, b test dataset

4.1.2 LIPR prediction with random forest

Following the decision tree, a random forest algorithm, as an advancement of the decision tree, is developed, offering the crucial advantage of demonstrating the feature importance. The range and optimal hyperparameters for the random forest model are provided in Table 5. The supplementary material (Fig. S7-b) contains graphs showing tuning errors for each model.

Table 5 Search grid point for finding the hyperparameters for random forest

The best-performing random forest model calculates RMSE of 0.9, MAE of 0.35, and MSE of 0.81. The R-squared value, another evaluation criteria, is 0.97 for the training dataset. Moreover, for the test dataset, the error metrics, including RMS, MAE, and MSE, are 1.8, 1.2, and 3.2, respectively. Additionally, the R-squared value for the test dataset is 0.9, illustrating that the random forest model has better performance for the test dataset. Nevertheless, during the training process, the decision tree shows greater robustness. Figure 3 demonstrates the accuracy of the random forest model by comparing the actual and predicted values for both the test and training datasets.

Fig. 3
figure 3

The relationship between actual and predicted values: a training dataset, b test dataset

4.1.3 LIPR prediction with XGBOOST

XGBoost is a specific implementation of the gradient boosting algorithms, which have been widely used in ML studies (Chen and Guestrin 2016). As the XGBoost algorithm is one of the most efficient methods in ML prediction, it is the final model used in this paper. Firstly, Table 6 describes the range and optimal values of hyperparameters. Supplementary material (Fig. S7-c) provides graphs of tuning errors across different models.

Table 6 Search grid point for finding the hyperparameters for XGBoost

In the well-performing XGboost model, the value of RMSE is 0.33, MAE is 0.31, and MSE is 0.1, and the R-squared value is 0.99 for the training dataset. Similarly, these metrics are 1.98, 1.29, 3.9, and0.88 for the test dataset. Based on the results, XGBoost is the best-performed model for the training dataset, however, for the test dataset, random forest is better compared to XGBoost, though the difference is not significant. Lastly, Fig. 4 illustrates the comparison between actual and predicted values for both the test and training datasets.

Fig. 4
figure 4

The relationship between actual and predicted values: a training dataset, b test dataset

4.2 NLIPR prediction

After developing and comparing tree-based ML algorithms for LIPR, ML methods are also developed and compared for NLIPR. Three ML models—decision tree, random forest, and XGBoost—were developed for this purpose.

4.2.1 NLIPR prediction with regression decision tree

Initially, a decision tree model is developed using the same financial features. Table 7 indicates the range and the optimal value of hyperparameters. Graphs illustrating tuning errors for each model are available in the supplementary material (Fig. S8-a).

Table 7 Search grid point for finding the hyperparameters for decision tree

The ideal hyperparameters are identified when the error metrics reach their lowest values during cross-validation: an RMSE of 0.46, MAE of 0.33, and MSE of 0.21, also R-squared should reach its highest value, which is 0.85 for the training dataset. Meanwhile, for the test dataset, the R-squared value is 0.61, while the error metrics illustrate an RMSE of 0.99, MAE of 0.62, and MSE of 0.99. Figure 5 represents the comparison between actual and predicted values of the NLIPR for both the training and test dataset in the decision tree model.

Fig. 5
figure 5

The relationship between actual and predicted values: a training dataset, b test dataset

4.2.2 NLIPR prediction with random forest

Another ML model used is the random forest method. Table 8 illustrates the range and chosen hyperparameters for the random forest algorithm. Error graphs for model tuning are provided in the supplementary material (Fig. S8-b).

Table 8 Search grid point for finding the hyperparameters for random forest

Based on these chosen hyperparameters, the error metrics were calculated. For the training dataset, the RMSE is 0.56, MAE is 0.36, MSE is 0.31, and the R-squared reached 0.78. However, these metrics for the test datasets are an RMSE of 1.05, MAE of 0.66, MSE of 1.1, and R-squared of 0.57. Figure 6 indicates the relationship between the actual and predicted values of the NLIPR for the random forest.

Fig. 6
figure 6

The relationship between actual and predicted values: a training dataset, b test dataset

The results show that for both the training and test datasets, the decision tree has a higher accuracy compared to the random forest.

4.2.3 NLIPR prediction with XGBoost

The final algorithm for predicting the NLIPR is XGBoost. Table 9 demonstrates the range and ideal hyperparameters after tuning the XGBoost model. The supplementary material (Fig. S8-c) contains graphs showing tuning errors for each model.

Table 9 Search grid point for finding the hyperparameters for XGBoost

By reaching these optimal values of hyperparameters, the error metrics are described as follows: an RMSE of 0.35, MAE of 0.23, MSE of 0.12, and R-squared of 0.91 for the training dataset. These metrics for the test dataset are 0.92, 0.57, 0.85, and 0.67. Figure 7 represents the comparison and actual values for the XGBoost method.

Fig. 7
figure 7

The relationship between actual and predicted values: a training dataset, b test dataset

The results indicate that the XGBoost method is the best-performing model for NLIPR rate compared to the decision tree and random forest, as the error metrics for the XGBoost algorithm are lower and the R-squared value is higher in comparison to the decision tree and random forest.

4.3 TIPR prediction

In the last part of the empirical result, decision tree, random forest, and XGBoost models are developed for predicting the TIPR to identify the best-performing algorithm.

4.3.1 TIPR prediction with regression decision tree

Firstly, a decision tree model is developed for predicting the TIPR. Table 10 represents the range and the optimal values of hyperparameters. Supplementary material (Fig. S9-a) includes graphs of errors for each model during tuning.

Table 10 Search grid point for finding the hyperparameters for decision tree

When the optimal values of hyperparameters are determined, the error metrics reach their lowest values. For the training dataset, RMSE is 0.57, MAE is 0.23, MSE is 0.32, and R-squared value is 0.99. Nevertheless, an RMSE of 2.32, MAE of 0.23, MSE of 5.4, and R-squared of 0.86 are calculated for the test dataset. Figure 8 represents a comparison between actual and predicted values for the decision tree model.

Fig. 8
figure 8

The relationship between actual and predicted values: a training dataset, b test dataset

4.3.2 TIPR prediction with random forest

The range and the ideal values of hyperparameters for the random forest model are described in Table 11. Graphs illustrating tuning errors for each model are available in the supplementary material (Fig. S9-b).

Table 11 Search grid point for finding the hyperparameters for random forest

For the random forest model, an RMSE of 0.25, MAE of 0.09, MSE of 0.06, and R-squared of 0.99 are recognized for the training dataset. Similarly, for the test dataset, the values are 1.1, 0.8, 1.4, and 0.96, respectively. Figure 9 compares the actual and predicted values.

Fig. 9
figure 9

The relationship between actual and predicted values: a training dataset, b test dataset

The results indicate that random forest has significant effectiveness for both training and test datasets.

4.3.3 TIPR prediction with XGBoost

XGBoost, the final algorithm, is discussed in this section for TIPR prediction. The range and optimal hyperparameters for XGBoost are presented in Table 12. Error graphs for model tuning are provided in the supplementary material (Fig. S9-c).

Table 12 Search grid point for finding the hyperparameters for XGBoost

After tuning the hyperparameters, the error metrics reached their lowest value in k-fold cross validation: an RMSE of 0.35, MAE of 0.25, MSE of 0.12, and R-squared of 0.99 were identified for the training dataset. Meanwhile, for the test dataset, the values are 1.46, 1.06, 2.14, and 0.95, respectively. Figure 10 illustrates the comparison between the actual and predicted values for the TIPR using the XGBoost model.

Fig. 10
figure 10

The relationship between actual and predicted values: a training dataset, b test dataset

The results indicate that for both training and test datasets, the random forest is the best-performing model compared to others, although there is no significant difference compared to XGBoost.

4.4 Feature importance

One of the significant benefits of the random forest method is its ability to measure feature importance. Many studies have used this feature to highlight the importance of various indicators (see Saarela and Jauhiainen 2021; Molnar et al. 2023; Orji and Ukwandu 2024). Figure 11 illustrates the importance of features for life, non-life, and total insurance penetration rates. As it can be seen for LIPR and TIPR, trade percent is the most crucial feature, whereas economic growth is the least important. Additionally, for the NLIPR rate, rule of law and economic growth have the highest and lowest impact, respectively.

Fig. 11
figure 11

Feature Importance by random forest

4.5 Discussion

In the previous part of the paper, decision tree, random forest, and XGBoost were developed to predict life, non-life, and total insurance penetration rates, evaluating the best-performing algorithms for each. Table 13 compares accuracy, in terms of RMSE and R-squared, for both train and test datasets. Based on the results, all three models performed well for the training dataset, especially for total and life insurance, where the R-squared values for all of them are higher than 0.97. Although XGBoost is the best-performing model for LIPR and TLIPR for training datasets, the random forest algorithm has higher accuracy for NLIPR. Meanwhile, for the test dataset, XGBoost is the best-performing model for NLIPR, and the random forest has the highest accuracy for both LIPR and TIPR, although the difference in effectiveness is not significant compared to XGBoost.

Table 13 Model Selection for Prediction IPR based on R2 and RMSE

Figure 12 represents the comparison between actual and predicted values for life, non-life, and total insurance penetration rates. As can be seen, all models indicate high accuracy. However, according to Fig. 12b, due to noisy data, the prediction is not as good as for LIPR and the TIPR.

Fig. 12
figure 12

Prediction versus real values, a LIPR, b NLIPR, c TIPR

The scenario might be different for each country. Table 14 illustrates the absolute error for some countries based on all developed methods and shows that the best model for each country might vary. For instance, in terms of LIPR, the decision tree algorithm performs best for Hungary, Iceland, Turkey, and Switzerland. However, the XGBoost model has higher accuracy for Ireland, the United States, and Germany. Additionally, the random forest model outperformed the decision tree and XGBoost in predicting NLIPR in the United States, Denmark, as well as TIPR in Hungary, Mexico, and Japan. Therefore, based on the provided results, an appropriate algorithm should be chosen for each country.

Table 14 Countries’ absolute error

Figures 13, 14, and 15 represent the average real value for LIPR, NLIPR, and TIPR based on test dataset, respectively.

Fig. 13
figure 13

Actual versus predicted value of LIPR

Fig. 14
figure 14

Actual versus prediction values of NLIPR

Fig. 15
figure 15

Actual versus predicted value of TIPR

5 Conclusion

Insurance plays a crucial role into the economic growth of each country. Predicting IPR provides deep insights into different countries’ economic conditions, financial variables, and insurance industries of different countries. Therefore, predicting IPRs using financial variables can offer valuable information about each country’s insurance sector and financial industry.

To identify the most robust and effective model overall, as well as the most effective method for each country, three ML models were developed and compared to each other. The study covers 30 OECD countries from 2000 to 2021.

In addition, to evaluate algorithms, criteria such as RMSE, MSE, MAE, and R-Squared were considered. All the hyperparameters were tuned using k-cross validation, and the optimal value of each hyperparameter was chosen when loss functions reached their minimum value and R-squared values reached their maximum.

For LIPR and TIPR on unseen data, random forest and XGBoost are the best-performing models overall, with no significant difference, although random forest is slightly more accurate. However, results can vary by country. For instance, in Iceland, the decision tree predicted LIPR better, with an average absolute error of 0.27, compared to 0.61 for random forest and 0.49 for XGBoost. Meanwhile, for non-life insurance, XGBoost performed more accurately, with an RMSE value of 0.35, which is lower than that of the decision tree and random forest.

Since the insurance industry is a significant sector of the financial industry in each country, policymakers can use these findings to enhance the decision-making process, improve their insurance industry, and subsequently develop their financial sectors. Despite differences in model accuracy, policymakers should chose best algorithms for their countries. Further studies should collect more data from a large number of countries, especially those with low levels of income, to account for greater variation and noisier data. Hence, advanced models, such as DL, are suggested for further research to compare their accuracy with that of ML algorithms.