Introduction

Customer churn is one of the key factors affecting the benign development of industries and enterprises, and at the same time, it is a very challenging research topic in both academia and industry1,2,3, especially for those information industries relying on the subscription model and the order purchase operation model, customer churn, especially the churn of key customers, can be fatal to their impact. Reducing 5% of customer loss rate can increase profits by 25–125%2. Unfortunately, this always requires lots of manual efforts to analyze data, and it is often too late to take actions to retain them. In order to retain more existing old customers, especially some key customers, many companies have made many attempts to differentiate between churned and non-churned customers, so as to achieve the purpose of retaining churned customers, but the actual effect is very poor. As we all know, the loss of old customers not only affects revenue, but also affects the attraction of new customers. In addition, the cost of developing a new customer is often much higher (almost 5–6 times) than the cost of retaining an old customer4,5. So, is it possible to research efficient customer churn prediction models for customer churn prediction by using machine learning-related algorithms in conjunction with the actual needs of the industry? At the same time, in order to help those decision makers who do not have the theoretical foundation of algorithms to make decisions quickly and efficiently, is it possible to develop an intelligent, convenient, efficient and intelligent early warning system that can detect or predict the existing customer churn in a timely manner to help the industry, and then the enterprises can take relevant actions to retain customers when they find that there is a risk of churning key customers, so as to minimize the losses of the enterprises? In part of the related work the theoretical basis of Gradient Boosting Algorithm6,7, Bayesian Algorithm8,9, Support Vector Machine Algorithm10,11,12,13,14,15, Random Forest Algorithm16, K Neighborhood Algorithm17,18, Logistic Regression Algorith19,20, Decision Tree Algorithm21,22,23,24 and Neural Network Algorithms25,26,27,28,29 are described and the research on application of these algorithms in customer churn prediction is discussed. The literature related to the above algorithms is restating the superiority of the single algorithm they use, and after analyzing them, it can be concluded that these algorithms are affected by the characteristics of the dataset, and there is a strong dependency between their algorithms and the dataset, and then there is no such thing as being able to use one algorithm alone to solve all the problems in any practical application scenarios. Based on the shortcomings of the traditional algorithms analyzed above, this paper proposes a model based on Ensemble-Fusion (Integrated Learning Fusion), in order to meet the universality of various complex scenarios through the model, and expects to be able to provide academia and industry with a pervasive and efficient customer churn prediction solution. So in this paper, we first propose a customer churn prediction algorithm based on the Ensemble-Fusion model. Then it proposes an efficient churn solution based on the Ensemble-Fusion model. Finally, in order to help the information industry make efficient customer churn decisions, a real-time intelligent early warning system for customer churn is developed through theory-guided practice, which can monitor customer dynamics in real-time, help enterprises to identify potential lost customers in advance, and provide early warning at the first moment to remind the sales team or the Customer success management team (CSM) to take proactive action to retain lost customers, thus reducing the risk of fatal blow to the enterprise because of customer churn.

Given the above purposes, this paper conducts research on customer churn prediction through machine learning related theories and algorithms, firstly gives a solution to deal with the huge and complex datasets in the industry, then proposes the Ensemble-Fusion (Integrated Learning Fusion) prediction model for customer churn, and finally, in order to further guide the theory to practice, facilitate the enterprises to take actions quickly and efficiently to retain customers, especially the key customers, in order to improve customer retention. Especially the retention of key customers. Combined with my many years of experience in the industry, I have developed an end-to-end real-time intelligent early warning system for customer churn, which not only predicts customer churn in an organization’s production environment, but also sends out early warnings to alert the relevant personnel such as the sales team and the customer success team, so that the relevant teams can take effective action to retain the customers who are about to be lost in the first time. The system not only predicts customer churn in an organization’s production environment, but also sends out early warnings to alert relevant personnel such as sales and customer success teams so that they can take immediate action to retain lost customers. In order to solve the above problems, we must first deal with the problems encountered in the research, specifically in the research work encountered in the actual research and development of the very difficult problems are as follows: First, the real structure of the production data is very complex and the relevant data are often distributed in different regions of the world in different departments and data structure of different databases, the collection of data is very difficult, and due to the restriction of some sensitive information and the relevant agreements, it is very difficult to collect all the relevant data. It is also difficult to collect all the relevant data due to sensitive information and related protocol issues. Therefore, the problem of customer churn data collection becomes how to construct an effective model with a limited data set. Secondly, in the collected relevant data, there is still a lot of noise in the data, which is very imbalance30,31,32,33,34,35,36,37,38 due to the actual impact of business complexity and there are no labels to mark whether a customer is churned or not, which requires that a lot of prior work and business knowledge should be involved before proceeding with the collection and processing of the data. In order to address the above issues in customer churn data prediction, this paper’s main contributions of the work are as follows:

  1. (1)

    This paper proposes a novel model named Ensemble-Fusion based on ML (Machine Learning) related theories and algorithms to predict customer churn in SAAS36 (Software-as-a-Service, SAAS is a cloud-based software delivery model in which the cloud provider develops and maintains cloud application software) production environments, which focuses on the exceptionally complex data collection, processing and application in the actual production line, and organizes a detailed customer churn prediction data processing architecture diagram is shown(detailed in Sect. “Customer churn prediction solution based on Ensemble-Fusion model”), and finally the solution proposed in this paper is used in the actual production environment to achieve good results.

  2. (2)

    This paper combines machine learning theories and algorithms, such as support vector machine algorithms, random forest algorithms, K-neighborhood algorithms, gradient boosting algorithms, logistic regression algorithms, Bayesian algorithms, deci- sion tree algorithms and neural network algorithms, and other 9 categories of 17 machine learning algorithms as a baseline classifiers to propose the “customer churn data processing architecture based on the integration of learning fusion (Ensemble- Fusion)”. Fusion-based customer churn prediction model and verified the high accuracy and effectiveness of the churn prediction model by evaluating the key indexes of the machine learning model, such as precision, recall, accuracy, AUC37(Area under the ROC38 Curve, AUC measures the entire two-dimensional area underneath the entire ROC curve. AUC provides an aggregate measure of performance across all possible classification thresholds.) and F1-score39,40(F1-score is an important evaluation metric that is commonly used in classification task to evaluate the performance of a model. F1-score is a way of combining the precision and recall of the model, and it is defined as the harmonic mean of the model’s precision and recall).

  3. (3)

    In order to further improve the productivity of the industry efficiently, by linking theory to practice, this paper also designs and develops an intelligent early warning system based on the Ensemble-Fusion model to help enterprises predict customer churn, especially the churn of important customers, quickly and effectively, so as to help them retain churned customers and reduce the churn that brings. The system is designed to help companies retain lost customers and minimize the fatal blow to the company due to customer churn. The intelligent system can not only present important customers with high probability of churn, but also automatically provide relevant information based on the prediction results to remind relevant personnel to take proactive actions to retain important customers that are about to be churned, so as to reduce losses.

This paper not only provides specific solutions to the important problem of cus- tomer churn from theory, but also translates the theory into a specific intelligent early warning system, which can efficiently help enterprises, especially those who don’t know the background knowledge of machine learning and other relevant leadership decision- making personnel to easily make effective decisions about customer churn, so as to be able to retain key customers and increase the competitiveness of the enterprise. The system can be used to retain key customers and increase the competitiveness of an organization.

The rest of this paper is organized as follows, in Section “A research approach to customer churn prediction based on Ensemble-Fusion model”, it mainly introduces the theory and methodology, solution, and overall architectural design of the machine learning-based customer churn intelligent system and introduces the customer churn prediction algorithm based on the Ensemble-Fusion model proposed in this paper. In Section “Experiment and result”, the proposed customer churn prediction algorithm is validated and the high accuracy and effectiveness of the churn prediction model are verified by the key metrics of machine learning model evaluation, such as precision, recall, accuracy, AUC , and F1-score37,38,39,40. Section “Intelligent early warning system for customer churn prediction based on Ensemble-Fusion model” describes the main functions of the intelligent early warning system for customer churn prediction, and also provides a detailed description of the User Cases associated with this intelligent system. A review of relevant customer churn research is presented in Section “Related work”. Finally, relevant conclusions and outlook are summarized in Section “Conclusions and future work”.

A research approach to customer churn prediction based on Ensemble-Fusion model

This part proposes a solution for customer churn prediction based on the Ensemble- Fusion model: firstly, it comprehensively outlines the specific scenarios to be solved for customer churn, and gives the ideas and feasible solutions to solve the problem from top to bottom. Then the specific design and implementation of an end-to-end customer churn intelligent prediction system is proposed: specifically including the collection and processing of complex datasets, the construction of prediction models, and the intelligent system platform in three parts, each of which contains a detailed process. Then this paper provides an in-depth analysis of the machine learning model for customer churn prediction, and finally this paper proposes a new customer churn prediction model and gives a specific implementation algorithm.

Customer churn prediction solution based on Ensemble-Fusion model

This part proposes a solution based on the Ensemble-Fusion model to predict customer churn and help organizations reduce customer churn. The detailed process of the solution is depicted in Fig. 1, as shown in Fig. 1, the solution consists of two main parts: the offline training part and the online inference part. During offline training, data preprocessing30,31,32,33 s first required to clean and label the input data, the annotation is done by labeling the data with churn or non-churn. Then, the relevant features of the data are extracted based on the business knowledge, such as the feature “Trend of meetings compared to last year” which is used to describe the number of meetings booked by customers in the current year compared to the number of meetings booked by customers in the previous year, and the number of meetings booked also reflects the trend of imminent churn of customers. The feature “Trend in meeting duration compared to last year” can be used to characterize the total duration of meetings in the current year compared to the total duration of meetings in the previous year, which can be used to predict the trend of customer churn. These extracted features can effectively reflect the trend of imminent or significant customer churn. Specific model features are described in Table 1, where model training data information is used from actual production line usage data.

Fig. 1
figure 1

Customer Churn Solution Flowchart.

Table 1 Detailed description of characteristics related to customer churn.

The process of customer churn prediction processing and the logical relationship between data transfers are detailed in Fig. 2. In addition, since there are only a few churned (noisy) data, data balancing-related processes must be performed before training. These features can then be used to iteratively train and validate the machine learning model until the model is validated well enough to be deployed directly to a production environment. Finally, the rigorously validated model can be deployed in a production environment to predict the likelihood of customer churn in real time.

Fig. 2
figure 2

Architecture diagram of customer churn prediction data processing.

For the online inference component, data cleaning and feature engineering35,36,37 are also required to construct the training dataset. The dataset here does not contain labeled data, mainly because the goal to be predicted is whether customers will churn in the following months, which has not occurred in the previous inference process. After obtaining the trained model, test data also needs to be fed into the machine learning model to infer the final prediction. Finally, information about the high churn customers predicted by the validated machine learning model will be displayed on the intelligent churn prediction system. Information about the churn prediction will be notified to the project stakeholders in real-time via email, instant messaging, and other messaging channels so that they can proactively take action to minimize the risk of churn losses.

Customer churn data prediction algorithm based on Ensemble-Fusion model

In order to better carry out the research on customer churn rate, this paper focuses on the theoretical basis of the Support Vector Machine algorithm, Random Forest algorithm, K-neighborhood algorithm, Gradient Boosting algorithm, Logistic Regression algorithm, Bayesian algorithm, Decision Tree algorithm, and Neural Networks algorithm in Section “Related work” and discusses the research on the application of these algorithms in the prediction of customer churn rate. The literature related to the above algorithms restates the superiority of the single algorithm they use, and after analyzing them, it can be concluded that these algorithms are affected by the characteristics of the dataset, and there is a strong dependency between their algorithms and the dataset, and then there is no such thing as being able to use one algorithm alone to solve all the problems in any practical application scenarios. Based on the shortcomings of the traditional algorithms analyzed above, this paper proposes a model based on Ensemble-Fusion (Integrated Learning Fusion), in order to meet the universality of various complex scenarios through the model, and expects to be able to provide academia and industry with a pervasive and efficient customer churn prediction solution.

This subsection focuses on the detailed construction process of the customer churn prediction method based on the Ensemble-Fusion model, which is described in detail in Algorithm 1, and compared with the experimental results of 17 machine learning algorithms through the model in the experimental part of Section “Experiment and result”, so as to validate that the model has a high accuracy rate, strong robustness, and ease of scalability.

End-to-end customer churn prediction real-time intelligent early warning system design

To further help organizations reduce customer churn, this subsection designs and develops a customer churn intelligent prediction system. The system consists of three main parts, the first part is mainly the collection and processing of different business-related data set and detailed processing, which mainly includes four major processes, of which the first major process includes the access of heterogeneous data, due to the unusual complexity of the source of data in the real production environment, which mainly includes the system application data, Billing (financial billing) customer data, prod- uct transaction data, Product discount data, product sales data, cross-departmental transaction data, reconciliation data and posting data. In a large multinational group.

figure a

Customer ChurnPrediction Algorithm Based on Ensemble Fusion Model

of companies, due to the different technical architectures of each system, the data for- mat is not the same, generally JSON, XML, plain text files and other formats. To process the data, it is necessary to unify the data format here, from different hetero- generous databases through ETL (Extra, Transform, Load) to achieve from different types of databases (e.g., MySQL, Oracle, MongoDB, and Redis) to get the data, and finally unified storage in the MySQL database. The second major process is to structure the data by managing the database to construct training and testing datasets for the next machine learning models. The third major process is to perform the construction of the machine learning model for customer churn prediction through the formatted and unified dataset acquired in the previous step (details will be elaborated in Sect.  “AUC results and analysis”). The fourth major part is the transfer of business logic through the standardized API interface (Restful API), and ultimately display of relevant information on the front-end page, which mainly includes the display of customer churn information, the display of customer churn heat map, the customer churn management platform, and the analysis of customer churn 360-degree related information, which is elaborated in detail in Fig. 2(Customer Churn Prediction Data Processing Architecture Diagram). The second part is the ML (Machine Learning) modeling system, which includes data acquisition, feature engineering, and model training, and this part is elaborated in subsection 2.3. The third part is the visualization and presentation plat- form which will display the information related to customer churn, and this relevant part will be described in detail in Section “Experiment and result”. The details of the system architecture are described in detail in Fig. 3, as shown in Fig. 3, the system mainly consists of the following parts, the first part is the collection of data, for the Fortune 500 multi- national corporations, their various businesses are spread all over the world, and the collection of data is a very complex and time-consuming work. The second part is the data processing such as feature engineering on the data collected in the first part, then the training and validation of the machine learning model, and finally obtaining a machine learning model with the highest accuracy rate to be used in the customer churn prediction system. The third part is the platform display part, which mainly displays multi-dimensional warning information and real-time forecasts for specific customer churn information, and the specific related information and functions will be elaborated in Section “Experiment and result”.

Fig. 3
figure 3

Architecture diagram of customer churn intelligent early warning system.

Specific user usage examples of this intelligent system are described in detail in Fig. 4. As shown in Fig. 4, the sales layer and the leadership layer are two important key target roles that are important in the platform. At the sales level, the intelligent system displays customers with high churn risk on the platform and provides relevant details. The platform also sends out regular alert emails, timely messages, and other early warning information to notify the relevant project stakeholders to take proactive action to intervene in the impending churn. Additionally, salespeople can send feedback about forecasts to help continuously improve and optimize the proposed machine learning model. For leadership, it is even more important to keep track of global customer churn rather than individual customer churn. To solve this problem, the intelligent real-time alert system is designed with a dashboard module for leadership managers to show the overall churn trend from a global perspective, thus facilitating decision-makers to make efficient decisions at the first time.

Fig. 4
figure 4

Use case diagram for a customer churn platform.

Experiment and result

This section focuses on the comparison of the experimental results of the proposed Ensemble-Fusion model-based machine learning for customer churn prediction and the classical machine learning 9 categories and 17 algorithms for customer churn predic- tion. Here, a private dataset of the customer production line system of the Company from 2015 to 2022 is used, where 80% of the data is used for training and 20% of the data is used for testing, in which K-fold cross-validation is used to test the accuracy of the model.

Model evaluation indicators

In order to evaluate the performance of machine learning models, relevant metrics recognized in the field of machine learning are usually used, namely precision, recall, accuracy and F1-score38,39,40,41. These metrics represent the performance of predictive models for customer churn prediction. The meanings of the metrics are explained here in a relevant way, with true positives and false positives denoted as TP and FP, respectively42, and true negatives and false negatives denoted as TN and FN, respectively43.TP stands for the number of customers whose actual labels are churned ( predict label is churn), FP stands for the customers whose actual customers are labeled as not churned but whose predicted customer labels are churned number, FN represents the number of customers whose actual label is churn but whose predicted label is not churn, and TN represents the number of customers whose actual label is not churn and whose predicted label is not churn. Thus, precision, recall, accuracy, and F1 score can be described as follows:

$$Prection = \frac{TP}{{TP + FP}}\;Recall = \frac{TP}{{TP + FN}}$$
(1)
$$Accuracy = \frac{TP + TN}{{TP + FP + TN + FN}}\;F1 - Score = \frac{2Prection * Recall}{{Prection + Recall}}$$
(2)

Results of model indicators related to customer churn prediction

To evaluate the performance of the customer churn prediction algorithm based on the Ensemble-Fusion model proposed in this paper, the customer churn prediction is performed by the model proposed in this paper and 17 machine learning algorithms in 9 major categories of machine learning classics respectively. The performance metrics of precision, recall, accuracy, and F1-score38,39,40,41 are compared, and the detailed results of the specific comparison can be found in Table 2. Among the 17 machine learning algorithms in 9 major classes of machine learning classics, the accuracy of gradient boosting classifiers and random forests are 95.32% and 94.29%, respectively, and the F1-score of the gradient boosting classifier is up to 96.3%, which is better than other machine learning classic algorithmic classifiers, while the integrated learning fusion model proposed in this paper achieves an accuracy rate of 95.35%, and the F1-Score reaches 96.96% significantly better than other machine learning classic benchmark classifier algorithms. The results of Precision, Recall, Accuracy, and F1-Score of 17 machine learning algorithms in 9 categories of machine learning classics are shown in detail in Figs. 5, 6, 7 and 8 for comparison.

Table 2 Comparison of results of customer churn prediction algorithm metrics.
Fig. 5
figure 5

Comparison of algorithm precision.

Fig. 6
figure 6

Comparison of algorithm recall.

Fig. 7
figure 7

Algorithm Accuracy comparison chart.

Fig. 8
figure 8

Algorithm F1-Score Comparison chart.

AUC results and analysis

To further evaluate the performance of the model, this section also uses AUC13 curve for evaluating the machine learning model. A higher AUC score represents better performance of the model. Here, fivefold cross-validation14 is used to calculate the ROC, and the highest AUC is obtained for the integrated learning-based fusion model proposed in this paper, the detailed results of the specific comparison can be found in Table 3, and the ROC15 results for the related machine algorithms are shown in Figs. 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 and 27.

Table 3 Comparison of AUC score results of customer churn data prediction algotihms AUC score.
Fig. 9
figure 9

SVM(RBF)algorithm ROC and AUC.

Fig. 10
figure 10

SVM(RBF)algorithm ROC and AUC.

Fig. 11
figure 11

SVM(Poly) algorithm AUC.

Fig. 12
figure 12

SVM (Sigmoid) algorithm AUC.

Fig. 13
figure 13

Random Forest algorithm AUC.

Fig. 14
figure 14

KNN algorithm AUC.

Fig. 15
figure 15

Random Forest algorithm AUC.

Fig. 16
figure 16

LR algorithm AUC.

Fig. 17
figure 17

MLP (Algorithm 16) AUC.

Fig. 18
figure 18

MLP (Algorithm 17) AUC.

Fig. 19
figure 19

MultinomialNB algorithm AUC.

Fig. 20
figure 20

BernouiliNB algorithm AUC.

Fig. 21
figure 21

GaussianNB algorithm AUC.

Fig. 22
figure 22

DT(CART) algorithm AUC.

Fig. 23
figure 23

ID3 algorithm AUC.

Fig. 24
figure 24

ExtraTrees algorithm AUC.

Fig. 25
figure 25

AdaBoost algorithm AUC.

Fig. 26
figure 26

Comparison of K-fold AUC for each algorithm.

Fig. 27
figure 27

Comparison of average AUC by algorithm.

Intelligent early warning system for customer churn prediction based on Ensemble-Fusion model

In this section, the main functions of the real-time intelligent early warning system for customer churn data prediction based on the Ensemble-Fusion model will be elaborated in detail, and the relevant descriptions of the main functions are described as follows.

Information relevant to predicting customer churn

Figure 28 shows the top five of the “Top 100” accounts with high churn risk, as shown in Fig. 28, with detailed information (e.g., account name, account ID, etc.) displayed in the table. If the prediction is incorrect, the user can give feedback by clicking on the relevant action, and then feedback through the system. Of course, it is also possible to click on the Account ID to enter the detailed prediction page, which will be analyzed in detail in Section “Demonstration of the intelligent system of customer churn prediction”.

Fig. 28
figure 28

Example display of customer churn information.

Demonstration of the intelligent system of customer churn prediction

In Figs. 29 and 30, detailed information of a detailed page of a real-time intelligent prediction system for customer churn is described, which consists of two parts, wherein the upper half of the page displays the basic information of the current churned customer data prediction, which specifically includes information such as the user’s ID, name, and the type of platform. In the second half, the reasons for the churn are provided and a multi-dimensional analysis of the specific reasons is provided to help the relevant stakeholders and personnel in the relevant departments in the industry to analyze the current billing and usage trends of the account so as to identify the churn trends in time to take effective action.

Fig. 29
figure 29

Example display of lost customer details.

Fig. 30
figure 30

Example display of user and account trends.

Dashboard for an intelligent system for customer churn prediction

For dashboards designed for leadership decision makers, specific information about the results of predictive analysis of relevant customer churn data is presented in Figs. 31, 32, 33 and 34. The Real-Time Intelligent Alerts dashboard consists of a total of five sections. The first section is the overall trend in customer churn, which includes three parts: average churn rate, fully renewed accounts, and new onboarding contracts. The second section is Customer churn as a key driver for leading decision-making teams to make decisions. The third section is the Churn heatmap (Churn Heatmap Description), which displays churn rates for selected regions and also provides a top correlation analysis and top correlation forecast for the next six months.

Fig. 31
figure 31

Leadership Decision Panel Design—Generalized Information.

Fig. 32
figure 32

Leadership Decision Panel Design—Churn Heat Map44(We developed a customer churn intelligent early warning system using open source pyecharts, https://github.com/pyecharts/pyecharts).

Fig. 33
figure 33

Leadership decision panel design—correlation coefficient analysis.

Fig. 34
figure 34

Leadership decision panel design—360 degree information analysis presentation.

Customer churn prediction intelligent system evaluation module

In order to evaluate the performance of the model in the intelligent early warning system for customer churn based on the Ensemble-Fusion model, this subsection tests the 2018 production line production data. Figure 35 demonstrates the specific results of the evaluation, and the accuracy of the model is obtained by testing and validation to be above 95.8%, which achieves a high level of accuracy prediction. Higher accuracy means that more predicted churned customers are indeed likely to actually churn in the future, which does reduce the churn rate and retention of customers thus reducing the risk of fatalities to the organization due to customer churn.

Fig. 35
figure 35

Customer churn prediction model evaluation page.

Related work

To obtain the best model for customer churn prediction, this section will conduct a theoretical analysis of related machine learning algorithms and models. First, 9 categories and 17 algorithms related to machine-learning are expounded, and then in the third part, a prediction model of customer churn rate based on an ensemble-fusion model is proposed, and 17 sets of experiments are carried out to verify that the model has strong performance. Robust and easy to extend.

Support vector machines

Support vector machines(SVM)10,11 are a set of supervised learning methods used for classification, regression, and outlier detection12. The advantages of support vector machines are effective in high dimensional spaces. Still effective in cases where the number of dimensions is greater than the number of samples. The objective function:

$$L\left( {w,b,alpha} \right) = \frac{1}{2}\left| {\left| w \right|} \right|^{2} - \mathop \sum \limits_{i = 1}^{n} \alpha_{i} y_{i} \left( {wx_{i} + b} \right) + \mathop \sum \limits_{i = 1}^{n} \alpha_{i}$$
(3)

SVM is a supervised learning models that analyze data used for classification and regression analysis. In the customer churn prediction, SVM divides the result of prediction into two parts, such as positive is customer churn while negative is customer non-churn. The kernel of SVM is used like linear, poly and RBF.

Random forests

Random forests are constructed by several trees16 and each decision tree is trained by random samples. A random forest is a data construct applied to machine learning that develops large numbers of random decision trees analyzing sets of variables. This type of algorithm helps to enhance the ways that technologies analyze complex data. The Random Forest algorithm is one of the best algorithms for classification. RF can classify large data with accuracy. It is a learning method in which the number of decision trees is constructed at the time of training and outputs of the modal predicted by the individual trees. RF acts as a tree predictor where every tree depends on the ran- dom vector values. The basic concept behind this is that a group of “weak learners” may come together to build a “strong learner”. Random forest models are machine learning models that make output predictions by combining outcomes from a sequence of regression decision trees. Each tree is constructed independently and depends on a random vector sampled from the input data, with all the trees in the forest having the same distribution. The predictions from the forests are averaged using bootstrap aggregation and random feature selection. RF models have been demonstrated to be robust predictors for both small sample sizes and high dimensional data. RF clas- sification models were constructed that directly classified bioreactor runs as having sufficient or insufficient cardiomyopathy content.

K-nearest-neighbors

K-nearest-neighbors algorithm (KNN) is a non-parametric classification method first developed by Evelyn Fix and Joseph Hodges in 195117. It is used for classification and regression. In both cases, the input consists of the k closest training examples in the data set. The output depends on whether KNN18 is used for classification or regression. The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples. The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems. The principle behind nearest neighbor methods is to find a predefined number of training samples closest in distance to the new point and predict the label from these. The number of samples can be a user-defined constant (k-nearest neighbor learning) or vary based on the local density of points (radius-based neighbor learning). The distance can, in general, be any metric measure: standard Euclidean distance is the most common choice. Neighbors- based methods are known as non-generalizing machine learning methods since they simply “remember” all of their training data. KNN is a non-parametric algorithm, which means it does not make any assumptions on underlying data. It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the data set and at the time of classification, it performs an action on the data set. KNN algorithm at the training phase just stores the data set and when it gets new data, then it classifies that data into a category that is much similar to the new data.

Gradient boosting classifier

Gradient boosting[34, 35]produces a model in the form of an ensemble of the prediction model, usually there using decision trees. Gradient boosting classifier has a lot of advantages, such as high prediction rate, dealing with non-linear data, and flex- ible handling of various types of data. Predictions are made by the majority vote of the weak learners’ predictions, weighted by their individual accuracy. Gradient boosting machines are an extremely popular machine learning algorithm that has proven successful across many domains. A simple GBM model contains two categories of hyper-parameters: boosting hyper-parameters and tree-specific hyper-parameters. Gradient boosting re-defines boosting as a numerical optimization problem where the objective is to minimize the loss function of the model by adding weak learners using gradient descent. Gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. As gradient boosting is based on minimizing a loss function, different types of loss functions can be used resulting in a flexible technique that can be applied to regression, multi-class classification.

Theoretical analysis of customer churn rate prediction based on logistic regression

Logistic regression[19, 20]is a generalized linear regression analysis model, which is divided from the classification of machine learning. It belongs to the classification algorithm in supervised learning. Due to the good performance of logistic regression19, it can often be used for binary classification. or multi-classification problems. In the research on the prediction of customer churn rate, logistic regression can be abstracted here to deal with the binary classification problem, such as the label of marking customer churn as 0, and the label of non-churn as 1. At this time, for each set of input data, according to the Sigmoid function20

$$g(z) = \frac{1}{1 + e( - z)}$$
(4)

in logistic regression, the predicted value can be mapped to between [0, 1]. If y ≥ 0.5, it is recorded as 0 category is the loss, and similarly, it is 1 category that is not lost.

Theoretical analysis of customer churn rate prediction based on Bayesian theory

The research on customer churn prediction is currently limited to the application stage of Naive Bayes8. The basic idea of the Naive Bayes algorithm9: for a given category to be classified, solve the problem under the condition that this category appears. The probability of occurrence of each category, which category has the highest probability of occurrence, is considered to be the category to which the item to be classified belongs.

Theoretical analysis of customer churn rate prediction based on decision tree

In the research on customer churn prediction, a few pieces of literature use a decision tree algorithm21. A decision tree is also called a decision tree in some literature22. This kind of algorithm belongs to supervised learning in machine learning, which can be used to solve classification and regression problems. The decision tree algorithm is a top-down divide-and-conquer strategy, a recursive algorithm from the root node to the leaf node, where the leaf nodes are divided according to different division methods, generally according to information gain, gain rate, and Gini index23.The decision tree is divided the algorithms are ID3 algorithm, C4.5 algorithm and CART algorithm24.

Theoretical analysis of customer churn rate prediction based on neural network

In recent years, deep learning has been widely used to solve some complex problems, and it is also used in the prediction of customer churn rate25. The BP neural network was proposed by a group of scientists led by Rumelhart and McCelland in the book “Parallel Distributed Processing” in 1986, which detailed the error back-propagation algorithm for multilayer perceptions with nonlinear continuous transformation functions. The analysis of, realizes Minsky’s vision of multi-layer network26. The structure of BP neural network26is a backpropagation (Back Propagation) neural network, referred to as the BP neural network. The standard BP neural network is divided into three layers, namely the input layer, the hidden layer and the output layer, as shown in Fig. 36.

Fig. 36
figure 36

The structure of three-layer BP neural network.

The principle of the neural network algorithm mainly includes two stages: (1)FP (forward propagation) data is input from the input layer, then input through the hidden layer under the mapping of the relevant activation function, and finally reaches the output layer for output, and then according to the error between the expected output and the actual output is used to construct the cost function (loss function) for the second stage (2) BP (backpropagation) from the output layer through each hidden layer to correct the weight and bias of the hidden layer by layer, and finally correct the weights and biases from the hidden layer to the input layer, and finally get the neural network model. Neural networks can approximate any nonlinear function arbitrarily. Because of their simple structure and easy implementation, they have been widely used in time series analysis and nonlinear function regression estimation. However, the development of such networks is limited due to the difficulty of determining the network structure, the existence of over-learning, and the tendency to fall into local extreme values. This paper expects to use it in the research of customer churn prediction to get good results.

Conclusions and future work

In this paper, we proposed a novel model named Ensemble-Fusion that utilized 9 categories of 17 machine learning algorithms as baseline classifiers. Through experiment proves that the Ensemble-Fusion model(Our model) reaches 95.35%, AUC score reaches 91% and F1-Score reaches 96.96%, and the experimental results show that the data prediction accuracy of Ensemble-Fusion model outperforms that of other benchmark algorithms. This paper first elaborates on the important role of research in today’s information industry and gives important contributions, then this paper focuses on the research of customer churn prediction based on an integrated learning fusion model, mainly from the customer churn prediction solution based on the integrated learning fusion model, the design of real-time intelligent early warning system of customer churn, the machine learning algorithm of customer churn prediction and this paper. The newly proposed customer churn prediction model is compared and the specific implementation algorithm based on the integrated learning fusion model is given. Then this paper validates the proposed churn prediction algorithm experimentally and evaluates the robustness of the algorithm by using evaluation metrics such as precision, recall, accuracy, F1-score, and AUC. Finally, this paper provides a detailed description of the main functions of the theoretically and practically developed customer churn intelligent early warning system, in order to efficiently help the information industry improve its productivity and to be able to excel in today’s globally competitive environment.The study presented in this paper is not free of limitations. Firstly, it is challenging to gather all relevant data on customer churn due to sensitive information and related protocol issues. Therefore, how to construct an effective model using the limited dataset becomes a bottleneck in customer churn prediction research. The other limitation of the study is that there is still a lot of noise and no labels to mark customer churn in the collected data, which requires a lot of time to organize and learn relevant business knowledge before data collection and processing. Finally, customer churn is a multidisciplinary issue involving a variety of fields such as psychology, sociology, and economics, but current research may lack an interdisciplinary perspective and approach.Concerning future research, we intend to develop a similar ensemble-fusion classification algorithm that substitutes the baseline classifiers with reinforcement learning model-related algorithms. The primary aim here is to construct an ensemble classifier that can more easily be used in complex data structures such as multisource isomerization. In order to study customer churn in more depth in the future, there are several potential directions for further research. The first direction is to obtain more data from industry, e.g., combining different feature data. Another interesting direction is to relax strict algorithmic constraints to support compact and dense feature representations, which can be explored in areas such as fast symmetric decomposition techniques.