1 Introduction

As a new business model of the twenty-first century, e-commerce has broad development prospects. E-commerce development has reached a new economic growth point [1, 2]. On November 11, 2019, Alibaba announced that the transaction volume of the Tmall shopping platform had reached RMB 268.4 billion. Furthermore, the transaction volume of the Jingdong Shopping Platform was reported to be RMB 204.4 billion. The number of people shopping online has increased sharply. According to statistics from the China Internet Network Information Center, by the end of 2018, China’s Internet penetration rate was 59.6%, with 829 million Internet users. In 2018, the e-commerce transaction volume in China exceeded RMB 31 trillion. The growth rate of the e-commerce transaction volume in 2018 from the previous year was 8.5%. E-commerce, which is developing rapidly, promotes China’s social development (intelligence, digitalization, and networking), which in turn promotes the transformation and upgrading of traditional industries in China. E-commerce transaction volume is an indicator of the development of e-commerce. The prediction of e-commerce transaction volume can provide reliable data support for the development of government and enterprise policies [3, 4].

To date, e-commerce development research has focused on two aspects, namely e-commerce index construction and transaction volume prediction [5, 6]. Two types of e-commerce transaction volume prediction model have been devised, namely the statistical regression model and machine learning model. Compared with statistical regression models, machine learning models exhibit strong adaptability to data and strong nonlinear fitting ability and are thus popular. Ji et al. [7] used the C-a-XGBoost model to predict e-commerce transaction volume. This model combines a statistical regression model with a machine learning model. The linear part of the e-commerce transaction is predicted using the autoregressive integrated moving average (ARIMA) model, and the nonlinear part is predicted using the C-XGBoost model, and the final prediction result is weighted. Although the prediction accuracy of the model is high, the model is complex and its generalizability is low. Chang et al. [8] forecasted transaction volume by establishing a fuzzy neural network. In this method, k-means clustering is first used to cluster historical data. Then, sale volume is predicted using the fuzzy neural network. The k-means clustering method requires constant adjustment of the classification of samples, which is time consuming and affects the development of the model. Chen and Lu [9] proposed a hybrid model to predict e-commerce transaction volume. The hybrid model combines clustering technology with machine learning models. Compared with the single prediction model, the hybrid model had higher prediction accuracy. Di Pillo et al. [10] used the support vector machine (SVM) model to predict transaction value. Compared with the statistical regression ARIMA model, the SVM model has stronger regression ability for irregular data. However, the SVM model is more sensitive to the number of samples. When the number of samples is large, the predictive ability of the SVM model is limited. Zhang [11] developed the back propagation (BP) neural network model to predict e-commerce transaction volume. To escape from the local optimal value, the particle swarm optimization (PSO) algorithm was used to optimize the network’s parameters. Li et al. [12] proposed a hybrid model to predict the e-commerce transaction volume of China. The model combines the nonlinear autoregressive natural network with ARIMA. The hybrid model obtained superior prediction results to the single prediction model.

The heuristic optimization algorithm can address high-dimension and multiple extremum problems in complex optimization and has been widely used in many fields, such as economic scheduling, model parameter optimization, and signal processing. Classical algorithms—such as the PSO, differential evolution, and genetic optimization algorithms—are widely used [13,14,15]. Latest algorithms include the ant colony, gray wolf, whale, cuckoo search, and ant lion algorithms [16,17,18,19]. These algorithms simulate animal foraging behavior, flight behavior, or crawling behavior. They focus on only one optimal solution during the iteration process, which limits their local development and global exploration abilities. The moth–flame optimization algorithm (MFO) is different from these algorithms. The MFO algorithm can track multiple optimal solutions simultaneously. Aziz et al. [20] applied the MFO algorithm to solve the optimal threshold determination problem in image segmentation. In the case of multilevel thresholds, the MFO algorithm is used to determine the optimal threshold. The results demonstrated that the MFO algorithm was superior to the whale algorithm. Li et al. [21] used the MFO algorithm to optimize the parameters of the least squares SVM model and predicted the annual power load. Zhang et al. [22] used the MFO algorithm for facial expression recognition and achieved excellent results.

The extreme learning machine (ELM) model is widely used because of its excellent nonlinear mapping ability, strong generalization ability, and simple iterative process. Liu et al. [23] forecast photovoltaic power generation by using the ELM model and employed an intelligent optimizer to enhance the regression ability of the ELM model. Li et al. [24] evaluated the aging of insulated gate bipolar transistor modules by using the ELM model. Baliarsingh and Vipsita [25] used the ELM model to classify cancers and achieved high classification accuracy. Because of the shallow structure of the ELM, even if many hidden nodes exist, the effect of using the ELM for feature learning is weak. To address this problem, Tang et al. [26] proposed a layered learning framework based on the ELM. Test experiments revealed that this method had high learning efficiency. Suresh et al. [27] used the ELM to measure the visual quality of JPEG encoded images. To improve the generalization performance of the ELM algorithm, the real-coded genetic algorithm and k-fold selection scheme were used to select the optimal input weights and deviation values. Wan et al. [28] used the ELM model to directly determine the optimal prediction interval for wind power generation. Experimental results demonstrated that the method had high efficiency and reliability. This method provided a novel general framework for probabilistic wind power forecasting, with high reliability and flexibility. Zhang et al. [29] used the ELM model to address the multiclass classification problem in the field of cancer diagnosis. The ELM model had high classification accuracy with reduced training time and implementation complexity.

Using data provided by the China Internet Network Information Center, this study established a forecast model of e-commerce transaction volume. First, to avoid the shortcoming of the MFO, which is that escaping from extreme values is difficult, the hybrid MFO (HMFO) was proposed. Second, the factors that affect e-commerce transaction volume were analyzed. The input of the e-commerce transaction volume forecast model was determined by the correlation coefficient. Finally, the ELM-HMFO hybrid improvement strategy model was used to predict the e-commerce transaction value of China. The main findings and contributions of this study are as follows:

  1. (1)

    The HMFO algorithm was proposed and applied to e-commerce.

  2. (2)

    The ELM-HMFO model was developed to predict the e-commerce transaction volume in China and achieved excellent results.

  3. (3)

    Through e-commerce transaction volume forecasting, the transaction trend of the market was determined, and decision information for enterprises was provided.

The paper is organized as follows. Section 2 introduces the modeling process of the e-commerce transaction volume prediction model. Section 3 analyzes the influencing factors and forecast results. The conclusions of this paper are presented in Section 4.

2 Model for forecasting e-commerce transaction volume

2.1 MFO

MFO, which is an intelligent heuristic algorithm, simulates the flight behavior of moth species [30]. The algorithm imitates the flight pattern of the moth at night. Moths mainly rely on moonlight to adjust their position, ensuring that the angle between their flight direction and the moonlight remains unchanged during the flight. Because the distance between the moon and moth is large, the moth flies in a relatively straight line. However, when the moth approaches artificial light sources, the flight path of the moth is disturbed from a straight line to a spiral path [31,32,33].

In the MFO algorithm, the target solution in the optimization process is the moth and the position of the moth represents the variable. By changing its position, a moth can fly in a multidimensional space. Let the position matrix of the moth be POS [21]:

$$ {P}_{OS}=\left[\begin{array}{c}{pos}_{1,1}\kern0.5em {pos}_{1,2}\kern0.5em \cdots \kern0.5em {pos}_{1,D}\\ {}\begin{array}{ccc}{pos}_{2,1}& {pos}_{2,2}& \begin{array}{cc}\cdots & {pos}_{2,D}\end{array}\end{array}\\ {}\begin{array}{c}\begin{array}{ccc}\vdots & \vdots & \begin{array}{cc}\vdots & \vdots \end{array}\end{array}\\ {}\begin{array}{ccc}{pos}_{m,1}& {pos}_{m,2}& \begin{array}{cc}\cdots & {pos}_{m,D}\end{array}\end{array}\end{array}\end{array}\right] $$
(1)

where D is the variable dimension and m is the number of moths.

By inputting the position variable of the moth into the fitness function, the fitness function output is the objective value corresponding to the moth. The objective value corresponding to the moth is stored in the array Fit as follows:

$$ {P}_{os} fit=\left[\begin{array}{c}{P}_{OS}{fit}_1\\ {}{P}_{OS}{fit}_2\\ {}\begin{array}{c}\vdots \\ {}{P}_{OS}{fit}_m\end{array}\end{array}\right] $$
(2)

The moth requires the position information of the fire to update its flight path. The number of variables in the position matrix of the fire is the same as that of the moth. The position matrix of the fire is represented byFir as follows:

$$ Fir=\left[\begin{array}{c}{fir}_{1,1}\kern0.5em {fir}_{1,2}\kern0.5em \begin{array}{cc}\cdots & {fir}_{1,D}\end{array}\\ {}\begin{array}{ccc}{fir}_{2,1}& {fir}_{2,2}& \begin{array}{cc}\cdots & {fir}_{2,D}\end{array}\end{array}\\ {}\begin{array}{c}\begin{array}{ccc}\vdots & \vdots & \begin{array}{cc}\vdots & \vdots \end{array}\end{array}\\ {}\begin{array}{ccc}{fir}_{m,1}& {fir}_{m,2}& \begin{array}{cc}\cdots & {fir}_{m,D}\end{array}\end{array}\end{array}\end{array}\right] $$
(3)

Similarly, the position variable of the flame is input to the fitness function to obtain the fitness value corresponding to the flame. The fitness value of the flame is expressed byFirfit as follows:

$$ Firfit=\left[\begin{array}{c}{Firfit}_1\\ {}{Firfit}_2\\ {}\begin{array}{c}\vdots \\ {}{Firfit}_m\end{array}\end{array}\right] $$
(4)

During the iterative process of the MFO algorithm, both the flame and moth positions are solutions to the optimization problem, but the positions of the moths and flames are updated differently. Moths are optimal individuals, searching for the optimal solution in their space. The flame is the optimal location in the space searched by the moth. The moth searches around the corresponding flame. During the iteration, the optimal position is updated using the fitness. The optimal position is used as the flame position in the next iteration. The positions of flames and moths are constantly updated in this search mechanism. The optimization process of the MFO algorithm consists of the following three parts:

$$ MFO=\left( Init, FuncP, DecT\right) $$
(5)

Init is the initialization function. The initial population position matrix and the corresponding fitness matrix of the MFO algorithm are obtained usingInit.

$$ Init:\varphi \to \left\{{P}_{OS},{P}_{OS} fit\right\} $$
(6)

FuncP represents the moth position update function. FuncP receives the moth position matrix POS and returns the updated moth position matrix POS.

$$ FuncP:{P}_{OS}\to {P}_{OS} $$
(7)

DecT is the termination decision function. When the algorithm satisfies the termination condition, function DecT returns T; when the algorithm does not satisfy the termination condition, function DecT returns F.

$$ DecT:{P}_{OS}\to \left\{T,F\right\} $$
(8)

The MFO algorithm uses the function Init to generate the initial position and corresponding fitness. The program of function Init is shown as follows:

figure d

where U = [U1, U2, …, Um] and L = [L1, L2, .., Lm] are the upper and lower bounds of the position variable, respectively.

Considering the influence of flames, the position of a moth during flight is updated as follows:

$$ {P_{OS}}_i=S\left({P}_{OS_i,} Fir\right) $$
(9)

where \( {P}_{OS_I} \) is moth i, S is the spiral function, and Firj is flame j.

The spiral function is as follows:

$$ S\left({P}_{OS_i,}{F}_{ir_j}\right)=D{is}_i\bullet {e}^{al}\bullet \cos \left(2\pi n\right)+{Fir}_j $$
(10)

where l ∈ [−1, 1], a defines the constant of the spiral shape, and \( {Dis}_i\left({Dis}_i=\left|{P}_{OS_i,}{Fir}_j\right|\right) \) represents the distance between flame j and moth i.

The position of the moth relative to the flame is determined by the parameter l. The moth is as far from the fire as possible when l = 1 and as close to the fire as possible when l = −1. After each flame position is updated, the flame fitness values are sorted and the moth’s position is updated using the sorted flames. Therefore, the position of the first moth in the population is always updated according to the best flame position, and the positions of the subsequent moths are updated according to the corresponding flame positions. This mechanism prevents moths from being attracted by the same flame, thereby expanding the search range of the moths.

If a fixed number of moths search m positions, the local search capability of the MFO algorithm may be poor. The MFO algorithm uses an adaptive mechanism to adjust the number of flames. During the iteration, the number of flames is decreased gradually. However, it is necessary to ensure that the number of fires and moths are consistent in the iteration process. At the end of the iteration, a moth’s position is updated according to the best flame position [34, 35]. The specific adjustment method is as follows:

$$ {Fir}_{num}= round\left({Init}_{num}-{t}^{\ast}\frac{Init_{num}-1}{T}\right) $$
(11)

where t is the current number of iterations, round is the rounding function, Initnum is the number of first-generation flames, and T is the total number of iterations.

Based on the aforementioned principles, the operation process of the MFO algorithm during the iterations is as follows:

  1. (1)

    Initialize the position of the moth and calculate the corresponding objective value.

  2. (2)

    The optimal position of the moth is considered the initial fire position.

  3. (3)

    Update moth and fire locations.

  4. (4)

    Update the moth position according to the fire position.

  5. (5)

    Change the number of flames on the basis of the number of iterations.

  6. (6)

    Determine whether to terminate the iteration.

2.2 MFO algorithm based on the hybrid strategy (HMFO)

A limitation of the traditional MFO algorithm is its poor ability to escape from a local solution. At the end of iteration as the number of moths and flames decrease the local development capability of the MFO algorithm is limited, which results in moths adopting spiral flight to approach the flame. The spiral flight results in large value space of the flame and also increases the flight time of the moth. To solve this limitation of the MFO algorithm, the hybrid strategy is used.

First, the Levy flight strategy is used to improve the moth update strategy. The convergence ability of the MFO algorithm is limited because the numbers of flames and moths decrease in later iterations. The local convergence ability of the MFO algorithm can be strengthened using the short-distance walk characteristics of the Levy flight strategy. Because of the variability of the long-distance jumping direction of the Levy flight strategy, the ability of the MFO algorithm to determine the best solution in the iterative process can be enhanced. The ability of the algorithm to avoid a local solution and determine the best solution can be improved using the long- and short-distance search method of the Levy flight strategy. The random walk trajectory of the Levy flight strategy is depicted in Fig. 1.

Fig. 1
figure 1

Levy strategy path

The random search path of Levy flight is depicted as follows:

$$ Levy\sim \frac{\omega }{{\left|\rho \right|}^{\frac{1}{\alpha }}}\left(1<\alpha <3\right) $$
(12)

where α = 1.5; ω and ρ follow a normal distribution.

$$ \left\{\begin{array}{c}\omega \sim N\left(0,{\sigma}_{\omega}^2\right)\\ {}\rho \sim N\left(0,{\sigma}_{\rho}^2\right)\end{array}\right. $$
(13)

The normal distribution values are as follows:

$$ \left\{\begin{array}{c}{\sigma}_{\omega }={\left[\frac{\Gamma {\left(1+\alpha \right)}^{\ast}\sin \left(\frac{\alpha }{2}\pi \right)}{\Gamma {\left(\frac{1+\alpha }{2}\right)}^{\ast }{\alpha}^{\ast }{2}^{\frac{1+\alpha }{2}}}\right]}^{\frac{1}{\alpha }}\\ {}{\sigma}_{\rho }=1\end{array}\right. $$
(14)

The moth’s position in flight is updated according to the Levy strategy as follows:

$$ {P}_{OS_i}\left(t+1\right)={P}_{OS_i}(t)+\zeta \oplus Levy\left(\alpha \right) $$
(15)

where ζ is the random step size and ⊕ is point multiplication.

In the MFO algorithm, moths search through spiral flight. To improve the global searching ability of moths in early flight and the local searching ability in subsequent iterations, the sine coefficient is introduced. The optimization efficiency of the moth is improved by the sine coefficient as follows:

$$ SC={SC}_{min}+{\left({SC}_{max}-{SC}_{min}\right)}^{\ast}\sin \left(\frac{t}{T}\pi \right) $$
(16)

where SCmin and SCmax are the minimum and maximum values of the sine coefficient, respectively. The curve of the sine coefficient is depicted in Fig. 2.

Fig. 2
figure 2

Sine coefficient

The curve depicted in Fig. 2 changes according to the sine law, which causes the MFO algorithm to start from local search at the beginning of the iteration. As the coefficient increases, the MFO algorithm turns to global search and then turns to local search in the final stage. This coefficient strengthens the convergence capability of the MFO algorithm.

2.3 Algorithm performance comparison

Standard functions were used to test the feasibility of the HMFO algorithm. These standard functions consisted of three unimodal functions and three multimodal functions. The MFO, PSO, and ant lion optimizer (ALO) algorithms were compared. The ALO algorithm imitates the hunting behavior of an ant lion [36]. The expressions, variable value ranges, and optimal solutions of the unimodal function and multimodal functions are listed in Tables 1 and 2 [37, 38].

Table 1 Unimodal functions
Table 2 Multimodal functions

The HMFO, MFO, ALO, and PSO algorithms were tested 15 times with six test functions. The maximum number of iterations was 500. The population size for HMFO, MFO, ALO and PSO was 30. The same test platform was used during all simulations. The platform parameters were as follows: i5 processor, 16 GB of memory, 512 GB hard disk capacity, and MATLAB 2016a simulation software. Table 3 depicts the test data for the algorithms.

Table 3 Test results

For the unimodal test functions, the average test result of the HMFO algorithm was the best. The convergence results of the MFO and ALO algorithms were similar. The test result of the PSO algorithm was poor. Similarly, for the convergence interval obtained using 15 repeated tests, the convergence interval of the HMFO algorithm was the smallest. The PSO algorithm exhibited the largest convergence interval. The difference between the convergence intervals of the MFO and ALO algorithms was small. Therefore, for the unimodal test function, the test results for the HMFO algorithm were superior; the convergence effects of the MFO and ALO algorithms were similar but inferior to that of the HMFO algorithm. The PSO algorithm exhibited poor performance.

For the multimodal functions, the HMFO algorithm exhibited strong convergence ability. For the S4 and S5 functions, the HMFO algorithm converged to 0. For the S6 function, although HMFO did not converge to 0, its convergence effect was more accurate than those of the other three algorithms. For the S4 and S5 functions, the convergence results of the MFO, ALO, and PSO algorithms were similar. For the S6 function, the average convergence accuracy of the MFO and ALO algorithms was higher than that of the PSO algorithm. For the multimodal test functions, the convergence result of the HMFO algorithm was more accurate than those of the other three algorithms. The convergence effects of the PSO, ALO, and MFO algorithms were similar.

From the convergence data presented in Table 3, the HMFO algorithm was found to have strong convergence ability for both unimodal and multimodal functions. The MFO, ALO, and PSO algorithms exhibited poor ability to avoid the local solution. For the multimodal functions, the test results of the three algorithms were imprecise. Through the sine coefficient and Levy flight, the HMFO algorithm exhibited strong local and global convergence capabilities. Especially for multimodal functions, the HMFO algorithm exhibited strong convergence ability.

The ALO algorithm required the longest execution time. The HMFO and MFO algorithms required the least execution time. The execution times of the HMFO and MFO algorithms were nearly identical, indicating that use of the HMFO algorithm does not considerably increase the calculation cost.

2.4 Elm

ELM is an improved feed-forward neural network. As a novel single-hidden-layer neural network, the ELM model is applied in various fields, such as data mining, fault classification, and life prediction. The most crucial feature of the ELM model is that the model’s weights and thresholds are initialized randomly [39,40,41]. The weights and thresholds need not be updated during the learning process, which speeds up calculation. The ELM model is simple, has strong generalizability, and learns quickly. Thus, the ELM model solves the complex structure and adjustable parameter problems of the traditional model. The topology of the ELM model is depicted in Fig. 3.

Fig. 3
figure 3

Topology of the ELM model

The ELM model consists of a three-layer network (hidden, input, and output layers), as depicted in Fig. 3. The hidden and input layers are connected by neurons, as are the hidden and output layers. The number of neurons in the hidden, output, and input layers is herein denoted m, k, and n, respectively [42, 43].The input matrix is A, and the output matrix is B. The total number of samples is N.

$$ A={\left[\begin{array}{c}{a}_{1,1}\kern0.5em {a}_{1,2}\kern0.5em \begin{array}{cc}\cdots & {a}_{1,N}\end{array}\\ {}\begin{array}{ccc}{a}_{2,1}& {a}_{2,2}& \begin{array}{cc}\cdots & {a}_{2,N}\end{array}\end{array}\\ {}\begin{array}{c}\begin{array}{ccc}\vdots & \vdots & \begin{array}{cc}\vdots & \vdots \end{array}\end{array}\\ {}\begin{array}{ccc}{a}_{n,1}& {a}_{n,2}& \begin{array}{cc}\cdots & {a}_{n,N}\end{array}\end{array}\end{array}\end{array}\right]}_{n\times N}B={\left[\begin{array}{c}\begin{array}{ccc}{b}_{1,1}& {b}_{1,2}& \begin{array}{cc}\cdots & {b}_{1,N}\end{array}\end{array}\\ {}\begin{array}{ccc}{b}_{2,1}& {b}_{2,2}& \begin{array}{cc}\cdots & {b}_{2,N}\end{array}\end{array}\\ {}\begin{array}{c}\begin{array}{ccc}\vdots & \vdots & \begin{array}{cc}\vdots & \vdots \end{array}\end{array}\\ {}\begin{array}{ccc}{b}_{k,1}& {b}_{k,2}& \begin{array}{cc}\cdots & {b}_{k,n}\end{array}\end{array}\end{array}\end{array}\right]}_{k\times N} $$
(17)

Suppose C is the weight between the input and hidden layers and D is the weight between the output and hidden layers.

$$ C={\left[\begin{array}{c}{c}_{1,1}\kern0.5em {c}_{1,2}\kern0.5em \begin{array}{cc}\cdots & {c}_{i,n}\end{array}\\ {}\begin{array}{ccc}{c}_{2,1}& {c}_{2,2}& \begin{array}{cc}\cdots & {c}_{2,n}\end{array}\end{array}\\ {}\begin{array}{c}\begin{array}{ccc}\vdots & \vdots & \begin{array}{cc}\vdots & \vdots \end{array}\end{array}\\ {}\begin{array}{ccc}{c}_{m,1}& {c}_{m,2}& \begin{array}{cc}\cdots & {c}_{m,n}\end{array}\end{array}\end{array}\end{array}\right]}_{m\times n}D={\left[\begin{array}{c}\begin{array}{ccc}{d}_{1,1}& {d}_{1,2}& \begin{array}{cc}\cdots & {d}_{1,k}\end{array}\end{array}\\ {}\begin{array}{ccc}{d}_{2,1}& {d}_{2,2}& \begin{array}{cc}\cdots & {d}_{2,k}\end{array}\end{array}\\ {}\begin{array}{c}\begin{array}{ccc}\vdots & \vdots & \begin{array}{cc}\vdots & \vdots \end{array}\end{array}\\ {}\begin{array}{ccc}{d}_{m,1}& {d}_{m,2}& \begin{array}{cc}\cdots & {d}_{m,k}\end{array}\end{array}\end{array}\end{array}\right]}_{m\times k} $$
(18)

The hidden layer threshold of the ELM model is as follows:

$$ \beta ={\left[{\beta}_1,{\beta}_2,\dots, {\beta}_m\right]}^T $$
(19)

When the matrix A is input to the ELM model, the prediction output matrix Q is obtained:

$$ {\displaystyle \begin{array}{c}Q=\left[{q}_1,{q}_2,\dots, {q}_N\right]\\ {}\begin{array}{c}k\times N\\ {}{q}_j={\left[\begin{array}{c}{q}_{1j}\\ {}\begin{array}{c}{q}_{2j}\\ {}\begin{array}{c}\vdots \\ {}{q}_{kj}\end{array}\end{array}\end{array}\right]}_{k\times 1}={\left[\begin{array}{c}\sum \limits_{i=1}^m{d}_{i1}E\left({c}_i{a}_j+{\beta}_i\right)\\ {}\begin{array}{c}\sum \limits_{i=1}^m{d}_{i2}E\left({c}_i{a}_j+{\beta}_i\right)\\ {}\begin{array}{c}\vdots \\ {}\sum \limits_{i=1}^m{d}_{k1}E\left({c}_i{a}_j+{\beta}_i\right)\end{array}\end{array}\end{array}\right]}_{k\times 1}\end{array}\end{array}} $$
(20)

where E (a) is the activation function and has infinite differentiability.

Equation 20 can be represented as follows:

$$ {\displaystyle \begin{array}{c} FD={Q}^T\\ {}F={\left[\begin{array}{c}E\left({c}_1{a}_1+{\beta}_1\right)\kern0.5em E\left({c}_2{a}_1+{\beta}_2\right)\cdots E\left({c}_k{a}_1+{\beta}_k\right)\ \\ {}\begin{array}{c}E\left({c}_1{a}_2+{\beta}_1\right)\kern0.75em E\left({c}_2{a}_2+{\beta}_2\right)\cdots E\left({c}_k{a}_2+{\beta}_k\right)\\ {}\begin{array}{c}\vdots \kern0.5em \vdots \kern0.5em \vdots \kern0.5em \vdots \\ {}E\left({c}_1{a}_N+{\beta}_1\right)\kern0.75em E\left({c}_2{a}_N+{\beta}_2\right)\kern0.75em \cdots E\left({c}_k{a}_2+{\beta}_k\right)\end{array}\end{array}\end{array}\right]}_{N\times k}\end{array}} $$
(21)

where F represents the output matrix of the hidden layer and QT represents the transpose matrix.

In the ELM model, C and β are determined randomly. The connection weight D can be calculated using the least square solution.

$$ \genfrac{}{}{0pt}{}{\mathit{\min}}{D}\left\Vert FD-{Q}^T\right\Vert $$
(22)

The least square solution is as follows:

$$ D={F}^{+}{Q}^T $$
(23)

where F+ is the generalized inverse matrix.

2.5 E-commerce transaction volume forecast using the ELM-HMFO model

The deviation value and input weight of the ELM model were randomly determined. Although the number of setting parameters during the iteration was reduced, random parameters increased the number of neurons in the network, which reduced the resource utilization of the model. Similarly, random parameters could reduce the prediction stability of the model, causing a large regression error of the ELM model. To improve the forecast stability and effect of the ELM model, the HMFO algorithm was used to solve the random parameters problem.

The convergence analysis in Section 2.3 revealed that the HMFO algorithm had strong convergence ability and exhibited an excellent convergence effect for multimodal functions. By combining the HMFO algorithm with the ELM model, the ELM-HMFO model was established to predict e-commerce transaction volume. The prediction process of the ELM-HMFO model was as follows:

  1. (1)

    Divide the sample and determine the input and output of the model.

  2. (2)

    Initialize the population of the HMFO algorithm and the parameters of the ELM model.

  3. (3)

    Normalize the sample data.

  4. (4)

    Use the training set to train the e-commerce transaction volume prediction model.

  5. (5)

    Search the optimal parameters using the HMFO algorithm.

  6. (6)

    Input the optimal parameters to the ELM model.

  7. (7)

    Use the ELM-HMFO model to predict the e-commerce transaction volume.

  8. (8)

    Normalize the forecast results of e-commerce transaction volume and analyze the forecast effect.

The flowchart of the e-commerce transaction volume prediction model is depicted in Fig. 4.

Fig. 4
figure 4

E-commerce transaction volume forecast flow

3 Data analysis and simulation experiment

3.1 Analysis of e-commerce transaction volume

The e-commerce transaction system is a complex and dynamic system. To predict e-commerce transaction volume, the dynamics, nonlinearity, and volatility of e-commerce data should be considered. The data used in this study were obtained from the China Internet Network Information Center. We present statistics on the e-commerce data from 2004 to 2019. The e-commerce transaction volume is depicted in Fig. 5.

Fig. 5
figure 5

E-commerce turnover

Figure 5 reveals that the e-commerce transaction volume increased every year, but the growth was nonlinear. Before 2011, the growth rate of the e-commerce transaction volume of China was slow, with trading volume less than RMB 10 trillion; after 2012, the e-commerce transaction volume grew rapidly and exceeded RMB 34 trillion by the end of 2019. Factors such as the number of CN domain names, number of Internet users, number of websites, Internet penetration, export bandwidth, and mobile phone penetration affect the e-commerce transaction volume of China. This study mainly analyzed the following factors influencing the e-commerce transaction volume:

  1. (1)

    The number of Internet users directly affects Internet demand and e-commerce transaction volume.

  2. (2)

    Websites are the platform for e-commerce and directly affect the scale of e-commerce transaction volume.

  3. (3)

    CN domain names are China’s top-level domain names and critical for promoting the use of the Internet. The greater the number of CN domain names, the more Internet information service companies there are. The CN domain name to a certain extent affects the volume of e-commerce transactions.

  4. (4)

    The Internet penetration rate is similar to the number of Internet users. A higher Internet penetration rate indicates a higher integration degree of the Internet information platform and is also a factor affecting e-commerce transaction volume.

  5. (5)

    Export bandwidth affects the information communication between countries. It also affects users’ Internet access quality and experience. High export bandwidth promotes the development of e-commerce platforms, which affects e-commerce transactions.

The correlation coefficient C was used to reflect the degree of correlation between the five factors and e-commerce transaction volume. The correlation coefficient C was calculated as follows:

$$ C=\frac{m\sum op-\sum o\sum p}{\sqrt{m\sum {o}^2-{\left(\sum o\right)}^2\sqrt{m\sum {p}^2-{\left(\sum p\right)}^2}}} $$
(24)

where m is the number of variables and C ∈ [0, 1]. When C is 0, two variables are unrelated; when C is 1, two variables are strongly correlated.

The coefficients of correlation between influencing factors and transaction value are presented in Table 4.

Table 4 Correlation analysis

Table 4 indicates that the number of netizens, number of websites, and export bandwidth were strongly correlated with e-commerce transaction volume. The correlation coefficient C was higher than 0.9 for these factors. The correlation between export bandwidth and e-commerce transaction volume, with a correlation coefficient of 0.98, was the strongest. The correlation between e-commerce transaction volume and CN domain names was the weakest, with a correlation coefficient of 0.87.

To reduce the amount of calculation required using the e-commerce transaction volume forecast model, the input of the model was determined from the correlation coefficients. Table 4 reveals that the coefficients of correlation of e-commerce transaction value with number of websites, number of Internet users, Internet penetration rate, and export bandwidth were high. Therefore, we used the number of websites, network penetration rate, export bandwidth, and number of Internet users as input variables for the model. The e-commerce transaction volume was used as the model’s output variable.

3.2 E-commerce transaction volume forecast

The e-commerce transaction data published on the website of the China Internet Information Center were used as the test and training data of the prediction model. The number of Internet users, number of websites, network penetration rate, and export bandwidth from 2004 to 2019 were considered as the input samples of the model. The e-commerce transaction volume from 2004 to 2019 was considered the output samples of the model. The ELM-HMFO model was used to predict the e-commerce transaction volume. The ELM-MFO model and SVM model were selected as the comparison models. The relative error (AE), root mean square error (RMSE), and decision coefficient (R2) were employed to analyze the prediction results:

$$ AE=\frac{p-a}{a}\times 100\% $$
(25)
$$ RMSE=\sqrt{\frac{1}{S}\sum \limits_{i=1}^s{\left({p}_i-{a}_i\right)}^2} $$
(26)
$$ {R}^2=1-\frac{\sum \limits_{i=1}^s{\left({p}_i-{a}_i\right)}^2}{\sum \limits_{i=1}^s{\left({a}_i-\overline{p}\right)}^2} $$
(27)

where a is the actual transaction volume, p is the regression transaction volume, and \( \overline{p} \) is the regression average transaction volume.

First, the number of Internet users, number of websites, network penetration rate, and export bandwidth from 2004 to 2010 were considered as the training input samples, and the e-commerce transaction volumes from 2004 to 2010 were used as the training output samples of the prediction model. The export bandwidth, number of websites, network penetration rate, and number of Internet users from 2011 to 2019 were used as the test input samples of the model, and the e-commerce transaction volume from 2011 to 2019 was used as the test output samples of the prediction model. The ELM-HMFO, SVM, and ELM-MFO models were used to forecast the e-commerce transaction volume from 2011 to 2019. The regression curve of each model is depicted in Fig. 6.

Fig. 6
figure 6

Forecast curves of the e-commerce transactions from 2011 to 2019

Figure 6(a) shows the regression of the SVM, ELM-MFO, and ELM-HMFO models. Figure 6(b) depicts the relative error curves. The regression curves reflect the upward trend of the real curve in e-commerce transaction volume from 2011 to 2019. Comparing the error curves, the AE errors of the three models were within [−6%, 6%]. The fluctuation of the AE curve of the ELM-HMFO model was the smallest, indicating that the ELM-HMFO model obtained superior predictions. On the basis of the overall prediction curves of the three models, the RMSE and R2 of the models were calculated. Table 5 lists the fitting effects of the SVM, ELM-MFO, and ELM-HMFO models.

Table 5 Analysis of the forecast results of e-commerce transaction volume from 2011 to 2019

The AE interval of the ELM-HMFO model was the smallest, which indicated superior prediction stability. The minimum RMSE of the ELM-HMFO model was 0.58, which was 51.26% smaller than that of the SVM model and 21.62% smaller than that of the ELM-MFO model. The ELM-HMFO model exhibited the smallest prediction error. The R2 of the SVM, ELM-MFO, and ELM-HMFO models was higher than 0.98, indicating that the regression fit was close. Figure 6(a) reveals that the regression results reflect the fluctuation in the real value of the e-commerce transaction volume. The execution time of the SVM algorithm was the shortest. The execution times of the ELM-MFO and ELM-HMFO models were similar, because the MFO and the HMFO algorithms form part of the ELM-MFO and ELM-HMFO models, respectively. During the training process, the two algorithms optimized the random parameters of the ELM model. Therefore, the execution times of the ELM-MFO and ELM-HMFO models were longer than that of the SVM model.

The number of Internet users, number of websites, network penetration rate, and export bandwidth from 2004 to 2008 were used as the training input samples of the model. The e-commerce transaction volume from 2004 to 2008 was used as the training output samples. The number of Internet users, number of websites, network penetration rate, and export bandwidth from 2009 to 2019 were used as the test input samples of the model. The e-commerce transaction volume from 2009 to 2019 was used as the test output samples. The ELM-HMFO, SVM, and ELM-MFO models were used to forecast the e-commerce transaction volume from 2009 to 2019. The regression curves are depicted in Fig. 7.

Fig. 7
figure 7

Forecast curves of the e-commerce transaction volume from 2009 to 2019

The predicted values of the e-commerce transaction volume obtained using the SVM, ELM-MFO, and ELM-HMFO models are illustrated in Fig. 7(a). The relative error curves are shown in Fig. 7(b). The predicted values of the ELM-HMFO and ELM-MFO models in Fig. 7(a) are close to the true values. Comparing the forecast errors in Fig. 7(b), the AE errors of the ELM-HMFO and ELM-MFO models were within 8%. The maximum AE error of the SVM model exceeded 25%. The analysis results of the ELM-HMFO, ELM-MFO, and SVM models for the e-commerce transaction volume from 2009 to 2019 are presented in Table 6.

Table 6 Analysis of the forecast results of e-commerce transactions from 2009 to 2019

The AE interval of the ELM-HMFO model was [−7.92%, 4.48%], which indicated that the prediction stability was excellent. The AE interval of the SVM model was [−4.28%, 29.74%]. The maximum AE of the SVM model was 29%, which indicated that the prediction error was relatively large. The minimum RMSE of the ELM-HMFO model was 0.76, which was 31.53% smaller than that of the SVM model and 28.97% smaller than that of the ELM-MFO model. The R2 of the SVM, ELM-MFO, and ELM-HMFO models was higher than 0.99, indicating that the regression results reflected the trend in actual values. Similarly, the execution time of the three algorithms was similar to that in the first group of simulations. The SVM model had the shortest execution time. The execution time of ELM-MFO and ELM-HMFO models was within 10 s.

Figure 8 depicts the regression effect analysis of the e-commerce transaction volume predicted using the SVM, ELM-MFO, and ELM-HMFO models.

Fig. 8
figure 8

Prediction and evaluation

The RMSE and R2 of the three models for the e-commerce transaction volume from 2011 to 2019 and 2011 to 2019 are depicted in Fig. 8. The prediction accuracy of the ELM-HMFO model was high, and its RMSE was lower than 0.8. The fitting effects of the three models were similar, and the R2 of the three models was higher than 0.98. The comprehensive analysis showed that the fitting effect and regression error of the ELM-HMFO model were satisfactory.

4 Conclusion

As an emerging industry, e-commerce has driven industrial transformation and promoted the development of the manufacturing, logistics, and service fields. E-commerce has become a driving force behind China’s economic development. E-commerce transaction volume reflects the development level of e-commerce. The government can consider the trend in e-commerce transaction volume when formulating planning policies, and the development trend of e-commerce transaction volume can provide data support for enterprise investment decisions. Therefore, analyzing the development trend of e-commerce transaction volume is necessary. For this reason, this paper proposes an efficient ELM-HMFO model for predicting e-commerce transaction volume, and the model achieved favorable prediction results. The main conclusions and contributions obtained in this study are as follows:

  1. (1)

    The novel HMFO algorithm was proposed. The performance of MFO algorithm was improved by introducing a sine coefficient and the Levy strategy.

  2. (2)

    For the multipeak test functions S4 and S5, the HMFO algorithm converged to the optimal value 0, which demonstrated strong optimization ability.

  3. (3)

    The ELM-HMFO model was proposed to accurately forecast the e-commerce transaction volume. The RMSE of the ELM-HMFO model was 51.26% smaller than that of the SVM model and 21.62% smaller than that of the ELM-MFO model for the e-commerce transaction volume from 2011 to 2019; the RMSE of ELM-HMFO model was 31.53% smaller than that of the SVM model and 28.97% smaller than that of the ELM-MFO model for the e-commerce transaction volume from 2009 to 2019.

  4. (4)

    The ELM-HMFO model can be used by the government to formulate planning policies and assist the investment decisions of enterprises. The model can play a critical role in evaluating the development trend of e-commerce.

The prediction results for e-commerce transaction volume and obtained using the HMFO and ELM-HMFO models were compared, showing that the proposed model exhibited higher prediction accuracy. However, the proposed model also has certain limitations. In future research, the MFO algorithm should be enhanced to reduce the execution time and improve convergence accuracy. The proposed model is simple. In future research, we will focus on developing a hybrid model with higher prediction accuracy.