1 Introduction

Machine learning (ML) algorithms have progressed substantially through the past two decades, from laboratory curiousness to a workable technology in prolonged commercial use. Artificial neural networks (ANNs), support vector machines (SVMs), fuzzy logic (FL) and neuro-fuzzy are well-known ML techniques that have proliferated as preferred methods of solving a broad range of problems in many areas. The research interest in ANN is related to its attractive benefits it manifests, like adaptation ability, learning capacity and its aptitude to generalise. As a score of this interest, ANN has been extensively used to solve an extensive set of problems such as in colour recognition Al-Azzeh et al. 2019, classification problems Arunkumar et al. 2020 and many other applications Chang et al. 2019. In effect, classification using ML methods is one of the most widespread and preferred applications in varied fields of study suchlike decision-making tasks that face human activity. For instance, ANNs have been effectively used to solve a large assortment of real-world classification problems in industry Wang et al. 2012, engineering Pirdashti et al. 2013 and medicine Saritas and Yasar 2019. Essentially, SVM is a binary classifier that builds a linear separating hyperplane to categorise data instances. SVMs have been appreciably used for classification Wang et al. 2012 as well as speech recognition Jain et al. 2020 due to its appealing features in various applications Khosravi et al. 2018. FL, as one of the popular ML techniques, has gained a large popularity by researchers because of its benefits such as effective performance, ease of implementation and easy interpretation of the outcomes. Its impact has been felt widely across scientific and engineering fields and across a range of industries concerned with data-intensive issues, such as classification of faults in mechanical differential Asadi Asad Abad et al. 2015 and modelling of industrial systems Sheta et al. 2009. Neuro-fuzzy computing, which is a wise combination of FL and ANNs, enables the construction of more intelligent decision-making systems. Basically, this system integrates the general features of ANNs such as robustness and adaptive learning with the strengths of FL with the aim of covering the weaknesses of these two techniques. These two techniques have been applied together to solve several types of engineering problems Babuška and Verbruggen 2003; Sheta et al. 2013 and have achieved fruitful accomplishments in medical imaging Omotosho et al. 2018 and classification problems Rajasekaran and Sri Meena 2012. An outstanding benefaction of soft computing and neuro-fuzzy is the evolution of adaptive neuro-fuzzy inference system (ANFIS) Ansari and Gupta 2011, where it has been appropriately applied to address several applications in engineering and industrial fields Khosravi et al. 2018 and has been adopted to prediction problems Naderpour and Mirrashid 2019.

With the evolution of industrial process plants, it has become more complicated and burdensome to control them. In this context, increased global competition in the control and monitoring of industrial processes has driven to the development of new approaches for the design and analysis of industrial processes Wang et al. 2015. This is to fulfil the high request for product quality process safety and even the productivity of all working equipments. Thus, industrial modelling techniques are increasing in approval and have earned a notable development in practical sites and academic community Schlei-Peters et al. 2018.

Numerous applications in engineering and scientific fields have been suggested to control and monitor industrial processes based on the inference of mathematical models Sheta et al. 2013. The rotor system is one of base components in web conveyance systems, which have a significant impact on the dynamics of the systems. Many models have been proposed in the literature to control and monitor the operational processes of the rotor systems, which have been thoroughly explored in many research works and have played a crucial role in various industrial machines. As one of comprehensive and precise approaches, Liu and Shao (2018) presented a thorough investigation of dynamic modelling and simulation system for foretelling the vibration attributes of rolling element bearings with and without local and distributed flaws. El-Thalji and Jantunen (2015) provided an elaborate review of the predictive health controlling methods, the corresponding abilities, drawbacks and merits in detecting and identifying the localised and distributed flaws in rolling element bearings. However, some industrial processes form a challenge in creating such mathematical models. Development of mathematical models is essential to automate and increase the possibility of simulating complex industrial operations. This imposes a great demand to evolve high quality models for the industrial systems to assure productivity and quality of service. There are two common ways to establish a relationship between the specified input and output variables for a nonlinear system.

  • Traditional empirical modelling methods, which depend on building an automated physical model on the basis of raw data registered in empirical or industrial system characteristics. There are some circumstances where traditional modelling methods require prolonged time to fulfil a task that may lead to unfavourable results. This takes place if there is a shortage of precise or systematic knowledge about the system or the experimental data are subject to a high degree of uncertainty. These models may produce imprecise outcomes for many situations that do not represent the data so quite. Sometimes, the developed models are not incomprehensible and will not be well interpreted Ljung 1987.

  • Model-based methods, which derive a mathematical relationship between the observed data and the true data of the system with a primary aim to reduce the variations between the target and the original datasets Braik et al. 2008; Sheta et al. 2009. Consequently, there is a difficulty in building a model based for a challenge industrial system with sufficient suitability that allows the model to efficiently identify the data measurements. Extracting an appropriate model for an industrial system is essential for model-based systems. The best design in industry is often associated with multiple design goals under nonlinear constraints. Diverse objectives often clash with each other, and occasionally, there are no really optimal solutions, and so often there is a need to some concessions and approximations Liu and Shao 2018.

Many mathematical models have been applied presented in the literature to model plant-wide industrial processes. Among these processes, the real winding process (WP) Nozari et al. 2012 was one of the challenging processes used extensively in research works for both academic and practical purposes. It has been widely used for judging and exploring the efficiency of the development of new mathematical models. To proffer an obviously and in-depth understanding of modelling industrial processes, we present a model-based method to simulate the WP using ANNs trained based on a set of bio-inspired algorithms (BIAs). The winding process Nozari et al. 2012 is an important and appropriate process in all disciplines of system theory. This includes conducting comparative studies, verification of new algorithms and evaluation of control systems. However, to model the WP is a challenge because it is a nonlinear process and has inconstant conditions. Thus, extracting an appropriate model for a WP sounds necessary for model-based control and diagnosis experiments. Here, particle swarm optimisation (PSO) algorithm Kennedy 2011, grasshopper optimisation algorithm (GOA) Saremi et al. 2017 and grey wolf optimiser (GWO) Mirjalili et al. 2014 were used as a new type of learning approach to adjust the weights of the ANN.

The rest of the paper is organised as follows: In Sect. 2, several methods of modelling manufacturing processes are reviewed. Section 3 describes the linear and nonlinear procedures of modelling the dynamic systems. In Sect. 4, we briefly present the evolutionary optimisation algorithms of our interest. Then, in Sect. 5, we briefly describe the winding process case study. After that, Sect. 6 presents a description of data used in our experimentation followed by a discussion of the preliminary operations performed on the data. The proposed system identification procedure is presented in Sect. 7. The evaluation metrics are given in Sect. 8. The simulation and evaluation outcomes are given in Sect. 9, and finally, the paper is concluded in Sect. 10 accompanied with insights for future research directions.

2 Previous works

Today, there is a swift development in the modelling, monitoring and identification of industrial processes. In this sense, there are a substantial number of methods reported in the literature for modelling many types of manufacturing processes as well as the winding process Sheta et al. 2019; Chang et al. 2019; Naderpour and Mirrashid 2019. However, it is not feasible to study and analyse all the research methods used for modelling and monitoring of the WP. Besides the studies that have used empirical methods to model the winding machines Sievers et al. 1988; Parant et al. 1992, there are many methods used data-based methods to develop suitable models for industrial systems Babuška and Verbruggen 2003; Sheta et al. 2013; Khosravi et al. 2018.

Earlier, genetic programming (GP) was adopted to create a nonlinear model for a winding machine Hussian et al. 2000, and a nonparametric ANN model was suggested for modelling an industrial winding process Hussian et al. 2001. A linear model was created for faults diagnosis for a winding machine using an Auto-Regressive with eXternal inputs (ARX) structure Noura et al. 2009. Sheta et al. (2013) have used ANFIS with Takagi–Sugeno technique to create models for the dynamics of a hot rolling industrial for datasets gathered from the Eregli Iron and Steel factory in Turkey. Nozari et al. (2012) have presented a technique for modelling an industrial winding process. Their proposed approach used an incremental tree-based algorithm, referred to as LOLIMOT, to train a recurrent local linear neuro-fuzzy (RLLNF) network. The performance of the RLLNF modelling approach was compared to least square error (LSE) method, multilayer perceptron (MLP) and radial basis function (RBF) identifiers.

The sharp increase in empirical data across industries provides unrivaled chances for data-driven decision making, where data-driven models can address experimental data to control, monitor and improve industry performance. Sadati et al. (2018) presented a modelling method that uses data from synthetic experiments to identify control variables during simplifying process parameters simultaneously. The method was tested on a real situation of a tire manufacturing company. Torres et al. (2018) used probabilistic boolean networks (PBNs) to model a manufacturing system, and they studied the relationship between components of a real industrial machine. The machine was modelled as a PBN, by specifying the regulatory nodes. The predictors, selection probabilities, simulation and property verification were used to assess the accuracy of the PBN model. They used simulation results to create the data needed to make inferential statistical tests to determine the degree of correspondence between the forecasts and real machine data.

Currently, many researchers tend to use ANN or one of its variations to model industrial processes due to its potentiality of generalisation and fitting adaptability Chang et al. 2019; Sheta et al. 2019; Wang et al. 2020. However, fitting ability of an ANN is typically affected by the configuration used, particularly number of hidden neurons, number of input parameters and the learning algorithm utilised to fine-tune its weights Crone and Kourentzes 2009. An iterative process of adding hidden neurons and inputs to the ANN model should result in a systematic decrease in the modelling error. The learning algorithm is a fundamental process for deducing the ANN model parameters that fit with the specified data set. This has encouraged the development of many learning algorithms such as gradient-based techniques, like Levenberg–Marquardt (LM) Ranganathan 2004 and back propagation (BP) Li et al. 2012 algorithms. These classical algorithms were preferred to use by many researchers due to their merits such as their efficient implementation, low computational burden and good at tuning the weights of the ANN model. On top of that, the rapid convergence is a key feature of gradient-based techniques, as the adequate exploitation of gradient information can significantly increase the convergence speed compared to a technique that does not calculate gradients. However, these algorithms are highly dependent on the initial weights when searching for optimal weights, are prone to falling into local optima with non-desirable solutions, may fail to explore multimodal and non-continuous surfaces and sometimes lead to poor performance. Another potential fragility of these classical methods is that they are comparatively inflexible of obstacles suchlike noisy target function spaces Zingg et al. 2008.

Many BIAs have been satisfactorily for modelling different types of industrial systems. Nikabadi and Naderi (2016) have proposed a hybridisation algorithm of simulated annealing (SA) and GA for scheduling unconnected parallel machines with sequence-based setup times, standby times, variable due dates as well as some antecedence connections between the jobs. Ayough and Khorshidvand (2019) have evolved a model for a cellular manufacturing process to reduce the costs associated with a limited number of cells to be built by allocating workforce using PSO and SA. Mousavi et al. (2015) utilised GA, SA and a combination of SA and GA to simulate how generators endeavour in the spot electricity market to increase profits according to the work plans of other generators. In another aspect of BIAs in industry, cuckoo search algorithm (CSA) was exercised to a vehicle routing problem Santillan et al. 2018. In Dixit et al. (2019), GA and ANN were used to address a modelling problem and optimisation study on dimensional irregularities of square-shaped microgroove in laser microfabrication of aluminium oxide ceramic substance. Because assembly forms a large fraction of the cost of any product in any modern production system, a study offered in Dao et al. (2017) used GA for an optimum global solution to model a virtual computer-integrated manufacturing system. An incorporation of clonal selection algorithm (CSA) and SA, referred as CSA-SA, along with a mathematical model based on quadratic assignment was combined to address a stochastic dynamic facility problem Moslemipour 2018. An outstanding use of BIAs in industry was made by Yıldız (2008) who has presented a hybrid harmony search method to handle many manufacturing and engineering design problems.

Although reasonable performance has been achieved in the modelling of industrial systems using ANNs and BIAs, the performance of modelling industrial systems that are subject to variable conditions and uncontrolled factors lacks high precision and is not persuasive either modelled using ANNs Nozari et al. 2012 or BIAs Nikabadi and Naderi 2016. Therefore, development of reliable and compact models to simulate an industrial process is essential to generate a sensible estimate of the process parameters. In this paper, we have turned our attention to BIAs as learning algorithms for ANNs, looking for a suitable source of inspiration for learning ANNs, which can overcome the handicaps of classical methods in the modelling of an industrial system. Here, three BIAs, namely PSO Kennedy 2011, GOA Saremi et al. 2017 and GWO Mirjalili et al. 2014, were used to training an ANN model to model an industrial winding machine Nozari et al. 2012. The aim of these algorithms is to get an optimal set of weights for the ANN model, where an optimal model would be obtained. These optimisation algorithms have outstanding exploratory search features and can avert getting caught into local minimum by exploring the search space, which are then well suited for adjusting the weights of ANNs and getting an optimal model structure for the winding process under study.

2.1 Challenges of the winding system

Winding systems are key elements of a broad range of industrial plants. For instance, steel rolling mills Babuška and Verbruggen 2003 and plants involve web conveyance that includes paper-making, coating and polymer film extrusion Braatz et al. 1996. Some researchers have considered limiting the computational efforts correlated with the analysis, design and active fault-tolerant control in web conveyance systems Sievers et al. 1988, steel industries Parant et al. 1992 and film and sheet processes Braatz et al. 1996. Developing accurate and reliable models to describe complex industrial processes is substantial to create a robust approximation of the output parameters. This is a specific requirement to create purposeful industrial process models. The winding systems have been largely used by control community as a data source to compare various approaches and evaluate the adequacy of process control methods Haddad et al. 2017. To sum up, a great number of literary articles have been directed to process modelling, performance control and winding process identification. Nevertheless, these works are short of accuracy about the winding process and in-depth study concerning of the many uncontrolled factors that occur in the actual winding process. In fact, due to these uncontrollable factors, modelling and monitoring methods of the winding process encountered a difficulty. The variations in the parameters of this process are further a challenge. Furthermore, the performance achieved in modelling and control of the winding process needs to be considered. So, we believe that there is still scope to do better. How to select suitable system architecture along with an appropriate algorithm for estimating system parameters and appropriate assessment criteria are considered in this paper.

The originality and relevance of this work are related to: (1) development of three intelligent models to model a real winding process based on ANNs and PSO, GOA and GWO with an aim to meet the expected theoretical behaviour aspects of the presented models and (2) assessment of the theoretical aspects of the proposed models like error reduction and performance expansion in addictive process parameters. Adding more hidden neurons to the proposed models to achieve the desired degree of performance is a sensible process. The primary objective of this work encompasses obtaining a specific structure of an ANN with optimum values of the weights. In addition, it is anticipated that this will allow it practicable to compare the alternative models of the case study as a second objective.

3 Dynamic system identification

There are two popular procedures for the identification of the dynamic multi-input–single-output (MISO) systems from input/output sequences:

3.1 Linear dynamic system identification

To build a model for a plant-wide process, the simplest solutions should be tried at outset based on the theory of system identification, particularly when no beforehand knowledge is available of its fundamental behaviour. To arrive at a linear model, a specific sequence of the input and output parameters perceived with a constant sampling interval is taken into account. Generally, it is possible to identify a problem to be solved given a set of input patterns, \(u(t)= \left\{ u_1,u_2,\ldots ,u_p \right\} , u \in \mathbb {R}^p\) and a set of desired patterns, \(y(t)= \left\{ y _1, y _2,\ldots , y _r \right\} \) , \(y \in \mathbb {R}^r\), where t is the sampling time and p and r are the number of samples in the input and output, respectively. So, it is potential to find a suitable mathematical model that represents the data model based on a relationship between input and output patterns, so the error is reduced between the true data and the estimated data. In the case of dynamic linear relationships between input and output variables, the relationship among them can be described by the following model Nozari et al. 2012:

$$\begin{aligned} y_M(k)= \sum _{l=1}^{n} (a_l y_M(k-1)+\sum _{i=1}^{p} \sum _{i=1}^{p} b_{il}u_i(k-l) \end{aligned}$$
(1)

where \(a_l\) and \(b_{il}\) are scale parameters, \(m_i\) represents the order of the numerator of the \(i{\mathrm{th}}\) input and n represents the denominator order.

Equation 1 represents linear MISO discrete-time systems, which can simply be converted to the corresponding discrete-time transfer function. The parameters in Eq. 1 can be computed by the LSE technique because the prediction error is linear in parameters Ljung 1987. Choosing an appropriate model structure is an important component in any modelling a system. The purpose of selecting an order for the model is to select a model that fits a particular data set. This issue is so important that deficiencies in this selection process may result in poor accuracy in some key procedures of the modelling scheme. The Akaike Information Criterion (AIC) Ljung 1987 is an information criterion that can be used to fulfil this process. In such a context, the obtained orders on the basis of linear models can then be tested on nonlinear approximators in the case of nonlinear identification. In the particular order selection strategy, the linear model order displayed in Eq. 1 is augmented and the AIC index is computed in each step. At last, after some iterations, the corresponding order of the lowest AIC value is chosen as a typical model order:

$$\begin{aligned} J_{AIC}= logSSE+\frac{2P}{Q} \end{aligned}$$
(2)

where Q identifies the number of samples utilised to calculate the AIC index, P denotes the number of parameters, 2P/Q stands for a penalty term and SSE is the sum of square errors set by Nozari et al. (2012):

$$\begin{aligned} SSE= \sum _{N=1}^{Q} (y_p(N)-y_M(N))^2 \end{aligned}$$
(3)

where \(y_p\) and \(y_p\) are the output parameters of the process and model, respectively.

3.2 Nonlinear dynamic system identification

The identification of a nonlinear system is more appropriate when linear methods do not provide satisfactory outcomes in the modelling of physical systems. Further, the potential adaptability, generality and capability to implement a nonlinear relationship between input and output parameters can be demonstrated using nonlinear regression techniques. The algorithm selection does not effect on the adjustment statistics value of the model. In many situations of the regression models, the fact that the change in output, Y, relies on input, X, is what turns out the relationship between X and Y to be nonlinear although the model is linear in its estimated parameters Chang et al. 2010. Yet the model is still considered a linear regression model in view that it is linear in terms of the regression coefficients, \(\beta _1, \beta _2, \ldots ,\beta _i\). One strategy to demonstrate such a relationship is through a polynomial regression model. The relationship between the independent variable X and the dependent variable Y would be modelled as an \(m{\mathrm{th}}\) degree polynomial in X. Such a model for a single regressor, X, is:

$$\begin{aligned} Y=\beta _0+\beta _1X^1+\beta _2X^2+\cdots +\beta _{m}X^m+\Upsilon \end{aligned}$$
(4)

where m stands for the degree of the polynomial and \(\Upsilon \) represents a vector of model errors.

The computational problems of polynomial regression can be entirely handled using multiple nonlinear regression (MNLR) models. This can be achieved using \(X^1, X^2,\ldots ,X^m\) as distinct independent parameters Chang et al. 2010. The polynomial regression model can be described as:

$$\begin{aligned} y_i=\beta _0+\beta _1x_i+\beta _2x_i^2+\cdots +\beta _{m}x_i^m+\epsilon _i \end{aligned}$$
(5)

where \(i=1,2,\ldots ,n\) and n is the number of coefficients of the model.

Equation 5 can be expressed by a matrix \(\mathbf{X} \) of input values, a response vector \(\mathbf {y}\) of output values, a coefficient vector \(\mathbf {\beta }\), and a vector \(\mathbf {\Upsilon }\) of model errors. To sum up, the regression model can be represented as shown in Eq. 6:

$$\begin{aligned} \mathbf {y}=\mathbf{X} \mathbf {\beta }+\mathbf {\Upsilon } \end{aligned}$$
(6)

The reality that the behaviour of most dynamical systems is nonlinear has made ANNs suitable for the identification tasks. MLP neural networks are generic nonlinear model structures that have proved to be adequate for black-box modelling and control problems. Several studies have recommended that MLP neural networks are a proper selection for defining a nonlinear system and that a variety of ANN architectures and training algorithms have been suggested for control problems Braik et al. 2008. Thus, the process of handling non-linearity as well as the dynamism of a WP was addressed on the basis of ANNs trained with BIAs as described below.

4 Evolutionary optimisation algorithms

We have chosen the well-known evolutionary algorithm, PSO, and two recent evolutionary algorithms, GOA and GWO, to train ANNs. These SI-based algorithms are well accepted by the artificial intelligence (AI) community for their potent optimisation in solving complex real-world problems. A brief background about each of these algorithms is given in the following subsections.

4.1 Particle swarm optimisation

The PSO was proposed by Kennedy (2011) to optimise continuous nonlinear functions by modelling the social behaviour of swarms of animals like bird flocking and fish schooling. The term “particle” was adopted because particles experience velocities and accelerations. In addition, it also indicates diffuse objects such as clouds. Basically speaking, PSO is mathematically simple and computationally inexpensive. In PSO, particles explore probable solutions of the hyperspace and accelerate towards optimum solutions. The PSO algorithm adjusts velocities and positions of the swarm particles as per the following equations, respectively Kennedy 2011:

$$\begin{aligned} V_{i}^{(k+1)}(j)= & {} \omega \times V_{i}^{(k)}(j) \nonumber \\&+\alpha \times r_{1} \times (pbest_{i}^{(k)}(j)-X_{i}^{(k)}(j))\nonumber \\&+\beta \times r_{2} \times (gbest_{i}^{(k)}(j)-X_{i}^{(k)}(j)) \end{aligned}$$
(7)
$$\begin{aligned} X_{i}^{(k+1)}(j)= & {} X_{i}^{(k)}(j) + V_{i}^{(k+1)}(j) \end{aligned}$$
(8)

where \(i=1, 2, \ldots , P\), is the particle’s index from a population of size P. \(pbest_{i}\) is the \(i{\mathrm{th}}\) particle’s best known position and \(gbest_{i}\) is the best position known to the swarm. \(\omega \) is the inertia weight. \(\alpha \) and \(\beta \) are two positive acceleration constants, called the cognitive and social parameters, respectively, which control the influence of \(pbest_{i}\) and \(gbest_{i}\) on the search process. \(r_{1}\) and \(r_{2}\) are two random numbers uniformly distributed within the range [0, 1]. \( X_{i}^{(k)}(j)\) identifies the \(j{\mathrm{th}}\) element of the current position of the \(i{\mathrm{th}}\) particle in \(k{\mathrm{th}}\) step. \( X_{i}^{(k+1)}(j)\) identifies the \(j{\mathrm{th}}\) element of the new position of the \(i{\mathrm{th}}\) particle in \((k+1)th\) step. \( V_{i}^{(k)}(j)\) identifies the \(j{\mathrm{th}}\) element of the current velocity vector of the \(i{\mathrm{th}}\) particle in \(k{\mathrm{th}}\) step and \(V_{i}^{(k+1)}(j)\) stands for the \(j{\mathrm{th}}\) element of the new velocity vector of the \(i{\mathrm{th}}\) particle in \((k+1)th\) step of the PSO model.

PSO has a virtuous approval by AI because it is robust in finding optimum or near optimum solution. Based on the intuitive understanding of PSO model, there is a faction within the PSO research community that supports the use of PSO in training the weights of ANNs. This is due to its robust exploration capabilities, diminished susceptibility to being trapped in local minimum and because it does not suffer from early convergence as is the case with the global best (gbest).

4.2 Grasshopper optimisation algorithm

The GOA, proposed by Saremi et al. (2017) is a gradient-free, nature-inspired stochastic optimisation algorithm that highly avoids local optima and considers the optimisation problem as a black box. GOA mimics grasshopper swarms (attraction and repulsion forces between grasshoppers in the swarm) with a mathematical model represented as Saremi et al. 2017:

$$\begin{aligned} X_{i}=r_{1} \times S_{i}+r_{2} \times G_{i}+r_{3} \times A_{i} , \end{aligned}$$
(9)

where the subscript i is the grasshopper’s index from a population of size N. \(X_{i}\) stands for the position. \(S_{i}\) represents the social interaction, which is a function of distance between grasshoppers. \(G_{i}\) represents the force caused by gravity and is directed towards the Earth’s centre. \(A_{i}\) represents wind advection and follows the direction of the wind, and \(r_{1}\), \(r_{2}\) and \(r_{3}\) are random numbers \(\in [0,1]\). Accordingly, as shown in Saremi et al. (2017),  (9) can be rewritten as:

$$\begin{aligned} X_{i}=\sum _{j=1, j\not =i}^{N}s(|\chi _{j}-\chi _{i}|) \frac{\chi _{j}-\chi _{i}}{d_{ij}}-g \hat{e_{g}}+u \hat{e_{w}} \end{aligned}$$
(10)

where g represents the gravitational constant, \(\hat{e_{g}}\) represents a unity vector towards the centre of the earth, u represents a constant drift, \(\hat{e_{w}}\) represents a unity vector in the direction of wind and s is a social forces function that was defined as: \(s(r) = fe^{\frac{-r}{l}} -e^{-r}\).

Then, authors proceed to show that Eq. (10) becomes Saremi et al. 2017:

$$\begin{aligned} X_{i}^{d}=c \left( \sum _{j=1, j\not =i}^{N}c \frac{ub_{d}-lb_{d}}{2}s(|\chi _{j}^{d}-\chi _{i}^{d}|)\frac{\chi _{j}-\chi _{i}}{d_{ij}} \right) +\hat{T_{d}} \end{aligned}$$
(11)

where \(ub_{d}\) and \(lb_{d}\) are the upper and lower limits in the dth dimension, \(\hat{T_{d}}\) represents the value of the dth dimension in the best target solution and c is a lessening factor to shrink the comfort area, repulsion area and attraction area.

GOA is different from PSO in that GOA mandates all search agents to contribute to the calculation of the position of every search agent. Moreover, GOA keeps a good balance between exploration and exploitation by adopting the so called adaptive comfort zone. At the beginning, grasshoppers experience high repulsion rate; this leads to high exploration and thus avoids local optima. Then, when the search proceeds and approaches final steps, the attraction between the grasshoppers in the swarm manifests and takes exploitation to its extent resulting in better search accuracy. In Saremi et al. (2017), it was shown that GOA excels in solving challenging real problems albeit with unknown search spaces. Based on the appealing features of GOA with high exploration and exploitation capabilities, there is a faction emanated to use GOA to adjust the weights of ANNs during the training process to model the industrial case study process.

4.3 Grey wolf optimisation

The GWO is a flexible metaheuristic optimisation method that avoids stagnation in local optima spots of the search space. This algorithm was proposed by Mirjalili et al. (2014); it imitates the social behaviour of grey wolves in the aspects of their hierarchical leadership and hunting manoeuvres. Mathematically, this algorithm models the leadership hierarchy of grey wolves by categorising them into four sets in accordance to their superiority from top to bottom as \(\alpha \), \(\beta \), \(\delta \) and \(\omega \). In terms of hunting, the algorithm models prey (target) encircling by the hierarchical grey wolves pack as Mirjalili et al. 2014; Masadeh et al. 2018:

$$\begin{aligned} \mathbf {D}= & {} |\mathbf {C}.\mathbf {X_{p}}(t)-\mathbf {X}(t)| \end{aligned}$$
(12)
$$\begin{aligned} \mathbf {X}(t+1)= & {} \mathbf {X}_{p}(t)-\mathbf {A}.\mathbf {D} \end{aligned}$$
(13)

where t defines the current iteration, \(\mathbf {A}\) and \(\mathbf {C}\) define coefficient vectors and \(\mathbf {X}(t)\) and \(\mathbf {X}_{p}(t)\) are the position vectors of the grey wolf and prey, respectively.

The vectors \(\mathbf {A}\) and \(\mathbf {C}\) were computed as follows:

$$\begin{aligned} \mathbf {A}= & {} 2\mathbf {a}.\mathbf {r_{1}}-\mathbf {a} \end{aligned}$$
(14)
$$\begin{aligned} \mathbf {C}= & {} 2.\mathbf {r_{2}} \end{aligned}$$
(15)

where the items of \(\mathbf {a}\) are linearly diminished from 2 to 0 over the course of iterations and \(\mathbf {r_{1}}\) and \(\mathbf {r_{2}}\) are random vectors within the range from 0 to 1.

Then, the following equations are used for hunting and attacking the prey Mirjalili et al. 2014; Azizivahed et al. 2018:

$$\begin{aligned} \begin{aligned} \mathbf {X}_{1}=&\mathbf {X_{\alpha }}(t)-\mathbf {A_{1}}\cdot \mathbf {D_{\alpha }}\\ \mathbf {X}_{2}=&\mathbf {X_{\beta }}(t)-\mathbf {A_{2}}\cdot \mathbf {D_{\beta }}\\ \mathbf {X}_{3}=&\mathbf {X_{\delta }}(t)-\mathbf {A_{3}}\cdot \mathbf {D_{\delta }} \end{aligned} \end{aligned}$$
(16)

where \(X_{\alpha }\), \(X_{\beta }\) and \(X_{\delta }\) are the best, second best and third best search agents, respectively, and \({D_{\alpha }}\), \(\mathbf {D_{\beta }}\) and \(\mathbf {D_{\delta }}\) are calculated using Eq. 17.

$$\begin{aligned} \begin{aligned} \mathbf {D_{\alpha }}=&\left| \mathbf {C_{1}}\cdot \mathbf {X_{\alpha }}-\mathbf {X}\right| \\ \mathbf {D_{\beta }}=&\left| \mathbf {C_{2}}\cdot \mathbf {X_{\beta }}-\mathbf {X}\right| \\ \mathbf {D_{\delta }}=&\left| \mathbf {C_{3}}\cdot \mathbf {X_{\delta }}-\mathbf {X}\right| \end{aligned} \end{aligned}$$
(17)

As \(\mathbf {X}_\alpha \), \(\mathbf {X}_\beta \) identify the three best solutions obtained so far Gupta and Deep 2018; Mirjalili et al. 2014, other wolves are committed to update their positions in the whole population as shown in Eq. 18:

$$\begin{aligned} \mathbf {X}(t+1)=\frac{\mathbf {X_{1}}+\mathbf {X_{2}}+\mathbf {X_{3}}}{3} \end{aligned}$$
(18)
Fig. 1
figure 1

Schematic diagram of a winding process with control sensors

GWO has achieved competitive results in solving several types of complex real problems with unknown search spaces Mirjalili et al. 2014; Masadeh et al. 2018. The statistical analysis presented in Mirjalili et al. (2014) shows that the general statements formed about the ability of exploration, convergence and even the solution accuracy of GWO are superior, and shows that the algorithm can be considered outright, not even for particular problem sets. Therefore, GWO was conducted here to train the weights of ANNs due to its attractiveness in which it has only few parameters to set and can work well in a wide range of applications as well as for certain applications focusing on clearly defined requirements.

5 Winding process

The winding process is a test setup of a real industrial winding plant process, which often met in web conveyance systems Bastogne et al. 1998. This process is a well-presented benchmark problem for analysis and control design in the control community. Figure 1 shows the schematic diagram of the winding process that we are targeting in this work. This process consists of multivariate and correlated systems with process parameters varying during operation. The key role of this process is to control the web transferring to avert the impacts of sliding and friction. The solution relies on conserving a traction effort on the strip and monitoring the tension at various points over the web Braatz et al. 1996. The winding machine considered in this study is composed of a plastic strip, three reels, referred to as reels 1, 2 and 3 or unwinding reel, traction reel and rewinding reel, respectively, and gear reduction connected with the three reels. The reels are controlled by three DC motors denoted as M\(_1\), M\(_2\), and M\(_3\). Motor M\(_1\) corresponds to the unwinding reel, M\(_2\) corresponds to the traction reel and M\(_3\) to the rewinding reel. Reels 1 and 3 are coupled using the DC motors driven via set-point currents \(I_{1}\) and \(I_{3}\). Moreover, tension meters are placed to measure strip tensions in the web between reels 1 and 2, referred to as \(\hbox {T}_{1}\), and between reels 2 and 3, referred to as \(\hbox {T}_{3}\), in addition to the dynamo tachometer that measures the angular speeds of each reel (\(S_{1}\), \(S_{2}\) and \(S_{3}\)). Each motor is run by a local controller as displayed in Fig. 1. Speed control is reached for motor M\(_2\), while torque control is attained for motors M\(_1\) and M\(_3\), given that the angular velocity of motor M\(_2\) (\(\Omega _2\)) is measured using a dynamo tachometer. The data measurements of this process contain the process input variables and controlled output variables. The measured data were used to generate the training patterns for the developed model structures. The key process variables were measured through sensors at pre-selected points of the process, and the obtained data were registered at a sampling rate of 0.1s in the monitoring system. The input and output parameters for the WP of this case study are described in Table 1.

Table 1 Input and output variables of the winding process

The output parameters for this case study describe the behaviour of the WP and show how it responds to various input sets. At any instant, to estimate the value of future response of the process, it is substantial to use both input and output parameters. The variations in the parameters that occurred in this process are due to the variation in the reel radius during the unwinding process. This non-measurable change of the reel radius remarkably amends the dynamic behaviour of the system throughout the overall process of unwinding Noura et al. 2009. Moreover, the performance and reliability of the monitoring system can be significantly affected by the system support stiffness, such as bearing, housing and rotor Liu and Shao 2017. Interfacial frictional moments are further parameters impact on the accuracy of control and monitor of dynamical systems. Thus, many researchers have discussed the effect of housing support stiffness and interfacial frictional in the construction of their dynamic models Liu et al. 2014; Zhang et al. 2016. As a result of the uncontrolled influences to which the WP is subject; this type of process poses a challenge in modelling, identification and control. To overcome such problematic issues, non-analytical techniques as presented below are experimentally applied for the identification of the winding process models.

6 Model preparation process

Model development process requires some necessary preparation stages, which must be completed as a pre-step to yield a good modelling process. This involves data collection and data pre-processing.

6.1 Training and test datasets

Typically, the success of modelling an industrial process depends at first on the amount of data that must be large enough to study the behaviour of the process well. This assumes a large computational time during training. Thus, it is useful to use an adequate number of data measurements to train the model until a highly qualified model is created. Here, the number of data measurements for each input variable and output variable of the WP consisted of 2500 data samples, which are publicly available atFootnote 1. The dataset for each web tension was divided equally into a training set to train the model and a test set for evaluating the performance of the developed model. Consequently, the number of data samples for each input parameter and output parameter in the training and testing processes consists of 1250 samples. This number of data values has the potential to yield an effective modelling process which could improve the performance of the identification process. The input variables of the WP in this study were indicated in the scope of the modelling problem as \(S_{1}(t)\), \(S_{2}(t)\), \(S_{3}(t)\), \(I_{1}(t)\) and \(I_{3}(t)\). The models \(\hbox {T}_{1}\)(t) and \(\hbox {T}_{3}\)(t) of the winding problem as defined in Sect.  5 correspond to the tension in the web between reels 1 and 2 and reels 2 and 3, respectively. The measurements \(I_{1}(t)\) and \(I_{3}(t)\) were included as the fourth and fifth input variables of the model to improve the performance of the developed models in estimating the tension in the web between the unwinding and rewinding reels and between the rewinding and the traction reels.

6.2 Data pre-processing

The data employed to build a model-based for a nonlinear system must be strictly selected to ensure that it is rich enough to avoid trapping into an overfitting problem or early convergence. Data pre-processing methods have to be handled before the learning process to augment the quality of training patterns to reinforce the acquisition of a high reasonable modelling scheme. In essence, ANNs can perform an arbitrary nonlinear mapping between input and output data values. The simplest strategies of pre-processing are data filtering and scaling.

6.2.1 Data filtration

Homogenisation and smoothing of intensive changes are needed to apply to the raw data measurements to extract righteous input and output variables from the empirical datasets. In this paper, the collected raw data are passed through a first-order digital Butterworth low-pass filter with a sampling rate of 3 Hz and a bandwidth rate of 0.3 Hz.

6.2.2 Data scaling

The input and output datasets of the winding process have diverse ranges, which may lead to unsatisfactory modelling process accompanied with relatively high error rates in the evaluation process. Furthermore, the dataset is not usually used immediately in the creation of models for industrial systems, as in many cases there is a variation in the magnitude of the variables of the systems. Thus, data scaling is an important issue concerned with high performance of the system identification. Data scaling should be conducted in a fixed range to prohibit data with larger magnitudes from overriding smaller magnitudes and impeding the early learning process. In this paper, the input and output data are scaled in the range between 0.1 and 0.9. The original data X were mapped to the scaled data \(X'\) as given in Eq. (19):

$$\begin{aligned} X'=\frac{X-X_{min}}{X_{max}-X_{min}}\times s+o \end{aligned}$$
(19)

where \(X_{max}\) and \(X_{min}\) identify, respectively, the maximum and minimum values of the original data, X, s is the scale parameter that is equal to 0.9 and o is the offset parameter, \(o=0.1\).

The effects of the Butterworth filter and the subsequent scaling for the inputs of the winding process are shown in Fig. 2. The raw inputs are shown in blue, and the filtered and scaled inputs are shown in red and green colours, respectively.

Fig. 2
figure 2

Effects of applying the Butterworth filter on the original signals, followed by scaling. Raw signals are in blue, filtered signals are in red, and scaled signals are in green (color figure online)

Figure 3 displays a filtration of the angular speed of reel 2 followed by normalisation.

Fig. 3
figure 3

Filtering and scaling the angular speed of reel 2 to the range between 0.1 and 0.9

7 Proposed system identification procedure

System identification often uses statistical methods to construct mathematical models for identifying dynamical systems using sets of measured data. In addition to the model preparation process described above, there are further three key issues to be addressed in the design of models for nonlinear industrial processes: model structure selection, model training process and model validation process. These issues should be handled accurately throughout the entire procedure to achieve high performance with a minimum modelling error. The mathematical structure of the proposed model for the web tensions of the winding process and the training algorithms of ANNs is described below:

  • Proposed model structure: the choice of a model structure with a limited number of parameters to model the WP is difficult due to its non-linearity. The general class of the developed model structure used to predict the target values of the tension webs of the winding process, \(T_{1}(t) \) and \(T_{3}(t) \), follows the pattern given in Eq. (20).

    $$\begin{aligned} y(t)=\phi (S_{1}(t), S_{2}(t), S_{3}(t), I_{1}(t), I_{2}(t),\theta )^{T} \end{aligned}$$
    (20)

    where y(t) represents the tension reels \(\hbox {T}_{1}\) or \(\hbox {T}_{3}\), \(\phi \) represents the system model, \(\theta \) is the model parameters and t is the time instances.

    To create \(\hbox {T}_1\) and \(\hbox {T}_3\) models of the WP with explainable structures that link inputs and output parameters; we use ANN trained based on PSO, GOA and GWO. These \(\hbox {T}_1\) and \(\hbox {T}_3\) models implement the web tensions between reels 1 and 2 and between reels 2 and 3, respectively. The ANN was organised in three layers; input and output layers in addition to a hidden layer. The ANN is then expanded, where hidden neurons are added to a hidden layer, one by one, until the ANN model is capable of achieving its functionality with a minimal realizable error. This depends primarily on the complexity of the modelling problem which is associated with the complexity of the dataset.

    The optimum size of ANN was obtained in this case study through train and error process, where the appropriate number of neurons was identified by an adaptive process that added or deleted neurons as required during the training process. In the feedforward (FF) process: the external inputs are initially fed to the input neurons of the input layer; the outputs from the input neurons are fed to the hidden neurons of the hidden layer; and finally the outputs of hidden layer are fed to the output neurons of the output layer. The first step in training the ANNs is to initialise the weight parameters \(\mathbf {w}\). Then, during the FF computation, the ANN weights, \(\mathbf {w}\), were optimised using the proposed BIAs. The learning process consists of adjusting the synaptic weights \(\mathbf {w}\) until they reach the desired behaviour. The output is evaluated to measure the ANN performance; if the output is not as desired, the weights have to be adapted in terms of the input patterns. Here, supervised learning where the goal is to generate an output approximation with the desired patterns of input–output samples set p as described in Eq. (21) is applied:

    $$\begin{aligned} \mathbf {T}^k =\left\{ \left( x^k \in \mathbb {R}^N d^k \in \mathbb {R}^M\right) \right\} , \ \ k=1,2,\ldots ,p \end{aligned}$$
    (21)

    where \(\mathbf {T}^k\) represents the training sample set, x represents the input patters, d is the desired response, N and M are the number of samples in the input and output patterns.

    The requirement is to design and calculate the NN parameters so that the actual output \(y^k\) of the NN due to \(x^k\) is close statistically to the required degree to \(d^k\) for all k. The use of the classic BP algorithm to adjust the weights of the ANN as this algorithm, like other traditional algorithms, is based on the descendant gradient technique, which can remain stuck in a local minimum. Also, a BP algorithm cannot solve non-continuous problems. For this reason, other techniques that can address non-continuous and nonlinear problems are crucial to reach a better performance of the ANN and solve complex problems. Here, PSO, GOA and GWO, as described below, were used to adjust the synaptic weights of ANN to obtain a minimum error.

  • Proposed training methods: the ability of an ANN model is affected by the configuration used, particularly the hidden neurons, input and output variables numbers; as the number of model parameters rises, it favours the network learning, and hence, the fitting is effective. In principle, adding more hidden neurons to the models in a systematic strategy should result in regular reductions in the fit error. A proper training process for estimating the ANN parameters is the initial point for defining the model. Prior to feeding the data to the models, an ANN design was determined as described above. This is followed by a selection process to estimate the model parameters. A FF-NN was configured using the input and output datasets of the industrial WP with a number of neurons identified through train and error process. These design matters are illustrated above, and the behaviour of the models using train and error process that resulted in a sensible number of hidden neurons for the models is presented in the results section. Using train and error process appears feasible to develop a model that achieves the desired degree of performance. During training, \(\mathbf {w}\) is updated through the use of BIAs until the mean square error (MSE) function defined in Eq. (22) is small enough.

    $$\begin{aligned} e =\frac{1}{M\cdot p}\sum _{i}^{M}\left( y_i\left( x_k,\mathbf {w}\right) -d_{ik}\right) ^2 \end{aligned}$$
    (22)

    where \(y_i\) that defines the output value at \(i{\mathrm{th}}\) index was calculated overall p pattern samples and \(d_{ik}\) is the desired result.

    While the effectiveness of GOA and GWO has been confirmed in reliably solving many complex engineering design problems, there is still a lack of research regarding their values in the area of optimising the weights of ANNs, particularly in the modelling of complex nonlinear industrial problems. In this work, we explored their usage in the optimisation of the weights of ANNs in the modelling of the winding process.

    The PSO, GOA and GWO were proposed as learning search algorithms to train the ANN in order to: (1) prevent the error function from oscillating around a set of weights without any improvement, (2) estimate the model parameters and, (3) obtain the optimal weights for which the optimal model structures could be created for the winding problem. These optimisation algorithms used the MSE as a fitness function to measure the closeness of the estimated output to the actual output. These modelling approaches using ANN with training on the basis of PSO, GOA and GWO have many merits in generating models with interpretable structure that can relate input and output variables from a given dataset without specifying the key parameters. Further, each proposed modelling scheme has a property to produce a highly robust model and has played an important role in achieving a high level of performance when modelling a nonlinear system for a real industrial process. In the training phase, the input set given to the particular modelling system is fed to the ANN and the learning algorithms. Then, the output of the modelling system is also fed back to the ANN and BIAs, which act as the target patterns. The conduction weights of all interconnections between neurons of ANN are updated based on the proposed optimisation algorithms, namely PSO, GOA and GWO, to reach a predetermined number of iterations or to meet the MSE criterion, in which the specified inputs produce the desired outputs. The estimated output vector of each model and the target vector are iteratively compared during the training process to optimise the modelling error value. This value is used by PSO, GOA and GWO at each iteration to reduce the MSE value along with adjusting the ANN weights by varying the parameters of each optimisation algorithm.

    Through these activities, the structured ANN-BIAs models learn the behaviour of the output response, where each independent model, \(\hbox {T}_1\) and \(\hbox {T}_3\), of each web tension of the WP is learned with a compact set of parameters. The proposed BIAs are expected to evade the local minimum by exploring a large area of the search domain. Each algorithm has exploratory search features, making it suitable to optimise the weights of ANNs and providing a better scope to get an optimum solution. To sum up the above descriptions of optimising the weight parameters of ANN using PSO, GOA and GWO, they are summarised by the iterative procedural codes in Algorithms  1,  2 and  3, respectively.

    figure a
    figure b
    figure c

    The final optimal solutions for each optimisation algorithm, as described in Algorithms 1, 2 and 3, were used to get the optimum weights of the final model structures of the web tensions, \(T_1\) and \(T_3\), of the winding process. During the evolution cycle of each optimisation algorithm, the connection weights of the ANN are updated based on the optimisation algorithms so that they can be used to constrain the modelling process to identify the unfamiliar data at the verification processes. In nutshell, the modelling procedures in Algorithms  1,  2 and  3 are iterated until a predefined maximum number of iterations is reached or the improvement in the MSE measure between the target output and the estimated output as measured by Eq. (22) falls below a defined threshold value d. The model of each proposed algorithm was trained several times to solve the problem of local minima. In these iterative procedures, to create a model based on ANNs trained by the proposed optimisation algorithms, the MSE measure was varied at each iteration to identify the web tension of the WP that best matched the estimated web tension values.

    In this case study, the goal is not only to generate models that approximate the values of the underlying web tensions of the winding system, but also to give insight onto the behaviour of these web tensions. The PSO, GOA and GWO summarise the interaction between input and output variables of the WP and also identify the significant variables in this process because these variables will survive and appear in the best individuals at the end of the evolutionary process. The \(\hbox {T}_1\) and \(\hbox {T}_3\) models were designed to capture the main characteristics of the correct output responses of the WP along with the variability of the tuned weight parameters. The results of these iterative algorithms are the models that determine the output vectors representing the web tensions \(\hbox {T}_1\) and \(\hbox {T}_3\) of the winding process. A schematic flowchart of the proposed modelling method for the winding process is shown in Fig. 4. The first step in Fig. 4 is to read the collected data of the winding process. The second step is data pre-processing, which poses a crucial step before building the model. This step consists of data filtration and normalisation. Data filtering step aims to ameliorate data quality and transform it to a more suitable form. Data normalisation aims to get rid of the scale differences between the variables of the process. The third step of the proposed modelling method before model creation is identifying the training dataset, which is associated with sample and variables selection. In this step, half of the datasets were used for training, and all process variables were chosen to create the model. After preparing the dataset, the next step was building and configuring the ANN model, where PSO, GOA and GWO were used to update the weights of the ANN model. The model parameters are then updated iteratively until a maximum number of iterations is arrived at or the enhancement in the error measure computed as given in Eq. (22) is small enough. The green arrow represents the feedback of the modelling method. This results in the models, ANN-PSO, ANN-GOA and ANN-GWO, which are used to test the validity and suitability of the proposed modelling method using a test dataset independent of that used in the training dataset. The last step is to evaluate the performance of the modelling method for both training and test datasets.

    According to the pre-processing phase, the proposed model structure and the training methods used to optimise the ANN weights, the developed models, ANN-PSO, ANN-GOA and ANN-GWO, are expected to be effective in describing and simulating the dynamics behaviour of the winding process. It is anticipated that these models can achieve very modest training and evaluation errors. This result is expected due to the coveted features of the adopted algorithms, PSO, GOA and GWO, which can largely avoid local solutions in finding the optimal weights for the ANN model. Other fascinating features of these optimisation algorithms include: 1) their ability to learn ANNs much faster than traditional learning algorithms suchlike BP and LM methods and 2) their efficiency in locating the global optimum and obtaining the optimal set of weights for the ANNs, where an optimal model structural will be obtained. In sum, there are a host of benefits of an ANN model trained by PSO, GOA and GWO, including robustness in term of noise resistance and disturbing signals, rapid training and evaluation capabilities, avoiding tripping in an undesirable solution and arriving at a high level of reliability in understanding and describing the process under study. The use of the algorithms, PSO, GOA and GWO, provides a stable optimisation for the weights of the ANN model that will present fast convergence with very low error rates in modelling. This is due to that these algorithms: (1) search for optimal weights without relying on the initial weights and (2) enhance the diversity of solutions and are not susceptible to converge at local optima. So, these models are powerful in optimising the ANN weights and have a high potentiality for capturing the important characteristics of the web tensions of the winding process with relatively simple mathematical formulas.

Fig. 4
figure 4

A scientific diagram showing the key procedures of the proposed modelling approach (color figure online)

8 Model acceptance and evaluation

Model selection with a suitable evaluation measure is critical to identify how the model pursues during the evaluation process. This process is required to validate the model and illustrate the power of the model-based approach to identify \(\hbox {T}_1\) and \(\hbox {T}_{3}\) data. The capability of the developed models to characterise the behaviour of the web tensions of the WP was verified through the computation of several metric measures. If the identified models fail to reach a predefined accuracy with an acceptable degree of performance, the modelling process returns back to its training phase. Therefore, performance criteria are needed to identify the level of similarity between the data generated by real experiments and the data produced from the developed models.

The performance of the developed models of the sub-systems, \(T_1\) and \(T_3\), of the winding problem was assessed based upon several criteria as defined below.

  1. 1.

    The mean absolute percentage error (MAPE), as defined below, is a measure of the percentage foretelling precision of the developed models.

    $$\begin{aligned} MAPE = \frac{1}{n}\sum _{i=1}^{n}\left| \frac{y_i- \hat{y}_i}{y_i}\right| \times 100\% \end{aligned}$$
    (23)

    where n stands for the number of data values used in the experiments and y and \(\hat{y}\) represent the n observed and modelled values, respectively.

  2. 2.

    The MSE or E as defined below assesses the accuracy of difference between the actual data produced by the measurement tools and their corresponding predicted data obtained from the models.

    $$\begin{aligned} E = \frac{1}{n}{\sum _{i=1}^{n}(y_i - \hat{y}_i)^2} \end{aligned}$$
    (24)

    where (\(y_{i}\), \(\hat{y}_i\)) represents a single data web tension value for observation i of the measured value acquired from the experiment and the estimated web tension value created by the modelling procedure, respectively. An accurate estimation of a MSE value is achieved if the difference between the expected and the measured values is within tolerance.

  3. 3.

    The Pearson product–moment correlation coefficient (R) as described below was used to test the potentiality of the developed models:

    $$\begin{aligned} R = \frac{\sum _{i=1}^{n} (y_i - \bar{y}) (\hat{y}_i - \bar{\hat{y}})}{\sqrt{\sum _{i=1}^{n}(y_i - \bar{y})^2 {\sum _{i=1}^{n}(\hat{y}_i - \bar{\hat{y}})^2}}} \end{aligned}$$
    (25)

    where \(y_i\) is the \(i{\mathrm{th}}\) actual value of the web tension output measured by a real experiment, \(\hat{y}_i\) is the \(i{\mathrm{th}}\) predicted web tension value generated from the model, \(\bar{y}\) and \(\bar{\hat{y}}\) are the mean values of the actual and foreseen web tension outputs, y and \(\hat{y}\), respectively.

  4. 4.

    The coefficient of determination (\(R^2\)), as given below, measures the differences between the observed and predicted means and variances of the observed data that can be explained by the model.

    $$\begin{aligned} R^2 = 1-\frac{\sum _{i=1}^{n} (y_i- \hat{y}_i)^2}{\sum _{i=1}^{n} (y_i - \bar{y})^2 } \end{aligned}$$
    (26)
  5. 5.

    The variance-accounted-for (VAF) measure, as defined in Eq. (27), measures the proximity of the measured web tension values obtained by real experiments to the estimated web tension values created by the developed models.

    $$\begin{aligned} VAF = \big [1 - \frac{var(y - \hat{y})}{var(y)}) \big ] \times 100\% \end{aligned}$$
    (27)

    where var(y) is the variance of the actual web tension data and \(var(y-\hat{y})\) is the variance of the difference between the actual web tension data (y) and the model data (\(\hat{y}\)).

The aforementioned evaluation measures are relevant and compelling to quantify the accuracy of the developed models. These criteria help to assess the potential of the developed models for the prediction of \(\hbox {T}_1\) and \(\hbox {T}_{3}\) and to single out the degree of agreement between the real output data and the estimated output data. A statistical analysis was performed using Friedman statistical test Friedman 1937 to verify the efficiency of each presented model. This test is able to rank each model along with other conventional and state-of-the-art models.

9 Experimental results and discussion

9.1 Machine and software specifications

Speed of computation is influenced by the volume of data to be processed, the time taken to perform the computation of each model and the complexity of the algorithm. Here, the developed models are implemented in Java under Microsoft Windows 10 platform. All experiments were run on an Intel Machine with Core i5 processor running at 2.50 GHz with 6.0 GB of RAM. The experiments were designed to assess the effect of tuning the weights of ANN using PSO, GOA and GWO. Each experiment was reiterated 10 times. The results of the proposed modelling methods are recorded in terms of the performance criteria defined above and compared with a set of popular traditional and state-of-the-art methods reported in the literature. The programming language, software and hardware specifications were presented to provide an indication of the efficiency of the developed models in terms of the computational burden. The average speed of operations for the 10 experiments using ANN-PSO, ANN-GOA and ANN-GWO are, respectively, about 79.453, 72.283 and 59.172 seconds, each with the controlling parameters given in Table 2. It is observed that the average computational time of the developed modelling schemes is relatively low. This confirms that the computational effort of each developed model is persuaded and assures that the proposed modelling schemes are computationally efficient to address the WP. Indeed, the computational time is only intended to provide some allusions about the time requirements for the future researchers. In short, the elapsed computational time affirms that proposed models are capable of handling WP at a very high speed on a computer with modest specifications such as the one described above.

9.2 Experimental setup of ANN-based algorithms

The feedforward ANN was configured with a number of neurons identified as described along with the input and output data sets of the WP. The ANN weights were controlled through each of the PSO, GOA and GWO to obtain the optimised models for \(\hbox {T}_1\) and \(\hbox {T}_3\) tensions. The control parameters of these tuned optimisation algorithms are displayed in Table 2.

Table 2 Values of the controlling parameters of the tuned algorithms

The control parameters in Table 2 were adjusted by the use of design of experiments to fit each swarm-based algorithm to the underlying nature of the modelling problem. The trial-and-error process was used to find the best parameters for each tuned algorithm. A reasonable wide range of each control parameter was initially defined and the experiments were run systematically. This is to avoid wasting time with untargeted experiments and to have an idea of the behaviour of the optimisation algorithms for different settings, and based on these results to perform a fine tune. The trends for the values of parameters were defined to determine whether the best values are within the range or there is a need to further experiments. The parameter values were varied several times until a sensible solution was obtained. However, most often only good settings are obtained, perhaps, not the “‘best”’ settings.

9.3 Simulation of the developed WP models

The proposed modelling schemes are expected to create models for \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) of the WP with explainable structures that link inputs and outputs with key parameters selection. The leading objective is to originate \(\hat{T_{1}}(t)\) and \(\hat{T_{3}}(t)\) models that describe the aspects of the WP that often resemble the actual \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\), respectively. Three models were developed using ANN-PSO, ANN-GOA and ANN-GWO for \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\). Each developed model was trained on a data set of 1250 samples of each input and output parameters and evaluated on a separate data set of 1250 previously unseen samples. The simulation results are shown below during training and verification stages to illustrate the capability of the modelling schemes.

9.3.1 Simulation result-based regression

Multiple nonlinear regression was used to build a regression model for the WP. Lease square estimation was used to estimate the model parameters. The produced models for the web tension models, \(\hbox {T}_1\) and \(\hbox {T}_3\), are given in Eqs. (28) and (29), respectively.

$$\begin{aligned} T_1= & {} 0.04365- 0.00217 \quad S_1 + 0.00356 \quad S_2 \nonumber \\&- 0.00852\quad S_3 -0.00742 \quad I_1 + 0.00169 \quad I_2 \end{aligned}$$
(28)
$$\begin{aligned} T_3= & {} 0.04158 - 0.00019\quad S_1 - 0.00219\quad S_2 \nonumber \\&- 0.00050\quad S_3 -0.00322\quad I_1 + 0.00729\quad I_2 \end{aligned}$$
(29)

The actual and predicted web tension values, \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\), obtained based on MNLR in both the training and testing cases are shown in Figs. 5 and 6 for both \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\), respectively.

Fig. 5
figure 5

Actual and predicted \(\hbox {T}_{1}\) in training and testing cases based on multiple nonlinear regression

Fig. 6
figure 6

Actual and predicted \(\hbox {T}_{3}\) in training and testing cases based on multiple nonlinear regression

The study of residuals is essential to MNLR modelling process and deciding the performance of the proposed models. The computed correlation coefficients (R) values over training and testing data for \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) are shown in Figs. 7 and 8, respectively.

Fig. 7
figure 7

Multiple nonlinear regression: computed R over training and testing data of \(\hbox {T}_{1}\)

Fig. 8
figure 8

Multiple nonlinear regression: computed R over training and testing data of \(\hbox {T}_{3}\)

9.3.2 Simulation result-based ANN-PSO model

The actual and predicted web tension between reels 1 and 2 and web tension between reels 2 and 3 in both training and testing cases are shown in Figs. 9 and 10, respectively, where the models were created based on ANN-PSO.

Fig. 9
figure 9

Actual and predicted results of \(\hbox {T}_{1}\) in training and testing cases (color figure online)

Fig. 10
figure 10

Actual and predicted results of \(\hbox {T}_{3}\) in training and testing cases based on ANN-PSO (color figure online)

The obtained correlation coefficients (R) values based on ANN-PSO models for both the training and testing data for \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) are shown in Figs. 11 and 12, respectively.

Fig. 11
figure 11

Computed R over training and testing data of \(\hbox {T}_{1}\) using ANN-PSO

Fig. 12
figure 12

Computed R over training and testing data of \(\hbox {T}_{3}\) using ANN-PSO

It is observed from the modelling and identification results shown in Figs. 9 and 10 that the predicted data output identified by red plots fit competently to the actual data output as identified by blue plots for each subsystem of the winding process reactor, as judged visually. These results appear particularly prominent given a model with a relatively large training dataset. These results further underline the validity and feasibility of such ANN-PSO model in describing the dynamic behaviour of the winding process, enabling it to model any industrial process. This in turn means that refinement and data normalisation procedures are valuable to retain the significant aspects of the datasets.

9.3.3 Simulation result-based ANN-GOA model

The actual and predicted web tension between reels 1 and 2 and between reels 2 and 3 in both training and testing processes based on the final developed ANN-GOA models for \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) of the winding process is shown in Figs. 13 and 14, respectively. The top part of the figure gives the training results, while the test results are given in the bottom part. The actual web tension values are shown in red, and the predicted web tension values are shown by a blue line.

Fig. 13
figure 13

Actual and predicted results of \(\hbox {T}_{1}\) in training and testing cases based on ANN-GOA (color figure online)

Fig. 14
figure 14

Actual and predicted results of \(\hbox {T}_{3}\) in training and testing cases based on ANN-GOA (color figure online)

The computed correlation coefficients (R) values based on ANN-GOA models for training and testing cases for both \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) are shown in Figs. 15 and 16, respectively.

Fig. 15
figure 15

Computed R over training and testing data of \(\hbox {T}_{1}\) using ANN-GOA

Fig. 16
figure 16

Computed R over training and testing data of \(\hbox {T}_{3}\) using ANN-GOA

Figure 13 shows that in the case of \(\hbox {T}_{1}\), the simulation results visually are a little better than the \(\hbox {T}_{3}\) case, as shown in Fig. 14 for training and testing data (shown in red and blue, respectively), and both results are reasonable. However, a modest error can be observed between the target and the predicted output in the testing cases, but it is non-significant. These simulation results affirm the effectiveness of the ANN-GOA-based approach for creating a model-based system. The nature of the errors in Figs. 13 and 14 was associated with the fact that the predicted web tensions between reels 1 and 2 and between reels 2 and 3 are not following exactly the corresponding actual web tensions and that the target values were not detected well.

9.3.4 Simulation result-based ANN-GWO model

The actual and the predicted \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) values based on the developed ANN-GWO models are appeared in Figs. 17 and 18 for both training and testing cases. The actual values are identified by blue and the predicted values by a red line.

Fig. 17
figure 17

Actual and predicted results of \(\hbox {T}_{1}\) in training and testing cases using ANN-GWO (color figure online)

Fig. 18
figure 18

Actual and predicted results of \(\hbox {T}_{3}\) in training and testing cases using ANN-GWO (color figure online)

The correlation coefficients (R) values obtained based on ANN-GWO models for training and testing cases for both \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) are shown in Figs. 19 and 20, respectively.

Fig. 19
figure 19

Computed R over training and testing data of \(\hbox {T}_{1}\) using ANN-GWO

Fig. 20
figure 20

Computed R over training and testing data of \(\hbox {T}_{3}\) using ANN-GWO

The simulation results shown in Figs. 17 and 18 confirm the validity and appropriateness of the ANN-GWO-based model for modelling the winding process in both training and testing cases. It is observed that the ANN-GWO-based modelling approach showed better results for \(\hbox {T}_{1}\) than the \(\hbox {T}_{3}\) as visually observed.

In short, the correlation coefficients (R) results obtained-based ANN-PSO, ANN-GWO and ANN-GOA modelling approaches for both training and testing cases demonstrate that a high degree of performance and consistent results were repeatedly obtained over the entire period of the sampling time.

The results illustrated in Figs. 9, 10, 11, 12, 14, 14, 15, 16, 17 and 18 show the ability and reliability of the proposed modelling approaches in the modelling of the web tensions of the WP. The plot results of the models for the actual and predicted web tension between reels 1 and 2 and between reels 2 and 3 in the training and testing cases are almost replica to each other, meaning that there is no much difference with a deviation in the data in the corresponding points of the plots, as visually shown. So, these obtained results prove a highly convincing level of modelling and identification for training and testing processes for both web tension cases. It is also observed from Figs. 9, 10, 11, 12, 14, 14, 15, 16, 17 and 18 that the presented ANN-PSO-, ANN-GOA- and ANN-GWO-based data modelling is at a high degree of performance, such that the estimated \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) outputs highly resemble the actual \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) outputs, respectively.

We can observe from all these figures that the estimated winding process output is almost identical to the actual winding process output. This demonstrates the efficacy of each of ANN-PSO, ANN-GOA and ANN-GWO in training ANNs. These results further show superior performance of the proposed modelling methods compared to previous methods in the literature such as MNLR method.

9.4 Convergence

The convergence curves that identify the performance of the ANN-PSO, ANN-GOA and ANN-GWO models, for modelling the web tension \(\hbox {T}_1\) of the WP, are presented for up to 1200 iterations in Fig. 21.

Fig. 21
figure 21

Best convergence curves over ten experiments for the ANN-PSO, ANN-GOA and ANN-GWO models for the web tension \(T_1\) at different MSE values

The data of the plots in Fig. 21 represent the mean of the sum of the squared errors between the estimated data values created by the presented modelling schemes and the corresponding real data measurements. The vertical error bars that appear on the convergence curves represent one standard error of the mean.

The standard error of the mean was used to determine whether the differences between the means of the data measurements of the models are statistically significant. The convergence curves show that the ANN model-based PSO, GOA and GWO converge rapidly to the desired global minimum error. There is a consistent difference between ANN-PSO and ANN-GOA as well as the ANN-GWO results. The error bars of the curves generated by ANN-GOA and ANN-GWO overlap at many points along the curves. The curve plots of ANN-PSO and ANN-GOA in addition to ANN-GWO models are obviously separated and the error bars do not interfere along the curves. In sum, the convergence curves in Fig. 21 confirm that the ANN-PSO, ANN-GOA and ANN-GWO models have reached a high degree of performance in modelling \(\hbox {T}_1\).

9.5 Evaluation results

To obtain a quantitative assessment of the performance of the proposed models: ANN-PSO, ANN-GOA and ANN-GWO, a set of measures were calculated to identify the degree of similarity between the actual output data and the corresponding estimated data. The performance results for modelling the web tensions \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) are shown in Tables 3 and 4, respectively.

Table 3 Training and testing results of web tension \(\hbox {T}_{1}\) arrived at by the proposed models
Table 4 Modelling and evaluation results of web tension \(\hbox {T}_{3}\) based on the proposed models

Tables 3 and 4 show that the modelling and evaluation results are reasonable in both training and testing. There is a relatively high correlation between the predicted and true web tensions datasets as observed in Tables 3 and 4, whereas a minimum value of 99.707 was reported for the correlation R at testing the ANN-GWO model. Interestingly, the ANN-PSO models reported better results than the ANN-GOA and ANN-GWO models. The ANN-GOA models reported correlation rates in training and testing for both \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) comparable to the correlation rates reported by ANN-PSO model. The ANN-GWO model did not perform as the other models on both \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) datasets. However, the difference margin to the corresponding data of ANN-GOA results is relatively small.

The results in terms of VAF measure for a series of experiments to evaluate \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) models for all proposed models, with a different number of hidden neurons in each experiment, are shown in Tables and, respectively.

Table 5 Web tension \(\hbox {T}_{1}\): Evaluation VAF results of the proposed models
Table 6 Web tension \(\hbox {T}_{3}\): Evaluation results based on the proposed models

It is observed from Tables 5 and 6 that there is a gradual improvement in the VAF rate as the number of hidden neurons increases. Further, there is a very small increase in the VAF rate when the number of neurons increases from 18 to 24. The difference is less than 0.02 on average which is a small and not statistically significant. This finding shows that 18 or 24 hidden neurons in the models are not critical to performance. The application of a smaller number of neurons will be more rapid. There is a factor of approximately 0.04 on average in VAF rate between the smallest and biggest number of neurons. This is a major difference in computational burden.

To sum up, the evaluation results in Tables 5 and 6 divulge that the ANN-PSO approach showed a large level of accuracy in terms of training and testing in comparison with ANN-GOA and ANN-GWO; also, there is no significant difference between ANN-GOA and ANN-GWO in this study. Further, the results showed that ANN-PSO, ANN-GOA and ANN-GWO models have a significant level of modelling ability, where the ANN-PSO model has better accuracy.

Table 7 A comparison in terms of the MSE between the proposed modelling approaches and other reported modelling approaches for \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) of the WP for a set of ten experiments

The applications of the models proposed to the winding process presented very satisfactory results, where there are always gains when adding more hidden neurons to the ANN, systematically increasing fit performance. In addition, it is possible to reach levels close to unit in performance, indicating that it is possible to reach desired levels of fit performance, even when the target of performance is unit. This ratifies that the proposed models achieved a high level of modelling performance in the modelling of the winding process.

While the effectiveness of the proposed models is highly reliable in modelling the winding process, there is a need in some evaluation cases to increase the number of hidden neurons to augment the performance level of the developed models. Another limitation of the proposed models is related to the relatively high computational cost. The computational cost of the proposed modelling method depends on the parameters of both the ANN model and the optimisation algorithms used to optimise the ANN weights, as the optimisation algorithms require a number of search agents and generations that are sufficiently adequate to obtain a high level of performance.

9.6 A comparison with other reported models

The aforementioned rendering criteria are used to obtain a quantitative evaluation of the performance obtained through the modelling schemes proposed. The models were trained ten times on a data set of 1250 samples for each input variable of the WP. The models were then tested on a different data set of 1250 samples for each input variable of the WP. The performance of the proposed ANN-PSO, ANN-GOA and ANN-GWO models is compared to conventional and state-of-the-art approaches reported in the literature, in which they are modelled the same process. The conventional approaches include MNLR and LSE. The state-of-the-art approaches are the ANN-based RBF and MLP as well as the RLLNF reviewed in the literature. The performances of the developed models for the web tensions between reels 1 and 2 (\(\hbox {T}_{1}\)) and between reels 2 and 3 (\(\hbox {T}_{3}\)) in terms of the MSE are displayed in Table 7 for both training and testing data sets. In Table 7, the number of neurons of the neural network is given.

The evaluation results presented in Table 7 show the appropriateness of the proposed model-based methods for modelling \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\). It is clear that a high level of performance was achieved using the proposed ANN-PSO model in both training and testing. Besides, the ANN-GOA-based approach achieved a faintly better degree of performance for \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) than the ANN-GWO model-based RMLP approach. As Table 7 demonstrates, the proposed models presented superior performance compared to the other model-based traditional and state-of-the-art methods.

The evaluation training and testing results for the presented modelling systems in modelling the \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) are summarised in terms of the VAF criterion in Table 8.

Table 8 Variance account for values for \(\hbox {T}_{1}\) and \(\hbox {T}_{3}\) of the WP using the developed models over ten experiments

Table 8 compares the evaluation strategy and the performance accuracy of the models presented in this paper to four modelling methods for the same industrial process. It is clear that a high level of performance is obtained using the proposed models. Moreover, the VAF results of the proposed models are significantly much better than the conventional models and slightly better than a state-of-the-art model Nozari et al. 2012. This illustrates that the presented models were managed to be a well representation of the WP web tensions. The results in Tables 7 and 8 reveal that PSO, GOA and GWO algorithms are efficacious in adjusting the weights of ANNs and reaching optimal MSE and VAF values. The overall results prove that the proposed modelling techniques have produced valuable and adaptable models in representing the behaviour of the WP web tensions and further are capable of obtaining good mathematical models for other complex industrial and control problems.

9.7 Statistical analysis

A statistical analysis was performed using Friedman statistical test Friedman 1937, at a significance level of \(\alpha = 0.05\), to verify the efficiency of each model presented and to rope the statistical preciseness of the proposed models with the accuracy of conventional and state-of-the-art models. This is to rank the presented models alongside conventional and state-of-the-art models. It is also able to identify the best model that yielded the best performance that can be defined as the control model. Holm’s procedure Holm 1979 was then performed in accordance to the Friedman test results to identify the degree of significance of the differences between the control model and the other models. In addition, Holm’s procedure was conducted to obtain the adjusted p values for each comparison addressed between the control model and the other models, as well as rejecting the null hypothesis of comparable performance between the models. Table 9 shows the ranking produced by Friedman test for the presented model.

Table 9 Average rankings of the presented models (Friedman)

The ranking results in Table 9 show that ANN-PSO model is ranked first, followed in order by ANN-GOA, LLNF, ANN-GWO, RBF, MLP and LSE models. The \(p-\) value of Holm’s procedure in Table 10 was adjusted using the ANN-PSO model because it was identified as the control model. The p value computed by Friedman test is 0.00651, which is less than \(\alpha =0.05\). This means that there is a large difference between the results of the evaluated models. Thus, Holm’s procedure was applied to demonstrate whether this difference is statistically significant between the control model and the other models.

Table 10 Adjusted p value and the null hypothesis based on the Holm’s procedure with \(\alpha =0.05\) (Friedman)

Holm’s procedure rejects those hypotheses that have a p value \(\le 0.01\). The Holm’s results, presented in Table 10, show that ANN-PSO model, statistically, reported results significantly better than the classical and state-of-the-art models for the winding process data. Further, the results shown in Table 10 confirm that ANN-GOA and ANN-GWO models are, statistically, exhibited a considerable level of performance.

To recapitulate, the sensible performance of the ANN-based PSO, GOA and GWO models demonstrates the generalisation capabilities and substantiates the modal basis for these models in modelling the case study.

The developed models have learned to precisely characterise the behaviour of the winding process due to the finding of the optimum weights for the proposed ANN model on the basis of the algorithms PSO, GOA and GWO. As a result, the developed models have achieved high accuracy values and very low error rates. This high degree of performance is considered a big advantage of the proposed modelling method. However, due to the use of PSO, GOA and GWO in training the ANN, the models ANN-PSO, ANN-GOA and ANN-GWO are subject to a relatively high computational burden. Additionally, increasing the number of neurons in the ANN model will raise the cost.

The nonlinear simulator models, ANN-PSO, ANN-GOA and ANN-GWO, presented in this work, have the ability to simulate winding plant behaviour afterward training without requiring real winding plant output data, but simply by providing the models with the inputs. This feature allows the proposed models to design and simulate model-based controllers over the full operating ranges of a nonlinear winding system. In short, the ANN-PSO, ANN-GOA and ANN-GWO models are highly efficient in the modelling of the winding process which can further be utilised to model any other complex industrial process.

10 Conclusion and future work

The proposed work has demonstrated the use of three bio-inspired algorithms (BIAs) to train artificial neural networks (ANNs) to derive three intelligent models for an industrial winding process (WP). These BIAs include particle swarm optimisation (PSO), grasshopper optimisation algorithm (GOA) and grey wolf optimiser (GWO). These models are referred, respectively, to as ANN-PSO, ANN-GOA and ANN-GWO. In the context of creating an optimal model structure for the WP, the mean square error (MSE) was used as a fitness function to evaluate each solution. The performance of the proposed models was assessed through conducting several experiments using the MSE, Variance-Accounted-For (VAF) and the correlation criteria. All evaluation measures reported a high level of agreement between modelled and observed WP output data, suggesting that the presented models are highly accurate. On all criteria, a very high degree of performance was realised by the proposed intelligent models with a correlation coefficient of more than 99% in all training and testing processes. In the simulation results, the proposed ANN-PSO, ANN-GOA and ANN-GWO models have created a faithful replica for their corresponding WP output data. These findings clearly evidenced the potential of PSO, GOA and GWO as promising learning algorithms for the optimisation of ANN weights. The results of the proposed models favourably outperformed state-of-the-art models and adduced superior performance rates considerably better than the conventional Least Square Error (LSE) method. Consequently, the proposed modelling method can be used as a base system upon to model any nonlinear industrial system and can offer a way to compare improvements between convincing modelling methods. The use of a new model-based identification system to model the WP is a useful direction and an interesting trend for future research and experiments. This model-based simulator approach could be further used to design a model predictive controller system or other winding systems with a set of variable parameters. Further work is demanded to assess the aptness of the proposed modelling approach to other real industrial benchmark datasets.