1 Introduction

In the present age, the machine learning techniques for handling multimodal dynamic systems have achieved a flurry of research and experienced rapid evolution. Dynamic analysis of multimodal systems is very crucial in engineering structures [1]. Most scientific problems are inherently non-linear in nature, so sometimes, traditional numerical methods fail to find the parametric solution of differential equations. Also, the existing traditional methods may sometimes be problem-dependent and require repetitions of the simulations. In contrast, a neural network-based approach provides the alternate and closed solution for said differential equations (DE), and the prediction using machine learning looks like a black box.

In 1943, McCulloch and Pitts come up with an idea of the first elementary model of the neural network [2]. However, it gets popular among mathematicians in the 1990’s after Lee and Kang [3] developed Hopfield neural network to solve first-order DE. DEs have been extensively investigated by researchers in the past few years [4,5,6,7,8]. Jiang et al. did a good amount of work to model the behaviour of dynamical systems [9] using DEs with respect to time series functions. The micro-level of modeling for DEs using machine learning algorithms has shown potential growth [10]. Deep learning solutions to stochastic DEs are frequently unstable and stuck in local optima [11], particularly for higher-order DE. Sirignano and Spiliopoulos [12] gave a meshfree algorithm of deep neural networks as PDE approximators, which converges as the number of hidden layers tends to infinity. Meade and Fernandez introduced B1 splines as a basis function in a feed-forward neural network (FFNN)2 [13]. This algorithm was employed to solve the linear and non-linear ODE numerically. Mall and Chakraverty [14] developed a single-layer Chebyshev orthogonal polynomial link functional neural model with regression-based weights to handle different non-linear DEs. In the last few decades, many optimization algorithms have emerged in both the research and application sectors. Verma and Kumar designed a multilayer network for solving dynamic models of mathematical physics using BFGS optimization algorithm [15]. Rizk‑Allah and Hassanien [16] developed a neural model using hybrid Harris hawks–Nelder–Mead optimization (HHO-NM) algorithm for solving spring-mass system problems.

In a pioneering work, Mosta and Sibanda [17] proposed a linearization technique for solving Van der Pol bimodal equation. The active control approach to solve the double-well Duffing–Van der Pol oscillator has been developed by Njah and Vincent [18]. Ibsen et al. [19] employed the homotopy analysis approach for solving the double-well and double hump Van der Pol–Duffing oscillator equations. Nourazar and Mirzabeigy [20] used modified transformation techniques to obtain an approximate solution of Van der Pol equations. Stability analysis of the Van der Pol equation was explored by Hu and Chung [21]. An algebraic method was presented by Akbari et al. [22] to find the solution of the Duffing equation. Chaotic behaviours of Van der Pol-Mathieu-Duffing Oscillator (VdPM-DO) system under different forms of excitations were presented by Kimiaeifar [23] excellently. For solving these celebrated non-linear oscillator equations, few other techniques have been developed [24,25,26,27] by the research community in recent years. Whatever numerical techniques are employed, they need stringent step size requirements and repeat of iterations for numerical precision, which will involve a significant amount of computing cost.

So to overcome these limitations, in the ongoing years, ANN models have also been simultaneously developed by researchers to approximate the solutions of dynamical problems. To predict neural solutions of boundary and initial value problems, different ODE and PDE solvers have been developed [28, 29]. Mall and Chakraverty designed a single-layer Hermite link ANN model to handle the Van der Pol-Duffing Oscillator (VdP-DO) equations [7]. Raja et al. [30] studied the power of intelligent computing to compute the solution to Mathieu’s systems. For detecting the weak signals in a strongly coupled Duffing-Van der Pol oscillator, Wang et al. used a long short-term memory (LSTM) neural network [31]. Yin et al. introduced a combination of fractional FLANN filters for solving VdP-DOE [32]. In recent research, Bukhari et al. introduced a non-linear autoregressive radial basic function (NAR-RBFs) neural network filters for solving the Van der Pol-Mathieu-Duffing Oscillator Equations (VdPM-DO) [33].

Motivated by the above considerations, it is natural to propose new and efficient ANN algorithms to understand the dynamical behaviour of linear/non-linear oscillator systems. In this regard, an unsupervised multi-layer neural model viz. Symplectic Artificial Neural Network (SANN) has been developed for solving the VdPM-DO equations for different excitation functions. The SANN is a novel model which guarantees that the network conserves energy when solving the DE [34]. It is more numerically precise and robust method to solve dynamical equations than standard NN and numerical methods. The main contribution of this paper is discussing the Symplectic Artificial Neural Network for solving the Van der Pol-Mathieu-Duffing equations for different values of the excitation function.

The remainder of this paper is organized as follows: Sect. 2 is devoted to the preparation of a multi-layer neural network and also describes the dynamics of VdPM-DO equations. The main result of this paper consists of the SANN algorithm, which is provided in Sect. 3. Section 4 presents several examples to illustrate the performance of our model. Lastly, the concluding remarks have been provided in Sect. 5.

2 Preliminaries

In this section, some preliminaries related to the architecture of multilayer Feed-forward Neural Network, governing equation and significance of Van der Pol-Mathieu-Duffing oscillator equations have been discussed.

2.1 Feed-forward neural networks: an overview

ANN is an exciting form of Artificial Intelligence (AI) that mimics the human brain’s training process to predict patterns from the given historical data. Neural networks are processing devices of mathematical algorithms that may be built by using computer languages [35].

Various learning procedures and parameters are required for modeling a neural network [36,37,38]. The neural network is made up of layers, and layers are made up of several neurons /nodes. Every ith and (i + 1)th layer neuron is interconnected by some signals known as synaptic weights. The weights are numerical values allocated to each link. Signals are received by the input layer, which is multiplied by the weights, then summed and sent to one or more hidden layers. The output layer receives linkages from hidden layers. Each node receives its net input and uses an activation function to process it. The input/output relation is given by

$$O_{q} \, = \nabla \left( {net\,(q)} \right),$$
(2.1)
$$net\,(q) = \sum\limits_{i = 1}^{n} {(w_{qi} } .\,x_{i} ) + \tilde{b}_{q\,} ,\,\quad q = 1,2,3,....N,$$
(2.2)
$$\nabla \left( x \right) = \frac{x}{1 + \exp ( - ax)},$$
(2.3)

where \(\nabla (x)\) is the swish activation function, \(x_{i}\) denotes the given input, \(w_{qi}\) is weight from the input unit \(i\,\) to the hidden unit \(q\),\(\,\tilde{b}_{q}\) is bias.

ANN analyses data in the same way as the human brain does. It processes the training data and updates the weights to improve the accuracy of neural forecasted value. Epoch refers to a complete cycle of passing the training algorithm and updating the weights. The error is minimized by using the required epochs. The accuracy of a neural network is improved through hyperparameter tuning.

2.2 Van der Pol-Mathieu-Duffing oscillator equations

The governing equation of Van der Pol-Mathieu-Duffing oscillator equations can be written as [23]

$$\left\{ \begin{gathered} \gamma^{\prime\prime}\,\left( t \right) + \varepsilon \left[ {\left( {\mu - \alpha \gamma^{2} } \right)\gamma^{\prime}\left( t \right) + \vartheta \gamma \,\cos \,\left( {\Omega t} \right)} \right]\,\, - \varpi^{2} \gamma \, + \beta \gamma^{3} - \varepsilon \sigma \gamma \zeta (t)\, = 0,\,\quad \alpha > \,0 \hfill \\ \gamma (0) = \,\,\overline{a}\,,\,\;\dot{\gamma }\left( 0 \right) = \overline{b}. \hfill \\ \end{gathered} \right.$$
(2.4)

It is noted that \(\mu ,\alpha ,\) and \(\sigma\) in the above equation are defined as constant values. Other notations represent different physical properties of the titled equation. This non-linear equation has been extensively used in various scientific problems like non-linear dusty plasma system [33], plasma physics models [30], inverted pendulum [39], radio frequency quadrupole [40], early mechanical failure signal [41], diffraction [42], weak signal detection [43], etc. This system may also be used to study the dynamical behaviour of limit cycles [44].

3 Neural formulation for differential equations

An autonomous system of nth-order ODE can be written as

$$\eta \left( {\xi ,\upsilon \left( \xi \right),\upsilon^{\prime}\left( \xi \right),\upsilon^{\prime\prime}\left( \xi \right) \ldots .\upsilon^{n} \left( \xi \right)} \right) = 0,\, \vdots \,\xi = \left( {\xi_{1} ,\xi_{2} ,...\xi_{n} } \right) \in \psi \subset \,R^{n}$$
(3.1)

where \(\upsilon \left( \xi \right)\) and \(\psi\) are the determined solution and discretized domain respectively.

Let \(\upsilon_{\iota } \left( {\xi ,\delta } \right)\) indicates the ANN approximate solution with adjustable parameters \(\delta\) (weights and biases). Then Eq. (3.1) can be rewritten as

$$\eta \left( {\xi_{i} ,\upsilon_{\iota } \left( {\xi_{i} ,\delta } \right),\upsilon^{\prime}_{\iota } \left( {\xi_{i} ,\delta } \right),\upsilon^{\prime\prime}_{\iota } \left( {\xi_{i} ,\delta } \right)...\upsilon^{n}_{\iota } \left( {\xi_{i} ,\delta } \right)} \right) = 0.$$
(3.2)

The corresponding cost function of ANN can be obtained by converting Eq. (3.2) to an unconstrained optimization problem. It can be written as

$$E(\delta ) = {\text{Min}}\sum\limits_{{\xi_{i} \in \,\psi }}^{n} {\,\left( {\eta \,\left( {\xi_{i} ,\,\upsilon_{\iota } \left( {\xi_{i} ,\delta } \right),\,\upsilon^{\prime}_{\iota } \left( {\xi_{i} ,\delta } \right),\upsilon^{\prime\prime}_{\iota } \left( {\xi_{i} ,\delta } \right),...,\,\upsilon^{n}_{\iota } \left( {\xi_{i} ,\delta } \right)} \right)} \right)^{2} } .$$
(3.3)

The ANN approximate solution \(\upsilon_{\iota } \left( {\xi_{i} ,\delta } \right)\) can be written as the sum of two terms

$$\upsilon_{\iota } \left( {\xi_{i} ,\delta } \right)\, = \,\alpha + \,f\,\left( {\xi ,NN(\xi ,\delta )} \right),$$
(3.4)

where the first part of the approximate solution i.e. \(\alpha\) is associated with the initial or boundary conditions, without the adjustable parameters. The second part of the approximate solution \(NN(\xi ,\delta )\) is an output of the FFNN that consist \(M\) hidden layers, with \(k\) neurons in each hidden layer, and a linear output node for a given input \(\xi \, \in R^{n}\). Weights and biases of \(NN(\xi ,\delta )\) are adjusted to deal with the minimization of the cost function. The FFNN outcomes \(NN(\xi ,\delta )\) can be written as:

$$NN(\xi ,\delta ) = \,\sum\limits_{j = 1}^{k} {v_{j} \,\frac{{g_{j} }}{{1 + \exp ( - ag_{j} )}}}$$
(3.5)

such that,

$$g_{j\,} \, = \sum\limits_{i = 1}^{n} {\ell_{ji} } \xi_{i} \, + \,\tilde{b}_{j} ,$$
(3.6)

where \(\ell_{ji}\) and \(v_{j}\) represent the synaptic weights from the input unit \(i\,\) to the hidden unit \(j\) and the hidden unit \(j\) to the output unit respectively, and \(k\) represents the number of neurons.

3.1 Symplectic neural network algorithm

Symplectic neural network is a type of artificial neural network that is inspired by the mathematical structure of symplectic geometry used to describe the dynamics of physical problems. SANNs are more efficient than traditional neural networks because they use fewer parameters and require less training data, which makes them well-suited for problems in physics and other fields where the dynamics of systems are considered. In contrast, standard neural networks [4] do not have this constraint and can be applied for tasks such as prediction and classification. Practically, SANN is constructed in such a way that it conserves energy [34].

Let us consider the first-order ODE as

$$\left\{ \begin{gathered} \gamma^{\prime} = \, f\,{(}\xi {,}\gamma { ) , }\quad \xi \in \, \left[ {\text{a,b}} \right] \hfill \\ \gamma (a) = \,\,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\alpha } . \hfill \\ \end{gathered} \right.$$
(3.7)

Consequently, the neural approximate solution can be written as

$$\upsilon_{\iota } \left( {\xi_{i} ,\delta } \right) = \,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\alpha } + \,(1\, - \,e\,^{ - (\xi - \,a)} )NN(\xi ,\delta ),$$
(3.8)

where \(NN(\xi ,\delta )\) is the output of the \(M^{{{\text{th}}}}\) hidden layer with one input data \(\xi\), and parameters \(\delta\). The ANN approximate solution \(\upsilon_{\iota } \left( {\xi_{i} ,\delta } \right)\) satisfies the initial conditions of the given DE. In order to find the loss function we may need to calculate gradient of the network \(NN(\xi ,\delta )\). It can be computed as follows:

$$D^{m} \,NN(\xi ,\delta ) = \,\sum\limits_{j = 1}^{k} {v_{j} \,\ell_{ji}^{m} \,\nabla^{m} \left( {g_{j} } \right)}$$
(3.9)

Differentiating Eq. (3.8) we have

$$\begin{aligned} D^{m} \upsilon_{\iota } \left( {\xi_{i} ,\delta } \right) & = \,D^{m} \,\left[ {(1\, - \,e\,^{ - (\xi - \,a)} )NN(\xi ,\delta )} \right] \\ & = D^{m} \,\left[ {(1\, - \,e\,^{ - (\xi - \,a)} )\sum\limits_{i = 1}^{n} {} \sum\limits_{j = 1}^{k} {v_{j} \,\frac{{\ell_{ji} \xi_{i} \, + \,\tilde{b}_{j} }}{{1 + \exp \left( { - a\,\left( {\ell_{ji} \xi_{i} \, + \,\tilde{b}_{j} } \right)} \right)}}} } \right] \\ \end{aligned}$$
(3.10)

Thus the loss function for Eq. (3.7) is calculated as

$$E(\delta ) = \sum\limits_{i = 1}^{n} {\left( {D\upsilon_{\iota } \left( {\xi_{i} ,\delta } \right) - f\left( {\xi_{i} ,\upsilon_{\iota } \left( {\xi_{i} ,\delta } \right)} \right)} \right)^{2} } .$$
(3.11)

Now, consider the second-order ODE as

$$\left\{ \begin{gathered} \gamma^{\prime\prime} = {\text{f (}}\xi {,}\gamma ,\gamma^{\prime}{ ) , }\quad \xi \, \in \, \left[ {\text{a,b}} \right] \hfill \\ \gamma (a) = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\alpha } \,,\gamma^{\prime}\,\left( a \right) = \,\hat{\beta }. \hfill \\ \end{gathered} \right.$$
(3.12)

Here, the neural approximate solution is written as

$$\upsilon_{\iota } \left( {\xi_{i} ,\delta } \right) = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\alpha } + \hat{\beta }\,(\xi - a) + (1 - e^{ - (\xi - a)} )^{2} NN(\xi ,\delta ),$$
(3.13)

Weights associated with this multi-layer perceptron are updated by the back-propagation learning algorithm through optimizing the following loss function

$$E(\delta ) = \sum\limits_{i = 1}^{n} {\left( {D^{2} \,\upsilon_{\iota } (\xi_{i} ,\delta ) - f\left( {\xi_{i} ,\upsilon_{\iota } (\xi_{i} ,\delta ),\,D\,\upsilon_{\iota } } \right)} \right)^{2} } .$$
(3.14)

During the training of the neural network the epochs will continue until the error is minimized. In each epoch, the Adam optimizer will update the parameters of the network without affecting the learning rates in order to find the optimum weights. Graphical abstract of SANN model for solving VdPM-DO equation is delineated in Fig. 1.

Fig. 1
figure 1

Framework of SANN model for solving VdPM-DOE

4 Robustness analysis in diverse scenarios and discussions

In this section, we have encountered the titled problem with different excitation functions for three cases and the robustness of the model has been discussed in detail.

4.1 Experimental settings

To show the efficacy of the SANN model, VdPM-DO equations are solved for two separate excitation functions under varied initial conditions. The accuracy of the suggested algorithm is demonstrated in the tables and graphs. Neural network training techniques are often iterative; therefore, the user must designate a starting point for iterations. Furthermore, training neural models is a challenging enough process and the initial guess has a significant impact on most approaches. The initial guess can decide whether or not the method converges at all, with certain initial points being so unstable that the algorithm encounters numerical difficulties during the training and fails altogether. One of the most prominent strategies for the parameter initialization is known as “break the symmetry” [45, 46]. So, the user may start with arbitrary, small and distinct real numbers as the initial weights of the network. As such, arbitrary real numbers in \(\left[ { - 1,1} \right]\backslash \left\{ 0 \right\}\) are considered as initial weights in this experiment.

In the next step, a training method is employed to tune the adjustable parameters of SANN, which is embedded in the approximate solution. The SANN is trained in order to predict the solutions of VdPM-DO equations for any point inside the given domain by unsupervised training, where the adjustable parameters are updated to minimize the cost function. We have taken Adam optimization with learning rate of 0.001 as it converges faster and is relatively stable. We have trained the network for 10,000 epochs for each problem solved here.

On the other hand, we observed that when the model converges, then weights also converge to a minimum in the vicinity of the initial set of weights. But, when the network is diverging, the updated weights move away from the initial weight configuration. More training will not be able to reduce the loss once it reaches this degree of divergence. In addition, Eq. (2.3) is taken as activation function [47] during the training of the model. However, after selecting the basic framework [29], the optimal number of hidden layers and the number of nodes in each hidden layer are selected after several experiments with a different number of hidden layers and nodes.

4.2 Simulation results

This section obtains solutions of VdPM-DO equation with different excitation functions through the SANN model. Following experiments are conducted in Python 3.0 under the Jupyter notebook environment. Moreover, to evaluate the performance of the model, different statistical measures such as MAE, MSE, RMSE and TIC have been calculated with respect to existing HAM [23] and RK4 [23] solutions as follows:

$${\text{MAE}} = \frac{1}{N}\sum\limits_{i - 1}^{N} {\left| {X(t) - \hat{X}(t)} \right|}$$
$${\text{MSE}} = \frac{1}{N}\sum\limits_{i - 1}^{N} {\left( {X(t) - (\hat{X}(t)} \right)^{2} }$$
$${\text{RMSE}} = \sqrt {\frac{1}{N}\sum\limits_{i - 1}^{N} {\left( {X(t) - \hat{X}(t)} \right)^{2} } }$$
$${\text{TIC}} = \frac{{\sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {X(t) - \hat{X}(t)} \right)^{2} } } }}{{\sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {X\left( t \right)} \right)^{2} } } + \sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {\hat{X}\left( t \right)} \right)^{2} } } }}$$

Here \(X(t)\) and \(\hat{X}\left( t \right)\) are numerical solutions (HAM/RK4) and obtained SANN solutions respectively.

Case-1 (For excitation function \(\zeta (t)\, = 1\)).

In Eq. (2.4), by setting the parametric excitation function \(\zeta (t)\, = 1,\,\overline{a} = 1,\overline{b} = 0\) and rewriting the equation, we have [23]

$$\left\{ \begin{gathered} \gamma^{\prime\prime}\,\left( t \right) + \varepsilon \left[ {\left( {\mu - \alpha \gamma^{2} } \right)\gamma^{\prime}\left( t \right) + \vartheta \gamma \,\cos \,\left( {\Omega t} \right)} \right]\,\, - \varpi^{2} \gamma \, + \beta \gamma^{3} - \varepsilon \sigma \gamma \, = 0,\,\quad \alpha > \,0 \hfill \\ \gamma (0) = \,\,1\,,\,\;\gamma^{\prime}\left( 0 \right) = 0, \hfill \\ \end{gathered} \right.$$

As discussed above, the neural approximate solution can be computed as follows

$$\varphi_{\tau } \left( {t,p} \right) = 1 + (1 - e^{ - t} )^{2} NN(t,p)|_{{\gamma \to \hat{\alpha } = 1,\,\gamma^{\prime} \to \hat{\beta } = 0}} .$$

To solve this non-linear system, we construct a multi-layer network with a single input, single output, and three hidden layers, with \(k = 50\) nodes in each hidden layer. We have trained the network for 150 equidistant points from 0 to 3 s, which are randomly selected at the beginning of each epoch [44].

We have taken the values of parameters from the literature [23] as shown in nomenclature as \(\alpha = 0.1,\beta = 0.1,\varepsilon = 0.1,\mu = 0.5,\vartheta = 0.1,\varpi = 1,\sigma = 1,\Omega = 1\). Table 1 shows the comparison among existing RK4 [23], HAM [23], ANN solutions and the obtained SANN solutions. From the tabular values we found that MSE of SANN is 0.0003393.

Table 1 Comparison among RK4 [23], HAM [23], ANN and SANN solutions (Case-1)

The time series plot of SANN solutions are delineated in Fig. 2. Figure 3 demonstrates the curves of training and validation loss over 10,000 epochs for the given model. The accuracy rate of SANN is visualized from the graph of training loss and the validation loss against epochs. It may be perceived that the loss function is minimized as the training period increases. The absolute error between SANN and HAM as well as SANN and RK4 results are compared graphically in Fig. 4. It is worth mentioning that the CPU time for training and computation is 521.794780254364 s.

Fig. 2
figure 2

Time series plot at testing points by SANN model (Case-1)

Fig. 3
figure 3

Training and validation loss during learning process of SANN (Case-1)

Fig. 4
figure 4

Plot of absolute error between SANN and HAM, SANN and RK4 solutions (Case-1)

Case-2 (For excitation function \(\zeta (t)\, = 1\)).

Here we have taken the same equation as case-1 with different initial conditions \(\overline{a} = 0,\overline{b} = 1\)

$$\left\{ \begin{gathered} \gamma^{\prime\prime}\,\left( t \right) + \varepsilon \left[ {\left( {\mu - \alpha \gamma^{2} } \right)\gamma^{\prime}\left( t \right) + \vartheta \gamma \,\cos \,\left( {\Omega t} \right)} \right]\,\, - \varpi^{2} \gamma \, + \beta \gamma^{3} - \varepsilon \sigma \gamma \, = 0,\,\alpha > \,0 \hfill \\ \gamma (0) = \,\,0\,,\,\gamma^{\prime}\left( 0 \right) = 1, \hfill \\ \end{gathered} \right.$$

In this case, the values of parameters \(\alpha = 0.1,\,\beta = 0.1,\,\varepsilon = 0.1,\mu = 0.5,\vartheta = 0.1,\,\varpi = 1,\sigma = 1,\,\Omega = 1\) are being fixed, and accordingly, we get the neural approximate solution as follows:

$$\varphi_{\tau } \left( {t,p} \right) = t + (1 - e^{ - t} )^{2} NN(t,p)|_{{\gamma \to \hat{\alpha } = 0,\,\,\gamma^{\prime} \to \hat{\beta } = 1}} .$$

Here we have designed a multi-layer network having a single input and output node, and each network consists of three hidden layers with \(k = 50\) neurons per layer. Then the network has been trained for 150 equispaced points for \(t\, \in \left[ {0,3} \right]\,.\)

The training and validation loss against the epoch for the neural network is delineated in Fig. 5, and the time series of neural solutions are plotted in Fig. 6. Robustness of the model increases along with the decrease of the value of the loss function. Tabular results for each of the above cases at different testing points are demonstrated in Table 2. Furthermore, the error made using the current approach is calculated between the obtained SANN results and RK4 results. It may be observed from the box plot of absolute errors (Fig. 7) that AE ranges in 1E − 05 to 5E − 02. The time taken by CPU for training of network and computation is 524.65269826 s.

Fig. 5
figure 5

Training and validation loss during learning process of SANN (Case-2)

Fig. 6
figure 6

Time series plot at testing points by SANN model (Case-2)

Table 2 SANN solutions at different testing points \(t \in [0,3],\,\Delta \,t = 0.1\)\([0,3]\)\(,\,\Delta \,t = 0.1\). (Case-1 and Case-2)
Fig. 7
figure 7

Box plot of absolute error between SANN and RK4 solutions (Case-2)

Case-3 (For excitation function \(\zeta (t)\, = t\)).

We conclude this simulation section by considering the following non-linear VdPM-DO equation with a different parametric excitation function, and the equation is of the form [23]:

$$\left\{ \begin{gathered} \gamma^{\prime\prime}\left( t \right) + \varepsilon \left[ {\left( {\mu - \alpha \gamma^{2} } \right)\gamma^{\prime}\left( t \right) + \vartheta \gamma \,\cos \,\left( {\Omega t} \right)} \right]\,\, - \varpi^{2} \gamma \, + \beta \gamma^{3} - \varepsilon \sigma \gamma \zeta (t)\, = 0,\,\alpha > \,0 \hfill \\ \gamma (0) = \,\,1\,,\,\gamma^{\prime}\left( 0 \right) = 0. \hfill \\ \end{gathered} \right.$$

We have fixed the values of parameters as the literature [23] \(\alpha = 0.2,\beta = 0.5,\varepsilon = 0.1,\mu = 0.5,\) \(\vartheta = 0.1,\,\varpi = 0.1,\sigma = 0.1,\Omega = 0.5\). Then the neural approximate solution Eq. (3.11) reduces to the following

$$\varphi_{\tau } \left( {t,p} \right) = 1 + (1 - e^{ - t} )^{2} NN(t,p)|_{{\gamma \to \hat{\alpha } = 1,\,\,\gamma^{\prime} \to \hat{\beta } = 0}} .$$

A multi-layer network with a single input, single output, and three hidden layers, \(N = 50\) nodes in each hidden layer, has been designed to illustrate the effectiveness of the proposed method. We have trained the network for 150 equidistant points from 0 to 5 s.

A comparison among the results obtained by RK4 [23], HAM [23], ANN and SANN algorithms are given in Table 3. The loss function of training and validation is contemplated in Fig. 8. The time series of SANN results are portrayed in Fig. 9, and absolute errors between SANN and HAM as well as SANN and RK4 are compared graphically in Fig. 10. It may be noted that MSE of SANN and traditional ANN are found 0.0072385 and 0.0081525 respectively. The CPU time for training and computation is 557.0586869716644 s.

Table 3 Comparison among RK4 [23], HAM [23], ANN and SANN solutions (Case-3)
Fig. 8
figure 8

Training and validation loss during learning process of SANN (Case-3)

Fig. 9
figure 9

Time series plot at testing points by SANN model (Case-3)

Fig. 10
figure 10

Plot of absolute error between SANN and HAM, SANN and RK4 solutions (Case-3)

4.3 Discussion and analysis

In the proposed methodology, the independent variables of DE are used as NN input. A feed-forward pass on the network gives us the value of the dependent variables evaluated at that particular point. Since NN is differentiable, we can compute the derivatives of the dependent variable (outputs) w.r.t. the independent variable (inputs) in order to find different derivatives that appear in the original DE. An unsupervised training method has been considered for solving the DE without knowing any solutions in advance. In this regard, an error function has been derived from the given DE, the approximate function and their derivatives. It may be noted that NN output is indeed a solution of DE when the loss function tends to zero.

Different statistical measures such as ‘MAE’, ‘TIC’ and ‘RMSE’ are conducted for the titled problem and outcomes are graphically portrayed in the Fig. 11. The accuracy of SANN is witnessed through the comparative assessment of the SANN and existing numerical solutions. One may decipher that the mode of AE values lie around 1E − 04, 1E − 02 and 1E − 03 for Case-1, Case-2 and Case-3 respectively. Moreover, MSE values of Case-1, Case-2 and Case-3 lie in the close vicinity of 3E − 04, 7E − 04 and 7E − 03 that indicate the correctness, precision and efficacy of the SANN model.

Fig. 11
figure 11

Performance indices based on statistical measures MAE, RMSE and TIC for a Case-1, b Case-3

It is well known that after training the neural model, it may be utilized as a black box to obtain numerical results of any arbitrary points in the given domain. In this experiment, we have considered three hidden layers with 50 neurons in each hidden layer for modeling of the network. One may consider more number hidden layers to construct a network, but it has been observed that by increasing the number of hidden layers and training a network for a long time, it loses its capacity to generalize. The error calculations, displayed graphs, and CPU processing time have been established to produce a complete robustness study for the SANN model. Based on these results, it is clear that the present unsupervised model is well validated.

5 Conclusion

In recent decades, the VdPM-DO equation has garnered a lot of attention from academics and scientists because of its various applications in mathematical physics and engineering applications. This investigation manifests that the presented SANN algorithm for solving the titled DE is promising. The accuracy of our model is demonstrated by the remarkable agreement of findings between SANN and existing numerical techniques HAM and RK4. As the graph of training loss and validation loss are converging, so it can be concluded that the obtained model is a robust model. Finally, it is worth mentioning that SANN is a reliable, computationally efficient and generic model. Also it can be considered as a powerful tool for the computation of non-linear oscillator problems.

The algorithm addressed in this article can be useful for solving different real-life problems and engineering applications such as modeling for the boundary of corneal model for eye surgery [48], astrophysical events [49], Stuxnet virus propagation [50], nervous stomach TFM model [51] and Bagley–Torvik fractional-order systems [52]. It also can be extended to functional link SANN for the alternate computing paradigm for solving dynamic models.