1 Introduction

Intelligent control strategies are very flexible for describing from an automation point of view. While the system performs critical functions, this concept is dynamically implemented in real-time. So it is different from traditional feedback. Thus, if the law of control is continuously updated, we can assume the classical adaptive control to be intelligent. This kind of system can be considered borderline according to the classification perspective [1]. If we trace intelligent control approaches from the source for mechatronics analysis, we will face analyzing and processing big data, the evolution of mathematic-based methods, and programming. The exponential growth process of this research area reached the early 2010s and did not stop declining [2].

Various artificial intelligence methods and areas are utilized in mechatronics and robotics, including artificial neural networks (ANNs), machine learning, evolutionary computing algorithms, and fuzzy logic. Machine learning consists of deep learning, reinforcement learning, classical learning (unsupervised and supervised), neural networks (NN), and ensemble methods. Intelligent control algorithms analyze large data sets and exploit beneficial patterns from actions taken by utilizing a variety of probabilistic, statistical, and optimization methods [2]. In the automatic control field, reinforcement learning is practical. Ensemble strategies and classical learning are also used for classifying and processing data sets against neural networks [3,4,5]. Conversely, neural networks are practical in dealing with unlabeled features and complex data.

In autonomous systems that interact with the real world, a critical challenge is the safety and reliability of utilizing intelligent control approaches. This problem is the subject of a review article [6] in which an asymptotic analysis of intelligent control approaches convergence is conducted.

This chapter discusses modern intelligent control approaches in mechatronics to recognize open issues and trends in intelligent control.

The chapter is constructed as follows: Sect. 2 recalls multiple intelligent control approaches. The applications of intelligent approaches in engineering control problems are presented in Sect. 3. The chapter is concluded in Sect. 4.

2 Smart Control Methods

Intelligent control is an apart discipline, but the application of new concepts, such as neural networks to control loops, utilizing different scientific approaches constructed based on automatic control theory, is required. Therefore, intelligent control can improve its performance every time by learning from previous experiences as a combined approach.

In the framework of intellectual approaches, assume the most general modern control theory methods. This chapter pays the most attention to machine learning since some of them are well-known, and there is no further explanation.

2.1 Adaptive Control Methods

Like optimal control, adaptive control is constructed based on a well-developed mathematical and theoretical justification [2]. This method became the initial step for intelligent control, as an integration of adaptive controllers within the framework of the classical automatic control theory provides a quality of operation of the system given the conditions of the parameters of the object and the specification of the external environment are unknown or change indefinitely. The adaptation principle can be considered the heart of intelligent control processes, which evolves from self-optimizing controllers to adaptive learning systems [7].

Adaptive control algorithms for time-discrete systems were applied to artificial intelligence by Yakubovich, who received several algorithms for training linear classification models [2, 8,9,10]. The Stripe Algorithm (SA) proposed by Yakubovich in a recently published article [11] has shown acceptable performance in machine learning. SA achieves higher performance than traditional linear learning methods by numerical analysis in online machine learning, making it suitable for this field. Lipkovich [12] provides various strategies for the reduction of loss optimization problems in dealing with inequalities systems, considering both regression and classification problems. In reference [11], a comparison analysis is conducted between SA and modern linear analogs, including logistic and linear regression. Complex non-linear models have the potential to outperform SA. However, the last one includes the common points of interest of linear models, like explainability, development, and implementation [12]. It can be noted that the presence of a hypothesis does not ensure practical application, especially in experimental conditions of control systems [12].

2.2 Optimization Techniques

Optimization techniques emerged before machine learning historically and were utilized to discover the extrema of a function [13]. Most machine learning problems are based on the theory of optimality. This theory can be generally formulated as the minimization of multiple features \(J\) regarding multiple parameters \(\theta =J(\theta )\to \underset{\theta \epsilon X}{\mathrm{min}}\). The form of the minimized value is determined by the machine learning approach. As an example, the prediction error on the existing sample is minimized in a regression or classification problem, while the greatest advantage from the activities of the plant is discovered in reinforcement learning. This can be accomplished by any search algorithm. As you can see, there are many types, methods, and uses of mathematical optimization.

2.3 ANNs

ANNs inspire biological networks as powerful artificial intelligence tools. ANN is an object that imitates the neural network constituting the human brain so that the computer can learn and make decisions like a human. An input layer, a hidden layer or more, and nodes or neurons as numerous simple computational components as an output layer construct an ANN structure. A simple ANN with two inputs and two hidden layers is presented in Fig. 1. This additionally includes relationships between neurons in consecutive layers through the weights. These weights can change the signal sent from one node to another and increase or decrease the impact of a particular relationship. A weighted input plus one bias from each neuron in the previous layer is received by each hidden layer neuron. The output of neurons is determined by their activation function. An ANN structure is shown below

Fig. 1
A structure diagram has two inputs. Input 1 has a and b. Input 2 has c and d. Both inputs lead to a neuron each and together result in the neuron output.

A simple ANN structure

$$Y=f\left(\sum_{i=1}^{n}{w}_{i}{v}_{i}-b\right)$$
(1)

In Eq. 1, f refers to the activation function, \({v}_{i}\) and \({w}_{i}\) are shown the input values and the weights of neurons, respectively. Also, y refers to the network’s output, b is the bias, and n indicates the neuron’s number in the hidden layers. The performance of the model can be enhanced by updating the network weights during the training phase. ANN’s neuron weights determine how the input data affects output data. The primary weights are selected randomly [14].

The network’s internal weights are tuned using a learning algorithm. Backpropagation (BP) algorithms are today’s most common form of training in ANNs. Also, optimization methods such as genetic algorithm and particle swarm optimization are practical in optimizing the ANN [14].

Using NNs is effective for noisy and non-linear system controls, and adaptability is provided for the system. The NN can work in real time after training. The constant NS advancement in properties and structure aims to overcome the existing shortcomings. The heuristic learning methods for NN can lead to deadlocks and vague solutions, and it needs a training sample to be prepared. Long training time is the principal disadvantage of NN in robotics, increasing the risk of inappropriate control of expensive equipment, the uniformity of training results for predictions, and the current implementation of NN can only be implemented in a very large-scale integration circuit form.

2.4 Fuzzy Logic Method

Zadeh proposed the fuzzy logic as an object with an element membership function in ranges [0, 1] to a set based on the fuzzy set concept [15, 16]. It turns out that fuzzy logic inference can be expressed in the NN form using the membership function to perform the task of neurons and the activation function, considering the neurons' connections as signal transmission. Currently, a lot of fuzzy neural networks roughly explained by the widespread shape of approximators have been developed [17].

2.5 Reinforcement Learning

Reinforcement learning is an approach for handling hybrid optimization problems in machine learning, a structure in which the operator learns how to perform successive decision-making tasks online through interaction with the environment. Reinforcement learning in agent planning is provided by receiving feedback on the outcome of choices made as a reward or punishment without specifying how to achieve the outcome. The reinforcement learning procedure is that the agent first chooses an action from the limited and possible action collection based on observing a situation in the environment and performing that action. Then, the agent receives a predetermined signal from the environment, demonstrating the quality of the operator's action as a reward or punishment. In the next step, the agent transfers to a new environmental status based on the current state [18,19,20]. In this approach, the agent interacts with the environment by performing a series of actions to find solutions [21, 22]. MDP provides a widely utilized mathematical framework for modeling such problems and consists of four stages [21,22,23]:

  1. 1.

    A set of states, \(S = \left\{{s}_{1}, {s}_{2 }, ...., {s}_{d}\right\}\)

  2. 2.

    A set of actions, \(A =\left\{{a}_{1}, {a}_{2 }, ...., {a}_{m}\right\}\)

  3. 3.

    A state transfer function \(T = (s{^\prime}|s, a)\) is a possibility distribution function that a given state s and action into a state s′.

  4. 4.

    A reward function \(R= S\times A\times S\to {\mathbb{R}}\) gives an instant reward when an agent performs an activity and moves from state to state s′.

Using the Markov chain in reinforcement learning, the agent’s choice of action is subject to a policy that determines the probability of choosing the action in a specific status. In other words, it determines the effect of the action in an independent state in such a way that the reinforcing learning agent learns to maximize all future rewards [24, 25].

$$R_t = r_{t + 1} + \gamma r_{t + 2} + \gamma^2 r_{t + 3} + \cdots \quad = \mathop \sum \limits_{k = 0}^\infty \gamma^k\,.\,r_{t + k + 1} ,$$
(2)

where \(t\) is the time stage and \({r}_{t+1}\), \({r}_{t+2}\), \({r}_{t+3},\) … is the sequence of rewards after the time stage \(t\), and \(\gamma \in \left[ {0, \ 1} \right]\) is a deterrent that handles the significance of instant rewards compared to coming rewards and prevents the reward from going to infinity.

One of the best techniques for solving the Markov decision chain problem is the temporal difference technique, which is notable for its good performance, low computational cost, and plain interpretation. The value of a state or action is estimated using the value of other states or actions [26, 27]. Since the proposed technique basis on temporal difference, we express TD as follows:

$${\text{V}}\left( {S_t } \right) = {\text{V}}\left( {S_t } \right) + \alpha \left[ {r_{t + 1} + \gamma {\text{V}}\left( {S_{t + 1} } \right) - {\text{V}}\left( {S_t } \right)} \right]$$
(3)

where parameter \(\alpha\) is the learning rate that determines how many errors must accept at every step. Parameter \(\gamma\) is the discount rate that characterizes the influence of the following case. The value inside the bracket is a calculation error in the calculation. It calculates the difference between the worth of case \(V({S}_{t})\) and the estimate of the subsequent step and the subsequent reward \({r}_{t+1} + \gamma \mathrm{V}({S}_{t+1}) - \mathrm{V}({S}_{t})\) that the operator tries to minimize this time.

3 Application of Intelligent Approaches in Engineering Control Problems

In this section, we will discuss the applications of intelligent approaches in engineering control problems by reviewing a few works in the literature.

3.1 Stabilization and Program Control Problems

Program control and stabilization operations require feedback in the loop. In general, system state vectors are not provided for evaluation, so the available measurement outputs define the control strategy. The robotics and mechatronics standard tasks are similar to speed trajectory tracking and stabilization tasks. These variables can be easily measured at the output of the system. Reference [28] presents a machine learning method for quadcopters. This article presented the approach \({\pi }_{\theta }\) using \(\theta\) as the parameter and is differentiable on parameters. \(J({\pi }_{\theta })\) is the objective function differentiable to \(\theta\), for example, the optimization is conducted by the gradient method. For this purpose, probabilistic estimation of the strategy parameters and the mean reward gradient formula is used. The most common method of gradient estimation is formulated as follows:

$$\hat{g}=\widehat{{\mathbb{E}}_{t}}\left[{\nabla }_{\theta }{\mathrm{log}}_{{\pi }_{\theta }}\left({a}_{t}|{s}_{t}\right){\hat{A}}_{t}\right]$$
(4)

where \(\widehat{{\mathbb{E}}_{t}}\) is the experimental mean for a finite set of instances, \(\hat{A}\left({s}_{t},{a}_{t}\right)=Q\left({s}_{t},{a}_{t}\right)-\hat{V}\left({s}_{t}\right)\) represents the advantage function value in time t by changing the sample generation process, and \({\pi }_{\theta }\) is the policy enhancement. The dynamic model may be non-differentiable or unknown in this reinforcement learning problem. Thus, the model should be trained, which leads to increasing the gradient estimates variance. For policy optimization in the mentioned article, a solid approach is proposed by incrementally enhancing agent performance. After differentiation of Eq. 4, the objective function is formulated below.

$$J\left( \theta \right){ } = \widehat{{\rm{\mathbb{E}}_t }}\left[ {min\left( {r\left( \theta \right)\hat{A}_t ,clip\left( {r\left( \theta \right),1 - \varepsilon ,1 + \varepsilon } \right)} \right)} \right]$$
(5)

where \(r\left(\theta \right)={\pi }_{\theta }/{\pi }_{\theta old}\) and \(\varepsilon\) is the hyperparameter.

By differentiation of Eq. 5, gradient \(\hat{g}\) is obtained. The reward function is formulated as follows:

$${r}_{t}\left({e}_{xt},{e}_{yt},{e}_{zt}\right)=\alpha -\sqrt{{e}_{xt}^{2}+{e}_{yt}^{2}+{e}_{zt}^{2}}$$
(6)

where \(\alpha\) is a constant to make sure that each quadcopter is assigned a reward, and \({e}_{xt},{e}_{yt},{e}_{zt}\) are the coordinates of the quadcopter.

In reference [29], entropy-based reinforcement learning is considered with a soft membrane vibrating drive for ultra-fast tripod robot gait. Data collection for learning and controller development with feedback are needed for this type of problem. A Gaussian normal distribution policy is defined as the controller: \({f}_{\varphi }\left({s}_{t}\right)=\left({\mu }_{t},{\sigma }_{t}\right)\) in which \(\varphi\) is the controller parameter, \({\sigma }_{t}\) and \({\mu }_{t}\) refer to the standard deviation and mean, respectively. The action strategies for starting are defined as \(N\left({a}_{t},{f}_{\varphi }\left({s}_{t}\right)\right)\) and function \({f}_{\varphi }\) is constructed as a neural network. The reward function is presented as follows:

$$r\left({s}_{t}\right)= -{d}_{t}-\delta {\theta }_{t}+c$$
(7)

where the root mean square error between the current position of the robot and its final position is shown by \({d}_{t}\), c is the coefficient, and the angular difference between the current and desired position is shown by \(\delta {\theta }_{t}\). To Maximize Entropy Solution, the optimal solutions policy is formulated as follows:

$$\pi _\alpha ^* = \arg \mathop {\max }\limits_\pi\,{\mathbb{E}}_{\tau \;P,\pi } \left[ {\sum_{t = 0}^\infty {\gamma ^t } {\left( {\hat{r}} \left( {s_t ,a_t } \right) \right)} + \alpha H\left( {\pi \left( {.|s_t } \right)} \right)} \right]$$
(8)
$$H\left({\pi }_{\varphi }\left(.|{s}_{t}\right)\right)= {\mathbb{E}}_{\alpha \sim {\pi }_{\varphi }}\left[-\mathit{log}{\pi }_{\varphi }\left(a|s\right)\right]$$
(9)

where \(\alpha\) is the entropy temperature in ranges \([0,\infty )\) and \({\hat{r}} {\left( {s_t ,a_t } \right)} = {\mathbb{E}}_{\acute{s}{\sim}P\left( {\pi \left( {\left. . \right|s,a} \right)} \right)} {\left[ r \left( {\acute{s}} \right)\right] }\). The function value should be minimized by stochastic gradient descent.

If we have control goal changing repeatedly, the mentioned reinforcement learning method is not applicable. To solve this problem, you can use a set of state-action-reward, which can be trained to mimic a specific objective in each set. This solution is presented by Puccetti et al. [30] and is tried on a speed control framework.

Bayesian statistical methods are very effective in intelligent systems [31]. A new hypothesis is achieved by recent data from human brain research led to that in specific types of sensorimotor learning, the brain uses Bayesian internal models to optimize performance on specific tasks. The resulting activity of a particular neuronal population can be modeled as a coordinated Bayesian process. The concept of neural signal processing can be utilized in a variety of applications, from rehabilitation engineering to artificial intelligence.

3.2 Controller Tuning

Utilizing fuzzy and adaptive controllers and PID is common in the industry. In adaptive control schemes, both the controller parameters and structure can be changed in response to parameters alteration of the disturbances or controlled object. An overview provides a historical viewpoint on learning methods and adaptive control [7]. In many cases, the structure of the controller is fixed, and only its parameters need to be tuned. It is known how to tune the controller based on a description of the system dynamics. Therefore, it is not easy to obtain in practice, requiring deep system knowledge and potentially open-loop, large-scale measurements. The first proposed algorithm in this area tries to tune a PID controller with the quick reaction of the system model and the sufficiency and cycle of the closed control-loop natural oscillation [32, 33]. Subsequently, an adaptive PID controller and a discriminative adaptive control algorithm were proposed, and the model parameter estimates were used to adjust coefficients [34, 35]. In some cases, especially if the system is unstable, only feedback measurements are possible. The alteration gets to be cumbersome and wasteful in this connection as the operating conditions of the system change. It, therefore, relies on automated methods that can rapidly decide the parameters of the controller without human intercession based on the task. A self-regulating structure starts work in this area.

In reference [36], a multi-parameter self-tuning controller is proposed to control an injection molding machine. The dynamics of a system are explained by the following probabilistic model of discrete time.

$$A\left({q}^{-1}\right)y(t)= B\left({q}^{-1}\right)u(t-d-1)+C\left({q}^{-1}\right)e(t)$$
(10)

where an output vector of dimension \(p\) is shown by \(y(t)\), an input vector of dimension \(s\) is indicated by \(u(t)\), \({q}^{-1}\) is the reverse shift operator, \(e(t)\) is white Gaussian noise of dimension \(p\), \(d\) is the unit time delay, and (\({q}^{-1} y(t))=y(t-1)\). The model presented in Eq. 8 is presented by the self-tuning estimation strategy with recursion in k-step as follows:

$$\acute{y}\left( {t + k|t} \right) = \mathop \sum \limits_{i = 1}^{n_a } \hat{A}_i \hat{y}\left( {t + k - i|t} \right) + \mathop \sum \limits_{i = d}^{n_b } \hat{B}_i u\left( {t + k - i} \right) + \mathop \sum \limits_{i = d}^{n_c } \hat{C}_i \hat{e}\left( {t + k - i} \right)$$
(11)

where \({\hat{A}}_{i},{\hat{B}}_{i},{\hat{C}}_{i}\) indicates the estimated matrices for Eq. 10 and \(k = 1, 2, . . ., d\). Thus, the optimization problem is reformulated as follows:

$$J = \left\| {D_0 \hat{B}_d u\left( t \right) + \hat{L}\left( t \right)} \right\|_{Q1}^2 + \left\| {G_0 u\left( t \right) + \mathop \sum \limits_{i = 1}^r G_i u\left( {t - 1} \right)} \right\|_{Q_1 }^2$$
(12)

This turns a stochastic optimization problem into a deterministic problem: \(\partial J/\partial u(t),\) in which the output of the self-tuning controller indicates by \(u(t)\):

$$\begin{aligned} u\left( t \right) & = - \left[ {\left( {D_0 \hat{B}_d } \right)^T Q_1 D_0 \hat{B}_d + G_0^T Q_2 G_0 } \right]^{ - 1} \\ & \quad \quad \left[ {\left( {d_0 \hat{B}_d } \right)^T Q_1 \hat{L}\left( t \right) + G_0^T Q_2 \mathop \sum \limits_{i = 1}^r G_i u\left( {t - 1} \right)} \right] \\ \end{aligned}$$
(13)

The structure adjustment capabilities and additional control of the learning controller must be utilized to fulfill the needs of more complex machines based on the simulation results.

In a study dedicated to self-tuning controllers [37], algorithms were obtained and analyzed by aggregating the least squares estimation (LSE) and tuning the minimum oscillations achieved by the dynamics model. Two theorems are achieved by the primary results assuming convergence of estimating parameters and defining a closed system. Some cross-covariance and output of the output control variable will vanish from the little presumptions of the registry in the first theorem. The second theorem assumes that the control framework may be a common linear likelihood framework of order n. When the parameter estimation process is converged, we show that the resulting control law is the control law of most minor variability that can be computed with the known parameters.

3.3 Identification Problems

In reference [38], a method using the bee swarm algorithm to identify linear systems described by differential equations is proposed. To get a model and parameter set that minimizes the prediction error between the model output and the real object, an optimization problem is defined based on the identification problem. The result of the algorithm operation is displayed on the DC motor model.

In reference [39], to adjust the parameters of the PID controller of an evaporator control system while minimizing the system tracking absolute squared error, a heuristic colony competition algorithm was used. The genetic algorithm and Ziegler–Nichols method demonstrate this algorithm's effectiveness.

To determine the lasting magnet synchronous motor model parameters in real time, Rahimi et al. [40] used a heuristic competition algorithm. For this, a minimization process is conducted based on the mean squared error of the system state vector control.

3.4 Optimization Problems

Gradientless search algorithms are widely utilized for all optimization problems due to their versatility. This also applies to NN because NN does not utilize the gradient of the function and does not consider it is differentiable [41]. Their characteristic is that the optimization problem solution is worthy but not ideal. Recently, various biomimetic solutions that borrow ideas from nature are gaining popularity [42, 43]. These include populations [44], swarm and colony algorithms [45,46,47], evolutionary, etc. A bat algorithm [44] is also known and is related to echolocation-based swarm intelligence. The cuckoo swarm algorithm tunes the PID controller in thyristor series compensation [48] and DC motor control systems [49]. The former was more efficient compared to the Swarm algorithm with the heuristic algorithm.

In reference [50], using support vector algorithms, an optimal control approach is proposed to minimize the bipedal robot's power consumption under a small data sample size and an unknown system dynamics model. The new controller has been integrated into the optimal controller, constraining the robot's joint angles to minimize the energy-related cost function. The energy functional is

$${J}_{EE}= \int\limits_{0}^{T}\frac{1}{2}{\tau }^{T}\tau dt, \tau =g\left(\Theta \right)$$
(14)

where \(g(.)\) is parameterized by NN and \(Q\) is a vector of generalized coordinates. The quadratic form support vector machine quality function is

$${J}_{SVM}= \min\frac{1}{2}{W}^{T}W+\frac{1}{2}C\sum_{i=1}^{N}{\xi }_{i}^{2} as, {\tau }_{i}={w}^{T}\varphi \left({\Theta }_{i}\right)+{\xi }_{i}$$
(15)

where \({\xi }_{i}\) is a positive variable, \(w\) is a vector of weights, \(C\) is a penalty factor, \(N\) refers to the training instances number, and \(\varphi (.)\) refers to the transformation function of the input space to the input space of higher-order features. The resulting functional includes the aggregation of \({J}_{EE}\) and \({J}_{SVM}\).

In Ref. [51], an improved “learn-learn” search algorithm is utilized by multi-objective optimization of PID controller parameters. This prevents function values from getting stuck in local minima. To this end, there are not only two learners in the learning, but it includes an additional state. Also, parameters ear to inconsistent targets is combined with a blocked device phase where they are blocked. This ensures that each objective cannot collide with another [52]. The results of comparative studies on optimizing the parameters of the PID controller of the DC motor control system utilizing the particle swarm method, the honey bee colony algorithm, and the learning-learning. The last one showed the best results.

3.5 Problems of Iterative Learning

Machine learning is known as one form of artificial intelligence, which is that rather than being explicitly programmed, the systems can be trained by data stored in memory [53]. Based on processing the training data set, a more accurate model is constructed. This allows you to train the model before and on an ongoing basis. The iterative model training process continuously improves the types of relationships between data items, no matter how complex or large. You can continue training in real time using models trained offline.

In Ref. [54], a fault-tolerant control approach is proposed according to the iterative current-loop learning control for recovering the execution of polyphase permanent magnet drives under open-circuit conditions. This method does not need diagnostics and troubleshooting as its main advantage, and torque measurements are sufficient. Iterative learning management, therefore, provides comprehensive knowledge on reliability for modeling uncertainty and the system. We developed a flexible trajectory-assisted control scheme using iterative learning control for a cloud-wheeled robot system to move along a given trajectory and transport cargo simultaneously and performed a system stability analysis [55]. In [56], a human-led iterative learning framework is presented for a trajectory-tracking task in which a controller gets input from the activities of a human agent. Hladowski et al. [57] considered the influence of noise to achieve new results for the dynamic enhancement of iterative learning control laws.

An iterative procedure is presented for planning the milling process in reference [3]. For that, it is necessary to know the machine's technical parameters and the parts' geometrical parameters to form the machine tool trajectory. Tool deviation is a severe problem in which the milling process requires constant review and planning. Dittrich et al. [3] presented the following solution that reduces processing errors by up to 50% by predicting the error between a model of machined shape and actual surface measurements using machine learning methods. Thus, a statistical support vector machine uses the previous process dataset as the training dataset.

4 Conclusions

The modern world trend towards organizations of advanced production types is reflected in intelligent control scientific publications methods in electromechanical systems. Using AI methods, it is possible to solve previously impossible problems of controlling mechatronic systems while at the same time increasing ease of implementation and computational efficiency. The complexities of control tasks for multi-agent systems are inherently non-linear, uncertain, or influenced by external environments and require individual approaches to solving specific problems, for which many tools are proposed. Only by actual experiments, the effectiveness of these learning algorithms on complex systems can be measured. The development of the algorithm itself aims not only to increase the accuracy and speed of learning but also to increase independence from adaptation to various goals and learning strategies that humans strictly set. Developers try to recreate the behavior of living organisms by utilizing natural thoughts in algorithms. Future research establishes a task, usually referred to as “learning for learning,” when agents need to select learning strategies and tune meta-parameters.