Keywords

1 Introduction

A wide variety of experiments are performed using SR from Indus-2 at experiment stations located at the commissioned beamlines (BL). These experiments are sensitive to the electron beam position and angle at source point [1]. The magnitude of dependence is such that for few 100 µm changes in electron beam position there is an mm change at the target sample. In advanced SRS, the orbit stability requirement is more stringent, i.e., 10% of beam size which comes out to be micron to sub-micron level. Various schemes like Slow Orbit Feedback (SOFB) and Fast Orbit Feedback (FOFB) systems are employed for achieving these stability requirements [2]. This stability is with reference to a predefined orbit, but sometimes a need to change this orbit altogether or locally at some place arises. For this, until now SOFB was used interactively while continuously consulting BL users after each modification in the orbit. After some iterative exercises, the beam orbit is attained that satisfies user requirement. Machine time is generally taken specifically for this activity as disturbance propagates throughout the ring while finding feasible orbit. Hence machine downtime increases. This iterative exercise is proposed to be replaced by self-learning application which, according to operating conditions or when commanded, can adjust the orbit locally. This application is particularly useful when one or two of the BL users are not getting the beam, and local correction is required without affecting other BL users.

Intelligent agents are software entities placed in a dynamic environment with a well-defined goal which the entity tries to accomplish by dynamically interacting with its environment. While interacting autonomously, these entities update their beliefs based on the result of their actions. When they give deteriorated performance and old skill is not good enough for achieving the assigned goal, they learn new skills. Agents are generally modeled to perceive their environment through properties like, their autonomy, intentions, beliefs, reactivity, and proactivity [3, 4]. Different types of agents are proposed and used by researchers and engineers for their applications. Multi-agent-based schemes have also been explored for control of large distributed plants. The work on multi-agent-based control has also been done toward minimization of closed orbit distortion (COD) of Indus-2 electron orbit, in simulated environment [5]. The simulation studies were performed earlier in the way that different agents were deployed at different layers of control to leverage the computational capability of each layer while locally handling most control tasks. The work presented here is an extension to it in a way that for the ease of development and deployment, the agents are first deployed only at application layer for initial test trials and scheme qualification for local orbit bump and COD of Indus-2 machine. This paper discusses the deployment of an intelligent algorithm core into already existing SOFB infrastructure for COD and control of local bump for any particular BL. It also peeks into the area of reliability and constraint incorporation into algorithm of this application for overall system stability.

2 Intelligent Agent-Based Orbit Control for Local Bump

Local bumps in electron orbit in SRS are generally provided for alignment of SR at BL experiment stations. Corrector magnets or steering magnets are used on top of other magnet optics. The same set of correctors are used by SOFB. Figure 1 shows three corrector scheme. Corrector one (CV1) opens bump, CV2 applies kick toward CV3 and CV3 closes it and the relation between them for closed orbit distortion (COD) [6] is given by Eq. (1). BPI1 and BPI2 are responsible for measuring beam position in this bump. Any mismatch in the corrector ratios would give rise to bump leakage and orbit distortion all over the ring. The skill set is defined as the ratio of correctors: λ:C1:C2:C3, where λ is a scaling factor for 1 mm bump at BPI. Application of local bump involves some amount to bump leakage; however, efforts are put to minimize this leakage by fine trimming the corrector value iteratively.

Fig. 1
figure 1

Three kicker bump scheme

Equation (1) is used for theoretically calculating corrector strengths at three correctors (\(\theta_{1} ,\theta_{2} ,\theta_{3}\)) for closed orbit distortion. β is Twiss Parameter and depends on the location of the corrector. Ψi,j denotes the phase difference between corrector i and j. Practically this ratio is calculated through measuring system response to step excitation of all correctors to generate response matrix (RM).

$$\frac{{\theta_{1} \sqrt {\beta_{1} } }}{{\sin \psi_{2,3} }} = \frac{{ - \theta_{2} \sqrt {\beta_{2} } }}{{\sin \psi_{1,3} }} = \frac{{\theta_{3} \sqrt {\beta_{3} } }}{{\sin \psi_{3,1} }}$$
(1)

For control of electron orbit locally three and four kicker schemes are used. While three kicker scheme controls only position at BPIs placed in between these three correctors, four kicker scheme controls angle at them as well. Until now, local bump was provided using SOFB system. Predictive facilities were provided to the operator in the SOFB application for easy visualization of the effects of changing reference orbit [7]. However, the process becomes iterative and a need of local, automatic, and intelligent algorithm was felt. The agents are developed with the aim of minimizing orbit bump leakage outside the bump zone while achieving the desired bump height. It involves updating their existing skill sets in case bump leakage is found to be outside the defined band of tolerance. Next sections describe the bump control system overview, proposed application overview, COD agents and their training and system constraints.

2.1 Bump Control System Overview

This section gives an overview of the Indus-2 orbit system and its control. The electron beam is made to circulate in the Indus-2 vacuum chamber using various magnets while continuously monitoring its position at various locations through beam position indicators (BPIs). The position is acquired from BPIs through data acquisition system and passed to orbit control system. The control system calculates the correction, in the form of steering magnets’ current settings, required to mitigate the disturbances in the beam. The corrections are passed to the steering magnets that correct the beam position variations. Current settings for SOFB are calculated using inverse response matrix (RM). Inverse RM is evaluated from RM using singular value decomposition (SVD). The SOFB control system is implemented through state machine-based algorithm in LabVIEW real-time environment. For local bump correction, the same controller is used with an additional state. Here, the corrections for local correctors are received from application layer and passed to power supply (PS) interface for application to corrector magnets. Figure 2 shows the block diagram of the local bump control scheme highlighting different components of the control system.

Fig. 2
figure 2

Block diagram of the local bump scheme

2.2 Application Overview

The new orbit control application is embedded inside the existing SOFB application by inserting a new state. The system model for local bump controller is RM of SOFB system itself as same hardware is used. The new state has the facility to communicate with the client GUI through ethernet network for sending and receiving the corrector values of steering coils and BPI data. The local bump control application incorporating agent-based control algorithm is deployed and integrated with SOFB client GUI in the control room. All the actions are calculated at this application and the corrector offsets are passed to the new state of SOFB controller. Handshaking mechanism is adopted for reliable communication between RT application and client application.

2.3 COD Agents, Skill Learning, and Algorithm

For achieving a common goal of closed orbit distortion (COD) minimization, the orbit correction agent work to find best corrector settings using hill climbing algorithm. The initial skill set is calculated by genetic algorithm. It finds new settings by evaluating cost of applying incremental changes to all three correctors separately and then applying the most effective step. Here, the cost is the bump leakage. The bump leakage is measured as rms orbit deviation outside bump. Most effective step is with least cost. It also updates the skill set after application of most effective step. Three corrector schemes are used for providing local bump. Training of the agent involves iteratively finding a corrector set, i.e., by applying a small delta step in positive and negative direction to each of the three correctors one by one, which could minimize the deviation outside local bump while achieving desired bump in the bump region. The training continued till a corrector set is achieved that keeps the leakage deviations in a band of tolerance, named as stop learning band. The training again starts when the leakage increases beyond a band of tolerance, named as start learning band. Figure 3 highlights various states of a typical learning cycle at different leakage bump costs. The numerical values of these bands are such that the absolute value of stop learn is less than start learn band. This ensures better learned skills compared to earlier and avoids going to learn mode frequently. Several iterative exercises were performed to finalize the values of stop learn band and start learn band to be 6 and 8 µm, respectively.

Fig. 3
figure 3

Learning state at different bump levels

Applying a higher value of correctors while achieving a bump (not in learn mode) may lead to high leakage bump despite applying corrections with correct skill set, so this has to be avoided. Too low value of application will reduce the bump achieve speed to impractically low, so a compromise has to be made. It is seen that noise floor in the beam position is 2–3 µm rms. A peak-to-peak deviation of 30 µm is tolerable for BL users. Also system measurements highlight that a maximum deviation of up 1.2–1.5 mm is possible with 1A change in corrector current. So for interactive learning of the system with actuators excitation, a current of 15 mA is taken. While being in learn mode, corrector excitation should be such that it exceeds the noise floor inside the electron beam but do not much affect the orbit, i.e., COD should be intolerable limits even in the worse case. The cost of stop learn should be such that the COD becomes minimum but greater than the noise floor so that the system does not oscillate due to noise dominance. The stop learn should also be selected such that the BL users do not get affected by the perturbation caused in the beam due to skill learning. The start learn value should be selected higher than stop learn band so that learning starts only when it is beyond the learned skill limit. The delta step for applying correction using skill set, delta step for learn cycle, cost selection for start learn, and stop learn band are very crucial and are to be chosen wisely. Table 1 gives the details of the final values of system parameters used for implementation of the agents.

Table 1 System parameters

Flow graph of the COD agent for control of bump in “Apply Request Interactively” mode of operation is shown in Fig. 4. Apply Request Interactively mode of operation is the core of the application. Initially, simulation request is raised with offset requirement at BL sample end. This request uses genetic algorithm to find current settings for the designated corrector sets. User may initialize skills with this set of corrector current or may keep the previously learned skill set. At first step, the corrections are applied based on this skill set. The cost is evaluated for this applied step. If the cost is greater than a predefined value, i.e., MaxCostApply, learning mode gets activated otherwise bump achievement condition is checked. If bump is achieved, it stops, otherwise it again applies correction with same skill set. In learn mode, hill climbing algorithm is used to find the corrector which gives minimum cost on excitation. Each corrector is excited in both directions with small current (ΔLearn) and cost is recorded. The step with minimum cost is applied and the skill set is updated. The cost is compared with the cost of stop learn (MinCostLearn). If it is less than this value, it comes out of learn mode and checks for bump achieve condition. If not, then it again repeats the process.

Fig. 4
figure 4

Algorithm for Apply Request Interactively mode of operation

2.4 System Constraints and Validation Criteria

The application is equipped with system constraints so that no instability arise. The constraints are embedded into the controller. The application of corrector setting through the application is permitted only when the final value of the corrector does not exceed its safe limit which is kept at 6 A for initial trials and would finally be kept at 7 A. The correctors are capable for sustaining 10 A current through. While using manual and quick application modes, the maximum rate at which the simulated corrector current may be applied is kept at 0.1 A/s that is below corrector PS rate limit of 0.13 A/s. This avoids the system to get into nonlinear zone. Sufficient time is given for settling of the response before evaluation of the applied step for cost evaluation. While evaluating incremental apply and learn steps to correctors, it was considered that the perturbation introduced by the learning and application of correction would be within tolerance band of 30 µm.

3 Results of Initial Trials of the Scheme

The proposed scheme has been implemented and tested for skill learning and overall functionality of the application. Some of the implementation results are given next, in the form of application interface GUI and skill learning capability.

Figure 5 is a snap of the client application GUI highlighting its various parameters. For the selected BL, i.e., BL 12, skills set graph, local corrector current graph, bump and leakage graph and interface button for activating various modes of operation are highlighted. By simulating offset requirement at BL sample, corrector current requirements, and effects on BPIs can be seen. Three modes of operation for applying the simulated request are apply request interactively (with learning, if required), apply request quickly and apply request manually (step by step). The apply request quickly and manually modes are performed only when the skill set is known to be good as these modes does not involve learning new skills and the correctors are incremented gradually with previously known skill sets. The apply request interactively mode works on the algorithm as explained earlier, which applies bump while also ascertaining that the leakage is minimum, if it is not, then the algorithm goes for learning new skills. Figure 5 shows, the local bump is achieved with a bump leakage of 6.3 µm rms for achieving a bump height of around 250 µm, and corrector values are within constraint boundary of ±6A. As per system model, a position change of 640 µm was required at BL 12 sample point which was achieved by the application without any learn cycle. The RM for system model was measured recently, so it was possible to achieve the bump without any need of learning new skills. In Fig. 5, the center graph shows skill set, blue is initial and red is new learned skill set. Since no learning took place the red dot was not updated and is same as it was in previous bump condition. For the sake of verifying the functionality, the initial skill set was modified to make it poor and then bump request was placed. Figure 6 gives the details of the track made by the skill set while achieving the bump at BL1. The α and β on the skill graph are the ratio of correctors C2:C1 and C3:C1.

Fig. 5
figure 5

GUI snap of the client application

Fig. 6
figure 6

COD Agent in process of learning new skill set while providing bump for BL 1

Since the initial skills, skills from system model, were good enough, and we altered the skills to see the movement of the learned skill set, the learned skill set tries to come back to the same point while achieving the desired goal of bump application. Similar procedure was repeated for BL 6 to get response shown by Fig. 7, while applying bump. The skill learning pattern also confirms with the simulation studies pattern.

Fig. 7
figure 7

COD Agent in process of learning new skill set while providing bump for BL 6

4 Conclusions

The proposed scheme has been validated, and initial trials were performed to evaluate various parameters skill learning and overall functionality of the application in Indus-2. With the chosen parameters and the intentionally degraded skill set, the application tries to achieve desired bump while learning new skill sets that are approaching the earlier known good skill set. Along with system constraint implementation into the application, its parameters are fine trimmed to have optimum system performance. The application is able to apply bump locally at any BLs with tolerable disturbances at other locations. The initial results are in line with the simulation studies carried out for the same system model. A GUI has been developed for remote operation in control.