Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

For several decades, the traditional mindset for controlling large-scale power systems has been limited to local output feedback control, which means that controllers installed within the operating region of any utility company typically use measurements available only from inside that region for feedback, and, in fact, more commonly only from the vicinity of the controller location. Examples of such controllers include Automatic Voltage Regulators (AVR), Power System Stabilizers (PSS), Automatic Generation Control (AGC), FACTS control, HVDC, and so on. However, the US Northeast blackout of 2003, followed by the timely emergence of sophisticated GPS-synchronized digital instrumentation technologies such as Wide-Area Measurement Systems (WAMS) led utility owners to understand how the interconnected nature of the grid topology essentially couples their controller performance with that of others, and thereby forced them to look beyond this myopic approach of local feedback and instead use wide-area measurement feedback [1]. Over the past few years, several researchers have started investigating such data-driven wide-area control designs using \(H_\infty \) control [2,3,4], LMIs and conic programming [5], wide-area protection [6], model reduction and control inversion [7], adaptive control [8], LQR-based optimal control [9,10,11,12], etc., complemented with insightful case studies of controller implementation for various real power systems such as the US west coast grid [13], Hydro Quebec [14], Nordic system [15, 16], and power systems in China [17], Australia [18], and Mexico [19]. A tutorial on the ongoing practices for wide-area control has recently been presented in [20], while cyber-physical implementation architectures for realizing these controls have been proposed in [21, 22].

One of the biggest roadblocks for implementing wide-area control in a practical grid, however, is that the current power grid IT infrastructure is rigid and has low capacity as it is mostly based on a closed mission-specific architecture. The current push to adopt the existing TCP/IP-based open Internet and high-performance computing technologies such as the NASPInet [23] may not be enough to meet the requirement of collecting and processing very large volumes of real-time data produced by such thousands of PMUs. Second, the impact of the unreliable and insecure communication and computation infrastructure, especially long delays and packet loss uncertainties over wide-area networks, on the development of new WAMS applications is not well understood. For example, uncontrolled delays in a network can easily destabilize distributed estimation algorithms for wide-area oscillation monitoring using PMU data from geographically dispersed locations. Finally, and most importantly, very little studies have been conducted to leverage the emerging IT technologies, such as cloud computing, Software-Defined Networking (SDN), and Network Function Virtualization (NFV), to accelerate the development of WAMS [24]. Another major challenge is privacy of PMU data as utility companies are often shy in sharing data from a large number of observable points within their operating regions with other companies. Equally important is cybersecurity of the data as even the slightest tampering of Synchrophasors, whether through denial-of-service attacks or data manipulation attacks, can cause catastrophical instabilities in the grid. What we need is a cyber-physical architecture that explicitly brings out potential solutions to all of these concerns, answering how data from multitudes of geographically dispersed PMUs can be shared across a large grid via a secure communication medium for successful execution of critical transmission system operations, how the various binding factors in this distributed communication system can pose bottlenecks, and, how these bottlenecks can be mitigated to guarantee the stability and performance of the grid.

Motivated by these challenges, in this tutorial we review the current state-of-the-art practice for wide-area communication and control from a Cyber-Physical System (CPS) viewpoint. We study the ways in which these practices pose limitations against optimal grid performance, how modern communication technologies such as SDN, NFV, and cloud computing can be used to overcome these limitations, and how new ideas of distributed control and optimization need to be integrated with various operational protocols and wide ranges of uncertainties of these wide-area networks to ensure that the closed-loop grid operates in a stable, reliable, robust, and efficient way. We also discuss the need for advanced modeling, simulation, and control of wide-area communication to ensure cybersecurity and resilience of wide-area controllers.

The remainder of the chapter is organized as follows. In Sect. 2 we recall standard differential-algebraic models of power system networks followed by wide-area control designs in Sect. 3. Section 4 presents the potential cyber-physical architectures for implementing these wide-area controllers over a distributed communication network. Section 5 highlights research directions on how these controllers can be made aware of the various operational uncertainties in the communication network such as delays. Section 6 summarizes the potential benefits of modern communication technology such as SDN, NFV and cloud computing in realizing the envisioned CPS architecture. Section 7 highlights the importance for making wide-area controllers secure against cyberattacks. Section 8 presents the current challenges facing simulation of extreme-scale wide-area control, while Sect. 9 describes the importance of hardware-in-loop CPS testbeds that can be used for testing, verification, and validation of such emulated control loops before they are deployed in the field. Section 10 concludes the chapter.

2 Power System Models

Consider a power system network with n synchronous generators. Each generator is modeled by a flux-decay model assuming that the time constants of the d- and q-axis flux are fast enough to neglect their dynamics, that the rotor frequency is around the normalized constant synchronous speed, and that the amortisseur effects are negligible. The model of the ith generator can be then written as [25]:

$$\begin{aligned} \dot{\delta }_i= & {} \omega _i-\omega _s \end{aligned}$$
(1)
$$\begin{aligned} M_i\dot{\omega }_i= & {} P_{mi} - (V_i I_{qi}\cos (\theta _i-\delta _i)+V_i I_{di}\sin (\delta _i-\theta _i)) - d_i(\omega _i-\omega _s) \end{aligned}$$
(2)
$$\begin{aligned} T_{qi}\,\dot{E}_{qi}'= & {} -E'_{qi} + (x_{di}-x'_{di})I_{di}+E_{fdi} \end{aligned}$$
(3)
$$\begin{aligned} T_{di}\,\dot{E}_{di}'= & {} -E'_{di}+(x_{qi}-x'_{qi})I_{qi} \end{aligned}$$
(4)
$$\begin{aligned} T_{Ai}\,\dot{E}_{fdi}= & {} -E_{fdi}+K_{Ai}(V_{ref,i}-V_i)+ u_i(t). \end{aligned}$$
(5)

for \(i=1, \ldots , n\). Equations (1)–(2) are referred to as the swing equations while (3)–(5) as the excitation equations. The states \(\delta _i\), \(\omega _i\), \(E'_{qi}\), \(E'_{di}\), and \(E_{fdi}\) respectively denote the generator phase angle (radians), rotor velocity, the quadrature-axis internal emf, the direct-axis internal emf, and the field excitation voltage of the ith generator. The voltage at the generator terminal bus is denoted in the polar representation as \(\tilde{V}_i(t)=V_i(t)\angle {\theta _i(t)}\). \(V_{ref,i}\) is the constant setpoint for \(V_i\). The generator current in complex phasor form is written as \(I_{di} +j I_{qi}=I_i\angle {\phi _i}\). \(\omega _s\) is the synchronous frequency, which is equal to \(120\pi \) rad/s for a 60-Hz power system. \(M_i\) is the generator inertia, \(d_i\) is the generator damping, and \(P_{mi}\) is the mechanical power input from the ith turbine, all of which are considered to be constant. \(T_{di}\), \(T_{qi}\), and \(T_{Ai}\) are the excitation time constants; \(K_{Ai}\) is the constant voltage regulator gain; \(x_{di}\), \(x'_{di}\), \(x_{qi}\) and \(x'_{qi}\) are the direct-axis and quadrature-axis salient reactances and transient reactances, respectively. All variables, except for the phase angles (radians), are expressed in per unit. Equations (1)–(5) can be written in a compact form as

$$\begin{aligned} \dot{x}_i(t)=g(x_i(t),z_i(t),u_i(t),\alpha _i) \end{aligned}$$
(6)

where \(x_i=[\delta _i \; \omega _i \; E'_{qi}\;E'_{di}\; E_{fdi}]' \in \mathbb R^5\) denotes the vector of state variables, \(z_i=[V_i\; \theta _i \; I_{di} \; I_{qi}]'\in \mathbb R^4\) denotes the vector of algebraic variables, \(u_i \in \mathbb R\) is the control input, and \(\alpha _i\) is the vector of the constant parameters \(P_{mi}\), \(\omega _s\), \(d_i\), \(T_{qi}\), \(T_{di}\), \(T_{Ai}\), \(M_i\), \(K_{Ai}\), \(V_{ref,i}\), \(x_{di}\), \(x_{qi}\), \(x'_{di}\), and \(x'_{qi}\), all of which are assumed to be known. The definition of the nonlinear function \(g(\cdot )\) follows from (1)–(5).

The model (6) is a completely decentralized model since it is driven by variables belonging to the ith generator only. It is, however, not a state-space model as it contains the auxiliary variables \(z_i\). The states \(x_i\) can be estimated for this model in a completely decentralized way if one has access to \(z_i(t)\) at every instant of time. This can be assured by placing PMUs within each utility area such that the generator buses inside that area become geometrically observable, measuring the voltage and currents at the PMU buses, and thereafter computing the generator bus voltage \(V_i\angle {\theta _i}\) and current \(I_i\angle {\phi _i}\) (or equivalently, \(I_{di}\) and \(I_{qi}\)) from those measurements. As the PMU measurements will be corrupted with noise, the estimated values of \(z_i(t)\) will not be perfect. They should rather be denoted as \(\tilde{z}_i(t)=z_i(t)+n_i(t)\), where \(n_i(t)\) is a Gaussian noise. An unscented Kalman filter (UKF) is next designed as

$$\begin{aligned} \dot{\hat{x}}_i(t)=g(\hat{x}_i(t), \tilde{z}_i(t), u_i(t), \alpha _i), \;\; \hat{x}_i(0)=\hat{x}_{i0} \end{aligned}$$
(7)

producing the state estimates \(\hat{x}_i(t)\) for the ith generator at any instant of time t. For details of the construction of this UKF, please see [26]. The estimator can be installed directly at the generation site to minimize the communication of signals, and made to run continuously before and after any disturbance.

The network equations that couple \((x_i,\,z_i)\) of the ith generator in (6) to the rest of the network can be written as

$$\begin{aligned} 0= & {} I_{di}(t)V_i(t)\sin (\delta _i(t)-\theta _i(t))+I_{qi}(t)V_i(t)\cos (\delta _i(t)-\theta _i(t))+P_{Li}(t) \nonumber \\&-\sum _{j=1}^m V_i(t)V_j(t)(G_{ij}\cos (\theta _{ij}(t))+B_{ij}\sin (\theta _{ij}(t))) \end{aligned}$$
(8)
$$\begin{aligned} 0= & {} I_{di}(t)V_i(t)\cos (\delta _i(t)-\theta _i(t))-I_{qi}(t)V_i(t)\sin (\delta _i(t)-\theta _i(t))+Q_{Li}(t)\nonumber \\&-\sum _{j=1}^m V_i(t)V_j(t)(G_{ij}\cos (\theta _{ij}(t))-B_{ij}\sin (\theta _{ij}(t))), \end{aligned}$$
(9)

where m is the total number of buses in the network. Here, \(\theta _{ij}=\theta _i-\theta _j\), \(P_{Li}\) and \(Q_{Li}\) are the active and reactive power load demand at bus i, and \(G_{ij}\) and \(B_{ij}\) are the conductance and susceptance of the line joining buses i and j. As shown in [25], the variables \(z_i(t)\) in (6) can be eliminated using (8)–(9) by a process called Kron reduction. The resulting dynamic model is linearized about a given operating point, and the small-signal model for the power system is written as

$$\begin{aligned} \dot{x}(t)= & {} A_c\,x(t) + B_c\,u(t) \end{aligned}$$
(10)
$$\begin{aligned} y(t):= & {} \Delta \omega (t) = C\,x(t). \end{aligned}$$
(11)

In this model, \(x(t)\in \mathbb R^{5n}\) and \(u(t) \in \mathbb R^n\) now represent the small-signal deviation of the actual states and excitation inputs of the n generators from their pre-disturbance equilibrium. We define \(y(t) \in \mathbb R^n\) as a performance variable, based on which the closed-loop performance of the overall system as well as the saturation limits on u(t) can be judged. Ideally, this can be chosen as the vector of all electromechanical states. For simplicity, y is often chosen as the vector of the small-signal generator frequency \(\Delta \omega (t)\) as frequency is the most effective electromechanical state for evaluating damping. The input u(t) is commonly used for designing feedback controllers such as Power System Stabilizers (PSS), which takes local feedback from the generator speed, and passes it through a lead-lag controller for producing damping effects on the oscillations in phase angle and frequency. PSSs, however, are most effective in adding damping to the fast oscillation modes in the system, and perform poorly in adding damping to the slow or inter-area oscillation modes [20]. Our goal is to design a supplementary controller u(t) for the model (10)–(11) on top of the local PSS by using state-feedback from either all or selected sets of other generators. If the state vector of every generator is fed back to every other generator, then the underlying communication network must have a complete graph, i.e., an all-to-all connection topology. If the state vectors of only a selected few generators are fed back to the controllers of some other generators, then the communication network may have a sparse structure. In either case, since long-distance data transfer is involved, we refer to this controller as a wide-area controller.

3 Controller Design

Several papers such as [9, 10] have posed the wide-area control problem as a constrained optimal control problem of the form:

$$\begin{aligned} \min _{K}J= & {} \frac{1}{2}\int _0^\infty (x^T(t)Qx(t)+u^T(t)Ru(t))dt \nonumber \\ \text {s.t.}, \;\dot{x}(t)= & {} Ax(t)+Bu(t) \nonumber \\ u(t)= & {} K\hat{x}(t), \;\; K \in \mathcal S, \; Q>0,\,R\ge 0, \end{aligned}$$
(12)

where the state estimate \(\hat{x}(t)\) follows from (7). The choice of the objective function J depends on the goal for wide-area control. For power oscillation damping, this function is often simply just chosen as the total energy contained in the states and inputs as in (12). For wide-area voltage control, it can be chosen as the setpoint regulation error for the voltages at desired buses [27], while for wide-area protection, it can be chosen as the total amount of time taken to trigger relays so that fault currents do not exceed their maximum values [6]. The set \(\mathcal S\) is a structure set that determines the topology of the underlying communication network. In the current state of the art, most synchronous generators operate under a completely decentralized feedback from its own speed measurement only. Thus in today’s power grid, the structure set \(\mathcal S\) is reflected in K as

$$\begin{aligned} K=\left[ \begin{array}{ccccc} K_1 &{} 0&{} 0 &{}\cdots &{} 0 \\ 0 &{} K_2 &{}0 &{} \cdots &{} 0\\ &{}&{}&{} \vdots &{}\\ 0 &{} 0&{} 0 &{}\cdots &{} K_n \end{array}\right] ,\;\; K_i=\left[ \begin{array}{ccccc}0&1&0&0&0 \end{array} \right] \end{aligned}$$
(13)

where \(K_i\) is the PSS controller for the ith generator, \(i=1,\ldots ,\,n\), whose state vector follows from (6). However, as pointed out before, decentralized feedback can damp only the high-frequency oscillations in the power flows. Their impact on inter-area oscillations that typically fall in the range of 0.1–2 Hz is usually small [20]. Inter-area oscillations arise due to various coherent clusters in the power network oscillating against each other following a disturbance. If left undamped they can result in catastrophic failures in the grid. In fact, both the 1996 blackout on the west coast and the 2003 blackout on the eastern grid of the United States were largely caused because of the lack of communication between generators resulting from the decentralized structure of K in (13). Triggered by the outcomes of these events, over the past few years, utility companies have gradually started moving away from the structure in (13) to a slightly more global structure as

$$\begin{aligned} K=\left[ \begin{array}{ccccc} K^1 &{} 0&{} 0 &{}\cdots &{} 0 \\ 0 &{} K^2 &{}0 &{} \cdots &{} 0\\ &{}&{}&{} \vdots &{}\\ 0 &{} 0&{} 0 &{}\cdots &{} K^r \end{array}\right] \end{aligned}$$
(14)

where \(K^i \in \mathbb R^{n_i \times 5n_i}\), \(n_i\) being the number of generators within the operating region of the ith utility company, and r being the total number of such companies. All the elements of the block matrix \(K_i\) may be nonzero, meaning that the generators inside an area communicate their state information with each other to compute their control signals. The resulting controller K is block diagonal, and, therefore, a better wide-area controller than (13) since here at least the local generators are allowed to interact. An ideal wide-area controller, however, would be one whose off-block-diagonal entries are nonzero as well, meaning that generators across the operating areas of different companies are allowed to exchange state information. In the worst-case, K can be a standard LQG controller, and, therefore, a dense matrix with every element nonzero, which means that the communication topology is a complete graph. In reality, however, all-to-all communication can be quite expensive due to the cost of renting communication links in the cloud [28], if not unnecessary. Papers such as [10, 29] have proposed various graph sparsification algorithms based on \(l_1\)-optimization to develop controllers that require far less number of communication links without losing any significant closed-loop performance. Papers such as [9], on the other hand, have proposed the use of modal participation factors between generator states and the inter-area oscillation modes to promote network sparsity in a more structured way. Papers such as [30,31,32] have proposed various projection and decomposition-based control designs by which a significant portion of the communication network admits a broadcast-type architecture instead of peer-to-peer connectivity, thereby saving on the number of links. The main challenge of implementing these types of wide-area controllers, however, is the fact that utilities are still shy in terms of sharing their PMU measurements and state information with other utilities due to which they prefer to stick to the block-decentralized structure of (14). Besides these network-centric approaches, other control designs based on more traditional approaches such as adaptive control [7], robust control [3], and hybrid control [8] have also been proposed for wide-area oscillation damping in presence of various model and operational uncertainties.

Several questions arise naturally from the design problem stated in (12). For example,

  1. 1.

    In practice, given the size and complexity of any typical grid, the exact values of the matrices A and B are highly unlike to be known perfectly. Moreover, the entries of these matrices can change from one event to another. Therefore, just one dedicated model created from a one-time system identification may not be suitable. Thus, one pertinent question is—how can one extend the model-based design in (14) to a more measurement-based approach where online PMU measurements from different parts of the grid can be used to estimate the small-signal model of the grid, preferably in a recursive way, based on which the control signals can be updated accordingly? This estimation should also be preferably carried out in a distributed way over the sparse communication topology generated from the controller. While traditional notions of adaptive control can be highly useful here, the speed of estimation may suffer if the entire model needs to be estimated. Newer ideas from reinforcement learning, adaptive dynamic programming, and Q-learning can be useful alternatives in this case as they tend to optimize the objective function directly by bypassing the need for estimating the model.

  2. 2.

    Another question is—whether it is necessary to base the design of the wide-area controller u(t) on the entire state-space model (14), or does it suffice to design it using a reduced-order model only? For example, recent papers such as [30] have shown that one can make use of singular perturbation based model reduction for designing u(t) in a highly scalable way for consensus networks using a hierarchical control architecture. This is especially true if the grid exhibits spatial clustering of generators due to coherency. Therefore, following [30] one idea can be to estimate a reduced-order network model, where every network node represents an equivalent generator representing an entire cluster, design an aggregated control input for each equivalent generator, and then broadcast the respective inputs to every generator inside that cluster for actuation. Open questions for these types of designs would include proofs for stability, sensitivity of closed-loop performance to estimation and model reduction errors, derivation of numerical bounds for closed-loop performance as a function of the granularity of model reduction, and so on.

  3. 3.

    The third question is—how can one robustify the wide-area controller (14) against the typical uncertainties in both the physical layer and the communication layer? Uncertainties in the physical grid model, for example, can easily arise from the lack of knowledge of the most updated model parameters in the set \(\alpha _i\) in (6). With increase in renewable penetration and the associated intermittencies in generation profiles, operational uncertainties are gradually increasing in today’s grid models [33]. Similarly, significant uncertainties will exist in the cyber-layer models as well including uncertainties due to delays, congestion, queuing, routing, data loss, synchronization loss, quantization, etc. A natural concern, therefore, is—how can the design in (14) be made aware of these uncertainties so as to optimize the closed-loop performance of variables for the entire CPS grid?

Many other CPS-centric design and implementation questions related to scalability, centralized versus distributed implementation, speed of computation, big data analytics in the loop, and codependence of (14) on other state estimation and control loops (for example, those using SCADA) can also arise. All of these questions deserve dedicated attention from researchers with backgrounds in control theory, power systems, signal processing, machine learning, computer science, communication engineering, economics, and information theory. In the following sections, we highlight several of these CPS research challenges for wide-area control.

4 Cyber-Physical Implementation of Wide-Area Controllers

Once designed, wide-area controllers as (12) need to be implemented in a distributed way by transmitting outputs measured by PMUs over hundreds of miles across the grid to designated controllers at the generation sites. Depending on the number of PMUs, the rate of data transfer can easily become as high as hundreds of Terabytes per second. The timescale associated with taking these control actions can be in fractions of seconds. Therefore, controlling latencies and data quality, and maintaining high reliability of communication are extremely important for these applications. The media used for wide-area communications typically use longer range, high-power radios, or Ethernet IP-based solutions. Common options include microwave and 900 MHz radio solutions, as well as T1 lines, digital subscriber lines (DSL), broadband connections, fiber networks, and Ethernet radio. The North American Synchrophasor Initiative (NASPI), which is a collaborative effort between the U.S. Department of Energy (DOE), the North American Electric Reliability Corporation (NERC), and various electric utilities, vendors, consultants, federal and private researchers and academics is currently developing an industrial grade, secure, standardized, distributed, and scalable data communications infrastructure called the NASPI-net to support Synchrophasor applications in North America. NASPI-net is based on an IP Multicast subscription-based model. In this model, the network routing elements are responsible for handling the subscription requests from potential PMU data receivers as well as the actual optimal path computation, optimization, and recomputation and rerouting when network failures happen. The schematic diagram of the NASPI-net is shown in Fig. 1. An excellent survey of NASPI-net can be found in [34].

Fig. 1
figure 1

Architecture of NASPInet [23]

The implementation of the state estimator (7) and the controller (12) in the NASPI-net may be done in the following way. PMU measurements \(y_i(t)\) from each utility company i, as shown in Fig. 1, are first gathered in a local Phasor Data Concentrator (PDC) at the local substation using a local-area network, assuming that the PDC is located at a reasonably close geographical distance from all the PMUs in that service area. The generator bus variables \(z_i(t)\) are computed from \(y_i(t)\) using Kirchoff’s laws at this PDC, and the decentralized state estimator (7) may be run to generate \(\hat{x}_i(t)\). This state estimate then enters the NASPI-net data bus through a Phasor Gateway (PGW). The PGW has high levels of security encryption so as to prevent the flow of any malicious data between the local PDC and the data bus. The state estimates are then communicated via standard Internet protocols to the designated controllers of other generators so that they can be used for computing the control signal \(u_j(t)\) for the jth generator, which finally gets actuated using the excitation control system located at this generator.

Fig. 2
figure 2

Wide-area control using a cloud-in-the loop architecture

A slightly different cyber-physical architecture for implementing these types of controllers has recently been proposed in [21]. A similar distributed communication architecture for open-loop oscillation estimation using PMU data is also presented in [35]. This architecture, shown in Fig. 2, is very similar to the NASPI-net except that here the estimation of the states, and the computation of the control signals are not done at the PDC or at the generator, but entirely inside a cloud computing network. For example, PMU measurements from inside a service area are still gathered at its local PDC, but this PDC does not generate the state estimate. It rather ensures that all the measurements are properly synchronized with respect to each other, that their measurement noise is within acceptable limits, and that the measurements do not consist of any bad data due to GPS errors, or errors in the phase-locked loop inside the PMUs. The PDC then relays all the measurements to a local service-based private cloud owned by the utility company, wherein they are gathered in a virtual computer or Virtual Machine (VM), created on the fly using the available computation resources in the cloud. The measurements are digitally represented as a periodic stream of data points. The geographical location of the VMs can be close to that of the generators in that area so that the latency from PDC-to-cloud communication is small. The state estimator (7) can then be employed in this VM. The local clouds themselves are connected to each other through an Internet of clouds, as shown in the figure. VMs in every local cloud, depending on the sparsity structure demanded by the controller \(u(t)=K\hat{x}(t)\), are connected to other remote VMs through an advanced, secure, third-party wide-area communication network such as SDN, an example of which can be Internet2. The VMs can then exchange their estimated states \(\hat{x}_i(t)\), and compute their control signals \(u_i(t)\) through this network in a completely distributed way.

In reality, depending on the number of PMUs inside any service territory, the local cloud of a utility company may have multiple VMs, each of which receives a designated chunk of the local PMU measurements from the local PDC, as shown in the figure. These VMs communicate with their neighboring VMs inside the cloud as well as to those across other clouds for exchanging PMU data, and for computing control signals via predetermined sparse feedback control laws such as (14). The control signals are, thereafter, transmitted back from the local cloud to the actuators of the corresponding generators inside their respective service regions. The resulting system is referred to as a cloud-in-the-loop wide-area control system. End-to-end delay specifications for wide-area control using this type of wide-area communication have been presented in [22]. Two main advantages of this architecture compared to NASPI-net are that all the computations are done inside a third-party cloud, thereby preserving the privacy of PMU data from direct exchange between the utilities, and that the communication between the clouds no longer has to be based on standard Internet protocols but can rather use more advanced networking technologies such as SDN with additional layers of network controllability.

For either architecture, the shared communication network will have delays arising from routing and queuing, besides the usual transmission delays due to the geographical distance between the VMs. Three classes of delays may be defined, namely—small delays \(\tau _s\) that arise due to message queuing inside any virtual machine, thereby delaying the availability of the measured state variable assigned to that machine for computing the corresponding control signal; medium delays \(\tau _m\) that arise due to communication between two virtual machines that are part of the same local cloud, and large delays \(\tau _l\) that arise due to communication between any two virtual machine that are located in two different local clouds. The stochastic end-to-end delay experienced by messages in an Internet-based wide-area communication link can be modeled using three components:

  1. 1.

    The minimum deterministic delay, say denoted by m,

  2. 2.

    The Internet traffic delay with Probability Density Function (PDF), say denoted by \(\phi _{1}\), and

  3. 3.

    The router processing delay with PDF, say denoted by \(\phi _{2}\).

The PDF of the total delay at any time t was written in terms of these three components as

$$\begin{aligned} \phi (t) = p\, \phi _{2}(t)+(1-p)\,\int _0^t \phi _{2}(u)\phi _{1}(t-u)du,~~~t \ge 0, \end{aligned}$$
(15)

Here, p is the probability of the open-period of the path with no Internet traffic. The router processing delay can be approximated by a Gaussian density function

$$\begin{aligned} \phi _{2}(t) = \frac{1}{\sigma \sqrt{2\pi }}e^{-\frac{(t-\mu )^2}{2\sigma ^{2}}}, \end{aligned}$$
(16)

where \(\mu > m\). The Internet traffic delay is modeled by an alternating renewal process with exponential closure period when the Internet traffic is on. The PDF of this delay is given by

$$\begin{aligned} \phi _{1}(t) = \lambda e^{-\lambda t}, \end{aligned}$$
(17)

where \(\lambda ^{-1}\) models the mean length of the closure period. The Cumulative Distribution Function (CDF) of this delay model can then be derived as

$$\begin{aligned} P(t)&= \int _{-\infty }^{t}\phi (s)ds = \frac{1}{2}\left[ \mathrm {erf}\left( \frac{\mu }{\sqrt{2}\sigma }\right) +\mathrm {erf}\left( \frac{t-\mu }{\sqrt{2}\sigma }\right) \right] \nonumber \\&\quad + \frac{(p-1)}{2}e^{(\frac{1}{2}\lambda ^2\sigma ^2+\mu \lambda )}e^{-\lambda t}\left[ \mathrm {erf}\left( \frac{\lambda \sigma ^2+\mu }{\sqrt{2}\sigma }\right) +\mathrm {erf}\left( \frac{t-\lambda \sigma ^2-\mu }{\sqrt{2}\sigma }\right) \right] . \end{aligned}$$
(18)

Random numbers arising from this CDF can be used for simulating delays of the form \(\tau _s\), \(\tau _m\) and \(\tau _l\), as defined above. The challenge, however, is to translate these single-path models to multipath, multi-hop, shared network models where background traffic due to other applications may pose serious limitations in latencies. Recent references such as [36, 37] have provided interesting theoretical tools such as Markov jump process, Poisson process, multi-fractal models, and Gaussian fractional sum-difference models for modeling delay, packet loss, queuing, routing, load balancing, and traffic patterns in such multichannel communication networks.

5 Cyber-Physical Codesigns

The next question is—how can the controller in (12) be codesigned with the information about \(\tau _s\), \(\tau _m\) and \(\tau _l\)? The conventional approach is to hold the controller update until all the messages arrive at the end of the cycle. However, this approach may result in poor closed-loop performance, especially for damping of the inter-area oscillations. In recent literature, several researchers have looked into delay mitigation in wide-area control loops [38,39,40,41], including the seminal work of Vittal and coauthors in [3] where \(\mathcal H_\infty \) controllers were designed for redundancy and delay insensitivity. All of these designs are, however, based on worst-case delays, which make the controller unnecessarily restrictive, and may degrade closed-loop performance. In [21], this problem was addressed by proposing a delay-aware wide-area controller, where the feedback gain matrix K was made an explicit function of delays using ideas from arbitrated network control theory [42].

For example, considering the three delays \(\tau _s\), \(\tau _m\) and \(\tau _l\) defined in the previous section, one may update the control input at any VM as new information arrives instead of waiting till the end of the cycle. If tweaking the protocols is difficult, then another alternative strategy will be to estimate the upper bounds for the delays using real-time calculus [43, 44]. The approach is referred to as arbitration, which is an emerging topic of interest in network control systems. Based on the execution of the three protocols, one may define two modes for the delays—namely, nominal and overrun. If the messages meet their intended deadlines, they are denoted as nominal. If they do not arrive by that deadline, they are referred to as overruns. Defining two parameters \(\tau _{th1}\) and \(\tau _{th2}\) such that \(\tau _{th1} \le \tau _{th2}\), one may define nominal, skip, and abort cases for computing the wide-area control signal as:

  • If the message has a delay less than \(\tau _{th1}\), we consider the message as the nominal message of the system and no overrun strategy will be activated.

  • If the message suffers a delay greater than \(\tau _{th1}\) and less than \(\tau _{th2}\), the message will be computed; however, the computations of the following message will be skipped.

  • If the message suffers a delay greater than \(\tau _{th2}\), the computations of the message will be aborted, and the message is dropped. This strategy is motivated by assuming that the messages will be significantly delayed and are no longer useful.

Accordingly, a feasible way to formulate the execution rules can be: (1) if \(\tau _{th1} \le \tau _{th2} \le \tau _{wcet}\), where \(\tau _{wcet}\) is the worst-case delay in the network, both abort and skip can happen, (2) Abort Only: if \(\tau _{th1}=\tau _{th2} < \tau _{wcet}\), the message will be dropped if they miss their first deadline, and (3) Skip Only: if \(\tau _{th1} \le \tau _{wcet}\) and \(\tau _{th2} \ge \tau _{wcet}\). One idea will be to set \(\tau _{th2} = \tau _{wcet}\) to develop a constructive strategy to determine \(\tau _{th1}\). This step can be a significant improvement over conventional network control designs in terms of both closed-loop performance and resource utilization.

Fig. 3
figure 3

(reproduced from [21] ©2014 IEEE and used with permission)

Discrete-time delays \(\tau _1<\tau _2<\tau _3\). The vector indicated next to the arrow represents the state used to compute input \(u_{ij}\)

To justify this approach, we cite an example from the control design presented recently in [21]. The sampling interval h of the PMUs between two consecutive control updates was broken down into three smaller intervals at which the inputs were updated as new measurements arrive, as shown in Fig. 3. If any state is unavailable, it is replaced by its predicted value. For the first generator, for example, the input \(u_1(k)\) is further divided to \(\begin{bmatrix}u_{11}(k)&u_{12}(k)&u_{13}(k)\end{bmatrix}\), where \(u_{ij}(k)\) denotes the input of the ith generator adjusted using the measurements of jth generator. Repeating the same logic for all generators, it was shown in [21] that the discrete-time model of the system can be written as

$$\begin{aligned} x(k+1) = A x(k) + \sum _{i=1}^n \sum _{j=1}^{m(i)} B^i_{j1} u_{ij} (k)+ \sum _{i=1}^n B^i_{i2} u_{i k(i)} (k-1), \end{aligned}$$
(19)

where m(i) shows the number of times that the inputs are updated in each generator, and k(i) is the index of the largest delay, or equivalently as

$$\begin{aligned} x(k+1)= A x(k) + B_2 u(k-1) + B_1 u(k). \end{aligned}$$
(20)

In other words, the excitation controller needs feedback from the current state samples as well as the past input samples to stabilize the closed-loop swing dynamics with communication delays. An open problem is to derive the equivalent expressions of u(k) for various typical protocols used for wide-area communication, develop tractable and scalable ways for tuning the control gains to guarantee closed-loop stability and performance while promoting sparsity in the network structure as indicated by the set \(\mathcal S\) in (14), and, most importantly, validating these communication and control laws using realistic cyber-physical testbeds. These points are explained in the following sections.

6 SDN, NFV, and Cloud Computing

As mentioned earlier, typically the Internet cannot provide the required latency and packet loss performance for grid operation under high data rates. Moreover, the network performance is highly random, and therefore, difficult to model accurately. That is why the cloud-based WAMS architecture proposed in Fig. 2 is currently garnering a lot of attention from power system engineers. However, limited studies have been conducted so far to leverage all possible benefits of cloud computing, Software-Defined Networking (SDN), and Network Function Virtualization (NFV), to accelerate this development [24]. With the recent revolution in networking technology, these new communication mechanisms can open up several degrees of freedom in programmability and virtualization for computation and communication platforms. However, customized SDN control and protocols, and sufficient experimental validation using realistic testbeds are still missing in almost all wide-area control applications.

Latency and data loss rate are important factors in the performance of all wide-area control and protection applications. Software such as Real-Time Dynamics Monitoring System (RTDMS), Phasor Grid Data Analyzer (PGDA), and GridSim are used for online oscillation monitoring using Synchrophasors. A list of related open-source software can be found in [45]. These simulation engines need to be integrated into executable actions so that results from the monitoring algorithms can be exported to a custom SQL database that can be set to trigger alerts or alarms whenever damping levels of oscillatory modes fall below prespecified thresholds. These alarm signals need to be communicated to the operator through a reliable communication network so that the operator can take manual actions to bring the damping back to acceptable levels [46]. In recent years, simulation platforms such as ExoGENI-WAMS [47] have been developed to emulate such communication platforms. The computation and communication planes are entirely shifted away from the physical infrastructure, similar to the architecture proposed in Fig. 2. Another example of a CPS simulator is GridSim [48]. The data delivery component in this simulation platform, also referred to as GridStat, is a publish-subscribe middleware, which allows for encrypted multicast delivery of data. GridStat is designed to meet the requirements of emerging control and protection applications that require data delivery latencies on the order of 10–20 ms over hundreds of miles with extremely high availability.

Similar to GridStat, the VMs in any cloud-in-the-loop CPS simulator may consist of two communication planes, namely a data plane and a management plane. The data plane is a collection of Forwarding Engines (FEs) designed to quickly route received messages on to the next VMs. The FEs are entirely dedicated to delivering messages from publishers to subscribers. Routing configuration information is delivered to the FEs from the management plane. The forwarding latency through an FE implemented in software is generally on the order of \(100\,\upmu \text {s}\), and with network processor hardware, it is less than \(10\,\upmu \text {s}\). The management plane, on the other hand, is a set of controllers, called QoS brokers, that manage the FEs of the data plane for every VM. The Quality-of-Service (QoS) brokers can be organized in a hierarchy to reflect the natural hierarchy in the physical infrastructure of the grid model. When a subscriber wishes to receive data from a publisher, it communicates with a QoS broker that designs a route for the data, and delivers the routing information to the relevant FEs and VMs, creating the subscription.

However, simulation platforms such as GridStat are just starting points for research. Much more advanced cloud computing and SDN protocols as well as their emulation software need to be developed for the future grid. One outstanding challenge is to develop a reliable communication software that will enable timely delivery, high reliability, and secure networking for these emulators. Timeliness of message requires guaranteed upper bounds on end-to-end latencies of packets. Legacy networking devices do not provide such guarantees, neither for commodity Internet connections nor for contemporary proprietary IP-based networks that power providers may operate on. Moreover, direct communication lacks rerouting capabilities under real-time constraints, and resorts to historic data when communication links fail. A promising solution for both timeliness and link failures can be the idea of Distributed Hash Tables (DHT), which was recently introduced for wide-area control applications in [49].

7 Cybersecurity

Another critical CPS challenge for both wide-area monitoring and control is the issue of cybersecurity. While security is an universally growing concern for applications at every layer of the grid, ranging from distribution grids to power markets, the challenge for WAMS is especially critical since the stakes here are much higher. The integration of cyber components with the physical grid introduces new entry points for malicious attackers. These points are remotely accessible at relatively low risks to attackers compared to physical intrusions or attacks on substations. They can be used to mount coordinated attacks to cause severe damages to the grid, resulting in catastrophic blackouts with billions of dollars worth of economic loss. One eye-opening example of such an attack was the Stuxnet [50]. Attacks can be originated on the cyber-layer as well to trigger cascading events leading to damages on physical facilities, leading to major outages. While mathematical models have been developed to model electrical faults and device failures, there are far less reliable ways of modeling and simulating realistic scenarios for different types of cyberattacks happening in a power grid. Several universities and national laboratories have only recently started developing simulation testbeds to emulate these vulnerability scenarios. Research demonstration events such as Cyber Security Awareness Month, which is a student-run cybersecurity event in the US, have been introduced by the Department of Homeland Security [51]. Many organizations are working on the development of smart grid security requirements including the NERC Critical Infrastructure Protection (NERC-CIP) plan, the National Infrastructure Protection Plan (NIPP), and National Institute of Standards and Technology (NIST). The goal is for power system operators to work with these standards organizations to develop simulation software that can model, detect, localize, and mitigate cyber vulnerabilities in the grid as quickly as possible. A more detailed overview of these methods will be provided in the forthcoming chapter by Wang.

Given the size, complexity, and enormous number of devices present in a typical grid, developing one unique solution for securing the grid from cyberattacks is probably impossible. Even the attack space can easily become huge, ranging from Denial-of-Service (DoS) attack on communication links, to disabling of physical assets, to data corruption, to GPS spoofing, to eavesdropping. Every physical application, whether it be state estimation, oscillation monitoring, Automatic Generation Control (AGC), or wide-area damping control, would need its own way of dealing with each of these attacks. One common solution to make all of these applications more resilient is to switch from centralized to distributed implementation, as alluded to in the previous sections. Although distributed communication opens up more points for an attacker to enter through, it also provides resilience through redundancy. For example, the distributed architecture shown in Fig. 4 was recently used in [52] for the purpose of wide-area oscillation monitoring. The idea was to carry out distributed estimation of the eigenvalues of the small-signal model of a power system using multiple VMs following the cloud-in-the-loop architecture of Fig. 2. If any of the VMs in a local cloud is disabled by a cyberattack, then one option is to quickly assign another estimators to take up the role of the disabled estimator. An alternative option is to run distributed localization algorithms such as those proposed in [52] to identify the faulty VMs, and eliminate them from the cloud.

Fig. 4
figure 4

(reproduced from [52] ©2018 IEEE and used with permission)

Cyberattacks on wide-area monitoring and control

So far most attack mitigation and localization methods in the literature are geared towards open-loop applications such as state estimation or oscillation estimation. Much more work is needed to extend these methods to closed-loop wide-area control, such as the design in (14). For example, an LQR wide-area controller was designed in [53] using a sparse communication graph \(\mathcal G\). The example was also cited in [54]. The nonlinear model of the IEEE 39-bus power system model was simulated with this sparse state-feedback controller using Power System Toolbox (PST) in Matlab. A fault was induced at \(t=0\), and the small-signal speed deviations of the synchronous generators were recorded, as shown in Fig. 5. The closed-loop system behavior is observed to be stable. At \(t=10\) s, a DoS attack induced on the communication link connecting generators 1 and 8, which means that these two generators are no longer capable of exchanging state information for their control actions. Instability is noticed immediately, with the frequency swings diverging with increasing amplitude at a frequency of roughly 0.05 Hz. This can be seen in Fig. 5 onwards from \(t=10\) s. At \(t=60\) s, the communication links connecting generators (2, 6) and (3, 6) are added, and the corresponding control gains are recomputed. The system is seen to regain stability, indicating that the attack has been successfully mitigated. The frequency deviations are all seen to converge to zero over time. This example shows the importance of developing formal recovery rules of wide-area controllers in face of attacks. It also highlights the need for developing effective simulation packages for investigating the impact of attack scenarios. Simulations, in fact, can reveal the most important pairs of generators that must communicate to maintain stable operating condition before and after an attack, and also the lesser important pairs that, either due to large geographical separation or weak dynamic coupling, do not necessarily add any significant contribution to stability. Software packages for illustrating other types of attack scenarios such as data manipulation attacks, jamming, eavesdropping, and GPS spoofing also need to be developed [55, 56].

Fig. 5
figure 5

(reproduced from [54] ©2017 IEEE and used with permission)

Time response of generator speeds before, during, and after DoS attack

8 Simulation of Wide-Area Controllers

Besides design and implementation, an equally important challenge for WAMS engineers is to develop simulation platforms where extreme-scale power grid models, and their associated wide-area monitoring and control algorithms can be simulated for various contingencies against various system-level uncertainties. As described in the introduction, the grid itself is changing because of the proliferation of new technologies. New sources of renewable generation that are intermittent and have no rotating inertia are being added rapidly. New types of loads such as electric vehicles and smart buildings are also proliferating. Power electronic converters and controllers are being introduced to connect these new generation sources, load and storage devices to the grid. The grid is being overlaid with more PMUs, communications, computers, information processors, and controllers. The challenge is that to fully showcase the capability of any wide-area controller, all these new technologies must be modeled and simulated together. Growth in computing power shows no signs of slowing down, so we do not foresee any limitations on model size or algorithm speed. However, the ability to utilize such powerful simulations depends on how easy their handling can be made to the engineers. Moreover, power systems today are gradually becoming an integral part of other interconnected infrastructures such as gas networks, transportation networks, communication networks, water networks, economics, and food chain networks. Thus, interoperability of simulation programs will become a key to minimizing the manual effort needed to set up and run co-simulations of these systems with PMU-assisted monitoring and control loops in grid models. We highlight two potential ways by which these co-simulations can be handled.

8.1 Parallel Computing

An obvious approach to speed up simulations is to utilize parallel or distributed computing. Although specially written programs for particular parallel architectures can provide high speed-ups, the rapidly changing hardware and software makes it impossible to keep modifying the simulation programs to keep up. The trend today is to use multiprocessor computers with compilers that can distribute the computation optimally to multiple processors. For wide-area monitoring and control simulations, some applications are much more amenable to parallelization than others. For example, any simulation that requires running many contingencies can run these hundreds of contingency cases in separate processors. The dynamics of individual generators can be run in parallel but the network that connects the generators has to be solved simultaneously. It turns out, however, that the algebraic equations representing the network cannot be parallelized very efficiently and become the main bottleneck for speeding up power system simulations. Parallel computing can also be used for gain-scheduling of robust wide-area controllers.

8.2 Hybrid Simulations

In order to implement fast wide-area control strategies, it is highly desirable to have a faster than real-time simulation of that model in hand. One promising solution is hybrid mixed-signal hardware emulation. In these emulations, hardware-based digital simulation and analog simulation can be used in an integrative way to achieve a massively parallel and scale-insensitive emulation architecture. There have been several attempts at building hardware accelerators in the past [57, 58]. For example, the grid models can be emulated in hardware by a coupled set of oscillators, resistors, capacitors, and active inductors. A higher frequency can be used so as to suit the scale of on-chip elements and permit faster than real-time operation. These elements then need to be built on chip, and connected with a customizable switch matrix, allowing a large portion of the grid to be modeled in real-time. Researchers have proposed the use of Verilog-AMS model, which is a hardware modeling language that includes features for Analog and Mixed Signal (AMS) elements [59]. It can incorporate equations to model analog subcomponents. The AMS model can be designed to emulate open-loop models of very large-scale power systems with tens of thousands of buses with built-in equations for AC power flow, electromagnetic and electromechanical dynamics of synchronous generators and induction generators, AC load models, power oscillation damping control, voltage control, droop control, AGC, and PSS. Several challenges that still stand on the way for developing at-scale faster than real-time simulations are:

  1. 1.

    How to synthesize the transmission network without unnecessary and unrealistic assumptions, and approximations using state-of-the-art microelectronics design technology?

  2. 2.

    How to develop a scalable mixed-signal emulation architecture capable of large power systems with tens of thousands nodes?

  3. 3.

    How to design configurable units of emulation on a chip so that any large power system with realistic transmission connections can be realized via software-based configuration?

Research activities are underway on building VLSI chips that can scale to large grid models for faster than real-time emulators, especially directed towards transient wide-area AC emulations. In [58], for example, predictive simulations were shown to be very useful for real-time path rating.

9 Simulation Testbeds

Gaining access to realistic grid models and PMU data owned by utility companies can be difficult due to privacy and nondisclosure issues. More importantly, in many circumstances, even if real data are obtained they may not be sufficient for studying the detailed operation of the entire system because of their limited coverage. To resolve this problem, several smart grid simulation testbeds have recently been developed to facilitate hardware-in-loop simulation of different grid applications without the need for gaining access to real data. Selected examples of such testbeds in the United States include CPS testbeds at Washington State University using GridStat [48], cloud-assisted wide-area control testbed at North Carolina State University using ExoGENI [47], cybersecurity testbeds at Iowa State [60], TCIPG in University of Illinois [61], DETER-lab testbed at University of Southern California [62], CPS testbeds at Idaho National Lab, Cornell University, and Pacific Northwest National Labs, and a big data hub at Texas A&M [63]. A comprehensive list of many other smart grid testbeds and their CPS capabilities was recently presented in the survey paper [64]. Two key questions that most of these testbeds are trying to answer are—(1) is it possible to design sufficiently general CPS standards and protocols to support a mass plug-and-play deployment of a wide-area grid without sacrificing reliability, data privacy, and cybersecurity, and (2) if so, then what standards and protocols are required to transform today’s grid into an end-to-end enabler of electric energy services.

9.1 Hardware Components

Generally speaking, the physical components of these testbeds are comprised of Real-Time Digital Simulators (RTDS) and Opal-RT. These are power system simulator tools that allow real-time simulation of both transmission and distribution models with a time-step of \(50\,\upmu \text {s}\). The RTDS comes with its own proprietary software known as RSCAD, which allows the user to develop detailed dynamic models of various components in prototype power systems. The RTDS also comes with digital cards that allow external hardware to interface with the simulation. For example, the Gigabit Transceiver Analog Output (GTAO) card allows the user to view low-level signals proportional to voltages and currents at different buses of the system in real time. The GTAO card generates voltage and current waveforms, and communicates them to sensors such as relays, circuit breakers, and PMUs. The PMUs measure these signals, and send the resulting digitized phasor data calculations to the PDCs. The PDC time-stamps and collects the data from all the PMUs, and sends them to the server for display and archival, when requested. The hardware and the software layers of these testbeds are integrated with each other to create a substation-like environment within the confines of research laboratories. The two layers symbiotically capture power system dynamic simulations as if these measurements were made by real sensors installed at the high-voltage buses of a real transmission substation.

9.2 Cyber and Software Components

The cyber-layer, on the other hand, is generally emulated by either a local-area network or a local cloud service. The ExoGENI-WAMS testbed at NC State [47], for example, is connected to a state-funded, metro-scale, multilayered advanced dynamic optical network testbed called Breakable Experimental Network (BEN) that connects distributed cloud resources in local universities [65]. It allows one to set up dynamic multilayer connections of up to 10 Gbps between different sites. One may simulate different types of disturbance events in power system models in RTDS, collect the emulated grid responses via PMUs and other sensors, communicate these data streams via BEN, and run virtual computing nodes at various sites in the ExoGENI cloud overlaid on top of BEN to execute distributed estimation and control algorithms. Some open questions for these types of setups in the future are, for example—where to deploy the computing facilities to better facilitate data collection and processing, and how to design better communication topologies.

9.3 A Network of Remote Testbeds

One pertinent question is whether these different simulation testbeds at different locations should be conjoined with each other to create a much bigger nationwide network of CPS testbeds. And if yes, then what are the most common challenges for such remote testbed federation? Developing protocols for usability by different users, and potential safety hazards are two important challenges, for example. Researchers are also contemplating making their testbeds open to public for accelerating research in the power and CPS community, but a robust economic and ethics model for sharing access to private resources still needs to be developed. Should there be a common centralized simulation testbed for accessing power system model and data, one must also resolve standardization issues, communication issues, maintenance costs, and strategies for sustainability.

9.4 Interoperability: Databases and User Interfaces

One of the major challenges for the users of the hundreds of existing simulations is that none of them are compatible with each other. Using two different simulations require keeping up two different databases of input data, and being familiar with two different sets of graphical outputs. This is not only true for different types of applications but also of the same application marketed by different vendors. Thus, in the current state of the art, it is impossible to integrate these different simulation programs. The easiest way to encourage interoperability is to standardize the databases. The data that go into these databases are proprietary to the utilities, and if the utilities can agree on using a standard database the simulation vendors will have to adopt it. In the USA, the National Institute for Standards and Technology (NIST) has been tasked to develop such standards. An earlier standard called the Common Information Model (CIM) is now an IEC standard, and is slowly being adopted at different rates in different countries. A similar effort should be made to standardize user interfaces for wide-area monitoring and control. If the database and user interface for a simulation are standardized, the ability to integrate different simulators as mentioned above would become much simpler.

10 Summary and Conclusions

This tutorial serves as a technical invitation to engineers for entering the challenging and attractive research field of wide-area control of power systems. We presented an overview of the main research ideas related to the cyber-physical aspects of this topic, and established a strong dependence of control on various properties of the underlying communication, computing, and cybersecurity. Evidently, there are several challenges that need to be surmounted in order to implement the proposed designs, requiring a strong knowledge of stochastic modeling, estimation theory, distributed control, system identification, model reduction, robust control, optimization, and related topics in signal processing and computer science. It is, therefore, our hope that this topic will be viewed by budding and established control theorists as a challenging and attractive opportunity. We hope that the compelling societal importance of power and energy systems will also serve as an additional motivation to enter this endeavor.

While this tutorial overviewed the general research landscape of WAMS, in the following three chapters, we present a more detailed overview of some specific research challenges associated with this technology. A preview of these chapters is as follows.

The chapter by Chakrabortty enlists six research challenges that need to be resolved before wide-area control can transition from a concept to a reality. These challenges include scalability of modeling and control, system identification and online learning of models from streaming or stored PMU data followed by model validation, wide-area communication and its associated uncertainties and architectural bottlenecks, cost allocation strategies for renting links in wide-area communication networks where game-theoretic algorithms can play a significant role, cybersecurity of WAMS, and finally, WAMS simulation testbeds where real-time interaction between hardware PMUs and software emulation of communication networks can be tested with high fidelity.

The chapter by Wang highlights several research problems on the signal processing aspects of WAMS. It presents an overview of the data-centric challenges that the power industry is currently facing while it transitions from SCADA-based monitoring and control to wide-area monitoring and control using gigantic volumes of PMU data. Data quality is an inevitable issue for the control room incorporation of these data. Because of the communication networks that were not conventionally designed to carry high-speed PMU data and the early deployment of older PMUs, data losses and data quality degradations happen quite often in practice. Different applications have diverse requirements on the data quality. This chapter provides a detailed list of challenges associated with data quality, and also cites recommendations on how trust scores can be assigned to data for taking control actions.

The chapter by Sun addresses the topic of wide-area protection and mitigation of cascading failures using PMU data. In the current state-of-the-art, cascading failures in the grid are prevented by separating the grid into disconnected islands. The chapter proposes the use of controlled islanding using real-time PMU data by which system efficiency can still be maintained at an acceptable level. It specifically highlights the three main challenges on when to island, where to island, and how to island. For the latter, it proposes the use of advanced numerical algorithms for predicting transient instability using PMU data, and lists a number of open challenges on how these predictive commands can be made use of in taking appropriate control and decision-making actions.