1 Introduction

The power system is one of the most critical infrastructures in modern society. It has been a target for cyber and physical attacks in the past two decades. It has many components that depend on each other and work in a coordinated manner for seamless operation. Hence, when any major component fails to operate, it causes cascading effects on the other components as well. For better monitoring and security, smart meters, sensors and IoT devices are being integrated. An objective of the so-called “smart grid” is to use more information in a smarter way to optimize power systems [107]. Security is one of the important challenges in cyber-physical systems due to this integration which has made them vulnerable on both the physical and cyber sides [124]. With different attacks that have occurred and have been recognized, anomaly or intrusion detection systems are now in demand in the power domain [9]. Anomaly detection systems (ADS) are used to identify events or observations that seem suspicious when compared to the normal behavior of the data.

In this survey, we review a collection of 190 papers covering the power grid architecture, its vulnerable points, and primarily, the range of anomaly detection techniques that have been proposed in the domain to detect exploits. The systematization led us to a range of challenges that are promising avenues for future research to enhance the efficacy of anomaly detection methods while aiding traditional attack detection and response platforms.

Figure 1 shows an overview of this survey’s organization. We begin in Sect. 2 with a discussion of the methodology for carrying out the systematization of knowledge. We then provide relevant background information in Sect. 3, regarding the power grid architecture and its processes (Sect. 3.1), the different attack types and targets in the power grid (Sect. 3.2), and the different locations in the power grid where anomaly detection can be applied, as well as the types of anomalies that can be detected (Sect. 3.3). Next, we motivate the need for detection methods in Sect. 4, by discussing some of the past attacks that have taken place (Sect. 4.1), the new demand manipulation attack (MAD) and how they can be carried out using IoT devices (Sect. 4.2), and lastly the attack impact on the power grid (Sect. 4.3). We then dive deeper into the anomaly detection methods proposed for the power grid and discuss the various methods identified in the literature in Sect. 5. Next, in Sect. 6, we discuss the current challenges in the area of anomaly detection in power grids, and highlight potential future directions of research. Lastly, we conclude the paper in Sect. 7.

Fig. 1
figure 1

Overview of the organization in this survey

Table 1 Comparison of this survey with existing survey papers in the last five years

1.1 Related work and contribution

In this work, we focus on assimilating the different types of anomaly detection methods used in the power grid in a single place. One of the most comprehensive surveys on anomaly detection was done by Chandola et al. which focuses on all types of detection techniques applied to any application domain [29]. Cyber security for smart grid infrastructure and the issues in the smart grid are discussed in [10] and [226]. There are several open threats and attacks on the smart grid that have been identified, with potential solutions including access control, security of network protocols and attack detection methods [67, 94, 98, 148]. With the increasing usage of IoT devices for collecting readings and monitoring the grid states, the security of the smart grid that is aided by IoT devices is discussed in [172].

The papers that focus on surveying or reviewing various detection techniques in the power grid context are done either by choosing a single technique or a single power grid application. For example, some surveys focus on detection techniques for consumption data [56, 73], multimodal image data [241] or for time series data in the smart grids [233], whereas we consider any type of data. Mohammadi Rouzbahani et al. focus on machine learning techniques for detection in cyber-physical systems [56]; we consider any kind of detection technique that may be statistical, rule-based, or graph-based, among others. Unlike in [190], where the thematic analysis is carried out on process data and features considered for detection, we also consider the paper type, power grid application, attack type and detection types. We show a comparison of this survey with similar existing surveys in the last five years in Table 1. While reviews and surveys have been published in this area, this is the first paper to our knowledge that includes a comprehensive review of detection methods across the entire power grid considering different types of data. To this end, our main contributions in this work are as follows: (i) provide an overview across different themes that are identified during the systematic review, (ii) provide a detailed taxonomy of different detection techniques used in the power grid domain, and (iii) point out the gaps and challenges that currently exist in this domain.

2 Methodology

The research works used in this review are obtained and organized through a four-step process: (i) database search, (ii) title screening, (iii) abstract screening and (iv) thematic analysis. In this section, we discuss the methodology in detail.

2.1 Database and keyword search

We search through Google Scholar for three key phrases and obtain the first 100 results for each. The key phrases used are (i) “anomaly detection power consumption”, (ii) “anomaly detection power grid”, and (iii) “anomaly detection smart grid”. We select the option to sort by relevance and to obtain papers from any time range. At the end of this search, we get 300 papers that include work in progress (WiP), journal articles and conference papers. There are 50 unique papers that occurred multiple times in different keyword searches which are merged into a single version. After removing the duplicates, 238 papers remained for the next step of the analysis.

2.2 Inclusion and exclusion criteria

The selection criteria that are applied to the 238 papers are: (i) the paper should be related to any of the power grid processes, (ii) the paper should include anomaly detection or attack detection approaches in the power grid, (iii) the full text should be available through institutional access to the published conferences, journals or workshops, and (iv) the paper is not a WiP work. First, a title screening is done based on these criteria, after which 23 papers are excluded. Next, a thorough abstract screening is performed after which 21 papers are excluded. There are 4 papers that we cannot view through institutional access. A total of 48 papers are excluded leaving us with 190 papers for the analysis.

2.3 Thematic analysis

We follow a deductive approach to perform the thematic analysis and use a list of preconceived themes to find the codes in each theme. The themes, sub-themes and codes are listed in Table 2. The themes are identified before starting the analysis to categorize each paper with a code from every theme. There are 95 codes that are created. Once the codes are finalized, sub-themes are identified to provide a hierarchical view of the different types of codes that are observed. The codes are further divided into sub-codes specifically for the “Detection technique” theme, where each sub-code represents the method used. This is covered in detail in Sect. 5 with a taxonomy of methods identified. In this section, we discuss about the themes and sub-themes. The number of papers categorized into a single sub-theme is given in Table 2. The list of works under a specific code is available in Table 3.

Table 2 Summary of themes, sub-themes, paper count per sub-theme and codes identified in this work; Table 3 provides the list of works under a specific code

2.3.1 Paper type

The Paper Type theme is used to recognize what type of research is conducted in every paper. This theme consists of five sub-themes that are mutually exclusive. A Review, Survey and Evaluation types include papers that conducted a thorough review, survey or evaluation of existing detection techniques that may be focused on a single area such as machine learning techniques, deep learning algorithms, techniques used for power consumption data, or communication network data. Papers that propose a new method or variation of existing methods to solve a detection problem in any power grid application are themed under Methodology type. A Framework sub-theme is where a paper proposes a new architecture that may not be necessarily implemented.

Within the Methodology sub-theme, the code feature handling is for papers that focus on treatment and extraction of features from the raw data for anomaly detection [174], big data is for techniques that are proposed for big data applications and visual analysis is for methods that focus on detection using visualization techniques.

2.3.2 Focus area

The Focus Area theme is for recognizing the area of contribution of the paper in the power grid domain. A paper may be categorized into multiple focus areas based on the proposed methodology. Targeted Detection is used to categorize papers that are focused on contributing a detection technique while optionally considering other aspects like concept drift, early detection, collective and contextual anomalies that may improve the detection process. When papers consider distributed computing and other edge detection concepts, we categorize them into Big Data and distributed computing sub-themes. As the names suggest, Profiling and Privacy sub-themes are for papers that analyze behavior profiles and propose privacy-preserving techniques respectively. Wide Area Protection is used to represent research that applies to multiple sensors that are located across different power grid processes. Lastly, Blockchain is used to identify works that utilize blockchain technology for detection.

2.3.3 Data type

Different types of data features are used for the detection system based on the application in the power grid. This theme is used to categorize the papers based on the type of data that is used for the detection process. Generally, streaming data of a single device is used to detect anomalous behavior, but multiple features may also be used for identifying anomalous behavior that can be recognized with specific states of the different features. Additionally, images may also be used to monitor and identify faults using video data.

2.3.4 Power grid process

We categorize the detection methods based on the application location in the power grid. There are four main processes in the power grid, namely Generation, Transmission, Distribution and Utility. Any detection method that does not apply to any of the four processes is themed into Other that consists of detection in energy management systems (EMS), software, sensors and supervisory control and data acquisition (SCADA). Within each of the power grid processes, we identify which devices are used for the detection mechanism to categorize them into their respective code.

2.3.5 Attack type

The papers here are categorized by the attack type that the detection system is targeted to identify. These attacks can be performed by an adversary or can be operational faults and unknown anomalous behavior. The codes for the attack types are shown in Table 2 and denote the attack names.

2.3.6 Detection

The Detection theme is used to categorize the paper based on the type of detection that is carried out. This can be supervised, semi-supervised, unsupervised or any other category that is not covered by the previous three. The codes specify what type of method is used for the detection process. The details about each of the detection techniques are covered in Sect. 5. A single research paper may have multiple detection techniques going across sub-themes as well. The list of works utilizing a specific detection technique is available in Table 4.

3 Background

In this section, we discuss the power grid architecture, attack targets in the power grid, and potential application points of anomaly detection at different areas in the power grid.

3.1 Power grid architecture

The power grid is a complex and highly engineered network that coordinates between the generation and distribution of electricity to its customers. Modern power systems have grown into a sophisticated cyber-physical system due to the expansion of their electrical infrastructure and the consequential application of diverse communication and information protocols. The modern power grid consists of two tightly coupled layers: physical and cyber layers. The physical layer is responsible for carrying electricity end-to-end and is the core of the power system. It consists of four major domains: generation, transmission, distribution and consumption, as shown in Fig. 2. Electric power is produced at the generating station and is then transmitted through a high voltage transmission network. From there, it is distributed to the end users which can be industrial or residential customers.

Fig. 2
figure 2

Physical layer of the power grid; reproduced from [214] under CC BY 3.0

Fig. 3
figure 3

Types of attacks on the power grid

Over the years, power systems have become more heterogeneous in terms of all these domains. There have been major upgrades such as renewable energy plants, microgrids, and electricity storage. This brings great challenges for the operation of the power system to coordinate between different system components. Hence, efficient schemes are incorporated in the cyber layer that are responsible for ensuring security, protection and monitoring of the power grid.

The cyber layer consists of secondary devices and schemes that are capable of communication, data collection, storage, processing, and decision-making. For modern power systems, there are three most critical cyber layer systems: Supervisory Control and Data Acquisition (SCADA), wide area measurement system (WAMS) and advanced metering infrastructure (AMI). A remote terminal unit (RTU) is used to merge data from local secondary devices like sensors and meters. SCADA systems are industrial control systems responsible for distributed monitoring, control, collection and analysis of real-time data from RTUs. WAMS is an alternative to SCADA which collects data at a higher sampling rate from the phasor measurement units (PMUs), which measure and estimate the voltage, current magnitudes and relative phase angles. AMI is a system that is placed in the distribution sector that monitors, collects and analyzes the energy usage data of consumers. A large number of smart meters are deployed to collect real-time energy usage data to derive appropriate demand side control for reliable operations [205].

The smart grid network architecture includes Home Area Network (HAN), Neighborhood Area Network (NAN) and Wide Area Network (WAN) [224]. HAN and NAN are used by the Advanced Metering Infrastructure that includes smart meters, data concentrators and other metering components. Customers can also access their consumption portal using the HAN. NAN is built over the HAN and ensures communication between HANs, data centers and generation sources. On the other hand, WAN is used by wide area monitoring and controlling applications like SCADA and WAMS with limited bandwidth and capacity in closed networks.

3.2 Attack targets

Attacks can be targeted either towards the physical layer or the cyber layer. It is possible that there can be attacks on humans, such as on power system personnel, or even performed by them due to threats or bribery from attackers. We have three types of attacks on the power grid: physical attacks, cyber attacks and human attacks. Fig. 3 summarizes the different attacks possible on a power grid.

3.2.1 Physical attacks

Physical attacks are targeted at the physical components of the power grid. These attacks can be performed in any power grid domain, i.e. generation, transmission, or distribution. The generation process can be affected when an adversary disconnects the generators by remotely switching them off, removing any supporting cords from outlets or hindering the connection between the generator and other supporting devices. A transformer failure can occur by lightning strikes, degradation of any electrical insulation, power overload or direct incursion on the transformer. An adversary can hinder the tasks of current-carrying devices in the field that causes tripping of lines or can steal energy by doing so. Lastly, an adversary can manipulate the demand by controlling the consumers’ devices which can result in a chain reaction of the above physical attacks.

3.2.2 Cyber attacks

Cyber attacks are classified based on the basic requirements of a general cyber network into attacks against availability, integrity and confidentiality. An attack against availability can cause the loss of control of the local devices or a delayed response. For example, an attacker can affect the communication network by launching a DoS attack and hence, cause a delay of operation commands sent to local devices or the measurements sent to the control center. Injecting malware into the network can also affect the availability of the system by taking it offline or damaging existing files. An attack against confidentiality can cause the leakage of critical information. SCADA devices comprise remote terminal units (RTUs), programmable logic controllers (PLCs) and intelligent electronic devices (IEDs) which do not incorporate authentication or encryption mechanisms and hence, are at high risk of being exploited. Also, DNP3 is used by most North American utilities as a communication protocol, which still lacks security features like encryption or authentication. Due to these existing vulnerabilities, an adversary can get access to a device or the network and can analyze the traffic packets and information exchanged on the network. An attack against integrity can compromise the data and information communication in the cyber network, which can severely affect the normal operation of the power grid. While all the cyber attacks have negative impacts on the power grid, the attacks against integrity would be relatively more severe. Thus, it is further divided into attacks against measurements and attacks against commands. The attacker can manipulate a set of measurements to change the state estimation outcome and mislead the operator to make non-optimal or wrong dispatch decisions. This is achieved by either altering sensor data leading to a false data injection attack or by injecting false measurements in transmission and distribution buses leading to a load redistribution attack. Fabricated control commands may be sent to power grid devices to hinder its operation. This can be achieved by guessing passwords to control systems, replaying network traffic messages to perform specific actions, spoofing attacks and unauthorized access to remote devices.

3.2.3 Human attacks

Human attacks occur when a power grid associated personnel intentionally or unintentionally leak critical information or are forced to take some detrimental actions. It can also be possible that some consumers are misguided to perform an action such as switching on or off the devices in their houses, through false alerts sent to their phones. Such actions are difficult to prevent and affect the power grid operation with a huge impact.

3.2.4 Coordinated attacks

Coordinated attacks involve multiple attacks at the same time or a combination of any of the above three categories of attacks to cause severe failure, thus having a high attack success rate. For example, there can be coordinated physical attacks on multiple lines, coordinated cyber attacks against multiple substations or physically tripping a line and performing a DDoS attack in the SCADA network to produce a delayed operator’s response that may all lead to a severe failure [218]. A coordinated attack was performed during the 2015 cyber attack on the Ukrainian power grid attack (detailed in the next section).

3.3 Anomaly detection

Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behaviour [29]. In the power grid domain, anomaly detection is applied to a variety of tasks like finding abnormal consumption behaviors, identifying compromised field devices, anomalous grid states, line overloads, and attack detection. These techniques vary based on the applied area or objective of the detection. Figure 4 shows the different types of detection systems that are applied based on the detection area in the grid [66, 205]. This model does not imply the application of a detection technique at a physical location. This distinction is made to understand different types of views that can be combined by the grid operator for the detection of different anomalous instances in power systems. We will discuss each of these types and the assumed threat models.

Fig. 4
figure 4

Abstract architecture of a power grid with possible applications of anomaly detection at different levels of data flow

3.3.1 Field anomaly detection

Field anomaly detection focuses on the field components that are important for running the power grid. Based on the area of the physical components placed in any of the power grid essential processes, the components may differ based on their functionality. In the following, we discuss the different components that require anomaly and/or intrusion detection systems and list the types of attacks that are expected to be detected by such a system.

The Remote Terminal Units (RTUs) are used to interface the different physical components mainly, Intelligent Electronic Devices (IEDs) which include transformers, PLCs and circuit breakers, to the SCADA master station. It transmits the data collected from these devices to a data collection system using different communication protocols that can be serial or Ethernet-based. Phasor Measurement Units (PMUs) are devices that measure and report phasor angle and magnitude for the AC voltage or current at a specific location on a power line. These measurements are used for monitoring and analysis of the grid states. These values facilitate improving the accuracies of modeling system conditions, predicting and detecting stress and instability of the grid, predicting and managing line congestion and identifying any field inefficiencies. PMUs are used to replace the traditional SCADA devices with the benefit that they can provide up to 60 measurements a second compared to one in every 2 to 4 s [53]. Phasor Data Concentrators (PDCs) are used for collecting data from different PMUs, aggregate and time synchronize data and send it to synchrophasor applications that use the data [135]. Smart meters are used in distribution and consumption processes that record information like voltage, current, consumption and power factor. They communicate this information to the consumers for greater clarity of consumption behavior, as well as electric suppliers for system monitoring and customer billing.

We can see that all these devices that are used in different stages from electricity generation to delivery, have an important role in the reliable transmission and delivery of electricity from end to end. As these devices are crucial for the collection of information required for estimation, operation and monitoring, the data coming from these devices can be altered by an attacker, thus affecting the integrity of the data. Therefore, the detection task in this category is to identify bad measurements and compromised devices at the field level by using the normal behavior patterns of these devices when not under attack.

Threat In SCADA systems, the PLCs are used to manage specific equipment and to run automation tasks in the power grid and hence, are a common target to the attackers. As PLCs are light weight devices that run a specific task based on the code they run, attackers take advantage of these devices with less security and change the code to do any malicious tasks. One such attack was accomplished by using the Stuxnet worm (discussed in Sect. 4.1.1). IEDs, which are used to manage automation tasks in the power grid, can also be manipulated similarly. There are inherent vulnerabilities in encryption and authentication mechanisms of RTUs and PMUs which is a threat that the attackers often use to perform False Data Injection (FDI) attacks. FDI attacks aiming at the physical layer give attackers the power to change the measurements of these devices transmitted to the monitoring systems. This will result in disruption of the analysis results of state estimation, leading to the control center misjudging the power grid into emergency and implementing maloperation, thus damaging the economic benefits, monitoring capability and safe operation of the power system [207].

Types of attacks that can be detected False data injection attacks, load redistribution attacks, demand manipulation attacks.

3.3.2 Network anomaly detection

Communication networks are used by SCADA, WAMS and AMI to obtain data from the physical components and transfer to management and control centers. Denial of service attacks (DoS) are the most common attacks used by intruders in a network. In fact, a DoS attack was one of many attacks performed in a coordinated manner during the 2016 attack on the Ukrainian power grid. The SCADA communication networks are increasingly interconnected with corporate information technology (IT) networks for the collection and processing of data in real-time, thus providing greater opportunities for intrusion [144]. With the advent of WAMS and AMI, there are numerous devices and sensors connected to this communication network and hence, there is a higher chance of intrusion. Since the monitoring systems have the permissions to change the control codes or the behavior of the physical devices, this is a very critical layer in the power grid operations. With AMI, the privacy of the data collected from Smart Meters can also be compromised. Hence, conventional cyber intrusion detection systems can be applied in the networking environment.

Threat There can be intrusions into the network by attackers that can stay undetected for a while to gain information about the grid operations, obtain the data from the monitoring systems, lock the power systems personnel out of their systems, password guessing, replay and spoofing attacks, reprogramming or sending false commands to the underlying devices, or sending incorrect information to the management centers.

Types of attacks that can be detected Insider attacks, intrusion attacks, replay attacks, spoofing attacks, DoS/DDoS attacks, information theft.

3.3.3 Centralized anomaly detection

Control and management centers receive the data that has been collected from the physical components and also have access to global system features. Hence, anomaly detection can utilize numerous features to find general anomalous system behavior, consumption demands from the consumers and bad data measurements from the physical devices. This layer uses a combination of both field and network detection systems, but gives a centralized view of the entire power system to take control decisions. This layer aids in the detection of coordinated attacks on the power grid. Though such attacks are challenging to detect in practice, there is a possibility to detect these attacks by analyzing multiple layers at the same time using the centralized view.

Threat It is a combination of field and network threats and are required to be analyzed together.

Types of attacks that can be detected Coordinated attacks.

4 Attacks and impact

There have been numerous attacks in the past decade targeting the power grid. The goal of attacks that have taken place is not just to disrupt the power grid but also to steal electricity and access confidential documents from the communication networks. Other attacks are possible, such as false data injection attacks, which disrupt the grid system state estimation. In 2014, the adversary stole and posted plans for two nuclear reactors, as well as the data of 10,000 employees from a South Korean nuclear and hydroelectric company Korea Hydro and Nuclear Power (KHNP). The Ukrainian power grid was taken down by the adversary, cutting power to more than 200,000 households in 2015. It was attacked again in 2016 by disabling an electricity substation, cutting power for an hour to the customers. In 2019, the US Power grid was attacked for a 10-hour period where the attacker(s) exploited known vulnerabilities in an internet-connected firewall [198]. The US Department of Energy (DoE) reported 150 successful attacks between 2010 and 2014 that targeted systems holding information regarding electricity grids [110].

SCADA workstations and PLCs have also been a target for successful attacks on industrial control systems that use these components [35, 119]. There have been attacks involving the Stuxnet worm which sabotages industrial equipment controlled by a specific Siemens PLC by modifying PLC code and then hiding changes using rootkits. Such worms can cause severe damage to the underlying physical system [144].

The failure of a power grid can cause direct permanent effects on the equipment. It also results in failure of operation of other infrastructure as well. Hence, security of the power grid has been popular among researchers to address vulnerabilities and propose new mechanisms to detect intrusions. We discuss attacks that took place in the past and also some that are newly proposed which is the motivation behind creating new anomaly and attack detection methods. Figure 5 shows an overview of the attacks discussed in this section.

Fig. 5
figure 5

Overview of the attacks discussed in Sect. 4

4.1 Past attacks

In this section, we discuss two cyber attacks that took down targeted critical infrastructure in 2010 and 2015. These attacks were initiated through computers targeted to affect the physical components of the infrastructure.

4.1.1 Stuxnet worm attack

Computer worms are well known to spread to many targets as quickly as possible. They are aimed at computer systems to exploit vulnerabilities, like Blaster which exploited the remote procedure call (RPC) of Windows computers. Some were used to physically impact the system, like Sobig which flooded mail servers with copies of itself [32]. However, a new worm was discovered by VirusBlokAda in Iran’s nuclear power plant in July 2010 called the Stuxnet. Unlike traditional worms which targeted computer systems, Stuxnet was developed to take control of critical physical infrastructure. It is known to have infected approximately 100,000 hosts, with more than 60,000 in Iran and more than 20,000 in Indonesia, India and the USA according to a Symantec’s report [55]. This worm gained a lot of attention due to the stealth of the attack, and is known to have been one of the most complex threats. The following were the steps carried out to perform the attack using Stuxnet [55]:

  • The attackers conducted reconnaissance; as each PLC is configured uniquely, they would first need the Industrial Control System (ICS) schematics. The documents required to attain knowledge of the computing environment in the facility were either stolen by an insider or retrieved by any other previous version of Stuxnet or other malicious binary.

  • Using the information attained, the attackers would now develop the latest version of Stuxnet with the final goal of potentially sabotaging the ICS.

  • Once the code is ready, the attackers probably used a setup mirroring the target environment to successfully test the code.

  • As the malicious binaries consisted of driver files that were needed to be digitally signed, they compromised two digital certificates to achieve this task and appear legitimate.

  • The initial infection vector is a USB stick to infect a computer within the organization. This USB keeps count and allows only three infections. This was introduced into the target perhaps by a compromised personnel who had access to the facility. The infection running on a target system attempts to spread only for 21 days. These limits were enforced perhaps to maintain the stealth of the attack [32].

  • As soon as a single computer is infected, it searches for other Field PGs, copies and executes itself on computers running a WinCC database server. Field PGs are computers used to program or interact with the PLCs and are typically Windows systems. Most of these computers are non-networked and hence, cannot be remotely controlled using the Internet. Hence, Stuxnet would first try to spread on the LAN through a zero-day vulnerability that allowed infecting Step 7 projects through removable devices. WinCC/Step 7 software is used by the programmers to connect to the PLC and access the memory contents, reconfigure it, download or debug a program.

  • All the functionality required to sabotage a system was directly embedded in the Stuxnet executable and hence, was complex and large for a malware being almost half a megabyte written in multiple languages. Once it found a suitable computer that ran Step 7, it would then modify the code on the PLC. These modifications could not be detected as rogue PLC code as Stuxnet hides them using a rootkit.

4.1.2 Ukraine power grid attack

On December 23, 2015, a Ukrainian regional electricity distribution company reported outages to customers. The outages were due to a third party’s illegal entry into the company’s computer and SCADA systems. Approximately 225,000 customers were affected and lost power due to this incident. Three Ukrainian power suppliers experienced coordinated cyber attacks that were executed within 30 min of each other. Due to the large-scale impact, the suppliers were required to move to manual operations in response to the attack. Several consolidated technical components were used to successfully perform the attack. These are the steps carried out to perform the attack [26]:

  • The attack phases are explained by using the Industrial Control System (ICS) Cyber Kill Chain, which details the steps an adversary must follow to perform high-confidence attack on the ICS processes and cause physical damage to equipment in a predictable and controllable way. It consists of two stages: intrusion and attack.

  • The attacker was able to successfully intrude into the system by weaponizing Microsoft Office documents embedded with BlackEnergy 3 within the documents. These documents were sent out by email. When these documents were opened, a popup was displayed to users to encourage them to enable macros which allowed the malware to exploit the macro functionality to install BlackEnergy 3 on the victim’s system.

  • Upon installation, the malware connected to command and control (C2) IP addresses to enable communication by the adversary with the malware and the infected systems. These pathways allowed the adversary to gather information from the environment and enable access. They were also able to gain access to the remainder of the systems including the SCADA dispatch workstations and servers and extract data necessary to formulate a plan for the second stage (ICS attack).

  • For the second stage, the attackers learned how to interact with the distribution management system environments using the native control present in the system and operator screens. The adversary completed installing malicious software which was identified as a modified version of KillDisk across the environment.

  • The last act of modification was for the adversaries to take control of the operator workstations and thereby lock the operators out of their systems. The adversaries used the Human Machine Interfaces (HMIs) in the SCADA environment to open the breakers. At least 27 substations were taken offline across the three energy companies, impacting roughly 225,000 customers.

  • Simultaneously, the attackers uploaded the malicious firm-ware to the serial-to-ethernet gateway devices. This ensured that even if the operator workstations were recovered, remote commands could not be issued to bring the substations back online.

  • During this same period, the attackers also leveraged a remote telephonic denial of service on the energy company’s call center with thousands of calls to ensure that impacted customers could not report outages.

4.2 Demand manipulation attacks

While smart grids are often known to be vulnerable to cyber attacks compared to traditional power grids, Dabrowski et al. show how a non-smart grid is also vulnerable to coordinated load changing attacks [41]. This attack is performed by controlling a botnet of devices that can modulate the power demand much faster than power plants can react. Demand manipulation attacks take place at the utility side, where an attacker can manipulate the consumption and operation of the consumer devices, mostly being IoT devices that can be remotely controlled [187]. The attacker can also influence the behavior of the consumers by sending false messages to mimic the demand response program either before, after or during a peak consumption state [154, 155]. An advantage to attackers is that there are multiple entry points to carry out a demand manipulation attack. Unlike attacks on SCADA systems, there are a variety of options that can be used to manipulate the demand—using IoT devices, energy theft and social engineering as a tool to make people perform a task. They also do not require access to the system operations or understand how the grid works. It follows a black-box approach with attackers not requiring power grid domain knowledge, thus making it an option for a variety of adversaries.

Demand manipulation attacks are the newest kinds of attacks that do not test the security of the power grid systems, but affect the power grid from the utility side by changing the demands of the consumers. There are successful DDoS attacks using IoT devices and botnets that were previously carried out (e.g. Mirai botnet attack) and there are open security issues when it comes to IoT devices and protocols. Since power grid devices and the consumers cannot be removed from the power grid scenario, but are used as a tool to perform adverse attacks, there remains an advanced persistent threat to the power grid. Though communication networks and power grid devices are being enhanced in terms of security, the threat from consumers will always exist until host-based anomaly and intrusion detection systems are established in each home. We discuss two attack examples that were proposed by [187] and [154].

4.2.1 Attack Example 1: Demand manipulation using IoT devices

The poor security measures and ubiquity of IoT devices have been an advantage to attackers for creating botnets to perform DDoS attacks. Access to compromised high-wattage devices can allow an adversary to disrupt the power grid’s normal operation by manipulating the total power demand. Since there are many types of IoT devices in a household, a common question is whether it is possible to get access to all of them. However, by gaining access to home assistants such as Amazon Echo or Google Home, control of such devices is possible [187]. The attack model is described below.

  • The adversary obtains access to various high-wattage IoT devices such as air conditioners, space heaters, and electric ovens from multiple households in the same geographical location.

  • If the target of the attack is the generators, then an abrupt increase or decrease in the power demands causes frequency instability of the generators resulting in their tripping. This is achieved by synchronously switching on/off many high wattage devices. A black start is a system’s restarting process after a blackout, which can also be disrupted by causing frequency instability in the system.

  • When a frequency instability that does not have a significant effect happens, the primary controller of the generators stabilizes the system frequency. Line failures and cascading failures can be caused by increasing the demands, as an increase of only 1% can cause an outage in 86% of the loads at this stage when simulated in the 2008 Summer peak Polish Grid. The reason these lines are sensitive is because the way power is transmitted follows Kirchhoff’s laws and the grid operator has almost no control after the response of the primary controllers. Line failures can also occur by redistributing demand via increasing demand in some places and decreasing demand in others.

  • An increase in the operating costs is caused when the demand goes above the predicted demand; this is when the operator needs to purchase additional electric power from reserve generators. By simulations, it is observed that a 5% increase in power demand during peak hours results in a 20% increase in power generation costs.

4.2.2 Attack Example 2: Demand manipulation using social engineering

Research in behavioral psychology predicts that people who are normally lulled into a sense of cognitive ease, do not question the validity of the information unless it is significantly different than those from previous events [154]. Based on this expectation, Raman et al. report an unconventional mode of malicious attack which demonstrates that consumer behaviors could be manipulated by an attacker using false communications that could significantly impact the system reliability [154]. The attack model is described below.

  • The adversary obtains access to some form of media such as SMS, e-mail or other platforms that could be used to send legitimate communications to the residents.

  • The adversary invites the consumers to participate in an upcoming Demand Response (DR) event, with the time, duration and task specified. At the same time, the attacker also needs to block communications from the utility company.

  • The consumer who receives the message decides whether to accept or reject the DR event request. If the consumer accepts the event, then they take steps either manually or using an automated home energy management system (HEMS) to reduce consumption during the specified event period.

  • The attack can be performed to fake a DR event or before a real DR event takes place. The resulting overshoot in demand due to the fake event would reduce the effect of the consumer response to the actual event. These messages can be sent to fake maintenance shutdown alerts, suggesting the consumers to use appliances during high stress time periods. They can also be used to follow a legitimate DR message sent by a utility to declare that an event was canceled.

  • For example, the consumers might be asked to schedule washing machines, dryers and dishwashers during a specific time like 7 PM to 9 PM. Right after a peak demand time, if these high-wattage appliances are switched on, then it would impact the performance of tap changing regulators which would have to respond to the sudden change in the system voltage. Prolonged operation of such appliances can deteriorate the life of such transformers having to work under high loading conditions. It is observed in simulations that if 50% of consumers believe the fake message, the attacker could alter the system’s daily peak demand by more than 2% which is quite significant to the utility.

Fig. 6
figure 6

Taxonomy of anomaly detection algorithms discussed in this work; Table 4 provides the list of works under a specific detection technique

4.3 Attack impact

When attacks are carried out such that the physical components are affected, then the power grid operation is hindered, due to which the consequences are likely to be catastrophic. Any attack that collapses the power grid has cascading effects on other critical infrastructure as well, like water treatment plants and food production industries. Many people will be affected by the lack of basic elements that are necessary to sustain life in urban and suburban communities.

Some believe that the cyber threat to critical infrastructure is rather exaggerated. This is because cyber threats to this infrastructure that belongs to the nation have never caused the loss of human life, never injured a person or never damaged a building [164]. However, damage to the physical equipment requires bringing in new or reserve equipment that affects the operational costs. Power outages cost between $18 and $33 billion per year in the United States. The most recent 2021 Texas power crisis which was caused due to winter storms resulted in a $20.4 billion excess expenditure, being the most costly winter event in the US affecting 10 million people and resulting in more than 150 deaths [129]. Therefore, cyber attacks combined with climate and other external factors, can lead to heavy damage and losses to the power grid.

The energy industry is far behind most other industries when it comes to security best practices and maintaining systems as these industrial systems rely on 1970 s-era technology. It doesn’t get upgraded, because doing so would interrupt service [140]. The root causes of the increasing number of blackouts are aging infrastructure and a lack of investment and clear policy to modernize the grid. Also, electricity demand has grown 10 percent over the last decade, even though there are more energy-efficient products and buildings than ever [130]. Hence, the problem is only getting worse with this combination of legacy systems and increasing demands. The electric power industry spends over $1.4 billion annually to replace electromechanical systems and devices that involve manual operation with new SCADA equipment [213]. This shows that investment in physical infrastructure and security is consistent every year, but the progress is slow. This gives time for attackers to take advantage of vulnerabilities that are persistent in the power grids.

4.3.1 Physical impact

As power needs to be delivered to a large scale of consumers, it is impractical and costly to have electric power storage for them. Therefore, stable operation of the power grid relies on a balance between the power supply and demand. The demand is usually estimated by the operator based on the weather data and historical consumption data. This allows them to deploy enough generators to meet the demand beforehand, without overloading any power lines. The increase or decrease in demands, thus leading to an increase or decrease in the frequency of the system cannot be tolerated for a long time since frequencies lower than a nominal value cause severe damage to the generators. An unpredicted supply and demand setting may result in electric power overload on some of the power lines. Once a line is overloaded, it may be tripped by the protective relay, or it may break due to overheating which should be avoided by the relay. Another issue is voltage instability caused when the generated power becomes inadequate. In such cases, power flow equations fail which forces the grid operator to perform load shedding to recover the system from the voltage collapse and make the equations feasible again. This causes outages around the grid due to failures in a few lines. Failure of components such as generators, lines or transformers has a high impact on the operation of the grid. It might result in even a blackout and when the system tries to recover from a blackout (black start), it is weak and more vulnerable to a repeated attack which can have adverse effects on the frequency of the operation.

5 Anomaly detection methods in power grid

Anomaly detection methods are broadly classified into three approaches: supervised, semisupervised and unsupervised [29]. In this summary, each of these categories is further divided into sub-categories based on the detection technique. An overview of the detection techniques discussed is shown in Fig. 6. The list of works utilizing a specific detection technique is available in Table 4.

5.1 Supervised approaches

Supervised anomaly detection is a technique that assumes the availability of labeled training data for both normal and anomalous classes. A typical approach is to build a predictive model and compare unseen data against the model to determine the class it belongs to [29]. In this subsection, we categorize supervised techniques into classification-based and neural network-based models.

5.1.1 Classification

Classification is used to learn a model (classifier) from a set of labeled data instances (training) and then, classify a test instance into one of the classes using the learned model (testing) [29]. Several traditional classifiers are known and are used in power grid applications namely, Logistic Regression (LR) [133], Gradient Boosting, Gaussian Naive Bayes (GaussianNB) [2, 37, 142], AdaBoost, XGBoost, Decision Tree (DT) [2, 68], Decision Tree as base learner (DTBoost), K-Nearest Neighbors (KNN) [72, 159], Multi-layer Perceptron classifier (MLP), Quadratric Discriminant Analysis (QDA), Support Vector Machine (SVM) [2, 42, 54, 95, 173], bagging ensemble classifier with decision trees as base learner (DTBagg) and Random Forest classifier (RForest) [2, 95, 137, 142]. The classification techniques are applied for different detection tasks like anomalous power consumption, anomalous network traffic, denial-of-service attacks on communication networks, electricity theft, false data injection, intrusion and tripping attacks.

Traditional classifiers are used by Xu et al. to determine anomalies in the running power consumption data of the Distribution Terminal Unit (DTU) as the consumption is influenced by the strong correlation to its running programs [220]. If the DTU is attacked, the running program will be modified slightly that influences the consumption as well. Two other works also compare different classifiers and different sampling techniques to solve the imbalanced data problem i.e. unequal distribution between the classes [109, 149]. Wang et al. perform a similar comparison and propose a novel model in which random forest is used as the basic classifier of AdaBoost followed by weighted voting on the prediction labels to decide the final class [203]. AdaBoost is also used by Qu et al. for detecting electricity theft attacks in residential areas using the power consumption data [152]. To provide wide area protection of the smart grid, Singh and Govindarasu evaluate different classification techniques for sensor data to detect multiple attacks like FDI, DoS, lines tripping and cyber events [181].

Classification can be performed on features that are extracted or modified for better performance of the classifier. Al-Abassi et al. pass the voltage data to a stacked autoencoder to obtain a different representation of the input data and apply random forest for classifying the anomalous data [5]. Ouyang et al. propose a hierarchical time series feature extraction algorithm and an ensemble classification model that uses XGBoost, RForest, and LR [138, 139]. The extracted time-series features are mainly used to find the abnormal sample distribution rules as well as the information from normal power consumption activities. Li et al. propose a blockchain-based anomaly detection based approach to detect anomalous consumption in smart grids using KNN [102]. They use different sensor data from the smart grid which records environmental factors and also use smart meter data to analyze the power consumption in combination with the sensor data.

Support Vector Machine (SVM) is used in several works from the literature by combining it with other methods. Zhang et al. first use an unsupervised approach, Gaussian Mixture Model Linear Discriminant Analysis (GMM-LDA) to label the consumption data and send it to Particle Swarm Optimization Support Vector Machine (PSO-SVM) for training with labeled data. Then, the test data is classified using the SVM model to identify the labels of the new data whether it is abnormal or not [236]. Another work also uses PSO and one-class SVM (OC-SVM) for the detection process [211]. Wang et al. propose an efficient home power anomaly detection using SVM and Hidden Markov Model with improved monitoring performance in terms of electricity usage as well as changes in the daily living activities of residents via the provision of detailed feedback [209]. Wang et al. propose a supervised learning algorithm named Support Vector Machine embedded Layered Decision Tree [206]. First, it segregates the training data set into subsets based on all nominal features which reduces the dimensionality of the feature space. Then, for each subspace, only the numeric features are considered to run Decision Tree-based Support Vector Machine. SVM is applied recursively to the tree to detect the anomalies. This detection process is carried out by each agent separately, but the final decision depends on the consensus among all interconnected agents.

5.1.2 Neural network

Artificial and deep neural network models can be useful in learning patterns and generalizing from past data to identify normal and anomalous instances [73]. Feedforward and recurrent neural networks are used for prediction and classification tasks in power grid anomaly detection.

Yuan and Jia use smart meter data for detecting anomalies but is done in a distributed manner where an IoT-based distributed structure is implemented to execute the data interaction [231]. They propose a deep learning approach that uses a stacked sparse autoencoder which is a multi-layer neural network consisting of several basic sparse autoencoders followed by a softmax layer for classification. Reuter et al. use a deep feed-forward neural network for classification and a deep autoencoder for the detection of anomalous data in SCADA communication systems [163]. Wang and Ahn use time series modeling in combination with an artificial neural network, SVM and KNN to yield accurate results for the detection of anomalies in residential application [208]. The artificial neural network is used for consumption prediction and to compensate for the non-linearity that traditional time series models like ARIMA fail to capture.

Wavelet transform (WT) provides a unified framework for signal processing applications. WT can decompose a signal in coefficients, and it can localize an anomalous behavior in both the time and frequency domains with different scales. Hence, Ghanbari et al. use WT followed by variance fractal dimension (VFD) to measure the complexity of the coefficients extracted during WT [61]. Additionally, they use a feed-forward artificial neural network to enhance the detection rate of anomalous behaviors in a short duration of the attack.

Some attacks involve hijacking the controller code that controls the actuators of the physical system, making the control behave abnormally. The PLCs consist of the control code to perform a specific task in the power grid. They are relatively stable as the code running on the controller changes infrequently. The key idea for the detection used by He et al. is that the normal behavior of the controller is predictable using a temporal deep learning model and low-cost Hardware Performance Counters (HPC) features [70]. LSTM and Conditional Restricted Boltzmann Machine (CRBM) are used to predict normal controller behavior and a reconstruction error distribution of HPCs is used to detect controller anomalies. The squared error of the actual and the predicted behavior is used to indicate the anomalies. LSTM is also used for predicting power consumption data in a supervised manner by training with normal and abnormal samples [199].

Efstathopoulos et al. use operational data and examine whether smart grid attacks can be detected by analyzing them [48]. Operational data is generated from physical devices that are used to run the power plant, e.g. temperature of a cooling system. They apply and compare different techniques like PCA, Isolation Forests (Iforest), Angle-based outlier detection (ABOD), one-class SVM, Stochastic Outlier Selection (SOS) and deep fully connected autoencoders. Wilson et al. use stacked autoencoder to develop machine learning features against transmission SCADA attacks [215]. The network is first pretrained with a greedy layer-wise pre-training procedure after which the parameters of the whole deep network are initialized by the corresponding parameters learned. After the training phase, a classifier layer is added to the deep neural network model and the fine-tuning takes place in a supervised manner.

Convolutional neural networks are used for image anomaly detection by actively monitoring fire breakout and working personnel [77]. Supervised classification is also achieved by adding a classifier layer to a neural network [171]. Such a method is used in [22, 43, 234] to identify anomalies in voltage data from the distribution power lines and traffic packets in communication networks using a CNN. Moreover, [234] uses multi-headed attention before classification to capture the multi-dimensional relationship between each packet from the traffic cluster.

5.2 Semisupervised approaches

Semisupervised techniques assume that the training data has labeled instances only for the normal class. Since they do not require labels for anomaly class, they are more widely applicable. The typical approach used in such techniques is to build a model for the class corresponding to normal behavior and use the model to identify anomalies in the test data [29]. We categorize semisupervised techniques into regression-based, distance-based and statistical models.

5.2.1 Neural network

Pattern matching is the ability to store known patterns of information flow within a given network and to perform a rapid comparison of real-time information with stored correlations in sub-patterns stored previously. The correlation between localized device readings is viewed as a pattern. As the power grid networks contain resource-constrained devices, Baig proposes an in-network, light-weight decentralized pattern recognition algorithm which can form an associative memory structure by interconnecting individual SGI device readings in a graph like structure called the GN (Graph Neuron) array [19]. The GN-based pattern recognition process is a comparison of SGI device readings at any given point in time to establish conformance with normal readings.

An extreme learning machine-based (ELM) anomaly detection technique is used in edge devices that enables on-device learning and detection [197]. LSTM and Gated Recurrent Unit (GRU) are used for predicting the power consumption and the loss between the predicted and actual is used for the anomaly detection process [58, 62, 113, 131, 143, 194, 210, 219, 242]. Spectral residual CNN is used by Oprea et al. to detect anomalous usage in residential consumption data [136].

5.2.2 Regression

The basic regression model-based anomaly detection technique consists of two steps. In the first step, a regression model is fitted to the data. In the second step, for each test instance, the residual for the test instance is used to determine the anomaly score. The residual part (e.g. anomaly score) is not explained by the regression model and can be chosen based on the use case.

Demand response programs are designed to reduce energy consumption for relatively short time periods and are widely recognized to help meet both reliability and market needs. However, it is critical to predict the reduction in energy during events and the increase due to the rebound effect after events. Zhang et al. focus on predicting the consumption accurately specifically for demand response programs [238]. A piece-wise linear regression is performed using the correlations between temperature and consumption to be able to predict the consumption accurately. An instance is then classified as an anomaly if the ratio of the predicted and observed consumption values are less than a threshold that depends on the historical consumption data of the user.

Badrinath Krishna et al. use ARIMA for power consumption prediction followed by computing time-window statistics of mean and standard deviation for detecting electricity theft [18]. Janetzko et al. use a prediction-based anomaly detection method using weighted average prediction by using daily seasonality and compute a normalized anomaly score for the detection process [87]. Higher anomaly scores denote a higher chance of it being an anomaly. Aligholian et al. use support vector regression (SVR) as a load prediction method, after which the difference between the real and predicted data is characterized by using a distribution function to detect the outliers with a 3-sigma rule [7]. Hosseinzadehtaher et al. propose a condition monitoring vector (CMV) equipped with a learned ultra short demand forecasting (USTDF) mechanism for detecting anomalies in AMI readings and smart inverters data [76]. The USTDF is based on the consumption data and temperatures and this model is built using multi-dimensional regression (MDR). Jaiswal et al. use four different types of regression techniques including linear regression, random forest regression (RFR), support vector regression (SVR) and gradient boosting regression (GBR) on consumption data followed by a 2-standard deviation for tagging anomalies [85].

Hybrid models are usually developed when different models perform better in different time windows. Cui et al. propose a detection system for school electricity data that combines polynomial regression for weekends and Gaussian distribution for week days [39]. Polynomial regression with Hampel identifier is used by Huang et al. for detecting anomalous consumption values in a research facility [79]. Kosek and Gehrke use an ensemble of non-linear artificial neural network models to detect anomalies in distributed energy resources (DERs) in a power grid that can be accessed and controlled remotely [97]. This model uses contextual parameters like hour of the day and other meteorological data for the training and detection process.

5.2.3 Distance-based techniques

Chen et al. use a Multi Layer Perceptron (MLP) and Mahalanobis distance-based statistical approach to find anomalies in power consumption data [30]. Yijia and Hang propose a detection method for identifying abnormal electricity users by combining the line loss and power analysis [228]. They use the Slope Extract Edge Point(SEEP) algorithm to extract the edge point sequence of power consumption and then apply the vector space cosine similarity to measure the similarity between the extracted sequence and the consumption sequence of the users. Consine similarity matching is applied after a Kalman filter estimation in [160], for detecting FDI attacks in smart grid communication systems.

Valenzuela et al. use principal component analysis (PCA) on the power flow data of the power grid transmission system and use principal components in the new subspace as well as the original values to construct an anomaly score [201].

5.2.4 Statistical models

Several works use matrix-based methods to find the changes between the states as the base for detecting anomalies in the target network [193, 237]. Zhang et al. used random matrix theory to detect anomalies in big data which can include a large volume of operational data in real-time such as power consumption, voltage, current, active power, and reactive power [237]. These data are usually random due to disturbances from external factors like climate or electrical factors like technical failures. Hence, a random matrix is used to denote these random variables. After constructing a high-dimensional random matrix, the sample covariance matrix is determined by approximating it using maximum likelihood estimation, subjecting it to a unitary singularization treatment and calculating its eigen values. Mean Spectral Radius (MSR) is used as the metric to represent the mean distance between all of the eigenvalues and the center on the complex plane. The single ring law is used to determine the extent of random deviation of the data in a random matrix. To detect anomalous behavior in transmission lines and buses, Moslemi et al. use covariance matrices for voltage, current and frequency data [126].

Replay attacks aim to intercept authentication information. In the smart grid, replay attacks intercept the usage pattern along the varying smart meters and replay this data to carry out an undetected intrusion. Kalman filtering along with a chi-square detector is used for the detection of the replay attack for the system controller of the smart grid [240]. Kalman filter-based technique is also used in [90, 160].

Yilmaz et al. consider a hierarchical data collection smart grid infrastructure and propose a general and scalable mitigation approach called Minimally Invasive Attack Mitigation via Detection Isolation and Localization (MIAMI-DIL) [232]. The authors use Geometric Entropy Minimization (GEM) with Cumulative sum (CUSUM) to learn the minimum volume sets that represent the nominal probability distribution of the expected data instances. If a data instance is out of this distribution, then it is an outlier. Entropy-based metrics, such as normalized entropy and Shannon entropy, can be calculated on the selected features. An attack can be identified based on the value of the calculated entropy [80, 88]. For finding anomalies in load data, extreme studentized deviate test (ESD) is used in [212].

Nasr and Varjani propose a statistical anomaly detection method that uses mean and standard deviation techniques to learn the behavior of the system over time and then use a threshold to differentiate between normal and anomalous data in SCADA systems [128]. Kwon et al. also use mean and standard deviation to detect DoS and other communication network attacks [100, 101]. Ishimaki et al. use harmonic to arithmetic mean ratio-based detector to identify false data and preserve the privacy of the consumers using fully homomorphic encryption (FHE) scheme called the Cheon-Kim-Kim-Song (CKKS) scheme [83].

Karimipour et al. use a detection method based on statistical correlation between measurements [91]. Marino et al. propose a detection system that models the communication network using Poisson distributions while using data to learn the model parameters [116]. Matthews and Leger use fano factor to tag anomalies in streaming PMU data in the distribution network [121].

5.2.5 Rule-based techniques

Rashid et al. propose a rule-based system to detect anomalies at the application level [158]. Most of the detection mechanisms focus on meter-level detection which does not identify the anomaly causing appliance. The authors use both submetered and non-intrusive load monitoring (NILM) data and perform post-processing on the NILM data to improve the performance of the detection process. Azizi et al. also use a rule-based approach to detect anomalous usage in household consumption data by using non-intrusive load monitoring [17].

A hierarchical architecture is used for monitoring the micro-phasor measurement unit data, by providing a set of rules based on different events and using correlation matrices to examine the state of the grid [86]. Zhang et al. propose a time series anomaly detection model that is applied on the network level packets in the power grid communications between SCADA’s HMI and PLCs [235]. It is based on Discrete Fourier transform, and periodicity of network packets which are then checked for abnormality based on the time deviation period.

5.3 Unsupervised detection

Unsupervised anomaly detection is a technique where we do not require training data, thus making it most widely applicable. It is based on the assumption that the normal instances are frequent when compared to the anomalous ones. This assumption will prevent the technique from having a high false positive rate [29]. For building models that do not have prior knowledge about anomalous consumption, the model is trained using the normal consumption behavior along with the definition of classifying consumption values as abnormal or normal [73].

5.3.1 Clustering

Clustering is a machine learning scheme used to categorize unlabelled consumption data into various clusters, mainly normal and abnormal clusters. Clustering can also be done on normal data and the data points that are farther away from the formed normal clusters are usually tagged as anomalies [233]. Some of the common clustering methods used in the context of power grids are K-means, Partitioning Around Medoids (PAM) [14, 169, 185] and Density-based spatial clustering of applications with noise (DBSCAN) [60, 62, 137, 243],

K-means clustering is a popular unsupervised approach that is used to categorize the observations into k clusters with the nearest means or the cluster’s centroid. This method has been used for detecting anomalous power consumption in residential buildings [49, 117], flooding of UDP packets, ICMP packets and Ping of Death in communication networks [122] and anomalous behavior in a distributed framework on network and PMU data [188]. Two clustering methods are also combined to achieve better performance in detection. K-means and Isolation Forest (IForest) are used in communication networks [177] and distribution PMU networks [93]. Rahimi et al. use a genetic algorithm to get an optimal K value for the K-means to find anomalous usage in power consumption data [153]. A combination of clustering and prediction methods, specifically K-means and LSTM is used by [27, 28, 59] to cluster the consumption observations to detect the anomalies as well as find anomalies in advance by using the prediction method (i.e. LSTM) to forecast the value into the future. Chou and Telaga also followed a similar approach by using K-means followed by a combination of neural networks and autoregression (NNAR) [34].

In an attempt to detect contextual and collective anomalies, Rossi et al. propose an approach based on frequent itemset mining by encoding the different event types streamed from smart meters, applying segmentation of the data and using categorical clustering for the evaluation of the collective data and detection of unexpected patterns [165]. Contextual information is also used by [14, 169, 185] to detect anomalies in the consumption by computing an anomaly score for each user considering historical consumption data. The anomaly score for a user is then adjusted by analyzing other contextual variables such as seasonal variation day of the week and other users with the same historical pattern. All three works use models based on Partitioning Around Medoids (PAM). PAM is also used for anomaly detection in solar farms on voltage and current data that is collected from micro-PMUs [44].

Commercial buildings consume a lot of energy and hence, motivate research to improve building energy efficiency. Bellala et al. propose an anomaly detection mechanism for power meter data to detect anomalous points using an unsupervised cluster-based algorithm to model the occupancy using Hidden Markov Model [24]. This algorithm takes as input the power time series of a meter over several days and outputs the probability of a particular day being anomalous. The values are computed through a KNN density estimation algorithm. The probability scores can be used to rank the days in terms of anomalousness, providing a building administrator with a prioritized list of data points that require further inspection. Janetzko et al. use the same clustering-based approach to detect anomalies in power consumption data and provide various time series visualization schemes, which helps in analyzing and understanding the energy consumption behavior [87].

DBSCAN is used for detecting anomalous behavior in IoT devices [60], phasor data from PMUs [243], streaming consumption values [62] and communication networks [137]. Zhang et al. propose an anomaly detection method for smart meters data which is based on Gaussian Mixture Model Linear Discriminant Analysis (GMM-LDA) clustering used for feature learning [236]. Self-Organizing Maps (SOM) is an unsupervised machine learning technique used to produce a low-dimensional representation of a higher dimensional data set while preserving the topological structure of the data. SOMs are used in [200] and [196] for detecting faults and FDI attacks using consumption data.

Visual analysis makes it easier to identify anomalies on a large scale. When analyzing data streaming from multiple sensors, it is important to identify similar patterns among them, so that the sensors can be grouped based on their behavior. This grouping can be done by using similarity measures between the different streaming data [189]. This aids in the initial analysis of the sensor data coming from a power grid, that can be used for creating anomaly detection models that represent the underlying patterns. Similarly, dissimilarity measures are also used to cluster similar groups together [65].

5.3.2 Neural network

Neural networks are used in anomaly detection techniques to let the network itself discover the patterns, features from the input data and the relation of the input data over the output.

Recurrent neural networks (RNN) are used for predicting the time series. It is used by Xu et al. for power consumption data followed by quantile regression to build probabilistic power consumption forecasting models with a quantile interval that is chosen beyond which the instance will be flagged as an anomaly [222].

Autoencoders are used to reconstruct the input data with the reconstruction error as the anomaly score that gives the extent to which the reconstructed output is different from the input. The input can consist of multiple features that are contextual or behavioral (day of the year, season, month, etc.) along with power consumption [12, 13]. Other neural networks can be combined with autoencoders like a variational autoencoder combined with RNN and attention [127, 146] for power consumption data or a CNN with an autoencoder [40] for smart home sensor data. Autoencoders are also used for anomaly detection in the power generation process specifically for wind turbine fault detection [239] and inverter sensors in solar power plants [82]. They are also used for PMU data in the power grid distribution network [3].

Generative adversarial networks (GANs) consist of a generator that generates adversarial samples and a discriminator that is trained on differentiating anomalous data from normal data. This is used for detecting intrusion attacks in communication networks that can occur in any smart grid infrastructure consisting of communicating IoT devices [36, 184] as well as in a federated setting [1].

Restricted Boltzmann Machine (RBM) is used for detecting large-scale attacks in the transmission lines using the voltage and current data [92].

5.3.3 Nearest neighbor and density-based techniques

Principal component analysis (PCA) is used by [114, 151] for selecting features that represent trend, variability, volatility and other statistically representative data of the consumption. Qui et al. use the first two principal components to represent the power consumption pattern of each user [151]. Local Outlier Factor (LOF) is used to quantify the extent to which each user’s point is an anomaly. LOF uses a concept of local density where each data instance needs the distance of its k nearest neighbors. Large distances result in low-density regions for anomalous data instances as compared to normal data instances. LOF is also used in communication networks for identifying anomalous traffic packets [64] and in commercial and residential buildings [156, 157]. Shylendra et al. use kernel density estimation (KDE) for detecting anomalous usage in power consumption data for wide sensor networks [178].

5.3.4 Tree-based techniques

Isolated forest is an unsupervised technique based on the decision tree algorithm. It is used for detecting anomalies in communication networks [64, 177], household power consumption [7, 114], power grid parameters in SCADA systems [175], inverter sensors in solar power plants [82] PMU data [93] and micro-PMU data in distribution networks [50].

5.3.5 One class classification

One class SVM is an unsupervised approach that uses only normal instances for the training and categorizes the data that deviate from the estimated model as anomalies. Though the performance of OC-SVM is similar to a binary class SVM, OC-SVM is preferred as it is trained only on normal data and detects anomalies from the new data by comparing it to the normal behavior [45]. It is used for identifying attacks in power consumption data of IoT devices [72, 211], communication networks [64] and transmission lines [51, 217].

5.4 Other techniques

Other techniques that have been used in smart grids include hierarchical temporal memory (HTM) [21] for anomalous behavior of micro-PMU data, an online model-free reinforcement learning approach which does not require attack models [23, 99] and an evolutionary technique based on fuzzy learning for improving clustering performance [105]. Passerini et al. use power-line communication signals to identify and localize faults in the distribution network such as electrical faults, impaired cables and unexpected impedance changes [145]. They utilize a two part algorithm, the first detects and tracks the evolution of faults over time while the second uses information about the network topology to localize the faults identified by the first algorithm.

5.4.1 Graph-based techniques

By using nodes and edges in a graph to represent the buses and branches in the electric network, graphs are constructed using the topological information of the network. The power grid is represented as a graph and the topology change is observed over a time period [104]. Anwar and Mahmood propose a graph matching approach to detect anomalies that exist in an electric topological and configuration database [11]. A query graph is anomalous if it has different number of nodes or edges than the reference graph. Graph-based techniques are also used to represent edge devices [221], sensors in smart homes [125] and transmission lines [75]. Chen et al. propose a graph technique that uses correlation grouping [31] and a graph convolution network with attention [33] to learn graph structures of the sensor data in power grids.

5.4.2 Big data

For data-driven anomaly detection, the processing of big data often becomes a challenge [204]. Lipcak et al. showcase the application of big data platforms using Apache Flink, Storm and Spark and compare the performances of the three while using weighted average prediction using previous three days of the consumption data and the temperature [106]. Apache Spark and Spark Streaming are also used for creating distributed computing framework [47, 108]. Chen et al. aim at implementing an online real-time detection algorithm for handling huge amounts of data using Storm and Hadoop-based framework [30]. For processing large-scale data from smart meters, Moghaddas and Wang propose a hierarchical framework that uses smart meter event data rather than consumption [123]. Matthews and Leger leverage MapReduce for the processing of millions of data from the PMUs and detect anomalies [120].

6 Future challenges

Our extensive review highlights that anomaly detection in the power grid is an active area of research, and continues to see novel explorations. At the same time, this broad look at the variety of work performed helped us identify recurring challenges in the domain. In this section, we discuss those pressing challenges that persist in the area despite advances in the complexity of adopted methods.

6.1 Detection speed and accuracy

For critical infrastructure like the power grid, real-time monitoring is essential and plays an important role in the grid operator’s decisions. The importance of applying quick detection approaches for not missing alarms during an active attack is also addressed in the literature [38]. However, when it comes to the detection of attacks, there is always a question as to when to raise an alarm. The earlier the anomaly is detected and reported, the sooner appropriate actions to mitigate the impact could be undertaken [4]. With many threshold-based approaches used in detection systems, having too small of a threshold raises false alarms and having it too large might miss anomalous instances. The selection of a threshold plays a crucial role in anomaly detection. As such, it is important to study the trade-off between detection reliability and detection speed and to operate on the optimal trade-off curve. In [112], it is shown that as the threshold is increased for a power consumption anomaly detection system, the detection rate or the true positive rate becomes worse and on the other hand, the false alarm rate or the false positive rate improves. During this analysis, it is observed that even when obvious anomalies that have very high wattage are injected, the detection system is still unable to capture them. Though an optimal threshold is chosen, the undetected attack configurations still lead to a successful attack on the power grid. Therefore, considering the detection accuracy alone leads to an inevitable trade-off between the true positive and false positive rates which must be assessed before deploying a model.

6.2 Concept drift and evolving attacks

Detection model updates over time are inevitable given that a consumer’s energy usage changes throughout the year. For example, when a new appliance is added or the number of members in the household change, the consumption pattern also is affected [238]. Therefore, handling concept drift and distribution shifts in the data is an important aspect of detection models. However, this is not commonly discussed when proposing new techniques.

Supervised detection algorithms perform better when detecting known attack signatures. Due to the unavailability of labeled data, detection becomes challenging when new attacks come to the surface. It is required to have robust mechanisms that can be updated to detect different types of evolving attacks [16]. Though a semi-supervised or unsupervised mechanism can be deployed for such a case of varying attacks, there is a possibility for an attack to be constructed such that it works within the tolerance levels of a model and stay undetected. Adversarial attacks on machine learning models have been shown for image data and time series data, and it remains as an advanced persistent threat to detection models.

Many power grid attacks begin with a compromise in the communication network. Therefore, using state-of-the-art countermeasures specifically for monitoring and detecting intrusions in the network will help in the early detection of an attack. Application of software-defined networking (SDN) to smart grids is known to enhance the SCADA system resilience [46]. SDN is a networking paradigm that provides separation between the control and data plane, allowing the controller to configure the network operations dynamically. If there is a failure of the network due to an attack, SDNs can be leveraged to dynamically establish a faster route via the internet as an emergency response. SDNs are also capable of dynamically filtering out unwanted traffic and potentially malicious traffic. This improves the efficiency, monitoring and resiliency of smart grid communication networks. Another promising countermeasure in industrial environments is the use of honeypots. In the context of smart grid networks, a honeypot simulates the normal operation of a device, such as a smart meter, to attract, deceive and analyze an attacker’s behavior [134]. Several proofs-of-concept have been shown for the use of honeypots in smart grids [134, 147, 183]. The use of honeypots for privacy-preserving federated learning environments is discussed in [6]. Agame-based honeypot selection has also been proposed which characterizes the essence and objective of the defender to support the choice of the honeypot type [25]. This helps in studying evolving attacks on the devices and the underlying network.

6.3 Limited data

When using supervised approaches, there is an imbalance between normal and anomalous data which makes it difficult to capture the characteristics of anomalous data points. Imbalanced data refers to the unequal distribution of the data instances in different classes, usually having the anomalous class as the minority and the normal class as the majority. This arises because the data that is collected from the system is typically associated with normal behavior and not disturbances or attacks. In a lot of studies, it was proved that some classifiers achieved better overall performance with a sampled and balanced dataset [149]. This poses a challenge as to how well the detection technique can perform with such limited availability of anomalous data.

Unlabeled data poses a challenge for classification-based anomaly detection as there is no clear indication of a specific point is normal or anomalous. There is a significant lack of labeled data and new types of attacks may have a behavior different from the trained data [24], which is why unsupervised detection approaches are preferred over supervised. However, unsupervised approaches usually end up having high false alarms, which might again misguide the grid operator in terms of taking the wrong countermeasures.

Transfer learning has become an effective approach as data and knowledge of older systems with richer power consumption records can be utilized [223]. Using this technique helps with the challenge of limited data, but there are other factors to be considered, such as device upgrades, changes in the consumption behavior of the consumers, new attack scenarios and threats, among others.

6.4 Datasets for benchmarking

Standard datasets that mimic the operation of the power grid in different locations, such as consumption at the utility level, or voltages and currents of the transmission lines, along with common anomalies or attacks that occur with such data will make it easier to benchmark methods for each target application. Though IEEE bus systems are used to generate synthetic data to mimic such operations, the artificial anomalies that are generated and the changes that are performed to the load still differ between research papers. Therefore, datasets that capture different anomalous events that are observed in the real power grid will help in obtaining a better validation of the constructed detection model.

6.5 Deployment and application in real world

While most of the detection mechanisms are trained and tested on smaller datasets, deploying the same model in the real world will be different. This is because the model begins to receive streaming data which it must handle, making it a real-time and online model. The efficiency of the model when deployed is not necessarily discussed in research papers. In fact, as the model must be trained using the historic data of its target deployment, it will then be validated on data that has different proportions of normal and anomalous instances. In most cases, the data is even unlabelled, making the detection performance to be heavily dependent on only the false positives. This changes the way a model perceives anomalies in the real world. To bridge the gap between development and deployment of detection mechanisms, the Digital Twin Technology can be used for rigorous testing and studying the detection performance of various attacks. A Digital Twin is a digital model of a physical system that reflects its behavior by applying platforms and two way interactions of data in real-time [84]. It provides a virtual environment to manipulate costly grid devices allowing the development of standardized models for the smart grids [15, 134].

6.6 Anomaly or attack source

An anomaly can be detected based on the observed data, however, finding the source of the anomaly is rather challenging [158]. Most of the detection systems are unable to find the attack source or the anomalous resources directly. In addition, to quantify an anomaly, it is also important to locate the anomaly especially as we are moving towards a distributed and federated learning environment. As the number of devices that are managed by a single detection algorithm increases, identifying the source of anomaly should also be considered.

6.7 Thresholding in detection systems

When we use a score-based system, threshold selection becomes an important step while deploying the model. Majority of the research papers use a fixed threshold that has been personalized for a specific scenario [108]. How and when to change the threshold based on the underlying behavior of the data then becomes an interesting problem. In fact, fixed thresholds have been shown to have a significant impact on the false positive rate of state-of-the-art neural networks designed for anomaly detection in the power grid [112], and leaves room for exploits irrespective of the choice made. Dynamic thresholding techniques for time series data have been proposed before. However, using or proposing such a technique for thresholding is less observed in the power grid domain [111]. As the power grid is evolving every day, dynamic or recent window-based thresholding techniques also play an important role in the performance of the detection system.

6.8 Scalability

Anomaly detection techniques proposed by different researchers aim to find anomalies in a specific dataset, that usually consists of a single target variable such as the power consumption, edge weights in graph-based approaches, or number of alarms in behavior-based approaches. However, numerous features are used to aid the detection process using a model to help in taking the decision of flagging an instance as an anomaly. For example, these features can be external temperature variables, sensors or devices in the power grid, or the number of personnel in an insider attack. With a higher frequency of data being collected for better monitoring of the grid, for example with 30 to 60 observations per second in the case of PMUs, the scope of applying the models to high frequency data needs to be evaluated for the methods in the literature. Most data available publicly usually contain one minute frequencies, which many papers are based on. However, the performance of the method in terms of timing may affect the detection results when tested on high frequency data. It is also observed that the frequency of data does affect the detection of short duration anomalies like sudden increase or decrease in demands [202]. This means that there will be an increase in the frequency of the data for better detection capabilities. However, scalability is often less discussed in terms of the practicality of the detection model, especially with growing numbers of sensors and consumers in the power grid every day.

6.9 Distributed and edge computing

As the volume of sensors and devices in the power grid is increasing, the detection mechanisms are moving towards a distributed approach and have detection mechanisms run on-device. Classical machine learning-based methods are useful when choosing lightweight devices compared to deep learning-based methods [241]. However, they are not robust to noisy data and their performance may saturate for a large amount of data. This calls for simple methods that can be used for a single device while being able to handle streaming data without being susceptible to high errors.

Moreover, when we consider fog computing that involves fog nodes to collect and run the detection algorithm on the data from the underlying devices, it requires efficient algorithms that can handle such high dimensional data [232]. We also need to take the storage, computation and communication overhead into account when using a distributed architecture [31].

6.10 Visualization techniques for large scale data

When monitoring the power grid at a large scale, for example, monitoring the demand requirements from multiple consumers in an area, it becomes easier when using visualization compared to listing the anomaly scores. Though [87, 189] propose visual analytics to better visualize anomalies, it is not observed to be proposed in any other research work. Selecting and constructing specific metrics and visualizations related to anomaly detection is also a problem of interest. Choosing a combination of visualizations and aggregate metrics that can crunch down the anomaly metrics which are calculated for multiple devices will aid the monitoring personnel in quick detection and localization of the anomalies.

7 Conclusion

In the past few years, the attack surface of the power grid has increased with the advent of internet-based devices. In this work, we saw that past attacks that took place on the power grid were caused by infecting the ICS system with malware that is propagated over the network. We also discussed how the availability of IoT devices from the consumers has made it easier for the attacker to alter the load of the power grid leading to severe damage. These attacks and any faults that occur in the power grid can be identified by using anomaly detection mechanisms. In this work, we provide a systematic review of anomaly detection systems in the power grid while categorizing the collected 190 papers into different codes for each theme. We present a detailed taxonomy of methods that are categorized based on the availability of labeled data into supervised, semi-supervised and unsupervised techniques. Lastly, we describe the current challenges for the detection mechanisms highlighting the existing gaps and limitations, to promote further research efforts addressing the mentioned issues.