Keywords

1 Introduction

In order to meet the needs of the national economy, the scale of China’s power grid is constantly expanding, and its structure and operation mode is becoming more and more complicated than before [1]. As the data acquisition and victory control system, the SCADA system has been widely used in power networks, the system may fail to measure or transmit data due to various force majeure factors during the process of measuring data or transmitting data. Abnormal, that is, bad data [2]. In order to improve the reliability of power system state estimation, and select and eliminate a small amount of bad data that occasionally appears in the SCADA system measurement sampling, many scholars at home and abroad have conducted in-depth research on bad data mining techniques. But looking at all kinds of methods, the accuracy, fastness, and comprehensiveness of the detection and identification of bad data are still big problems that plague electric power workers [3, 4].

At this stage, the protection and control system of the power grid has achieved a high degree of automation, which places higher requirements on the accuracy of the system data [5]. Obviously, once the data received by a substation automation system or dispatch automation system is bad data, the impact of these error messages will interfere with the dispatcher’s judgment and may cause the dispatcher to make wrong control decisions and even cause protection and control. The device malfunctioned, which seriously affected the safety of the power grid [6].

The focus of this paper is to obtain the association rules by mining historical data samples collected by the SCADA system when the topology and operating status of the power system network are unclear, to conduct research on the detection and identification of bad data before state estimation. It will provide a certain theoretical and practical basis for related fields, and contribute to the improvement of China’s power system security.

2 Method

2.1 Data Mining

Data mining is not a random application of some existing or known analysis techniques to specific situations to solve specific problems [7, 8], but a way to solve problems and analyze problems. The whole process of data mining is shown in Fig. 1.

Fig. 1.
figure 1

Data mining process

2.2 Improvement of Association Rule Algorithm

This paper reduces the number of data subsets that need to be counted for the periodic support and proposes the CARM2 algorithm., Reducing the time complexity of the algorithm, the specific improvement steps are as follows:

Assume that the number of data subsets contained in period (l, o) is |db (l, o)|,

$$ \left| {db(l,o)} \right| = \left| {(n - o)/l} \right| $$
(1)

The minimum periodic support is minCycle. If |db (l, o)| is divided into two parts of the data subset, they are:

$$ \left| {db(l,o)} \right| \times (1 - \hbox{min} \;Cycle) $$
(2)
$$ \left| {db(l,o)} \right| \,\times\, \hbox{min} \;Cycle{\rm{-}}1 $$
(3)

Then only the periodic support number of the association rules of the first part of the data subset can be counted, because the association rules of the second part of the data subset periodically do not meet the minimum periodicity support condition, so it cannot become periodic Association rules. On the other hand, assuming that the first m data subsets of |db (l, o)| have been calculated, if the periodic support number of an association rule in these m data subsets is less than:

$$ m - \left| {db(l,o)} \right| \times (1 - \hbox{min} \;Cycle) $$
(4)

Then this rule cannot become a periodic association rule.

Proof: Because the number of data subsets contained in period (l, o) is |db(l, o)|, assuming that the minimum periodic support specified by the user is minCycle, an association rule must become a periodic association rule. The number of periodic support must be at least \( \left| {db(l,o)} \right| \times \hbox{min} \;Cycle \). The first m data subsets of |db(l, o)| have been calculated, and |db (l, o)|-m data subsets remain, then in the first m data subsets that have been calculated This association rule appears at least:

$$ \left| {db(l,o)} \right| \times \hbox{min} \;Cycle - (\left| {db(l,o)} \right| - m) = m - \left| {db(l,o)} \right| \times (1 - \hbox{min} \;Cycle) $$
(5)

Only then can this association rule become a periodic association rule and an improved CARM2 algorithm.

3 Experiment

3.1 Data

In this paper, the current, active, and reactive power data of a line collected in the SCADA system of a dispatch center from May to September and five months are used as sample data. Each daily active power, reactive power, and current data curve has 96 curves. Sampling point, the sampling interval is 15 min/time. It is known that the sample data used in this article has been trapped and cleaned up, and all are good data, and there is no missing in the middle.

3.2 Association Rule Mining

Because the selected historical data includes five months (150 days) of current and power distribution, the sampling interval is 15 min, and the time attribute is 96 timestamps per day, and the five months are divided into five periods of I-V according to the month, A total of 480 time units, the original database storage unit is shown in Table 1. Set the minimum support degree to 0.05, and perform periodic association rule mining on the data subset of each period to obtain the periodic frequent itemsets at each moment, and then to summarize the current and power distribution rules at that moment.

Table 1. Raw database storage unit

4 Discussion

4.1 Detection and Identification of Single Bad Data

Sample data of No. 2 is randomly selected, and the active power data of No. 10 sampling point is set as bad data. The active power data is increased by 10% based on the original normal data. It is known that the original data of this sampling point are: active power 489.15 MW, reactive power −32.47 MW, current 580.1A. The form after discretization is: T10, P7, Q6, 17. The point in active power increased by 10% to 538.06 MW, and the level changed from P7 to P3. All the modified sample data of No. 2 are discretized and stored as a new data source. The bad data detection process is shown in Fig. 2.

Fig. 2.
figure 2

Bad data detection process

The test results are shown in Table 2. The results show that the data record (T10, P3, Q6, 17) was extracted into the suspicious collection of bad data. The record has a timestamp T10. It is obvious that the record can be identified. There is bad data in the 10th sampling record. The corresponding association rules obtained from the sample data mining in the previous section are as follows: T10 → P7, Q6, 17 (Sup = 0.17, Conf = 0.68), T10 → P7, Q6, 16 (Sup = 0.08, Conf = 0.32). It can be seen that there are only two cases of normal data at the 10th sampling point, so it can be determined that the active power of the record is bad data.

Table 2. Single bad data detection result

4.2 Detection and Identification of Multiple Bad Data

Sample data No. 18 was randomly selected, and the active power data at the 35th sampling point and the reactive power data at the 65th sampling point were set as bad data. One reduced the active power by 10% and the other reduced the reactive power by 20%. It is known that the original data of the 35th sampling point are: active power 527.72 MW, reactive power −52.77 MW, current 623.46A. After discretization: T35, P4, Q8, 14. Now reduce the active power by 10% to 474.95 MW, the grade becomes P8. The original data of sampling point 65 is known as: active power 553.09 MW, reactive power −33.49 MW, current 655.1A. After discretization: T65, P2, Q6, 12. Reduce the reactive power by 20% to −26.54 MW, and the grade becomes Q3. The test results are shown in Table 3. According to the table, the data records (T35, P8, Q8, 14) and (T65, P2, Q3, 12) were extracted into the suspicious set of bad data, and the 35th Bad data were present in the and 65th sampling records [9, 10].

Table 3. Multiple bad data detection result

5 Conclusion

In this paper, the association rules in data mining and the detection and identification of bad data in power systems are studied in-depth, and the association rules are introduced into the detection and identification of bad data. Detect and identify models to derive information with practical application value. The information obtained from the historical data of the power system using association rules helps to obtain the measured and predicted amount at each moment so that the decision has a scientific basis.