Abstract
Traditional bad data detection methods are estimated algorithms that require repeated state estimations. A large number of calculations may also cause “residual flooding” or “residual pollution” phenomena, which is the ideal state. The bad data can be detected and identified before the estimation, and the bad data detection and identification method based on association rule mining studied in this paper can solve these problems to a certain extent. This paper first analyzes the traditional bad data detection and identification methods and then leads to data mining technology. Second, it delves into the classic algorithm Apriori and improvement in association rules and studies the basic algorithm and improvement of periodic association rule mining. Application of improved algorithm. The current, active, and reactive power data of a certain line collected in the SCADA system of a dispatching center from May to September and five months were selected as sample data to finally verify the feasibility and effectiveness of the method.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In order to meet the needs of the national economy, the scale of China’s power grid is constantly expanding, and its structure and operation mode is becoming more and more complicated than before [1]. As the data acquisition and victory control system, the SCADA system has been widely used in power networks, the system may fail to measure or transmit data due to various force majeure factors during the process of measuring data or transmitting data. Abnormal, that is, bad data [2]. In order to improve the reliability of power system state estimation, and select and eliminate a small amount of bad data that occasionally appears in the SCADA system measurement sampling, many scholars at home and abroad have conducted in-depth research on bad data mining techniques. But looking at all kinds of methods, the accuracy, fastness, and comprehensiveness of the detection and identification of bad data are still big problems that plague electric power workers [3, 4].
At this stage, the protection and control system of the power grid has achieved a high degree of automation, which places higher requirements on the accuracy of the system data [5]. Obviously, once the data received by a substation automation system or dispatch automation system is bad data, the impact of these error messages will interfere with the dispatcher’s judgment and may cause the dispatcher to make wrong control decisions and even cause protection and control. The device malfunctioned, which seriously affected the safety of the power grid [6].
The focus of this paper is to obtain the association rules by mining historical data samples collected by the SCADA system when the topology and operating status of the power system network are unclear, to conduct research on the detection and identification of bad data before state estimation. It will provide a certain theoretical and practical basis for related fields, and contribute to the improvement of China’s power system security.
2 Method
2.1 Data Mining
Data mining is not a random application of some existing or known analysis techniques to specific situations to solve specific problems [7, 8], but a way to solve problems and analyze problems. The whole process of data mining is shown in Fig. 1.
2.2 Improvement of Association Rule Algorithm
This paper reduces the number of data subsets that need to be counted for the periodic support and proposes the CARM2 algorithm., Reducing the time complexity of the algorithm, the specific improvement steps are as follows:
Assume that the number of data subsets contained in period (l, o) is |db (l, o)|,
The minimum periodic support is minCycle. If |db (l, o)| is divided into two parts of the data subset, they are:
Then only the periodic support number of the association rules of the first part of the data subset can be counted, because the association rules of the second part of the data subset periodically do not meet the minimum periodicity support condition, so it cannot become periodic Association rules. On the other hand, assuming that the first m data subsets of |db (l, o)| have been calculated, if the periodic support number of an association rule in these m data subsets is less than:
Then this rule cannot become a periodic association rule.
Proof: Because the number of data subsets contained in period (l, o) is |db(l, o)|, assuming that the minimum periodic support specified by the user is minCycle, an association rule must become a periodic association rule. The number of periodic support must be at least \( \left| {db(l,o)} \right| \times \hbox{min} \;Cycle \). The first m data subsets of |db(l, o)| have been calculated, and |db (l, o)|-m data subsets remain, then in the first m data subsets that have been calculated This association rule appears at least:
Only then can this association rule become a periodic association rule and an improved CARM2 algorithm.
3 Experiment
3.1 Data
In this paper, the current, active, and reactive power data of a line collected in the SCADA system of a dispatch center from May to September and five months are used as sample data. Each daily active power, reactive power, and current data curve has 96 curves. Sampling point, the sampling interval is 15 min/time. It is known that the sample data used in this article has been trapped and cleaned up, and all are good data, and there is no missing in the middle.
3.2 Association Rule Mining
Because the selected historical data includes five months (150 days) of current and power distribution, the sampling interval is 15 min, and the time attribute is 96 timestamps per day, and the five months are divided into five periods of I-V according to the month, A total of 480 time units, the original database storage unit is shown in Table 1. Set the minimum support degree to 0.05, and perform periodic association rule mining on the data subset of each period to obtain the periodic frequent itemsets at each moment, and then to summarize the current and power distribution rules at that moment.
4 Discussion
4.1 Detection and Identification of Single Bad Data
Sample data of No. 2 is randomly selected, and the active power data of No. 10 sampling point is set as bad data. The active power data is increased by 10% based on the original normal data. It is known that the original data of this sampling point are: active power 489.15 MW, reactive power −32.47 MW, current 580.1A. The form after discretization is: T10, P7, Q6, 17. The point in active power increased by 10% to 538.06 MW, and the level changed from P7 to P3. All the modified sample data of No. 2 are discretized and stored as a new data source. The bad data detection process is shown in Fig. 2.
The test results are shown in Table 2. The results show that the data record (T10, P3, Q6, 17) was extracted into the suspicious collection of bad data. The record has a timestamp T10. It is obvious that the record can be identified. There is bad data in the 10th sampling record. The corresponding association rules obtained from the sample data mining in the previous section are as follows: T10 → P7, Q6, 17 (Sup = 0.17, Conf = 0.68), T10 → P7, Q6, 16 (Sup = 0.08, Conf = 0.32). It can be seen that there are only two cases of normal data at the 10th sampling point, so it can be determined that the active power of the record is bad data.
4.2 Detection and Identification of Multiple Bad Data
Sample data No. 18 was randomly selected, and the active power data at the 35th sampling point and the reactive power data at the 65th sampling point were set as bad data. One reduced the active power by 10% and the other reduced the reactive power by 20%. It is known that the original data of the 35th sampling point are: active power 527.72 MW, reactive power −52.77 MW, current 623.46A. After discretization: T35, P4, Q8, 14. Now reduce the active power by 10% to 474.95 MW, the grade becomes P8. The original data of sampling point 65 is known as: active power 553.09 MW, reactive power −33.49 MW, current 655.1A. After discretization: T65, P2, Q6, 12. Reduce the reactive power by 20% to −26.54 MW, and the grade becomes Q3. The test results are shown in Table 3. According to the table, the data records (T35, P8, Q8, 14) and (T65, P2, Q3, 12) were extracted into the suspicious set of bad data, and the 35th Bad data were present in the and 65th sampling records [9, 10].
5 Conclusion
In this paper, the association rules in data mining and the detection and identification of bad data in power systems are studied in-depth, and the association rules are introduced into the detection and identification of bad data. Detect and identify models to derive information with practical application value. The information obtained from the historical data of the power system using association rules helps to obtain the measured and predicted amount at each moment so that the decision has a scientific basis.
References
Khan, Z., Razali, R.B., Daud, H.: Bad data detection in power system state estimation based on generalized likelihood ratio test. Int. J. Energy Stat. 04(4), 1650016 (2016)
Deng, S., Zhou, A., Yue, D.: Distributed intrusion detection based on hybrid gene expression programming and cloud computing in cyber physical power system. IET Control Theory Appl. 11(11), 1822–1829 (2017)
Jiang, X., Sheng, G.: Research and application of big data analysis of power equipment condition. High Volt. Eng. 44(4), 1041–1050 (2018)
Zhou, W.: Research and application of data mining algorithm based on fuzzy neural network for nonlinear problems in large data environment. J. Comput. Theor. Nanosci. 13(7), 4735–4738 (2016)
Falkenthal, M., Barzen, J., Breitenbücher, U.: Pattern research in the digital humanities: how data mining techniques support the identification of costume patterns. Comput. Sci. – Res. Dev. 32(3–4), 1–11 (2016)
Fan, S.-K.S., Lin, S.-C., Tsai, P.-F.: Wafer fault detection and key step identification for semiconductor manufacturing using principal component analysis, AdaBoost and decision tree. J. Chin. Inst. Ind. Eng. 33(3), 151–168 (2016)
Fatima, B., Ramzan, H., Asghar, S.: Session identification techniques used in web usage mining: a systematic mapping of scholarly literature. Online Inf. Rev. 40(7), 1033–1053 (2016)
Yu, H., Du, Y., Ma, C.: Survey of compressed sensing technology for signal and data of power system. Yi Qi Yi Biao Xue Bao/Chin. J. Sci. Instr. 38(8), 1943–1953 (2017)
Zhu, Y., Xing, N., Ji, Y.: Fault location algorithm of integrated data network for power system based on interactive active detection. Autom. Electr. Power Syst. 41(4), 35–40 (2017)
Fernandes, E.R., Ghiocel, S.G., Chow, J.H.: Application of a phasor-only state estimator to a large power system using real PMU data. IEEE Trans. Power Syst. 32(1), 1 (2016)
Acknowledgments
The Academic Funding Project for Outstanding Talents of Universities and Colleges (Professional) in Anhui Province in 2018 (Project Number: gxbjZD57).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, H. (2020). Data Mining Technology in Detection and Identification of Bad Data in Power System. In: Yuan, X., Elhoseny, M., Shi, J. (eds) Urban Intelligence and Applications. ICUIA 2020. Communications in Computer and Information Science, vol 1319. Springer, Singapore. https://doi.org/10.1007/978-981-33-4601-7_6
Download citation
DOI: https://doi.org/10.1007/978-981-33-4601-7_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4600-0
Online ISBN: 978-981-33-4601-7
eBook Packages: Computer ScienceComputer Science (R0)