Keywords

1 Introduction

In the past 1960, modernization of distributed control system (DCS) at power plants has provoked changes on the traditional way to operate, supervise, and diagnose the operation of the units at power plants. Technologists, suppliers, and experts in alarms systems administration believed to improve operation the units on having incorporated exorbitant alarms quantities into the DCS, in other words, from a conventional quantity in order of tens or hundreds (80–250) of alarms to quantities of thousands (4000–20,000). This situation of “progress” was considered a myth and this has been confirmed thanks to alarm management.

During the practice of last decades it has been necessary to put attention to situations of risk, accidents, incidents and other aspects related to the safety of the resources at power plants.

Trips quantity of the units increased in an alarming way without being provided necessary with a convincing explanation of why occurs even with better instrumentation, major quantity of signals for supervising, fastest and opportune information delivery to the operator, best improve in communications, in the operator interfaces, more modern equipment, among others. That’s why the need to diagnose the performance level of the alarm systems before and after realizing a suitable alarm management.

2 Background

In the 90s, immediately after the creation of the ASM (Abnormal Situation Management) consortium, the EEMUA 191 (Engineering Equipment and Materials Users Association) guide, the NAMUR 102 (Alarm management) recommendation and the review of the already created ANSI/ISA 18.2 norm [ 1 , 2 ] there has been concern to review the alarms systems for the purpose of improving the generation units.

Additionally programs and diagnosis systems that allow to experts and consultants in the topic to identify where the origin of the problems is located have been developed. Where does the problem initiate? Why the problem exist? How can trips quantity by unit be minimized? What tools can contribute to improve the operation of the generation process? and so on.

All these questions converge on the need for a necessary and urgent diagnosis that identifies elements or areas of progress that contribute substantially to a better operation of the units from the operator, this last being fundamental in the operation of control room at power plants .

On having diagnosed the operative state of units not only anomalous situations are identified, but it is necessary to understand the areas of opportunity that support the operator in a more efficient way to operate the power plant. This finally will ultimately lead to a better use of any nature resources: human, financial/economical, equipment, raw material, etc.

Most of cases, there is a confusion of what an alarm is. It is very important that before examining alarm management best practices, to understand the complete concept of the alarm meaning. No matter what the process is concerning to, an alarm has the following purposes.

  1. 1.

    To alert of an abnormal change

  2. 2.

    To communicate the nature of the change as well as possible causes

  3. 3.

    To direct to take proper corrective action

In the best practices, the most important contribution of an alarm is that it needs an action as part of the operator tasks.

As preamble, it is important to mention that the Instituto de Investigationes Eléctricas http://vmwl1.iie.org.mx/sitioIIE/site/indice.php or IIE, as part of the activities of alarm management project, initiated in October 2010, has developed software diagnosis tools for power plants to the CFE (Federal Commission of Electricity), the only company of generation, transmission and distribution of electricity in the Mexican territory. The initial labor was realized in a thermoelectric power plant, gotten the first objectives of alarms rationalization such changes were incorporated into the database of two units, the first one off line and the second one in line with the DCS executing, experiencing the first results and increasing the safety of the systems of the power plant and the operative reliability of the same plant [ 3 ]. Figure  1 shows available findings distribution on related analysis and monitoring systems that have contributed to improve alarm system on a global scale.

Fig. 1
figure 1

Benchmark of system diagnosis around the world

3 Related Works

Later to the advent of the DCS , introduced about 1975–80 for different firms as Honeywell ® , Mitsubishi ® , Yokogawa ® , Siemens ® , ABB ® , among others, and with the change of paradigm through modern computers with big databases, high resolution interfaces and fast communications, have brought a big change in the alarm management. Commercial applications exist for the diagnosis alarm systems as well in petrochemical and gas area [ 4 ] and the electrical sector. For example, for the last one, the Matrikon ® Alarm Manager’s Advanced Analysis [ 5 ] that automatically generates a report of unit performance in accordance with parameters established by EEMUA and ANSI/ISA norms; it identifies redundancy in the alarm configuration, chattering alarms, priority distribution, statistics of alarm occurrence, etc. Another similar system designed for identification of related problems with alarms is the Y-Plant Alert TM of Yokogawa, which detects alarms events, avalanche alarms, visualization on screen of the alarms state, events inside certain intervals that provide to the alarm manager useful information of the state of the unit. The IIE has developed a generic tool for the off line analysis of the alarm systems of any unit in a generating at power plant , which purpose is to take it to the practice and extend it like evaluation support and complement to the suggested guidelines by the international norms, for the activities of alarm system rationalization.

All these with certain automation are based on the guidelines of the reference norms as well as on the guide lineaments of PAS alarm management [ 6 ].

4 Asarhe® Diagnosis System

The IIE diagnosed the alarms systems in the electrical generation sector in six chosen pilot power plants, by technology type: hydroelectric, coal, combined cycle, diesel, geo-thermoelectric and steam conventional thermoelectric, identifying the absence of the procedures or guide for alarm documentation and classification, even for the alarm design or redesign, and maintenance of the alarm systems, based on international reference norms. Based on the obtained results, a system named ASARHE * (Analysis of Signs of Alarms based on Historical Events Records) has been prepared, of property technology to realize diagnoses of units [ 7 , 8 ], in addition to preparing the alarms philosophy for every power plant that considers as specifications, the requests that establish the criteria for the management of alarm systems, and how it is applied in other areas.

Due to different commercial DCS , there are also diverse ways of presenting variables to monitor, so there in not standardization on displaying alarms on operator station. This last provoked a mix of signals related to maintenance, communications between software and hardware in order to control and maintain the unit within normal operational conditions.

Different DCS used to make a laborious data interpretation of the alarm system diagnosis.

4.1 Initial Interface

One of the most important aspects to determine the alarm system performance is precisely the quantitative analysis of alarm historical record, in which information of each alarm is stored with tag, name or description, prioritization level criteria (critical, warning or tolerance alarm), set point, and occurrence date and hour.

Figure  2 shows the main interface of the ASARHE to prepare the tasks sequence that must be carried out and parameters for the analysis of the alarm historical files until the result and its interpretation.

Fig. 2
figure 2

Initial interface of ASARHE

The tasks are as follow.

  1. 1.

    Cleaning of previous historical records

  2. 2.

    Power plant selection to be analyzed

  3. 3.

    Unit deployment at power plant

  4. 4.

    DCS type

  5. 5.

    Technology description of power plant

  6. 6.

    Selection of historical record directory

  7. 7.

    Deployment of historical record

  8. 8.

    Download information to ASARHE

  9. 9.

    Exit to process information and graphs deployment

4.2 Process Flow of the Proposed System

Historical data is stored at the engineering station in a proper format of the DCS , for which is necessary to convert them to an understandable format for the ASARHE to process later the information and deliver the analysis results to the user. Figure  3 shows schematic representation of data conversion, data processes, and alarms distribution graphs.

Fig. 3
figure 3

Flow chart of ASARHE system

4.3 Data Format Conversion

Information conversion initiates from the information registered in the alarm historical record, registered on tape, flexible discs and most modern DCS in order to get the most appropriated selected data. This information is kept in a defined format by the provider and later will be used by the analysts in chief to check the sequence of operations, alarms occurrence, operator’s answer, etc. before the destabilization of normal conditions of operation. A typical conversion needed by ASARHE appears in Fig.  4 , in which differentiation of the order of the information and the separation of the alarms happened by daily periods for one month can be observed.

Fig. 4
figure 4

Data conversion first input to ASARHE

A code segment for one data conversion module written in Visual Basic is shown in Fig.  5 .

Fig. 5
figure 5

Code segment of 20 most frequently conversion module

4.4 Identification of Alarm Types and References

As soon as the information is converted to ASARHE format, the graphs are generated with statistics of quantity of alarms per month, for operator’s shift, per hour, and per every 10 min. The ANSI/ISA norm establishes that an alarm must appear of the following way. See Figs.  6 and 7 .

Fig. 6
figure 6

Type of alarms per period: monthly

Fig. 7
figure 7

Type of alarms per period: every 10 min

  1. 1.

    In normal situations, an alarm occurs every 10 min.

  2. 2.

    In a disturbance, during the first 10 min, there will be a maximum of 10 alarms.

4.5 Results Interpretation, Findings, and Performance Level Determination

As part of the generated results by ASARHE, the 20 most frequent alarms, named bad actors are shown. See Fig.  8 . From these alarms a tag which serves to identify every instrument and it generally coincides with badly calibrated or aged instrumentation that may need adjustment of its set point.

Fig. 8
figure 8

Bad actors and nuisance alarms

Nuisance alarms could be duplicate alarms on the system. This is a typical situation between the overloaded and reactive performance level and the identification of these alarms as well point adjustment tasks could be the difference to set the system on a reactive or stable level, and it is often usable in practice during plant upsets.

In Fig.  9 also the alarms quantity per day are identified, which is a good indicator of the overall health of the alarm system. In this graph, the most alarms are under maximum acceptable (300) and manageable (150) except during last seven days of the month due to a disturbance occurred in a generating process, but general speaking, the performance of the alarm system was in a good acceptance.

Fig. 9
figure 9

Finding of critical, warning, and tolerance alarms quantity per day

Distribution of alarms occurrence by priority type is shown in Fig.  10 .

Fig. 10
figure 10

Distribution alarms by priority per day

These last two graphs determine the performance level in accordance with the limits established in the reference norm as indicated in Fig.  11 .

Fig. 11
figure 11

Performance level of alarm system

Performance levels are [ 1 , 2 ]: (1) Overloaded—alarms are very difficult to distinguish from less important ones; (2) Reactive—operators react more to the rate of alarm generation than to the purpose of the alarms themselves; (3) Stable—all alarms are meaningful and have a specific response; (4) Robust—operators strongly trust the alarm system, and have time to attend all alarms; and (5) Predictive—alarm system is completely stable and provides the operator with timely, accurate information. EEMUA-191 performance level model, edition 2-2007 recommended this last level as optimum management. For most power plants , to reach this last level depends on the state of the art of the technology at that moment. Same EEMUA-191, third and revision 2-2013, considers robust level as top performance level. This level requires early and adequate fault detection, moving forward the process tendency, as well as incorporating artificial intelligence techniques, among others.

4.6 Preparation for Alarms Rationalization

Common problems refer to the excessive quantity of alarms presented to the operator, to the identification of the chattering alarms, to the distinction of alarms and events, and to the determination of the state of the alarms system.

As previously observed, immediately after DCS modernization, the problems of administration and suitable handling of the alarm systems created disturbances in the control rooms and hence in the power plants where control and monitoring operative processes of whatever the application are. For such reason, since modernization, the concern of restoring alarms systems again has arisen, which is a “regression to” when deployments alarms were done from light box annunciators and appropriate legends allowed the operator to control the normal state of the process (gone are those days). Steps as part of the alarm management of the ANSI/ISA administration cycle under the 18.2 norm is the next labor to be done, which phases shown in Fig.  12 are the following.

Fig. 12
figure 12

Alarm management life cycle stages

Philosophy ①—basic design of alarm system; Identification ②—collection point for potential alarms; Rationalization ③—applying prioritization requirements; Design/re-design ④—basic alarm design, HMI design, and design of advanced alarming techniques; Implementation ⑤—installation alarm system as well as operators training. Operation ⑥—confirm alarm philosophy and purpose of each alarm; Maintenance ⑦—test and adjust if alarm operational is not working properly; Monitoring ⑧—continuously monitoring the overall performance; Changes management ⑨—identifies problem alarms for maintenance; and Audit ⑩—to continuous improvement, closing the alarm management life cycle.

Alarm’s operator interface is shown in Fig.  13 , which presents different type of alarms during normal operation at a modernized 350 MW unit of a thermal power plant.

Fig. 13
figure 13

Alarm’s operator interface

5 Alarm Rationalization

Once the unit has been evaluated, it will be necessary to apply the steps of the alarms administration cycle (Fig.  12 ) and to reduce substantially the quantity of alarms that present to the operator in its interface. This work in general is realized by expert engineers and experienced operators on the operation of the power plant .

  1. 1.

    To prepare the alarm philosophy, definitions and terminology to use, rationalization criteria, alarm priorities definition, deployment criteria of HMI prioritization, monitoring, maintenance plan, test, as well as operators’ training. Optimum alarm distribution criteria must be the following. See Fig.  14 .

    Fig. 14
    figure 14

    Operator’s response time

  2. 2.

    To identify the information in the database of the DCS . It is necessary to understand its complete content.

  3. 3.

    To analyze bad actors , monitoring the current alarm system and to identify the system performance, to identify the alarm occurrence and to separate alarms from events.

  4. 4.

    To document the alarm book containing the tag, set point, priority and a clear description of (1) Cause of the alarm: why did the alarm occur? (2) Action: what must the operator do to restore the process to its normal condition?, and (3) Consequence: what happens if the alarm is not attended?

  5. 5.

    To focus in the priority and in the possible change of every alarm, which must be checked by the person in charge of the alarm system together with expert engineers in operation, electrical, safety, faults and other related areas.

  6. 6.

    To implement a suitable alarm administration in real-time, this means, to establish a methodology that guarantees the update of every alarm.

  7. 7.

    To keep control of program changes of every alarm.

The first three steps described are always necessary and the last four represent the most arduous part and important labor of alarm rationalization and the performance and optimal operation of the power plant will depend on it.

6 Conclusion and Future Work

The diagnosis before alarm rationalization is fundamental as improving the performance level of the alarm system in process plants can avoid accidents, losses of production and unnecessary trips unit, that in turn, affect copiously the economic resources in power plants, as well as the reliability, relevant aspect in safety terms [ 9 , 10 ].

The diagnosis uses a systematical, validated, standardized and highly advisable, comparable methodology on a global scale. In Mexico no reference exists and it represents a challenge of big dimensions for the IIE, since data base systems that operate in the real power plant are re-designed, and any mistake can be of unimaginable consequences.

Up to this moment, there is not another technologist in the country that can diagnose the units of power plants. Nevertheless, the activities of rationalization that the IIE is applying to the alarm systems to have been attended at power plants of the CFE, can be compared with companies on a global scale that also are applying a suitable management rationalization and administration of the alarm systems, such as the cases of products of proprietary analysis of Matrikon, Emerson (United States and Canada), ABB (Switzerland), Finland and Sweden; Siemens (Germany), and Yokogawa (Japan).

From 2010 to 2013, 148 alarm systems were rationalized in 50 different power plants. The alarm books were prepared for every unit. In all units where rationalized alarms were implanted they moved from OVERLOADED to STABLE performance level.

ASARHE system will continue being applied to power plants that modernize its DCS and that adopt, as an integral solution, the alarm management inside the electrical sector as well as like part of its daily activities and of a new culture and continuous progress.

In the future advanced skills of alarm management for optimization of the alarm system will be included. This will contribute to get ROBUST or PREDICTIVE performance level to generating units in accordance to process type and operation mode.