1 Introduction

Of the hundreds of machinery-related accidents occurring every year in the mining industry, incidents involving stationary machinery at surface mines continue to be among the most frequent [1]. A National Institute for Occupational Safety and Health (NIOSH) study showed that the majority of fatal accidents involving stationary machinery at surface mines occurred at sand and gravel (38%) and stone (26%) operations. Of these accidents, entanglement in conveyor components were the most common cause of fatal accidents (48%) [2]. The same study stated that one-third of these accidents involved improper lockout/tagout (LOTO) procedures as a contributing factor. The U.S. Mine Safety and Health Administration (MSHA) acknowledged this problem, stating in a recent request for information (RFI) [3]: “Since 2007, there have been 17 fatalities related to working near or around belt conveyors, of which 76% were related to miners becoming entangled in belt drives, belt rollers, and discharge points. Factors that contribute to entanglement hazards include inadequate or missing guards, inadequate or an insufficient number of crossovers in strategic locations, and/or inappropriate lock out/tag out procedures. Systems that can sense a miner’s presence in hazardous locations; ensure that machine guards are properly secured in place; and/or ensure machines are properly locked out and tagged out during maintenance would reduce fatalities [3].”

In response to this problem, NIOSH’s Spokane Mining Research Division (SMRD) is exploring the potential application of Internet of Things (IoT) technologies to provide cost-effective intelligent machine monitoring systems for improved worker safety [4, 5]. For phase I of this project, SMRD partnered with Central Pre-Mix CRH Company (Central Pre-Mix) to develop and install a proof-of-concept wireless IoT solution to monitor machinery and conveyors during operation and maintenance. The primary goals of the system were to provide real-time monitoring of access points and facilitate the planning and execution of LOTO procedures. Phase II, which is presently underway, considers expansion of the system in scope and functionality.

This paper describes the design and field deployment of phase I of this system, which represents a vital first step towards widespread adoption of intelligent safety monitoring systems. Current and future work for phase II is also described, which will see the completion of a final, comprehensive system for use by industry.

2 Phase I

The first phase in developing the intelligent monitoring system at Central Pre-Mix involved monitoring equipment and concrete batch temperatures, managing lockout/tagout, and tracking confined space entry. The development and field installation at Central Pre-Mix served as proof-of-concept for an IoT solution to machine safety while addressing specific safety concerns at the batch plant. This system is informational: it provides real-time data on intrusions, safety status, and LOTO, but it is not intended to automate these functions.

2.1 Monitoring, LOTO, and Confined Space Requirements

Machinery monitoring requirements for this study were driven by Central Pre-Mix’s concrete batch plant daily operational and maintenance practices. High priority needs included monitoring access to the mixing area, measurement of concrete batch temperatures, measurement of the temperature of the concrete mixing drum main support bearings, and end-of-shift maintenance of the concrete mixer drum.

As the mixing area is considered hazardous, monitoring access was deemed critical. While access for workers is not restricted, monitoring the access door allows batch plant operators to be aware of any workers entering or leaving the vicinity.

Temperature is monitored for two high priority locations, i.e., for the mixing drum support-bearing and the concrete batch. Knowing the concrete batch temperature is critical to ensure quality and long-term integrity of the concrete. The batch temperature is currently measured by a worker using a handheld infrared (IR) thermometer, but it was desirable to halt this procedure since it holds potential for workers to get entangled in the mixer drum while taking measurements. The temperature of the concrete mixing drum main support bearings is checked regularly as a predictor of unexpected bearing failures. Predicting failure of the main bearings can reduce hazards associated with a catastrophic failure, or hazards imposed when repairs are hastened, as well as preventing higher costs incurred due to catastrophic component failure.

The principal confined space restriction managed in phase I is that surrounding the mixer drum. Daily maintenance of the mixer drum requires a worker to crawl inside the drum and remove accumulations of hardened concrete using a pneumatic rotary hammer (mixer drum chipping), as shown in Fig. 1. Access to the mixer drum entry is through a gate labeled with a confined space warning sign. Before entering the drum, electrical power is isolated from the drum, charge belt, mixer feed conveyor, and hydraulic pump motors using four disconnect switches. The daily cleaning procedure thus entails both LOTO and confined space protocols.

Fig. 1
figure 1

Entering mixer drum to perform maintenance

During daily mixer maintenance, the four disconnects mentioned above are locked in the OFF position (lockout) using a long bar held in place by a single padlock (Fig. 2). A tag belonging to the worker(s) involved is placed on the lock (tagout), then verification of electrical isolation is performed by attempting to start each motor (testout), at which point the LOTO is complete and the worker may enter the drum.

Fig. 2
figure 2

Locked out disconnect panel

Before entry, a confined space permit must be filled out. Both the worker who is performing the chipping and an attendant who will remain outside the drum must sign the form. Once the mixer drum chipping is complete, the confined space entry permit is indicated as complete by writing the word “canceled” in big letters across the form. The last steps are filing the completed permit onsite and reversing the LOTO process before startup.

2.2 System Design Considerations and Hardware Selection

The system designed for phase I was proof-of-concept. The central aim was to determine the viability of IoT in providing intelligent machine monitoring and assisted LOTO. System design considerations included wired versus wireless technologies, sensor types, sensor node power requirements, sensor node network topology, and data transport method.

One early decision for the project was the selection of battery-powered nodes, as opposed to wired. As the eventual goal is installation in large-scale surface stone, sand, and gravel (SSG) mine machinery and conveyor systems, wired sensors were deemed impractical. A wired solution would severely increase installation complexity and maintenance costs, along with slowing deployment. Further, some sites would have limited access to mains AC power. It was with these factors in mind that battery-powered sensor nodes were selected.

The appropriate network topology for the system also posed an interesting problem. There are many options, such as bus, star, tree, and mesh, each with their particular pros and cons. The bus, star, and tree topologies, for example, offer an easier installation than a mesh network, but there is a greater risk of cutting off segments of the network if there is a failure at a critical point such as the central hub (for a star) or along the linear bus (for a bus network). The mixer area presented a challenging environment for a wireless network as it is filled with numerous metal objects, e.g., mixer drum, mixer drum support structure, conveyor assemblies, electrical panels, and conduits. Also, only three of the proposed sensor node locations had a line-of-sight to the IoT node coordinator (gateway). Given the difficult environment, it was decided that a self-configuring, self-healing mesh network topology would offer the best chance for robust wireless links [6, 7].

In order to process data independent of Central Pre-Mix’s information technology (IT) network, a cellular data transport (backhaul) was determined to be the best choice to transport sensor data to cloud storage [8]. Cellular data transport does not place a burden on the site’s network infrastructure, and it avoids any concerns regarding confidential data. Additional discussion on the usage of cloud storage and computing is discussed in greater detail in Section 2.4 below.

With the system design criteria defined, a review of various wireless IoT technologies and manufacturers was conducted, seeking products that were commensurate with our requirements.

Some wireless protocols under consideration included LoRaWAN, ZigBee, and Bluetooth 5LE. LoRaWAN is a low-power, wide-area network that uses the LoRa (long range) layer to deliver messages to a gateway without the multiple hops typically seen in a mesh topology [9]. ZigBee offers a much shorter range, typically under 100 m, but in contrast it is capable of a data transfer rate five times higher [10]. Bluetooth 5 Low Energy is a relatively new product that modifies the existing Bluetooth stack to support low power scenarios such as IoT. It is capable of a high data rate (as high as 2 MB/s) and a recent addition to the specifications added support for mesh topologies [11].

The final option, and the one selected for the first stage prototype, was the Wzzard platform by Advantech (Ottawa, IL). It is a turnkey solution in the form of rugged IP67 (no dust penetration for the device and it can be immersed in water up to a depth of 1 m) rated nodes, ideal for system prototyping in an industrial environment. Additionally, these nodes have multiple analog and digital sensor inputs and create a self-forming and self-healing mesh network topology. This solution is scalable to hundreds of nodes and has available a cellular gateway for ease of data transport to cloud storage (Fig. 3). Phase I therefore utilized the Advantech SmartSwarm gateway and Wzzard (Fig. 3). These sensors utilize the message queuing telemetry transport (MQTT) protocol, which is a publish/subscribe protocol requiring a publisher (the sensor), a broker (the gateway), and a subscriber (the webpages). This Wzzard platform is time synchronized, which helps to ensure low power by reducing the duty cycle. However, this also means that data is received, at most, every 10 s.

Fig. 3
figure 3

A SmartSwarm cellular gateway (left), industrial-grade Wzzard node (middle), and commercial-grade Wzzard node (right)

2.3 Installation, Configuration, and Commissioning

Field installation began with the setup of the system’s sensors. For the access door and entry gate, magnetically activated reed switch sensors were selected. With this method, the opening of the gate/door is detected while ensuring there are no false readings from vibration [5]. For temperature measurements on the bearings, contact thermocouples were installed under mounting bolts with thermal grease. Finally, the batch temperature was measured using a self-powered IR thermocouple, obviating the need for a probe that would be quickly worn if placed in the material flow.

Once sensors were in place, each was connected to its dedicated node. The roller temperature nodes were set to a publish rate of 2 min, whereas the door, gate, and safety disconnects were set to publish every 10 s. Publishing every 10 s will greatly reduce battery life, but was deemed necessary to provide timely worker location data.

After the hardware installation, all of the nodes were configured to communicate with a gateway on which a scripting software (Node-RED) was used to parse the data and create webpages, displaying the sensor data on a web browser. Additionally, the gateway collects the data from sensors and sends it in the form of encrypted MQTT messages to the cloud where scripting software parses the data and relays it to webpages for remote viewing. The redundant local viewing (store and forward configuration) provides a safeguard in the (yet to be encountered) case where the cellular network fails. The local display consisted of a 19″ touch screen and a Raspberry Pi essentially acting as a thin client, modified to operate in kiosk mode. Remote or local viewing is possible on any device with a current web browser as shown in Figs. 4 and 5.

Fig. 4
figure 4

Web browser view of disconnect switches and gate/door statuses

Fig. 5
figure 5

Web browser view of online confined space entry form

In addition to viewing data, the prototype system allows the worker to use a tablet or cell phone to populate forms that are required for LOTO or confined space procedures (typically done using paper and pencil). While this currently does not supplant paper forms, the digital submission and archiving of such forms provides useful data that can easily be referenced later. One possible use of such data is to monitor that the LOTO process was done correctly, which will in turn reduce the likelihood of worker circumvention. Further, having the mobile planning option has great potential to reduce the burden of the LOTO process.

The two key challenges to the implementation of this system were the battery life of the nodes and the configuration of the network to insure all nodes had a robust connection. These two issues are closely related in that one of the main causes of reduced battery life occurs when nodes are repeatedly seeking and failing to connect with each other. This can be aided by using better antennas (e.g., larger, externally mounted) or by installing “repeater nodes” in cases where neighboring nodes are failing to connect properly.

The “network health reports” were consulted to address the above difficulties and troubleshoot connectivity problems. Perhaps the most useful information contained in the health reports provided by Advantech nodes (and common among others) is the received signal strength indication (RSSI) for each node and its neighbors. This data was vital in fine-tuning the system to prevent data loss. Additionally, nodes which are having to repeatedly rejoin the network can be detected in order to troubleshoot and conserve power. The health reports also include dropped and received packets, the former of which indicates potential interference and battery loss, while the latter is useful for determining the flow of traffic through a mesh network. The node’s battery voltage and cumulative charge consumption (in millicoulombs) are also provided and can be used to determine, over time, faults in the node or network configuration.

A final feature of the safety monitoring and planning system included the use of electronic forms, which are compared to the sensor data. For example, if there has not been a form submitted to plan the chipping of the mixer and the gate is opened, an alarm is sent via text message and email. This enables not only assurance that LOTO protocol is followed, but also allows for alarms to be sent in the event the chipping of the mixer takes unusually long.

2.4 Cloud Data Storage

The sensor data is sent to the cloud from the local mesh network using a cellular gateway, creating a network independent from the site’s existing infrastructure. This provides additional security by segregating the collected data from sensitive material on the site’s existing network. Additionally, the use of a cellular backhaul provides a quick installation, which can be easily upgraded, replaced, or relocated at the site.

The monitoring system uses the cloud for data storage, which has become common for enterprise applications, as it offers many benefits over traditional local storage. Foremost, the monitoring data can be easily viewed remotely, providing personnel with real-time data off-site which can be viewed from a PC or mobile device. A safety officer, management, or foreman will receive alerts concerning LOTO violations or impending equipment failure and can readily check current conditions in the plant. Cloud storage also offers scalability. Should the monitoring system expand, local network and storage could be placed under a heavier load and require upgrades and maintenance. Cloud storage removes this problem through horizontal scaling, as additional resources can be allocated to meet rising demand. Additionally, all monitored data can be stored, if desired, to facilitate long-term analysis or to leverage machine learning to offer new insights regarding plant operations.

One concern about cloud storage is security, namely data leaks. Future software revisions will implement rigorous encryption for cloud data, along with an option to bypass the cloud and opt for local data storage. It should be noted that many cloud servers will be quite secure given most providers will implement security patches as needed.

3 Phase II

3.1 Expansion of System and Increased Functionality

Researchers at NIOSH have already begun work on the second phase of this project. The first goal is to expand the system by scaling up the network. This means a greater number of nodes covering a wider field and will, naturally, increase the network complexity. The second aim is to add functionality to the system, such as predictive failure analysis and proximity detection. Throughout development, there will also be hardware and software revisions in order to provide the best fit solution for each site. The third and possibly most important task in phase II will be to evaluate the impact of the system on a worker’s situational awareness (SA). This will include refining the user interface, filtering data in order to provide the most relevant information, and finally testing to make certain our system is improving SA.

The initial step in scaling up the system will be conducted at Central Pre-Mix. As the current site is operating efficiently, this will make an excellent testbed and allow for faster troubleshooting of the network. The expansion plans will triple the number of sensors and add an additional building and conveyor. Parallel to this work, NIOSH will be investigating alternative hardware and software solutions. The hardware will include various radios and protocols, such as Bluetooth 5 LE, LoRaWAN, and ZigBee. Data will be gathered regarding energy consumption and evaluated to determine best fit for a given scenario. For example, in order to service remote installations, systems will have the option to use a satellite gateway in lieu of cellular. Additionally, researchers will evaluate the merits of a pure web-based solution against the most common HMIs used by industry. Software evaluations will take into consideration industry adoption, cost, latency, security, and free access to data. Phase II will also include a locally hosted version of the software. Although cloud storage has been the preferred method during development, allowing sites to host the software on a local network will also enable remote locations to make use of the system.

Phase II will also incorporate a number of new functions. The system will add predictive failure analysis by analyzing historic data in order to anticipate imminent equipment failure (e.g., bearing failure, which can be predicted through high operating temperatures). Machine learning will be brought to bear on these problems as well, with the development of novel algorithms to offer new insights into the causes of component failure. This will reduce the potential for accidents by notifying operators of necessary maintenance prior to catastrophic equipment failure. Proximity detection will also be considered, beginning with a study of turnkey options such as video analysis or RFID. It is assumed that any use of proximity will be limited to an alarm system as this stage, as opposed to an automatic equipment shutdown. Further, the system will include a component to measure and reduce dust exposure through improved transfer points and real-time monitoring. The system will use sensor data to estimate air flow at potential hazard zones and compensate to restrict airborne particles. Should there be hazardous exposure levels, alerts will be sent to users via text/email. In order to ensure the safe and reliable expansion of the system, a reliability analysis will be conducted by NIOSH researchers and interested stakeholders. This is anticipated to take the form of a failure mode and effects analysis (FMEA) or fault tree analysis (FTA).

As an additional measure, SMRD is developing an inspection web app that can work independently or in concert with the monitoring system. The app will guide a user through a pre-shift inspection, in compliance with MSHA’s new regulation [12], and archive the results for future reference. Additionally, the app will track outstanding maintenance issues, which can be updated as they are addressed. Although electronic records have not supplanted paper yet, the eventual goal is to see paper replaced with electronically signed documents. This app will greatly reduce the time required for pre-shift inspections and record keeping.

The final stage of the project will measure worker situational awareness. For the purposes of our system’s development, SA can be broadly understood as a worker’s perception and understanding of his environment, along with the ability to predict events based on that information. It is a central aim of this project that the informational system improve SA. To that end, the user interface will be refined through a multi-stage, iterative process. Feedback will be solicited from stakeholders, test sites, and experts in human factors and then applied to a series of software revisions. Additionally, the data will be filtered to the end user to avoid extraneous details or distractions, with the goal being to display critical information by default, and allow customization as desired. Finally, testing will be conducted to determine the impact of the system on worker SA. This will take the form of an analysis of the accumulated data in concert with live volunteer testing.

4 Conclusion

Initial testing of the IoT monitoring system, in collaboration with Central Pre-Mix, has successfully demonstrated a method for electronically tracking and confirming lockout/tagout. Additionally, the system has proven capable of reliably archiving lockout/tagout and confined space entry data. Early work has also provided equipment and batch temperature readings, both of which are available in real time to workers. The mesh network employed has proven viable in an extreme environment, and results indicate a cloud-based solution will meet the system’s needs. Phase I provides a clear example of the promise in bringing IoT to the mining industry, reliably and at a low cost.

Early work on phase II of the project is promising. NIOSH researchers are currently expanding on the existing intelligent monitoring system, in terms of scope and functionality. The final, comprehensive system is expected to include predictive failure analysis using historic data archived in phase I, as well as additional sensors to provide monitoring things such as proximity detection to hazards, or localized environmental conditions. Further, the final system will be scalable to larger installations, with hundreds of sensors in operation. This is expected to save time, improve workers’ situational awareness, and reduce accidents in the workplace. Development will strive to maintain low cost, accessibility, and ease of use, in order to ensure wide adoption.

The proliferation of IoT devices, in the business and consumer spheres, illustrates that the public is ready to adopt these new solutions. The mining sector in particular will benefit enormously from the ability to remotely view safety-related data in real time and to receive alarms when safeguards are potentially failing. There is a preponderance of evidence that the operation and maintenance of equipment, especially conveyors, is hazardous to workers, and NIOSH will continue working to reduce these accidents through leveraging emerging technologies such as IoT.