1 Introduction

Millions of individuals worldwide are impacted by the issue of noise pollution. The explosive growth in population, urbanisation, human leisure activities, construction, and industry, as well as the use of transportation (motorways, rails, and air traffic), one of the primary sources of environmental noise, are all factors contributing to noise pollution [1, 2]. However, indoor noise pollution can be more specific, including in libraries, offices, courts, schools, and hospitals [3, 4]. Since noise pollution has become intolerable to human beings, numerous efforts have been conducted to evaluate noise levels in urban cities [5,6,7], industrial cities [8, 9], and traffic in general [10,11,12,13,14]. Some studies focussed on environmental noise monitoring and geo-spatial mapping analysis [15,16,17], while others performed noise source identification and classification using various techniques [18,19,20,21] to pinpoint the most problematic and disruptive noise sources.

In this context, Intensive Care Units (ICUs) are environments where patients encounter a range of acoustic disturbances originating from various sources. These include the medical equipment essential for patient care and the activities of healthcare personnel [22]. The impact of this noise extends to both patients and the healthcare staff, affecting them psychologically and physiologically [22]. Research indicates that patients often recall noise as a prominent and sometimes distressing aspect of their ICU stay [23,24,25]. Several studies have indicated that it is a major environmental hazard to human health, contributing to an increased risk of cardiovascular issues, high blood pressure, and stress. Sleep disruption has been found to be the most common negative health influence related to prolonged exposure to noise, as it can lead to serious problems such as delirium [26, 27]. Studies have also shown that noise can have a significant impact on the performance of medical staff, including reduced cognitive abilities, altered social behaviours, deficient sustained memory, impaired speech communication, and decreased attention and work skills [28,29,30,31]. These factors can cause impaired execution of tasks and, consequently, increase the potential for errors and wrong decisions during working days [32, 33].

Over the years, with the advancement of technology and the addition of more equipment in ICUs, the levels of sensory stimuli in these environments have increased significantly, leading to higher noise levels [34]. Due to the significant consequences of acoustic noise, the WHO recommends that noise levels in ICUs should not exceed 35 dBA during the day and 30 dBA at night, with no peaks exceeding 45 dBA [35]. However, the available evidence indicates that the noise levels measured in ICUs are much higher than what is recommended [36,37,38]. The extreme and unmanageable noise levels result in an unfamiliar environment for patients requiring care. Consequently, reducing noise levels in hospitals, and ICUs in particular, has become a pressing issue, and significant attention has been given to efforts aimed at mitigating these levels. There have been several passive control programmes designed and implemented in an attempt to reduce noise levels, such as altering sound-absorbing materials or redesigning internal construction [39], using earplugs or headphones [40], and providing staff training for behaviour modification [41]. However, none of them have yet been able to satisfy the recommended guidelines since outcomes have shown that these programmes did not significantly reduce noise levels.

In many studies, monitoring and analysing sound levels have been conducted utilising high-precision SLMs [42,43,44,45]. The study presented by [46] offered a graphic depiction of the typical "noisiness" map for the ICU layout, highlighting locations with high sound levels. The study emphasised the importance of pinpointing the main noise sources to gain a better understanding of the nature of these sounds, ultimately improving patient experiences and outcomes. The authors suggested the use of machine learning techniques to identify and categorise various noise sources accurately. An Artificial Neural Network (ANN) was employed in [47] to characterise the types of sounds in a Neonatal Intensive Care Unit (NICU). The study conducted short-term noise measurement, 24 h, with different sound level parameters recorded every second. The study found that equipment alarms and staff conversations significantly influence the acoustic environment in the NICU. The lack of uniformity in the sound spectra, simultaneity, and concomitance of noise sources hindered the unequivocal interpretation of some results of the classification model due to the limited characteristics of its inputs. Other studies, however, have recently developed cost-effective IoT-based systems as an alternative approach. Marques et al. [48] proposed an IoT-based system linked to a mobile computing technology. The system comprised an ESP8266 microcontroller and a Gravity Analogue Sound level meter for real-time noise monitoring and data collection, which were stored in a SQL Server database using Web Services. The proposed system provides advanced data analysis and visualisation, making it compatible with all domestic devices despite being designed using low-cost components. Jose et al. [49] introduced a low-cost system deployed in Linares, Spain, for continuously monitoring real-time and spatial noise data. The results indicate its effectiveness in generating accurate maps of noise levels, although being constructed using affordable components. Nevertheless, relying solely on sound pressure levels (SPLs) is insufficient to provide a comprehensive representation of the acoustic environment and accurately identify the characteristics and nature of the major noise sources.

Only a limited number of studies have investigated the sources of noise in ICUs either by relying on surveys of staff and patients to identify and assess sources of disruptive noise [50,51,52] or by placing observers situated in the patient's area to document instances of disruptive sound [53,54,55,56]. The existence of a human observer has the potential to impact the acoustic environment by affecting the actions of nursing personnel. Those techniques could potentially result in measurement prejudices. Very few studies recorded audio events to determine noise sources in ICUs. For instance, Park et al. [57] conducted continuous soundscape recording for three days in a single room, but only the first 24 h of recording were utilised for analysis. As a result, the depiction of the sound environment was limited and inaccurate. Additionally, it required a team of six research assistants to listen to the audio files for manual segmentations and annotations, which took a total of 350 h to complete. The work [58] presented preliminary results of an acoustic analysis of sounds in a NICU, including the establishment of an annotated audio database, and focussed on automatic vocalisation detection using Gaussian Mixture Models (GMM) with the produced database. Although ten acoustic settings were performed to generate audio datasets, only 18 files were utilised. The paper acknowledged the need for further research, including a larger dataset and a more thorough analysis of the Spectro-temporal characteristics of acoustic events. Later, Shield et al. [59] conducted a comprehensive noise survey of five general inpatient hospital wards in the UK. This involved continuous noise monitoring over several days as well as recording short sound files whenever the maximum noise level exceeded a pre-determined threshold, set to 70 dB. Sources of noise have been manually identified by listening to the sound files. The study presented by [60] characterised an ICU soundscape. The recorded acoustics data were processed, revealing ten noise sources, with the nurses' conversations and alarms produced by EKG monitors and ventilators identified as the main sound sources. However, there is a lack of information about how the identification was performed, whether manually or using any other detection techniques. He et al. [61] used A WENSN® WS1361 meter to measure noise levels and sound events in an ICU ward. The study recognised twenty noise sources, while the methods to identify these sources were unknown. The authors noted that it was challenging to detect noise sources independently due to the presence of several noise sources at once. The high level of systematic effort required, including the allocation of significant human resources for manual segmentation, annotation, and identification, makes replicating this methodology challenging when dealing with a large dataset.

Given these considerations, there is a compelling need for a more intelligent approach to identifying ICU noise sources, motivated by the acknowledged challenges and gaps in scientific research. This paper aims to pinpoint the major sources of acoustic noise and quantify their contribution to the acoustic environment through the classification of recorded events with the help of Deep Neural Networks (DNNs). To accomplish this, an accurate, intelligent, cost-effective measurement IoT-based system was developed and deployed in three different hospitals in Babel, Iraq. This system not only measures and records key environmental parameters within the ICU, including SPLs, temperature, humidity, and light intensity but also automatically captures sounds exceeding predefined SPL thresholds. The Acoustic Event Classification (AEC) approach, integrated with YAMNet, a sound-trained Convolutional Neural Network (CNN), was then applied to detect and classify these recorded sound events. Further clustering analysis was then implemented on the classified data to separate acoustic sources. Consequently, this approach can help to avoid such noisy scenarios, where feasible, leading to reduced noise levels, ultimately avoiding risks and potentially improving the well-being of ICU patients.

The paper is organised as follows: Sect. 2 describes the methodological approach used to tackle the ICU noise problem. Section 3 presents the results of the AEC. Section 4 discusses the findings, research challenges, and limitations. Finally, Sect. 5 provides some conclusions with final remarks and identifies the potential for future work.

2 Methods

2.1 Description of Measurement System

The measurement system developed for this study serves as a compact and portable SLM with dimensions of 15 × 10 × 6 cm. In addition to measuring the SPL, which is the mean sound levels \({{\text{LA}}}_{{\text{eq}}}\), it measures other environmental parameters, including temperature, humidity, and light intensity, at a high resolution of 4 measurements per second. A key feature of the system, crucial for the intended measurement campaign, is its ability to automatically record sound sequences at a sampling frequency (\({{\text{F}}}_{{\text{s}}}\)) of 16 kHz once the monitored SPL exceeds a preset threshold (\({{\text{dB}}}_{{\text{thr}}}\)). The SPL threshold and the recording length can be easily adjusted. The recorded segments are stored locally on an SD memory card, while other measured data are stored locally or transmitted over the Internet and saved on a data management platform. This approach enables remote and continuous monitoring of the soundscape of ICUs over an extended period. The developed system comprises a MEMS microphone with a frequency response conforming to IEC 61672–1 Class 1, mounted on a pre-amplifier. Prior to deployment, all components were checked and calibrated using a B&K 4231 calibrator. Due to its low cost (£55) and lightweight design, this system can be readily replicated and deployed in a large number of ICUs.

2.2 ICUs General Environment, System Deployment, and Measurements

For this study, the system was installed in two Respiratory Intensive Care Units (RICUs) and one Neonatal Intensive Care Unit (NICU) across three different hospitals in Babel, Iraq. The system was mounted on baskets, approximately 1 m away from the patient bed, without interfering with patient-care activities, as depicted in Fig. 1. Each patient bed is equipped with various life-supporting devices positioned behind and beside the bed. Additionally, the units contain equipment, such as air conditioning and ventilation systems, which contribute to noise levels. The number of staff members, including nurses and physicians in the ICUs, varies daily depending on the number of patients and their specific needs, though shift handovers remain consistent. Throughout the recording periods in all ICUs, staff reported typical patient-care activities, such as patient monitoring and data recording. They also noted that the time between patients' discharge and admitting new patients was typically less than half an hour, and the units were consistently fully occupied. The summary of ICUs general characteristics and measurements is tabulated in Table 1.

Fig. 1
figure 1

Deployment of the developed system: a enclosure box, and b deployed in NICU

Table 1 ICUs general characteristics and measurements

The recording of audio segments was triggered by the excessive SPLs constantly measured in the ICUs. Each recorded segment in the dataset is 10 s long. Figure 2 illustrates the raw SPL data recorded in all ICUs over four days. To improve readability, the data were down sampled by a factor of 240, resulting in one sample per minute.

Fig. 2
figure 2

Raw recorded SPLs in dB conducted over 4 days in three Iraqi Hospitals: a RICU1, b RICU2, and c NICU

It is evident from Fig. 2 that the measured SPLs exceeded acceptable limits for a significant portion of the measurement duration. Notably, RICU2 exhibited considerably higher noise levels compared to the other ICUs, with the minimum recorded SPL reaching 41 dB and the maximum nearing 82 dB. Moreover, the number of recorded segments in RICU2 was greater, totalling 14,630, despite the shorter recording period of four days. However, it is worth mentioning that the NICU, despite its open bay structure, exhibited the lowest overall noise levels.

2.3 The Proposed Approach (AEC and Clustering) for Data Analysis

The YAMNet model has been recently developed for sound event detection. It consists of 28 learning layers (27 CNN-layers and a fully connected layer) employing the MobileNet-v1 architecture and encompassing 3.7 million parameters. It was extensively trained on Google datasets with the same characteristics as the collected datasets. The model’s training data is drawn from the expansive AudioSet, which boasts 521 audio classes and stands the most extensive dataset for audio deep learning [62, 63]. The input features to this network are image-based representations, specifically the log Mel-spectrograms that use a Mel scale to mimic human perceptual hearing [64].

The framework of the overall system model used in this work, including event classification and clustering processes, is depicted in Fig. 3, in which the YAMNet is integrated. The pre-processing block comprises two steps: DC offset removal and the normalisation of peak amplitude to fall between the − 1 and 1 ranges. The process of obtaining Mel-spectrograms involves segmenting each audio segment in the datasets into 0.98 s segments with an overlap of 0.8575 s. The one-sided Short Time Fourier Transform (STFT) is then computed for each segment by applying a 25 ms long periodic Hanning window, with the overlap size and hop size being 15 ms and 10 ms, respectively. Then 512 points Fast Fourier Transform (FFT) is applied, and the obtained magnitude spectra are passed through 64 Mel frequency filter banks, spanning the range of 125–7500 Hz. Finally, the logarithmic scale is applied to obtain the log Mel-spectrograms. Each Mel-spectrogram is represented by a 96 × 64 matrix, where 96 and 64 correspond to the number of 10 ms frame spectra and the total number of Mel bands, respectively. The number of Mel-spectrograms, or L arrays, depends on the input length and overlap percentage in the segmentation process. These multi-dimensional Mel-spectrograms are now a valid input for the YAMNet model.

Fig. 3
figure 3

Framework of the overall system model: a AEC, and b clustering process

The model output has the advantage of returning (i) sounds identified over time in audio inputs as string arrays with the specified sound classes described by the AudioSet ontology in chronological order and (ii) a k-by-2 matrix H of time stamps in seconds for corresponding identified sounds. The number of rows, k in H, is the number of detected sound regions. The start and finish times of the identified sound regions are listed in the first and second columns of H, respectively. These outcomes from (i) and (ii) facilitate automatic segmentation and labelling of sound events simultaneously, eliminating the need for additional research assistants to annotate sound events. In the post-processing step, the returning output of YAMNet is exploited by finding the indices of time stamps and capturing related sounds classified in the previous step. The last two blocks in Fig. 3b are repeated for all possible classification outcomes obtained in Fig. 3a. This process is repeated for all potential classification outcomes, leading to the achievement of acoustic source clustering.

For acoustic detection, a minimum time interval of 0.2 s between adjacent regions of the same detected sound is set. Regions shorter than 0.2 s are combined. Additionally, the minimum duration for identified sound areas is set at 0.3 s, discarding regions shorter than this length.

2.4 Ethical Considerations

Consent from the heads of all ICUs in which deploying the developed system was initially managed and obtained. Additionally, reasonable steps were taken to ensure that potential staff and patients are appropriately informed upon admission to the ICU, explaining the data that the system can record and where the recorded data will be used (thesis research). Oral consent was obtained, either from the patients directly if they were awake or on their behalf. Furthermore, it is significant to clarify that participation in this activity is entirely voluntary, and individuals are free to withdraw at any time without facing any consequences. Choosing not to participate or deciding to withdraw will not have any impact on their career if they are healthcare workers or the quality of care they receive if they are patients.

The audio signals collected by the developed recording system may include speech. Therefore, speech segments are the only anticipated ethical issue. All individuals present during the recording periods, including staff, patients, and any other people, remained anonymous, and no personal information or other related data about them was recorded or collected.

For confidentiality reasons, Speech/non-Speech discrimination was intelligently processed, performing Speech Activity Detection (SAD) to remove speech frames from the collected audio datasets following the completion of the recording and before carrying out the main analysis. The process was carried out in the presence of medical professional staff at their computers. The DNN, YAMNet, was used to detect speech frames, which were subsequently eliminated while preserving the rest of the audio data. Despite removing speech segments, the overall number of segments and the total time duration of speech presence were the only parameters determined and noted. It is also important to mention that the original datasets were entirely deleted upon the completion of the SAD process by the professional medical staff. The new datasets resulting from the SAD process were the only datasets handed. These datasets are stored and secured in the Google Team Drive associated with our official university account.

Based on the procedures outlined above, this research adheres to the university's research ethics policy and maintains integrity by complying with all relevant laws, codes, guidance, policies, and procedures. Ethical considerations were broadly taken into account during the project design and data collection by the XXX Faculty Ethics Committee, and the Reference Number TECH2023-A.A-01 was granted.

3 Analysis of Recorded Datasets

In this section, the model presented in Fig. 3 was applied to analyse the collected datasets to detect, classify, and cluster the main noise sources and establish the contribution of each noise source. The authors spent considerable time checking individual.wav files in the obtained class folders, ensuring that audio files that belong to the same class were grouped into the correct class folder. Any audio file that clustered incorrectly was moved to its belonging folder. As a result, distribution, correctly clustered, and final distribution columns were provided in subsequent tables. The classification results were then discussed with the ICU medical personnel to identify the names of the noise sources in each ICU.

For a better understanding of the results presented in the rest of this section, it is beneficial to use the abbreviations DST, which stands for distribution of audio files, and CDL, which represents the cumulative duration length and shows the real-time contribution of the main acoustic sources. The range of intensity levels in dB was estimated by identifying the lowest and highest peak magnitudes in the spectra, followed by \(20{\text{log}}({\text{magnitudes}}/20\times {10}^{-6})\). Furthermore, the AEC rate indicated in the last column in subsequent tables was measured for each detected class individually. The AEC rate is the precision metric of the model output, and it is calculated in our scenario as: \({{\text{precision}}}_{/{\text{class}}}=\mathrm{correctly\; clustered}/{\text{distribution}}\).

3.1 Results Based on Measurements in RICU1

The model efficiently identified and categorised different acoustic sources in RICU1. As indicated in Table 2, eight noise classes, including six medical devices, speech (conversations), and multiple devices working simultaneously, were discovered to be the main noise sources in RICU1. The resulting dataset consisted of 10,180 files and showed that speech was a significant contributor, as illustrated in the pie chart depicted in Fig. 4. The results also revealed that the patient monitor produces SPL higher than other medical devices. Figure 5 shows the time series of the extracted noise sources along with the matching Mel-spectrograms, demonstrating that sources 2, 3, 5, and 6 produce sounds with harmonic characteristics. However, sources 2 and 3 might be the most annoying and inconvenient since they have high-frequency tonnes exceeding 3 kHz.

Table 2 Distribution, real-time contribution, and estimated SPLs of the main acoustic sources in RICU1 as well as AEC accuracy rates
Fig. 4
figure 4

Percentage contribution of the main acoustic sources in the RICU1: a percentage of DST, and b percentage of CDL

Fig. 5
figure 5

Time series and Mel-spectrogram of the extracted noise sources found in RICU1: af medical devices

3.2 Results Based on Measurements in RICU2

The analysis of RICU2 revealed 15,732 audio files distributed in nine noise classes, including six medical devices, speech, mobile ringing, and concurrent operation of multiple devices, as depicted in Table 3. The pie chart plotted in Fig. 6 illustrates that the conversation and patient monitor were found to be the most dominant noise categories. Figure 7 displays the patterns of the extracted noise sources along with the related Mel-spectrograms.

Table 3 Distribution, real-time contribution, and estimated SPLs of the main acoustic sources in RICU2 as well as AEC accuracy rates
Fig. 6
figure 6

Percentage contribution of the main acoustic sources in the RICU2: a percentage of DST, and b percentage of CDL

Fig. 7
figure 7

Time series and Mel-spectrogram of the extracted noise sources found in RICU2: af medical devices

The acquired classes are slightly different, even though this ICU is of the same type as the one before. New noise sources like oxygen and suction pumps are present in this ICU. They produce the highest range of sound levels. Mel-spectrograms illustrate that sources 2 and 3 produce low-frequency signals, while sources 1 and 4 produce harmonic signals. The Ventilator, source 5, might be the most annoying source as it generates frequency tones ranging from 400 Hz to 7.5 kHz. Source 6, producing a low range of sound levels, sounds like a background alarm. It is generated outside the ICU ward, as stated by the nursing staff.

3.3 Results Based on Measurements in NICU

During the seven days of data collection in the NICU, a total of 7709 sound segments were captured; nevertheless, the clustering analysis revealed 8138 audio files produced by eleven noise sources. As illustrated in Table 4 conversation accounts for the largest proportion of segments, approximately 5225 files, followed by the patient monitor and incubator alarms as the second and third most contributors, respectively.

Table 4 Distribution, real-time contribution, and estimated SPLs of the main acoustic sources in NICU as well as AEC accuracy rates

Figure 8 illustrates the proportional contribution of the primary sources of noise. The baby cry and the incubator alarm, producing high ranges of sound levels, are new noise sources in this ICU. It is also worth noting that sources 5 and 6 are ventilator alarms produced when \({\mathbf{O}}_{2}\) pressure is low. They originate from two ventilators made by different manufacturing companies (dmc), nevertheless. It can be noticed from Table 4 that the noise levels produced by various sources, except the ventilator (dmc) and incubator alarms, are low compared to other ICU sources.

Fig. 8
figure 8

Percentage contribution of the main acoustic sources in the NICU: a percentage of DST, and b percentage of CDL

The time series of the extracted noise sources and the related Mel-spectrograms for the new NICU dataset are depicted in Fig. 9. Based on the Mel-spectrograms, the oxygen pump generates a frequency of around 1.46 kHz, while the suction pump produces a high frequency of about 3 kHz. Sources 1 and 4 in this ICU have similar characteristics to sources found in RICU1, indicating that they are related to the same devices and probably made by the same manufacturing company. A multi-device source is also included, demonstrating a disrupted spectrogram.

Fig. 9
figure 9

Time series and Mel-spectrogram of the extracted noise sources found in NICU: ah medical devices

4 Discussion

4.1 AEC Model Performance

According to the results presented in Tables 2, 3, and 4, the suggested framework demonstrated excellent performance in the acoustic event detection and classification of various noise sources in typical ICUs. The high performance achieved is attributed to the combination of resilience techniques employed during the pre-processing phases, including DC offset elimination and signal normalisation. Additionally, utilising the log Mel-spectrogram, an input feature, provides several advantages in representing frequency components more distinctly and capturing essential acoustic characteristics effectively. The logarithmic scaling compresses the spectrogram's dynamic range, increasing the visibility of subtle details while preventing extreme values from taking over the representation. Compression is valuable for conserving memory and computational processing power when analysing massive datasets in real-time applications. Furthermore, logarithmic scaling is better compatible with the human auditory perception system, providing an additional advantage by emphasising the lower and mid-frequency components in spectrograms. Moreover, the logarithmic scaling has the advantage of separating significant signal information from background noise, given that the noise gets compressed relative to the signal. Therefore, the model exhibits robustness against heavy background noise, which, in our case, is the baseline noise, as the ICUs are equipped with many sources that are inductive loads. These pre-processing techniques, with their advantages, directly impact the performance of the proposed framework, in particular the AEC rates. In addition, Figs. 5, 7, and 9 demonstrate that noise sources essentially exhibit distinct time series patterns, producing unique corresponding Mel-spectrograms. Consequently, the model can easily differentiate between these patterns, resulting in greater performance.

The highest AEC rate was achieved when the model processed speech, highlighting its speech recognition capabilities, given that the YAMNet model was extensively trained on Google datasets, which include speech data. It was observed that the performance of the suggested model degraded when detected sounds of the same class had a higher separation between consecutive regions. For example, in RICU1, the model's recognition rate for Source 4, which has long-separated regions, was 91.05%. Conversely, performance improved when consecutive regions of the same detected sound had a shorter separation. This was observed in Source 6, detected in RICU1, in which its signal pattern exhibited a very short separation between consecutive regions, resulting in a recognition rate of 100%. The worst performance scenario obtained when the input feature consisted of a continuous periodic single tone, as depicted in the Mel spectrogram of Source 2 (the oxygen pump) found in RICU2 and NICU. The model still achieved an adequate recognition rate, but not as expected in comparison to other input noise sources.

4.2 Analysed Results

This study proved again that ICUs are generally extremely noisy environments, in line with other published studies on this topic. As illustrated in Fig. 2, SPLs in all monitored ICUs are higher than the advised values over most of the observed periods. The temporal patterns of noise levels exhibit variability across different days and ICUs, pointing to an unstable and unpredictable acoustical environment. Compared to some previous works [65], this study proposes a framework for a more comprehensive understanding of the ICU environment and offers insight into the individual sound sources and their relative contribution to the noise in the ICUs. This information could potentially help in identifying the acoustic sources most amenable to modification and control.

The data analysis clearly shows the significant impact of staff and visitors’ conversations, especially in RICU2 and NICU. Tables 2, 3, and 4 indicate conversations emerge as the most pervasive non-patient noise source, occurring frequently and having the longest cumulative duration in RICU1, RICU2, and NICU. The medical equipment alarms also contributed significantly to the overall acoustic energy. The ventilator, in particular, was identified as the top contributor. The Mel-spectrogram figures suggest that alarm sounds generated by life-supporting devices with multiple frequency components may be more disturbing to patients than other types of sounds or speech.

Moreover, some alarms exhibit harmonic characteristics, with each alarm pulse frequently consisting of two or more tonal components. Such alarms are known to be more irritating than non-tonal ones at the same SPL. Other sources, such as suction pumps, had shorter overall contributed durations and fewer frequent occurrences-less than 1% of the total durations and occurrences. However, these short-term events are likely to disturb patients since they are directly related to patient care. The final noise class in each analysed ICU results from two or more sources operating simultaneously, which can undoubtedly bother patients due to the highly transient nature of unstructured tonal sounds.

Work also finds that some medical devices may produce different numbers of sounds depending on the patient’s condition. A ventilator can emit two different alarms: one generated during normal functioning and another triggered when the oxygen, \({{\text{O}}}_{2}\), pressure is low. It is also important to note that ventilators and other equipment from different manufacturers may produce entirely different alarms despite being related to the same patient case. This variation was observed in the NICU.

Furthermore, ICU patients may find alarm sounds to be more frustrating or inconvenient compared to staff-generated noises, which are somewhat predictable in the presence of nursing staff. Staff-generated noise can, however, reassure patients that they are being taken care of and attended to, whereas medical device alarms, with their unclear meanings, can cause patients to worry more about their health condition.

4.3 Research Challenges with Limitations

ICUs are sensitive areas where only professionals and nurses can have access to provide patient care. Installation of new devices in these wards requires both clinical and administrative approval, often with specific restrictions. Consequently, the established measurement system was granted limited access to ICUs for a set number of days with the consent of qualified staff. However, the impact of such a system on the behaviours of ICU staff and patients remains comparatively unknown.

Although consistent data-processing protocols were applied uniformly across all ICUs in this study, making a direct comparison between the three units or with existing literature proves challenging. This challenge stems from the varying durations of measurements and the highly variable acoustic environments within ICUs. These environments are influenced by a multitude of factors, including the layout of buildings, the ICU ward’s year of construction, the type of ICU, the number of patients and their conditions, the number of nursing staff, the quantity of medical support devices, and the caregiving protocols. These variables inevitably lead to discrepancies in the analysis outcomes of acoustic environments.

5 Conclusion and Future Work

This paper proposes a new, cost-effective, and efficient method for understanding and identifying the origins of noise within typical ICUs by proposing and implementing an Acoustic Event Classification (AEC) system, supported by a Deep Learning (DL) model. The AEC model considerably reduces the extensive human effort required in traditional audio identification and classification approaches, which mostly rely on manual segmentation and annotation of audio events. Notably, the proposed AEC system demonstrated strong performance with low complexity and minimal time consumption, benefiting from not requiring training the model from scratch.

An in-house Internet of Things (IoT)-based prototype system was initially developed, capable of measuring Sound Pressure Levels (SPLs) with the necessary resolution. It automatically recorded audio signals when SPLs exceeded the preset threshold of 45 dB, focussing on capturing the loudest and most significant sounds. Deployed in three hospitals in Babel, Iraq, the system showcased satisfactory performance, bridging the accuracy gap between low-cost and commercially expensive alternatives.

The acoustic source analysis presented in this work provides valuable insights into the origins of ICU noise and their contributions to the overall ICU soundscape. This analysis led to create of three efficient acoustic datasets for further intelligent inspection, offering a manageable number of audio segments relevant to ICU noise sources. The data analysis revealed that staff, visitors, and accompanying individuals were the primary contributors to the majority of sounds, constituting 58.28%, 52.90%, and 68.46% of the cumulative acoustic noise duration for RICU1, RICU2, and NICU, respectively. Notably, relocating staff interactions within ICU wards could significantly reduce sound events, as conversations are a primary noise source.

In conclusion, the precise measurements and comprehensive intelligent analysis presented in this study shed light on the most disruptive noise sources in ICUs, opening the possibility for effective noise mitigation strategies. One idea to explore in the continuation of this work is the investigation of the Active Noise Control (ANC) method for reducing some of the noise present in the ICUs. The sounds captured in this work will be used to design and simulate an ANC system and explore its effectiveness for acoustic noise cancellation. Plans involve developing a real-time ANC system, followed by a clinical trial to assess its effectiveness in reducing noise levels in ICUs and ultimately improving patients' health outcomes.