Abstract
Endangered wildlife is protected in remote land where people are restricted to enter. But intrusions of poachers and illegal loggers still occur due to lack of surveillance to cover a huge amount of land. The current usage of stealth ability of the camera is low due to limitations of camera angle of view. Maintenance such as changing batteries and memory cards were troublesome reported by Wildlife Conservation Society, Malaysia. Remote location with no cellular network access would be difficult to transmit video data. Rangers need a system to react to intrusion on time. This paper aims to address the development of an audio events recognition for intrusion detection based on the vehicle engine, wildlife environmental noise and chainsaw activities. Random Forest classification and feature extraction of Linear Predictive Coding were employed. Training and testing data sets used were obtained from Wildlife Conservation Society Malaysia. The findings demonstrate that the accuracy rates achieve up to 86% for indicating an intrusion via audio recognition. It is a good attempt as a primary study for the classification of a real data set of intruders. This intrusion detection will be beneficial for wildlife protection agencies in maintaining security as it is less power consuming than the current camera trapping surveillance technique.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Audio classification
- Feature extraction
- Linear Predictive Coding
- Random forest
- Wildlife Conservation Society
1 Introduction
The protection of wildlife is becoming important as it grows smaller every year. This is can be evident from the poaching activities [1]. Wildlife Department officers hunt people who involving in poaching activities in Semporna, Sabah [1]. This is can be due to legal loggers sometimes break rules of entering the wildlife zones [2]. To overcome this issue, Sabah Forestry Department favors to set up a dedicated wildlife enforcement team as intruders became more daring in forests and reserve areas [3]. Even though, protection initiative has been made, but the numbers of wildlife species grew lower and even near extinction for some species that reside in the sanctuary. Many approaches were used to protect wildlife and faced many challenges. A recent finding stressed on the urgent need for new or combined approaches that need to be taken up in the research challenges to enable better protection against poaching in wildlife zone [4]. One of the challenges is in the implementation of security in remote areas. It requires a special equipment such as camera trapping and it should be designed to endure the conditions of a rain forest. The use of the camera requires high maintenance due to the location as it has no power grid source, rely on its batteries for surveillance and high probability of being spotted by intruders [5]. The equipment and cameras can be stolen or destroyed by trespassers (WCS, 2017). The use of camera trapping surveillance by Wildlife Conservation Society (WCS), Malaysia acquired a high amount of memory for data storage and faced with fogs and blockages of the camera view. The stealth ability of the camera is low due to limitations of camera angle of view whereby the maintenance such as changing batteries and memory cards were troublesome. In addition, remote location with no cellular network access would be difficult to transmit video data.
There is a need to find a better solution to overcome this issue and consider the maintenance cost and security. Low investment in lack of protection Southeast Asia was a reason for the lack of protection of wildlife [6]. Thus, solution with less power consumption can be considered for less frequent maintenance and cost saving. There is effort in computing solution have been addressed to in detecting intruders mainly in acoustic surveillance. They detect the signals from the sound in the wildlife zone to classify them in two types; intrusion and non-intrusion. In this case, Fast Fourier Transform (FFT) spectrum of the voice signal extracts the information and calculate the similarity threshold to classify the intrusion.
Many researches focused on signal classification for several types of applications includes acoustic classification [7,8,9,10,11,12,13,14,15]. Machine learning methods are still used in acoustic signal solutions even though methods the recent method as such, as Convolution Neural Network and deep learning have been applied to the acoustic classifications [16, 17]. Quadratic discriminant analysis classifies audio signals of passing vehicles based on features based on short time energy, average zero cross rate, and pitch frequency of periodic segments of signals have demonstrated an acceptable accuracy with as compared to some methods in previous studies [18]. In addition, feature extraction of the audio signals is prime of importance task to determine features of audio. For instance, spectrum distribution and the second one on wavelet packet transform has shown different performance with the K-nearest neighbor algorithm, and support vector machine classifier [19]. This paper aims to identify a suitable technique to be efficient in identifying audio signals of an event of intrusion by the vehicle engine, environmental noise and chainsaw activities in wildlife reserves and evaluate an audio intrusion detection using data sets from WCS Malaysia.
2 Related Work
2.1 Signal Processing
The audio recording is a waveform whose frequency range is audible for humans. Stacks of the audio signs are used to define variance data formatting of stimulant audio signals [20]. To create an outline of the output signal, also analyses the stimulation signal and audio signal, classification systems are used which are helpful for catching the signal of any variation of a speech [21]. Prior to the classifications of audio signal, the features in the audio signal are extracted to minimize the amount of data [22]. Feature extraction is a numerical representation that later can be used to characterize a segment of audio signals. The valuable features can be used in the design of the classifier [23]. The audio signal features can be extracted as Mel Frequency Cepstral Coefficient (MFCC), pitch and sampling frequency [22].
MFCC represents the signals which are audio in nature are measured in a unit of Mel scale [24]. These features can be used for speech signal. MFCC is calculated by defining the STFT crescents of individual frame into sets of 40 consents using a set of the 40 weighting contours simulating the frequency sensing capability as humans. The Mel scale relates the frequency which is pre-received of a pure tone to its actual measured frequency.
Pitch determination is important for speech transforming algorithms [25]. Pitch is the quality of a sound in major correlations of the rate of vibration generating it, the amount of lowness or highness of the tone. The sound that comes from the vocal cords starts at the larynx and stops at the mouth. If unvoiced sounds are produced vocal cords do not shake and are open while the voiced sounds are being produced, the vocal cords vibrate and generate pulses known as glottal pulses [24].
2.2 Feature Extraction
One of the audio signal processing and speech processing is Linear Predictive Coding (LPC). It uses frequently in in extracting the spectral envelope of a digital signal of audio in a compact form factor. By applying information relevant to a linear predictive model. LPC provides very accurate speech parameter estimates for speech analysis [25]. LPC coefficient representation is normally used to extract features taking account of the spectral envelope of signals in the analog format [26]. Linear prediction is dependent on a mathematical computation whereas the upcoming values of a time discrete signal are specified as a linear function with consideration of previous samples. LPC is known as a subset of the filter theory in digital signal processing. LPC applies a mathematical operation such as autocorrelation method of, mhj autoregressive modeling allocating the filter coefficients. The feature extraction of LPC is quite sufficient for acoustic event detection tasks.
Selection of extracting features is important to get the optimized values from a set of features [27]. Selecting features from a large set of available features will allow a more scaled approach. These features will then use to determine the nature of the audio signal or classification purposes. It is used to select the optimum values to keep accuracy and performance level and minimizing computational cost altogether. It has resulted in drastic effects towards the accuracy and will require more computational cost if no optimum features were developed [28]. Reduction of features can improve the accuracy of prediction and may allow necessary, embedded, step of the prediction algorithm [29].
2.3 Random Forest Algorithm
Random forests are a type of ensemble method for predicting using the average over predictions of few independent base models [30]. The independent model is a tree as many trees make up a forest [31]. Random forests are built by combining the predictions of trees in which are trained separately [32]. The construct of random tree, it follows three choices [33].) as the following:
-
Method for splitting the leaves.
-
Type of predictor to use in each leaf.
-
Method for injecting randomness into the trees.
The trees in random forest are randomized based regression trees. The combinations will form an aggregated regression estimate at the end [34]. Ensemble size or the number of trees to generate by the random forest algorithm is an important factor to consider as it shows to differentiate in different situations [35]. Past implementations of the random forest algorithm and their accuracy level of relevance to the ensemble size affect accuracy levels majorly. Bag of Features is the input data for predictions [36]. Sizes of the ensemble in this case show that there is a slightly better accuracy in setting the trees to a large number [37].
3 Development of an Audio Event Recognition for Intrusion Detection
3.1 System Architecture
The development of the audio events recognition for intrusion detection starts with the identification of system architecture. Figure 1 demonstrates the system architecture and explains the main components of the system generally in block diagram form. The system should be able to classify the audio as an intrusion or non-intrusive to allow accurate alarms of intrusions notify rangers. Figure 2 shows the system flow diagram consisting of a loop of real time recording of audio and classification.
3.2 Data Acquisition and Preparation
This section explains data processing and feature extraction processes. A set of recordings/signal dataset was provided by WCS Malaysia. The recording consists of 60 s of ambient audio of rainforest environment and vehicle engine revving towards the recording unit in the rainforest.
Since acquiring raw data are unstructured and unsuitable for machine learning the data requires a standard form to allow the system to be able to learn from this source. A standardized form has been formulated to allow a more lenient approach to solving the problem. The parameters are 5 s in duration, waveform audio files of the mono channel on the frequency of 44100 Hz. Two segments of 5 s from the raw audio file is combined using Sony Vegas an application for audio & video manipulation to resynthesize into training data. Independent audio files of vehicle engines and rainforest background environmental overlap in various combinations as described in scenarios below. The vehicle audio is lowered to produce various distances of vehicles between the devices. To produce a long-distance scenario the vehicle audio is reduced by 5 dB up to 20 dB. The composed audio is then verified again by human testing to validate further into logical terms of hearing ability and classification. In Figs. 3, 4, 5 and 6 visualize the 4 scenarios of resynthesizing of 2 layers of audio signals, namely the above audio is a natural environment and below is the vehicle engine audio segment.
Resynthesized audio files that are created as the training data are divided into three separate audio events. The recording acquired are altered using software to extract various five seconds of applicable audio indication of a vehicle or chainsaw activity and rainforest typical conditions. Data on vehicle audio activity consist of 4 × 4 vehicles moving since machine learning requires the data in the form of numbers the training audio data is not yet ready for modelling. The next step is to extract the feature of LPC from the audio files created before. The feature extraction of waveform audio files is done using MATLAB R2017b digital Signal processing toolbox using the LPC function.
4 Results and Discussion
4.1 Audio Data Analysis Using Welch Power Spectral Density Estimate
To further examine the waveform audio files, it is converted from time domain to frequency domain. By using, Welch Power Spectral Density Estimate in MATLAB R2017b function, Figs. 7a–f, shows different scenarios and the representation of audio in which power spectral density estimation graph form.
The composition is constructed from double environmental audio overlapped of a minus 20 dB of the engine activity. This audio file was validated by human testing, but the results are no presence of vehicles activity. This shows that even humans cannot hear up to this level of detection. This finding has shown that machines has shown the capability of performing surveillance accurately.
4.2 Results of a Random Forest Simulation
The simulation of the random forest used “sklearn” a python machine learning library and “Graphviz” a visualization library to create the decision trees. The simulation is done by producing 4 trees created by several subsets from the entire dataset. Gini index or entropy is normally used to create decision trees on each subset with random parameters. Testing is done for all 4 trees with the same input data to find most trees resulting the same output. Random Forest tree generation system is a series of random selection of the main training dataset into smaller subsets that consist of even classed data [27]. In this case it is broken up into 2 subsets and each subset is used to generate tree with Gini index and entropy method. It indicates that producing an ensemble of 4 trees can be used for predicting in random forest technique. Figure 8 displays the random forest dataset selection process and tree generation process.
Test set A, B and C are features extracted from audio of the vehicle, the nature and chainsaw respectively. Variables L1, L2, L3, L4, L5, L6, L7, L8, L9 and L10 are the LPC extracted features from audio files. Table 1 shows the test inputs for the experiment. Figures 9 and 10 demonstrate the example of the generated and visualized tree.
Each test set A, B and C will be tested in all 4 trees generated. The majority class will be the most similar results among tree results. Table 2 shows the results for each tree and test set respectively. It can be concluded that the results prove that as trees could produce false results the whole ensemble will allow better interpretation of the overall prediction. This shows that a cumulative result of a majority will help avoid false positives.
Results of MATLAB 2017b Tree bagger classifier in the Classification Learner App and A series of test has been done to find results on the WEKA platform is shown in Table 3. On both platforms, it shows an average of 86% positive prediction based on the 10 variables of LPC features. The results obtained is promising enough as the training data is limited.
By using Classification Learner App in MATLAB 2017b allow to run many classifiers. It is found that Linear Discriminant method is more accurate in predicting LPC extraction of audio files that consist of events such as vehicle, chainsaws and natural acoustic events. Basic decision tree results may differ based on their maximum splits that could be controlled to produce diversity of results. The performance of each type of tree is assessed on the entire data set. Fine Tree is defined by increasing the maximum splits allowed in the generation process. Medium tree is in between a fine tree and a coarse tree with just enough maximum splits allow. A Coarse tree allows low numbers of total splits. Table 4 shows the total results of all basic decision trees generated with their respective parameters.
5 Conclusions
Random Forest technique with Linear Predictive coding feature extraction has been found to be efficient. The combinations of linear predictive coding feature extraction and random forest classification is the best combination with past studies. The current study only achieved 86%. It is believed to be connected to the data variance and amount collected for training the model. Thus, it could be concluded in the implementation of random forest require a decent data set for training to allow better results. LPC extracting and classification of audio signals are very light in requirements of computing power. In future, the evaluation of other techniques as such of deep learning and different type of signal datasets can be applied for a better solution.
References
Wildlife.gov.my: Latar Belakang PERHILITAN. http://www.wildlife.gov.my/index.php/2016-04-11-03-50-17/2016-04-11-03-57-37/latar-belakang. Accessed 30 Apr 2018
Pei, L.G.: Southeast Asia marks progress in combating illegal timber trade. http://www.flegt.org/news/content/viewItem/southeast-asia-marks-progress-in-combating-illegal-timber-trade/04-01-2017/75. Accessed 30 Apr 2018
Inus, K.: Special armed wildlife enforcement team to be set up to counter poachers, 05 November 2017. https://www.nst.com.my/news/nation/2017/10/294584/special-armed-wildlife-enforcement-team-be-set-counter-poachers. Accessed 30 June 2018
Kamminga, J., Ayele, E., Meratnia, N., Havinga, P.: Poaching detection technologies—a survey. Sensors 18(5), 1474 (2018)
Ariffin, M.: Enforcement against wildlife crimes in west Malaysia: the challenges. J. Sustain. Sci. Manag. 10(1), 19–26 (2015)
Davis, D., Lisiewski, B.: U.S. Patent Application No. 15/296, 136 (2018)
Davis, E.: New Study Shows Over a Third of Protected Areas Surveyed are Severely at Risk of Losing Tigers, 04 April (2018). https://www.worldwildlife.org/press-releases/new-study-shows-over-a-third-of-protected-areas-surveyed-are-severely-at-risk-of-losing-tigers. Accessed 30 June 2018
Mac Aodha, O., et al.: Bat detective—deep learning tools for bat acoustic signal detection. PLoS computational Biol. 14(3), e1005995 (2018)
Maijala, P., Shuyang, Z., Heittola, T., Virtanen, T.: Environmental noise monitoring using source classification in sensors. Appl. Acoust. 129, 258–267 (2018)
Zhu, B., Xu, K., Wang, D., Zhang, L., Li, B., Peng, Y.: Environmental Sound Classification Based on Multi-temporal Resolution CNN Network Combining with Multi-level Features. arXiv preprint arXiv:1805.09752 (2018)
Valada, A., Spinello, L., Burgard, W.: Deep feature learning for acoustics-based terrain classification. In: Bicchi, A., Burgard, W. (eds.) Robotics Research. SPAR, vol. 3, pp. 21–37. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-60916-4_2
Heittola, T., Çakır, E., Virtanen, T.: The machine learning approach for analysis of sound scenes and events. In: Virtanen, T., Plumbley, M., Ellis, D. (eds.) Computational Analysis of Sound Scenes and Events, pp. 13–40. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-63450-0_2
Hamzah, R., Jamil, N., Seman, N., Ardi, N, Doraisamy, S.C.: Impact of acoustical voice activity detection on spontaneous filled pause classification. In: Open Systems (ICOS), pp. 1–6. IEEE (2014)
Seman, N., Roslan, R., Jamil, N., Ardi, N.: Bimodality streams integration for audio-visual speech recognition systems. In: Abraham, A., Han, S.Y., Al-Sharhan, S.A., Liu, H. (eds.) Hybrid Intelligent Systems. AISC, vol. 420, pp. 127–139. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-27221-4_11
Seman, N., Jusoff, K.: Acoustic pronunciation variations modeling for standard Malay speech recognition. Comput. Inf. Sci. 1(4), 112 (2008)
Dlir, A., Beheshti, A.A., Masoom, M.H.: Classification of vehicles based on audio signals using quadratic discriminant analysis and high energy feature vectors. arXiv preprint arXiv:1804.01212 (2018)
Aljaafreh, A., Dong, L.: An evaluation of feature extraction methods for vehicle classification based on acoustic signals. In: 2010 International Conference on Networking, Sensing and Control (ICNSC), pp. 570–575. IEEE (2010)
Baelde, M., Biernacki, C., Greff, R.: A mixture model-based real-time audio sources classification method. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2427–2431. IEEE (2017)
Dilber, D.: Feature Selection and Extraction of Audio, pp. 3148–3155 (2016). https://doi.org/10.15680/IJIRSET.2016.0503064. Accessed 30 Apr 2018
Xia, X., Togneri, R., Sokel, F., Huang, D.: Random forest classification based acoustic event detection. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 163–168. IEEE (2017)
Lu, L., Jiang, H., Zhang, H.: A robust audio classification and segmentation method. In: Proceedings of the Ninth ACM International Conference on Multimedia, pp. 203–211. ACM (2001)
Anselam, A.S., Pillai, S.S.: Performance evaluation of code excited linear prediction speech coders at various bit rates. In: 2014 International Conference on Computation of Power, Energy, Information and Communication (ICCPEIC), April 2014, pp. 93–98. IEEE (2014)
Chamoli, A., Semwal, A., Saikia, N.: Detection of emotion in analysis of speech using linear predictive coding techniques (LPC). In: 2017 International Conference on Inventive Systems and Control (ICISC), pp. 1–4. IEEE (2017)
Grama, L., Buhuş, E.R., Rusu, C.: Acoustic classification using linear predictive coding for wildlife detection systems. In: 2017 International Symposium on Signals, Circuits and Systems (ISSCS), pp. 1–4. IEEE (2017)
Homburg, H., Mierswa, I., Möller, B., Morik, K., Wurst, M.: A benchmark dataset for audio classification and clustering. In: ISMIR, September 2005, vol. 2005, pp. 528–531 (2005)
Jaiswal, J.K., Samikannu, R.: Application of random forest algorithm on feature subset selection and classification and regression. In: 2017 World Congress on Computing and Communication Technologies (WCCCT), pp. 65–68. IEEE (2017)
Kumar, S.S., Shaikh, T.: Empirical evaluation of the performance of feature selection approaches on random forest. In: 2017 International Conference on Computer and Applications (ICCA), pp. 227–231. IEEE (2017)
Tang, Y., Liu, Q., Wang, W., Cox, T.J.: A non-intrusive method for estimating binaural speech intelligibility from noise-corrupted signals captured by a pair of microphones. Speech Commun. 96, 116–128 (2018)
Balili, C.C., Sobrepena, M.C.C., Naval, P.C.: Classification of heart sounds using discrete and continuous wavelet transform and random forests. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 655–659. IEEE (2015)
Denil, M., Matheson, D., De Freitas, N.: Narrowing the gap: random forests in theory and in practice. In: International Conference on Machine Learning, January 2014, pp. 665–673 (2014)
Behnamian, A., Millard, K., Banks, S.N., White, L., Richardson, M., Pasher, J.: A systematic approach for variable selection with random forests: achieving stable variable importance values. IEEE Geosci. Remote Sens. Lett. 14(11), 1988–1992 (2017)
Biau, G.L., Curie, M., Bo, P.V.I., Cedex, P., Yu, B.: Analysis of a random forests model. J. Mach. Learn. Res. 13, 1063–1095 (2012)
Phan, H., et al.: Random regression forests for acoustic event detection and classification. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 20–31 (2015)
Xu, Y.: Research and implementation of improved random forest algorithm based on Spark. In: 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA), pp. 499–503. IEEE (2017)
Zhang, Z., Li, Y., Zhu, X., Lin, Y.: A method for modulation recognition based on entropy features and random forest. In: IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C), pp. 243–246. IEEE (2017)
Abuella, M., Chowdhury, B.: Random forest ensemble of support vector regression models for solar power forecasting. In: Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), pp. 1–5. IEEE (2017)
Manzoor, M.A., Morgan, Y.: Vehicle make and model recognition using random forest classification for intelligent transportation systems. In: 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC), pp. 148–154. IEEE (2018)
Acknowledgement
The authors express a deep appreciation to the Ministry of Education, Malaysia for the grant of 600-RMI/FRGS 5/3 (0002/2016), Institute of Research and Innovation, Universiti Teknologi MARA and the Information System Department, Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam, Malaysia for providing essential support and knowledge for the work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yusoff, M., Md. Afendi, A.S. (2019). Acoustic Surveillance Intrusion Detection with Linear Predictive Coding and Random Forest. In: Yap, B., Mohamed, A., Berry, M. (eds) Soft Computing in Data Science. SCDS 2018. Communications in Computer and Information Science, vol 937. Springer, Singapore. https://doi.org/10.1007/978-981-13-3441-2_6
Download citation
DOI: https://doi.org/10.1007/978-981-13-3441-2_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3440-5
Online ISBN: 978-981-13-3441-2
eBook Packages: Computer ScienceComputer Science (R0)