1 Introduction

A traditional method has been used for many years for determining the density of railway ballast. According to this method, a hammer with the wooden head is used to snap on the rail to produce an acoustical impact. Then, an experienced person listens carefully to the sound to distinguish good density of the ballast from bad ballast density [1]. The authors of the present work were inspired by this traditional method to provide a new method for determining the density of aggregate used in road, railway, etc. In this research, an acoustical impact was created and then analyzed using speech recognition technology through wavelet transforms and ANN.

2 Review

This review takes several-steps: (1) Comprehensive comparison of domains. (2) Extraction of journals and databases; (3) sieve and collect appropriate articles; and (4) db analysis.

Table 1 summarizes the existing abbreviations fined in literature. Several domains have been proposed by scientists, and these techniques are classified in Fig. 1. The latest study on AE methods is shown in Fig. 1, are studied in 2012–2015, and 2015–2018 (Fig. 2).

Table 1 The acoustic methods of non-destructive testing of building structures
Fig. 1
figure 1

The steps of review task

Fig. 2
figure 2

Comprehensive comparison of domains by researchers in recent years in the field of AE

In this section, we summarize a brief review of methods proposed in the literature. From Fig. 1, can be concluded that, Engineering (with > 70%) and Material (with 26%) are used more than the others. It also means that AE is an effective method in solving the engineering problems.

As shown in Fig. 3, among the publications, articles and conference papers class are most widely used. According to the study, the frequency of the two articles and conf. groups in total is equivalent. According to Fig. 3, more than 80% of the total publications in AE, are Article and conf. Also about 2% of the current approaches are review while the remaining is mostly book or chapter book.

Fig. 3
figure 3

Comprehensive comparison of recent publication in the field of AE

The main step is to recognize the journals and databases that cover any related material for this review. Figure 4 displays the statistics based on CiteScore Publication by year of the academic journals that were searched in the review process.

Fig. 4
figure 4

Comprehensive comparison of CiteScore Publication by year

In this step, we broke down the topics into subtopics that fall under the term of “AE”. These topics include AE engineering (AEEng.), AE material (AEM) and AE computer (AEco.). Table 1 contains the collected information on the studied references.

2.1 A Review on the Use of Acoustic Analysis in Quality Evaluation

Food sorting using the acoustical impact technology is used in the food industry. For example, there is a traditional method for distinguishing high-quality watermelon from low-quality watermelon. According to this traditional method, an experienced person listens carefully to the noise caused by the finger tapping to distinguish the high-quality watermelon from low-quality watermelon. Recently, the automatic food sorting using the speech recognition technology has received high attention [51]. Foods such as tomatoes, walnuts, beans, and pistachio nuts are released on the floor and then automatic food sorting is done using the acoustical impact technology [52, 53].

Another important application of acoustic analysis is acoustic emission (AE) monitoring. The AE monitoring is a procedure used to detect and locate damage in mechanically loaded structures, materials, and components [54]. Usually, piezoelectric sensors are attached on the surface of the material to record the elastic waves generated by cracking incidents in the material. Using these sensors provides valuable input on the failure process from early times, certainly before the fracture is apparent by visible macro-cracks [55].

Recently, acoustic played an important role in automatic assessment systems and scientists have paid more attention to this field.

Acoustic Emission testing is applied to inspect and monitor pipelines [56], storage tanks and pressure vessels [57], bucket trucks, bridges, aircraft [58], and a variety of composite and ceramic components [59]. The other acoustic methods of non-destructive testing of building structures are presented in Table 2 [60].

Table 2 The acoustic methods of non-destructive testing of building structures [60]

2.2 A Review on the Speech Recognition Technology

Automatic Speech Recognition is a process in which speech signals are converted into a sequence of words as the linguistic units [61]. The process is generally divided into two phases: the first phase is the system of “learning” or “training” process. The second phase is the “recognition” or “test” phase of the speech signal. After pre-processing of the unknown voice data and having feature extraction, the information is imported into a speech model trained using a certain decision comparison criteria, compared with the speech template. Making data analysis and identification for the unknown speech signal on the established speech model, then it is possible to withdraw the final recognition results of the unknown speech signal [62]. Signal processing and feature extraction are the main parts of the speech recognition system. Feature extraction is considered as the heart of the system. This component extracts those features from the input speech (signal) that help the system identify the speaker [62]. There are many conventional and new signal processing methods and feature extraction techniques. Some of these new techniques are presented in the following:

Recently, a new feature extraction method has been proposed to achieve robustness of speech recognition systems. This method combines the phase autocorrelation with bark wavelet transform. The result shows that the word recognition rate using this combined method is 60%, whereas it is 41.35% for the conventional feature extraction method [63]. Other new methods offered for speaker feature extraction are on established on the basis of Formants, Wavelet Entropy, and Neural Networks, In contrast to conventional speaker recognition methods that extract features from non-vowel signals, the proposed method extracts the features from vowels. The results were compared to those of the classical algorithms well-known for speaker recognition and found to be superior [64]. In another recent study, a novel feature extraction was introduced with dimension reduction technique using the combined signal processing and statistical approaches such as Discrete Wavelet Transform and Multidimensional Scaling. In this regard, the Support Vector Machine plays a major role for classification of nonlinear heterogeneous dataset [65].

3 Method

An acoustical system was developed to determine aggregate density. The system includes three microphones, a steel plate, a sinker, an isolated body, and digital signal processing hardware. The steel plate is placed on aggregate, followed by releasing the sinker on the steel plate. Next, the acoustic impact is analyzed using speech recognition technology. Figures 5, 6, 7 and 8 present the proposed acoustical system.

Fig. 5
figure 5

A schematic view of the acoustical system: (1) isolated body, (2) releaser, (3) isolated box, (4) sinker, (5) steel plate, (6) microphone, (7) hole, (8) aggregate, (9) Patch Cable, and (10) Laptop set

Fig. 6
figure 6

Three microphones are placed inside the isolated black boxes. There is a hole next to each microphone

Fig. 7
figure 7

The sinker and the steel plate

Fig. 8
figure 8

The steel plate is placed on aggregate. Then, the sinker is released on the steel plate

The acoustic system has three microphones. At the beginning, the microphones were placed inside the main body of the system, but it was observed that in this case the system creates random sound signals. To correct this problem, the microphones were placed outside the main isolated body and then covered with an isolated box (Fig. 6).

Figure 7 shows the sinker and the steel plate. The thickness of the steel plate is 2 mm. The inadequate thickness of the plate leads to shaking the plate, causing the sound signal to be random. Our observations showed that the thickness of 2 mm is sufficient. Table 3 presents specifications of the acoustic system used in this work.

Table 3 The acoustic system details

Because of the low friction between the plate and the sinker on the plate shakes, the randomized sound signals are generated. To correct this problem, the underside of the sinker was covered with a plastic sheet. Figure 9 shows the random signals before modifying the acoustic system.

Fig. 9
figure 9

The random signals before modifying the acoustic system

We suspect that bulk density of the aggregate effects the sound signals for the following reasons:

  • Changing aggregate grading in the upper layer The vibratory compaction causes reduction of fine sand content on the upper surface of the aggregate, which can affect the sound transmission.

  • Increasing the empty space below the plate The vibratory compaction causes to increase the empty space between the coarse grains. Therefore, the empty space below the plate increases. The increasing gap between the sound source and a porous mass causes increasing the sound absorption at low frequencies.

  • Porosity reduction Porosity reduction results in increasing the sound absorption at high frequencies. Moreover, porosity reduction increases the velocity of the sound in the aggregate.

4 Aggregate Samples Preparation

Aggregate samples at different densities were created according to ASTM Standard C29. This test method covers the determination of bulk density of aggregate in either compacted or loose condition and calculates voids between particles in fine, coarse, or mixed aggregates based on the same procedure. This test method is applicable to aggregates not exceeding 125 mm [5 in.] in nominal maximum size. Figure 10 illustrates the aggregate to create samples. The nominal maximum size of the aggregate is 40 mm.

Fig. 10
figure 10

Depot for preparing aggregate samples

Figure 11 shows metal container recommended by ASTM Standard C29. Standard size container was determined according to the nominal maximum size of the aggregate (40 mm).

Fig. 11
figure 11

Metal container according to ASTM Standard C29

The shoveling procedure for loose bulk density shall be used only when specifically stipulated. Otherwise, the compact bulk density shall be created by either the rodding procedure for aggregates having a nominal maximum size of 37. 5 mm (11/2 in.) or less, or the jigging procedure for aggregates having a nominal maximum size greater than 37. 5 mm (11/2 in.) and not exceeding 125 mm (5 in.). As the aggregates to create samples have a nominal maximum size greater than 37. 5 mm, the compact bulk density shall be created by the jigging procedure.

The measure was filled in three approximately equal layers, each layer was compacted by placing the measure on a firm base, raising the opposite sides alternately about 50 mm (2 in.), and allowing the measure to drop in such a manner as to hit with a sharp slapping blow. The aggregate particles, by this procedure, are arranged in a densely compacted condition. According to ASTM Standard C29, each layer shall be compacted by dropping the measure 50 times in the manner described, 25 times on each side. However, our observations showed that dropping the measure 52 times, 13 times on each of four sides of the metal container is closer to vibratory compaction at the construction site so that we did likewise. In the present research, we considered loose condition as low density at compacted condition. For creating a medium density of aggregate, each layer was compacted by dropping the measure 12 times, 3 times on each four sides (Fig. 12).

Fig. 12
figure 12

Some aggregate samples at different density

5 Sound Data Collection

Audio signals were collected, using the acoustical system as described in Sect. 2. Figure 13 shows collecting audio signals using the system. 318 sound data (106 sound data for each condition) from the surface of aggregate samples at different bulk density. Before placing the steel plate, the surface of the aggregate was leveled with the fingers or a straightedge since the slope of the steel plate must be almost zero. The slope of the steel plate causes slipping the released sinker which effects on the sound.

Fig. 13
figure 13

Collecting audio signals using the acoustical system

6 Feature Extraction

Sound curves of the collected data were drawn using MATLAB software (Fig. 14). The signals include 3 parts as shown in Fig. 15. Part 1 and part 3 were removed from the original signals so just part 2 was processed. The signals were decomposed at level 5 using one-dimensional discrete wavelet analysis. Figure 16 shows decomposition a recorded signal at level 5. Six features (Mean and Standard Deviation of the wavelet coefficients and energy at low and high frequencies) were extracted from each of sound data. To achieve the desired results, it is necessary to limit the data to a specific range by data normalization. In this study, the following formula was used for data normalization [66]:

$$a_{i} = 0. 1 + 0. 8\left( {\frac{{A_{i} - A_{min} }}{{A_{max} - A_{min} }}} \right)$$
(1)

where \(A_{i}\) is an original value, \(a_{i}\) is the normalized value, \(A_{max}\) is the maximum value, and \(A_{min}\) is the minimum value. Before normalization, data were passed through the absolute value function.

Fig. 14
figure 14

Sound curves of some of the collected data

Fig. 15
figure 15

Three parts of a recorded sound signal

Fig. 16
figure 16

Decomposition a recorded signal at level 5

7 Developing the Neural Networks

Neural network (NN) models are well suited to domains where large labeled datasets are available since their capacity can easily be increased by adding more layers or more units in each layer. However, big networks with millions or billions of parameters can easily overt even the largest of datasets [67].

7.1 Pattern Recognition Problem 1

A neural network was developed to classify sound data into two aggregate densities (good density and low density). A pattern recognition problem was defined. The goal here is to choose the structure of the neural network to achieve a desired input/output relationship. The input matrix has 6 rows (number of features) and 212 columns (number of sound data), and the target is a matrix zero–one with 2 rows (number of conditions) and 212 columns (number of sound data). A backpropagation learning algorithm was employed for learning in the MATLAB program. First, a perceptron neural network with 6-25-2 structure was considered. The training function was varied as shown in Fig. 17. The method to update weight and bias values in the training functions is presented in Table 4.

Fig. 17
figure 17

The classification percent error using “training Fc”

Table 4 The method to update weight and bias values in the training functions

Next, the number of hidden layers was varied as 2-35 in order to see the sensitivity of the results. Figure 17 shows the classification percent error for training functions while Fig. 18 presents the classification MSE according to the number of hidden layer neurons. Finally, the number of hidden layers was varied as 1 and 2. Based on a trial and error approach, a 6-30-2 neural network with traincgp function was chosen to solve the pattern recognition problem.

Fig. 18
figure 18

The classification percent error with the number of hidden layer neurons

7.2 Pattern Recognition Problem 2

Another neural network was developed for sound data classification into three aggregate densities (good density, medium density, and low density). A pattern recognition problem was defined. Again, the goal here is to choose the structure of the neural network to achieve a desired input/output relationship. The input matrix has 6 rows (number of features) and 318 columns (number of sound data), and the target is a matrix zero–one with 3 rows (number of conditions) and 318 columns (number of sound data). A backpropagation learning algorithm was employed for learning in the MATLAB program. First, a perceptron neural network with 6-25-3 structure was considered. The training function was varied as shown in Fig. 19. Then the number of hidden layers was varied as 3-40 in order to examine the sensitivity of the results.

Fig. 19
figure 19

The classification percent error according to the training Fc

Figure 19 shows the classification percent error for training functions while Fig. 20 shows the classification MSE according to the number of hidden layer neurons. After varying the number of hidden layers as 1 and 2, an ultimate 6-35-35-3 neural network with trainbr function was chosen to solve the pattern recognition problem.

Fig. 20
figure 20

The classification MSE according to the number of hidden layer neurons

7.3 Classifier Performance Evaluation

Classification results can be displayed in a confusion matrix. Table 5 shows a confusion matrix with the following entries:

Table 5 The confusion matrix for two-class classification problem
  • TPψ is the number of correct positive predictions;

  • FPψ is the number of incorrect negative predictions;

  • TNψ is the number of incorrect positive predictions; and

  • FNψ is the number of correct negative predictions

In this paper, we have used “prediction accuracy”, “classification precision” and “MCC” to evaluate the model for the classification problems 1 and 2. These performance measures can be calculated directly from the confusion matrix as follows:

$${\text{Accuracy}} = \frac{TP + TN}{TP + FP + TN + FN}$$
(2)
$${\text{Precision}} = \frac{TP}{TP + FP}$$
(3)
$$MCC = \frac{TP \times TN - FP \times FN}{{\sqrt {\left( {TP + FP} \right)\left( {TP + FN} \right)\left( {TN + FP} \right)\left( {TP + NF} \right)} }}.$$
(4)

8 Results and Discussion

As described in Sect. 7, a 6-30-2 neural network with traincgp function is developed to solve pattern recognition problem 1 while a 6-35-35-3 neural network with trainbr function is developed to solve pattern recognition problem 2. Error histogram was used to validate the networks performance. The error histogram can give an indication of outliers, which are data points where the fit is significantly worse than the majority of data. It is a good idea to check the outliers to determine if the data are of poor quality, or if those data points are different than the rest of the dataset. If the outliers are valid data points but are unlike the rest of the data, then the network is extrapolating for these points. In this case, more data collection can be helpful. Figure 21 indicates the error histogram for the chosen neural network to solve problem 1 (a) and problem 2 (b). The blue, green, and red data represent training, validation, and testing data, respectively. As these histograms show the number of outliers is not significant.

Fig. 21
figure 21

Error histogram for problem 1 (a) problem 2 (b)

MSE curve is used to obtain additional verification of the network performance. Figure 22b indicates the mean squared errors for training, testing, and validation data. The training stopped when the validation error increased, which occurred at iteration 25. As shown in Fig. 22b, the result is reasonable because of the following considerations:

Fig. 22
figure 22

Training, validation, and test performance (MSE)

  • The final mean-square error is small.

  • The test set error and the validation set error have similar characteristics.

  • No significant overfitting occurred by iteration 19 (where the best validation performance occurs).

Classification results are shown in a confusion matrix. A confusion matrix contains information about actual and predicted classifications done by a classification system. Each column of the matrix represents the instances in a predicted class while each row representing the instances in an actual class. Accuracy and precision performance measures were used to evaluate the model for the classification problems 1 and 2. Classification accuracy is the number of correct predictions made divided by the total number of predictions made. Precision is the number of true positives divided by the number of true positives and false positives.

MCC (Matthews correlation coefficient) is the other performance measure to evaluate the model for the classification problems. This coefficient is used as a measure of the quality of the classifications. The MCC takes into account true and false positives and negatives and is generally regarded as balanced.

Figure 23 illustrates the confusion matrixes for training, testing, and validation, and three kinds of data combined for the classification problem 1. According to the matrixes, the prediction accuracy is 89.9% on the train set, 87.5% on the validation set, 90.6% for the test set, and 89.6% for all datasets. Figure 24 presents the test confusion matrix and all confusion matrix for classification problem 2. According to the matrixes shown in Fig. 19b, the neural network prediction accuracy is 77.1% on the test set and 96.5% for all datasets. The other useful performance measures (the MCC and the Precision) were listed in Table 3. These performance measures show that the classification of good density and low density was successful. The third row of Table 6 shows the MCC values. The MCC is, in essence, a correlation coefficient between the observed and predicted classifications, which returns a value between − 1 and + 1. A coefficient of + 1 represents a perfect prediction, 0 no better than a random prediction, and − 1 indicates total disagreement between prediction and observation. As shown Table 6, the MCC values for the classification problems 1 and 2 are in the range of 0.8–0.98, which is acceptable.

Fig. 23
figure 23

Training, validation, and test confusion matrixes (problem 1)

Fig. 24
figure 24

The test confusion matrix and all confusion matrixes (problem 2)

Table 6 The performance measures to evaluate the model for the classification of good density and low density

As described earlier, the accuracy of the classification of three density condition in problem 2 is less than that of two density conditions in problem 1, but the classification of good density and low density in problem 2 is almost as accurate as the classification of good density and low density in problem 1. Hence, we can state that adding the samples with medium density did not increase or decrease the accuracy of the classification of good density and low density.

9 Conclusion

An acoustical system was developed to determine bulk density of aggregate. This system is placed on the surface of the aggregate and produces a sound data, as described in Sect. 2. The sound data are analyzed for feature extraction so that to define a pattern recognition problem. As one of the pattern recognition problem-solving, artificial neural network (ANN) method was used to develop a pattern recognition system to classify sound data into three aggregate densities (good density, medium density, and low density). The classification accuracy was 77.1% on the test set and 96.5% for all datasets. Another neural network was developed to classify sound data into two aggregate densities (good density and low density). The classification accuracy was 90.6% on the test set and 89.6% for all datasets.