1 Introduction

Infrastructure maintenance is paramount for sustainable urbanization. With the population residing in urban areas doubling every year [15], the consequent degradation of infrastructure is inevitable. If urbanization happens unplanned it will exert negative impacts on a country’s economic growth, social welfare, and overall quality of life [8]. The smart city concept comes into play as an approach to find solutions to problems arising from rapid urbanization, with sustainability in mind. In a broad definition, a smart city is about creating new ways of incorporating collaborations from individuals for the betterment of city governance with the use of information and communication technology (ICT) [30]. The innovations and expansions of ICT expedite the growth of smart cities.

Among the other types of infrastructure, the road network plays a vital role in sustainable urban development. Technical defects and poor road conditions result in poor ride quality and additional costs due to increased fuel consumption, vehicle damages, and additional travel time. As fuel consumption is directly related to the vehicle’s CO2 emission [18], it also triggers environmental pollution. Therefore road conditions should be frequently monitored to ensure they comply with the correct standards. The traditional methods of road quality monitoring such as the use of dipstick profilers, bump integrators and visual inspection are labour-intensive, expensive, and require expert supervision. Developing countries with emerging economies struggle to find such a workforce or precise technologies to maintain their rapidly expanding road system. In Sri Lanka as well the road sector has been expanding over the years resulting in becoming the country with the highest road density in South Asia [27]. However, around 25% of this road network suffers from negligence and lack of maintenance especially in rural areas [3].

Previous research has revealed that smartphone-based solutions can be used as low-cost and robust techniques to monitor road quality [6, 16, 17]. These applications use accelerometers and magnetometers embedded in modern smartphones to collect road data. As the number of smartphone users is rapidly increasing every year, these methods create a great opportunity to frequently collect and update road quality status by crowdsourcing. However, crowdsourcing can incur errors and inconsistencies in calculated values as the collected data can vary based on the vehicle type, smartphone model, or other user-specific habits of using mobile phones. This research aims to provide a solution to minimize these errors and efficiently employ crowdsourcing techniques.

The main contributions of this paper can be identified as follows.

  • A novel solution to measure road surface quality which is less susceptible to various factors that deviate the predicted roughness level from the actual level, and expects minimum requirements from the user for data collection is presented.

  • A segmentation algorithm is presented to segmentize any road using vehicle journey routes without having predefined segments, and to identify repeated segments in multiple journeys.

  • An updating algorithm is introduced to calculate a balanced roughness value from the acquired data on different journeys while penalizing possible errors. Moreover, a dashboard with a virtual map of roads is developed using Google Map APIs to display the calculated values.

2 Related Work

A variety of methods to assess road surface quality have been explored in the literature. In recent years, a multitude of research studies has been conducted under smartphone-based sensing approaches. Roadroid [21] can be identified as one of the earliest software developed to monitor road conditions using smartphone sensors. According to the authors, smartphone-based monitoring falls into class 4, which is subjective rating and uncalibrated measures, and would be a powerful way of data collection even though it will not provide precise results compared to the expensive precision profiles. They have experimented on the influence of the different vehicle types and road types, and have developed a model to make calculations less susceptible. Their two options of roughness calculation, eIRI (estimated IRI) and cIRI (calculated IRI) were based on peaks and root mean square analysis of vibrations and quarter-car-simulation.

Several research studies have investigated the use of machine learning techniques to predict IRI. Asfault, a mobile application, and a crowdsourced platform to monitor road conditions in real-time using machine learning algorithms have been developed under the research study referred to in [36]. They have considered the evaluation of road condition as a 5 class classification problem (good, average, fair, poor, and occurrences of obstacles). The SVM (Support Vector Machines) classifier was used for the automated evaluation of road quality and the average accuracy was 92%. However, the data collection was only available to the developers or authorized experts. RoadSense [7] is an Android application that makes use of both accelerometer and gyroscope sensors. The authors have experimented with three machine learning algorithms, C4.5 decision tree, SVM, and Naive Bayes.

In [34] the authors have provided a comprehensive review of the existing literature on smartphone sensor-based approaches for road quality monitoring. The conclusions drawn from the review include the necessity of developing a fully automated application without the involvement of user interaction, providing adaptability to variations observed under different vehicle types and road conditions, and applying effective crowdsourcing techniques.

Experimental studies to identify and evaluate the factors that affect the road quality measurements of smartphones have been carried out in [5, 29, 40]. They have investigated the influence of speed, the mobile phone model, its placement, vehicle type and driving style. The results of the experiments suggest that without proper calibration the obtained values can vary significantly, however by applying suitable techniques and analyzing data from all sources together, the difference can be minimized.

Crowdsourced techniques help to overcome the problems of insufficient data coverage and can provide more frequent updates, presumably improving the accuracy of the observations. In [24] the authors have proposed an optimized crowdsensing solution to assess road quality. Reorientation mechanisms to align the smartphone accelerometer with the vehicle axes were applied through the use of Euler Angles. An IRI proxy value was calculated using the Root Mean Square of the z-axis acceleration, the number of values collected in the 50m segment, and vehicle speed. Data from multiple users were integrated by simply merging it to the nearest existing point using a distance-based weighting scheme.

A road condition monitoring system called RCT (Road Condition Tool) is proposed in [38] and its performance was analyzed in [37]. The objective of the application was to support supply chain links in identifying appropriate routes. A set of parameters including the vehicle type and weight along with acceleration and GPS data were collected. The repeatability of the calculated road pavement condition assessment index was analyzed in terms of vehicle running speed and vehicle category. Except for the category of heavy goods vehicles, the other vehicle categories and the vehicle speed had shown a low correlation with the index.

The main drawbacks of these previously proposed solutions are, requiring user interactions when collecting vehicle journey data and not handling smartphone sensor noises and errors efficiently to make the solutions properly crowdsourced. Moreover, they have not proposed proper techniques to recognize repeated road segments or to merge collected values from different sources accordingly. The majority of the previous solutions also lack validation of the solution across varying conditions. Table 1 presents a summary of the previous literature on smartphone-based road condition monitoring.

Table 1 A comparison of previous road condition monitoring smartphone applications

3 Methodology

3.1 Overview of the Data Collecting Mobile Application

An Android application “iRoads X” which is an updated version of “iRoads” [4] is used to collect GPS and acceleration data. The collected data; which are anonymised to protect the privacy of the user; are filtered, preprocessed, stored, and transferred to a database. Figure 1 shows two user interfaces of the iRoads X application. The rest of this section describes the techniques used in the iRoads X application for data collection, preprocessing, and storing.

Fig. 1
figure 1

User interfaces of the iRoads X application

3.2 Speed Calculation

iRoads X uses GPS to calculate the speed of the vehicle. It incorporates the haversine formula [14] given in Eq. 1, to get the distance between two GPS coordinates, and then that distance is divided by the time taken to travel the distance.

$$d=2r\arcsin\left( \sqrt{\sin^{2}\left( \frac{\phi_{2} -\phi_{1}}{2}\right) +\cos(\phi_{1})\cos(\phi_{2})\sin^{2}\left( \frac{\lambda_{2} -\lambda_{1}}{2}\right) }\right)$$
(1)
  • r is the radius of the Earth(i.e. 6371 km)

  • ϕ1, ϕ2 are the latitudes of point 1 and point 2 (in radians)

  • λ1, λ2 are the longitudes of point 1 and point 2 (in radians)

3.3 Noise Filtering

When collecting smartphone sensor data by crowdsourcing, the acquired signals may contain various types of noises. The SMA (Simple Moving Average) filter is used by iRoads X to decrease noises in the 3-axis accelerometer, such as hand movements and vehicle maneuvers. SMA is an arithmetic moving average that is calculated by taking the average of a data series. It can be mathematically represented by Eq. 2. In this equation Ai is the data point in the i th period, and n is the total number of periods.

$$\frac{A_{1} +A_{2} +...+A_{n}}{n}$$
(2)

Then the filtered data is put through a reorientation process which is explained in the following section. To remove constant noise such as engine vibrations, the system maintains a queue for the average filters of each axis of the accelerometer. Subsequently, another queue is maintained to keep reoriented sensor data collected when the vehicle is stable. Then the mean value of this queue is reduced from the mean value obtained from the previous queue to obtain the average filtered constant noise from the accelerometer readings.

3.4 Re-Orientation

iRoads X uses the 3-axis accelerometer as the main sensor to collect road profile data. A reorientation mechanism is required as the users can keep the mobile phone in any position while collecting the accelerometer data. For this reorientation process, configurable two methods are used in the iRoads X application. One method is by calculating Euler angles [31]. Equations 3 to 13 represent the mathematical functions involved in this method. x, y, and z are the initial acceleration values along x, y, and z axes and xreoriented, yreoriented, and zreoriented are the accelerations after reorienting.

$$\theta =\arccos\left( \frac{y}{9.800}\right)\\$$
(3)
$$\phi =\arctan\left( \frac{z}{x} \ \right)\\$$
(4)
$$x_{\phi } =x\times \cos(\phi ) -z\times \sin(\phi )\\$$
(5)
$$y_{\phi } =y\\$$
(6)
$$z_{\phi } =x\times \sin(\phi ) +z\times \cos(\phi )\\$$
(7)
$$x_{\theta } =x_{\phi } \times \cos(\theta ) +y_{\phi } \times \sin(\theta )\\$$
(8)
$$z_{\theta } =z_{\phi }\\$$
(9)
$$\alpha =\arctan\left( \frac{x_{\phi }}{z_{\phi }}\right)\\$$
(10)
$$x_{reoriented} =x_{_{\theta }} \times \cos(\alpha ) -z_{\theta } \times \sin(\alpha )\\$$
(11)
$$y_{reoriented} =-x_{\phi } \times \sin(\theta ) +y_{\phi }\cos(\theta )\\$$
(12)
$$z_{reoriented} =x_{_{\theta }} \times \sin(\alpha ) +z_{\theta } \times \cos(\alpha )\\$$
(13)

The other method is by using the magnetic vector and the gravity vector which can be found using the magnetometer and the accelerometer in the smartphone [9]. In this method, the reorientation can be performed without waiting for braking events. First, the accelerometer sensor data is transformed from the smartphone coordinate system to the geometric coordinate system and then transformed to the vehicle’s coordinate system. Figure 2 demonstrates this reorientation process.

Fig. 2
figure 2

Reorientation of the mobile phone

3.5 Identifying a Journey Without User Input

Another feature that iRoads X utilizes to support crowdsourcing is automatically identifying the time to collect data; i.e., collecting data only when the user is travelling in a vehicle. For this, iRoads X uses a third-party Software Development Kit (SDK) called pathsense [2] to identify the start and the end of a vehicle journey. This feature can minimize the user interactions required for the data collection process, which makes iRoads X more suitable and user-friendly for crowdsourcing.

3.6 Database

As the local database to store the collected data, iRoads X uses Couchbase lite [1]. The stored data is communicated with the database server through a sync gateway [32]. This provides an additional advantage that an active internet connection is not required to collect the data, the users can connect later and sync the data in the local database with the database server. The overall architecture of the application is presented in Fig. 3.

Fig. 3
figure 3

The overall architecture of the iRoads X mobile application

3.7 Segmentation of Roads

A segmentation algorithm is implemented to segmentize any road, and to identify the segments which are repeatedly travelled by vehicles using GPS coordinates. The roads do not need to have predefined segments, they will be automatically segmented using vehicle routes. This section presents the implementation methodology of the segmentation algorithm.

As the initial step, the collected journey data sorted by the timestamp are retrieved from the couchbase server. Then the distance between each two GPS coordinates is calculated using Eq. 1.

The points mounting up to the desired segment size (500m) are saved as a new segment if the road did not change and a previously recorded segment did not overlap with the new segment before reaching the desired distance. The change of the road is detected using reverse geocoding. If the road changed, then a new segment will be started at that point discarding the points before that as shown in Fig. 4.

Fig. 4
figure 4

Starting a new segment when the road changes

The overlapping with an existing segment can happen in two ways. The new journey can start at an earlier point in the road and then start to overlap with an old segment as shown in Fig. 5, or the new journey can start at the middle of an existing segment as shown in Fig. 6. In the former case, the existing segment is detected by searching for a starting coordinate of an already saved segment that lies close to the GPS coordinate of the current journey. In the latter case, an ending coordinate will be searched. The flow chart of the searching function is given in Fig. 7. If such a segment exists, the points up to that overlapping point in the current journey will be discarded.

Fig. 5
figure 5

New journey starting at an earlier point in the road and then overlapping with an old segment

Fig. 6
figure 6

New journey starting at the middle of an existing segment

Fig. 7
figure 7

The flow chart for the function to search existing matching segments

Then the old segment is compared with the new journey to confirm whether they overlap from the starting point to the ending point. This is done by checking if the difference of the bearing angles of the two journeys does not exceed a predefined limit. This comparison is required as the new journey can have a U-turn at a middle point of the old segment, without reaching the segment end. On rare occasions, the two journey directions can show a significant difference due to other reasons, but the two segments may still completely overlap. To handle such cases another function is applied. The difference of that function is that instead of checking the bearing angle of every sub-segment, it will check the distance between the ending point of the old segment and the new segment as shown in the flow chart in Fig. 8.

Fig. 8
figure 8

The flowchart of the second algorithm to confirm that the new segment overlaps with the old segment

3.8 Road Roughness Prediction

When properly trained, machine learning models are capable of handling complex tasks and predicting more accurate values to a larger range of inputs compared to threshold-based algorithms [12, 26]. Therefore to predict road roughness level, an XGBoost based machine learning model is implemented. The predicted roughness level is based on the IRI (International Roughness Index) and will be referred to as alt-IRI (alternative IRI) in the following sections. XGBoost which stands for “Extreme Gradient Boosting” is a highly scalable, memory efficient, and flexible supervised learning technique based on decision tree ensembles [11]. It attempts to predict an accurate value for the target variable by optimizing the loss function and applying regularization.

The features selected to predict the alt-IRI for each segment are the mean, the maximum, and the standard deviation of acceleration signals and the number of spikes (pulses) in x, y, and z directions, along with the average speed of the vehicle. The model is trained using a dataset collected from roads mounting up to 56 km through crowdsourcing. The dataset is labelled using the values collected by running a ROMDAS (Road Measurement Data Acquisition System) on the same road segments. The road quality category is then determined by classifying the predicted value into four classes as given in Table 2.

Table 2 The road quality category according to the alt-IRI

3.9 Updating the Values

If a new segment is identified, the predicted alt-IRI for the segment is saved along with the segment data and the timestamp. If the segment is repeated, then an updated value is calculated as given in 1.

figure a

The updating algorithm ensures that the alt-IRI values are frequently updated while penalizing the possible errors due to crowdsourced data collection. If there exist parts of the roads with less than 500m of length that may not be included in a segment, then their alt-IRI values are calculated by averaging the values of adjacent segments.

3.10 Visualizing the Calculated Values

As the next step, a virtual dashboard is developed using AngularJS to visualize the collected alt-IRI of roads in Sri Lanka on a map. This dashboard can be used by the public and road development authorities to easily identify the road conditions in the country. Figure 9 presents the interface of the dashboard displaying calculated alt-IRI.

Fig. 9
figure 9

The interface of the dashboard displaying calculated alt-IRI

3.11 Overall System Overview

The complete system architecture is depicted in Fig. 10. This diagram depicts how the aforementioned components communicate with one another in order to achieve the final objective.

Fig. 10
figure 10

Overall architecture diagram of the system

Figure 11 shows a conceptual diagram that explains the overall system. The 3 axis accelerometer readings are the independent variables in this system. The IRI of roads is the dependent variable. The moderator variables are the type of vehicle, the type of smartphone, and the positioning of the smartphone. These values are managed, by utilizing crowdsourcing techniques in such a way that the effect on the system’s outcomes from these variables is reduced.

Fig. 11
figure 11

Conceptual diagram of the overall system

Table 3 compares the techniques utilized in this system to those used in existing similar systems. Based on the advantages and limitations of prior studies mentioned in Table 3, it can be concluded that the proposed techniques in this paper are more suitable for crowdsourcing compared to the previous approaches.

Table 3 A comparison of techniques used in this paper and in previous work

4 Experiments

To evaluate the developed system, a set of field tests were carried out. Field tests were arranged to analyze the effects of the variations in passenger car type, road condition, smartphone model, and the position of the smartphone inside the vehicle.

The car type, smartphone position, and smartphone model are chosen since the degree of noises, such as engine vibrations, can vary depending on these factors. The smartphone orientation can also change based on the position of the smartphone. To analyze the impact of reorientation, noise filtering and roughness prediction techniques used in the proposed solution, experiments were carried under different controlled conditions.

In the first experiment, the data is collected on a selected road segment while the conditional factors vary. The selected segment is shown in Fig. 12. The vehicle type and smartphone placement are changed on each journey. Two journeys were conducted for each combination of vehicle type and smartphone placement. 3 vehicle types and 5 smartphone placements were used and altogether 30 journeys were conducted for the selected road segment. The specifications of the selected vehicles are given in Table 4. In 10 journeys, the data were collected from 2 smartphone model types under identical conditions, i.e. in the same road segment, with the same smartphone placement and vehicle type. A summary of the experiment is given in Table 5.

The purpose of this experiment is to evaluate the system performance under different data collection environments. When collecting data by crowdsourcing it is necessary to ensure that the predicted values by the system are not affected by different conditional factors. The system is expected to provide similar roughness values under each different condition as the experiment is done on the same road segment.

Fig. 12
figure 12

The selected road segment

Additionally, to evaluate the system performance under different road conditions, a set of roads under poor, fair, good and very good conditions were selected based on the visual inspection of experts. Then the alt-IRI values are predicted for each road segment using the developed system and subsequently, they are classified into the 4 classes based on the predicted value. The goal of this experiment is to evaluate the performance under different road conditions (very good, good, fair, poor) while the conditional factors are not controlled. Here the system is expected to predict different roughness values according to the road condition.

The results of these experiments are discussed in the following section.

Table 4 The specifications of the selected vehicles
Table 5 A summary of the first experiment

5 Results and Discussion

The collected acceleration signals from each journey in the first experiment are analyzed to identify how the varying factors affect the collected data. Figure 13a and 13b show two example acceleration datasets collected under different conditions in the same road segment. Even though the raw acceleration signals demonstrate significant differences, after applying reorientation and noise filtering techniques the differences appear to be negligible.

Fig. 13
figure 13

Example acceleration signals collected from the first experiment (a) Raw and processed acceleration signals collected from Samsung SM-M022G placed near gearshift in C200, (b) Raw and processed acceleration signals collected from Samsung SM-M022G placed on hand in C200

Table 6 presents the predicted alt-IRI for the selected road segment. It appears that the predicted values do not have a significant variance across varying conditions. Therefore it can be concluded that passenger car type or smartphone placement differences have not imposed a significant effect on the final predictions albeit the differences shown in raw acceleration signals.

Table 6 The predicted alt-IRI for the selected road segment

Table 7 summarizes the final predicted alt-IRI from the data collected from the two models Samsung SM- M022G (Model 1) and Samsung SM-A920F (Model 2) under identical conditions. Values predicted by the smartphone Model 2 seem to have slightly higher values compared to the values from Model 1 due to the differences in the sensitivity of the sensors, but when compared with all the predictions a significant variance cannot be observed.

Table 7 The predicted alt-IRI from the data collected from the two smartphone models

The results suggest that the proposed techniques have successfully reduced the conditional effects. Removing the constant noise has helped in minimizing the effect of the car type and the smartphone type and applying the simple moving average filtering has minimized the effects of hand movements and other high-frequency noises. The effect of the placement of the smartphone is reduced by applying the reorientation as it reorients the smartphone coordinate system to the vehicle coordinate system regardless of the differences in the placement inside the vehicle.

After applying the updating algorithm to the predicted values for the segment in Fig. 12, an average value of 3.350 is achieved. Figure 14 demonstrates the value distribution, applied weights, and the removed values by the algorithm. It is evident from the results that the possible anomalous predictions are removed by the updating algorithm and a final averaged and more accurate value is obtained.

Fig. 14
figure 14

A visualization of the alt-IRI distribution, applied weights and the removed values by the updating algorithm

One of the reasons for the observed slight differences in the calculated values is that on different rides the vehicle does not drive on the same line and different lines on the same road may have slightly different roughnesses, but after averaging, the updated value is expected to contain the most desirable value for the complete road segment. Another factor that affects the z-axis acceleration is the vehicle speed [25]. Therefore the predicted alt-IRI values are further analyzed with the average vehicle speed to figure out if there is an influence of speed on the predicted alt-IRI. The obtained results are shown in Fig. 15. It can be observed that the two variables do not demonstrate a significant correlation. It is expected, as the average speed is one of the input features for the machine learning model.

Fig. 15
figure 15

The correlation between the average speed on the predicted alt-IRI

Table 8 The results obtained from the second field test

It can be observed that after applying the processing techniques, and the machine learning model predicting similar values for the segment when the segment changes and the roughness level changes the system must be able to predict different alt-IRI values and the corresponding classes accordingly. The results obtained from the field tests of evaluating the system on road segments with different roughness levels proved the capability of the system as shown in Table 8. It can be observed that even after the acceleration signals have gone through the processing steps, the changes were captured and the machine learning model predicted the alt-IRI values accordingly.

The results from this second experiment highlight the advantage of using machine learning models to predict the IRI values as it gives the ability to predict accurate values for a larger range of complex inputs. Furthermore, the overall results prove that the segmentation algorithm accurately segmentizes the routes, identifies the overlapping segment and enables the updating algorithm to calculate an averaged value.

The authors understand that the scope of the conducted field tests is limited, but the results achieved from the tests showed the proposed system’s capability to provide accurate predictions of the road conditions by crowdsourcing. However, as the target vehicle class of the developed system is passenger cars, the system is not trained or tested for other vehicle classes. The predictions may show some deviations from the actual value if data is collected using a different vehicle class. Another major limitation in the proposed system is that there is a significant error rate of 10m in smartphone GPS, but many functions in the proposed solution such as the vehicle speed calculation and segmentation algorithm depend on the GPS coordinates. Furthermore, the accuracy of the segmentation algorithm depends on OpenStreetMap accuracy. If there are any unidentified roads or incorrectly named roads in OpenStreetMap, the route segmentation process may also be affected.

6 Conclusions and Future Work

This study attempts to develop a crowdsourced road surface quality monitoring application that requires minimum user interactions and puts minimum constraints on the user for collecting data, and which is also capable of reducing the effects of different factors that can deviate the calculated road roughness from the actual value. A set of experiments was conducted to verify the system performance under diverse conditions. The preliminary experiment was carried out on a selected route, using multiple car types, smartphone models, and different placements of the smartphone inside the vehicle. The obtained raw acceleration signals from the experiment showed considerable variations across conducted journeys, but after applying signal processing steps the deviations were minimized. Moreover, the developed machine learning model was able to predict accurate roughness levels further dealing with varying conditions. In the second experiment, the system was tested on different road conditions. It is observed that the system is capable of accurately determining the road quality category. When more data is collected frequently for the same road segment, it is observed that the final value obtained from the updating algorithm tends to provide a more precise balanced value. Complexities can arise when thousands of mobile phone users connect to the system at the same time and a massive amount of data is being collected every second. Many recent studies have proposed efficient and scalable solutions for big data analysis and high-performance computing [10, 28]. As the proposed system is developed using the microservices architecture [13] it is possible to enhance the system performance by augmenting these techniques.