1 Introduction

Traffic density congestion refers to the increased volume of vehicles on roadways, resulting to reduced flow and speed, longer travel timesand increased frustration for commuters [1]. As cities grow and economies flourish, traffic congestion becomes an unavoidable challenge. The main reason of Traffic Density (TD) jamming is the ever-growing amount of automobiles on the roads. With the rise in personal car ownership and the convenience it offers, the demand for private transportation has skyrocketed [2, 3]. As a result, road networks have become overwhelmed, leading to congestion during peak hours. Outdated or poorly planned infrastructure contributes significantly to traffic congestion. Roads that cannot accommodate the increasing number of vehicles exacerbate the problem. Additionally, insufficient traffic management systems lead to bottlenecks and gridlocks [4, 5]. As more people migrate to urban areas in search of better opportunities, the population in cities increases. The rapid urbanization makes immense pressure on transportation systems, resulting in congestion on roadways [6].

In this rapidly advancing era of transportation, managing traffic congestion and ensuring road safety are critical challenges faced by governments and urban planners worldwide [7, 8]. Traditional traffic management systems often fall short in handling the complexities of modern urban environments [8, 9]. However, with the integration of technology, the development of advanced Recommendation Systems(RS) has become possible. Modern RS utilize Machine Learning(ML) algorithms to examine huge volumes of real time traffic information, historical outlines, weather conditionsand events to provide intelligent insights [10, 11]. Many approaches to route recommendation have focused on segmenting roads, but they often encounter difficulties. Previous approachs typically overlook the importance of vehicle size as a critical factor for enhancing driving convenience and safety. Additionally, these approaches face challenges that extend beyond basic attributes, necessitating the integration of real-time road conditions. Accurate assessment of road capacity is essential for identifying the most appropriate road segments to provide optimal route recommendations. RS suggest the most efficient routes to avoid congested areas, thereby reducing travel duration and fuel utilization. Users receive real time updates on traffic conditions, accidents and construction activities, enabling them to make informed decisions.

Road condition classification involves assessing the quality and safety of roads. Identifying and categorizing road conditions are essential for planning maintenance activities and prioritizing repairs to ensure the safety of motorists. The EGJ fusion based RS has developed as a guaranteeing approach for optimizing Traffic Congestion (TC) and classifying road conditions effectively. The main contributions of proposed approach are,

  • Enhanced preprocessing enables more accurate object detection, leading to better traffic flow analysis and monitoring. The contrast enhancement and denoising techniques improve the visibility of objects in the images, aiding in better decision making.

  • ECNN learns the hierarchical features and patterns from preprocessed images makes it robust to variations in lighting, weather and road conditions, resulting in reliable traffic flow classification.

  • ADGAN enhances model generalization, reducing overfitting on the training data and improving performance on concealed data. It learns robust feature representations through adversarial training, enabling better separation of different label categories in the latent space.

  • EGJ system's recommendations lead to smoother traffic movement, reducing congestion and improving overall transportation efficiency. By accurately classifying road conditions, the system aids in proactive road maintenance and enhances road safety.

Remaining document is structured as follows: Sect. 2 discusses the literature survey, Sect. 3 outlines the introduced methods, Sect. 4 describes the outcomes and explanations of experimentand Sect. 5 provides the conclusion and outlines future scope for this manuscript.

2 Literature survey

Several researches in the literature focus on classifying TD congestion using various methods and perceptions. This section discusses several recent approaches, their outcomes and associated challenges.

Mane et al. [12] suggested a road TD classification system using a Convolutional Neural Network (CNN). It demonstrated promising results in improving the overall traffic system. By leveraging CNN's ability to learn complex features from images, they achieved accurate Vehicle Detection (VD) and density classification. Humayun et al. in [13] addressed the problem of multi-scale VD in changing climate environments by YOLOv4 and spatial Pyramid Pooling (PP) network. It showed promising results in detecting vehicles at different scales and under diverse weather conditions. Jilani et al. in [14] introduced a novel 5layer CNN model for TC classification. To address the challenge of limited real-world data, used Generative Adversarial Networks (GAN) based synthetic data augmentation. This approach enhanced the model's performance and achieved competitive results. Chetouane et al. in [15] focused on vision-based VD for road TC classification. Utilized visual information to classify TC levels accurately. Rafique et al. in [16] developed a rapid road traffic monitoring scheme employing Pyramid Pooling Vehicle Detection (PPVD) and filter-based tracking on floating images. The designed technique demonstrated effective traffic monitoring capabilities. Mehdi et al. in [17] suggested an Entropy based Traffic Flow Labeling (ETFL) approach for CNN constructed TC detection. The method effectively predicted TC based on meta parameters. Putra et al. in [18] introduced a road and traffic conditions monitoring system using information extracted from twitter. While the method provided valuable insights into traffic conditions, it heavily depended on the availability of relevant tweets, which could limit its effectiveness in areas with low social media activity or unreliable data sources. Kamble and Kounte in [19] applied a ML approach to develop a TC monitoring scheme in the Internet of Vehicles (IoV). Utilize a ML technique to improve traffic managing and enhance traffic flow in IoV environments. Saleem et al. in [20] developed a Fusion constructed intelligent TC Control (FTCC) scheme for vehicular systems in smart cities, leveraging ML techniques. It emphasized the potential of ML to contribute for the development of smart cities with efficient transportation systems. Singh and Verma in [21] presented a method for providing accident and congestion alerts in VANET using random forest and cloud load balancing. Cloud computing and artificial intelligent tool integrates to enhance communication and safety in vehicular networks. Mondal and Rehena in [22] focused on the classification of road segments in an Intelligent Traffic Management System (ITMS). Emphasized the importance of accurate road segment classification for effective traffic control and planning. Zang et al. in [23] introduced a method for identifying TC patterns in urban highway networks constructed on the Traffic Performance Index (TPI). It demonstrated the significance of data driven approaches and performance indices in understanding traffic patterns in urban environments.

Kaul and Altaf in [24] introduced a Traffic Safety Management Approach (TSMA) for smart road transport in Vehicular Adhoc NETworks (VANET). The system focused on enhancing traffic safety and management in VANETs. The research highlighted the way of intelligent approaches leveraged to create safer road transportation systems. Rath et al. in [25] suggested a mobile agent constructed enhanced traffic controller structure in VANET. It aims to enhance traffic control mechanisms by using mobile agents, which are software entities capable of autonomous actions. The method showcased the potential of mobile agents in optimizing traffic flow and improving overall efficiency in VANETs. Prakash et al. in [26] focused on fuel utilization and interruption conscious traffic development in the VANET setting. To optimize traffic scheduling to reduce fuel consumption and minimize delays in vehicular communications is the main function of this method. Chandramohan et al. [27] in 2020 focused on energy distribution for green vehicle transport through Multi-Metric Cooperative Clustering Based Routing for Energy Efficient Data Dissemination (2M2C-R2ED). They applied this approach across diverse areas and scenarios, centrally analyzing data from roadside units and moving vehicles to assess transport resources, traffic conditions and congestion causes. This system achieved an average 2.8216% frequency in node identification and process optimization. Tu et al. in [28] developed a traffic jamming prediction approach called SG-CNN to predict traffic congestion by grouping road segments. The authors aimed to improve prediction accuracy and stability by leveraging the relationships between grouped segments. The model demonstrated superior performance, particularly during peak traffic periods and showed robustness in handling dynamic traffic data. Table 1 signifies the summary of reviewed articles include references, methods, objectives and limitations.

Table 1 Summary of existing approaches

The table summarized the existing approaches of TD recognition based on weather conditions, light conditions and with various classifications. It shows various limitations like identifying TD in complex traffic, complexity in implementing and managing mobile agent, introduce latency.

Problem statement

The analyzed methods aim to classify TD based on image attributes, offering a novel approach to assess traffic recognition. However, both homogeneous and heterogeneous TD, detecting accident ratesand assessing weather conditions are not accurately recognized. Additionally, there is no recommended system in place. Consequently, the existing approaches (12–28) exhibit low accuracy in TD recognition, which serves as the motivation for this research work. In this research, a novel approach is presented that addresses the limitations. Proposed EGJ method introduces a robust classification system for both homogeneous and heterogeneous TD, as well as accurate accident and weather condition detection. Moreover, incorporate a recommended system that estimates various parameters, including TD levels (heavy, mediumand light), speed limits, weather conditions, urban or rural areas, light conditionsand road surface details. By integrating these advancements, proposed approach aims to significantly improve TD recognition and provide more comprehensive and precise evaluation of traffic conditions for enhanced traffic management and safety.

3 Proposed Enhanced Hybrid Golden Jackal Method

The EGJ fusion based RS has been introduced to achieve optimal TC management and accurate classification of road conditions. Figure 1 shows the architecture of proposed technique.

Fig. 1
figure 1

Proposed EGJ architecture

From Fig. 1 the process unfolds in multiple phases. In the initial phase, images sourced from a dataset of road vehicle images are subject to preprocessing through the EGF method. These images then undergo classification using the ECNN approach. The second phase involves the preprocessing of textual data collected from datasets. This preprocessing involves crisp data conversion, data splittingand normalization methodologies. Classification task is accomplished utilizing the ADGAN approach. In the final phase, an EGJ fusion technique is employed to merge the outputs obtained from the training processes of ECNN and ADGAN. This fusion methodology effectively combines the insights gathered from both image and text data classifications. Ultimately, the results of this comprehensive process are harnessed and integrated into the recommended system.

3.1 Classification of image data

The effectiveness of neural networks significantly pivots on the quality of input data. Preprocessing acts a crucial role in improving data quality and consequently the overall performance of classification models.

3.1.1 Preprocessing with enhanced geodesic filtering

Traffic surveillance cameras capture images in various environmental conditions, such as different lighting, weather and traffic situations. Preprocessing aims to enhance the images by reducing noise, improving contrast and extracting essential features for better analysis [32]. Enhanced Geodesic Filtering (EGF) is a mathematical morphological operation used for image processing. Image denoising and contrast enhancement are the key components of EGF to improve image quality.

Noise in images are significantly degrade the performance of classification algorithms by introducing irrelevant variations. EGF computes geodesic distances between pixels, considering both spatial proximity and intensity differences. This differentiation helps in distinguishing between noise and actual image features. ALSO, by applying smoothing operations based on geodesic distances, EGF selectively reduces noise while preserving important structural details. This adaptive approach ensures that noise is effectively minimized without blurring significant features. EGF maintains sharp intensity transitions by focusing on preserving gradients during filtering. This ensures that edges remain well-defined and prominent.Moreover, by utilizing geodesic information, EGF applies more smoothing in homogeneous regions while preserving sharp transitions at edges. Finally,EGF takes into account the local context of each pixel, ensuring that significant features, like textures and patterns, are retained. Also,dynamically adjusts the filtering scale based on local image characteristics.

The core concept behind GF is the geodesic space among two points on an image. The geodesic space between two pixels is the distance of the shortest path between them, considering the image intensity values as a cost function. Equation (1) shows the geodesic distance \((t)\) between pixels \(a\) and \(b\) in an image mathematically.

$$t(a,b)=\text{min}[G(p)]$$
(1)

where, \(G(p)\) represents the cost function of the path \(p\) which is usually defined based on the intensity difference between pixels.

The geodesic filtering process summarized in the following steps:

  • Select an initial marker set \((M)\) containing pixels with known desired values.

  • Calculate the geodesic distance of each pixel to the nearest marker in \((M)\).

  • Replace the intensity value of each pixel with the minimum intensity value along its geodesic distance to its nearest marker.

This results in higher classification accuracy, as the cleaner, more detailed images provided by EGF directly translate to better performance in road condition classification.

3.1.2 Image classification with enhanced consolidated convolutional neural network

The application of ECNN [29] is used for classifying preprocessed traffic images into heterogeneous and homogeneous traffic flows, specifically distinguishing between ligh, medium and heavy traffic conditions. Heterogeneous traffic flow includes diverse vehicle types and speeds, while homogeneous traffic flow consists of vehicles moving at similar speeds and densities. It is a hybrid approach that integrates preprocessing layers, feature selection layersand CNN layers. This architecture enables the ECNN to make informed decisions based on the enhanced and relevant features obtained through preprocessing. To train the ECNN model, the preprocessed traffic images, along with their equivalent traffic flow labels, are fed into the network. The model learns to extract relevant patterns and features during the training process.

The ECNN efficiently extracts hierarchical features from raw images by employing convolutional layers. These layers are designed to detect low-level features such as edges and textures and progressively learn higher-level attributes, thereby improving the representation and comprehension of image content. ECNN integrates powerful preprocessing techniques, such as denoising, normalizationand contrast enhancement, to enhance the quality of input data before feeding it to the neural network. This ensures the network focuses on essential features during training, leading to improved performance. ECNN employs a consolidated architecture that combines preprocessing layers, feature selection layersand standard CNN layers. The consolidation enables better interaction between the preprocessing and CNN components, optimizing the flow of information throughout the network.

A typical ECNN contains several convolutional, pooling and fully connected layers. The convolutional layers employ learnable filters to mine hierarchical attributes from the input images. In a convolutional layer, the output attribute maps \((F\_out)\) are obtained by convolving the input attribute maps \((F\_in)\) with learnable filters \(L\). Equation (2) shows the formula for convolutional layer.

$$F\_out=Con(F\_in,L)+e$$
(2)

where, \(Con(F\_in,L)\) denotes the convolutional functions of input features; \(e\) denotes the term bias. The pooling layer decreases the spatial dimensions of the feature maps while retaining the most important information. Non-linear activation functions like Rectified Linear Unit \((RLU)\) are applied after the convolutional and pooling layers to establisg a non-linearity to the model. Equation (3) denotes the mathematical formula of RLU.

$$RLU(s)=\text{max}(0,s)$$
(3)

where, \(s\) denotes the spatial dimensions of feature; \(\text{max}(0,s)\) denotes maximum dimensions of features. The fully connected layer links every neuron in the current layer to all the neurons in the previous layer.The output of this layer is generated through matrix multiplication using weight matrices, combined with the inclusion of bias terms.

3.2 Classification of text data

In this phase, text data is collected from road safety datasets [31]. The text data includes information about various aspects of vehicles and accident details. Data preprocessing and classification is described in this section.

3.2.1 Data collection and preprocessing

The information is preprocessed through crisp data conversion, splittingand normalization. These steps confirm the text data is reliable, optimizedand appropriate for classification. Crisp data conversion, data splitting and data normalization are the three preprocessing steps take place.

  • Crisp data conversion

Crisp data conversion is a process of converting raw or continuous data into discrete categories or classes. It is commonly used when dealing with continuous numerical data that needs to be categorized into specific groups or intervals. Crisp data conversion is particularly useful for simplifying complex data and facilitating data analysis and modeling. Figure 2 shows the steps involved in crisp data conversion.

Fig. 2
figure 2

Crisp data conversion steps

Defining Categories: The first step is to define the categories or intervals into which the data will be converted. These categories are often determined based on the domain knowledge or specific requirements of the analysis.

Data Binning: The data is then divided into the defined categories or intervals, a process known as binning. This involves assigning each data point to the appropriate category based on its value. Common methods for binning include equal width binning (where bins have equal width) and equal frequency binning (where each bin comprises an equivalent amount of data points).

Labeling Categories: Once the data is binned, each category is assigned a label or a representative value. This label is used to represent all data points within that category during subsequent analysis.

Crisp data conversion is especially useful when dealing with datasets that have a large number of continuous variables or when working with algorithms that require categorical inputs.

  • Data splitting

In ML and data analysis, a pivotal stage involves evaluating a proposed appraoch performance on data it hasn't encountered before. This process involves partitioning the dataset into distinct subsets: training, validation and testing. The training data is employed to teach the model, the validation data aids in refining hyperparameters and assessing model performance during training, while the test data gauges the ultimate functioning of the model on new and unseen data.

  • Data normalization

Data normalization is also known as feature scaling. It is a preprocessing technique utilized to scale the values of different attributes in the dataset to a similar range. Normalization ensures that all attributes contribute correspondingly to the model during training, preventing features with larger values from dominating the learning process. The common techniques for data normalization consist of Min_Max scaling and Z_Score. Min–Max scaling method scales the data to a specific range, typically among 0 and 1. Equation (4) shows the Min–Max scaling function.

$$N\_val=(y-text{min}\_val)/(\text{max}\_val-\text{min}\_val)$$
(4)

where, \(N\_val\) denotes normalized value; \(y\) denotes the range; \(\text{max}\_val\) represents maximum value and \(\text{min}\_val\) represents minimum value. The process of Z-score standardization rescales the information to achieve a mean of 0 and a standard deviation of 1. Data normalization is essential, especially when features in the dataset have different scales or units.

These preprocessing techniques acts a vibrant part in confirming data quality and the successful application of ML algorithms to various datasets and for accurate classification.

3.2.2 Text Classification with Adaptive Drop block enhanced Generative Adversarial Networks

The preprocessed text data is then classified using ADGAN [30]. It is a powerful model for text classification tasks. It leverages adaptive drop block regularization to enhance model generalization and efficiency. ADGAN categorizes the text data into different classes, such as weather conditions, vehicle speed, highway conditions, rural/urban classificationsand light conditions. This classification process provides meaningful insights into various factors related to traffic and road conditions.

The construction of the ADGAN is mentioned in Fig. 3. The initial input for the text classification task consists of multiple featuresand there is a considerable amount of overlap among these features. Developing a robust classifier \(C\) becomes challenging because it struggles to accurately classify real data when faced with high levels of redundancy. To address this, the number of input features is reduced to three key components using Principal Component Analysis (PCA). PCA helps condense the feature information to a more manageable scale, which is a crucial step. This not only decreases the computational complexity significantly but also aids in training a resilient classifier \(C\).

Fig. 3
figure 3

ADGAN architecture

From Fig. 3 it is evident that the input for generator \(C(x,d)\) encompasses both noise factor denoted as \(x\) and class identifiers represented by \(d\). In contrast, the discriminator \(E\) is fed with text segments \({Z}_{real}\) accompanied by their corresponding class labels \(d\), in addition to synthetic segments \({Z}_{fake}\) created by \(C\) using noise \(x\) and class label \(d\). It's important to highlight that in the ADGAN setup, the discriminator \(E\) generates a solitary outcome that signifies either a synthetic label or a specific class \(d\). As a result, the training procedure involves directing the generator \(C\) to generate text segments that correspond to the targeted class label. This is achieved by training the discriminator \(E\) to maximize the log-likelihood using the formula Eq. (5).

$${A}_{E}=H[\text{log}Q(D=d|{Z}_{real})]+\}H[\text{log}Q(D=fake|{Z}_{fake}]$$
(5)

where, \(H\) represents real data’s, \(Q\) represents the component of PCA, \(D\) represents data label and \({A}_{E}\) denotes loss function of discriminator. Equation (6) is the formula to increase the log likelihood (probability) of trained generator.

$${A}_{C}=H[\text{log}Q(D=d|{Z}_{fake})]$$
(6)

where, \({A}_{C}\) represents trained generator \(C\). The primary aspect of the loss function \({A}_{E}\) aims to guide the discriminator \(E\) in accurately labeling genuine text samples, while the secondary aspect focuses on assigning \(fake\) labels to the text samples generated by the generator. In contrast, the main objective of generator \(C\) is to create text samples that correspond to the intended class. Through adversarial training, generator \(C\) learns to capture the authentic data distribution of the desired text class [33]. The discriminator has two outputs: one distinguishes between real and generated text, while the other classifies the input text based on its class label. The generator aims to produce text samples that belong to a specific class at the time of training. Consequently, the generator's parameters are fine-tuned to enhance the alignment of two factors. The initial factor involves the logarithmic probability of producing text that the discriminator recognizes as authentic. The secondary factor involves the logarithmic probability of generating text that the discriminator categorizes as a member of a specific class. Nonetheless, a difficulty emerges, especially concerning minority classes. When a generated text from a minority class is presented to the discriminator, it often receives a fake classification due to the scarcity of minority class samples in the training dataset. In its pursuit of minimizing loss, the discriminator often categorizes text from the minority class as fake. This results is the contradiction between the two components of the generator's objective, making it challenging to simultaneously optimize both components. This conflict leads to a decline in the quality of generated text, significantly hindering the effectiveness of GAN based approaches for text classification. Before providing drivers with recommended routes, the system predicts the traffic conditions for each detected road segment.. This road segment includes attributes like Days of the week(D), Weather(W), Rush hour(R),Temperature(Temp),and Traffic Condition(T).

In the ADGAN framework for text classification, the discriminator \(E\) possesses a single output that indicate either a specific class label \(d\) or a fabricated label. The training process of discriminator \(E\) centers on associating authentic text samples with their corresponding class labels. Simultaneously, \(E\) strives to associate the text samples generated by the generator \(C\) with the fabricated label. In contrast, the primary focus of generators training is to avoid generating text with the fabricated label and instead produce samples that align with the intended class. This strategy aims to create a certain equilibrium within the training dataset. This noise \(x\) is transformed to match the dimensions of real input data, now comprising the necessary features. Subsequently, discriminator \(E\) receives both artificially generated text samples and real text samples as input. The output of \(E\) provides the possibility that the input sample fits to class or is a fabricated sample. Over numerous iterations, both \(C\) and \(E\) converge to refined outcomes. Specifically, \(C\) becomes proficient at generating synthetic text that aligns with the distribution of real text, making it difficult for \(E\) to distinguish between them. The discriminator has a single output that indicates either a specific class label or a fabricated label. The training process of the discriminator focuses on associating authentic text samples with their corresponding class labels, while simultaneously distinguishing generated text samples from the generator with fabricated labels. To maintain a balanced relationship between the generator and discriminator during GAN training, it is crucial to avoid an imbalance that leads to issues like overfitting or mode collapse. In sparse GAN training, this balance becomes even more critical since the densities of generators and discriminators can differ significantly, exacerbating potential issues. To address this, introduce the concept of the Balance Ratio (BR), a metric for assessing the balance between sparse generators and discriminators. During each training iteration, random noise \(x\) is described from a multivariate normal distribution and real data \(H\)​ are tested from the training set. To denote the discriminator after a gradient descent update as \(E({\theta }_{E})\) and the generator before and after training as \({C}_{pre}({\theta }_{c})\) and \({C}_{post}({\theta }{\prime}{}_{c})\) respectively. The balance ratio is then defined by Eq. (7).

$$BR=\frac{E({C}_{post}({\theta }_{C}{\prime}))-E({C}_{pre}({\theta }_{C}))}{E({\theta }_{E})-E({C}_{pre}({\theta }_{C}))}$$
(7)

When the \(BR\) is low (< 30%), it indicates that the generator is too weak to deceive the discriminator, as the generated data are still recognized as fake. Conversely, a high \(BR\) (> 80%) suggests that the discriminator is too weak and fails to provide useful feedback to the generator. Table 2 presents the list of attributes used for determining the road condition along with their data ranges.

Table 2 Road condition features and their data range for road segments

The traffic condition is classified using datasets for each observed road segment. This classification includes attributes like \(D\), \(R\), \(W\), \(Temp\) and \(T\). This approach provides traffic condition predictions based on observations collected of 18 road segments. The classification of traffic conditions is divided into four categories that describe the road situation: 1 (Light traffic), 2 (Medium traffic) and 3 (Heavy traffic) [34]. The probability of each class of traffic condition is considered by Eq. (8).

$$T{C}_{({}_{T/D,R,W,Temp})}=\frac{T{C}_{({}_{T,D,R,W,Temp})}}{T{C}_{({}_{D,R,W,Temp})}}$$
(8)

where \(TC\) signifies the traffic condition classification, The classification result is determined by selecting the category with the highest probability at a given time \(d\). This is calculated using Eq. (9), which identifies the traffic condition class (\(i\)) with the highest probability based on the specified features.

$$Traffic\;condition=\text{arg}\underset{T}{\text{max}}\;T{C}_{d}\;({T}_{i}|{D}_{d}=\alpha ,TempT{C}_{d}=g)$$
(9)

where \(\alpha\) signifies the days with respect to specified time \(d\).\(g\) signifies the temperature with respect to specified time \(d\).

This dynamic interaction between the two networks contributes to improved performance in text classification. After that the output of ECNN and ADGAN are get fused for recommendation process. In this research, the static attribute refers to road infrastructure, which includes information about road segments like junction locations and road connectivity, as well as the length and width of the roads. These attributes are manually collected and measured using google maps for each road segment.

3.3 Data fusion with enhanced hybrid golden jackal approach

In this phase, the outputs obtained from the ECNN (image classification) and ADGAN (text classification) and the additional historical information are combined using an EGJ fusion method [31]. It is a powerful technique that integrates the predictions from both image and text classifiers. By fusing the outputs, the system achieves a more comprehensive and accurate analysis of traffic conditions. The fusion process enhances the decision making process by considering multiple sources of information.

The golden jackal algorithm belongs to the category of swarm intelligence algorithms, drawing inspiration from the cooperative hunting practices observed in the natural habitat of golden jackals. These jackals frequently collaborate in hunting, with participation from both males and females. The hunting behavior of golden jackals can be divided into three primary stages:

  • Searching and Approaching Prey: This phase involves the jackals searching for prey and moving closer to it.

  • Enclosing and Provoking Prey: Once the prey is located, the jackals encircle it and provoke it until it ceases to move.

  • Precise Attack: After the prey is immobilized, the jackals launch a coordinated attack to capture it.

During the initialization stage of the algorithm, a set of matrices representing the positions of prey are generated in a random distribution. This process is carried out using Eqn. (10).

$$\begin{bmatrix}\begin{array}{ccccc}Z_\text{1,1}&...&Z_{1,i}&...&Z_{1,m}\\Z_\text{2,1}&...&Z_{2,i}&...&Z_{2,m}\\...&...&...&...&...\\...&...&...&...&...\\Z_{M-\text{1,1}}&...&Z_{M-1,i}&...&Z_{M-1,m}\\Z_{M,1}&...&Z_{M,i}&...&Z_{M,m}\end{array}\end{bmatrix}$$
(10)

where, \(Z\) represents golden jackal, \(M\) represents the count of prey populationsand \(m\) signifies the number of dimensions. Equations (11) and (12) shows the mathematical depiction of the hunting behavior of the golden jackal is as follows\((|B|>1)\)

$${Z}_{1}(s)={Z}_{N}(s)-B\cdot |{Z}_{N}(s)-gl\cdot prey(s)|$$
(11)
$${Z}_{2}(s)={Z}_{GN}(s)-B\cdot |{Z}_{GN}(s)-gl\cdot prey(s)$$
(12)

where, s represents the present iteration. \({Z}_{N}(s)\) corresponds to the male golden jackal position, \({Z}_{GN}(s)\) denotes the female golden jackal positionand \(prey(s)\) signifies the position vector of the prey. \({Z}_{1}(s)\) and \({Z}_{2}(s)\) indicate the male and female golden jackals revised position, respectively. The parameter \(B\) stands for the energy involved in prey evasion and is computed according to Eqns. (13) and (14).

$$B={B}_{1}\cdot {B}_{0}$$
(13)
$${B}_{1}={d}_{1}\cdot (1-(s/S))$$
(14)

where, \({B}_{0}\) is a randomly generated value within the range of [-1, 1], representing the initial energy of the prey. \(S\) signifies the maximum number of iterations, \({d}_{1}\) is a constant set at 1.5and \({B}_{1}\) stands for the diminishing energy of the prey. From Eqns. (10) and (11), \({Z}_{N}(s)-gl\cdot prey(s)\) denotes the distance between the golden jackal and the prey.

where, \(gl\) signifies the vector of random numbers calculated using the Levy flight function. Equation (15) represents the mathematical representation formula of \(gl\). 0.05 is the constant value of \(gl\).

$$gl=0.05\cdot LF(x)$$
(15)

Equation (16) represents the mathematical representation formula of \(LF(x)\)

$$LF(x)=0.01\times (\mu \times \sigma )/(|{\nu }^{(1/\beta )}|)$$
(16)

Equation (17) represents the mathematical representation formula of \(\sigma\). It is the variable of loss function.

$$\sigma ={\{\frac{\Gamma (1+\beta )\times \text{sin}(\pi \beta /2)}{\Gamma (\frac{1+\beta }{2})\times \beta \times ({2}^{\beta -1})}\}}^{1/\beta }$$
(17)

where, \(\mu\) and \(\nu\) represent random values within the range of (0, 1)and \(\beta\) is a constant set at 1.5. \(\Gamma\) is used to interpolate factorial values for non-integer arguments and extends the factorial function to real and complex numbers. Equation (18) signifies the updated position of the prey \(Z(s+1)\).

$$Z(s+1)=\frac{{Z}_{1}(s)+{Z}_{2}(s)}{2}$$
(18)

The updated prey position, determined by the interactions with the female and male golden jackals. When the golden jackals engage with the prey, its evading energy experiences a reduction. Equations (19) and (20) shows the mathematical representation of the golden jackals encircling and consuming the prey \(\left(\left|B\right|\le 1\right)\):

$${Z}_{1}(s)={Z}_{N}(s)-B\cdot |gl\cdot {Z}_{N}(s)-prey(s)|$$
(19)
$${Z}_{2}(s)={Z}_{GN}(s)-B\cdot |gl\cdot {Z}_{GN}(s)-gl\cdot prey(s)|$$
(20)

Table 3 signifies the pseudocode for EGJ approach.

Table 3 EGJ fusion algorithm

From the Table 3, formula of fitness function\((p)\), represents fitness value of given solution\(p\); \(i\) represents iterations and \({w}_{i}\) represents weight parameter. The outputs from image, text and historical information are combined, incorposrating the hunting inspired behavior of the golden jackal algorithm. These combined insights create a comprehensive understanding of the situation at hand, leading to more refined and accurate recommendations. In the proposed fusion model, alongside forecasting trasssffic conditions, the system gathers historical data patterns necessary for calculating the road condition value. Some attributes, like weather information and traffic heterogeneity change dynamically over time. Therefore, the system continuously update these values using several matching sources, including OpenWeather, CCTV footage and TomTom digital maps. Meanwhile, static attributes, like road infrastructure, remain constant and are pre-determined. These combined attributes are used to define the road condition for each road segment accurately,also it helps to improve prediction accuracy.

For instance, considering both image and text derived insights, the system provide recommendations like adjusting vehicle speed based on observed traffic patterns, suggesting appropriate routes given the current weather conditions, or advising suitable actions for specific highway or urban settings. By integrating the dynamic strategies of the golden jackal algorithm into the RS, the overall accuracy and relevance of the suggestions are significantly improved, resulting in a more effective and reliable system.

4 Results and discussions

Classification of both text and images are carried out through the utilization of ECNN and ADGAN techniques is described in this section. Subsequently, an enhanced fusion approach is employed to elevate the performance of the RS. Lastly, a comparative experiment is conducted to highlight the benefits of the proposed classification method. Implementation is conducted on a personal computer featuring a 3.40 GHz Intel Core i7-6700 CPU and 16 GB RAM, using Anaconda and Python version 3.10.9 software.

4.1 Dataset description

The input image is sourced from a comprehensive road vehicle image dataset (https://www.kaggle.com/datasets/ashfakyeafi/road-vehicle-images-dataset), which is structured into distinct folders: one dedicated to training images and the other to validation images. This dataset holds immense value for the advancement of autonomous vehicles and initiatives related to traffic management. In addition, the input text data originates from road safety dataset (https://www.kaggle.com/datasets/tsiaras/uk-road-safety-accidents-and-vehicles?select=Accident_Information.csv) encompass a diverse range of information and data points. These datasets are designed to provide a comprehensive collection of relevant details related to vehicles and accidents, contributing to a more comprehensive understanding of these domains.

4.2 Experimental outcome of classified image

From the input data EGF method classifies the image as like homogeneous and heterogeneous traffic.

Figures 4(a-l) illustrates the effectiveness of the EGF method, across various texture categories. For homogenous textures, the method significantly smooths and unifies heavy, light and medium textures. Similarly, for heterogenous textures, it reduces variability and enhances uniformity in heavy, light and medium textures. Overall, EGF effectively reduces noise and variability, producing smoother and more consistent images across different texture types and intensity levels.

Fig.4
figure 4figure 4

Preprocessing results of propopsed appraoch (a) Original homogenosus heavy (b) Preprocesed homogenosus heavy (c) Original homogenous light (d) Preprocesed homogenous light (e) Original homogenous medium (f) Preprocesed homogenous medium (g) Original heterogenous heavy (h) Preprocesed heterogenous heavy (i) Original heterogenous light (j) Preprocesed heterogenous light (k) Original heterogenous medium (l) Preprocesed heterogenous medium

The results presented in Fig. 5 underscore the model's proficiency in accurately categorizing TD levels.

Fig. 5
figure 5

Homogeneous traffic (a) Medium (b) Light (c) Heavy

Figure 5a depends homogeneous medium traffic, Fig. 5b depends homogeneous light traffic Fig. 5c depends homogeneous heavy traffic. Homogeneous traffic, refers to a situation where vehicles on the road share similar characteristics, such as speed and size. Homogeneous light traffic scenario reflects a situation where traffic flow is consistentand vehicles are sparsely distributed. Medium traffic maintains a moderate and balanced flow. Heavy TD reaches a high level, resulting in congestion and slower vehicle movement.

Fig. 6 underscores the model's precision by effectively categorizing TD levels within the intricate framework of heterogeneous traffic conditions. Fig. 6a depends heterogeneous medium traffic, Fig. 6b depends heterogeneous light traffic and Fig. 6c depends heterogeneous heavy traffic. Heterogeneous traffic refers to a situation in which the flow of vehicles on the road is characterized by variations in speed, densityand types of vehicles. This discriminating accuracy enhances decision making for traffic management, route optimizationand real time urban mobility updates, ultimately contributing to smoother and more organized vehicular movement.

Fig. 6
figure 6

Heterogeneous traffic (a) Light (b) Medium (c) Heavy

The implementation of EGF and ECNN based preprocessing significantly enhances the outcomes of image classification tasks, particularly in the context of traffic images. The reduction of noise, improvement in feature extractionand resultant accurate classifications contribute to more informed decision making and improved real-world applications.

4.3 Experimental outcome of classified data

The results of text data classification sourced from the vehicleinfo.csv and accident.csv datasets underscore the model's proficiency in extracting meaningful insights from unstructured textual data. These insights empower decision makers to implement effective strategies, enhance road safetyand streamline urban mobility management for the overall benefit of commuters and transportation systems.

Figure 7 illustrates the rate of serious accidents spanning between the years 2006 to 2016. The dataset, accident.csv, encompasses a total of 286,339 instances of serious accidents. Additionally, the graph provides insights into the monthly range of total fatalities, along with a 10-month moving average for a more comprehensive perspective.

Fig. 7
figure 7

Serious accident rate

Figure 8 illustrates the rate of slight accidents spanning the years 2006 to 2016. The dataset, accident.csv, encompasses a total of 1,734,548 instances of slight accidents. Additionally, the graph provides insights into the monthly range of total fatalities, along with a 10-month moving average for a more comprehensive perspective.

Fig. 8
figure 8

Slight accident rate

Figure 9 illustrates the rate of fatal accidents spanning the years 2006 to 2016. The dataset, accident.csv, encompasses a total of 26,369 instances of fatal accidents. Additionally, the graph provides insights into the monthly range of total fatalities, along with a 10-month moving average for a more comprehensive perspective.

Fig. 9
figure 9

Fatal accident rate

The implementation of ADGAN preprocessing significantly enhances the outcomes of data classification tasks, particularly in the context of vehicle data. ADGAN improves the overall model generalization by mitigating overfitting concerns on the training data and elevating performance on previously unseen data. This is achieved through the acquisition of robust feature representations via adversarial training, fostering enhanced differentiation between distinct label categories within the latent space. The dynamic interplay between the two networks notably amplifies the efficacy of text classification, leading to enhanced performance.

4.4 Experimental outcome of recommended system

By harnessing the combined outputs of ECNN and ADGAN, the RS equips traffic management authorities with a holistic understanding of traffic flow, densityand conditions. This informed approach enhances the effectiveness of implemented strategies, leading to smoother traffic flow and reduced congestion.

Figure 10 shows the outlines of the recommendation data framework. This framework encompasses critical parameters such as speed rate, weather conditionsand light conditions. When a user uploads an image while specifying attributes like speed rate, weather conditions, urban or rural settingand light conditions, the system generates recommendations for specific areas relevant to the provided data. The resulting recommended data output is illustrated in Fig. 11.

Fig. 10
figure 10

Recommendation data framework

Fig. 11
figure 11

Outcome of recommended data

The outcomes related to the RS,that utilizes the innovative EGJ fusion technique, clearly demonstrates its crucial contribution to the transformation of traffic management strategies. By merging ECNN and ADGAN outputs, the system advances towards a smarter, saferand more efficient urban mobility landscape, benefitting individuals and the community at large.

4.5 Performance evaluation of proposed approach

The analysis incorporates four widely used metrics: overall accuracy, precision, recalland F1-Score. In this investigation, overall accuracy signifies the correctness of classifications across all products within the test set. Precision evaluates the accuracy ratio of products categorized within specific image and text categories. Recall measures the accuracy ratio of products correctly attributed to specific image and text categories. The F1-Score integrates precision and recall by calculating their harmonic average. As such, the evaluation employs overall accuracy and the F1-Score as performance metrics in this work.

Figures 12 and 13 shows the performance evaluation of homogeneous and heterogeneous traffic classification method. EGJ fusion technique attains 99.5%, 99.6% and 99.8% accuracy for homogeneous heavy, medium and light TD For homogeneous heavy, medium and light traffic it attains 99.7%, 99.6% and 99.8% respectively.

Fig. 12
figure 12

Performance evaluation of classification approach (Homogeneous traffic)

Fig. 13
figure 13

Performance evaluation of classification approach (Heterogeneous traffic)

Figure 14a-c, analyzed the error performance of road segment prediction, focusing on Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). MAE quantifies the average magnitude of errors in predictions, disregarding their direction. The proposed method achieved a MAE of 0.043. RMSE computes the square root of the average squared differences between predicted and actual values, yielding a RMSE value of 0.05 in the proposed approach. MAPE represents errors as a percentage relative to actual values, measuring the average absolute percentage difference between predicted and actual values. In that proposed achieved 0.148 MAPE value, indicates better accuracy, particularly in cases where predicting smaller values These results demonstrate that EGF, following grouping-based road segments exhibits smaller prediction errors. Conversely, ITMS [22] and SG-CNN [28] shows larger prediction errors, underscoring the effective adaptability of the ADGAN under road segment grouping model to the data.

Fig. 14
figure 14

Error performance of road segments prediction (a) MAE (b) RMSE (c) MAPE

Figure 15 evaluated the performance of road segment grouping pre-training across various time periods. This results illustrates that the prediction outcomes from both SG-CNN [28] and ITMS [22] exhibit significant unpredictability. This is particularly evident during early peak periods, where the ADGAN model clearly outperforms other approachs and attained 0.007 to 0.574 MAE . Furthermore, the SG-CNN method demonstrated greater stability across all time epochs, indicating that the approach presented in this research more effectively handles data with substantial fluctuations.

Fig. 15
figure 15

Performance in MAE across varying time intervals for road segment grouping prediction

Figure 16 signifies the performance evaluation proposed approach. EGJ technique achieves higher accuracy, precision, recall, F1-Score like 98%, 99.1%, 99.5% and 98.2% respectively.

Fig. 16
figure 16

Performance evaluation of proposed method

Figure 17 displays the visualization of training and testing accuracy for the proposed fusion method. These results reveal that the EGJ fusion approach achieves higher accuracy levels with various epoch. Fig. 17a shows the accuracy for 200 epochs. Figure 17b shows the accuracy rate for 300 epoch. Figure 17c shows the accuracy for 400 epochs. Upon analysis, it is obvious that the introduced approach attains remarkable accuracy while maintaining a minimal loss rate. Figure 18 depicts the visualization of training and testing loss for the EGJ fusion method. The outcomes indicate that the introduced approach achieves a notably low loss rate with various epoch. Figure 18a shows the loss rate for 200 epochs. Fig. 18b shows the loss rate for 300 epochs. Figure 18c shows the loss rate for 400 epochs.

Fig. 17
figure 17

Accuracy of proposed method with various epoch (a) 200 epochs (b) 300 epochs (c) 400 epochs

Fig. 18
figure 18

Loss rate of proposed method with various epoch (a) 200 epochs (b) 300 epochs (c) 400 epochs

Figure 19 illustrates the confusion matrix generated by the EGJ fusion technique. This matrix serves as a critical tool for evaluating the performance of classification models by displaying the accurately predictions align with actual ground truth across various classes. In the matrix, rows depict the true classes, while columns represent the predicted classes. Table 4 signifies the comparison of existing classification method with proposed approach. ADGAN attains higher accuracy than CNN, YOLOv4 and PPVD. It obtains 2.9% higher accuracy, 6.6% precision, 12.1% recall and 10% F1-Score than PPVD approach.

Fig. 19
figure 19

Confusion matrix of proposed approach

Table 4 Comparison of proposed classifier with the existing classification methods

The outcomes from the EGJ fusion method are used as input for the recommended system. The recommended system uses the fused data to make knowledgeable decisions and provide recommendations for traffic managing, like optimizing traffic flow, road conditions and safety measures.

Various CNN architectures combined with proposed approach are tested to find the most effective model, which is shown in Table 5. EGJ + ResNet-50 achieves high accuracy (98.4%) and precision (98.7%) with moderate computational efficiency (7.6 GFLOPs), making it a strong final model candidate due to effective feature extraction capabilities. EGJ + VGG-16 offers high precision (90.2%) but is less computationally efficient (19.6 GFLOPs) with moderate accuracy (88.5%), less suitable for resource-limited scenarios. EGJ + Inception-v3 shows excellent balance with high accuracy (97.8%) and efficiency (5.2 GFLOPs). EGJ + MobileNet-v2, despite lower accuracy (86.7%) and precision (85.9%), has very low computational cost (0.5 GFLOPs), ideal for mobile applications. EGJ + DenseNet-121 achieves the highest accuracy (99.9%) and recall (99.5%) with efficient computational usage (6.0 GFLOPs), leveraging feature reuse for strong performance. EGJ + AlexNet, the oldest architecture, performs the least well with accuracy (78.4%) and precision (82.1%) and moderate efficiency (1.1 GFLOPs), making it less competitive. Integrating the strengths of ResNet-50, Inception-v3 and DenseNet-121 enhances the proposed model's capability for robust performance, significantly improving research outcomes.

Table 5 Experimentation with Various CNN Architectures with proposed approach

4.6 Case study

In this research, historical traffic patterns cover trends observed in urban area which encompasses, each day of year from 2005-2010.This historical pattern specially includes both heterogenous and homogenous scenarios. A geospatial dashboard integrates data from three main sources: city traffic sensors, traffic model simulations based on sensor data and historical records. The traffic model simulations provide comprehensive traffic insights across the urban and highway area, aiding in the analysis of critical traffic conditions, congestion hotspots and traffic trends throughout different times and seasons. Fig 20 illustrates the congestion level of vehicles across different time frames.

Fig. 20
figure 20

Congestion level of vehicles (a) For each day of March 2007 (b) For each year between 2005–2010

Case study 1 Urban Traffic Management in single year (2007)

In this case study, the historical traffic patterns are analyzed specifically for each day of March 2007. The objective is to understand the congestion levels of vehicles on a daily basis throughout the month. This analysis involves collecting and processing data like traffic volume, speed and congestion catalogues for each day of March 2007, which is shown in Fig 20a. By examining these patterns, traffic management authorities identify peak congestion periods, trends in traffic flow and factors influencing daily traffic patterns. For example, specific days exhibits higher congestion due to events, weather conditions, or holidays, while others may show consistent traffic patterns. In March 5, 2007 analyzing the historical data reveals significant morning rush-hour congestion due to inclement weather conditions affecting road conditions Similarly, In March 15, 2007, the historical traffic data indicates heightened congestion during the evening hours attributed to a major sporting event held nearby.

Case Study 2: Historical Traffic Patterns for Each Year Between 2005-2010

In this case study, historical traffic patterns are examined over a broader timeframe, spanning each year from 2005 to 2010, which is shown in Fig 20b. The aim is to analyze long-term trends in traffic congestion levels, identifying patterns and changes over multiple years. This analysis helps in understanding seasonal variations, annual trends and the impact of infrastructure developments or policy changes on traffic flow and congestion. For example, In Year 2008, a comprehensive analysis shows a gradual increase in traffic congestion levels throughout the year, correlating with population growth and economic activities. In Year 2010, the historical data highlights a decrease in congestion during the summer months, attributed to the implementation of new traffic management strategies and road expansions.

Moreover, by leveraging historical patterns, the EGJ system predicts and manages traffic congestion effectively, optimizing traffic flow and reducing traffic jams. Integrating historical data enables accurate classification of road conditions, aiding in proactive maintenance and timely interventions.

5 Conclusion and future work

In this manuscript, the introduced EGJ method has been proficiently applied to recognize TD. The implementation was carried out successfully on the PYTHON platform. To assess the performance of the ECNN technique, a dataset comprising road vehicle images was employed. Additionally, the performance evaluation of the ADGAN method utilized datasets named vehicleinfo.csv and accident.csv. The proposed method demonstrates precise classification of both heterogeneous and homogeneous TD scenarios. The entire process involves collecting, preprocessingand classifying both image and text data to analyze traffic flow and road conditions. By fusing the outcomes from image and text classifiers, the system gains valuable insights for making informed decisions and providing recommendations in traffic management scenarios. The proposed system demonstrates outstanding performance metrics in TD and road condition classification tasks, achieving 98% accuracy 99.1% precision and 98.2% F1-Score. Moreover, error performance metrics including a MAE, RMSE and MAPE of 0.043, of 0.05,0.148 respectively,which further underscore the robustness and accuracy of our proposed approach. These results validate its efficacy in handling complex traffic data scenarios with precision and reliability.In future, this EGJ fusion method improves the precision and efficiency of the recommended system, contributing to better traffic flow optimization and road condition classification. The rise of autonomous vehicles holds promise in revolutionizing transportation. Self-driving cars optimize traffic flow and reduce human errors, potentially reducing congestion significantly. The transition to electric vehicles can have a positive impact on TD congestion. Electric cars produce fewer emissions, contributing to cleaner air and a more sustainable future.