1 Introduction

Deep learning, which is the basis of AI technology, has made significant strides in recent years. Target tracking, semantic segmentation, and unmanned driving are some of the many aspects that target detection technology supports technically on a basic level. In systems where authorization is essential, such as parking management systems, toll payment processing systems, etc., it also invariably plays a part [1]. A recent high-tech innovation, automatic driving depends on the capacity to only find automobiles. Traffic violations, car accidents, and thefts occur often in urban areas and are captured on video by CCTV systems. The vehicle detector in a traffic surveillance system needs to be quick, precise, and dependable enough to do so [2]. Vehicle detectors are often evaluated based on their real-time detection capacity and if they have a high detection accuracy of traffic objects in bad weather [3, 23].

As the number of motor vehicles on the road grows, automatic vehicle traffic monitoring is becoming an increasingly important piece of traffic control and violation detection technology. There is a great deal of difficulty in supervising traffic in large, crowded cities [4]. The traffic flow data that is used by modern surveillance systems typically takes significant factors like speed, size, trajectory, and vehicle type into account. Additionally, modern sensors that utilize vision are used to track and record particular traffic patterns [5].

With the advancement of DNNs built on neural networks, a sophisticated subset of ML has emerged that is particularly effective at addressing issues in several complex models that are typically challenging to explain using conventional statistical techniques [6]. It recognizes a wide range of items, including automobiles, persons, license plates, and many other objects. The CNN's ability to extract key properties after training without involving humans [7, 24]. For devices with constrained storage and processing capability, the computational load is still too great to process photographs. The D-based algorithms are well known for being powerful picture identification tools. Among the several vehicle identification algorithms, CNN-based techniques have become increasingly popular [8].

The YOLO treats the vehicular perception problem as a regression problem and achieves accurate vehicle detection by classifying images using CNN. The YOLO networks can accelerate detection, instantly recognize cars that have been motion-blurred, and gather information about an object's position, category, and confidence level [9, 25]. SNN bounding box and class probabilities can be forecasted using regression-based YOLO methodologies. The YOLO model was developed as a result to expedite the process of recognizing an object and pinpointing its location in an image. It is demonstrated that YOLO-based approaches, including YOLOv3, YOLOv4 [10], and YOLOv5, maintain their superiority in terms of processing time and accuracy [18]. In this paper, a novel Logistic Vehicle speed detection using the YOLO (LV-YOLO) method has been introduced to detect the logistical vehicle speed detection using the LV-YOLO network.

The main contribution of our method is as follows:

  • In the image acquisition layer, a CCTV camera first captures the input highway traffic video, and then collected videos are converted into frames.

  • In the segmentation layer, the video frame is segmented using U-Net, which segments the vehicle in the video frames.

  • The detection layer performs logistic vehicle detection, and speed detection using LV-YOLO on segmented frames based on the Boxy Vehicle dataset.

  • Finally, AI can provide detailed information on the speed of the trucks to the traffic police for immediate action to control the speed of the truck.

The remaining portions of this work are given as follows. Section II discusses the literature review related to logistic vehicle speed detection. The proposed LV-YOLO method is presented in Section III and an experiment is conducted in Section IV to examine its viability. The conclusion of Section V is provided by an experimental result.

2 Literature survey

Recently, researchers introduced many numbers of deep learning and ML-based methods, especially to increase the precision of vehicle speed detection. This part provides an overview of some new and advanced techniques.

In 2022, Hussain et al. [11] introduced presented a hybrid phantom approach to promote privacy while reducing energy use. The results reveal that the parameters have an average consistency value of 4.2, a consistency index value of 0.066, energy usage of 1.211 J, and an average safety ratio of 59.41%.

In 2023, Farid [3] designed vehicle identification and classification using the YOLO-v5 network using freely available datasets. The redesigned YOLO-v5 framework adjusts to any challenging traffic patterns. However, the haze images have very limited visibility using the suggested method.

In 2020, Fachrie [12] introduced a straightforward vehicle classification and counting mechanism to aid humans. Implement a vehicle counting system instead of following the movements of the cars using DL techniques. With YOLOv3, it improves system performance and cuts down on time. The experimental result of the recommended model was 97.72% accurate in counting the vehicles based on video.

In 2020, Kim [13] presented a real-time vehicle recognition method based on DL algorithms employed in tunnel photos. Procedures for noise elimination and brightness smoothing are employed to locate the vehicle in the tunnel environment. After creating a training image, the vehicle region is learned using the ground truth method. In various tunnel road settings, the suggested method's detection accuracy is around 94%.

In 2020, Sudha and Priyadarshini [14] suggested the various types and numbers of vehicles in an input video, it is advised that the updated Yolov3 advanced DL model and better visual background extractor methods be employed. The average accuracy of the trial was 98.6%, and the results were captured on multiple-input, high-definition films with a monocular camera.

In 2023, Chen et al. [15] suggested an edge intelligence-based enhanced YOLOv4 vehicle detection to enhance vehicle detection performance using ECA and HRNet, to enhance segmentation precision using the original backbone network with MobileNetv2. The outcome demonstrates that the suggested strategy may raise the accuracy of vehicle detection from 82.03 to 86.22% and raise the quality of the segmentation model from 73.32 to 75.63%.

In 2023, Zaman et al. [16] different classification networks are employed to create an ensemble CNN-based improved driver facial expression recognition model. To detect the faces of the drivers, the R-CNN model is applied. It is capable of identifying faces reliably in real-time and in offline video. Face detection and DFER datasets achieve better accuracy.

In 2023, Azhar et al. [17] introduced a DL accident prediction model that combines extended features like weather, geo-coded locations, and time information. The accuracy for accident detection is raised by 8%, bringing the test accuracy to 94%. However, it does not rely on a decreasing map architectural process because data is only provided when an accident happens.

According to the above literature, various DL and ML techniques focus on vehicle detection and speed detection. Additionally, existing techniques are more time-consuming, have lower performance rates, and train loss is higher, detection model needs improvement in terms of mAP, and predicting things like speed and movement are the most difficult tasks. Therefore, the proposed LV-YOLO method is used for logistics vehicle detection, and speed detection accurately in a short time.

3 LV-YOLO methodology

In this section, introduced a Logistic Vehicle speed detection using the YOLO (LV-YOLO) method for highway truck detection, and truck speed calculation. The proposed framework is divided into three layers. Layer 1 is the image acquisition, layer 2 is the segmentation layer, and layer 3 is the detection layer. In the image acquisition layer, a CCTV camera first captures the input highway traffic video. The collected video is converted into frames. In the segmentation layer, the video frame is segmented using U-Net, which segments the vehicle in the video frames. The detection layer performs logistic vehicle detection, and speed detection using LV-YOLO on segmented frames based on the Boxy Vehicle dataset. Finally, AI can provide detailed information on the speed of the trucks to the traffic police. The proposed LV-YOLO method's general flow is depicted in Fig. 1.

Fig. 1
figure 1

The overall workflow of the proposed LV-YOLO

3.1 Image acquisition layer

The truck on the highway is found using the logistic vehicle detection approach as shown in Fig. 1. The road area is divided into a remote region and a proximal area depending on where the camera was installed.

The CCTV cameras' real-time highway videos are first collected, and the videos that have been acquired are then turned into frames. The highway route is being traveled by a variety of vehicles, including cars, trucks, buses, motorbikes, bicycles, etc.

3.2 Segmentation layer

U-Net is used in this layer to segment the frames. To comprehend what is provided in an image at the pixel level, segmentation is used. It offers detailed information about the image as well as the vehicle's shapes and limitations. The result of image segmentation is a mask, each element of which denotes the class that a given pixel belongs to. This approach can be used to control traffic systems and has shown encouraging results using real imagery. Figure 2 shows the up-sampling and down-sampling paths that make up U-Net.

Fig. 2
figure 2

Architecture of U-Net

The down-sampling pipeline is made up of five convolutional blocks. The number of feature mappings is increased from 1 to 1024 in each block by using two convolutional layers. Except for the last block, downsampling is performed using max pooling, which reduces the size of the feature map from 240 × 240 to 15 × 15. Each block of the up-sampling starts with a deconvolution layer that increases the size of the feature maps from 15 × 15 to 240 × 240 while reducing the number of feature maps. By integrating the deconvolutional feature map from the encoding pass of each up-sampling block. Finally, employing a 1 × 1 convolutional layer, just two feature maps remain, representing the foreground and background segmentation, respectively.

3.3 Detection layer

In recent years, withthe development of DL in object detection, massive deep detection models have been proposed. YOLO represents a seminal advancement in object detection within the domain of computer vision. By framing object detection as a regression problem, YOLO takes a different approach. In the proposed method LV-YOLO has been proposed to detect logistic vehicles and vehicle speed based on previous data from the Boxy Vehicle dataset.

3.3.1 Boxy vehicle dataset

The Boxy vehicle dataset was used to train the LV-YOLO for image-based vehicle detection. The majority of the images in the dataset include traffic scenes and vehicles on roadways. As the input for our system is CCTV footage from traffic and road scenarios, this aligns this dataset ideally for our use case. The datasets include 1,990,806 automobiles that have been labeled by 3D-like and 2D bounding boxes over 200,000 images. During the distillation training process, the dataset's 2D ground truth annotations are utilized as hard labels. Sunny and rainy conditions at daytime, dawn, and dusk.

3.3.2 LV-YOLO for truck detection

The detection of truck, speed, and truck count using LV-YOLO. The 5th version of YOLO is used in the proposed method. The LV-YOLO network is trained using the UFPR-ALPR dataset. The most sophisticated object detection network is LV-YOLO, which is made up of three modules: Backbone, Neck, and Head. Figure 3 shows how the CSPDarknet53 architecture, with the SPP layer acting as the backbone, the PANet as the neck, and the LV-YOLO as the detecting head, is widely used by LV-YOLO. The CNN is cutting-edge and recognizes objects properly in real-time. This technique divides the entire image into components, evaluates each component using a single neural network, and predicts the bounding box and probability of each component. The predicted probabilities weight these bounding boxes. Because the neural network only runs one forward propagation loop before providing predictions, the technique "looks once" at the image. After non-max suppression, detected items are delivered.

Fig. 3
figure 3

Architecture of LV-YOLO

3.3.2.1 Backbone

The Input includes three parts: truck detection, speed detection, and truck counting as shown in Fig. 3. LV-YOLO uses a CSPDarknet53 backbone network for detecting the truck on the highway based on the Boxy vehicle dataset. The input image is processed, and hierarchical characteristics are extracted from it. Convolutional layers, pooling layers, and other architectural components created to capture features at various scales make up the backbone network. These layers successively shrink the input image's spatial dimensions while deepening the feature maps. A CSP connection has been added to Darknet53 to facilitate information flow, creating CSPDarknet53. The backbone generates a hierarchy of feature maps, where each map captures features at a specific scale. These feature maps are passed on to subsequent components for further processing.

3.3.2.2 Neck

The neck is an intermediate component placed between the backbone and the detection head. LV-YOLO employs PANet as its neck structure. By combining truck characteristics at many scales from various backbone network stages, PANet aids in improving the model's capacity to identify speed. This step, which is further separated into the Calibration Factor and Speed, determines the truck's detected speed.

3.4 Calibration factor

A crucial component of speed calculation is camera calibration. It is the ratio of actual distance to pixel. Since the vehicle cannot fly into the air or go over the ground, it can be seen as a 2D-to-2D conversion. Knowing the real length of any object and dividing it by the length in pixels of the identical object in the image yields the calibration factor.

$$ C = \frac{{length of the object in real world \left( {cms} \right)}}{{length of the object in frame \left( {pixels} \right)}} $$
(1)

3.5 Speed calculation

Two traffic signals are used to calculate the ultimate speed in km/hr. Assuming the video is being played at 30 frames per second, the speed is changed every 0.5 s, taking into account the centroid values that are kept in an array for every 15th frame. The distance traveled by each object can now be determined using Eq. 2.

$$ D = \sqrt {\left( {a - e} \right)^{2} + \left( {b - f} \right)^{2} } $$
(2)

where (a, b) defines the centroid coordinates of an object in the image (i), and (e, f) defines the centroid coordinates of the same object in the image (i-15). The frame rate can be used to calculate the time required for these 15 frames. The truck's ultimate speed is determined by Eq. 3.

$$ V = c \frac{d}{T} $$
(3)

where T represents the time for 15 frames in hours, D represents the distance moved by the object in pixels, and C is the calibration factor in that particular region in km/pixel.

3.6 Head

The head is the last part of the object detection model, and it is in charge of predicting, calculating speed, and counting the number of vehicles related to the things seen in the image. The head of LV-YOLO consists of detection heads for multiple scales (e.g., YOLOv5s has three scales, and YOLOv5x has six scales). Each detection head is to predict the object's bounding box, object score, and class probability at each scale. Typically, anchor boxes are used by the head. These anchor boxes allow for the prediction of item size and position. As a result, the model can effectively detect objects of various sizes. The LV-YOLO is also well suited for a variety of applications, including object recognition in real-time video streams, robotics, and autonomous cars, which has helped it acquire popularity. Truck detection and speed detection using LV-YOLO shown in Fig. 4.

Fig. 4
figure 4

Truck detection and speed detection using LV-YOLO

4 Result and discussion

The effectiveness of the LV-YOLO approach is assessed and analyzed in this section. The proposed method is implemented in MATLAB2020b on a Windows 10 PC with an Intel i3 core CPU clocked at 2.10 GHz and 8 GB of RAM. Figure 5 shows the simulation result of the truck detection, speed detection, and truck counting. The logistic vehicle speed is determined in terms of pixels moved per sec before being converted into kilometers per hour and detect the logistic vehicle using LV-YOLO based Boxy Vehicle dataset.

Fig. 5
figure 5

Simulation results of the proposed LV-YOLO method

4.1 Performance analysis

The effectiveness of the LV-YOLO method was measured using some metrics Mean Average Precisions (mAP) using Eq. 4, Frames Per Second (FPS) using Eq. 5, and Mean square Error (MSE) using Eq. 6. The following equations are used to calculate these measurements.

$$ mAP = \frac{{\sum\nolimits_{m} {\mathop \smallint \nolimits_{0}^{1} } p\left( r \right)dr}}{m} $$
(4)
$$ fps = \frac{1}{T} $$
(5)
$$ MSE = \frac{1}{T}\mathop \sum \limits_{p = 1}^{T} (X^{p} - \hat{X}^{p} )^{2} $$
(6)

where \({X}^{p}\) and \({\widehat{X}}^{p}\) are predicted and ground-truth vehicle speeds at the \({p}^{th}\) future second. The effectiveness of LV-YOLO in detecting the vehicle speed. From the Boxy Vehicle dataset, the LV-YOLO achieves an overall mAP of 99.42%.

Figure 6 and Fig. 7, the LV-YOLO achieves the highest mAP in both training and testing. Figure 7 also displays the loss. The proposed model achieves a performance based on mAP of 99.42%. It is evident how to use the LV-YOLO network to increase the detection of mAP.

Fig. 6
figure 6

Performance curve of the LV-YOLO method

Fig. 7
figure 7

Loss curve of the LV-YOLO method

4.2 Comparative analysis

The proposed method is more successful when compared to previously employed strategies. Figure 8, illustrates detected result is compared between the proposed LV-YOLO network to existing networks such as YOLOv3, and YOLOv4. It clearly shows the proposed LV-YOLO method is better than existing techniques using the Boxy vehicle dataset.

Fig. 8
figure 8

Simulation result of the truck detection of existing techniques with LV-YOLO

In urban areas, deep-learning-powered truck detection can help with traffic control and city planning. Truck detection using deep learning may be used for security and surveillance in a variety of scenarios, including airports, seaports, and border crossings. Urban planners and policymakers may use truck detection technologies to collect information on the movement of products and materials inside cities. This data may help drive infrastructure development decisions such as road maintenance, the building of new transportation hubs, and the application of truck-specific rules to alleviate congestion and environmental impact [23,24,25].

Figure 9, illustrates the comparison of the proposed LV-YOLO network to the YOLOv3, and YOLOv4 networks it gains less mAP and FPS. LV-YOLO achieves a high mAP range of 99.42%. The mAP obtained by YOLOv3, and YOLOv4 is 96.32%, and 97.36%. The FPS obtained by YOLOv3, YOLOv4, and LV-YOLO is 20, 65, and 89 respectively.

Fig. 9
figure 9

Comparison analysis of existing DL models

Table 1 shows the performance comparison of vehicle detection and speed detection between LV-YOLO and existing methods. The LV-YOLO method maintains a high mAP of 99.42%. In vehicle detection comparison to CNN-TCAM, CNN, 1D-CNN, and EMD-Informer, the LV-YOLO technique improves overall mAP by 1.72%, 5.42%, 0.82% and 0.96%. According to Table 1, the proposed LV-YOLO achieved a high mAP value of 99.42% with low MSE and high inference speed (1.28 s per signal). The speed mentioned here is inference speed which is observed from the edge vehicle signal to the deep learning algorithm. The LV-YOLO takes 1.28 s per signal and this speed is better than all the existing techniques with good accuracy of 99.42% respectively. Compared to existing methods the proposed method yields better mAP and low MSE respectively.

Table 1 Performance comparison between proposed and existing models

5 Conclusion

In this research, an LV-YOLO method to detect the logistical vehicle and speed detection. The collected highway video is converted into frames. The video frame is segmented using U-Net, it segments the vehicle in the video frames. The detection layer performs logistic vehicle detection, and speed detection using LV-YOLO based on the Boxy Vehicle dataset. The LV-YOLO model was evaluated based on mAP, and FPS. The mAP result of the LV-YOLO method maintains excellent mAP levels of 99.42%, and FPS level of 89. In comparison, the mAP obtained by YOLOv3, and YOLOv4 is 96.32%, and 97.36%. The FPS obtained by YOLOv3, YOLOv4, and LV-YOLO is 20, 65, and 89. The LV-YOLO method improves the overall mAP by 1.72%, 5.42%, and 0.82% better than the Simple vehicle counting system, Real-time detection, and Advance YOLOv3 model respectively. The simulation outcomes show that the LV-YOLO method detects the vehicle speed and truck successfully. The LV-YOLO can be an effective method for enhancing both logistics vehicle recognition and speed detection while using a similar amount of model parameters and computational complexity as the LV-YOLO method. Currently, the proposed method can only detect vehicles and not classify them. In the future, to detect district road vehicle speeds, and count and classify them into various categories with more accurate results using an advanced YOLO network.