1 Introduction

Most computer vision research uses good weather photos or movies. Fog, rain, and snow reduce video visibility, making all-weather monitoring difficult. In outdoor video surveillance, it is crucial to swiftly gather weather conditions and quality video pictures. Conventional weather identification involves sensors or radars [1], but it is restricted and expensive. Most weather identification research uses cameras to capture footage and computers to identify weather. The three main types of image-based weather recognition algorithms are those that train classifiers using weather image features, those that use deep learning, and those that use fusion features.

Using spatial information from weather photos, a weather identification classifier may be trained. This approach requires spatially-based weather image characteristics. Dictionary learning and multicore learning were used to classify weather images. This approach extracts the characteristics of the picture using spatial data. These features are the sky, shadow, rain line, snowflake, dark channel, contrast, and saturation. By employing a multicore learning algorithm, the optimal weight of all these features is learned. This weight is then used to identify and classify four types of weather images: fog, rain, snow, and clear. According to the source [2], the accuracy rate is reported to be 71%. The traditional feature extraction method, which involves mining, is a spatially oriented method. The extracted features, when analyzed spatially, can effectively capture the surface-level information present in the image. However, they do not adequately capture the intricate spatial relationships and underlying semantic meaning embedded within the image. Therefore, the recognition effect of the first type of method is usually poor in terms of spatial difference.

Second, end-to-end recognition based on deep learning. This study created the RFS (Rain Fog Snow) data collection of rain, fog and snow photographs and recommended analysing it with Superpixel masks. Although RFS data collection is private, enhancement and 10 types of convolutional neural networks for training can achieve recognition accuracy greater than 70% [3]. This article labels nine weather photographs. Integrating deep residual and dense convolution networks yields a weather recognition accuracy of 80% [4]. Fang et al. suggested an updated SqueezeNet-based weather classification approach that reduces the parameters to AlexNet equivalent in 1/50 of the original condition, but performs worse than ResNet and VGG16 [5]. This work suggested a block image and voting strategy-based classification approach for outside transmission line weather picture data, including foggy, rainy, snowy and sunny days. Refine ResNet50 to extract weather. The method identifies weather effectively, but lacks shallow weather image processing. Expression of layer data [6]. The characteristics of the deep neural network from end to end express the abstractness of the image and the intrinsic information [7]. The second method improves the accuracy of recognition when trained on large weather image data.

The third type of method combines the first two types of methods by incorporating spatial information. Integrates the underlying features extracted by traditional methods and deep learning features to train the classifier, considering the spatial relationships between the features. Guo et al. suggested a feature-fusion-based outdoor weather picture classification approach that includes sky, contrast, and saturation. According to [8], it has been found that deep features extracted by the dark channel and the AlexNet network exhibit a recognition accuracy rate exceeding 90%. The third type of method combines the underlying features and depth features of weather images, allowing for a more detailed expression of image information from various spatial perspectives. When comparing the first two types of methods, the addition of spatial information further enhances the accuracy of weather identification.

In order to address the issue of limited availability of public data sets for outdoor image weather recognition, as well as the lack of alignment between existing data sets and the specific application focus of this article, a new data set called the Multiclass Weather Image Blocks Dataset (MWIBD) was created. This data set includes various weather conditions such as fog, rain, snow, and sunny outdoor weather images. This study presents a novel approach to recognising outdoor weather conditions using image blocking and feature fusion. It improves existing weather recognition methods and employs the migration VGG16 network model to extract deep features from weather images. These features are then combined with superficial features such as average gradient, contrast, saturation, and dark channel to train a SoftMax classifier. The classifier is capable of accurately identifying fog, rain, snow, and clear weather conditions.

The significance of the work is to increase the visibility of the weather image of the global spatial information system. Intelligent global transportation and video surveillance require this. The main goal is to develop and execute a strong weather detection algorithm that can quickly and reliably identify weather in various areas. Standard image processing retrieves surface characteristics, whereas pre-trained VGG16 models collect deep spatial features. The goal is to train a SoftMax classifier to distinguish clear, rain, snow, and fog using a spatially improved, multicategory weather picture block dataset using feature fusion. Finally, our surveillance film should be waterproof while providing accurate and reliable location data for these devices. Experimental validation shows the utility of the algorithm locally. The 99.26% weather recognition accuracy suggests that it might be effective in spatial information-based adaptive video picture clarification systems.

This study presents a novel perspective on the impact of inclement weather on geographical data used in intelligent transportation and video surveillance systems. The main accomplishments include a weather recognition system that takes into account geographical context, as well as a collection of weather images organized by place. The system can enhance its accuracy in distinguishing between four different weather conditions by utilising deep spatial information and typical image processing techniques. The accuracy of the algorithm, determined by testing, is 99.26%. The results demonstrate that the algorithm improves video clarity even in challenging circumstances. This approach improves the dependability of geographic data during adverse weather conditions.

The study proposes a complete method to mitigate the influence of unfavourable weather on GIS. Section 2 introduces a multicategory weather picture block dataset to facilitate model training and testing in the absence of acceptable datasets. A weather picture crashing dataset image. Geospatial feature extraction for weather photos is covered in Sect. 3. Deep features employ transfer learning, while shallow features use standard techniques. The unique strategy of merging shallow and deep information to improve the SoftMax classifier, which is critical for weather classification, is introduced in Sect. 4. The way in which the weather image recognition model distinguishes clear, snow, rain, and fog is explained in Sect. 5. In Sect. 6, Experimental Results and Analysis, the system achieves 99.26% accuracy in weather identification. The algorithm’s function in adaptive video picture clarification for spatial data integrity in bad weather is highlighted in Sect. 7, Conclusions, which highlights the important aspects and discusses future research.

2 Methods

2.1 Weather Image Chunking dataset

The currently available public weather image datasets, in terms of spatial coverage, are very limited. Chu et al. constructed the Image2 Weather dataset, which has 183,798 images. It includes photographs of clear, overcast, snow, rain, fog, and other 6 weather situations [9]. The Multiclass Weather Dataset (MWD) uses spatial data. There are 65,000 photos in 6 categories: sunny, overcast, rain, snow, fog, and thunder [10]. Although both public data sets have six categories of weather, it is observed that the presence of clouds or cloudy conditions does not significantly affect the recognition of video image targets. In the absence of any cover, such as trees or buildings, and with targets appearing blurred due to lightning weather conditions, it is worth noting that such weather occurrences are relatively rare. Therefore, the weather classified and identified in this article includes fog, rain, snow, and clear weather occurring in various locations.

This article selected some images that meet the needs of this article from the Image2 Weather dataset and MWD, and collected some public images to construct a weather image data set dataset. The sources of the data set are mentioned in Table 1.

Table 1 Source distribution of weather image datasets

The data set dataset includes four categories of images: fog, rain, snow, and clear, with 1,000 images in each category. Based on the purpose of the weather recognition model application in this article, the rainy and snowy images in the data set are those that are raining and snowing images. Since the data set is small and there are many identical targets and features in different types of weather image, such as roads, trees, vehicles, pedestrians, etc., it will seriously interfere with the recognition of weather in outdoor images. Therefore, this article chooses to crop the images. and flipping to filter out images with weather characteristics, thereby reducing interference and effectively improving weather recognition. This paper processes the data set dataset and constructs a multi-category weather image block data set MWIBD. The processing process is shown in Fig. 1. The data set contains 4000 images, and each image is randomly cropped into 10 images with a size of 224 * 224.

Fig. 1
figure 1

Data Set Processing Process

First, each image in the data set is randomly cropped into 10 images of 224 × 224 size. There may be images without weather features in the cropped images, such as the close-up area of the foggy image. The fog is very light, approximately Sunny day images, so it is necessary to filter the images to remove images without weather features (fog, raindrops, snowflakes) or with unclear whether features, and then flip the remaining images left and right, finally forming a multicategory weather image block data set A sample of the MWIBD. MWIBD Fig. 2 shows data.

Fig. 2
figure 2

Example of MWIBD Dataset

The MWIBD weather image block data set includes four categories of weather images: fog, rain, snow, and clear. Each category has 10,000 images and each image size is 224 × 224. 80% of the images in the MWIBD data set are used for training and 20% for training. Image used for testing.

2.2 Weather Image feature extraction

This paper extracts shallow features from weather images to express the spatial information of the image, and extracts deep features to express the abstract and intrinsic spatial information of weather images.

The three geographically separated modules that make up the weather recognition model are decision recognition, feature fusion, and feature extraction. Both shallow and deep feature extraction methods are included in the feature extraction module. From the provided spatial information, the shallow feature mainly derives the average gradient, contrast, saturation, and dark channel. The fully connected layer features FC1 and FC2 of VGG16GTL, which are situated inside the model’s architecture, are the main sources of extraction for the deep feature.

2.2.1 Shallow-feature extraction

2.2.1.1 Average gradient

The average gradient of a picture might indicate its clarity, according to this study [11]. Images with higher average gradients have more edge information and are sharper. The mean gradient of the grayscale image is as follows: Eq. (1) Shown:

$$G=\frac{\sum _{i=1}^{w} \sum _{j=1}^{h} \sqrt{\frac{{\left({A}_{ij}-{A}_{(i+1)j}\right)}^{2}+{\left({A}_{ij}-{A}_{i(j+1)}\right)}^{2}}{2}}}{w*h}$$
(1)

Among them, \({A}_{ij}\)indicates the image’s pixel-point gray value \((i,j)\), \(w\)width and\(h\) height of the picture.

2.2.1.2 Contrast

This study found that contrast describes changes in picture value in image space [11]. Degraded photos have less contrast than clean ones. Under different weather conditions, the contrast difference is large, so it can be used as a feature to distinguish the contrast of weather images. As shown in formula (2):

$$C=\sqrt{\frac{V}{\sqrt{\left[\sum _{k=1}^{255} (k-G{)}^{4}*\frac{{N}_{k}}{w*h}\right]/{V}^{2}}}}$$
(2)

Among them, \(G\) is the average gradient of the image, \(V\)is picture standard deviation, \({N}_{k}\)pixel count with gray value \(k\), \(k\)represents input image gray value, \(k\in [0.0.255]\). Among them, the standard of the image the difference \(V\) is as shown in Eq. (3):

$$V=\sqrt{\left[\sum _{i=1}^{w} \sum _{j=1}^{h} {\left({A}_{ij}-{\stackrel{-}{A}}_{ij}\right)}^{2}\right]/\left(w*h\right)}$$
(3)

This includes, \({A}_{ij}\) represents the mean value of \({A}_{ij}\).

2.2.1.3 Saturation

This research indicated that saturation is not affected by lighting, so it can describe different images under different lighting conditions [2]. The normalized saturation of each pixel of image I is as shown in Eq. (4):

$${S}_{i,j}=\frac{{S}_{i,j}-min\left({S}_{I}\right)}{max\left({S}_{I}\right)-min\left({S}_{I}\right)}$$
(4)

Among them, \({S}_{i,j}\) represents the saturation of pixel point \((i,j)\), \(max\left({S}_{I}\right)\) is the maximum saturation value of image \(I\), and \(min\left({S}_{I}\right)\) is the minimum saturation value of image \(I\).

2.2.1.4 Dark passage

The work suggested image dehazing using dark-channel prior theory. According to the prior hypothesis of the dark channel, most local portions of outdoor haze-free photographs include pixels with very low values in at least one color channel, approaching 0 [12]. Therefore, the dark channel represents hazy weather. Image I’s dark channel is Eq. (5):

$${I}_{d}\left(x\right)=\underset{c\in \left\{R,G,B\right\}}{min} \left({I}_{c}\left(x\right)\right)$$
(5)

Among them, \({I}_{c}\) represents the RGB channel of the image.

2.2.2 Deep feature extraction

2.2.2.1 Transfer learning

Traditional machine learning and data mining work better when training and test sets are in the same feature space and equally distributed. Transfer learning is gaining popularity as each job requires a new data set, which is costly [13]. Transfer learning optimizes model training by transferring trained model parameters to a new model. Transfer learning improves training results, as most data and tasks are correlated [14]. This article fine-tunes transfer learning using the VGG16 network model.

2.2.2.2 VGG16 network model

This research built VGG16 to solve ImageNet’s 1000 picture categorization and placement categories [15]. Compared to AlexNet [16], VGG16 presents a smaller 3 × 3 and 1 network topology. The ×1 convolution kernel and 2 × 2 pooling kernel can enhance network depth while reducing parameters. Figure 3 shows the structure of the VGG16 network model. VGG16 has 13 convolutional layers, 5 pooling layers, and 3 fully connected layers.

Fig. 3
figure 3

VGG16 Structure Diagram

2.2.2.3 Deep feature extraction based on transfer learning

The VGG16 network model has very good generalization capabilities when migrating the network to new tasks. The model migration based on VGG16 is shown in Fig. 4.

Fig. 4
figure 4

Migrate VGG16 Model

The migration idea based on the VGG16 model is as follows: freeze 13 convolutional layers, remove all fully connected layers of the pre-trained model, and add 2 fully connected layers of your own design. Name the new two fully connected layers \({FC}_{1}\) and \({FC}_{2}\) respectively. Set the number of neurons output by \({FC}_{1}\) and \({FC}_{2}\) to 2048 and 1024 respectively.

The training parameters of the three fully connected layers FCG4096, FCG4096 and FCG1000 in the VGG16 network model are: 7 × 7 × 512 × 4 096102760448, 4096 × 4096 = 6,777,216, 4096 × 1 000 = 4,096,000. And based on the training parameters of the two fully connected layers FC1 and FC2 of the VGG16 migration model, VGG16GTL are: 7 × 7 × 512 × 2048 = 51,380,224, 2048 × 1024 = 2,097,152. Fine-tuning VGG16 After that, the training parameters were greatly reduced, making training more efficient.

The optimizer used in this article to optimize the network model is Adam [17], which combines the advantages of two optimization algorithms, \(AdaGrad\) and \(RMSProp\). The step size of Adam’s calculation update is not directly determined by the current gradient but by the first-order moment of the gradient. The estimation correction and the second-order gradient moment estimation correction are two parts of adaptive adjustment. Using Adam to update parameters is not affected by the scaling transformation of the gradient and can automatically adjust the learning rate. The relevant calculation formula for Adam is as follows.

The objective function \(J\left(\theta \right)\) is differentiated with respect to θ to obtain the gradient \({g}_{t}\):

$${g}_{t}={\nabla }_{\theta }J\left({\theta }_{t-1}\right)$$
(6)

Among them, \(\theta\) is the parameter of the network, which refers to the weight, deviation, or activation value; \(J\left(\theta \right)\) refers to the objective function with parameter \(\theta\) to be optimized.

The first-order moment estimates \({m}_{t}\) of the gradient \({g}_{t}\) is shown in Eq. (7):

$${m}_{t}={\beta }_{1}{m}_{t-1}+\left(1-{\beta }_{1}\right){g}_{t}$$
(7)

Among them, \({\beta }_{1}\) is the first-order moment attenuation coefficient.

The second-order moment estimate \({v}_{t}\) of gradient \({g}_{t}\) is shown in Eq. (8):

$${v}_{t}={\beta }_{2}{v}_{t-1}+\left(1-{\beta }_{2}\right){g}_{t}^{2}$$
(8)

Among them, \({\beta }_{2}\) is the second-order moment attenuation coefficient.

Since the initial value of \({m}_{t}\) is 0, it is biased towards 0 in the early stage of training, and \({m}_{t}\) needs to be biased and corrected, as shown in Eq. (9):

$${\stackrel{\wedge }{m}}_{t}=\frac{{m}_{t}}{\left(1-{\beta }_{1}^{t}\right)}$$
(9)

Among them, \({\beta }_{1}^{t}\) is the tth power of \({\beta }_{1}\).

Since the initial value of \({v}_{t}\) is 0, it is biased towards 0 in the early stage of training, and \({v}_{t}\) needs to be biased and corrected, as shown in Eq. (10):

$${\widehat{v}}_{t}=\frac{{v}_{t}}{\left(1-{\beta }_{2}^{t}\right)}$$
(10)

Among them, \({\beta }_{2}^{t}\) is the tth power of \({\beta }_{2}\).

The updated step size \({\theta }_{t}\) is shown in Eq. (11):

$${\theta }_{t+1}={\theta }_{t}-\alpha \frac{1}{\sqrt{{\widehat{v}}_{t}}+\epsilon }{\widehat{m}}_{t}$$
(11)

Among them, it was suggested through experiments to set \(\epsilon ={10}^{-8}\), and the learning rate \(\alpha\) can be adjusted according to the specific situation [17]. This article sets it to 0.0001 on the basis of experiments.

The features of the fully connected layers \({\text{F}\text{C}}_{1},{\text{F}\text{C}}_{2}\) in the transfer learning model based on VGG16 can better express the deep information of weather images, and the features of the fully connected layer are one-dimensional features, which are easy to integrate with traditional features. Therefore, this article extracts the \({\text{F}\text{C}}_{1}\) of VGG16-TL the features of the \({\text{F}\text{C}}_{2}\) layer is used as deep features of weather images.

2.3 Weather Image Feature Fusion

2.3.1 Feature Fusion Method

This article analyses in detail the fusion methods of different types of features and mentioned deep feature fusion methods, including additive fusion, maximum fusion, cascade fusion, etc. [18]. .

This paper adopts the idea of cascade fusion to fuse the shallow features and deep features of weather images. The feature fusion method is shown in Eq. (12):

$$F=\left[{F}_{G},{F}_{C},{F}_{S},{F}_{I},{F}_{F{C}_{1}},{F}_{F{C}_{2}}\right]$$
(12)

Among them, \({F}_{G}\) represents the average gradient feature, \({F}_{C}\) represents the contrast feature, \({F}_{S}\) represents the saturation feature, \({F}_{I}\) represents the dark channel feature, \({F}_{F{C}_{1}}\) and \({F}_{F{C}_{2}}\) represent the deep features extracted from the fully connected layers \({\text{F}\text{C}}_{1},{\text{F}\text{C}}_{2}\) of VGG16-TL respectively.

2.3.2 Weather Image Feature Fusion

The shallow features are to extract the average gradient, contrast, saturation, and dark channel histograms. Each feature has 256 dimensions, and the shallow features have a total of 1024 dimensions. The deep features are used to extract the features of the fully connected layers \({\text{F}\text{C}}_{1}\) and \({\text{F}\text{C}}_{2}\) of VGG16GTL, and the deep features have a total of 1024 dimensions. 3072 dimensions and then cascade and fuse all features to form 4096-dimensional weather image features.

This article uses TensorFlow in the feature fusion module. The \(tf.concat\left(\right)\) function implements cascade fusion of average gradient, contrast, saturation, dark channel, \({\text{F}\text{C}}_{1}\), and \({\text{F}\text{C}}_{2}\).

2.4 Weather image recognition model

The outdoor image weather recognition model designed in this article based on image segmentation and feature fusion is shown in Fig. 5.

Fig. 5
figure 5

Weather Recognition Model

The weather recognition model consists of three modules that are spatially divided: feature extraction, feature fusion, and decision recognition. The feature extraction module consists of both deep feature extraction and shallow feature extraction techniques. The shallow feature mainly extracts the average gradient, contrast, saturation, and dark channel from the given spatial information. The deep feature mainly extracts the fully connected layer features FC1 and FC2 of VGG16GTL, which are located within the architecture of the model. The features can be found in Sect. 3, which provides specific details about the content of the extraction module.

The VGG16GTL transfer learning model, which is based on VGG16, has features of fully connected layers FC1, FC2 that can better describe the deep information of weather photos. Furthermore, the features of the completely connected layer are one-dimensional features that are simple to combine with traditional features. As a result, the FC1 of VGG16-TL is extracted in this paper, and the properties of the FC2 layer are used as deep features of the weather image.

The feature fusion module cascades and fuses the extracted deep features and the combined shallow features. Please, see Sect. 3 for the specific content of the feature fusion module. The decision recognition module is a feature fused through Softmax training, which can realize the recognition of four weather conditions: fog, rain, snow, and clear weather.

3 Results and discussion

3.1 Experimental environment

The experiments in this article were all conducted on an Intel(R) Core (TM) i7G9750H CPU @2.60 GHz, 8GB RAM, 64-bit Windows 10 system, NVIDIA GeForce GTX 1660 Ti, python 3. 6. Conducted in a TensorFlow2.2.0 GPU environment.

3.2 Evaluation index

The confusion matrix is an indicator to evaluate the effect of the model and can be used to evaluate the weather recognition model designed in this article [8]. The confusion matrix is listed in Table 2. In Table 2, \({T}_{P}\) indicates that the image calibration is a positive sample and the classification result is also a positive sample. \({F}_{P}\) means that the image is calibrated as a negative sample and the classification result is a positive sample; \({F}_{N}\) means that the image is calibrated as a positive sample and the classification result is a negative sample; \({T}_{N}\) means that the image is calibrated as a negative sample and the classification result is also a negative sample.

Table 2 Confusion Matrix

The weather recognition model of the article can be evaluated using a confusion matrix [8]. Table 2 shows the spatial confusion matrix. The geographical data in Table 2 and the classification result imply that TP is a solid picture calibration and classification sample. FP, FN, and TN indicate negative images with positive, negative, or negative classification results, respectively, in spatial calibration.

Precision(P): Represents the proportion of correctly classified samples among the samples identified as positive categories:

$$\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}=\frac{{T}_{P}}{{T}_{P}+{F}_{P}}$$
(13)

Recall (R): Represents the proportion of correct predictions among all positive category samples:

$$Recall=\frac{{T}_{P}}{{T}_{P}+{F}_{N}}$$
(14)

Accuracy: Indicate the proportion of correctly predicted samples to the total number of samples:

$$Accuracy=\frac{{T}_{P}+{T}_{N}}{{T}_{P}+{F}_{P}+{T}_{N}+{F}_{N}}$$
(15)

3.3 Experiment 1: shallow feature experiment

This article designs Experiment 1 to compare the weather recognition of shallow features Local Binary Pattern (LBP), Histogram of OrienGted Gradient (HOG), brightness, average gradient, contrast, saturation, and dark channel. Effect.

As a threshold, the local binary pattern (LBP) technique uses the gray value of the center pixel in a local picture region. Compare this threshold to nearby pixels. Its position is recorded as 1 if the neighbourhood pixel value is larger than the central location pixel value, suggesting a spatial link. If the neighbourhood pixel value is less than the central pixel value, the location is shown as 0, indicating a different spatial connection. From the binary number that expresses spatial information, a decimal number may be obtained. Image’s center pixel LBP is stated in decimal form. The picture feature is a histogram of all local LBP values. HOG creates a histogram by computing gradient values in the immediate region of the image. Picture to create space. The experimental results of the shallow feature comparison are listed in Table 3.

Table 3 Shallow Feature Comparison

As shown in Table 3, the impact of a shallow feature’s recognition is poor, since it may be better at recognizing one sort of image than another. For example, the dark channel can effectively identify foggy weather images, but the classification effect on rainy and snowy days is poor; contrast can effectively distinguish foggy, sunny and rainy images, but it is difficult to distinguish between rainy and snowy images.

Although the classification effect of a single shallow feature is not good, the fusion of four shallow features can achieve a good classification effect. The fusion of the four shallow features of average gradient, contrast, saturation, and dark channel can achieve a recognition accuracy of 74.66%. Rate.

When comparing 7 shallow features, the accuracy of HOG, LBP, and brightness is relatively low. This article chooses to use four shallow features: average gradient, contrast, saturation, and dark channel.

3.4 Experiment 2: deep-feature experiment

Design experiment 2 compares the recognition effects of 4 network models \(AlexNet\) [16] \(GoogLeNet\) [19], ResNet50 [20], VGG16 [21] and the migration models of these 4 network models: Alex-Net-TL [22], \(GoogLeNet-TL\) [23], ResNet50-TL [24] and VGG16-TL [25].

The training parameter settings of all models are consistent, the batch size \(batch\_size\) is set to 32, the network model optimizer is Adam, the learning rate is set to 0.0001, and the number of training iterations epoch is set to 100. 4 network models \(AlexNet\), \(GoogLeNet\), the change curves of the recognition accuracy of \(ResNet50\) and \(VGG16\) in different iteration cycles are shown in Fig. 6 (a) and 6 (b).

Fig. 6
figure 6

Recognition loss and accuracy of 4 mitigation models in different iteration cycles. a Loss Changes Network Mode, b Accuracy Changes Network Model

As can be seen in Fig. 6, the training loss of \(GoogLeNet\) is relatively the smallest among the four trained network models. The training loss after 20 epochs is basically less than 0.1, and the classification accuracy is high, up to 96.20. %.

Comparing the four full training models, first of all, \(AlexNet\) uses large convolution kernels with sizes of \(11\times 11\) and \(5\times 5\), while VGG16 uses smaller convolution kernels with sizes of \(3\times 3\) and \(1\times 1\), and uses small convolution kernels. Replacing large convolution kernels with convolution kernels can deepen the network depth and reduce network parameters, so the effect of \(VGG16\) is better than \(Alex-Net\). Secondly, \(AlexNet\) and \(VGG16\)both have convolutional, pooling, and fully linked layers.\(ResNet50\) uses the residual structure is replaced, while \(GoogLeNet\) uses the Inception Module structure and converts all connections to sparse connections [19]. Therefore, \(GoogLeNet\) not only effectively controls the parameters and calculation amount, but also has better recognition results. After comparing the 4 full training models, compare the migration network models of these 4 models, namely \(AlexNet-TL\), \(GoogLeNet-TL\), \(ResNet?0-TL\), \(VGG?-TL\). The classification accuracy change curves of these 4 migration models in different iteration cycles are shown in Fig. 7(a) and 7(b). As shown in Fig. 7, it can be seen that the convergence speed of the four models of the migration network is faster than that of the fully trained network model, but the training loss of \(GoogLeNet-TL\) and \(ResNet50-TL\) is larger than that of the fully trained \(GoogLeNet\) and \(Res-Net50\). When comparing the four migration network models, among them, the training loss of the \(VGG16GTL\) model is relatively minimal. After 10 epochs, the training loss is already lower than 0.1, and the classification accuracy is high, up to 97.29%.

Fig. 7
figure 7

Classification loss and accuracy of 4 migration models in different iteration cycles, (a) Loss Change Transfer Network Model (b) Accuracy Changes Transfer Network Model

Although among the four fully trained models, \(GoogLeNet\) has better recognition results, when performing transfer learning, \(VGG16GTL\) has better generalization ability and can achieve good transfer effects. Therefore, \(VGG16GTL\) extracts deep features from weather images CNN [26] preferred model weather images.

Comparing the training loss changes and the accuracy change curves of the full training model and the transfer model of a total of 8 models, the\(VGG16-TL\) model has a relatively small loss and the fastest training convergence speed. The comparison of the specific recognition accuracy of the 4 network models and the 4 migration network models is listed in Table 4.

Table 4 Recognition Accuracy of Network Model

It can be seen in Table 4 that the recognition accuracy of the VGG16GTL model [27] is relatively the highest.

Taken together, the VGG16-TL model has smaller losses [28], the fastest training convergence speed, and the highest weather recognition accuracy. Therefore, this article chooses the VGG16-TL model to extract deep features of weather images [29].

3.5 Experiment 3: comparison of different methods

To verify the effectiveness and superiority of the proposed model, experiment 3 is designed to compare the proposed method with previous methods presented in articles [30], [31], and [32]. This article compares the recognition effects of four methods on four types of images: fog, rain, snow, and clear. The experimental results are listed in Table 5.

Table 5 Comparison of 4 Methods

3.6 Discussion

As can be seen in Table 5, the overall recognition accuracy of the model in this document can reach 99.22%. The recognition effect on fog, rain, snow, and clear weather and the overall recognition accuracy are better than those of other models. Among them, especially the recognition effect of rain and snow is greatly improved compared to other methods. The network structure used in Guo et al.‘s method is \(Alex-Net\). Through Experiment 2, it can also be seen that the weather recognition effect of \(AlexNet\) full training is poor. The method can extract deep features of images and express deep semantic information of images, but lacks the expression of shallow information of images. Therefore, although the recognition accuracy of Wang et al.‘s method on the MWIBD data set in this paper has reached more than 90%, its recognition effect has yet to be improved. Improvement. This method can easily identify foggy days, but the recognition effect on rainy days and snowy days is poor. Goswami has proved through experiments that it is difficult to improve the weather recognition effect of the network model by increasing the number of convolution layers. Therefore, this paper integrates \(VGG16-TL\) extraction. Deep image features and shallow image features are used for weather recognition, which can achieve better weather recognition results. Taken together, the weather recognition model proposed in this article takes into account both shallow features and deep features of weather images, and achieves better recognition results in four types of weather: foggy, rainy, snowy, and sunny weather. However, the model in this article also has misidentifications. The model’s recognition effect on rainy days and snowy days is not as good as foggy days and sunny days. This is because the two types of rainy and snowy days degradation are relatively similar and easily confused. The outdoor image weather recognition model designed in this article is intended to be applied in an adaptive video image sharpening processing system. When the current weather of the video image is recognized, the corresponding sharpening processing method is automatically called to improve the visibility of the degraded video image. If it is recognized that the current weather is raining, the video image rain removal method requires real-time processing to improve the visibility of video surveillance in a timely manner.

4 Conclusion

In this study, we created a weather image segmentation dataset that we refer to as MWIBD for outdoor weather image recognition. The dataset includes photographs of the weather from a wide variety of categories. In addition to this, it suggests a model for weather recognition that uses techniques such as image segmentation and feature fusion. This model can classify photographs of the weather by combining superficial and deep aspects of photographs through the use of spatial information. It does a good job of collecting both surface-level information, such as how the image appears, and the deeper level of semantic information, such as the meaning that lies behind the image. The model improves the accuracy of weather identification by using geographic data at every stage of the development process. The recommended weather identification model not only works well in a variety of geographical contexts, but also converges rapidly and has a low training loss. In the future, the method will try to be improved to be suitable for applications that include the management of spatial information, such as adaptive video image sharpening.