1 Introduction

According to the equipment type, the existing automatic parking methods can be roughly divided into two categories: the radar-based one and the camera-based one. The most widely used radar-based method includes: laser radars [13, 21, 29], ultrasonic radars [20, 22], and short-wave radars. Though these methods are good at detecting vehicles and obstacles, or planning and tracking routes, they can neither judge the detected object type well nor obtain the information for parking guidance. Contrarily, the camera-based method can grasp the ground guidance information better and have a lower requirement for hardware and the image quality. Now, the most popular four car cameras are optical flow cameras, depth cameras, stereo cameras [26], and fish-eye cameras [2]. However, these approaches also just pay attention to the space calculation, without fully utilizing the ground information. Thus, Zhang et al. [8, 30] proposed a parking spot detection algorithm using the vertex angle as detection object for determining a parking spot, which takes much time to the vertex angle classification and is not enough universal.

Fig. 1
figure 1

A schematic diagram of the core content. a is a processed ground image; b is the result of corner detection; cf are, respectively, the generative result by sideline clues, occlusion clues, edge clues, and domain clues; g is the final result with detected parking spots

In this paper, we propose a Generative Parking Spot Detection (GPSD) algorithm using a multi-clue recovery model. First, we design an illumination balance algorithm for ensuring detection accuracy, which splits the original image into multiple areas and specifies the balance strategy according to the illumination difference of each area. Then, we propose a micro-target detection algorithm, which strengthens the weight of underlying semantics and expands the spatial pyramid pooling layer. Finally, we propose the multi-clue model using the detected corner to recover the parking spot and adding several clues to correct the recovery result. The result of each stage about proposed algorithm is shown in Fig. 1.

Our contributions can be summarized as followings:

  • Illumination balance We gradually split the original image with complex scene information into multiple layers, in order to enhance the locally blurred target and ensure the global illumination continuity more conveniently.

  • Micro-target detection We adjust the network by strengthening the weight of underlying semantics and expanding the spatial pyramid pooling layer to enhance its detection ability.

  • Parking spot location We geometrically dismantle the object within the original image into several meta-elements as the input of detection process, and for eliminating the interference of the complex scene information, we use the multi-clue model to recover the parking spot, as well as correct the final result.

Fig. 2
figure 2

A schematic diagram of GPSD. The proposed algorithm consists of three main modules: an Illumination Balance Module, a Micro-Target Detection Module, and a Generative Parking Spot (GPS) Location Module. During the illumination balance processing, the original image will be transported to a single-channel one for weakening the influence of the color dimension and then be separated to several areas where the occlusion area will be regarded as Occlusion Clues (OCs) in GPS Location Module. Both Line Detector and Micro-Target Detection Module will take the balanced image as input, and the sideline detected in the former will be regarded as Sideline Clues (SCs) in GPS Location Module; while the result of the latter will be used to construct a fully pairing map. Finally, several clues, like SCs, OCs, Edge Clues (ECs), and Domain Clues (DCs), will be applied to correct the fully pairing map, as well as identify the real parking spot in GPS Location Module

2 Related works

Automatic Parking Assistance System (APAS) The early APAS tries to find a empty space for parking and guide routes by sensors or cameras. Song et al. [21] proposed a laser-based Simultaneous Localization and Mapping (SLAM) automatic parallel parking and tracking control scheme. Scheunert et al. [18] used a photonic mixer-type depth camera to collect the spatial parking lot information. Suhr et al. [25] designed a three-dimensional point cloud reconstruction based on motion stereo. However, these methods ignore the ground parking spot line and may cause wrong parking. With the deep learning rapidly growing, the new APAS starts using ground images for detecting parking spots, which are captured by four-way fish-eye cameras and finally transformed to an around view images [9, 10, 32]. In addition, Yamamoto et al. [28] proposed a parking control system by only a monocular camera, while Athira et al. [1] presented an image processing based on Optical Character Recognition (OCR). Besides, the Internet of Thing (IoT) has also been used for smart parking lot managements [5] and real-time information exchanges [6].

Parking spot detection (PSD) The existing PSD method mainly detects the vertex angle [8, 23, 24, 27, 30] or the sideline [7, 19]. Zhang et al. [30] proposed a spot detection algorithm based on deep convolutional neural network and built a large-scale labeled dataset. Inspired by their works, Suhr and Jung [24] proposed an end-to-end trainable one-stage parking slot detection method, and Wu et al. [27] annotated and released the large-scale benchmark dataset PSDD. Sedighi and Kuhnert [18] presented a parking strategy for vision-based autonomous parking systems in which the ego-vehicle could complete its auto-park by one maneuver, or up to maximum three required maneuvers. Although the existing method can catch the ground information wall, it takes much time in classification tasks about vertex angles or sidelines, and taking different recovery strategies in different situations improves the algorithm complexity. Based on the existing car camera system and inspired by the YOLO [3, 14,15,16], we propose a parking spot detection algorithm using corners for the detection target.

3 Overview

Different from existing algorithms using sidelines [7, 19] or vertex angles [8, 30] as the describable characteristic of a parking spot, we focus on the corner, which is more basic but the key constituent element of parking spots, and propose a generative parking spot detection algorithm based on multi-clue recovery model. As shown in Fig. 2, the proposed algorithm contains three main modules: an illumination balance module for image preprocessing, a micro-target detection module for corner detection, and a generative parking spot location module for parking spot recovery. For each captured ground image, it will be firstly transformed to a single-channel picture in a dimension reduction module, then be filtered, and finally be separated into several areas where the view area is transported into the balancer module for completing illumination balance. After these, the preprocessed image will be transported into the micro-target detection module and a line detector. In the former, we take a designed CNN to detect the position of corners and use them to construct a fully pairing map. In the generative parking spot location module, we design a multi-clue recovery model which takes various clues, like sideline clues, occlusion clues, edge clues, and domain clues, to correct the pairing result for locating the real parking spot.

4 Layered analytical illumination balance

In this section, in order to solve the problem of the unbalanced illumination area, partial information missing, and various definitions of parking spot lines, we propose a Layer Analytical Illumination Balance (LAIB) method for image preprocessing.

4.1 Layered analytical model

Color dimension reduction The multiple color type of parking spot lines may cause some unnecessary interference while detecting. Since there is always an obvious color difference between lines and the ground, it is easy to remove the color feature by reducing the color dimension. In our experiment, we just transform the RGB image to a gray one.

Area separation As shown in Fig. 3, we firstly design a Common Area (CA) Extractor to catch the occlusion from several reference images. Because the occlusion is caused by a same camera, it is consistent in all ground images, and we select the reference image following the random principle. Then, we used the extracted occlusion to separate the view from the target image in a View-Occlusion (VO) Separator, which mainly contains the parking spot line and the ground. Finally, we further separate the view into the line area with lights and shadows, and the ground area with lights and shadows in a Line-Ground (LG) Separator.

Fig. 3
figure 3

A frame analysis diagram of Layered Analytical Illumination Balance (LAIB). In this figure, several ground images are randomly selected from the original dataset as the reference images, and they are used to extract the common area in the Common Area (CA) Extractor. The extracted common area is the occlusion area caused by the view of a car camera itself and will be regarded as the reference to separate the occlusion from the view for each target image in the View-Occlusion (VO) Separator. The separated view area is transported into the Line-Ground (LG) Separator to catch the independent parking spot line and ground, which are used to calculate the illumination value, separately

Fig. 4
figure 4

A design schematic diagram of parking spot types. Four categories are selected from numerous vehicle samples and adjusted to the top view. At the same time, four common PS categories are summarized from various parking spot examples. From the top-down perspective, all vehicles can be regarded as several approximate rectangles, and this feature has affected the design of parking spot types

Fig. 5
figure 5

A schematic diagram of corner classification. In this figure, the reliable corner, \(P_1 \sim P_9\), is marked in red, the occlusion hypothetical corners, \(P_2(g), P_7(g), P_{10}(g)\), is marked in green, and the edge hypothetical corners, \(P_1(b), P_8(b), P_9(b)\), is marked in blue

4.2 Illumination balance strategy

Filter design We improve the original Gaussian filter function by increasing the pixel utilization for reducing the influence of image sharpness dropping:

$$\begin{aligned} F(p_{x,y},\delta ) = \lfloor f(p_{x,y},k,\delta ) + {\varDelta }f \rfloor , \end{aligned}$$
(1)

where \(F(p_{x,y},\delta )\) is the improved Gaussian filter function, consisting of the body \(F(p_{x,y},\delta )\) and a relevance item \({\varDelta }f\) for increasing the correlation of surrounding pixels. In \(F(p_{x,y},\delta )\), \(p_{x,y}\) represents a pixel at location (xy) in ground images; \(\delta \) is a retracting coefficient for adjusting the conversion ratio of pixel values and \(\delta \in [0.5,1.0]\):

$$\begin{aligned} \begin{aligned} f(p_{x,y},k,\delta ) = \frac{\sum \sum \frac{p_{i,j}}{|k^2 \cdot \log (\frac{p_{i,j}}{255}+\delta ) - \sum \sum \log (\frac{p_{i,j}}{255}+\delta )|}}{\sum \sum \frac{1}{|k^2 \cdot \log (\frac{p_{i,j}}{255} + \delta ) - \sum \sum \log (\frac{p_{i,j}}{255}+\delta )|}},\\ \end{aligned} \end{aligned}$$
(2)

where k is the Gaussian kernel size, satisfying \(k>1\). For each pixel \(p_{i,j}\) at location (ij), it satisfies \(1-k \le 2 \cdot (i-x), 2 \cdot (j-y) \le k-1\). And \({\varDelta }f\) is as following:

$$\begin{aligned} \begin{aligned} {\varDelta }f = {\left\{ \begin{array}{ll} \frac{\sum \{f-p^{'} | p^{'}<f\}}{\sum 1}, \frac{\sum \{p^{'}-f | p^{'}>f\}}{\sum \{f-p^{'} | p^{'}<f\}} \le 0.2\\ \frac{\sum \{p^{'}-f | p^{'}>f\}}{\sum 1}, \frac{\sum \{f-p^{'} | p^{'}<f\}}{\sum \{p^{'}-f | p^{'}>f\}} \le 0.2\\ \varepsilon _{-} \cdot \frac{\sum f - p^{'}}{\sum 1} + \varepsilon _{+} \cdot \frac{\sum p^{'}-f}{\sum 1}, \mathrm{else} \end{array}\right. }, \end{aligned} \end{aligned}$$
(3)

where \(\varepsilon _+\) and \(\varepsilon _-\) are both proportional coefficients, satisfying \(\varepsilon _++\varepsilon _-=1\). And for simplicity, \(f(p_{x,y},k,\delta )\) is simplified to f. \(p^{'}\) represents the surrounding pixel, defined as following:

$$\begin{aligned} p^{'} \in \left\{ \log \left( \frac{p_{x+i,y+j}}{255} + \delta \right) | i,j \ne 0 \right\} . \end{aligned}$$
(4)

Balance strategy After the filter processing, the ground image will be separated to get the view which will be further divided into four parts: the parking spot line within shadow areas \(R_s^l\) and light areas \(R_h^l\), and the ground within shadow areas \(R_s^g\) and light areas \(R_h^g\), according to several thresholds \(G_1\), \(G_0\) and \(G_2\), sequentially:

$$\begin{aligned} G_0(V,N,{\varGamma }_0)= & {} \sum \left( \frac{v_i \cdot \log \frac{n_i}{\sum n_i}}{\sum \log \frac{n_i}{\sum n_i}} + \frac{B \cdot \log (B^2 + \mu _0)}{\sum \log (B^2 + \mu _0)}\right) \nonumber \\ B= & {} \frac{v_i \cdot \sum n_i}{\sum (v_i \times n_i)} - 1, \end{aligned}$$
(5)

where \(G_0(V,N,{\varGamma }_0)\) is the calculation function. For each pixel i in the view, its pixel value is \(v_i\), and the value set of all pixels is V, satisfying \(v_i \in V\); for each \(v_i\), its corresponding pixel number is \(n_i\), and the amount set of \(v_i\) is N, satisfying \(n_i \in N\); and the size of V is equal to the type amount of all pixel values \({\varGamma }_0\), where \(1 \le i \le {\varGamma }_0\). \(\mu _0\) is an adjustment parameter and satisfies \(\mu _0 \le 1\). Due to the definition of \(G_1\) and \(G_2\) is like \(G_0\), it will not be repeated for simplicity. In order to find the illumination effect on the parking spot line and the ground, we, respectively, calculate the ground one by \(R_s^g\) and \(R_h^g\), and the line one by \(R_s^l\) and \(R_h^l\). Specifically, when the illumination is simple or single, we just obtain the median of \(R_s^g\) and \(R_h^g\), and the illumination effect is \((R_s^g + R_h^g)/2\); if the illumination is complex, we need to subdivide the original area and extract the average value of the illumination of each sub-area to decide a multi-level illumination balance strategy.

Table 1 A comparison result table of corner detection experiments based on HERV 2018 dataset

5 Fast micro-target detection for corners

The existing parking spot detection algorithm focuses on identifying the type of detected sidelines or vertex angles. Usually, these various types will not only need large numbers of training cases for deep convolutional neural network, but also make the subsequent identification cumbersome. In order to solve this problem, we consider using corners as the alternative and propose a Fast Micro-Target Detection (FMTD) algorithm.

5.1 Corner properties

As Fig. 4 shown, though vehicles have various appearances, their ground projection can always be abstracted to some rectangles, which makes the common parking spot is a rectangle or parallelogram. For a regular parking spot, Zhang et al. [30] used the vertex angle as the detection target, but pay much attention to consider different types of them. Considering that the corner is the basis of sidelines and vertex angles, we choose it as an alternative.

5.2 Corner selection

Although it is difficult to catch the position of missing corners outside the view, we can find some alternatives: when these missing corners approach the view boundary along the sideline they belong to, they will eventually intersect with the boundary. We call the real corner on the parking spot line the reliable corner and the intersection the hypothetical corner, which can be further subdivided into the occlusion one and the edge one according to their positions in the ground image. The occlusion hypothetical corner is not only caused by the view limitation of car cameras, but also from the occlusions of vehicles or other obstacles. In Fig. 5, we show several samples of above corners. With introducing of hypothetical corners, we improve the single-image utilization, though some parking spots, like \(P_1(b) P_2(g) P_3 P_4\), have the deformation.

5.3 Corner detection

Since corners are relatively micro-targets relative to the whole parking spot, it is necessary to improve the detection ability of the detection model for micro-target. Thus, we have proposed the following improvement strategy: We add a spatial pyramid pooling layer with a five-window structure. In this way, we can use more local small images to train our network, so that the network’s learning ability of local features can be greatly enhanced. At the same time, we also use the pooling result obtained by different windows for convolution processing.

6 Generative parking spot location

After the corner detection, we propose a Generative Parking Spot Location (GPSL) method to recover the effective parking spot, which will construct a fully pairing map using a pairwise pairing method and extract useful information from the original image as clues for correcting the result.

6.1 Sideline clue

We extract parking spot sidelines from the captured ground image and use them to depict parking spots. Specifically, we will compare the connection of each corner pair (ij), defined as \(K^s(i,j)\), with the group of parking spot sidelines \(Q^s\):

$$\begin{aligned} \begin{aligned} D^s(i,j,\rho ,b)&= \omega ^s + \left\| \gamma _1^s \cdot \left| \log \frac{{\varDelta }\rho }{\xi ^s}\right| \right. \\&\quad \left. + \gamma _2^s \cdot \sum _{i,j} \log \frac{\rho \cdot x + b}{y}\right\| _{-\infty } \end{aligned}, \end{aligned}$$
(6)

where \(D^s(i,j,\rho ,b)\) is the coincidence degree between \(K^s(i,j)\) and \(Q^s\). \(\rho \) and b are the parameter of \(Q^s\), satisfying \(y=\rho \cdot x + b\). \(\omega ^s\) is a coincidence degree threshold and \(\omega ^s <1\). \(\gamma _1^s\) and \(\gamma _2^s\) are, respectively, the line proportion and the point proportion, satisfying \(0<\gamma _1^s,\gamma _2^s <1\). \({\varDelta }\rho \) is the slope difference between \(K^s(i,j)\) and \(Q^s\). \(\xi ^s\) is a basic threshold for controlling the line coincidence degree and \(\xi ^s <1\). If and only if \(D^s(i,j,\rho ,b) \le 0\), we consider the connection \(K^s(i,j)\) is a part of one reliable sideline.

6.2 Occlusion clue

Since the boundary of occluded areas is usually irregular, we directly scan each pixel on the connection of an occlusion hypothetical corner pair \(K^o(i,j)\):

$$\begin{aligned} \begin{aligned} D^o(i,j)&= \omega ^o + \gamma _1^o \cdot \frac{{\varDelta }L}{\xi ^o}\\&\quad + \gamma _2^o \cdot \frac{\sum _{i,j} \left\{ 1 | \log \left( \frac{v}{255}+q\right) \le 0 \right\} }{\sum _{i,j} 1} \end{aligned}, \end{aligned}$$
(7)

where \(D^o(i,j)\) is the confidence of connections \(K^o(i,j)\). \(\omega ^o\) is a confidence threshold and \(\omega ^o <0\). \(\gamma _1^o\) and \(\gamma _2^o\) are, respectively, the line length proportion and the pixel value proportion, satisfying \(0<\gamma _1^o,\gamma _2^o <1\). \({\varDelta }L\) is the Euclidean distance of the occlusion hypothetical corner i and j. \(\xi ^o\) is a basic threshold for controlling the line length confidence and \(\xi ^o \>1\). The pixel value is expressed as \(\log (v/255 + q)\), where \(v \in [0,255]\) and q is a preset threshold determined by the occlusion area’s type. If and only if \(D^o(i,j) \le 0\), we think the connection \(K^o(i,j)\) is a part of the occlusion area boundary.

Fig. 6
figure 6

A schematic diagram of the rating benchmark. In this figure, different distribution areas are marked with various colors, and each one has its own score. It is noted that the score is allowed to be negative. Three samples have been shown in the right column

Fig. 7
figure 7

A comparison result diagram of different corner detection algorithms on HERV 2018 dataset. In this figure, columns 1 to 5 are the result of a ATSS [31], b Faster R-CNN [17], c Retina Net [11], d SSD [12], and e our FMTD, respectively

Fig. 8
figure 8

A statistical graph of comparative experimental results on HERV 2018 dataset. In this figure, a is the FD rate, b is the precision rate, c is the recall rate, and d is the quality

6.3 Edge clue

For the connection problem of an edge hypothetical corner pair \(K^e(i,j)\), we use the following equation:

$$\begin{aligned} \begin{aligned} D^e(i,j,\rho ) = \omega ^e + \left\| \log \left( \left| \frac{{\varDelta }\rho }{\xi ^e}\right| \right) \right\| _{-\infty }. \end{aligned} \end{aligned}$$
(8)

where \(D^e(i,j,\rho )\) is the coincidence degree between the connection of \(K^e(i,j)\) and the image edge group \(Q^e\). \(\rho \) is the slope of \(Q^e\). \(\omega ^e\) is a coincidence degree threshold and \(\omega ^e <1\). \({\varDelta }\rho \) is the slope difference between \(K^e(i,j)\) and \(Q^e\). \(\xi ^e\) is a basic threshold for controlling the line coincidence degree and \(\xi ^e <1\).

As for the connection problem between two types of hypothetical corners, we add the marginal assistance point and the vertex for correcting the deformation, where the former is the intersection of occlusion areas and the view. Specifically, there are two cases: when targets are edge hypothetical corners, vertices, and marginal assistance points, we just adjust the type of i and j in Eq. (8); and when targets are occlusion hypothetical corners and marginal assistance points, we can replace i or j with the marginal assistance point in Eq. (7). In the former case, the marginal assistance point is also the occlusion hypothetical corners.

Fig. 9
figure 9

A comparison result diagram of proposed algorithm on HERV 2019 dataset. In this figure, a is the original ground image; b is the detection result of corners; rows 3–6 are the recovery result of using c SC, d OC, e EC, and f DC, respectively; g is the final parking spot marking map

6.4 Domain clue

In order to select the effective parking spot from these polygonal domains divided by above clues, we take the level of points, lines, and areas for consideration:

$$\begin{aligned} D^d(P^n,L^m,S) = \min \{ D_p^d(P^n), D_l^d(L^m), D_a^d(S) \}, \end{aligned}$$
(9)

where \(D^d(P^n,L^m,S)\) is the final discriminator containing three sub-discriminators: the point one \(D_p^d(P^n)\), the line one \(D_l^d(L^m)\), and the area one \(D_a^d(S)\).

In \(D_p^d(P^n)\), \(P^n\) is the set of n vertices. For each \(p_i\), its type j can be \(\{ 1,2,3 \}\), respectively, representing reliable corners, hypothetical corners, and the assistance point; the corresponding weight \(\eta _j\) is presented and \(\sum \eta _j = 1\), while the corresponding non-negative score \(v_j\) is decreasing and no more than 1:

$$\begin{aligned} D_p^d(P^n) = \frac{1}{n} \cdot \sum _{i=1}^n \{ \eta _j \cdot v_j | p_i = j, j=1,2,3 \}. \end{aligned}$$
(10)

In \(D_l^d(L^m)\), \(L^m\) is the set of m sidelines. \(l_i\) is a sideline of \(L^m\) and \(1 \le i \le m\). \(\xi _1^d\) is a standard ratio of long and short sidelines according to the real parking spot type and \(\xi _1^d \ge 1.5\):

$$\begin{aligned} D_l^d(L^m) = \frac{1}{\xi _1^d} \cdot \left| \mathrm{minlog}\left( \frac{l_i}{l_{i \pm 1}}\right) \right| . \end{aligned}$$
(11)

In \(D_a^d(S)\), S is the area and \(\xi _2^d\) is a standard value according to the real parking spot type and \(\xi _2^d \ge 1.2\):

$$\begin{aligned} D_a^d(S) = \log \left( \root 4 \of {1/S}\right) /\xi _2^d. \end{aligned}$$
(12)

7 Experimental results

Our algorithm is proposed in the design phase of a real project and is fully verified during the implementation process. Datasets used for comparison experiments are HERV 2018 and HERV 2019, both of which are provided by the foundation engineering supported by the Huayu Automotive Systems Co., Limited (HASCO) and the East China University of Science and Technology (ECUST). Specifically, the HERV 2018 dataset contains more than 800 processed bird’s-eye views with a size of \(360 \times 240\); the HERV 2019 dataset contains 440 processed fish-eye camera images with a size of \(900 \times 350\).

7.1 Corner detection

In Table 1, we show the data comparison result of our proposed FMTD algorithm with several classic object detection algorithms which are ATSS [31], Faster R-CNN [17], Retina Net [11], and SSD [12] on HERV 2018 dataset. We randomly split the original dataset into 8 groups where there are 100 images in each group. Every time, we use 7 groups as the training set which also contains the validation set, and 1 group as the testing set. We choose False Detection rate (FD rate), perception rate, recall rate, and the quality as the comparison item. The quality is designed to describe the average distance between detected corners and the real corner, which is abstracted as a score, and the rating rule is shown in Fig. 6.

In Fig. 7, we show the comparison result of above five algorithms. As this figure shown, there are 12 samples, and for each sample, we select several domains to compare the performance in details: blue boxes are used to show error detection results, while green and orange boxes are for the result ought to be detected. According to this figure, it is obvious that our method can greatly reduce the probability of error detection results when compared with ATSS, Faster R-CNN, and Retina Net; while compared with Faster R-CNN and SSD, our method has a higher accuracy. Besides, our method also can achieve a higher score in precision of the detected corner position. In Fig. 8, we show the corresponding statistical graph of comparison results in Table 1. As shown in this figure, our FMTD method always can achieve the better performance than other algorithms.

Fig. 10
figure 10

A comparison result diagram of different parking spot detection algorithms on HERV 2018 dataset. In this figure, rows 1 to 4 are, respectively a original image, b ground truth, the result of c Yolact [4] and the result of d our GPSL

7.2 Parking spot detection

In Fig. 9, we show the experimental result of proposed multi-clue recovery processing on HERV 2019 dataset. In this figure, (b) is the result of FMTD without fully pairing processing, and it is transported into the multi-clue recovery model as the input. Next, four rows are, respectively, the result of recovery processing using sideline clues, occlusion clues, edge clues, and domain clues. Specifically, the order of edge clue processing and occlusion clue processing can be changed. As this figure shown, our multi-clue recovery model can effectively locate parking spots and mark them in this map. At the same time, it can also deal with the deformation of the parking spot well and ensure that the recovered parking spot meets the intuitive perception of human eyes.

In Fig. 10, we compare the proposed GPSL algorithm with the classic instance segmentation algorithm, Yolact [4]. The testing dataset for comparison experiments is HERV 2018 dataset. In this figure, there are a total of 5 different samples, and for each sample, we have shown the original top-view image in row (a), the ground truth in row (b), the location result of Yolact in row (c), and the one of our GPSL in row (d). It is obvious that our result is closer to the given ground truth, and our method can achieve a lower miss detection rate in all samples. Although our method will filter some parking spots which appear at the marginal area, this strategy can fit the principle of proximity, and at the same time guarantee to a certain extent the accuracy and availability of the input used in the vehicle parking guidance algorithm.

Table 2 A comparison result table of parking spot location experiments based on HERV 2018 dataset
Fig. 11
figure 11

A statistical graph of comparative experimental results on HERV 2018 dataset. In this figure, a is the precision rate, b is the recall rate, and c is the score

In Table 2, we show the comparison result of the parking spot location experiment by using the HERV 2018 dataset. We choose the precision rate, the recall rate and the score to explain the performance difference among Yolact [4] and our GPSL method. The calculation rule of score is:

$$\begin{aligned} \#\mathrm{Score} = \sum _i \left( \frac{s_i}{\sum _i s_i} \cdot (\beta \cdot \sum _j p_{ij} + (1 - \beta ) \cdot \sum _r l_{ir})\right) , \end{aligned}$$
(13)

where we use the area ratio \(s_i / (\sum _i s_i)\) to indicate the importance of each parking spot, which is scored by their points and sidelines, and \(\beta \) is used to adjust the ratio of point score \(p_{ij}\) and sideline score \(l_{ir}\). The final score has been normalized. In Fig. 11, we show the corresponding statistical graph of comparison results in Table 2. As shown in this figure, our proposed GPSL method has achieved a satisfactory performance.

Besides, in Tables 3 and 4, we show the comparison result of the parking spot detection experiment by using two public parking spot datasets, PS 2.0 [30] and PPSD [27]. We also choose the precision rate, recall rate and score to evaluate the performance of each method. Since the around view images in these two datasets contains several areas with various light intensities, we have to increase the range of illumination balance, which increases the algorithm complexity; at the same time, because the area of parking spots in the around view image is small, we need to adjust some parameters of the multi-clue recovery model for adapting the need of generating a small parking spot; in addition, the clarity of around view image also poses a great challenge to our algorithm. As shown in Tables 3 and 4, our GPSD method has the most balanced performance in all aspects.

Table 3 The performance of different parking spot detection algorithm on the PS 2.0 dataset
Table 4 The performance of different parking spot detection algorithm on the PPSD dataset

8 Conclusion and future work

Different from existing methods, this paper proposes a Generative Parking Spot Detection algorithm which focuses on using the corner to recover parking spots. For improving the accuracy of corner detection, we proposed a layered analytical illumination balance method and designed a fast micro-target detection network. And we use the multi-clue model to correct the result of fully pairing processing. According to the experimental result, our method can achieve a higher score both in the corner detection and the parking spot location. Because our proposed algorithm is aimed as the detection task of common parallelogram parking spots, it is very sensitive to the deformation of parking spots. In addition, the sample number of used datasets is small, and the scene type is single. So in the future, we will both improve the parking spot generation strategy for strengthening the algorithm robustness, and extended the dataset by adding more scene types and surrounding environmental factors.