1 Introduction

Image stitching technology is widely used. When using the ordinary camera to capture images, due to the limited shooting angle, it is impossible to take all elements in the scene, and the use of a panoramic camera has to face the objective phenomenon of expensive cost. To broaden the perspective of ordinary cameras, people have conducted in-depth research on image stitching technology and achieved some results and applied in various fields, including autonomous driving [1, 2], geoscience [3,4,5], electrical engineering [6], virtual reality, and other fields [7].

Since it is difficult to achieve consistent exposure of the original images for image stitching, there will be a discontinuous boundary, misalignment, ghosting, and many other factors that affect the stitching effect. The image fusion process is the key to solving these problems. Image fusion algorithms can split into two directions: the method based on smooth transition [8,9,10] and to find the optimal seamline. The former way eliminates artifacts by aligning the images as much as possible, and this class of methods generally divides the image into areas and calculates the corresponding homography matrix. Spatial variations are distorted over these areas to align the overlapping regions, resulting in a significantly reduced in artifacts. The latter method achieves the final result by optimizing the costs associated with the seams and sewing the areas on either side of the seamline together. Finding the optimal seamline method can avoid complex algorithms and optimize blurring, image misalignment, and other problems [11, 12], but the stitching performance decreases sharply when the number of feature matches is very little [13].

Many scholars have studied the optimal seamline method. Kerschner proposed the “twin snakes technique” method [14] and defined the energy as the sum of the mismatched pixels on the line. Li et al. fused information about the color, gradient magnitude, and texture complexity of the images into the data. They used a new multi-frame joint optimization strategy to find the seamline in multiple overlapping images at once [15]. An optimal seamline detection method based on a CNN-based semantic image segmentation and graph cut energy minimization framework was proposed by Li et al. [16]. Hejazifar et al. proposed FARSE, a fast and robust seam estimation method, to avoid visible seams and ghosting by defining grayscale weighted distances and gradient-domain difference regions [17]. Lin et al. introduced a new structure-preserving warping method to improve the effectiveness of stitching images with large parallax [18]. Li et al. generated large-scale orthophotos by mosaicking multiple orthophotos, enabling the generation of a high-quality seamline network with fewer artifacts [19]. Zhang et al. found the optimal seamline for unmanned aerial vehicle (UAV) images based on the improved energy function by introducing optical flow into the energy function [20]. These methods have achieved good results in different fields. While optimizing and improving the quality of stitching images, the human visual system (HVS) is of great significance in evaluating the effect of image stitching. As early as 2017, HVS has proven to assess the quality of image stitching [21]. Li et al. proposed a new human perception-based stitching method [22], which considered the nonlinearity and inhomogeneity of human perception as energy minimization, finding seams that are more compatible with the HVS.

We propose an image stitching method based on the HVS and scale-invariant feature transform (SIFT) algorithm [23]. Find the optimal seamline to complete the stitching task. In the experiment, we evaluate the performance of image stitching. The experimental results show that our method conforms to the characteristics of the HVS, and the quality of the image stitching is greatly improved. The main contributions of our work are summarized as follows.

  • We propose an image stitching method based on optimal seamline, which is based on the SIFT algorithm and the HVS that quantifies the preprocessed image to find the optimal seamline, and finally uses a multi-scale fusion algorithm to make the seamline almost invisible.

  • We build an attribute relationship model based on the HVS to connect the properties of the HVS to make it more suitable for our method.

The remaining structure of this paper is organized as follows: Sect. 2 describes the related work, Sect. 3 introduces the methods used in this paper, Sect. 4 discusses and analyzes the experimental results, and Sect. 5 summarizes the whole paper.

2 Related work

2.1 Image stitching

The core goal of image stitching is to align the overlapping areas of images [10]. Image stitching consists of three parts: image preprocessing, image registration, and image fusion [24]. Many scholars have optimized the stitching process [25,26,27,28,29]. Recently, Jia et al. used the consistency of lines and points to preserve the linear structure in the stitching process to improve the image stitching quality [30]. Liao and Li proposed two types of single-perspective warping for natural-image stitching to reduce projection distortion [31].

The quality of image stitching is also closely related to the quality of the images [32]. External disturbances such as lens distortion, photorefractive effect, exposure, and differences in brightness are bound to be present when taking pictures [33]. Under these factors, there are many mismatches when feature point detection and feature matching are performed on images, and it further leads to low accuracy of the homography matrix used for stitching and shows a lot of image discontinuity problems [34]. These discontinuities are exactly the visual attention points of people, and how to avoid the seam problem to improve the image stitching quality is the key to the research.

2.2 Human visual system

The nervous system regulates human eye activity. Human perception of images is influenced by both physiological and psychological factors. The HVS, as an image processing system, perceives images non-uniformly and nonlinearly. For images, the main characteristics of the HVS are generally expressed in three aspects: brightness, frequency domain, and image type characteristics. The brightness characteristic is one of the most fundamental characteristics of the HVS, mainly about the sensitivity of the human eyes to changes in brightness. The human eyes are less sensitive to the noise attached to the region of high luminance. When the brightness of the area in the image is relatively high, the human eye is not sensitive to the change in gray value [35], so people can easily ignore the detail part in the background [36]. If the discontinuous edges of the stitched image are in high-attention areas of the human eyes, it will harm image quality. When the discontinuous edges are in the masked background, it will be difficult for the human eye to observe the undesirable areas, and image quality will improve.

3 Method

The process of image stitching is shown in Fig. 1. First, we compensate for image brightness, contrast, and saturation, aiming to remove significant brightness differences that may affect image registration and image fusion. Then, the processed image is extracted and registered features by SIFT algorithm and random sample consensus (RANSAC) algorithm to obtain the corresponding homography matrix [37]. The overlapping images are visually quantized, and the quantized characteristics include the following four types: brightness characteristic, brightness difference, masking characteristic, and visual attention. Based on the above, we establish the attribute relationship model and reference the edge detection result to find the candidate area of the optimal seamline. Adjust the parameters of the attribute relationship model to select the optimal seamline, at the end fusion to get a seamless stitched image. These will be described in detail in later chapters.

Fig. 1
figure 1

Proposed image stitching method workflow

3.1 Image preprocessing

In order to make the stitched image without a large color difference, we analyze the brightness, saturation, and contrast and compensate for the image with poor visual effect in the two images during data preprocessing. To make the brightness saturation contrast of the two images basically consistent under the visual effect of the human eye. And it does not affect the focus information of the image.

Set the original pixel grayscale as \(f(i,j)\) and the transformed pixel grayscale as \(g(i,j)\), and apply the linear transformation Eq. (1) to adjust the brightness contrast, where the coefficient \(\mathrm{con}\) affects the image contrast and the coefficient \(\mathrm{lumin}\) affects the brightness of the image.

$$ g\left( {i,j} \right) = {\text{con}}*f\left( {i,j} \right) + lu\min $$
(1)

Adjust the parameters of another image according to one of the images, until the two images have an overlapping part of the brightness, the contrast is basically the same. The higher the saturation, the fuller the color. We fine-tune the saturation of the two images by adjusting them as follows.

First set a parameter \(p\), which takes values from −100 to 100 and normalizes it to take values between −1 and 1. Calculate the maximum value \(L\mathrm{Max}\) and the minimum value \(L\mathrm{Min}\) for the RGB three-channel to obtain the values of \(\mathrm{Para}1\) and \(\mathrm{Para}2\), as shown in Eq. (23).

$$ {\text{Para}}1 = \left( {L{\text{Max}} - L{\text{Min}}} \right)/255 $$
(2)
$$ {\text{Para}}2 = \left( {L{\text{Max}} + L{\text{Min}}} \right)/255 $$
(3)

If \(L{\text{Max}}\) and \(L{\text{Min}}\) are the same, it indicates a gray point and processes the next pixel. The \(L\) and \(S\) in hue, saturation, lightness (HSL) color mode are calculated by \({\text{Para}}2\), as shown in Eq. (45).

$$ L = \left( {L{\text{Max}} - L{\text{Min}}/510} \right) $$
(4)
$$ S = \left\{ {\begin{array}{*{20}l} {\frac{{{\text{Para}}1}}{{{\text{Para}}2}},} \hfill & {L < 0.5} \hfill \\ {\frac{{{\text{Para}}1}}{{2 - {\text{Para}}2}},} \hfill & {L \ge 0.5} \hfill \\ \end{array} } \right. $$
(5)

At this point will be divided into two cases according to the \(p\) value, when \(p \ge 0\), which means an increase in the color saturation, then the \({\text{Para}}3\) value is obtained from Eq. (6). Set the value of L*255 as \(K\). The adjusted image RGB three-channel value \(M\) can be calculated from Eq. (7).

$$ {\text{Para}}3 = \left\{ {\begin{array}{*{20}l} {S,p + S \ge 0} \hfill \\ {1 - p,p + S < 0 } \hfill \\ \end{array} } \right. $$
(6)
$$ M = M + \left( {M - K} \right)*{\text{Para}}3 $$
(7)

If \(p < 0\), reduce the color saturation, \({\text{Para}}3 = p\), adjusted the image RGB value \(M\) as shown in Eq. (8).

$$ M = K + \left( {M - K} \right)*\left( {1 - {\text{Para}}3} \right) $$
(8)

The compensation results of brightness contrast and saturation can be seen in Fig. 1.

3.2 SIFT and RANSAC feature extraction and registration

To determine the homography matrix, we choose the SIFT and RANSAC algorithms to find and match feature points on images. The RANSAC algorithm is widely used in computer vision and mathematics fields, such as straight-line fitting, plane fitting, and computing transformation matrices between images or point clouds. Feature extraction and registration include the following steps. Build scale space and detect key points, determine the cardinal direction of key points and describe them, match key points and eliminate mismatching key points. Figure 2 shows the results of feature extraction and matching.

Fig. 2
figure 2

Results of SIFT and RANSAC feature extraction: a key point detection, b SIFT key point matching, c RANSAC removing mismatched key points

3.3 Human visual system characteristic

In 3.3 we will introduce the relevance of the four concepts of brightness characteristic, brightness difference, masking characteristic, and visual attention in HVS to image stitching quality.

3.3.1 Brightness characteristic

The visual perception of brightness in humans is usually poor in the absolute brightness perception of objects and extremely sensitive to the relative difference in brightness, which is called high contrast sensitivity in academia. There are generally two definitions of the luminance contrast between an object and its surrounding background: Weber contrast [38] and Michelson contrast. The definition of Weber contrast \(C_{{{\text{web}}}}\) is shown in Eq. (9). \(L_{b}\) and \(L \), respectively, the brightness value and background brightness value of the object.

$$ C_{{{\text{web}}}} = \frac{{L - L_{b} }}{L} $$
(9)

Human eye brightness characteristics is a kind of study of the relationship between the brightness of the object and the brightness of human subjective perception. Existing research has proved that human perception of image brightness and object brightness is presented in the form of a logarithmic function. According to Weber's law it can be obtained: Human subjective perception of brightness is related to the brightness \(L\) of the image, as shown in Eq. (10), where \(T\) is constant, which is related to the average value of the brightness of the entire image, \(t^{\prime} = T\ln 10\), \(T_{0}\) is a constant, \(L\) represents the objective brightness value. The discontinuous discrimination rate of the human eye in the dark is much higher than the resolution in a bright place, so positioning the image stitching in the bright area can significantly improve the stitching effect of the image. We define the brightness characteristic as \({\text{BC}}\).

$$ {\text{BC}} = T\ln L + T_{0} = t^{\prime}\lg L + T_{0} $$
(10)

3.3.2 Brightness difference

Differential Eq. (10) yields Eq. (11), and it can be seen that the difference in brightness conforms to the linear law with the change of actual brightness. Among them, \(d\left( {{\text{BC}}} \right)\) and \({\text{d}}L\) represent subjective brightness and objective brightness, respectively. At the same time, we define the brightness difference as \({\text{BD}}\), as shown in Eq. (12).

$$ d\left( {{\text{BC}}} \right) = T\frac{{{\text{d}}L}}{L} $$
(11)
$$ {\text{BD}} = 255 - |({\text{averange}}\left( {L_{1} } \right) - {\text{averange}}\left( {L_{2} } \right)| $$
(12)

where \(L_{1}\) and \(L_{2}\) represent the brightness of pixels with the same coordinates in the overlapping areas of the two images to be stitched, respectively. The \({\text{averange}}\) function is used to calculate the average brightness of pixels adjacent to the current pixel. Through the previous analysis, it can be obtained that compared with the high brightness difference area, the stitching in the low brightness difference area can obtain better stitching quality.

3.3.3 Masking characteristics

Masking characteristics is an important feature of the HVS, and masking characteristics play an important role in image processing. When there are multiple stimuli, the interaction between the stimuli will cause some of them to be unperceivable, especially when the characteristics of the stimuli are similar to the environmental characteristics [39], which is the masking characteristic. Visual masking characteristics are generally related to the spatial frequency, direction, and position of the stimuli.

The image regions can be divided into textured, smooth, and other regions. Mismatches in the smooth region have little effect on the results. The more cluttered the edge information in the textured region, the more it helps to improve the quality of the stitched image. The masking characteristics quantization formula is shown in Eq. (13). \({\text{MC}}\) is the masking value for each pixel, where \(x_{\max } - x_{\min }\) represents the difference between the maximum and minimum values of the grayscale value of the neighborhood window. \(\alpha\) and \(\beta\) are constants, we set them to 0.8 and 1 according to the experience of Cao et al. Except for these two cases, the \({\text{MC}}\) of the pixels in other regions is 0.

$$ {\text{MC}} = \left\{ {\begin{array}{llll} {\alpha *\left( {255 - x_{\max } + x_{\min } } \right),} &{\quad {\text{pixels}} {\text{in}} {\text{smooth}} {\text{regions}}} \\ {\beta *H ,} &{\quad {\text{pixels}} {\text{in}} {\text{texture}} {\text{regions}}} \\ \end{array} } \right. $$
(13)

The textured region can mask the discontinuous edges during image stitching, and the complexity in each region is determined by Eq. (14), where \(H\) is the local entropy. \(l\), \(w\) are the length and width of the window around the pixel being calculated. \(n_{ij}\) represents a grayscale pixel, and \(l*w\) is the neighborhood window of the pixel.

$$ H = - \mathop \sum \limits_{i = 0}^{l - 1} \mathop \sum \limits_{j = 0}^{w - 1} \frac{{n_{ij} }}{l*w}*\left( {\log n_{ij} - \log \left( {l*w} \right)} \right) $$
(14)

3.3.4 Visual attention

When people observe images, they divide the images into different areas due to psychological factors, treat these areas separately, and sometimes focus on only a part of them. When the image is distorted in the areas the human eye focuses on, distortion is more noticeable than in other areas. Every pixel of an image has a unique salience value, and pixels with higher significance have a greater impact on image quality. If the seamline is in an area with high visual attention, it will affect the quality of stitched images.

We choose the SDSP model proposed by Zhang et al. [40] to calculate the visual attention value \({\text{VA}}\) of pixels, and the definition is shown in Eq. (1518).

$$ {\text{VA}}\left( x \right) = V_{F} \left( x \right)*V_{D} \left( x \right)*V_{C} \left( x \right) $$
(15)

\(V_{F} \left( x \right)\) is a saliency plot modeled by band-pass filtering, defining the image as \(p\left( x \right)\), converting it to \({\text{CIEL}}^{*} a^{*} b^{*}\) opponent color space, and the resulting three channels are represented by \(p_{L} \left( x \right)\), \(p_{a} \left( x \right)\), and \(p_{b} \left( x \right)\). \(f\left( x \right)\) is the transfer function of the logarithmic filter \({\text{Gabor}}\).

$$ V_{F} \left( x \right) = \sqrt[2]{{\left( {p_{L} {*}f} \right)^{2} + \left( {p_{a} {*}f} \right)^{2} + \left( {p_{b} {*}f} \right)^{2} }} $$
(16)

\(V_{c} \left( x \right)\) is color significant, \(\theta_{c}\) is a parameter, \(p_{an} \left( x \right)\) is a linear mapping of \(p_{a} \left( x \right)\), and \(p_{bn} \left( x \right)\) is a linear mapping of \(p_{b} \left( x \right)\).

$$ V_{c} \left( x \right) = 1 - e^{{\left( {\frac{{p_{bn}^{2} \left( x \right) - p_{an}^{2} \left( x \right)}}{{\theta_{C} }}} \right)}} $$
(17)

\(V_{D} \left( {\text{x}} \right)\) is location significant, and studies have shown that objects near the center of the image are more attractive to people, \({\text{center}}\) is the center of the image \(p\left( x \right)\). \(\theta_{D}\) is a parameter.

$$ V_{D} \left( x \right) = e^{{\left( { - \frac{{\left| {\left| {x - {\text{center}}} \right|} \right|_{2}^{2} }}{{\theta_{D} }}} \right)}} $$
(18)

Selecting nonsignificant areas according to the VA model to find the optimal seamline can lead to better results.

3.4 Establish attribute relationship model

By image preprocessing we can get two images with similar brightness: image1 and transformed image2. Find the feature points through the SIFT algorithm and the RANSAC algorithm to extract and optimize the feature point pairs, filter out the mismatching results, and set the output homography matrix to \(H\).

According to the analysis of Sect. 3.3, brightness characteristics, brightness difference, masking characteristics, and visual attention all affect the quality of image stitching. According to the magnitude of influence, by building an attribute relationship model to quantify the impact of these four attributes on image stitching, the attribute relationship model is shown in Eq. (19), where \(\mu_{1}\), \(\mu_{2}\), \(\mu_{3}\) are constants and \(\mu_{1} + \mu_{2} + \mu_{3} = 0.9\).

$$\begin{aligned} {\text{ARM}} = \mu_{1} *{\text{BC}} + 0.1*{\text{BD}} + \mu_{2} *{\text{MC}} + \mu_{3} *\left( {255 - {\text{VA}}} \right) \end{aligned} $$
(19)

We mainly focus on the comprehensive feature of the SDSP model, the value setting of \(\mu_{3}\) will be higher than \(\mu_{1}\), \(\mu_{2}\), according to the image preprocessing results, the two images now have similar brightness, the brightness difference is not obvious, so the influence degree of the brightness difference is set to 0.1, the value of \(\mu_{1}\) is controlled between 0.1 and 0.2, and the splicing is positioned by adjusting the values of \(\mu_{2}\) and \(\mu_{3}\). The main steps are shown in Algorithm 1. We define P as the area where image1 and transformed image2 overlap.

figure c

3.5 Optimal seamline selection and image fusion

According to the previous analysis, seamline has the lowest impact on image quality when smooth and texture regions. We use the edge detection algorithm to perform edge detection on the image, according to the obtained edge map, to determine whether the edge pixels are splice line candidates. If the optimal seamline produces a small number of discontinuous strong edges, and the discontinuous weak edges appear in areas with low attention, the optimal seamline is almost invisible.

The main process of the optimal seamline selection algorithm is represented by Algorithm 2. When the two images are aligned, there will be an intersection point on the top edge of the image and the bottom edge of the image, which are the two endpoints of the optimal seamline, and the intersection points of the upper and lower edges are start and end, Find_edistance function is used to find the reciprocal of the Euclidean distance between the two parameters, point is the pixel currently being processed, next_point is the next candidate point, and the optimal seamline is a point set called optimal_seamline.

figure d

After finding the optimal seamline for image fusion, divide the overlapping area of the two images into two parts based on the optimal seamline, these two parts are filled by two images separately. Fusing the images according to the Laplace pyramid fusion method. The main steps are as follows. Calculate the Gaussian pyramid and the Laplace pyramid of the input image. Merge the pyramids of Laplace, which are on the same level. Expand the upper pyramid of Laplace until it has the same resolution as the original image, then overlay the image one after the other. Finally, we obtain the output image.

4 Experimental results and analysis

The platform used for this method is Windows 10 on a PC with 3.33 GHz and 16 GB RAM. The program was written in MATLAB. The description of the algorithm proposed in this paper is an image stitching method based on HVS and SIFT algorithm. To verify the effectiveness of the proposed method, we evaluate the quality of image stitching through a series of experiments, the experimental results show that the proposed method can improve the quality of image stitching. We experimented with visually representative images, all of which were taken of natural scenes. We first compare the results under different parameters during the stitching process to find the influence of the parameters on the position of the optimal seamline. Second, we compared the experimental results with the experimental method of Li et al. and the experimental method of Cao et al., the results showed that our experiments were more effective. We also added a subjective evaluation of image quality. Finally, we performed a limitation analysis of our experiment. To ensure the fairness of the comparison, all test methods use the same matching data and are performed on the same host.

4.1 Self-comparison experiments

Before conducting a comparative experiment, the images of the four HVS properties that appear during the experiment are first introduced.

In Fig. 3, we introduce the attribute relationship graphs of HVS, all of which are processed on the overlap of image1 and transformed image2. As the brightness difference graph shown in Fig. 3d, the white pixels indicate that there is a small brightness difference in the current image position, and the image is preprocessed according to the brightness difference graph, and a small brightness difference is expected in the area near the seamline to achieve the better stitching effect. In Fig. 3e the brightest part of the image is the visual attention point, and people will pay attention to these places first when looking at the image. Figure 3f shows the attribute relationship graph obtained by establishing the attribute relationship model based on the four graphs before this graph.

Fig. 3
figure 3

HVS attribute graph a two images directly stitched b brightness characteristic, c masking characteristics, d brightness difference, e visual attention, f attribute relationship graph

Next, according to the established attribute relational graph to find the optimal seamline, Fig. 4 shows the difference in seamline position when the three parameters are assigned differently.

Fig. 4
figure 4

The influence of parameters on the seamline position. a \(\mu_{3}\), \(\mu_{2}\) and \(\mu_{1}\) are 0.55 0.25 0.1, b \(\mu_{3}\), \(\mu_{2} \) and \(\mu_{1}\) are 0.5 0.2 0.2, c \(\mu_{3}\), \(\mu_{2}\) and \(\mu_{1}\) are 0.5 0.3 0.1, d \(\mu_{2}\) = 0.9, e \({ }\mu_{3}\) = 0.9, f \(\mu_{1}\) = 0.9

In Fig. 4, the most eye-catching thing is the tall building in the middle and the sign on the building, and it can be seen that the position of the seamline is different when the attribute relationship model is different. When only \({\text{MC}}\) or \({\text{BC}}\) is concerned, the seamline will pass through the central building and the sign on the building, which will lead to poor stitching. When only focusing on the \({\text{VA}}\), the seamline is close to the central building. Observing the three graphs Figs. 4a,  4b, and  4c, it is found that Fig. 4b passes through the central building, Fig. 4a can avoid the central building, and Fig. 4c approaches the central building. The results of fusing according to different seamline show that the stitching effect of Fig. 4a is the best. The results of stitching according to Fig. 4a can be seen in Fig. 7.

The blind/referenceless image spatial quality evaluator (BRISQUE) algorithm is a reference-free spatial domain image quality assessment algorithm. The larger the score, the worse the quality of the image. We used the BRISQUE algorithm to evaluate the quality of the images under three different attribute relationship graphs, and the results showed that (a) had a result of 25.52, which was 1.15 lower than (b) result and 0.75 lower than (c) result.

In summary, the attribute relationship model influences the final stitching effect, finding the optimal seamline is the key to getting good experimental results.

4.2 Comparative experiment

We used multi-scale fusion to perform the final fusion processing of the image. To more clearly show the influence of image preprocessing and image fusion on the final result, we used the BRISQURE algorithm to compare the stitching results without preprocessing and the preprocessed results. Compare the direct fusion based on seamline with the multi-scale fusion results. It can be seen from Table 1 that preprocessing and multi-scale fusion have a certain improvement in experimental results.

Table 1 Image quality module research

In Figs. 5, 6, 7, 8, and 9, we list the direct fusion of SIFT algorithm, the method of Cao et al., the algorithm of Li et al., and the stitching effect of our method under five datasets. From the highlighted content in the figure, it can be seen that SIFT splicing and direct fusion will produce an obvious dividing line. The method of Cao et al. will result in the problem of misalignment and ghosting, and the algorithm of Li et al. may cause a difference in brightness, and produce the problem of dislocation, affecting the visual effect. And our method can effectively avoid the problems of seams, ghosting, and misalignment. In summary, our stitching effect is better.

Fig. 5
figure 5

Stitching effect of datasets bike

Fig. 6
figure 6

Stitching effect of datasets stone

Fig. 7
figure 7

Stitching effect of datasets building

Fig. 8
figure 8

Stitching effect of datasets stage

Fig. 9
figure 9

Stitching effect of datasets sunset

We give the results of scoring the five groups of images in Figs. 5, 6, 7, 8, and 9 using the BRISQUE algorithm, and the results are shown in Table 2.

Table 2 BRISQUE algorithm scoring results

In Fig. 10, to more visualize the results of our experiments, we provide the results of nine additional sets of experiments.

Fig. 10
figure 10

More results of ours

As can be seen in Table 2, our algorithm has the best-quality of stitching results. Figure 11 shows that our method scores line is under the other two methods with the best results. We use the BRISQUE algorithm to score the images in the dataset, excluding the failures, the average score of our method is lower than the method of Li et al. and the method of Cao et al., which shows that the quality of our image stitching is high, and the algorithm is better than the other two algorithms.

Fig. 11
figure 11

BRISQUE algorithm scoring line

In this paper, we focus on image stitching under HVS, and subjective evaluation is also an essential part of image quality evaluation. Accordingly, we introduce subjective evaluation. We conducted two user studies to compare the image stitching results of three different methods. We invited 20 participants to evaluate the unlabeled stitching results, including 10 researchers with computer vision backgrounds. We sequentially dropped two input images, our results, and the others' results on a large screen, each time with our results in a random order of placement with the others. The available results to the user are (1) A is better (2) B is better (3) both good (4) both bad. The evaluation results are shown in Fig. 12. It can be seen that our results are more favored by users.

Fig. 12
figure 12

user study on visual quality. The number are shown in percentage and averaged on 20 participants

4.3 Limitations

When using the suture-based image stitching method, the image must have a coincident area where the optimal seamline can be found. If the two images of the input have particularly large parallax or the significant structure of the image is very complex, when looking for the optimal seamline in the established attribute relationship graph, there will be no optimal seamline. We provide a failure case in Fig. 13, where the image has a large parallax and high-performance alignment is not possible during the image alignment phase, which directly affects the final result.

Fig. 13
figure 13

A failure example. The red rectangle indicates the unsatisfying stitched areas

5 Conclusion

The method of stitching images combined with SIFT algorithm and HVS proposed in this paper is a good solution to improve the image quality under the human vision. This method quantifies the visual characteristics of the human vision to locate the seamline of two images to be stitched, avoiding high perception area as much as possible. Before using the SIFT algorithm to obtain the homography matrix, preprocessing is used to minimize the brightness difference between the two images, and the saturation and contrast are fine-tuned to make them more suitable for subsequent processing. The next step is to build the attribute relationship model, determine the optimal seamline and perform multi-scale fusion to obtain the final results. Based on a series of comparative experimental results analysis, our method is evaluated and has a superior visual effect and good stitching effect under the human visual effect. In the following work, we will also develop more flexible adaptive brightness preprocessing methods to eliminate the disadvantage of manual parameter adjustment for image preprocessing. By introducing machine learning methods, work efficiency can be further improved, which is also the research direction of future work.