Introduction. Nowadays, digital photo and video cameras are integral and essential elements of measurement technology and control systems and are used in various fields of science and technology, such as physics and astronomy [1], biology [2], and medicine [3]. Cameras are also used in various applications for studying the relief of other planets [4], analyzing the ozone layer [5], in microscopy [6], for analyzing oils [7], restoring three-dimensional targets [8], videogrammetry [9], coding [10], measuring microreliefs [11], and distribution of the velocity fields of air swirls [12]. The current problems of modern photographic technology include the characterization of cameras. Characterization refers to the determination of the noise characteristics of the camera, namely, the dark component of temporal noise, indicators of light (photo response non-uniformity, PRNU) and dark (dark signal non-uniformity, DSNU) spatial components of noise, light temporal noise by plotting a curve of temporary noise dependence on the signal level, and conversion factor of electrons into digital units [13]. Most manufacturers do not indicate the value of some of the listed characteristics, even for cameras used in scientific research. However, when choosing a camera for a particular scientific task, the noise characteristics should be known to reduce their influence on the image quality during computer processing [14].

The use of standard and accurate camera characterization methods (e.g., the method specified in the European Machine Vision Association standard EMVA 1288 [13]) implies obtaining a series of light images at different illumination levels and a series of dark frames. Depending on the accuracy required, a series of images can contain a different number of frames (2 is the minimum possible number of frames). However, according to the standard, the curve of the temporal noise dependence should contain at least 50 points distributed over all possible levels of the photosensor signal. Thus, to satisfy the minimum requirements of the standard method, at least 100 images of various uniform targets must be taken, from which the camera characteristics must be calculated. Moreover, the necessity to suppress possible shading defects [15] and vignetting [16] must not be avoided, as disregarding them causes significant decrease in the result accuracy.

Thus, the standard method for measuring the noise parameters of cameras has a noticeable disadvantage, which is the long implementation time. To accelerate the processes of determining the noise characteristics, some works [17, 18] proposed several methods for the fast characterization of cameras based on the segmentation of targets recorded. These methods enable to estimate the camera noise much faster (compared with the standard method) [19]. Depending on the required time and accuracy of the noise estimate, the error of the results obtained may correspond to the standard method error or remain at an acceptable level.

The present study aims to perform a comparative analysis of the obtained measurement accuracy of the main component of noise (temporal noise and implementation time) of camera photosensors for various purposes using the standard method (EMVA 1288) and the automatic segmentation of non-uniform target (ASNT) method.

Description of the noise model. The EMVA 1288 standard proposes a simple model for the classification and description of noise from digital camera photosensors. All noises are divided into dark and light components. Dark noise mostly appears at low signal levels and describes the random production of electrons and dark currents. Noise can also be temporal or spatial. Temporal noise includes fluctuations of signals in pixels caused by random processes, whereas spatial noise includes inhomogeneities of the registration system that appear as a result of minor differences in the system elements. For the light component, spatial noise is the PRNU, and for the dark component, it is the DSNU. The total noise at a given signal level is presented as the mean-square deviation (MSD). The MSD sum of each noise component is

$$ {\sigma}_{\mathrm{t}}=\sqrt{S/K+{S}^2{PRNU}^2+{DSNU}^2+{\sigma}_{\mathrm{d}0}^2+{\upmu}_{I_{\mathrm{ref}}}{2}^{\left(T-{T}_{\mathrm{ref}}\right)/{T}_{\mathrm{d}}}{t}_e}, $$
(1)

where σt is the total noise; S is the signal in digital units; K is the conversion factor of generated electrons into digital units; PRNU and DSNU are the inhomogeneities of the photosensitivity of pixels and the dark signal, respectively; σd0 is the dark temporal noise; Tref is the reference temperature at which the noise parameters were estimated; μIref is the dark current at the reference temperature; Тd is the temperature corresponding to the dark current doubling; T is the effective temperature value; and te is the exposure time.

Because the light component of the temporal noise is actually shot photon noise, it is approximated by the Poisson distribution [15]. Therefore, MSD is the root of the signal in pixels expressed in the number of generated electrons, which is associated with the signal in digital units through a conversion factor K dependent on the dynamic range and digit capacity of the analog-to-digital converter (ADC). The last term in Eq. (1) presents the dependence of the dark temporal noise on the temperature and time of exposure.

Description of the standard method. The standard method for characterizing cameras involves obtaining the temporal and spatial components of the noise of a photosensor [13]. It includes registering at least 50 series of images at various illumination levels, including the minimum level (with the digital camera cover closed). Each series must consist of two or more frames. After the registration of the series, the noise is determined.

The following equations are used to calculate the temporal noise at each illumination level:

$$ {\upsigma}_d^E\sqrt{\frac{\Sigma_{m.n}{\left({D}_1\left[m,n\right]-{D}_2\left[m,n\right]\right)}^2}{2 MN};} $$
(2)
$$ {\upsigma}_l^E\sqrt{\frac{\Sigma_{m.n}{\left({L}_1\left[m,n\right]-{L}_2\left[m,n\right]\right)}^2}{2 MN}-{\left({\upsigma}_d^E\right)}^2}, $$
(3)

where \( {\upsigma}_d^E \) and \( {\upsigma}_l^E \) are the dark and light temporary noises, respectively; D1[m, n] and D2[m, n] are the signal values in the matrices of two dark images; L1[m, n] and L2[m, n] are the signal values in the matrices of two light images with the same brightness; M and N are the horizontal and vertical dimensions of the signal matrix; and m and n are the row and column numbers, respectively.

Then, the dependences of the temporal noise σ on the signal level S are plotted, and the conversion factor K is determined by their approximation.

Spatial noises are estimated at the minimum signal level for the dark component and at the average signal level for the light component according to the following equations:

$$ {DSNU}^E=\sqrt{\frac{\Sigma_{m.n}{\left(D\left[m,n\right]-\overline{D}\left[m,n\right]\right)}^2}{MN-1}-{\left({\upsigma}_d^E\right)}^2}; $$
(4)
$$ {PRNU}^E=\sqrt{\frac{\Sigma_{m.n}{\left(L\left[m,n\right]-\overline{L}\right)}^2}{\left( MN-1\right)-{\left({\sigma}^E\right)}^2}}/\left(\overline{L}-\overline{D}\right), $$
(5)

where \( \overline{L} \) is the average value of the signal of a uniformly illuminated light image L[m, n]; \( \overline{D} \) is the average value of the signal of the dark image D[m, n]; and σE is the temporal and dark spatial noises.

The standard method provides high accuracy in estimating temporal and spatial noises, which can be significantly increased by increasing the number of frames in a series.

The main disadvantages of the standard method, in addition to the duration of implementation, are the additional steps required to obtain images of sufficient uniformity, which is usually complicated by the vignetting [16] and shading effects [15] and the inhomogeneity of the light flux used.

Vignetting refers to the decrease in image brightness at the edges of the field of view due to the limitation of the incidence of oblique light beams by the optical system. Shading refers to the image shading due to uneven lighting and dust or dirt on the surfaces of the lenses or the sensor itself. Vignetting and shading actually lead to the appearance of dark areas in the images of uniform targets, which introduces a significant error when conducting experiments.

Different methods of dealing with such defects for color and grayscale images have been proposed [16, 20]. Among such methods, in the image correction method, only one photograph is corrected to compensate for vignetting and shading. The essence of this method is the creation of a so-called smoothed image and determination of its difference with the initial image. For smoothing, the Fourier transform of the original image is determined, and a low-pass fi lter is applied to it, followed by an inverse Fourier transform. Thus, the smoothed image will contain smooth (uniformly varying in brightness) areas of the original image free of defects after subtracting the smoothed image from the initial one. To obtain a corrected image, a constant is added to the difference between the original and smoothed images, and the shaded areas are highlighted:

$$ \hat{A}\left[m,n\right]=C\left[m,n\right]- LowPass\left\{C\left[m,n\right]\right\}+\mathrm{const}, $$
(6)

where \( \hat{A}\left[m,n\right] \) is a matrix of corrected image signals, C[m, n] is a matrix of signals of the original image, LowPass{C[m, n]} is a matrix of smoothed image signals, and const is an arbitrary constant.

The constant is selected by taking into account the desired dynamic range. There are various types of low-pass filters. The choice of the filter is different in each case and depends on the spatial frequencies in the Fourier region, where the shading conditions dominate.

Description of the ASNT method. The application of the ASNT method requires only two images of the gradient target and provides all information about the system temporal noise [17]. In this case, the target dynamic range must exceed the dynamic range of the cameras, so that all possible levels of the camera signal can be obtained. The method implementation process can be divided into several steps:

  1. 1.

    Registration of two images of a gradient target that satisfi es the condition of the dynamic range width G1[m, n], G2[m, n].

  2. 2.

    Determining the matrices of mean values \( \overline{G}\left[m,n\right] \)and variances σG[m, n] based on the unbiased estimate of two frames:

$$ \overline{G}\left[m,n\right]=\left({G}_1\left[m,n\right]+{G}_2\left[m,n\right]\right)/2; $$
(7)
$$ {\upsigma}_G\left[m,n\right]=\sqrt{\sum \limits_{i=1}^2{\left({G}_i\left[m,n\right]-\overline{G}\left[m,n\right]\right)}^2}. $$
(8)
  1. 3.

    The procedure of image segmentation of \( \overline{G}\left[m,n\right] \) into groups of pixels depending on the signal in them. In this case, the corresponding pixels are selected from the variance matrix σG[m, n] to determine the average variance of the selected signal.

  2. 4.

    Plotting the dependence of the temporal noise on the signal level using all the groups segmented by the signal level.

  3. 5.

    Determination of the conversion factor K by approximating the experimental dependence with the equation

$$ {\upsigma}^{ASNT}(S)=\sqrt{{\left({\upsigma}_d^{ASNT}\right)}^2+S/K}, $$
(9)

where σASNT(S) is the MSD of the temporary noise, and \( {\upsigma}_d^{ASNT} \) is the MSD deviation of the dark temporal noise.

This method enables to determine the photosensor temporal noise quickly and accurately. If the number of images is increased, then the noise estimation accuracy can be improved, and the time to obtain estimates can be varied by changing the segmentation step. The disadvantage of this method is the complexity of assessing spatial noise due to the target high heterogeneity. However, the temporal noise of camera photosensors is usually several times higher than the spatial noise [13]. Therefore, in a first approximation, only temporary noise can be used to assess the quality of images and to suppress the noise [17].

Description of the experimental assembly for measuring the temporal noise of camera photosensors. In this study, a laser was used as a source of uniform radiation. A lens (microscope objective) and diaphragm were used to create a homogeneous laser beam, and a rotating matte diffuser was used to reduce the spatial coherence of radiation. A gradient transparency was used as the test image. The shooting process was controlled by a computer. The assembly was used to measure the noise characteristics of cameras of various types, namely, machine vision camera PixeLink PL-B781F, scientific camera Retiga R6, and amateur camera Canon EOS M100. Shading and vignetting effects were suppressed according to the EMVA 1288 standard. Table 1 presents the main technical characteristics of the cameras under study.

Table 1 Technical Characteristics of the Digital Cameras under Study

PixeLink PL-B781F camera. This camera has a complementary metal–oxide–semiconductor (CMOS) structure. When shooting in the raw data format (RAW format), it multiplies the signal in each pixel by 64. Thus, the image format is 16 bits instead of the actually existing 10 bits, and the maximum signal increases from 1024 to 65536 digital units. Therefore, in processing, dividing the resulting two-dimensional signal array (a 16-bit image) by 64 is sufficient.

Canon EOS M100 camera. Technically, the digital capacity of the ADC of the CMOS camera Canon EOS M100 is 16 bits. However, a significant portion of the signals are missing within this dynamic range. Thus, the signal of 33 digital units is followed by a signal of 36 digital units, and after the signal of 886 digital units, there is a signal of 894 digital units. The minimum and maximum differences between sequential signals are 1 and 8 digital units, respectively, and the most common difference is 7 digital units. Therefore, to continue the work with the image, a shot array was normalized to a number calculated, taking into account the difference between unique signals and the frequency of their use by manufacturers when processing the images. Thus, the digital capacity of the image in RAW format became 13.5 bits instead of 16 bits, and the maximum signal became 11732 digital units instead of 65535 digital units. Because the Canon EOS M100 is a color camera, only pixels corresponding to one color channel were used.

Retiga R6 camera. This charge-coupled device camera is monochrome and is designed to solve scientific problems.

Comparison of the measurements results by the ASNT method and the standard method in terms of accuracy, measurement time, and the possibility of increasing the accuracy. Several light series of frames and one dark series of frames were taken for each of the devices under study to implement the standard method. Examples of light and dark images are presented in Fig. 1. A series of images similar to those presented in Fig. 1 were used to obtain the temporal noise values. These values and the approximated dependences of the temporal noise on the signal level are presented in Fig. 1 for the Retiga R6 camera and in Fig. 2 for the PixeLink PL-B781F and Canon EOS M100 cameras. The temporal noise dependences on the signal level obtained by the ASNT method for two images of the same target are presented in Fig. 2 for the PixeLink PL-B781F and Canon EOS M100 cameras and in Fig. 3 for the Retiga R6 camera.

Fig. 1.
figure 1

Measurement results of the camera photosensor noise according to the EMVA 1288 standard.

Fig. 2.
figure 2

Dependencies of the temporal noise σ on signal S value for the PixeLink PL-B781F (a) and Canon EOS M100 (b) cameras: 1) ASNT method; 2, 3) standard method and approximated curve.

Fig. 3.
figure 3

Measurement results of the Retiga R6 camera photosensor temporal noise on the signal value using the ASNT method.

The results of measuring the temporal noise and the specification data are presented in Table 2. To compare the accuracy of noise measurement by the methods under consideration, the errors of noise parameters were calculated as MSD for different series of measurements for each camera.

Table 2 Temporal Noise Measurement Results of the Cameras by the ASNT Method and Standard Method EMVA 1288

Table 2 shows that the camera manufacturer data on temporal noise are either completely absent or not fully presented, and the overwhelming amount of noise measurement results by both methods coincide within the error, which indicates the veracity of the results obtained. For both methods, the errors of the conversion factor K coincide in order of magnitude, and the errors in measuring the dark temporal noise differ due to a small number of pixels with a dark signal level when shooting a gradient target using the ASNT method (in the EMVA 1288 standard, these data are obtained from images with a closed digital camera blind). Therefore, the standard method enables to determine the dark temporal noise with a high accuracy. However, when measuring noise using the ASNT method, only two frames of one target were used. With a large number of frames, the measurement error decreases several times or even an order of magnitude [17].

To compare the estimate rates obtained by the methods under study, the time of the computer processing of images recorded by the cameras was measured. The results are presented in Table 3. The estimation of time expenditures were conducted in a MATLAB environment on a Lenovo device (Processor Intel(R) Core(TM), I7-4510U CPU 2.00 GHz 2.6 GHz, RAM 6 GB).

Table 3 Time (sec) of Determining the Noise Characteristics of Digital Cameras Using the Two Methods

The standard method is much more time consuming than the ASNT method. In this case, only the calculation time was taken into account. However, the time for selecting the exposure for each series of images should also be taken into account to ensure the equidistance of the points when plotting the dependence of the temporal noise on the signal level. The difference in the calculation times by the ASNT method for different cameras was mainly associated with the different digit capacities of the camera ADC (the segmentation duration is directly proportional to the number of digits) and with different camera resolutions (the more the pixels of the camera’s photosensor, the longer it takes to evaluate). The second circumstance also determines the difference in the calculation times by the standard method.

Since the Canon EOS M100 is a color camera, the computation time was increased due to the time spent for one of the color channels for each registered image (not specified in Table 3).

Taking into account all factors, the measurement of temporal noise, starting with the assembly installation and ending with plotting the dependence of the temporal noise on the signal level, can be no more than 3 and 9 hours by the ASNT method and standard method, respectively. Thus, the speed performance of the method is significantly higher (3 times) than the speed performance of the standard method.

To consider the possibility of increasing the accuracy of the methods under study, the conversion factors K were measured for a different number of images in series, i.e., one series of shooting a gradient target using the ASNT method and many series of shooting quasi-uniform targets using the standard method. The graphs of the dependences of the temporal noise on the signal level are presented in Fig. 4 for 2, 4, 8, and 16 images in series, taken with the Retiga R6 camera. The conversion factors K obtained are presented in Table 4.

Fig. 4.
figure 4

Dependencies of temporal noise σ on the signal S value for the Retiga R6 camera using the EMVA 1288 standard (a) (dots and approximated curve) and ASNT method (b) for 2 (), 4 (), 8 (), and 16 () images in each series; experimental dots for the EMVA 1288 standard coincide for all image quantities under consideration.

Table 4 Conversion Factors for Electrons to Digital Units Depending on the Number of Images in a Series

Figure 4 and Table 4 demonstrate that when using the EMVA 1288 standard, the accuracy of measuring the conversion factor does not increase with an increase in the number of images in a series. At the same time, the points on the graph almost coincide. This result can be attributed to the large number of pixels in the images used, as even in a series of two images, a large statistical array is obtained, because the signal is quasi-uniform and adding supplementary data to such an array no longer leads to signifi cant changes in statistical values. For the ASNT method, the spread of noise at each signal level decreases with an increase in the number of frames in a series, which reduces the error in determining the conversion factor. This is due to the fact that each signal level in a gradient image is represented by a small number of pixels, and additional frames significantly increase the statistics. This leads to significant changes in the dependence of the temporal noise on the signal level and a decrease in the error in determining the conversion factor.

Thus, an increase in the number of images in each series does not always lead to an increase in the accuracy of determining the temporal noise. However, the EMVA 1288 standard also enables to estimate the spatial noises, whose measurement error is significantly influenced by the number of frames in a series. With an increase in the number of images during averaging, temporal noise decreases according to the root law of the number of images and the accuracy of spatial noise estimation increases.

From a practical point of view, when measuring noise, registering more images in each series than the required minimum for the method is advisable because the histogram of the brightness of a gradient image can be very uneven. At the same time, the maximum measurement accuracy is ensured with a uniform histogram. Therefore, an increase in the number of frames can significantly reduce the error in determining the temporal noise for each signal level.

Conclusion. The results of the comparative analysis of the implementation rate and accuracy of the two methods (standard and ASNT method) for characterizing the light and dark temporal noises of photosensors of three cameras for different purposes reveal that the temporal noise coincide for both methods within the measurement error. The potential for improving the accuracy of the methods by increasing the number of frames in a series of images was also compared. Although the accuracy of the ASNT method for two frames is lower than that of the standard method, an increase in the number of frames reduces the error of ASNT measurements to values comparable to the error of the standard method and even less. Thus, with an increase in the number of frames in a series of images to 16, the ASNT method error decreases by five times, and the error of the standard method remains unchanged, which is explained by the peculiarities of the methods. The performance speed of the two methods is different for all the cameras under study, as software processing by the ASNT method requires considerably lesser time than the standard method.

Thus, to quickly determine the temporal noises of various cameras, it is advisable to use the ASNT method, which is significantly less time consuming than the standard method and hence can be useful for improving the image quality and for assessing the attainable signal-to-noise ratio in specific optical–digital systems.