1 Introduction

Synoptic observations at the Kislovodsk Solar Station have been carried out since 1947. The station observes sunspots and low-latitude faculae in white light as well as prominences in the Hα line and in the 5303 Å and 6374 Å coronal spectral lines. It also observes the solar disk in the Ca ii K and Hα lines, and polar faculae and radio emission at 5-cm and 3-cm wavelengths (Gnevyshev 1983). All observations have been performed with a CCD camera since 2002, except for the white-light observations. Until mid-2010, these were made by means of films. The reason was the insufficient optical resolution of the CCD compared with photographic film. Between January 2009 and May 2010, white-light observations were performed with both the CCD camera and film. During that period we were processing the imaging methods to debug and maintain the system. At present, we use a CCD camera with a spatial resolution of 0.7 arcsec. In 2009 – 2010 the imaging of white-light observations was made in the usual manual way (using a film projector), as well as in a semi-automatic imaging mode. To apply the semi-automatic method we used scanned film and CCD images. The semi-automatic method includes a program for identifying and accentuating the sunspots, sunspot umbra, and faculae, but during processing the operator can change the parameters. As a rule, ground-based observations were mostly processed by the operator in semi-automatic mode because of different atmospheric conditions, matrix pollution, and other factors. Spacecraft observations are more homogeneous, therefore they can be processed in an automatic mode.

As a result, enormous archives of observations have been accumulated, and there is a great amount of photographic films and plates. All data are analyzed daily, producing a sequence of observational data that were processed into databases. One of the most important conditions during the transition to the computerized processing methods was retaining the data-analysis process. To be more specific, we had to ensure the best possible coherence in the measurements of area, number, and other parameters of the allocated active regions made with computerized methods and the previously made human-based, hand-held procedures. On the other hand, computerized methods facilitate image-analysis procedures, and one can measure certain indices that were previously impossible to measure because they were too labor intensive.

Therefore we created algorithms and software tools that allowed the daily image analysis to proceed semi-automatically, but the extraction of the active elements was kept under the observer’s control. Long-term observations can also be processed automatically. At Kislovodsk Station, we analyzed the synoptic observations of solar activity and then produced a series of the previously summarized observations. Within the sunspot data sequences, the coordinates of spot-groups are presented, along with their area, the number of sunspots, and the sunspot group index [Wolf number: W]. These data can be viewed at www.solarstation.ru .

We also present some algorithms of the image processing. We provide comparative results of synoptic solar-activity observations, using automatic methods. We compare the processed data of synoptic observations of spots and faculae in the white light in manual mode from the data in Kislovodsk and the processed data in automatic mode of spacecraft observations from the Michaelson Doppler Imager onboard the Solar and Heliospheric Observatory (SOHO/MDI) for Solar Cycle 23. SOHO/MDI observations represent homogeneous data that cover more than one activity cycle. The processing of this set of data is a good test for the sunspot emission algorithm. The MDI data during the period of 1996 – 2004 were analyzed with edge-detection methods (Zharkov, Zharkova, and Ipson 2005) and algorithms. However, both for Kislovodsk and NOAA data, this method shows lower sunspot values than the values obtained from manual processing. A Bayesian technique for active region and sunspot detection and labeling was developed by Turmon, Pap, and Mukhtar (2002). This method detects the features more accurately, but is quite expensive computationally. Another approach to measuring the sunspot area is using edge detection and boundary gradient intensity, which was applied for high-resolution observations of individual sunspot groups and/or disk segments by Győri (1998). This procedure is very precise if applied to data with a sufficiently high resolution. However, this method is not suitable for automatic sunspot detection on full-disk images with the low and moderate resolutions that are available in most archives.

We here describe the automated analysis of sunspots, umbra, and faculae from white-light images (from the entire Solar Cycle 23) as well as the magnetic flux of these activity elements, derived from SOHO/MDI observations.

2 Solar Disk-Center and Radius Measurements

To determine the radius of the sun, we need to identify points that belong to the solar limb. Typically, the edge of the Sun in ground-based observations has no sharp boundary. The edge of the Sun on the photographic plates in different parts of the image may have varying intensity. Using a single threshold intensity for the complete disk is often difficult. Therefore we used the specifying procedure. This procedure allows us to change the level of the threshold intensity for different parts of the image.

To define the edge of the disk within the image, we divided the image into four quadrants centered on the solar disk. Then we defined a distribution function of the pixel intensity value with a histogram-based method. As a rule, the distribution function had two maxima, corresponding to the points of the solar disk and background. The distribution minimum between two maxima, to first approximation, was taken as the intensity corresponding to the solar limb [I limb]. The irregularity of the solar exposure and the irregular background were taken into account by approximating between the averages of I limb in the different quadrants. The lists of chosen points were realized on the basis of the standard template library [STL]. The points that were surrounded by adjoining points from the list from all directions were considered to be inner points and were excluded form the procedure.

The position of the highest gradient, as a rule, corresponds to the position of the boundary limb points at which the limb intensity [\(I^{\prime}_{\mathrm{limb}}\)] is given. To define the limb’s intensity, we also used the gradient chart. The gradient was defined with the Sobel algorithm (Tennenbaum et al. 1969). The highest gradient distribution, as a rule, corresponds to the intensity of the boundary limb points [\(I^{\prime}_{\mathrm{limb}}\)]. Using the detected set of points, we defined the preliminary location of the solar-disk center and radius. Then we analyzed the maximal gradient points on the line, joining the center with the previously chosen limb point and the radius value, which was about 1 % different from the radius value. We chose the points where the direction of the gradient differed by no more than 30 from the expected disk center. This allows one to separate false points.

Selected boundary points were used for a rim search for which we performed an iterative procedure of fitting the circle. The points located farther away than a critical distance from the boundary were eliminated and the procedure was repeated. For this we excluded the points whose distance from the center exceeded the value of the mean-square deviation by more than 3σ.

For the chosen limb points obtained by the methods described above, we used a fitting procedure. For this we used the gradient-weighted and Levenberg–Marquardt method (Chernov 2010), and finally chose the one that gave the smallest error.

3 Calculating the Sunspot Parameters

The next step in our image processing was to create a sequence of calibrated images. We based our calibration method on the assumption that the center-to-limb variation of the quiet Sun (i.e. areas of the Sun without bright floccules) corresponds to a standard curve. The local intensity level of the quiet Sun [LQSL] is used to eliminate nonuniform intensities across the solar disk, for image calibration, and to search for active elements. To do this, we segmented the solar disk and found the maximal distribution number of pixels in intensity. This corresponded to the level of the quiet Sun [I QS(θ,φ)]. The technique for determining the level of the quiet Sun is described by Tlatov, Pevtsov, and Singh (2009).

After calculating the quiet-Sun intensity [I QS], it is possible to allocate the active elements by specifying, for example, a threshold level relative to I QS. There are two main methods for defining the sunspots: the threshold method and the border method. Threshold methods contain cutoff values for defining the boundary of the sunspots with respect to the LQSL, using either the given value (Chapman et al. 1989) or a value from the histogram calculation that represents the intensity distribution (Steinegger et al. 1996). These methods allow sunspots to be identified within a specified contrast interval and thus the cutting off of false areas, such as dust and scratches in digital images. Border methods that rely on gradient values are widely used (Győri 1998; Zharkova et al. 2005; Zharkov, Zharkova, and Ipson 2005; Benkhalil et al. 2006) although they can give different values of the sunspot area compared with the threshold method depending on the chosen threshold value. This complicates their application for obtaining long-term, homogeneous series. Border methods allow the identification of sunspot boundaries, but these boundaries often do not allow us to carry out the sunspot or solar active features contouring completely, and it also leads to mistakes.

To identify the sunspots, we combined these two methods. At the same time, instead of the intensity contrast we used the modified contrast, which contains both the value of the relative intensity and the gradient value Δ=ΔI+kΔg, where ΔI=abs(I QSI)/I QS is the contrast of the image and Δg=(gG)/G. Δg=0 for g<G, with g and G being the gradient at the given point and the average gradient across the solar disk, approximated by the Sobel function. The modified contrast allows the boundaries of the allocated active regions to be defined more reliably.

For the spot I<I QS allocation we used the growing procedure, which defines the sunspots with the value of the modified contrast Δ, which transcends some threshold value Δ>Δ1 if the contrast itself satisfies the condition ΔI>0.7Δ1. The magnitude of the level 0.7 found for sunspots is a compromise found by means of selection, when the boundary is clear-cut and a continuous of allocation of inner sunspot regions is ensured. The results are lists of all points for each spot with a common boundary, from which we found the average coordinates, intensity, and other parameters.

After defining the penumbra–photosphere boundary, we estimated the umbra–penumbra boundary with intensity histograms and gradient distributions for the points in each list corresponding to a given spot. We considered the intensity at the umbra–penumbra boundary to correspond to the steepest spot gradient if the gradient value transcended some segment of the average gradient value of the umbra–penumbra boundary I U=f(g max).

Spots, faculae, and other tracer agents were considered as separate objects, as a group of pixels with a common boundary. The objects were implemented on the basis of the STL lists. To do this, we found a point that corresponded to the selection criteria described above. Then we considered and joined the points that also corresponded to the selection criteria. This was iterated until the number of the points became equal to zero during the next stage. These lists allowed us to operate the selected area with its common boundary as a whole object. One can find the parameters of the average and highest intensity, space, and perimeter, and it is also possible to transport these objects to other images, using the converting matrix for characteristics measurement of other observation types or to produce general activity charts.

Examples of the sunspot–photosphere and umbra–penumbra boundary locations are shown in Figure 1.

Figure 1
figure 1

An example of identifying a sunspot group in NOAA Active Region 11166 (6 March 2011, 06:06 UT) from Kislovodsk observations. The spatial extent of the image is 225×150 arcsec. The heliographic longitude and latitude of the center sunspot group is θ=10.1, ϕ=90.1.

4 Comparative Results of the White-Light Analysis of Solar Cycle 23

To date, the basic data for the long-term solar activity observations are received during manual processing. As a rule, during this process one receives solar activity indices that are difficult to reproduce or confirm because there is no information about processing the procedure and geometry of the identified active elements. The development of automatic methods allows one to preserve not only quantitative, but also vector information of selected active elements, and one can measure more parameters of the data in question. This section gives the automatic processing data of the white-light observations of the Sun during Solar Cycle 23.

The stability of the automatic methods for detecting solar activity can be tested by means of available long-term series of observational data. The white-light solar data observed by SOHO/MDI (Scherrer et al. 1995) are the most solid set of data for checking the algorithms and software tools for extracting solar active regions, allowing the selection of both sunspots and solar faculae. For this study we used the set of full-disk (level 1.8) calibrated synoptic daily continuum and line-of-sight magnetogram observations. The data almost continuously cover the time period from 1996 until 2010 with a cadence of 4 continuum and 15 magnetogram observations per day.

We give the results of automatic extraction of the sunspots, sunspot umbra, and faculae in white light and compare these parameters with the results of manual extraction from the observation at Kislovodsk Solar Station. We used a threshold level of Δ>9 %, k=1.0 as the sunspot allocation parameters.

Figure 2 shows monthly averages of the sunspot areas during Solar Cycle 23. In the automatic mode, a total of 31 988 sunspots were located. Correlation analyses with manually processed data give the following relationships. The differences that are most noticeable in 1998 and the beginning of 1999 arise because there are no MDI data in that time. The correlation analysis with the manual processing data gives the following relationship: S MDI=−20(±15)+0.945(±0.013)S Kisl, R=0.99 for the Kislovodsk Solar Station data and S MDI=−9(±13)+1.058(±0.012)S NOAA, R=0.98 for US Air Force/NOAA Data Center, US ( solarscience.msfc.nasa.gov/greenwch/sunspotarea.txt ). The efficiency of the chosen parameter is shown by the high correlation coefficient, the linear regression formula, and the b coefficient, which are close to unity.

Figure 2
figure 2

Monthly averages of the sunspot areas from automatic processing of SOHO/MDI observational data (lower graph) compared with the results of manual processing of the sunspot area from Kislovodsk Solar Station data (upper graph). Areas presented in millionths of the solar hemisphere.

Information about the inner structure of sunspots is important for determining different characteristics, such as the dynamics of the sunspots, the definition of the total solar irradiance, and the development of the model structure of the sunspots.

A number of researchers have published values of r U=S U/S S, the ratio of umbral to total area of the sunspot. Thus, Tandberg-Hanssen (1956) found that for sunspots with a total area larger than \(150~{\mathrm{Mm^{2}}}\) the umbral area is r U=0.17. Jensen, Nordø, and Ringnes (1955) found r U=0.189 around the maximum of the sunspot cycle and r U=0.159 around the minimum. Gokhale and Zwaan (1972) obtained r U=0.169 from the results of earlier investigations. Antalova (1991) gave r U=0.175 and Beck and Chapman (1993) obtained r U=0.2. A part of the difference between the results of the various authors probably has to do with the different techniques used by the various authors to measure the umbral and penumbral areas and the sensitivity of these methods to the seeing conditions (Steinegger et al. 1996; Solanki 2003).

We extracted the sunspot umbra with the techniques described in Section 3. Figure 3 shows monthly averages of the umbra area for the period from 1996 to 2008. The upper panel of Figure 3 shows monthly averages of the umbra area derived from SOHO/MDI data, while the lower panel shows variations in the ratio of umbral to total sunspot area S U/S S covering the same time period. A linear fit to the data was performed, obtaining S U=4.99(±2.0)+0.171(±0.002)S S, R=0.993. The average value of r U≈0.19 confirms previous determinations.

Figure 3
figure 3

(Top panel) Monthly averages of the sunspot umbra area from the automatic processing of SOHO/MDI observations (in millionths of the solar hemisphere). (Lower panel) The ratio of umbral to total sunspot areas.

One of the most important questions concerning the evaluation of the changing solar radiance is the correlation between the area of sunspots and plages (Foukal and Lean 1988; Fröhlich 1994). To extract faculae in white light we used the techniques described in Section 3 with the parameters I>I QS, Δ>5 %, k=1. Figure 4 presents monthly averages of plage areas, received during manual processing at Kislovodsk Solar Station and during the automatic processing (SOHO/MDI data). The correlation between these data is given by \(S^{\mathrm {MDI}}_{\mathrm{Faculae}}=0.2(\pm0.3)+2.94(\pm0.1) S^{\mathrm {Kisl}} _{\mathrm{Faculae}} \) with the correlation coefficient R=0.91. The facular area in white light derived from automatic processing is almost three times higher than the facular area received from the manual processing because the manual measurements considered only bright faculae at the edge of the disk (r/R≳0.7). The correlation between the spot area and facular area is \(S^{\mathrm{MDI}} _{\mathrm{Faculae}}=1.3(\pm0.3)+5.94(\pm 0.2)S^{\mathrm{NOAA}} _{\mathrm{Spot}} \), R=0.88.

Figure 4
figure 4

(Top panel) Monthly average of the faculae area in white light derived from Kislovodsk Solar Station data. (Lower panel) Faculae area derived from automatic processing of SOHO/MDI white-light observations. The area are given in thousandths of the solar hemisphere.

The strongest contrast of faculae in white light is observed near the solar limb. The given parameters of the automatic detection method allows plages to be located in a distance range of 0.3<r/R<1.0. Therefore, as a rule, the plage area in white light is less extended than the facular area, which is registered in the spectral lines. By calculating the plage area in the Ca ii K line derived from the Kodaikanal (India) observations (Tlatov, Pevtsov, and Singh 2009), the correlation is \(S^{\mathrm{Ca\,\textsc{ii}\,K}}_{\mathrm{Plage}}=8.5(\pm0.3)+15(\pm0.25) S^{\mathrm{NOAA}} _{\mathrm{Spot}} \), R=0.88, we find that the facular area in white light is approximately three times smaller than the calcium plage area.

This difference can be explained by the lower contrast of the faculae in white light, and it can certainly be registered only near the limb. Observation data in line Ca ii K were processed in an automatic mode. Before that, good-quality images were selected, and a special calibration procedure allowed us to obtain a homogeneous set.

To extract values of the magnetic field inside the detected sunspot areas and faculae, the white-light images and magnetograms were synchronized by rotating a solar magnetogram image to the time and point of view corresponding to the continuum image to allow pixel-by-pixel comparisons of both images. A detailed analysis using similar methods was performed by Zharkov, Zharkova, and Ipson (2005). Information about the magnetic field allows an analysis of sunspot polarities to be performed. Figure 5 shows the ratio of sunspot areas for following and preceding polarities [S f/S p]. The average ratio was 0.45 for the northern hemisphere and 0.38 for the southern hemisphere.

Figure 5
figure 5

Monthly averages of the ratio of sunspot areas with following [S f] and preceding [S p] polarities.

Observations made with SOHO/MDI allow one to measure the magnetic field in different active regions that are visible in the intensity image. Averages of magnetic flux from sunspots as well as the umbra and faculae are presented in Figure 6. We used the line-of-sight magnetic-field data to calculate the magnetic flux. The data were taken from the daily magnetograms that were the closest in time to the white-light images. Magnetic-field saturation effects (Ulrich et al. 2009) were not considered. The absolute values of the magnetic flux from the sunspots and faculae are related by ΦFaculae=14.1(±5)+0.75(±0.03)ΦSpot, R=0.9. The correlation between the sunspot areas, expressed in millionths of the solar hemisphere, and the magnetic flux in units of 1020 Mx is ΦSpot=2.1(±3)+0.15(±0.02)S Spot, R=0.98. The magnetic flux from the sunspot umbra and across the whole sunspot are linked by the relation ΦUmbra=0.03(±1.1)+0.35(±0.06)ΦSpot, R=0.98.

Figure 6
figure 6

The absolute values of the magnetic flux received during superposition of isolated active elements onto the SOHO/MDI magnetograms. (Top panel) The magnetic flux of white-light faculae. (Middle panel) The magnetic flux of the sunspots. (Lower panel) The magnetic flux of the sunspot umbra.

Ringnes and Jensen (1960) found the strongest correlation between the logarithm of area and the field strength. Here, we used this known relationship to investigate the changes in the magnetic-field strength by employing the area of sunspot umbra as a proxy for the magnetic field. Figure 7 shows the dependency between the logarithm of the sunspot umbra area and their maximal field strength. We found a good correlation between the ( logarithm of) umbra areas and the magnetic-field strength B max=760(±13)+1102(±12)lgS U, R=0.67.

Figure 7
figure 7

Magnetic-field strength (from SOHO/MDI observations) vs. the logarithm of sunspot umbra area for Solar Cycle 23. The line is a straight-line fit to the data.

5 Discussion

We briefly described the automated procedures for detecting solar features such as sunspots, sunspot umbra, and active regions in full-disk solar images. The original images were automatically standardized according to shape and intensity, to which the feature detection techniques were then applied with local intensity thresholding. The main goal of executing our methods is preserving the stability of the process over a long series of observations (Balmaceda et al. 2009). Sunspot areas were calculated over the entirety of Solar Cycle 23 both automatically, using data from SOHO/MDI, and manually, using data from Kislovodsk Solar Station and NOAA. Comparing the results from the two processes, we found a high degree of correlation close to the absolute averages. The application of automatic and semi-automatic computer methods provides an opportunity to considerably widen the list of the measured parameters and to cross-reference image data analyses over a variety of observations and data products.

The results of automatic and manual white-light observations allowed us to develop new methods that are now being used at Kislovodsk Solar Station. The main aim of these methods is preserving the stability of the sunspot-processing system, which has been maintained for 50 years (Balmaceda et al. 2009). These methods can also be applied for automatic processing of long-term observations. The high correlation coefficient with the practically complete coincidence of the average monthly values of calculated space in manual and automatic mode indicates the general possibility of using computer-processing methods. In the future we are planning to process long-term solar activity observations in automatic mode with visual filtration and correction.

Our analysis showed that long-term series of ground-based observations of sunspots can be successfully continued by observations on spacecraft (SOHO/MDI; SDO/HMI) using automated methods of allocation. This may affect the measurement of magnetic characteristics of sunspots and faculae at the same times. This allows creating databases of individual sunspots and faculae, including their magnetic characteristics.