1 Introduction

Solar flares are among the most significant phenomena related to solar activities. Since the high-energy particles produced by flare events interact with the interplanetary medium and the terrestrial environment, solar flares influence interplanetary space and affect the Earth’s ionosphere significantly (Benz, 2017). Space weather variations may damage space-borne and ground-based technological systems, as well as biological systems. Therefore, reliable flare alerting and forecasting systems have practical significance (Veronig et al., 2000). Obviously, extracting features from each detected flare event is essential for flare forecasting systems. In general, these features consist of start, peak, and end time, location, light curves, and detection images (Kraaikamp and Verbeeck, 2015).

Observing solar flares is of great importance not only for space weather forecasting and alerting, but also for studying solar physics. Magnetic reconnection provides the energy of eruptive solar events, but this occurs in the corona and thus obstructs direct observational study. As an alternative, it could be indirectly studied from the observation and analysis of the motion of flare ribbons and loops, which are among the clearest signatures of magnetic reconnection at easily observable wavelengths (Sui, Holman, and Dennis, 2004). Moreover, the particle acceleration, energy partition, and transport process that trigger solar flares remain debated. To build a better understanding of these processes, it is necessary to accumulate massive high-resolution flare observations in multiple energy ranges over the whole course of a flare (Bloomfield et al., 2016).

Statistical analysis is a useful tool for investigating the relationship between solar activity phenomena. Only by developing an automated flare detection program can the processing of the large number of events that are needed for a statistical analysis be achieved (Qu et al., 2004). Although statistical analyses have been conducted for H\(\upalpha\) and X-ray flares (Temmer et al., 2001; Hannah et al., 2011), only simple features were analyzed, including duration, event asymmetries, and spatial distribution on the solar disk. If the detection method were conceived on pixels rather than macropixels and if the flare region were segmented accurately, morphological studies of solar flares would be possible (Kraaikamp and Verbeeck, 2015).

Before extracting the flaring region, it is necessary to determine the start and end time of a solar flare to lighten the computing load. Veronig et al. (2000) considered that the flare starts at the time when the maximum intensity within the image exceeds twice the intensity of the quiet Sun. Pötzi et al. (2015) determined the start of a solar flare as the time when the brightness enhancement is higher than the faint flare level for three consecutive images. Some methods (Borda et al., 2002; Qu et al., 2003) extracted image features and trained models within image databases using machine-learning algorithms, e.g. neural networks and support vector machines, to classify flare and non-flare images.

Semi-automated methods for segmenting the flare regions (Saba, Gaeng, and Tarbell, 2006; Gill, Fletcher, and Marshall, 2010) were not able to handle huge amounts of image data. Previous works mostly focused on the H\(\upalpha\) bands, while some exceptions (Caballero and Aranda, 2014; Kraaikamp and Verbeeck, 2015) concentrated on the extreme ultraviolet (EUV) wavelengths. Veronig et al. (2000) implemented a region-growing algorithm by selecting some pixels with high intensities as the seed points, and the region-growing process stopped when the intensity was lower than twice the intensity of the quiet Sun or a region border determined by the Canny (1986) method was encountered. Qu et al. (2004) detected edges adaptively using the second-order derivative within sub-images, adopted region-growing by taking advantage of the edges, and applied morphological operations to derive the segmentation results. Maurya and Ambastha (2010) and Kirk et al. (2013) mainly adopted the gray threshold method setting the thresholds to 70% of the maximum and 1.35 times of the quiet Sun, respectively. By using manually tuned parameters, Piazzesi et al. (2012) regarded the region of interest with ascending intensities as the flare region. Pötzi et al. (2015) built a feature vector for each pixel and trained a Gaussian mixture model within an expert-annotated image database to classify each pixel into flare, sunspot, filament, or background, followed by a denoising procedure based on total variation.

All of these detection methods in the literature are designed for full-disk images. Local high-resolution observations reveal fine structures of solar flares and could be used to derive more accurate solar flare segmentation results. However, large flares usually exceed the field of view of a local solar image, and full-disk images are still essential for providing the overall information. Furthermore, previous methods in the literature used empirically determined gray thresholds, or the thresholds were based on a singular data source and thus lacked flexibility for different data sources. The flare region is the brightest part of the image, and it forms a local maximum in the gray histogram, which could be used to derive a proper gray threshold. Therefore, an automated solar flare detection method applied to both full-disk and local high-resolution H\(\upalpha\) images is proposed in this article. In this method, an adaptive gray threshold based on the gray histogram and an area threshold is used to segment the flare region. Then some features of each detected H\(\upalpha\) solar flare are extracted, e.g. the start, peak, and end time, the importance class, and the brightness class.

The article is organized as follows: the data sources are presented in Section 2. Section 3 describes the proposed method in detail, including image pre-processing, flare segmentation, and the feature extraction process. In Section 4, experimental results on full-disk and high-resolution observations are reported to validate the effectiveness of the segmentation method. Additionally, the extracted flare features are compared with official reports. Section 5 summarizes the method and concludes the article.

2 Data Sources

Figure 1
figure 1

(a) Full-disk H\(\upalpha\) images obtained by KSO at 07:08UT on 20 August 2017, (b) full-disk H\(\upalpha\) image obtained by BBSO at 17:47UT on 22 June 2015, and (c) local high-resolution H\(\upalpha\) line center image obtained by GST/VIS at 16:26UT on 22 June 2015.

The full-disk H\(\upalpha\) observations with \(2048\times 2048\) pixels and 12-bit depth in this article are provided by Big Bear Solar Observatory (BBSO) (Denker et al., 1999) and Kanzelhöhe Observatory for Solar and Environmental Research (KSO). Data from KSO are publicly available via the online KSO data archive at http://kanzelhohe.uni-graz.at/ (Pötzi, Polanec, and Temmer, 2013). Data from BBSO are publicly available via the online FTP Data Archive at http://bbsoweb.bbso.njit.edu/pub/archive, and the images with a smaller data gap can be requested via the Automated Data Request at http://www.bbso.njit.edu/cgi-bin/HaReqForm/.

The local high-resolution H\(\upalpha\) line center observations in this article are provided by the 1.6 m Goode Solar Telescope (GST) with the Visible Imaging Spectrometer (VIS) (Cao et al., 2010) at BBSO. GST/VIS observations can achieve a pixel size as small as \(0''.03\), and its field of view is \(57''\times 64''\). The images with \(1800\times 1700\) pixels processed in this article were cropped from the original images with \(2155\times 2555\) pixels and 16-bit depth. Data from GST/VIS can be requested via the online GST Data Request Form at http://www.bbso.njit.edu/~vayur/nst_requests/.

3 Methods

Figure 2 shows the schematic diagram of the flare detection and feature extraction method. For a regular PC, it takes about 5 s and 1 s to execute the whole process for each high-resolution image and full-disk image, respectively, and it achieves near real-time processing. The method mainly consists of three steps: pre-processing, the flare segmenting process, and post-processing. In the following, the three steps are described in detail.

Figure 2
figure 2

Schematic diagram of the flare detection and feature extraction method.

3.1 Image Pre-processing

The goal of pre-processing full-disk images is mainly to remove limb-darkening, as done in Zharkova et al. (2003). First, the mask defining the solar disk has to be determined, as shown in Figure 3b. Second, to prevent the impact from the dark background, set the average intensity within the solar disk to the pixels outside the disk. Third, impose severe blur on the original image iteratively to remove the image content and extract the intensity inhomogeneity. Finally, derive the pre-processed image, i.e. Figure 3d, as

$$ {\mathrm{Img}}_{d}=\mathrm{Img}_{a}/{\mathrm{Img}}_{c} \times \mathrm{Img}_{b}, $$
(1)

where \(\mathrm{Img}_{a}\) refers to Figure 3a. In Equation 1, \(\mathrm{Img}_{b}\), \(\mathrm{Img}_{c}\), and \(\mathrm{Img} _{d}\) are named in the same way.

Figure 3
figure 3

Demonstration of the subsequent phases of full-disk image pre-processing. (a) Same image as in Figure 1b, (b) mask defining the solar disk, (c) blurred image with limb-darkening and large-scale variations, and (d) the pre-processed image.

The pre-processing of local high-resolution images aims to remove dark points within the flare region. This can be achieved by implementing

$$ {\mathrm{Img}}'(x)=\left \{ \textstyle\begin{array}{l@{\quad}l} {\mathrm{Img}}(x), & \max({\mathrm{Img}_{5\times5}}(x))-\mathrm{Img}(x)\le T \\ \max({\mathrm{Img}_{5\times5}}(x)), & \max({\mathrm{Img}_{5\times5}}(x))-\mathrm{Img}(x)>T \end{array}\displaystyle \right . $$
(2)

iteratively, where \(\mathrm{Img}\) is the original image, \(x\) denotes a pixel in \(\mathrm{Img}\), \(\mathrm{Img}'\) is the pre-processed image, and \(\max({\mathrm{Img}_{5\times5}}(x))\) is the maximum intensity within the \(5\times 5\) window in the neighborhood of \(x\). In Equation 2, the pre-processing results are not sensitive to the threshold \(T\), which is set to 5000 empirically for 16-bit images. Figure 4 shows the original image and the pre-processed image.

Figure 4
figure 4

Left panel: Original high-resolution image. Right panel: Pre-processed image.

3.2 Flare Segmentation

A solar flare is the brightest part of the image. If the threshold can be chosen properly, the simple gray threshold method may derive good segmentation results. To promote the robustness, the histogram with 100 bins is smoothed, as shown in Equation 3, where \(i\) is the index of the grayscale, \(x(i)\) is the corresponding grayscale, and \(h(x(i))\) is the amplitude of the histogram as below:

$$ h\bigl(x(i)\bigr)=\left \{ \textstyle\begin{array}{l@{\quad}l} { ( h(x(i-1))+h(x(i))+h(x(i+1)) ) }/{3}, & 2\le i\le99 \\ h(x(i)), & i=1 \mbox{ or } 100. \end{array}\displaystyle \right . $$
(3)

In this article, the threshold Th is the location of the first local minimum on the right side of the gray histogram, as indicated with the black arrow in Figure 5. It can be calculated as

$$ {\mathrm{Th}}=\min\bigl(x({{i}_{\mathrm{m}}}),x(100)\bigr). $$
(4)

In Equation 4, the local minimum must meet the requirements that \(h(x({{i}_{\mathrm{m}}}))\ge h(x({{i}_{\mathrm{m}}}-1))\), \(h(x({{i}_{\mathrm{m}}}))\ge h(x({{i}_{\mathrm{m}}}+1))\), and \({{i}_{\mathrm{m}}}>50\). If no local minimum is found, it means that no solar flare occurs, thus \(\mathrm{Th}=x(100)\).

Figure 5
figure 5

Histogram and its smoothed curve of (a) the cropped image from the full-disk image and (b) the high-resolution image. The black arrows show the location of the gray threshold.

Then, an area threshold is set to remove the small flare part. It can also be used to determine the start and end time of each flare event, since there is a minimum flare area of one flare event. However, different solar observatories set different lower limits for the subflare area in their reports (Švestka, 1976). According to the observations of microflares, their typical size is several arcseconds (Berkebile-Stoiser et al., 2009), so it is reasonable to set the lower limit of subflare size to 10 arcseconds. A circle with 10 arcsec diameter has an area of \(78.5~\mbox{arcsec}^{2}\), i.e. \(13.5~\upmu\mbox{hem}\) (a millionth of solar hemisphere). If the segmented flare area is smaller than \(13.5~\upmu\mbox{hem}\), no flare is thought to occur.

3.3 Post-processing

In this processing step, the flare start, peak, and end time, the importance class, and the brightness class are extracted.

In each solar flare event, the brightness usually increases rapidly in several minutes to the maximum, and then decreases slowly. Taking advantage of this typical process will derive a more reliable result of the start and end time. According to the Space Weather Prediction Center (SWPC), the start time of an X-ray flare event is defined as the first minute, in a sequence of 4 minutes, of a steep monotonic increase in the soft X-ray flux. The end time is the time when the flux level decays to a point halfway between the maximum flux and the pre-flare background level. Similarly, if the flux is replaced by the segmented flare area, the start and end time of an H\(\upalpha\) solar flare event can be defined. The peak time is defined as the time when the highest intensity during the flare event is reached, which follows the definition in Pötzi et al. (2015).

To eliminate false detections, which have high brightness without obvious eruptive process, it is necessary to label each flare component. That is, to cluster nearby components and decide their start and end time as a whole. For the full-disk images, if the distance between components near the disk center is smaller than 150 arcsec, which follows Pötzi et al. (2015), and diminishes accordingly when moving close the limb, they are labeled with the same number.

In the band, the brightness of a flare can be classified as F (faint), N (normal) and B (bright). The brightness class used to be determined through the subjective estimation of the flare intensity by the observer (Temmer et al., 2001), as well as through the brightness enhancement (Wittman, 2012) and line width (see http://www.sws.bom.gov.au/Educational/2/4/2). Except for the subjective estimation, the method depending on brightness enhancement is to calculate the percentage between the highest intensity to the background intensity, but the intensities are strongly affected by the H\(\upalpha\) filter employed. As for the method depending on the line width, it needs to tune the H\(\upalpha\) filter off-band. In this article, the H\(\upalpha\) flare brightness class can be derived from

$$ {\mathrm{Bri}}=\frac{\max(\mathrm{flare})-\min(\mathrm{flare})}{ \mathrm{rms}(\mathrm{flare})} $$
(5)

and Table 1 in an objective manner, without tuning the filter. This method described in Equation 5 is based on the assumption that brighter flares have wider intensity distributions. The H\(\upalpha\) flare importance class depends on the flare area, and their relationship is shown in Table 2. We note that the flare areas in this article are in units of \(\upmu\mbox{hem}\).

Table 1 Relationship between the H\(\upalpha\) flare brightness class and the Bri.
Table 2 Relationship between the H\(\upalpha\) flare importance class and the flare area.

4 Results

BBSO and GST observations have the same observing time and site, so it is convenient to compare full-disk and high-resolution flare detection results using their observations.

Comparison with official data is essential to validate the detection method. The SWPC publishes an event report every day at http://www.swpc.noaa.gov/products/solar-and-geophysical-event-reports/. The report contains features of optical flares observed in H\(\upalpha\) by several observatories, including Culgoora in Australia, Holloman in the USA, and San Vito in Italy. KSO extracts H\(\upalpha\) flare features using the method proposed by Pötzi et al. (2015), and publishes the results at http://ceasr.kso.ac.at/flare_data/kh_flares-query.php. To facilitate comparison with the features extracted by KSO and to validate the reliability on different databases of our method, experiments on KSO observations are also carried out.

In this section, experimental results consist of the segmented flare region and the comparison with SWPC and KSO reports on brightness class, importance class, and start and end time of each flare event.

4.1 Flare Segmentation

To validate the effectiveness of our method, we compared the results with those of Qu et al. (2004), Maurya and Ambastha (2010), and Kirk et al. (2013). In Figures 6 and 7, the full-disk H\(\upalpha\) observations of different flare phases from BBSO and KSO are used to verify the reliability of the databases. The corresponding observation times are given at the top, and the applied methods are labeled on the right. From a subjective point of view, in Figure 6, our method and the method employed by Maurya and Ambastha (2010) obtain satisfying results, while the results derived by Kirk et al. (2013) are less stable between different flare phases, and the results reported by Qu et al. (2004) tend to overestimate the flare area. In Figure 7, our method and that of Kirk et al. (2013) obtain satisfying results, while results reported by Maurya and Ambastha (2010) and Qu et al. (2004) tend to overestimate and underestimate the flare area, respectively.

Figure 6
figure 6

Segmentation results of the M6.5 solar flare on 22 June 2015, using the BBSO full-disk H\(\upalpha\) images. The cropped images from the original images are shown in the first row, the segmentation results adopted by our method and those reported by Maurya and Ambastha (2010), Kirk et al. (2013), and Qu et al. (2004) are shown in the following rows, respectively. The red areas in these images are the detected solar flare.

Figure 7
figure 7

Corresponding segmentation results of the M5.3 solar flare on 2 April 2017, using the KSO full-disk H\(\upalpha\) images.

To validate the accuracy of the segmentation results more objectively, we compare the solar flare area and the importance class between SWPC reports and the methods we tested in Table 3. We conclude that the flare areas calculated by our method and that of Kirk et al. (2013) correlate better with SWPC reports than the other tested methods. In summary, our method obtains more stable and accurate segmentation results than previous works.

Table 3 Comparison of the flare areas and importance classes of the M6.5 flare on 22 June 2015 and the M5.3 flare on 2 April 2017, calculated for SWPC data with our method and those of Maurya and Ambastha (2010), Kirk et al. (2013), and Qu et al. (2004).

Furthermore, to test the performance of our method on high-resolution images, the segmentation results for the same period as in Figure 6 using the GST high-resolution H\(\upalpha\) line center images are shown in Figure 8. We conclude that our method can satisfyingly derive solar flare segmentation results at any phase of flare eruption in full-disk and high-resolution images.

Figure 8
figure 8

Segmentation results of the M6.5 solar flare on 22 June 2015, using the GST high-resolution H\(\upalpha\) line center images. The pre-processed images are shown in the top row, the corresponding times are plotted in the middle, and the red areas in the images in the bottom row are the detected solar flare.

4.2 Flare Classification

The extracted features of three solar flare events using the full-disk images of KSO are given in Tables 4, 5, and 6. We note that a single letter can follow a start, peak, and end time. \(\mbox{A} = \mbox{after}\), \(\mbox{B} = \mbox{before}\), and \(\mbox{U} = \mbox{uncertain}\). For example, the start time 07:29 B means that the event began before 07:29.

Table 4 Comparison of the C1.1 solar flare features on 28 March 2017, extracted with the method in this article and as given by SWPC and KSO.
Table 5 Comparison of the M5.3 solar flare features on 2 April 2017, extracted with the method in this article and as given by SWPC and KSO.
Table 6 Comparison of the X2.2 solar flare features on 6 September 2017, extracted with the method in this article and as given by SWPC and KSO.

The experimental results verify that the derived importance class and brightness class correlate well with the data given by SWPC and KSO. The data about the start, peak, and end time given by our method and KSO correlate as well. Although there are some discrepancies between them and the time given by SWPC, this might be explained by the data gap and the difference between data sources. For the C1.1 flare on 28 March 2017, there is a data gap between 07:09 and 07:29 for KSO. For the M5.3 flare on 2 April 2017, no KSO data before 08:22 and after 09:09 are available. For the X2.2 flare on 6 September 2017, SWPC identifies several H\(\upalpha\) flare events using observations from San Vito in Italy and Holloman in the USA: one event between 08:30 and 08:44, one event between 08:52 and 15:53, and one event between 13:54 and 17:52.

Using the high-resolution solar images in the H\(\upalpha\) line center derived by GST on 22 June 2015, the variation in gray threshold and flare area is shown in the left diagram of Figure 9. During the start and end time, the derived gray threshold remains steady, and the variations in flare area are consistent with the typical flare development process, i.e. increases dramatically followed by a slow decay. This verifies that the derived gray threshold of our method is reliable for high-resolution image variations. The right diagram in Figure 9 shows the corresponding results for full-disk images. The derived gray threshold is less steady than in the left diagram, and it varies with the maximum intensity within the flare region.

Figure 9
figure 9

Derived gray threshold and flare area changes as a function of time, according to the full-disk solar H\(\upalpha\) images and the high-resolution solar images in the H\(\upalpha\) line center for the M6.5 class flare on 22 June 2015. The left diagram corresponds to the result of high-resolution images, and the right diagram corresponds to the result of full-disk images. The green vertical lines denote the start and end time of the flare.

5 Conclusion

We developed an automated solar flare segmentation and feature extraction method that we applied to high-resolution and full-disk H\(\upalpha\) images. The segmentation procedure is a combination of an adaptive gray threshold and an area threshold. The start, peak, and end time of each flare are computed, which are based on the time when the flare area exceeds a threshold, the flare maximizes its intensity, and the flare area a threshold. At the same time, H\(\upalpha\) flare features such as the importance class and brightness class are also extracted, which are based on the maximum flare area and intensity distribution within the flare region, respectively. Experimental results have verified that the segmentation results on full-disk images derived with our method are more stable and accurate than previous works. In addition, the segmented flare region in high-resolution images is consistent with subjective evaluations, and the extracted flare features correlate well with the data given by KSO. When abundant features from observations of different telescopes can be obtained in a consistent way, our understanding of H\(\upalpha\) solar flares may be extended through a more complicated statistical analysis.