1 Introduction

A solar flare is observed at the solar surface with strong and extended emissions of energy and light. Flares are often associated with plasma ejections from the solar atmosphere, which are called coronal mass ejections (CMEs). The flares eject electrons, ions, and atoms from the Sun into interplanetary space. Thus, solar flares are very important for space-weather forecasting. Many authors have studied the relations between flare occurrences and several solar-activity parameters such as McIntosh group, Mount Wilson classification, sunspot area, and magnetic properties (Smith and Howard, 1968; McIntosh, 1990; Sammis, Tang, and Zirin, 2000; Zharkov and Zharkova, 2006; Barnes et al., 2007; Barnes and Leka, 2008). In addition, there are close relations between previous flares and future flaring, such as homologous flares, which occur in a similar location with a similar shape (e.g. Zirin and Werner, 1967; Sui, Holman, and Dennis, 2004; Xu et al., 2014). These flares are observed in a complex active region, which may produce strong multiple flares on consecutive days.

In recent years, the prediction of solar flares has been studied using two representative methods: One is the multiple regressive method, which assumes a linear relation between input and output parameters (Bornmann and Shaw, 1993; Lee et al., 2007; Song et al., 2009; Yuan et al., 2010). The other is the machine-learning method, which can be used with an artificial neural network (ANN) or a support vector machine (SVM). The SVM method classifies data into two groups and maximizes the margin between the two groups (Li et al., 2007, 2008; Qahwaji and Colak, 2007; Yuan et al., 2010). The ANN method is similar to human nerve structure and has been applied to flare-forecast models (Qahwaji and Colak, 2007; Colak and Qahwaji 2009; Wang et al., 2008; Li and Zhu, 2013; Ahmed et al., 2013).

Several flare-forecast models have been developed based on sunspot-group information: the automated solar activity prediction (ASAP: Colak and Qahwaji, 2009) and the magnetic free energy (MAG4: Falconer et al., 2014). A flare-forecast model is operated at the Space Weather Prediction Center of the National Oceanic and Atmospheric Administration (NOAA/SWPC). This NOAA model predicts a flare-occurrence probability based on both McIntosh sunspot properties (e.g. McIntosh classification and sunspot area) and the expertise of the forecaster (Crown, 2012).

Most researchers have made flare-probability forecasts based on Poisson statistics (Gallagher, Moon, and Wang, 2002; Lee et al., 2012; Bloomfield et al., 2012). Recently, several authors have studied flaring and non-flaring forecasts using magnetograms from the Solar and Heliospheric Observatory (SOHO: Domingo, Fleck, and Poland, 1995)/Michelson Doppler Imager (MDI: Scherrer et al., 1995) and the Solar Dynamic Observatory (SDO: Pesnell, Thompson, and Chamberlin, 2012)/Helioseismic and Magnetic Imager (HMI: Schou et al., 2012) with support vector machine methods (Bobra and Couvidat, 2015) and neural network (Ahmed et al., 2013).

In this study, we focus on the forecast of maximum flare flux for strong flares, which is a more straightforward forecast of flare strength than flare probability or flaring and non-flaring forecasting. We develop daily maximum flare-flux models using the McIntosh classification, sunspot area, Mount Wilson magnetic classification, and flare fluxes of the previous day. We develop our models using the same number of flares in each class (C-, M-, and X-class) for training since previous forecast models have shown relatively lower performances for strong flares (M- and X-class) than for weak flares (C-class), which might be caused by the fact that a larger number of weak flares is considered for training. To give more weight to strong flares, we introduce a weighted mean flare rate [WMFR] and weighted total flare flux [\(F_{\mathrm{t}}\)] as input parameters. The WMFRs are evaluated using the historical data of flares according to McIntosh and Mount Wilson classifications by converting them into occurrence rates with weights for each flare class, which are used for input numerical values. The \(F_{\mathrm{t}}\) is the total-flare flux of the previous day with the weights for each flare class. We use multiple linear regressive and artificial neural network methods to develop the models. We find the best models by evaluating the performance of the developed models using various statistical parameters.

This article is organized as follows: We introduce the data in Section 2. We describe forecast methods and statistical parameters for evaluation in Section 3. In Section 4, we present the results and compare them with results of other articles. Finally, we present the summary and conclusion in Section 5.

2 Data

We used the NOAA Solar Region Summary (SRS) for solar active-region (AR) information ( www.swpc.noaa.gov/ftpmenu/warehouse.html ) from January 1996 to November 2013. For information about the flares, we used the Geostationary Operational Environmental Satellite (GOES) solar X-ray flare catalog of the National Geophysical Data Center (NGDC) ( ftp://ftp.ngdc.noaa.gov/STP/SOLAR_DATA/SOLAR_FLARES/FLARES_XRAY ) from January 1996 to December 2012 and the Lockheed-Martin latest solar events ( www.lmsal.com/solarsoft/latest_events_archive.html ) from January 2013 to November 2013.

To generate a set of data for each AR, we took the McIntosh classification (McIntosh 1990), sunspot area, and Mount Wilson magnetic classification (Smith and Howard, 1968) from the SRS. We used all ARs for a given day from SRS data. For the flare-flux parameter of a specific AR for a given day, we chose the highest flare flux from multiple X-ray flares for the AR. Then we took the daily X-ray flare fluxes for all C-, M-, and X-class flares for a given AR. It is well known that most X-class flares are accompanied by previous flare activity (Xu et al., 2014) and the information on X-ray flares of the previous day is important input for the forecast of flares (Lee et al., 2007). Thus, we considered 92 X-class flare events out of 105 X-class flares that had X-ray flares during the previous day. Since the remaining 13 X-class flares except for one event occurred near the limb, we were unable to obtain AR information from SRS data or the NGDC X-ray flare catalog. We did not consider the remaining one event since its AR information from SRS is inconsistent with that from the NGDC flare catalog. Therefore, we used 92 X-class flare events. The number of training events from these 92 events is 61, and 31 were testing events.

We only considered flares that had X-ray flares on the previous day. The training data are composed of flaring ARs especially for C-, M-, and X-class. The training dataset consists of C-, M-, and X-class flares and the numbers of each class of flares are all 61 from January 1996 to December 2004. We randomly selected 61 events for C- and M-class flares. After the training was completed, we used the testing dataset from January 2005 to November 2013. The testing dataset consists of all ARs that had flares.

2.1 Weighted Mean Flare Rate of McIntosh and Mount Wilson Magnetic Classifications

We defined the weighted mean flare rate (WMFR) to give a greater weight to strong flares as follows:

$$\begin{aligned} \mathrm{WMFR} =& \frac{{N_{\mathrm{C}}} + 10 {N_{\mathrm{M}}} + 100 {N_{\mathrm{X}}}}{{\mbox{Total number of active regions for each group}}}, \end{aligned}$$
(1)

where \({N_{\mathrm{C}}}, {N_{\mathrm{M}}}\), and \({N_{\mathrm{X}}}\) are the number of C-, M-, and X-class flares for each McIntosh and Mount Wilson classification.

Figures 1 and 2 show the top ten WMFRs of the 60 McIntosh classifications and the WMFR values of each Mount Wilson magnetic classification calculated using data from January 1996 to December 2004. Stronger and more complex McIntosh classifications have higher WMFR values. Most of the complex sunspots of the Mount Wilson magnetic classification that are characterized by \(\gamma\) and/or \(\delta\) show higher WMFR values.

Figure 1
figure 1

WMFR distribution for the McIntosh sunspot classification.

Figure 2
figure 2

WMFR distribution for the Mount Wilson magnetic classification.

2.2 Weighted Total Flare Flux \(F_{\mathrm{t}}\)

To consider the X-ray maximum flare flux, we generated two parameters. One was the logarithm base-ten value of the strongest X-ray maximum flare flux in each AR, which was normalized by the C1.0 flux. For example, C2.1, M5.7, and X1.1 were converted to \(\log_{10}{2.1}\), \(\log_{10}{57}\), and \(\log_{10}{110}\), respectively. To give more weight to the higher flaring classes (M- and X-class), we introduced the weighted total flare flux \([F_{\mathrm{t}}]\) as follows:

$$\begin{aligned} {F_{\mathrm{t}}} = \log_{10} \biggl[ {\sum_{i=0}}{F_{\mathrm{C}}}+10 {\sum_{j=0}}{F_{\mathrm{M}}}+100 {\sum _{k=0}}{F_{\mathrm{X}}} \biggr], \end{aligned}$$
(2)

where \({F_{\mathrm{C}}}\), \({F_{\mathrm{M}}}\), and \({F_{\mathrm{X}}}\) are the flux of C-, M-, and X-class flares for each AR, respectively. For instance, if one active region generates C5.4, M3.7, M2.1, and X1.2 flares on a specific day, then the weighted total flux is given by

$$\begin{aligned} {F_{\mathrm{t}}} = \log_{10} [{5.4+10 (37+21)+100\,120} ]. \end{aligned}$$
(3)

After the maximum flare flux and \(F_{\mathrm{t}}\) were evaluated, we used them as the input parameters of the forecasting models.

2.3 Input and Output

We specified five input parameters for the forecasting models. We used the WMFRs of McIntosh and Mount Wilson classifications and the maximum flare flux and \(F_{\mathrm{t}}\) of the previous day as explained in Sections 2.1 and 2.2. In addition, we used the area of a given active region since it is closely related with the flare flux (e.g. Sammis, Tang, and Zirin, 2000). All input parameters were normalized by their maximum values so that the parameters are from 0 to 1. The output is the solar maximum flare flux that occurs within 24 hours for each AR on a specific day. We summarize the input and output parameters in Table 1.

Table 1 Summary of input parameters.

3 Method

We here apply multiple linear regressive and artificial neural network methods to historical solar AR and flare data to forecast a daily maximum flare flux. We tried out various combinations of the five input parameters proposed in Section 2 to find better estimates.

A comparison of these methods is useful to determine which method is appropriate for a certain dataset. The linear regressive method simply assumes a linear relation of the input and output parameters. On the other hand, neural-network algorithms teach a system, including nonlinear behaviors and trial and error, to propose better estimates.

3.1 Multiple Linear Regressive (MLR) Method

A multiple linear regression (MLR) method typically generates a linear system \(Ax=b\) that has more equations than unknowns. For an example of one-dimensional regression, we estimated parameters \(a\) and \(b\) in the linear equation \(y = a x + b\) with dataset \((x_{j} , y_{j})\), where \(j = 1, 2, \ldots, n\). To evaluate \(a\) and \(b\), the MLR uses the least-squares method, which minimizes the distances between the given data and the regressed line.

We first set up the multi-linear equation using the input parameters in Table 1. At a given time [\(t\)], the following equation estimates the flare flux at a time \(t+1\):

$$\begin{aligned} Y_{t+1} = \sum_{j = 1 }^{5}{a_{j} (X_{t})_{j}} + b, \end{aligned}$$
(4)

where \(Y_{t+1}\) is the predicted flare flux and \(X_{t}\)s are normalized input parameters. \(a_{1}, a_{2}, \ldots, a_{5}\) and \(b\) are coefficients and constant value, respectively, which were obtained using the IDL function regress.

We trained the system on a dataset of C-, M-, and X-class flares. Since the number of X-class flares, which has the flare information of the previous day, is 61, we chose the same number of C- and M-class flares for training datasets. If we had used all flares, Equation (4) might have found the best solution for weaker flares since the stronger flares (M- and X-class) are much rarer than the weaker flares (C-class).

3.2 Artificial Neural Network (ANN) Method

An artificial neural network (ANN) was first proposed in artificial intelligence fields, where it mimicked the human neural systems. The ANN method has been applied to various scientific simulations and succeeded in recognizing a pattern in a variety of datasets. In the field of space-weather forecasting, ANN is frequently used to forecast flares by constructing suitable networks (e.g. Qahwaji and Colak, 2007; Colak and Qahwaji 2009; Wang et al., 2008).

A widespread ANN consists of three layers, which are the input, hidden, and output layers as described in Figure 3. Each layer is connected by some weights that state how easily a stimulus is transferred to the next neuron based on a specific activation function [\(\sigma(w,x)\)]. The activation function [\(\sigma(w,x)\)] was chosen as a sigmoid or Gaussian function in this study.

Figure 3
figure 3

Simple artificial neural network (ANN) process with three layers.

A mathematical representation of Figure 3 can be written as

$$\begin{aligned} y_{k} (x,w)=\sigma \Biggl( {\sum_{j=1}^{M}}w_{kj}^{(2*)} \sigma \Biggl(\sum_{i=1}^{D} w_{ji}^{(1*)} x_{i}+w_{j0}^{(1*)} \Biggr) + w_{k0}^{(2*)} \Biggr), \end{aligned}$$
(5)

where \(M\) is the number of weights from input layer to hidden layer, and \(D\) is the number of weights from hidden layer to output layer. Here, the input parameter is \(x_{i}\) and the output variable is \(y_{k}\). There are several ways to evaluate the unknowns [\(w\)] of Equation (5). We selected two methods to calculate \(w\) by applying the backward-propagation and resilient-propagation measures (Graupe, 2007).

The number of hidden layers affects the forecasting performance. The number of hidden layers can be chosen in various ways depending on the problems, and thus empirical trials may suggest a successful number of hidden layers. We selected two hidden layers because two layers tend to show the best predictions in many trials. We considered between \(1\times1\) nodes to \(10\times10\) nodes for each hidden layer.

The input parameters were trained by ANN using the open-source computer vision library (OpenCV: opencv.org ), which includes various machine-learning modules (Bradski, 2000). Training and prediction programs were compiled with the library using Visual C++. The ANN uses a resilient propagation (RPROP) algorithm of multilayer feed-forward neural networks. The RPROP is a first-order optimization algorithm and one of the fastest error propagation algorithms (Riedmiler and Braun 1993). To prevent overfitting problems in the training process, we used as termination criteria both an iteration number and a training error. We used the Gaussian functions as the activation function in the ANN.

3.3 Evaluation

A contingency table was used to measure the performance of the forecasting models. This table is made up of the number of yes/no between observed and predicted events, similar as in Table 2. A, B, C, and D are the number of hits, false alarms, misses, and correct nulls, respectively.

Table 2 The contingency table for prediction and observation.

The output value of our model is flare flux. We have two ways to evaluate this: the contingency table, and the hitting fraction. To change our results to a contingency table, we considered each flare class as follows. For an example of C-class flares, the number of hits (A in Table 2) was counted when the observed and forecast flare fluxes were all C-class. The number of false alarms (B) was counted when the observed flare flux was not C-class, but the predicted flare was C-class. The number of misses (C) was counted when the observed flare flux was C-class, but the predicted flare flux was not. Lastly, the number of correct nulls (D) was counted when the observed and predicted flare fluxes were not C-class. In addition, we considered a hitting fraction, described in Section 4.3. For example, when the flare flux of AR 10 715 was of class M1.2 on 31 December 2014, and the predicted flare flux was M1.4, the number of hits (A) was counted for the M-class contingency table. Using this contingency table, we derive statistical parameters that measure the performance of forecasting.

There are various statistical measurements of forecasting performance using contingency tables. In Table 3, we specifically chose the probability of detection (POD), critical success index (CSI), false-alarm ratio (FAR), and the Heidke skill score (HSS). POD represents the ratio of the number of events correctly forecast to the number of total observed events. This score ranges from 0 to 1, where unity gives the best score. There is a good reason to use CSI instead, since CSI includes false alarms, which results in a more balanced score. The best score for CSI is the same as for POD, 1. FAR measures a rate of incorrect forecast events by the ratio of false alarms to the total number of forecast events. FAR also has a range between 0 and 1, and the best score is 0. HSS indicates an improvement of the forecast over the standard forecast, which usually refers to numbers that are correct by chance. This HSS has a range from \(-\infty \) to 1. If the HSS value is lower than 0, then the forecasting model has no skill, whereas the score 1 corresponds to a perfect forecast.

Table 3 Statistical parameters from the contingency table.

Because the HSS is known to be a good statistical parameter for rare-event forecasts (Doswell, Davies-Jones, and Keller, 1990), we mainly used the HSS, which is appropriate for measuring performances for strong-flare (M- and X-class) forecasts.

4 Results and Discussion

We used various combinations of input parameters for each method. We estimated the performance of each model using the statistical parameters described in Section 3.3. The forecast of M- and X-class flares is more important for space weather than that of weak flares. Therefore we focused on evaluating the performance of models for M- and X-class flares using HSS. From the results, we chose the best model for each forecasting method and compared these models.

4.1 Multiple Linear Regressive (MLR) Method

The development of a daily maximum flare-flux forecast model by MLR has two stages. First, we evaluate coefficients [\(a_{j}\)] and constant [\(b\)] in Equation (4) using a training dataset. Second, a testing dataset is applied to Equation (4) with the previously determined coefficients. We made many trials of the combinations of input parameters and then chose the best four models with the highest values of the averaged HSS for all flares (\(\mathrm{M}{+}\mathrm{X}\)-class).

Table 4 shows input parameters and their coefficients for each MLR model using the MLR method for four models. We note that all cases include area [\({X_{3}}\)] and flare-flux parameter [\({X_{2}}\) or \({X_{4}}\)]. Since all input parameters are normalized, the coefficients (slopes) of input parameters imply the relative contribution to the output parameter. Table 4 shows that \(a_{2}\) and \(a_{4}\) are higher than the other coefficients, which shows that the daily maximum flare flux most strongly depends on the flare flux of the previous day.

Table 4 Best four models for the MLR.

Table 6 shows the statistical parameters of four models with high HSS values for M- and X-class flares. The best model that has the highest HSS value is MLR1, which uses the input parameters \(X_{2}\) and \(X_{3}\). For M- and X-class flares, the best scores of HSS and the other statistical parameters are found in MLR2 and MLR1, respectively (boldface in Table 6). Table 5 shows the contingency table for the best two models, MLR1 and MLR2. The area [\({X_{3}}\)] was used for all models as an input parameter. This result confirms the earlier studies that the flare flux is closely related to the sunspot area as a proxy of magnetic flux (Sammis, Tang, and Zirin, 2000; Lee et al., 2012).

Table 5 The contingency table for MLR1 and MLR2.

4.2 Artificial Neural Network (ANN)

To implement the daily maximum flare-flux forecast model using ANN, we first trained a neural network and then used a test dataset for flare forecasting. The training and testing datasets were the same as those of the MLR method. To choose the best models, we empirically ran the models with many combinations of input parameters and then found the four best models according to the averaged HSS values of all flares (\(\mathrm{M}{+}\mathrm{X}\)-class) as described in Tables 7 and 9.

The highest value of the averaged HSS of all flares (\(\mathrm{M}{+}\mathrm{X}\)-class) is 0.316 for the ANN1 model, which uses the input parameters \(X_{3}\), \(X_{4}\), and \(X_{5}\) in Table 7. For M- and X-class flares, the best scores of HSS and the other statistical parameters are found in ANN2 and ANN1, respectively (boldface in Table 9). Table 8 shows the contingency table for the best two models, ANN1 and ANN2.

It is noted that the area [\({X_{3}}\)], the weighted total flare flux [\({X_{4}}\)], and the Mount Wilson magnetic classification [\({X_{5}}\)] are included as input parameters for all models except for ANN2. Thus, these are the significant parameters for strong flare-flux forecasting with the ANN method.

4.3 Comparison of the Models and Discussion

By comparing the averaged HSS of MLR and ANN models in Tables 6 and 9, we find that all ANN models are more accurate than all MLR models. While the POD values of the MLR models are more accurate than those of the ANN models, the other four parameters of the ANN models are more accurate than those of the MLR models. These results imply that the ANN models based on the learning systems of neural network have better performance for strong-flare forecasting than the MLR models, which simply assume a linear relation between input and output parameters.

Table 6 Statistical parameters of the four best MLR models for M- and X-class flares. Boldface indicates the best score for each parameter among the four MLR models.
Table 7 Best four models for the ANN.
Table 8 The contingency table for ANN1 and ANN2.
Table 9 Statistical parameters of the best four ANN models for M- and X-class flares. Boldface indicates the best score for each parameter among the four ANN models.

Table 10 shows a comparison of our models with the NOAA look-up table (Crown, 2012).

Table 10 Comparison of the TSS with a NOAA model. Average TSS values for ten random simulations and (standard deviation).

Although HSS is a good parameter for measuring the performance of forecasting rare events, the restriction is that it can only be applied for the same ratio of non-flaring and flaring observations. Bloomfield et al. (2012) suggested an alternative parameter, the true skill statistics (TSS), which does not depend on this ratio. Therefore, we list TSS values to compare our models with the NOAA model in Table 10. To verify the consistency of our best MLR and ANN models, we additionally list standard deviations of the TSS in parentheses in Table 10. These standard deviations are derived from ten random sets of the training dataset, which contains 61 numbers of random choices of C- and M-class. It is noted that the MLR method has relatively lower standard deviations than the ANN method since the MLR method has a lower error bar and is less prone to overfitting than a highly flexible algorithm such as the ANN method. The MLR method fits linear behaviors, whereas the ANN method is more appropriate for nonlinear fitting, which generally depends on the input dataset. This implies that the ANN is more sensitive to selecting the training dataset. The best averaged TSS value of all flares (\(\mathrm{M}{+}\mathrm{X}\)-class) of our models is 0.371 for MLR1, which is significantly higher than that of the NOAA model. The best models with the highest TSS values for M- and X-class flares are ANN2 (0.432) and MLR1 (0.377), respectively.

In addition, to evaluate the performance of maximum flare-flux forecasting, we defined a hitting fraction (HF) for each flare, which is the fraction of events satisfying the following criterion:

$$\begin{aligned} \big|\log_{10}\mbox{(observed flux)} - \log_{10}\mbox{(predicted flux)} \big|\le 0.5 \end{aligned}$$
(6)

to measure the performance of the flare-flux forecast. In this case, the hitting is counted if the difference between the predicted and observed value is less than 0.5 in the logarithmic. The HF implies how accurately our daily maximum flare-flux forecast models estimate the observed maximum flare flux.

Table 11 shows the hitting fraction (HF) of the MLR1, ANN1, and ANN2 models. The evaluated hitting rates are in the range of 0.549 – 0.702, suggesting that the best models successfully forecast the maximum flare fluxes of about two-thirds of the events within the criteria. We also found that the hitting fractions of the ANN1 model are more accurate than those of the ANN2 model, which is more noticeable for X-class flares.

Table 11 Hitting fraction (HF) of each forecasting model.

5 Summary and Conclusions

To automatically forecast the daily maximum flare flux, we applied multiple linear regressive (MLR) and artificial neural network (ANN) methods to several input parameters such as the weighted mean flare rate of McIntosh and Mount Wilson classifications, sunspot area, previous maximum flare flux, and weighted total flare flux from January 1996 to December 2013. We considered the training dataset using the same number (61) of data for each C-, M-, and X-class to avoid giving more significance to weak flares since strong flares occur much less frequently than weak flares. We selected the best four models of each method. Our main results are as follows.

  1. i)

    We found that the active-region area [\({X_{3}}\)] and the weighted total flare flux [\({X_{4}}\)] of the previous day are most widely used as the input parameter of the models. This implies that the magnetic flux of an active region and flare information of the previous day are significantly important for maximum flare-flux forecasting.

  2. ii)

    To compare our models with previous ones, we used the TSS because it does not depend on the ratio of non-flaring and flaring observations. A comparison of the TSS of our MLR1 model with those of the NOAA model shows that the X-class flare-forecasting of our model (MLR1: 0.377) is significantly better than the NOAA model (NOAA: 0.243)

  3. iii)

    We defined a hitting fraction (HF), which measures the fraction of successfully forecast events. The HF of our models show that they can successfully forecast the maximum flare flux of about two-thirds of the events for strong flares.

To our knowledge, this study is the first attempt to forecast the daily maximum flare flux for a given active region using a neural-network method, which is different from most of other flare-forecasting models that predict the flare occurrence probabilities for each flare class. Both types of flare forecasts are complementary to each other in that common users are interested in the probabilities of all class flares as well as in the daily maximum flare flux. The models developed in this study should be meaningful for an automatic space-weather forecasting and should be extended to additional input parameters such as magnetic parameters.