1 Introduction

Ensemble modeling has been employed in weather forecasting in order to quantify prediction uncertainties and determine forecast confidence (Sivillo, Ahlquist, and Toth 1997). Individual forecasts that constitute an ensemble forecast represent possible scenarios that approximate a probability distribution that reflects forecasting uncertainties. Such uncertainties include those associated with initial conditions (such as observational uncertainties), techniques, and models. Different forecasts in the ensemble can start from different initial conditions and/or be based on different forecasting models/procedures. In the simplest application, the ensemble mean or a weighted mean can be taken as a single forecast. The ensemble mean should perform better than individual ensemble members by emphasizing systematic features found in all members. However, an ensemble also contains additional information about possible scenarios and their probabilities and thus provides a probabilistic forecast. For example, ensemble modeling provides a quantitative description of the forecast probability that an event will occur by giving event-occurrence predictions as a percentage of ensemble size. This conveys the level of uncertainty in a given forecast in contrast to a categorical yes/no forecast. Additionally, all ensemble-forecast members can be plotted together to allow visualization of the uncertainty among ensemble members, and their clustering distribution. An example of such a visualization is hurricane-track “plume” maps in weather forecasting. Regions where members tend to coincide/cluster can be taken to have a higher forecast confidence.

To understand the uncertainties in space-weather forecasting, ensemble coronal mass ejection (CME) forecasting efforts have now begun in space-weather models of the heliosphere. Fry et al. (2003), McKenna-Lawlor et al. (2006), and Smith et al. (2009) compared the performance of real-time shock-arrival-time forecasts following solar events (since 1997) from the three “Fearless Forecast” models: Shock Time of Arrival (STOA: Dryer 1974), Interplanetary Shock Propagation Model (ISPM: Smith and Dryer 1990), and Hakamada–Akasofu–Fry (HAFv.2: Dryer et al. 2001). While there are many models predicting the evolution of CMEs (see Zhao and Dryer 2014 and references therein), only the Wang–Sheeley–Arge (WSA) coronal model (Arge and Pizzo 2000; Arge et al. 2004) coupled with the global heliospheric ENLIL solar-wind model (Odstrčil 2003) has been used extensively in space-weather operations world-wide. The first effort in using this model for ensemble forecasting of CME propagation was reported by Pulkkinen et al. (2011). Emmons et al. (2013) performed WSA–ENLIL ensemble CME modeling using 100 ensemble members for 15 historical events with automatically determined cone-model CME parameters (Pulkkinen, Oates, and Taktakishvili 2010). They found that the observed CME arrival was within the ensemble prediction spread for 8 out of the 15 events. Lee et al. (2013) discussed ensemble modeling of CME propagation with WSA–ENLIL for an event study, using eight ensemble members and various synoptic background maps. Differences found in the predicted arrival time of each individual simulation were mostly due to CME initial speed and the time at which the CME was inserted at the WSA–ENLIL inner boundary, resulting in propagation through a different background solar wind. They used National Solar Observatory Global Oscillation Network Group (GONG: Harvey et al. 1996) synoptic magnetograms and Air Force Data Assimilative Photospheric flux Transport (ADAPT) maps (Arge et al. 2010; Henney et al. 2012). For their CME event, they showed that when using ADAPT maps, the WSA–ENLIL model values agreed better with in-situ observations, and the arrival-time predictions were improved as a result of the more accurate background solar-wind representation. However, the overall spread in CME arrival times did not change significantly.

This article describes the WSA–ENLIL+Cone ensemble-modeling system installed at the Community Coordinated Modeling Center (CCMC) and results from the past 1.5 years of real-time execution at the CCMC/Space Weather Research Center. This is the first ensemble space-weather prediction system for CME propagation of its kind employed in a real-time environment. The current version of the system evaluates the sensitivity of CME arrival-time predictions from the WSA–ENLIL+Cone model to initial CME parameters. The CCMC, located at NASA Goddard Space Flight Center, is an interagency partnership to facilitate community research and accelerate implementation of progress in research into space-weather operations. The SWRC is a CCMC sub-team that provides space-weather services to NASA robotic mission operators and science campaigns and prototypes new models, forecasting techniques, and procedures. The CCMC also serves the CME Scoreboard website ( kauai.ccmc.gsfc.nasa.gov/CMEscoreboard ) to the research community who may submit CME arrival-time predictions in real time for a variety of forecasting methods. The website facilitates model validation under real-time conditions and enables collaboration. For every CME event table on the site, the average of all submitted forecasts is automatically computed, thus itself providing a world-wide ensemble mean CME arrival-time forecast from a variety of models/methods.

In Section 2 a brief description of the WSA–ENLIL+Cone model is given. The triangulation algorithm for determining CME parameters for the ENLIL model is described in Section 3. The real-time ensemble-modeling methodology is explained in Section 4, followed by an example of an ensemble simulation given in Section 5. Results and the evaluation of the first 1.5 years of simulations are described in Section 6. In Section 7 we discuss a parametric event case study of the sensitivity of the CME arrival-time prediction to model free parameters for the CME and ambient solar wind. Finally, a summary and discussion are presented in Section 8.

2 WSA–ENLIL+Cone Model Description

The global 3D MHD WSA–ENLIL model provides a time-dependent description of the background solar-wind plasma and magnetic field into which a CME can be inserted (Odstrčil, Smith, and Dryer 1996; Odstrčil and Pizzo 1999a, 1999b; Odstrčil 2003; Odstrčil, Riley, and Zhao 2004). This modeling system does not simulate CME initiation but uses kinematic properties of CMEs inferred from coronagraphs to launch a CME-like hydrodynamic structure into the solar wind and interplanetary magnetic field computed from the WSA coronal model (Arge and Pizzo 2000; Arge et al. 2004). A common method to estimate the 3D CME kinematic and geometric parameters is to assume that the geometrical CME properties are approximated by the Cone model (Zhao, Plunkett, and Liu 2002; Xie, Ofman, and Lawrence 2004) which assumes isotropic expansion, radial propagation, and constant CME cone angular width. Generally, a CME disturbance is inserted in the WSA–ENLIL model as slices of a homogeneous spherical plasma cloud with uniform velocity, density, and temperature as a time-dependent inner boundary condition at 21.5 solar radii [R] with an unchanged background magnetic field. While the simplest geometrical case is employed in this work, the WSA–ENLIL+Cone model can also support an elliptical geometry including tilt, an elongated spheroid or ellipsoid, and leading- and trailing-edge velocities. Measurements derived from coronagraphs (described in Section 3.1) determine the cloud velocity, location, and width. The CME cloud density [dcld] is a free parameter that by default is four times larger than typical mean values in the ambient fast wind, providing a pressure four times higher than that in the ambient fast wind. The cloud temperature is taken to be equal to the ambient fast-wind temperature. Another ENLIL free CME parameter is the cavity ratio, which allows the CME to be represented by a spherical shell of plasma and is based on coronagraph observations of CME cavities. The cavity ratio [radcav] is defined as the ratio of the radial CME cavity width to the CME width, with the default being no cavity [radcav=0].

WSA–ENLIL+Cone runs performed for research and operations have shown that accurate descriptions of the heliosphere and transients are achieved only when the background solar wind is well-reproduced and if coronagraph observations from multiple views, for example from the SOlar and Heliospheric Observatory (SOHO) spacecraft at L1 ahead of Earth (Domingo, Fleck, and Poland 1995) and the Solar TErrestrial RElations Observatory (STEREO) spacecraft (Kaiser et al. 2008), are used to derive CME parameters (Lee et al. 2013; Millward et al. 2013). WSA coronal maps provide the magnetic field and solar-wind speed at the boundary between the coronal and heliospheric models, usually at 21.5 R, and they are generated from synoptic magnetograms. Small latitudinal shifts in the magnetogram-derived coronal maps caused by inaccuracies in solar magnetic-field observations, particularly in the polar regions, can cause large longitudinal shifts in the solar-wind structure, for example in characterizing high-speed stream arrival times (e.g. MacNeice 2009; Jian et al. 2011, 2015). Other coronal models, such as MAS (MHD around a Sphere: Riley, Linker, and Mikić 2001) or heliospheric tomography from interplanetary scintillation (IPS: Jackson et al. 2011) can also provide the background solar wind and have been coupled with ENLIL heliospheric simulations.

CCMC/SWRC has been carrying out routine WSA–ENLIL+Cone simulations for several years using solar magnetic synoptic maps and CME geometric and kinematic properties inferred from coronagraph observations (Zheng et al. 2013). Each ENLIL run uses a WSA model synoptic map computed from the single GONG daily updated synoptic magnetogram (see, e.g., Arge and Pizzo 2000) closest to the time that the simulation is executed. These 4 low-resolution real-time simulations complete in ≈ 20 minutes, running on two nodes with 16 processors per node on a spherical grid size of 256×30×90 (r,θ,ϕ) with a five- to ten-minute output cadence at locations of interest. The simulation range is 0.1 to 2 AU in radius [r], −60 to +60 in latitude [θ], and 0 to 360 in longitude [ϕ]. CME parameters are derived using real-time coronagraph observations from spacecraft and a geometric triangulation algorithm. The measurements are an approximation of the true 3D speed and width of the CME at 21.5 R (ENLIL inner boundary). However, the coronagraph-derived measurements are often inferred from just a few data points, and some CMEs may be missed as a result of real-time data gaps. CME parameters derived in real time and simulation graphical outputs are publicly available from the CCMC Space Weather Database Of Notifications, Knowledge, Information (DONKI) ( kauai.ccmc.gsfc.nasa.gov/DONKI ).

3 Ensemble CME Parameters

3.1 StereoCAT Triangulation Algorithm for Determining CME Parameters

CME parameters are determined using the Stereoscopic CME Analysis Tool (StereoCAT), developed by the CCMC for real-time CME analysis carried out by the CCMC/SWRC forecasting team. The goal was to develop a tool that can be used quickly, yet reliably in a real-time environment with any possible combination of spacecraft available for analysis. It was also required that the tool be intuitive and simple enough to be employed by a wide variety of users such as space-weather forecasters, scientists, students, and citizen scientists. The basic methodology of the tool, i.e. tracking of CME kinematic properties from two different fields of view, is similar to that of the NOAA Space Weather Prediction Center CME Analysis Tool (CAT) developed by Millward et al. (2013) and the geometric localization developed by Pizzo and Biesecker (2004). However, StereoCAT does not attempt to capture the volumetric structure of CMEs, but is based on tracking specific CME features. The algorithm is most similar to the CME geometric triangulation method of Liu et al. (2010). For a more detailed discussion of different CME analysis techniques in the context of cone-model-based CME simulations, see Pulkkinen, Oates, and Taktakishvili (2010) and Millward et al. (2013). Other stereoscopic methods for determining the kinematic properties of CMEs include those by Thernisien, Howard, and Vourlidas (2006), Lugaz et al. (2010), and Davies et al. (2013).

StereoCAT is based on triangulation of transient CME features from two different coronagraph fields of view. We call these planes of sky A and B, which may designate, for example, fields of view of the Sun Earth Connection Coronal and Heliospheric Investigation (SECCHI) COR2 instruments onboard the STEREO-A and STEREO-B spacecraft (Howard et al. 2008). The tool is used to manually identify the same CME features in two consecutive images that are then used to calculate the plane-of-sky velocities for A and B, \(\boldsymbol {v}'_{A}\) and \(\boldsymbol {v}''_{B}\), respectively. Note that these velocities are in local plane-of-sky coordinates indicated by ′ and ″. These data need to be brought into the same coordinate system (heliospheric earth equatorial (HEEQ) coordinates in this case), which can be accomplished by rotations:

$$\begin{aligned} \boldsymbol {v}_A =& \boldsymbol {R}_A \cdot \boldsymbol {v}'_A \end{aligned}$$
(1)
$$\begin{aligned} \boldsymbol {v}_B =& \boldsymbol {R}_B \cdot \boldsymbol {v}''_B, \end{aligned}$$
(2)

where operators R A and R B carry out transformations from A and B plane-of-sky coordinates into a common base such as HEEQ, respectively.

We then define two projection matrices as

$$\begin{aligned} \boldsymbol {P}_A =& \boldsymbol {1} - \boldsymbol {e}_A \boldsymbol {e}_A^T \end{aligned}$$
(3)
$$\begin{aligned} \boldsymbol {P}_B =& \boldsymbol {1} - \boldsymbol {e}_B \boldsymbol {e}_B^T , \end{aligned}$$
(4)

where 1 is a 3×3 identity matrix. The unit vectors normal to the planes-of-sky of coronagraphs A and B are defined as e A and e B , where \(\boldsymbol {e}_{A}^{T}\) is the transpose of matrix e A . The matrices P A and P B project any vector to the plane-of-sky of A and B, respectively. Consequently, plane-of-sky speeds can be expressed as

$$\begin{aligned} \boldsymbol {v}_A =& \boldsymbol {P}_A \cdot \boldsymbol {v} \end{aligned}$$
(5)
$$\begin{aligned} \boldsymbol {v}_B =& \boldsymbol {P}_B \cdot \boldsymbol {v}, \end{aligned}$$
(6)

where v is the three-dimensional vector pointing toward the propagation direction of the CME. While individual projection matrices are not invertible, we can combine Equations (5) and (6) to obtain

$$ (\boldsymbol {P}_A + \boldsymbol {P}_B) \cdot \boldsymbol {v} = \boldsymbol {v}_A + \boldsymbol {v}_B, $$
(7)

from which we can solve

$$ \boldsymbol {v} = (\boldsymbol {P}_A + \boldsymbol {P}_B)^{-1} \cdot ( \boldsymbol {v}_A + \boldsymbol {v}_B ). $$
(8)

Importantly, (P A +P B )−1 exists as long as planes-of-sky A and B are different, i.e. when e A and e B are not co-linear (parallel to each other). Therefore large triangulation errors occur when the spacecraft separation angle is very small or around 180.

A similar approach can be used to track the three-dimensional location [r] of a feature from plane-of-sky measurements [r A and r B ] as

$$ \boldsymbol {r} = (\boldsymbol {P}_A + \boldsymbol {P}_B)^{-1} \cdot ( \boldsymbol {r}_A + \boldsymbol {r}_B ). $$
(9)

Often the time stamps of coronagraph imagery from spacecraft A and B do not match exactly. This is handled in StereoCAT by propagating the tracked feature in A with speed v A to a new r A that matches the B time stamp. Consequently, matching time stamps are used for r A and r B in Equation (9).

The angular size of a CME is estimated in StereoCAT simply by manually selecting the two outer edges of the CME. These two lines that connect through the center of the Sun are then used to compute the opening angle of the CME. It is noted that this process does not take into account projection of the outer CME edges to the spacecraft plane of sky, and is therefore a measurement of the projected CME width. While this is not an issue if the CME propagation direction is not too far away from the plane of sky of the spacecraft that is used to measure the opening angle, one needs to be very careful with events with propagation directions substantially away from the plane of sky, as in such cases the opening angle can be overestimated. This issue will be addressed in the future versions of StereoCAT.

Other limitations of StereoCAT arise from the user’s ability to reliably identify the same structures in images from both spacecraft due to ambiguities from the different viewing angles. It may at times be difficult or impossible to track the same structure since different sections of the CME contribute most strongly to images in different planes of sky (Howard and DeForest 2012). Consequently, StereoCAT is unsuitable to be used with coronagraph data in which the CME appears as a halo, since the CME leading edge is not visible.

3.2 Performing CME Measurements with StereoCAT

StereoCAT has three modes: two-timepoint, ensemble, and frame series, and is available online via a web interface ( ccmc.gsfc.nasa.gov/analysis/stereo : LaSota 2013). Available coronagraphs include the Large Angle and Spectrometric Coronagraph Experiment (LASCO)-C2 and -C3 instruments onboard the SOHO spacecraft (Brueckner et al. 1995), and the SECCHI-COR2 instruments on the STEREO-A and -B spacecraft. All three modes are based on the same triangulation algorithm, described in Section 3.1. In the basic two-timepoint mode, the user manually measures the CME leading-edge height for two different times in each coronagraph image for two different coronagraph viewpoints. The plane-of-sky speed for each viewpoint is calculated, from which the triangulated speed and direction are computed using the algorithm described in Section 3.1. The user also manually measures the CME opening angle in each coronagraph view. Because this is a projected-width measurement, both widths and their average are displayed for the user.

In ensemble mode, the user manually repeats the same procedure as for the two-timepoint mode, by measuring the same feature for the same pair of coronagraphs at two different times. Between each two-timepoint measurement, the display is fully reset such that the user is forced to carefully remeasure the CME leading-edge height and opening angle. This series of repeated measurements leads to a range of CME parameters that can be used to initialize an ensemble simulation. For every m two-timepoint measurements made, n=m 2 ensemble CME parameter members are automatically generated by combining different spacecraft-measurement pairs. For example, for m=2 two-timepoint measurements, there are n=22=4 ways to combine the first and second time-step height measurements in viewpoints A and B to triangulate the CME. Since the two projected-width measurements made for each measurement m are not triangulated, they are randomly assigned to each ensemble member. An example screenshot of m=6 two-timepoint measurements performed in ensemble mode using StereoCAT is shown in Figure 1. Two image pairs are shown from the SECCHI-COR2 instruments from STEREO-B (top row) and STEREO-A (bottom row) for two different times separated by 30 minutes in the left and right columns. The white circles indicate the six individual two-timepoint plane-of-sky leading-edge height measurements (near the center of the CME front) and the width measurements are marked by the green circles (near the CME edges). The green lines in panel c of Figure 1 illustrate the CME opening-angle measurements for one of the coronagraph images. In this example the six individual two-timepoint measurements were combined by the algorithm to create 62=36 ensemble members.

Figure 1
figure 1

Example screenshot of m=6 two-timepoint measurements performed for the 18 April 2014 CME using StereoCAT in ensemble mode. Two image pairs are shown from the SECCHI-COR2 instruments from STEREO-B (a – b; top row) and STEREO-A (c – d; bottom row), for two different time steps, 18 April 2014 13:54 UT (a, c; left column) and 18 April 2014 14:24 UT (b, d; right column). The white circles indicate the six individual two-timepoint plane-of-sky leading-edge height measurements (near the center of the CME front), and the width measurements are marked by the green circles (near the CME edges). The green lines in panel c illustrate the CME opening angle measurements for one of the coronagraph images. The plane-of-sky leading-edge measurements (central white circles) are later combined together using the triangulation algorithm discussed in Sections 3.1 – 3.2 to generate 62=36 ensemble members. The distribution of the resulting CME parameters that are used as initial conditions for 36 WSA–ENLIL+Cone simulations is shown in Figure 3.

After completing the measurements, the user may inspect histograms of their CME parameters. The web interface allows the user to remove any ensemble members and add any custom members. Generally, members are removed when they have nearly identical parameters, or when triangulation appears unreliable. Custom members can be measurements from different image time pairs, from plane-of-sky estimates that incorporate the source location, or from any other CME measurement technique. The same procedure can be applied to create n individual ensemble measurements for x CMEs for a series of events, which are then combined one-to-one to be simulated together such that there are n ensemble members containing x CMEs each.

In frame-series mode, the user can measure a series of different frames (times) for each spacecraft, which are then triangulated to create a CME height–time profile. The user selects a range of time and steps through the images available from each instrument, measuring the CME in as many images as they choose. The software chooses time pairs of measurements for triangulation based on a user-specified maximum allowed time difference. From these measurements, plane-of-sky and triangulated height–time, velocity, latitude, and longitude profiles of the CME are generated. Triangulations made with different spacecraft pairs are shown as separate height–time profiles. Several methods are used to calculate the CME speed, acceleration, the time the CME passes 21.5 R (ENLIL inner boundary), and the time it erupts from the Sun. These include least-squares linear and quadratic fits, averages over selected data points, and averages from only the first and last data points. Results for each method are reported separately, allowing the user to choose the most appropriate fitting technique depending on the acceleration profile of the CME. Plane-of-sky values are also reported, which can be used when coronagraph projection effects make this triangulation method unreliable. This can occur if the CME is very wide, appears as a halo, or is heavily projected in the coronagraph data. In these cases the user will not be able to identify the same CME leading-edge feature in the data from two coronagraphs. The user can inspect the triangulated height values directly on the height–time plot to evaluate triangulation accuracy in these cases.

4 Ensemble Modeling with WSA–ENLIL+Cone

The current implementation of this ensemble-modeling method evaluates the sensitivity of WSA–ENLIL+Cone model simulations of CME propagation to initial CME parameters. As described in Section 3.1, StereoCAT is used to create an ensemble of n CME parameters that are used as input to n WSA–ENLIL+Cone simulations. We have observed that n≈36 to 48 provides an adequate spread of input parameters, but this number can be increased if necessary. For n=48 a typical run takes 130 minutes to complete on 24 nodes with four processors per node on the initial development system. We estimate that the same run will take ≈ 80 minutes on the CCMC production system, which has 16 processors per node.

The simulations provide n profiles of MHD quantities (density, velocity, temperature, and magnetic-field components) and a distribution of n predicted arrival times at locations of interest within the computational domain. Currently, ensemble modeling is performed for spacecraft at the following locations: Mercury (MErcury Surface, Space ENvironment, GEochemistry, and Ranging (MESSENGER)), Venus (Venus Express (VEX)), Earth (Advanced Composition Explorer (ACE), Wind, SOHO, and orbiting spacecraft), Mars (Mars Science Laboratory (MSL), Mars Atmosphere and Volatile Evolution (MAVEN), Mars Express (MEX)), Spitzer Space Telescope, STEREO-A and -B. The CME-associated disturbance/shock arrival time is then automatically computed in post-processing from any sharp increases in the modeled solar-wind dynamic pressure at a given location. In this work, we focus on the ensemble results of the Earth-directed events.

For Earth-directed CMEs, the CCMC/SWRC also computes n estimates of the geomagnetic K P index using the WSA–ENLIL+Cone model plasma parameters at Earth. The geomagnetic three-hour planetary K index [K P] is a measure of general planetary-wide geomagnetic disturbances at mid-latitudes based on ground-based magnetic observations (Bartels, Heck, and Johnston 1939; Rostoker 1972; Menvielle and Berthelier 1991). The K P index is created from standardized K indices from individual stations, which measure the magnitude of horizontal geomagnetic field disturbances (not including daily variations). K P is a quasi-logarithmic index ranging from 0 to 9. Real-time estimated planetary K P indices are available from NOAA using real-time data from a limited number of geomagnetic observatories, and the final definitive K P is from the Helmholtz Center Potsdam GFZ German Research Centre for Geosciences.

The predicted K P estimate is made by using the Newell et al. (2007) coupling function arising from their correlation of 20 candidate coupling functions with geomagnetic indices. The function that represents the rate of magnetic flux \(\frac {{\mathrm{d}} \Phi_{\mathrm{MP}}}{{\mathrm{d}} t}\) opening at the magnetopause and correlated best with nine out of ten indices is given as

$$ \frac {{\mathrm{d}} \Phi_\mathrm{MP}}{{\mathrm{d}} t} = {v_\mathrm{bulk}}^{4/3} B_\mathrm{T}^{2/3}{\sin}^{8/3}\biggl(\frac{\theta_\mathrm{C}}{2} \biggr), $$
(10)

where v bulk is the bulk solar-wind speed, the interplanetary magnetic field (IMF) clock angle θ C is given by tan−1(B y /B z ), and the perpendicular component of the magnetic field is given by \(B_{\mathrm{T}}=(B_{y}^{2}+B_{z}^{2})^{1/2}\) (in GSM coordinates). An exponential fit to the correlation of this coupling function with the K P index yields the following relation used for the estimate:

$$ K_\mathrm{P} = 9.5 - \mathrm{e}^{2.17676-5.2001 (\frac {{\mathrm{d}} \Phi_\mathrm{MP}}{{\mathrm{d}} t})}. $$
(11)

Emmons et al. (2013) showed for their sample of 15 events that K P predictions using Equation (11) computed directly from in-situ solar-wind observations had a mean absolute error of 0.5. Because ENLIL-modeled CMEs do not contain an internal magnetic field and the magnetic-field amplification is caused mostly by plasma compression, only the magnetic-field magnitude is used and three magnetic-field clock-angle scenarios of 90 (westward), 135 (southwestward), and 180 (southward) are assumed. This provides a simple estimate of three possible maximum values that the K P index might reach following arrival of the predicted CME shock/sheath. For the forecast, the K P estimates are rounded to the nearest whole number.

Another commonly used activity index is the Dst (disturbance storm time) index, which is a measure of magnetosphere storm activity primarily from the strength of the ring current. The index is obtained from the measurement of the perturbations in the horizontal component of the Earth’s magnetic field from ground-based observatories that are sufficiently distant from the auroral and equatorial electrojets, are located at approximately ± 20 geomagnetic latitude, and are evenly distributed in longitude (Sugiura 1964). Although the ring current makes the largest contribution to the Dst, all magnetospheric current systems contribute, such as the Chapman–Ferraro magnetopause current, which is strengthened during sudden storm commencement (SSC) and increases the Earth’s surface field and gives a sudden positive jump in Dst. Currently, ENLIL model results are not used to predict the Dst, but in principle this can be computed in a similar manner to the K P index by using the Newell et al. (2007) Dst relation.

5 Example Ensemble: 18 April 2014 CME

In this section we describe the real-time ensemble modeling of an Earth-directed partial halo CME that was first observed at 13:09 UT on 18 April 2014 by SECCHI/COR2-A. Figure 2 shows this CME as viewed from SOHO/LASCO-C2 and -C3, STEREO/SECCHI-COR2-A, and -B near 14:50 UT. This CME was associated with an M7.3 class solar flare from Active Region (AR) 12 036 located at S18W29 with peak at 13:03 UT. The eruption and a coronal wave were visible South of the active region in SDO/AIA 193 Å and a nearby filament eruption was visible in AIA 304 Å. Subsequently, starting at 13:35 UT, an increase in solar energetic particle proton flux above 0.1 pfu MeV−1 [1 pfu=1 particle cm−2 sr−1 s−1] was observed by the GOES-13 Electron, Proton, Alpha, and Detector (EPEAD) (15 – 40 MeV energy range) in Earth orbit.

Figure 2
figure 2

Coronagraph observations of the 18 April 2014 CME with an onset time at 13:09 UT as viewed from (a) STEREO/SECCHI-COR2-B, (b) SOHO/LASCO-C2 and -C3, and (c) STEREO/SECCHI-COR2-A all near 14:50 UT. The fields of view of LASCO-C2, -C3, and SECCHI-COR2 are 2.2 – 6 R, 2.8 – 32 R (shown here cropped to 17 R), and 2.5 – 15 R respectively. Images from Helioviewer ( helioviewer.org ) (Müller et al. 2009).

Figure 1 shows StereoCAT measurements for the 18 April 2014 CME. As discussed above, the central white circles indicate the individual leading-edge measurements, and the green outer circles near the CME edges are the projected width measurements. The six leading-edge measurements are combined together using the triangulation algorithm discussed in Sections 3.1 – 3.2 to generate 62=36 ensemble members. The distribution of the resulting CME parameters that are used as initial conditions for n=36 WSA–ENLIL+Cone simulations is shown in Figure 3 in (a) the equatorial plane (latitude=0) and (b) meridional plane (longitude=0). The figures show the CME velocity vectors in spherical HEEQ coordinates with the grids showing the degrees longitude (a) and latitude (b), and the radial coordinate showing the speed in km s−1. The Sun–Earth line is along 0 longitude and latitude. The arrow directions on the grid indicate the CME central longitude and latitude respectively, with CME half width indicated by the color of the vector. The arrow lengths correspond to the CME speed. CME propagation directions are clustered between −30 to −40 latitude, and around 10 West of the Sun–Earth line in longitude, while CME speeds range from ≈ 1300 to 1600 km s−1. Median CME parameters are: speed of 1394 km s−1, direction of 9 longitude, −35 latitude, and a half-width of 46.

Figure 3
figure 3

Distribution of the 18 April 2014 CME input parameters shown in (a) the equatorial plane (latitude=0) and (b) meridional plane (longitude=0). The plots show the CME speed vectors in spherical HEEQ coordinates with the grids showing the degrees longitude (a) and latitude (b), and the radial coordinate showing the speed in km s−1. The Sun–Earth line is along 0 longitude and latitude. The arrow directions on the grid indicate the CME central longitude and latitude, respectively, with the CME half-width indicated by the color of the vector. The arrow lengths correspond to the CME speed. CME propagation directions are clustered between −30 to −40 latitude, and around 10 West of the Sun–Earth line in longitude, while CME speeds range from ≈ 1300 to 1600 km s−1. Median CME parameters are: speed of 1394 km s−1, direction of 9 longitude, −35 latitude, and a half-width of 46.

Model results for the 36-member ensemble WSA–ENLIL+Cone run for this CME are shown in Figures 4 – 5. For the ensemble member with median CME input parameters, Figure 4 shows a scaled velocity contour plot for the (a) constant Earth latitude plane, (b) meridional plane of Earth, and (c) 1 AU sphere in cylindrical projection on 20 April at 06:00 UT. Panel d shows the measured (red) and simulated (blue) radial velocity profiles at Earth, with the simulated CME duration shown in yellow. This simulation figure shows the northeastern portion of the CME impacting Earth. Figure 5 shows the modeled magnetic field, velocity, density, and temperature profiles at Earth plotted as color traces for all 36 ensemble members, along with the observed in-situ L1 observations from ACE, plotted in black. The model traces are color coded by CME input speed such that slow to faster input speeds are colored from light green to dark blue. The arrival of the CME-associated shock was observed by Wind and ACE on 20 April 2014 at around 10:20 UT, and energetic storm particles were observed by ACE. The provisional SYM-H index (≈ one minute Dst) shows a sudden storm commencement of +25 nT at 11:01 UT. The observations in Figure 5 show clear signatures of the arrival of an interplanetary coronal mass ejection (ICME), including a leading shock (abrupt increase in all the solar-wind parameters at around 10:20 UT) with enhanced post-shock temperatures, enhanced magnetic field with rotations in direction, and declining solar-wind speed. This CME was predicted to arrive at Earth and also at Mars for all of the 36 runs. The mean predicted arrival at Earth was on 20 April 2014 at 05:07 UT with arrival times from individual runs ranging from 20 April 2014 at 01:08 to 11:16 UT. A histogram showing the distribution of arrival times at Earth is shown in Figure 6 with individual arrivals marked by the blue arrows. This figure shows a normal distribution with 50 % of the predicted arrivals within one hour of the mean. The prediction error for the mean predicted CME arrival time was −5.2 hours, and the observed arrival time was just within the ensemble predicted spread. The spread in ensemble member predictions can also be seen in Figure 5 compared to the observations, showing that most of the predictions are earlier than the observed arrival with a few after. From the CME input parameters plotted in Figure 3 the ensemble members with arrival times closest to the observed time had CME input speeds in the range of 1200 – 1400 km s−1, latitudes near −40 and half-widths around 35 – 40. This suggests that the early arrival-time predictions for this event could be due to overestimations of the CME input speed and half-width.

Figure 4
figure 4

Global view of the 18 April 2014 CME on 20 April at 06:00 UT: WSA–ENLIL+Cone scaled velocity contour plot for the (a) constant Earth latitude plane, (b) meridional plane of Earth, and (c) 1 AU sphere in cylindrical projection, for the ensemble member with median CME input parameters (speed of 1394 km s−1, direction of 9 longitude, −35 latitude, and a half-width of 46). Panel d shows the measured (red) and simulated (blue) radial velocity profiles at Earth, with the simulated CME duration shown in yellow.

Figure 5
figure 5

18 April 2014 CME ensemble: Model-calculated density, velocity, magnetic field, and temperature profiles at Earth for all 36 ensemble members plotted as color traces along with the observed in-situ L1 observations from ACE plotted in black (red for B z ). The model traces are color coded by CME input speed such that slow to faster input speeds are colored from light green to dark blue. The observations show clear signatures of the arrival of an ICME, including a leading shock (abrupt increase in all the solar-wind parameters at around 10:20 UT) with enhanced post-shock temperatures, enhanced magnetic field with rotations in direction, and declining solar-wind speed. The spread in the color traces show that most of the predictions are earlier than the observed arrival, with a mean predicted arrival at Earth of 20 April 2014 at 05:07 UT and a range from 20 April 2014 at 01:08 UT to 11:16 UT.

Figure 6
figure 6

18 April 2014 CME: Histogram distribution of arrival-time predictions at Earth (bin size of one hour) with individual arrivals marked by the blue arrows. This figure shows a normal distribution where 50 % of the predicted arrivals are within one hour of the mean. The prediction error for the mean predicted CME arrival time is −5.2 hours, and the observed arrival time was within the ensemble predicted spread.

The NOAA real-time observed K P index (and the Potsdam final K P) reached 5 during the synoptic period 12:00 – 15:00 UT on 20 April associated with the CME shock arrival. The Dst reached a minimum of −24 nT at 15:00 UT on 21 April, and thus, based on Dst, this CME only resulted in very weak geomagnetic activity. As discussed in Section 4, Equation (11) can be used to forecast the maximum K P index from maximum ENLIL predicted quantities at CME shock/sheath arrival at Earth (colored traces shown in Figure 5). Figure 7 shows the predicted probability distribution of K P for three clock-angle scenarios θ C=90 (green), 135 (purple), 180 (orange). The figure also shows the overall K P forecast probability distribution calculated for all three angles combined 90 – 180, assuming each scenario is equally likely, in black. The standard deviation of the overall K P forecast probability distribution is 1.1, with 84 % of the forecasts falling between K P=5 to 7. The most likely forecast is for K P=7 at 41 %, followed by K P=5 at 27 % and K P=6 at 16 % likelihood of occurrence. Using the most likely forecast of K P=7, the K P prediction error for this event is ΔK P err=K P predictedK P observed=2 (overprediction). The overprediction of K P may be related to the overestimation of the CME input speed. In Sections 6.1 – 6.2 and 7 we discuss various factors that can contribute to early arrival-time predictions and K P overpredictions.

Figure 7
figure 7

Distribution of K P probability forecast using ENLIL-predicted solar-wind quantities at Earth for three clock-angle scenarios: θ C=90 (green), 135 (purple), 180 (orange), and all three angles combined 90 – 180 (black) (assuming equal likelihood). The standard deviation of the overall K P forecast probability distribution is 1.1, with 84 % of the forecasts falling between K P=5 to 7. The most likely forecast is for K P=7 at 41 %, followed by K P=5 at 27 % and K P=6 at 16 % likelihood of occurrence. The NOAA real-time observed K P index (and the Potsdam final K P) reached 5 during the synoptic period 12:00 – 15:00 UT on 20 April associated with the CME disturbance arrival.

6 Real-Time Ensemble Modeling: First Results

For 35 Earth-directed CME events from January 2013 through June 2014, real-time ensemble modeling was carried out by the CCMC/SWRC team following the methods described in Sections 4 – 5. In Table 1 we list a summary of the ensemble simulation results for these 35 CME events. The first and second columns give the CME onset date and time based on the first appearance in C2 or COR2. Generally, if two CMEs occur within a day of each other, they will both be included in the same simulation as separate CMEs that may or may not merge during their propagation. A few of the ensemble simulations listed in the table contain two CMEs as part of a single run. In these cases, CMEs that were simulated together with the CME listed in the previous row are indicated by a. The third column lists (for 2013) the second-order plane-of-the-sky (POS) speed at 20 R reported in the SOHO/LASCO CDAW CME catalog ( cdaw.gsfc.nasa.gov/CME_list ) (Yashiro et al. 2004; Gopalswamy et al. 2009). If measurements were not made to 20 R, the second-order POS speed at the time of last observation is used. The next four columns provide the median ensemble CME input parameters of v, latitude, longitude (HEEQ), and half-width [w/2] measured using StereoCAT. In columns 8, 9, and 10, we list the mean predicted arrival time of all n tot ensemble members, followed by the spread in arrival times in hours relative to the mean. The next column (11) shows n predicted hits, the number of ensemble members out of n tot, the total number of ensemble members that predict that the CME will arrive at Earth. This ratio p=n predicted hits/n tot gives a forecast probability and conveys the forecast uncertainty about the likelihood that the CME will arrive. Columns 12, 13, and 14, list the actual arrival time of the CME-associated shock or disturbance observed in situ at the Wind spacecraft, followed by the total in-situ observed CME transit time relative to the CME start time. In the last column the prediction error [Δt err] is calculated for predictions indicating hits. The prediction error is defined as Δt err=t predictedt observed, which is negative when ENLIL predictions are earlier than the observed CME arrival time, and late predictions are positive. When possible, ICME and magnetic-cloud catalogs were used to help assess whether the CME did arrive at Earth. These included the Richardson and Cane (2010) ICME catalog ( www.srl.caltech.edu/ACE/ASC/DATA/level3/icmetable2.htm ), and the Wind ICME catalog ( wind.nasa.gov/index_WI_ICME_list.htm ) with circular flux-rope model fitting (based on Hidalgo et al. 2000). Shocks identified by the SOHO/CELIAS/MTOF/PM “shockspotter” program were also used in arrival-time assessment. Determining the measured in-situ arrival time of the CME-associated shock or disturbance can be subjective and therefore be a source of error in the prediction error calculation. Taking this into consideration, in-situ signatures that could not be unambiguously identified as the arrival of the CME-related disturbance are indicated by c, and these five ensembles are not included in the following forecast verification. This reduces the sample size from 35 to 30 ensembles.

Table 1 Summary of the ensemble simulation results for 35 CME events (January 2013 – June 2014). Columns 1 – 2: CME onset date and time. Column 3: SOHO/LASCO CME Catalog plane-of-sky (POS) speed intentionally left blank. Columns 4 – 7: median ensemble CME input parameters of v, latitude, longitude (HEEQ), and half-width [w/2]. Columns 8 – 10: mean predicted arrival time of all n tot ensemble members, and the spread in arrival times in hours relative to the mean. Column 11: n predicted hits, the number of ensemble members predicting that the CME will arrive at Earth out of n tot, the total number of ensemble members. Columns 12 – 14: actual arrival time observed in situ, and the observed CME transit time relative to the CME start time. Column 15: prediction error Δt err=t predictedt observed for hits, or CR and FA stand for correct rejections and false alarms.

In the following subsections we discuss ensemble CME arrival and K P forecast verification inspired by methods used in ensemble weather forecasting and applied here for the first time.

6.1 CME Arrival Forecast Verification

To begin with a simple forecast evaluation of CME arrival time, the ensemble mean can be taken as a single forecast. Using the prediction error Δt err=t predictedt observed (last column of Table 1), the mean absolute error (MAE) is 12.3 hours, the root-mean-square error (RMSE) is 13.9 hours, and the mean error (ME) is −5.8 hours (early) for all 17 ensembles containing hits. Considering the sample size in this study, these errors are comparable to CME arrival-time prediction errors (a RMSE of ≈ ten hours) reported by others (Millward et al. 2013; Romano et al. 2013; Vršnak et al. 2014; Mays et al. 2015). Similarly, Colaninno, Vourlidas, and Wu (2013) used a variety of methods to evaluate CME arrival-time predictions (not real-time) based on imaging-data analysis alone and found an error ± 6 hours for seven out of nine CMEs, and ± 13 hours for their full sample of nine CMEs. The CME arrival-time prediction error is inevitably related to the CME propagation speed, thus it is useful to consider the input speed and in-situ observed transit time relative to the prediction error. For this sample, the average in-situ observed transit time was 66 hours. In Figure 8a the CME arrival-time prediction error is plotted versus the CME input speed, and in Figure 8b the prediction error as a percentage of the CME transit time is plotted versus the CME input speed. The error bars are computed using the predicted ensemble range as listed in column 10 of Table 1. The dashed-horizontal line indicates the mean arrival-time prediction error (a) and mean of the prediction error/transit time percentage (b). These figures show a nearly consistent negative prediction error for fast CMEs above ≈ 1000 km s−1 such that these fast CMEs are generally predicted to arrive earlier than they are observed. This could be a sign of the modeled CME having too much momentum as defined by a combination of the input speed and half-width (which is related to the modeled CME mass). The overestimation of the modeled CME velocity compared to in-situ observed values is also due to the modeled CME having a lower magnetic pressure than is observed in typical magnetic clouds.

Figure 8
figure 8

(a) CME arrival-time prediction error plotted against the CME input speed. (b) Prediction error as a percentage of the CME transit time, plotted against the CME input speed. The error bars are computed using the predicted ensemble range as listed in Table 1.

Ensemble modeling produces a probabilistic forecast [p i ] of the likelihood of CME arrival for each ensemble [i], but we begin with a simpler forecast evaluation by binning the probability [p i ] into a categorical yes/no forecast. Categorical forecasts only have two probabilities: zero and one. Therefore we start by binning the probability forecast [p] into two categories: “yes” the CME will arrive, and “no” the CME will not arrive. In the signal detection theory model of weather forecasting, event-forecasting performance can be evaluated in terms of a 2×2 contingency table, as shown in Table 2 (Harvey et al. 1992; Weigel et al. 2006; Jolliffe and Stephenson 2011). For CME arrival prediction, the “event” is taken as the “CME arrival”. Hits are then defined as CME arrivals that were both predicted and observed to occur. Misses are defined as CME arrivals that were not predicted, but were observed to occur. False alarms (FA) are defined as CME arrivals that were predicted to occur, but were observed not to occur. And correct rejections (CR) are CME arrivals that were not predicted, and were observed not to occur. To bin each ensemble’s probabilistic forecast, correct rejections were identified when the criterion of the forecast probability p i =n predicted hits/n total members<15 % was met; i.e. that less than 15 % of the total predictions in the ensemble indicated CME arrival. Similarly, the inverse criterion is used to identify hits. Table 2 shows the contingency table definitions and values for this 30 event sample: 17 hits, 8 correct rejections, 5 false alarms, and 0 misses (see Table 1 for specific CR and FA events). For this sample, zero misses indicates that there were no ensemble simulations that did not predict CME arrivals that were observed to occur. There were 8 out of 30 correct rejections and 5 false alarms for events that were not observed in situ, giving a correct rejection and false-alarm rate of 62 % (8/13) and 38 % (5/13), respectively. The correct-alarm ratio, defined as the number of hits divided by the number of hits and false alarms, is 77 % and the false-alarm ratio is 23 %.

Table 2 Forecast performance contingency table for 30 ensembles.

We now consider a more nuanced technique to evaluate the probabilistic forecast without partitioning it into a categorical forecast with only two probabilities as described above. A method defining the magnitude of probability forecast errors is the Brier Score (BS) (Brier 1950; Murphy 1973; Wilks 1995), defined as

$$ \mbox{BS}={\frac{1}{N}{\sum_{i = 1}^N {(p_{i} - o_{i} } })^{2} }, $$
(12)

where N is the number of events, p i is the forecast probability of occurrence for event i, and o i is 1 if the event was observed to occur and 0 if it did not occur. For CME arrival prediction, the “event” here is taken as the “CME arrival” and p i is listed in column 11 of Table 1 for each ensemble. This score is a probability mean-square error that weights larger errors more than small ones and ranges from 0 to 1, with 0 being a perfect forecast. The BS computed from all N=30 ensemble CME arrival probabilities (Table 1, column 11) is 0.15, which indicates that in this sample, on average, the probability [p] of the CME arriving is fairly accurate. However, such verification scores reduce the problem to a single measure that can only consider one dimension, whereas there are many dimensions to the system. For example, consider the aspect of forecast reliability. Reliable forecasts are those where the observed frequencies of events agree with the forecast probabilities. To evaluate the reliability of probabilistic ensemble forecasts, a set of probabilistic forecasts [p i ] must be evaluated using observations that demonstrate that those events either occurred or did not occur. Multiple forecasts must be evaluated because a single probabilistic forecast cannot be simply assessed as correct or incorrect. For example, if a forecast suggests a 30 % chance of CME arrival, and the CME does arrive, the forecast is not clearly either correct or incorrect. Therefore, to provide forecast verification for a p=30 % chance of CME arrival, one would need to compile the statistics of observed CME arrivals for a set of forecasts that predicted a 30 % chance of arrival. In this way, a reliability diagram can be constructed to determine how well the predicted probabilities of an event correspond to their observed frequencies (Wilks 1995; Jolliffe and Stephenson 2011). Figure 9a shows the reliability diagram of the likelihood of CME arrival forecast for the 30 event sample, with the reliability for this sample shown as the black line with points and the perfect reliability diagonal as a dotted line. The line of perfect reliability is diagonal because, for example, when a 60 % probability forecast is made, it is considered perfectly reliable if the event is observed to occur 60 % of the time over multiple ensemble forecasts. The number of events used in each calculation is shown next to each point, and the sample size is smaller than needed for a robust diagram. Nevertheless, the diagram shows that overall ensemble modeling is underforecasting in the forecast bins between 20 – 80 % and slightly overforecasting in the 1 – 20 % and 80 – 100 % forecast bins. Overforecasting is when the forecast chance of CME arrival (forecast probability) is higher than is actually observed; i.e. the CME is observed to arrive less often than is predicted. Similarly, underforecasting is when the chance of CME arrival is lower than is actually observed; i.e. the CME is observed to arrive more often than is predicted.

Figure 9
figure 9

CME arrival-time forecast verification: (a) Reliability diagram of the forecast probability of CME arrival for the 30 ensemble sample, with the ensemble results shown as the solid line with points and the diagonal perfect reliability as a dotted black line. The number of ensembles used in each calculation is shown next to each point. The diagram indicates underforecasting in the forecast bins between 20 – 80 % and slight overforecasting in the 1 – 20 % and 80 – 100 % forecast bins. Overforecasting is when the forecast probability of CME arrival is higher than observed, i.e. the CME is observed to arrive less often than is predicted. Similarly, underforecasting is when the CME arrival forecast probability is lower than observed, i.e. the CME is observed to arrive more often than is predicted. (b) The rank histogram for the 17 ensembles containing hits indicates undervariability of initial conditions.

Another aspect of forecast reliability is to assess how well the ensemble spread of the forecast represents the true variability of the observations. For 8 out of 17 of the ensemble runs containing hits, the observed CME arrival was within the spread of ensemble arrival-time predictions. This indicates that roughly half of the observations fall outside of the extremes of the predicted ensemble spread. However, one aspect of a reliable forecast is that the set of ensemble member forecast values for a given event and observations should be considered as random samples from the same probability distribution. This reliability then implies that if an n-member ensemble and the observation are sorted from earliest to latest arrival times, the observation is equally likely to occur in each of the n+1 possible “ranks”. Therefore a histogram of the rank of the observation, “rank histogram”, tallied over many events should be uniform (flat) (Anderson 1996; Hamill and Colucci 1997; Talagrand, Vautard, and Strauss 1997). While more samples would be desirable, it is still instructive to examine the rank histogram for the CME arrival-time predictions from the 17 ensembles containing hits in this sample, shown in Figure 9(b). Since each ensemble run in our sample does not have the same number of members, the rank has been normalized to 10 (nine-member ensemble). To construct this rank histogram, the CME arrival-time predictions of each ensemble are sorted from earliest to latest and the rank of where the observed arrival falls among the predicted times is noted. For example, an ensemble with a rank of 8 has the meaning that seven arrival-time predictions fall before the observed arrival, a rank of 10 would mean that all nine predictions occur before the observation, and a rank of 1 means that the observation occurs before all of the predictions. The nonuniform U-shape of this histogram partly illustrates that roughly half of the observed arrivals are outside the spread of predictions (ranks 1 and 10), with a tendency for an overall early spread of predictions (rank=10) compared to observations (also quantified by mean arrival time error of −7.0 hours). U-shaped rank histograms can indicate lack of variability in the ensemble, but can also be a sign of a combination of conditional biases in the model (Hamill 2001). However, when evaluating the WSA–ENLIL+Cone model in this sample of ensembles, and > 70 regular runs containing hits performed by SWRC (Romano et al. 2013), an overall negative bias (early predictions) was found, with a weaker bias for CME input speeds below ≈ 1000 km s−1. Therefore, it is unlikely that a combination of positive and negative model biases within the ensembles contributed to the U-shaped rank histogram for our sample. Most likely, the U-shape suggests undervariability, indicating that these ensembles do not sample a wide enough spread in CME input parameters.

6.2 K P Forecast Verification

For each event for which a hit is predicted in Table 1, ensemble modeling provides a probabilistic K P forecast (see Section 4) for three magnetic-field clock-angle scenarios of 90 (westward), 135 (southwestward), and 180 (southward). An overall probabilistic K P forecast can then be obtained by making the simple assumption that each clock-angle is equally likely to occur. Table 3 lists the overall probabilistic K P forecast p(K P=b) for each K P bin b (e.g. the distribution shown in Figure 7 in black) for these 17 events. The observed K P, sudden storm commencement (SSC) and minimum Dst indices are also shown. The mean predicted K P is listed in column 12, along with the overall predicted K P spread (using plus or minus notation). Underlined K P probabilities indicate that the NOAA real-time observation falls within this bin, and the final definitive K P values are listed in column 13. The Dst values are from the real-time (quicklook) Dst index provided by the World Data Center for Geomagnetism in Kyoto, Japan. To estimate the reliability of the probabilistic K P forecast, the Brier Score is calculated for each K P bin and listed in the last line of the table.

Table 3 Summary of K P prediction results for 17 ensemble runs containing hits. Columns 1 – 2: CME start date and time. Columns 3 – 11: overall probabilistic K P forecast for each K P bin assuming equal likelihood of three clock-angle scenarios. Underlined K P probabilities indicate that the NOAA real-time K P observation falls in this bin, and the observed definitive K P is listed in column 13. The mean predicted K P is listed in column 12, along with the overall predicted K P spread (using plus or minus notation). The Brier Score (BS) is calculated for each K P bin and listed in the last line of the table. The Dst sudden storm commencement and minimum values are listed in the last two columns.

To evaluate forecast performance, a single categorical predicted K P forecast can be derived from the probabilistic K P forecast p(K P=b) distribution. For example, the single categorical forecast \(K_{\mathrm{P}_{\mathrm{predicted}}}\) can be taken as the mean predicted K P, or the most probable K P value. This allows a K P prediction error to be computed as \(\Delta K_{\mathrm{P}_{\mathrm{err}}}=K_{\mathrm{P}_{\mathrm{predicted}}}-K_{\mathrm{P}_{\mathrm{observed}}}\) for each ensemble, where positive values of \(\Delta K_{\mathrm{P}_{\mathrm{err}}}\) indicate an overprediction of the K P index and negative values indicate that K P has been underpredicted. If the categorical \(K_{\mathrm{P}_{\mathrm{predicted}}}\) is taken as the K P bin b that has the highest likelihood in the probabilistic K P forecast p(K P=b) for each ensemble, the prediction errors are calculated to give a mean absolute error (MAE) of 1.9, root mean square error (RMSE) of 2.5, and mean error (ME) of +1.4. However, if the categorical \(K_{\mathrm{P}_{\mathrm{predicted}}}\) is taken as the mean predicted K P in each ensemble (last column of Table 3), these errors are reduced to MAE=1.5, RMSE=2.0, and ME=+0.6. Consequently, using the ensemble mean K P yields a more accurate forecast in this sample; however, both forecast choices show an overall tendency for the overprediction of K P. Given that the modeled CMEs do not have an internal magnetic-field structure, the Newell et al. (2007) K P coupling function using ENLIL results as input performs surprisingly well. For comparison, using ACE solar-wind data as input to the coupling function for this sample gives K P prediction errors of MAE=0.67, RMSE=0.77, and ME=+0.22.

In Figure 10 the K P prediction error (from the ensemble mean K P) is compared to the CME input speed; the error bars show the ensemble K P prediction spread. This figure shows that K P is usually overpredicted when CME input speeds are above ≈ 1000 km s−1. This bias is also apparent in the K P predictions made from a sample of > 70 regular WSA–ENLIL+Cone runs reported by Romano et al. (2013). The K P overprediction is most likely due to an overestimation of the CME dynamic pressure at Earth by the WSA–ENLIL+Cone model, because the CME has a lower magnetic pressure than is observed in typical magnetic clouds. In addition, since the CME dynamic pressure is linearly related to the density and the square of the velocity, this quantity will be in particular more sensitive to higher CME input speeds and produce higher in-situ speeds than those measured. Another factor in the higher CME dynamic pressure can arise from the approximation of the CME as a cloud with homogeneous density in the model.

Figure 10
figure 10

K P prediction error (computed from the ensemble mean K P) compared to the CME input speed shows an overprediction of the K P value for CME input speeds above ≈ 1000 km s−1. Error bars indicate the ensemble K P prediction spread listed in Table 3.

Other factors contributing to K P overprediction may include the magnetic-field direction – two out of the three field configurations assumed produce persistent southward fields (135 and 180), so there is a bias toward geoeffective field configurations. Examining the distribution of north–south magnetic fields associated with the ICMEs of Richardson and Cane (2010) and the associated sheaths, in only 2 % of cases are southward fields completely absent, so the bias towards geoeffective field configurations is consistent with observations. However, both small and large maximum southward fields are observed relatively infrequently (e.g. maximum southward fields are < 4 nT in 17 % of events, and > 15 nT in 16 %), suggesting that the weighting of 90 and 180 clock angles should be reduced. In particular, reducing the 180 clock-angle weight would be expected to reduce the K P overprediction.

The last line of Table 3 lists the Brier Score calculated for each K P bin. Here, the BS is a measure of the magnitude of error in the K P probability forecast (how likely it is that a given K P bin will occur) in each bin. The BS values indicate that in this sample, the K P probability forecast is reliable for the K P=5 and 6 bins (BS=0.17 for both), and less so for the K P=3 and 4 bins (BS=0.27 and 0.19). Although the scores also indicate that the forecast is most reliable for the smallest and largest K P bins, most of the observations in this sample did not fall in these extreme bins, hence a larger sample is needed to verify the forecast reliability for these bins. Figure 11a shows the overall observed K P distribution and the forecast K P probability distribution for the events in Table 3 used to calculate the BS.

Figure 11
figure 11

K P forecast verification: (a) Histogram of the observed K P values (solid) and the forecast K P probability distribution (hatched) for this sample (see Table 3). (b) Rank histogram of K P predictions for all ensembles.

To further evaluate K P probability forecast reliability, we compare the observed K P to the spread in ensemble predictions. For most (12 out of 17) of the ensembles, the observed K P was within the overall predicted K P spread (column 11). The observed K P was also within the predicted mean K P±1 for 11 out of 17 of the ensembles. A rank histogram was also constructed for the K P predictions for all ensembles and is shown in Figure 11b, again normalized to an ensemble size of 9. To construct this rank histogram, the K P predictions are sorted from smallest to largest and the rank of where the observed K P value falls among the predicted K P values is noted. For example, an ensemble with a rank of 6 has the meaning that five K P predictions are lower than the observed K P value, a rank of 10 would mean that all nine of the K P predictions are lower than the observed K P (underprediction), and a rank of 1 means that the observed K P value is lower than all of the K P predictions (overprediction). The histogram has an overall flat shape, with more occurrences at rank 1 (the observed K P was lower than the predicted range) and fewer occurrences in the higher ranks, which shows the bias for K P overprediction (mean error=+0.6). Note that the rank histogram does not indicate the quality of forecasts, but only measures whether the observed probability distribution is well represented by the ensemble. Therefore, a uniform, flat rank histogram is a necessary but not sufficient condition for determining the reliability of ensembles (Hamill 2001).

7 ENLIL Parameter Sensitivity: 11 April 2013 Event Case Study

In the current configuration, other than the measured CME speed, direction, and size, the real-time WSA–ENLIL+Cone ensemble simulations use the default values for the model CME free parameters. In this section, we present a case study that examines the effect of changing these model free parameters on the ensemble modeling. The CME starting on 11 April 2013 at 07:24 UT was chosen for this study because of the large early arrival-time prediction error obtained for all members of the model ensemble. Taktakishvili, MacNeice, and Odstrčil (2010) studied the dependence of arrival-time predictions on the uncertainty in CME input parameters (speed, width, density ratio) for three Earth-directed CME events of varying speeds. A similar procedure was adopted for this case study, and by employing the ensemble-modeling technique, the parameter space can be sampled more systematically.

The original set of simulations performed in real time were chosen as the base ensemble (ensemble I). Subsequently, ten ensemble runs (ensembles II – XI), each containing 36 members for 360 total simulations, were performed to assess the sensitivity of the CME arrival-time prediction to changes in the model free parameters and ambient solar-wind model, while keeping the CME speed and direction input parameters fixed. The ENLIL-model free parameters considered in this study include the CME half-width, CME density ratio, CME cavity ratio, and ambient solar-wind reduction factor. The CME density ratio [dcld] is a free parameter, which by default is a set factor of four times larger than typical mean values in the ambient fast wind, providing a pressure four times higher than that of the ambient fast wind. The cavity ratio [radcav] is defined as the ratio of the radial CME cavity width to the CME width, with the default being no cavity [radcav=0]. The ambient speed reduction factor [vred] reduces the solar-wind speed provided by the WSA coronal map to account for expansion of the solar wind from the WSA boundary to 1 AU since WSA is calibrated against 1 AU in-situ observations.

Figure 12 shows the CME starting on 11 April 2013 at 07:24 UT as viewed from SOHO/LASCO-C3, STEREO/SECCHI-COR2-A and -B near 09:55 UT. On this date, the STEREO-B spacecraft was located at −142 and STEREO-A was at 133 in HEEQ coordinates. This CME was associated with an M6.5 class flare from AR 11719 located at N07E13 with a peak intensity at 07:16 UT. The eruption, coronal dimming, and wave were visible mostly Southeast of the active region in SDO/AIA 193 Å. Additionally, an increase in solar-energetic-particle proton flux was observed starting at around 07:40 UT by the SOHO/Comprehensive Suprathermal and Energetic Particle Analyzer (COSTEP) (reaching 1 pfu MeV−1, in the 16 – 40 MeV energy range), ACE/Electron, Proton, and Alpha Monitor (EPAM) (100 pfu MeV−1, 1.22 – 4.94 MeV), and GOES-13 EPEAD (5 pfu MeV−1, 15 – 40 MeV energy range) instruments starting at 08:00 UT, and by the IMPACT HET instruments onboard STEREO-B (5 pfu MeV−1, 24 – 41 MeV energy range) and -A (0.001 pfu MeV−1, 24 – 41 MeV energy range). This solar-energetic-particle event and its longitudinal extent were studied in detail by Cohen et al. (2014) and Lario et al. (2014).

Figure 12
figure 12

Coronagraph observations of the 11 April 2013 07:24 UT CME as viewed from (a) STEREO/SECCHI-COR2-B, (b) SOHO/LASCO, and (c) STEREO/SECCHI-COR2-A all near 09:55 UT.

As a result of the lack of availability of real-time concurrent coronagraph images, triangulation of CME parameters with the StereoCAT ensemble-mode method was not possible for this CME. Therefore, the ensemble was composed of custom members. The CME parameters for each member were derived from plane-of-sky CME speed measurements combined with the source location at the Sun. The distribution CME input parameters for 36 ensemble members are shown in Figure 13. Median CME parameters are: speed of 1000 km s−1, direction of −15 longitude, 0 latitude, and a half-width of 55. Figure 13 shows that custom ensemble members were chosen with speeds of 850, 900, 1000, 1100, and 1200 km s−1, between ± 10 latitude, −10 to −25 longitude with a half-width of 55. Subsequent re-analysis of the CME height–time evolution gives average plane-of-sky speeds of ≈ 800 km s−1 and ≈ 700 km s−1 for SECCHI-COR2-B and LASCO-C3, respectively, yielding a triangulated speed of 850±200 km s−1, −5±5 latitude, −15±10 longitude, 50±5 half width, which is represented within the ensemble members derived in real-time.

Figure 13
figure 13

Distribution of the 11 April 2013 CME input parameters shown in (a) the equatorial plane (latitude=0) and (b) meridional plane (longitude=0). The plots show the CME speed vectors in spherical HEEQ coordinates with the grids showing the degrees longitude (a) and latitude (b), and the radial coordinate showing the speed in km s−1. The Sun–Earth line is along 0 longitude and latitude. The arrow directions on the grid indicate the CME central longitude and latitude, respectively, and all CME half-widths are 55. The arrow lengths correspond to the CME speed. Median CME parameters are: speed of 1000 km s−1, direction of −15 longitude, 0 latitude, and a half-width of 55. This figure shows that custom ensemble members were chosen with speeds of 850, 900, 1000, 1100, and 1200 km s−1, between ± −10 latitude, −10 to −25 longitude with a half-width of 55.

The WSA–ENLIL+Cone model scaled-velocity contour plot is shown in Figure 14 on 13 April at 06:00 UT for the ensemble member with median CME input parameters. This simulation figure shows a nearly direct CME impact at Earth, slightly eastward. Figure 15 shows the base ensemble WSA–ENLIL+Cone modeled quantities for all 36 ensemble members (color traces) at Earth along with in-situ ACE (black) and Wind (gray) observations (when there are ACE data gaps). The model traces are color coded by CME input speed such that slow to faster input speeds are colored from light green to dark blue. All 36 of the ensemble members predicted that the CME would arrive (100 %) and the mean predicted arrival at Earth was 13 April 06:14 UT (range from 13 April 00:47 to 12:20 UT). The histogram of the distribution of arrival times is shown in Figure 16. The clustering of predicted arrival times in this histogram (and also in Figure 15) reflects the limited number of discrete CME input speeds represented in the ensemble (see Figure 13), with faster CMEs arriving first. The CME-associated shock was observed to arrive at ACE and Wind on 13 April at 22:13 UT, giving an average prediction error of −16 hours. Clear ICME signatures including an enhanced low-variability magnetic field, declining solar-wind speed, and low proton temperatures, start at around 16:45 UT on 14 April through about 18:30 UT on 15 April. The overall spread in arrival-time predictions of all of the members in the base ensemble (including the clustering by CME input speed) can also be seen in Figure 15 as the color traces increase ahead of the observed arrival. The traces also show that the velocity, density, and temperature are overpredicted, while the maximum magnetic-field strength is similar to that actually observed. The passage of this CME did not produce a geomagnetic storm because of an almost persistently northward magnetic field, shown in red in the top panel of Figure 15. The NOAA real-time observed K P index reached 3 during the synoptic period 21 – 24:00 UT on 13 April, while the Potsdam final K P was 3+. The Dst index shows a sudden storm commencement of +21 nT at 23:00 UT on 13 April and reached a minimum of only −7 nT at 11:00 UT on 15 April.

Figure 14
figure 14

Global view of the 11 April 2013 CME on 13 April at 06:00 UT: WSA–ENLIL+Cone scaled velocity contour plot for the (a) constant Earth latitude plane, (b) meridional plane of Earth, and (c) 1 AU sphere in cylindrical projection, for the ensemble member with median CME input parameters (speed of 1394 km s−1, direction of 9 longitude, −35 latitude, and a half-width of 46).

Figure 15
figure 15

11 April 2013 CME base ensemble: Model-calculated magnetic field, velocity, density, and temperature profiles at Earth for each ensemble member along with the observed in-situ L1 observations from ACE in black (red for B z ). Wind density observations are plotted in gray because of missing ACE values. The model traces are color coded by CME input speed such that slow to faster input speeds are colored from light green to dark blue. The CME-associated shock was observed to arrive at ACE and Wind on 13 April at around 22:13 UT with clear ICME signatures starting around 16:45 UT on 14 April through about 18:30 UT on 15 April. All of the arrival times indicated by the model results are earlier than the observed shock arrival and are clustered by CME input speed. The mean predicted arrival time at Earth is 13 April 06:14 UT, with a range from 13 April 00:47 to 12:20 UT.

Figure 16
figure 16

11 April 2013 base ensemble: Histogram of distribution of arrival-time predictions at Earth (one-hour bin size). The actual arrival was observed on 13 April at around 22:13 UT by Wind. The clustering of predicted arrival times reflects the limited number of different CME input speeds represented in the ensemble (see Figure 13), with faster CMEs arriving first.

In Figure 17 the arrival-time prediction error [Δt err=t predictedt observed] for the members in the base ensemble is plotted against the CME input speed for different CME input propagation directions (grayscale coded) and a fixed half width of 55 (full angular width of 110). On 11 April 2013, Earth was located at −5.9 latitude and 0 longitude in HEEQ coordinates, thus the input propagation direction of −10 latitude and −10 longitude (black, and dark blue in subsequent figures) represent the members with the most direct impact. This figure shows that the arrival-time prediction error ranges from −9.9 hours to −21.4 hours and increases with initial CME speed.

Figure 17
figure 17

11 April 2013 base ensemble: CME arrival-time prediction error [Δt err=t predictedt observed] for the ensemble members plotted versus the CME input speed for different CME input propagation directions (grayscale coded).

Considering that one source of the prediction error might be the uncertainty in the CME width, an identical ensemble (II) simulation was performed with the same input conditions, but decreasing the half-width by 10 (full angular width decreased from 110 to 90). Figure 18 shows the difference from the original predicted arrival times from the base ensemble against the CME input speed for different propagation directions (as shown in Figure 17) when the full angular width decreased from 110 to 90. In this figure (and those subsequent), the new CME arrival-time prediction error is shown in hours relative to the original base-ensemble prediction error. Compared to the original arrival-time estimates, the overall prediction error decreases by 0.2 to 1.8 hours with increasing initial CME speed. Since all of the predictions in the base ensemble were too early (negative prediction error), a decrease in prediction error means that the new predictions are shifted to later times, closer to the observed arrival time. Nevertheless, the improvement is small compared to the prediction error.

Figure 18
figure 18

11 April 2013 ensemble II: Difference from the base ensemble (as shown in Figure 17) in hours when the CME input half-width is decreased by 10 against the CME input speed for different propagation directions.

Next, the dependence of the prediction on the input CME density ratio [dcld] was considered. Two ensembles were performed (III and IV) for which all parameters of the base ensemble were held fixed, but the CME density ratio was adjusted from four (default) to two and three. The results are shown in Figure 19, which shows the difference from the predicted arrival time for the base ensemble as a function of CME speed for the two different density ratios. The prediction error decreases by 3.3 to 4.3 hours for a CME density ratio of two and by 1.3 to 1.7 hours for a density ratio of three, as a function of increasing initial CME speed. Hence, reducing the density ratio from four to two or three improves the arrival-time prediction by around 3.5 or 1.5 hours, respectively.

Figure 19
figure 19

11 April 2013 ensembles III and IV: Difference from the base ensemble (as shown in Figure 17) in hours when the CME density ratio dcld is decreased to dcld=3 and dcld=2 (default dcld=4) against the CME input speed for different propagation directions.

Another ENLIL CME parameter is the cavity ratio, which allows the CME to be represented by a spherical shell of plasma, based on coronagraph observations of CME cavities. The cavity ratio [radcav] is defined as the ratio of the radial CME cavity width to the CME width, with the default being no cavity [radcav=0]. Figure 20 shows the results of five ensembles (V – IX) with the CME cavity ratio adjusted to 0.1, 0.3, 0.5, 0.6, and 0.7, i.e. the CME is modeled as a progressively thinner shell as the ratio increases, using the base ensemble for all other parameters fixed. Specifically, the differences from the arrival times obtained for the base ensemble are plotted as a function of CME speed and direction (indicated by the symbol/line type) for each of these ensembles (indicated by the line color). For a cavity ratio of 0.1, the prediction remains largely unchanged compared to the base ensemble (with 0.15 hours). For the other cavity ratios, increasing differences in Figure 20 as the cavity ratio increases correspond to the predicted arrival moving to later times, reducing the prediction error. Furthermore, for each cavity ratio there is a spread (one to three hours) in prediction-time difference (compared to the base ensemble) for the different CME input directions with the more Earth-directed inputs showing the largest difference from the base ensemble. Overall, the prediction error decreases by between 0 – 1.6 hours, 0.9 – 3.9 hours, 1.8 – 4.7 hours, and 2.4 – 5.6 hours for cavity ratios of 0.3, 0.5, 0.5, 0.6, respectively.

Figure 20
figure 20

11 April 2013 ensembles V – IX: Difference from the base ensemble (as shown in Figure 17) in hours when the CME cavity ratio is increased (radcav=0.1, 0.3, 0.5, 0.6, and 0.7) against the CME input speed for different propagation directions. Different CME input directions are indicated by the symbol/line type and each ensemble is indicated by a different line color. The cavity ratio [radcav] is defined as the ratio of the radial CME cavity width to the CME width, and the default is no cavity [radcav=0].

Considering now the influence of the ENLIL ambient solar-wind solution, the in-situ data-model comparison (Figure 15) for the base ensemble indicates that the modeled background solar-wind speed is ≈ 125 km s−1 higher than the observed in-situ values, whereas the default value of the ambient solar-wind reduction factor vred is 25 km s−1. To examine the role of the speed reduction factor in the prediction error, vred was increased to 50 km s−1 and 75 km s−1 for two ensembles (X and XI). This factor reduces the speed provided by the WSA coronal map to account for expansion of the solar wind from the WSA boundary to 1 AU since WSA is calibrated against 1 AU in-situ observations. Figure 21 shows the prediction time difference from the base ensemble for these two ensembles, which show differences of 1 – 1.3 hours and 2.2 – 2.8 hours for vred of 50 km s−1 and 75 km s−1, respectively. Since the differences are positive, this indicates that the predicted arrival times are moved later, reducing the error relative to the observed arrival time. As might be expected, the modeled CME propagates more slowly when the ambient solar wind is slower. Figure 22 illustrates that the modeled background solar-wind speed better matches the observed speed before CME arrival when the ambient speed-reduction factor [vred] is increased to 75 km s−1 in ensemble XI.

Figure 21
figure 21

11 April 2013 ensembles X and XI: Difference from the base ensemble (as shown in Figure 17) in hours when the ENLIL ambient-speed reduction factor vred is increased to 50 km s−1 and 75 km s−1 (from the default value of 25 km s−1) against the CME input speed for different propagation directions.

Figure 22
figure 22

11 April 2013 ensemble XI: The predicted velocities (color traces) better match the observed in-situ values at ACE (black) when vred is increased to 75 km s−1 compared to the default vred=25 km s−1 results shown in Figure 15. The second peaks in the predicted solar-wind speed are artifacts of the ENLIL modeled CME as a spherical cloud.

Overall, this parametric case study shows that after the CME input speed, the cavity ratio and density ratio assumed in ENLIL have the greatest effects on the predicted CME arrival time, each changing this time by about three hours on average. Their effect is also more noticeable with higher CME input speeds. The CME input speed, cavity, and density ratios define the CME momentum that defines the CME deceleration. In addition to using the default values, new ensemble runs could be performed with changes to the CME cavity ratio and density ratio, as informed by coronagraph measurements of the CME. Here, we only examined the effect of changing the ad-hoc ambient-speed reduction factor in ENLIL, but we could also produce an ensemble of ambient-solar-wind WSA–ENLIL simulations using different ambient speed reduction factors that can be compared to in-situ measurements to determine the most suitable factor to be used in subsequent CME simulations. An ensemble forecast reflecting uncertainties in the background solar wind could also be produced by using a variety of magnetograms (from different observatories or processed using different techniques) as input to the WSA or WSA-ADAPT models.

8 Summary and Discussion

This study evaluated the first ensemble CME prediction system of its kind employed in a real-time environment, providing unique space-weather information. The ensemble prediction approach provides a probabilistic forecast that includes an estimation of arrival-time uncertainty from the spread in predictions and a forecast confidence in the likelihood of CME arrival. The current implementation explores the sensitivity of CME arrival-time predictions from the WSA–ENLIL+Cone model to initial CME parameters. First results give a mean absolute arrival-time error of 12.3 hours, a RMSE of 13.9 hours, and mean error of −5.8 hours (early bias), based on a sample of 30 CME events for which ensemble simulations were performed. The arrival time is generally based on the arrival of the CME-generated shock at Earth. The ensemble mean absolute error and RMSE are both comparable with other CME arrival-time prediction errors reported in the literature.

When considering the overall performance of CME arrival prediction, it was found that the correct rejection rate is 62 %, the false-alarm rate is 38 %, the correct-alarm ratio is 77 %, and the false alarm ratio is 23 %. Each ensemble CME arrival-time forecast includes a forecast probability p=n predicted hits/n tot, which conveys a forecast uncertainty about the likelihood that the CME will arrive, which can be compared with observations to determine forecast reliability. The Brier Score (BS) of 0.15 for all 30 ensemble CME arrival probabilities indicates that in this sample the predicted probability of the CME arriving is fairly accurate on average. (A BS of 0 on a range of 0 to 1 is a perfect forecast.) However, the reliability diagram (Figure 9a) shows that the ensemble simulations are underforecasting the likelihood that the CME will arrive in the forecast bins between 20 – 80 % and are slightly overforecasting in the 1 – 20 % and 80 – 100 % forecast bins. Overforecasting is when the forecast chance of CME arrival is higher than is actually observed; i.e. the CME is observed to arrive less often than is predicted. More ensemble simulations are needed for a more robust forecast verification of these probabilistic CME arrival time forecasts.

For 8 out of 17 of the ensemble runs containing hits, the observed CME arrival was within the spread of ensemble arrival-time predictions. The initial distribution of CME input parameters was shown to be an important influence on the accuracy of CME arrival-time predictions. Particularly, the median and spread of the input distribution should accurately represent the range of CME parameters derived from observations. This is seen in the rank histogram (Figure 9b), which illustrates that roughly half of the observed arrivals are outside the spread of predictions and also suggests undervariability in initial conditions; i.e. these ensembles do not sample a wide enough spread in CME input parameters.

Each set of ensemble simulations also provides a probabilistic K P forecast [p(K P=b)] for each K P bin [b], which can be compared with observations to determine forecast reliability. The Brier Score (BS) for the probabilistic K P forecast bins shows a reliability for the K P=5 and 6 bins (BS=0.17 for both), and less so for the K P=3 and 4 bins (BS=0.27 and 0.19). If choosing a single categorical K P forecast value, the mean predicted K P was found to have smaller prediction errors compared to using the K P bin with the highest likelihood from the probabilistic K P forecast. The observed K P was within ± 1 of the predicted mean K P for 11 out of 17 of the ensembles. The K P prediction errors computed from the mean predicted K P show a mean absolute error of 1.4, RMSE of 1.8 and mean error +0.4. There is a known overall tendency for the overprediction of K P, generally found for CME input speeds above 800 – 1000 km s−1. Again, more ensemble simulations are needed to provide better forecast verification and to calibrate the K P forecast.

This article focused on the forecast verification of the ensemble-modeling aspect of CME arrival and K P predictions. More events, as well as a comparison of results using different CME propagation models, are needed for more comprehensive forecast verification. These aspects are being investigated in a separate verification study that evaluates > 400 single WSA–ENLIL+Cone simulations (of which there are > 70 simulations containing CME arrivals) performed at the CCMC/SWRC.

The parameter sensitivity studied in Section 7 suggests future directions for this ensemble system. In addition to using the default model values, new ensemble runs could be performed with changes to the CME cavity ratio and density ratio as informed by coronagraph measurements of the CME. As discussed in Section 2, an accurate representation of the background solar wind is necessary for simulating transients, and prediction errors arising from background characterization and other model limitations should be considered. An ensemble forecast reflecting uncertainties in the background solar wind could be produced by using a variety of magnetograms (from different observatories or processed using different techniques) as input to the WSA or WSA-ADAPT models. From these results one can produce an ensemble of ambient solar wind WSA–ENLIL model outputs that can be compared to in-situ measurements to determine the most suitable coronal maps/model instance. These subselected WSA or WSA-ADAPT maps could then be used for a series of ensemble WSA–ENLIL+Cone CME simulations. Such an improved ensemble forecast would produce predictions that also reflect the uncertainties in the WSA–ENLIL modeled background solar wind in addition to the uncertainties in CME input parameters (as considered in this work).

Another improvement could be the use of real-time interplanetary scintillation (IPS) observations by the Ooty Radio Telescope (Manoharan 2006). These data can provide crucial information about the CME propagation and interaction with the surrounding solar wind, which could be used to provide updated information on CME parameters as the CME moves out from the Sun. This information could then be used to refine model predictions of the propagation of the CME. The STEREO/Heliospheric Imagers also provide CME propagation information out to 1 AU. However, it is not always possible to extract this information from real-time data, and the imagers do not always have an optimal viewing angle for Earth-directed CMEs. Comparisons of CME propagation from WSA–ENLIL with near real-time observations of the CME location inferred from IPS, the STEREO heliospheric imagers, or some other source can be used to select ensemble members with the best agreement using quantitative and visual inspection employing advanced visualization techniques such as 3D volumetric rendering (Bock et al. 2014).

Finally, the forecasting of CME arrival would benefit from the use of other propagation models, in addition to WSA–ENLIL, each with its own set of independently assessed input parameters, leading to a community-wide ensemble-prediction capability. A first step to such a capability is provided by the CME Scoreboard, described in Section 1, where anyone is invited to post their estimate of the arrival time of a recently observed CME in real-time.