1 Introduction

Tropical cyclones (TCs) are extreme weather events that cause significant damage to life and property. This damage can be reduced by timely and accurate prediction of TC's location and intensity. The North Indian Ocean comprising Bay of Bengal (BoB) and Arabian Sea (AS) experiences the maximum number of TCs during the post-monsoon (October–December) season, followed by pre-monsoon (March–May) season (Osuri et al. 2013). An increasing trend in the intensity and frequency of TCs over the North Indian Ocean has been observed in recent decades (Singh et al. 2001; Balaji et al. 2018; Mani et al. 2018; Ashrit et al. 2021). It is, therefore, necessary to predict accurately the track, intensity, associated winds, and rainfall during the TC’s passage well in advance, at least ahead of a lead time of 48–72 hrs. In recent decades, there has been a substantial increase in the prediction of TCs up to a lead time of 120 hrs using Numerical Weather Models, which is attributed to advancements in resolution, data assimilation techniques and physics of the models (Mohapatra et al. 2013a, b, c; Bhate et al. 2021; Pattanaik and Mohapatra 2021; Prasad et al. 2021). Numerous efforts have been carried out globally to improve TC prediction (Cheung et al. 2018; Mehra et al. 2018; Hendricks et al. 2019). Furthermore, the model verification provides important information on the forecast quality and systematic errors, which are further helpful in the model improvements. Chen et al. (2018) evaluated the TC rainfall forecasts through the CRA method and found that overestimation of heavy rainfall close to the TC centre. Yu et al. (2020) investigated the source of forecast rainfall errors through the CRA verification method and showed that errors are mostly from rain patterns for light to moderate rains, followed by displacement errors, particularly for heavy rain. Their results also suggested that improving track prediction will further improve TC rainfall prediction. Therefore, verification helps to provide useful insights into model performance by determining the source and quantity of forecast errors.

There have been a number of studies carried out on the prediction of TC track and intensity (Mohapatra et al. 2013a, b, 2021; Osuri et al. 2013; Mohapatra 2014; Mohanty et al. 2019; Ashrit et al. 2021). Mohapatra et al. (2013b) evaluated the TC track forecast issued by India Meteorological Department (IMD) during 2003–2011 by calculating direct position error (DPE) and skill in the track forecast. They showed that type of track and intensity of TC play an important role in determining the track errors. Their results also suggested that DPE over NIO decreased yearly from 2003 to 2011. Osuri et al. (2013) found that higher resolution predictions yielded an improvement in track and intensity errors over the North Indian Ocean. Though the verification of TC track and intensity is carried out routinely, there is still a lack of sufficient attention on the verification of TC precipitation forecasts over NIO.

Verification of TC rainfall is commonly carried out by calculating bias, absolute error, and skill score metrics such as probability of detection (POD), false alarm ratio (FAR), equitable threat score (ETS) and frequency bias based on certain thresholds (Mohapatra 2014). TC rainfall associated with the landfall poses a severe threat to life and property. It is found that the forecast of rainfall at the time of landfall is tricky since there can be rapid intensification/rapid weakening over the coast just before the time of landfall (Ray et al. 2022). Hence it is necessary to accurately determine the source of rainfall errors during the landfall time. To better understand the source of rainfall errors, object-based verification techniques such as contagious rain area (CRA) (Ebert and McBride 2000; Yu et al. 2020; Dube et al. 2022), method for object-based diagnostic evaluation (MODE; Davis et al. 2009), and structure-amplitude-location (SAL; Wernli et al. 2008) method have been used in many studies. For the present study, TC rainfall verification has been carried out using the CRA method. In this object-based approach, rain objects are treated as contiguous regions where rain rate exceeds a user-specified threshold.

Chen et al. (2018) estimated the systematic errors in the location and intensity of Australian Community Climate and Earth Simulator (ACCESS) TC rainfall forecasts using CRA method and found that TC forecasts tend to produce more rainfall in the regions closer to the TC centre. Yu et al. (2020) investigated the rainfall forecast verification for landfalling TCs over China during 2012–2015. Their results showed that pattern error was the major contributor to the rainfall forecast error for light-moderate rainfall, followed by displacement error, particularly for heavy rainfall, while event verification indicated that forecast ability decreases with an increase in rainfall amount. A similar study by Osuri et al. (2020) over NIO also revealed that pattern error contribution to the total rainfall error is more followed by displacement error and volume error. Their results also suggested that the model is less skilful for heavy rainfall, particularly when initialized at higher intensity stages. The primary objective of the present study is to examine the skill of the WRF model over NIO for 5 days using 24 TCs during 2016–2020. Section 2 presents the datasets used and methodology, and section 3 discusses the experimental results, while conclusions of the study are presented in section 4.

2 Data and methodology

2.1 Data for validation

The IMD’s best track data is used for the validation of forecast track and intensity. It includes position (lat./lon.), sustained maximum wind speed, estimated central pressure, pressure drop at the centre, and stage of intensity. To estimate the IMD best track data, it considers all available surface and upper air observations from land and ocean, satellite, and radar observations. A review of the best-track estimates procedure for the TCs over NIO is explained in Mohapatra et al. (2012). IMD/RSMC Delhi publicly shares this data. Global precipitation measurement (GPM) mission is a network of satellites that provides the next-generation global observations of rain and snow. The GPM core observatory, launched in February 2014, carries two instruments, namely Ku/Ka-band dual-frequency precipitation radar and a multi-channel GPM Microwave Imager. The Integrated Multi-Satellite Retrievals for GPM (IMERG) provides precipitation estimates for half-hourly interval at 0.1°×0.1° resolution. The IMERG best precipitation estimates are obtained by calibrating it with the merged precipitation product from the above two instruments. The IMERG products are produced at three different latencies – the ‘Early’ run has 6-hr latency, the ‘Late’ run has a 16-hr latency, and the ‘Final’ run has a 3-month latency. For the present study, final run IMERG half-hourly dataset is used for the rainfall forecast verification.

2.2 Methodology

For the present study, 24 TCs over NIO during the 2016–2020 period (figure 1) are considered (table 1). Out of 24 cyclones formed during the period 2016–2020, 15 cyclones were formed over BoB and 9 cyclones were formed over AS. The total forecast sample size (for all the simulated cyclones) is 2414 forecasts. The maximum number of TCs are formed over BoB and in the post-monsoon season. The two highest-intensity TCs were Amphan and Kyarr, which were generated in BoB and AS, respectively. If we analyze the maximum intensity of the TCs, the maximum storms (10) reached very severe cyclonic storm (VSCS) stage, followed by cyclonic storm (CS) (6), severe cyclonic storm (SCS) (3), extremely severe cyclonic storm (ESCS) (3), super cyclonic storm (SuCS) (2). This study focuses on the evaluation of TC track, intensity errors and rainfall by using traditional verification methods such as calculating DPE skill scores, root mean square error (RMSE) and bias. As majority of the TCs in this study were dissipated over land (19 out of 24), CRA method is used to evaluate the systematic errors in the TC rainfall during landfall.

Figure 1
figure 1

Observed (IMD) best tracks of TCs during 2016–2020 over (a) BOB, (b) AS, and (c) model domain for simulations.

Table 1 Details of the TC considered in this study.

2.2.1 Model configuration and data assimilation

The present study uses the mesoscale model advanced weather research forecasting model version 4.2 (ARW: Skamarock et al. 2019) to simulate the tropical cyclones over NIO. The parent domain is South Asia, and the nested domain is extended to the equatorial Indian Ocean and BOB. The horizontal resolution of the outermost domain has been set to 18 × 18 km2, and the inner domain is 6 × 6 km2. The timestep for integration was set to 60  and 20 s for the two domains, respectively. The model used initial and boundary conditions from the 3-hourly GFS forecast with a horizontal resolution of 0.× 0.5° as it represents the evolution of TCs better (Malakar et al. 2020). For these simulations, the model is initialized with a 00 UTC GFS forecast from 5 days before the cyclone formation date. The forecasts were generated for 124 hrs and saved at hourly intervals. For the present work, the land surface scheme was represented by the Unified NOAH land surface model, PBL physics was represented by the MYJ TKE scheme (Janjić 1994), longwave radiation physics was represented by the RRTM scheme (Mlawer et al. 1997), and shortwave radiation scheme was represented by Dudhia scheme (Dudhia 1989). In addition, the model’s microphysics and cumulus physics were represented by the Thompson scheme (Thompson et al. 2008) and the revised Kain–Frisch scheme (Kain 2004). The model forecast of 18 km resolution has been analyzed for forecast verification.

The model initial condition (IC) has improved by En3DVar data assimilation techniques in the WRF data assimilation (WRFDA) framework. A detailed description of En3DVar is provided in Wang et al. (2008a, b). The En3DVar data assimilation uses the best strategies from the variational and ensemble Kalman filter (EnKF) technique. The variational assimilation technique assumes that the background forecast-error covariances are static and nearly homogeneous (Wang et al. 2008a, b). However, in real data cases, the error covariances change with time. The EnKF provides an alternative to variational data assimilation systems. The EnKF background-error covariance for a given initial condition (IC) is estimated from an ensemble of short-term forecasts (Wang et al. 2008a). The ensembles are used to calculate a flow-dependent estimate of the background error covariances. The En3DVar data assimilation system uses flow-dependent and static background error covariances to optimize the model background error. The resultant error is used in the variational cost function. The static background error covariances are calculated using the National Meteorological Centre method (Parrish and Derber 1992) and simulations of the duration of one month using the ARW model. Our earlier contributions (Rajasree et al. 2016; Malakar et al. 2020; Bhate et al. 2021, 2023; Munsi et al. 2021) have shown that the En3DVar data assimilation technique generates initial conditions, which provides better skill and is more efficient than other data assimilation techniques. The assimilation window is ±3 hrs of IC. The present study carried out the hybrid experiment by incorporating in-situ data, aircraft observations, satellite-derived winds, and satellite radiances provided by global data assimilation system. In the present study, 50 ensembles of initial conditions are generated. Here, the ensemble-generated error covariances are given a higher weight (75%) than the static background error covariances. The satellite radiances are from AMSUA, MHS, Suomi-NPP, AIRS, and HIRS sensors.

2.2.2 Error calculation of forecast parameters

DPE is the most commonly used error statistic for the verification of track of the TC forecasts. It is defined as the great circle distance between TC’s forecast position and observed position at the forecast verification time. For this study, the 6-hr interval IMD’s best track data is considered as the observed position of TC. The TCs forecast position is estimated from the location of the minimum sea level pressure (MSLP). To determine the initial TCs forecast position, MSLP was found within a search radius of around 3° of the observed position at the verification time. In this way, for further 6-hr intervals TC’s forecast position, a search for the location of MSLP was made within a 3° radius around the point of MSLP of the previous 6-hr forecast position. The mean track error for all the cyclones is given by

$$DPE = (i_{{1}} x_{{1}} + i_{{2}} x_{{2}} + ... + i_{n} x_{n} )/(i_{{1}} + i_{{2}} + ... + i_{n} ),$$
(1)

where i1, i2, ..., in are the number of forecasts and x1, x2,…, xn are DPE for TCs 1, 2, …, n. DPE alone gives no information on the directional bias of the forecast, i.e., whether the forecast is left/right, ahead/behind the observed track. To account for the above-mentioned characteristics, error components across (cross track error (CTE)) and along (along track error (ATE)) the track are also calculated. Figure 2(c) gives the graphical representation of track error components. CTE is the error component perpendicular to the observed track, i.e., great circle distance between the forecast point and the point of intersection of extrapolated observed track and perpendicular line from forecast point to the extrapolated line. Positive (negative) values of CTE indicate that the forecast position is right (left) of the extrapolated observed track. ATE is the error component along the observed track. It is the great circle distance between the current observed point and the point of intersection of the extrapolated observed track with the cross-track and the positive (negative) values of ATE indicate that the point of intersection lies ahead (behind) of the observed track which indicates a fast (slow) bias of the forecast track. The detailed procedure for calculating these errors is described in Heming (2017). While the intensity (wind) errors of TC forecasts are obtained by evaluating the model forecast with IMD maximum sustained winds (MSW) estimates. These errors are calculated in terms of bias and absolute error, where the bias gives the information about the under/over-estimation of winds, and absolute error gives the magnitude of errors.

Figure 2
figure 2

(a) Mean magnitude errors (direct position error (DPE), cross track error (CTE), and along track error (ATE)). (b) Mean bias (cross track bias (CTB) and along track bias (ATB)) over NIO in km. (c) Forecast error components.

The TC forecast rainfall during the TC period is verified by calculating categorical scores such as POD, FAR, ETS, and frequency bias (Fb) based on IMD rainfall classification (very light rain: 0.1–2.5 mm day1; light rain: 2.5–7.5 mm day1; moderate rain: 7.5–35.5 mm day1; rather heavy rain: 35.5–64.5 mm day1; heavy rain: 64.5–124.5 mm day1; very heavy rain: >124.5 mm day1). Here, rainfall within a search radius of 6° around the TC centre is considered for the calculation of the scores.

2.2.3 CRA methodology

The verification of rainfall forecasts during the landfall of the TCs is carried out using CRA method. CRA method is one of the verification methods used for the spatial verification of the forecast. It is an object-based verification approach that examines the properties of rainfall forecasts in terms of their location, size, intensity, and fine-scale pattern. In this method, the observed and forecast field has to be of same resolution and if not, they are brought to the same resolution using the interpolation methods. Then the observed and forecast fields are merged at each grid point by taking the maximum value of observed and forecast. The CRA is identified based on the rainfall threshold (e.g., 1 mm, 5 mm, 10 mm, ...), the search distance, and the best-fit criterion (minimum squared error (Ebert and McBride 2000), maximum correlation coefficient (Grams et al. 2006), or maximum overlap (Ebert et al. 2004)). In the present study, based on IMD rainfall classification, we have defined the thresholds of 7.5, 24.5, 35.5, 64.5, 95.5, and 124.5 mm day−1. These thresholds correspond to moderate rain, rather heavy rain, heavy rain, very heavy rain and extremely heavy rain. Here the search distance is 600 km around the TC centre. The CRA is defined as a region in the merged field bounded with the user-specified threshold value. Next, a pattern-matching technique is used to find the best match between observed and forecast entities within CRA. The best-fit criterion is achieved by maximizing the spatial correlation coefficient between forecast and observed fields. In the present analysis, the best-fit criterion is maximum spatial correlation. The location error is then the vector displacement of the forecast. Finally, the mean squared error (MSE) of the original forecast is decomposed into displacement, volume, and pattern error components

$${MSE_{total}} = {MSE_{displacement}} + {MSE_{pattern}} + {MSE_{volume}}.$$
(2)

For the maximum correlation coefficient best-fit criterion, the component errors are formulated as:

$${MSE_{displacement}} = {2}S_{F} S_{X} (r_{opt} - r),$$
(3)
$$MSE_{volume} = \, (\overline{F} - \overline{X} )^{{2}},$$
(4)
$$MSE_{pattern} = { 2}S_{F} S_{X} ({1} - r_{{{{opt}}}} ) + (S_{F} - S_{X} )^{{2}},$$
(5)

where SF and SX are standard deviations of original forecast and observed fields, respectively, r is the actual spatial correlation coefficient between forecast and observed fields, ropt is the maximum correlation coefficient between observed and shifted forecast field means after shift. The displacement error is sensitive to search radius and the total error is sensitive to best fit criterion.

3 Results and discussion

3.1 Track errors

The mean track error (magnitude and bias) and its components at different lead times w.r.t IMD best track is depicted in figure 2. From figure 2, it is found that DPE increases gradually with the forecast lead time. Similar behaviour is observed for CTE and ATE with forecast lead time. It is also seen that CTE has higher values for most of the lead times before 72 hrs and lower values at later lead times compared to ATE. From figure 2(b), positive bias in CT (cross track), i.e., the forecast track is eastward of the observed track till 72-hr lead, time and negative bias in CT, i.e., the forecast track is the west side of the observed track for further lead time. While negative bias in AT (across track) (forecast track is behind the observed track) till 72-hr and positive bias (forecast track is ahead of the observed track) for further lead time is observed. The DPE values over NIO for 12-, 24-, 48- and 72-hr lead time is 68.12, 91.79, 149.8 and 232.36 km, respectively. Correspondingly, the CTE values are 38.67, 46.7, 92.97 and 143.17 km and ATE values are 36.87, 56.94, 77.76 and 131.243 km.

Osuri et al. (2013) showed that the mean forecast DPE of the model at 18 km resolution over NIO for 12-, 24-, 48- and 72-hr lead time is 106, 129, 222 and 359 km, respectively, during 2007–2011. It can be seen that the values of mean DPE over NIO at different lead times are comparable and lesser than in the study by Osuri et al. (2013). Overall it is found that the forecast track is right and behind the observed track till a lead time of 72 hr, which is also observed in the studies of Mohapatra et al. (2013b) and Osuri et al. (2013) and the left and ahead of the observed track at later lead time. Comparing the magnitude of CTE and ATE, CTE dominates for most lead times before 72 hr and ATE dominates for further lead times.

3.2 Landfall errors

Mohapatra et al. (2013c, 2015, 2021) and Mohapatra and Sharma (2019) analyzed the landfall of cyclones generated over NIO. This study was based on the period 1961–2018. It was found that, among the cyclones formed over BoB, 48% of cyclones strike different parts of the east coast of India, prominently over Andhra Pradesh, Odisha and Tamil Nadu. Twenty-six per cent cyclones strike Bangladesh and Myanmar, and 18% of the cyclones formed over AS, strike the Gujarat coast. The percentage of cyclones dissipating over AS is higher (63%) than those over BoB (21%). In the present study, 19 out of 24 cyclones have landfall over the coastline, where 13 cyclones are formed over BoB and 6 cyclones over AS. The majority of cyclones formed over NIO, strike the east coast of India and then the other coastlines along NIO. This is in concurrence with the earlier studies. Of the 13 cyclones which were formed over BoB, five were dissipated over Tamil Nadu, two over West Bengal, two over Andhra Pradesh, two over Odisha, one over Myanmar and one over Bangladesh. The six cyclones formed over AS have landfall over Gulf of Aden (1), Oman coast (3), Somalia (1) and Maharashtra (1). Thus, most of the cyclones formed over AS had landfall over the Gulf coast.

The mean landfall position error and MSW error for NIO, BOB and AS at different lead times are shown in figure 3. It was observed that average landfall position error over NIO for 24 hr is 127 km, which is higher than the average landfall position error of 67 km for 24-hr forecast during 2009–2013 (as seen in the study by Mohapatra et al. 2015). The landfall errors are lower over BoB as compared to AS, which is also observed in Mohapatra et al. (2015). The average MSW error at the time of landfall is 4 ms−1 for 24-hr forecast lead time over NIO (figure 3b). The track errors for all the landfalling cyclones at the time of landfall are listed in table 2, and the intensity errors of the same cyclones are listed in table 3 for 24-hr lead time. It is seen that the average AT bias is negative, indicating delayed landfall response (table 2). It is found that, in general, the track errors for AS cyclones are more than for BoB cyclones. Also, the simulations for the cyclones with a life period of less than 5 days and which intensify to cyclonic storm stage exhibit significant errors in track and intensity compared to higher-stage intensity cyclones.

Figure 3
figure 3

(a) Mean landfall position error (km) and (b) mean absolute maximum sustained windspeed error (ms−1) (MSWE) during landfall time over NIO, BOB, and AS.

Table 2 Track errors with respect to IMD best track data for all the landfalling cyclones at the time of landfall.
Table 3 Absolute maximum sustained wind error with respect to IMD best track data at the landfall time.

3.3 Intensity errors

Mean error and bias in the intensity forecast (10 m MSW) based on IMD maximum sustainable wind is shown in figure 4(a). It is seen that the error increases gradually with the increase of forecast length for all the basins. The maximum intensity error over NIO increases from 4 to 12 ms−1. The mean error for the TCs over BOB is less than over AS during the short forecast time (before 48-hr forecast time), which ranges from 4 to 9 ms−1 and for later lead times, error over BOB basin is predominantly higher than over AS. The mean error for BoB after 48 hrs is in the interval 9–12 ms−1. The maximum intensity absolute error is 6.2, 9.1, 10.1 ms−1 for 24-, 48- and 72-hr forecast lead time over NIO, which is consistent with the study by Mohapatra et al. (2013a, b, c), where it was observed as 5.5, 7, 10 ms−1 for 24-, 48- and 72-hr lead time, respectively. The mean bias (figure 4b) tends to be positive with the increase of forecast length for NIO as a whole and BOB basin, i.e., overestimation of 10 m MSW for higher lead times. While for the AS basin, the bias is positive during intermediate lead times and negative for remaining lead times. The bias in maximum intensity over NIO is between 0 and 5 ms−1. The bias in maximum intensity over BoB is between 0 and 8 ms−1, while the bias over AS is in the range of 0 to –3 ms−1.

Figure 4
figure 4

(a) Mean absolute error, (b) mean bias of 10-m maximum sustainable wind for the TCs over North Indian Ocean (NIO), Bay of Bengal (BOB), and Arabian Sea (AS), and (c) percentage correct (overall accuracy) of forecast events within the same, within same and out by one category at different lead times.

To understand the percentage of forecasts that were predicted to be in the correct category based on IMD best track dataset, multi-categorical verification of TC’s intensity has been carried out. Figure 4(c) indicates the percentage correct forecast within the same category (red bars) and within ±1 category (green bars), which also include the same category. The correct forecast percentage decreases with an increase in lead time. The percentage correct values for 24–120 hrs lead time are 36, 28, 32, 26 and 22. While the percentage correct within the same and ±1 category for 24–120 hours lead time are 76, 60, 64, 56 and 60%, respectively. Furthermore, to understand whether the model overforecasts or underforecasts the intensity at different TC stages is analyzed for all the cyclones (figure 5). It is seen that model correctly forecasts the intensity of the TC about 25–35%, 5–20%, 12–30%, 10–30%, 35–85%, 10–25% range for the forecast lead times at depression (D: 8.7–13.9 ms1), deep depression (DD: 15–18 ms1), cyclonic storm (CS: 18–25 ms1 ), severe cyclonic storm (SCS: 25–32 ms1), very severe cyclonic storm (VSCS: 32–46 ms1), extremely severe cyclonic storm (ESCS: 46–62 ms−1) stages. Also, the model tends to predict the VSCS stage well compared to other stages except for 120-hr forecast lead time. The model underestimates (overestimates) the intensity prediction in the range of 0% (55–75%), 10–30% (55–80%), 15–25% (40–65%), 35–45% (25–35%), 15–60% (0), 60–80% (0) and 100% (0) for the forecast lead times at D, DD, CS, SCS, VSCS, ESCS, and SUCS stages. An increase in the percentage of underestimation of intensity prediction with an increase in TC stage is observed at all forecast lead times, which is also observed in the study by Routray et al. (2019). Overall it is seen that model forecasts mostly tend to overestimate the intensity prediction at D, DD, and CS stages and underestimate the same at ESCS and SUCS stages, while well predicted the intensity at VSCS stage.

Figure 5
figure 5

Percentage of forecast events at TC intensity stages for 24–120 hrs days lead times: (a) 24 hours, (b) 48 hours, (c) 72 hours, (d) 96 hours, and (e) 120 hours forecast.

3.4 Rainfall verification

3.4.1 Error statistics of rainfall forecast

Figure 6(a–c) depicts the scatter plots of IMERG rainfall vs. model estimated rainfall for the lead time 24, 48 and 72 hrs. The bias, RMSE and correlation coefficient (CC) also are computed for daily accumulated rainfall forecast. The bias for 24-, 48-, and 72-hr forecasts is 2.64, 3.18 and 3.35 mm, respectively. The rainfall is averaged over a region of 6° around TC centre for all the TCs during the study period at different forecast lead times. The RMSE and correlation coefficient of 24-, 48- and 72-hr are 46.5, 53.8 and 59.9 mm, and 0.43, 0.304, 0.207, respectively. The results indicated that the model rainfall errors are lower for 24-hr forecast time as compared to higher forecast times (48–72 hrs). Furthermore, the model bias is positive for all the forecast lead times, indicating overestimation of the rainfall.

Figure 6
figure 6

Verification of forecast rainfall for (a) 24 hr, (b) 48 hr, (c) 72 hr lead times against IMERG rainfall, and (d) skill scores for the occurrence of the rain event.

The forecast skill based on dichotomous rainfall forecast(yes/no) is carried out by measuring skill scores such as POD, FAR, ETS and Fb at different thresholds for 24-, 48- and 72-hr lead times (figure 6d). The formulae for the skill scores are listed in Appendix. It is seen that the POD is above 90% and FAR is below 10% for the occurrence of rain event. Equitable threat score (ETS) measures the fraction of observed and/or forecast events that were correctly predicted and adjusted for hits associated with random chance. The ideal score for ETS is one. The ETS for the forecasts under study is in the range of 0.6 and 0.7, with the highest ETS for 24 hr followed by 48 and 72 hr FC. The frequency bias is the fraction of frequencies of forecasted ‘yes’ events with those of observed ‘yes’ events. Fb less than one indicates the frequency of forecasted yes events is less than the observed yes events and vice versa. The frequency bias of the present study is close to one indicating all the rain events are forecasted by the model.

The dichotomous forecast does not provide in-depth understanding of forecast failure. Hence the skill scores are calculated for various rainfall categories (figure 7). The categories for rainfall are considered for moderate rainfall (7.6–35.5 mm), rather heavy (35.6–64.4 mm), heavy (64.5–124.4 mm) and very heavy rainfall (≥124.5 mm) based on IMD criteria. These are the criteria for daily accumulated rainfall. Figure 7(a) shows the POD for the categories 7.5–24.4, 24.5–35.4, 35.5–55.4, 55.5–64.4, 64.5–95.4, 95.5–124.4 and >124.5 mm. The POD for all these rainfall thresholds for the lead times 24, 48 and 72 hrs (figure 7a) shows that POD is maximum for very heavy rainfall categories, followed by 7.5–24.4 mm (moderate rainfall) and POD is least for 55.5–64.5 mm, i.e., rather heavy rainfall. The POD for very heavy rainfall is ~40% and that for other categories is in the range 10–40%. The POD for 24-hr is always higher, followed by 48- and 72-hr forecasts. This is in concurrence with the earlier POD for dichotomous rainfall forecast. Figure 7(b) shows the FAR for the multi-categorical forecast. The FAR for all the categories is above 60%; however, the lowest FAR is 70% for very heavy rainfall events (≥124.5 mm) and for moderate rainfall category is 70–75%. The ETS for all the rainfall categories is shown in figure 7(c). Here also, the very heavy rainfall category shows the maximum ETS, i.e., 0.08–0.2. The ETS for other categories of rainfall is in the range of 0.01–0.05. The Fb for multi-categorial forecast is exhibited in figure 7(d). The Fb is greater than for all categories above 55.5 mm rainfall. Thus, the frequency of rainfall forecast for rainfall 55.5 mm and above is higher than the observed frequency. The model is overestimating the rainfall for these rainfall thresholds. For all the categories below 55.5 mm, the frequency of rainfall forecast is almost matching with the observed frequency. Thus, the model shows better skill for very heavy rainfall (≥124.5 mm) and moderate rainfall (7.5–24.4 mm). In other words, the model well captures moderate and very heavy rainfall compared to other categories. The forecast error of each of these categories is further analyzed using CRA method.

Figure 7
figure 7

Skill scores at different thresholds for 24-, 48-, and 72-hr lead times, (a) POD, (b) FAR, (c) ETS, and (d) Fbias.

3.4.2 CRA verification

There are several techniques to evaluate the rainfall forecast. In the past few years, object-based techniques have been developed which compare the location, size, shape, intensity, and other attributes of the forecast and observation objects (Ebert and McBride 2000; Ebert and Gallus 2009). CRA (Ebert and McBride 2000; Sharma et al. 2015, 2019, 2020; Dube et al. 2022) and Method for Object-based Diagnostic Evaluation (MODE) (Davis et al. 2006) are two object-based verification methods.

CRA is a feature-based method. Firstly, an entity associated with observed and forecast fields is identified. Here, the rainfall associated with TC during the landfall day, located within 600 km of TC centre, is the entity to be evaluated. As described in the methodology section, in CRA method, observations and forecasts are brought to the same spatial resolution. Here, the resolution of observations, i.e., IMERG data is 0.1° and that of the forecast is 18 km. Hence the forecast is brought to 0.1° spatial resolution. Then the observations and forecast are merged to get the combined field. The maximum value from the observed and forecast rainfall is assigned at each grid point. The CRA is defined as all the regions comprised of all grid points with a value greater than 64.5 mm in the merged field. The best-fit criterion used is the spatial correlation coefficient. The area with the maximum correlation is searched iteratively by shifting the forecast grid points within the search radius. The location error (x, y) is then the vector displacement of the forecast. The displacement, volume and pattern errors are calculated using equations (3, 4 and 5). Figure 8 demonstrates the CRA verification results of model forecast and IMERG 24-hr accumulated rainfall during the landfall day of TC Amphan. Figure 8(a and b) represents IMERG and model forecast (shifted) rainfall and the black contour indicates the region of IMERG and model forecast merged rainfall fields exceeding the CRA threshold of 64.5 mm. The mean and maximum rainfall is higher for the model compared to IMERG rainfall data. The above overestimation of rainfall can also be seen in the scatter plot (figure 8c). The displacement error is 196 km (x) and 126 km (y), indicating that the longitudinal error is more than the latitudinal error. So, the model rainfall bias needs to be corrected to better match the IMERG data. The correlation coefficient and RMSE values are improved from –0.05 to 0.43 and 106.2 to 89.19 mm through CRA adjustment. The error decomposition indicates that pattern error contribution is larger than the total error, followed by displacement and volume error.

Figure 8
figure 8

CRA verification for 24-hr accumulated rainfall of TC Amphan (24-hr lead time). CRA: contiguous rain area, CC: correlation coefficient and RMSE: Root mean square error. The black contour indicates the region of rainfall exceeding the CRA threshold in the merged field.

The CRA methodology is applied to all the land-falling TCs during the study period to quantify the error components of rainfall forecast on the landfall day only. Figure 9 shows the Whisker plot of error decomposition components for different CRA thresholds at 24-, 48- and 72-hr forecast lead times. It is seen that for lower CRA thresholds, viz., 7.5 and 35.5 mm, pattern error is larger, followed by displacement and volume errors, which could be due to improper prediction of TC structure (Yu et al. 2020). As the CRA threshold increases, i.e., for 64.5 and 124.5 mm, the error contribution from pattern error decreases gradually, while the displacement and volume error contribution to the total rainfall error increases. Similar behaviour is observed for 48-hr lead time and 72-hr lead time. For the mean errors in the extremely heavy rainfall CRA threshold case, volume error contribution is larger for 24-hr lead time and displacement error contribution is slightly larger for 48- and 72-hr lead time, as the track errors are larger for longer lead times. Overall, pattern error dominates for low CRA threshold and with the increase of CRA threshold, displacement and volume error contribution increases gradually, which was also observed in the study (Chen et al. 2018). Therefore, pattern, displacement, and volume error can all have significant contributions to the total rainfall error, depending on the CRA threshold and forecast lead time.

Figure 9
figure 9

Error decomposition for 24-, 48-, and 72-hr forecast lead time at different CRA thresholds, (a) 7.5 mm, (b) 35.5 mm, (c) 64.5 mm, and (d) 124.5 mm. D: Displacement error, V: volume error, P: pattern error. The dot represents the mean error.

The CC for 24-hr forecast rainfall is more compared to 48- and 72-hr lead times (figure 10a). It is seen that after CRA adjustment also, the 24-hr forecast still performed better (higher CC, which is the sum of CC before the shift and CC difference) (figure 10) as compared to 48- and 72-hr forecasts. However, 48- and 72-hr forecasts have been most improved through CRA adjustment, especially for extremely heavy rainfall thresholds, i.e., having larger CC differences (figure 10b). Furthermore, to understand the reason behind the same, rainfall forecast from inner core (0–100 km from TC centre) region to TC environment region (200–400 km from TC centre) is analyzed (figure 10c, d), since TC track errors directly affect the location of rain area (Chen et al. 2018; Yu et al. 2020).

Figure 10
figure 10

Correlation coefficient (CC) for different rainfall thresholds at 24-, 48- and 72-hr lead times (a) before shift, (b) difference (after shifting CC – before shift CC), (c) rainfall and (d) rainfall error variation w.r.t to distance from tropical cyclone center for IMERG and 24-, 48- and 72-hr lead times.

It was observed that the model tends to produce high frequency of heavy rain events than low-moderate rain events within inner core region (not shown) and with an average rain amount of 77 mm (IMERG), 114 mm (24 hr FC), 107 mm (48 hr FC), and 96 mm (72 hr FC) (figure 10c). Even though the model captured the decreasing trend as seen in IMERG rainfall from the inner core to the TC environment region, it produced excessive rainfall within the inner core (higher positive rainfall error) as compared to the TC environment (figure 10d). As the locations of heavy rain are near to TC inner core (0–100 km), errors in the rain may be related to track errors. Hence, longer lead times, especially heavy rainfall thresholds, have greatly improved through CRA adjustment. Therefore, TC structure (associated with rain pattern) and TC track (associated with rain area location) need to be improved for better rainfall predictions (Yu et al. 2020). Further, the landfall of SUCS Amphan is discussed better to understand the error statistics for the model rainfall.

3.4.3 SUCS Amphan (16–21 May 2020)

Super cyclonic storm ‘Amphan’ formed over the Bay of Bengal (BoB), which concentrated into depression on 16th May 00 UTC, underwent rapid intensification from 17th noon to 19th early morning and made landfall over the West Bengal–Bangladesh coast as VSCS during 20th 1000–1200 UTC. After landfall, the system exhibited rapid weakening from the 20th noon to the 21st morning. The track and errors in track central pressure and MSW during the Amphan life cycle are shown in figure 11. It is seen that the track error is larger during the day the system made landfall (20th May 2020) (figure 11b). During the initial stage of TC, the model central pressure drop is more than that observed pressure. Moreover, the bias in central pressure and MSW is more during the initial TC stage as compared to later stages (figure 11c, d). Also, a sudden increase in central pressure and MSW bias is observed just after the landfall. From the CRA verification analysis, the average forecast rainfall (104 mm) is higher than that of observed rainfall (68.2 mm), and the maximum forecast rainfall (423.1 mm) is much higher than observed maximum rainfall (182.8 mm) (figure 8). Also, pattern error (59%) contribution is larger than the total rain error, which might be due to improper representation of TC during the initial stages, followed by displacement (29%), which could be due to significant track error during 20th May 2020 and volume (11%) errors (figure 8). Figure 12 shows day accumulated GPM-IMERG and model forecast rainfall during the cyclone period. During the initial stages of TC, spatial pattern difference is seen clearly in observed (figure 12a, c, e, g, i) and forecasted rainfall (figure 12b, d, f, h, j), while at later stages, significant shift in rainfall location and pattern difference is observed. Hence, TC Intensity and track predictions need to be further improved for better rainfall prediction.

Figure 11
figure 11

(a) Amphan track, (b) direct position error (DPE), (c) maximum sustained wind speed bias, and (d) minimum sea level pressure bias during the Amphan cyclone period.

Figure 12
figure 12

GPM accumulated rain and 24-hr forecast accumulated rain (a, b) 16th May, (c, d) 17th May, (e, f) 18th May, (g, h) 19th May, and (i, j) 20th May.

4 Conclusions

The track, intensity, and rainfall prediction are crucial factors in TC prediction. It is important from disaster management as well as a socio-economic point of view. Therefore, the forecast verification of these factors is required to improve the model prediction. Hence, the present study is carried out to examine the skill of the model by analyzing track, intensity, and rainfall forecast for 24 TCs over the North Indian Ocean during the 2016–2020 period. The study summarizes in following conclusions.

  • The track errors are evaluated in terms of DPE, ATE, and CTE errors. The DPE of TCs over NIO rises with the increase in lead time and ranges from 68 to 232 km for 12–72 hr forecast lead time. The ATE and CTE evaluation shows that the forecast track is southeastward of the observed track till 72 hrs and northwestward at later lead times.

  • The landfall DPE for NIO TCs is 120 km for 24 hr forecast and it increases with the lead time. The landfall errors are more for AS-originated cyclones compared to BoB-originated cyclones. The BoB has the least landfall error of around 100 km and cyclones formed over AS shows landfall DPE of the order of 180 km for 24-hr lead time.

  • The intensity error for TC forecast over NIO magnifies with forecast lead time from 4 to 12 ms−1. The intensity is overestimated for BoB cyclones and is undermined for AS cyclones. The VSCS stage of TC is correctly captured by the model forecast, followed by SCS stage in all the forecast lead times.

  • The dichotomous (yes/no) rainfall forecast verification indicates the POD 90% and FAR less than 10%. The multicategory forecast verification exhibits the highest POD for very heavy rainfall events (≥124.5 mm) and moderate rainfall (7.5–15.4 mm) categories.

  • CRA method classifies the spatial forecast error into pattern, displacement and volume error. For the moderate rainfall thresholds, viz., 75 and 35.5 mm, the pattern error is prominent compared to displacement and volume error.

  • For heavy rainfall threshold, viz., 64.5 mm, the displacement error is larger than volume and pattern error. For very heavy rainfall threshold, viz., 124.5 mm, pattern error is least compared to volume and displacement error. TC structure (associated with rain pattern) and TC track (associated with rain area location) need to be improved for better rainfall predictions.

Data assimilation, dynamical core and model physics are the three major components of numerical weather prediction. Hence, to improve the prediction of the TC parameters, viz., track, intensity and rainfall, we need to look into these three major components. The future scope of this analysis is to explore methods like the vortex relocation method to improve the vortex initialization in the model (Hsiao et al. 2010; Gao et al. 2014; Nadimpalli et al. 2021), incorporate more observations over oceanic regions, use artificial intelligence (AI)/machine learning (ML) techniques to reduce the model forecast biases. This will lead to more accurate predictions for tropical cyclones.