1 Introduction

Changes in atmospheric circulation and dynamics can directly impact fluxes at the air-sea interface, sea-level pressure, and wind-waves. The impact of climate change can affect wind-waves, and therefore, it is very essential to have a proper understanding of its evolution, quantification, and evaluation of its variability having significant practical applications. The present study focuses on the Indian Ocean region that is bordered by a highly vulnerable coastline and islands directly impacted by sea-level rise, wave-induced flooding, and extreme weather events (Church et al. 2006). These effects indicate a demand for adaptive planning that can benefit the coastal communities in coping up with the associated risk of future wind-wave climate change (Morim et al. 2018). Therefore, the availability of high-quality data can undoubtedly provide the best possible estimate of changes pertaining to ocean surface waves.

In a broader perspective, the wave measurements from long-term records of Voluntary Observing Ships (1900–2000) have established negative trends (− 11 cm/decade) for the SIO (Trenberth et al. 2007). Calibrated and validated datasets from satellite altimeters are widely used to interpret global trends in wind and wave climate (Woolf et al. 2002; Meucci et al. 2020; Young et al. 2011; Young and Ribal 2019). A recent study (Gupta et al. 2015) reported that the latitudinal band 40° S–55° S elucidates the highest impact of climate change in the Indian Ocean (IO) region. Using similar datasets for 28 years, Sreelakshmi and Bhaskaran (2020b) reported an increasing trend in extreme wind-waves for the Extra-Tropical South Indian Ocean belt since 2011.

Among the reanalysis products, ERA5 has been recommended for its advancements and benefit in data assimilation techniques (Meucci et al. 2020). Studies reported an excellent agreement of ERA5 data collated with observations and models both at global and regional scales (Stefanakos 2019; Dullaart et al. 2020; Tarek et al. 2020; Rivas and Stoffelen 2019). Utilizing 41 years of ERA5 data, Sreelakshmi and Bhaskaran (2020c) stated that the AS and the head BoB show decreased wind-sea activity. Recently, Bruno et al. (2020) evaluated the performance of both the wind-sea and swell components for the western AS. Naseef and Kumar et al. (2020) have noticed an increasing trend for the maximum SWH (0.73 cm/year) for the IO. A consistent rate of increase in extreme wind speed (0.8–1.2 cm/s per year) and wave height (0.42–0.88 cm per year) has been reported for the south and central AS (Aboobacker et al. 2021).

Climate models developed under the Intergovernmental Panel on Climate Change (IPCC) have achieved great attention for their benefits in showcasing historical and future changes in various parameters under climate change scenarios. Ocean wave parameters are not available under the CMIP project; instead, the models simulate wind field, sea-ice concentration, and sea-level information. Prior studies have generated wave projections using input parameters from CMIP3 and CMIP5 projects. They have used both statistical methods (Wang and Swart 2018) and dynamical methods (Hemer et al. 2013a) to develop projections of wind-wave parameters. The future wave conditions are projected and evaluated across the global oceans (Hemer Mark et al. 2013; Morim et al. 2019) and regional basins (Wang and Swart 2018; Bricheno and Wolf 2018; Gallagher et al. 2016).

A regional study over the Northeast Atlantic Ocean by Perez et al. (2014) analysed the performance of CMIP5 models in simulating the wind speed. They reported that models ACCESS1.0, EC-Earth, HadGEM2-ES, HadGEM2-CC, and CMCC-CM performed well compared to NCEP-NCAR, ERA-40C, and NOAA-CIRES datasets. Zappa et al. (2013) had shortlisted EC-Earth, GFDL-CM3, HadGEM2-ES, and MRI-CGCM3 as the best-performing GCMs in reproducing the North Atlantic Extra-Tropical cyclones. For the European region, the wind speed simulated by EC-Earth, MIROC, and HadGEM2 correlated well during the winter season (Masato et al. 2013). A recent study by Morim et al. (2020) indicated that MRI-CGCM3 and MRI-ESM1 models overestimated the mean and extreme wind speeds due to considerable inter-model uncertainty. Their study indicated a negative bias in wind speed for most of the global oceans, with exception for the equatorial regions. Among the 19 CMIP5 GCMs, the EC-Earth model was reported as the top-performing model for the North-East Atlantic (Hazeleger et al. 2012). A seasonal difference of 5–10% in SWH simulations over the North Atlantic Ocean was depicted by the Wave Watch III (WWIII) model forced with ERA-Interim and EC-Earth wind data (Gallagher et al. 2016).

In context to the Indian Ocean, seven ensembles of the EC-Earth model are employed in the WAM model to generate wave parameters. Those simulations were validated against 72 in situ measurements, ERA-Interim, and CFSR datasets (Semedo et al. 2018). Historical SWH data show the highest difference in the NIO and the lowest bias for the extra-tropical SIO. The zonal wind stress distribution over the equatorial IO is weaker than the QuikScat, NCEP-1, and ERA-Interim datasets (Lee et al. 2013). Wave height simulations produced by WWIII forced with CMIP5 models exhibited a significant positive bias for the IO region (Casas-Prat et al. 2018). The overestimation of SWH data in some regions is attributed to the drawbacks in the SMC grid in representing the remote island locations. Wang et al. (2015) reported that CSIRO-Mk3.6.0 and EC-Earth exhibited substantial climate change signals in the annual mean SWH for the Eastern Tropical and NIO regions. The WAM model forced with the EC-Earth wind field have simulated a historical (1950–2010) wind speed and wave height trend for the IO as 0.13 × 10−2 m/s per decade and 0.78 × 10−2 m/decade, respectively (Dobrynin et al. 2012). A study using the MRI-AGCM3.2S model (Kamranzad and Mori 2018; Kamranzad et al. 2017) indicated a decrease in SWH for the NIO and the central SIO regions. Their study noticed considerable overestimation in the regions near Antarctica due to the absence of ice cover in the SWAN model compared to satellite measurements. Thereafter, Kamranzad and Mori (2019) demonstrated that SWAN outperforms the WWIII model in simulating SWH against satellite data for the IO region. Many regions in the IO and the west coast of Maldives are prone to unstable wind-wave climate under the extreme emission scenario. Another study (Chowdhury et al. 2019) used CMIP5 wind information to force the MIKE21 model. Their study (Chowdhury et al. 2019) signifies increased SWH for the Indian coast, specifically a rise of 30% in wave period for the east coast of India under the RCP 4.5 scenario. Remarkable model skills by GFDL-CM3 and MRI-CGCM3 are reported in a recent study (Srinivas et al. 2020) that reflected on the teleconnections (between SWH and Indian Ocean Dipole).

The Coordinated Ocean-Wave Climate Projections (COWCLIP) project has developed climate model projections with CMIP5 surface winds forced to WWIII (Hemer Mark 2010). The team has evaluated the performance of wave projections for global oceans by comparing them with ERA40C, ERA-Interim, and NCEP-CFSR data (Hemer and Trenham 2016). Projected changes in SWH from CMIP3 models (Hemer et al. 2013) have reported an increased wind activity over the eastern equatorial IO region. They found a consistent projected decrease in SWH and wind speed over the North and the East IO regions. The multi-model ensemble from CMIP5 models demarcated a projected decrease of 25.8% in the global SWH distribution (Hemer et al. 2013). The studies mentioned above have explored and presented the future wave climate projections for the global and regional domains. However, pertinent studies based on the CMIP5 model evaluation of SWHs in the IO domain utilizing various skill analysis methods are minimal.

Therefore, the present study used the COWCLIP datasets developed by the Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia (Hemer et al. 2012). The goal is to evaluate the skill level of 8 GCMs in reproducing the historical wind-wave climate (1979–2005) for the IO region. Evaluation of the model performance is planned by comparing the simulations with the latest ERA5 wave data. The IO sector is divided into different sub-domains based on wind-wave activity to differentiate the model performance at each region. Comparison exercise for the historical period would provide enough confidence among the models to evaluate its usage for future projections. The best-performing models can be used to construct an ensemble mean to access the projected changes in wind-wave climate for the IO, which would be executed in a separate study.

2 Data and methodology

2.1 Datasets

2.1.1 COWCLIP-CMIP5 GCM forced wave simulations

The COWCLIP project employs the spectral wave model, WAVEWATCH III (version 3.14) (Tolman 2009), to generate the global simulations at 1° × 1° spatial resolution (Hemer et al. 2013). Sensitivity of Wave Watch III simulations for the Indian Ocean region is documented in many studies (Seemanth et al. 2016; Remya et al. 2020; Swain and Umesh 2018).The model grid is generated using the DBDB2 v3.0 bathymetry and GSHHS shoreline database to define the obstruction grid for unresolved boundaries. The model wave spectra were discretized by non-linear frequency bands ranging between 0.04 and 0.5 Hz with a directional resolution of 15° and 25° (Hemer et al. 2013. Near-surface wind speed and sea-ice area fraction at a temporal resolution of 3 h serve as input for generating the wave parameters. The historical wave simulations covering 26 years (1979–2005) are generated using each of the 8 GCM simulated datasets. The URL link http://data-cbr.csiro.au/thredds/catalog/catch_all/CMAR_CAWCRWave_archive/Global_wave_ projections/HISTORICAL/CMIP5/catalog.html is used to extract the datasets from the CSIRO archive. This study utilized the SWH data forced by eight individual GCMs on a monthly resolution for the IO domain (Table 1).

Table 1 Details of CMIP5 models used in the COWCLIP project for wave simulations

2.1.2 ERA5-reference datasets

The fifth-generation ECMWF Reanalysis product (ERA5) is a replacement for the ERA-Interim that combined model data with global observations using the data assimilation technique. The ERA5 dataset can be used to assess the effectiveness of CMIP5 GCM forced wave simulations. Altimeter products available from 1993 is not sufficient for long-term climate model evaluations. Therefore, the reanalysis product serves as the best available wave dataset to represent the historical wind-wave climate. The ERA5 data spans 1979 to date with a spatial resolution of 0.25° globally (Hersbach et al. 2020). ERA5 datasets are produced through a 4DVAR data assimilation scheme in CY41R2 of ECMWF’s Integrated Forecast System (IFS). The data assimilation method incorporates the observations from satellite (satellite radiances-infrared and microwave, satellite retrievals from radiance data, GPS-radio occultation data, scatterometer data, altimeter data), in situ (buoys, ships, wind profiler, radar), and snow (land stations, satellite) data. The ERA5 products are superior to ERA-Interim in terms of higher spatial and temporal resolution, better representation of precipitation, evaporation, sea surface temperature, and sea ice (Rivas and Stoffelen 2019; He et al. 2021; Tarek et al. 2020; Gleixner et al. 2020). The ERA5 wave model product employs a wave spectrum having 24 direction and 30 frequency bands along with ETOPO2 bathymetry. A revised unresolved bathymetry scheme and wave advection scheme are included for improving the model in representing coastlines and unresolved islands (Bidlot 2012). Additional output parameters define the wave-modified fluxes, swell components, and freak waves in the ocean (Janssen and Bidlot 2009). Considering the inherent benefits, the present study uses monthly SWH data from the ERA5, which is interpolated to 1° × 1° grid as recommended in the COWCLIP project (Wang et al. 2015). The linear interpolation method is used, which maintains homogeneity for ERA5 reference compared to the GCM forced wave simulations.

2.2 Methodology

The GCM forced wave simulations produced under the COWCLIP project are evaluated through multiple performance metrics. The entire IO domain is divided into four sub-domains based on wind-wave activity, such as the Arabian Sea (AS), Bay of Bengal (BoB), Tropical Indian Ocean (TIO), and the South Indian Ocean (SIO). Geographical coordinates corresponding to each sub-domain are provided in Table 2 and Fig. 1. The performance metrics are calculated for individual sub-domains. The study also incorporates various methods to evaluate model skill in simulating the historical wave climate in addition to the spatial mean, standard deviation, and bias measures. The methods are widely applied in climate model evaluation for various ocean and atmospheric parameters. The chosen skill assessment metrics are also widely used in the evaluation of climate model performance. Metrics such as M-score, Taylor skill, MCPI, and MVI enable testing of the sensitivity of model performance. The evaluation of CMIP5 and CMIP6 models is performed using the Taylor skill score measure (Chen et al. 2021; Fan et al. 2020; Mohan and Bhaskaran 2019; Hirota and Takayabu 2013; Ito et al. 2020; Kusunoki and Arakawa 2015). M-score is another important metric for differentiating climate models based on their performance skill score (Hemer and Trenham 2016; Katzfey et al. 2016; Bador et al. 2015; Elguindi et al. 2014). The MCPI and MVI represent the relative error in the models by comparing them with the reference data (Gleckler et al. 2008; Chen and Sun 2015; Werner 2011; Díaz-Esteban et al. 2020; Luo et al. 2020).

Table 2 Description of the study area
Fig. 1
figure 1

Study area with sub-domains considered for the study

2.2.1 Mielke measure (M-Score)

The Mielke measure or M-score (Mielke Jr 1991; Watterson 1996; Watterson et al. 2014; Watterson 2015) is a non-dimensional matrix used to represent the skill level of models. They are widely applied in climate variables such as wave parameters (Hemer and Trenham 2016), sea surface temperature, precipitation, and mean sea-level pressure (Katzfey et al. 2016; Bador et al. 2015; Elguindi et al. 2014). M-score is calculated using Eq. 1 as:

$$ M=\left(\frac{2}{\pi}\right)\mathit{\arcsin}\left(1-\frac{MSE}{V_x+{V}_y+{\left({G}_x-{G}_y\right)}^2}\right)\times 1000 $$
(1)

where x corresponds to the modelled (GCMs) field, y is the observed field (ERA5 reanalysis), V signifies the variance, and G is the spatial mean of the variable (SWH) over the domain considered. The M-score represents the mean square error (MSE), which is non-dimensionalized by including the spatial variance of the field. The arcsin transformation denotes the square root of MSE, rather than MSE itself. This is particularly useful when the correlation coefficient values tend to be close to one. The calculated skill score range is 0–800, where a zero shows no skill for the model. After interpolating the x and y fields to the same grid, the scores were calculated for 8 GCM forced wave simulation outputs compared with the ERA5 data. The historical period of 1979–2005 is considered as fixed duration for all statistical measures.

2.2.2 Taylor skill score

Taylor skill score (Taylor 2001) relates the correlation coefficient and standard deviations of the models to observations. This score is a beneficial tool in the climate model evaluation as documented in many studies (Mohan and Bhaskaran 2019; Hirota and Takayabu 2013; Ito et al. 2020; Kusunoki and Arakawa 2015). The Taylor skill score is calculated as follows:

$$ SS=\frac{4{\left(1+R\right)}^4}{{\left( SDR+\frac{1}{SDR}\right)}^2{\left(1+{R}_0\right)}^4} $$
(2)

where R represents the correlation coefficient of each model with reference. The maximum correlation coefficient R0 (set to 1 for this analysis). The ratio of standard deviations of each model against the observed values is represented by SDR. A Taylor skill score value close to 1 shows a better skill for the model. We present the Taylor diagrams to feature the performance of the GCMs in each sub-domain (Krishnan and Bhaskaran 2020). In the Taylor diagram, the abscissa represents the reference dataset (ERA5). The azimuthal angle shows the correlation between the models and reference dataset, and the radial distance from the origin represents the standard deviation. The root mean square error is shown as proportional to the distance between each GCM and the reference (Ogata et al. 2014).

2.2.3 Model Climate Performance Index (MCPI)

The MCPI index emphasizes the models’ overall performance, which is estimated by averaging the relative errors across the fields and domains of the study (Gleckler et al. 2008; Chen and Sun 2015; Werner 2011). To calculate the MCPI, we estimate the root mean square difference (RMSD) between each model and reference dataset as follows:

$$ {E}^2=\frac{1}{W}{\sum}_i{\sum}_j{\sum}_t{W}_{ijt}{\left({F}_{ijt}-{R}_{ijt}\right)}^2 $$
(3)

where F is the simulated field; R is the reference field; i, j, and t represent the longitude, latitude, and time; and W is the weighted sum. Later, the relative error is calculated by relating individual RMSD values of each wave simulation (Emfr) and median of all RMSD values (\( {\overline{E}}_{fr} \)) calculated. The relative error is calculated as follows:

$$ {E}_{mfr}^{\prime }=\frac{E_{mfr}-{\overline{E}}_{fr}}{{\overline{E}}_{fr}} $$
(4)

The median of the RMSD values is calculated instead of the mean to reduce the influence of large errors in the results (Gleckler et al. 2008). Relative errors are calculated for 8 of the GCM forced wave simulations against ERA5 data at four sub-domains. Smaller values for MCPI indicate better agreement with the reference data, and a negative value usually indicates a remarkable skill than the typical model (Chen and Sun 2015).

2.2.4 Model Variability Index (MVI)

The Model Variability Index denotes the ratio of simulated to observed variance of the datasets considered (Gleckler et al. 2008). The MVI is calculated as

$$ {MVI}_{mr}={\sum}_{f=1}^F{\left[{\beta}_{mr f}-\frac{1}{\beta_{mr f}}\right]}^2 $$
(5)

where β2 and F represent the ratio of the model to ERA5 variance and the total number of variables respectively. A better model which replicates observations well would have the MVI value close to zero. This method addresses the model issues such as excessively large or small inter-annual variability (Bao et al. 2014). MVI is useful for evaluating the difference between the model and observation (Díaz-Esteban et al. 2020; Luo et al. 2020).

There can be limitations when using a single method to assess the model skills.

The M-Score does not provide details on the bias whether it is positive or negative, as the score is based on squared differences (Gu et al. 2015). On the other hand, MCPI does not reveal model errors as the measure is a residual of large spread in model performance which are variable specific. The metric MVI provides an overall performance index in representing the inter-annual variability (Radić and Clarke 2011). For the variables with a larger degree of variation, the error shown by the normalized Taylor diagrams will be smaller than the actual error (Gleckler et al. 2008). Therefore, we considered four different metrics to evaluate the performance of each model. Multi-model ensemble mean (MMM) is constructed using five best-performing models following the analyses as mentioned above. The performance of MMM and the individual models is analyzed and discussed in the subsequent sections.

3 Results and discussions

3.1 Spatial analysis of GCM forced wave simulations

The monthly mean of GCM simulated SWH is compared with the ERA5 Reanalysis for 26 years (1979–2005). Figure 2 illustrates that the wave simulations by MRI-CGCM3 and INMCM4 follow the spatial pattern of SWH similar to the ERA5. Besides that, HadGEM2-ES and ACCESS1.0 resemble the maximum SWH noticed over Eastern regions of the SIO. A gross underestimation is noticed in the CNRM-CM5 model SWH values all over the IO compared to ERA5. Wave height distribution ranged between 0 and 2 m for the North Indian Ocean (NIO), 2 and 3.5 m for the Tropical Indian Ocean (TIO), and 3.5 and 5 m for the South Indian Ocean (SIO) in the ERA5 data. The mean MMM constructed from the best-performing models reproduced a similar pattern compared to the reference dataset.

Fig. 2
figure 2

The mean SWH simulated by the GCMs and ERA5 over the Indian Ocean during 1979–2005

The standard deviation (STD) of any variable represents dispersion from the mean (Dobrynin et al. 2012; Kumar et al. 2020). From Fig. 3, the models follow a similar trend in locations with high STD values (North-Western AS and SIO). The model ACCESS1.0 replicates the STD distribution close to the ERA5 data. On the other hand, MIROC5 simulates a maximum STD value of around 1.2 m over the North-Western AS, an overestimation of about 0.2 m. As mentioned earlier, the CNRM-CM5 forced SWH data showed higher underestimation over the SIO region.

Fig. 3
figure 3

Distribution of spatial standard deviation for SWH simulated by the GCMs and ERA5 over the Indian Ocean during 1979–2005

Standard bias in climate model simulations has been discussed widely (Krishnan and Bhaskaran 2019a; Xu et al. 2014). Figure 4 clearly shows that most of the GCMs simulated SWH overestimate the ERA5 by a maximum of 1 m. The models illustrate the highest bias over the AS, equatorial IO, and the South-Western areas of the SIO. Slightly positive bias values over the Western TIO are attributed to the waves generated by trade winds, which does not have a relationship with model resolution (Hemer and Trenham 2016). An underestimation in SWH ranging between 1 and 1.5 m is depicted over the Eastern SIO. Compared to the individual models, MMM showcases a lesser bias for the overall IO. The accumulated bias in wave simulations is attributed to the wave model’s uncertainty, along with the input GCM fields (Hemer and Trenham 2016). The CMIP5 GCMs are reported to carry biases caused by the model parametrizations and model physics (Ma et al. 2014). The bias errors located at any specific region can be linked with phenomena occurring at distant locations (Wang et al. 2014). Nayak et al. (2013) reported that the remotely forced long waves generated from the Southern ocean influence the East coast of India. The wave climate of SIO comprises both swells and wind-sea generated by the trade wind system. The biases accumulated in these waves can be dispersed by swell waves (Lee et al. 2013). Variations in the SWH values over the SIO can be attributed to the lower skill of climate models in resolving this region’s complexity.

Fig. 4
figure 4

Spatial variation of bias errors in GCM forced wave simulations over the Indian Ocean during 1979–2005

The Taylor skill score of each GCM forced wave simulation and MMM are shown in Fig. 5. Taylor skill manifests the mean variability among the models in simulating SWH over the IO domain (Mohan and Bhaskaran 2019; Davini and Cagnazzo 2014; Hirota and Takayabu 2013). A common feature depicted among the models is that higher skill (0.8–1.0) is seen over the AS, BoB, and the equatorial IO. MIROC5 observes deficient skill close to zero over the Eastern and Western regions of TIO and the SIO. The benefits of ensemble average are reflected in MMM by exhibiting the highest Taylor skill score. The lowest model skill over the Western TIO region is seen prevailing in both individual models and MMM.

Fig. 5
figure 5

Taylor’s skill score (calculated over the period 1979–2005) obtained from individual GCMs and MMM

3.2 Performance evaluation metrics (sub-domain analysis)

In addition to the spatial variability analysis, the study demands a detailed investigation on the performance of individual GCMs over each sub-domain. A recent study by Sreelakshmi and Bhaskaran (2020a) had established the fact that highest wave activity exists over the extra-tropical SIO and lowest for the AS. The wave period data from CMIP5 and CORDEX datasets showed an overestimation for the SIO region (Chowdhury and Behera 2019). Therefore, the specific behaviour of wave climate in the IO domain demands verification of the model competency over each sub-domain. The variability of the modelled SWH by GCMs and ERA5 is dominant over the AS, BoB, TIO, and SIO regions from the bias error and Taylor skill. Following that, the performance matrices such as the Taylor diagram, M-Score, MCPI, and MVI are calculated for each IO sub-domain.

Firstly, we have evaluated the Mielke measure or M-Score to rank the GCM forced wave simulations. Hemer and Trenham (2016) have noticed the largest M-Score (765) for the full global domain. The same proves that the wave field simulated by the COWCLIP models reproduces the global structure relatively well. Furthermore, a detailed analysis on the individual sectors would provide additional confidence in choosing the best model. M-Score is calculated for 8 GCM forced wave simulations and MMM over four sub-domains (represented in Fig. 1). In Fig. 6, BoB shows the highest similarity (M-Score of 768) between MMM and ERA5 among the four sub-domains. The three domains other than AS have shown a higher skill for MMM than that of any individual model. The wave simulations by the best-performing models, HadGEM2-ES, BCC-CSM1.1, and ACCESS1.0, portray considerable skill compared to MMM for the AS domain. The mentioned models show the M-Score greater than 600 for the AS, BoB, and SIO, whereas they present low skill over the TIO. The MMM showed a comparatively low M-Score for the TIO (633) and SIO (640) basins. The model CNRM-CM5 (at least M-Score of 179 for TIO) underperforms consistently in the analysis. Similar observation is also noticed in the global simulations (Hemer and Trenham 2016). In contrast, CNRM-CM5 performs moderately well for the AS (M-Score of 576). The model MIROC5 also showed a notably high score for the AS compared to the other three domains. The remarkable performance of MMM in the four sub-domains is reflected in the M-Score analysis. The analysis suggests that the GCM forced simulations reproduced the wavefield structure of IO reasonably well.

Fig. 6
figure 6

M-Score containing the scores calculated for 8 GCMs forced wave simulation across each of four sub-regions

After determining the M-Score, certain independent checks have been performed for the GCM forced wave simulations to ascertain the wave data quality. The Model Climate Performance Index (MCPI) is a measure of the relative error linked with the root mean square errors between the simulated and observed fields. In addition, the Model Variability Index (MVI) evaluates the variance of the fields. Figure 7 shows the MCPI and MVI values, demonstrating the spread of the model performance and model variability for each domain. The smaller the values of MCPI and MVI, the better the model performance (Gleckler et al. 2008). Analogous to the M-Score, BoB showed a remarkable correlation for MCPI and MVI by establishing smaller values for the indices. A substantial degree of skill is visible for HadGEM2-ES, BCC-CSM1.1, and ACCESS1.0 for the AS domain. The wave simulations by INMCM4 are weaker for the AS and BoB. The model CNRM-CM5 underperforms in the TIO and SIO domains. For the BoB, TIO, and SIO domains, MMM outperforms the individual models. The models GFDL-CM3, BCC-CSM1.1, and ACCESS1.0 display the lowest skill for the SIO domain. The MMM holds higher MVI values than the best individual models for the AS, TIO, and SIO regions. Improved skill of MMM over the BoB, TIO, and SIO regions in terms of a negative value for MCPI indicates an improved skill than the typical model (Chen and Sun 2015).

Fig. 7
figure 7

Model Climate Performance Index (MCPI) versus Model Variability Index (MVI) for the four sub-domains

The Taylor diagram (Taylor 2001) represents the spread of models in terms of normalized correlation coefficient (CC), root mean square error (RMSE), and standard deviation (STD). Taylor diagrams are widely used to analyse the CMIP5 model skills (Krishnan and Bhaskaran 2019b; Miao et al. 2014; Semedo et al. 2018). The MMM for the BoB domain showed enhanced results (higher CC and lower RMSE) than individual models (Fig. 8). The standard deviation value for MMM is closer to the ERA5. For the SIO domain, MMM exhibits a higher STD value than the reference. The correlation coefficient is highest in the BoB and lowest in the TIO and SIO domains. The Taylor skill score of MMM in the AS domain is between the score of the other three domains. The RMSE is lowest in the BoB and TIO regions compared to the other two areas. Better effectiveness of the MMM over the individual GCM forced simulations are evident from the analysis.

Fig. 8
figure 8

Taylor diagram representing the skill of 8 CMIP5 GCMs and MMM for the four sub-domains

Various performance evaluation metrics (M-Score, MCPI, MVI, and Taylor skill) are summarized in Fig. 9. The performance index is standardized to 0–1 for a pronounced understanding of the skill of the GCM forced simulations. Performance scores vary based on domain and models, whereas a few models perform well at all the domains. The M-Score is highest for HadGEM2-ES in the AS domain, whereas MMM dominates in the other three domains. Taylor skill score is highest for the MMM, which is consistent in the four domains. MCPI is another index by which GFDL-CM3 outperforms the MMM for the AS and BoB. Simultaneously, models ACCESS1.0, MRI-CGCM3, INMCM4, and BCC-CSM1.1 are better than MMM for the TIO domain. In the SIO region, MMM shows notably more remarkable skill than the individual ensemble members. In terms of MVI, HadGEM2-ES, ACCESS1.0, and BCC-CSM1.1 establish a substantial skill level than the MMM in the AS and SIO.

Fig. 9
figure 9

Performance metrics for the SWH datasets simulated under the COWCLIP project for the IO and various sub-domains

Figure 10a, b summarizes the Total Performance Index (TPI) for each of the models in the four sub-domains. The TPI values recommend the average skill of each model in representing SWH values on a monthly and seasonal scale. The improved skill of MMM better than the individual models in simulating seasonal SWH over the BoB domain is similar to the monthly skill. Analysis of monthly data reveals the better skill of BCC-CSM1.1 and MIROC5 for the AS domain; instead, ACCESS1.0 dominates in the seasonal analysis. Unlike the other domains, the TIO and SIO agree to INMCM4 and MMM for the monthly mean and HadGEM2-ES and ACCESS1.0 for the seasonal mean values. In the seasonal analysis of SWH data, HadGEM2-ES, ACCESS1.0, and MMM perform better among the available models.

Fig. 10
figure 10

a Total Performance Index (TPI) for the SWH datasets (monthly scale) simulated under the COWCLIP project for the IO and various sub-domains. b Total Performance Index (TPI) for the SWH datasets (seasonal scale) simulated under the COWCLIP project for the IO and various sub-domains

There are few models (HadGEM2-ES, ACCESS1.0, and BCC-CSM1.1) which show outstanding performance in the study. The study also recommends the competency of MMM constructed using five selected models (MRI-CGCM3, ACCESS1.0, INMCM4, HadGEM2-ES, and BCC-CSM1.1) over the IO. The Fourth Assessment Report of IPCC (Randall et al. 2007) has mentioned that the MMM reduces the biases of individual models, retaining only the pervasive errors. The GCM forced wave simulations produced by the wave models are controlled mainly by the quality of the input wind forcing. The input field requires a sufficient resolution to represent the characteristics of storm systems, which causes the generation of surface waves (Hemer and Trenham 2016). Therefore, the variability in the wind input can be accounted as the significant factor influencing the differences in the skill of each wave simulation. Inadequate representation of atmospheric components in the GCM directly influences the near-surface wind speed (Morim et al. 2020). Model sensitivity in terms of uncertainty in the inputs was reported by Krishnan and Bhaskaran 2019a, 2019b, 2020). The analysis reported that near-surface wind speed simulated by HadGEM2-ES, ACCESS1.0, and MIROC5 is found as the best-performing GCMs for the BoB. A similar improved performance was also noticed in the wave model output created by the same GCM inputs. Correspondingly, wave simulations follow the same skill for the mentioned models, which are not valid for all IO sub-domains. Model performance can also vary when the models and observations agree on the external natural influences (e.g. extreme events, ENSO) (De Winter et al. 2013). In this context, Krishnan and Bhaskaran (2019b) noticed that during few cyclone cases in the BoB, maximum wind speed simulated by the GCMs underestimated the in situ records of RAMA buoys. The primary accumulation of bias in the GCM forced wave simulations over the SIO can be attributed to the modelling of sea-swell systems in that region. Swells generated in the SIO propagate towards the North Indian Ocean (NIO) and affect the wind-wave climate of the AS and BoB (Young 1999; Alves 2006; Sabique et al. 2012). The wind-seas at any region formed because the local winds are modified by interacting with the swells (Hanson and Phillips 1999). Vethamony et al. (2013) reported that the swells generated at 40° S directly propagate to the BoB without affecting the AS. Following that, Nayak et al. (2013) presented the propagation of swells from the SIO and its role in modifying the local wind-waves along the East coast of India. Other than the swells originating from the SIO, the Shamal swells formed in the Arabian Peninsula reach the West coast of India and modify that region’s wave climate (Aboobacker et al. 2011a, 2011b). Another wave system known as Makran swells propagates towards the Eastern and Western parts of the Arabian Sea (Anoop et al. 2020). All these swell systems, their propagation, and interaction with local wave conditions determine the resultant wind-wave climate of the IO region. Therefore, the weaknesses of GCM forced wave simulation can be accounted for various causes such as the inability of GCM forcing to reproduce the conditions, the drawback of wave model application in simulating the wave fields, and the intricate complexities existing in the study area.

4 Conclusions

Evaluation of the historical wave climate for the Indian Ocean was performed utilizing the simulations produced under the COWCLIP project. Performance evaluation methodologies such as Taylor skill score, M-Score, MCPI, and MVI are employed for the analysis. The GCM forced wave simulations were compared with the wave heights from ERA5 reanalysis. This study demonstrates that the GCM forced wave simulations showed variable skill depending on the region. Higher Taylor skill (0.8–1.0) from simulations was evident over the AS, BoB, and the equatorial IO. The BoB showed the highest similarity (M-Score of 768) between MMM and ERA5 among the four sub-domains. Based on MCPI and MVI, the models HadGEM2-ES, ACCESS1.0, and BCC-CSM1.1 revealed outstanding performance than MMM for the AS and SIO. The model CNRM-CM5 consistently underperformed in the analysis. Based on the skill statistics, the multi-model mean was constructed using the five best-performing models (MRI-CGCM3, ACCESS1.0, INMCM4, HadGEM2-ES, and BCC-CSM1.1). The most remarkable variations between the models were noticed over the SIO. The SIO is known as one of the swell generation areas in the IO. The difference in SWH over the SIO can be attributed to the lower skill of climate models in resolving the complexity over this region. The biases accounted in models can be attributed due to the weaknesses of GCM forcing in model physics, parametrization, and resolution. The present study provides a first-order analysis dealing with the skill of each GCM forced wave simulation by leveraging the advantage of the multi-model mean. A pronounced understanding on the historical wave climate would provide remarkable confidence in employing them for futuristic wave climate studies. Wave simulations produced under the COWCLIP project deal only with a limited number of ensemble members. Further studies will incorporate more ensemble member’s available and corresponding wave height projections for the IO region.