1 Introduction

The ocean is the dominant reservoir of the global water cycle, which is inextricably linked to climate change. Seawater temperature is a widely used ocean variable in global climate change analysis, characterizing the combined results of ocean thermal and dynamical processes and sea-air interactions (Wu et al. 2013). The Sixth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC) notes that the global surface temperatures per decade during the last 40 years have seen successively warmer than any decade since 1850 (Masson-Delmotte et al. 2021). Cheng et al. (2019) pointed out that the average temperature in the upper ocean (0 − 2000 m) since 1960 has increased by about 0.13 ℃, especially in the accelerating warming process since 1990. At the same time, seawater salinity is also an essential indicator of the global water cycle (Du et al. 2019). Over the past half-century, the global water cycle has accelerated due to continuous climate warming (Yu et al. 2020), which in turn affects seawater salinity. Both observations (Skliris et al. 2014) and numerical simulations (Held and Soden 2006) show a “fresh gets fresher, salty gets saltier” pattern in ocean surface salinity on multi-year intergenerational time scales.

To reconstruct historical ocean data and predict future environmental trends, a series of ocean survey programs (Liu et al. 2017) has been carried out and released corresponding products. Among them, data assimilation systems have been applied for the reanalysis datasets to coordinate and analyze various historical observations (Chi et al. 2018; Storto et al. 2019). Therefore, the reanalysis datasets are increasingly used in ocean climate change studies. Up to now, dozens of global reanalysis datasets have been publicly released, including the simple ocean data assimilation (SODA) series developed by the University of Maryland (Carton et al. 2018), the ocean reanalysis system (ORAS) series (Balmaseda et al. 2013) released by the European Centre for Medium-Range Weather Forecasts (ECMWF), the ECMWF reanalysis (ERA) series (Patra et al. 2020), the global ocean reanalysis system (GLORYS) series (Verezemskaya et al. 2021) established by the Mercator Marine Center, France, and the China ocean reanalysis (CORA) series (Han et al. 2011, 2013a, 2013b) developed by the National Marine Data and Information Service, Ministry of Natural Resources of the People’s Republic of China.

However, there exists a certain bias in different regions in terms of ocean temperature and salinity variables among different analyses (Bengtsson et al. 2004). Overall, the trends of global mean temperature and salinity among different datasets may vary as much as 30% to 40% (Carton et al. 2019). In the tropical Indian Ocean, the study found lower sea surface temperature (SST) in the global ocean data assimilation system (GODAS) and higher SST in ensemble coupled data assimilation (ECDA), SODA, and ORAS4 (Karmakar et al. 2018). In the tropical Pacific Ocean, there exist small deviations of ocean salinity among different reanalyzed datasets. At the same time, the most significant divergence and errors were found in the Southern Ocean and other regions with large frontal variations, which may be caused by the lack of observational data (Balmaseda et al. 2015; Shi et al. 2017). Therefore, according to previous studies, reanalysis datasets are unlikely to perform well in all regions or all periods.

In recent years, the applicability of reanalysis data in the adjacent sea areas of China has been explored compared to the observations from stations and satellites. The regional reanalysis product CORAv1.0 has attracted more scholarly attention. Wu et al. (2013) compared the seasonal and interannual variability of SST near the China seas between CORAv1.0 and SODA2, and the results compared to Advanced Very High Resolution Radiometer (AVHRR) satellite data showed that the CORAv1.0 dataset had a smaller error. Zhang et al. (2016) also found that CORAv1.0 performed better in capturing the intraseasonal variability of SST than the Estimating the Circulation and Climate of the Ocean, Phase II (ECCO2) and SODA2, but it needed further improvement at the subsurface layer. Chao et al. (2021) calculated the errors of CORAv1.0 in the northwest Pacific and pointed out that the root mean square error (RMSE) of 0 − 2000 m temperature and salinity were 0.61 ℃ and 0.08 psu, respectively. In addition, Gao et al. (2015) evaluated the applicability of the ERA-Interim dataset for SST in the Bohai Sea. Overall, the assessment of reanalysis datasets in Chinese waters is relatively homogeneous with a comparative analysis of only single or two reanalysis datasets (Wu et al. 2013; Gao et al. 2015; Zhang et al. 2016; Chao et al. 2021). However, comprehensive studies of spatiotemporal variability of ocean temperature and salinity among multiple reanalysis datasets are lacking.

The marine economy of the coastal provinces in the Yellow and Bohai seas (YBS) has been witnessing rapid growth. Based on data from the China Marine Economic Statistical Yearbook 2019 (Ministry of Natural Resources of the People’s Republic of China 2021), the total marine economy of the six provinces along the YBS, namely, Liaoning, Hebei, Tianjin, Shandong, Jiangsu, and Shanghai, amounted to 4,291,360 million yuan in 2019. This accounts for approximately 15.27% of the gross regional product. Due to global climate change, the physical ocean parameters such as temperature and salinity in the YBS have changed significantly in recent decades (Lin et al. 2001; Park et al. 2015; Wei et al. 2020). The increase in seawater temperature and salinity will significantly impact the future regional climate change, ocean environment, and aquatic ecology in the YBS. However, the magnitude of difference for various reanalysis datasets cannot be neglected in the assessment. As such it is crucial to select an appropriate reanalysis dataset. Therefore, evaluations of the reliability and accuracy of seawater temperature and salinity of reanalysis datasets over the YBS need to be carried out, which are essential to many climate applications such as the reconstruction of historical marine environment and the initialization of seasonal and decadal forecasts.

In this study, we first attempt to evaluate multiple global reanalysis products in the YBS. The paper is organized as follows: data and method are described in Section 2, comparison results are in Section 3, discussions are in Section 4, and summaries are in Section 5.

2 Data and method

2.1 Data sources

2.1.1 Reanalysis data

In this paper, we select eight reanalysis datasets for assessing temperature and salinity changes in the YBS, including (1) the fifth generation of the ECMWF reanalysis (ERA5, http://apps.ecmwf.int/data-catalogues/era5/?class=ea); (2) the version 3.4.2 of SODA (SODA3.4.2) developed by the University of Maryland (UM), USA (https://www2.atmos.umd.edu/~ocean/index_files/soda3.4.2_mn_download_b.htm); (3) the version 2 of global ocean reanalysis ensemble product (GREPv2) released by the Mercator Ocean (https://resources.marine.copernicus.eu/products0); (4) the sub-datasets in the GREPv2 ensemble, including the ocean reanalysis system 5 (ORAS5) published by the ECMWF, the second generation, version 4 of GLORYS (GLORYS2v4) distributed by Copernicus Marine Environment Monitoring Service (CMEMS), the version 7 of the Euro-Mediterranean Center on Climate Change (CMCC) global ocean reanalysis system (C-GLORSv7); and (5) the first and second generations of CORA released by the National Marine Data Center (CORAv1.0, CORAv2.0, http://mds.nmdis.org.cn/pages/dataViewDetail.html?dataSetId=48). The details of the resolution, assimilation methods, and ocean assimilation variables of the above reanalysis datasets are shown in Table 1. It is worth mentioning that, due to the unavailability of daily data in SODA3.4.2 and CORAv1.0, with their highest time resolutions being 5 days and 1 month respectively, all reanalysis data used in this study were obtained on a monthly interval for subsequent analysis.

Table 1 The introduction of eight reanalysis datasets

2.1.2 Observation data from ocean stations

The observational temperature and salinity data from six ocean stations (shown in Fig. 1) used in this paper can be downloaded from the National Marine Data Center of China (http://mds.nmdis.org.cn/pages/dataViewDetail.html?dataSetId=9). The observed temperature and salinity data include the daily SST and the daily average sea surface salinity (SSS) from January 1996 to December 2020 at three ocean stations, including Shidao (36.5° N, 122.3° E), Xiaomaidao (36.0° N, 120.3° E) and Lianyungang (34.5°N, 119.3°E). The observed hourly SST datasets from January 2011 to December 2020 at three ocean stations, including Laohutan (38.9°N, 121.7°E), Zhifudao (37.6° N, 121.4° E), and Lvsi (32.1° N, 121.6° E), are also collected. The daily mean values are arithmetically calculated based on the hourly SST from the ocean stations, and the monthly average, seasonal average (March to May for spring, June to August for summer, September to November for autumn, and December to February of the following year for winter), and the annual average data are further calculated based on the daily average SST and SSS.

Fig. 1
figure 1

Location map of the study area

2.1.3 Multi-observation ocean temperature and salinity data

The global multi-observation ocean 3D temperature and salinity data (ARMOR3D) dataset are used in this study, which is available through the CMEMS implemented by Mercator Ocean (https://data.marine.copernicus.eu/product/MULTIOBS_GLO_PHY_TSUV_3D_MYNRT_015_012). ARMOR3D dataset combines remote sensing observations (SST, sea level anomalies, and geostrophic surface currents) with in situ vertical profiles of temperature and salinity obtained primarily from the Argo network using statistical methods (Guinehut et al. 2012; Verbrugge et al. 2017). The dataset has a spatial resolution of 0.25° and a temporal resolution of weekly/monthly averaging, covering 50 layers of vertical data from 0 to 5500 m. The main variables include temperature, salinity, and sea surface height. The synthesis of the ARMOR3D multi-observation dataset is mainly divided into two steps. The first step synthesizes the temperature field from satellite altimeter data and in situ observations using multiple/simple linear regression methods and synthesizes the salinity field from satellite altitude data. The second step combines the synthesized fields with in situ temperature and salinity profiles using optimal interpolation.

The ARMOR3D dataset exhibits a high level of robustness, as evidenced by its minimal RMSE values (Guinehut et al. 2012). Notably, the RMSE of seawater temperature is observed to be 0.80 °C at a depth of 100 m and 0.20 °C at a depth of 1000 m, respectively. Similarly, the RMSE of salinity is 0.10 psu at a depth of 100 m and 0.05 psu at a depth of 1000 m, respectively. Therefore, the observation-based ARMOR3D dataset has become widely used as an independent benchmark against satellite-derived products (Su et al. 2021; Hu and Zhao 2022), numerical results (Kaurkin et al. 2016), or assimilated reanalysis datasets (Cipollone et al. 2017; Iakovleva and Bashmachnikov 2021).

2.2 Data processing

To facilitate the comparison with the measured data of the ocean station, the grid point which is closest to the latitude and longitude of the ocean station is selected and used as the corresponding data of the ocean station. If the deviation is large, the four nearest surrounding grid values are selected, and they are further calculated by the distance-weighted linear interpolation method (Gao et al. 2015). To directly compare with the ARMOR3D multi-observation dataset with 0.25°, all reanalysis datasets are regridded to a 0.25° × 0.25° horizontal grid to perform bilinear interpolation (Carton et al. 2019). Although some information might be lost in the regridding (Li et al. 2020), previous studies used similar methods and showed that data quality was not affected to a greater extent (Carton et al. 2019; Arshad et al. 2021).

The parameters, such as linear correlation coefficient (COR), standard deviation (SD), centered root mean square error (CRMSE), and bias ratio relative to observations (BR) of the reanalysis datasets, are calculated for the applicability evaluation of eight reanalysis datasets. The formulae are listed as follows:

$$\mathrm{COR}=\frac{{\sum }_{i=1}^{n}\left({a}_{i}-\overline{a}\right)\cdot \left({o}_{i}-\overline{o}\right)}{\sqrt{[{\sum }_{i=1}^{n}{\left({a}_{i}-\overline{a}\right)}^{2}]\cdot [{\sum }_{i=1}^{n}{\left({o}_{i}-\overline{o}\right)}^{2}]}}$$
(1)
$$\mathrm{SD}=\sqrt{\frac{{\sum }_{i=1}^{n}{\left({a}_{i}-\overline{a}\right)}^{2}}{n}}$$
(2)
$$\mathrm{CRMSE}=\sqrt{\frac{{\sum }_{i=1}^{n}{[\left({a}_{i}-\overline{a})-{(o}_{i}-\overline{o}\right)]}^{2}}{n}}$$
(3)
$$\mathrm{BR}=\frac{{\sum }_{i=1}^{n}\left({a}_{i}-{o}_{i}\right)/n}{[{\sum }_{i=1}^{n}{o}_{i}/n]\times 100\%}$$
(4)

where n is the number of all values or elements; {ai} is the actual observations time series; {oi} is the estimated or forecasted time series; \(\overline{a}\) is the mean of a1, …, an; and \(\overline{o}\) is the mean of o1, …, on.

Furthermore, the Taylor diagrams (Taylor 2001) are employed to effectively differentiate the performance of various reanalysis datasets based on the aforementioned COR, SD, and CRMSE values.

3 Results

3.1 Comparative analysis with ocean station observations

3.1.1 Interannual variation

The annual mean SST and SSS variation are calculated based on the daily temperature and salinity observation data from the coastal ocean stations (Shidao, Xiaomaidao, Lianyungang, Laohutan, Zhifudao, Lvsi) in the study area from 1996 to 2020. Eight reanalysis datasets, including ERA5, GREPv2, C-GLORSv7, GLORYS2v4, ORAS5, CORAv1.0, CORAv2.0, and SODA3.4.2, are compared with the ocean station observations. As shown in Fig. 2, a consistent annual mean SST variation is found between the reanalysis datasets and the ocean station observations. There is no significant trend (p > 0.05) in the annual mean SST at Shidao, Xiaomaidao, and Lianyungang from 1996 to 2010. However, since 2011, there exists a significant warming trend (p < 0.05) for SST in all six ocean stations. The average warming rate of the reanalysis data is 0.17 ℃/yr, which is slightly lower than the ocean station with 0.19 ℃/yr, and positive deviations in the reanalysis datasets are found. It is worth noting that the average deviations of SST at Shidao, Xiaomaidao, and Lianyungang during 2011 − 2020 for the four reanalysis datasets, including GREPv2, GLORYS2v4, ORAS5, and CORAv2.0, are reduced to some extent compared with those in 1996 − 2010. The average reduction in absolute deviation over the past 10 years ranges from 12 to 20%, indicating that the accuracy of SST in these four reanalysis datasets has improved during the last decade.

Fig. 2
figure 2

Comparison of annual mean SST variations between ocean stations, eight reanalysis datasets, and the ARMOR3D multi-observation dataset in the YBS (1996 − 2020: a Shidao, b Xiaomaidao, and c Lianyungang; 2011 − 2020: d Laohutan, e Zhifudao, and f Lvsi)

Figure 3 compares the annual mean SSS variation between ocean stations and reanalysis datasets from 1996 to 2020. The deviation range of the annual mean SSS of the reanalysis dataset is − 5.06 ~ 4.60 psu, and the negative deviation is predominant. There is no significant trend (p > 0.05) for the interannual variability of SSS at Shidao and Xiaomaidao, while there exists a decreasing trend through the 95% confidence test at Lianyungang. The decreasing rate of eight reanalysis datasets with − 0.03 psu/yr is lower than that of the observations with − 0.12 psu/yr. The CORA series (CORAv1.0 and CORAv2.0) is closest to the observations at Shidao and Xiaomaidao, with mean absolute deviations of 0.45 psu and 0.59 psu, respectively. The deviations of GLORYS2v4 and SODA3.4.2 are relatively large, with maximum negative deviations of − 5.06 psu and − 4.03 psu, respectively. However, the salinity at Lianyungang is quite different. The CORA series exhibits a large positive deviation, while SODA3.4.2 and C-GLORSv7 datasets have relatively small deviations.

Fig. 3
figure 3

Comparison of the annual mean SSS variations between ocean stations, eight reanalysis datasets, and the ARMOR3D multi-observation dataset in the YBS during 1996 − 2020 (a Shidao, b Xiaomaidao, and c Lianyungang)

Positive deviations in SST are observed at the Shidao, Xiaomaidao, and Laohutan stations. Among these stations, the largest deviations are observed at the Shidao station, with the values of BR ranging from 4.08% to 18.67% for eight reanalysis datasets. In contrast, negative deviations in SST are observed at the Lianyungang and Lvsi stations. Figure 4 illustrates the Taylor diagram, which compares the annual mean SST of eight reanalysis datasets and the ARMOR3D multi-observation dataset with ocean station observations in the YBS. Compared to eight reanalysis datasets, the ARMOR3D multi-observation dataset exhibits the highest COR values and the lowest CRMSE values at Shidao, Xiaomaidao, and Zhifudao stations. Additionally, the remaining three stations also exhibit enhanced performance. These findings highlight that ARMOR3D exhibits robust performance in accurately representing SST within the YBS. Among the eight reanalysis datasets, the GREPv2 and ORAS5 datasets exhibit the highest CORs and the lowest CRMSE. However, it is worth noting that the SST deviations of the CORAv1.0 dataset are notably higher compared to the other reanalysis datasets, particularly at Shidao, Xiaomaidao, and Lvsi stations, as depicted in Fig. 4. Furthermore, spatial variations in the correlation coefficient are observed across different stations. Laohutan station demonstrates the highest correlation coefficient, with all values surpassing 0.90. Conversely, the Lvsi station displays the lowest correlation coefficient among the stations.

Fig. 4
figure 4

Taylor diagram of the annual mean SST of eight reanalysis datasets and the ARMOR3D multi-observation dataset compared to the ocean station observations in the YBS (a Shidao, b Xiaomaidao, c Lianyungang, d Laohutan, e Zhifudao, and f Lvsi)

Similarly, the Taylor diagram of the annual mean SSS of the reanalysis datasets is presented in Fig. 5. Overall, the correlations between the reanalysis datasets and the observed data are generally low, typically below 0.50 in absolute terms. It is worth noting that the mean CRMSE of ARMOR3D for three ocean stations is 0.68 psu, which is lower than all other seven reanalysis datasets (ranging from 0.82 to 1.15 psu). Among these seven reanalysis datasets, GREPv2 exhibits the smallest mean CRMSE at 0.82 psu, while SODA3.4.2 displays the highest mean CRMSE at 1.15 psu. Furthermore, when compared to the ocean station observations, the reanalysis datasets consistently show a negative deviation in SSS. Notably, the maximum BR is observed at Xiaomaidao, indicating a substantial deviation at this location.

Fig. 5
figure 5

Taylor diagram of the annual mean SSS of eight reanalysis datasets and the ARMOR3D multi-observation dataset compared to the ocean station observations in the YBS (a Shidao, b Xiaomaidao, and c Lianyungang)

3.1.2 Intrayear variation

Figure 6 shows the intrayear variation of SST between eight reanalysis datasets, the ARMOR3D multi-observation dataset, and the ocean station observations. The overall seasonal trends of the reanalysis datasets and the observations are consistent. At the same time, the deviations of seasonal SST between eight reanalysis datasets, the ARMOR3D multi-observation dataset, and the ocean station observations are shown in Fig. 7. Overall, the SST deviation in summer is the largest, among which Shidao, Xiaomaidao, and Laohutan stations all show obvious positive deviations, especially there exists a positive deviation at Shidao station exceeding 4.00 °C. The SST in winter for six stations is generally overestimated. The deviation in spring and autumn is relatively smaller than that in summer and winter. In comparison to eight reanalysis datasets, the ARMOR3D multi-observation dataset demonstrates a generally reduced deviation compared to the observed data. Among the reanalysis datasets, the deviations of the ORAS5 dataset in the autumn and winter are found to be generally smaller than other reanalysis datasets, while the CORA series exhibits larger deviations than the other datasets. Furthermore, it is worth mentioning that the GREPv2 ensemble reanalysis dataset has shown a significant improvement in seasonal SST deviation, with a notable reduction of 15% when compared to GLORYS2v4 and C-GLORSv7.

Fig. 6
figure 6

Comparison of intrayear SST changes between eight reanalysis datasets, the ARMOR3D multi-observation dataset, and the ocean station observations in the YBS (1996 − 2020: a Shidao, b Xiaomaidao, and c Lianyungang; 2011–2020: d Laohutan, e Zhifudao, and f Lvsi)

Fig. 7
figure 7

Deviations of seasonal SST between eight reanalysis datasets, the ARMOR3D multi-observation dataset, and the ocean station observations in the YBS (1996 − 2020: a Shidao, b Xiaomaidao, and c Lianyungang; 2011 − 2020: d Laohutan, e Zhifudao, and f Lvsi)

The deviations of seasonal SSS between seven reanalysis datasets, the ARMOR3D multi-observation dataset, and the ocean station observations are shown in Fig. 8. The SSSs of the reanalysis datasets are generally lower than the observed data at three ocean stations, except for the CORA series. Comparing the seasonal differences of all the reanalysis datasets, the seasonal deviation of SSS of C-GLORSv7 and SODA3.4.2 is the smallest among all datasets. However, the negative deviations of GLORYS2v4 in all seasons are higher than other reanalysis datasets, with the largest deviation of − 4.86 psu in the autumn at Xiaomaidao station. In addition, the deviation of the CORA series varies notably at different stations, among which the deviations at Shidao and Xiaomaidao stations are considerably smaller than other reanalysis datasets, while there exists a positive deviation at Lianyungang station. In terms of the ARMOR3D multi-observation dataset, the salinity deviation is found to be the smallest at Shidao and Xiaomaidao, in comparison to the other seven reanalysis datasets. However, a significant positive deviation is observed at Lianyungang, which can potentially be attributed to the substantial difference in the location of the sampling points. The center point of the sampling grid in ARMOR3D and CORAv1.0 is noticeably distant from the actual location of the Lianyungang ocean station, with measurements of approximately 0.37° and 0.74°, respectively. In contrast, the distances from the center point of the other reanalysis datasets are less than 0.24°.

Fig. 8
figure 8

Deviations of seasonal SSS between seven reanalysis datasets, the ARMOR3D multi-observation dataset, and the ocean station observations in the YBS from 1996 − 2020 (a Shidao, b Xiaomaidao, and c Lianyungang)

3.2 Horizontal variations of temperature and salinity

3.2.1 Annual mean variation

The spatial variation of regional mean annual SST and SSS in the YBS for eight reanalysis datasets, including ERA5, SODA3.4.2, GREPv2, C-GLORSv7, GLORYS2v4, ORAS5, CORAv1.0, and CORAv2.0, is compared to the observation variation based on the ARMOR3D multi-observation dataset. Figure 9 shows the deviations of regional multi-year mean SST in the YBS with a deviation bar of − 2.1 ~ 2.1 ℃. Regarding the overall regional deviations, C-GLORSv7, GLORYS2v4, and CORAv2.0 datasets show large positive deviations, and ERA5, ORAS5, and GREPv2 datasets have relatively low positive deviations. However, SODA3.4.2 and CORAv1.0 datasets have an obvious negative zone for SST, and the negative deviations are primarily found in the central part of the North Yellow Sea, the eastern Yellow Sea along the Korean coast, and the boundary between the southern Yellow Sea and the East China Sea. The regional mean annual SST is all overestimated compared to the ARMOR3D observation dataset with a relative deviation ranging from 0.88% ~ 3.15%. Regarding the region of the northeastern Yellow Sea region, the reanalysis datasets generally show positive deviations, especially for the SODA3.4.2 dataset with a deviation of 1.4 ~ 2.1 ℃. Regarding the region of the Bohai Sea, the SST has been overestimated for all reanalysis datasets. In addition, the deviations of the GREPv2 ensemble dataset are lower than those of C-GLORSv7 and GLORYS2v4, indicating that the ensemble data’s regional SST quality has improved compared with the individual dataset. The results show that the applicability of ERA5, ORAS5, and GREPv2 is relatively good for regional multi-year mean SST in the YBS.

Fig. 9
figure 9

Spatial distribution of the deviations of regional multi-year mean SST in the YBS (a ERA5 (1993 − 2020), b SODA3.4.2 (1993 − 2019), c GREPv2 (1993 − 2019), d C-GLORSv7 (1993 − 2019), e GLORYS2v4 (1993 − 2019), f ORAS5 (1993 − 2019), g CORAv1.0 (1993 − 2019), and h CORAv2.0 (1993 − 2019))

Similarly, the deviations of regional multi-year mean SSS for different reanalysis datasets in the YBS are shown in Fig. 10. There are apparent differences in SSS deviations ranging from − 12.9 ~ 2.1 psu among different reanalysis datasets. Except for the CORAv2.0 dataset, the regional mean annual SSS is generally underestimated with the BR of − 6.66% ~  − 0.58%. SODA3.4.2, GREPv2, C-GLORSv7, GLORYS2v4, and ORAS5 datasets mainly have negative deviations. However, both CORAv2.0 and SODA3.4.2 datasets have positive deviations in the Bohai Sea, with the SODA3.4.2 dataset reaching a maximum deviation of 2.1 psu. When comparing the SSS deviations among all the reanalysis datasets, the CORA series demonstrates relatively smaller deviations compared to other reanalysis datasets. Particularly, the CORAv1.0 dataset stands out with relatively good quality, featuring the smallest BR of − 0.58%.

Fig. 10
figure 10

Spatial distribution of the deviations of regional multi-year mean SSS in the YBS (a SODA3.4.2(1993 − 2019), b GREPv2(1993 − 2019), c C-GLORSv7(1993 − 2019), d GLORYS2v4 (1993 − 2019), e ORAS5(1993 − 2019), f CORAv1.0(1993 − 2018), and g CORAv2.0(1993 − 2019))

3.2.2 Intrayear variation

The intrayear variation of SST between the reanalysis datasets and the ARMOR3D multi-observation dataset is shown in Fig. 11a. The monthly SST shows a similar trend compared to the observations. The reanalysis datasets all overestimate the SST in winter and spring with positive deviations of 0.23 ~ 0.77 ℃, especially the ERA5 and ORAS5 datasets having the smallest deviation (as shown in Fig. 11b). Except for CORAv1.0, the deviations of most reanalysis datasets are all reduced to < 0.25 °C in summer. In particular, the deviation of GREPv2 is only 0.04 ℃. The SST of CORAv1.0 shows a negative anomaly in summer, with a maximum deviation reaching − 0.78 ℃. Positive SST deviations are found for most reanalysis datasets in autumn. Consistent with the intrayear variation, the seasonal SST deviation of the GREPv2 ensemble dataset has been improved compared with GLORYS2v4 and C-GLORSv7. In general, the ERA5 dataset has relatively good applicability for monthly SST variation.

Fig. 11
figure 11

Monthly mean SST change (a) and deviations of seasonal mean SST between the reanalysis datasets and the ARMOR3D multi-observation dataset (b) in the YBS during 1993 − 2020

Figure 12a shows the intrayear variation of SSS between the reanalysis datasets and the ARMOR3D multi-observation dataset. There is a large SSS deviation in summer compared to the observations, especially with a maximum deviation in August. Six reanalysis datasets, including SODA3.4.2, GREPv2, C-GLORSv7, GLORYS2v4, ORAS5, and CORAv1.0, generally underestimate the SSS in four seasons (as shown in Fig. 12b). Among them, CORAv1.0 has the smallest deviation with a deviation range of − 0.31 ~  − 0.01 psu, while CLORYS2V4 has the largest deviation with a maximum deviation of − 3.46 psu. However, CORAv2.0 generally overestimates the SSS in the whole year with a deviation range of 0.24 ~ 0.62 psu. Regarding the seasonal difference, the SSS deviation of the reanalysis datasets is largest in summer, while it decreases to the lowest in winter. Overall, the quality of the CORAv1.0 dataset is relatively good based on the abovementioned evaluation of monthly SSS variation.

Fig. 12
figure 12

Similar to Fig. 11 but for SSS

3.3 Vertical variations of temperature and salinity

In this study, the ARMOR3D multi-observation dataset is selected as another observation dataset to evaluate the applicability of the vertical profile of seven reanalysis datasets in the YBS from 1993 to 2020, including SODA3.4.2, GREPv2, C-GLORSv7, GLORYS2v4, ORAS5, CORAv1.0, and CORAv2.0. It is worth mentioning that no vertical profile of the ocean is available from ERA5 as it is an atmospheric reanalysis; thus, the ERA5 dataset can only provide the sea surface temperature data. Since the water depth in the YBS is shallow with an average depth of only 18 m in the Bohai Sea and 44 m in the Yellow Sea and the depth is generally lower than 80 m (Liu et al. 2007), thus, the vertical temperature and salinity data were extracted for the upper 80 m for further analysis.

3.3.1 Interannual variation

Deviations of annual mean vertical temperature between the reanalysis datasets and the ARMOR3D multi-observation dataset in the YBS during 1993 − 2020 are shown in Fig. 13a. The ocean temperature deviation in the upper 20 m is less than 1.0 °C for the six reanalysis datasets except for CORAv2.0. With the increase in depth, there exhibits a positive bias for the CORA series, GREPv2, C-GLORSv7, GLORYS2v4, and ORAS5 reanalysis datasets, especially with a maximum deviation at 60 m depth. In contrast, the SODA3.4.2 dataset has the largest negative deviation with a maximum value of − 5.0 ℃. The statistical analysis of the vertical temperature between reanalysis datasets and ARMOR3D can be found in Table 2. The correlation coefficients of five reanalysis datasets, CORAv1.0, GREPv2, C-GLORSv7, GLORYS2v4, and ORAS5, demonstrate strong agreement, exceeding a high value of 0.97. However, the correlation coefficients of SODA3.4.2 and CORAv2.0 are comparatively lower, measuring below 0.83. Overall, SODA3.4.2 has a large negative deviation with a BR value of − 11.13%, while the other reanalysis datasets have positive deviations. In addition, in terms of CRMSE, the C-GLORSv7 and GREPv2 datasets exhibit lower values of 0.35 ℃ and 0.45 ℃, respectively. Conversely, the SODA3.4.2 dataset demonstrates a maximum CRMSE value of 1.71 ℃. Therefore, the applicability of C-GLORSv7 and GREPv2 is relatively good for the mean vertical temperature in the YBS, while the deviation of SODA3.4.2 is considerably large.

Fig. 13
figure 13

Deviations of annual mean vertical temperature (a) and vertical salinity (b) between the reanalysis datasets and the ARMOR3D multi-observation dataset in the YBS during 1993 − 2020

Table 2 The statistical analysis of annual mean vertical temperature and salinity between the reanalysis datasets and the ARMOR3D multi-observation dataset in the YBS during 1993 − 2020

Figure 13b shows the deviations of annual mean vertical salinity between the reanalysis datasets and ARMOR3D in the YBS during 1993 − 2020. The seawater salinity deviations in the upper 20 m of the CORA series are relatively small, while the other five reanalysis datasets have relatively large negative deviations. The CORA series reanalysis datasets also have good applicability for vertical salinity with the increase in depth. Table 2 also gives the statistical analysis of the vertical salinity between reanalysis datasets and ARMOR3D. Strong positive correlations of vertical salinity between seven reanalysis datasets and ARMOR3D are found, especially the CORA series datasets having maximum values. However, the GLORYS2v4 dataset has the largest CRMSE with 0.95 psu.

3.3.2 Intrayear variation

Figure 14 shows the deviations of seasonal mean vertical temperature between the reanalysis datasets and ARMOR3D in the YBS during 1993 − 2020. Overall, the deviations in spring and winter are small for six reanalysis datasets except for SODA3.4.2, with the deviations generally less than 2 ℃. The deviations of vertical temperature increase in summer and autumn, and the larger deviations occur at 50 ~ 60 m depth corresponding to the seasonal thermocline. The deviations of seasonal mean vertical salinity between the reanalysis datasets and ARMOR3D in the YBS during 1993 − 2020 are shown in Fig. 15. The salinity deviations in the upper 20 m vary greatly with the season, which is relatively large in summer compared to other seasons. Specifically, five reanalysis datasets, including SODA3.4.2, GREPv2, C-GLORSv7, GLORYS2v4, and ORAS5, are found to be underestimated the salinity in the upper 20 m. In addition, it is worth mentioning that the CORA series reanalysis datasets compare with the ARMOR3D observed data well in the YBS.

Fig. 14
figure 14

Deviations of seasonal mean vertical temperature between the reanalysis datasets and the ARMOR3D multi-observation dataset in the YBS during 1993 − 2020 (a spring, b summer, c autumn, and d winter)

Fig. 15
figure 15

Similar to Fig. 14 but for seasonal mean vertical salinity

The statistical analysis of monthly mean vertical temperature and salinity between the reanalysis datasets and the ARMOR3D multi-observation dataset in the YBS during 1993 − 2020 are shown in Table 3. Strong positive correlations of vertical temperature between seven reanalysis datasets and ARMOR3D are found. The C-GLORSv7 dataset compares with the observed data well with the smallest CRMSE (1.05 ℃) and BR (2.75%). However, the deviations of SODA3.4.2 and CORAv2.0 are large with a bias ratio of − 11.25% and 13.17%, respectively. As for the correlation coefficient of monthly mean vertical salinity, it is noteworthy that only the SODA3.4.2 dataset exhibits a relatively lower value of 0.57, while the remaining six reanalysis datasets showcase considerably higher coefficients exceeding 0.85. Regarding the vertical salinity deviations of different reanalysis datasets, the CORA series datasets have the smallest values of CRMSE, while the GLORYS2v4 and SODA3.4.2 datasets display significantly larger values of CRMSE. Overall, the correlation and deviation analysis show that the C-GLORSv7 dataset and the CORA series have better applicability for vertical temperature and salinity, respectively.

Table 3 The statistical analysis of monthly mean vertical temperature and salinity between the reanalysis datasets and the ARMOR3D multi-observation dataset in the YBS during 1993 − 2020

4 Discussion

In this study, we focus on the applicability of ocean temperature and salt of eight reanalysis datasets in the YBS and especially expand the assessment to several new reanalysis datasets, including the latest published SODA3.4.2, ERA5, and CORAv2.0. Notably, we first attempt to compare the ensemble product GREPv2 with the individual reanalysis datasets in the YBS, and compare the differences between the CORA series (v1.0 and v2.0) for the first time. As such it would provide a more comprehensive assessment than the previous studies.

All reanalysis datasets can reproduce the interannual variability of SST well, and the correlation coefficients are generally higher than 0.80. A warming trend of SST in the YBS has been shown since 2011 (Fig. 2). Especially the quality of SST in ERA5 and ORAS5 reanalysis datasets has improved in the last decade. However, compared to SST, the deviation of SSS is relatively large, which is related to the late start of satellite observations of SSS (Lee and Gentemann 2018). It is worth mentioning that the CORA series has improved the quality of SSS in the YBS over the last 14a, which benefits from more salinity observations assimilated into the CORA series in recent years (Wu et al. 2013). Therefore, the acquisition of observation data should be strengthened to assimilate into reanalysis datasets to improve the quality of SSS.

Obvious seasonal differences in the temperature and salinity deviations were also found in the YBS. The largest SST deviation generally occurs in winter (Figs. 7 and 11), because of the SST latitudinal variation in abundance in winter due to solar radiation, while the intense solar radiation makes the SST more uniform in summer (Qiao et al. 2004), reducing the SST deviations among the reanalysis datasets. However, the largest deviation in vertical temperature occurs in summer (Fig. 14). A seasonal thermocline in the Yellow Sea appears in spring, reaches its maximum in summer, and gradually decreases in autumn until it disappears in winter, which corresponds precisely to the seasonal deviation variation. The considerable positive deviation reflects that the reanalysis datasets underestimate thermocline depth. The large temperature gradient near the thermocline increases the temperature uncertainty, which leads to an increase in the vertical temperature deviations of the reanalysis datasets in summer and autumn (Qiao et al. 2004). In addition, the variation of seasonal deviation of the salinity (Figs. 12 and 15) shows that larger deviations occur in summer and autumn than in spring and winter. The largest deviation of the salinity occurs in summer, which is possibly related to a large amount of runoff running into the YBS in summer (Shi et al. 2017; Xie et al. 2019). The freshwater flux in the summer in the YBS increases largely compared with other seasons; thus, the uncertainty of the freshwater change leads to an increase in the deviation of the salinity in summer, especially in the upper 25 m in the vertical profiles.

The spatial distribution of SST and SSS deviations in the reanalysis datasets exhibits spatial heterogeneity (Figs. 9 and 10), and the region with a high deviation appears generally in the coastal water. For example, there are large positive deviations of SST and large negative deviations of SSS on the west coast of the Bohai Sea and the east coast of the Yellow Sea, respectively. A similar situation has been observed in New Zealand waters with large errors at the sea-land boundary based on four reanalysis datasets (de Souza et al. 2021). This is possibly due to the lack of observation data in the nearshore sea and uneven interpolation (Hu et al. 2015). In addition, most reanalysis datasets have positive deviations of SSS extending from the south to the north in the central Yellow Sea. It is worth mentioning that the region partially overlaps with the Yellow Sea Warm Current (YSWC) with high temperature and salinity. Therefore, the salinity is generally overestimated in the reanalysis datasets in the YSWC.

The paper compares the applicability of temperature and salinity in the YBS considering the update of reanalysis datasets, such as CORAv1.0 and CORAv2.0. The spatial resolution of CORAv1.0 is 1/4°, and the newly released CORAv2.0 improves the spatial resolution to 1/10°. The results show that the SST of CORAv2.0 is closer to the observed values in the YBS. However, the CORAv2.0 quality has not improved for vertical temperature and ocean salinity. Specifically, the CRMSE of monthly mean vertical temperature and monthly mean vertical salinity of CORAv2.0 is 2.09 ℃ and 0.39 psu, respectively, which are both higher than CORAv1.0. This may be partially related to the different ocean models and assimilation methods used in the two reanalysis datasets (Table 1). Moreover, some studies suggest that some parameters in the previous version are more favorable than the newly published version. For example, He and Zhao (2018) pointed out that the CFSR dataset has a larger error in the daily mean air temperature over central China compared to the older NCEP-2 dataset. Hersbach et al. (2020) also reported that ERA5 exists a larger cold bias in the lower stratospheric air temperature than the older ERA-Interim data. The comparison results between different versions of the reanalysis datasets are also closely related to the specific parameters. Moreover, except for the CORA series, the existing products in this study with higher resolution, such as GLORYS12 (Jean-Michel et al. 2021), should be considered in the comparison in the future.

In addition, it is worth noting that there are also some limitations in this paper. The bilinear interpolation method will introduce some errors at the coastal boundary (Zhang et al. 2021), which increases the uncertainty of the ocean temperature and salinity data. Furthermore, the reasons for the difference in various reanalysis datasets, such as the used ocean models and the assimilation methods, should be carried out for further in-depth research in the future.

5 Conclusion

In this paper, we apply ocean station observational data and multi-observation data ARMOR3D to comprehensively evaluate the applicability of eight reanalysis datasets, including ERA5, SODA3.4.2, GREPv2, C-GLORSv7, GLORYS2v4, ORAS5, CORAv1.0, and CORAv2.0, in the YBS. The main conclusions are as follows:

  1. 1.

    The ERA5, ORAS5, and GREPv2 reanalysis datasets can reproduce the SST well, while C-GLORSv7 and GREPv2 can better reflect the vertical ocean temperature variation. In contrast, the SST of the CORA reanalysis series and the vertical temperature of SODA3.4.2 have a certain deviation compared to the observed data. Except for the reanalysis data of GLORYS2v4 and SODA3.4.2, the other six reanalysis datasets can reflect the changes in ocean salinity to some extent. Overall, the GREPv2 dataset is mostly consistent with the temperature observations, while the CORA series reanalysis datasets compare with the observed salinity data well in the YBS.

  2. 2.

    The quality of ocean temperature and salinity from GREPv2 is better than that of the individual member dataset (C-GLORSv7, GLORYS2v4, and ORAS5), which shows that multi-model ensemble can somewhat reduce the deviations of individual data.

  3. 3.

    Most reanalysis datasets can reproduce the interannual variation of SST in the YBS well with improved performance in the last decade to some extent. However, more studies should be focused on improving the quality of regional ocean salinity from reanalysis datasets in the future, especially including strengthening the acquisition of observation data.

  4. 4.

    Large deviations of SST and SSS are usually occurred in coastal waters, such as the western coast of the Bohai Sea and the eastern coast of the Yellow Sea. The accuracy of reanalysis datasets along the nearshore region should be further improved.