1 Introduction

The Mediterranean is a semi-enclosed sea, connected to the Atlantic Ocean through the Strait of Gibraltar in the west and to the Sea of Marmara and the Black Sea through the Dardanelles Straits in the northeast (Fig. 1). Its pelagic ecosystem may be characterized as oligotrophic, exhibiting a well-defined eastward decreasing trend in primary productivity (Moutin and Raimbault 2002) that is related to the anti-estuarine circulation at Gibraltar, with inflowing nutrient poor surface Atlantic waters and out-flowing subsurface Mediterranean waters. The eastern basin, separated by the shallow Sicily Strait (~500 m), is recognized as one of the most oligotrophic areas in the world (Azov 1991). Phosphorous is the limiting nutrient for phytoplankton and bacterial growth, with decreasing concentrations from west to east (Krom et al. 2004). The primary production is mainly controlled by vertical mixing processes that supply the euphotic zone with deep water nutrients, reaching its maximum between December and April and minimum between June and September. The seasonal cycle is stronger in areas characterized by deep water formation, such as the Gulf of Lions in the northwestern Mediterranean, which is one of the most productive areas in the Mediterranean (Morel et al. 1991; Bosc et al. 2004). Relatively increased productivity is also found in areas receiving river nutrient inputs (Fig. 1), such as the Northern Adriatic, the North Aegean, and the Gulf of Lions.

Fig. 1
figure 1

Mediterranean model domain and bathymetry (m). Major rivers and straits are indicated, along with the Gulf of Lions (black box), Adriatic Sea (red box), and Levantine basin (green box), where averages are calculated in Figs. 5, 7, and 8

Numerical models are now routinely used to simulate the dynamics of marine ecosystems, which are subject to changes from climate and human pressures, providing a valuable tool for the management of marine resources. However, the models ability to accurately simulate the space-time variability of the marine environment is limited by various sources of uncertainties, such as those of model structure and parameterization, as well as the quality of initial conditions and meteorological forcing. In this respect, the pronounced characteristics of the Mediterranean ecosystem, such as the P-limited oligotrophism and the role of the physics as the main driver on the ecosystem processes, significantly add to the complexity. Data assimilation is a process of nudging the model simulations toward available observations to reduce the uncertainties in the model outputs. This approach is widely used in atmospheric and ocean sciences and has also become popular in biogeochemical ocean applications in the last decades (see review in Edwards et al. 2015). Particularly in the Mediterranean Sea, biological data assimilation has been applied regionally in the Ligurian Sea (Lenartz et al. 2007; Kalaroni et al. 2016), the Northwest coast of Spain (Torres et al.2006), the Cretan Sea (Allen et al. 2003; Hoteit et al. 2003, 2004; Triantafyllou et al. 2003, 2007, 2012; Kalaroni et al. 2016), and at basin scale (Hoteit et al. 2005; Teruzzi et al. 2014).

Ensemble Kalman filters (EnKFs) are currently among the most popular data assimilation techniques due to their efficiency and robustness in dealing with large scale nonlinear systems and their reasonable computational cost. Different types of EnKFs are now commonly used for physical (e.g., Hoteit et al. 2002; Evensen 2003; Nerger et al. 2006; Xu et al. 2013; Hoteit et al. 2013; Hoteit et al. 2005) and biochemical (Natvik and Evensen 2003; Triantafyllou et al. 2003; Nerger and Gregg 2007; Simon and Bertino 2009; Ciavatta et al. 2011, 2014; Hu et al. 2012;) ocean applications.

In an EnKF, the distribution of the state of the system, conditioned on available observations, is represented by a set of state vectors called ensemble. The standard forecast-update steps of the Kalman filter are then implemented as follows. The ensemble members are first propagated in time with the model to estimate the forecast and its error covariance as the sample mean and covariance of the forecasted members. These are then used in the Kalman correction step to update the forecast with the incoming observations. In practice, a sufficiently large, but computationally demanding, ensemble is needed to well describe the state distribution (Pham 2001; Song et al. 2010). A small ensemble may result in an underestimation of the state variance and a degenerative analysis, characterized by an ensemble collapse (Whitaker and Hamill 2002). This limitation is often mitigated by introducing appropriate inflation and localization (Edwards et al. 2015). Another approach that was proven efficient for enhancing the performance of the EnKF when implemented with small ensembles is the so-called hybrid scheme (Hamill and Snyder 2000). It combines the error covariance estimated from an EnKF ensemble with a pre-selected static background error covariance representing the climatology of the system statistics (Hamill and Snyder 2000; Song et al. 2010; Lui et al. 2016). The hybrid method was found particularly efficient when the filter is implemented with a small ensemble and also in the presence of model error (Wang et al. 2008; Counillon et al. 2009; Song et al. 2010).

Hybrid assimilation schemes have been successfully applied in meteorology (e.g., Hamill and Snyder 2000; Etherton and Bishop 2004; Wang et al. 2008) and more recently in ocean forecasting (Counillon et al. 2009) and sub-surface flows (Gharamti et al. 2014). In the present study, a hybrid scheme was developed and implemented for efficient data assimilation into a marine ecosystem model of the Mediterranean Sea. The scheme combines the ensemble-based forecast error covariance of the singular evolutive interpolated Kalman (SEIK, Pham 2001) filter, with a static

background covariance built from a set of empirical orthogonal functions (EOFs), as in the singular fixed extended Kalman (SFEK, Hoteit et al. 2002). The hybrid algorithm was implemented and tested for assimilation of satellite Chl-a data into a three-dimensional ecosystem model of the Mediterranean. Data assimilation for marine ecosystems can be quite challenging, given the large number of state variables and the required computational cost, the short time-scales, and nonlinear nature of biogeochemical processes (Edwards et al. 2015). In particular, for the shelf and regional areas of Mediterranean ecosystems, these shortcomings have been analyzed and discussed in the works of Triantafyllou et al. (2007) and Triantafyllou et al. (2005), supporting the idea of the development of a more efficient Kalman scheme. The main aim of this study is to assess the performance of the new hybrid scheme for data assimilation and to test its efficiency and robustness before implementing it operationally as part of the POSEIDON forecasting system (www.poseidon.hcmr.gr).

The paper is organized as follows. Section 2 briefly describes the coupled hydrodynamic/biogeochemical model. Section 3 presents the ensemble hybrid data assimilation scheme. Section 4 assesses the performance of the new hybrid scheme against SEIK and SFEK, first with regard to the assimilated data (Chl-a) and then with regard to non-assimilated model variables, such as dissolved inorganic nutrients. In section 5, the performance of the assimilation schemes is discussed in relation to the ensemble spread and forecast update. The performance of the hybrid scheme is also further analyzed through a series of sensitivity experiments, to investigate the effect of the dynamic ensemble size and the blending parameter. Concluding remarks are offered in section 6.

2 Materials and methods

2.1 Model description

A three-dimensional coupled hydrodynamic/biogeochemical model is implemented at Mediterranean basin scale (Fig. 1). The coupled model is comprised of the Princeton Ocean Model (POM, Blumberg and Mellor 1983) and the European Regional Seas Ecosystem Model (ERSEM, Baretta et al. 1995). POM is a primitive equation, free-surface, and sigma-coordinate model that employs a 2.5 turbulence closure scheme (Mellor and Yamada 1982) to compute vertical viscosity/diffusivity. It is a widely spread community model (www.ccpo.odu.edu/POMWEB) with numerous applications in coastal and open ocean studies, including the Mediterranean (Zavatarelli and Mellor 1995; Horton et al. 1997; Korres and Lascaratos 2003), among others. ERSEM is a comprehensive generic biogeochemical model that has been successfully implemented in various coastal and open sea ecosystems, such as the North Sea (Pätsch and Radach 1997), the oligotrophic Mediterranean (Allen et al. 2002; Petihakis et al. 2002), and the Arabian Seas (Blackford and Burkill 2002; Triantafyllou et al. 2014), among others. It follows the functional group approach, adequately describing the pelagic plankton food web with four phytoplankton groups (diatoms, nanoplankton, picoplankton, and dinoflagellates), three zooplankton groups (heterotrophic nanoflagellates, microzooplankton, and mesozooplankton), and bacteria. Its variables also include dissolved and particulate organic matter, along with dissolved inorganic nutrients (phosphate, nitrate, ammonium, and silicate), while carbon dynamics are loosely coupled with nitrogen and phosphorus dynamics, as plankton groups have dynamically varying C:N:P internal pools (in total, 38 pelagic variables are prognostically computed). We refer the reader to Petihakis et al. (2002), Tsiaras et al. (2014), and Petihakis et al. (2014) for more details on the biogeochemical model description and implementation.

The coupled hydrodynamic/biogeochemical model is currently operational in the Mediterranean, providing 4-day forecasts (without data assimilation) for dissolved inorganic nutrients, plankton biomass, and production, as part of the POSEIDON forecasting system (www.poseidon.hcmr.gr; Korres et al. 2007; Tsiaras et al. 2010). It successfully reproduces the main features of the Mediterranean ecosystem, such as the east-west gradient of productivity and phosphorus limitation, the seasonal cycle of plankton production that is controlled by vertical mixing processes, and the increase of productivity in coastal areas receiving river nutrient inputs (Tsiaras et al. 2010).

In this study, simulations were performed for year 2000, and the atmospheric forcing was obtained from the POSEIDON operational weather forecast (Papadopoulos et al. 2002). Fresh water discharge and nutrient inputs from (25) major rivers in the Mediterranean were obtained from hydrological/nutrient emission modeling of the Mediterranean drainage basin (Ludwig et al. 2009). Open boundary conditions near the Gibraltar strait (Fig. 1) were obtained from available climatologies for dissolved inorganic nutrients (MEDATLAS 2002, www.ifremer.fr/medar/) and temperature/salinity (MODB-MED4), while radiation conditions were adopted for current velocities. The Dardanelles water exchange, an important mechanism for the ecosystem of the Aegean Sea (Petihakis et al. 2014), is parameterized through a two-layer open boundary condition (Nittis et al. 2006), with prescribed climatological data of seasonally varying water inflow and dissolved inorganic nutrients (Tugrul et al. 2002).

2.2 Observational dataset

Assimilated observations consist of remote sensing Chl-a, retrieved by SeaWiFS (Sea-viewing Wide Field-of view Sensor) and processed using the Ocean Chlorophyll 4-version 4 (OC4-v4) algorithm (O’Reilly et al. 1998). The SeaWiFS data are 8-day composite products (9 × 9 km2 resolution) for year 2000. On top of assessing the performance of the different filters in improving the model Chl-a estimate, we also evaluated the filters in term of their impact on non-assimilated variables, such as dissolved inorganic nutrients that are primary constituents in the biogeochemical model dynamics. For that purpose, given the limited data availability on specific years, a seasonal “climatology” of near-surface (0–50 m) in situ data was constructed, by aggregating available observations in the Mediterranean on different seasons over 1990–1999 period, obtained from the SeaDataNet database (www.seadatanet.org). The simulated seasonal mean phosphate and nitrate concentrations, extracted at the data locations, were compared against the observations on different seasons. The model performance, particularly the efficiency of data assimilation in reproducing the nutrients spatial/seasonal variability, was assessed by calculating the following model skill indexes over all model (M)/data (D) pairs (n) (e.g., Stow et al. 2009; Jolliff et al. 2009):

$$ \bullet \kern1em Percentage\ model\ bias\ PBIAS={\left(\sum \left( D- M\right)/\sum D\right)}^{\ast }100, $$

which computes the percentage normalized difference between the model and data mean. It basically evaluates whether the model systematically underestimates or overestimates the observations.

$$ \bullet \kern1em Pearson\ correlation\ coefficient\; PCC=\sum \left( D-\overline{D}\right)\cdot \Big( M-\overline{M\Big)}/\sqrt{\sum {\left( D-\overline{D}\right)}^2\cdot \Big( M-{\overline{M\Big)}}^2}, $$

where over-bar denotes a mean value. This provides a measure of whether the model is able to reproduce the observed spatial variability. PCC = 1 indicates a perfect correlation.

$$ \bullet \kern1em RMS\ model\ errorRMSE=\sqrt{\sum {\left( D- M\right)}^2/ n}, $$

which gives the overall goodness of fit between model and data values, with RMSE = 0 indicating a perfect fit.

$$ \bullet \kern1em Normalized\ standard\ deviation\; NSTD={\sigma}^2(M)/{\sigma}^2(D), $$

which measures whether the model exhibits a similar overall variability to the observed (NSTD = 1).

3 The hybrid SEIK filter

The hybrid ensemble scheme proposed in this study uses a combination of covariances from the singular extended interpolated Kalman (SEIK; Pham 2001) and the singular fixed extended Kalman (SFEK, Hoteit et al. 2002; Hoteit and Pham 2004) in the Kalman update step. SFEK uses a static background covariance, built from a set of EOFs, while SEIK employs a flow-dependent error covariance, estimated from a stochastically generated ensemble (Hoteit et al. 2012). HYBRID bears some similarities with the singular semi-evolutive interpolated Kalman (SSEIK, Hoteit et al. 2002), in the sense that its algorithm also involves static and flow-dependent covariances. In SSEIK, however, only a “part” of the filter covariance is integrated in time with the model during the forecast step through a well-chosen reduced ensemble that is sampled, at every analysis step, based on certain criteria (as for example propagating the part of the covariance that represents the dominant error modes as extracted by a singular value decomposition, SVD).

3.1 SEIK filter

SEIK operates as a succession of three consecutive steps: a sampling step to generate the ensemble from the filter’s estimate and its covariance, a forecast step to integrate the ensemble forward with the model, and a Kalman update step of the forecast ensemble mean and covariance with the incoming observations.

3.1.1 Sampling step

Starting from an available analysis state and a low rank (r) error covariance P a(t k ) = L k U k L k T at a given time t k , an ensemble of N = r + 1 states\( {X}_1^a\left({t}_k\right),\dots, {X}_N^a\left({t}_k\right) \) is randomly drawn after every assimilation cycle (t k ) in such a way that their sample mean and covariance exactly match X a(t k ) and P a(t k ) (Note that N = r + 1 is the smallest ensemble that could be generated to describe a rank-r covariance matrix, Pham 2001) i.e.,

$$ {X}^a\left({t}_k\right)=\frac{1}{N}\sum_{i=1}^N{X}_i^a\left({t}_k\right), $$
(1)
$$ {P}^a\left({t}_k\right)=\frac{1}{N}\sum_{i=1}^N\left[{X}_i^a\left({t}_k\right)-{X}^a\left({t}_k\right)\right]\kern0.5em {\left[{X}_i^a\left({t}_k\right)-{X}^a\left({t}_k\right)\right]}^T $$
(2)

U k is a r × r matrix and L k is the so-called filter correction basis of dimension n × r, with n being the size of the system state X(t k ). To generate\( {X}_i^a\left({t}_k\right) \), one may use the second-order exact sampling technique (Pham 2001; Hoteit et al. 2002), in which the i th ensemble member is computed as

$$ {X}_i^a\left({t}_k\right)={X}^a\left({t}_k\right)+\sqrt{r+1}\kern0.1em {L}_k{\left({\Omega}_{k, i}{C}_k^{-1}\right)}^T, $$
(3)

where C is the square root matrix of U k (that can be obtained by Cholesky decomposition) and Ω k , i denotes the i th row of a randomly generated matrix Ω k , with columns orthonormal and orthogonal to the vector [1 ⋯ 1]T, so that the \( {X}_i^a\left({t}_k\right) \) satisfy (1) and (2).

3.1.2 Forecast step

The generated ensemble members X i a (t k ) are propagated forward with the model (M), to compute the forecast ensemble \( {X}_i^f\left({t}_{k+1}\right)= M\left({t}_{k+1},{\mathrm{t}}_k\right){X}_{\mathrm{i}}^a\left({t}_k\right) \). The forecast state X f(t k + 1) and its error covariance matrix P f(t k + 1) are then estimated as the sample mean and covariance of the\( {X}_i^f\left({t}_{k+1}\right) \), respectively. One can then decompose P f(t k + 1) as

$$ {P}^f\left({t}_{k+1}\right)={L}_{k+1}{\left[{NTT}^T\right]}^{-1}{L_{k+1}}^T, $$
(4)

with

$$ {L}_{k.}=\left[{X}_1^f\left({\mathrm{t}}_{\mathrm{k}}\right),\kern0.5em \dots, \kern0.5em {X}_{r+1}^f\left({\mathrm{t}}_{\mathrm{k}}\right)\right]\cdot T $$
(5)

and T is a (r + 1) × r orthogonal matrix with zero column sums (Hoteit et al. 2002).

3.1.3 Analysis step

Once a new observation y k+1 becomes available, the forecast and its error covariance matrix are updated exactly as in the Kalman filter, as

$$ {X}^a\left({t}_{k+1}\right){= X}^f\left({t}_{k+1}\right){+ K}_{k+1}\left[{y}_{k+1}-{H}_{k+1}\left({X}^f\left({t}_{k+1}\right)\right)\right] $$
(6)
$$ {P}^a\left({t}_{k+1}\right)={L}_{k+1}{U}_{k+1}{L_{k+1}}^T, $$
(7)

where K k + 1 is the so-called Kalman gain, given by

$$ {K}_{k+1}{= L}_{k+1}{U}_{k+1}{(HL)}_{k+1}^T{R}_{k+1}^{-1} $$
(8)
$$ {(HL)}_{k+1}=\left[{H}_{k+1}\left({X_1}^f\left({t}_{k+1}\right)\right),\dots, {H}_{k+1}\left({X_{r+1}}^f\left({t}_{k+1}\right)\right)\right]\cdot T, $$
(9)

with H k+1 the observational operator at time t k+1 , practically computing the predictions of the observation by the forecast members. R k+1 is the observational error covariance, and U k+1 is the computed from

$$ {U}_{k+1}^{-1}=\rho \frac{1}{N}{\left({T}^T T\right)}^{-1}+{(HL)}_{k+1}^T{R}_{k+1}^{-1}{(HL)}_{k+1}. $$
(10)

ρ is the so-called “forgetting” factor, taking values between 0 and 1. It is used to inflate the error covariance (by 1/ρ) to account for the various sources of errors, assigning more or less confidence on the observations in relation to the model forecast (Hoteit, Pham and Blum 2002).

Localization is applied as described by Nerger et al. (2006) to filter out long-range spurious correlations and to allow more degrees of freedom to fit the data. This is implemented in practice by updating the model state variables at each grid cell using only observations within a specified cutoff radius.

3.2 SFEK filter

The update step of the SEIK filter is only applied along the directions of L k , which was hence called correction basis of the filter (Pham 2001). L k is updated in time to follow changes in the system dynamics, but this can be computationally demanding. Noticing that the filter generally behaves well when the update step is always applied along an initial correction basis L 0 described by a set of EOFs computed from a historical model trajectory, Hoteit et al. (2002) suggested to keep L 0 invariant with time to drastically reduce the computational load. In the resulting SFEK filter, the forecast step only computes the forecast state, by integrating the analysis state with the model X f(t k + 1) = M(t k + 1,t k )X a(t k ). L k is kept invariant equal to L 0 , and U k is still updated as in Eq. 10. The analysis step is then identical to that of SEIK filter (Eqs. 68).

3.3 HYBRID filter

The HYBRID scheme uses a weighted combination of the (low-rank and already scaled/inflated) covariances of SEIK and SFEK, following the formulation of Hamill and Snyder (2000). More specifically, HYBRID uses

$$ {P_{k+1}}^{Hybrid}=\left(1-\alpha \right)\cdot {P_{k+1}}^{SEIK}+\alpha \cdot {P_{k+1}}^{SFEK} $$
(11)

as background covariance in the SEIK filter update step, where α is the weighting factor between 0 and 1. Based on the low-rank decomposition of P SEIK and P SFEK, one can write:

$$ \begin{array}{l}{P_{k+1}}^{Hybrid}=\left(1-\alpha \right)\cdot {L_{k+1}}^{SEIK}{\left[{NT}^T T\right]}^{-1}{\left({L_{k+1}}^{SEIK}\right)}^T+\alpha \cdot {L}^{SFEK}{U_k}^{SFEK}{\left({L}^{SFEK}\right)}^T,\\ {}\kern4em =\left[{L_k}^{SEIK}|{L}^{SFEK}\right]\cdot \left[\begin{array}{cc}\hfill \left(1- a\right)\cdot {\left({NT}^T T\right)}^{-1}\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill a\cdot {U_k}^{SFEK}\hfill \end{array}\right]\cdot {\left[{L_k}^{SEIK}|{L}^{SFEK}\right]}^T\\ {}\kern4.5em ={L_k}^{Hybrid}{U_k}^{Hybrid}{\left({L_k}^{Hybrid}\right)}^T,\end{array} $$
(12)

which can be then directly used in Eqs. 610 to update the forecast with a blended correction basis of flow-dependent and climatological (static) directions.

For resampling, we follow Wang et al. (2008) and generate the analysis ensemble from the analysis covariance of the flow-dependent part only, that is we use Eq. 3 with L k  = L k SEIK and C k −1 the cholesky decomposition of U k SEIK, with:

$$ {\left({U}_{k+1}^{SEIK}\right)}^{-1}=\rho \frac{1}{N}{\left({T}^T T\right)}^{-1}+{\left({HL}^{SEIK}\right)}_{k+1}^T{R}_{k+1}^{-1}{\left({HL}^{SEIK}\right)}_{k+1}. $$
(13)

This ensemble is then integrated with the model as in SEIK to compute the forecast state, as the mean of the forecasted members and P k+1 SEIK as their sample covariance. In terms of computational load, the size of the flow-dependent ensemble in the hybrid scheme can be significantly smaller than that of the SEIK filter, meaning important savings in computing hours.

4 Experiments design and results

The model state vector consists of all (38) pelagic variables of the biogeochemical model (see model description section 2.1). The initial/static correction basis L 0 was generated by performing an EOF analysis on a long-sequence of (bi-daily) model outputs, simulated over a 2-year period (January 1997–December 1998). A set of 25 EOF modes was retained, as described by Hoteit et al. (2001), to form the initial correction basis, explaining about 84% of the system’s variance.

The observation error was estimated as 20% of the SeaWiFS Chl-a, i.e., R(i,j) = [0.2 × Chl-a SeaWiFS (i,j)]2, assuming a 20% measurement error. This is lower from the SeaWiFS errors that are usually used in the literature (~35%). A recent work that sampled a great variety of the Atlantic waters (including temperate environments) reported an accuracy of 16% (Brewin et al. 2016). In this preliminary study, testing the efficiency of the proposed ensemble hybrid scheme for data assimilation into a marine ecosystem model, the specified 20% mean error was considered as a compromise between coastal (higher error) and open sea (lower error) areas. More sophisticated approaches for estimating the observation error have been recently proposed to tackle this challenging issue in data assimilation (e.g., Luo and Bhakta 2016; Miyoshi et al. 2013). Moreover, advanced techniques have been suggested for the online estimation of the observation noise variance as hyper-parameters during the filtering process (e.g., Ueno and Nakamura 2014; Dreano et al. 2017; Li et al. 2009). These would still however require limiting the spatial variability of the observational error variance through some kind of parameterization in order to reduce the number of hyper-parameters to be estimated.

A cutoff radius of ~30 km was adopted for the localization of the filter update. This was chosen by trial-and-error, after a series of sensitivity experiments testing the impact of various values on the performance of the filtering schemes. Consistent with Hamill et al. (2001), we found that a small cut-off radius was more suitable when the filter is implemented with small ensembles. SEIK performance was slightly improved with a slightly larger (~60 Km) cut-off radius. The ensemble size of SEIK is N = 26, while the size of the HYBRID flow-dependent ensemble is N D  = 10. The rank of the static background covariance in HYBRID and SFEK is r 1  = 16 and r 0  = 25, respectively (see Table 1). The SeaWiFS 8-day average Chl-a is assimilated at the middle of the 8-day assimilation window (end of day 4). The filter performance in improving the model Chl-a estimate is evaluated comparing the model 8-day average Chl-a of analysis and forecast states against the respective assimilated SeaWiFS 8-day average Chl-a (Figs. 2– 5). This model 8-day average is computed as the average of the analyzed Chl-a at day 4 and the forecasted Chl-a on the remaining 7 days, considered as the best estimate of the 8-day average Chl-a data over the assimilation window.

Table 1 Attributes of different assimilation experiments
Fig. 2
figure 2

Model simulated Chl-a without assimilation (FREE) and with the different filter schemes (HYBRID, SEIK, SFEK) against SeaWiFS Chl-a for 21-28/04/2000 (left) and 19-27/08/2000 (right)

Various sensitivity experiments were performed to set the SEIK/HYBRID ensemble ranks (r k , r 0 ), forgetting factors (ρ), and the covariances weighting factor (α). The best parameters were chosen, based on the filters performance in estimating Chl-a, as well as the system robustness and behavior with non-assimilated variables (dissolved inorganic nutrients). The attributes of the selected data assimilation experiments are summarized in Table 1. The performance of HYBRID was compared with that of SEIK and SFEK. Results of sensitivity experiments to test the HYBRID performance with different values of the weighting factor (α) and the size of the dynamic ensemble (see Table 1) are also presented and discussed.

4.1 Impact of data assimilation on Chl-a

The performance of the HYBRID scheme is first evaluated against the assimilated SeaWiFS Chl-a, and in comparison with SEIK, SFEK and a model free-run without assimilation. Two examples illustrating the impact of data assimilation with the HYBRID scheme on the model Chl-a are shown in Fig. 2 for spring and summer periods. During April, the model free-run underestimates the phytoplankton spring bloom in the Gulf of Lions (see Fig. 1). This bloom is triggered by an increased vertical mixing in this dense water formation site, which results in the entrainment of deep water nutrients in the euphotic zone (e.g., Marty et al. 2002). The model free-run partly reproduces such a bloom in early March (not shown), but underestimates the late April bloom that is captured by SeaWiFS. This underestimation is significantly reduced after assimilation, particularly using HYBRID and SEIK, and somewhat less with SFEK. This is better depicted in Fig. 3, plotting the relative differences between the simulated and SeaWiFS Chl-a. Another area of model deficiency is the overestimation of Chl-a in the Eastern Levantine, Ionian, and Balearic Seas (see Fig. 1), which is mostly related to the hydrodynamic model overestimation of winter-spring vertical mixing. Again, this is significantly improved in HYBRID, SEIK, and to a lesser degree in SFEK. During August, the free-run Chl-a is underestimated, mainly in the Adriatic and N. Aegean, the Gulf of Lions, and some areas in the Ionian and Eastern Levantine. This bias is practically corrected (Figs. 2 and 3) over most of the domain by the HYBRID scheme. Few exceptions are some coastal areas in the Northern Adriatic, the Gulf of Gabes, where the retrieved Chl-a by the satellite bio-optical algorithm may be overestimated due to direct bottom reflection in shallow waters (Barale et al. 2008) and the Egyptian coast that is influenced by nutrient inputs from the River Nile (Fig. 1). SEIK achieves a slightly better performance in the Adriatic, as compared to HYBRID. It also behaves better than SFEK in the Gulf of Lions and the N. Aegean Sea, but is slightly less efficient in some other areas in the South Ionian and the Eastern Levantine.

Fig. 3
figure 3

Model simulated Chl-a relative error (model-SeaWiFS)/SeaWiFS, without assimilation (FREE) and with the different filter schemes (HYBRID, SEIK, SFEK), during 21-28/04/2000 (left) and 19-27/08/2000 (right)

Figure 4 plots the annual mean fractional change of HYBRID, SEIK, and SFEK Chl-a relative error with respect to the free-run error, illustrating the overall performance of the different schemes throughout the Mediterranean. HYBRID reduces the estimation error by more than 40% in most areas compared to the free-run. Slightly less improvement is achieved in more productive areas, receiving important lateral nutrient inputs, such as the Northern Adriatic (River Po), the Gulf of Lions (River Rhone), the Northern Aegean (river and Black Sea Water inputs), and the Nile area, or exhibiting strong space-time variability, such as the Alboran Sea. The latter is characterized by important dynamical variations of the circulation driven by the Atlantic water inflow and local atmospheric forcing (e.g., Macias et al. 2014). SFEK reduction of the estimation error exhibits a similar pattern but to a much less degree, as compared to HYBRID. This could be attributed to the spatial variability of the assigned observation error, which was estimated as a function of satellite Chl-a, resulting in a relatively weaker correction in more productive areas where observations are considered more uncertain, thus, receiving less weight in the filter update. SEIK exhibits somehow a different behavior, being more efficient in more productive areas, such as the Adriatic, the Gulf of Lions, and the North Aegean and less efficient in others, mostly open-sea areas in the Eastern Levantine and the southwestern Mediterranean. This is due to the nature of its fully flow-dependent ensemble, exhibiting more spread in the more productive areas. Overall, SEIK provides a mixed performance, showing a better behavior than SFEK in some areas and worse in others, while both being significantly less performant than HYBRID in most areas.

Fig. 4
figure 4

Fractional change of the annual mean Chl-a relative error (|data-model|/data) against the SeaWiFS Chl-a, over the free run simulation (ASSIM/FREE-1) for the simulations adopting the HYBRID (top), SEIK (middle), and SFEK (bottom) assimilation filter schemes

The seasonal variability of the relative Chl-a error over different areas of the Mediterranean is shown in Fig. 5. We focus on the Adriatic Sea, because it represents a river-influenced productive area, the Gulf of Lions, because it is a dynamically varying area that exhibits a strong spring bloom, and the Levantine area that is a more oligotrophic area. HYBRID outperforms both SEIK and SFEK throughout the year in all areas, except in the Adriatic, where SEIK performs better over most of the year. SEIK is also slightly better than SFEK in the Gulf of Lions, but worse in the Levantine area during the summer-autumn period. Overall, SEIK appears to perform better than SFEK during winter-spring period that is characterized by stronger variability, while SFEK is slightly better during the calmer summer-autumn period. SEIK provides a slightly lower mean 8-day estimation error (0.34) in the Mediterranean, as compared to SFEK (0.343). One can notice that despite this, SEIK leads to a slightly worse Mediterranean average analysis and forecast errors (Fig. 5), as compared to SFEK. This is related to the stronger impact of SEIK on non-assimilated variables, such as dissolved inorganic nutrients that results in a more efficient decrease of the Chl-a error over the 8-day period (see discussion in section 5.1). Furthermore, HYBRID appears very efficient in the forecast update, as depicted by its very low analysis error, as compared to both SEIK and SFEK. This is not reflected in its forecast error, being much closer to the other schemes, but gives HYBRID an important “head start” over the other schemes, which results in a significantly lower error on the 8-day period.

Fig. 5
figure 5

Seasonal variability of the mean 8-day Chl-a relative error (|data-model|/data) against the SeaWiFS Chl-a, averaged in the Adriatic Sea (top left), Gulf of Lions (middle left), Levantine basin (bottom left), Mediterranean (bottom right), and mean analysis (top right) and forecast (middle right) Chl-a relative error, averaged over the Mediterranean, for the simulations adopting the HYBRID (red line), SEIK (black line), SFEK (green line) assimilation filter schemes and the one without assimilation (FREE, blue line)

Comparing other skill indexes (Table 2), SEIK achieves better scores in terms of overall percentage bias, RMS error and standard deviation, as compared to both HYBRID and SFEK, which is mainly due to its stronger impact in the most productive Adriatic area. On the other hand, HYBRID estimates correlate better with the data.

Table 2 Percentage bias (PBIAS), pearson correlation coefficient (PCC), normalized standard deviation [STDN = STD(model)/STD(data)], and root mean square error (RMSE) of model simulated Chl (mg/m3), NO3 (mmol/m3), and PO4(mmol/m3) against SeaWiFS Chl-a and available in situ data over 1990–2009 period, collated from SeaDatanet database (www.seadatanet.org). The correlation coefficient is calculated on the logarithm of model and data values.

4.2 Impact on non-assimilated variables

The impact of data assimilation on model predicted dissolved inorganic nutrients (phosphate, nitrate) in terms of bias, correlation, standard deviation, and RMS error, against observations obtained from the SeaDataNet seasonal climatology (1990–1999) is outlined in Table 2. The free-run shows an overall underestimation (positive percentage model bias, PBIAS) for both nitrates and phosphates, and also a weaker phosphate variability (normalized STD < 1), as compared to the observed. SEIK achieves the stronger decrease of PBIAS and increase of STD for phosphate concentration. In contrast, SEIK has the highest RMS error and the lowest Pearson correlation (PCC). This is mostly related to SEIK’s stronger update in the productive Adriatic region (Figs. 4 and 5), where nutrient concentrations are highest. This stronger update results in a significant increase of nutrients that has a positive effect on PBIAS and STD, but a negative on model correlation and RMS error, suggesting a potential deterioration in reproducing the observed variability. When the Adriatic Sea is removed, the SEIK phosphate PBIAS is still better than HYBRID and SFEK, but differences are much lower. SEIK has also the lowest RMS error, indicating the good performance with phosphate. On the other hand, HYBRID leads to a higher correlation coefficient (with or without the Adriatic) and a lower RMS error, which are better indicators of the model performance, despite the weaker improvement in PBIAS and STD. In the case of nitrates, HYBRID improves both correlation and PBIAS, but slightly overestimates STD and has a higher RMS error. The simulation of nitrate is significantly deteriorated using SEIK, mainly due to its strong update in the Adriatic. In most of the Mediterranean, phosphate is the main limiting factor for phytoplankton growth (Krom et al. 2004). This phosphate limitation is particularly noticeable in the Adriatic, which receives river nutrient loads characterized by high N:P ratios (Ludwig et al. 2009). Moreover, in this area, the model exhibits a systematic negative bias (Fig. 3). This is presumably due to an underestimation of phosphate river loads or also due to an overestimation of the retrieved Chl-a by the satellite bio-optical algorithm, as this may be influenced by land inputs of colored matter and suspended solids (Gregg and Casey 2004). Given this negative bias, the filter update results in an increase of nutrient concentrations, along with Chl-a. The addition of excess nitrogen that cannot be consumed by phytoplankton, being limited by phosphate, may result in nitrate built up by the model, as observed in the results of SEIK (see discussion below). The impact of SFEK on nitrate and phosphate is somewhat similar to HYBRID, as indicated by its skill indexes lying between HYBRID and the free-run.

5 Analysis of filters forecast update/spread and HYBRID sensitivity experiments

5.1 Forecast update

The impact of the different filtering schemes on Chl-a and dissolved inorganic nutrients is further investigated by examining the fractional change as imposed by the filters after the analysis step [FC = (analysis − forecast)/forecast] for Chl-a, phosphates and nitrates concentrations (Figs. 6 and 7). HYBRID leads to a much larger mean annual absolute fractional change (Fig. 6), particularly for Chl-a, indicating a more pronounced update as compared to SEIK and SFEK. It is noticeable that in the Adriatic, SEIK leads to a stronger change for nitrate and phosphate, as compared to HYBRID, despite the weaker correction it imposes on Chl-a. The more pronounced impact of SEIK on nitrates results in the overestimation mentioned above (Table 2). We should note, however, that nitrates updates are more significant, as compared to phosphates in all schemes. Areas showing a stronger update are those characterized by a larger model bias, such as the Adriatic, or those characterized by strong seasonal variability, such as the Gulf of Lions. SFEK imposes a relatively weaker correction than both HYBRID and SEIK in the more productive areas, such as the Adriatic and G. Lions, but a slightly stronger correction than SEIK in more oligotrophic open sea areas, such as the Levantine basin. This differentiation in the filter’s performance is related to the nature of their covariances. SEIK is more efficient in more productive and variable areas, such as the Adriatic and Gulf of Lions, due to its flow-dependent covariance, while SFEK shows a good performance in less variable areas, such as the Levantine, as its EOF-based covariance retains the spread of the dominant climatological modes of the system. HYBRID appears to perform well in all areas thanks to its “blended” covariance, estimated as a weighted average of flow-dependent and “smoothed” climatological covariances.

Fig. 6
figure 6

Annual mean absolute fractional change (|Analysis-Forecast|/Forecast) of the Chl-a (left column), phosphates (middle column), and nitrates (right column) assimilation correction for HYDRID (top), SEIK (middle), and SFEK (bottom)

Fig. 7
figure 7

Seasonal variability of the Chl-a (blue line), phosphates (red line), and nitrates (black line) assimilation correction fractional change (Analysis/Forecast-1) for HYDRID (left column), SEIK (middle column), and SFEK (right column), averaged over Adriatic (top), G. Lions (middle), and Levantine (bottom) areas

Figure 7 shows the different impacts of the filters on Chl-a, nitrates, and phosphates in the three focus areas: Adriatic, Gulf of Lions, and Levantine. In the Adriatic, SFEK mostly improves Chl-a, while SEIK imposes a more dynamically evolving update, based on its flow-dependent correction subspace, to all variables and in many cases most pronounced on nitrates. HYBRID improvement lies somewhere in between, leading to a stronger update of Chl-a, as SFEK, but with a more dynamically evolving update and a stronger impact on other variables as in SEIK. The SEIK’s stronger impact on phosphates (Figs. 6 and 7) results in a lower Chl-a error in the Adriatic, as compared to HYBRID (Fig. 5), despite its weaker impact on Chl-a. As nutrients are the driving fuel of phytoplankton, it appears to be more efficient for the filter to change those in order to achieve a better sustained change in phytoplankton. This information emerges from the statistics of flow-dependent covariance, as estimated from the evolving ensembles. A shortcoming of this dynamical behavior is that an inappropriate nutrients correction may lead to instabilities, as in the case of nitrates for SEIK. However, this is not entirely attributed to the filter. As explained above, phytoplankton growth is constrained by the most limiting nutrient and in most cases phosphates. The biogeochemical model provides a feedback to minimize the effect of an inappropriate correction of this limiting nutrient through its own dynamics, which is not the case for nitrates that are found in excess and are therefore susceptible to unconstrained instabilities. In that respect, HYBRID appears more robust than SEIK, as its dynamically based covariance is smoothed by a static one that was built by time-“smoothed” EOFs. In the Gulf of Lions, HYBRID imposes a stronger and more dynamically evolving update on all variables, as compared to SEIK and SFEK, showing peaks during the spring bloom events, which explains its more efficient error decrease (Figs. 3 and 5). In the Levantine, the filters impose a negative correction during the winter-spring period, to correct the model positive bias in this area (Figs. 2 and 3). This is most pronounced in HYBRID and comparable in SEIK and SFEK, as is the decrease of the respective errors (Fig. 5). During summer-autumn period, SFEK moderately changes Chl-a, while SEIK’s update is quite weak for both Chl-a and nutrients, resulting in a weak error decrease (Figs. 3 and 5).

5.2 Ensemble spread

The poor performance of SEIK in the Eastern Levantine during the summer-autumn period (Figs. 5 and 7) can be related to the spread of the forecast ensemble (δX a (t k )) that is calculated as the standard deviation of its (N = 26) ensemble members

$$ \delta {X}^a\left({t}_k\right)=\sqrt{\frac{1}{N}\sum_{i=1}^N{\left[{X}_i^a\left({t}_k\right)-\overline{X^a}\left({t}_k\right)\right]}^2\kern0.5em } $$
(14)

In the case of SFEK, the spread of the static ensemble is the standard deviation of the initial ensemble members (δX a (t 0)), while in the case of HYBRID, the spread is calculated as a weighted combination (α = 0.95) of the variance of the dynamic (N D  = 10) and static (r 1 = 16) ensemble.

As shown in Fig. 8, the spread of the SEIK ensemble for Chl-a in the Eastern Levantine significantly decreases after June, as is the correction imposed by the filter (Fig. 7). The opposite occurs in the Adriatic, where the SEIK ensemble exhibits a larger Chl-a spread, as compared to SFEK/HYBRID (Fig. 8), consistent with the SEIK smaller Chl-a error (Fig. 5). The HYBRID ensemble spread is quite close to the SFEK spread, due to the relatively small (1-α = 0.05) contribution of the flow-dependent ensemble (this is the reason why the HYBRID ensemble spread is plotted on a different axis in Fig. 8, in order to better show its variability in time). In the Gulf of Lions, SEIK exhibits a slightly lower spread than SFEK/HYBRID, which is, however, maintained throughout the year, in contrast with the Eastern Levantine. The close relation between the SEIK ensemble spread and its ability to reduce the Chl-a estimation error can also be clearly identified, comparing the annual mean spread (Fig. 9) with the Chl-a relative error decrease (Fig. 4). One may notice that the areas where SEIK is more efficient are those characterized by a larger ensemble spread, such as the Adriatic, the North Aegean, and the Gulf of Lions. In contrast, areas where SEIK exhibits a poor performance, such as the coastal areas in the Eastern Levantine or the Alboran Sea, are those characterized by a smaller ensemble spread. In the Eastern Levantine, the small spread of the ensemble, particularly during summer-autumn periods (Fig. 8), can be attributed to the very low productivity (Fig. 2), as surface nutrients are depleted, and Chl-a approaches very small values. On the other hand, the Alboran Sea is a productive and dynamic area, where one would expect a better performance for SEIK. The SEIK ensemble has enough spread in this area during the early assimilation window (not shown). However, the ensemble appears to have lost most of the initial spread after only few forecasting cycles with the model (Fig. 9). This may be attributed to the strong effect of circulation in this area, controlled by Atlantic water inflow and local atmospheric forcing that apparently drives the ensemble members closer to the mean state. One way to alleviate this could be to consider including stochastic perturbations in the model internal dynamics and/or the atmospheric forcing. A smaller forgetting factor (i.e., larger inflation rate) should also result in a larger spread, but this would affect the entire domain. Given that in some areas, such as the Adriatic, SEIK already exhibits a sufficiently large spread and corrections, a spatially varying forgetting factor that would inflate the spread in areas where it is low, might be more appropriate (Anderson 2009).

Fig. 8
figure 8

Spread of Chl-a forecast ensemble (see Eq.14) for SEIK (black line, left axis), SFEK (green line, left axis), and HYBRID (red line, right axis), averaged over Adriatic, G.Lions and Levantine areas (Fig.1). The HYBRID spread (red line) is very close to SFEK and is also plotted on a different (right) axis in order to point out its variability with time

Fig. 9
figure 9

Annual mean spread of Chl-a forecast ensemble (see Eq.14) for SFEK (left), SEIK (middle), and HYBRID flow-dependent part (right). (The HYBRID total ensemble spread (Eq. 14) is very similar to that of SFEK)

A relatively small contribution (1-α = 0.05) from the flow-dependent covariance is adopted in HYBRID (Eq. 11), resulting in an ensemble spread being effectively close to that of the SFEK static ensemble (Figs. 8 and 9). Including this small (5%) contribution from the flow-dependent covariance in HYBRID seems to significantly enhance the filter performance as compared to both SFEK and SEIK. As discussed in the next section, increasing the weight of the flow-dependent covariance deteriorated the HYBRID behavior, which started resembling more the SEIK behavior, leading to an increased error for nitrates, particularly in the Adriatic (Table 2, see discussion in section 4.2). The relatively small (1-α = 0.05) flow-dependent contribution in HYBRID, that is found optimum in our case, is consistent with Counillon et al. (2009), who also found an optimum α = 0.95 for their HYBRID scheme as compared to a 10-member ensemble Kalman filter and an Optimal Interpolation scheme. As mentioned above (section 4.2, Figs. 6– 7), the HYBRID appears to impose a significantly stronger correction (Analysis-Forecast, Eq. 6) to the forecast, as compared to SFEK, despite their similar spread. Given that the HYBRID forecast error is always lower than SFEK (not shown), this stronger correction may be attributed to the more efficient representation of the error growth directions in the HYBRID covariance, which includes a flow-dependent component, allowing it to track changes in the system dynamics (Pham 2001; Hoteit et al. 2002; Hoteit et al. 2004). Another important attribute of HYBRID is its ability to maintain enough variance for its update step through the contribution of the static covariance. In this way, HYBRID remains efficient in areas, such as the Levantine, where SEIK ensemble members are driven to a similar state during summer, resulting in a poorer performance.

5.3 HYBRID sensitivity experiments

Another set of sensitivity experiments (see Table 1) was performed to investigate the sensitivity of the HYBRID scheme to the weighting factor and the size of the dynamically varying ensemble. In the reference HYBRID simulation (Table 1), the weighting parameter is α = 0.95. When this is decreased to 0.7 and 0.5, i.e., assigning more weight to the flow-dependent ensemble covariance (see Eq. 11) in the analysis step, the Chl-a relative error is also decreased (Table 3), suggesting a more effective correction. However, in this case, HYBRID starts resembling more the SEIK behavior, increasing the nitrates correction in the Adriatic that results in a significant increase in the nitrate model bias.

Table 3 Annual mean of 8-day average, analysis, and forecast relative Chl-a error (|data-model|/data), nitrates correction fractional change (FC, |analysis − forecast|/forecast), nitrates concentration in the Adriatic and nitrate overall model PBIAS, for different HYBRID runs (see also Table 1)

The reference HYBRID simulation has used a flow-dependent ensemble of 10 members (Table 1). HYBRID was also tested with a flow-dependent ensemble of 5 (HYBRID_5), 15 (HYBRID_15), and 20 (HYBRID_20) members. The Chl-a mean relative error is slightly larger in HYBRID_5 simulation, but seems to quickly level off with an increasing flow-dependent ensemble size (Table 3). However, one may clearly notice that the larger flow-dependent ensemble results in an improvement of the Chl-a forecast, suggesting a more efficient projection of the Chl-a information onto the non-assimilated variables. A larger flow-dependent ensemble is therefore beneficial and is recommended depending on the availability of computational resources. HYBRID was not found sensitive to the rank of its static background covariance, providing very similar results to the reference HYBRID simulation (not shown), which is consistent with previous sensitivity experiments performed with SFEK (Hoteit et al. 2004).

6 Conclusions

A hybrid ensemble scheme, combining a flow-dependent ensemble covariance with a static background covariance, was developed and successfully implemented for assimilating satellite (SeaWiFS) Chl-a data into a marine ecosystem model of the Mediterranean. The performance of the new hybrid scheme (HYBRID) was assessed against a model free-run (without assimilation), and SEIK and SFEK schemes, with regard to the assimilated data (Chl-a) and non-assimilated model variables, such as dissolved inorganic nutrients (nitrates, phosphates).

The Chl-a estimation error against SeaWiFS significantly decreased after assimilation, particularly with the HYBRID scheme, correcting the most important model deficiencies. HYBRID reduced the error by more than 40% in most areas, as compared to the free-run, showing a slightly less pronounced improvement in the more productive areas, which could be explained by the relatively high observational error that has been adopted in these areas. SFEK estimation error exhibits similar patterns to that of HYBRID, but is generally higher. SEIK leads to mixed performances, being more efficient than SFEK (and in some cases HYBRID) in more productive areas, and less efficient in others, mostly open-sea oligotrophic areas. This is related to the flow-dependent SEIK ensemble, efficiently identifying the directions of error growth in highly productive areas. In oligotrophic nutrient depleted areas during summer-autumn period, characterized by very low Chl-a, or in areas that are strongly controlled by circulation, the ensemble members are driven to a similar regime with small spread, leading to the SEIK poor performance.

The impact of data assimilation with the different filtering schemes on dissolved inorganic nutrients (phosphate, nitrate) was evaluated in terms of various skill indexes (bias, correlation, standard deviation, and RMS error), against a seasonal climatology (1990–1999) of in situ observations in the Mediterranean (SeaDataNet). Data assimilation had a positive overall impact on nutrients (as compared to the free-run) with regard to most skill indexes. An exception was the deterioration of nitrates simulation by SEIK in the Adriatic, which was related to its strong update in this productive area, and also to the phytoplankton limitation on phosphate that lead to a built up of excess of nitrates. SEIK, on the other hand exhibited a good performance, particularly in terms of mean model bias, with phosphate, which is the main limiting nutrient in the Mediterranean, thus, controlling primary production. HYBRID achieved the best and most robust performance improving phosphate, particularly in terms of correlation and RMS error, as well as nitrates, in terms of correlation and bias. The impact of SFEK on nitrate and phosphate was somewhat similar to HYBRID.

The filters different behavior was illustrated by the space-time variability of the forecast update on Chl-a and dissolved inorganic nutrients. SEIK imposed a more dynamically evolving correction that was in some cases stronger on nutrients, as compared to Chl-a. This emerges from the SEIK flow-dependent covariance, as nutrients effectively drive phytoplankton variability, and their change may result in a better sustained change in phytoplankton. A shortcoming of this behavior is that dynamical inconsistencies in the correction of nutrients may lead to instabilities, as in the case of nitrates in the Adriatic. SFEK appeared to have a stronger effect on Chl-a, while HYBRID in a way combines SFEK and SEIK attributes, leading to a stronger update for Chl-a, but with a more dynamically evolving update, based on its flow-dependent correction subspace and a stronger effect on other variables.

Adopting an increased contribution (weighting factor) from the flow-dependent covariance resulted in a more effective correction of Chl-a, but also in a deterioration of nitrates, similar to SEIK. An improvement in the HYBRID Chl-a forecast was also obtained with increased flow-dependent ensemble size, but this improvement in the filter performance quickly leveled off.

Overall, SEIK was found to be more efficient in more productive and variable areas, due to its flow-dependent covariance, while SFEK showed a better performance than SEIK in less variable areas, as its EOF-based covariance retains the main modes of the system long-term variability. HYBRID was more efficient than both SFEK and SEIK, as it appears to perform well in both types of areas, due to its “blended” covariance. Its flow-dependent component may track changes in the system dynamics, allowing for a more efficient representation of the error growth directions, which results in a better filter correction, even though the spread of its ensemble is largely determined by the static ensemble. Another important attribute of HYBRID is its ability to maintain a sufficient spread in its dynamic ensemble, thanks to its static covariance that mitigates the inbreeding of the ensemble.

Among the main challenges of assimilating satellite Chl-a in marine ecosystem models are the dynamically consistent projection of the Chl-a information onto non-assimilated variables, respecting the ecosystem dynamics, and the increased computational cost, given the large number of state variables. The developed hybrid assimilation scheme was found particularly robust and efficient in reducing the model Chl-a error but also showing a positive effect and dynamically consistent updates on non-assimilated variables. Moreover, its good performance with a relatively small flow-dependent ensemble size is translated to a significantly reduced computational load, which is particularly important for its operational implementation within the 3-D biogeochemical forecasting model of the Mediterranean. Future research besides an end-to-end ecosystem simulation analysis will also consider stochastic perturbation on the physical forcing and the ecosystem parameters, eventually developing and implementing filtering schemes for state parameter estimation.