1 Introduction

Phytoplankton communities are among the most important photosynthetic groups on Earth, being at the bottom of the marine food chain, and responsible for approximately half the global primary production (Field et al. 1998). Their contribution to ecosystem functions is only matched by their contribution to biodiversity. Indeed, phytoplankton communities are characterized by a surprisingly high number of species. For example, a single sample as small as a few mL can contain up to seventy species (REPHY 2017; Widdicombe and Harbour 2021). This observation is usually called the “paradox of the plankton” (a term coined by Hutchinson 1961), which refers to the conflict between the observed diversity of species competing for similar resources in a seemingly homogeneous environment, and models predicting that only a few species will persist by outcompeting the others (MacArthur and Levins 1964; Huisman and Weissing 1999; Schippers et al. 2001). Phytoplankton models for coexistence are now almost as diverse as their model organisms (Record et al. 2014), but they often describe only a handful of species, which does not correspond to the diversity observed in the field. When modeling rich communities (> 10 species), classical answers to the plankton paradox involving temporal fluctuations (e.g., Li and Chesson 2016; Chesson 2018) are not sufficient to maintain a realistic diversity. For instance, we found that a phytoplankton community dynamics model with environmental fluctuations and storage effect still requires extra niche differentiation for coexistence, which manifests in stronger intraspecific than interspecific interactions (Picoche and Barraquand 2019). However, it is not clear that we should resort to hidden niches to explain phytoplankton coexistence, as most models also make hidden simplifying assumptions that could be relaxed. One that we relax here is mean-field dynamics at the microscale. Indeed, field observations have revealed phytoplankton patchiness for decades, with early records in the past centuries (Bainbridge 1957; Stocker 2012), from the macro- to the micro-scale (Leonard et al. 2001; Doubell et al. 2006; Font-Muñoz et al. 2017).

Phytoplankton patchiness can at least be partly explained by the hydrodynamics of their environment: the size of these organisms is mostly below the size of the smallest eddy (i.e., the Kolmogorov scale). In a typical aquatic environment such as the ocean, phytoplankton individuals are embedded in viscous micro-structures (Peters and Marrasé 2000) while phytoplankton populations are displaced by a turbulent flow at slighly larger scales (Martin 2003; Prairie et al. 2012). Phytoplankton organisms therefore live in an environment where fluid viscosity dominates at the scale of an individual but turbulent dispersion dominates on length scales characteristic of a small population of those individuals (Estrada et al. 1987; Prairie et al. 2012).

This leads us to consider demography in the context of this environmental variation created by hydrodynamic processes. Individual-based models provide a convenient depiction of population dynamics and movement at the microscale (Hellweger and Bucci 2009). In this framework, population growth is a result of individual births and deaths. Aggregation of individuals can emerge from local reproduction coupled with limited dispersal, which can happen in a fluid where turbulence and diffusion are not strong enough to disperse kin aggregates (Young et al. 2001). The resulting local aggregation can then affect the community dynamics at larger spatial scales, even when all competitors are equivalent (i.e., with equal interaction strengths irrespective of species identity). Indeed, the combination of local dispersal after reproduction and local interactions leads to stronger intraspecific interactions than interspecific interactions at the population level (Detto and Muller-Landau 2016). This mechanism stabilizes the community, as a high intra-to-interspecific interaction strength ratio makes a species control its abundance more than it controls the abundance of other species, which is associated with coexistence in theoretical models (Levine and HilleRisLambers 2009; Barabás et al. 2017) and often observed in the field at the population level (Adler et al. 2018; Picoche and Barraquand 2020). Therefore, the microscale spatial distribution of individuals likely affects the interaction structure within a community, and may sustain diversity (Haegeman and Rapaport 2008).

Existing models of phytoplankton populations near the Kolmogorov scale—between 1 mm and 1 cm in an oceanic environment (Barton et al. 2014)—focus on a single species and the clustering of its individuals (Young et al. 2001; Birch and Young 2006; Bouderbala et al. 2018; Breier et al. 2018). These models share similarities to dynamic point process models (Law et al. 2003; Bolker and Pacala 1999; Plank and Law 2015) developed initially with larger organisms in mind. When phytoplankton individual-based models consider multiple types of organisms, they focus for now on how organisms with opposite characteristics (e.g., increase versus decrease in density with turbulence in Borgnino et al. 2019; Arrieta et al. 2020) segregate spatially, or on coexistence of species that have contrasting trait values (e.g., size in Benczik et al. 2006). Such models are useful as an explanation of how species with marked differences might coexist. The difficulty of the coexistence problem, however, is that we also have to explain how closely related species or genera (e.g., within diatoms), many of whom have similar size, buoyancy, chemical composition, etc., manage to coexist within a single trophic level. This requires modelling similar species in a spatially realistic environment and objectively quantifying whether they aggregate or segregate in space.

To do so, we build a multispecies version of the Brownian Bug Model (BBM) of Young et al. (2001), an individual-based model which includes an advection process mimicking a turbulent fluid flow, passive diffusion of organisms, as well as stochastic birth and death processes. The initial version of this model (Young et al. 2001) coupled limited dispersal and local reproduction with ocean-like microscale hydrodynamics, and showed spatial clusters of individuals of the same species. The original BBM was limited to a single species and was illustrated with two-dimensional simulations. The model was not strongly quantitative (Picoche et al. 2022) in the sense that parameters were not informed by current knowledge on phytoplankton biology (numbers of cells per liter, diffusion characteristics, etc.). As phytoplankton organisms live in a three-dimensional environment, informing the model with more realistic parameters requires us to shift to three dimensions. We also extend the model to multiple species, and consider two size classes for our phytoplankton communities, which are either made of nanophytoplankton (3 µm diameter, \(\approx 10^{6}\) cells L\(^{-1}\)) or microphytoplankton (50 µm, \(\approx 10^{4}\) cells L\(^{-1}\)). We populate each community with 3 to 10 different species.

The Brownian Bug Model (in its original single-species form as in the multispecies version considered here) is related to spatial branching processes. Without advection, it combines a continuous-time, discrete-state model for population growth and a continuous-time, continuous-space Brownian motion for particle diffusion (Birch and Young 2006). It is further complexified by a turbulent flow in Young et al. (2001); Picoche et al. (2022) as well as here. In spite of this complexity, it remains possible to derive the dynamics of pair density functions, which quantify the degree of intra- and interspecific clustering of organisms, via correlations between positions of organisms (see next section). Thus we can understand emergent spatial structures in analytic detail and compare these predictions to the results from three-dimensional simulations. Furthermore, because we do not consider direct interactions between organisms, the multispecies spatial point process that represents the stable state of the BBM is a random superposition of spatial point processes for each species (Illian et al. 2008). This enables us to derive, in addition to pair correlation functions, analytical formulas for the species composition in the neighbourhood of an individual, which are more readily ecologically interpreted than pair density or correlation functions.

2 Model and spatial statistics

2.1 Brownian Bug Model

The Brownian Bug Model (BBM) describes the dynamics of individuals in a turbulent and viscous environment, including demographic processes. The model is continuous in space and time. Here we extend the mostly two-dimensional, monospecific version in Young et al. (2001), to three dimensions and S species.

Each individual is characterized by its species identity i and its position \({\textbf{x}}^{T}=(x,\,y,\,z)\). The population dynamics are modelled by a linear birth-death process with birth rate \(\lambda _{i}\) and death rate \(\mu _{i}\). Each individual independently follows a Brownian motion with diffusivity \(D_{i}\), and is advected by a common stochastic and chaotic flow modelling turbulence. The model applies in the Batchelor regime, which means that the separation s(t) between two individuals k and l grows exponentially with time with stretching parameter \(\gamma \), i.e., \(s(t)=\ln \left( |{\varvec{x}}_{k}-{\varvec{x}}_{l}|(t)\right) \propto 3\gamma t\) (Kraichnan 1974; Young et al. 2001).

Within a given community (the set of all individuals of the S species), all species share the same parameters: \(\lambda _{i}\), \(\mu _{i}\) and \(D_{i}\) values can change between communities, as we later consider small and large phytoplankton, but are set to common values within a community. On the contrary, \(\gamma \) describes the environment and is not community-specific, i.e., all individuals are displaced by the same turbulent stirring. For numerical simulations, time needs to be discretized (this is required for diffusion and advection modelling). The approximated model advances through time in small steps of duration \(\tau \). During each interval, events unroll as follows:

  1. 1.

    Demography: each individual can either reproduce with probability \(p_{i}=\lambda _{i}\tau \) (forming a new individual of the same species i at the same position \({\textbf{x}}\) as the parent), die with probability \(q_{i}=\mu _{i}\tau \), or remain unchanged with probability \(1-p_{i}-q_{i}\).

  2. 2.

    Diffusion: each individual moves to a new position \({\textbf{x}}(t')={\textbf{x}}(t)+\delta {\textbf{x}}(t)\), with \(t<t'<t+\tau \). The random displacement \(\delta {\textbf{x}}(t)\) is drawn from a Gaussian distribution \({\mathcal {N}}(0,\Delta _{i}^{2})\) with \(D_{i}=\Delta _{i}^{2}/2\tau \) the diffusivity. This diffusive step separates the initially coincident pairs produced by reproduction in step 1 above.

  3. 3.

    Turbulence: each individual is displaced by a turbulent flow, modelled with the Pierrehumbert map (Pierrehumbert 1994), adapted to three dimensions following Ngan and Vanneste (2011). Thus given the position at time \(t'\) the updated position at time \(t+\tau \) is

$$\begin{aligned} x(t+\tau )= & {} x(t')+\frac{U\tau }{3}\cos \left( ky(t')+\phi (t)\right) \nonumber \\ y(t+\tau )= & {} y(t')+\frac{U\tau }{3}\cos \left( kz(t')+\theta (t)\right) \nonumber \\ z(t+\tau )= & {} z(t')+\frac{U\tau }{3}\cos \left( kx(t+\tau )+\psi (t)\right) . \end{aligned}$$
(1)

Above, U is the velocity of the flow, \(k=2\pi /L_{s}\) is the wavenumber for the flow at the length scale \(L_{s}\) (see below) and \(\phi (t)\), \(\theta (t)\), \(\psi (t)\) are random phases drawn from a uniform distribution between 0 and \(2\pi \); these phases remain constant during the interval between t and \(t+\tau \). The shift from continuous to discrete-time turbulence modelling is described in Section S1 in the Supplementary Information. The velocity U is related to \(\gamma \). As the separation between two points grows exponentially with parameter \(3\gamma \) due to turbulence, the exponent \(\gamma \) can be estimated as the slope of \(1/3\left\langle \ln (s(t))\right\rangle =f(t)\) in the absence of diffusion and demography (Young et al. 2001; Picoche et al. 2022).

Individuals are distributed in a cube of side length L, with periodic boundary conditions. The cube dimensions are determined to balance computing costs and realistic concentrations of individuals; they represent the accumulation of a few volumes of scale \(L_{s}\).

2.2 Characterization of the spatial distribution

Let W be the observation window (in our case, the whole cube, which we never subsample hereafter). The state of the system at time t can be described as a collection of S populations, where the population of species i is made of \(n_{i}\) individuals randomly distributed in W, with positions \({\varvec{X}}_{i}(t)=[{\varvec{x}}_{1,i}(t),{\varvec{x}}_{2,i}(t),...,{\varvec{x}}_{n_{i},i}(t)]\). \({\varvec{X}}(t)~=~[{\varvec{X}}_{1}(t),\ldots ,{\varvec{X}}_{S}(t)]\) arises from a stochastic and spatial individual-based model changing through time, but can also be analyzed as a spatial point process at time t. We note that the point distributions remain the same for all spatial translations \(\varvec{\xi } \) (i.e., the point process described by the set \({\varvec{X}}=[{\varvec{x}}_{1},{\varvec{x}}_{2},...,{\varvec{x}}_{k}]\) is the same as \(\varvec{X_{\xi }}=[{\varvec{x}}_{1}+\varvec{\xi } ,{\varvec{x}}_{2}+\varvec{\xi } ,...,{\varvec{x}}_{k}+\varvec{\xi } ])\): the process is stationary.

A useful method to characterize a spatial point process is the use of spatial moments (illustrated in Section S2 of the SI for simple spatial point processes). These can be theoretically derived and used to check simulations. The spatial moments of a process are, however, merely statistical indicators which then need to be related to more easily ecologically interpretable quantities. This is the role of the dominance index, which we present below.

2.2.1 Spatial moments

The first-order moment is the intensity of the process, or mean concentration of individuals, whose empirical estimate is \(C_{i}=\frac{\widehat{N_{i}(W)}}{V(W)}\), where \(\widehat{N_{i}(W)}\) is the empirical number of individuals of species i in the cube W and \(V(W)=L^{3}\) is the volume of the cube; it does not give any information regarding the spatial distribution of individuals, and their spatial correlations.

The second-order product density, or pair density G(rt), is the expected density of pairs of points separated by a distance r (Law et al. 2003). A similar statistic can be used for marked spatial point process. In our case, the marks are the species’ identities, and we can define \(G_{ij}(r,t)\), so that \(G_{ij}(r,t)d{\textbf{x}}_{A}d{\textbf{x}}_{B}\) is the probability of finding an individual of species i in volume \(d{\textbf{x}}_{A}\) and an individual of species j in volume \(d{\textbf{x}}_{B}\), with the distance between the centers of \(d{\textbf{x}}_{A}\) and \(d{\textbf{x}}_{B}\) equal to r (pages 219 and 325 in Illian et al. 2008). We define \(\varvec{\xi }\) as the vector connecting the center of \(d{\textbf{x}}_{A}\) to the center of \(d{\textbf{x}}_{B}\), while \(r=|\varvec{\xi }|\) is the radial distance. We show in Picoche et al. (2022) that the intraspecific pair density \(G_{ii}(r,t)\), in three dimensions, is a solution of

$$\begin{aligned} \frac{\partial G_{ii}}{\partial t}(r,t)=\frac{2D_{i}}{r^{2}}\frac{\partial }{\partial r}\left( r^{2}\frac{\partial G_{ii}}{\partial r}\right) +\frac{\gamma }{r^{2}}\frac{\partial }{\partial r}\left( r^{4}\frac{\partial G_{ii}}{\partial r}\right) +2(\lambda _{i}-\mu _{i})G_{ii}+2\lambda _{i}C_{i}\delta (\varvec{\xi }). \nonumber \\ \end{aligned}$$
(2)

The pair correlation function \(g_{ij}(r,t)\), or pcf, can be derived from the pair density and is defined as

$$\begin{aligned} g_{ij}(r,t)=\frac{G_{ij}(r,t)}{C_{i}C_{j}}. \end{aligned}$$
(3)

The pcf is equal to one when the spatial distribution of species i individuals is random relative to species j individuals. To compute the intraspecific pcf \(g_{ii}(r,t)\) at steady state, considering a population at equilibrium, we integrate Eq. 2 (see Appendices, Eqs. 19-30) with \(\lambda _{i}=\mu _{i}\) and obtain

$$\begin{aligned} g_{ii}(r)=1+\frac{\lambda _{i}}{4\pi D_{i}C_{i}\ell _{\textrm{B},i}}\left( \frac{\ell _{\textrm{B},i}}{r}+\arctan \left( \frac{r}{\ell _{\textrm{B},i}}\right) -\frac{\pi }{2}\right) , \end{aligned}$$
(4)

where \(\ell _{\textrm{B},i}=\sqrt{2D_{i}/\gamma }\) approximates the Batchelor scale for species i.

The system converges rapidly to the solution in Eq. 4 in the presence of advection. However, when there is no turbulent advection, convergence is much slower, to the point that an equilibrium assumption requires unrealistically long timeframes (see Section S3 in the SI). We therefore need a time-dependent formula for the pcf in the absence of advection, which can be obtained in the case where \(\gamma =0\) using a Green’s function (see derivation in the Appendices, Eqs. 31-37),

$$\begin{aligned} g_{ii}(r,t)=1+\frac{\lambda _{i}}{4\pi rD_{i}C_{i}}\left\{ 1-{{\,\textrm{erf}\,}}\left( \frac{r}{\sqrt{8D_{i}t}}\right) \right\} . \end{aligned}$$
(5)

The above equations match when \(\gamma \rightarrow 0\) and \(t\rightarrow +\infty \).

As populations of different species do not directly interact, each population is an independent realization of a point process, which means that the distribution of all individuals within the community at time t is a random superposition of stationary point processes and thus \(g_{ij}(r,t)~=~1\) if \(i\ne j\) (Illian et al. 2008, p. 326, eq. 5.3.13).

Related to the pair correlation function is Ripley’s K-function K(r). Using its marked version, \(C_{j}K_{ij}(r)\) is the average number of points of species j surrounding an individual of species i within a sphere of radius r (Illian et al. 2008), i.e.,

$$\begin{aligned} \forall r\ge 0, K_{ij}(r)=\frac{1}{C_{j}}{\mathbb {E}}_{i}\left( N_{j} \left( b(o,r)\backslash \{o\}\right) \right) , \end{aligned}$$
(6)

where \({\mathbb {E}}_{i}\) is the expectation with respect to individuals of species i and \(N_{j}\left( b(o,r)\backslash \{o\}\right) \) is the number of individuals of species j in the sphere of radius r centered on individual o, not counting individual o itself. \(K_{ij}(r)\) is related to \(g_{ij}(r)\) as

$$\begin{aligned} g_{ij}(r)=\frac{K_{ij}'(r)}{4\pi r^{2}}. \end{aligned}$$
(7)

Combining Eq. 7 and, when \(U>0\), Eq. 4, we can show that (see Appendices, Eqs. 38-44)

$$\begin{aligned} K_{ii}(r)=\frac{4}{3}\pi r^{3}+\frac{\lambda _{i}r^{3}}{3D_{i}C_{i}\ell _{\textrm{B},i}}\left( \frac{\ell _{\textrm{B},i}}{r} +\frac{\ell _{\textrm{B},i}^{3}\log \left( \frac{r^{2}}{\ell _{\textrm{B},i}^{2}}+1\right) }{2r^{3}} +\arctan \left( \frac{r}{\ell _{\textrm{B},i}}\right) -\frac{\pi }{2}\right) . \end{aligned}$$
(8)

When \(U=0\), we need a time-dependent solution corresponding to our simulation duration, i.e. (see Appendices, Eqs. 4651)

$$\begin{aligned} K_{ii}(r,t)=\frac{4}{3}\pi r^{3}+\frac{\lambda _{i}r^{2}}{C_{i}D_{i}}\left( \frac{1}{2}-\frac{1}{2} {{\,\textrm{erf}\,}}\left( \frac{r}{\sqrt{8D_{i}t}}\right) \left( 1-\frac{4D_{i}}{r^{2}}t\right) -\frac{\sqrt{2D_{i}t}}{\sqrt{\pi }r}e^{-\frac{r^{2}}{8D_{i}t}}\right) . \end{aligned}$$
(9)

For random superposition of stationary point processes, \(K_{ij}(r,t)=\frac{4}{3}\pi r^{3}\) if \(i\ne j\) (Illian et al. 2008, p. 324, eq. 5.3.5).

2.2.2 Dominance index

The dominance index (defined in Table S1 in the Supporting Information of Wiegand et al. 2007) is the ratio between the number of conspecifics and the number of individuals of all species surrounding a given individual.

Let \(M_{ij}(r)\) be the average number of individuals of species j within a circle of radius r around an individual of species i, which can also be written with Ripley’s K-function as \(M_{ij}(r)=C_{j}K_{ij}(r)\). \(M_{ii}(r)\) corresponds to the conspecific neighbourhood and \(M_{io}(r)=\sum _{j=1,j\ne i}^{S}M_{ij}(r)\) corresponds to individuals of all other species. We can then define \({\mathcal {D}}_{i}\) as

$$\begin{aligned} \begin{array}{ccc} {\mathcal {D}}_{i}(r) &{} = &{} \frac{M_{ii}(r)}{M_{ii}(r)+M_{io}(r)}\\ &{} = &{} \frac{C_{i}K_{ii}(r)}{\sum _{j=1}^{S}C_{j}K_{ij}(r)}. \end{array} \end{aligned}$$
(10)

When individuals of the same species i tend to cluster, \({\mathcal {D}}_{i}(r)\) tends to 1 while it tends to the proportion of individuals of species i in the whole community when the distribution is uniform (Section S2 of the SI).

Using Eqs. 8 and 10, we obtain the formula for the dominance index in the presence of advection as

$$\begin{aligned} {\mathcal {D}}_{i}(r)=\frac{\frac{\lambda _{i}}{3D_{i}\ell _{\textrm{B},i}}\left( \frac{\ell _{\textrm{B},i}}{r} +\frac{\ell _{\textrm{B},i}^{3}\log \left( \frac{r^{2}}{\ell _{\textrm{B},i}^{2}}+1\right) }{2r^{3}}+\arctan \left( \frac{r}{\ell _{\textrm{B},i}}\right) -\frac{\pi }{2}\right) +\frac{4}{3}\pi C_{i}}{\frac{\lambda _{i}}{3D_{i}\ell _{\textrm{B},i}}\left( \frac{\ell _{\textrm{B},i}}{r}+\frac{\ell _{\textrm{B},i}^{3}\log \left( \frac{r^{2}}{\ell _{\textrm{B},i}^{2}}+1\right) }{2r^{3}}+\arctan \left( \frac{r}{\ell _{\textrm{B},i}}\right) -\frac{\pi }{2}\right) +\sum _{j=1}^{S}\frac{4}{3}\pi C_{j}}.\nonumber \\ \end{aligned}$$
(11)

In the absence of advection (\(U=0,\gamma =0\)), we use the time-dependent dominance index, computed similarly:

$$\begin{aligned} {\mathcal {D}}_{i}(r,t)=\frac{\frac{\lambda _{i}}{D_{i}r}\left( \frac{1}{2} -\frac{1}{2}{{\,\textrm{erf}\,}}\left( \frac{r}{\sqrt{8D_{i}t}}\right) \left( 1-\frac{4D_{i}}{r^{2}}t\right) -\frac{\sqrt{2D_{i}t}}{\sqrt{\pi }r}e^{-\frac{r^{2}}{8D_{i}t}} \right) +\frac{4}{3}\pi C_{i}}{\frac{\lambda _{i}}{D_{i}r}\left( \frac{1}{2} -\frac{1}{2}{{\,\textrm{erf}\,}}\left( \frac{r}{\sqrt{8D_{i}t}}\right) \left( 1-\frac{4D_{i}}{r^{2}}t\right) -\frac{\sqrt{2D_{i}t}}{\sqrt{\pi }r}e^{-\frac{r^{2}}{8D_{i}t}} \right) +\sum _{j=1}^{S}\frac{4}{3}\pi C_{j}}.\nonumber \\ \end{aligned}$$
(12)

2.3 Parameters

We model two types of organisms: microphytoplankton (defined by a diameter between 20 and 200 µm, here 50 µm) and nanophytoplankton (defined by a diameter between 2 and 20 µm, here 3 µm). These two groups are characterized respectively by a low diffusivity, slow growth and lower concentration vs. high diffusivity, fast growth and higher concentration. Organisms are displaced by a turbulent fluid whose velocity defines the time scale of the discretized model: we give here the reasoning behind parameter values, keeping in mind that our model parameters are only approximate. Main parameter definitions and values are given in Table 1.

2.3.1 Advection

We first consider the advection process, due to the turbulence of the environment. We only consider the Batchelor-Kolmogorov regime, i.e., \(L_S\) is below the size of the smallest eddy, but above the smallest length scale of fluctuations in nutrient concentrations. The defining scale of the environment therefore corresponds to a Reynolds number

$$\begin{aligned} \text {Re}=\frac{U}{k\nu }\approx 1 \end{aligned}$$
(13)

where \(\nu =10^{-6}\) m\(^{2}\) s\(^{-1}\) is the kinematic viscosity for water. The smallest wavenumber k corresponds to the largest length scale \(L_{s}\) (Kolmogorov scale), i.e., \(k=2\pi /L_{s},\) with \(L_{s}\approx 1\) cm in the ocean (Barton et al. 2014). The definition of the Reynolds number leads to

$$\begin{aligned} \begin{array}{cc} 1 &{} \approx \;\frac{UL_{s}}{2\pi \nu }\\ \Leftrightarrow U &{} \approx \;\frac{2\pi \nu }{L_{s}}. \end{array} \end{aligned}$$
(14)

This means that \(U=6.3\times 10^{-4}\) m s\(^{-1}\) = \(5.4\times 10^{3}\) cm d\(^{-1}\). Using \(U\tau /3=0.5\) cm as in Young et al. (2001), we have \(\tau =2.8\times 10^{-4}\) d \(=24\) s. When \(U\tau /3=0\), the environment is only diffusive, we keep the same value for \(\tau \). For \(U\tau /3=0.5\) cm, we estimate \(\gamma =1231\) d\(^{-1}\).

2.3.2 Diffusion

If we use the Stokes–Einstein equations (Einstein 1905, cited from Dusenbery 2009), diffusivity can be computed with

$$\begin{aligned} D_{i}=\frac{RT}{N_{A}}\frac{1}{6\pi \eta a_{i}} \end{aligned}$$
(15)

where \(R=8.314\) J K\(^{-1}\) mol\(^{-1}\) is the molar gas constant, \(T=293\) K is the temperature, \(N_{A}=6.0225\times 10^{23}\) is Avogadro’s number, \(\eta =10^{-3}\) m\(^{-1}~\)kg s\(^{-1}\) is the dynamic viscosity of water and \(a_{i}\) is the radius of the organism considered.

Using \(D_{i}=\frac{\Delta _{i}^{2}}{2\tau }\), we find that

$$\begin{aligned} \begin{array}{cccc} &{} \Delta _{i} &{} = &{} \sqrt{2\tau D_{i}}\\ \Leftrightarrow &{} \Delta _{i} &{} = &{} \sqrt{\frac{RT}{N_{A}}\frac{\tau }{3\pi \eta a_{i}}}. \end{array} \end{aligned}$$
(16)

We consider \(a_{n}=1.5\) µm for nanophytoplankton individuals and \(a_{m}=25\) µm for microphytoplankton individuals, which allows us to compute \(\Delta _{n}\) and \(\Delta _{m}\) (see Table 1).

2.3.3 Ecological processes

We study the community at equilibrium, with the birth rate equal to the death rate, i.e., \(p_{i}=q_{i}\,\forall i\). We use a microphytoplankton doubling rate of 1 d\(^{-1}\) (Bissinger et al. 2008) and consider the fastest-growing nanophytoplankton species, corresponding to a diameter of 3 µm (Bec et al. 2008), for which the doubling rate is between 2 and 3 d\(^{-1}\) (set to 2.5 d\(^{-1}\) here).

Table 1 Definitions and values of the main parameters used in the three-dimensional BBM, assuming the duration of a time step \(\tau \) is 24 s

2.3.4 Range of interaction

As we examine individual aggregation and its potential effects on interactions between species, we have to ascertain the volume in which an individual can be affected by the presence of other individuals, or affect other individuals. We only consider here interactions due to competition for nutrients, and therefore need to define a nutrient depletion volume. We approximate this volume as the sphere of radius r where \(C(r)\le 90\%C_{\infty }\) with \(C_{\infty }\) the background concentration of the nutrient and \(C(a_i) = 0\) (perfect absorption at the cell surface). The radius of this nutrient depletion volume is maximized when the individual is in stagnant water so that diffusion is the only hydrodynamic process. In this case, the depletion radius corresponds to 10 times the radius of the individual (Jumars et al. 1993; Karp-Boss et al. 1996). We define the maximum distance which allows for potential interactions (due to competition for resources) between two individuals of radius \(a_{i}\) and \(a_{j}\) as \(d_{\text {threshold}}\), and the corresponding volume of potential interactions around an organism as \(V_{\text {int}}=4/3\pi d_{\text {threshold}}^{3}\) with

$$\begin{aligned} d_{\text {threshold}}=10a_{i}+10a_{j}. \end{aligned}$$
(17)

We consider this maximum value as our baseline, keeping in mind that turbulence reduces the size of the nutrient depletion volume and increases the nutrient flux to the cell (Arnott et al. 2021). We caution that determination of the shape of the nutrient depletion volume in the presence of turbulence is too complex to be addressed here (Karp-Boss et al. 1996).

We consider a total volume of 1000 cm\(^{3}\) for microphytoplankton and 10 cm\(^{3}\) for nanophytoplankton (volumes are adapted to balance realistic concentrations and computation time) with periodic boundary conditions. Individuals are uniformly distributed in the cube at the beginning of the simulation. We run an idealized simulation with 3 species with an even abundance distribution of about \(10^{4}\) cells L\(^{-1}\) for microphytoplankton (Picoche and Barraquand 2020) and \(10^{6}\) cells L\(^{-1}\) for nanophytoplankton individuals (Edwards 2019). We then model a more realistic community with 10 species having a skewed abundance distribution (between 55,000 and 400 cells L\(^{-1}\) for microphytoplankton, according to observations of field abundance distributions in Picoche and Barraquand (2020), and multiplied by \(10^{2}\) for nanophytoplankton). All simulations are run for 1000 time steps of duration \(\tau \) (corresponding to approximately 6h40 of phytoplankton time—note that runtimes can be much longer). The computation of g and K for simulated distributions is explained in Section S4 of the SI. The code for all simulations and analyses can be found at https://github.com/CoraliePicoche/brownian_bug_3D/.

3 Results

We show an example of nanophytoplankton spatial distributions with and without advection at the end of a simulation in Fig. 1: clustering is not visible to the naked eye, even when zooming in on the observation volume, in the presence of advection, but removing turbulence helps visualising small aggregates of conspecifics. Microphytoplankton distributions are not straightforward to interpret as no clusters can be detected visually (although they may actually be present), whether advection is included or not (Section S5 of the SI). Statistics are therefore needed to go further in detecting patterns of aggregation.

Fig. 1
figure 1

Spatial distributions of a 3-species community of nanophytoplankton with and without advection with density \(C=10^{3}\) cells cm\(^{-3}\) after 1000 time steps. Each color corresponds to a different species. On the left-hand side, only a zoom on a \(0.5\times 0.5\times 0.5\) cm\(^{3}\) cube is shown, and its projection on the x-y plane is shown on the right-hand side

Ripley’s K-functions extracted from numerical simulations match theoretical formula (Fig. 2) for both types of organisms, which also indicates that dominance indices extracted from the simulations match theoretical expectations.

Fig. 2
figure 2

Comparison of theoretical and simulated Ripley’s K-functions as a function of distance (in cm) for microphytoplankton (a, b) and nanophytoplankton (c, d) in a 3-species community with even abundance distributions after 1000 timesteps, with (a, c) and without (b, d) advection. Each color represents a different species. Intraspecific K-functions are shown with dashed (theoretical values) and solid (simulated values) lines. Interspecific K-functions are shown with dotted lines (theoretical values) and circles (simulated values). The black dash-dotted line corresponds to the threshold considered as the maximum distance for nutrient-based competition

Dominance indices all follow a similar pattern (Figs. 3 and 4). The dominance index is close to 1 for small distances: there is always a scale at which an organism is surrounded almost only by conspecifics. The index then decreases sharply to converge at large distances (close to 1 cm) to the proportion of the focus species in the whole community, as it would for a uniform spatial distribution. Patterns differ at intermediate ranges of distances between organisms.

In the presence of advection, the dominance index starts decreasing for a distance approximately 10 times smaller than when advection is absent, which indicates that organisms are closer to heterospecifics when their environment is turbulent. A quasi-uniform distribution is also reached for smaller distances with advection than without. Microphytoplankton species start mixing for distances larger than for nanophytoplankton species irrespective of the hydrodynamic regime surrounding them.

In a 3-species community with the same initial abundances, in the presence of advection, microphytoplankton dominance indices are between 0.37 and 0.47 at the distance threshold for potential interactions, while they are between 0.80 and 0.94 for nanophytoplankton species. In the absence of turbulence, dominance indices are all above 0.98 when the distance threshold is reached (Fig. 3). Microphytoplankton organisms are therefore as likely to share their depletion volume with conspecifics as they are with heterospecifics, but only when turbulent advection is accounted for, whereas nanophytoplankton organisms always have almost only conspecifics around them.

Fig. 3
figure 3

Dominance indices as a function of distance (in cm) for microphytoplankton (a) and nanophytoplankton (b) in a 3-species community with even abundance distributions (final proportions in the community are indicated in the figure) after 1000 timesteps, with (circles) and without (lines) advection. Each color represents a different species. The grey dashed curve represents the analytical solution. The black dashed line corresponds to the threshold considered as the maximum distance for nutrient-based competition

More mixing in microphytoplankton than nanophytoplankton, and more mixing with advection, also holds when considering a 10 species-community with a skewed abundance distribution (Fig. 4), but dominance indices are overall lower in communities with more species and with less even abundances. In the presence of advection, microphytoplankton dominance indices at the distance threshold are between 0.34 (for the most abundant species) and 0.033 (for one of the least abundant species), while they are between 0.90 and 0.85 when advection is not taken into account. Nanophytoplankton species, too, are more mixed than in the 3 species-community: dominance indices vary between 0.54 and 0.2 when the depletion threshold is reached (with an exception of 0 for one particular species which had no conspecific for distances below \(10^{-2}\) cm) when organisms are displaced by turbulence, while the same quantity is between 1 and 0.97 when they are only subject to diffusion.

Fig. 4
figure 4

Dominance indices as a function of distance (in cm) for microphytoplankton (a) and nanophytoplankton (b) in a 10-species community with a skewed abundance distribution (final proportions in the community are indicated in the figure) after 1000 timesteps, with (circles) and without (lines) advection. Each color represents a different species. The coloured dashed curves (advection) and small stars (no advection) represent the analytical solution. The black dashed line corresponds to the threshold considered as the maximum distance for nutrient-based competition

Differences in spatial distributions are not only due to organism sizes, which determine their demographic and hydrodynamic properties, but also to their abundances (here set through initial values). In the presence of turbulence, the threshold distance at which dominance falls below 95% is smaller for more abundant species (Fig. 5a, b). Abundant species tend to be present nearly everywhere when they are mixed in the environment. Therefore, they are also more likely to be close to a heterospecific, but still have more conspecifics close to them than the less abundant species (\({\mathcal {D}}\left( d_{\text {threshold}}\right) \) increases with abundance, Fig. 5c, d). However, this increase is less clear for nanophytoplankton than for microphytoplankton (Fig. 5c, d). When turbulence is absent, the relationships with abundance are unclear, possibly affected by sampling effects, and we refrain from interpreting them.

Fig. 5
figure 5

Minimum distances (in cm) between points for dominance to drop below 95% (a and b) and dominance at a distance corresponding to the threshold for competition (c and d) as a function of abundances (note the logarithmic scale on the x-axis) for microphytoplankton and nanophytoplankton. We consider cases with and without advection in a 10-species community with a skewed abundance distribution. These have been obtained combining 10 sets of simulations

4 Discussion

We designed a stochastic, three-dimensional, individual-based model of the spatial distribution of multiple species in a viscous and turbulent flow. We conducted both mathematical analyses and numerical simulations to quantify spatial correlations in the distribution of organisms. We focused on the pair correlation function and Ripley’s K-function, for which numerical and theoretical analyses showed a good agreement, and extracted a more ecologically-oriented metric from them, i.e., the dominance index. This statistic is the local average ratio of conspecifics, i.e., the number of organisms of the focal species in the neighbourhood of an individual of the same species, divided by the total number of organisms in that neighbourhood. Intraspecific clustering corresponds to a dominance index close to 1, which decreases when interspecific mixing increases. The choice of this index was motivated by two reasons: (1) it is at its core a proportion of a focal species in a certain volume, i.e. a scale-dependent, localized metric bounded between 0 and 1 as opposed to other statistics whose values are less directly interpreted, and (2) it is easy to relate to coexistence theory as it describes the environment of an organism in terms of heterospecifics and conspecifics, which can, under certain assumptions that we discuss below, be related to interspecific and intraspecific interactions. Comparing the distributions of organisms of different sizes, we showed that the presence of turbulence always increased mixing (results are robust to slight modifications in the computation of advection velocity U, shown in Section S6 of the SI). The species composition around an organism depended on its size, which mechanically determines its hydrodynamic properties (diffusivity), and is linked with its ecological characteristics (growth rate and density). Microphytoplankters (20–200 µm), larger cells with lower diffusivity, growth rate and abundance, were on average further away from other cells, due to their lower concentrations (Fig. S11 of the SI), than nanophytoplankters (2–20 µm). They were surrounded by more heterospecifics than conspecifics within a volume of potential interactions, whose radius is defined as the maximum distance for which nutrient depletion volumes of two different individuals may overlap. If we consider that interactions between species (not modelled directly here because of timescale issues, see below) could occur with equal probability at all distances within the volume of potential interactions, we would conclude that microphytoplankters are more likely to interact with individuals from other species than with individuals of their own species. This affirmation is, however, conditional upon interactions at 10 cell diameters from an individual being equally likely than at 1 diameter from an individual. If we keep in mind that interactions are more likely or stronger at very short distances, or that the maximal radius of interaction could be shorter than our estimation and advection velocity U lower (SI Section S6), microphytoplankters may still experience more frequent effects of conspecifics than heterospecifics.

To see this, let us first focus on the smallest distances between organisms. The nearest neighbour of an organism was always an organism of the same species, and the minimum distance between conspecifics was always lower than expected for a uniform distribution (Section S7 of the SI). The dominance index remained close to 1 for distances below \(10^{-2}\) cm or \(10^{-3}\) cm for microphytoplankton and nanophytoplankton respectively. There was therefore always some intraspecific aggregation, i.e., conspecifics were always closer than heterospecifics at the smallest distances. This is due to the prevalence of demographic processes at individual scales, because an individual acts as a source point for other organisms of the same species, and hydrodynamic processes do not separate conspecifics fast enough to prevent aggregation. This remains true if we add an initial separation distance between mother and daughter cells upon birth (additional simulations, see code repository). If we consider that interaction strengths are a smoothly decaying function of distance, a common assumption in spatial coexistence models (e.g., Bolker and Pacala 1999; Law et al. 2003), this implies that population-level intraspecific interactions could be stronger than interspecific interactions due to intraspecific micro-scale aggregation. However, the mechanisms of competition at this scale are poorly known, likely relying on multiple types of resources with different distributions in the environment, effects on the cell, uptakes, etc. Rather than weighting much more heavily the potential interactions with the closest neighbour(s) through an interaction kernel, we therefore chose conservatively to define a maximum distance for two organisms to possibly affect the concentrations of elements in the environment of each other, assuming perfect absorption on the cell surface. We consider that, at all distances below this threshold, interactions could happen between organisms. We continue the discussion with that simplification in mind, and explicitly mention when it is relaxed.

Dominance indices began to decrease at distances above \(10^{-3}\) cm, still below the maximum distance for interactions. At this distance and above, the balance between heterospecifics and conspecifics was much more sensitive to different phytoplankters’ demographic and hydrodynamic traits. The species composition of an organism’s neighbourhood depended on its size: nanophytoplankton organisms mainly shared their volume of potential interactions with conspecifics (the dominance index remained close to 1, even near the distance threshold, i.e., the maximum distance for the overlap of nutrient depletion volumes) while microphytoplankton organisms could affect both conspecifics and heterospecifics (the dominance index was often below 0.5 at the distance threshold, i.e. an individual’s depletion zone probably overlapped with more heterospecifics’ than conspecifics’). Microphytoplankters were therefore more likely to share their depletion volume with heterospecifics than nanophytoplankters. The rate of production of new microphytoplankton conspecifics was not sufficient to compensate for the mixing induced by turbulence and diffusivity, even though the diffusivity of microphytoplankters was smaller than that of nanophytoplankters. There may therefore be different mechanisms at play at the community level for microphytoplankton and nanophytoplankton to maintain coexistence. For nanophytoplankton, the spatial structure likely leads to more interactions between conspecifics than between heterospecifics. The spatial distribution of microphytoplankton species, on the contrary, encourages more interactions between heterospecifics. If we consider that local interaction strengths are equal within the volume of potential interactions, scaling to the population level, we would likely observe stronger intra- over interspecific interactions for nanophytoplankton (a key factor in coexistence theory, Barabás et al. 2017) but not necessarily so for microphytoplankton. Using a timescale separation argument, we show in Section S8 in the SI how stronger interactions at population level than individual level may arise in a Lotka-Volterra model whose spatial structure is summed up by the dominance indices evidenced here. Stronger intra- than interspecific competition may arise at population level even when assuming that all local interaction strengths between individuals are equal, regardless of the identity of competitors.

All of the above discussion is based on a microphytoplankter’s neighbourhood in its nutrient depletion volume. To simplify the computation, we used maximum volumes of potential interactions, corresponding to a diffusive-only flow of nutrient particles. But when fluid turbulence increases, nutrient uptake increases, and the size of the depletion zone decreases (Karp-Boss et al. 1996). The proportion of change in the depletion volume increases with the size of organisms: a 10 µm-diameter organism might not experience any change, while the uptake of a 100 µm-diameter organism would increase by at least 50% (Karp-Boss et al. 1996). Therefore the volume of potential interactions shrinks in the presence of turbulence for microphytoplankton, but not necessarily for nanophytoplankton. An additional reason why microphytoplankers might still be surrounded by conspecifics at ecologically meaningful distances and interacting more frequently with them is imperfect absorption of nutrients: if nutrient concentration at the cell surface is not zero but \(C_0\), then the radius of interaction is \(10 a_i(1 - C_0/C_{\infty })\).

Up to now, we have only focused on the dominance index, a localized proportion of conspecifics. However, interactions also depend on the absolute densities of individuals. Mechanically, when density decreases, the distances between neighbours increase, which explains that the distances between the low-abundance microphytoplankters tended to be greater than distances between the more abundant nanophytoplankters (Section S7 of the SI). Explicit mathematical models using pair densities to express interaction rates (e.g. Law et al. 2003; Plank and Law 2015) may be able to incorporate those effects; however, as we highlight below, the timescales and spatial correlations that are seen in such models may not necessarily represent faithfully phytoplankton community dynamics.

Contrary to other similar models (e.g., Birch and Young 2006; Bouderbala et al. 2018), we did not consider explicit effects of local density on survival and fertility rates. Outside of simply maintaining analytical tractability, we had another, more biological reason to do so: we cannot be sure that these local density-dependencies make sense in our phytoplankton context. To understand why, consider that even if a species abundance is locally tripled, competition might not directly ensue at the time scales covered by our model (\(\approx 7\) h), if nutrient depletion has not had time to set in yet. Even if we considered longer time frames, we would need lagged local density-dependencies, which are to our knowledge not leading to tractable spatial branching or dynamic point processes. We could, of course, directly model nutrients, perhaps as resource “points” with a dynamics of their own (Murrell 2005; North and Ovaskainen 2007), which in turn change the reproduction or death rate of individuals. If the resource points risk being depleted, this entails a negative spatial correlation between organisms and their resources (Murrell 2005; Barraquand and Murrell 2012). And that is where such models might be inadequate. The phycosphere, a micro-environment at the periphery of a phytoplankton organism where communities of bacteria interact (Seymour et al. 2017), can also impact phytoplankton fitness, both positively (cross-feeding) and negatively (algicidal activities of bacteria). This can sometimes lead to an accumulation of key resources close to the phytoplankter. This will lead to positive spatial correlations between consumers and their resources, and we currently do not have theoretical models to represent this process (short of modelling precisely the spatial distribution of these bacteria).

Our model should be viewed as a first model of spatial distributions of multiple phytoplankton species in a realistic, three-dimensional environment at the microscale, describing only basic hydrodynamic and demographic processes. Using this model, we were able to predict whether phytoplankters could be in contact with individuals of their own or other species, and form reasonable conjectures regarding potential intra vs interspecific interactions between species, emerging at the population level through spatial distributions (Detto and Muller-Landau 2016). It is worthwhile to keep in mind that there are many remaining features of phytoplankton physiology and life histories which we do not address here, but which may affect spatial distributions. Many phytoplankters are able to move actively in three dimensions, which can favour cluster formation (Breier et al. 2018). Even those who are believed to move passively actually often move along the vertical dimension by regulating their buoyancy (Reynolds 2006), and can at times aggregate to form pairs (Font-Muñoz et al. 2019). Finally, a part of spatial structure is explained by the partially colonial nature of microphytoplankton (Kiørboe et al. 1990). This clearly calls for viewing our model as a null model to which more complex mechanistic models and their spatial outputs can be compared.