1 Introduction

The use of co-simulation is increasing in the industry as it enables to connect and simulate systems with given interfaces (input and output variables) without disclosing the expertise inside. Hence, modellers can provide system architects with virtual systems as black-boxes since the systems are able to interact through their interfaces. Among these interactions, the minimal requirements are quite simple: a system should at least be able to read the inputs given by the other systems, to simulate its physics inside (most of the time thanks to an embedded solver), and to provide outputs of the simulation to the other systems.

Besides its black-box aspect protecting the know-how, co-simulation also enables physic-based decomposition (one system can represent the hydraulic part of a modular model, another the mechanical part, a third one the electrical part, and so on) and/or dynamics-based decomposition (some systems isolate the stiff state variables so that they do not constraint all the other states anymore during the simulation). In other words, the co-simulation opens many doors thanks to the modular aspect of the models handled.

The co-simulation field of research nowadays focuses on the numerical methods and algorithms that can be used to process simulations of such modular models. From the simplest implementations (non-iterative Jacobi) to very advanced algorithms [4, 7, 11,12,13, 15], co-simulation methods have been developed in different fields, showing that the underlying problems to be tackled are not straightforward. Some arising problems could clearly be identified since the moment it has become a center of interest for researchers, such as the delay between the given inputs and the retrieved outputs of a system (corresponding to the so-called “co-simulation step” or “macro-step”), the instabilities that might occur as a consequence of this delay [16], the discontinuities produced at each communication [5], the error estimation (and the use of it to adapt the macro-step size) [13], the techniques to solve the so-called “constraint function” corresponding to the interface of the systems [9, 14], and so on. Moreover, performance issues usually arise when co-simulation codes are implemented in practice, for instance: idling systems (in Gauss-Seidel-like methods systems are simulated sequentially, one at a time, and in Jacobi-like methods the time taken by a co-simulation step is the one of the slowest system due to synchronization points where faster systems have to wait for slower ones). Many of these problems have been addressed in papers, either proposing an analysis, a method to solve them, or both.

In our previous paper [6], an iterative method that satisfies the interfaces’ consistency while avoiding discontinuities at each macro-step was proposed and compared to well-established methods (non-iterative Jacobi, zero-order hold iterative co-simulation [9], and non-iterative algorithm enhancing variables’ smoothness [5]). This algorithm was based on a fixed-point iterative method. Its evolution, presented in this paper, is based on iterative methods that normally require jacobian matrix computation, yet we use their jacobian-free version. The name of this method is IFOSMONDI-JFM, standing for Iterative and Flexible Order, SMOoth and Non-Delayed Interfaces, based on Jacobian-Free Methods. The enhancements it brings to the classical IFOSMONDI method enable to solve cases that could not be solved by this previous version. The integration of an easily modulable jacobian-free method to solve the constraint function will be presented. The software integration, in particular, was made possible thanks to the PETSc framework, a library that provides modulable numerical algorithms. The interfacing between PETSc and the co-simulation framework dealing with the systems, interfaces and polynomial representations will be detailed.

2 Formalism and notations

2.1 A word on the JFM accronym

In the whole paper, the JFM abbreviation will denote jacobian-free versions of iterative methods that are designed to bring a given function (the so-called callback) to zero and that normally require the computation of the jacobian matrix of the callback function. In particular, a fixed-point method does not meet these criteria: it is not a JFM, contrary to matrix-free versions of the Newton method, the Anderson method [1] or the non-linear GMRES method [10].

2.2 General notations

The set \(M_{a, b}(A)\) will represent the set of matrices of a rows and b columns with its coefficients in the set A.

In this paper, we will focus on explicit systems. In other words, we will consider that every system in the co-simulation is a dynamical system corresponding to an ODE (Ordinary Differential Equation). The time-domain of the ODEs considered will be written \([t^{{\mathrm{init}}}, t^{{\mathrm{end}}}[\), and the variable t will denote the time.

Let’s consider \(n_{sys}\in {\mathbb {N}}^*\) systems are involved: we will use the index \(k\in [\![1, n_{sys}]\!]\) to denote the \(k{\mathrm{th}}\) system, and \(n_{st,k}\), \(n_{in,k}\), and \(n_{out,k}\) will respectively denote the number of state variables, the number of inputs, and the number of outputs of system k.

The time-dependant vectors of states, inputs and outputs of system k will, respectively, be written \(x_k\in L([t^{{\mathrm{init}}}, t^{{\mathrm{end}}}[, \mathbb {R}^{n_{st,k}})\), \(u_k\in L([t^{{\mathrm{init}}}, t^{{\mathrm{end}}}[, \mathbb {R}^{n_{in,k}})\), and \(y_k\in L([t^{{\mathrm{init}}}, t^{{\mathrm{end}}}[, \mathbb {R}^{n_{out,k}})\) where L(AB) denotes the set of functions of domain A and co-domain B. We can write the ODE form of the system k:

$$ \left\{ \begin{array}{lcl} {\dot{x}}_k(t) &{} & = {} f_k(t, x_k(t), u_k(t)) \\ y_k(t) &{} & = {} g_k(t, x_k(t), u_k(t)) \end{array} \right. $$
(1)

Please note that co-simulation is mainly interesting on 0D systems. Indeed, CFD systems for instance can be split in term of physics, generating systems coupled on every point in space. This would generate a very high number of interfaces \(n_{in,k}\) and \(n_{out,k}\) for all k in \([\![1, n_{sys}]\!]\). Although co-simulation can work on such cases, we will focus on 0D systems (such as the test-cases presented in Sect. 5) as co-simulation becomes relevant when the stiffness of the systems are local on each of them, and where the interface variables (inputs and outputs) are smooth and relatively few.

Let \(n_{in,tot}\) and \(n_{out,tot}\) respectively be the total amount of inputs \(\sum _{k=1}^{n_{sys}}n_{in,k}\) and the total amount of outputs \(\sum _{k=1}^{n_{sys}}n_{out,k}\).

The total input and the total output vectors are simply concatenations of input and output vectors of every system. They will be denoted by underlined vectors. The underline will denote a quantity “upon every subsystem”.

$$\begin{aligned} \begin{array}{lclcl} \underline{\smash {u}}\vphantom{u}(t) &{} & = {} (u_1(t)^T, \ldots , u_{n_{sys}}(t)^T)^T &{} \in &{} L([t^{{\mathrm{init}}}, t^{{\mathrm{end}}}[, \mathbb {R}^{n_{in,tot}}) \\ \underline{\smash {y}}\vphantom{y}(t) &{} & = {} (y_1(t)^T, \ldots , y_{n_{sys}}(t)^T)^T &{} \in &{} L([t^{{\mathrm{init}}}, t^{{\mathrm{end}}}[, \mathbb {R}^{n_{out,tot}}) \\ \end{array} \end{aligned}$$
(2)

To illustrate the notations introduced above, an example is given further in this paper, in Fig. 3.

Finally, a tilde symbol \(\tilde{}\) will be added to a functional quantity to represent an element of its co-domain. exempli gratia, \(\underline{\smash {y}}\vphantom{y}\in L([t^{[N]}, t^{[N+1]}[, {\mathbb {R}})\), so we can use \({\tilde{\underline{\smash {y}}\vphantom{y}}}\) to represent an element of \(\mathbb {R}^{n_{out,tot}}\).

2.3 Extractors and rearrangement

To easily switch from global to local inputs, extractors are defined. For \(k\in [\![1, n_{sys}]\!]\), the extractor \(E_k^u\) is the matrix defined by (3).

$$\begin{aligned} \begin{array}{rccccc} E^u_k =&{}\bigg ( \underbrace{\qquad \qquad 0\qquad \qquad } &{} \Big | &{}\underbrace{\qquad \qquad \Big ( I_{n_{in,k}} \Big )\qquad \qquad }&{} \Big | &{} \underbrace{\qquad \qquad 0\qquad \qquad } \bigg ) \\ n_{in,k}\, \times &{} \sum _{l=1}^{k-1}n_{in,l} &{} &{} n_{in,k} \times n_{in,k} &{}&{} {n_{in,k} \times \sum _{l=k+1}^{n_{sys}}n_{in,l}} \\ \end{array} \end{aligned}$$
(3)

where \(\forall n\in {\mathbb {N}},\ I_n\) denotes the identity matrix of size n by n.

The extractors enable to extract the inputs of a given system from the global inputs vector with a relation of the form \({\tilde{u}}_k = E^u_k {\tilde{\underline{\smash {u}}\vphantom{u}}}\). We have: \(\forall k\in [\![1, n_{sys}]\!],\ E^u_k\in M_{n_{in,k}, n_{in,tot}}(\{0, 1\})\).

A rearrangement operator will also be needed to handle concatenations of outputs and output derivatives. For this purpose, we will use the rearrangement matrix \(R^{\underline{\smash {y}}\vphantom{y}}\in M_{n_{out,tot}, n_{out,tot}}(\{0, 1\})\) defined blockwise in (4).

$$\begin{aligned}\begin{array}{c}R^{\underline{\smash {y}}\vphantom{y}} = \left( R^{\underline{\smash {y}}\vphantom{y}}_{K, L}\right) _{\begin{array}{c} K\in [\![1,\ 2\ n_{sys}]\!]\\ L\in [\![1,\ 2\ n_{sys}]\!] \end{array}} \\ \text {where} \\ R^{\underline{\smash {y}}\vphantom{y}}_{K, L} = \left\{ \begin{array}{ll} I_{n_{out,K}} &{} \text {if}\ K \leqslant n_{sys}\ \text {and}\ L=2K-1 \\ I_{n_{out,K-n_{sys}}} &{} \text {if}\ K > n_{sys}\ \text {and}\ L=2(K-n_{sys}) \\ 0 &{} \text {otherwise} \\ \end{array}\right. \end{array}\end{aligned}$$
(4)

The \(R^{\underline{\smash {y}}\vphantom{y}}\) operator makes it possible to rearrange the outputs and output derivatives with a relation of the form (5).

$$\begin{aligned}\begin{array}{lcl} \left( \begin{array}{c} \\ {\tilde{\underline{\smash {y}}\vphantom{y}}}\\ \hline \\ \tilde{{\dot{\underline{\smash {y}}\vphantom{y}}}}\\ \end{array}\right) & = & \left( \begin{array}{c} {\tilde{y}}_1 \\ {\tilde{y}}_2 \\ \vdots \\ {\tilde{y}}_{n_{sys}} \\ \hline \tilde{{\dot{y}}}_1 \\ \tilde{{\dot{y}}}_2 \\ \vdots \\ \tilde{{\dot{y}}}_{n_{sys}} \\ \end{array}\right) \\ & = & \begin{array}{c} \underbrace{\left( \begin{array}{ccccccc} \!I_{n_{out,1}}\!\! &{} 0 &{} 0 &{} 0 &{} \cdots &{} 0 &{} 0 \\ 0 &{} 0 &{} \!\!I_{n_{out,2}}\!\! &{} 0 &{} \cdots &{} 0 &{} 0 \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} 0 &{} 0 &{} \cdots &{} I_{n_{out,n_{sys}}}\!\!\! &{} 0 \\ \hline 0 &{} \!I_{n_{out,1}}\!\! &{} 0 &{} 0 &{} \cdots &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} \!\!I_{n_{out,2}}\! &{} \cdots &{} 0 &{} 0 \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} 0 &{} 0 &{} \cdots &{} 0 &{} \!\!\!I_{n_{out,n_{sys}}}\! \\ \end{array}\right) }\\ R^{\underline{\smash {y}}\vphantom{y}}\\ \end{array} \left( \begin{array}{c} {\tilde{y}}_1 \\ \tilde{{\dot{y}}}_1 \\ {\tilde{y}}_2 \\ \tilde{{\dot{y}}}_2 \\ \vdots \\ {\tilde{y}}_{n_{sys}} \\ \tilde{{\dot{y}}}_{n_{sys}} \\ \end{array}\right)\end{array} \end{aligned}$$
(5)

As explained in Sect. 2.2, a reduced number of inputs and outputs is advised. On the extractors and rearrangement operators, a high number of interface variables will produce large matrices. However, the operations implying such matrices will not be strongly impacted as they are not explicitely constructed in practice. Indeed, these operators help to define the mathematical formalism (namely the callback function, further introduced in Sect. 3.2) yet in practice the application of the extractor operators can be done while communicating with a simple call to \({\texttt {MPI\_Scatterv}}\) function, and the application of the rearrangement operator can be done while communicating with a simple call to \({\texttt {MPI\_Gatherv}}\) function (such workflow will be presented further in this paper and illustrated in Fig. 7). Lastly, in the implementation, the \(E_k^u\) (for all k in \([\![1, n_{sys}]\!]\)) and \(R^{\underline{\smash {y}}\vphantom{u}}\) matrices will never be assembled.

2.4 Time discretization

In the context of co-simulation, the \(g_k\) and \(f_k\) functions in (1) are usually not available directly. Thus, several co-simulation steps, the so-called “macro-steps”, are made between \(t^{{\mathrm{init}}}\) and \(t^{{\mathrm{end}}}\). Let’s introduce the notations of the discrete version of the quantities introduced in Sect. 2.2.

A macro-step will be defined by its starting and ending times, respectively denoted as \([t^{[N]}, t^{[N+1]}]\) for the \(N^{\mathrm{th}}\) macro-step. The subscript \(^{[N]}\) will be written with square brackets to avoid confusion with power exponents (exempli gratia: \(t^2\)). The macro-steps define a partition of the time-domain, as described in (6) and Fig. 1.

$$\begin{aligned} \left\{ \begin{array}{l} \begin{array}{lcl} [t^{{\mathrm{init}}}, t^{{\mathrm{end}}}[ &{} & = {} \displaystyle {\bigcup _{N=0}^{N_{\max }-1}} [t^{[N]}, t^{[N+1]}[ \\ t^{[0]} &{} & = {} t^{{\mathrm{init}}}\\ t^{[N_{\max }]} &{} & = {} t^{{\mathrm{end}}}\\ \end{array}\\ { \forall N \in [\![0, N_{\max }-1]\!],\ t^{[N+1]} > t^{[N]} }\\ \end{array} \right. \end{aligned}$$
(6)

Let \(\delta t^{[N]}\) denote the size of the \(N{\mathrm{th}}\) macro-step:

$$\begin{aligned} \left\{ \begin{array}{l} \forall N \in [\![0, N_{\max }-1]\!],\ \delta t^{[N]} = t^{[N+1]}-t^{[N]} > 0 \\ \displaystyle {\sum _{N=0}^{N_{\max }-1}} \delta t^{[N]} = t^{{\mathrm{end}}}- t^{{\mathrm{init}}}\end{array} \right. \end{aligned}$$
(7)

Let \({\mathbb {T}}\) denote the set of possible macro-steps.

$$\begin{aligned} {\mathbb {T}} \overset{\varDelta }{=} \{[a, b[\ |\ t^{{\mathrm{init}}}\leqslant a < b \leqslant t^{{\mathrm{end}}}\} \end{aligned}$$
(8)

An element of this set is a macro-step: for instance \(\tau \in {\mathbb {T}}\) with \(\tau = [t^{[N]}, t^{[N+1]}[\).

Fig. 1
figure 1

Partition of the time domain in macro-steps

On a given macro-step \([t^{[N]}, t^{[N+1]}[\), \(N\in [0, N_{\max }]\), for all systems, the restrictions of the piecewise equivalents of \(u_k\) and \(y_k\) will be denoted by \(u_k^{[N]}\) and \(y_k^{[N]}\) respectively. In case several iterations are made on the same step, we will refer to the functions by a left superscript index m. Finally, we will denote the coordinate of these vectors with an extra subscript index.

$$\begin{aligned}&\forall k\in [\![1, n_{sys}]\!], \forall N\in [\![0, N_{\max }]\!],\ \forall m\in [0, m_{\max }(N)], \nonumber \\&\forall j\in [\![1, n_{in,k}]\!], ^{[m]}u_{k, j}^{[N]} \in L([t^{[N]}, t^{[N+1]}[, {\mathbb {R}}) \nonumber \\&\forall i\in [\![1, n_{out,k}]\!], ^{[m]}y_{k, i}^{[N]} \in L([t^{[N]}, t^{[N+1]}[, {\mathbb {R}}) \end{aligned}$$
(9)

In (9), \(m_{\max }(N)\) denotes the number of iterations (minus one) done on the \(N^{\mathrm{th}}\) macro-step. \(m_{\max }(N)\) across N can be plotted in order to see where the method needs to proceed more or less iterations.

All derived notations introduced in this subsection can also be applied to the total input and output vectors.

$$\begin{aligned}&\forall N\in [\![0, N_{\max }]\!],\ \forall m\in [\![0, m_{\max }(N)]\!],\nonumber \\&\forall {\bar{\jmath }}\in n_{in,tot}, ^{[m]}{\underline{\smash {u}}\vphantom{u}}_{{\bar{\jmath }}}^{[N]} \in L([t^{[N]}, t^{[N+1]}[, {\mathbb {R}}) \nonumber \\&\forall {\bar{\imath }}\in n_{out,tot}, ^{[m]}{\underline{\smash {y}}\vphantom{y}}_{{\bar{\imath }}}^{[N]} \in L([t^{[N]}, t^{[N+1]}[, {\mathbb {R}}) \end{aligned}$$
(10)

Indices \({\bar{\imath }}\) and \({\bar{\jmath }}\) in (10) will be called global indices in opposition to the local indices i and j in (9).

2.5 Step function

Let \(S_k,\ k\in [\![1, n_{sys}]\!]\) be the ideal step function of the \(k^{\mathrm{th}}\) system, that is to say the function which takes the system to its future state one macro-step forward.

$$\begin{aligned} S_k:\left\{ \begin{array}{lcl} {\mathbb {T}} \times L([t^{{\mathrm{init}}}, t^{{\mathrm{end}}}[, \mathbb {R}^{n_{in,k}}) \times \mathbb {R}^{n_{st,k}} &{} \rightarrow &{} \mathbb {R}^{n_{out,k}} \times \mathbb {R}^{n_{st,k}} \\ (\tau ,\ u_k,\ {\tilde{x}}) &{} \mapsto &{} S_k(\tau ,\ u_k,\ {\tilde{x}}) \end{array} \right. \end{aligned}$$
(11)

In practice, the state vector \({\tilde{x}}\) will not be explicited. Indeed, it will be embedded inside of the system k and successive calls will either be done:

  • with \(\tau\) beginning where the \(\tau\) at the previous call of \(S_k\) ended (moving on),

  • with \(\tau\) beginning where the \(\tau\) at the previous call of \(S_k\) started (step replay),

  • with \(\tau\) of the shape \([t^{{\mathrm{init}}}, t^{[1]}[\) with \(t^{[1]}\in ]t^{{\mathrm{init}}}, t^{{\mathrm{end}}}]\) (first step).

Moreover, the \(u_k\) argument only needs to be defined on domain \(\tau\) (not necessarily on \([t^{{\mathrm{init}}}, t^{{\mathrm{end}}}[\)). Thus, \(S_k\) will not be considered in the method, but the \({\hat{S}}_k\) function (practical step function) defined hereafter will be considered instead. Despite \({\hat{S}}_k\) is not properly mathematically defined (the domain depends on the value of one of the arguments: \(\tau\) and some quantities are hidden: the states), it does not lead to any problem, considering the hypotheses above.

$$\begin{aligned} \begin{array}{c} {\hat{S}}_k:\left\{ \begin{array}{lcl} {\mathbb {T}} \times L(\tau , \mathbb {R}^{n_{in,k}}) &{} \mapsto &{} \mathbb {R}^{n_{out,k}} \\ (\tau , u_k) &{} \mapsto &{} {\hat{S}}_k(\tau , u_k) \end{array} \right. \\ \text {satisfying} \\ {\hat{S}}_k([t^{[N]}, t^{[N+1]}[,\ ^{[m]}u_k^{[N]}) =\ ^{[m]}y_k^{[N]}(t^{[N+1]}) \end{array} \end{aligned}$$
(12)

The \({\hat{S}}_k\) function is the one available in practice, namely in the FMI (Functional Mock-up Interface) standard.

2.6 Extended step function

The values of the output variables might not be sufficient for every co-simulation scheme. It is namely the case for both classical IFOSMONDI and IFOSMONDI-JFM. Indeed, the time-derivatives of the outputs are also needed.

Let \(\hat{{\hat{S}}}_k\) be the extension of \({\hat{S}}_k\) returning both the output values and derivatives.

$$\begin{aligned} \begin{array}{c} \hat{{\hat{S}}}_k:\left\{ \begin{array}{lcl} {\mathbb {T}} \times L(\tau , \mathbb {R}^{n_{in,k}}) &{} \mapsto &{} \mathbb {R}^{n_{out,k}} \times \mathbb {R}^{n_{out,k}} \\ (\tau , u_k) &{} \mapsto &{} \hat{{\hat{S}}}_k(\tau , u_k) \end{array} \right. \\ \text {satisfying} \\ \\ \hat{{\hat{S}}}_k([t^{[N]}, t^{[N+1]}[,\ ^{[m]}u_k^{[N]}) = \left( \begin{array}{c} ^{[m]}y_k^{[N]}(t^{[N+1]}) \\ \\ \displaystyle {\frac{d\ ^{[m]}y_k^{[N]}}{dt}}(t^{[N+1]}) \end{array} \right) \end{array} \end{aligned}$$
(13)

To evaluate \(\hat{{\hat{S}}}_k\), system k is integrated over the time-domain \(\tau\) (first argument) with inputs given by the second argument: vectorial function (of dimension \(n_{in,k}\)) of the time, and the values and derivatives of the outputs (vectorial function of the time, of dimension \(n_{out,k}\)) are returned, evaluated at the time corresponding to the end of the first argument (\(\sup (\tau )\)). Figure 2 presents an example of this workflow.

Fig. 2
figure 2

Extended step function’s workflow visualization on an example where system k has 2 inputs (\(n_{in,k}=2\)), 2 states, and 2 outputs (\(n_{out,k}=2\))

2.7 Connections

The connections between the systems will be denoted by a matrix filled with zeros and ones, with \(n_{out,tot}\) rows and \(n_{in,tot}\) columns denoted by \(\varPhi\). Please note that if each output is connected to exactly one input, \(\varPhi\) is a square matrix. Moreover, it is a permutation matrix. Otherwise, if an output is connected to several inputs, more than one 1 appears at the corresponding row of \(\varPhi\). Without loss of generality, let’s consider that there can neither be more nor less than one 1 on each column of \(\varPhi\) considering that an input can neither be connected to none nor several outputs. Indeed, a system with an input connected to nothing is not possible (a value has to be given), and a connection of several outputs in the same input can always be decomposed regarding a relation (sum, difference, ...) so that this situation is similar to distinct inputs connected to a single output each, with these inputs are combined (added, substracted, ...) inside of the system considered.

$$\begin{aligned} \begin{array}{l}\forall {\bar{\imath }}\in n_{out,tot},\ \forall {\bar{\jmath }}\in n_{in,tot},\\ \varPhi _{{\bar{\imath }}, {\bar{\jmath }}} = \left\{ \begin{array}{ll} 1 &{} \hbox {if output } {\bar{\imath }}\ \hbox {is connected to input } {\bar{\jmath }}\\ 0 &{} \text {otherwise} \end{array} \right. \end{array} \end{aligned}$$
(14)

An example of a connection matrix is presented in Fig. 3.

Fig. 3
figure 3

Example of a 3-system co-simulation model with its interfaces and its \(\varPhi\) matrix

The dispatching will denote the stage where the inputs are generated from their connected inputs, using the connections represented by \(\varPhi\).

$$\begin{aligned} {\tilde{\underline{\smash {u}}\vphantom{u}}} = \varPhi ^T {\tilde{\underline{\smash {y}}\vphantom{y}}} \end{aligned}$$
(15)

Analogously to the extractor and rearrangement operators introduced in Sect. 2.3, the \(\varPhi\) matrix does not need to be explicitly constructed in practice. Indeed, the implementation only needs to know the connections to proceed with the dispatching (15).

The coupling function (16) will denote the absolute difference between corresponding connected variables in a total input vector and a total output vector. In other words, it represents the absolute error between a total input vector and the dispatching of a total output vector. The \(\lambda\) subscript does not correspond to a quantity, it is a simple notation inherited from a “Lagrange multipliers” approach of system coupling [14].

$$\begin{aligned} g_{\lambda }:\left\{ \begin{array}{lcl} \mathbb {R}^{n_{in,tot}}\times \mathbb {R}^{n_{out,tot}}&{} \rightarrow &{} \mathbb {R}^{n_{in,tot}}\\ ({\tilde{\underline{\smash {u}}\vphantom{u}}}, {\tilde{\underline{\smash {y}}\vphantom{y}}}) &{} \mapsto &{} |{\tilde{\underline{\smash {u}}\vphantom{u}}} - \varPhi ^T {\tilde{\underline{\smash {y}}\vphantom{y}}}| \end{array} \right. \end{aligned}$$
(16)

The coupling condition (17) is the situation where every output of the total output vector corresponds to its connected input in the total input vector.

$$\begin{aligned} g_{\lambda }({\tilde{\underline{\smash {u}}\vphantom{u}}}, {\tilde{\underline{\smash {y}}\vphantom{y}}}) = 0_{\mathbb {R}^{n_{in,tot}}} \end{aligned}$$
(17)

3 IFOSMONDI-JFM method

3.1 Modified extended step function

As in classical IFOSMONDI [6], the IFOSMONDI-JFM method preserves the \(C^1\) smoothness of the interface variables at the communication times \((t^{[N]})_{N\in [\![1, N_{\max }-1]\!]}\). Thus, when a time \(t^{[N]}\) has been reached, the input functions for every system will all satisfy the property (18) illustrated in Fig. 4.

$$\begin{aligned}&{\forall k\in [\![1, n_{sys}]\!],\ \forall m\in [\![0, m_{\max }(N)]\!], } \nonumber \\&\quad \left\{ \begin{array}{llll} &{} ^{[m]}u_k^{[N]}(t^{[N]}) &{} & = {} \ ^{[m_{\max }(N-1)]}u_k^{[N-1]}(t^{[N]}) \\ &{} \displaystyle {\frac{d\ ^{[m]}u_k^{[N]}}{dt}}(t^{[N]}) &{} & = {} \ \displaystyle {\frac{d\ ^{[m_{\max }(N-1)]}u_k^{[N-1]}}{dt}}(t^{[N]}) \\ \end{array}\right. \end{aligned}$$
(18)
Fig. 4
figure 4

\(C^1\) smoothness constraints at the left of \(\tau\) for \(j^{\mathrm{th}}\) input of system k

The IFOSMONDI-JFM method also represents the inputs as \(3^{\mathrm{rd}}\) order polynomial (maximum) to satisfy the smoothness condition (18) and to respect imposed values and derivatives at \(t^{[N+1]}\) for every macro-step.

Knowing these constraints, it is possible to write a specification of the practical step function \(\hat{{\hat{S}}}_k\) in the IFOSMONDI-JFM case (also applicable in the classical IFOSMONDI method):

$$\begin{aligned} \zeta _k:\left\{ \begin{array}{lcl} {\mathbb {T}} \times \mathbb {R}^{n_{in,k}} \times \mathbb {R}^{n_{in,k}} &{} \mapsto &{} \mathbb {R}^{n_{out,k}} \times \mathbb {R}^{n_{out,k}} \\ (\tau , {\tilde{u}}_k, \tilde{{\dot{u}}}_k) &{} \mapsto &{} \zeta _k(\tau , {\tilde{u}}_k, \tilde{{\dot{u}}}_k) \end{array} \right. \end{aligned}$$
(19)

where the three cases discussed in Sect. 2.5 have to be considered.

Once each of these cases has been detailed, Figs. 5 and 6 will show the succession of such cases.

3.1.1 Case 1: Moving on

In this case, the last call to \(\zeta _k\) was done with a \(\tau \in {\mathbb {T}}\) ending at current \(t^{[N]}\). In other words, the system k “reached” time \(t^{[N]}\). The inputs were, at this last call: \(^{[m_{\max }(N-1)]}u_k^{[N-1]}\).

To reproduce a behavior analog to that of the classical IFOSMONDI method, the inputs \(^{[0]}u_k^{[N]}\) will be defined as the \(2^{\mathrm{nd}}\) order polynomial (or less) satisfying the three following constraints:

$$\begin{aligned} \begin{array}{lcl} ^{[0]}u_k^{[N]}(t^{[N]}) &{} & = {} ^{[m_{\max }(N-1)]}u_k^{[N-1]}(t^{[N]}) \\ \\ \displaystyle {\frac{d\ ^{[0]}u_k^{[N]}}{dt}}(t^{[N]}) &{} & = {} \displaystyle {\frac{d\ ^{[m_{\max }(N-1)]}u_k^{[N-1]}}{dt}}(t^{[N]}) \\ \\ ^{[0]}u_k^{[N]}(t^{[N+1]}) &{} & = {} ^{[m_{\max }(N-1)]}u_k^{[N-1]}(t^{[N]}) \\ \end{array} \end{aligned}$$
(20)

The two first constraints guarantee the smoothness property (18), and the third one minimizes the risk of out-of-range values (as in the classical IFOSMONDI method).

In this case, \(\zeta _k\) in (19) is defined by the specification (21).

$$\begin{aligned} \begin{array}{r} \zeta _k([t^{[N]}, t^{[N+1]}[, \cdot , \cdot ) = \hat{{\hat{S}}}_k([t^{[N]}, t^{[N+1]}[,\ \underbrace{^{[0]}u_k^{[N]}}) \\ \text {computed with (20)} \end{array} \end{aligned}$$
(21)

\(2{\mathrm{nd}}\) and \(3{\mathrm{rd}}\) arguments of \(\zeta _k\) are unused.

3.1.2 Case 2: Step replay

In this case, the last call to \(\zeta _k\) was done with a \(\tau \in {\mathbb {T}}\) starting at current \(t^{[N]}\). In other words, the system did not manage to reach the ending time of the previous \(\tau\) (either because the method did not converge, or because the step has been rejected, or another reason).

Two particular subcases have to be considered here: either the step we are computing is following the previous one in the iterative method detailed after this section, or the previous iteration has been rejected and we are trying to re-integrate the step starting from \(\tau\) with a smaller size \(\delta t^{[N]}\).

3.1.2.1 Subcase 2.1: Following a previous classical step

In this subcase, the last call of \(\zeta _k\) was not only done with the same starting time, but also with the same step ending time \(t^{[N+1]}\). The inputs were, at this last call: \(^{[m-1]}u_k^{[N]}\) with \(m\geqslant 1\), and satisfied the two conditions at \(t^{[N]}\) of (21).

The jacobian-free iterative method will ask for given input values \({\tilde{u}}_k\) and time-derivatives \(\tilde{{\dot{u}}}_k\) that will be used as constraints at \(t^{[N+1]}\), thus \(^{[m]}u_k^{[N]}\) will be defined as the \(3{\mathrm{rd}}\) order polynomial (or less) satisfying the four constraints depicted in (22).

$$\begin{aligned} \begin{array}{lclcl} ^{[m]}u_k^{[N]}(t^{[N]}) & = & {} ^{[m_{\max }(N-1)]}u_k^{[N-1]}(t^{[N]}) & = & {} ^{[m-1]}u_k^{[N]}(t^{[N]}) \\ \displaystyle {\frac{d\ ^{[m]}u_k^{[N]}}{dt}}(t^{[N]}) & = & {} \displaystyle {\frac{d\ ^{[m_{\max }(N-1)]}u_k^{[N-1]}}{dt}}(t^{[N]}) & = & {} \displaystyle {\frac{d\ ^{[m-1]}u_k^{[N]}}{dt}}(t^{[N]}) \\ {}^{[m]}u_k^{[N]}(t^{[N+1]}) & = & {\tilde{u}}_k \\ \displaystyle {\frac{d\ ^{[m]}u_k^{[N]}}{dt}}(t^{[N+1]}) & = & {} \tilde{{\dot{u}}}_k \end{array} \end{aligned}$$
(22)

The two firsts constraints ensure the (18) smoothness property, and the third and fourth one will enable the iterative method to find the best values and derivatives to satisfy the coupling condition.

In this subcase, \(\zeta _k\) in (19) is defined by the specification (23).

$$\begin{aligned} \begin{array}{r} \zeta _k([t^{[N]}, t^{[N+1]}[, {\tilde{u}}_k, \tilde{{\dot{u}}}_k) = \hat{{\hat{S}}}_k([t^{[N]}, t^{[N+1]}[,\ \underbrace{^{[m]}u_k^{[N]}}) \\ \text {computed with (22)} \end{array} \end{aligned}$$
(23)
3.1.2.2 Subcase 2.2: Re-integrate a step starting from  \(t^{[N]}\) but with different \(\delta t^{[N]}\) than at the previous call of \(\zeta _k\)

In this subcase, current \(t^{[N+1]}\) is different from \(\sup {(\tau )}\) with \(\tau\) being the one used at the last call of \(\zeta _k\).

As it shows that a step rejection just occurred, we will simply do the same than in case 1, as if we were moving on from \(t^{[N]}\). In other words, all calls to \(\zeta _k\) with \(\tau\) starting at \(t^{[N]}\) are “forgotten”.

Please note that \(^{[m_{\max }(N-1)]}u_k^{[N-1]}(t^{[N]})\) and \(\displaystyle {\frac{d\ ^{[m_{\max }(N-1)]}u_k^{[N-1]}}{dt}}(t^{[N]})\) can be retrieved using the values and derivatives constraints at \(t^{[N]}\) of the inputs at the last call of \(\zeta _k\) thanks to the smoothness constraint (18).

3.1.3 Case 3: First step

In this particular case, we will do the same as in the other cases, except that we would not impose any constraint for the time-derivative at \(t^{{\mathrm{init}}}\). That is to say:

  • at the first call of \(\zeta _k\), we have \(N=m=0\), we will only impose \(^{[0]}u_k^{[0]}(t^{{\mathrm{init}}}) =\ ^{[0]}u_k^{[0]}(t^{[1]}) = u_{k}^{\text {init}}\) to have a zero order polynomial satisfying the initial conditions \(u_{k}^{\text {init}}\) (supposed given),

  • at the other calls, case 2 will be used without considering the constraints for the derivatives at \(t^{{\mathrm{init}}}\) (this will lower the polynomial’s degrees). For (22), the first condition becomes \(^{[m]}u_k^{[N]}(t^{{\mathrm{init}}}) = u_{k}^{\text {init}}\), the second one vanishes, and the third ans fourth ones remain unchanged. For the subcase 2.2, it can be considered that \(^{[m_{\max }(-1)]}u_k^{[-1]}(t^{{\mathrm{init}}}) = u_{k}^{\text {init}}\), and \(\displaystyle {\frac{d\ ^{[m_{\max }(-1)]}u_k^{[-1]}}{dt}}(t^{{\mathrm{init}}})\) will not be needed as it is a time-derivative in \(t^{{\mathrm{init}}}\).

Finally, we have \(\zeta _k\) defined in every case, wrapping both the computation of the polynomial inputs and the integration done with \(\hat{{\hat{S}}}_k\).

The workflow consisting in the succession of the cases detailed above can be visualized in Fig. 5. An example on a given single input of a given single system is presented in Fig. 6 on 2 successive co-simulation steps. Squared number 1 to 6 denote the order of the successive input computations.

Fig. 5
figure 5

Workflow of the calibration of the inputs, visualization on a single given \(j{\mathrm{th}}\) input of a given system k, and algorithm’s tasks in transitions between cases. This figure does not represent the whole method: it only focuses on the inputs calibration

Fig. 6
figure 6

Focus on a single input j of system k: on the co-simulation step \(\tau\), it can be seen that the constraints at the beginning of the steps come from the last iteration of the previous co-simulation step, and the constraints at the end of the steps come from the method, or are artificial (for the first iteration)

Until here, the polynomial inputs computation stage during an evaluation of \(\zeta _k\) for \(k\in [\![1, n_{sys}]\!]\) has been detailed among all the possible cases. However, the constraints at the end of the co-simulation steps have been described as “coming from the method”. Indeed, the JFM will decide of the constraints to use as they will exactly be the variables of the function to zero (the aforementioned callback function, see Sect. 3.2).

3.2 Iterative method’s callback function

The aim is to solve the co-simulation problem by using a jacobian-free version of an iterative method that usually requires a jacobian computation (see Sect. 2.1). Modern matrix-free versions of such algorithms make it possible to avoid perturbating the systems and re-integrating them for every input, as done in [14], to compute a finite-differences jacobian matrix. This saves a lot of integrations over each macro-step and saves time.

Nevertheless, on every considered macro-step \(\tau\), a function to be brought to zero has to be defined. This so-called JFM’s callback (standing for Jacobian-Free Method’s callback) presented hereafter will be denoted by \(\gamma _{\tau }\). In zero-order hold co-simulation, this function if often \({\tilde{\underline{\smash {u}}\vphantom{u}}}-\varPhi ^T{\tilde{\underline{\smash {y}}\vphantom{y}}}\) (or equivalent) where \({\tilde{\underline{\smash {y}}\vphantom{y}}}\) are the output at \(t^{[N+1]}\) generated by constant inputs \({\tilde{\underline{\smash {u}}\vphantom{u}}}\) over \([t^{[N]}, t^{[N+1]}[\).

In IFOSMONDI-JFM, we will only enable to change the inputs at \(t^{[N+1]}\), the smoothness condition at \(t^{[N]}\) guaranteeing that the coupling condition (17) remains satisfied at \(t^{[N]}\) if it was satisfied before moving on to the step \([t^{[N]}, t^{[N+1]}[\). The time-derivatives will also be considered to maintain \(C^1\) smoothness, so the coupling condition (17) will also be applied to these time-derivatives.

Finally, the formulation of the JFM’s callback for IFOSMONDI-JFM is given in (24).

$$\begin{aligned} \gamma _{\tau }:\left\{ \begin{array}{lcl} \mathbb {R}^{n_{in,tot}}\times \mathbb {R}^{n_{in,tot}}&{} \rightarrow &{} \mathbb {R}^{n_{in,tot}}\times \mathbb {R}^{n_{in,tot}}\\ \left( \begin{array}{c} {\tilde{\underline{\smash {u}}\vphantom{u}}} \\ \tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}} \end{array}\right) &{} \mapsto &{} \left( \begin{array}{c} {\tilde{\underline{\smash {u}}\vphantom{u}}} \\ \tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}} \end{array}\right) - \left( \begin{array}{cc} \varPhi ^T &{} 0 \\ 0 &{} \varPhi ^T \end{array}\right) R^{\underline{\smash {y}}\vphantom{y}} \left( \begin{array}{c} \zeta _1\left( \tau , E^u_1 {\tilde{\underline{\smash {u}}\vphantom{u}}}, E^u_1 \tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}}\right) \\ \vdots \\ \zeta _{n_{sys}}\left( \tau , E^u_{n_{sys}} {\tilde{\underline{\smash {u}}\vphantom{u}}}, E^u_{n_{sys}} \tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}}\right) \\ \end{array}\right) \end{array} \right. \end{aligned}$$
(24)

3.2.1 Link with the fixed-point implementation

The formulation (24) can be used to represent the expression of the fixed-point \(\varPsi _{\tau }\) function. The latter has been introduced in classical IFOSMONDI algorithm [6] where a fixed-point method was used instead of a JFM one.

We can now rewrite a proper expression of \(\varPsi _{\tau }\) including the time-derivatives.

$$\begin{aligned} \varPsi _{\tau }:\left\{ \begin{array}{lcl} \mathbb {R}^{n_{in,tot}}\times \mathbb {R}^{n_{in,tot}}&{} \rightarrow &{} \mathbb {R}^{n_{in,tot}}\times \mathbb {R}^{n_{in,tot}}\\ \left( \begin{array}{c} {\tilde{\underline{\smash {u}}\vphantom{u}}} \\ \tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}} \end{array}\right) &{} \mapsto &{} \left( \begin{array}{c} {\tilde{\underline{\smash {u}}\vphantom{u}}} \\ \tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}} \end{array}\right) - \gamma _{\tau } \left( \left( \begin{array}{c} {\tilde{\underline{\smash {u}}\vphantom{u}}} \\ \tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}} \end{array}\right) \right) \\ &{}&{} = \left( \begin{array}{cc} \varPhi ^T &{} 0 \\ 0 &{} \varPhi ^T \end{array}\right) R^{\underline{\smash {y}}\vphantom{y}} \left( \begin{array}{c} \zeta _1\left( \tau , E^u_1 {\tilde{\underline{\smash {u}}\vphantom{u}}}, E^u_1 \tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}}\right) \\ \vdots \\ \zeta _{n_{sys}}\left( \tau , E^u_{n_{sys}} {\tilde{\underline{\smash {u}}\vphantom{u}}}, E^u_{n_{sys}} \tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}}\right) \\ \end{array}\right) \end{array} \right. \end{aligned}$$
(25)

\(\varPsi _{\tau }\) was referred as \(\varPsi\) in [6] and did not include the derivatives in its formulation, yet the smoothness enhancement done by the Hermite interpolation led to an underlying use of these derivatives.

When the result of the \(m{\mathrm{th}}\) iteration is available, a fixed-point iteration on macro-step \(\tau =[t^{[N]}, t^{[N+1]}[\) is simply done by:

$$\begin{aligned} \left( \begin{array}{c} ^{[m+1]}{\tilde{\underline{\smash {u}}\vphantom{u}}} \\ ^{[m+1]}\tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}} \end{array}\right) := \varPsi _{\tau }\left ( \left( \begin{array}{c} ^{[m]}{\tilde{\underline{\smash {u}}\vphantom{u}}} \\ ^{[m]}\tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}} \end{array}\right) \right) \end{aligned}$$
(26)

3.3 First and last integrations of a step

The first iteration of a given macro-step \(\tau \in {\mathbb {T}}\) is a particular case to be taken into account. Considering the breakdown presented in Sect. 2.5, this corresponds to case 1, case 2 subcase 2.2, case 3 first bullet point, and case 3 second bullet point when falling into subcase 2.2.

All these cases have something in common: they denote calls to \(\zeta _k\) using a \(\tau\) argument that has never been used in a previous call of \(\zeta _k\). In these cases, the latter function is defined by (21).

For this reason, the first call of \(\gamma _{\tau }\) for a given macro-step \(\tau\) will be completed before applying the JFM. Then, every time the JFM will call \(\gamma _{\tau }\), the \((\zeta _k)_{k\in [\![1, n_{sys}]\!]}\) functions called by \(\gamma _{\tau }\) will behave the same way.

Once the JFM method ends, if it converged, a last call to \(\gamma _{\tau }\) is made with the solution \(\big ((^{[m_{\max }(N)]}{\tilde{\underline{\smash {u}}\vphantom{u}}}^{[N]})^T,\ (^{[m_{\max }(N)]}\tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}}^{[N]})^T\big )^T\) for the systems to be in a good state for the next step (as explained in Sect. 2.5, the state of a system is hidden but affected at each call to a step function).

3.4 Step size control

The step size control is defined with the same rule-of-thumbs than the one used in [6]. The adaptation is not done on an error-based criterion such as in [13], but instead with a predefined rule based on the convergence of the iterative method (yes/no).

A minimal step size \(\delta t_{{\mathrm{min}}}\in \mathbb {R_*^+}\), a maximal step size \(\delta t_{{\mathrm{max}}}\in \mathbb {R_*^+}\) and an initial step size \(\delta t_{{\mathrm{init}}}\in [\delta t_{{\mathrm{min}}}, \delta t_{{\mathrm{max}}}]\) are defined for any simulation with IFOSMONDI-JFM method. At certain times (the communication times), the method will be allowed to reduce this step to help the convergence of the JFM.

The convergence criterion for the iterative method is defined by the rule (27).

$$\begin{aligned} \begin{array}{l} \text {Given}\ (\varepsilon _{\text {abs}}, \varepsilon _{\text {rel}})\in (\mathbb {R_+^*})^2, \\ \text {convergence is reached when} \\ \left| \gamma _{\tau } \left( \begin{array}{c} {\tilde{\underline{\smash {u}}\vphantom{u}}} \\ \tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}} \end{array} \right) \right| < \left| \left( \begin{array}{c} {\tilde{\underline{\smash {u}}\vphantom{u}}} \\ \tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}} \end{array} \right) \right| \varepsilon _{\text {rel}} + \left| \left( \begin{array}{c} 1 \\ \vdots \\ 1 \end{array} \right) \right| \varepsilon _{\text {abs}} \end{array} \end{aligned}$$
(27)

When the iterative method does not converge on the step \([t^{[N]}, t^{[N+1]}[\), either because a maximum number of iterations is reached or for any other reason (linear search does not converge, a Krylov internal method finds a singular matrix, ...), the step will be rejected and retried on the half (28) without subceeding \(\delta t_{{\mathrm{min}}}\). Otherwise, once the method converged on \([t^{[N]}, t^{[N+1]}[\), the next integration step \(\tau\) tries to increase the size of \(30\%\), without exceeding \(\delta t_{{\mathrm{max}}}\).

Once the iterative method exits on \(\tau _{\text {old}}\), the next step \(\tau _{\text {new}}\) is defined by expression (28).

$$\tau _{\text {new}} = \left\{ \begin{array}{l} \bigg [ \sup (\tau _{\text {old}}), \min \!\bigg \{t^{{\mathrm{end}}}, \sup (\tau _{\text {old}}) + \min \!\Big \{\delta t_{{\mathrm{max}}}, 1.3\ \big (\sup (\tau _{\text {old}})-\inf (\tau _{\text {old}})\big ) \Big \}\bigg \}\bigg [ \text {if convergence (27) was reached} \bigg [\inf (\tau _{\text {old}}), \inf (\tau _{\text {old}}) + \max \!\Big \{\delta t_{{\mathrm{min}}}, \displaystyle {\frac{\sup (\tau _{\text {old}})-\inf (\tau _{\text {old}})}{2}}\!\Big \}\bigg [\text {otherwise (divergence)} \\ \end{array} \right. $$
(28)

When \(\varepsilon _{\text {abs}} = \varepsilon _{\text {rel}}\), these values will be denoted by \(\varepsilon\).

When \(\delta t_{{\mathrm{max}}}= \delta t_{{\mathrm{init}}}\), these values will be denoted by \(\delta t_{{\mathrm{ref}}}\).

When the step size cannot be reduced as \(\delta t_{{\mathrm{min}}}\) is reached, the co-simulation stops with an error. One can retry with a smaller \(\delta t_{{\mathrm{min}}}\), or with \(\delta t_{{\mathrm{min}}}= 0\).

4 Note on the implementation

Our implementation is based on an orchestrator-worker architecture, where \(n_{sys}+1\) processes are involved. One of them is dedicated to global manipulations: the orchestrator. It is not responsible of any system and only deals with global quantities (such as the time, the step \(\tau\), the \({\tilde{\underline{\smash {u}}\vphantom{u}}}\) and \({\tilde{\underline{\smash {y}}\vphantom{y}}}\) vectors and the corresponding time-derivatives, and so on). The \(n_{sys}\) remaining processes, the workers, are responsible for one system each. They only deal with local quantities related to the system they are responsible for.

4.1 Parallel evaluation of \(\gamma _{\tau }\) using MPI

An evaluation of \(\gamma _{\tau }\) consists in evaluations of the \(n_{sys}\) functions \((\zeta _k)_{k\in [\![1, n_{sys}]\!]}\), plus some manipulations of vectors and matrices (24). An evaluation of a single \(\zeta _k\) for a given \(k\in [\![1, n_{sys}]\!]\) consists in polynomial computations and an integration (21) (23) through a call of the corresponding \(\hat{{\hat{S}}}_k\) function (13).

A single call to \(\gamma _{\tau }\) can be evaluated parallelly by \(n_{sys}\) processes, each of them carrying out the integration of one of the systems. To achieve this, the MPI standard (standing for Message Passing Interface has been used, as the latter provides a routine to handle multi-process communications of data.

As the \(k^{\mathrm{th}}\) system only needs \(E^u_k {\tilde{\underline{\smash {u}}\vphantom{u}}}\) and \(E^u_k \tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}}\) (see (3)) among \({\tilde{\underline{\smash {u}}\vphantom{u}}}\) and \(\tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}}\), the data can be send in an optimized manner from the orchestrator process to \(n_{sys}\) workers by using the \({\texttt {MPI\_Scatterv}}\) routine.

Analogously, each worker process will have to communicate their contribution both to the outputs and their derivatives (assembling the block vector at the right of the expression (24)). This can be done by using the \({\texttt {MPI\_Gatherv}}\) routine.

Finally, the communication of global quantities such as \(\tau\), m, the notifications of statuses and so on, can be done easily thanks to the \({\texttt {MPI\_Broadcast}}\) routine.

In all cases, the communications are organized in a “bus” architecture (all workers communicate with the orchestrator, but not to one another). Synchronization points before and after each evaluation of all \(\zeta _k\) functions for all k in \([\![1, n_{sys}]\!]\) (in a single call of \(\gamma _{\tau }\)) would generate, in the worst case (when every system has connection with every other system), \(n_{sys}(n_{sys}-1)\) communications for every input/output dispatching or gathering in a point-to-point architecture, whereas only \(2\ n_{sys}\) communications are needed for a bus architecture, with the same total amount of exchanged data. Thus, our code uses the bus architecture.

4.2 Using PETSc for the JFM

PETSc [2, 3] is a library used for parallel numerical computations. For this paper, the several matrix-free versions of the Newton method and variants implemented in PETSc were very attractive. Indeed, the flexibility of this library at runtime enables the use of command-line arguments to control the resolution: \({\texttt {-snes\_mf}}\) orders the use of a matrix-free non-linear solver, \({\texttt {-snes\_type newtonls}}\), \({\texttt {anderson}}\) [1] and \({\texttt {ngmres}}\) [10] are various usable solving methods that can be used as JFMs, \({\texttt {-snes\_atol}}\), \({\texttt {-snes\_rtol}}\) and \({\texttt {-snes\_max\_it}}\) control the convergence criterion, \({\texttt {-snes\_converged\_reason}}\), \({\texttt {-snes\_monitor}}\) and \({\texttt {-log\_view}}\) produce information and statistics about the run, ...

This subsection proposes a solution to use these PETSc implementations in a manner that is compliant with the parallel evaluation of the JFM’s callback (24). This implementation has been used to generate the results of Sect. 5.

First of all, PETSc needs a view on the environment of the running code: the processes, and their relationships. In our case, the \(n_{sys}+1\) processes of the orchestrator-worker architecture are not dedicated to the JFM. Thus, PETSc runs on the orchestrator process only. In terms of code, this can be done by creating PETSc objects referring to \({\texttt {PETSC\_COMM\_SELF}}\) communicator on the orchestrator process, and creating no PETSc object on the workers.

The callback \(\gamma _{\tau }\) implements internally the communications with the workers, and is given to the PETSc \({\texttt {SNES}}\) object. The \({\texttt {SNES}}\) non-linear solver will call this callback blindly, and the workers will be triggered behind the scene for integrations, preceded by the communications of the \(\big ((^{[m_{\max }(N)]}{\tilde{\underline{\smash {u}}\vphantom{u}}}^{[N]})^T,\ (^{[m_{\max }(N)]}\tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}}^{[N]})^T\big )^T\) values asked by the \({\texttt {SNES}}\) and followed by the gathering of the outputs and related derivatives. The latters are finally returned to PETSc by the callback on the orchestrator side, after reordering and dispatching them as in (24).

4.3 JFM’s callback implementation

In this section, a suggestion of implementation is proposed for the \(\gamma _{\tau }\) function, both on the orchestrator side and on the workers side. Precisions about variables in the snippets are given below them.

By convention, the process of rank 0 is the orchestrator, and any process of rank \(k\in [\![1, n_{sys}]\!]\) is responsible of system k.

figure a

On the worker’s side, the corresponding running code section is the one in Snippet 2.

figure b

The aim is not to show the code that has been used to generate the results of Sect. 5, but to figure out how to combine the PETSc and MPI standard (PETSc being based on MPI) to implement a parallel evaluation of \(\gamma _{\tau }\).

In the code snippet 1, the function \({\texttt {JFM\_callback}}\) is the one that is given to the PETSc \({\texttt {SNES}}\) object with \({\texttt {SNESSetFunction}}\). The context pointer \({\texttt {ctx}}\) can be anything that can be used to have access to extra data inside of this callback. The principle is: when \({\texttt {SNESSolve}}\) is called, the callback function which has been given to the \({\texttt {SNES}}\) object will be called an unknown number of times. For this example, we suggested a context structure \({\texttt {MyCtxType}}\) at least containing:

  • \({\texttt {t\_N}}\), \({\texttt {t\_Np1}}\) the boundary times of \(\tau\), id est \(t^{[N]}\) and \(t^{[N+1]}\) (as \({\texttt {double}}\) each),

  • \({\texttt {n\_in\_tot}}\) the total number of inputs \(n_{in,tot}\) (as \({\texttt {size\_t}}\)),

  • \({\texttt {double\_u\_and\_du}}\) an array dedicated to the storage of \(({\tilde{\underline{\smash {u}}\vphantom{u}}}^T, \tilde{{\dot{\underline{\smash {u}}\vphantom{u}}}}^T)^T\) (as \({\texttt {double*}}\)),

  • \({\texttt {in\_sizes}}\) the array containing the number of inputs for each process \((n_{in,k})_{k\in [\![0, n_{sys}]\!]}\) including process 0 (with the convention \(n_{in,0}=0\)) (as \({\texttt {int*}}\)),

  • \({\texttt {in\_offsets}}\) the memory displacements \(\left( \sum _{l=1}^{k}n_{in,l}\right) _{k\in [\![0, n_{sys}]\!]}\) for inputs scattering for each process (as \({\texttt {int*}}\)),

  • \({\texttt {work1\_n\_out\_tot}}\) and \({\texttt {work2\_n\_out\_tot}}\) two arrays of size \(n_{out,tot}\) for temporary storage (as \({\texttt {double*}}\)),

  • \({\texttt {out\_sizes}}\) and \({\texttt {out\_offsets}}\) two arrays analogous to \({\texttt {in\_sizes}}\) and \({\texttt {in\_offsets}}\) respectively, considering the outputs,

  • \({\texttt {n\_sys}}\) tot number of systems \(n_{sys}\) (as \({\texttt {size\_t}}\)),

  • \({\texttt {double\_res}}\) an array of size \(2\ n_{in,tot}\) dedicated to the storage of the result of \(\gamma _{\tau }\) (as \({\texttt {double*}}\)), and

  • \({\texttt {connections}}\) any structure to represent the connections between the systems \(\varPhi ^T\) (a full matrix might be a bad idea as \(\varPhi\) is expected to be very sparse).

The function \({\texttt {dispatch}}\) is expected to process the dispatching (15) of the values given in its first argument into the array pointed by its fourth argument.

Please note that the orchestrator process has to explicitly send an order different from \({\texttt {DO\_A\_STEP}}\) (with \({\texttt {MPI\_Bcast}}\)) to notify the workers that the callback will not be called anymore. Nonetheless, this order might not be sent right after the call to \({\texttt {SNESSolve}}\) on the orchestrator side. Indeed, if the procedure converged, a last call has to be made explicitly in the orchestrator (see Sect. 3.3).

Another explicit call to \({\texttt {JFM\_callback}}\) should also be explicitly made on the orchestrator side before the call of \({\texttt {SNESSolve}}\) (as also explained in Sect. 3.3).

Figure 7 presents a schematic view of these two snippets running parallelly.

Fig. 7
figure 7

Workflow of the callback function called by SNESSolve: example with \(n_{sys}=2\) (external first call to the callback is supposed to be already made before \({\texttt {SNESSolve}}\) is called)

5 Results on test cases

Two test cases will be treated here. The first one is a simple case that enables to understand the kind of configurations that really benefit from the IFOSMONDI-JFM method (id estwhen the function of the fixed-point formulation is not contractant), and the second one is an industrial-scale model with 148 interface variables that allows the comparison of classical IFOSMONDI (based on the fixed-point method), IFOSMONDI-JFM and the natural explicit ZOH co-simulation method in terms of time/accuracy trade-off.

5.1 Mechanical model with multiple feed-through

Difficulties may appear in a co-simulation problem when the coupling is not straightforward. Some of the most difficult cases to solve are the algebraic coupling (addressed in [8]) arising from causal conflicts, and the multiple feed-through, id estthe case where outputs of a system linearly depend on its inputs, and the connected system(s) have the same behavior. In some case, this may lead to a non-contractant \(\varPsi _{\tau }\) function.

This section presents a test case we designed, belonging to this second category. The fixed-point convergence can be accurately analyzed so that its limitations are highlighted.

Please note that this test case is intentionally simple in order to easily enlight the enhancements brought by the IFOSMONDI-JFM method compared to the fixed-point IFOSMONDI method. Although very simple, this example enables to understand the convergence properties of the proposed JFM, as the latter is not objected by the non-contractance of \(\varPsi _{\tau }\) (contrary to a fixed-point underlying method like in classical IFOSMONDI).

5.1.1 Test case presentation

The test case has been modeled, parameterized and simulated with Simcenter Amesim software, a 0D modeling and simulation software developed by Siemens Industry Software. The co-simulations have been run with our code (implementing fixed-point IFOSMONDI and IFOSMONDI-JFM algorithms), coupled with the systems modeled in Simcenter Amesim for underlying \(\hat{{\hat{S}}}_k\) evaluations (see Fig. 2) in \(\zeta _k\) evaluation (polynomial input computations stage in \(\zeta _k\) happens in our code).

Fig. 8
figure 8

Mass spring damper with damping reaction modelled with Simcenter Amesim - Parameters are above, variables are below

Figure 8 represents a 1-mass test case with a classical mechanical coupling on force, velocity and position. These coupling quantities are respectively denoted by \(f_c\), \(v_c\) and \(x_c\). The component on the right represents a damper with a massless plate, computing a velocity (and integrating it to compute a displacement) by reaction to a force input.

We propose the parameters values in Table 1.

Table 1 Parameters and initial values of the test case model

All variables will be denoted by either f, v or x (corresponding to forces, velocities and positions, respectively) with an index specifying its role in the model (see Fig. 8).

The predefined force \(f_L\) is a \(C^{\infty }\) function starting from 5 N and definitely reaching 0 N at \(t=2\) s. The expression of \(f_L\) is (29) and the visualization of it is presented on Fig. 9.

$$\begin{aligned} f_L:\left\{ \begin{array}{lcl} [0, 10] &{} \rightarrow &{} [0, 5] \\ t &{} \mapsto &{} \left\{ \begin{array}{ll} \displaystyle {\frac{5}{e^{-1}}} e^{\left( \left( \displaystyle {\frac{t}{2}}\right) ^2-\ 1\right) ^{-1}} &{} \text {if}\ t < 2 \\ 0 &{} \text {if}\ t \geqslant 2 \\ \end{array} \right. \end{array} \right. \end{aligned}$$
(29)
Fig. 9
figure 9

Predefined force \(f_L\)

The expected behavior of the model is presented in Table 2 referring to conventionnal directions of Fig. 10.

Fig. 10
figure 10

Test model visualized with Simcenter Amesim

Table 2 Main stages of a simulation of the test case model

The behavior presented in Table 2 might slightly change while parameter \(D_D\) changes (all other parameters being fixed, see Table 1).

5.1.2 Equations and eigenvalues of the fixed-point callback \(\varPsi _{\tau }\)

The displacement of the mass \(M_L\) is due to the difference of the forces applied on its left side (\(f_L\), generated from a force source, cf. Fig. 9) and on its right side (\(f_{SD}\), resulting from spring compression/dilatation and damper effect). This movement can be computed using the acceleration of the mass. Indeed, second Newton’s law gives:

$$\begin{aligned} \begin{array}{lcl} {\dot{v}}_L &{} & = {} (f_L + f_{SD}) M_L^{-1} \\ {\dot{x}}_L &{} & = {} v_L \end{array} \end{aligned}$$
(30)

and the spring and damper forces can be expressed the following way:

$$\begin{aligned} \begin{array}{lcl} f_{SD} & = & K_{SD} ( x_C - x_L ) + D_{SD} (v_C - v_L) \\ f_C & = & - f_{SD} \\ f_D & = & - D_D ( 0 - v_C ) \\ f_C & = & f_D \\ v_C & = & \nicefrac{f_C}{D_D} \end{array} \end{aligned}$$
(31)

leading to the expression (32) of the coupled systems.

$$\begin{aligned} \begin{array}{l} (S_1):\left\{ \begin{array}{ccl} \left( \begin{array}{c} {\dot{v}}_L \\ {\dot{x}}_L \end{array}\right) &{} & = {} \left( \begin{array}{cc} \frac{-D_{SD}}{M_L} &{} \frac{-K_{SD}}{M_L} \\ 1 &{} 0 \end{array}\right) \left( \begin{array}{c} v_L \\ x_L \end{array}\right) + \left( \begin{array}{cc} \frac{D_{SD}}{M_L} &{} \frac{K_{SD}}{M_L} \\ 0 &{} 0 \end{array}\right) \left( \begin{array}{c} v_C \\ x_C \end{array}\right) + \left( \begin{array}{c} \frac{f_L}{M_L} \\ 0 \end{array}\right) \\ f_C &{} & = {} \left( D_{SD}\ \ K_{SD}\right) \left( \begin{array}{cc} v_L \\ x_L \end{array}\right) +\left( -D_{SD}\ \ -K_{SD}\right) \left( \begin{array}{cc} v_C \\ x_C \end{array}\right) \end{array} \right. \\ (S_2):\left\{ \begin{array}{ccl} {\dot{x}}_D &{} & = {} 0\ x_D + \frac{1}{D_D}\ f_C \\ \left( \begin{array}{c} v_C\\ x_C \end{array}\right) &{} & = {} \left( \begin{array}{c} 0 \\ 1 \end{array}\right) x_D + \left( \begin{array}{c} \frac{1}{D_D} \\ 0 \end{array}\right) f_C \end{array} \right. \end{array}. \end{aligned}$$
(32)

At a given time t, we can state the jacobian of \(\varPsi _{\tau }\) introduced in (25) using the expressions of the coupling quantities (32). Indeed, the output variables got at a call are at the same time than the one at which the imposed inputs are reached (end of the macro-step) thanks to the definitions of \(\zeta _k\).

(33)

The framed zeros are “by-design” zeros: indeed, systems never produce outputs depending on inputs given to other systems. The block called “Block” in (33) depends on the methods used to retrieve the time-derivatives of the coupling quantities (see (13) and its finite differences version). Nevertheless, this block does not change the eigenvalues of \(J_{\varPsi _{\tau }}\) as it is a block-triangular matrix. Indeed, the characteristic polynomial of \(I_6 - \lambda J_{\varPsi _{\tau }}\) is the product of the determinant of the two \(3\times 3\) blocks on the diagonal of \(I_6 - \lambda J_{\varPsi _{\tau }}\). The eigenvalues of \(J_{\varPsi }\) are:

$$\begin{aligned} 0,\ +1\!\textit{i}\sqrt{\frac{D_{SD}}{D_D}},\ -1\!\textit{i}\sqrt{\frac{D_{SD}}{D_D}}\hspace{5mm}\text {(each with a multiplicity of 2)} \end{aligned}$$
(34)

Hence, the following relation between the parameters and the spectral radius can be shown (given \(D_D > 0\) and \(D_{SD} = 1 > 0\)):

$$\begin{aligned} \varrho \left( J_{\varPsi _{\tau }}\right) \left\{ \begin{array}{lcl}< 1 &{} \ \text {if}\ D_{SD} < D_D \\ \geqslant 1 &{} \ \text {if}\ D_{SD} \geqslant D_D \end{array} \right. \end{aligned}$$
(35)

We can thus expect that the fixed-point IFOSMONDI co-simulation algorithm based on a fixed-point method [6] cannot converge on this model when the damping ratio of the component on the right of the model (see Fig. 8) is smaller than the damping ratio of the spring-damper component.

We will process several simulations with different values of \(D_D\) leading to different values of \(\varrho (J_{\varPsi _{\tau }})\). These values and the expected movement of the body of the system is plotted in Fig. 11.

Fig. 11
figure 11

Displacement of the mass (\(x_L\)) for different damping ratios of the right damper (\(D_D\)) simulated on a monolithic model (without co-simulation). Associated spectral radii of \(J_{\varPsi }\) are recalled for further coupled formulations

5.1.3 Results

As the PETSc library enables to easily change the parameters of the JFM (as explained in Sect. 4.2), three methods have been used in the simulations:

  • NewtonLS: a Newton based non-linear solver that uses a line search,

  • Ngmres: the non-linear generalized minimum residual method [10], and

  • Anderson: the Anderson mixing method [1]

First of all, simulations have been processed with all these JFMs (with parameters exhaustively defined in appendix A) within IFOSMONDI-JFM, the fixed-point IFOSMONDI algorithm (denoted hereafter as “Fixed-point”), and the original explicit zero-order hold co-simulation method (sometimes referred to as non-iterative Jacobi). The error is defined as the mean of the normalized \(L^2\) errors on each state variable of both systems on the whole \([t^{{\mathrm{init}}}, t^{{\mathrm{end}}}]\) domain. The reference is the monolithic simulation (of the non-coupled model) done with Simcenter Amesim. Such errors are presented for a contractant case (\(D_D=4\) N, so \(\varrho (J_{\varPsi _{\tau }})=0.5\)) in Fig. 12. For a non-contractant case (\(D_D=0.64\) N, so \(\varrho (J_{\varPsi _{\tau }})=1.25\)), analog plots are presented in Fig. 13.

Fig. 12
figure 12

Error accross \(\delta t_{{\mathrm{ref}}}\) with different methods on a contractant case (\(D_D=4.0\), \(\rho (J_{Psi})=0.5\))—NewtonLS, Ngmres and Anderson are matrix-free iterative methods used with the IFOSMONDI-JFM algorithm, Fixed-point is the fixed-point IFOSMONDI algorithm, and Explicit ZOH is the non-iterative zero-order hold fixed-step co-simulation

Fig. 13
figure 13

Error accross \(\delta t_{{\mathrm{ref}}}\) with different methods on a non-contractant case (\(D_D=0.64\), \(\rho (J_{Psi})=1.25\))—NewtonLS, Ngmres and Anderson are matrix-free iterative methods used with the IFOSMONDI-JFM algorithm

As expected, the simulations failed (diverged) with fixed-point method for the non-contractant case. Moreover, the values given to the system were too far from physically-possible values with the explicit ZOH co-simulation algorithm, so the internal solvers of systems \((S_1)\) and \((S_2)\) failed to integrate. These are the reason why these two methods lead to no curve on Fig. 13.

Nonetheless, the three versions of IFOSMONDI-JFM algorithm keep producing reliable results with an acceptable relative error (less than \(1\%\)) when \(\delta t_{{\mathrm{ref}}}\geqslant 0.1\) s.

On Figs. 12 and 13, IFOSMONDI-JFM method seems to solve the problem with a good accuracy regardless of the value of the damping ratio \(D_D\). To confirm that, several other values have been tried: the ones for which the solution has been computed and plotted in Fig. 11. The error is presented, but also the number of iterations and the number of integrations (calls to \(\zeta _k\), i.e. calls to \(\gamma _{\tau }\) for IFOSMONDI-JFM or to \(\varPsi _{\tau }\) for fixed-point IFOSMONDI). Although for the fixed-point IFOSMONDI the number of iteration is the same than the number of integration, for the IFOSMONDI-JFM algorithm the number of iterations is the one of the underlying non-linear solver (NewtonLS, Ngmres or Anderson), and there might be a lot more integrations than iterations of the non-linear method. These results are presented in Fig. 14.

Fig. 14
figure 14

Total number of iterations, integrations, and error across spectral radius of \(J_{\varPsi }\) for different methods (Fixed-point corresponds to classical IFOSMONDI algorithms, and all other methods are used with the IFOSMONDI-JFM version). All co-simulation ran with \(\varepsilon = 10^{-4}\) and \(\delta t_{{\mathrm{ref}}}= 10^{-2}\)

As expected, the threshold of \(\varrho (J_{\varPsi _{\tau }})=1\) (id est \(D_D=D_{SD}=1\)) is critical for the fixed-point method. The IFOSMONDI-JFM method not only can overpass this threshold, but no significant extra difficulty appears to solve the problem in the non-contractant cases, except for the Ngmres non-linear solver (which failed to converge with \(D_D=0.01\), so with \(\varrho (J_{\varPsi _{\tau }})=10\)). However, regarding the Ngmres method, the variant that uses line search converges in all cases. Even though the latter requires more integrations than other JFMs, it is more robust to high values of \(\varrho (J_{\varPsi _{\tau }})\). The parameters of this line search are detailed on Table 9 in appendix A.

The NewtonLS and Anderson methods show a slightly bigger error on this “extreme” case of \(\varrho (J_{\varPsi _{\tau }})=10\), yet it stays under \(0.001\%\) which is completely acceptable.

Among those two JFMs (NewtonLS and Anderson), the trend that can be observed on Fig. 14 shows that NewtonLS is always more accurate than Anderson, yet it always requires a bigger amount of integrations. We can stand that IFOSMONDI-JFM is more accuracy-oriented on this model when it is based on the NewtonLS JFM, and more speed-oriented on this model when it is based on the Anderson JFM (for the same \(\delta t_{{\mathrm{ref}}}\) and \(\varepsilon\)). For high values of \(\varrho (J_{\varPsi _{\tau }})\), accuracy-oriented simulations are achieved thanks to the Ngmres JFM with line search more than the NewtonLS one.

Finally, smaller errors are obtained with IFOSMONDI-JFM and with less iterations than fixed-point IFOSMONDI. Yet, the time consumption is directly linked with the number of integrations, not with the number of iterations of the underlying non-linear solver. The total number of integrations does not increase across the problem difficulty (increasing with \(\varrho (J_{\varPsi _{\tau }})\)), and the non-linear methods within IFOSMONDI-JFM do not even require more integrations that the fixed-point one for most of the values of \(D_D\) for which the fixed-point IFOSMONDI algorithm does not fail.

5.2 Industrial-scale thermal-electric model

Regarding industrial-scale test cases, it is not always possible to determine in advance if the fixed-point formulation is contractant or not. Indeed, the analytical analysis (as done for the first test-case) is not always possible due to the model dimensions and its potential non-linear behavior.

For this reason, this subsection introduces a large model with 324 state variables across eleven systems, and 148 interface variables in total (meaning 148 inputs connected to 148 outputs in a one-to-one way, making \(\varPhi ^T\) a square matrix). As the variable of the JFM is the vector of all inputs and their derivatives, the JFM solves a problem of size 296.

The problem is compliant with the fixed-point IFOSMONDI and the explicit ZOH methods, so that comparisons in terms of time/accuracy trade-off can be conducted. This analysis is namely possible thanks to the scale of the system, making it run non-instantaneously.

Fig. 15
figure 15

Subsketch inside of a single module of the battery pack: the 6 cells can be seen

5.2.1 Model presentation

The model in an industrial-scale thermal-electric system representing a battery pack (represented on Fig. 16) with an air cooling system. The battery pack is made of 10 modules of 6 cells each (see Fig. 15), all modules being connected by several moist airports (to represent airflow as different points in space as the air is circulating), thermal connections (representing thermal conduction) and electrical connections.

Fig. 16
figure 16

Battery pack cooling system modelled with Simcenter Amesim (each module contains 6 cells as shown on Fig. 15)—Monolithical model

In practice, the need for a co-simulation for this kind of model arises when an external tool (simulation and modelling platform) provides a black-box system for each module. Indeed, in this case, doing a co-simulation is the only way to test the battery pack made up of these modules (in a flexible configuration regarding the number of modules) regarding a given battery load/unload scenario.

In this paper, the monolithic system of Fig. 16 will only act as a reference, and we will consider 11 black-box systems respectively corresponding to the 10 modules and the external load/unload scenario. The sketch of one of the black-box module systems is given as example in Fig. 17, and the load/unload scenario (in the \(11{\mathrm{th}}\) system) is presented in Fig. 18.

Fig. 17
figure 17

Black-box system of module 2 only: sketch representation of a single system for co-simulation in Simcenter Amesim

Fig. 18
figure 18

Battery load/unload signal

The battery pack is a 230 V, 10.4 kWh hybrid vehicle battery. The cells in each module are 3.84 V, 45 Ah Li-Ion cells. The charge and discharge (smooth) steps that can be seen on Fig. 18 simulate critical cses where the highest thermal load occur (as the battery is submitted to high currents). The pack in a \(20^{\circ }\hbox {C}\) air environment. Air the airflow (cooling system) comes from the bottom of module 1 and exits the pack at the top of module 10, the temperature is distributed along with the modules and cells like as shown in Fig. 19 at the end of the 5000 s scenario. This result is obtained with the monolithical reference model.

Fig. 19
figure 19

Battery temperature distribution at \(t=t^{{\mathrm{end}}}=5000\) —Arrows represent the air flow—Module 1 is on the left and module 10 is on the right

5.2.2 Results

Results have been generated on a HPC cluster so that the processes can run in parallel. Indeed, due to the 10 modules of the model and the system containing the scenario, the battery load, and the reference potential, 11 workers are instantiated for a co-simulation. In addition to the orchestrator process (see architecture on Fig. 7), a total of 12 processes run parallelly.

Due to the small number of integrations required by the Anderson method (as it recombines the previously evaluated iterates, [1]), this JFM is chosen in IFOSMONDI-JFM. The results are obtained with several values or \(\varepsilon\) both with IFOSMONDI-JFM and fixed-point IFOSMONDI methods. A strong knowledge of the model is not required with these methods, yet an idea of the order of magnitude of the co-simulation step size always helps. For this reason, we used the following parameters:

  • \(\delta t_{{\mathrm{min}}}= 1\) ms, to be able to catch the fast interfaces dynamics, if any,

  • \(\delta t_{{\mathrm{max}}}= 1\) s, to avoid missing events (peaks, slope changes, etc...), and

  • \(\delta t_{{\mathrm{init}}}= 1\) ms for safety reasons (catch high dynamics at initialization).

In contrast, the explicit ZOH co-simulation method requires the step size to be chosen at the beginning, which implies that the user has a strong knowledge of the system. For this reason, we ran co-simulations with different values of the fixed co-simulation step size \(\delta t\).

Please note that co-simulation with the explicit ZOH method ran with the same architecture than the IFOSMONDI methods. In other words, each co-simulation required 12 processes to run, regardless of the method. This by-design parallelism will therefore not bias the results below (Tables 3, 4).

Table 3 Results on the Battery pack cooling system with IFOSMONDI-JFM
Table 4 Results on the Battery pack cooling system with fixed-point IFOSMONDI

Please note that, in the case of fixed-point IFOSMONDI, one iteration corresponds to a single integration (Table 5).

Table 5 Results on the Battery pack cooling system with Explicit ZOH

Please note that, in the case of Explicit ZOH, one co-simulation step corresponds to a single integration.

Fig. 20
figure 20

Graphical visualization of results in Tables 3, 4 and 5

On the trade-off graph on Fig. 20, the more a co-simulation is valuable, the more it is close to the bottom-left corner, meaning that the run is accurate and fast. Every method follows a well-known phenomenon: the more a co-simulation is accurate, the slower it is. Graphically, this means that point corresponding to a given method goes on the right on the x-axis when they go down on the y-axis.

Nonetheless, both IFOSMONDI methods’ curves are lower than the Explicit ZOH’s curve and more on the left. This can be interpreted in two equivalent ways:

  • at equivalent accuracy as Explicit ZOH, the IFOSMONDI methods (fixed-point and JFM) are faster

  • at an equivalent computational time as Explicit ZOH, the IFOSMONDI methods are more accurate.

In addition, the trade-off curve of IFOSMONDI-JFM is lower and more on the right than the trade-off curve of fixed-point IFOSMONDI. It means that, with the same \(\varepsilon\) convergence criterion, IFOSMONDI-JFM is more accuracy-oriented than the fixed-point IFOSMONDI method.

Let’s focus on the \(\varepsilon =10^{-8}\) cases. The average step size was 0.831 s for IFOSMONDI-JFM and 0.986 s for fixed-point IFOSMONDI, so let’s compare the results with the run with the explicit ZOH method with a fixed co-simulation step size of 1 s. Two variables of interest are plotted on Fig. 21. To see the differences of accuracy between the runs, a focus on \(t\in [3730, 3830]\) is presented on Fig. 22. On the latter, we can clearly see that both IFOSMONDI methods visually match the monolithic reference solution whereas the explicit ZOH has a delay and an overshoot.

Fig. 21
figure 21

Two variables of interest in the Battery Pack Cooling model, results for different (co-)simulation methods

Fig. 22
figure 22

Focus on \(t\in [3730, 3830]\) of the curves in Fig. 21

Finally, the step size adaptation with the rule described in 3.4 can be visualize in both IFOSMONDI fixed-point and JFM methods with \(\varepsilon =10^{-8}\) together with the number of integrations (including the rejected steps) and one of the variable of interests of the system. Figures 23 and 24 show that the methods focus on the one hand on the stiff parts of the simulation by integrating more time and reducing the co-simulation step size, and on the other hand they save time on the non-stiff parts by increasing the co-simulation step size and iterating a smaller amount of time. This phenomenon can be explained by the fact that non-stiff models (or non-stiff parts of a simulation of models with variable stiffness) produce a version of the coupling constraint that is easier to satisfy (at a given \(\varepsilon\) tolerance) than stiff models (or stiff parts of a simulation of models with variable stiffness).

Fig. 23
figure 23

Visualization of the connection between the co-simulation step size (upper straight curve, right y-scale), the number of integrations (lower straight curve, right y-scale) and a representative variable of interest of the system (superimposed red curve with round markers) in the case of the IFOSMONDI-JFM method applied on the Battery Pack Cooling system

Fig. 24
figure 24

Visualization of the connection between the co-simulation step size (upper straight curve, right y-scale), the number of integrations (lower straight curve, right y-scale) and a representative variable of interest of the system (superimposed blue curve with triangle markers) in the case of the fixed-point IFOSMONDI method applied on the Battery Pack Cooling system

6 Conclusion

The IFOSMONDI-JFM algorithm takes advantages from the \(C^1\) smoothness of fixed-point IFOSMONDI algorithm [6] without the delay that this smoothness implies in [5] (thanks to its iterative aspect), the coupling constraint is satisfied both at left and right of every communication time thanks to the underlying non-linear solvers of PETSc [2]. The iterative part does not need a finite differences estimation of the jacobian matrix like in [14] or a reconstruction of it like in [15].

The resulting algorithm even solves co-simulation problems for which the fixed-point formulation would involve a non-contractant coupling function \(\varPsi _{\tau }\).

Thanks to its algebraic loop, the test case introduced in 5.1.1 shows this robustness as its difficulty can easily be increased or decreased in a quantifiable way. It can be a good candidate to benchmark the robustness of various co-simulation methods.

On the test-cases considered in this paper, the IFOSMONDI-JFM method either requires less iterations to converge when the parameterization enables both methods to solve the problem, or has a better accuracy than the one obtained with the fixed-point IFOSMONDI method. In the end, the time/accuracy trade off is similar for both methods.

The matrix-free aspect of the underlying solvers used with IFOSMONDI-JFM and their fast convergence are two of the causes of the small amount of integrations per step.

As the contractance of the fixed-point function is not always possible to analyze in the case of industrial cases in practise, one cannot know in advance if the fixed-point IFOSMONDI method will work or fail. The robustness of the newly introduced IFOSMONDI-JFM method brings a solution to this problem.