1 Introduction

Knowledge of the magnetic-field structure is significant for understanding solar phenomena on the Sun, for example coronal mass ejections, flares, and filaments. However, it is difficult to measure the solar magnetic field with high accuracy except in the photosphere. Therefore, there are several models of magnetic-field extrapolation that are proposed, such as potential fields (PF) (Schmidt 1964; Altschuler and Newkirk 1969), linear force-free fields (LFFF) (Nakagawa and Raadu 1972; Chiu and Hilton 1977), and nonlinear force-free fields (NLFFF) (Sakurai 1989; Neukirch 2005; Wiegelmann and Sakurai 2021).

The PF model is charaterized by

$$ \begin{aligned} \Delta \phi & = 0 \ \ \text{in} \ \Omega \\ \frac{{\partial \phi }}{{\partial n}} & = {B_{z_{0}}} \ \ \text{on} \ \partial \Omega \end{aligned} \ \ , $$
(1)

where \({\boldsymbol{B}} = {\boldsymbol{B}}\left (x,y,z\right ) = \left (B_{x},B_{y},B_{z} \right ) \) is the field, \(\Omega \) is the open space above the solar surface, and \(\partial \Omega \) is its boundary. Finally, the magnetic field can be formulated as \({\boldsymbol {B}} = \nabla \phi \).

The solution of Equation 1 is unique. The solution is a good approximation of the large-scale topology of a solar active region, but it is not suitable for a small-scale topology such as that discussed by Aulanier, Pariat, and Démoulin (2005).

The force-free field model can be written as

$$ \begin{aligned} \nabla \times {\boldsymbol {B}} & = \alpha {\boldsymbol {B}} \ \ \text{in} \ \Omega \\ \nabla \cdot {\boldsymbol {B}} & = 0 \ \ \text{in} \ \Omega \\ {\boldsymbol {B}} & = {\boldsymbol{B_{0}}} \ \ \text{on} \ \partial \Omega \end{aligned} \ \ , $$
(2)

where \({\boldsymbol{B_{0}}}\) is the observed magnetic field in the photosphere. When \(\alpha \) is a constant, Equation 2 represents the LFFF model, which can be solved analytically using the Green’s function method or the Fourier method, as discussed by Wiegelmann and Sakurai (2021). In particular, when \(\alpha = 0\), Equation 2 describes the PF model, which can also be solved using these analytical methods.

In the LFFF model, \(\alpha \) is a global free parameter. As observed by Démoulin et al. (2002) and Valori et al. (2010), a large \(\alpha \) may lead to overestimation, while a small \(\alpha \) may lead to under-estimation. When the assumption of a constant \(\alpha \) is relaxed, Equation 2 becomes the NLFFF model, where \(\alpha \) is a spatially dependent scalar function. With a variable \(\alpha \), a closed-form solution to Equation 2 is no longer feasible. This has prompted the development of various numerical algorithms to solve Equation 2. These numerical algorithms include the works by Grad and Rubin (1958), Nakagawa (1974), Mikić and McClymont (1994), Amari et al. (1997), Wheatland, Sturrock, and Roumeliotis (2000), Yan and Sakurai (2000), Régnier, Amari, and Kersalé (2002), Wiegelmann and Neukirch (2003), Wiegelmann, Inhester, and Sakurai (2006), and Yan and Li (2006).

However, without ground-truth 3D magnetic fields, the performance, stability, and accuracy of these algorithms cannot be objectively evaluated. Fortunately, assuming an axially symmetric configuration of the magnetic field, i.e. \(\frac{\partial }{{\partial \phi }} = 0\) in the spherical coordinate system \((r,\theta ,\phi )\), Low and Lou provided a set of separable and semi-analytical solutions to Equation 2:

$$ {\boldsymbol {B}} = \frac{{{B_{0}}}}{{{R^{n + 2}}}}\left [ { - \frac{{{\mathrm{d}}F}}{{\mathrm{d}\mu }}\widehat {{\boldsymbol {r}}} + \frac{{nF}}{{{{\left ( {1 - {\mu ^{2}}} \right )}^{\frac{1}{2}}}}} \widehat {{\boldsymbol{\theta}}} + \frac{{\eta A}}{{{{\left ( {1 - {\mu ^{2}}} \right )}^{\frac{1}{2}}}}} \widehat { {\boldsymbol{\phi}}} } \right ] , $$
(3)

where

$$ A = \textstyle\begin{cases} {{F^{1+{\frac{1}{n}}}},}&{n = 1} \\ {F{{\left | F \right |}^{\frac{1}{n}}},}&{n \geqslant 3} \end{cases} $$
(4)

and \(F \) satisfies

$$ \left ( {1 - {\mu ^{2}}} \right ) \frac{{{\mathrm{d}^{2}}F}}{{\mathrm{d}{\mu ^{2}}}} + n\left ( {n + 1} \right )F + C = 0, $$
(5)

where

$$ C = \textstyle\begin{cases} {\left ( {1 + \frac{1}{n}} \right ){a^{2}}{F^{1 + \frac{2}{n}}},}&{n = 1} \\ {\left ( {1 + \frac{1}{n}} \right ){a^{2}}F{\left ( {{F^{2}}} \right )^{ \frac{1}{n}}},}&{n \geqslant 3.} \end{cases} $$
(6)

For a comprehensive and in-depth analysis of the mathematical framework of Low and Lou’s equilibria, please consult Appendix A for detailed discussions.

In fact, \(n = 1 \) and \(n \ge 3 \) can be generalized, as \(\frac{2}{n}\) is positive and not necessarily an integer in both Equations 4 and 6. For example, \(n = \frac{4}{3}\), i.e. \(\frac{2}{n} = \frac{3}{2} \) is not a positive integer, but \(n = \frac{2}{9}\), i.e. \(\frac{2}{n} = 9 \) is a positive integer.

It is to be observed that \({F^{1 + \frac{2}{n}}}\) is not always equal to \(F{\left ( {{F^{2}}} \right )^{\frac{1}{n}}}\). For example, \({F^{\left ( {1 + \frac{2}{n}} \right )}} > 0\), but \(F{\left ( {{F^{2}}} \right )^{\frac{1}{n}}} < 0 \) when \(F = -0.1\), \(n = \frac{2}{9}\). In this article, we take \(n = 5\), 3, 1.5, 1, 0.9, 0.7, 0.5, 0.3, and 0.1, respectively.

Equation 5 is a second-order, nonlinear, ordinary differential equation (ODE). Solving ODEs is an important topic in mathematics and engineering. In general, most of the existing research for finding solutions to ODEs falls into two main categories: analytical techniques and numerical methods. The analytical techniques include, e.g., separation of variables and the method of integrating factors. The numerical methods include, e.g., Euler’s method and the Runge–Kutta (RK) method. The numerical method is usually expressed in terms of the discretization parameters. Artificial-intelligence-based methods are also currently being used in solving ODEs, such as that discussed by Raissi, Perdikaris, and Karniadakis (2019), Dufera (2021), and Cuomo et al. (2022).

Several works have extended Low and Lou’s equilibria in the last 30 years, such as Low and Flyer (2007), Lerche and Low (2014), and Prasad, Mangalam, and Ravindra (2014) redefined Equations 6, 7, and 9 of Low and Lou (1990) in different ways. To the best of our knowledge, there is no discussion about the existence and uniqueness of analytical solutions to Equation 5. The most commonly used numerical method to solve Equation 5 is the RK fourth-order method (RK4). RK4 provides the approximate value of \({{F}}\left ( {{\mu _{i}}} \right )\) at the discrete sampling points \(\mu _{i}\)s. If \(\mu_{j}\) is not included in the set of \(\mu _{i}\), we will not be able to directly determine the value of \(F\left ( {{\mu _{j}}} \right )\). In this article, a neural network can compute \(F\left ( {{\mu _{j}}} \right )\) and \({F^{\prime}}\left ( {{\mu _{j}}} \right )\) at any \({\mu _{j}}\) in \(\left [ { - 1,1} \right ]\) directly.

The remainder of this article is organized as follows: Section 2 provides a RK-based method for solving for the parameter \(a \) in Equation 6. Section 3 develops the corresponding numerical algorithm. The data-driven numerical method for the parameters \(n \) and \(a \) is presented in Section 4. The conclusion is given in Section 5.

2 RK-Based Method for the Parameter \(a \) in Equation 5

Returning to Equations 3 and 5, there is one parameter \(a \) and two unknowns \(\left ( {F,\frac{{\mathrm{d}F}}{{\mathrm{d}\mu }}} \right )\) with \({B_{0}} = 1\) and \({r_{0}} = 1\) in Low and Lou’s equilibria. In this section, we study numerical methods for solving for \(a \).

The basic idea for solving Equation 5 with initial conditions is to rewrite it as a system of first-order ODEs. Introducing the variables

$$ {f_{1}} = F\left ( \mu \right ) \ \ \text{and} \ \ {f_{2}} = {F^{ \prime}}\left ( \mu \right ), $$

we obtain a system of two first-order ODEs:

$$ \textstyle\begin{cases} f_{1}^{\prime }= {f_{2}}, \\ f_{2}^{\prime }= - \frac{{n\left ( {n + 1} \right ){f_{1}} + D}}{{1 - {\mu ^{2}}}}, \end{cases} $$

where

$$ D = \textstyle\begin{cases} \left ( {1 + \frac{1}{n}} \right ){a^{2}}f_{1}^{1 + \frac{2}{n}},&{\text{if $\frac{2}{n}$ is an integer},} \\ \left ( {1 + \frac{1}{n}} \right ){a^{2}}{f_{1}}{\left ( {f_{1}^{2}} \right )^{\frac{1}{n}}}, &{\text{otherwise},} \end{cases} $$

with initial values at \(\mu = - 1\)

$$ {f_{1}}\left ( { - 1} \right ) = 0 \ \ \text{and} \ \ {f_{2}}\left ( { - 1} \right ) = 10. $$

The RK method is an effective method for solving the initial-value problem of Equation 5 with an unknown parameter \(a \) if \(f_{1}(1) = 0 \).

We divide the interval \(\left [ {a_{0},a_{\mathrm{max}}} \right ]\) into \(N \) equal parts, choosing a step \(h = \frac{{{a_{\max }} - {a_{0}}}}{N}\). Then, \(a_{i} = a_{0} + \left ( {i - 1} \right )h \left ( {1 \leqslant i \leqslant N} \right )\). Without loss of generality, we choose \(a_{0} = {10^{ - 5}}\), \(a_{\max } = 10\), \(N = 999\).

Note that \(\mu = -1 \) is a singular point of \(f_{2}^{\prime }\) in Equation 5. The value of \(f_{2}^{\prime}\left ( { - 1} \right )\) depends on the parameter \(a \), as shown Figure 1, for \(n = 1 \), \(f_{2}^{\prime}\left ( { - 1} \right ) = 0\), and \(10^{6} \).

Figure 1
figure 1

The curves of \(F\left (a, 1\right )\) with the variable \(a\) for \(F^{\prime \prime} \left (-1\right )=0\) and \(F^{\prime \prime} \left (-1\right )=10^{6}\), where the points of intersection between \(F\left (a, 1\right )\) and the horizontal axis can determine the values of \(a\) for \(F\left (a, 1\right )=0\).

We take \(f_{2}^{\prime}\left ( { - 1} \right ) = 0 \) in this section. It helps us to find the interval that contains zeros when we plot \(F\left ( {a_{i},\mu = 1} \right )\) against \(a \), as shown in Figure 2. Finally, we find \(F\left ( {a_{i},\mu = 1} \right ) = 0\) by the bisection method, and we find \(a_{n,m} \)s (\(1 \le m \le 3 \)) as presented in Table 1.

Figure 2
figure 2

\(F(a,1)\) with the variable \(a\) for different \(n\).

Table 1 \(a_{n,m} \) for the \(m \)th-order root of \(F\left ( {a_{i},\mu = 1} \right ) \) for a given \(n \).

3 Machine-Learning-Based Method for \(F\) and \(\frac{{\mathrm{d}F}}{{\mathrm{d}\mu }}\)

Most of the classical numerical methods compute an approximate value for the solution at discrete sampling points. In this section, we propose a grid-free method based on a neural network to obtain a numerical solution at any point in \(\left [ { - 1,1} \right ]\).

The following theorem provides a solid theoretical basis for using multi-layer perceptrons (MLP) in scientific computing.

Theorem 1

(Universal approximation theorem (Cybenko 1989 and Hornik 1991)) Let \(K \subseteq {\mathbb{R}^{d}}\) be compact, \(f:K \to \mathbb{R}\) be continuous, \(\rho :\mathbb{R} \to \mathbb{R}\) be continuous and not a polynomial. Then, for \(\forall \epsilon > 0 \), there exists \(N \in \mathbb{N}, a_{k}\), \(b_{k} \in \mathbb{R}\), \(w_{k} \in \mathbb{R}^{d} \) with

$$ {\left \| {f - \sum \limits _{k = 1}^{N} {{a_{k}}\rho \left ( {\left \langle {{w_{k}}, \cdot } \right \rangle - {b_{k}}} \right )} } \right \|_{\infty }} < \epsilon . $$

Remark 1

The universal approximation theorem also exists when \(f:K \to \mathbb{R}^{m}\).

Therefore, we use an MLP architecture with three fully connected operations to solve Equation 5 as shown in Figure 3. The first fully connected operation has one input channel corresponding to the inputs \(\mu \). The second fully connected operation has \(N \) hidden neurons. The last fully connected operation has two outputs \(F\) and \(\frac{{\mathrm{d}F}}{{\mathrm{d}\mu}}\). It can be formulated as

$$ {\boldsymbol {f} \left ( {\mu ;\theta } \right )} = {\left [ {{f_{1}},{f_{2}}} \right ]^{T}} = {W_{2}}Y + {b_{2}} = {{W_{2}}\left [ {\sigma \left ( {{W_{1}} \mu + {b_{1}}} \right )} \right ] + {b_{2}}}, $$
(7)

where \({W_{1}},{b_{1}} \in {\mathbb{R}^{N \times 1}}\), \({W_{2}} \in { \mathbb{R}^{2 \times N}}\), \({b_{2}} \in {\mathbb{R}^{2 \times 1}}\), \({\left [ \cdot \right ]^{T}}\) is the transpose operator in linear algebra, \(Y = {\left [ {{y_{1}}, \ldots ,{y_{N}}} \right ]^{T}}\), and \(\sigma \) is the hyperbolic tangent sigmoid (tanh) elementwise operator

$$ {\mathrm{{tanh}}}\left ( x \right ) = \frac{{1 - {\text{e}^{ - x}}}}{{1 + {\text{e}^{ - x}}}}, $$
(8)

where \(x \) is an element of the column vector \({W_{1}}\mu + {b_{1}}\).

Figure 3
figure 3

The MLP has an input layer, one hidden layer, and an output layer. A layer consists of small individual units called neurons. The letter \(N \) is used for the number of neurons in the hidden layer. The loss function contains information from the differential equation and the initial conditions.

Theorem 1 ensures the existence of a neural-network solution \({\boldsymbol {f}} \left ( {\mu ;\theta } \right ) \) that can approximate the solution of Equation 5 arbitrarily closely. Then, we find \({\boldsymbol {f}} \left ( {\mu ;\theta } \right ) \) by minimizing the loss function

$$ \mathcal{L}\left ( {\mu ;\theta } \right ) = {\lambda _{1}}{ \mathcal{L}_{1}}\left ( {\mu ;\theta } \right ) + {\lambda _{2}}{ \mathcal{L}_{2}}\left ( {\mu ;\theta } \right ) + {\lambda _{3}}{ \mathcal{L}_{3}}\left ( {\mu ;\theta } \right ), $$
(9)

where

$$ \begin{aligned} {\mathcal{L}_{1}}\left ( {\mu ;\theta } \right ) & = \frac{1}{M}\sum \limits _{\mu = {\mu _{j}},j = 1}^{j=M} {{{\left ( { \left ( {1 - {\mu ^{2}}} \right ) \frac{{\mathrm{d}{f_{2}}}}{{\mathrm{d}\mu }} + n\left ( {n + 1} \right ){f_{1}} + E} \right )}^{2}}} \\ {\mathcal{L}_{2}}\left ( {\mu ;\theta } \right ) & = \frac{1}{M}\sum \limits _{\mu = {\mu _{j}},j = 1}^{M} {{{\left ( { \frac{{\mathrm{d}{f_{1}}}}{{\mathrm{d}\mu }} - {f_{2}}} \right )}^{2}}} \\ {\mathcal{L}_{3}}\left ( {\mu ;\theta } \right ) & = {\left ( {{f_{1}} \left ( { - 1;\theta } \right )} \right )^{2}} + {\left ( {{f_{2}} \left ( { - 1;\theta } \right ) - 10} \right )^{2}}, \end{aligned} $$

where \(M \) is the sample size and where

$$ E = \textstyle\begin{cases} {a^{2}}\left ( {1 + \frac{1}{n}} \right )f_{1}^{\left ( {1 + \frac{2}{n}} \right )},& {\text{if}} \ \frac{2}{n} \ \text{is a positive integer}, \\ {a^{2}}\left ( {1 + \frac{1}{n}} \right ){f_{1}}{\left ( {f_{1}^{2}} \right )^{\frac{1}{n}}},& \text{otherwise} \end{cases} $$

and where \({\boldsymbol {\theta}} \in \mathbb{R}^{(4N+2)\times 1}\) is a learnable parameter, which is reshaped by the collecting set \(\left \{ {{W_{1}},{b_{1}},{W_{2}},{b_{2}}} \right \}\). Clearly, if Equation 9 does reduce to zero, then Equation 5 holds.

In order to compute \(\frac{{\mathrm{d}{f_{1}}}}{{\mathrm{d}\mu }}\), \(\frac{{\mathrm{d}{f_{2}}}}{{\mathrm{d}\mu }}\), and \(\frac{{\partial \mathcal{L}}}{{\partial \theta }} \) we use automatic differentiation (AD) (Baydin et al. 2018), rather than numerical differentiation or integration (Kincaid and Cheney 2002, Chapter 7) based on the assigned grid. AD is a set of techniques for evaluating the derivatives numerically. AD uses symbolic rules for differentiation, however AD evaluates derivatives at particular numeric values, and it does not construct symbolic expressions for derivatives. Automatic differentiation is a powerful tool to automate the calculation of derivatives and is preferable to more traditional methods, especially when differentiating complex algorithms and mathematical functions (Baydin et al. 2018). In Matlab, a dlgradient command takes derivatives with respect to the input or to the parameters.

In practice, we use the mini-batch ADAM (Chen et al. 2022) that is a batch of \(Nb \) randomly sampled points at every training iteration to minimize ℒ. The \(M \) data points are randomly divided into \(\frac{M}{{Nb}}\) batches of size \(Nb \). When all \(\frac{M}{{Nb}}\) batches of data are used for optimization once an epoch is completed. The mini-batch ADAM is an optimization algorithm that can minimize the loss function ℒ.

Select \(M = \) 100,000 points from −1 to 1 at random to train the MLP. Set the \(m = 1 \), \(\beta _{1} = 0.9\), and \(\beta _{2} = 0.999\) for all \(n \)s. Set \(\lambda _{1}\), \(\lambda _{2}\), \(\lambda _{3}\), \(N\), \(Nb \), and \(\eta \), as Table 2, for different \(n \) at the same time. Then, update \({\boldsymbol {\theta}}\) using the mini-batch ADAM algorithm, which is shown as Algorithm 1.

Algorithm 1
figure 4

Mini-batch ADAM Algorithm.

Table 2 Hyperparameters Setting of the MLP.

Finally,

$$ {{{\boldsymbol {\theta}}} ^{*}} = \mathop {\arg \min }\limits _{{ \boldsymbol {\theta}}} Loss . $$

For values of \({\mu _{i}} = - 1 + \left ( {i - 1} \right )\frac{2}{{9999}} \ \left ( 1 \leqslant i \leqslant 10{,}000 \right )\), compare the predicted values (\({F_{\mathrm{MLP}}}\)) of the MLP with the numerical solutions (\({F_{\mathrm{RK}}}\)) of Equation 2 using the classical RK4.

Define the mean square error (MSE):

$$ \begin{aligned} {\mathrm{MSE}}(F) & = \frac{1}{{10{,}000}}\sum \limits _{i = 1}^{10{,}000} {{{\left [ {{F_{\mathrm{MLP}}}\left ( {{\mu _{i}}} \right ) - {F_{\mathrm{RK}}} \left ( {{\mu _{i}}} \right )} \right ]}^{2}}}, \\ {\mathrm{MSE}}(\frac{{\mathrm{d}F}}{{\mathrm{d}\mu }}) & = \frac{1}{{10{,}000}}\sum \limits _{i = 1}^{10{,}000} {{{\left [ {{{ \frac{{\mathrm{d}F}}{{\mathrm{d}\mu }}}_{\mathrm{MLP}}}\left ( {{\mu _{i}}} \right ) - {{\frac{{\mathrm{d}F}}{{\mathrm{d}\mu }}}_{\mathrm{RK}}}\left ( {{ \mu _{i}}} \right )} \right ]}^{2}}}, \end{aligned} $$

to measure how close the \({F_{\mathrm{MLP}}}\left ( {{\mu _{i}}} \right )\) is to the \({F_{\mathrm{RK}}}\left ( {{\mu _{i}}} \right )\) and the \(\left ( {{{\frac{{\mathrm{d}F}}{{\mathrm{d}\mu }}}_{\mathrm{MLP}}}} \right )\left ( {{\mu _{i}}} \right )\) is to the \({\left ( {\frac{{\mathrm{d}F}}{{\mathrm{d}\mu }}} \right )_{\mathrm{RK}}} \left ( {{\mu _{i}}} \right )\), respectively. The MSEs are shown in Tables 3 and 4. This shows that the MLP approach is efficient, compared with the RK4.

Table 3 Test performance of the numerical solutions generated by the MLP when \(n > 1 \).
Table 4 Test performance of the numerical solutions generated by the MLP when \(n < 1 \).

Figures 4 and 5 show how close the solutions generated by MLPs are to the RK method. Therefore, they illustrate that MLP works reasonably well.

Figure 4
figure 5

\(F\), \(\frac{{\mathrm{d}F}}{{\mathrm{d}\mu }}\) generated by MLP and RK methods when \(n \ge 1 \).

Figure 5
figure 6

\(F\), \(\frac{{\mathrm{d}F}}{{\mathrm{d}\mu }}\) generated by MLP and RK methods when \(n < 1 \).

We developed a numerical algorithm that can effectively solve a specific class of ODEs, particularly those derived from \(\frac{\partial }{{\partial \phi }} = 0 \). One notable aspect of our algorithm is its remarkable adaptability, which facilitates effortless adjustments to a wide range of initial and boundary conditions through simple modifications of the loss function in our proposed artificial-intelligence (AI) neural-network model. Lerche and Low (2014) generalized the equilibria proposed by Low and Lou (1990) and made modifications to the initial conditions. In this scenario, our numerical algorithm remains valid and applicable. To avoid unnecessary repetition, we have included it as Appendix B.

4 Data-Driven Approach for Identifying the Parameters \(n \) and \(a \)

The objective is to identify the optimum parameters of Low and Lou’s equilibria to match the observations of an active region at the photosphere. This process concerns an inverse problem: given a measured magnetic field \({\boldsymbol {B}} \) at the photosphere, or its value \(P{\boldsymbol {B}} \) under a measurement operator \(P \), determine a corresponding parameter set \(\left \{ {a,n} \right \}\) such that the neural-network solution \(f\left ( {{r_{i},\mu _{i}};a,n} \right )\) can approximate the field \({\boldsymbol {B}} \).

To analyze a force-free field \({\boldsymbol {B}} \) with the additional condition \(\frac{\partial }{{\partial \phi }} = 0\) imposed in the spherical coordinate system, we consider the transformation matrix \(P \) defined as P= [ 0 r n + 2 ( 1 μ 2 ) 1 2 n 0 ] , see Equation 19, then one obtains that

PB= [ 0 r n + 2 ( 1 μ 2 ) 1 2 n 0 ] [ B r B θ B ϕ ] T =F.

The loss function of the neural network can be designed as

$$ \mathcal{L} = \lambda _{1} \mathcal{L}_{1} + \lambda _{2} \mathcal{L}_{2}, $$

where

$$ \mathcal{L}_{1} = \frac{1}{M}\sum \limits _{i = 1}^{M} {{{\left ( { \left ( {1 - {\mu _{i}^{2}}} \right ) \frac{{{\mathrm{d}^{2}}G}}{{\mathrm{d}{\mu ^{2}}}} + {\theta _{2}} \left ( {{\theta _{2}} + 1} \right )G + {\theta _{3}}\left ( {1 + \frac{1}{{{\theta _{2}}}}} \right )G{{\left ( {{G^{2}}} \right )}^{ \frac{1}{{{\theta _{2}}}}}}} \right )}^{2}} } $$

and

$$ {\mathcal{L}_{2}} = \frac{1}{M}\sum \limits _{i = 1}^{M} {{{\left ( {1 - {\mu _{i}^{2}}} \right )}^{\frac{1}{2}}} \frac{{{r^{{\theta _{2}} + 2}}}}{{{\theta _{2}}}}{{\left ( {{{\left ( {{B_{ \theta }}} \right )}_{i}} - f\left ( {{\mu _{i}};{\theta _{1}}} \right )} \right )}^{2}}} $$

and

$$ G\left ( {{\mu _{i}}} \right ) = \frac{{{r^{n + 2}}}}{n}{\left ( {1 - { \mu _{i}^{2}}} \right )^{\frac{1}{2}}}f\left ( {{\mu _{i}};{\theta _{1}}} \right ). $$

In the above equations, \(\lambda _{1} \) and \(\lambda _{2} \) are two trade-off parameters: \(M \) represents the sample size, and \({\boldsymbol {\theta}}_{1} \) is a learnable vector for \(B_{\theta} \). Furthermore, \(\theta _{2} \) and \(\theta _{3} \) are two learnable parameters associated with \(n \) and \(a \), respectively.

For a simple case, the active region is generated by Low and Lou’s approach with \(n = 5 \) and \(a = 3.9341 \) (\(n{a_{1}} = \left \{ {n = 5,a = 3.9341} \right \}\)). We use an MLP with five layers to identify the parameters \(n \) and \(a \). The first layer has three inputs corresponding to the \(r\), \(\mu \), and \(B_{\theta}\). There are 64 neurons for each hidden layer. The last layer has three outputs: the estimated values of the \(n\), \(a \), and \(B_{\theta}\). Set \(\lambda _{1} = \lambda _{2} = 1\) and \(M = 65{,}536 \). After training the MLP, the outputs \(n \) and \(a \) are 4.9042 and 3.9830 (\(n{a_{2}} = \left \{ {n = 4.9042,a = 3.9830} \right \}\)), respectively.

Figure 6 shows the magnetogram \(B_{x} \) that is generated by \({na}_{1} \) and \({na}_{2} \), respectively. The magnetic-field intensity in Figure 6 is visualized in the range of −5000 to 5000. Any values exceeding 5000 are limited to 5000, and any values below −5000 are set to −5000. Figure 7 shows the contours of \(B_{x} \) that are generated by \({na}_{1} \) and \({na}_{2} \), respectively. They are highly compatible in visual representation. In Figure 7, the colorbars are displayed in arbitrary units. It is worth noting that magnetograms can be represented as matrices in Cartesian coordinates \((x,y) \). Figure 8 illustrates a visual representation of the quantity \(\frac{{{{\left ( {{B_{n2}}} \right )}_{x}} - {{\left ( {{B_{n1}}} \right )}_{x}}}}{{\max \left | {{{\left ( {{B_{n1}}} \right )}_{x}}} \right |}}\). However, the largest differences occur close to the magnetic nulls in Figure 8 since the error propagation rapidly increases in Equation 3 when \(r \) is too small. The cosine similarity (Brockmeier et al. 2017) between the magnetograms generated by \({na}_{1} \) and \({na}_{2} \) is 0.9909. Cosine similarity is a measure that calculates the cosine value of the angle between two matrices. It ranges from −1 to 1, with a value closer to 1 indicating a higher degree of similarity between the two magnetograms.

Figure 6
figure 7

The magnetogram \(B_{x} \) that is generated by \({na}_{1} \) and \({na}_{2} \), respectively. Please note that the magnetograms are derived from the equilibria by Low and Lou and are dimensionless.

Figure 7
figure 8

The contour of \(B_{x} \) that is generated by \({na}_{1} \) and \({na}_{2} \), respectively. Please note that the contours are derived from the equilibria by Low and Lou and are dimensionless.

Figure 8
figure 9

The normalized differences generated by \({na}_{1} \) and \({na}_{2} \).

If Low and Lou’s equilibria can effectively approximate the solar photospheric observational data, we can utilize a neural network to determine the parameters and initial values of the Low and Lou’s equilibria. Therefore, a neural network as shown in Figure 9 is applied to NOAA active regions (ARs) 11158 and 11302. This process can be written as:

[ B r B θ B ϕ ] = W 4 σ ( W 3 σ ( W 2 σ ( W 1 [ r μ ] + b 1 ) + b 2 ) + b 3 ) + b 4 ,

where \(W^{1}\), \(W^{2}\), \(W^{3}\), and \(W^{4}\) represent the weight matrices for each layer, while \(\sigma \) represents the activation function. The biases are represented by \(b^{1}\), \(b^{2}\), \(b^{3}\), and \(b^{4}\).

Figure 9
figure 10

Neural-network architecture for the observational data.

The outputs of the neural network are shown in Figures 10c and d for NOAA ARs 11158 and 11302 from the Solar Dynamics Observatory/Helioseismic and Magnetic Imager (SDO/HMI), respectively. It can be observed that we cannot obtain effective magnetic fields referring to Figures 10a and b. With only the data-loss term \(\mathcal{L}_{2}\), we can obtain reasonable outputs as shown in Figures 10e and f for NOAA ARs 11158 and 11302, respectively.

Figure 10
figure 11

The neural-network outputs for NOAA ARs 11158 and 11302.

According to Wiegelmann and Sakurai (2021), the necessary conditions for determining whether the solar photospheric magnetic field is a nonlinear force-free field are:

$$ \int _{S} {{B_{z}}\left ( {x,y,0} \right )\mathrm{d}x \hspace{0.08cm} \mathrm{d}y} = 0, \quad \frac{{\left | {{F_{x}}} \right |}}{{{F_{p}}}} \ll 1, \frac{{\left | {{F_{y}}} \right |}}{{{F_{p}}}} \ll 1, \frac{{\left | {{F_{z}}} \right |}}{{{F_{p}}}} \ll 1, $$

where

$$ \begin{aligned} {F_{x}} = - \frac{1}{{4\pi }}\int _{S} {{B_{x}}{B_{z}} \mathrm{d}x \hspace{0.08cm} \mathrm{d}y},\ \ & \ \ {F_{y}} = - \frac{1}{{4\pi }}\int _{S} {{B_{y}}{B_{z}} \mathrm{d}x \hspace{0.08cm} \mathrm{d}y} , \\ {F_{z}} = \frac{1}{{8\pi }}\int _{S} {\left ( {B_{x}^{2} + B_{y}^{2} - B_{z}^{2}} \right )\mathrm{d}x \hspace{0.08cm} \mathrm{d}y},\ \ & \ \ {F_{p}} = \frac{1}{{8\pi }}\int _{S} {\left ( {B_{x}^{2} + B_{y}^{2} + B_{z}^{2}} \right )\mathrm{d}x \hspace{0.08cm} \mathrm{d}y}. \end{aligned} $$

For practical computations, the acceptable conditions for flux imbalance, as stated by Moon et al. (2002), are defined as:

$$ \frac{{\left | {{F^{+} } - {F^{-} }} \right |}}{{{F^{+} } + {F^{-} }}} \leqslant 0.1 , $$

where \(F^{+} \) and \(F^{-} \) represent the upward and downward magnetic fluxes, respectively. Additionally, the vertical-force condition, as mentioned by Liu et al. (2013), is given by

$$ \frac{{\left | {{F_{z}}} \right |}}{{{F_{p}}}} < 0.1, $$

where \(F_{z} \) denotes the vertical force and \(F_{p} \) represents the total magnetic pressure. If these conditions are satisfied, the magnetic field can be approximated as a force-free field \(\left ( {\nabla \times {\boldsymbol {B}} } \right ) \times {\boldsymbol {B}} = 0\). The active-region data that we used in our study meets these requirements. One possible reason for the code not working is that Low and Lou’s equilibria may not accurately approximate certain observational data, such as for NOAA AR 11158. To effectively utilize Low and Lou’s equilibria, it is crucial to regularize the observational data to approximate the condition \(\frac{\partial }{{\partial \phi }} = 0 \). This regularization term is also essential to ensure that the modified data closely resembles the original data. As is evident from Figures 10c and d, including this regularization term is indispensable. However, the specific form of this regularization term has not yet been determined. In our future work, we will continue to explore and optimize this regularization term.

5 Conclusion

In this article, testing a numerical algorithm for solar magnetic-field extrapolation using Low and Lou’s equilibria proves to be beneficial. We have presented a machine-learning-based numerical method that effectively determines the parameter \(a\) and the function \(F\) in Low and Lou’s equilibria. By employing the MLP neural network, we successfully implemented this algorithm. An area of crucial investigation lies in the adaptive selection of MLP’s width and the parameters \(\lambda _{1} \), \(\lambda _{2} \), and \(\lambda _{3} \) in Equation 9.

Furthermore, we have proposed a machine-learning algorithm to address the inverse problem of Low and Lou’s equilibria. While it performs well on generated data, it falls short when applied to observational data. An intriguing avenue for future research involves optimizing the parameters of the generalized equilibria proposed by Low and Lou to better align with observations of active regions on the photosphere.

Moreover, a promising direction for the future is to combine solar photospheric observation data with artificial-intelligence techniques for coronal magnetic-field extrapolation. This integration holds potential for further advancements in understanding and predicting solar phenomena.