1 Introduction and Motivation

General problems in mathematical optimization are usually formulated in the following form:

$$\displaystyle \begin{aligned} ({\mathcal{P}}_{ap}): \;\;\; \min f(\mathbf{x}) , \;\; \mbox{ s.t. } \; \mathbf{h}(\mathbf{x}) = 0, \;\; \mathbf{g} (\mathbf{x}) \le 0 , {}\end{aligned} $$
(1)

where the unknown \(\mathbf {x}\in {\mathbb R}^n\) is a vector, \(f(\mathbf {x}):{\mathbb R}^n \rightarrow {\mathbb R}\) is the so-called “objective” function,Footnote 1 and \(\mathbf {h}(\mathbf {x}) = \{ h_i(\mathbf {x}) \} : {\mathbb R}^n \rightarrow {\mathbb R}^m\) and \( \mathbf {g} (\mathbf {x}) = \{ g_j(\mathbf {x}) \} : {\mathbb R}^n \rightarrow {\mathbb R}^p\) are two vector-valued constraint functions. It must be emphasized that, different from the basic concept of objectivity in continuum physics and nonlinear analysis, the objective function used extensively in optimization literature is allowed to be any arbitrarily given function, even the linear function. Therefore, the \(({\mathcal {P}}_{ap})\) is an abstractly (or arbitrarily) proposed problem (APP). Although it enables one to “model” a very wide range of problems, it comes at a price: many global optimization problems are considered to be NP-hard. Without detailed information on these arbitrarily given functions, it is impossible to have a powerful theory for solving the artificial nonconvex problem (1).

Canonical duality-triality is a newly developed and continuously improved methodological theory. This theory comprises mainly: 1) a canonical transformation, which is a versatile methodology that can be used to model complex systems within a unified framework, 2) a complementary-dual principle, which can be used to formulate a perfect dual problem with a unified analytic solution, and 3) a triality theory, which can identify both global and local extrema and to develop effective canonical dual algorithms for solving real-world problems in both continuous and discrete systems. This theory was developed from Gao and Strang’s original work on nonconvex variational/boundary-value problems in large deformation mechanics [43]. It was shown in Gao’s book [18] and in recent articles [40, 53] that the (external) penalty and Lagrange multiplier methods are special applications of the canonical duality theory in convex optimization. It is now understood that this theory reveals an intrinsic multi-scale duality pattern in complex systems, many popular theories and methods in nonconvex analysis, global optimization, and computational science can be unified within the framework of the canonical duality-triality theory. Indeed, it is easy to show that the KKT theory in mathematical programming, the semi-definite programming (SDP) method in global optimization, and the half-quadratic regularization in information technology are naturally covered by the canonical duality theory [39, 56, 86].

Mathematics and mechanics have been complementary partners since the Newton times. Many fundamental ideas, concepts, and mathematical methods extensively used in calculus of variations and optimization are originally developed from mechanics. It is known that the classical Lagrangian duality theory and the associated Lagrange multiplier method were developed by Lagrange in analytical mechanics [51]. The modern concepts of super-potential and sub-differential in convex analysis were proposed by J.J. Moreau from frictional mechanics [63]. The canonical duality theory is also developed from the fundamental concepts of objectivity and work-conjugate principle in continuum physics. The Gao-Strang gap function discovered in finite deformation theory provides a global optimality condition for general nonconvex/nonsmooth variational analysis and global optimization. Application of this theory to nonlinear elasticity leads to a pure complementary energy principle which was a 50-year-old open problem [58]. Generalization to global optimization was made in 2000 [20]. Since then, this theory has been used successfully for solving a large class of challenging problems in multi-disciplinary fields of applied mathematics, computational science, engineering mechanics, operations research, and industrial and systems engineering [11,12,13,14,15,16,17, 22,23,24,25, 36,37,38, 40, 42, 44, 45, 70, 73].

However, as V.I. Arnold indicated [2]: “In the middle of the twentieth century it was attempted to divide physics and mathematics. The consequences turned out to be catastrophic.” Indeed, due to the ever-increasing gap between physics and other fields, some well-defined concepts in continuum physics, such as objectivity, Lagrangian, tensor, and fully nonlinearity, etc., have been seriously misused in optimization, which leads to not only ridiculous arguments but also wrong mathematical models and many artificially proposed problems. Also, the canonical dual transformation theory and methodology have been rediscovered in different forms by researchers from different fields. The main goal of this paper is to bridge this gap by presenting the canonical duality theory in a systematical way from a unified modeling, basic assumptions to the theory, method, and general applications. The methodology, examples, and conjectures presented in this paper are important not only for better understanding this unconventional theory but also for solving many challenging problems in complex systems. This paper will bring some fundamentally new insights into multi-scale complex systems, global optimization, and computational science.

2 Multi-Scale Modeling and Properly Posed Problems

In linguistics, a complete and grammatically correct sentence should be composed by at least three words: subject, object, and a predicate.Footnote 2 As a language of science, the mathematics should follow this rule. Based on the canonical duality principle [18], a unified mathematical problem for multi-scale complex systems was proposed in [26, 28]:

$$\displaystyle \begin{aligned} ({\mathcal{P}}): \;\;\; \min \{ \Pi ({{\boldsymbol{\chi}}}) = {G}(\mathbf{D}{{\boldsymbol{\chi}}}) - F({{\boldsymbol{\chi}}}) \; | \; {{\boldsymbol{\chi}}} \in \mathcal{X}_c \}, {} \end{aligned} $$
(2)

where \(F:\mathcal {X}_a \subset \mathcal {X}\rightarrow {\mathbb R}\) is a subjective function such that the external duality relation \({{\boldsymbol {\chi }}}^* = \nabla F({{\boldsymbol {\chi }}}) = \bar {\boldsymbol {\chi }}^*\) is a given input (or source), its domain \(\mathcal {X}_a\) contains only geometrical constraints (such as boundary or initial conditions), which depends on each given problem; \(\mathbf {D} :\mathcal {X}_a \rightarrow \mathcal {G}_a\) is a linear operator which links the configuration variable \( {{\boldsymbol {\chi }}} \in \mathcal {X}_a\) with an internal variable \(\mathbf {g} = \mathbf {D} {{\boldsymbol {\chi }}} \in \mathcal {G}_a\) at different physical scales; \({G}:\mathcal {G}_a\subset \mathcal {G} \rightarrow {\mathbb R}\) is an objective function such that the internal duality relation g  = ∇G(g) is governed by the constitutive law, its domain \(\mathcal {G}_a\) contains only physical constraints, which depends on mathematical modeling; The feasible set is defined by:

$$\displaystyle \begin{aligned} \mathcal{X}_c = \{ {{\boldsymbol{\chi}}} \in \mathcal{X}_a| \;\; \mathbf{D}{{\boldsymbol{\chi}}} \in \mathcal{G}_a\} . \end{aligned} $$
(3)

The predicate in \(({\mathcal {P}})\) is the operator “−” and the difference Π(χ) is called the target function in general problems. The object and subject are in balance only at the optimal states.

2.1 Objectivity, Isotropy, and Symmetry in Modeling

Objectivity is a central concept in our daily life, related to reality and truth. According to Wikipedia, the objectivity in philosophy means the state or quality of being true even outside a subject’s individual biases, interpretations, feelings, and imaginings.Footnote 3 In science, the objectivity is often attributed to the property of scientific measurement, as the accuracy of a measurement can be tested independent from the individual scientist who first reports it.Footnote 4 In continuum mechanics, the objectivity is also called the principle of frame-indifference [65, 80], which is a basic concept in mathematical modeling [8, 18, 60] but is still subjected to seriously study in continuum physics [59, 64]. Let \( \mathcal {R} \) be a special orthogonal group SO(n), i.e., \(\mathbf {R} \in \mathcal {R} \) if and only if R T = R −1 and \( \det \mathbf {R} = 1\). The following mathematical definition was given in Gao’s book (Definition 6.1.2 [18]).

Definition 1 (Objectivity and Isotropy)

A set \(\mathcal {G}_a \) is said to be objective if \(\mathbf {R} \mathbf {g} \in \mathcal {G}_a\) \(\forall \mathbf {g} \in \mathcal {G}_a\), \(\forall \mathbf {R} \in \mathcal {R}\). A real-valued function \({G}:\mathcal {G}_a \rightarrow {\mathbb R}\) is said to be objective if

$$\displaystyle \begin{aligned} {G}(\mathbf{R} \mathbf{g} ) = {G}(\mathbf{g}) \;\; \forall \mathbf{g} \in \mathcal{G}_a, \; \forall \mathbf{R} \in \mathcal{R}. \end{aligned} $$
(4)

A set \(\mathcal {G}_a \) is said to be isotropic if \( \mathbf {g} \mathbf {R} \in \mathcal {G}_a \;\; \forall \mathbf {g} \in \mathcal {G}_a, \; \forall \mathbf {R} \in \mathcal {R}.\) A real-valued function \({G}:\mathcal {G}_a \rightarrow {\mathbb R}\) is said to be isotropic if

$$\displaystyle \begin{aligned} {G}(\mathbf{g} \mathbf{R} ) = {G}(\mathbf{g}) \;\; \forall \mathbf{g} \in \mathcal{G}_a, \; \forall \mathbf{R} \in \mathcal{R}. \end{aligned} $$
(5)

Lemma 1

A real-valued function G(g) is objective if and only if there exists a real-valued function Φ(C) such that G(g) =  Φ(g T g).

Geometrically speaking, an objective function is rotational symmetry, which should be an SO(n)-invariant in n-dimensional Euclidean space. Physically, an objective function doesn’t depend on observers. Because of Noether’s theorem,Footnote 5 rotational symmetry of a physical system is equivalent to the angular momentum conservation law (see Section 6.1.2 [18]). Therefore, the objectivity is essential for any real-world mathematical models. In Euclidean space \(\mathcal {G}_a \subset {\mathbb R}^n\), the simplest objective function is the 2-norm ∥g∥ in \({\mathbb R}^n\) as we have \(\|\mathbf {R} \mathbf {g} \|{ }^2 = {\mathbf {g}}^T {\mathbf {R}}^T \mathbf {R} \mathbf {g} = \|\mathbf {g}\|{ }^2 \;\; \forall \mathbf {R} \in \mathcal {R}\). In continuum physics, the objectivity implies that the equilibrium condition of angular momentum (symmetry of the Cauchy stress tensor σ = ∂G(g), Section 6.1 [18]) holds. It was emphasized by P.G. Ciarlet that the objectivity is not an assumption, but an axiom [8]. In Gao and Strang’s work, the internal energy W(g) must be an objective function such that its variation (Gâteaux derivative) σ = ∂W(g) is the so-called constitutive duality law, which depends only on the intrinsic property of the system.

2.2 Subjectivity, Symmetry Breaking, and Well-Posed Problem

Dual to the objective function that depends on modeling, the subjective function F(χ) depends on each problem such that its variation is governed by the action-reaction duality law: \(\bar {\boldsymbol {\chi }}^* = \partial {F}({{\boldsymbol {\chi }}}) \in \mathcal {X}^*\). From the point view of systems theory, the action \(\bar {\boldsymbol {\chi }}^* \in \mathcal {X}^*\) can be considered as the input or source of the system, and the reaction \({{\boldsymbol {\chi }}} \in \mathcal {X}\) should be the output (or the configuration, the state) of the system. A system is conservative if the action is independent of the reaction. Therefore, the subjective function must be linear on its domain \(\mathcal {X}_a\) and, by Riesz representation theorem, we should have \(F({{\boldsymbol {\chi }}}) = \langle {{\boldsymbol {\chi }}}, \bar {\boldsymbol {\chi }}^* \rangle \), where the bilinear form \(\langle {{\boldsymbol {\chi }}}, {{\boldsymbol {\chi }}}^* \rangle :\mathcal {X} \times \mathcal {X}^* \rightarrow {\mathbb R}\) puts \(\mathcal {X}\) and \(\mathcal {X}^*\) in duality. The target function Π(χ) = G(D χ) − F(χ) can have different physical meanings in real-world applications. For example, in continuum mechanics the subjective function F(χ) is the external energy, the objective function G(g) is the stored energy, then Π(χ) is the total potential energy. In this case, the minimum total potential energy principle leads to the general variational problem (2). The criticality condition  Π(χ) = 0 leads to the equilibrium (Euler-Lagrange) equation:

$$\displaystyle \begin{aligned} A ({{\boldsymbol{\chi}}}) = {\mathbf{D}}^* \partial {G}(\mathbf{D} {{\boldsymbol{\chi}}}) =\bar{\boldsymbol{\chi}}^* {} \end{aligned} $$
(6)

where \({\mathbf {D}}^*: \mathcal {G}^*_a \rightarrow \mathcal {X}^*\) is an adjoint operator of D and \(A:\mathcal {X}_c \rightarrow \mathcal {X}^*\) is called equilibrium operator. The triality structure \(\mathbb {S}^e = \{ \langle \mathcal {X}, \mathcal {X}^* \rangle ; A \} \) forms an elementary system in Gao’s book (Section 4.3, [18]). This abstract form covers the most well-known equilibrium problems in real-world applications ranging from mathematical physics in continuous analysis to mathematical programming in discrete systems [18, 34]. Particularly, if G(g) is quadratic such that 2 G(g) = H, then the operator \(A:\mathcal {X}_c \rightarrow \mathcal {X}^*\) is linear and can be written in the triality form: A = D HD, which appears extensively in mathematical physics, optimization, and linear systems (see the celebrated text by Strang [77]). Clearly, any convex quadratic function G(D χ) is objective due to the Cholesky decomposition A = Λ Λ≽ 0.

According to the action-reaction duality in physics, if there is no action or demand (i.e., \(\bar {\boldsymbol {\chi }}^* = 0\)), the system has no reaction (i.e., χ = 0). Dually, a real-world problem should have at least one nontrivial solution for any given nontrivial input.

Definition 2 (Properly and Well-Posed Problems)

A problem is called properly posed if for any given nontrivial input it has at least one nontrivial solution. It is called well-posed if the solution is unique.

Clearly, this definition is more general than Hadamard’s well-posed problems in dynamical systems since the continuity condition is not required. Physically speaking, any real-world problems should be well-posed since all natural phenomena exist uniquely. But practically, it is difficult to model a real-world problem precisely. Therefore, properly posed problems are allowed for the canonical duality theory. This definition is important for understanding the triality theory and NP-hard problems.

2.3 Management Optimization

In management science, the decision variable χ is simply a vector \(\mathbf {x} \in {\mathbb R}^n\), which could represent the products of a manufacture company. The input \(\bar {\boldsymbol {\chi }}^*\) can be considered as market price (or demanding), denoted by \(\mathbf {f} \in {\mathbb R}^n\). Therefore, the subjective function 〈x, f〉 = x T f in this example is the total income of the company. The products are produced by workers \(\mathbf {g} \in {\mathbb R}^m\). Due to the cooperation, we have g = Dx and \(\mathbf {D} \in {\mathbb R}^{m\times n}\) is a matrix. Workers are paid by salary g  = ∂G(g), and therefore, the objective function G(g) is the cost (in this example G is not necessarily to be objective since the company is a man-made system). Then, the target Π(x) = G(Dx) −x T f is the total loss and the minimization problem \(\min \Pi (\mathbf { x})\) leads to the equilibrium equation:

$$\displaystyle \begin{aligned} {\mathbf{D}}^T \partial_{\mathbf{g}} {G}(\mathbf{D} \mathbf{x}) = \mathbf{f} . \end{aligned}$$

The cost function G(g) could be convex for a small company, but usually nonconvex for big companies to allow some people having the same salaries.

If the company has to make a profit \(\frac {1}{2} {\alpha } \|\mathbf {x}\|{ }^2\), where α > 0 is a parameter, then the target function is \(\Pi (\mathbf {x}) = {G}(\mathbf {D} \mathbf {x}) + \frac {1}{2} {\alpha } \|\mathbf {x} \|{ }^2 - {\mathbf {x}}^T \mathbf {f} \) and the minimization problem \(\min \Pi (\mathbf {x})\) leads to:

$$\displaystyle \begin{aligned} {\alpha} \mathbf{x} = \mathbf{f} - {\mathbf{D}}^T \partial_{\mathbf{g}} {G}(\mathbf{D} \mathbf{x}) .{} \end{aligned} $$
(7)

This is a fixed point problem. In this case, if we let \(\bar {\mathbf {g}} =\bar {\mathbf {D}} \mathbf {x} = (\mathbf {D} \mathbf {x}, \mathbf {x})\) and \(\bar {G} = {G}(\mathbf {g}) + \frac {1}{2} {\alpha } \|\mathbf {x}\|{ }^2\), then the fixed point problem (7) can be written in the unified form of:

$$\displaystyle \begin{aligned} \bar{\mathbf{D}}^T \partial_{\bar{\mathbf{g}}} \bar{G}(\bar{\mathbf{D}} \mathbf{x}) = \mathbf{f}. \end{aligned}$$

This shows that the fixed point problem is a special case of the general equilibrium equation (6), a necessary condition of the general minimization problem \(({\mathcal {P}}_g)\).

2.4 Nonconvex Analysis and Boundary-Value Problems

For static systems, the unknown of a mixed boundary-value problem is a vector-valued function:

$$\displaystyle \begin{aligned} \begin{array}{rcl} {{\boldsymbol{\chi}}}(\mathbf{x}) \in \mathcal{X}_a &\displaystyle =&\displaystyle \{ {{\boldsymbol{\chi}}}\in \mathcal{C}[\Omega, {\mathbb R}^m]| \;\; {{\boldsymbol{\chi}}}(\mathbf{x}) = 0 \;\; \forall \mathbf{x} \in \Gamma_\chi \}, \\ &\displaystyle &\displaystyle \Omega\subset {\mathbb R}^d, \;\; d \le 3, \;\; m \ge 1, \;\; \partial \Omega = \Gamma_\chi\cup{\Gamma_t}, \end{array} \end{aligned} $$

and the input is \(\bar {\boldsymbol {\chi }}^* = \{ \mathbf {f}(\mathbf {x}) \;\;\forall \mathbf {x} \in \Omega , \;\; \mathbf {t}(\mathbf {x}) \; \forall \mathbf {x} \in {\Gamma _t} \}\) [43]. In this case, the external energy is \({F}({{\boldsymbol {\chi }}}) = \langle {{\boldsymbol {\chi }}}, \bar {\boldsymbol {\chi }}^* \rangle = \int _\Omega {{\boldsymbol {\chi }}}\cdot \mathbf {f} \,\mbox{d}\Omega + \int _{\Gamma _t} {{\boldsymbol {\chi }}}\cdot \mathbf {t} \,\mbox{d} \Gamma \). In nonlinear analysis, D is a gradient-like partial differential operator and \(\mathbf {g} = \mathbf {D} {{\boldsymbol {\chi }}} \in \mathcal {G}_a \subset \mathcal {L}^p[\Omega ; {\mathbb R}^{m\times d}]\) is a two-point tensor field [18] over Ω. The internal energy G(g) is defined by:

$$\displaystyle \begin{aligned} {G}(\mathbf{g}) = \int_\Omega {U}(\mathbf{x}, \mathbf{g}) \,\mbox{d}\Omega , \end{aligned} $$
(8)

where \({U}(\mathbf {x},\mathbf {g}):\Omega \times \mathcal {G}_a \rightarrow {\mathbb R}\) is the stored energy density. The system is (space) homogeneous if U = U(g). Thus, G(g) is objective if and only if U(x, g) is objective on an objective set \( \mathcal {G}_a\). By the facts that g = Du is a two-point tensor, which is not considered as a strain measure, but the (right) Cauchy-Green tensor C = g T g is an objective strain tensor, there must exist a function Φ(C) such that G(g) =  Φ(C). In nonlinear elasticity, the function Φ(C) is usually convex and the duality C  =  Φ(C) is invertible (i.e., Hill’s work-conjugate principle [18]). These basic truths in continuum physics laid a foundation for the canonical duality theory.

By finite element method, the domain Ω is divided into m-elements { Ωe} such that the unknown function is piecewisely discretized by χ(x) ≃N e(x)χ ex ∈ Ωe. Thus, the nonconvex variational problem (2) can be numerically reformulated in a global optimization problem:

$$\displaystyle \begin{aligned} \min \{ \Pi({\boldsymbol{\chi}}) = {G}(\mathbf{D} {\boldsymbol{\chi}}) - \langle {\boldsymbol{\chi}} , \mathbf{f} \rangle \;\; | \;\; {\boldsymbol{\chi}} \in \mathcal{X}_c \} , \end{aligned} $$
(9)

where χ = {χ e} is the discretized unknown χ(x), D is a generalized matrix depending on the interpolation N e(x), and \(\mathcal {X}_c\) is a convex constraint set including the boundary conditions. The canonical dual finite element method was first proposed in 1996 [11]. Applications have been given recently in engineering and sciences [30, 45, 73].

2.5 Lagrangian Mechanics and Initial-Value Problems

In Lagrange mechanics [51, 52], the unknown \({{\boldsymbol {\chi }}}(t) \in \mathcal {X}_a \subset \mathcal {C}^1[I;{\mathbb R}^n]\) is a vector field over a time domain \(I \subset {\mathbb R}\). Its components {χ i(t)} (i = 1, …, n) are known as the Lagrangian coordinates. Its dual variable \(\bar {\boldsymbol {\chi }}^*\) is the action vector function in \({\mathbb R}^n\), denoted by f(t). The external energy \({F}({{\boldsymbol {\chi }}}) = \langle {{\boldsymbol {\chi }}}, \bar {\boldsymbol {\chi }}^* \rangle = \int _I {{\boldsymbol {\chi }}}(t) \cdot \mathbf {f}(t) {\,\mbox{d}t}\). While the internal energy G(D χ) is the so-called action:

$$\displaystyle \begin{aligned} {G}(\mathbf{D} {{\boldsymbol{\chi}}}) = \int_I L(t, {{\boldsymbol{\chi}}}, \dot{ {{\boldsymbol{\chi}}}} ) {\,\mbox{d}t} , \;\;\; L= {T}( \dot{ {{\boldsymbol{\chi}}}} ) - {U}(t, {{\boldsymbol{\chi}}}), \end{aligned} $$
(10)

where \(\mathbf {D} {{\boldsymbol {\chi }}} = \{ 1 , \partial _t \} {{\boldsymbol {\chi }}} = \{ {{\boldsymbol {\chi }}}, \; \dot {{{\boldsymbol {\chi }}}} \}\) is a vector-valued mapping, T is the kinetic energy density, U is the potential density, and L = T − U is the Lagrangian density. Together, Π(χ) = G(D χ) − F(χ) is called the total action. This standard form holds from the classical Newton mechanics to quantum field theory.Footnote 6 Its stationary condition leads to the well-known Euler-Lagrange equation:

$$\displaystyle \begin{aligned} A({{\boldsymbol{\chi}}}) = {\mathbf{D}}^* \partial{G}(\mathbf{D}{{\boldsymbol{\chi}}}) = \{ - \partial_t , 1 \} \cdot \partial L({{\boldsymbol{\chi}}}, \dot{{{\boldsymbol{\chi}}}}) = - \partial_t \partial_{\dot{{{\boldsymbol{\chi}}}}}{T}( \dot{{{\boldsymbol{\chi}}}}) - \partial_{{\boldsymbol{\chi}}} {U}( t, {{\boldsymbol{\chi}}} ) = \mathbf{f} .{} \end{aligned} $$
(11)

The system is called (time) homogeneous if \(L = L({{\boldsymbol {\chi }}},\dot {{{\boldsymbol {\chi }}}})\). In general, the kinetic energy T must be an objective function of the velocityFootnote 7 \({\mathbf {v}}_k = \dot {\mathbf {x}}_k({{\boldsymbol {\chi }}})\) of each particle \(\mathbf { x}_k = {\mathbf {x}}_k({{\boldsymbol {\chi }}}) \in {\mathbb R}^3 \;\; \forall k \in \{1, \dots , K\}\), while the potential density U depends on each problem. For Newtonian mechanics, we have χ(t) = x(t) and \({T}(\mathbf {v}) = \frac {1}{2} m \|\mathbf {v} \|{ }^2 \) is quadratic. If U = 0, the equilibrium equation \(A({{\boldsymbol {\chi }}}) = - m \ddot {\mathbf {x}}(t) = \mathbf {f}\) includes the Newton second law: \(\mathbf {F} = m \ddot {\mathbf {x}}\) and the third law: −F = f. The first law \(\mathbf {v} = \dot {\mathbf {x}} = {\mathbf {v}}_0\) holds only if f = 0. In this case, the system has either a trivial solution x = 0 or infinitely many solutions x(t) = v 0 t + x 0, depending on the initial conditions in \(\mathcal {X}_a\). This simple fact in elementary physics plays a key role in understanding the canonical duality theory and NP-hard problems in global optimization.

By using the methods of finite difference and least squares [39, 54], the general nonlinear dynamical system (11) can also be formulated as the same global optimization problem (9), where χ = {χ i(t k)} is the Lagrangian coordinates χ i(i = 1, …, n) at each discretized time t k(k = 1, …, m), D is a finite difference matrix, and \(\mathcal {X}_c\) is a convex constraint set including the initial condition [54]. By the canonical duality theory, an intrinsic relation between chaos in nonlinear dynamics and NP-hardness in global optimization was revealed recently in [54].

2.6 Mono-Duality and Duality Gap

Lagrangian duality was developed from Lagrange mechanics since 1788 [51], where the kinetic energy \({T}(\mathbf {v}) = \sum _{k}\frac {1}{2} m_k \|{\mathbf {v}}_k\|{ }^2\) is a quadratic (objective) function. For convex static systems (or dynamical systems but U(χ) = 0), the stored energy \({G}:\mathcal {G}_a \rightarrow {\mathbb R}\) is convex and its Legendre conjugate G (σ) = {〈g;σ〉− G(g)| σ = ∂G(g)} is uniquely defined on \(\mathcal {G}^*_a\). Thus, by G(D χ) = 〈D χ;σ〉− G (σ) the total action or potential Π(χ) can be written in the Lagrangian formFootnote 8 \(L:\mathcal {X}_a \times \mathcal {G}_a^* \rightarrow {\mathbb R}\):

$$\displaystyle \begin{aligned} L({{\boldsymbol{\chi}}}, {\boldsymbol{\sigma}}) = \langle \mathbf{D} {{\boldsymbol{\chi}}} ; {\boldsymbol{\sigma}} \rangle - {G}^*({\boldsymbol{\sigma}}) - \langle {{\boldsymbol{\chi}}}, \mathbf{f} \rangle = \langle {{\boldsymbol{\chi}}}, {\mathbf{D}}^* {\boldsymbol{\sigma}} - \mathbf{f} \rangle - {G}^*({\boldsymbol{\sigma}}), \end{aligned} $$
(12)

where \({{\boldsymbol {\chi }}}\in \mathcal {X}_a\) can be viewed as a Lagrange multiplier for the equilibrium equation \({\mathbf {D}}^* {\boldsymbol {\sigma }} = {\mathbf {f}} \in \mathcal {X}^*_a\). In linear elasticity, L(χ, σ) is the well-known Hellinger-Reissner complementary energy [18]. Let \(\mathcal {S}_c = \{ {\boldsymbol {\sigma }} \in \mathcal {G}^*_a | \; {\mathbf {D}}^* {\boldsymbol {\sigma }} = {\mathbf {f}} \} \) be the so-called statically admissible space. Then, the Lagrangian dual of the general problem \(({\mathcal {P}})\) is given by:

$$\displaystyle \begin{aligned} ({\mathcal{P}}^*): \;\;\; \max \{ \Pi^*({\boldsymbol{\sigma}}) = - {G}^*({\boldsymbol{\sigma}}) | \; {\boldsymbol{\sigma}} \in \mathcal{S}_c \}, {} \end{aligned} $$
(13)

and the saddle Lagrangian leads to a well-known min-max duality in convex (static) systems:

$$\displaystyle \begin{aligned} \min_{{{\boldsymbol{\chi}}}\in \mathcal{X}_c} \Pi({{\boldsymbol{\chi}}}) = \min_{{{\boldsymbol{\chi}}}\in \mathcal{X}_a} \max_{{\boldsymbol{\sigma}} \in \mathcal{G}^*_a} L({{\boldsymbol{\chi}}}, {\boldsymbol{\sigma}}) = \max_{{\boldsymbol{\sigma}} \in \mathcal{G}^*_a} \min_{{{\boldsymbol{\chi}}}\in \mathcal{X}_a} L({{\boldsymbol{\chi}}}, {\boldsymbol{\sigma}}) = \max_{{\boldsymbol{\sigma}} \in \mathcal{S}_c} \Pi^*({\boldsymbol{\sigma}}) . \end{aligned} $$
(14)

This one-to-one duality is the so-called mono-duality in Chapter 1 [18], or the complementary-dual variational principle in continuum physics. In finite elasticity, the Lagrangian dual is also known as the Levison-Zubov principle. However, this principle holds only for convex problems. In real-world problems, the stored energy G(g) is usually nonconvex in order to model complex phenomena. Its complementary energy can’t be determined uniquely by the Legendre transformation. Although its Fenchel conjugate \({G}^\sharp :\mathcal {G}^*_a \rightarrow {\mathbb R} \cup \{ + \infty \}\) can be uniquely defined, the Fenchel-Moreau dual problem:

$$\displaystyle \begin{aligned} ({\mathcal{P}}^\sharp ): \;\;\; \max \{ \Pi^\sharp({\boldsymbol{\sigma}}) = - {G}^\sharp({\boldsymbol{\sigma}}) | \;\; {\boldsymbol{\sigma}} \in \mathcal{S}_c\} \end{aligned} $$
(15)

is not considered as a complementary-dual problem due to Fenchel-Young inequality:

$$\displaystyle \begin{aligned} g_a = \min \{ \Pi({{\boldsymbol{\chi}}}) | \; {{\boldsymbol{\chi}}}\in \mathcal{X}_c \} \ge \max \{ \Pi^\sharp({\boldsymbol{\sigma}}) | \;\; {\boldsymbol{\sigma}} \in \mathcal{S}_c\} = g_p , \end{aligned} $$
(16)

and g ap = g a − g p ≠ 0 is the well-known duality gap. This duality gap is intrinsic to all Lagrange-Fenchel-Moreau types of duality problems since the linear operator D can’t change the nonconvexity of G(D χ). It turns out that the existence of a pure stress σ based complementary-dual principle was a well-known debate in nonlinear elasticity for more than fifty years [58].

Remark 1 (Equilibrium Constraints and Lagrange Multiplier Law)

Strictly speaking, the Lagrange multiplier method can be used mainly for equilibrium constraint in \(\mathcal {S}_c\) and the Lagrange multiplier must be the solution to the primal problem (see Section 1.5.2 [18]). The equilibrium equation D σ = f must be an invariant under certain coordinates transformation, say the law of angular momentum conservation, which is guaranteed by the objectivity of the stored energy G(D χ) in continuum mechanics (see Definition 6.1.2, [18]), or by the isotropy of the kinetic energy \({T}(\dot {{{\boldsymbol {\chi }}}})\) in Lagrangian mechanics [52]. Specifically, the equilibrium equation for Newtonian mechanics is an invariant under the Galilean transformation; while for Einstein’s special relativity theory, the equilibrium equation D σ = f is an invariant under the Lorentz transformation. For linear equilibrium equation, the quadratic G(g) is naturally an objective function for convex systems. Unfortunately, since the concept of the objectivity is misused in optimization and the notation of the Euclidian coordinate x = {x i} is used as the unknown, the Lagrange multiplier method and the associated augmented methods have been mistakenly used for solving general nonconvex optimization problems, which produces many artificial duality gaps [53]. ♣

2.7 Bi-Duality and Conceptual Mistakes

For convex Hamiltonian systems, the action G(D χ) in (10) is a d.c. (difference of convex) functional and the Lagrangian has its standard form in Lagrangian mechanics (see Section 2.5.2 [18] with q(t) = χ and p = σ):

$$\displaystyle \begin{aligned} L(\mathbf{q}, {\mathbf{p}}) = \langle\dot{\mathbf{q}} ; {\mathbf{p}} \rangle - \int_I [{T}^*({\mathbf{p}}) + {U}(\mathbf{q})]{\,\mbox{d}t} - \langle \mathbf{q} , \mathbf{f} \rangle, \end{aligned} $$
(17)

where \(\mathbf {q} \in \mathcal {X}_a \subset \mathcal {C}^1[I,{\mathbb R}^n]\) is the Lagrange coordinate and \({\mathbf {p}} \in \mathcal {S}_a \subset \mathcal {C}[I,{\mathbb R}^n]\) is the momentum. In this case, the Lagrangian is a bi-concave functional on \(\mathcal {X}_a \times \mathcal {S}_a\), but the Hamiltonian:

$$\displaystyle \begin{aligned} H(\mathbf{q},{\mathbf{p}}) = \langle\mathbf{D}{\mathbf{q}} ; {\mathbf{p}} \rangle -{L}(\mathbf{q},{\mathbf{p}}) = \int_I [{T}^*({\mathbf{p}}) + {U}(\mathbf{ q})]{\,\mbox{d}t} \end{aligned} $$
(18)

is convex.Footnote 9 The total action and its canonical dual are [18]

$$\displaystyle \begin{aligned} \Pi(\mathbf{q})& = \max \{ {L}(\mathbf{q}, {\mathbf{p}}) | \;\; {\mathbf{p}} \in \mathcal{V}^*_a\} = \int_I[{T}(\dot{\mathbf{q}}) - {U}(\mathbf{q}) ] {\,\mbox{d}t} - \langle \mathbf{q}, \mathbf{f} \rangle \;\; \forall \mathbf{q} \in \mathcal{X}_c \end{aligned} $$
(19)
$$\displaystyle \begin{aligned} \Pi^d({\mathbf{p}}) & = \max \{ {L}(\mathbf{q}, {\mathbf{p}}) | \;\; \mathbf{q} \in \mathcal{X}_a\} = \int_I [{U}^*(\dot{{\mathbf{p}}}) - {T}^*({\mathbf{p}}) ] {\,\mbox{d}t} \;\; \forall {\mathbf{p}} \in \mathcal{S}_c \end{aligned} $$
(20)

Clearly, both Π and Πd are d.c. functionals. In this case, the so-called bi-duality was first presented in author’s book Chapter 2 [18]:

Theorem 1 (Bi-Duality Theory)

For a given convex Hamiltonian system, if \((\bar {\mathbf {q}}, \bar {\mathbf {p}})\) is a critical point of L(q, p) over the time interval \(I\subset {\mathbb R}\) , then we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Pi(\bar{\mathbf{q}}) = \min \Pi(\mathbf{q}) \;\; &\displaystyle \Leftrightarrow&\displaystyle \;\; \min \Pi^d({\mathbf{p}}) = \Pi^d(\bar{\mathbf{p}}) \end{array} \end{aligned} $$
(21)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \Pi(\bar{\mathbf{q}}) = \max \Pi(\mathbf{q}) \;\; &\displaystyle \Leftrightarrow&\displaystyle \;\; \max \Pi^d({\mathbf{p}}) = \Pi^d(\bar{\mathbf{p}}). \end{array} \end{aligned} $$
(22)

The mathematical proof of this theory was given in Section 2.6 [18] for convex Hamiltonian systems and in Corollary 5.3.6 [18] for d.c. programming problems. This bi-duality revealed not only an interesting dynamical extremum principle in periodic motion but also an important truth in convex Hamiltonian systems.

Remark 2 (Least Action Principle and Conceptual Mistakes)

The least action principle plays a central role in physics from the classical Newtonian mechanics, general relativity (Einstein-Hilbert action), to the modern string theory. Credit for the formulation of this principle is commonly given to Pierre Louis Maupertuis, who felt that “Nature is thrifty in all its actions.” It was historically called “least” because its solution requires finding the path that has the least value [9], say Fermat’s principle in optics. However, in Hamiltonian systems it should be accurately called the principle of stationary action since its solution does not minimize the total action. Actually, the validity of the least action principle has remained obscure in physics for several centuries. As a footnote in their celebrated book (Section 1.2, [52]), Landau and Lifshitz pointed out that the least action principle holds only for a sufficient small time interval, not for the whole trajectory of the system. Unfortunately, this is not true in general since the total action could be a concave functional within a sufficient small time interval.

Theorem 1 shows that a convex Hamiltonian system is controlled by the bi-duality, which revealed the following truths (see page 77 [18]):

The least action principle is not valid for any periodic motion.

It holds for the whole trajectory of the system if the potential U(q) = 0.

The bi-duality theory has been challenged by M.D. Voisei, C. Zălinescu, and his former student R. Strugariu in a paper published in a dynamical systems journal [78]. Instead of finding any possible mistakes in author’s work, they created an artificial “Lagrangian”:

and the associated “total action” f(x) as well as its “dual action” g(y):

$$\displaystyle \begin{aligned} \begin{array}{rcl} f(x) &\displaystyle =&\displaystyle \max \{ L(x,y) | \; y \in Y \} = -\frac{1}{2} {\alpha} \| x\|{}^2 + \frac{1}{2} \beta^{-1} \langle a, x \rangle^2 \|b\|{}^2 \;\; \forall x \in X \\ g(y) &\displaystyle =&\displaystyle \max \{ L(x,y) | \; x \in X \} = - \frac{1}{2} \beta \| y\|{}^2 + \frac{1}{2} {\alpha}^{-1} \langle b, y\rangle^2 \|a \|{}^2 \;\;\; \forall y \in Y \end{array} \end{aligned} $$

By using these elementary functions in linear algebra, they produced a series of strange counterexamples to against the bi-duality theory in convex Hamiltonian systems presented by the author in Chapter 2 [18]. They claimed: “Because our counter-examples are very simple, using quadratic functions defined on whole Hilbert (even finite-dimensional) spaces, it is difficult to reinforce the hypotheses of the above mentioned results in order to keep the same conclusions and not obtain trivialities.”

Clearly, the quadratic function L(x, y) created by Zălinescu et al. is totally irrelevant to the Lagrangian L(q, p) in Lagrangian mechanics and in Gao’s book [18]. Without the differential operator D =  t, the quadratic d.c. function f(x) (or g(y)) is defined on one-scale space X (or Y ) and is unbounded. Therefore, its critical point does not produce any motion. This basic mistake shows that these people don’t have necessary knowledge not only in Lagrangian mechanics (the time derivative D =  t is necessary for any dynamical systems) but also in d.c. programming (unconstrained quadratic d.c. programming does not make any sense). It also shows that these people even don’t know what the Lagrangian coordinate is, otherwise, they would never use a time-independent vector \(x\in {\mathbb R}^n\) as an unknown in dynamical systems.

Moreover, since there is neither input in L(x, y) nor initial/boundary conditions in X, all the counterexamples produced by Zălinescu et al. are simply not problems but only artificial “models.” Since they don’t follow the basic rules in mathematical modeling, such as the objectivity, symmetry, conservation, and constitutive laws, etc., these artificial “models” are very strange and even ugly (see Examples 3.3, 4.2, 4.4 [78]). This type of mistakes shows that these people don’t know the difference between the modeling and problems. ♣

3 Unified Problem and Canonical Duality-Triality Theory

In this section, we simply restrict our discussion in finite-dimensional space \(\mathcal {X}\). Its element \({\boldsymbol {\chi }} \in \mathcal {X}\) could be a vector, a matrix, or a tensor.Footnote 10 In this case, the linear operator D is a generalized matrixFootnote 11 \(\mathbf {D}:\mathcal {X} \rightarrow \mathcal {G}\) and \(\mathcal {G}\) is a generalized matrix space equipped with a natural norm ∥g∥. Let \(\mathcal {X}_a \subset \mathcal {X}\) be a convex subset (with only linear constraints) and \(\mathcal {X}^*_a\) be its dual set such that for any given input \(\bar {\boldsymbol {\chi }}^* = \mathbf {f} \in \mathcal {X}^*_a\) the subjective function \( \langle {\boldsymbol {\chi }}, \mathbf {f} \rangle \ge 0 \;\; \forall {\boldsymbol {\chi }} \in \mathcal {X}_a\). Although the objectivity is necessary for real-world modeling, the numerical discretization of D χ could lead to a complicated function G(D χ) which may not be objective in g = D χ. Also in operations research, many challenging problems are artificially proposed. Thus, the objectivity required in Gao and Strang’s original work on nonlinear elasticity has been relaxed by the canonical duality since 2000 [20].

3.1 Canonical Transformation and Gap Function

Definition 3 (Canonical Function and Transformation)

A real-valued function \({\Phi }:\mathcal {E}_a \rightarrow {\mathbb R}\) is called a canonical function if its domain \(\mathcal {E}_a\) is convex and the duality relation \({\boldsymbol {\xi }}^* = \partial {\Phi } ({\boldsymbol {\xi }}) : \mathcal {E}_a \rightarrow \mathcal {E}_a^*\) is bijective.

For a given real-valued function \({G}:\mathcal {G}_a \rightarrow {\mathbb R}\), if there exist a mapping \({\boldsymbol {\xi }}: \mathcal {G}_a \rightarrow \mathcal {E}_a \) and a canonical function \({\Phi }:\mathcal {E}_a \rightarrow {\mathbb R}\) such that

$$\displaystyle \begin{aligned} {G} (\mathbf{g}) = {\Phi}({\boldsymbol{\xi}}(\mathbf{g})) \;\; \forall \mathbf{g} \in \mathcal{G}_a , {} \end{aligned} $$
(23)

the transformation (23) is called the canonical transformation.

The canonical function is not necessary to be convex. Actually, in many real-world applications, Φ(ξ) is usually concave. For example, in differential geometry and finite deformation theory (see [18]) the objective function G(g) is the deformation Jacobian:

$$\displaystyle \begin{aligned} {G}(\mathbf{g}) = \sqrt{\det({\mathbf{g}}^T\mathbf{g})}, \;\; \mathbf{g} = \nabla {\boldsymbol{\chi}} . \end{aligned} $$
(24)

If we chose \( {\boldsymbol {\xi }} = \det (\mathbf { g}^T\mathbf {g})\) as the geometrical measure, which is the third invariant of the Riemannian metric tensor g T g and usually denoted as I 3, the canonical function Φ(I 3) is then a concave function of the scale measure I 3(g).

The canonical duality is a fundamental principle in sciences and oriental philosophy, which underlies all natural phenomena. Therefore, instead of the objectivity in continuum physics, a generalized objective function G(g) is used in the canonical duality theory under the following assumption.

Definition 4 (Generalized Objective Function)

For a given real-valued function \({G}:\mathcal {G}_a \rightarrow {\mathbb R}\), if there exist a measure \({\boldsymbol {\xi }}:\mathcal {G}_a \rightarrow \mathcal {E}_a\) and a canonical function \({\Phi }:\mathcal {E}_a \rightarrow {\mathbb R}\) such that the following conditions hold:

  • (D1) Positivity: \({G}(\mathbf {g}) \ge 0 \;\; \forall \mathbf {g} \in \mathcal {G}_a\);

  • (D2) Canonicality: \({G}(\mathbf {g}) = {\Phi }({\boldsymbol {\xi }}(\mathbf {g})) \;\; \forall \mathbf {g} \in \mathcal {G}_a\),

then \({G}:\mathcal {G}_a \rightarrow {\mathbb R}\) is called to be a generalized objective function.

The canonical transformation plays an important role in mathematical modeling and nonlinear analysis. Let \( {\Lambda } = {\boldsymbol {\xi }} \circ \mathbf {D} :\mathcal {X}_a \rightarrow \mathcal {E}_a\) be the so-called geometrically admissible operator and \(\langle {\boldsymbol {\xi }} ; {\boldsymbol {\varsigma }} \rangle :\mathcal {E} \times \mathcal {E}^* \rightarrow {\mathbb R}\) be the bilinear form which puts \(\mathcal {E}\) and \(\mathcal {E}^*\) in duality. By (D2), we have \(\mathcal {X}_c = \{ {\boldsymbol {\chi }} \in \mathcal {X}_a | \;\; {\Lambda }({\boldsymbol {\chi }}) \in \mathcal {E}_a \}\). The problem \(({\mathcal {P}})\) can be equivalently written in the following canonical form:

$$\displaystyle \begin{aligned} ({\mathcal{P}}_g): \;\; \min \left\{ \Pi({\boldsymbol{\chi}}) = {\Phi}({\Lambda}( {\boldsymbol{\chi}})) - \langle {\boldsymbol{\chi}} , \mathbf{f} \rangle | \;\; {\boldsymbol{\chi}} \in \mathcal{X}_c \right\}. \end{aligned} $$
(25)

By the facts that the canonical duality is a universal principle in nature, the canonical measure Λ(χ) is not necessarily to be objective, and the spaces \(\mathcal {X}\), \(\mathcal {E}\) could be at different physical scale with totally different dimensions, the canonical problem \(({\mathcal {P}}_g)\) can be used to study general optimization problems in multi-scale complex systems. The criticality condition of \(({\mathcal {P}}_g)\) is governed by the fundamental principle of virtual work:

$$\displaystyle \begin{aligned} \langle {\Lambda}_t ({\boldsymbol{\chi}})\delta {\boldsymbol{\chi}} ; {\boldsymbol{\varsigma}} \rangle = \langle \delta {\boldsymbol{\chi}} , {\Lambda}^*_t({\boldsymbol{\chi}}) {\boldsymbol{\varsigma}} \rangle = \langle \delta {\boldsymbol{\chi}}, \mathbf{f} \rangle \;\; \forall \delta {\boldsymbol{\chi}}\in \mathcal{X}_c, \end{aligned} $$
(26)

where Λt(χ) =  Λ(χ) represents a generalized Gâteaux (or directional) derivative of Λ(χ), its adjoint \({\Lambda }^*_t\) is called the balance operator, ς = ξ (ξ) =  Φ(ξ) and \({\boldsymbol {\xi }}^*:\mathcal {E}_a \rightarrow \mathcal {E}^*_a \) is a canonical dual (or constitutive) operator. The strong form of this virtual work principle is called the canonical equilibrium equation:

$$\displaystyle \begin{aligned} \mathbf{A} ({\boldsymbol{\chi}}) = {\Lambda}^*_t({\boldsymbol{\chi}}) {\boldsymbol{\xi}}^* ({\Lambda}({\boldsymbol{\chi}})) = \mathbf{f} . \end{aligned} $$
(27)

A system governed by this equation is called a canonical system and is denoted as (see Chapter 4, [18]):

$$\displaystyle \begin{aligned} \mathbb{S}_a = \{ \langle \mathcal{X}_a , \mathcal{X}_a^* \rangle , \langle \mathcal{E}_a ; \mathcal{E}_a^* \rangle ; ( {\Lambda}, {\boldsymbol{\xi}}^* ) \}. \end{aligned}$$

Definition 5 (Classification of Nonlinearities)

The system \(\mathbb {S}_a \) is called geometrically nonlinear (resp., linear) if the geometrical operator \({\Lambda }:\mathcal {X}_a \rightarrow \mathcal {E}_a\) is nonlinear (resp., linear); the system is called physically (or constitutively) nonlinear (resp., linear) if the canonical dual operator \({\boldsymbol {\xi }}^*:\mathcal {E}_a \rightarrow \mathcal {E}^*_a \) is nonlinear (resp., linear); the system is called fully nonlinear (resp., linear) if it is both geometrically and physically nonlinear (resp., linear).

Both geometrical and physics nonlinearities are basic concepts in nonlinear field theory. The mathematical definition was first given by the author in 2000 under the canonical transformation [20]. A diagrammatic representation of this canonical system is shown in Figure 1.

Fig. 1
figure 1

Diagrammatic representation for a canonical system

This diagram shows a symmetry broken in the canonical equilibrium equation, i.e., instead of Λ, the balance operator is \({\Lambda }_t^*\). It was discovered by Gao and Strang [43] that by introducing a complementary operator Λc(χ) =  Λ(χ) − Λt(χ)χ, this locally broken symmetry is recovered by a so-called complementary gap function:

$$\displaystyle \begin{aligned} G_{ap}({\boldsymbol{\chi}}, {\boldsymbol{\varsigma}}) = \langle - {\Lambda}_c({\boldsymbol{\chi}}) ; {\boldsymbol{\varsigma}} \rangle, \end{aligned} $$
(28)

which plays a key role in global optimization and the triality theory. Clearly, if Λ = D is linear, then G ap = 0. Thus, the following statement is important to understand complexity:

Only the geometrical nonlinearity leads to nonconvexity in optimization, bifurcation in analysis, chaos in dynamics, and NP-hard problems in complex systems.

3.2 Complementary-Dual Principle and Analytical Solution

For a given canonical function \({\Phi }:\mathcal {E}_a \rightarrow {\mathbb R}\), its conjugate \({\Phi }^*:\mathcal {E}^*_a \rightarrow {\mathbb R}\) can be uniquely defined by the Legendre transformation:

$$\displaystyle \begin{aligned} {\Phi}^*( {\boldsymbol{\varsigma}} ) = \mathrm{sta} \{ \langle {\boldsymbol{\xi}} ; {\boldsymbol{\varsigma}} \rangle - {\Phi}({\boldsymbol{\xi}}) | \;\; {\boldsymbol{\xi}} \in \mathcal{E}_a \}, \end{aligned} $$
(29)

where \(\mathrm {sta} \{ f({\boldsymbol {\chi }}) | \; {\boldsymbol {\chi }} \in \mathcal {X} \}\) stands for finding the stationary value of f(χ) on \(\mathcal {X}\), and the following canonical duality relations hold on \(\mathcal {E}_a \times \mathcal {E}^*_a\):

$$\displaystyle \begin{aligned} {\boldsymbol{\varsigma}} = \partial {\Phi}({\boldsymbol{\varepsilon}}) \;\;\Leftrightarrow \;\; {\boldsymbol{\varepsilon}} = \partial {\Phi}^*({\boldsymbol{\varsigma}}) \;\;\Leftrightarrow \;\; {\Phi}({\boldsymbol{\varepsilon}}) + {\Phi}^*({\boldsymbol{\varsigma}}) = \langle {\boldsymbol{\varepsilon}} ; {\boldsymbol{\varsigma}} \rangle. \end{aligned} $$
(30)

If the canonical function is convex and lower semicontinuous, the Gâteaux derivative should be replaced by the sub-differential and Φ is replaced by the Fenchel conjugate \(\Phi ^{\sharp }({\boldsymbol {\varsigma }}) = \sup \{ \langle {\boldsymbol {\xi }} ; {\boldsymbol {\varsigma }}\rangle - \Phi ({\boldsymbol {\xi }}) | \; {\boldsymbol {\xi }} \in \mathcal {E}_a \}\). In this case, (30) is replaced by the generalized canonical duality:

$$\displaystyle \begin{aligned} {\boldsymbol{\varsigma}} \in \partial {\Phi}({\boldsymbol{\varepsilon}}) \;\;\Leftrightarrow \;\; {\boldsymbol{\varepsilon}} \in \partial {\Phi}^{\sharp}({\boldsymbol{\varsigma}}) \;\;\Leftrightarrow \;\; {\Phi}({\boldsymbol{\varepsilon}}) + {\Phi}^\sharp({\boldsymbol{\varsigma}}) = \langle {\boldsymbol{\varepsilon}} ; {\boldsymbol{\varsigma}} \rangle\;\; \forall ({\boldsymbol{\xi}}, {\boldsymbol{\varsigma}}) \in \mathcal{E}_a \times \mathcal{ E}^*_a . \end{aligned} $$
(31)

If the convex set \(\mathcal {E}_a\) contains inequality constraints, then (31) includes all the internal KKT conditions [14, 53]. In this sense, a KKT point of the canonical form Π(χ) is a generalized critical point of Π(χ).

By the complementarity Φ( Λ(χ)) = 〈 Λ(χ);ς〉− Φ(ς), the canonical form of Π(χ) can be equivalently written in Gao and Strang’s total complementary function \(\Xi :\mathcal {X}_a \times \mathcal {E}_a^* \rightarrow {\mathbb R} \) [43]:

$$\displaystyle \begin{aligned} \Xi({\boldsymbol{\chi}}, {\boldsymbol{\varsigma}}) = \langle {\Lambda}({\boldsymbol{\chi}}) ; {\boldsymbol{\varsigma}} \rangle - {\Phi}^*({\boldsymbol{\varsigma}}) - \langle {\boldsymbol{\chi}}, \mathbf{f} \rangle . \end{aligned} $$
(32)

Then, the canonical dual function \(\Pi ^g:\mathcal {S}_c \rightarrow {\mathbb R}\) can be obtained by the canonical dual transformation:

$$\displaystyle \begin{aligned} \Pi^g({\boldsymbol{\chi}}) = \mathrm{sta} \{ \Xi({\boldsymbol{\chi}}, {\boldsymbol{\varsigma}}) | \;\; {\boldsymbol{\chi}} \in \mathcal{X}_a \} = G_{ap}^{\Lambda}({\boldsymbol{\varsigma}}) - \Phi^*({\boldsymbol{\varsigma}}), \end{aligned} $$
(33)

where \(G_{ap}^{\Lambda } ({\boldsymbol {\varsigma }}) = \mathrm {sta} \{ \langle {\Lambda }({\boldsymbol {\chi }} ) ; {\boldsymbol {\varsigma }} \rangle - \langle {\boldsymbol {\chi }}, \mathbf {f} \rangle | \;\; {\boldsymbol {\chi }} \in \mathcal {X}_a \}\), which is defined on the canonical dual feasible space \(\mathcal {S}_c = \{ {\boldsymbol {\varsigma }} \in \mathcal {E}^*_a | \; \; {\Lambda }_t^*({\boldsymbol {\chi }}) {\boldsymbol {\varsigma }} = \mathbf {f} \;\; \forall {\boldsymbol {\chi }} \in \mathcal {X}_a \}.\) Clearly, \(\mathcal {S}_c \neq \emptyset \) if \(({\mathcal {P}})\) is properly posed.

Theorem 2 (Complementary-Dual Principle [18])

The pair \((\bar {\boldsymbol {\chi }}, \bar {\boldsymbol {\varsigma }})\) is a critical point of Ξ(χ, ς) if and only if \(\bar {\boldsymbol {\chi }}\) is a critical point of Π(χ) and \(\bar {\boldsymbol {\varsigma }}\) is a critical point of Π g(ς). Moreover:

$$\displaystyle \begin{aligned} \Pi(\bar{\boldsymbol{\chi}}) = \Xi(\bar{\boldsymbol{\chi}}, \bar{\boldsymbol{\varsigma}}) = \Pi^g(\bar{\boldsymbol{\varsigma}}). \end{aligned} $$
(34)

Proof

The criticality condition \(\partial \Xi (\bar {\boldsymbol {\chi }}, \bar {\boldsymbol {\varsigma }}) = 0\) leads to the following canonical equations:

$$\displaystyle \begin{aligned} {\Lambda}(\bar{\boldsymbol{\chi}}) = \partial {\Phi}^*(\bar{\boldsymbol{\varsigma}}), \;\; \; {\Lambda}_t^*(\bar{\boldsymbol{\chi}}) \bar{\boldsymbol{\varsigma}} = \mathbf{f}. \end{aligned} $$
(35)

The theorem is proved by the canonical duality (30) and the definition of Πg.

Theorem 2 shows a one-to-one correspondence of the critical points between the primal function and its canonical dual. In large deformation theory, this theorem solved the fifty-year-old open problem on complementary variational principle and is known as the Gao principle in literature [58]. In real-world applications, the geometrical operator Λ is usually quadratic homogeneous, i.e., \({\Lambda } ({\alpha } {\boldsymbol {\chi }}) = {\alpha }^2 {\Lambda }({\boldsymbol {\chi }}) \;\; \forall {\alpha } \in {\mathbb R}\). In this case, we have [43] Λt(χ)χ = 2 Λ(χ), Λc(χ) = − Λ(χ), and

$$\displaystyle \begin{aligned} \Xi({\boldsymbol{\chi}}, {\boldsymbol{\varsigma}}) = G_{ap}({\boldsymbol{\chi}}, {\boldsymbol{\varsigma}}) - {\Phi}^*({\boldsymbol{\varsigma}}) - \langle {\boldsymbol{\chi}}, \mathbf{f} \rangle = \frac{1}{2} \langle {\boldsymbol{\chi}}, \mathbf{G}({\boldsymbol{\varsigma}}) {\boldsymbol{\chi}} \rangle - {\Phi}^*({\boldsymbol{\varsigma}}) - \langle {\boldsymbol{\chi}}, \mathbf{f} \rangle , {} \end{aligned} $$
(36)

where \( \mathbf {G}({\boldsymbol {\varsigma }}) = \partial ^2_{{\boldsymbol {\chi }}} G_{ap}({\boldsymbol {\chi }}, {\boldsymbol {\varsigma }})\). Then, the canonical dual function Πg(ς) can be written explicitly as:

$$\displaystyle \begin{aligned} \Pi^g({\boldsymbol{\varsigma}}) = \{ \Xi({\boldsymbol{\chi}}, {\boldsymbol{\varsigma}}) | \;\; \mathbf{G}({\boldsymbol{\varsigma}}) {\boldsymbol{\chi}} = \mathbf{f} \; \forall {\boldsymbol{\chi}} \in \mathcal{X}_a\} = - \frac{1}{2} \langle [\mathbf{G}({\boldsymbol{\varsigma}})]^{+} \mathbf{f} , \mathbf{f} \rangle - {\Phi}^*({\boldsymbol{\varsigma}}), \end{aligned} $$
(37)

where G + represents a generalized inverse of G.

Theorem 3 (Analytical Solution Form [18])

If \(\bar {\boldsymbol {\varsigma }} \in \mathcal {S}_c\) is a critical point of Π g(ς), then:

$$\displaystyle \begin{aligned} \bar{\boldsymbol{\chi}} = [\mathbf{G}(\bar{\boldsymbol{\varsigma}})]^{+} \mathbf{f} \end{aligned} $$
(38)

is a critical point of Π(χ) and \( \Pi (\bar {\boldsymbol {\chi }}) = \Xi (\bar {\boldsymbol {\chi }}, \bar {\boldsymbol {\varsigma }}) = \Pi ^g(\bar {\boldsymbol {\varsigma }}). \) Dually, if \(\bar {\boldsymbol {\chi }}\in \mathcal {X}_c\) is a critical point of Π(χ), it must be in the form of (38) for a critical point \(\bar {\boldsymbol {\varsigma }}\in \mathcal {S}_c\) of Π g(ς).

This unified analytical solution form holds not only for general global optimization problems in finite-dimensional systems [25] but also for a large class of nonlinear boundary-/initial-value problems in nonconvex analysis and dynamic systems [21, 23, 54].

3.3 Triality Theory and NP-Hard Criterion

In order to study extremality property for the general problem \(({\mathcal {P}}_g)\), we need additional assumptions for the generalized objective function G(D χ).

Assumption 1 (Canonically Convex Function)

Let G(D χ) be a generalized objective function, i.e., there exist a measure \({\Lambda }:\mathcal {X}_a \rightarrow \mathcal {E}_a\) and a canonical function \({\Phi }:\mathcal {E}_a \rightarrow {\mathbb R}\) such that G(D χ) =  Φ( Λ(χ)). We assume:

  • (A1) Nonlinearity: \({\Lambda }({\boldsymbol {\chi }}): \mathcal {X}_a \rightarrow \mathcal {E}_a\) is a quadratic measure,

  • (A2) Regularity: Φ( Λ(χ)) is twice continuously differentiable for all \({\boldsymbol {\chi }} \in \mathcal {X}_a\),

  • (A3) Convexity: \(\Phi :\mathcal {E}_a \rightarrow {\mathbb R}\) is strictly convex.

One should emphasize that although Φ(ξ) is required to be strictly convex on \(\mathcal {E}_a\) in this section, the composition Φ( Λ(χ)) is usually nonconvex on \(\mathcal {X}_a\) due to the geometrical nonlinearity. The canonical duality theory presented in this section can be generalized to the problems governed by high-order nonlinear Λχ) and canonically concave function Φ(ξ) (see [18, 22, 46, 56]).

Definition 6 (Degenerate and Nondegenerate Critical Points, Morse Function)

Let \(\bar {\boldsymbol {\chi }} \in \mathcal {X}_c\) be a critical point of a real-valued function \(\Pi :\mathcal {X}_c \rightarrow {\mathbb R}\). \(\bar {\boldsymbol {\chi }} \) is called degenerate (resp. nondegenerate) if the Hessian matrix of Π(χ) is singular (resp., nonsingular) at \(\bar {\boldsymbol {\chi }} \). The function \(\Pi :\mathcal {X}_c \rightarrow {\mathbb R}\) is called a Morse function if it has no degenerate critical points.

Theorem 4 (Triality Theory [20])

Suppose that \({\Phi }:\mathcal {E}_a \rightarrow {\mathbb R}\) is convex, \((\bar {\boldsymbol {\chi }}, \bar {\boldsymbol {\varsigma }})\) is a nondegenerate critical point of Ξ(χ, ς), and \(\mathcal {X}_{o}\times \mathcal {S}_{o} \) is a neighborhood Footnote 12 of \(\left ( \bar {\boldsymbol {\chi }},\bar {\boldsymbol {\varsigma }} \right ) \).

If \(\bar {\boldsymbol {\varsigma }} \in \mathcal {S}^+_c = \{ {\boldsymbol {\varsigma }} \in \mathcal {S}_c | \;\; \mathbf {G}({\boldsymbol {\varsigma }} ) \succeq 0 \}\) , then

$$\displaystyle \begin{aligned} \Pi(\bar{\boldsymbol{\chi}}) = \min_{{\boldsymbol{\chi}} \in \mathcal{X}_c} \Pi\left( {\boldsymbol{\chi}}\right) =\max_{{\boldsymbol{\varsigma}} \in \mathcal{S}^+_c} \Pi^g\left({\boldsymbol{\varsigma}} \right) = \Pi^g(\bar{\boldsymbol{\varsigma}}) . {} \end{aligned} $$
(39)

If \(\bar {\boldsymbol {\varsigma }} \in \mathcal {S}^-_c = \{ {\boldsymbol {\varsigma }} \in \mathcal {S}_c| \;\; \mathbf {G}({\boldsymbol {\varsigma }} ) \prec 0 \}\) , then we have either

$$\displaystyle \begin{aligned} \Pi(\bar{\boldsymbol{\chi}}) = \max_{{\boldsymbol{\chi}} \in \mathcal{X}_o} \Pi\left( {\boldsymbol{\chi}}\right) =\max_{{\boldsymbol{\varsigma}} \in \mathcal{S}_o} \Pi^g\left({\boldsymbol{\varsigma}} \right) = \Pi^g(\bar{\boldsymbol{\varsigma}}) , {} \end{aligned} $$
(40)

or (if \(\dim \Pi = \dim \Pi ^g\) )

$$\displaystyle \begin{aligned} \Pi(\bar{\boldsymbol{\chi}}) = \min_{{\boldsymbol{\chi}} \in \mathcal{X}_o} \Pi\left( {\boldsymbol{\chi}}\right) =\min_{{\boldsymbol{\varsigma}} \in \mathcal{S}_o} \Pi^g\left({\boldsymbol{\varsigma}} \right) = \Pi^g(\bar{\boldsymbol{\varsigma}}) . {} \end{aligned} $$
(41)

The statement (39) is the so-called canonical min-max duality, which can be proved easily by Gao and Strang’s work in 1989 [43]. Clearly, \({\boldsymbol {\varsigma }} \in \mathcal {S}^+_c\) if and only if \(G_{ap}({\boldsymbol {\chi }}, {\boldsymbol {\varsigma }}) \ge 0 \;\; \forall {\boldsymbol {\chi }} \in \mathcal {X}\). This duality theory shows that the Gao-Strang gap function provides a global optimum criterion. The statements (40) and (41) are called the canonical double-max and double-min dualities, respectively, which can be used to find local extremum solutions. The triality theory shows that the nonconvex minimization problem \(({\mathcal {P}})\) is canonically dual to the following maximum stationary problem:

$$\displaystyle \begin{aligned} ({\mathcal{P}}^d): \;\; \max \mathrm{sta} \{ \Pi^g({\boldsymbol{\varsigma}}) | \;\; {\boldsymbol{\varsigma}} \in \mathcal{S}^+_c \}. \end{aligned} $$
(42)

Theorem 5 (Existence and Uniqueness Criteria [25])

For a properly posed \(({\mathcal {P}})\) , if the canonical function \(\Phi :\mathcal {E}_a \rightarrow {\mathbb R}\) is convex, int \( \mathcal {S}^+_c \neq \emptyset \) , and

$$\displaystyle \begin{aligned} \lim_{{\alpha} \rightarrow 0^+} \Pi^g({\boldsymbol{\varsigma}}_o + {\alpha} {\boldsymbol{\varsigma}}) = - \infty \;\; \forall {\boldsymbol{\varsigma}}_o \in \partial \mathcal{S}^+_c, \;\; \forall {\boldsymbol{\varsigma}} \in \mathcal{S}^+_c, \end{aligned} $$
(43)

then \(({\mathcal {P}}^d)\) has at least one solution \(\bar {\boldsymbol {\varsigma }}\in \mathcal {S}^+_c\) and \(\bar {\boldsymbol {\chi }} = [\mathbf {G}(\bar {\boldsymbol {\varsigma }}) ]^+\mathbf {f}\) is a solution to \(({\mathcal {P}})\) .‘ The solution is unique if \(\mathbf {H} = \mathbf {G}(\bar {\boldsymbol {\varsigma }}) \succ 0\).

Proof

Under the required conditions, \(-\Pi ^g:\mathcal {S}^+_c \rightarrow {\mathbb R}\) is convex and coercive and int\(\mathcal {S}^+_c \neq \emptyset \). Therefore, \(({\mathcal {P}}^g)\) has at least one solution. If H ≻ 0, then \(\Pi ^g:\mathcal {S}^+_c \rightarrow {\mathbb R}\) is strictly concave and \(({\mathcal {P}}^g)\) has a unique solution.

This theorem shows that if int\(\mathcal {S}^+_c \neq \emptyset \) the nonconvex problem \(({\mathcal {P}}_g)\) is canonically dual to \(({\mathcal {P}}^g)\) which can be solved easily. Otherwise, the problem \(({\mathcal {P}}_g)\) is canonically dual to the following minimal stationary problem, i.e., to find a global minimum stationary value of Πg on \(\mathcal {S}_c\):

$$\displaystyle \begin{aligned} ({\mathcal{P}}^s): \;\; \min \mathrm{sta} \{ \Pi^g({\boldsymbol{\varsigma}}) | \;\; {\boldsymbol{\varsigma}} \in \mathcal{S}_c \}, \end{aligned} $$
(44)

which could be really NP-hard since Πg(ς) is nonconvex on the nonconvex set \(\mathcal {S}_c\). Therefore, a conjecture was proposed in [24].

Conjecture 1 (Criterion of NP-Hardness)

A properly posed problem \(({\mathcal {P}}_g)\) is NP-hard only if int \(\mathcal {S}^+_c = \emptyset \).

Remark 3 (History of Triality and Challenges)

The triality theory was discovered by the author during his research on post-buckling of a large deformed elastic beam in 1996 [12], where the primal variable u(x) is a displacement vector in \({\mathbb R}^2\) and ς(x) is a canonical dual stress also in \({\mathbb R}^2\). Therefore, the triality theory was correctly proposed in nonconvex analysis, which provides for the first time a complete set of solutions to the post-buckling problem. Physically, the global minimizer \(\bar {\mathbf {u}}({\mathbf {x}})\) represents a stable buckled beam configuration (happened naturally), the local minimizer is an unstable buckled state (happened occasionally), while the local maximizer is the unbuckled beam state. Mathematical proof of the triality theory was given in [18] for one-D nonconvex variational problems (Theorem 2.6.2) and for finite-dimensional optimization problems (Theorem 5.3.6 and Corollary 5.3.1). In 2002, the author discovered some counterexamples to the canonical double-min duality when \(\dim \Pi \neq \dim \Pi ^g\). Therefore, the triality theory was presented in an “either-or” form since the double-max duality is always true but the double-min duality was remarked by certain additional condition (see Remark 1 in [22] and Remark for Theorem 3 in [23]). Recently, the author and his co-workers proved that the canonical double-min duality holds weakly when \(\dim \Pi \neq \dim \Pi ^g\) [6, 61]. It was also discovered by using the canonical dual finite element method that the local minimum solutions in nonconvex mechanics are very sensitive not only to the input and boundary conditions of a given system but also to such artificial conditions as the numerical discretization and computational precision, etc. The triality theory provides a precise mathematical tool for studying and understanding complicated natural phenomena.

The triality theory has been repeatedly challenged by M.D. Voisei and C. Zălinescu in a set of at least 11 papers (see [29]). These papers fall into three groups. In the first group (say [78, 83]), they oppositely choose piecewise linear functions for G and quadratic functions for F as counterexamples to against the canonical duality theory with six conclusions on Gao and Strang’s original work including [83]: “About the (complementary) gap function one can conclude that it is useless at least in the current context. The hope for reading an optimization theory with diverse applications is ruined …” Clearly, they made conceptual mistakes. In the second group, Voisei and Zălinescu chose an artificial problem with certain symmetry such that \(\mathcal {S}^+_a = \emptyset \). Such a problem can be solved easily by linear perturbation (see [62]). The counterexamples in the third group are simply those such that \(\dim \Pi \neq \dim \Pi ^g\). This type of counterexamples were first discovered by Gao in 2002 so it was emphasized in [22, 23] that the canonical double-min duality holds under certain additional constraints (see Remark 1 in [22] and Remark for Theorem 3 in [23]). But neither [23] nor [22] was cited by Zălinescu and his co-authors in their papers. Honest people can easily understand the motivation of these challenges.

The canonical duality-triality theory has been successfully used for solving a wide class problems in both global optimization and nonconvex analysis [39], including certain challenging problems in nonconvex analysis [19], nonlinear PDEs [33], large deformation mechanics [27], and NP-hard integer programming problems [24, 31]. ♣

4 Applications in Complex Systems

Applications to nonconvex constrained global optimization have been discussed in [40, 53]. This section presents applications to two general global optimization problems.

4.1 Unconstrained Nonconvex Optimization Problem

$$\displaystyle \begin{aligned} ({\mathcal{P}}_g): \;\;\; \min \left\{ \Pi({\boldsymbol{\chi}}) = \sum_{s=1}^m \Phi_s({\Lambda}_s({\boldsymbol{\chi}})) - \langle {\boldsymbol{\chi}}, \mathbf{f} \rangle | \;\; {\boldsymbol{\chi}}\in \mathcal{X}_c\right\} , \end{aligned} $$
(45)

where the canonical measures ξ s = Λs(χ) could be either a scalar or a generalized matrix, Φk(ξ k) are any given canonical functions, such as polynomial, exponential, logarithm, and their compositions, etc. For example, if \({\boldsymbol {\chi }} \in \mathcal {X}_c \subset {\mathbb R}^n\) and

$$\displaystyle \begin{aligned} \begin{array}{rcl} {G}(\mathbf{D} {\boldsymbol{\chi}}) &\displaystyle =&\displaystyle \sum_{i\in \mathbb{I}} \frac{1}{2} {{\alpha}_i} {\boldsymbol{\chi}}^T {\mathbf{Q}}_i {\boldsymbol{\chi}} + \sum_{j \in \mathbb{J }} \frac{1}{2} {\alpha}_j \left( \frac{1}{2} {\boldsymbol{\chi}}^T {\mathbf{Q}}_j {\boldsymbol{\chi}} + \beta_j \right)^2 \\ &\displaystyle +&\displaystyle \sum_{k \in \mathbb{K }} {\alpha}_k \exp \left( \frac{1}{2} {\boldsymbol{\chi}}^T {\mathbf{Q}}_k {\boldsymbol{\chi}} \right) + \sum_{\ell \in \mathbb{L}} \frac{1}{2} {\alpha}_\ell {\boldsymbol{\chi}}^T {\mathbf{Q}}_{\ell{\boldsymbol{\chi}}} \log (\frac{1}{2} {\boldsymbol{\chi}}^T {\mathbf{Q}}_{\ell{\boldsymbol{\chi}}} ) , \end{array} \end{aligned} $$
(46)

where {Q s} are positive-definite matrices to allow the Cholesky decomposition \({\mathbf {Q}}_s = {\mathbf {D}}^T_s {\mathbf {D}}_s\) for all \( s \in \{ \mathbb {I}, \mathbb {J}, \mathbb {K}, \mathbb {L}\}\) and {α s, β s} are physical constants, which could be either positive or negative under Assumption 1. This general function includes naturally the so-called d.c. functions (i.e., difference of convex functions). Let \( p = \dim \mathbb {I}, \;\; q = \dim \mathbb {J} + \dim \mathbb {K}+ \dim \mathbb {L} \). By using the canonical measure:

$$\displaystyle \begin{aligned} {\boldsymbol{\xi}} = \{ \xi_s \} = \left \{\frac{1}{2} {{\alpha}_i} {\boldsymbol{\chi}}^T {\mathbf{Q}}_i {\boldsymbol{\chi}} , \frac{1}{2} {\boldsymbol{\chi}}^T {\mathbf{Q}}_r {\boldsymbol{\chi}} \right \} \in \mathcal{E}_a = {\mathbb R}^p \times {\mathbb R}^{q}_+ , \end{aligned}$$

where \({\mathbb R}^q_+ = \{ \mathbf {x} \in {\mathbb R}^q| \; x_i \ge 0 \; \forall i = 1, \dots , q\}\), G(g) can be written in the canonical form:

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\Phi} ({\boldsymbol{\xi}}) &\displaystyle =&\displaystyle \sum_{i\in \mathbb{I}} \xi_i + \sum_{j\in \mathbb{J}} \frac{1}{2} {\alpha}_j (\xi_j + \beta_j)^2 + \sum_{k\in \mathbb{K}} {\alpha}_k \exp \xi_k + \sum_{\ell\in \mathbb{L}} {\alpha}_\ell \xi_\ell \log\xi_\ell . \end{array} \end{aligned} $$

Thus,  Φ(ξ) = {1, ς r} in which \({\boldsymbol {\varsigma }} = \{ {\alpha }_j (\xi _j + \beta _j), {\alpha }_k \exp \xi _k, \; {\alpha }_\ell ( \log \xi _\ell - 1) \} \in \mathcal {E}^*_a\), and

$$\displaystyle \begin{aligned} \mathcal{E}^*_a = \{ {\boldsymbol{\varsigma}}\in {\mathbb R}^q |\; \varsigma_j \ge - {\alpha}_j \beta_j \;\; \forall j \in \mathbb{J}, \;\; \varsigma_k \ge {\alpha}_k \;\;\forall k \in \mathbb{K}, \;\; \varsigma_\ell \in {\mathbb R} \;\; \forall \ell \in \mathbb{L}\}. \end{aligned}$$

The conjugate of Φ can be easily obtained as:

(47)

Since Λ(χ) is quadratic homogenous, the gap function and \(G_{ap}^{\Lambda }\) in this case are

$$\displaystyle \begin{aligned} \begin{array}{c}\displaystyle G_{ap}({\boldsymbol{\chi}}, {\boldsymbol{\varsigma}}) = \frac{1}{2} {\boldsymbol{\chi}}^T \mathbf{G}({\boldsymbol{\varsigma}}) {\boldsymbol{\chi}}, \;\; G_{ap}^{\Lambda}({\boldsymbol{\varsigma}}) = \frac{1}{2} {\mathbf{f}}^T [\mathbf{G}({\boldsymbol{\varsigma}})]^+ \mathbf{f}, \\ {} \displaystyle\mathbf{G}({\boldsymbol{\varsigma}}) = \sum_{i \in \mathbb{I}} {\alpha}_i {\mathbf{Q}}_i + \sum_{s\in \{ \mathbb{J}, \mathbb{K}, \mathbb{L}\}} \varsigma_s {\mathbf{Q}}_s . \end{array} \end{aligned}$$

Since \(\Pi ^g({\boldsymbol {\varsigma }}) = -G_{ap}^{\Lambda }({\boldsymbol {\varsigma }}) - \Phi ^*({\boldsymbol {\varsigma }}) \) is concave and \(\mathcal {S}^+_c\) is a closed convex set, if for the given physical constants and the input f such that \( \mathcal {S}^+_c\neq \emptyset \), the canonical dual problem \(({\mathcal {P}}^g)\) has at least one solution \(\bar {\boldsymbol {\varsigma }}\in \mathcal {S}^+_c \subset {\mathbb R}^q\) and \(\bar {\boldsymbol {\chi }} = [\mathbf {G}(\bar {\boldsymbol {\varsigma }})]^+ \mathbf {f} \in \mathcal {X}_c \subset {\mathbb R}^n\) is a global minimum solution to \(({\mathcal {P}})\). If n ≫ q, the problem \(({\mathcal {P}}^g)\) can be much easier than \(({\mathcal {P}})\).

4.2 D.C. Programming

It is known that in Euclidean space every continuous global optimization problem on a compact set can be reformulated as a d.c. optimization problem, i.e., a nonconvex problem which can be described in terms of d.c. functions (difference of convex functions) and d.c. sets (difference of convex sets) [82]. By the fact that any constraint set can be equivalently relaxed by a nonsmooth indicator function, general nonconvex optimization problems can be written in the following standard d.c. programming form:

$$\displaystyle \begin{aligned} \min \{ f(\mathbf{x})= g(\mathbf{x})-h(\mathbf{x}) \; | \;\; \forall {\mathbf{x}\in \mathcal{X}} \}, \end{aligned} $$
(48)

where \(\mathcal {X} = {\mathbb R}^n\), g(x), h(x) are convex proper lower semicontinuous functions on \({\mathbb R}^n\). A more general model is that g(x) can be an arbitrary function [82]. Clearly, this d.c. programming problem is too abstract. Although it can be used to “model” a very wide range of mathematical problems [47], it is impossible to have an elegant theory and powerful algorithms for solving this problem without detailed structures on these arbitrarily given functions. As a result of extensive studying during the last thirty years (cf. [48, 79]), even some very simple d.c. programming problems are considered as NP-hard [82].

Based on the canonical duality theory, a generalized d.c. programming problem \(({\mathcal {P}}_{dc})\) can be presented in a canonical d.c. minimization problem form:

$$\displaystyle \begin{aligned} ({\mathcal{P}}_{dc}): \;\; \min \left\{ \Pi({\boldsymbol{\chi}}) = \Phi(\Lambda ({\boldsymbol{\chi}}) ) - \frac{1}{2} \langle {\boldsymbol{\chi}} , \mathbf{ H} {\boldsymbol{\chi}} \rangle - \langle {\boldsymbol{\chi}}, \bar{\boldsymbol{\chi}}^* \rangle | \;\; {\boldsymbol{\chi}} \in \mathcal{ X}_c \right\}, \end{aligned} $$
(49)

where H is a given positive-definite generalized matrix.

Since the canonical measure \({\boldsymbol {\xi }} = \Lambda ({\boldsymbol {\chi }}) \in \mathcal {E}_a\) is nonlinear and Φ(ξ) is convex on \(\mathcal {E}_a\), the composition Φ( Λ(χ)) has a higher-order nonlinearity than a quadratic function. Therefore, the coercivity for the target function Π(χ):

$$\displaystyle \begin{aligned} \lim_{\|{\boldsymbol{\chi}} \| \rightarrow \infty } \Pi ({\boldsymbol{\chi}}) = \infty {} \end{aligned} $$
(50)

should be naturally satisfied for many real-world problems, which is a sufficient condition for existence of a global minimal solution to \(({\mathcal {P}}_{dc})\) (otherwise, the set \(\mathcal {X}_c \) should be bounded). Clearly, this generalized d.c. minimization problem can be used to model a reasonably large class of real-world problems in multi-disciplinary fields [34, 49].

4.3 Fixed Point Problems

Fixed point problem is a well-established subject in the area of nonlinear analysis, which is usually formulated in the following form:

$$\displaystyle \begin{aligned} ({\mathcal{P}}_{fp}): \;\;\;\; \mathbf{x} = {F}(\mathbf{x}), \end{aligned} $$
(51)

where \({F}:\mathcal {X}_a \rightarrow \mathcal {X}_a\) is nonlinear mapping and \( \mathcal {X}_a \) is a subset of a normed space \(\mathcal {X} \). Problem \(({\mathcal {P}}_{fp})\) appears extensively in engineering and sciences, such as equilibrium problems, mathematical economics, game theory, and numerical methods for nonlinear dynamical systems. It is realized [72] that this well-studied field is actually a special subject of global optimization.

Lemma 2

If F is a potential operator, i.e., there exists a real-valued function \({P}:\mathcal {X}_a \rightarrow {\mathbb R}\) such that F(x) = ∇P(x), then \(({\mathcal {P}}_{fp})\) is equivalent to the following stationary point problem:

$$\displaystyle \begin{aligned} \bar{\mathbf{x}} = \arg \mathrm{sta} \left \{ \Pi(\mathbf{x})= {P}(\mathbf{x})-\frac{1}{2}\|\mathbf{x} \|{}^2 \; | \;\; \forall \mathbf{x}\in \mathcal{X}_a \right \}. \end{aligned} $$
(52)

Otherwise, \(({\mathcal {P}}_{fp})\) is equivalent to the following global minimization problem:

$$\displaystyle \begin{aligned} \bar{\mathbf{x}} = \arg \min \left \{ \Pi(\mathbf{x})= \frac{1}{2} \|{F}(\mathbf{x}) - \mathbf{x}\|{}^2 \; | \;\; \forall \mathbf{x} \in \mathcal{X}_a \right \}. \end{aligned} $$
(53)

Proof

First, we assume that F(x) is potential operator, then x is a stationary point of Π(x) if and only if ∇ Π(x) = ∇P(x) −x = 0, thus x is also a solution to \(({\mathcal {P}}_{fp})\) since F(x) = ∇P(x).

Now, we assume that F(x) is not a potential operator. By the fact that \(\Pi (\mathbf {x}) = \frac {1}{2} \|{F}(\mathbf {x}) - \mathbf {x} \|{ }^2 \ge 0 \;\;\forall \mathbf {x} \in \mathcal {X}\), the vector \(\bar {\mathbf {x}}\) is a global minimizer of Π(x) if and only if \({F}({\bar {\mathbf {x}}}) - \bar {\mathbf {x}} = 0 \). Thus, \(\bar {\mathbf {x}}\) must be a solution to \(({\mathcal {P}}_{fp})\).

By the facts that the global minimizer of an unconstrained optimization problem must be a stationary point, and

$$\displaystyle \begin{aligned} \frac{1}{2} \|{F}(\mathbf{x})-\mathbf{x}\|{}^2 = {P}(\mathbf{x}) - \frac{1}{2} \|\mathbf{x}\|{}^2, \;\; {P}(\mathbf{x})=\frac{1}{2} \langle {F}(\mathbf{x}) , {F}(\mathbf{x}) \rangle - \langle \mathbf{x} , {F}(\mathbf{x}) \rangle +\|\mathbf{x}\|{}^2 ,{}\end{aligned} $$
(54)

the global minimization problem (53) is a special case of the stationary point problem (52). Mathematically speaking, if a fixed point problem has a trivial solution, then F(x) must be a homogeneous operator, i.e., F(0) = 0. For general problems, F(x) should have a nonhomogeneous term \(\mathbf {f} \in {\mathbb R}^n\). Thus, we can let P(x) = G(Dx) −〈x, f〉 such that \(\mathbf {D}: \mathcal {X} \rightarrow \mathcal {G} \subset {\mathbb R}^m\) is a linear operator and \({G}:\mathcal {G} \rightarrow {\mathbb R}\) is a generalized objective function. Thus, the fixed point problem \(({\mathcal {P}}_{fp})\) can be reformulated in the following stationary point problem:

$$\displaystyle \begin{aligned} ({\mathcal{P}}_{fp}):~~~~ \bar{\mathbf{x}} = \arg \mathrm{sta} \left\{ \Pi(\mathbf{x})= {G}(\mathbf{D} \mathbf{x})-\frac{1}{2}\|\mathbf{x}\|{}^2 - \langle \mathbf{x} , \mathbf{ f} \rangle \; | \;\; \forall \mathbf{x} \in \mathcal{X}_c \right\}. \end{aligned} $$
(55)

Clearly, the fixed point problem is actually equivalent to a d.c. programming problem. Canonical duality theory for solving this fixed point problem is given recently in [72].

4.4 Mixed Integer Nonlinear Programming (MINLP)

The decision variable for (MINLP) is \({\boldsymbol {\chi }} = \{ {\mathbf {y}}, {\mathbf {z}} \} \in \mathcal {Y}_a \times \mathcal {Z}_a\), where \(\mathcal {Y}_a\) is a continuous variable set and \(\mathcal {Z}_a \) is a set of integers. It was shown in [69] that for any given integer set \(\mathcal {Z}_a\), there exists a linear transformation \({\mathbf {D}}_z :\mathcal {Z}_a \rightarrow \mathbb {Z} = \{ \pm 1\}^n\). Thus, based on the unified problem \(({\mathcal {P}}_g)\), a general MINLP problem can be proposed as:

$$\displaystyle \begin{aligned} ({\mathcal{P}}_{mi}): \;\; \min \{ \Pi({\mathbf{y}},{\mathbf{z}}) = {G}({\mathbf{D}}_y {\mathbf{y}}, {\mathbf{D}}_z {\mathbf{z}}) - \langle {\mathbf{y}} , {\mathbf{s}} \rangle - \langle {\mathbf{z}} , {\mathbf{t}} \rangle \; | \;\;({\mathbf{y}},{\mathbf{z}}) \in \mathcal{Y}_c \times {\mathcal{Z}}_c \}, \end{aligned} $$
(56)

where f = (s, t) is a given input, \({\mathbf {D}}{\boldsymbol {\chi }} = ( {\mathbf {D}}_y {\mathbf {y}}, \; {\mathbf {D}}_z {\mathbf {z}}) \in \mathcal {G}_y \times \mathbb {Z} \) is a multi-scale operator, and

$$\displaystyle \begin{aligned} \mathcal{Y}_c = \{ {\mathbf{y}} \in \mathcal{Y}_a \; | \;\; {\mathbf{D}}_y {\mathbf{y}} \in \mathcal{G}_y \}, \; \; \mathcal{Z}_c = \{ {\mathbf{z}} \in \mathcal{Z}_a | \; {\mathbf{D}}_z {\mathbf{z}} \in \mathbb{Z} \}. \end{aligned}$$

In \(\mathcal {Y}_a\), certain linear constraints are given. Since the set \(\mathbb {Z}_a\) is bounded, by Assumption 1 either \({G}:\mathcal {G}_y \rightarrow {\mathbb R}\) is coercive or \(\mathcal {G}_y\) is bounded. This general problem \(({\mathcal {P}}_{mi}) \) covers many real-world applications, including the so-called fixed cost problem [41]. Let

$$\displaystyle \begin{aligned} {\mathbf{g}} = {\boldsymbol{\Lambda}}_z({\mathbf{z}}) = ({\mathbf{D}}_z {\mathbf{z}}) \circ ({\mathbf{D}}_z {\mathbf{z}}) \in \mathcal{E}_z = {\mathbb R}^n_+, \end{aligned} $$
(57)

where x ∘y = {x i y i}n is the Hadamard product in \({\mathbb R}^n\), the integer constraint in \(\mathbb {Z}\) can be relaxed by the canonical function Ψ(g) = {0 if g ≤e, otherwise}, where e = {1}n. Therefore, the canonical form of \(({\mathcal {P}}_{mi})\) is

$$\displaystyle \begin{aligned} \min \{ \Pi({\mathbf{y}},{\mathbf{z}}) = \Phi({\boldsymbol{\Lambda}}({\mathbf{y}},{\mathbf{z}})) + \Psi({\boldsymbol{\Lambda}}_z({\mathbf{z}})) - \langle {\mathbf{y}} , {\mathbf{s}} \rangle - \langle {\mathbf{z}}, {\mathbf{t}} \rangle \; | \; {\mathbf{y}}\in \mathcal{Y}_c \}. \end{aligned} $$
(58)

Since the canonical function Ψ(g) is convex, semicontinuous, its Fenchel conjugate is

$$\displaystyle \begin{aligned} \Psi^\sharp({\boldsymbol{\sigma}}) = \sup\{ \langle {\mathbf{g}} ; {\boldsymbol{\sigma}} \rangle - \Psi({\mathbf{g}}) | {\mathbf{g}} \in {\mathbb R}^n \} =\{ \langle {\mathbf{e}} ; {\boldsymbol{\sigma}} \rangle \;\; \mbox{ if } {\boldsymbol{\sigma}} \ge 0, \;\; \infty \mbox{ otherwise} \}. \end{aligned}$$

The generalized canonical duality relations (31) are σ ≥ 0 ⇔g ≤e ⇔〈g −e;σ〉 = 0. The complementarity shows that the canonical integer constraint g = e can be relaxed by the σ > 0 in continuous space. Thus, if ξ =  Λ(χ) is a quadratic homogenous operator and the canonical function Φ(ξ) is convex on \(\mathcal {E}_a\), the canonical dual to \(({\mathcal {P}}_{mi})\) is

(59)

where G(ς, σ) depends on the quadratic operators Λ(χ) and Λ z(z), \(\mathcal {S}^+_c\) is a convex open set:

$$\displaystyle \begin{aligned} \mathcal{S}^+_c = \{ ({\boldsymbol{\varsigma}}, {\boldsymbol{\sigma}}) \in \mathcal{E}^*_a \times {\mathbb R}^n_+ | \;\; {\mathbf{G}}({\boldsymbol{\varsigma}}, {\boldsymbol{\sigma}}) \succeq 0, \;\; {\boldsymbol{\sigma}} > 0 \}. \end{aligned} $$
(60)

The canonical duality-triality theory has be used successfully for solving mixed integer programming problems [35, 41]. Particularly, for the quadratic integer programming problem:

$$\displaystyle \begin{aligned} ({\mathcal{P}}_{qi}): \;\;\; \min \left\{ \Pi({\mathbf{x}}) = \frac{1}{2} {\mathbf{x}}^T {\mathbf{Q}} {\mathbf{x}} - {\mathbf{x}}^T {\mathbf{f}} | \;\; {\mathbf{x}} \in \{ -1 , 1 \}^n \right\}, \end{aligned} $$
(61)

we have \(\mathcal {S}^+_c = \{ {\boldsymbol {\sigma }}\in {\mathbb R}^n_+ | \;\; {\mathbf {G}}({\boldsymbol {\sigma }}) = {\mathbf {Q}} + 2 {\mbox{Diag }}({\boldsymbol {\sigma }}) \succeq 0, \;\; {\boldsymbol {\sigma }} > 0 \}\) and

$$\displaystyle \begin{aligned} ({\mathcal{P}}^g_{qi}): \;\;\;\max \left \{ \Pi^g({\boldsymbol{\sigma}}) = -\frac{1}{2} {\mathbf{f}}^T [{\mathbf{G}}({\boldsymbol{\sigma}})]^{+} {\mathbf{f}} - {\mathbf{e}}^T {\boldsymbol{\sigma}} | \;\; {\boldsymbol{\sigma}} \in \mathcal{S}_c^+ \right\} \end{aligned} $$
(62)

which can be solved easily if int\(\mathcal {S}_c^+ \neq \emptyset \). Otherwise, \(({\mathcal {P}}_{qi})\) could be NP-hard since \(\mathcal {S}^+_c\) is an open set, which is a conjecture proposed in [24]. In this case, \(({\mathcal {P}}_{qi})\) is canonically dual to an unconstrained nonsmooth/nonconvex minimization problem [25].

4.5 General Knapsack Problem and Analytical Solution

Knapsack problems appear in real-world decision-making processes in a wide variety of fields, such as finding the least wasteful way to cut raw materials, resource allocation where there are financial constraints, selection of investments and portfolios, selection of assets for asset-backed securitization, and generating keys for the Merkle-Hellman and other knapsack cryptosystems. Mathematically, a general quadratic knapsack problem can be formulated as an integer programming problem:

$$\displaystyle \begin{aligned} ({\mathcal{P}}_{qk} ): \;\;\;\; \min \left \{ \Pi_{qk} ({\mathbf{z}}) = \frac{1}{2} {\mathbf{z}}^T {\mathbf{Q}} {\mathbf{z}} - {\mathbf{c}}^T {\mathbf{z}} | \;\; {\mathbf{z}} \in \{ 0, 1 \}^n , \;\; {\mathbf{v}}^T {\mathbf{z}} \le V_c \right\}, \end{aligned} $$
(63)

where \({\mathbf {Q}} \in {\mathbb R}^{n \times n}\) is a given symmetrical, usually indefinite, matrix, \({\mathbf {c}}, \; {\mathbf {v}} \in {\mathbb R}^n\) are two given vectors, and V c > 0 is a design parameter.

The knapsack problem has been studied for more than a century, with early works dating as far back as 1897. The main difficulty in this problem is the integer constraint z ∈{0, 1}n, so that even the most simple linear knapsack problem:

$$\displaystyle \begin{aligned} ({\mathcal{P}}_{lk}): \;\;\;\; \max \left \{ \Pi_{lk} ({\mathbf{z}}) = - {\mathbf{c}}^T {\mathbf{z}} | \;\; {\mathbf{z}} \in \{ 0, 1 \}^n , \;\; {\mathbf{v}}^T {\mathbf{z}} \le V_c \right\}, \end{aligned} $$
(64)

is listed as one of Karp’s 21 NP-complete problems [50].

By the fact that \({\boldsymbol {\alpha }} \circ {\mathbf {z}}^2 = {\boldsymbol {\alpha }}\circ {\mathbf {z}} \;\; \forall {\mathbf {z}} \in \{0,1\}^n, \; \forall {\boldsymbol {\alpha }} \in {\mathbb R}^n\), for any given symmetrical \({\mathbf {Q}} \in {\mathbb R}^{n\times n}\) we can choose an α such that Q α = Q + 2Diag (α) ≽ 0. Thus, by c α = c + α, the problem \(({\mathcal {P}}_q)\) can be equivalently written in the so-called α-perturbation form [25]:

$$\displaystyle \begin{aligned} ({\mathcal{P}}_{\alpha}): \;\; \min\left\{ \Pi_{{\alpha}} ({\mathbf{z}}) = \frac{1}{2} {\mathbf{z}}^T {\mathbf{Q}}_{\alpha} {\mathbf{z}} - {{\mathbf{c}}}_{\alpha}^T {\mathbf{z}} \;\; |\;\; {\mathbf{v}}^T {\mathbf{z}} \le V_c , \;\; {\mathbf{z}}\in \{ 0,1\}^n \right\} . \end{aligned} $$
(65)

Let rank Q α = r ≤ n, there must exist (see [77]) an \({\mathbf {L}} \in {\mathbb R}^{r\times n} \) and \({\mathbf {H}} \in {\mathbb R}^{r\times r}\) with rank L = rank H = r and H ≻ 0 such that Q α = 4L T HL. Similar to the α-perturbed canonical dual problem \(({\mathcal {P}}^g_{ip})\) given in [25], the canonical dual problem \(({\mathcal {P}}^g_q)\) can be reformulated as:

$$\displaystyle \begin{aligned} ({\mathcal{P}}^g_{{\alpha}}): \;\;\; \max_{{\boldsymbol{\zeta}} \in \mathcal{S}^+_c} \left\{ \Pi^g_{{\alpha}} ({\boldsymbol{\sigma}},\tau) = - \frac{1}{2} \mbox{Abs}[ {\boldsymbol{\phi}}({\boldsymbol{\sigma}}, \tau)] - \frac{1}{2} {\boldsymbol{\sigma}}^T {\mathbf{H}}^{-1} {\boldsymbol{\sigma}} - \tau V_b + d \right\} ,\end{aligned} $$
(66)

where \(V_b = V_c - \frac {1}{2} \sum _{i=1}^n v_i , \;\; d = \frac {1}{8} \sum _{i=1}^n (2 {\alpha }_i+ \sum _{j=1}^n Q_{ij}) - \frac {1}{2} \sum _{i=1}^n (c_i + {\alpha }_i)\),

$$\displaystyle \begin{gathered} \mathcal{S}^+_c = \{ {\boldsymbol{\zeta}} =({\boldsymbol{\sigma}}, \tau) \in {\mathbb R}^{m+1} | \;\; \;\; \tau\ge 0 \}. \end{gathered} $$
(67)
$$\displaystyle \begin{gathered} {\boldsymbol{\phi}} ( {\boldsymbol{\sigma}},\tau ) = {\mathbf{c}} - \tau {\mathbf{v}} - 2 {\mathbf{L}}^T {\boldsymbol{\sigma}} - \frac{1}{2} {\mathbf{Q}} {\mathbf{e}}, \end{gathered} $$
(68)

The notation Abs[ϕ(σ, τ)] denotes \( \mbox{Abs}[ {\boldsymbol {\phi }}({\boldsymbol {\sigma }},\tau ) ]= \sum _{i=1}^n | \phi _i({\boldsymbol {\sigma }},\tau ) | \).

Theorem 6 (Analytical Solution to Quadratic Knapsack Problem)

For any given V c > 0, \({\mathbf {v}}, {\mathbf {c}} \in {\mathbb R}^n_+ \) , \({\boldsymbol {\alpha }} \in {\mathbb R}^n_+\) such that Q α = Q + 2Diag (α) = 4L T HL and H ≻ 0, if \( {\bar {{\boldsymbol {\zeta }}}} =\{ \bar {{\boldsymbol {\sigma }}}, \bar {\tau }\} \) is a solution to \(({\mathcal {P}}^g_{{\alpha }})\) , then

$$\displaystyle \begin{aligned} {\bar{{\mathbf{z}}}} = \frac{1}{2} \left\{ \frac{ \phi_i (\bar{{\boldsymbol{\sigma}}}, \bar{\tau}) }{| \phi_i (\bar{{\boldsymbol{\sigma}}}, \bar{\tau}) |} + 1 \right\}^n \end{aligned} $$
(69)

is a global optimal solution to \(({\mathcal {P}}_{{\alpha }})\) and

$$\displaystyle \begin{aligned} \Pi_{{\alpha}} ({\bar{{\mathbf{z}}}}) = \min_{{\mathbf{z}} \in \mathcal{Z}_a} \Pi_{{\alpha}} ({\mathbf{z}}) = \max_{{\boldsymbol{\zeta}} \in \mathcal{S}^+_c} \Pi^g_{{\alpha}} ({\boldsymbol{\zeta}}) = \Pi^g_{{\alpha}}({\bar{{\boldsymbol{\zeta}}}}). \end{aligned} $$
(70)

Theorem 7 (Existence and Uniqueness Theorem to Quadratic Knapsack Problem)

For any given V c > 0, \({\mathbf {v}}, {\mathbf {c}} \in {\mathbb R}^n_+ \) , \({\boldsymbol {\alpha }} \in {\mathbb R}^n_+\) such that Q α = Q + 2Diag (α) = 4L T HL , H ≻ 0, and \( {\bar {{\boldsymbol {\zeta }}}} =\{ \bar {{\boldsymbol {\sigma }}}, \bar {\tau }\} \) is a solution to \(({\mathcal {P}}^g_{{\alpha }})\) , if

$$\displaystyle \begin{aligned} \phi_i (\bar{{\boldsymbol{\sigma}}}, \bar{\tau}) \neq { 0} \;\; \forall i =1, \dots, n {} \end{aligned} $$
(71)

then the canonical dual feasible set \(\mathcal {S}^+_c\neq \emptyset \) and the knapsack problem \(({\mathcal {P}}_{{\alpha }})\) has a unique solution. Otherwise, if \(\phi _i(\bar {{\boldsymbol {\sigma }}}, \bar {\tau }) = 0 \) for at least one i ∈{1, …, n}, then \(\mathcal {S}^+_c = \emptyset \) and \(({\mathcal {P}}_{{\alpha }})\) has at least two solutions.

The canonical dual for the linear knapsack problem has a very simple form:

$$\displaystyle \begin{aligned} ({\mathcal{P}}^g_{lk}): \;\;\; \max_{ \tau \ge 0} \left\{ \Pi^g_{lk}(\tau) = -\frac{1}{2} \sum_{i=1}^n ( | c_i - \tau v_i | - \tau v_i) - \tau V_c \right\}. {} \end{aligned} $$
(72)

Corollary 1 (Analytical Solution to Linear Knapsack Problem)

For any given V c > 0, \({\mathbf {v}}, {\mathbf {c}} \in {\mathbb R}^n_+ \) , if \(\bar {\tau } > 0\) is a solution to \(({\mathcal {P}}^g_{lk})\) , then

$$\displaystyle \begin{aligned} {\bar{{\mathbf{z}}}} = \frac{1}{2} \left\{ \frac{c_i - \bar{\tau} v_i }{|c_i - \bar{\tau} v_i |} + 1 \right\}^n \end{aligned} $$
(73)

is a global optimal solution to \(({\mathcal {P}}_l)\) and

$$\displaystyle \begin{aligned} \Pi_{lk} ({\bar{{\mathbf{z}}}}) = \Pi^g_{lk} (\bar{\tau}) \end{aligned} $$
(74)

Corollary 2 (Existence and Uniqueness Theorem to Linear Knapsack Problem)

For any given \({\mathbf {v}}, {\mathbf {c}} \in {\mathbb R}^n_+\) , if there exists a constant τ c > 0 such that

$$\displaystyle \begin{aligned} \psi_i(\tau_c) = \tau_c v_i - c_i \neq { 0} \;\; \forall i=1, \dots, n {} \end{aligned} $$
(75)

then the knapsack problem \(({\mathcal {P}}_{lk})\) has a unique solution. Otherwise, if ψ i(τ c) = 0 for at least one i ∈{1, …, n}, then \(({\mathcal {P}}_{lk})\) has at least two solutions.

Detailed proof of these results is given by Gao in [31].

The so-called multi-dimensional knapsack problem (MKP) is a generalization of the linear knapsack problem, that is:

$$\displaystyle \begin{aligned} ({\mathcal{P}}_{mk}): \;\; \max {\mathbf{c}}^T {\mathbf{z}}, \;\; s.t. \;\; {\mathbf{W}} {\mathbf{z}} \le {\boldsymbol{\omega}} , \;\; {\mathbf{z}} \in \{ 0, 1\}^n, \end{aligned} $$
(76)

where \({\mathbf {c}} \in {\mathbb R}^n_+\) and \({\boldsymbol {\omega }} \in {\mathbb R}^m_+\) (m < n) are two given nonnegative vectors,

$$\displaystyle \begin{aligned} {\mathbf{W}} \in {\mathbb R}^{m\times n}_+ = \{ {\mathbf{W}} = \{ w_{ij} \} \in {\mathbb R}^{m\times n} | \;\; w_{ij} \ge 0 \;\; \forall i=1, \dots, m, \; j=1,\dots, n\} \end{aligned}$$

is a given nonnegative matrix such that \(w_{ij} \le \omega _j, \; \sum _{j=1}^n w_{ij} \ge \omega _i\). Clearly, this problem has multi-knapsacks {ω i}m. Therefore, instead of the multi-dimensional, the correct name for \(({\mathcal {P}}_{mk})\) should be the multi-kanpsacks problem. This problem has applications in many fields including capital budgeting problems and resource allocation [66]. The canonical dual problem for \(({\mathcal {P}}_{mk})\) is

$$\displaystyle \begin{aligned} ({\mathcal{P}}^g_{mk}): \;\;\; \max_{ {\boldsymbol{\tau}}\in {\mathbb R}^m_+} \left\{ \Pi^g_{mk}({\boldsymbol{\tau}}) = -\frac{1}{2} \sum_{i=1}^n ( | c_i - \sum_{j=1}^m w_{ji}\tau_j | - \sum_{j=1}^m w_{ji}\tau_j ) - {\boldsymbol{\omega}}^T {\boldsymbol{\tau}} \right\}. {} \end{aligned} $$
(77)

Thus, if \(\bar {\boldsymbol {\tau }} = \{ \bar {\tau }_i\} \) is a global maximizer of \(({\mathcal {P}}^g_{mk})\), the analytic solution to \(({\mathcal {P}}_{mk})\) is

$$\displaystyle \begin{aligned} {\mathbf{z}} = \frac{1}{2} \left( \frac{ c_i - \sum_{j=1}^m w_{ji}\bar{\tau}_j }{| c_i - \sum_{j=1}^m w_{ji}\bar{\tau}_j | } + 1 \right). \end{aligned} $$
(78)

4.6 Bilevel Optimization and Optimal Control

Bilevel optimization appears extensively in optimal design and control of complex systems. A general formulation of the bilevel optimization problem can be written as follows:

$$\displaystyle \begin{aligned} \begin{array}{rcl} ({\mathcal{P}}_{bo}):\;\;\;\; &\displaystyle \min &\displaystyle \{ {T} ( {\mathbf{x}}, {\mathbf{y}}) \;\; | \;\; {\mathbf{x}}\in \mathcal{X}_a , \;\; {\mathbf{y}} \in \mathcal{Y}_a \} , {} \end{array} \end{aligned} $$
(79)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle \mbox{ s.t.} &\displaystyle {\mathbf{y}} \in \arg \min \{ \Pi({\mathbf{v}}, {\mathbf{x}}) \; | \;\; {\mathbf{v}} \in \mathcal{Y}_a \} {}, \end{array} \end{aligned} $$
(80)

where T represents the top-level target (or leader) function, Π is the lower-level target (or follower) function. Similarly, \({\mathbf {x}} \in \mathcal {X}_a \) represents upper-level decision vector and \({\mathbf {y}}\in \mathcal {Y}_a \) represents the lower-level variable. Clearly, this is a coupled nonlinear optimization problem, which is fundamentally difficult even for convex systems.

To solve this coupling problem numerically, an alternative iteration method can be used [30]:

  1. (1)

    For a given x k−1, solve the lower-level problem first to obtain

    $$\displaystyle \begin{aligned} {\mathbf{y}}_k \in \arg \min \{ \Pi({\mathbf{y}}, {\mathbf{x}}_{k-1}) | \;\; {\mathbf{y}} \in \mathcal{Y}_a \}. {} \end{aligned} $$
    (81)
  2. (2)

    Then for the fixed y k, solve the upper-level problem for

    $$\displaystyle \begin{aligned} {\mathbf{x}}_k = \arg \min \{ {T} ( {\mathbf{x}}, {\mathbf{y}}_k) \;\; | \;\; {{\mathbf{x}}\in \mathcal{X}_a } \}. {} \end{aligned} $$
    (82)

These two single-level optimization problems can be solved by the canonical duality theory, and the sequence {x k, y k} can converge to an optimal solution of \(({\mathcal {P}}_{bo})\) under certain conditions.

As an example, let us consider the following optimal control problem:

$$\displaystyle \begin{aligned} \begin{array}{rcl} ({\mathcal{P}}_{oc}):\;\;\;\; &\displaystyle \min &\displaystyle \{ \Phi ( {\boldsymbol{\nu}}, {\boldsymbol{\chi}}) \;\; | \;\; {\boldsymbol{\nu}}\in \mathcal{U} , \;\; {\boldsymbol{\chi}} \in \mathcal{X} \} , {} \end{array} \end{aligned} $$
(83)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle \mbox{ s.t.} &\displaystyle \dot {\boldsymbol{\chi}} = {{\mathbf{a}}}({\boldsymbol{\chi}}, {\boldsymbol{\nu}}, t ) \;\; \forall t \in I = [t_o, t_b] , {} \end{array} \end{aligned} $$
(84)

where χ(t) is the state, ν(t) is the control; \(\mathcal {X} \) is a feasible set including the boundary conditions: χ(t o) = χ o and ψ(χ(t b), t b) = 0. The upper-level target Φ is usually a quadratic continuous-time cost functional:

$$\displaystyle \begin{aligned} \Phi ({\boldsymbol{\nu}}, {\boldsymbol{\chi}}) = \frac{1}{2} \int_I \left[ {\boldsymbol{\chi}}^T(t) {\mathbf{Q}}(t) {\boldsymbol{\chi}}(t){+}{\boldsymbol{\nu}}^T(t) \mathbf{R}(t) {\boldsymbol{\nu}}(t){-}2{\boldsymbol{\chi}}^T(t) {\mathbf{P}}(t) {\boldsymbol{\nu}}(t) \right] {\,\mbox{d}t}{+}\Phi_b({\boldsymbol{\chi}}(t_b)) \end{aligned} $$
(85)

where \({\mathbf {Q}}(t) \in {\mathbb R}^{d\times d} \), \({\mathbf {R}}(t) \in {\mathbb R}^{p\times p}\) are positive semi-definite and positive definite, respectively, on the time domain I = [t o, t b], \( \Phi _b({\boldsymbol {\chi }}(t_b)) = \frac {1}{2} {\boldsymbol {\chi }}^T (t_b) {\mathbf {Q}}_b {\boldsymbol {\chi }}(t_b) \). New to this cost function is the coupling term χ T(t)P(t)ν(t), where \({\mathbf {P}}(t) \in {\mathbb R}^{d\times p}\) is a given matrix function of t, which plays an important role in alternative iteration methods for solving the general nonlinear optimal control problem \(({\mathcal {P}}_{oc})\).

For conservative systems, the nonlinear operator a(χ, ν, t) is a potential operator, i.e., there exists an action (or Lagrangian) \(\Pi ({\boldsymbol {\chi }}, \dot {\boldsymbol {\chi }} ,{\boldsymbol {\nu }} )\) such that for any given control \({\boldsymbol {\nu }}(t) \in \mathcal {U}\) the differential equation (84) can be written in the following least action form:

$$\displaystyle \begin{aligned} {\boldsymbol{\chi}} \in \arg \min \{\Pi({\boldsymbol{\chi}}, \dot{\boldsymbol{\chi}}, {\boldsymbol{\nu}} ) \; | \;\; {\boldsymbol{\chi}} \in \mathcal{X} \} \end{aligned} $$
(86)

Although such a Lagrangian does not exist for dissipative systems, the least squares method can always be used so that (84) can also be written in this minimization form.

In order to reformulate the challenging control problem \(({\mathcal {P}}_{oc})\) in function space, the finite element method can be used such that the time domain I is divided into n elements {I e = [t k, t k+1} and in each I e, the unknown fields can be numerically discretized as:

$$\displaystyle \begin{aligned} {\boldsymbol{\nu}}(t) = {\mathbf{N}}^e_u(t) {\mathbf{u}}_e , \;\; {\boldsymbol{\chi}}(t) = {\mathbf{N}}^e_x(t) {\mathbf{x}}_e \;\; \forall t \in I_e,\;\; e=1, \dots, n, \end{aligned} $$
(87)

where \({\mathbf {N}}_u^e(t) \) is an interpolation matrix for ν(t), u e = (ν(t k), ν(t k+1)) is a nodal control vector; similarly, \({\mathbf {N}}_x^e(t)\) is an interpolation matrix for χ(t) and x e = (χ(t k), χ(t k+1)) is a nodal state vector. Let \(\mathcal {U}_a \subset {\mathbb R}^{p\times n} \) be an admissible nodal control space, \(\mathcal {X}_a \subset {\mathbb R}^{d\times n}\) be an admissible state space, \({\mathbf {u}} =\{ {\boldsymbol {\nu }}_k \} \in \mathcal {U}_a\), and \({\mathbf {x}} =\{{\boldsymbol {\chi }}_k \} \in \mathcal {X}_a\), then both the cost functional Φ and the action Π can be numerically written as:

$$\displaystyle \begin{aligned} \Phi ({\boldsymbol{\nu}},{\boldsymbol{\chi}}) \approx \Phi_h( {\mathbf{u}},{\mathbf{x}}) = \frac{1}{2} {\mathbf{x}}^T {\mathbf{Q}}_h {\mathbf{x}} + \frac{1}{2} {\mathbf{u}}^T {\mathbf{R}}_h {\mathbf{u}} - {\mathbf{x}}^T {\mathbf{P}}_h {\mathbf{u}}, \end{aligned} $$
(88)
$$\displaystyle \begin{aligned} \Pi ({\boldsymbol{\chi}}, \dot {\boldsymbol{\chi}}, {\boldsymbol{\nu}}) \approx \Pi_h({\mathbf{x}}, {\mathbf{u}}) = {G}({\mathbf{D}} {\mathbf{x}}, {\mathbf{u}}) - F({\mathbf{x}},{\mathbf{u}}) , \end{aligned} $$
(89)

where G(Dx, u) and F(x, u) depend on the action \(\Pi ({\boldsymbol {\chi }}, \dot {\boldsymbol {\chi }}, {\mathbf {u}})\),

$$\displaystyle \begin{aligned} {\mathbf{Q}}_h =\sum_{e=1}^n \int_{I_e} {\mathbf{N}}_x^T(t) {\mathbf{Q}}(t) {\mathbf{N}}_x(t) {\,\mbox{d}t} + \frac{1}{2} {\mathbf{N}}_x^T(t_b) {\mathbf{Q}}_b {\mathbf{N}}_x(t_b), \end{aligned}$$
$$\displaystyle \begin{aligned} {\mathbf{R}}_h = \sum_{e=1}^n \int_{I_e} {\mathbf{N}}_u^T(t) {\mathbf{R}}(t) {\mathbf{N}}_u(t) {\,\mbox{d}t} , \end{aligned}$$
$$\displaystyle \begin{aligned} {\mathbf{P}}_h = \sum_{e=1}^n \int_{I_e} {\mathbf{N}}_x^T(t) {\mathbf{P}}(t) {\mathbf{N}}_u(t) {\,\mbox{d}t}. \end{aligned}$$

Therefore, the optimal control problem \(({\mathcal {P}}_{oc}) \) can be written in a bilevel optimization problem:

$$\displaystyle \begin{aligned} \begin{array}{rcl} ({\mathcal{P}}^h_{oc}):\;\;\;\; &\displaystyle \min &\displaystyle \{ \Phi_h ( {\mathbf{u}}, {\mathbf{x}}) \;\; | \;\; {\mathbf{u}}\in \mathcal{U}_a , \;\; {\mathbf{x}} \in \mathcal{X}_a\} , {} \end{array} \end{aligned} $$
(90)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle \mbox{ s.t.} &\displaystyle {\mathbf{x}} \in \arg \min \{ \Pi_h({\mathbf{y}}, {\mathbf{u}}) \; | \;\; {\mathbf{y}} \in \mathcal{X}_a \} {} \end{array} \end{aligned} $$
(91)

The canonical duality theory has been successfully applied for solving nonlinear dynamical systems [54, 71] and the relation between chaos and NP-hardness was first discovered by Latorre and Gao [54]. Combined with an alternative iteration method, the canonical duality theory can be used to efficiently solve the general bilevel optimization problems.

4.7 Multi-Level Multi-Targets MINLP and Topology Optimization

Multi-target optimization is concerned with mathematical optimization problems involving more than one target function to be optimized simultaneously. Since the target is a vector-valued function, it is also known as vector optimization, multi-criteria optimization, or Pareto optimization. By the fact that the objectivity has been misused in optimization literature, this important research area has been misguidedly called multi-objective optimization. Multi-target optimization problems appear extensively in multi-scale complex systems where optimal decisions need to be taken in the presence of trade-offs between two or more conflicting targets. Therefore, the multi-level optimization and MINLP problems are naturally involved with the multi-target optimization. In real-world applications, the multi-level multi-target mixed integer nonlinear programming (MMM) could have many different formulations. Based on the canonical duality theory, a simple form of MMM problems can be proposed as the following:

$$\displaystyle \begin{aligned} \begin{array}{rcl} ({\mathcal{P}}_{3m}):\;\;\;\; &\displaystyle \min &\displaystyle \{ {T}({\mathbf{z}}, \bar{\mathbf{x}},\bar{\mathbf{y}} ) \;\; | \;\; { \bar{\mathbf{x}} \in \mathcal{X}_a, \;\; \bar{\mathbf{y}} \in \mathcal{Y}_a ,\;\; {\mathbf{z}}\in \mathcal{Z}_a} \} , {} \end{array} \end{aligned} $$
(92)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle \mbox{ s.t.} &\displaystyle \bar{\mathbf{x}} \in \arg \min \{ \Pi_x({\mathbf{x}}, {\mathbf{y}}, {\mathbf{z}}) \;\; | \;\; {\mathbf{x}} \in \mathcal{X}_a \} {}, \end{array} \end{aligned} $$
(93)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \bar{\mathbf{y}} \in \arg \min \{ \Pi_y({\mathbf{x}}, {\mathbf{y}}, {\mathbf{z}}) \;\; | \;\; {\mathbf{y}}\in \mathcal{Y}_a \} {}. \end{array} \end{aligned} $$
(94)

Without loss of generality, we assume that the leader variable \({\mathbf {z}} \in \mathcal {Z}_a \) is a discrete vector, the follower variables \({\mathbf {x}}\in \mathcal {X}_a \) and \({\mathbf {y}} \in \mathcal {Y}_a\) are continuous vectors; the top-level (leader) target \({T}:\mathcal {Z}_a \times \mathcal {X}_a \times \mathcal {Y}_a \rightarrow {\mathbb R}^m\) is a vector-valued function, which is not necessary to be objective, while the lower-level (follower) targets Πx and Πy are real-valued functions such that the follower problems can be written respectively in the canonical form \(({\mathcal {P}})\), where the objectivity and subjectivity are required. If we let

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{u}} &\displaystyle =&\displaystyle \{ {\mathbf{x}}, {\mathbf{y}}, \cdots \} \in \mathcal{U}_a = \mathcal{X}_a \times \mathcal{Y}_a \times \cdots, \\ \Pi({\mathbf{u}}, {\mathbf{z}}) &\displaystyle =&\displaystyle \{ \Pi_x({\mathbf{u}}, {\mathbf{z}}), \Pi_y({\mathbf{u}}, {\mathbf{z}}) , \cdots\}:\mathcal{U}_a \times \mathcal{Z}_a \rightarrow {\mathbb R}^d, \;\; d \ge 2, \end{array} \end{aligned} $$

then the MMM problem can be written in a general form:

$$\displaystyle \begin{aligned} \begin{array}{rcl} ({\mathcal{P}}_{3m}):\;\;\;\; &\displaystyle \min &\displaystyle \{ {T}({\mathbf{z}}, {\mathbf{u}} ) \;\; | \;\; { {\mathbf{u}} \in \mathcal{U}_a ,\;\; {\mathbf{z}}\in \mathcal{Z}_a} \} , {} \end{array} \end{aligned} $$
(95)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle \mbox{ s.t.} &\displaystyle {\mathbf{u}} \in \arg \min \{ \Pi ({\mathbf{v}} , {\mathbf{z}}) \;\; | \;\; {\mathbf{v}} \in \mathcal{U}_a \} {}. \end{array} \end{aligned} $$
(96)

Clearly, the \(({\mathcal {P}}_{3m})\) should be one of the most challenging problems proposed so far in global optimization even if both T and Π are linear vector-valued functions.

Topology optimization is a mathematical tool that optimizes the best mass density distribution ρ(x) within a design domain \(\Omega \subset {\mathbb R}^d\) in order to obtain the best structural performance governed by the minimum total potential principle:

$$\displaystyle \begin{aligned} \min \left\{ \Pi({\mathbf{u}}, \rho) = \int_\Omega {U}(\nabla {\mathbf{u}}) \rho \,\mbox{d}\Omega - \int_{\Gamma_t} {\mathbf{u}}^T {\mathbf{t}} \,\mbox{d} \Gamma \;\; | \;\; {\mathbf{u}} \in \mathcal{U} \right\} \end{aligned} $$
(97)

where \({\mathbf {u}} : \Omega \rightarrow {\mathbb R}^d\) is a displacement vector field, the design variable ρ(x) ∈{0, 1} is a discrete scalar field, which takes ρ(x) = 1 at a solid material point x ∈ Ω and ρ(x) = 0 at a void point x ∈ Ω. By using finite element method, the design domain Ω is meshed with n disjointed finite elements { Ωe} and let

$$\displaystyle \begin{aligned} {\mathbf{u}}({\mathbf{x}}) = {\mathbf{N}} ({\mathbf{x}}) {\mathbf{u}}_e \;\; \rho({\mathbf{x}}) = z_e \in \{0, 1\} \;\; \forall {\mathbf{x}} \in \Omega_e, \end{aligned}$$

the total potential energy can be numerically written as:

$$\displaystyle \begin{aligned} \Pi({\mathbf{u}} , \rho) \approx \Pi_h ({\mathbf{u}}, {\mathbf{z}}) = {\mathbf{z}}^T {\mathbf{c}}({\mathbf{u}}) - {\mathbf{u}}^T {\mathbf{f}} \end{aligned}$$

where \({\mathbf {u}} = \{ {\mathbf {u}}_e \} \in \mathcal {U}_a \subset {\mathbb R}^m\) is a nodal displacement vector, \({\mathbf {z}} \in \mathcal {Z}_a \subset \{ 0,1\}^n\) is a discretized design vector, and

$$\displaystyle \begin{aligned} {\mathbf{c}}({\mathbf{u}}) = \left\{ \int_{\Omega_e} {U}(\nabla {\mathbf{N}}({\mathbf{x}}) {\mathbf{u}}_e) \,\mbox{d}\Omega \right\} \in {\mathbb R}^n_+ , \;\; {\mathbf{f}} = \left\{ \int_{{\Gamma_t}_e} {\mathbf{N}}({\mathbf{x}})^T {\mathbf{t}}({\mathbf{x}}) \,\mbox{d} \Gamma \right\} \in {\mathbb R}^m. \end{aligned}$$

Let

$$\displaystyle \begin{aligned} \mathcal{Z}_a = \{ {\mathbf{z}} \in \{0, 1\}^n | \;\; {\mathbf{v}}^T {\mathbf{z}} \le \omega \}, \end{aligned}$$

where \({\mathbf {v}} = \{ v_e\}\in {\mathbb R}^n_+\) and v e ≥ 0 is the volume of the e-th element, and ω > 0 is the desired volume of structure. The correct mathematical problem for general topology optimization has been proposed recently by Gao [30, 31]:

Problem 1 (Topology Optimization for General Materials)

For a given external load f and the desired volume ω > 0, to solve the bilevel MINLP problem:

$$\displaystyle \begin{aligned} \begin{array}{rcl} ({\mathcal{P}}_{to}):\;\;\;\; &\displaystyle \min &\displaystyle \{ {T}({\mathbf{z}}, {\mathbf{u}} ) \;\; | \;\; { {\mathbf{u}} \in \mathcal{U}_a ,\;\; {\mathbf{z}}\in \mathcal{Z}_a} \} , \end{array} \end{aligned} $$
(98)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle \mathit{\mbox{ s.t.}} &\displaystyle {\mathbf{u}} \in \arg \min \{ \Pi_h ({\mathbf{v}} , {\mathbf{z}}) \;\; | \;\; {\mathbf{v}} \in \mathcal{U}_a \} . {} \end{array} \end{aligned} $$
(99)

The top-level target function T depends on each design problem, which could be

$$\displaystyle \begin{aligned} \begin{array}{rcl} {T}_1({\mathbf{z}},{\mathbf{u}}) &\displaystyle =&\displaystyle {\mathbf{u}}^T {\mathbf{f}} - {\mathbf{z}}^T {\mathbf{c}}({\mathbf{u}}), \end{array} \end{aligned} $$
(100)
$$\displaystyle \begin{aligned} \begin{array}{rcl} {T}_2({\mathbf{z}},{\mathbf{u}}) &\displaystyle =&\displaystyle \frac{1}{2} {\mathbf{z}}^T {\mathbf{Q}}({\mathbf{u}}) {\mathbf{z}} - {\mathbf{z}}^T {\mathbf{c}}({\mathbf{u}}) . \end{array} \end{aligned} $$
(101)

where Q(u) is a symmetrical matrix whose diagonal elements \(\{Q_{ii}\}_{i=1}^n = 0\) and Q ij(u) is the negative effect to the structure if the i-th and j-th elements are elected. Clearly, the top-level is a linear knapsack problem for T = T 1, or a quadratic knapsack problem if T = T 2. If \(\mathcal {Z}_a = \{ {\mathbf {z}}\in \{0,1\}^n| \;\; {\mathbf {W}} {\mathbf {z}} \le {\boldsymbol {\omega }} \}\), then the top level is a multi-knapsack problem.

In topology limit design, the top-level target could be a vector-valued function depending on certain design parameter α, say:

$$\displaystyle \begin{aligned} {T}_3({\alpha}, {\mathbf{z}},{\mathbf{u}}) = \{ {\alpha}, {T}_i({\mathbf{z}},{\mathbf{u}}) \} , \;\; i=1, 2. \end{aligned} $$
(102)

If α is the volume ω, then \(({\mathcal {P}}_{to})\) is a topology optimization for lightweight design problem:

Problem 2 (Topology Lightweight Design)

For the given external load and ω b > ω a > 0, to solve

$$\displaystyle \begin{aligned} \begin{array}{rcl} ({\mathcal{P}}_{lw}):\; &\displaystyle \min &\displaystyle \{ {T}_3(\omega , {\mathbf{z}}, {\mathbf{u}} ) \; | \;\; {\mathbf{v}}^T {\mathbf{z}} \le \omega, \; \omega \in [\omega^a, \omega^b], \; { {\mathbf{u}} \in \mathcal{U}_a ,\; {\mathbf{z}}\in \{ 0, 1\}^n } \} , \qquad \ \end{array} \end{aligned} $$
(103)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle \mathit{\mbox{ s.t.}} &\displaystyle {\mathbf{u}} \in \arg \min \{ \Pi_h ({\mathbf{v}} , {\mathbf{z}}) \;\; | \;\; {\mathbf{v}} \in \mathcal{U}_a \} . \end{array} \end{aligned} $$
(104)

This is a bilevel multi-target knapsack problem.

If α = −η and η > 0 is the external loading factor, then by simply choosing T 3(α, z, u) = −z T c(u) we have the following problem.

Problem 3 (Topology Limit Design)

For the given external load distribution f and the plastic yield condition in \(\mathcal {U}_a\) , to solve

$$\displaystyle \begin{aligned} \begin{array}{rcl} ({\mathcal{P}}_{ld}):\;\;\;\; &\displaystyle \max &\displaystyle \{ {\mathbf{z}}^T {\mathbf{c}}({\mathbf{u}} ) \;\; | \;\; \eta > 0, \;\; { {\mathbf{u}} \in \mathcal{U}_a ,\;\; {\mathbf{z}}\in \mathcal{Z}_a} \} , \end{array} \end{aligned} $$
(105)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle \mathit{\mbox{ s.t.}} &\displaystyle {\mathbf{u}} \in \arg \min \{ {\mathbf{z}}^T {\mathbf{c}}({\mathbf{v}}) - \eta \;\; | \;\; {\mathbf{v}}^T {\mathbf{f}} = 1, \;\;{\mathbf{v}} \in \mathcal{U}_a \} . \end{array} \end{aligned} $$
(106)

If α = {ω, −η}, the combination of \(({\mathcal {P}}_{lw})\) and \(({\mathcal {P}}_{ld}) \) forms a new problem:

Problem 4 (Topology Lightweight Limit Design)

For the given ω b > ω a > 0, the external load distribution f and the plastic yield condition in \(\mathcal {U}_a\) , to solve

$$\displaystyle \begin{aligned} \begin{array}{rcl} ({\mathcal{P}}_{ll}):\;\;\;\; &\displaystyle \min &\displaystyle \{ \omega, - {\mathbf{z}}^T{\mathbf{c}}( {\mathbf{u}} ) \} \;\; \forall \; \omega \in [\omega^a, \omega^b],\; \eta > 0, \; {\mathbf{u}} \in \mathcal{U}_a ,\;\; {\mathbf{z}}\in \mathcal{Z}_a , \qquad \end{array} \end{aligned} $$
(107)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle \mathit{\mbox{ s.t.}} &\displaystyle {\mathbf{u}} \in \arg \min \{ {\mathbf{z}}^T {\mathbf{c}}({\mathbf{v}}) - \eta \;\; | \;\;{\mathbf{v}}^T {\mathbf{f}} = 1, \;\; {\mathbf{v}} \in \mathcal{U}_a \} . \end{array} \end{aligned} $$
(108)

Due to a conflict between \(\min \omega \) and \(\max \{ \eta = {\mathbf {z}}^T{\mathbf {c}}( {\mathbf {u}} ) \}\), this MMM problem could exist a (possibly infinite) number of Pareto optimal solutions.

The canonical duality theory is particularly useful in topology optimization for full-stress (or plastic limit) design. In this type of problems, it is much more convenient to use the stress as the unknown in analysis. Therefore, dual to \(({\mathcal {P}}_{to})\) the problem for full-stress design can be proposed as:

$$\displaystyle \begin{aligned} \begin{array}{rcl} ({\mathcal{P}}^*_{to}):\;\;\;\; &\displaystyle \max &\displaystyle \{ {T}^*( {\mathbf{z}}, {\boldsymbol{\sigma}}) \;\; | \;\; {{\mathbf{z}}\in \mathcal{Z}_a, \;\; {\boldsymbol{\sigma}} \in \mathcal{S}^+_a} \} , {} \end{array} \end{aligned} $$
(109)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle \mbox{ s.t.} &\displaystyle {\boldsymbol{\sigma}} \in \arg \max \{ \Pi^d_h({\boldsymbol{\tau}}, {\mathbf{z}}) \;\; | \;\; {\boldsymbol{\tau}}\in \mathcal{S}^+_a \} {}, \end{array} \end{aligned} $$
(110)

The top-level dual target T d(z, σ) can be

$$\displaystyle \begin{aligned} \begin{array}{rcl} {T}^d_1( {\mathbf{z}},{\boldsymbol{\sigma}}) &\displaystyle =&\displaystyle {\mathbf{z}}^T {\mathbf{c}}^d({\boldsymbol{\sigma}}), \end{array} \end{aligned} $$
(111)
$$\displaystyle \begin{aligned} \begin{array}{rcl} {T}^d_2( {\mathbf{z}},{\boldsymbol{\sigma}}) &\displaystyle =&\displaystyle {\mathbf{z}}^T {\mathbf{c}}^d({\boldsymbol{\sigma}})+ \frac{1}{2} {\mathbf{z}}^T {\mathbf{Q}}^*({\boldsymbol{\sigma}}) {\mathbf{z}}, \end{array} \end{aligned} $$
(112)

where \({\mathbf {c}}^d({\boldsymbol {\sigma }})\in {\mathbb R}^n_+\) is a positive vector such that each of its components \(c^d_e({\boldsymbol {\sigma }})\) is the pure complementary energy in the e-th element Ωe. The feasible space \(\mathcal {S}^+_a\) is a bounded convex set with an inequality constraint ∥σg ≤ σ c, where ∥σg is a generalized norm which depends on the yield condition adopted, say either Trisca or von Mises criterion [11]. Corresponding to T 3, we have

$$\displaystyle \begin{aligned} {T}^d_3({\alpha}^*, {\mathbf{z}},{\boldsymbol{\sigma}}) = \{ {\alpha}^*, {T}_i^d( {\mathbf{z}},{\boldsymbol{\sigma}}) \} , \;\; i =1,2, \end{aligned} $$
(113)

where α could be − ω, η, or other design parameters.

For linear elastic structures, the total potential energy is a quadratic function of u:

$$\displaystyle \begin{aligned} \Pi_h({\mathbf{u}}, {\mathbf{z}}) = \frac{1}{2} {\mathbf{u}}^T {\mathbf{K}}({\mathbf{z}}) {\mathbf{u}} - {\mathbf{u}}^T {\mathbf{f}} , \end{aligned} $$
(114)

where \( {\mathbf {K}}({\mathbf {z}} ) = \left \{ z_e {\mathbf {K}}_e \right \} \in {\mathbb R}^{n\times n} \) is the overall stiffness matrix, obtained by assembling the sub-matrix z e K e for each element Ωe. Accordingly, \( {\mathbf {c}}({\mathbf {u}} ) = \frac {1}{2} \left \{ {\mathbf {u}}^T_e {\mathbf {K}}_e {\mathbf {u}}_e \right \} \) is the strain energy vector. In this case, the global optimal solution for the lower-level minimization problem (99) is simply governed by a linear equilibrium equation K(z)u = f. Then for T = T 1, the bilevel knapsack problem \(({\mathcal {P}}_{to})\) can be written in the single-level reduction:

$$\displaystyle \begin{aligned} ({\mathcal{P}}_{le}): \;\; \min \left\{ {\mathbf{f}}^T {\mathbf{u}} - {\mathbf{z}}^T {\mathbf{c}}({\mathbf{u}}) \;\; | \;\; {\mathbf{K}}({\mathbf{z}} ) {\mathbf{u}} = {\mathbf{f}} , \;\; {\mathbf{v}}^T {\mathbf{z}} \le V_c, \;\; {\mathbf{z}} \in \{0,1\}^n \right\} . \end{aligned} $$
(115)

This knapsack-type problem makes a perfect sense in topology optimization, i.e., among all elements { Ωe} with the given volume vector v = {v e}, one should keep only those who stored more strain energy density c(u). Based on the canonical dual solution to the knapsack problem, a canonical duality algorithm (CDT) is developed with successful applications.

In term of the stress, the full-stress design problem \(({\mathcal {P}}^*_{fs})\) for linear elastic structures can be simply given as:

$$\displaystyle \begin{aligned} ({\mathcal{P}}^*_{fs}):\;\;\;\; \max \{ {\mathbf{z}}^T {\mathbf{c}}^d( {\boldsymbol{\sigma}}) \;\; | \;\;{{\mathbf{z}}\in \mathcal{Z}_a, \;\; {\boldsymbol{\sigma}} \in \mathcal{S}^+_a} \} , \end{aligned} $$
(116)

where \({\mathbf {c}}^d({\boldsymbol {\sigma }}) = \{ \frac {1}{2} {\boldsymbol {\sigma }}^T_e {\mathbf {C}}_e {\boldsymbol {\sigma }}_e \} \in {\mathbb R}^n_+ \) is the stress energy vector, C e is the compliance matrix of the e-th element,

$$\displaystyle \begin{aligned} \mathcal{S}^+_a = \{ {\boldsymbol{\sigma}} \in {\mathbb R}^p | \;\; {\mathbf{D}}^* {\boldsymbol{\sigma}} = {\mathbf{f}}, \;\; \|{\boldsymbol{\sigma}}\|{}_g \le {\sigma}_c \} , \end{aligned} $$
(117)

in which σ c > 0 is a material constant, and \({\mathbf {D}}^* \in {\mathbb R}^{m\times p}\) is a balance operator depending on the polynomial interpolation in the mixed finite element method [10, 11].

Example

Let us consider the 2-D classical long cantilever beam (see Figure 2). The correct topology optimization model for this benchmark problem should be \(({\mathcal {P}}_{le})\) [31]. We let ω = 0.4 and the pre-given domain ω 0 = 1 is discretized by nex × ney = 180 × 60 elements. Computational results obtained by the CDT and by the popular methods SIMP and BESO are summarized in Figure 3, where the C = z T c(u) is the total strain energy. The parameters used are penal = 3, rmin = 1 for BESO and penal = 3, rmin = 1.5, ft = 1 for SIMP. Clearly, the precise solid-void solution produced by the CDT method is much better than the approximate results produced by other methods. In order to look the strain energy distribution c = {c e(u)} in the optimal structure, we let nex × ney = 80 × 30. Figure 4 shows clearly that the CDT can produce mechanically sound structure with homogeneous distribution of strain energy density. Detailed study on canonical duality theory for solving topology optimization problems is given recently in [30,31,32].

Fig. 2
figure 2

The design domain for a long cantilever beam with external load

Fig. 3
figure 3

Computational results by SIMP (a), BESO (b), and CDT (c) with ω = 0.4.

Fig. 4
figure 4

Strain energy distributions by SIMP (a), BESO (b), and CDT (c) with ω = 0.4.

5 Symmetry, NP-Hardness, and Perturbation Methods

The concept of symmetry is closely related to the duality and, in certain sense, can be viewed as a geometric duality. Mathematically, symmetry means invariance under transformation. By the canonicality, the object G(g) possesses naturally certain symmetry. If the subject F(χ) = 0, then Π(χ) = G(D χ) =  Φ( Λ(χ)) and \(({\mathcal {P}}_g)\) should have either a trivial solution or multiple solutions due to the symmetry. In this case, Πd(ς) = − Φ(ς) is concave and, by the triality theory, its critical point \(\bar {\boldsymbol {\varsigma }} \in \mathcal {S}^-_c\) is a global maximizer, and \(\bar {\boldsymbol {\chi }} = [{\mathbf {G}}(\bar {\boldsymbol {\varsigma }})]^+ {\mathbf {f}} = 0\) is the biggest local maximizer of Π(χ), while the global minimizers must be \(\bar {\boldsymbol {\chi }}(\bar {\boldsymbol {\varsigma }})\) for those \(\bar {\boldsymbol {\varsigma }} \in \partial \mathcal {S}_c^+\) such that

$$\displaystyle \begin{aligned} \Pi^g(\bar{\boldsymbol{\varsigma}}) = \min \{ - \Phi^*({\boldsymbol{\varsigma}}) | \;\; \det{\mathbf{G}}({\boldsymbol{\varsigma}}) = 0 \;\; \forall {\boldsymbol{\varsigma}} \in \mathcal{S}_c\}. \end{aligned} $$
(118)

Clearly, this nonconvex constrained concave minimization problem could be really NP-hard. Therefore, many well-known NP-hard problems in computer science and global optimization are not well-posed problems. Such as the max-cut problem, which is a special case of quadratic integer programming problem \(({\mathcal {P}}_{qi})\). Due to the symmetry Q = Q T and f = 0, its canonical dual problem has multiple solutions on the boundary of \(\mathcal {S}^+_c\). The problem is considered as NP-complete even if Q ij = 1 for all edges. Strictly speaking, this is not a real-world problem but only a perfect geometrical model. Without sufficient geometrical constraints in \(\mathcal {X}_a\), the graph is not physically fixed and any rigid motion is possible. However, by adding a linear perturbation f ≠ 0, this problem can be solved efficiently by the canonical duality theory [85]. Also, it was proved by the author [25, 35] that the general quadratic integer problem \(({\mathcal {P}}_{qi})\) has a unique solution as long as the input f ≠ 0 is big enough. These results show that the subjective function plays an essential role for symmetry breaking which leads to a well-posed problem. To explain the theory and understand the NP-hard problems, let us consider a simple problem.

Example 1 (Nonconvex Minimization in \({\mathbb R}^n\))

$$\displaystyle \begin{aligned} \begin{array}{rcl} \min \left\{ \Pi({\mathbf{x}})=\frac{1}{2} \alpha (\frac{1}{2}\|{\mathbf{x}}\|{}^2-{\lambda})^2-{\mathbf{x}}^T {\mathbf{f}} \;\;\; \forall {\mathbf{x}} \in {\mathbb R}^n \right\} ,{} \end{array} \end{aligned} $$
(119)

where α, λ > 0 are given parameters. Let \( {\Lambda }({\mathbf {x}})=\frac {1}{2}\|{\mathbf {x}}\|{ }^2 \in {\mathbb R}\), and the canonical dual function is \( \Pi ^d(\varsigma )= - \frac {1}{2} \varsigma ^{-1} \|{\mathbf {f}} \|{ }^2 -{\lambda } \varsigma -\frac {1}{2} \alpha ^{-1} \varsigma ^2 , \) which is defined on \(\mathcal {S}_c = \{ \varsigma \in {\mathbb R}| \;\; \varsigma \neq - {\lambda } , \;\; \varsigma = 0 \mbox{ iff } {\mathbf {f}} = 0\}\). The criticality condition Πd(ς) = 0 leads to a canonical dual equation:

$$\displaystyle \begin{aligned} (\alpha^{-1} \varsigma+{\lambda})\varsigma^2=\frac{1}{2} \|{\mathbf{f}} \|{}^2 . \end{aligned} $$
(120)

This cubic equation has at most three real solutions satisfying ς 1 ≥ 0 ≥ ς 2 ≥ ς 3, and, correspondingly, {x i = fς i} are three critical points of Π(x). By the fact that \(\varsigma _ 1 \in \mathcal {S}^+_a = \{ \varsigma \in {\mathbb R}\; |\; \varsigma \ge 0 \}\), x 1 is a global minimizer of Π(x). While for \(\varsigma _2, \varsigma _3 \in \mathcal {S}^-_a= \{ \varsigma \in {\mathbb R}\; |\; \varsigma < 0 \}\), x 2 and x 3 are local min (for n = 1) and local max of Π(x), respectively (see Figure 5(a)).

Fig. 5
figure 5

Graphs of Π(x) (solid) and Πg(ς) (dashed) ( α = 1, λ = 2 )

If we let f = 0, the graph of Π(x) is symmetric (i.e., the so-called double-well potential or the Mexican hat for n = 2 [23]) with infinite number of global minimizers satisfying ∥x2 = 2λ. In this case, the canonical dual \(\Pi ^g (\varsigma ) = - \frac {1}{2} {\alpha }^{-1} \varsigma ^2 - {\lambda } \varsigma \) is strictly concave with only one critical point (local maximizer) ς 3 = −αλ < 0. The corresponding solution x 3 = fς 3 = 0 is a local maximizer. By the canonical dual equation (120) we have ς 1 = ς 2 = 0 located on the boundary of \(\mathcal {S}^+_a\), which corresponding to the two global minimizers \(x_{1,2} = \pm \sqrt {2 {\lambda }}\) for n = 1, see Figure 1(b). If we let f = −2, then the graph of Π(x) is quasi-convex with only one critical point and (120) has only one solution \(\varsigma _1 \in \mathcal {S}^+_c\) (see Figure 1(c)).

This simple example reveals an important truth, i.e., the symmetry is the key that leads to multiple solutions. Theoretically speaking, nothing is perfect in this real world, a perfect symmetry is not allowed for any real-world problem. Thus, any real-world problem must be well-posed [29]. In reality, it is impossible to precisely model any real-world problem; although most of the NP-hard problems are artificial, they appear extensively not only in global optimization and computer science but also in chaotic dynamical systems, decision science, and philosophy, say, the well-known Buridan’s ass paradox in its most simple version.

Example 2 (Paradox of Buridan’s Ass and Perturbation)

A donkey facing two identical hay piles starves to death because reason provides no grounds for choosing to eat one rather than the other.

The mathematical problem of this paradox was formulated in [31]:

$$\displaystyle \begin{aligned} \max \{ c_1 z_1 + c_2 z_2 |\;\;\; c_1 = c_2 = c, \;\; z_1 + z_2 \le 1, \;\; (z_1, z_2) \in \{ 0,1\}^2 \} . \end{aligned} $$
(121)

Clearly, this is a linear knapsack problem in \({\mathbb R}^2\). Due to the symmetries: v 1 = v 2 = 1 and c 1 = c 2 = c, the solution to (77) is τ c = c. Therefore, ψ i(τ c) = 0 ∀i = 1, 2 and by Theorem 2 this problem has multiple (two) solutions, which is NP-hard to this donkey.

In order to solve such NP-hard problems, the key idea is to break the symmetry. A linear perturbation method has been proposed by the author and his co-workers. This method is based on a simple truth, i.e., it is impossible to have the two identical hay piles. Thus, by adding a linear perturbation term 𝜖ρ 1 to the cost function to break the symmetry, then for c = 2, 𝜖 = 0.05, the solution to (77) is τ c = 2.0184. So, the condition (75) holds for i = 1, 2 and by the canonical duality theory, the perturbed Buridan’s ass problem has a unique solution z = (1, 0).

Perturbation method has been successfully applied for solving many challenging problems including hard cases of trust region method [7], NP-hard problems in integer programming [84, 85], and nonconvex constrained optimization in Euclidean geometry [62]. By the fact that the subjective function F(χ) = 〈χ, f〉 plays a key role in real-world problems, the following conjecture was proposed recently [26, 28].

Conjecture 2

For any given properly posed problem \(({\mathcal {P}}_g)\) under the Assumption 1 , there exists a constant f c > 0 such that \(({\mathcal {P}}^g)\) has a unique solution in \(\mathcal {S}^+_c\) as long asf∥≥ f c.

This conjecture shows that any properly posed problems are not NP-hard if the input ∥f∥ is big enough. Generally speaking, most NP-hard problems have multiple solutions located either on the boundary or the outside of \(\mathcal {S}^+_c\). Therefore, a quadratic perturbation method can be suggested as:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Xi_{\delta_k} ({\boldsymbol{\chi}}, {\boldsymbol{\varsigma}}) &\displaystyle =&\displaystyle \Xi({\boldsymbol{\chi}}, {\boldsymbol{\varsigma}}) + \frac{1}{2} \delta_k \|{\boldsymbol{\chi}} - {\boldsymbol{\chi}}_k \|{}^2\\ &\displaystyle =&\displaystyle \frac{1}{2} \langle {\boldsymbol{\chi}} , {\mathbf{G}}_{\delta_k}({\boldsymbol{\varsigma}}) {\boldsymbol{\chi}} \rangle - \Phi^*({\boldsymbol{\varsigma}}) - \langle {\boldsymbol{\chi}}, {\mathbf{f}}_{\delta_k} \rangle + \frac{1}{2} \delta_k \langle {\boldsymbol{\chi}}_k, {\boldsymbol{\chi}}_k \rangle , \end{array} \end{aligned} $$

where δ k > 0, χ k (k = 1, 2, … ) are perturbation parameters, \({\mathbf {G}}_{\delta _k}({\boldsymbol {\varsigma }}) = {\mathbf {G}}({\boldsymbol {\varsigma }}) + \delta _k {\mathbf {I}} \), and \({\mathbf {f}}_{\delta _k} = {\mathbf {f}}+ \delta _k {\boldsymbol {\chi }}_k\). Thus, the original canonical dual feasible space \(\mathcal {S}_c^+\) can be enlarged to \(\mathcal {S}_{\delta _k}^+ = \{ {\boldsymbol {\varsigma }} \in \mathcal {S}_c | \; {\mathbf {G}}_{\delta _k}({\boldsymbol {\varsigma }}) \succ 0 \}\) such that a perturbed canonical dual problem can be proposed as:

$$\displaystyle \begin{aligned} ({\mathcal{P}}^g_k): \;\; \max \left\{ \min \{ \Xi_{\delta_k} ({\boldsymbol{\chi}}, {\boldsymbol{\varsigma}}) |\;\; {\boldsymbol{\chi}} \in \mathcal{X}_a \} | \; \; {\boldsymbol{\varsigma}} \in \mathcal{S}_{\delta_k}^+ \right\}. \end{aligned} $$
(122)

Based on this problem, a canonical primal-dual algorithm has been developed with successful applications for solving sensor network optimization problems [70] and chaotic dynamics [54].

6 Connections with Popular Methods and Techniques

By the fact that the canonical duality-triality theory is a unified mathematical methodology with solid foundation in physics, it is naturally connected to many other powerful methods and techniques in different fields. This paper discusses only two well-known methods in optimization and a so-called composite minimization problem. Connections with other theories and methodologies can be found in [34, 56].

6.1 Relation with SDP Programming

Now, let us show the relation between the canonical duality theory and the popular semi-definite programming relaxation.

Theorem 8

Suppose that \(\Phi :\mathcal {E}_s \rightarrow {\mathbb R}\) is convex and \(\bar {\boldsymbol {\varsigma }} \in \mathcal {E}^*_a\) is a solution of the problem:

$$\displaystyle \begin{aligned} ({\mathcal{P}}^{sd}): \;\;\; \min \{ g + \Phi^*({\boldsymbol{\varsigma}}) \} \;\; s.t. \left( \begin{array}{cc} {\mathbf{G}}({\boldsymbol{\varsigma}}) & {\mathbf{f}} \\ {\mathbf{f}}^T & 2 g \end{array} \right) \succeq 0 \;\; \forall {\boldsymbol{\varsigma}} \in \mathcal{E}^*_a , \;\; g \in {\mathbb R}, \end{aligned} $$
(123)

then χ = [G(ς)]+ f is a global minimum solution to the nonconvex problem \(({\mathcal {P}})\).

Proof

The problem \(({\mathcal {P}}^d)\) can be equivalently written in the following problem (see [86]):

$$\displaystyle \begin{aligned} \min \left\{ g + \Phi^*({\boldsymbol{\varsigma}}) |\;\; g \ge G_{ap}^{\Lambda}({\boldsymbol{\varsigma}}), \;\; {\mathbf{G}}({\boldsymbol{\varsigma}}) \succeq 0 \;\; \forall {\boldsymbol{\varsigma}} \in \mathcal{E}^*_a\right \}. \end{aligned} $$
(124)

Then, by using the Schur complement lemma, this problem is equivalent to \(({\mathcal {P}}^{sd})\). The theorem is proved by the triality theory.

It was proved [35] that for the same problem \(({\mathcal {P}}_{qi})\), if we use different geometrical operator:

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\Lambda}({\mathbf{x}}) &\displaystyle =&\displaystyle {\mathbf{x}} {\mathbf{x}}^T \in \mathcal{E}_a = \{ {\boldsymbol{\xi}} \in {\mathbb R}^{n\times n}| \; {\boldsymbol{\xi}} = {\boldsymbol{\xi}}^T, \;\; {\boldsymbol{\xi}} \succeq 0, \\ &\displaystyle &\displaystyle {\mbox{rank }} {\boldsymbol{\xi}} = 1, \;\; \xi_{ii} = 1 \;\; \forall i = 1, \dots, n\}, \end{array} \end{aligned} $$

and the associated canonical function \( \Phi ({\boldsymbol {\xi }}) = \frac {1}{2} \langle {\boldsymbol {\xi }} ; {\mathbf {Q}} \rangle + \{ 0 \mbox{ if } \; {\boldsymbol {\xi }} \in \mathcal {E}_a , + \infty \mbox{ otherwise} \}\), where 〈ξ;ς〉 = tr(ξ T ς), we should obtain the same canonical dual problem \(({\mathcal {P}}^d_{qi})\). Particularly, if f = 0, then (\({\mathcal {P}}_{qi}\)) is a typical linear semi-definite programming:

$$\displaystyle \begin{aligned} \min \frac{1}{2} \langle {\boldsymbol{\xi}} ; {\mathbf{Q}} \rangle \;\; s.t. \; {\boldsymbol{\xi}} \in \mathcal{E}_a . \end{aligned}$$

Since \(\mathcal {E}_a\) is not bounded and there is no input, this problem is not properly posed, which could have either no solution or multiple solutions for a given indefinite Q = Q T.

The SDP programming has been used for solving a canonical dual problem in post-buckling analysis of a large deformed elastic beam [1].

6.2 Relation to Reformulation-Linearization/Convexification Technique

The Reformulation-Linearization/Convexification Technique (RLT) proposed by H. Sherali and C.H. Tuncbilek [75] is one well-known novel approach for efficiently solving general polynomial programming problems. The key idea of this technique is also to introduce a geometrically nonlinear operator ξ =  Λ(x) such that the higher-order polynomial object G(x) can be reduced to a lower-order polynomial Φ(ξ). Particularly, for the quadratic minimization problems with linear inequality constraints in \(\mathcal {X}_a\):

$$\displaystyle \begin{aligned} ({\mathcal{P}}_{q}): \;\;\min \left\{ \Pi({\mathbf{x}}) = \frac{1}{2} {\mathbf{x}}^T {\mathbf{Q}} {\mathbf{x}} - {\mathbf{x}}^T {\mathbf{f}} |\;\; {\mathbf{x}} \in \mathcal{X}_a \right\}, \end{aligned} $$
(125)

by choosing the quadratic transformation:

$$\displaystyle \begin{aligned} {\boldsymbol{\xi}} = {\Lambda}({\mathbf{x}}) = {\mathbf{x}} \overrightarrow{\otimes} {\mathbf{x}} \in \mathcal{E}_a \subseteq {\mathbb R}^{n\times n} , \;\; i.e., \;\; {\boldsymbol{\xi}} = \{ \xi_{ij} \} = \{ x_i x_j \}, \;\; \forall 1 \le i \le j \le n,\end{aligned} $$
(126)

where \(\overrightarrow {\otimes } \) represents the Kronecker product (avoiding symmetric terms, i.e., ξ ij = ξ ji), the quadratic object G(g) can be reformulated as the following first-level RLT linear relaxation:

$$\displaystyle \begin{aligned} {G}( {\mathbf{x}}) = \frac{1}{2} {\mathbf{x}}^T {\mathbf{Q}} {\mathbf{x}} = \frac{1}{2} \sum_{k=1}^n q_{kk} \xi_{kk} + \sum_{k=1}^{n-1} \sum_{l=k+1}^n q_{kl} \xi_{kl} = \Phi({\boldsymbol{\xi}}). \end{aligned} $$
(127)

The linear Φ(ξ) can be considered as a special canonical function since ς =  Φ(ξ) is a constant and Φ(ς) = 〈ξ;ς〉− Φ(ξ) ≡ 0 is uniquely defined. Thus, using Φ(ξ) = 〈ξ;ς〉 to replace G(x) and considering ξ as an independent variable, the problem \(({\mathcal {P}}_{q})\) can be relaxed by the following RLT linear program:

$$\displaystyle \begin{aligned} ({\mathcal{P}}_{RLT}): \;\; \min \left\{ \Phi({\boldsymbol{\xi}}) - \langle {\mathbf{x}} , {\mathbf{f}} \rangle | \;\; {\mathbf{x}} \in \mathcal{X}_a, \;\; {\boldsymbol{\xi}} \in \mathcal{E}_a \right\}. \end{aligned} $$
(128)

Based on this RLT linear program, a branch and bound algorithm was designed [76]. It is proved that if \((\bar {\mathbf {x}}, \bar {\boldsymbol {\xi }})\) solves \(({\mathcal {P}}_{RLT})\), then its objective value yields a lower bound of \(({\mathcal {P}}_q)\) and \(\bar {\mathbf {x}}\) provides an upper bound for \(({\mathcal {P}}_q)\). Moreover, if \(\bar {\boldsymbol {\xi }} = {\Lambda }(\bar {\mathbf {x}}) = \bar {\mathbf {x}} \overrightarrow {\otimes } \bar {\mathbf {x}}\), then \(\bar {\mathbf {x}}\) solves \(({\mathcal {P}}_q)\).

This technique has been significantly adapted along with supporting approximation procedures to solve a variety of more general nonconvex constrained optimization problems having polynomial or more general factorable objective and constraint functions [74].

By the fact that for any symmetric Q, there exists \({\mathbf {D}} \in {\mathbb R}^{n\times m}\) such that Q = D T HD with \({\mathbf {H}} = \{ h_{kk} = \pm 1, \;\; h_{kl} = 0 \;\; \forall k\neq l\} \in {\mathbb R}^{m\times m}\), the canonicality condition (127) can be simplified as:

$$\displaystyle \begin{aligned} {G}({\mathbf{D}} {\mathbf{x}}) = \frac{1}{2} ({\mathbf{D}}{\mathbf{x}})^T {\mathbf{H}} ({\mathbf{D}}{\mathbf{x}}) = \frac{1}{2} \sum_{k=1}^m h_{kk} \xi_{kk} = \Phi({\boldsymbol{\xi}}), \end{aligned} $$
(129)
$$\displaystyle \begin{aligned} {\boldsymbol{\xi}} ={\Lambda}({\mathbf{x}}) = ({\mathbf{D}} {\mathbf{x}}) \overrightarrow{\otimes} ({\mathbf{D}} {\mathbf{x}}) \in {\mathbb R}^{m\times m}. \end{aligned} $$
(130)

Clearly, if the scale m ≪ n, the problem \(({\mathcal {P}}_{RLT})\) will be much easier than the problems using the geometrically nonlinear operator \({\boldsymbol {\xi }} = {\mathbf {x}} \overrightarrow {\otimes } {\mathbf {x}}\). Moreover, if we are using the Lagrange multiplier \({\boldsymbol {\varsigma }} \in \mathcal {E}^*_a = \{ {\boldsymbol {\varsigma }} \in {\mathbb R}^{m\times m} | \;\; \langle {\Lambda }({\mathbf {x}}) ; {\boldsymbol {\varsigma }} \rangle \ge 0 \;\;\forall {\mathbf {x}} \in {\mathbb R}^{n} \}\) to relax the ignored geometrical condition ξ =  Λ(x) in \(({\mathcal {P}}_{RLT})\), the problem \(({\mathcal {P}}_q)\) can be equivalently relaxed as:

$$\displaystyle \begin{aligned} ({\mathcal{P}}_\Upsilon): \;\; \min_{{\mathbf{x}} \in \mathcal{X}_a} \min_{{\boldsymbol{\xi}}\in \mathcal{E}_a } \max_{{\boldsymbol{\varsigma}}\in \mathcal{E}_a^*} \left \{ \Upsilon({\mathbf{x}}, {\boldsymbol{\xi}}, {\boldsymbol{\varsigma}}) = \Phi({\boldsymbol{\xi}}) + \langle {\Lambda}({\mathbf{x}}) - {\boldsymbol{\xi}} ; {\boldsymbol{\varsigma}} \rangle - \langle {\mathbf{x}} ,{\mathbf{f}} \rangle \right\}. \end{aligned} $$
(131)

Thus, if \((\bar {\mathbf {x}}, \bar {\boldsymbol {\xi }},\bar {\boldsymbol {\varsigma }})\) is a solution to \(({\mathcal {P}}_\Upsilon )\), then \(\bar {\mathbf {x}}\) should be a solution to \(({\mathcal {P}}_q)\). By using the sequential canonical quadratic transformation Λ(x) = Λp(…( Λ1(x)… ) (see Chapter 4, [18]), this technique can be used for solving general global optimization problems.

6.3 Relation to Composite Minimization

The so-called composite minimization in optimization literature is given in the following form [57]:

$$\displaystyle \begin{aligned} \min_x h(c(x)), \end{aligned} $$
(132)

where \(c:{\mathbb R}^n \rightarrow {\mathbb R}^m\) is called the inner function, and \(h:{\mathbb R}^m \rightarrow {\mathbb R}\cup \{ -\infty , +\infty \}\) is called the outer function. Although there are some mathematical assumptions, such that c(x) is smooth and h(y) may be nonsmooth but is usually convex, this is another abstractly proposed problem. Therefore, this problem appears mainly from numerical approximation methods, for example, the least squares method for solving the fixed point problem (53).

In numerical analysis or matrix completion [5], the variable x is a d × n matrix \({\mathbf {X}} = \{ {\mathbf {x}}_i \} = \{ {\mathbf {x}}^{\alpha }_i \} \; ({\alpha } = 1, \dots , d, \;\; i=1,\dots , n)\). In sensor network communication systems, the component \({\mathbf {x}}_i \in {\mathbb R}^d\) is the position of the i-th sensor and the well-studied sensor localization problem is to find the sensor locations {x i} by solving the following nonlinear system [70]:

$$\displaystyle \begin{aligned} \|{\mathbf{x}}_i - {\mathbf{x}}_j \| = d_{ij} \;\; \forall (i,j) \in {\mathcal{A}}_d, \;\; \|{\mathbf{x}}_i - {{\mathbf{a}}}_k \| = e_{ik} \;\; \forall (i,k) \in {\mathcal{A}}_e \end{aligned} $$
(133)

where {d ij} and {e ik} are given distances, \({{\mathbf {a}}}_k \in {\mathbb R}^d\) (k = 1, ⋯ , m) are specified anchors, and \({\mathcal {A}}_d\) and \({\mathcal {A}}_e\) are two index sets. By the least squares method, this problem can be formulated as a fourth-order polynomial minimization:

$$\displaystyle \begin{aligned} \min \left\{ \Pi({\mathbf{X}}) = \sum_{(i,j) \in {\mathcal{A}}_d} \frac{1}{2} w_{ij} (\|{\mathbf{x}}_i - {\mathbf{x}}_j\|{}^2 - d_{ij} )^2 + \sum_{(i,k) \in {\mathcal{A}}_e} \frac{1}{2} \omega_{ik} (\|{\mathbf{x}}_i - {{\mathbf{a}}}_k\|{}^2 - e_{ik} )^2 \right\}, \end{aligned}$$

where w ij and ω ik are given weights. Clearly, this is a composite minimization if we let

$$\displaystyle \begin{aligned} \begin{array}{rcl} c({\mathbf{X}}) &\displaystyle =&\displaystyle \{ {\mathbf{c}}_{ij}({\mathbf{X}}), \; {\mathbf{c}}_{ik} ({\mathbf{X}}) \} , \;\; {\mathbf{c}}_{ij} = {\mathbf{x}}_i - {\mathbf{x}}_j, \; {\mathbf{c}}_{ik} = {\mathbf{x}}_i - {{\mathbf{a}}}_k\} , \end{array} \end{aligned} $$
(134)
$$\displaystyle \begin{aligned} \begin{array}{rcl} h(c) &\displaystyle =&\displaystyle \sum_{(i,j) \in {\mathcal{A}}_d} \frac{1}{2} w_{ij} (\|{\mathbf{c}}_{ij} \|{}^2 - d_{ij})^2 + \sum_{(i,k) \in {\mathcal{A}}_e} \frac{1}{2} \omega_{ik} ( \|{\mathbf{c}}_{ik}\| - e_{ik})^2. \end{array} \end{aligned} $$
(135)

In this case, the matrix-valued function c(X) = g(X) = DX = {x i −x j, x i −a k} is the finite difference operator in numerical analysis and h(c) = G(g) is a fourth-order nonconvex polynomial of the linear operator g = DX.

We can also let

$$\displaystyle \begin{aligned} \begin{array}{rcl} c({\mathbf{X}}) &\displaystyle =&\displaystyle \{ c_{ij} ({\mathbf{X}}) , c_{ik}({\mathbf{X}}) \}, \;\; c_{ij} = \|{\mathbf{x}}_i {-} {\mathbf{x}}_j\|{}^2 {-} d_{ij}, \; c_{ik} = \|{\mathbf{x}}_i {-} {{\mathbf{a}}}_k\|{}^2 {-} e_{ik} , \qquad \end{array} \end{aligned} $$
(136)
$$\displaystyle \begin{aligned} \begin{array}{rcl} h(c) &\displaystyle =&\displaystyle \sum_{(i,j) \in {\mathcal{A}}_d} \frac{1}{2} w_{ij} c_{ij}^2 + \sum_{(i,k) \in {\mathcal{A}}_e} \frac{1}{2} \omega_{ik} c_{ik}^2. \end{array} \end{aligned} $$
(137)

In this case, Π(X) = h(c(X)) is also a composite function but now c(X) = ξ(X) =  Λ(X) is a nonlinear operator and Φ(ξ) = h(ξ) is a convex function. Therefore, the composition:

$$\displaystyle \begin{aligned} \Pi({\mathbf{X}}) = h(c({\mathbf{X}}) ) = {G}({\mathbf{D}} {\mathbf{X}}) = \Phi({\Lambda}({\mathbf{X}})) \end{aligned}$$

is indeed a canonical transformation. The sensor localization problem is considered to be NP-hard by traditional theories and methods even if d = 1 [3]. From the point view of the canonical duality theory, this problem has usually multiple global minimizers due to the lacking of the subjective function. Therefore, by introducing a linear perturbation F(X) = 〈X, T〉 = tr(X T T), the perturbed sensor localization problem \(\min \{ \Pi ({\mathbf {X}}) = \Phi ({\Lambda }({\mathbf {X}})) - F({\mathbf {X}})\}\) can be solved deterministically by the canonical duality theory in polynomial time [55, 68, 70].

Generally speaking, the composite function is a special case of the canonical transformation G(g) =  Φ ∘ Λ(g) if h(y) =  Φ(y) is convex, x = g, and Λ(x) = c(x) as the geometrical measure. It is an objective function if c(x) = x T x. In this case, h(c(x)) is the so-called convex composite function. In real-world applications, g(x) could be again a composite function. For multi-scale systems, g can be defined by (see [45]):

$$\displaystyle \begin{aligned} {\mathbf{g}}({\mathbf{x}}) = ( {\mathbf{D}}_1 , {\mathbf{D}}_2, \dots , {\mathbf{D}}_k) {\mathbf{x}}= \{ {\mathbf{g}}_i ({\mathbf{x}}) \} ,\;\; {\mathbf{g}}_i({\mathbf{x}}) = {\mathbf{D}}_i {\mathbf{x}}, \end{aligned} $$
(138)

each g i is a geometrical measure with dimension different from other g j, j ≠ i. Correspondingly:

$$\displaystyle \begin{aligned} {G}({\mathbf{D}}{\mathbf{x}}) = \Phi({\Lambda}({\mathbf{x}})) , \; \; {\Lambda}({\mathbf{x}}) = {\Lambda}_k \circ {\Lambda}_{k-1} \circ \dots \circ {\Lambda}_1({\mathbf{x}}) \end{aligned} $$
(139)

is called the sequential canonical transformation (see Chapter 4, [18]). Particularly, if every Λi(ξ i−1) is a convex polynomial function of ξ i−1 = Λi−1 (i = 1, …, k, Λ0 = x), the composition Φ( Λ(x)) is the canonical polynomial function. The sequential canonical transformation for solving high-order polynomial minimization problems have been studied in [18, 46].

7 Conclusions

Based on the necessary conditions and basic laws in physics, a unified multi-scale global optimization problem is proposed in the canonical form:

$$\displaystyle \begin{aligned} \Pi({\boldsymbol{\chi}}) = {G}({\mathbf{D}} {\boldsymbol{\chi}}) - {F}({\boldsymbol{\chi}}) = \Phi({\Lambda}({\boldsymbol{\chi}})) - \langle {\boldsymbol{\chi}}, {\mathbf{f}} \rangle . \end{aligned} $$
(140)

The object G depends only on the model and \({G}({\mathbf {g}}) \ge 0 \;\forall {\mathbf {g}} \in \mathcal {G}_a\) is necessary; G should be an objective function for physical systems, but it is not necessary for artificial systems (such as management/manufacturing processes and numerical simulations, etc.). The subject F depends on each properly posed problem and must satisfy F(χ) ≥ 0 together with necessary geometrical constraints for the output \({\boldsymbol {\chi }} \in \mathcal {X}_a\) and equilibrium conditions for the input \({\mathbf {f}} \in \mathcal {X}^*_a\). The geometrical nonlinearity of Λ(χ) is necessary for nonconvexity in global optimization, bifurcation in nonlinear analysis, chaos in dynamics, and NP-hardness in computer science.

Developed from large deformation nonconvex analysis/mechanics, the canonical duality-triality is a precise mathematical theory with solid foundation in physics and natural root in philosophy, so it is naturally related to the traditional theories and powerful methods in global optimization and nonlinear analysis. By the fact that the canonical duality is a universal law of nature, this theory can be used not only to model real-world problems but also for solving a wide class of challenging problems in multi-scale complex systems. The conjectures proposed in this paper can be used for understanding and clarifying NP-hard problems.

It is author’s hope that by reading this paper, the readers can have a clear understanding not only on the canonical duality-triality theory and its potential applications in multi-disciplinary fields, but also on the generalized duality-triality principle and its role in modeling/understanding real-world problems.