1 Introduction

Many problems in structural optimization can be formulated as a minimization of the maximum eigenvalue of a symmetric matrix. Examples include maximization of the fundamental eigenfrequency (Seyranian et al. 1994; Du and Olhoff 2005), robust design optimization (Kobelev 1993; Takezawa et al. 2011; Brittain et al. 2012; Holmberg et al. 2015), and maximization of the fundamental buckling load (Cox and Overton 1992; Rahmatalla and Swan 2003; Stanford and Beran 2013).Footnote 1

Let λ 1(A) denote the largest eigenvalue of a symmetric matrix \(\boldsymbol {A} \in \mathbb {R}^{n \times n}\). In the present context A depends on the stiffness matrix arising from a finite element discretization of some structure. The stiffness matrix in turn depends on a set of design variables x, usually determining the amount of material present in each finite element. It is frequently observed that the multiplicity of λ 1(A) is greater than one close to or at an optimal solution x , rendering the function xλ 1(A(x)) non-smooth (Overton and Womersley 1993). In this note we provide sufficient conditions for this to happen. The most important part of these is that the number of non-zero elements in x is greater than the order n of A. Since the conditions depend on the solution they cannot in general be checked a priori, but this is quite natural since one expects applied loads and boundary conditions to have a significant impact on the properties of the optimal solutions.

The work presented here builds on the analysis of Pataki (1998) and follows it closely in parts. Pataki (1998) considered unconstrained minimization of the maximum eigenvalue (so did Dedieu (2000) using different techniques), but for structural optimization it is necessary to (at least) introduce lower bounds on the design variables and a linear equality constraint limiting the total amount of material used. The main result here is Theorem 1 which gives a lower bound (greater than one) on the multiplicity of λ 1(A(x )) when A depends linearly on the vector of design variables x, and x is an extreme point in the optimal solution set. In many problems of interest, however, A is a non-linear function of x. Fortunately, Lemma 1 shows that that such problems can be reduced to problems of the type treated by Theorem 1, and Theorem 2 then provides a lower bound on the multiplicity of λ 1(A(x )) in the non-linear case.

2 Notation and basic facts

Let \(\mathbb {S}^{n}\) denote the space of n × n symmetric matrices with real entries. If a matrix in \(\mathbb {S}^{n}\) is positive semi-definite we say that it belongs in \(\mathbb {S}^{n}_{+}\). The eigenvalues of \(\boldsymbol {A} \in \mathbb {S}^{n}\) are denoted λ 1 ≥ … ≥ λ n , and we can write A = QDiag{λ i }Q T where \(\boldsymbol {Q} \in \mathbb {R}^{n\times n}\) is an orthonormal matrix and Diag{λ i } is a diagonal matrix with λ 1, … , λ n on the diagonal. The functions \(\lambda _{i}:\mathbb {S}^{n} \rightarrow \mathbb {R}\) are Lipschitz continuous and convex. The multiplicity of λ 1 is the largest number p, 1 ≤ pn, such that λ 1=… = λ p > λ p+1; we denote this number mult(λ 1). tr(A) is the trace of A.

A face of a convex set C is a non-empty, convex subset F of C such that if \(F \ni x = \frac {1}{2}(x_{1}+x_{2})\) for some x 1, x 2C, then x 1, x 2F. A zero-dimensional face of a convex set, i.e. a point, is called an extreme point of that set.

We use the notation t(n) = n(n + 1)/2 for the n-th triangular number and ℕ0 denote the set of non-negative integers.

3 The affine case

Let \(\boldsymbol {A} : \mathbb {R}^{m} \rightarrow \mathbb {S}^{n}\) be defined by

$$\boldsymbol{A}(\boldsymbol{x}) = \boldsymbol{A}_{0} + \sum\limits_{i=1}^{m}x_{i}\boldsymbol{A}_{i}, $$

where \(\boldsymbol {A}_{i}\in \mathbb {S}^{n}\) for i = 0,…, m. Consider the optimization problem

$$ \underset{\boldsymbol{x}\in\mathcal{X}}{\min} \quad \lambda_{1}(\boldsymbol{A}(\boldsymbol{x})), $$
(1)

where \(\mathcal {X} = \{\boldsymbol {x} \in \mathbb {R}^{m} \;|\; \boldsymbol {x} \geq \boldsymbol {0},\; \boldsymbol {a}^{\mathsf {T}}\boldsymbol {x} = b\}\). The vector \(\boldsymbol {a} \in \mathbb {R}^{m}\) is assumed to be such that this convex set is non-empty, compact and contains a point x > 0. The objective function in (1) is a composition of a convex and an affine function, hence convex. Since \(\mathcal {X}\) is compact and λ 1(⋅) is lower semi-continuous, the set of optimal solutions to (1) is non-empty and compact (Andréasson et al. 2005, Theorem 4.7).

The results proved in this paper concern extreme points of the set of optimal solutions to (1) so it is of interest to know that such points exist. Let \(\lambda _{1}^{\ast }\) denote the optimal value of (1). The set of optimal solutions is then

$$\mathcal{O} = \left\{ \boldsymbol{x} \in \mathbb{R}^{m} \; |\; \lambda_{1}(\boldsymbol{A}(\boldsymbol{x})) = \lambda_{1}^{\ast} \right\} \, \cap \, \mathcal{X}. $$

The first of the sets to the right is closed and convex (Pataki 1998, p. 348), so by Theorem 2.1 in (Rockafeller 1972) \(\mathcal {O}\) is convex. Proposition 2.3.3 in Hiriart-Urruty and Lemarchal (1993) then ensures the existence of an extreme point in \(\mathcal {O}\).

3.1 SDP-reformulation and the main theorem

There exists several different characterizations of the maximum eigenvalue of a symmetric matrix (Pataki 1998). Here we employ a version where the eigenvalue is the optimal value of a certain semi-definite program (SDP).

Let x be given. A symmetric matrix is positive semi-definite if and only if all its eigenvalues are non-negative (Horn and Johnson 1985, Theorem 7.2.1). It follows that z IA(x) ≽ 0zλ 1(A(x)) ≥ 0, where I is an identity matrix. Therefore λ 1(A(x)) is the optimal value of the SDP

$$\begin{array}{@{}rcl@{}} &&\underset{z\in \mathbb{R}, \boldsymbol{W} \in \mathbb{S}^{n}_{+}}{\min} \quad z\\ &&\quad\text{s. t.} \quad z\boldsymbol{I} - \boldsymbol{W} = \boldsymbol{A}(\boldsymbol{x}), \end{array} $$
(2)

where W is a slack variable. Since A(x) is symmetric, A(x) = QDiag{λ i }Q T, with Q an orthonormal matrix of order n. Then (z , W ) is an optimal solution to (2) if and only if z = λ 1(A(x)) and

$$\boldsymbol{W}^{\ast} = \boldsymbol{Q}\text{Diag}\left\{0,z^{\ast}-\lambda_{2},\ldots,z^{\ast}-\lambda_{n}\right\}\boldsymbol{Q}^{\mathsf{T}}. $$

It follows from the last equality that

$$ \text{mult}(\lambda_{1}(\boldsymbol{A}(\boldsymbol{x})) = n - \text{rank}(\boldsymbol{W}^{\ast}). $$
(3)

This implies that the lower the rank of W , the greater the multiplicity of λ 1(A(x)). Consequently, the proof of Theorem 1 below amounts to bounding the rank of W .

Now since x in (2) is arbitrary, problem (1) can be reformulated as

$$\begin{array}{@{}rcl@{}} \underset{\boldsymbol{x}\in\mathcal{X}}{\min} \quad \lambda_{1}(\boldsymbol{A}(\boldsymbol{x})) & = & \underset{\boldsymbol{x}\in\mathcal{X},z\in \mathbb{R}, \boldsymbol{W} \in \mathbb{S}^{n}_{+}}{\min} \quad z\\ && \qquad\text{s. t.} \quad z\boldsymbol{I} - \boldsymbol{W} = \boldsymbol{A}(\boldsymbol{x}). \end{array} $$
(4)

The key step in the proof of Theorem 1 below is the application of Theorem 2.2 in (1998), which provides an upper bound on the sum of the ranks of the matrix variables at faces of the feasible set of an SDP. Therefore we need to reformulate (4) into a problem to which the cited theorem applies. To this end, note that a scalar x j ≥ 0 can be seen as a symmetric, positive semi-definite matrix of order 1. This means that problem (4) can be written as

$$\begin{array}{@{}rcl@{}} &&\underset{x_{1}\in\mathbb{S}^{1}_{+},\ldots,x_{m}\in\mathbb{S}^{1}_{+}, z\in \mathbb{R}, \boldsymbol{W} \in \mathbb{S}^{n}_{+}}{\min} \quad z\\ && \qquad\qquad \text{s. t.} \quad \left\{\begin{array}{l} z\boldsymbol{I} - \boldsymbol{W} = \boldsymbol{A}(\boldsymbol{x}) \\ \boldsymbol{a}^{\mathsf{T}}\boldsymbol{x} = b. \end{array}\right. \end{array} $$
(5)

With this reformulation we can now prove the main theorem.

Theorem 1

Let x be an extreme point of \(\mathcal {O}\) such that \({\sum }_{i=1}^{m} r(x_{i}^{\ast }) > n\), where r(x i ) = rank(x i ) ∈ {0, 1}. Then

$$\begin{array}{@{}rcl@{}} &&\text{mult}(\lambda_{1}(\boldsymbol{A}(\boldsymbol{x}^{\ast})) \geq\\ &&n \!- \!\max\left\{w \in \mathbb{N}_{0} \;|\; t(w) \!\leq \! t(n) \!- \! \sum\limits_{i=1}^{m} r(x_{i}^{\ast}),\, w \leq n \!- \!2\right\}. \end{array} $$

Proof

Since x is an extreme point of \(\mathcal {O}\), it follows from Lemma 4.2 in Pataki (1998) (omit V and take k = 1) that (x , z , W ), (z , W ) solving (2) with x = x , is an extreme point of the feasible set of (5). Let w = rank(W ). Problem (5) has q = 1 unconstrained scalar variables and t(n)+1 equality constraints (counting a symmetric matrix-constraint as t(n) scalar constraints), and an extreme point is a face of dimension zero, so Theorem 2.2 from Pataki (1998) yields

$$\begin{array}{@{}rcl@{}} \sum\limits_{i=1}^{m} r(x_{i}^{\ast}) + t(w) \leq t(n) &+& 1 - q \;\; \Leftrightarrow\\ &&t(w) \leq t(n) - \sum\limits_{i=1}^{m} r(x_{i}^{\ast}). \end{array} $$
(6)

To get a contradiction, assume that w = n−1 [by (3) then, mult(λ 1(A(x )) = 1]. Straightforward calculations show that

$$ t(n-1) = t(n)-n > t(n)-\sum\limits_{i=1}^{m} r(x_{i}^{\ast}), $$
(7)

since by assumption, \({\sum }_{i=1}^{m} r(x_{i}^{\ast })>n\). But now (7) contradicts (6) and it follows that w < n − 1; hence wn − 2. This result together with (6) substituted in (3) then yields

$$\begin{array}{@{}rcl@{}} &&\text{mult}\left( \lambda_{1}(\boldsymbol{A}(\boldsymbol{x}^{\ast}))\right. = n - w \geq\\ &&n \,-\, \max\left\{w \geq 0 \;|\; t(w) \leq t(n) \,-\, \sum\limits_{i=1}^{m} r(x_{i}^{\ast}),\, w \!\leq\! n\,-\,2\right\}. \end{array} $$

Remark 1

If x ≥ 0, y ≥ 0 and x + y = u, then 0 ≤ xu and 0 ≤ yu. Based on this observation we could account for an upper bound u i on each of the x i ’s in (5) by introducing additional variables \(y_{i}\in \mathbb {S}_{+}^{1}\), i = 1,…, m, and m linear equality constraints x i + y i = u i . Inequality (6) in the proof above then becomes

$$t(w) \leq t(n) + m - \sum\limits_{i=1}^{m}\left[r\left( x_{i}^{\ast}\right) + r\left( y_{i}^{\ast}\right)\right]. $$

It holds that \(r(x_{i}^{\ast }) + r(y_{i})^{\ast } = 2\) if \(0<x_{i}^{\ast }<u\); 1 otherwise. Following the same arguments as in the proof one is thus led to a version of Theorem 1 where \({\sum }_{i=1}^{m} r(x_{i}^{*}) > n\) is replaced by

$$\left|\left\{\,i\in \{1,\ldots,m\} \;|\; 0 < x_{i}^{\ast} < u \,\right\}\right| > n, $$

where |⋅| denotes the cardinality of a set. In terms of topology optimization this means that the number of ”grey” elements in the solution should exceed the order of the matrix. This seems overly restrictive since researchers report coalescing of eigenvalues despite using penalization methods which lead to solutions where almost all variables attain their upper or lower bound (Holmberg et al. 2015).

Remark 2

It is straightforward to modify the theorem to account for additional linear equality constraints – adding s such constraints leads to an additional term ” + s” to the right in (6). Linear inequality constraints may be treated by introduction of (non-negative) slack variables. Adding one linear inequality and a slack variable s then implies that a term ” r(s)” should be added to the left-hand side of (6) and ” +1” to the right-hand side.

4 The non-linear case

Consider the optimization problem

$$ \underset{\boldsymbol{x}\in\mathcal{X}}{\min} \quad \lambda_{1}(\boldsymbol{A}(\boldsymbol{x})), $$
(8)

where xA(x) is a continuously differentiable, non-linear function. Lemma 1 below shows than any solution x to this problem is also a solution to a problem of type (1). Therefore it makes sense to assume that x is an extreme point in the optimal solution set of the latter problem. This is done in Theorem 2, the proof of which is then a direct application of Theorem 1.

Lemma 1

Let x be an optimal solution to (8). Then x solves

$$ \underset{\boldsymbol{y}\in \mathcal{X}}{\min} \quad \lambda_{1}\left( \boldsymbol{A}^{\ast}+\sum\limits_{i=1}^{m}(y_{i}-x_{i}^{\ast})\boldsymbol{A}_{i}^{\ast}\right), $$
(9)

where A = A(x ), and \(\boldsymbol {A}_{i}^{\ast } = \frac {\partial \boldsymbol {A}}{\partial x_{i}}(\boldsymbol {x}^{\ast })\).

Proof

We begin by noting that \(\mathcal {X}\) satisfies Slater’s constraint qualification (Andréasson et al. 2005, Def. 5.38), ensuring the necessity of the optimality conditions that follow.

Introduce the Lagrangian

$$L(\boldsymbol{x};\mu,\boldsymbol{\gamma}) = \lambda_{1}(\boldsymbol{A}(\boldsymbol{x})) + \mu(\boldsymbol{a}^{\mathsf{T}}\boldsymbol{x} - b) - \boldsymbol{\gamma}^{\mathsf{T}}\boldsymbol{x}. $$

The necessary first-order optimality conditions for a point \(\boldsymbol {x}^{\ast }\in \mathcal {X}\) in (8) is (Clarke 1983, Theorem 6.1.1) the existence of (μ, γ) ≠ 0, with γ0 such that

$$\begin{array}{@{}rcl@{}} &&\boldsymbol{0} \in \partial L(\boldsymbol{x}^{\ast};\mu,\boldsymbol{\gamma})\\ &&\qquad\quad \gamma_{i}x_{i}^{\ast} = 0, \quad i=1,\ldots,m, \end{array} $$
(10)

where L(x ;μ, γ) = λ 1(A ) + μ aγ is the Clarke’s generalized gradient of L. Let p = mult(λ 1(A )). Theorem 3.9 in Overton and Womersley (1993) provides an expression for the generalized gradient λ 1(A ) using which the optimality conditions become the existence of (μ, γ) ≠ 0, with γ0, and \(\boldsymbol {U}\in \mathbb {S}^{p}_{+}\) such that

$$ \begin{array}{ll} 0 = \text{tr}\left( \boldsymbol{U}\boldsymbol{Q}_{1}^{\mathsf{T}}\boldsymbol{A}_{i}^{\ast}\boldsymbol{Q}_{1}\right) + \mu a_{i} - \gamma_{i}, & \quad i = 1,\ldots,m \\ \gamma_{i}x_{i}^{\ast} = 0, & \quad i=1,\ldots,m \\ \text{tr}(\boldsymbol{U}) = 1, & \end{array} $$
(11)

where the columns of \(\boldsymbol {Q}_{1} \in \mathbb {R}^{n \times p}\) are eigenvectors associated with λ 1(A ).

Problem (9) is convex, so the generalized gradient of the associated Lagrangian coincide with the subdifferential (Clarke 1983, Proposition 2.2.7). The necessary and sufficient, first-order optimality conditions for (9) then follows from Theorem 28.3 in Rockafeller (1972), and are the existence of (μ, γ) ≠ 0, with γ0, and \(\boldsymbol {U}\in \mathbb {S}^{p}_{+}\) such that

$$ \begin{array}{ll} 0 = \text{tr}\left( \boldsymbol{U}\boldsymbol{Q}_{1}^{\mathsf{T}}\boldsymbol{A}_{i}^{\ast}\boldsymbol{Q}_{1}\right) + \mu a_{i} - \gamma_{i}, & \quad i = 1,\ldots,m \\ \gamma_{i}y_{i} = 0, & \quad i=1,\ldots,m \\ \text{tr}(\boldsymbol{U}) = 1. & \end{array} $$
(12)

But these conditions coincide with conditions (11). This shows that y = x satisfies the optimality conditions (12) of problem (9). Since these are sufficient for optimality it follows that x solves (9). □

Theorem 2

Assume that \({\sum }_{i=1}^{m} r(x_{i}^{\ast }) > n\) and that x is an extreme point in the set of optimal solutions to problem (9). Then

$$\begin{array}{@{}rcl@{}} &&\text{mult}(\lambda_{1}(\boldsymbol{A}^{\ast})) \geq\\ &&n \!- \!\max\left\{w \in \mathbb{N}_{0} \;|\; t(w) \!\leq \! t(n) \!- \! \sum\limits_{i=1}^{m} r(x_{i}^{\ast}),\, w \!\leq \! n \!- \!2\right\}. \end{array} $$

Proof

Since problem (9) is of the form (1) with optimal value λ 1(A ) the assertion follows from Theorem 1. □

5 Discussion

This paper has presented sufficient conditions for the maximum eigenvalue at an optimal solution to a type of problems occurring in structural optimization to have multiplicity greater than one. The conditions are given in Theorems 1 and 2, whose proofs are based on results obtained by Pataki (1998).

It is common in structural optimization to require that the design variables satisfy x j ε for some small number ε>0 to avoid singularity of the stiffness matrix (Christensen and Klarbring 2009). A simple change of variables x i y i + ε results in a problem with zero as the lower bound on the design variables to which the presented results apply.

In Holmberg et al. (2015) the author and co-workers considered minimization of the worst-case compliance for a structure subject to self-weight loadings. The problem was formulated as a non-linear SDP, which is likely the best way to treat it numerically, but it is not difficult to show that it is equivalent to

$$\min_{\boldsymbol{x}\in\mathcal{H}} \lambda_{1}(\boldsymbol{H}(\boldsymbol{x})), $$

where \(\mathcal {H}\) is similar to \(\mathcal {X}\), but includes upper bounds on x, and \(\boldsymbol {H}(\boldsymbol {x}) = \boldsymbol {B}(\boldsymbol {x})\boldsymbol {K}(\boldsymbol {x})^{-1}\boldsymbol {B}(\boldsymbol {x})^{\mathsf {T}}\in \mathbb {S}^{s}\), in which K(x) is the global stiffness matrix, B(x) is a scaling matrix and s ∈ {2, 3} is the number of spatial dimensions. For this problem, n in Theorem 2 equals s – a very small number – so the inequality \({\sum }_{i=1}^{m}r(x_{i}^{\ast }) > n\) will, in practise, always hold. Numerical experience (Holmberg et al. 2015) strongly corroborates the suggestion by Theorem 2 that the maximum eigenvalue at an optimal solution will have multiplicity greater than one.

6 Future work

There are at least three items that should be addressed in future work:

  1. (i)

    Are numerical solutions to structural optimization problems typically extreme points in the optimal solution set? Is it possible to modify the results in this paper so that they apply also to non-extremal solutions?

  2. (ii)

    Problems involving generalized eigenvalues fit into the present framework by assuming that they solve ordinary eigenvalue problems (see footnote in the Introduction). This hinges on the non-singularity of at least one of the involved matrices. This assumption is satisfied in the bulk of the relevant literature, but important exceptions exist (Achtziger and Kočvara 2007). Therefore direct treatment of generalized eigenvalues would be desirable.

  3. (iii)

    The results given in this paper should be extended to account for upper bounds on the variables x i . This is needed to analyze, for instance, three-dimensional topology optimization problems for continuum bodies. The issue was discussed in the remark following Theorem 1 above, but the obtained condition seems overly restrictive.