1 Introduction

The problem of packing spheres has been the subject of intense theoretical and empirical research. In particular, many works have tackled the problem with optimization tools. See, for example, [7, 9, 1719, 23, 37, 5053] and the references therein. On the other hand, the problem of packing ellipsoids has received more attention only in the past few years. This problem appears in a large number of practical applications, such as the design of high-density ceramic materials, the formation and growth of crystals [21, 46], the structure of liquids, crystals and glasses [4], the flow and compression of granular materials [25, 33, 34], the thermodynamics of liquid to crystal transition [1, 20, 45], and, in biological sciences, in the chromosome organization in human cell nuclei [54].

The density of three-dimensional ellipsoid packings is analysed in [24]. On the one hand, physical experiments with up to 30,000 spheres and 23,000 M&M’s Milk Chocolate Candies (registered trademark of Mars, Inc.) are performed. On the other hand, a computer-aided simulation technique that generalises the Lubachevsky–Stillinger compression algorithm (LSA) [41] is proposed and applied to instances with up to 1000 ellipsoids. However, since the main subject of the work is to analyse the density of “jammed disordered packings”, the computer-aided simulations do not confine the ellipsoids to a compact container but to a box with periodic boundary conditions and optimization procedures are not employed. In the same line of research, a computer-aided simulation technique that generalises the “force-biased” algorithm [35] is considered in [6] to statistically explore the geometrical structure of random packings of spheroids. Simulations with up to 100,000 spheroids within a box with periodic boundaries are considered. The collision of moving ellipsoids, that finds applications in simulations as well as in robotics to approximately model the collision of free-form objects, is studied in [22].

A problem related to the chromosome organization in the human cell nucleus and that falls between ellipsoid packing and covering is considered in [54]. The problem consists in minimizing a measure of the total overlap of a given set of ellipsoids arranged within a given ellipsoidal container. A hard-to-solve bilevel programming model in which the lower-level problem is a semi-definite programming problem is proposed. Up to our knowledge, only five very recent works in the literature exploit mathematical programming formulations and optimization to deal with the problem of packing ellipses or ellipsoids within rectangular containers. The problem of packing the maximum number of identical ellipses within a rectangle, with the ellipses’ axes parallel to the sides of the rectangle, was approached in [27]. A set T of points is arbitrarily defined (by a grid) and the centers of the ellipses are restricted to coincide with a point in T. In this way, a mixed integer linear programming model is developed and a heuristic method is proposed. Small instances are also solved with an exact commercial solver. Numerical experiments with up to 69 ellipses are presented. The problem of minimizing the area of a rectangular container that can fit a given set of (non-necessarily identical) freely-rotated ellipses is tackled in [38]. A nonlinear programming model that addresses the non-overlapping between the ellipses by defining separating lines is proposed. With an eye in finding global minimizers of the proposed models, symmetry-breaking constraints are introduced and an alternative mixed integer nonlinear programming model is devised. The analysis of the numerical experiments presented in [38] allow us to conclude that the state-of-the-art global optimization solvers BARON [48], LindoGlobal [39], and GloMIQO [43] available within the GAMS platform, were unable to find global solutions for instances with more than 4 ellipses (when restricted to a maximum of 5 h of CPU time). For larger instances, a heuristic method is proposed in [38] and numerical experiments with up to 100 ellipses are presented. The problem of placing a given set of ellipses within a rectangular container of minimal area is considered in [49]. Nonlinear programming models are proposed by considering “quasi-phi-functions” that are an extension of the phi-functions that were extensively used in the literature to model a large variety of packing problems (see, for example, [5053] and the references therein). Using ad hoc initial guesses, instances with up to 120 ellipses are tackled by a multi-start strategy combined with a local nonlinear programming solver. In [44], the methodology proposed in [49] is extended to deal with the problem of packing spheroids within a rectangular container of minimal volume and numerical experiments with up to 12 spheroids are presented. In [36], the methodology introduced in [38] for packing ellipses within rectangles of minimum area is extended to tackle the problem of packing ellipsoids within rectangular containers of minimum volume. The non-overlapping constraints are based on separating hyperplanes. Nonlinear programming models are proposed and tackled by global optimization methods. Instances with up to 100 ellipsoids are considered, but the state-of-the-art global optimization solvers available within GAMS are unable to find optimal solutions and only feasible points are reported (a gap smaller than \(10^{-4}\) is found for instances with only two ellipsoids).

The problem of packing ellipsoids within spheres, ellipsoids, and polyhedrons (in the n-dimensional space) is considered in the present work. Two different continuous and differentiable nonlinear programming models for the non-overlapping between the ellipsoids are proposed. When combined with fitting constraints, (continuous and differentiable) nonlinear programming models for a variety of ellipsoid packing problems are obtained. In particular, as illustrative examples of the capabilities of the proposed models, five different two- and three-dimensional problems are addressed: (a) packing the maximum number of identical ellipses with given semi-axis lengths within a rectangular container with given length and width; (b) finding the rectangular container with minimum area that can pack a given set of ellipses; (c) finding the elliptical container with minimum area that can pack a given set of identical ellipses; (d) finding the spherical container with minimum volume that can pack a given set of identical ellipsoids; and (e) finding the cuboid with minimum volume that can pack a given set of identical ellipsoids. In all cases, a simple multi-start strategy combined with a nonlinear programming solver is employed with the aim of finding good quality solutions. The models and methodologies presented in [38, 49] apply to problem (b). Numerical experiments will show that models and methodologies introduced in the present work are able to find equivalent solutions to those obtained in [38] for small-sized instances and improve almost all solutions of larger instances; additionally improving almost all solutions obtained in [49] for the instances introduced in [38]. Moreover, while the models presented in [27, 38, 49] deal with two-dimensional problems and rectangular containers and the model presented in [44] deals with spheroids and rectangular containers, the models introduced in the present work deal with n-dimensional problems with arbitrary ellipsoids and ellipsoidal and polyhedral containers. Models introduced in [36] (that was published online after the submission of the present manuscript) tackle problem (e) above and have some similarities with one of the models presented in this work. However, in [36], global optimization techniques are employed and only feasible solutions are delivered for medium-sized instances. On the other hand, using multi-start strategies combined with local minimization solvers, good-quality local solutions are reported in the present work.

This paper is organized as follows. In Sect. 2, we state the problem being considered in this work. In Sect. 3, we present a transformation that will be used to develop some of the models. In Sect. 4, we introduce two continuous and differentiable nonlinear models for avoiding overlapping between ellipsoids. In Sect. 5, we propose nonlinear models for including an ellipsoid inside an ellipsoid and inside a half-space. Some illustrative numerical experiments are presented in Sect. 6. We close the paper with some concluding remarks in Sect. 7. The computer implementation of the models and methods introduced in the present work, as well as the reported solutions, are freely available at http://www.ime.usp.br/~lobato/.

2 Problem statement

An ellipsoid in \({\mathbb {R}}^n\) is a set of the form \({\mathcal {E}}= \{x \in {\mathbb {R}}^n \mid (x-c)^{\top } M^{-1} (x-c) \le 1\}\), where \(M~\in ~{\mathbb {R}}^{n \times n}\) is a symmetric and positive definite matrix. The vector \(c \in {\mathbb {R}}^n\) is the center of the ellipsoid. The eigenvectors of \(M^{-1}\) determine the principal axes of the ellipsoid and the eigenvalues of \(M^{\frac{1}{2}}\) are the lengths of the semi-axes of the ellipsoid. If M is a positive multiple of the identity matrix, then the ellipsoid is a ball. More specifically, if \(M = r^2I_{n}\) for some \(r > 0\), then the ellipsoid is a ball with radius r. An ellipsoid in a two-dimensional space is also called an ellipse. We denote by \(\partial {\mathcal {E}}\) the frontier of \({\mathcal {E}}\), i.e., \(\partial {\mathcal {E}} = \{x \in {\mathbb {R}}^n \mid (x-c)^{\top }M^{-1}(x-c) = 1\}\). We denote by \(\text {int}({\mathcal {E}})\) the interior of \({\mathcal {E}}\), i.e., \(\text {int}({\mathcal {E}}) = \{x \in {\mathbb {R}}^n \mid (x-c)^{\top }M^{-1}(x-c) < 1\}\). We say that two ellipsoids overlap if there exists a point that belongs to the interior of both ellipsoids. We say that two ellipsoids touch each other if they do not overlap and there exists a point that belongs to the frontier of both ellipsoids.

In this work, we deal with the problem of packing ellipsoids in the n-dimensional space. We can state this problem as follows. Given ellipsoids \({\mathcal {E}}_1, \dots , {\mathcal {E}}_m\) in \({\mathbb {R}}^n\) and a set \({\mathcal {C}}\subset {\mathbb {R}}^n\), that we call a container from now on, we want to find ellipsoids \(\bar{{\mathcal {E}}}_1, \dots , \bar{{\mathcal {E}}}_m\) such that

  1. 1.

    \(\bar{{\mathcal {E}}}_i\) is obtained by rotating and translating ellipsoid \({\mathcal {E}}_i\) for all \(i \in \{1,\dots ,m\}\);

  2. 2.

    \(\text {int}(\bar{{\mathcal {E}}}_i) \cap \text {int}(\bar{{\mathcal {E}}}_j) = \emptyset \) for all \(i,j \in \{1,\dots ,m\}\) with \(i \ne j\);

  3. 3.

    \(\bar{{\mathcal {E}}}_i \subseteq {\mathcal {C}}\) for each \(i \in \{1,\dots ,m\}\).

The first constraint states that we can only rotate and translate the given ellipsoids. The second constraint says that the ellipsoids cannot overlap. The third constraint requires that each ellipsoid be inside the container. This is a feasibility problem whose variables are the center and angles of rotation of each ellipsoid. If ellipsoids to be packed are all identical then, by solving a sequence of feasibility problems with an increasing number of ellipsoids, we are able to tackle the optimization problem of packing as many ellipsoids as possible within a given container. In this work, we also consider the problem of, given a set of (not necessarily identical) ellipsoids, minimizing the volume of a container of a given shape.

3 Preliminaries

Consider a rotation matrix \(Q \in {\mathbb {R}}^{n \times n}\) and the transformation \(R: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\) defined by \(R(x) = Qx + c\), where \(c \in {\mathbb {R}}^n\). In a two-dimensional space, we can represent a rotation matrix as

$$\begin{aligned} Q(\theta ) = \left( \begin{array}{cc} \cos \theta &{}\quad -\sin \theta \\ \sin \theta &{}\quad \quad \cos \theta \end{array} \right) , \end{aligned}$$
(1)

which rotates a point counterclockwise through an angle \(\theta \). In a three-dimensional space, we can represent a rotation matrix as

$$\begin{aligned} Q(\psi ,\theta ,\phi ) \!=\! \left( \begin{array}{ccc} \cos \theta \cos \psi &{}\quad \sin \phi \sin \theta \cos \psi -\cos \phi \sin \psi &{}\quad \sin \phi \sin \psi + \cos \phi \sin \theta \cos \psi \\ \cos \theta \sin \psi &{}\quad \cos \phi \cos \psi + \sin \phi \sin \theta \sin \psi &{}\quad \cos \phi \sin \theta \sin \psi -\sin \phi \cos \psi \\ -\sin \theta &{}\quad \sin \phi \cos \theta &{}\quad \cos \phi \cos \theta \end{array} \right) , \end{aligned}$$
(2)

which rotates a point through an angle \(\phi \) about the x-axis, through an angle \(\theta \) about the y-axis, and through an angle \(\psi \) about the z-axis. These rotations appear clockwise when the axis about which they occur points toward the observer. Consider the ellipsoid \({\mathcal {E}}= \{x \in {\mathbb {R}}^n \mid x^{\top } M^{-1} x \le 1\}\), where \(M \in {\mathbb {R}}^{n \times n}\) is a symmetric and positive definite matrix. After applying the transformation R to the elements of \({\mathcal {E}}\) (that is centered at the origin), we obtain the set

$$\begin{aligned} \bar{{\mathcal {E}}}= & {} \{x \in {\mathbb {R}}^n \mid x = R(z), z \in {\mathcal {E}}\}\\= & {} \{ x \in {\mathbb {R}}^n \mid (x - c)^{\top } QM^{-1}Q^{\top } (x-c) \le 1\}. \end{aligned}$$

The set \(\bar{{\mathcal {E}}}\) is an ellipsoid, since \(QM^{-1}Q^{\top }\) is symmetric and positive definite. In fact, the transformation R is an isometry, since it is a rotation followed by a translation.

Now, consider the ellipsoids

$$\begin{aligned} {\mathcal {E}}_i= & {} \{ x \in {\mathbb {R}}^n \mid (x - c_i)^{\top }Q_iP_i^{-1}Q_i^{\top } (x-c_i) \le 1\},\nonumber \\ {\mathcal {E}}_j= & {} \{ x \in {\mathbb {R}}^n \mid (x - c_j)^{\top }Q_jP_j^{-1}Q_j^{\top } (x-c_j) \le 1\}, \end{aligned}$$
(3)

where \(P_i\) and \(P_j\) are positive definite diagonal matrices, and \(Q_i\) and \(Q_j\) are rotation matrices. Consider the linear transformation \(T_i: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\) defined by

$$\begin{aligned} T_i(x) = P_i^{-\frac{1}{2}} Q_i^{\top }x. \end{aligned}$$
(4)

Let \({\mathcal {E}}_{ii}\) be the set obtained when the transformation \(T_i\) is applied to every element of \({\mathcal {E}}_i\), i.e.,

$$\begin{aligned} {\mathcal {E}}_{ii}= & {} \{ x \in {\mathbb {R}}^n \mid x = T_i(z), z \in {\mathcal {E}}_i\}\nonumber \\= & {} \left\{ x \in {\mathbb {R}}^n \mid \left( x - P_i^{-\frac{1}{2}}Q_i^{\top }c_i\right) ^{\top }\left( x - P_i^{-\frac{1}{2}}Q_i^{\top }c_i\right) \le 1\right\} . \end{aligned}$$
(5)

Note that \({\mathcal {E}}_{ii}\) is a ball with unitary radius centered at \(P_i^{-\frac{1}{2}}Q_i^{\top }c_i\). By applying the transformation \(T_i\) to the elements of \({\mathcal {E}}_j\), we obtain the set

$$\begin{aligned} {\mathcal {E}}_{ij}= & {} \{ x \in {\mathbb {R}}^n \mid x = T_i(z), z \in {\mathcal {E}}_j\}\nonumber \\= & {} \left\{ x \in {\mathbb {R}}^n \mid \left( x - P_i^{-\frac{1}{2}}Q_i^{\top }c_j\right) ^{\top } S_{ij}\left( x - P_i^{-\frac{1}{2}}Q_i^{\top }c_j\right) \le 1\right\} , \end{aligned}$$
(6)

where

$$\begin{aligned} S_{ij} = P_i^{\frac{1}{2}}Q_i^{\top }Q_jP^{-1}_jQ_j^{\top }Q_iP_i^{\frac{1}{2}}. \end{aligned}$$
(7)

Observe that \(S_{ij}\) can be written as \(S_{ij} = V_{ij}^{\top }V_{ij}\), where \(V_{ij} = P_j^{-\frac{1}{2}}Q_j^{\top }Q_iP_i^{\frac{1}{2}}\). Then, \(S_{ij}\) is symmetric. Moreover, since \(V_{ij}\) is nonsingular with \(V_{ij}^{-1} = P_i^{-\frac{1}{2}}Q_i^{\top }Q_jP_j^{\frac{1}{2}}\), matrix \(S_{ij}\) is positive definite. Thus \({\mathcal {E}}_{ij}\) is an ellipsoid. Lemma 3.1 shows that the ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\) overlap if and only if the ball \({\mathcal {E}}_{ii}\) and the ellipsoid \({\mathcal {E}}_{ij}\) overlap. This means that the problem of verifying whether two arbitrary ellipsoids overlap can be reduced to verifying whether an unitary radius ball and an ellipsoid overlap.

Lemma 3.1

Consider the ellipsoids \({\mathcal {E}}_i, {\mathcal {E}}_j, {\mathcal {E}}_{ii}\) and \({\mathcal {E}}_{ij}\) defined in (3), (5) and (6). Then, the ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\) overlap if and only if the ellipsoids \({\mathcal {E}}_{ii}\) and \({\mathcal {E}}_{ij}\) overlap.

Proof

For any \(x \in {\mathbb {R}}^n\), we have

$$\begin{aligned} (x -c_i)^{\top }Q_iP_i^{-1} Q_i^{\top }(x - c_i)= & {} (x -c_i)^{\top }Q_iP_i^{-\frac{1}{2}} P_i^{-\frac{1}{2}}Q_i^{\top }(x - c_i)\\= & {} (x -c_i)^{\top }\left( P_i^{-\frac{1}{2}}Q_i^{\top }\right) ^{\top } P_i^{-\frac{1}{2}}Q_i^{\top }(x - c_i) \\= & {} \left( P_i^{-\frac{1}{2}} Q_i^{\top }x - P_i^{-\frac{1}{2}}Q_i^{\top }c_i\right) ^{\top }\left( P_i^{-\frac{1}{2}} Q_i^{\top }x - P_i^{-\frac{1}{2}}Q_i^{\top }c_i\right) \\= & {} \left( T_i(x) - P_i^{-\frac{1}{2}}Q_i^{\top }c_i\right) ^{\top }\left( T_i(x) - P_i^{-\frac{1}{2}}Q_i^{\top }c_i\right) . \end{aligned}$$

Then, \(x \in \text {int}({\mathcal {E}}_i)\) if and only if \(T_i(x) \in \text {int}({\mathcal {E}}_{ii})\). Moreover, for any \(x \in {\mathbb {R}}^n\),

$$\begin{aligned}&(x - c_j)^{\top }Q_jP^{-1}_jQ_j^{\top }(x - c_j)\nonumber \\&\quad = (x - c_j)^{\top }Q_iP_i^{-\frac{1}{2}} P_i^{\frac{1}{2}}Q_i^{\top }Q_jP^{-1}_jQ_j^{\top }Q_iP_i^{\frac{1}{2}} P_i^{-\frac{1}{2}}Q_i^{\top }(x - c_j)\nonumber \\&\quad = (x - c_j)^{\top }Q_iP_i^{-\frac{1}{2}} S_{ij}P_i^{-\frac{1}{2}}Q_i^{\top }(x - c_j)\nonumber \\&\quad = (x - c_j)^{\top }\left( P_i^{-\frac{1}{2}}Q_i^{\top }\right) ^{\top } S_{ij}P_i^{-\frac{1}{2}}Q_i^{\top }(x - c_j)\nonumber \\&\quad = \left( P_i^{-\frac{1}{2}} Q_i^{\top }x - P_i^{-\frac{1}{2}}Q_i^{\top }c_j\right) ^{\top } S_{ij}\left( P_i^{-\frac{1}{2}} Q_i^{\top }x - P_i^{-\frac{1}{2}}Q_i^{\top }c_j\right) \nonumber \\&\quad = \left( T_i(x) - P_i^{-\frac{1}{2}}Q_i^{\top }c_j\right) ^{\top } S_{ij}\left( T_i(x) - P_i^{-\frac{1}{2}}Q_i^{\top }c_j\right) . \end{aligned}$$

Therefore, \(x \in \text {int}({\mathcal {E}}_j)\) if and only if \(T_i(x) \in \text {int}({\mathcal {E}}_{ij})\). Hence, \(\text {int}({\mathcal {E}}_i) \cap \text {int}({\mathcal {E}}_j) \ne \emptyset \) if and only if \(\text {int}({\mathcal {E}}_{ii}) \cap \text {int}({\mathcal {E}}_{ij}) \ne \emptyset \). In other words, the ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\) overlap if and only if \({\mathcal {E}}_{ii}\) and \({\mathcal {E}}_{ij}\) overlap. \(\square \)

Figure 1 illustrates this transformation. Three ellipses are shown in Fig. 1a, where the ellipses \({\mathcal {E}}_1\) and \({\mathcal {E}}_2\) overlap. Figure 1b shows these ellipses after applying the transformation \(T_1\), that turns the ellipse \({\mathcal {E}}_1\) into a unitary radius ball. Note that in Fig. 1b only the ellipses \({\mathcal {E}}_{11}\) and \({\mathcal {E}}_{12}\) overlap.

Fig. 1
figure 1

a Three ellipses and an overlapping between ellipses \({\mathcal {E}}_1\) and \({\mathcal {E}}_2\). Ellipse \({\mathcal {E}}_3\) does not overlap the other ellipses. b The transformation that converts \({\mathcal {E}}_1\) into a ball is applied to each ellipse

4 Non-overlapping model

We shall present two nonlinear models for the non-overlapping constraints of ellipsoids. The first one is based on the transformation \(T_i\) introduced in Sect. 3 and the second model is based on separating hyperplanes.

4.1 Transformation based model

Consider a ball \({\mathcal {B}}\) with radius \(r > 0\) and an ellipsoid \({\mathcal {E}}\), both in \({\mathbb {R}}^n\). We know that \({\mathcal {B}}\) and \({\mathcal {E}}\) overlap if and only if the distance between the center of the ball \({\mathcal {B}}\) and the ellipsoid \({\mathcal {E}}\) is strictly less than r. Therefore, a necessary and sufficient condition for \({\mathcal {B}}\) and \({\mathcal {E}}\) not to overlap is that the distance between the center of the ball \({\mathcal {B}}\) and the ellipsoid \({\mathcal {E}}\) must be greater than or equal to r.

Now, consider the ellipsoids

$$\begin{aligned} {\mathcal {E}}_i= & {} \{ x \in {\mathbb {R}}^n \mid (x - c_i)^{\top }Q_iP_i^{-1}Q_i^{\top } (x-c_i) \le 1\} {\text { and }}\\ {\mathcal {E}}_j= & {} \{ x \in {\mathbb {R}}^n \mid (x - c_j)^{\top }Q_jP_j^{-1}Q_j^{\top } (x-c_j) \le 1\}, \end{aligned}$$

where \(c_i, c_j \in {\mathbb {R}}^n\), \(Q_i, Q_j \in {\mathbb {R}}^{n \times n}\) are orthogonal matrices, and \(P_i, P_j \in {\mathbb {R}}^{n \times n}\) are diagonal and positive definite matrices. As seen in Sect. 3, when transformation \(T_i\) defined in (4) is applied to both ellipsoids, we obtain the ball

$$\begin{aligned} {\mathcal {E}}_{ii} = \left\{ x \in {\mathbb {R}}^n \mid \left( x - P_i^{-\frac{1}{2}}Q_i^{\top }c_i\right) ^{\top }\left( x - P_i^{-\frac{1}{2}}Q_i^{\top }c_i\right) \le 1\right\} \end{aligned}$$

with unitary radius and the ellipsoid

$$\begin{aligned} {\mathcal {E}}_{ij} =\left\{ x \in {\mathbb {R}}^n \mid \left( x - P_i^{-\frac{1}{2}}Q_i^{\top }c_j\right) ^{\top } S_{ij}\left( x - P_i^{-\frac{1}{2}}Q_i^{\top }c_j\right) \le 1\right\} , \end{aligned}$$

where \(S_{ij}\) is given by (7). In order to guarantee that \({\mathcal {E}}_{ii}\) and \({\mathcal {E}}_{ij}\) do not overlap, it is enough to require that the distance between the center \(c_{ii}\) of the ball \({\mathcal {E}}_{ii}\) and the ellipsoid \({\mathcal {E}}_{ij}\) be greater than or equal to one. Notice that, according to the discussion present in Sect. 3, this is a necessary and sufficient condition for the ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\) not to overlap. However, there is no known analytic expression for this distance. Thus, to find it, we can solve the problem of projecting \(c_{ii}\) onto \({\mathcal {E}}_{ij}\), that can be formulated as

$$\begin{aligned}&{\text {minimize}} \quad \left\| x - c_{ii} \right\| ^2\nonumber \\&{\text {subject to}} \quad x \in {\mathcal {E}}_{ij}. \end{aligned}$$
(8)

This is a convex quadratic programming problem whose optimal value is the squared distance between the center of the ball \({\mathcal {E}}_{ii}\) and the ellipsoid \({\mathcal {E}}_{ij}\). To find this distance more easily, we can represent the center of the ball \({\mathcal {E}}_{ii}\) as a function of ellipsoid \({\mathcal {E}}_{ij}\) in a convenient way detailed hereafter.

With a simple change of variables, we can rewrite problem (8) as the problem

$$\begin{aligned} \begin{array}{ll} &{}{\text {minimize}} \quad \left\| x - (c_{ii} - P_i^{-\frac{1}{2}}Q_i^{\top }c_j) \right\| ^2\\ &{}{\text {subject to}} \quad x^{\top }S_{ij}x \le 1. \end{array} \end{aligned}$$
(9)

Let \(\bar{{\mathcal {E}}}_{ij}\) be the ellipsoid determined by matrix \(S_{ij}\) and centered at the origin, i.e.,

$$\begin{aligned} \bar{{\mathcal {E}}}_{ij} = \left\{ x \in {\mathbb {R}}^n \mid x^{\top } S_{ij}x \le 1\right\} . \end{aligned}$$

Problem (9) is the problem of projecting the point \(c_{ii} - P_i^{-\frac{1}{2}}Q_i^{\top }c_j\) onto ellipsoid \(\bar{{\mathcal {E}}}_{ij}\). Suppose that \(c_{ii} \notin \text {int}({\mathcal {E}}_{ij})\). Equivalently, we have \(c_{ii} - P_i^{-\frac{1}{2}}Q_i^{\top }c_j \notin \text {int}(\bar{{\mathcal {E}}}_{ij})\). Therefore, by Proposition 4.1 below, problem (9) has a unique solution \(x_{ij} \in {\mathbb {R}}^n\). Moreover, its solution belongs to the frontier of ellipsoid \(\bar{{\mathcal {E}}}_{ij}\), namely, \(x_{ij}^{\top }S_{ij}x_{ij} = 1\), and there exists a unique \(\mu _{ij} \in {\mathbb {R}}_+\) such that

$$\begin{aligned} c_{ii} - P_i^{-\frac{1}{2}}Q_i^{\top }c_j = x_{ij} + \mu _{ij} S_{ij}x_{ij}. \end{aligned}$$

Thus, as long as \(c_{ii} \notin \text {int}({\mathcal {E}}_{ij})\), \(c_{ii}\) is uniquely represented as a function of a point in the frontier of \(\bar{{\mathcal {E}}}_{ij}\) and a non-negative scalar. In this case, the distance between the center \(c_{ii}\) of the ball \({\mathcal {E}}_{ii}\) and the ellipsoid \({\mathcal {E}}_{ij}\) is given by

$$\begin{aligned} \left\| x_{ij} - \left( c_{ii} - P_i^{-\frac{1}{2}}Q_i^{\top }c_j\right) \right\| = \mu _{ij}\left\| S_{ij}x_{ij} \right\| . \end{aligned}$$

On the other hand, by Proposition 4.2 below, any point of the form \(y = x + \mu S_{ij} x\) with \(x^{\top }S_{ij}x = 1\) and \(\mu > 0\) is such that \(y^{\top }S_{ij}y > 1\), i.e., it is a point that does not belong to the ellipsoid \(\bar{{\mathcal {E}}}_{ij}\). If \(\mu = 0\), then \(y=x\) and, therefore, y is a point on the frontier of ellipsoid \(\bar{{\mathcal {E}}}_{ij}\). Thus, any point of the form \(y = x + \mu S_{ij} x\) such that \(x^{\top }S_{ij}x = 1\) and \(\mu \in {\mathbb {R}}_+\) does not belong to the interior of ellipsoid \(\bar{{\mathcal {E}}}_{ij}\).

If \(c_{ii}\) lies in the interior of \({\mathcal {E}}_{ij}\), then the distance from \(c_{ii}\) to the ellipsoid \({\mathcal {E}}_{ij}\) is zero. So, for the ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\) not to overlap, \(c_{ii}\) must be outside the interior of \({\mathcal {E}}_{ij}\). Therefore, we can represent the center of the ball \({\mathcal {E}}_{ii}\) as a function of a point \(x_{ij}\) in the frontier of ellipsoid \(\bar{{\mathcal {E}}}_{ij}\) and a nonnegative number \(\mu _{ij}\) without loss of generality. Using this representation, the distance from the center of the ball \({\mathcal {E}}_{ii}\) to the ellipsoid \({\mathcal {E}}_{ij}\) is given by \(\mu _{ij}\left\| S_{ij}x_{ij} \right\| \).

Proposition 4.1

Let \({\mathcal {E}}= \{x \in {\mathbb {R}}^n \mid x^{\top }Mx \le 1\}\), where \(M \in {\mathbb {R}}^{n \times n}\) is a positive definite matrix. Thus, for each \(y \in {\mathbb {R}}^n \setminus \text {int}({\mathcal {E}})\), there exist unique \(x^* \in {\mathbb {R}}^n\) and \(\mu ^* \in {\mathbb {R}}\) such that \(y = x^* + \mu ^* M x^*\) and \(x^*\) is the projection of y onto \({\mathcal {E}}\). Moreover, \(x^* \in \partial {\mathcal {E}}\) and \(\mu ^* \in {\mathbb {R}}_+\).

Proof

Let \(y \in {\mathbb {R}}^n\) be such that \(y \notin \text {int}({\mathcal {E}})\). The problem of projecting y onto the set \({\mathcal {E}}\) can be formulated as the problem

$$\begin{aligned} \begin{array}{ll} {\text {minimize}} &{}\quad \left\| x - y \right\| ^2\\ {\text {subject to}} &{}\quad x^{\top }Mx \le 1. \end{array} \end{aligned}$$

Since \({\mathcal {E}}\) is convex, this problem has a unique solution \(x^*\) (see, for example, Proposition 2.1.3 in [5]). The Lagrangian function associated with the above problem is

$$\begin{aligned} {\mathcal {L}}(x,\mu ) = \left\| x-y \right\| ^2 + \mu (x^{\top }Mx - 1), \end{aligned}$$

whose gradient with respect to x is

$$\begin{aligned} \nabla _x {\mathcal {L}}(x,\mu ) = 2(x-y) + 2 \mu Mx. \end{aligned}$$

Since the function that defines the inequality constraint is convex and the null vector strictly satisfies this constraint, this problem fulfills the Slater constraint qualification (see, for example, Proposition 3.3.9 in [5]). So, according to the Karush–Kuhn–Tucker first-order necessary conditions (see, for example, Proposition 3.3.1 in [5]), there exists a unique \(\mu ^* \in {\mathbb {R}}\) such that

$$\begin{aligned} \nabla _x \mathcal {L}(x^*,\mu ^*)= & {} 0 \end{aligned}$$
(10)
$$\begin{aligned} \mu ^* ({x^*}^{\top }Mx^* - 1)= & {} 0 \end{aligned}$$
(11)
$$\begin{aligned} \mu ^*\ge & {} 0. \end{aligned}$$
(12)

Therefore, by condition (10), we have that \(y = x^* + \mu ^* Mx^*\). If \(y \in \partial {\mathcal {E}}\), then we must have \(x^* = y\) and \(\mu ^* = 0\). On the other hand, if \(y \notin {\mathcal {E}}\), then we must have \(\mu ^* \ne 0\). So, by condition (12), we must have \(\mu ^* > 0\). Consequently, condition (11) implies \({x^*}^{\top }Mx^* = 1\), i.e., \(x^* \in \partial {\mathcal {E}}\). \(\square \)

Proposition 4.2

Let \({\mathcal {E}}= \{x \in {\mathbb {R}}^n \mid x^{\top }Mx \le 1\}\), where \(M \in {\mathbb {R}}^{n \times n}\) is a positive definite matrix. Thus, for each \(x \in \partial {\mathcal {E}}\) and \(\mu > 0\), we have \((x + \mu M x)^{\top }M(x + \mu Mx) > 1\).

Proof

Let \(x \in \partial {\mathcal {E}}\) and \(\mu > 0\). Thus, \(x^{\top }Mx = 1\) and

$$\begin{aligned} (x + \mu M x)^{\top }M(x + \mu Mx)= & {} x^{\top }Mx + 2 \mu (Mx)^{\top }Mx + \mu ^2 (Mx)^{\top }M(Mx)\\= & {} 1 + 2 \mu \left\| Mx \right\| ^2 + \mu ^2 (Mx)^{\top }M(Mx) > 1, \end{aligned}$$

where the inequality follows from the fact that \(Mx \ne 0\) and M is positive definite. \(\square \)

Based on this representation, we shall develop a model for the non-overlapping of ellipsoids in \({\mathbb {R}}^n\). Let \(I = \{1,\dots ,m\}\) be the set of indices of the ellipsoids. For each \(i \in I\), it is given a positive definite diagonal matrix \(P_i^{\frac{1}{2}} \in {\mathbb {R}}^{n \times n}\) whose entries are the lengths of the semi-principal axes of ellipsoid i. In order to guarantee that all the m ellipsoids do not overlap each other, we ensure that ellipsoids i and j do not overlap for each \(i,j \in I\) such that \(i < j\).

For each \(i \in I\), the decision variable \(c_i \in {\mathbb {R}}^n\) will represent the center of ellipsoid i and \(Q_i \in {\mathbb {R}}^{n \times n}\) will represent a rotation matrix for ellipsoid i.

For each \(i,j \in I\) such that \(i < j\), the decision variable \(x_{ij} \in {\mathbb {R}}^n\) will represent a point in the frontier of ellipsoid \(\bar{{\mathcal {E}}}_{ij}\) and \(\mu _{ij} \in {\mathbb {R}}\) will be a nonnegative variable. Let \(i,j \in I\) be such that \(i < j\). Since \(x_{ij}\) will be a point in the frontier of ellipsoid \(\bar{{\mathcal {E}}}_{ij}\), we must have \(x_{ij}^{\top }S_{ij}x_{ij} = 1\). Since \(\mu _{ij}\) must be nonnegative, we must have the constraint \(\mu _{ij} \ge 0\). Moreover, since the distance between the center of ball \({\mathcal {E}}_{ii}\) and the ellipsoid \({\mathcal {E}}_{ij}\) must be greater than or equal to one, we must have \(\mu _{ij}\left\| S_{ij}x_{ij} \right\| \ge 1\) or, equivalently, \(\mu _{ij}^2 \left\| S_{ij}x_{ij} \right\| ^2 \ge 1\). According to the adopted representation, the center \(c_{ii}\) of the ball \({\mathcal {E}}_{ii}\) as a function of ellipsoid \({\mathcal {E}}_{ij}\) is given by

$$\begin{aligned} c_{ii} = x_{ij} + \mu _{ij}S_{ij}x_{ij} + P_i^{-\frac{1}{2}}Q_i^{\top }c_j. \end{aligned}$$

Finally, for each \(i \in I \setminus \{m\}\), the center of ball \({\mathcal {E}}_{ii}\) is \(P_i^{-\frac{1}{2}}Q_i^{\top }c_i\). So, we obtain the following model:

$$\begin{aligned}&x_{ij}^{\top }S_{ij}x_{ij} = 1, \qquad \forall i,j \in I {\text { such that }} i < j \end{aligned}$$
(13)
$$\begin{aligned}&\mu _{ij}^2 \left\| S_{ij}x_{ij} \right\| ^2 \ge 1, \qquad \forall i,j \in I {\text { such that }} i < j \end{aligned}$$
(14)
$$\begin{aligned}&\mu _{ij} \ge 0, \qquad \forall i,j \in I {\text { such that }} i < j \end{aligned}$$
(15)
$$\begin{aligned}&P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j) = x_{ij} + \mu _{ij}S_{ij}x_{ij}, \qquad \forall i,j \in I {\text { such that }} i < j \end{aligned}$$
(16)

Model (13)–(16) can be somewhat simplified. Notice that any solution to the system (13)–(16) must strictly satisfy inequalities (15). In other words, any solution must be such that \(\mu _{ij} > 0\) for all \(i,j \in I\) such that \(i < j\). This is a consequence of constraints (14). Firstly, we present Proposition 4.3, which offers a strictly positive lower bound for the value of \(\mu _{ij}\). Lemma 4.1 is used in the proof of Proposition 4.3 and provides an upper bound on the norm of the vector \(\left\| S_{ij}x_{ij} \right\| \) that depends only on the lengths of the semi-principal axes of ellipsoids i and j.

Lemma 4.1

Let \(x_{ij} \in {\mathbb {R}}^n\) and \(S_{ij} = P_i^{\frac{1}{2}}Q_i^{\top }Q_jP^{-1}_jQ_j^{\top }Q_iP_i^{\frac{1}{2}} \in {\mathbb {R}}^{n \times n}\), where \(Q_i\) and \(Q_j\) are orthogonal matrices and \(P_i\) and \(P_j\) are positive definite diagonal matrices. Suppose that \(x_{ij}^{\top } S_{ij} x_{ij}=1\). Thus,

$$\begin{aligned} \left\| S_{ij}x_{ij} \right\| \le \lambda _{\max }(P_i) \lambda _{\max }\left( P_j^{-1}\right) \lambda _{\max }\left( P_j^{\frac{1}{2}}\right) \lambda _{\max }\left( P_i^{-\frac{1}{2}}\right) . \end{aligned}$$

Proof

We have

$$\begin{aligned} \left\| S_{ij}x_{ij} \right\|= & {} \left\| P_i^{\frac{1}{2}}Q_i^{\top }Q_jP^{-1}_jQ_j^{\top }Q_iP_i^{\frac{1}{2}}x_{ij} \right\| \le \lambda _{\max }\left( P_i^{\frac{1}{2}}\right) \left\| Q_i^{\top }Q_jP^{-1}_jQ_j^{\top }Q_iP_i^{\frac{1}{2}}x_{ij} \right\| \\= & {} \lambda _{\max }\left( P_i^{\frac{1}{2}}\right) \left\| P^{-1}_jQ_j^{\top }Q_iP_i^{\frac{1}{2}}x_{ij} \right\| \!\le \! \lambda _{\max }\left( P_i^{\frac{1}{2}}\right) \lambda _{\max }\left( P_j^{-1}\right) \left\| Q_j^{\top }Q_iP_i^{\frac{1}{2}}x_{ij} \right\| \\= & {} \lambda _{\max }\left( P_i^{\frac{1}{2}}\right) \lambda _{\max }\left( P_j^{-1}\right) \left\| P_i^{\frac{1}{2}}x_{ij} \right\| \nonumber \\\le & {} \lambda _{\max }\left( P_i^{\frac{1}{2}}\right) \lambda _{\max }\left( P_j^{-1}\right) \lambda _{\max }\left( P_i^{\frac{1}{2}}\right) \left\| x_{ij} \right\| \\= & {} \lambda _{\max }(P_i) \lambda _{\max }\left( P_j^{-1}\right) \left\| x_{ij} \right\| , \end{aligned}$$

where the second and third equalities hold since \(Q_i\) and \(Q_j\) are orthogonal matrices, and the inequalities and the last equality follow from the fact that \(P_i^{\frac{1}{2}}\) and \(P_j^{-1}\) are positive definite diagonal matrices. Therefore,

$$\begin{aligned} \left\| S_{ij}x_{ij} \right\| \le \lambda _{\max }(P_i) \lambda _{\max }\left( P_j^{-1}\right) \left\| x_{ij} \right\| . \end{aligned}$$
(17)

Since \(x_{ij}^{\top }S_{ij}x_{ij} = 1\), we have \(\left\| x_{ij} \right\| > 0\). Thus,

$$\begin{aligned} \lambda _{\min }(S_{ij}) \le \frac{x_{ij}^{\top }S_{ij}x_{ij}}{\left\| x_{ij} \right\| ^2} = \frac{1}{\left\| x_{ij} \right\| ^2}, \end{aligned}$$

where the inequality follows from the Courant–Fischer Theorem (see, for example, Theorem 8.1.2 in [28]). Since \(S_{ij}\) is positive definite, we have \(\lambda _{\min }(S_{ij}) > 0\). Thus,

$$\begin{aligned} \left\| x_{ij} \right\| ^2 \le \frac{1}{\lambda _{\min }(S_{ij})}. \end{aligned}$$

Moreover, we have

$$\begin{aligned} \lambda _{\min }(S_{ij})= & {} \lambda _{\min }\left( P_i^{\frac{1}{2}}Q_i^{\top }Q_jP^{-1}_jQ_j^{\top }Q_iP_i^{\frac{1}{2}}\right) \ge \lambda _{\min }\left( Q_i^{\top }Q_jP^{-1}_jQ_j^{\top }Q_i\right) \lambda _{\min }\left( P_i^{\frac{1}{2}} P_i^{\frac{1}{2}}\right) \\= & {} \lambda _{\min }\left( Q_i^{\top }Q_jP^{-1}_jQ_j^{\top }Q_i\right) \lambda _{\min }(P_i) \ge \lambda _{\min }\left( P^{-1}_j\right) \lambda _{\min }\left( Q_i^{\top }Q_jQ_j^{\top }Q_i\right) \lambda _{\min }(P_i)\\= & {} \lambda _{\min }\left( P^{-1}_j\right) \lambda _{\min }(P_i), \end{aligned}$$

where the inequalities follow from Theorem 1.4 by Lu and Pearce [40] and the last equality holds since \(Q_i\) and \(Q_j\) are orthogonal matrices. Thus,

$$\begin{aligned} \left\| x_{ij} \right\| ^2 \le \frac{1}{\lambda _{\min }\left( P^{-1}_j\right) \lambda _{\min }(P_i)} = \lambda _{\max }(P_j)\lambda _{\max }\left( P_i^{-1}\right) , \end{aligned}$$

where the equality holds since \(P_i\) and \(P_j\) are positive definite diagonal matrices. So,

$$\begin{aligned}&\left\| x_{ij} \right\| \le \left( \lambda _{\max }(P_j)\lambda _{\max }\left( P_i^{-1}\right) \right) ^{\frac{1}{2}} = \big (\lambda _{\max }(P_j)\big )^{\frac{1}{2}}\left( \lambda _{\max }\left( P_i^{-1}\right) \right) ^{\frac{1}{2}} \\&\quad = \lambda _{\max }\left( P_j^{\frac{1}{2}}\right) \lambda _{\max }\left( P_i^{-\frac{1}{2}}\right) . \end{aligned}$$

Therefore, from (17), we have

$$\begin{aligned} \left\| S_{ij}x_{ij} \right\| \le \lambda _{\max }(P_i) \lambda _{\max }\left( P_j^{-1}\right) \lambda _{\max }\left( P_j^{\frac{1}{2}}\right) \lambda _{\max }\left( P_i^{-\frac{1}{2}}\right) . \end{aligned}$$

\(\square \)

Proposition 4.3

Any solution to the system (13)–(16) is such that \(\mu _{ij} \ge \epsilon _{ij}\) for all \(i < j\), where

$$\begin{aligned} \epsilon _{ij} = \lambda _{\min }\left( P_i^{-1}\right) \lambda _{\min }\left( P_i^{\frac{1}{2}}\right) \lambda _{\min }(P_j) \lambda _{\min }\left( P_j^{-\frac{1}{2}}\right) > 0. \end{aligned}$$

Proof

Consider a solution to the system (13)–(16). By constraints (13), we have \(x_{ij}^{\top } S_{ij} x_{ij} = 1\) for each \(i,j \in I\) such that \(i < j\). Thus, by Lemma 4.1, we have

$$\begin{aligned} \left\| S_{ij}x_{ij} \right\| \le \lambda _{\max }(P_i) \lambda _{\max }\left( P_j^{-1}\right) \lambda _{\max }\left( P_j^{\frac{1}{2}}\right) \lambda _{\max }\left( P_i^{-\frac{1}{2}}\right) \end{aligned}$$

for all \(i,j \in I\) such that \(i < j\). By constraints (14) and (15), we must have \(\left\| S_{ij}x_{ij} \right\| > 0\) and \(\mu _{ij} \ge \left\| S_{ij}x_{ij} \right\| ^{-1}\) for all \(i,j \in I\) such that \(i < j\). Therefore, we can take

$$\begin{aligned} \epsilon _{ij}= & {} \left( \lambda _{\max }(P_i) \lambda _{\max }\left( P_j^{-1}\right) \lambda _{\max }\left( P_j^{\frac{1}{2}}\right) \lambda _{\max }\left( P_i^{-\frac{1}{2}}\right) \right) ^{-1}\nonumber \\= & {} \lambda _{\min }(P_i^{-1}) \lambda _{\min }\left( P_i^{\frac{1}{2}}\right) \lambda _{\min }(P_j) \lambda _{\min }\left( P_j^{-\frac{1}{2}}\right) \end{aligned}$$

and the proposition holds. (Note that \(\epsilon _{ij} > 0\) since \(P_i\) and \(P_j\) are positive definite matrices.)

\(\square \)

For each \(i,j \in I\) such that \(i < j\), the term \(S_{ij}x_{ij}\) appears in constraints (13), (14), and (16). From constraints (16), we have

$$\begin{aligned} P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j) = x_{ij} + \mu _{ij}S_{ij}x_{ij}, \quad \forall i,j \in I {\text { such that }} i < j. \end{aligned}$$

Thus, constraints (14) are equivalent to constraints

$$\begin{aligned} \left\| P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j) - x_{ij} \right\| ^2 \ge 1, \quad \forall i,j \in I {\text { such that }} i < j. \end{aligned}$$
(18)

Constraints (13) can be replaced by

$$\begin{aligned} x_{ij}^{\top }\left( P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j) - x_{ij}\right) = \mu _{ij}, \quad \forall i,j \in I {\text { such that }} i < j, \end{aligned}$$
(19)

provided that \(\mu _{ij} \ne 0\). By Proposition 4.3, there exist positive constants \(\epsilon _{ij}\) such that constraints (13) and (15) are equivalent to constraints (19) and \(\mu _{ij} \ge \epsilon _{ij}\) for all \(i,j \in I\) such that \(i < j\). By Proposition 4.3, we can take

$$\begin{aligned} \epsilon _{ij} = \lambda _{\min }\left( P_i^{-1}\right) \lambda _{\min }\left( P_i^{\frac{1}{2}}\right) \lambda _{\min }(P_j) \lambda _{\min }\left( P_j^{-\frac{1}{2}}\right) , \end{aligned}$$
(20)

where \(\lambda _{\min }(M)\) denotes the least eigenvalue of matrix M. Therefore, we can replace constraints (13) and (14) with constraints (18), (19) and \(\mu _{ij} \ge \epsilon _{ij}\), for all \(i,j \in I\) such that \(i < j\), and obtain an equivalent model. Hence, model (13)–(16) is equivalent to the following model:

$$\begin{aligned}&x_{ij}^{\top } \left( P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) - x_{ij}\right) = \mu _{ij}, \quad \forall i,j \in I {\text { such that }} i < j \end{aligned}$$
(21)
$$\begin{aligned}&\left\| P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) - x_{ij} \right\| ^2 \ge 1, \quad \forall i,j \in I {\text { such that }} i < j \end{aligned}$$
(22)
$$\begin{aligned}&P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) = x_{ij} + \mu _{ij}S_{ij}x_{ij}, \quad \forall i,j \in I {\text { such that }} i < j \end{aligned}$$
(23)
$$\begin{aligned}&\mu _{ij} \ge \epsilon _{ij}, \quad \forall i,j \in I {\text { such that }} i < j. \end{aligned}$$
(24)

The model (21)–(24) has \(m (m-1) (n + 3) / 2\) nonlinear constraints and \(m(m-1)/2\) bound-constraints. If the rotation matrices are represented as in (1) and (2), this model will have \(3m(m+1)/2\) variables in the two-dimensional case and \(2m(m+2)\) variables in the three-dimensional case.

4.2 Separating hyperplane based model

A hyperplane is a set of the form \({\mathcal {H}}= \{x \in {\mathbb {R}}^n \mid w^{\top }x = s\}\), where \(w \in {\mathbb {R}}^n\), \(w \ne 0\), and \(s \in {\mathbb {R}}\). There are two half-spaces associated with hyperplane \({\mathcal {H}}\), namely, \({\mathcal {H}}^{-} = \{x \in {\mathbb {R}}^n \mid w^{\top }x \le s\}\) and \({\mathcal {H}}^{+} = \{x \in {\mathbb {R}}^n \mid w^{\top }x \ge s\}\). We say that a hyperplane \({\mathcal {H}}\) in \({\mathbb {R}}^n\) supports a subset \({\mathcal {A}}\) of \({\mathbb {R}}^n\) if \({\mathcal {A}}\) is contained in one of the half-spaces associated with \({\mathcal {H}}\) and there exists at least one element of \({\mathcal {A}}\) that belongs to the hyperplane \({\mathcal {H}}\). We denote the relative interior of set \({\mathcal {A}}\) by \(\text {ri}({\mathcal {A}})\).

Given non-empty subsets \({\mathcal {A}}\) and \({\mathcal {B}}\) of \({\mathbb {R}}^n\), we say that a hyperplane separates sets \({\mathcal {A}}\) and \({\mathcal {B}}\) if \({\mathcal {A}}\) is contained in one of the half-spaces associated with this hyperplane and \({\mathcal {B}}\) is contained in the other half-space associated with this hyperplane. If \({\mathcal {A}}\) and \({\mathcal {B}}\) are convex sets then, by Theorem 11.3 in [47], there exists a hyperplane that separates \({\mathcal {A}}\) and \({\mathcal {B}}\) if and only if \(\text {ri}({\mathcal {A}}) \cap \text {ri}({\mathcal {B}}) = \emptyset \). Therefore, since the relative interior of an ellipsoid is the interior of this ellipsoid, there exists a hyperplane that separates ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\) if and only if \(\text {int}({\mathcal {E}}_i) \cap \text {int}({\mathcal {E}}_j) = \emptyset \). In this section, we propose a non-overlapping model based on separating hyperplanes.

Consider the ellipsoids

$$\begin{aligned} {\mathcal {E}}_i= & {} \left\{ x \in {\mathbb {R}}^n \mid (x - c_i)^{\top }Q_iP_i^{-1}Q_i^{\top } (x-c_i) \le 1\right\} {\text { and }}\\ {\mathcal {E}}_j= & {} \left\{ x \in {\mathbb {R}}^n \mid (x - c_j)^{\top }Q_jP_j^{-1}Q_j^{\top } (x-c_j) \le 1\right\} , \end{aligned}$$

where \(c_i, c_j \in {\mathbb {R}}^n\), \(Q_i, Q_j \in {\mathbb {R}}^{n \times n}\) are orthogonal matrices and \(P_i,P_j \in {\mathbb {R}}^{n \times n}\) are positive definite and diagonal matrices. Let \(M_i = Q_iP_i^{-1}Q_i^{\top }\) and \(M_j = Q_jP_j^{-1}Q_j^{\top }\). For any \(x \in \partial {\mathcal {E}}_i\), vector \(M_i(x - c_i)\) defines a hyperplane that passes through the point x and supports the ellipsoid \({\mathcal {E}}_i\) (see Lemma 4.2 below). For the ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\) not to overlap, there must be a point \(y \in \partial {\mathcal {E}}_j\) such that, for some \(x \in \partial {\mathcal {E}}_i\), x can be expressed as the sum of y and a nonnegative multiple of \(M_j(y - c_j)\) and the vector \(M_j(y - c_j)\) must be a negative multiple of \(M_i(x - c_i)\). Figure 2 illustrates this situation in \({\mathbb {R}}^2\). In this picture, we have \(\tilde{x}_{ij} \in \partial {\mathcal {E}}_i\) and \(\tilde{x}_{ji} \in \partial {\mathcal {E}}_j\). So, vectors \(M_i(\tilde{x}_{ij} - c_i)\) and \(M_j(\tilde{x}_{ji} - c_j)\) determine hyperplanes that support the ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\) at the points \(\tilde{x}_{ij}\) and \(\tilde{x}_{ji}\), respectively.

Fig. 2
figure 2

Separation of two ellipsoids by hyperplanes determined by the vectors \(M_i(\tilde{x}_{ij} - c_i)\) and \(M_j(\tilde{x}_{ji} - c_j)\), and the points \(\tilde{x}_{ij}\) and \(\tilde{x}_{ji}\)

We thus obtain the following model for the non-overlapping of ellipsoids, where the variables are \(c_i \in {\mathbb {R}}^n\), the angles that form matrix \(Q_i \in {\mathbb {R}}^{n \times n}\) for each \(i \in I\), \(\gamma _{ij}, \rho _{ij} \in {\mathbb {R}}\) for each \(i,j \in I\) such that \(i < j\), and \(\tilde{x}_{ij} \in {\mathbb {R}}^n\) for each \(i,j \in I\) such that \(i \ne j\).

$$\begin{aligned} (\tilde{x}_{ij} - c_i)^{\top } M_i (\tilde{x}_{ij} - c_i)&= 1&\qquad \qquad \qquad \forall i,j \in I {\text { such that }} i < j \end{aligned}$$
(25)
$$\begin{aligned} (\tilde{x}_{ji} - c_j)^{\top } M_j (\tilde{x}_{ji} - c_j)&=1&\qquad \qquad \qquad \forall i,j \in I {\text { such that }} i < j \end{aligned}$$
(26)
$$\begin{aligned} M_j (\tilde{x}_{ji} - c_j)&= - \gamma _{ij} M_i (\tilde{x}_{ij} - c_i)&\qquad \forall i,j \in I {\text { such that }} i < j \end{aligned}$$
(27)
$$\begin{aligned} \tilde{x}_{ij}&= \tilde{x}_{ji} + \rho _{ij} M_j(\tilde{x}_{ji} - c_j)&\quad \forall i,j \in I {\text { such that }} i < j \end{aligned}$$
(28)
$$\begin{aligned} \rho _{ij}&\ge 0&\qquad \qquad \qquad \forall i,j \in I {\text { such that }} i < j \end{aligned}$$
(29)
$$\begin{aligned} \gamma _{ij}&\ge 0&\qquad \qquad \qquad \forall i,j \in I {\text { such that }} i < j. \end{aligned}$$
(30)

This model has \(m(m-1)(n+1)\) nonlinear constraints and \(m(m-1)\) bound-constraints. If the rotation matrices are representd as in (1) and (2) then this model will have \(3m^2\) variables in the two-dimensional case and \(4m^2 + 2m\) in the three-dimensional case.

By Propositions 4.4 and 4.5 below, constraints (25)–(30) indeed model the non-overlapping of ellipsoids. Lemma 4.2 is used in the proofs of Propositions 4.4 and 4.5.

Lemma 4.2

Consider the ellipsoid \({\mathcal {E}}= \{x \in {\mathbb {R}}^n \mid (x - c)^{\top }M(x-c) \le 1\}\), where \(M \in {\mathbb {R}}^{n \times n}\) is positive definite. Let \(x^* \in \partial {\mathcal {E}}\) and define \(w = M(x^* - c)\) and \(s = w^{\top }x^*\). Thus, \(w^{\top }x \le s\) for every \(x \in {\mathcal {E}}\).

Proof

Let \(f: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) be the function defined by \(f(x) = (x-c)^{\top }M(x-c)\). Since the Hessian of f (the matrix 2M) is positive definite in every point of \({\mathbb {R}}^n\), we have that f is convex. Therefore, by the first-order condition of convexity, we have

$$\begin{aligned} f(x) \ge f(x^*) + \nabla f(x^*)^{\top }(x - x^*) \end{aligned}$$
(31)

for all \(x \in {\mathcal {E}}\). Since \(w = \nabla f(x^*) / 2\), \(s = w^{\top }x^*\), \(f(x^*) = 1\) and \(f(x) \le 1\) for all \(x \in {\mathcal {E}}\), inequality (31) implies that \(w^{\top }x \le s\) for all \(x \in {\mathcal {E}}\). \(\square \)

Proposition 4.4

Any solution to the system (25)–(30) is such that \(\text {int}({\mathcal {E}}_i) \cap \text {int}({\mathcal {E}}_j) = \emptyset \) for all \(i,j \in I\) such that \(i \ne j\).

Proof

Consider a solution to the system (25)–(30). Let \(i,j \in I\) be such that \(i < j\). Let \(w_{ji} = M_j(\tilde{x}_{ji} - c_j)\) and \(s_{ji} = w_{ji}^{\top }\tilde{x}_{ji}\) and consider the hyperplane \({\mathcal {H}}_{ji} = \{x \in {\mathbb {R}}^n \mid w_{ji}^{\top }x = s_{ji}\}\). We shall prove that \({\mathcal {H}}_{ji}\) separates ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\). By Lemma 4.2, we have \(w_{ji}^{\top }x \le s_{ji}\) for all \(x \in {\mathcal {E}}_j\), i.e., \({\mathcal {E}}_j \subseteq {\mathcal {H}}_{ji}^{-}\). Point \(\tilde{x}_{ij}\) belongs to half-space \({\mathcal {H}}^{+}_{ji}\) since

$$\begin{aligned} w_{ji}^{\top }\tilde{x}_{ij}&=w_{ji}^{\top } \big [ \tilde{x}_{ji} + \rho _{ij} M_j(\tilde{x}_{ji} - c_j) \big ] = w_{ji}^{\top } (\tilde{x}_{ji} + \rho _{ij} w_{ji})\nonumber \\&=w_{ji}^{\top }\tilde{x}_{ji} + \rho _{ij} \left\| w_{ji} \right\| ^2 = s_{ji} + \rho _{ij} \left\| w_{ji} \right\| ^2 \ge s_{ji}, \end{aligned}$$
(32)

where the first equality follows from (28), the second equality follows from the definition of \(w_{ji}\), the fourth equality follows from the definition of \(s_{ji}\) and the inequality holds since \(\rho _{ij}\) is nonnegative. Now, consider the hyperplane \({\mathcal {H}}_{ij} = \{x \in {\mathbb {R}}^n \mid w_{ij}^{\top }x = s_{ij}\}\), where \(w_{ij} = M_i(\tilde{x}_{ij} - c_i)\) and \(s_{ij} = w_{ij}^{\top }\tilde{x}_{ij}\). For all \(x \in {\mathcal {E}}_i\), we have

$$\begin{aligned} w_{ji}^{\top }x = -\gamma _{ij} w_{ij}^{\top }x \ge -\gamma _{ij} s_{ij} = -\gamma _{ij} w_{ij}^{\top }\tilde{x}_{ij} = w_{ji}^{\top }\tilde{x}_{ij} \ge s_{ji}, \end{aligned}$$

where the first and third equalities follow from (27), the second equality follows from the definition of \(s_{ij}\), the first inequality follows from Lemma 4.2 and the fact that \(\gamma _{ij}\) is nonnegative, and the last inequality follows from (32). Therefore, \(x \in {\mathcal {H}}_{ji}^{+}\) for each \(x \in {\mathcal {E}}_i\). Hence, we have \({\mathcal {E}}_j \subseteq {\mathcal {H}}_{ji}^{-}\) and \({\mathcal {E}}_i \subseteq {\mathcal {H}}_{ji}^{+}\), i.e., hyperplane \({\mathcal {H}}_{ji}\) separates ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\). In other words, ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\) do not overlap. \(\square \)

Proposition 4.5

Let \(I = \{1,\dots ,m\}\). For each \(i \in I\), let \({\mathcal {E}}_i = \{x \in {\mathbb {R}}^n \mid (x-c_i)^{\top }M_i(x-c_i) \le 1\}\), where \(c_i \in {\mathbb {R}}^n\) and \(M \in {\mathbb {R}}^{n \times n}\) is positive definite. If ellipsoids \({\mathcal {E}}_1, \dots , {\mathcal {E}}_m\) do not overlap each other, then the system (25)–(30) has a solution.

Proof

Let \(i,j \in I\) be such that \(i < j\). Suppose that \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\) do not overlap. Let \(x^* \in {\mathcal {E}}_i\) and \(y^* \in {\mathcal {E}}_j\) be such that the distance between ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\) is equal to \(\left\| x^* - y^* \right\| \). Thus, \((x^*,y^*)\) is an optimal solution to the problem

$$\begin{aligned} {\text {minimize}}&\qquad \left\| x - y \right\| ^2\\ {\text {subject to}}&\qquad (x-c_i)^{\top }M_i(x-c_i)\,\le 1\\&\qquad (y-c_j)^{\top }M_j(y-c_j)\le 1. \end{aligned}$$

Since both constraints of this problem are convex in \({\mathbb {R}}^{2n}\) and point \((c_i,c_j)\) strictly satisfies both inequalities, this problem fulfills the Slater constraint qualification. Therefore, by Proposition 3.3.9 in [5], there exist Lagrange multipliers \(\mu _i^* \in {\mathbb {R}}\) and \(\mu _j^* \in {\mathbb {R}}\) such that

$$\begin{aligned}&\displaystyle 2(x^* - y^*) + 2\mu _i^* M_i(x^* - c_i) = 0 \end{aligned}$$
(33)
$$\begin{aligned}&\displaystyle 2(y^* - x^*) + 2\mu _j^* M_j(y^* - c_j) = 0 \end{aligned}$$
(34)
$$\begin{aligned}&\displaystyle \mu _i^* \ge 0 \end{aligned}$$
(35)
$$\begin{aligned}&\displaystyle \mu _j^* \ge 0. \end{aligned}$$
(36)

From (34), we have \(x^* = y^* + \mu _j^* M_j(y^* - c_j)\). From (33) and (34), it follows that

$$\begin{aligned} \mu _j^* M_j(y^* - c_j) = -\mu _i^* M_i(x^* - c_i). \end{aligned}$$

Since the ellipsoids do not overlap, we must have \(x^* \in \partial {\mathcal {E}}_i\) and \(y^* \in \partial {\mathcal {E}}_j\). Thus, since \(M_i\) and \(M_j\) are nonsingular, we have \(M_i(x^* - c_i) \ne 0 \ne M_j(y^* - c_j)\). Therefore, \(\mu _j^* \ne 0\) if \(\mu _i^* \ne 0\), and \(\mu _j^* = 0\) if \(\mu _i^* = 0\).

Suppose that \(\mu _i^* = 0\). Thus, equation (33) implies that \(x^* = y^*\). Since \(\text {int}({\mathcal {E}}_i) \cap \text {int}({\mathcal {E}}_j) = \emptyset \), there exists a hyperplane \({\mathcal {H}}\) that separates ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\). Since \(x^* \in \partial {\mathcal {E}}_i\) and \(x^* \in \partial {\mathcal {E}}_j\), point \(x^*\) must belong to the hyperplane \({\mathcal {H}}\). (Suppose that \(x^* \notin {\mathcal {H}}\) and let \(w \in {\mathbb {R}}^n\) and \(s \in {\mathbb {R}}\) be such that \({\mathcal {H}}= \{x \in {\mathbb {R}}^n \mid w^{\top }x = s\}\). Since \(x^* \notin {\mathcal {H}}\), we have either \(w^{\top }x^* < s\) or \(w^{\top }x^* > s\). Suppose, without loss of generality, that \(w^{\top }x^* < s\). Thus, there exists a ball \({\mathcal {B}}\) with center in \(x^*\) and radius \(r > 0\) such that \(w^{\top }x < s\) for all \(x \in {\mathcal {B}}\). Since \(x^* \in \partial {\mathcal {E}}_i\), Theorem 6.1 in [47] implies that there exists \(z_i \ne x^*\) such that \(z_i \in {\mathcal {B}}\cap \text {int}({\mathcal {E}}_i)\). By the same reason, since \(x^* \in \partial {\mathcal {E}}_j\), there exists \(z_j \ne x^*\) such that \(z_j \in {\mathcal {B}}\cap \text {int}({\mathcal {E}}_j)\). Therefore, \(z_i \in \text {int}({\mathcal {E}}_i)\) and \(z_j \in \text {int}({\mathcal {E}}_j)\) satisfies \(w^{\top }z_i < s\) and \(w^{\top }z_j < s\). But it contradicts the fact that \({\mathcal {H}}\) separates ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\).)

Since \(x^*\) belongs to the hyperplane \({\mathcal {H}}\) and \({\mathcal {H}}\) separates ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\), we have that \({\mathcal {H}}\) supports ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\) in \(x^*\). Let \({\mathcal {H}}_{ij} = \{x \in {\mathbb {R}}^n \mid w_{ij}^{\top }x = s_{ij}\}\), where \(w_{ij} = M_i(x^* - c_i)\) and \(s_{ij} = w_{ij}^{\top }x^*\). By Lemma 4.2, \({\mathcal {H}}_{ij}\) supports \({\mathcal {E}}_i\) in \(x^*\). By Theorem 3.1 in [29], there exists only one hyperplane that supports \({\mathcal {E}}_i\) in \(x^*\). Therefore, \({\mathcal {H}}_{ij} = {\mathcal {H}}\). Similarly, if we define \({\mathcal {H}}_{ji} = \{x \in {\mathbb {R}}^n \mid w_{ji}^{\top }x = s_{ji}\}\), where \(w_{ji} = M_j(x^* - c_j)\) and \(s_{ji} = w_{ji}^{\top }x^*\), we have that \({\mathcal {H}}_{ji} = {\mathcal {H}}\). Therefore, \({\mathcal {H}}_{ij} = {\mathcal {H}}_{ji}\). Hence, \(w_{ij}\) must be parallel to \(w_{ji}\), i.e., there must exist a scalar \(\gamma \ne 0\) such that \(M_i(x^* - c_i) = \gamma M_j(x^* - c_j)\). Notice that \(w_{ij}^{\top }c_i < s_{ij}\), since

$$\begin{aligned}&- w_{ij}^{\top }c_i = -c_i^{\top }M_i(x^* - c_i) = (x^* - c_i - x^*)^{\top }M_i(x^* - c_i) = (x^* - c_i)^{\top }M_i(x^* - c_i)\\&- s_{ij} = 1 - s_{ij}. \end{aligned}$$

So, \(c_i \notin {\mathcal {H}}\). Since \({\mathcal {H}}_{ji}\) separates ellipsoids \({\mathcal {E}}_i\) and \({\mathcal {E}}_j\), Lemma 4.2 implies that \(w_{ji}^{\top }c_i \ge s_{ji}\). Thus, since \(c_i \notin {\mathcal {H}}\), we must have \(w_{ji}^{\top }c_i > s_{ji}\). In order to derive a contradiction, suppose that \(\gamma \) is positive. Then,

$$\begin{aligned} w_{ij}^{\top }c_i < s_{ij} = w_{ij}^{\top }x^* = \gamma w_{ji}^{\top }x^* = \gamma s_{ji} < \gamma w_{ji}^{\top }c_i = w_{ij}^{\top }c_i, \end{aligned}$$

which is a contradiction. Therefore, \(\gamma \) must be negative.

Hence, if we take \(\tilde{x}_{ij} \doteq x^*\), \(\tilde{x}_{ji} \doteq y^*\), \(\rho _{ij} \doteq \mu _j^*\) and

$$\begin{aligned} \gamma _{ij} \doteq \left\{ \begin{array}{ll} -\frac{\mu _j^*}{\mu _i^*} &{}\quad {\text {if }} \,\mu _i^* > 0\\ -\frac{(x^* - c_j)^{\top }M_jM_i(x^* - c_i)}{\left\| M_j(x^* - c_j) \right\| ^2} &{}\quad {\text {if }} \,\mu _i^* = 0,\\ \end{array} \right. \end{aligned}$$

then constraints (25)–(30) are satisfied. \(\square \)

By constraints (26)–(27), any solution to the system (25)–(30) must satisfy

$$\begin{aligned} - \gamma _{ij} (\tilde{x}_{ji} - c_j)^{\top } M_i (\tilde{x}_{ij} - c_i) = 1 \end{aligned}$$

for all \(i < j\). Then, \(\gamma _{ij}\) cannot be zero. Hence, since \(\gamma _{ij} \ge 0\) by constraints (30), we must have \(\gamma _{ij} > 0\) for all \(i < j\). The following lemma provides a positive lower bound on the value of \(\gamma _{ij}\).

Lemma 4.3

Any solution to the system (25)–(30) is such that \(\gamma _{ij} \ge \lambda _{\min }(P_i)\) for all \(i < j\).

Proof

Consider a solution to the system (25)–(30). By constraints (26)–(27), we must have

$$\begin{aligned} - \gamma _{ij} (\tilde{x}_{ji} - c_j)^{\top } M_i (\tilde{x}_{ij} - c_i) = 1. \end{aligned}$$

Thus,

$$\begin{aligned} \gamma _{ij} = - \left[ (\tilde{x}_{ji} - c_j)^{\top } M_i (\tilde{x}_{ij} - c_i)\right] ^{-1}. \end{aligned}$$

Since \(M_i\) is positive definite, we have \((\tilde{x}_{ji} - c_j + \tilde{x}_{ij} - c_i)^{\top } M_i (\tilde{x}_{ji} - c_j + \tilde{x}_{ij} - c_i) \ge 0\). Then, since

$$\begin{aligned}&(\tilde{x}_{ji} - c_j + \tilde{x}_{ij} - c_i)^{\top } M_i (\tilde{x}_{ji} - c_j + \tilde{x}_{ij} - c_i) =\\&(\tilde{x}_{ji} - c_j)^{\top } M_i (\tilde{x}_{ji} - c_j) + (\tilde{x}_{ij} - c_i)^{\top } M_i (\tilde{x}_{ij} - c_i) + 2(\tilde{x}_{ji} - c_j)^{\top } M_i (\tilde{x}_{ij} - c_i), \end{aligned}$$

we must have

$$\begin{aligned} - (\tilde{x}_{ji} - c_j)^{\top } M_i (\tilde{x}_{ij} - c_i)\le & {} \frac{1}{2} \left[ (\tilde{x}_{ji} - c_j)^{\top } M_i (\tilde{x}_{ji} - c_j) + (\tilde{x}_{ij} - c_i)^{\top } M_i (\tilde{x}_{ij} - c_i)\right] \\\le & {} \max \{(\tilde{x}_{ji} - c_j)^{\top } M_i (\tilde{x}_{ji} - c_j), (\tilde{x}_{ij} - c_i)^{\top } M_i (\tilde{x}_{ij} - c_i)\}\\\le & {} \lambda _{\max }(M_i) = \lambda _{\max }(Q_iP_i^{-1}Q_i^{\top }) = \lambda _{\max }(P_i^{-1}), \end{aligned}$$

where the last equality holds since \(Q_i\) is orthogonal. Hence,

$$\begin{aligned} \gamma _{ij} \ge \left[ \lambda _{\max }\left( P_i^{-1}\right) \right] ^{-1} = \lambda _{\min }(P_i). \end{aligned}$$

\(\square \)

According to Lemma 4.3, we can replace (in case this represents any advantage for the solution process) \(\gamma _{ij} \ge 0\) in (30) with \(\gamma _{ij} \ge \lambda _{\min }(P_i)\).

5 Containment models

5.1 Ellipsoid inside an ellipsoid

In this section, we present a model for the inclusion of an ellipsoid \({\mathcal {E}}_i\) inside an ellipsoid \({\mathcal {C}}\). Firstly, we apply a transformation to \({\mathcal {E}}_i\) that converts this ellipsoid into a ball \({\mathcal {E}}_{ii}\) with unitary radius and we apply the same transformation to \({\mathcal {C}}\), thus obtaining an ellipsoid \({\mathcal {C}}_i\). In this way, we have \({\mathcal {E}}_i \subseteq {\mathcal {C}}\) if and only if \({\mathcal {E}}_{ii} \subseteq {\mathcal {C}}_i\). In order to guarantee that \({\mathcal {E}}_{ii}\) be contained in \({\mathcal {C}}_i\), we require that the center \(c_{ii}\) of ball \({\mathcal {E}}_{ii}\) be in \({\mathcal {C}}_i\) and the distance between \(c_{ii}\) and the frontier of ellipsoid \({\mathcal {C}}_i\) be at least one. Since the computation of the distance between a point and the frontier of an ellipsoid demands the solution of a non-convex optimization problem, we will represent the center \(c_{ii}\) with respect to \({\mathcal {C}}_i\) in a similar manner to what was done in Sect. 4.1. In this representation, the distance between \(c_{ii}\) and the frontier of ellipsoid \({\mathcal {C}}_i\) is easily obtained.

In order to develop this model, we must first state some results. Next, we present Propositions 5.1 and 5.2 and Lemmas 5.1 and 5.2. Lemma 5.1 is used in the proof of Proposition 5.1 and Lemma 5.2 is used in the proof of Proposition 5.2. These lemmas consider particular cases of Propositions 5.1 and 5.2.

Lemma 5.1

Consider the ellipsoid \({\mathcal {E}}= \{z \in {\mathbb {R}}^n \mid z^{\top } D z \le 1\}\), where \(D \in {\mathbb {R}}^{n \times n}\) is a positive definite diagonal matrix. For each \(y \in {\mathcal {E}}\), there exist \(x \in \partial {\mathcal {E}}\) and \(\alpha \in [-1/\lambda _{\max }(D), 0]\) such that \(y = x + \alpha Dx\).

Proof

We shall prove the assertion by induction on the dimension of the ellipsoid. We will denote the i-th diagonal element of matrix D by \(d_i\).

Consider the one-dimensional case, where \(n = 1\). Then, \(D = d_1 = \lambda _{\max }(D)\), \({\mathcal {E}}= \{z \in {\mathbb {R}}\mid d_1 z^2 \le 1\} = \{z \in {\mathbb {R}}\mid -1/\sqrt{d_1} \le z \le 1/\sqrt{d_1}\}\) and \(\partial {\mathcal {E}} = \{-1/\sqrt{d_1},1/\sqrt{d_1}\}\). Let \(y \in {\mathcal {E}}\). We will analyse the cases where \(-1/\sqrt{d_1} \le y \le 0\) and \(0 < y \le 1/\sqrt{d_1}\) separately. Suppose that \(-1/\sqrt{d_1} \le y \le 0\). Take \(x = -1/\sqrt{d_1}\) and consider the point \(x + \alpha Dx\) with

$$\begin{aligned} \alpha = \frac{y-x}{d_1 x} = - \frac{y + 1/\sqrt{d_1}}{\sqrt{d_1}}. \end{aligned}$$

Then,

$$\begin{aligned} x + \alpha Dx = x + \frac{y-x}{d_1 x} d_1 x = y. \end{aligned}$$

Since \(y \ge -1/\sqrt{d_1}\), we have \(y + 1/\sqrt{d_1} \ge 0\). Thus, \(\alpha \le 0\). In addition, since \(y \le 0\), we have

$$\begin{aligned} \alpha = - \frac{y + 1/\sqrt{d_1}}{\sqrt{d_1}} \ge - \frac{1/\sqrt{d_1}}{\sqrt{d_1}} = -\frac{1}{d_1} = - \frac{1}{\lambda _{\max }(D)}. \end{aligned}$$

Hence, \(y = x + \alpha Dx\) with \(x \in \partial {\mathcal {E}}\) and \(\alpha \in [-1/\lambda _{\max }(D), 0]\). The case where \(0 < y \le 1/\sqrt{d_1}\) is analogous. Simply take \(x = 1/\sqrt{d_1}\) and \(\alpha = (y-x)/(d_1x)\).

Consider \(n > 1\) and suppose that the assertion is true for all ellipsoids lying in a dimension strictly less than n. Consider the ellipsoid \({\mathcal {E}}= \{z \in {\mathbb {R}}^n \mid z^{\top } D z \le 1\}\) and let \(y \in {\mathcal {E}}\). Let \({\mathcal {I}} = \{1,\dots ,n\}\), \({\mathcal {I}}^+ = \{i \in {\mathcal {I}} \mid d_i = \lambda _{\max }(D)\}\) and \({\mathcal {I}}^- = {\mathcal {I}} \setminus {\mathcal {I}}^+\). Since D is diagonal, we must find \(x \in {\mathbb {R}}^n\) and \(\alpha \in {\mathbb {R}}\) such that

$$\begin{aligned}&y_i = x_i + \alpha d_i x_i, \forall i \in {\mathcal {I}}, \end{aligned}$$
(37)
$$\begin{aligned}&x^{\top }Dx = 1, \end{aligned}$$
(38)
$$\begin{aligned}&\alpha \in \,\, [-1/\lambda _{\max }(D),0]. \end{aligned}$$
(39)

For each \(\alpha \in [-1/\lambda _{\max }(D),0]\) and \(i \in {\mathcal {I}}^-\), we have \(1 + \alpha d_i \in (0,1]\). Therefore, from (37), for all \(i \in {\mathcal {I}}^-\) we must have

$$\begin{aligned} x_i = \frac{y_i}{1 + \alpha d_i}. \end{aligned}$$

Now, we consider two cases: the first one where \(y_i \ne 0\) for all \(i \in {\mathcal {I}}^+\) and the second one where \(y_j = 0\) for some \(j \in {\mathcal {I}}^+\).

Case 1. Suppose that \(y_i \ne 0\) for all \(i \in {\mathcal {I}}^+\). In this case, we must have \(\alpha > -1 / \lambda _{\max }(D)\). Thus, from (37), we must have

$$\begin{aligned} x_i = \frac{y_i}{1 + \alpha d_i} \end{aligned}$$
(40)

for all \(i \in {\mathcal {I}}\). Then,

$$\begin{aligned} x^{\top }Dx = \sum _{i=1}^n d_i x_i^2 = \sum _{i=1}^n d_i \frac{y_i^2}{(1 + \alpha d_i)^2} = \sum _{i \in {\mathcal {I}}^+} d_i \frac{y_i^2}{[1 + \alpha \lambda _{\max }(D)]^2} + \sum _{i \in {\mathcal {I}}^-} d_i \frac{y_i^2}{(1 + \alpha d_i)^2}. \end{aligned}$$

Thus, for \(\alpha > -1 / \lambda _{\max }(D)\), we have \(x^{\top }Dx = 1\) if and only if

$$\begin{aligned} \sum _{i \in {\mathcal {I}}^+} d_i y_i^2 = [1 + \alpha \lambda _{\max }(D)]^2 \left[ 1 - \sum _{i \in {\mathcal {I}}^-} d_i \frac{y_i^2}{(1 + \alpha d_i)^2} \right] . \end{aligned}$$

Let \(f: {\mathbb {R}}\rightarrow {\mathbb {R}}\) be the function defined by

$$\begin{aligned} f(t) = \sum _{i \in {\mathcal {I}}^+} d_i y_i^2 - [1 + t \lambda _{\max }(D)]^2 \left[ 1 - \sum _{i \in {\mathcal {I}}^-} d_i \frac{y_i^2}{(1 + t d_i)^2} \right] . \end{aligned}$$

We have

$$\begin{aligned} f(0) = \sum _{i \in {\mathcal {I}}^+} d_i y_i^2 - \left( 1 - \sum _{i \in {\mathcal {I}}^-} d_i y_i^2 \right) = \sum _{i \in {\mathcal {I}}} d_i y_i^2 - 1 = y^{\top }Dy - 1 \le 0, \end{aligned}$$

where the inequality holds since \(y \in {\mathcal {E}}\), i.e., \(y^{\top }Dy \le 1\). We also have

$$\begin{aligned} f(-1/\lambda _{\max }(D)) = \sum _{i \in {\mathcal {I}}^+} d_i y_i^2 > 0. \end{aligned}$$

Thus, since f is continuous in the interval \([-1/\lambda _{\max }(D),0]\) and \(f(0) \le 0\) and \(f(-1/\lambda _{\max }(D)) > 0\), by the Intermediate Value Theorem, there exist \(t^* \in (-1/\lambda _{\max }(D), 0]\) such that \(f(t^*) = 0\). Therefore, by taking \(\alpha = t^*\) and x as in (40), the system (37)–(39) is satisfied.

Case 2. Suppose that \(y_j = 0\) for some \(j \in {\mathcal {I}}^+\). We shall consider the cases where \(|{\mathcal {I}}^+| = 1\) and \(|{\mathcal {I}}^+| > 1\) individually.

Case 2.1. Suppose that \(|{\mathcal {I}}^+| = 1\). Then, \({\mathcal {I}}^- = {\mathcal {I}} \setminus \{j\}\). Thus, from (37), we must have \(x_i = y_i / (1 + \alpha d_i)\) for all \(i \in {\mathcal {I}} \setminus \{j\}\). Then, \(x^{\top }Dx = 1\) if and only if

$$\begin{aligned} \lambda _{\max }(D) x_j^2 = 1 - \sum _{i \in {\mathcal {I}}^-} d_i \frac{y_i^2}{(1 + \alpha d_i)^2}. \end{aligned}$$

Let \(g: {\mathbb {R}}\rightarrow {\mathbb {R}}\) be the function defined by

$$\begin{aligned} g(t) = 1 - \sum _{i \in {\mathcal {I}}^-} d_i \frac{y_i^2}{(1 + t d_i)^2}. \end{aligned}$$

If \(g(-1/\lambda _{\max }(D)) \ge 0\), then we can take \(\alpha = -1/\lambda _{\max }(D)\), \(x_i = y_i/(1 + \alpha d_i)\) for all \(i \in {\mathcal {I}} \setminus \{j\}\) and

$$\begin{aligned} x_j = \left[ \frac{1}{\lambda _{\max }(D)} \left( 1 - \sum _{i \in {\mathcal {I}}^-} d_i \frac{y_i^2}{(1 + \alpha d_i)^2} \right) \right] ^{\frac{1}{2}}, \end{aligned}$$

and therefore x and \(\alpha \) form a solution to the system (37)–(39). Suppose that \(g(-1/\lambda _{\max }(D)) < 0\). Since

$$\begin{aligned} g(0) = 1 - \sum _{i \in {\mathcal {I}}^-} d_i y_i^2 = 1 - y^{\top }Dy \ge 0 \end{aligned}$$

and g is continuous in the interval \([-1/\lambda _{\max }(D),0]\), by the Intermediate Value Theorem, there exists \(t^* \in (-1/\lambda _{\max }(D),0]\) such that \(g(t^*) = 0\). Then, by taking \(\alpha = t^*\), \(x_j = 0\) and \(x_i = y_i/(1 + \alpha d_i)\) for all \(i \in {\mathcal {I}} \setminus \{j\}\), we have a solution to the system (37)–(39).

Case 2.2. Suppose that \(|{\mathcal {I}}^+| > 1\). Let \(\tilde{y} \in {\mathbb {R}}^{n-1}\) be defined as

$$\begin{aligned} \tilde{y}_i = \left\{ \begin{array}{l l} y_i &{}\quad {\text {if }} i < j,\\ y_{i+1} &{}\quad {\text {if }} i \ge j.\\ \end{array} \right. \end{aligned}$$

Consider the diagonal matrix \(\tilde{D} \in {\mathbb {R}}^{(n-1) \times (n-1)}\), where the i-th element of its diagonal is given by

$$\begin{aligned} \tilde{d}_i = \left\{ \begin{array}{ll} d_i &{}\quad {\text {if }} i < j,\\ d_{i+1} &{}\quad {\text {if }} i \ge j.\\ \end{array} \right. \end{aligned}$$

Then, since \(|{\mathcal {I}}^+| > 1\), there exists \(i \in {\mathcal {I}} \setminus \{j\}\) such that \(d_i = \lambda _{\max }(D)\). Then, \(\lambda _{\max }(\tilde{D}) = \lambda _{\max }(D)\). By construction, we have \(\tilde{y}^{\top }\tilde{D}\tilde{y} = y^{\top }Dy \le 1\). Thus, by the induction hypothesis, there exist \(\tilde{\alpha } \in [-1/\lambda _{\max }(D),0]\) and \(\tilde{x} \in {\mathbb {R}}^{n-1}\) such that \(\tilde{y} = \tilde{x} + \tilde{\alpha } \tilde{D}\tilde{x}\) and \(\tilde{x}^{\top }\tilde{D}\tilde{x} = 1\). Therefore, if we define \(\alpha = \tilde{\alpha }\) and \(x \in {\mathbb {R}}^n\) by

$$\begin{aligned} x_i = \left\{ \begin{array}{l l} \tilde{x}_i &{} \quad {\text {if }} i < j,\\ 0 &{}\quad {\text {if }} i = j,\\ \tilde{x}_{i-1} &{} \quad {\text {if }} i > j,\\ \end{array} \right. \end{aligned}$$

we have \(y = x + \alpha D x\), \(x^{\top }Dx = 1\) and \(\alpha \in [-1/\lambda _{\max }(D),0]\). In other words, x and \(\alpha \) form a solution to the system (37)–(39) and the proof is complete. \(\square \)

Proposition 5.1

Consider the ellipsoid \({\mathcal {E}}= \{z \in {\mathbb {R}}^n \mid z^{\top } S z \le 1\}\), where \(S \in {\mathbb {R}}^{n \times n}\) is symmetric and positive definite. For each \(y \in {\mathcal {E}}\), there exist \(x \in \partial {\mathcal {E}}\) and \(\alpha \in [-1/\lambda _{\max }(S), 0]\) such that \(y = x + \alpha Sx\).

Proof

Let \(y \in {\mathcal {E}}\). Since S is symmetric, there exist an orthogonal matrix \(Q \in {\mathbb {R}}^{n \times n}\) and a diagonal matrix \(D \in {\mathbb {R}}^{n \times n}\) formed by the eigenvalues of S such that \(S = QDQ^{\top }\) and \(\lambda _{\max }(S) = \lambda _{\max }(D)\) (see, for example, Theorem 8.1.1 in [28]). Consider the ellipsoid \({\mathcal {E}}' = \{z \in {\mathbb {R}}^n \mid z^{\top } D z \le 1\}\). Then, \(y' = Q^{\top }y \in {\mathcal {E}}'\) and, by Lemma 5.1, there exist \(x' \in \partial {\mathcal {E}}'\) and \(\alpha ' \in [-1/\lambda _{\max }(D), 0]\) such that \(y' = x' + \alpha ' D x'\). By left multiplying by Q both sides of this equality, we obtain

$$\begin{aligned} y = Qx' + \alpha ' QDx' = Qx' + \alpha ' QDQ^{\top }Qx' = Qx' + \alpha ' SQx'. \end{aligned}$$

Since \(x' \in \partial {\mathcal {E}}'\), we have \(Qx' \in \partial {\mathcal {E}}\). Define \(x = Qx'\) and \(\alpha = \alpha '\). Therefore, \(y = x + \alpha S x\), with \(x \in \partial {\mathcal {E}}\) and \(\alpha \in [-1/\lambda _{\max }(S), 0]\). \(\square \)

Lemma 5.2

Let \(D \in {\mathbb {R}}^{n \times n}\) be a positive definite diagonal matrix. Consider the ellipsoid \({\mathcal {E}}= \{z \in {\mathbb {R}}^n \mid z^{\top } D z \le 1\}\). Let \(x \in \partial {\mathcal {E}}\) and \(\alpha \in [-1/\lambda _{\max }(D), 0]\). Let \(y = x + \alpha Dx\). Then, \(y \in {\mathcal {E}}\) and the distance from y to the frontier of \({\mathcal {E}}\) is \(\left\| y - x \right\| \).

Proof

If \(\alpha = 0\), then \(y = x\) and, therefore, \(y \in {\mathcal {E}}\) and \(d(y,\partial {\mathcal {E}}) = d(x,\partial {\mathcal {E}}) = 0 = \left\| y - x \right\| \). Suppose that \(\alpha < 0\). Consider the ball centered at y with radius \(\left\| y-x \right\| \). We shall prove that this ball is contained in the ellipsoid \({\mathcal {E}}\). Let \(z \in {\mathbb {R}}^n\) be a point belonging to this ball. Then,

$$\begin{aligned} \left\| z-y \right\| ^2 \le \left\| y - x \right\| ^2 = \alpha ^2 \left\| Dx \right\| ^2. \end{aligned}$$
(41)

Since \(y = x + \alpha D x\), we have

$$\begin{aligned} \left\| z-y \right\| ^2 = \left\| z - x - \alpha Dx \right\| ^2 = \left\| z-x \right\| ^2 - 2\alpha (z-x)^{\top }Dx + \alpha ^2 \left\| Dx \right\| ^2. \end{aligned}$$
(42)

From (41) and (42), it follows that

$$\begin{aligned} \left\| z-x \right\| ^2 - 2\alpha (z-x)^{\top }Dx \le 0. \end{aligned}$$

Notice that

$$\begin{aligned} \left\| z-x \right\| ^2 - 2\alpha (z-x)^{\top }Dx&= (z-x)^{\top }(z - x - 2\alpha Dx) = (z-x)^{\top }(z - x - \alpha Dx - \alpha Dx)\\&= (z-x)^{\top }(z - y - \alpha Dx) = (z-x)^{\top }(z - y) - \alpha (z-x)^{\top }Dx\\&= (z-x)^{\top }(z - y) - \alpha z^{\top }Dx + \alpha x^{\top }Dx\\&= (z-x)^{\top }(z - y) - \alpha z^{\top }Dx + \alpha , \end{aligned}$$

where the third equality holds since \(y = x + \alpha Dx\) and the last equality holds since \(x \in \partial {\mathcal {E}}\), i.e., \(x^{\top }Dx = 1\). Thus,

$$\begin{aligned} (z-x)^{\top }(z - y) - \alpha z^{\top }Dx \le -\alpha . \end{aligned}$$

By dividing both sides of this inequality by \(-\alpha \) (that, by assumption, is positive), we obtain the following inequality:

$$\begin{aligned} \frac{1}{\alpha }(z-x)^{\top }(y - z) + z^{\top }Dx \le 1. \end{aligned}$$

We have

$$\begin{aligned}&\frac{1}{\alpha }(z-x)^{\top }(y - z) + z^{\top }Dx\\&\quad = \frac{1}{\alpha }(z-x)^{\top }(y - z) + z^{\top }D(x - z + z) = \frac{1}{\alpha }(z-x)^{\top }(y - z) - (z-x)^{\top }Dz + z^{\top }Dz\\&\quad = {\alpha }(z-x)^{\top }(y - z - \alpha Dz) + z^{\top }Dz = \frac{1}{\alpha }(z-x)^{\top }(x + \alpha Dx - z - \alpha Dz) + z^{\top }Dz\\&\quad = \frac{1}{\alpha }(z-x)^{\top }[x - z + \alpha D(x-z)] + z^{\top }Dz = -\frac{1}{\alpha }(x-z)^{\top }[x - z + \alpha D(x-z)] + z^{\top }Dz. \end{aligned}$$

Therefore,

$$\begin{aligned} -\frac{1}{\alpha }(x-z)^{\top }[x - z + \alpha D(x-z)] + z^{\top }Dz \le 1. \end{aligned}$$

Note that

$$\begin{aligned} -\frac{1}{\alpha }(x-z)^{\top }[x - z + \alpha D(x-z)] \ge 0, \end{aligned}$$

since \(-1 / \alpha > 0\) and

$$\begin{aligned}&(x-z)^{\top }[x - z + \alpha D(x-z)]\\&\quad = (x-z)^{\top }(x - z) + \alpha (x-z)^{\top }D(x-z) \ge \left\| x-z \right\| ^2 \!+\! \alpha (x-z)^{\top }(\lambda _{\max }(D) I_{n})(x-z)\\&\quad = \left\| x-z \right\| ^2 + \alpha \lambda _{\max }(D) \left\| x-z \right\| ^2 = [1 + \alpha \lambda _{\max }(D)] \left\| x-z \right\| ^2 \ge 0, \end{aligned}$$

where the first inequality follows from the fact that D is a diagonal matrix and \(\alpha < 0\), and the second inequality holds since \(\alpha \ge -1/\lambda _{\max }(D)\). Consequently, we have \(z^{\top }Dz \le 1\), i.e., \(z \in {\mathcal {E}}\). Thus, the ball centered at y with radius \(\left\| y-x \right\| \) is contained in the ellipsoid \({\mathcal {E}}\). Therefore, \(y \in {\mathcal {E}}\) and since x belongs to this ball and \(x \in \partial {\mathcal {E}}\), we conclude that \(d(y,\partial {\mathcal {E}}) = \left\| y-x \right\| \). (Suppose, in order to derive a contradiction, that \(d(y,\partial {\mathcal {E}}) < \left\| y-x \right\| \). Then, there exists \(v \in \partial {\mathcal {E}}\) such that \(\left\| y - v \right\| < \left\| y-x \right\| \). Then, v belongs to the interior of ball \({\mathcal {B}}(y,\left\| y-x \right\| )\) centered at y with radius \(\left\| y - x \right\| \). Since \({\mathcal {B}}(y,\left\| y-x \right\| )\) is contained in \({\mathcal {E}}\), we have that v is also an interior point of \({\mathcal {E}}\), which is a contradiction. Thus, \(d(y,\partial {\mathcal {E}}) = \left\| y-x \right\| \).) \(\square \)

Proposition 5.2

Consider the ellipsoid \({\mathcal {E}}= \{z \in {\mathbb {R}}^n \mid z^{\top } S z \le 1\}\), where \(S \in {\mathbb {R}}^{n \times n}\) is a symmetric and positive definite matrix. Let \(x \in \partial {\mathcal {E}}\) and \(\alpha \in [-1/\lambda _{\max }(S), 0]\). Let \(y = x + \alpha Sx\). Then, \(y \in {\mathcal {E}}\) and the distance from y to the frontier of \({\mathcal {E}}\) is \(\left\| y - x \right\| \).

Proof

Since S is symmetric, there exist an orthogonal matrix \(Q \in {\mathbb {R}}^{n \times n}\) and a diagonal matrix \(D \in {\mathbb {R}}^{n \times n}\) formed by the eigenvalues of S such that \(S = QDQ^{\top }\) and \(\lambda _{\max }(S) = \lambda _{\max }(D)\) (see, for example, Theorem 8.1.1 in [28]). Consider the ellipsoid \({\mathcal {E}}' = \{z \in {\mathbb {R}}^n \mid z^{\top } D z \le 1\}\). Then, \(Q^{\top }x \in \partial {\mathcal {E}}'\). Thus, by Lemma 5.2, \(y' = Q^{\top }x + \alpha D Q^{\top }x\) is such that \(y' \in {\mathcal {E}}'\) and the distance from \(y'\) to the frontier of \({\mathcal {E}}'\) is \(\left\| y' - Q^{\top }x \right\| \). Since \(y' \in {\mathcal {E}}'\), it follows that \(Qy' \in {\mathcal {E}}\). Moreover,

$$\begin{aligned} Qy' = Q\left( Q^{\top }x + \alpha D Q^{\top }x\right) = x + \alpha Q D Q^{\top }x = x + \alpha Sx = y. \end{aligned}$$

Thus, \(y = Qy'\) and, therefore, \(y \in {\mathcal {E}}\). We also have

$$\begin{aligned}&d(y',\partial {\mathcal {E}}') = \min _{z \in \partial {\mathcal {E}}'} \left\| y' - z \right\| = \min _{z \in \partial {\mathcal {E}}'} \left\| Q(y' - z) \right\| = \min _{z \in \partial {\mathcal {E}}'} \left\| y - Qz \right\| \\&\quad = \min _{w \in \partial {\mathcal {E}}} \left\| y - w \right\| = d(y,\partial {\mathcal {E}}), \end{aligned}$$

where the second equality is valid since Q is orthogonal and the fourth equality holds since, for all \(z \in \partial {\mathcal {E}}'\), we have \(Qz \in \partial {\mathcal {E}}\) and, for all \(w \in \partial {\mathcal {E}}\), we have \(w = Q(Q^{\top }w)\) and \(Q^{\top }w \in \partial {\mathcal {E}}'\). Thus, \(d(y',\partial {\mathcal {E}}') = d(y,\partial {\mathcal {E}})\). Furthermore,

$$\begin{aligned} d(y',\partial {\mathcal {E}}') = \left\| y' - Q^{\top }x \right\| = \left\| Q\left( y' - Q^{\top }x\right) \right\| = \left\| Qy' - x \right\| = \left\| y - x \right\| . \end{aligned}$$

Hence, \(y \in {\mathcal {E}}\) and \(d(y,\partial {\mathcal {E}}) = \left\| y - x \right\| \). \(\square \)

We are now able to develop the model. Consider the ellipsoid \({\mathcal {C}}= \{x \in {\mathbb {R}}^n \mid x^{\top }P^{-1}x \le 1\}\), where P is a positive definite diagonal matrix. Consider also the ellipsoid \({\mathcal {E}}_i = \{ x \in {\mathbb {R}}^n \mid (x - c_i)^{\top }Q_iP_i^{-1}Q_i^{\top } (x-c_i) \le 1\}\), where \(c_i \in {\mathbb {R}}^n\), \(Q_i \in {\mathbb {R}}^{n \times n}\) is orthogonal and \(P_i \in {\mathbb {R}}^{n \times n}\) is a positive definite diagonal matrix. By applying transformation \(T_i\) defined in (4) to ellipsoid \({\mathcal {E}}_i\), we obtain the ball

$$\begin{aligned} {\mathcal {E}}_{ii} = \left\{ x \in {\mathbb {R}}^n \mid \left( x - P_i^{-\frac{1}{2}}Q_i^{\top }c_i\right) ^{\top }\left( x - P_i^{-\frac{1}{2}}Q_i^{\top }c_i\right) \le 1\right\} . \end{aligned}$$

By applying the same transformation \(T_i\) to ellipsoid \({\mathcal {C}}\), we obtain the ellipsoid

$$\begin{aligned} {\mathcal {C}}_i = \{ x \in {\mathbb {R}}^n \mid x^{\top }S_ix \le 1\}, \end{aligned}$$

where

$$\begin{aligned} S_{i} = P_i^{\frac{1}{2}}Q_i^{\top }P^{-1}Q_iP_i^{\frac{1}{2}}. \end{aligned}$$
(43)

We have that \({\mathcal {E}}_i \subseteq {\mathcal {C}}\) if and only if \({\mathcal {E}}_{ii} \subseteq {\mathcal {C}}_i\). In order to guarantee that \({\mathcal {E}}_{ii} \subseteq {\mathcal {C}}_i\), we require that the center \(c_{ii}\) of ball \({\mathcal {E}}_{ii}\) be in \({\mathcal {C}}_i\) and that the distance between \(c_{ii}\) and the frontier of \({\mathcal {C}}_i\) be at least one. By Proposition 5.1, if \(c_{ii} \in {\mathcal {C}}_i\) then there exist \(\bar{x}_{i} \in \partial {\mathcal {C}}_i\) and \(\alpha _i \in [-1/\lambda _{\max }(S_i), 0]\) such that

$$\begin{aligned} c_{ii} = \bar{x}_{i} + \alpha _{i}S_{i}\bar{x}_{i}. \end{aligned}$$
(44)

Moreover, by Proposition 5.2, any point of the form (44) belongs to ellipsoid \({\mathcal {C}}_i\) and the distance between \(c_{ii}\) and \(\partial {\mathcal {C}}_i\) is \(\left\| c_{ii} - \bar{x}_i \right\| \). Thus, since \(c_{ii} = P_i^{-\frac{1}{2}}Q_i^{\top }c_i\), we obtain the following model for the inclusion of ellipsoids into an ellipsoid.

$$\begin{aligned}&P_i^{-\frac{1}{2}}Q_i^{\top }c_i = \bar{x}_{i} + \alpha _{i}S_{i}\bar{x}_{i}, \qquad \forall i \in I \end{aligned}$$
(45)
$$\begin{aligned}&\bar{x}_{i}^{\top }S_{i}\bar{x}_{i} = 1, \qquad \forall i \in I \end{aligned}$$
(46)
$$\begin{aligned}&\left\| P_i^{-\frac{1}{2}}Q_i^{\top }c_i - \bar{x}_i \right\| ^2 \ge 1, \qquad \forall i \in I \end{aligned}$$
(47)
$$\begin{aligned}&\alpha _{i} \le 0, \qquad \forall i \in I \end{aligned}$$
(48)
$$\begin{aligned}&\alpha _{i} \ge -1/\lambda _{\max }(S_i), \qquad \forall i \in I. \end{aligned}$$
(49)

Consider a solution to the system (45)–(49). Notice that the value of \(\alpha _i\) must be strictly negative for each \(i \in I\). Otherwise, if \(\alpha _i = 0\) for some \(i \in I\), constraint (45) implies that \(c_{ii} = \bar{x}_{i}\) and, therefore, \(\left\| c_{ii} - \bar{x}_i \right\| = 0\), which violates constraint (47). Lemma 5.3 provides a negative upper bound on the value of \(\alpha _i\).

Lemma 5.3

Any solution to the system (45)–(49) is such that

$$\begin{aligned} \alpha _{i} \le - \lambda _{\min }\left( P_i^{-1}\right) \lambda _{\min }\left( P_i^{\frac{1}{2}}\right) \lambda _{\min }(P) \lambda _{\min }\left( P^{-\frac{1}{2}}\right) < 0, \quad {\text { for each }} i \in I. \end{aligned}$$

Proof

Consider a solution to the system (45)–(49). Let \(i \in I\). By constraints (46), we have \(\bar{x}_{i}^{\top } S_{i} \bar{x}_{i} = 1\). Then, by Lemma 4.1 (taking \(x_{ij} \doteq \bar{x}_i\), \(P_j \doteq P\) and \(Q_j \doteq I_{n}\)), we have

$$\begin{aligned} \left\| S_{i}\bar{x}_{i} \right\| \le \lambda _{\max }(P_i) \lambda _{\max }\left( P^{-1}\right) \lambda _{\max }\left( P^{\frac{1}{2}}\right) \lambda _{\max }\left( P_i^{-\frac{1}{2}}\right) . \end{aligned}$$

By constraints (47), we have \(\alpha _{i}^2 \left\| S_{i}\bar{x}_{i} \right\| ^2 \ge 1\). Thus, we must have \(\left\| S_{i}\bar{x}_{i} \right\| > 0\) and, therefore, \(\alpha _{i}^2 \ge 1 / \left\| S_{i}\bar{x}_{i} \right\| ^2\). Consequently, since \(\alpha _i \le 0\) by constraints (48), we must have \(\alpha _i \le - 1 / \left\| S_{i}\bar{x}_{i} \right\| \). Hence, \(\alpha _i\) must satisfy

$$\begin{aligned} \alpha _i\le & {} - \left( \lambda _{\max }(P_i) \lambda _{\max }(P^{-1}) \lambda _{\max }\left( P^{\frac{1}{2}}\right) \lambda _{\max }\left( P_i^{-\frac{1}{2}}\right) \right) ^{-1}\\= & {} - \lambda _{\min }\left( P_i^{-1}\right) \lambda _{\min }\left( P_i^{\frac{1}{2}}\right) \lambda _{\min }(P) \lambda _{\min }\left( P^{-\frac{1}{2}}\right) . \end{aligned}$$

(Note that \(- \lambda _{\min }(P_i^{-1}) \lambda _{\min }(P_i^{\frac{1}{2}}) \lambda _{\min }(P) \lambda _{\min }(P^{-\frac{1}{2}}) < 0\) since \(P_i\) and P are positive definite matrices.) \(\square \)

Thus, by Lemma 5.3, the following is a valid constraint.

$$\begin{aligned} \alpha _{i} \le - \lambda _{\min }\left( P_i^{-1}\right) \lambda _{\min }\left( P_i^{\frac{1}{2}}\right) \lambda _{\min }(P) \lambda _{\min }\left( P^{-\frac{1}{2}}\right) , \quad \forall i \in I. \end{aligned}$$
(50)

If the ellipsoidal container is fixed, then the right-hand side of (50) is a constant for each \(i \in I\). In this case, constraints (50) are bound-constraints and they might replace constraints (48). On the other hand, if the ellipsoidal container is not fixed (for example, the volume of the container could be minimized), then \(\lambda _{\min }(P)\) is not a constant and, therefore, the right-hand side of (50) is not constant either. In any case, if global optimization techniques are employed to solve the problem, then constraints (50) could be useful in order to reduce the search space.

5.1.1 Computing the largest eigenvalue of \(S_i\)

The i-th constraint in (49) depends on the largest eigenvalue of matrix \(S_i\) defined in (43). Thus, we must know how to compute it. Firstly, we consider the particular two-dimensional case. Next, we consider the problem in \({\mathbb {R}}^n\) where the container is a ball. Finally, we consider the general case in \({\mathbb {R}}^n\) where the container is an arbitrary ellipsoid (centered at the origin).

Let \(i \in I\). Consider the two-dimensional problem and suppose that \(a_i\) and \(b_i\) are the eigenvalues of \(P_i^{\frac{1}{2}}\), and a and b are the eigenvalues of \(P^{\frac{1}{2}}\). In this case, if we represent the rotation matrix \(Q_i\) as in (1), the largest eigenvalue of \(S_i\) will be given by

$$\begin{aligned} \lambda _{\max }(S_i) = \frac{\delta _i + \sqrt{\beta _i}}{4 a^2 b^2}, \end{aligned}$$

where

$$\begin{aligned} \delta _i = \left( a^2 + b^2\right) \left( a_i^2 + b_i^2\right) - \left( a^2 - b^2\right) \left( a_i^2 - b_i^2\right) \cos \left( 2 \theta _i\right) \end{aligned}$$

and

$$\begin{aligned} \beta _i = \delta _i^2 - (4a b a_i b_i)^2. \end{aligned}$$

Constraint \(\alpha _{i} \ge -1/\lambda _{\max }(S_i)\) is therefore equivalent to constraint

$$\begin{aligned} \alpha _i \ge - \frac{4a^2b^2}{\delta _i + \sqrt{\beta _i}}, \end{aligned}$$

which in turn is equivalent to constraint

$$\begin{aligned} \alpha _i \sqrt{\beta _i} \ge - (4a^2b^2 + \alpha _i \delta _i). \end{aligned}$$
(51)

By constraints (48), \(\alpha _i\) must be nonpositive. Then, we must have \(\alpha _i \sqrt{\beta _i} \le 0\) and, therefore, \(4a^2b^2 + \alpha _i \delta _i \ge 0\). In this way, constraint (51) is equivalent to constraints

$$\begin{aligned} \alpha _i^2 \beta _i - (4a^2b^2 + \alpha _i \delta _i)^2&\ge 0 \end{aligned}$$
(52)
$$\begin{aligned} 4a^2b^2 + \alpha _i \delta _i&\ge 0. \end{aligned}$$
(53)

The function that defines constraint (51) is not everywhere differentiable in the domain of the variables of the model, whereas the functions that define constraints (52) and (53) are continuous and differentiable. So, for our purposes, the latter constraints are more suitable than the former one. This is because we are interested in solving the problem of packing ellipsoids in practice and, for this, we will use methods that make use of the derivatives of the functions that define the problem.

Now, consider the problem in \({\mathbb {R}}^n\) and suppose that the container is a ball with radius \(r > 0\). In this case, we have \(P = r^2I_{n}\) and thus

$$\begin{aligned} S_{i} = P_i^{\frac{1}{2}}Q_i^{\top }P^{-1}Q_iP_i^{\frac{1}{2}} = r^{-2} P_i^{\frac{1}{2}}Q_i^{\top }Q_iP_i^{\frac{1}{2}} = r^{-2} P_i. \end{aligned}$$

Then, \(\lambda _{\max }(S_i) = r^{-2}\lambda _{\max }(P_i)\) and the largest eigenvalue of \(P_i\) is simply the largest element of the diagonal of \(P_i\).

Finally, consider the problem in \({\mathbb {R}}^n\) where the container is an ellipsoid centered at the origin. Since \(S_i\) is nonsingular, we have \(\lambda _{\min }(S_i^{-1}) = 1 / \lambda _{\max }(S_i)\). Then, the problem of computing the largest eigenvalue of matrix \(S_i\) is reduced to the problem of computing the least eigenvalue of matrix \(S_i^{-1}\). Consider the system of equations

$$\begin{aligned}&S_i^{-1} v_i = \lambda _i v_i \end{aligned}$$
(54)
$$\begin{aligned}&v_i^{\top }v_i = 1 \end{aligned}$$
(55)
$$\begin{aligned}&(S_i^{-1} - \lambda _i I_{n}) = B_i^{\top } B_i, \end{aligned}$$
(56)

where the variables are \(\lambda _i \in {\mathbb {R}}\), \(v_i \in {\mathbb {R}}^n\) and \(B_i \in {\mathbb {R}}^{n \times n}\). Equations (54) and (55) are satisfied if and only if \(v_i\) is an eigenvector of \(S_i^{-1}\) and \(\lambda _i\) is the eigenvalue associated with \(v_i\). Equation (56) is satisfied if and only if matrix \(S_i^{-1} - \lambda _i I_{n}\) is positive semidefinite. (A matrix A is positive semidefinite if and only if there exists a matrix B such that \(A = B^{\top }B\). See, for example, page 566 in Meyer [42]). Since \(S_i^{-1}\) is positive definite, matrix \(S_i^{-1} - \lambda _i I_{n}\) is positive semidefinite if and only if \(\lambda _i \in [0, \lambda _{\min }(S_i^{-1})]\). Since Eqs. (54) and (55) imply that \(\lambda _i\) is an eigenvalue of \(S_i^{-1}\), Eq. (56) is satisfied if and only if \(\lambda _i = \lambda _{\min }(S_i^{-1})\). Therefore, in the n-dimensional case, the i-th constraint in (49) of the model (45)–(49) must be replaced by constraints

$$\begin{aligned} \alpha _{i}&\ge -\lambda _i\nonumber \\ S_i^{-1} v_i&= \lambda _i v_i\nonumber \\ v_i^{\top }v_i&= 1\nonumber \\ (S_i^{-1} - \lambda _i I_{n})&= B_i^{\top }B_i, \end{aligned}$$
(57)

incorporating the variables \(\lambda _i\), \(v_i\) and \(B_i\) into the model.

5.2 Ellipsoid inside a half-space

In this section, we propose a model to include an ellipsoid \({\mathcal {E}}_i\) into a half-space \({\mathcal {H}}\). A transformation is applied to the ellipsoid \({\mathcal {E}}_i\) which converts it into a ball \({\mathcal {E}}_{ii}\) and the same transformation is applied to the half-space \({\mathcal {H}}\), thus obtaining a half-space \({\mathcal {H}}_i\). Next, we model the inclusion of \({\mathcal {E}}_{ii}\) into \({\mathcal {H}}_i\) and observe that \({\mathcal {E}}_i\) is contained in \({\mathcal {H}}\) if and only if \({\mathcal {E}}_{ii}\) is contained in \({\mathcal {H}}_i\).

Consider the half-space \({\mathcal {H}}= \{x \in {\mathbb {R}}^n \mid w^{\top }x \le s\}\), where \(w \in {\mathbb {R}}^n\), \(w \ne 0\), and \(s \in {\mathbb {R}}\), and the ellipsoid \({\mathcal {E}}_i = \{ x \in {\mathbb {R}}^n \mid (x - c_i)^{\top }Q_iP_i^{-1}Q_i^{\top } (x-c_i) \le 1\}\), where \(c_i \in {\mathbb {R}}^n\), \(Q_i \in {\mathbb {R}}^{n \times n}\) is orthogonal and \(P_i \in {\mathbb {R}}^{n \times n}\) is positive definite and diagonal. Let \({\mathcal {H}}_i\) be the set obtained when transformation \(T_i\) defined in (4) is applied to the half-space \({\mathcal {H}}\), i.e.,

$$\begin{aligned} {\mathcal {H}}_i&= \left\{ x \in {\mathbb {R}}^n \mid x = T_i(z), z \in {\mathcal {H}}\right\} = \left\{ x \in {\mathbb {R}}^n \mid x = P_i^{-\frac{1}{2}} Q_i^{\top }z, z \in {\mathcal {H}}\right\} \\&= \left\{ x \in {\mathbb {R}}^n \mid z = Q_iP_i^{\frac{1}{2}}x, z \in {\mathcal {H}}\right\} =\left\{ x \in {\mathbb {R}}^n \mid w^{\top }Q_iP_i^{\frac{1}{2}}x \le s\right\} . \end{aligned}$$

We have that \({\mathcal {E}}_i \subseteq {\mathcal {H}}\) if and only if \({\mathcal {E}}_{ii} \subseteq {\mathcal {H}}_i\). Thus, in order to guarantee that ellipsoid \({\mathcal {E}}_i\) is contained in the half-space \({\mathcal {H}}\), we require that ball \({\mathcal {E}}_{ii}\) be contained in the half-space \({\mathcal {H}}_i\), i.e., the center \(c_{ii}\) of ball \({\mathcal {E}}_{ii}\) must belong to \({\mathcal {H}}_i\) and the distance from \(c_{ii}\) to the frontier of the half-space \({\mathcal {H}}_i\) must be at least one.

The frontier of the half-space \({\mathcal {H}}_i\) is the hyperplane \(\partial {\mathcal {H}}_i = \{ x \in {\mathbb {R}}^n \mid w^{\top }Q_iP_i^{\frac{1}{2}}x = s \}\). Thus, the distance \(d(c_{ii}, \partial {\mathcal {H}}_i)\) from the point \(c_{ii}\) to the frontier of \({\mathcal {H}}_i\) is given by

$$\begin{aligned} d(c_{ii}, \partial {\mathcal {H}}_i) = \frac{|w^{\top }Q_iP_i^{\frac{1}{2}}c_{ii} - s|}{\left\| P_i^{\frac{1}{2}}Q_i^{\top }w \right\| }. \end{aligned}$$

Therefore, the conditions

$$\begin{aligned} \left( w^{\top }Q_iP_i^{\frac{1}{2}}c_{ii} - s\right) ^2 \ge \left\| P_i^{\frac{1}{2}}Q_i^{\top }w \right\| ^2 \quad {\text { and }} \quad w^{\top }Q_iP_i^{\frac{1}{2}}c_{ii} \le s \end{aligned}$$
(58)

are satisfied if and only if \({\mathcal {E}}_i \subseteq {\mathcal {H}}\). Recalling that \(c_{ii} = P_i^{-\frac{1}{2}}Q_i^{\top }c_i\), conditions (58) can also be written as

$$\begin{aligned} \left( w^{\top }c_i - s \right) ^2 \ge \left\| P_i^{\frac{1}{2}}Q_i^{\top }w \right\| ^2 \quad {\text { and }} \quad w^{\top }c_i \le s. \end{aligned}$$
(59)

6 Numerical experiments

In this section, we present a variety of numerical experiments that aim to illustrate the capabilities and limitations of the introduced models for packing ellipsoids. In a first set of experiments, we consider the problem tackled in [27] that consists in packing as many identical ellipses as possible within a given rectangle. In a second set of experiments, we deal with the problem approached in [38] that consists in, given a set of (not necessarily identical) ellipses, finding the rectangle with the smallest area within which the given set of ellipses can be packed. Finally, in a third set of experiments, we deal with the problem of packing three-dimensional ellipsoids within a sphere or cuboid, trying to minimize the volume of the container.

All considered two-dimensional models were coded in AMPL [26] (Modeling Language for Mathematical Programming), while the three-dimensional models were coded in Fortran 90. The experiments were run on a 2.4GHz Intel Core2 Quad Q6600 machine with 4.0GB RAM memory and Ubuntu 12.10 (GNU/Linux 3.5.0-21-generic x86_64) operating system. As the nonlinear programming (NLP) solver, we have used Algencan [2, 13] version 3.0.0, which is available for downloading at the TANGO Project web page (http://www.ime.usp.br/~egbirgin/tango/). Algencan was compiled with GNU Fortran (GCC) 4.7.2 compiler with the -O3 optimization directive enabled.

Algencan is an augmented Lagrangian method for nonlinear programming that solves the bound-constrained augmented Lagrangian subproblems using Gencan [3, 11, 12], an active-set method for bound-constrained minimization. Gencan adopts the leaving-face criterion described in [11], that employs spectral projected gradients defined in [15, 16]. For the internal-to-the-face minimization, Gencan uses an unconstrained algorithm that depends on the dimension of the problem and the availability of second-order derivatives. For small problems with available Hessians, a Newtonian trust-region approach is used (see [3]); while for medium- and large-sized problems with available Hessians a Newtonian line-search method that combines backtracking and extrapolation is used (this is the case of the two-dimensional problems presented in the current section that, since they were coded in AMPL, have second-order derivatives available). When second-order derivatives are not available, each step of Gencan computes the direction inside the face using a line-search truncated-Newton approach with incremental quotients to approximate the matrix-vector products and memoryless BFGS preconditioners (this is the case of the three-dimensional problems considered in the present section, that were coded in Fortran 90 and for which only first-order derivatives were coded).

Although Algencan is a local nonlinear programming solver, it was designed in such a way that global minimizers of subproblems are actively pursued, independently of the fulfillment of approximate stationarity conditions in the subproblems. In other words, Algencan’s subproblem solvers try always to find the lowest possible function values, even when this is not necessary for obtaining approximate local minimizers. As a consequence, practical behavior of Algencan is usually well explained by the properties of their global-optimization counterparts [8]. The “preference for global minimizers” of Algencan has been discussed in [2]. This has also been observed in papers devoted to numerical experiments concerning Algencan and other solvers (see, for example, [31, 32] and the references therein). This does not mean at all that Algencan is able to find global minimizers. Moreover, in no case it would be able to prove that a global minimizer has been found. This simply means that, although unnecessary from the theoretical point of view, Algencan makes an effort to find good quality local minimizers.

6.1 Two-dimensional packing

6.1.1 Packing the maximum number of ellipses within a rectangle

Given positive numbers L, W, a, and b, the problem considered in this section consists in computing the maximum number \(m^*\) of identical ellipses with semi-axis lengths a and b that can be packed within a rectangle with length L and width W. To illustrate the capabilities of the introduced models, we have considered a very simple strategy for packing the maximum possible number of identical ellipses into a given rectangle. The algorithm iteratively packs an increasing amount of ellipses into the rectangle. At the m-th iteration, the algorithm tries to pack m ellipses. If it successfully packs the m ellipses inside the rectangle, then the iteration is over and the next one begins. If it cannot pack the m ellipses, a packing with \(m^* = m-1\) ellipses is returned and the algorithm terminates.

In order to pack m ellipses, a feasibility problem must be solved. This feasibility problem consists of the non-overlapping constraints (21)–(24) or, alternatively, the non-overlapping constraints (25)–(30), plus the fitting constraints that require the ellipses to be inside the rectangle. In (21)–(24) or (25)–(30), we have that \(P_i \in {\mathbb {R}}^{2 \times 2}\), for \(i=1,\dots ,m\), is the diagonal matrix with diagonal entries \(a^2\) and \(b^2\); while \(\epsilon _{ij}\) (\(i,j\in \{1,\dots ,m\}\) such that \(i < j\)) is given by (20). The inclusion of an ellipse within the rectangle is obtained by requiring the ellipse to be contained in four half-spaces (each one associated with a particular side of the rectangle) as modeled in (59). Hence, considering that the rectangle with length L and width W is centered at the origin and has sides parallel to the Cartesian axes, the fitting constraints are given by

$$\begin{aligned} \left( w^{\top }_{\ell }c_i - s_{\ell } \right) ^2 \ge \left\| P_i^{\frac{1}{2}}Q_i^{\top }w_{\ell } \right\| ^2 \quad {\text { and }} \quad w^{\top }_{\ell }c_i \le s_{\ell } \quad \text{ for } \, i=1,\dots ,m, \; \quad \ell =1,\dots ,4, \end{aligned}$$
(60)

where

$$\begin{aligned} w_1 = -w_2 = (1,0)^{\top }, \quad w_3 = -w_4 = (0,1)^{\top }, \quad s_1 = s_2 = \frac{L}{2} \quad {\text { and }} \quad s_3 = s_4 = \frac{W}{2}. \end{aligned}$$

From now on, the feasibility problem with m ellipses and that uses the non-overlapping constraints (21)–(24) plus the fitting constraints (60) will be named \(\mathcal{{F}}^m_1\); while the one that uses the non-overlapping constraints (25)–(30) plus the fitting constraints (60) will be named \(\mathcal{{F}}^m_2\). The model \(\mathcal{{F}}^m_1\) has \(m(5m + 11)/2\) constraints and \(3m(m+1)/2\) variables; while the model \(\mathcal{{F}}^m_2\) has \(m(3m + 5)\) constraints and \(3m^2\) variables.

Since the feasibility problems \(\mathcal{{F}}^m_1\) and \(\mathcal{{F}}^m_2\) are non-convex and their numerical resolution can be a very hard task, we apply a multi-start strategy. We define a maximum number \(N_{{\text {att}}}\) of attempts to solve each problem launching the local NLP solver Algencan from different initial points. If the problem is successfully solved, a packing with m ellipses is found. In this case, m is incremented and the algorithm continues. Otherwise, if the maximum number of attempts has been reached, then the algorithm stops, suggesting that a packing with m ellipses is not possible, and a packing with \(m^* = m-1\) ellipses is returned.

The algorithm starts with \(m = 1\) and increases m by one at each iteration. It is important to mention that most of the computational effort is spent when solving the problem with \(m = m^*\) and trying to solve the problem with \(m = m^* + 1\). See, for example, [10, 14, 17] where exhaustive numerical experiments support this claim.

When trying to pack m ellipses, the first initial point is constructed as follows. First, \(m-1\) ellipses are arranged as in the solution for the problem with \(m-1\) ellipses and the m-th ellipse is randomly arranged in the rectangle (the center of the m-th ellipse is chosen uniformly at random inside the rectangle and its rotation angle is chosen uniformly at random in the interval \([0,\pi ]\)). For each subsequent attempt of packing m ellipses, the initial point is given by a small perturbation of the solution returned by the local NLP solver in the previous unsuccessful attempt. We have considered a maximum of \(N_{{\text {att}}} = 100\) attempts to solve each subproblem for a fixed value of m. Also, we have considered a total CPU time limit of 5 hours to solve all subproblems for increasing values of m.

Table 1 Results obtained for the instances proposed in [27]

Table 1 shows the results obtained by applying the described strategy connected with models \({\mathcal {F}}_1\) and \({\mathcal {F}}_2\) to the six instances considered in [27], each one defined by a rectangle with length 6 and width 3 and identical ellipses with eccentricity 0.74536. In the table, the first column refers to the instance name and the second column shows the lengths of the semi-axes of the identical ellipses. The third column shows the number of ellipses packed by the method proposed in [27]. The fourth and fifth columns present, for models \(\mathcal{{F}}_1\) and \(\mathcal{{F}}_2\), respectively, the number of packed ellipses and the total CPU time spent (in seconds). As expected, the strategy of solving models \({\mathcal {F}}_1^m\) and \({\mathcal {F}}_2^m\) for increasing values of m was able to find better solutions than the ones found by the method proposed in [27], since our models do not impose constraints on the rotation angles of the ellipses (the method proposed in [27] considers only 90-degree rotations). It is worth noting that this set of experiments suggests that the usage of model \({\mathcal {F}}_1\) delivered solutions faster, even being able to deliver a better quality solution (within the considered CPU time limit of 5 hours and the maximum number of attempts) for instance GL6. Figure 3 shows the graphical representation of the solutions found by considering model \({\mathcal {F}}_1\).

Fig. 3
figure 3

Solutions found by model \({\mathcal {F}}_1\) for the instances proposed in [27]

6.1.2 Minimizing the area of the container

In this section, we first consider the problem of packing a given set of m (identical or non-identical) ellipses with semi-axis lengths \(a_i\) and \(b_i\) (for \(i \in \{1,\dots ,m\}\)) within a rectangular container of minimum area. This problem can be modeled as the nonlinear programming problem that consists in minimizing the product of the variable length L and width W of the rectangular container (centered at the origin and with their sides parallel to the Cartesian axes) subject to the non-overlapping constraints (21)–(24) plus the fitting constraints (60), where \(P_i \in {\mathbb {R}}^{2 \times 2}\) is the diagonal matrix with entries \(a_i^2\) and \(b_i^2\) for \(i \in \{1,\dots ,m\}\) and \(\epsilon _{ij}\) is given by (20) for \(i,j\in \{1,\dots ,m\}\) such that \(i < j\). This NLP problem will be named \(\mathcal{{M}}\) from now on. The model \(\mathcal{{M}}\) has \(m(5m + 11)/2\) constraints and \(3m(m+1)/2 + 2\) variables.

Since problem \(\mathcal{{M}}\) is a very hard non-convex nonlinear programming problem, we consider, once again, a multi-start strategy in order to obtain the best possible local solution using the NLP solver Algencan. The algorithm stops when either the number of attempts to solve the problem reaches \(N_{\text {att}} = 1000\) or 5 hours of CPU time are spent.

For instances with identical ellipses with semi-axis lengths a and b, the initial point is given as follows. The centers of the ellipses are arranged in a regular lattice where the distance between consecutive points is \(2\max \{a,b\}\). The rotation angle of each ellipse is chosen uniformly at random in the interval \([0,\pi ]\). The initial guess for the length and width of the container are then chosen so that it contains all ellipses. In the case of instances with non-identical ellipses, the lattice is constructed so that the ellipses do not overlap when their centers are arranged in the lattice. Moreover, the order in which the ellipses are arranged in the lattice is random.

Table 2 Instances with non-identical ellipses considered in [38]
Table 3 Instances with identical ellipses considered in [38]

In a first set of experiments, we considered the three sets of instances introduced in [38] for the problem of packing a given set of identical or non-identical ellipses within a rectangular container of minimum area. The first set includes 15 instances with non-identical ellipses; the second set includes 14 instances with identical ellipses; and the third set includes 15 small instances with 3 non-identical ellipses with increasing eccentricity. Tables 2, 3 and 4 show the results. The first column presents the names of the instances and the second column shows the number m of ellipses. The third column shows the area of the container found by the method proposed in [38]. A subset of the instances in Table 2 were also considered in [49]. Therefore, the third column in Table 2 also shows, when applicable, the area of the container found by the method proposed in [49]. The fourth column shows the area of the container found by our method. The area is rounded with 5 decimal places (results up to the machine precision can be found in http://www.ime.usp.br/~lobato/). The fifth column shows the number of attempts made to find the solution. The last column shows the average CPU time (in seconds) per local minimization. As it can be seen, our method was able to find solutions at least as good as the ones presented in [38]. Moreover, for 20 instances, our method found better solutions (marked with * in Tables 2 and 3) than the ones reported in [38]. In Table 2 it is also possible to see that, considering the 9 instances to which the methodology proposed in [49] was applied, our method found better quality solutions in 7 instances (TC05b, TC06, TC11, TC14, TC20, TC50, and TC100), same quality solution in one instance (TC05a), and a poorer quality solution in only one instance (TC30). Figure 4 illustrates the solutions obtained for the three largest instances TC30, TC50, and TC100.

Table 4 Instances with three non-identical ellipses considered in [38]
Fig. 4
figure 4

Illustrations of the solutions found for the instances TC30, TC50, and TC100

To end this section, we consider the problem of packing a given set of ellipses inside an ellipse with minimum area. This problem can be modeled as the problem of minimizing the product ab of the variable semi-axis lengths a and b of the elliptical container subject to the non-overlapping constraints (21)–(24) plus the fitting constraints (45)–(48) and (52)–(53). This model has \(m(5m+1)/2\) constraints and \(3m(m+3)/2 + 2\) variables. We have considered only one instance where the ellipses to be packed have semi-axis lengths 2 and 1. Figure 5 illustrates the solution found by a single run of Algencan. This solution was found in 1h56m14s. The container has semi-axis lengths 19.136912 and 12.050124.

Fig. 5
figure 5

100 ellipses with semi-axis lengths 2 and 1 inside a minimizing area ellipse with semi-axis lengths 19.136912 and 12.050124

Table 5 Results for the three-dimensional problem of minimizing the volume of the container (sphere or cuboid) for an increasing number of ellipsoids \(m \in \{ 10, 20, \dots , 100 \}\)
Fig. 6
figure 6

Illustration of the solutions obtained for the problem of packing ab 90 and 100 ellipsoids, respectively, within a sphere of minimum volume and cd 90 and 100 ellipsoids, respectively, within a cuboid of minimum volume

In all the experiments described in the present and the previous subsection, the local solver Algencan was run using its default parameters; while the optimality and feasibility tolerances \(\varepsilon _{\text{ feas }}\) and \(\varepsilon _{\text{ opt }}\) (that are parameters that must be provided by the user) were both set to \(10^{-8}\). Those tolerances, related to the stopping criteria, are used to determine whether a solution to the optimization problem being solved has been found. See [13, pp. 116–117] for details. For the packing problems considered in the present work, independently of the stopping criterion satisfied by the optimizer, it is a relevant information the accuracy of the delivered solution in terms of (a) the fitting constraints and (b) the maximum overlapping between the ellipses being packed. Regarding the fitting constraints, once the multi-start process determines that a solution has been found with tolerances \(\varepsilon _{\text{ feas }}=\varepsilon _{\text{ opt }}=10^{-8}\), the optimization process is resumed with tighter tolerances in order to achieve a precision of the order of \(10^{-14}\) in the sup-norm of the fitting constraints (60). Regarding the overlapping between the ellipses, in order to be able to deliver a measure that is independent of the model being solved, the approach introduced in [30] was considered. In particular, its C/C++ implementation (freely available at https://github.com/chraibi/EEOver) was used. The method is an exact method that is able to compute the intersection between ellipses and, for all the solutions reported here, overlapping between every pair of ellipses is always smaller than the machine precision \(10^{-16}\).

6.2 Three-dimensional packing

In this section, we consider two problems of packing three-dimensional ellipsoids. The first problem is to pack a given set of m ellipsoids inside a sphere with minimum volume. It can be modeled as the problem of minimizing the radius r of the sphere subject to the non-overlapping constraints (21)–(24), where \(P_i \in {\mathbb {R}}^{3 \times 3}\) is the diagonal matrix whose entries are the squared lengths of the semi-axes of the i-th ellipsoid for \(i \in \{1,\dots ,m\}\) and \(\epsilon _{ij}\) is given by (20) for \(i,j\in \{1,\dots ,m\}\) such that \(i < j\), plus the fitting constraints (45)–(49), where \(S_i = r^{-2}P_i\) and \(\lambda _{\max }(S_i) = r^{-2}\lambda _{\max }(P_i)\) for each \(i \in \{1,\dots ,m\}\). This problem has \(3m^2 + 3m + 1\) constraints and \(2m^2 + 8m + 1\) variables.

The second problem is to pack a given set of m ellipsoids inside a cuboid with minimum volume. This problem can be modeled as the problem of minimizing the product of the variable length L, width W, and height H of the cuboid subject to the non-overlapping constraints (21)–(24) plus the fitting constraints. Assuming that the edges of the cuboid are parallel to the Cartesian axes, the fitting constraints are given by

$$\begin{aligned} \left( w^{\top }_{\ell }c_i - s_{\ell } \right) ^2 \ge \left\| P_i^{\frac{1}{2}}Q_i^{\top }w_{\ell } \right\| ^2 \quad {\text { and }} \quad w^{\top }_{\ell }c_i \le s_{\ell } \quad \text{ for } \, i=1,\dots ,m, \quad \ell =1,\dots ,6,\nonumber \\ \end{aligned}$$
(61)

where \(w_1 = -w_2 = (1,0,0)^{\top }\), \(w_3 = -w_4 = (0,1,0)^{\top }\), \(w_5 = -w_6 = (0,0,1)^{\top }\), and

$$\begin{aligned} s_1 = s_2 = \frac{L}{2}, \quad s_3 = s_4 = \frac{W}{2}, \quad {\text { and }} \quad s_5 = s_6 = \frac{H}{2}. \end{aligned}$$

This problem has \(3m^2 + 9m\) constraints and \(2m^2 + 4m + 3\) variables.

In our experiments, we have considered identical ellipsoids with semi-axis lengths 1, 0.75, and 0.5. Table 5 presents the results we have obtained for a single run of the local solver Algencan applied to instances with \(m \in \{10, 20, \dots , 100\}\). In the table, the first column shows the number of ellipsoids. The second and third columns show the volume of the sphere found and the CPU time, respectively. The fourth and fifth columns show the volume of the cuboid found and the CPU time, respectively. Figure 6 illustrates a few arbitrary selected solutions.

7 Concluding remarks

We proposed two continuous and differentiable nonlinear models for avoiding overlapping between ellipsoids. We also introduced models for including an ellipsoid inside an ellipsoid and inside a half-space. Since the non-overlapping models have quadratic numbers of variables and constraints on the number of ellipsoids to be packed, the numerical resolution of these models turns out to be a very hard computational task. Numerical experiments suggest that a multi-start strategy combined with a local nonlinear programming solver can be applied to instances with up to 100 ellipsoids in order to obtain “solutions” within an affordable time. Two lines of future research are possible. On the one hand, simpler models with a smaller number of variables and constraints would simplify the optimization process. On the other hand, defining suitable bounds for all the variables of the introduced models is a simple task and it would allow one to apply spatial branch-and-bound based global optimization solvers. The development of dedicated global optimization solvers applicable to the introduced models would allow the resolution of at least small instances of the hard packing problems considered in the present work.