1 Introduction

Ellipsoids’ packing techniques are important tools for several physical-chemistry and condensed matter physics applications (see [20,21,22,23,24, 35] and the references therein). In a robotic problem addressed in [18], a robot arm and other elements in a scene are approximated by three-dimensional ellipsoids and it allows the authors to explore the relation between the overlapping of ellipsoids and the overlapping of free-form objects. In [4, 20], simulation techniques are employed to study the properties of random packings of ellipsoids. In both works, rectangular boxes with periodic boundaries are considered in order to simulate the three-dimensional infinite space, and properties like density, orientation, and contacts among the ellipsoids are analyzed.

In the last few decades, many works [5, 7, 14,15,16, 19, 29, 41,42,43,44] addressed the problem of packing non-overlapping spheres in the n-dimensional space within a variety of fixed- or variable-dimensions containers of different shapes by means of nonlinear programming (NLP) models and methods. However, to the best of our knowledge, only eight very recent works [8, 25, 30,31,32,33, 38, 40] have considered the problem of packing non-overlapping ellipses, spheroids, or ellipsoids using nonlinear programming models and optimization techniques.

In 2013, the problem of packing as many identical ellipses as possible within a rectangular container was considered in [25]. By restricting the ellipses to have their semi-axes parallel to the Cartesian axes and allowing the centers of the ellipses to belong to a finite set of points in the plane (grid), the problem was modelled as a mixed-integer linear programming problem (MILP). As expected, only small instances of the MILP model could be solved to optimality within an affordable time.

In 2014, the problem of packing a given set of (non-necessarily identical) freely-rotated ellipses within a rectangle of minimum area was considered [31]. The non-overlapping between the ellipses was modelled using the idea of separating lines. State-of-the-art global optimization solvers were able to find solutions for instances with up to 14 ellipses. For instances with up to 100 ellipses, the authors presented solutions obtained by a constructive heuristic method. The same problem was addressed in [40]. The problem was modelled using “quasi-phi-functions” that is an extension of the phi-functions [17] extensively and successfully used to model a large variety of complicated packing problems. As well as in [31], the non-overlapping between ellipses was modelled based on the idea of separating lines. Models were tackled by a local optimization solver combined with ad hoc initial points and a multi-start strategy. Most of the solutions presented in [31] were improved in [40], where numerical experiments with additional instances with up to 120 ellipses were also shown.

In 2015, the work [30] extended the ideas presented for the two-dimensional case in [31] to deal with the three-dimensional problem of packing a given set of (non-necessarily identical) freely-rotated ellipsoids within a rectangular container of minimum volume. The idea of separating lines to model the non-overlapping between ellipses was naturally extended to separating planes to model the non-overlapping between ellipsoids. Resulting NLP models are non-convex and highly complex. Numerical experiments in [30], that considered instances with up to 100 ellipsoids, showed that state-of-the-art global optimization solvers were only able to deliver feasible solutions within an affordable prescribed CPU time limit. Heuristic methods were also proposed in [30]. Also in 2015, Pankratov et al. [38] extended the methods and methodology proposed in [40] from the two- to the three-dimensional case; but only spheroids, instead of arbitrary ellipsoids, were considered in the three-dimensional case. In that work, quasi-phi-functions were defined, NLP models proposed (based on separating planes), and solutions were delivered by applying a multi-start strategy associated with a local NLP solver. Illustrative numerical experiments in [38] describe solutions obtained for instances with up to 12 spheroids.

Still in 2015, continuous and differentiable NLP models for n-dimensional ellipsoids’ packing problems were proposed in [8]. The non-overlapping between ellipsoids was formulated in two different ways: (i) based on separating hyperplanes (and, in this sense, similarly to the already mentioned approaches) and (ii) based on linear transformations. In the latter case, the non-overlapping between a pair of ellipsoids reduces to the non-overlapping between a sphere and an ellipsoid. The solution to this simpler problem was inspired in the models proposed in [5] for packing circles within an ellipse. The non-overlapping models proposed in [8] were employed in the same work, as an illustration of their applicability, to tackle five variations of two- and three-dimensional ellipsoids’ packing problems: (a) packing the maximum number of identical ellipses with given semi-axis lengths within a rectangular container with given length and width; (b) finding the rectangular container with minimum area that can pack a given set of ellipses; (c) finding the elliptical container with minimum area that can pack a given set of identical ellipses; (d) finding the spherical container with minimum volume that can pack a given set of identical ellipsoids; and (e) finding the cuboid with minimum volume that can pack a given set of identical ellipsoids. In all cases, a multi-start strategy combined with a local nonlinear programming solver was employed with the aim of finding good quality solutions. Numerical experiments showed that, when applied to the instances of problem (b) presented in [31], the proposed models and methodology improved most of the solutions presented in [31, 40]. When applied to problem (e), models proposed in [8] can deal with instances with up to 100 ellipsoids in the sense that good quality solutions can be found within an affordable CPU time limit. On the other hand, only small-sized instances of the NLP models proposed in [30, 38] can be handled by off-the-shelf local or global nonlinear solvers.

In 2016, the problems of packing ellipses within minimizing-area circles and regular polygons were considered in [32, 33], respectively. The introduced models include variables classified as “primary variables”, that are the variables that define the ellipses’ positions (center and orientation), and “secondary variables”, that are auxiliary variables required to model the non-overlapping between the ellipses and the container fitting constraints. The later variables can be seen as “embedded Lagrange multipliers” since they are, in fact, Lagrange multipliers of underlying optimization subproblems related to the non-overlapping and fitting constraints (this idea is also present in the models proposed in [8]). In [33], illustrative numerical examples that consider the ellipses of the instances presented in [31] are given (all with up to 14 ellipses). In [32], instances with up to 20 ellipses are considered. It is worth noting that the authors mention that the application of non-dedicated deterministic global optimization strategies could become prohibitively expensive for larger instances and that heuristic approaches might be useful in such case.

Nonlinear programming models introduced in [8, 30,31,32,33, 38, 40] share the following property: If m ellipsoids are considered, the cost of evaluating the constraints that define the models is \(O(m^2)\). This means that, independently of variations in the modelling process that may affect the difficulties that appear when a local or global solver is employed, they all share the quadratic complexity in the number of ellipsoids. This inhibits their applicability to large-sized instances of any kind of ellipsoids’ packing problem. For the two-dimensional case, this difficulty was addressed in [40] by means of an heuristic approach that solves a sequence of NLP problems with expected O(m) variables and constraints. Starting from a given initial point, for each ellipse, a square that contains the ellipse is considered and an additional constraint that says that the ellipse must belong to the square is added to an underlying model with \(O(m^2)\) variables and constraints (based on quasi-phi-functions). Since a pair of ellipses that must belong to non-overlapping squares cannot overlap, its associated non-overlapping constraint and auxiliary variables can be deleted from the model. The resulting NLP model is solved and the process is repeated. Hopefully, all subproblems have a linear number of variables and constraints and the sequence of their solutions converges to a solution to the original problem. This strategy was also extended in [39] for tackling several three-dimensional related packing problems.

In the present work, we introduce non-overlapping models with an evaluation cost that is linear in the number of ellipsoids. As a consequence, models for a variety of ellipses’ and ellipsoids’ packing problems are introduced and numerical experiments with up to 1000 ellipsoids are delivered. To fix ideas, suppose that our problem consists of packing m different ellipsoids in a given container in \(\mathbb {R}^n\) without overlapping. Assume that, in principle, each ellipsoid is placed with its center in the origin. The natural variables of the problem turn out to be the displacements of the ellipsoids by means of which each ellipsoid is inside the container and intersections between (the interior of the) ellipsoids do not occur. Each displacement is a combination of a translation and a rotation. Since the translation is represented by n real parameters and the rotation is represented by \(n(n-1)/2\) rotation angles, it is natural to think that the problem has \(m [n + n(n-1)/2]\) variables. However, practical mathematical formulations usually involve \(O(m^2)\) unknowns. This is because there exists one non-overlapping requirement for each pair of ellipsoids and each non-overlapping requirement involves additional specific variables. For example, in order to verify whether two ellipsoids overlap, one finds the points in the ellipsoids that realize the minimum distance. These points are “specific” variables that we would like to eliminate. The main idea of this paper is to eliminate the additional \(O(m^2)\) variables by making them implicit by means of suitable geometrical transformations. Although the non-overlapping model we present has a linear number of variables and constraints, it is still a very-hard-to-solve nonlinear model. In order to find good quality solutions for the ellipsoids’ packing problem, we consider a multi-start method with ad hoc initial guesses.

The rest of this work is organized as follows. A formal definition of the problem is given in Sect. 2. The model introduced in [8] that formulates the non-overlapping problem in terms of its natural variables plus \(O(m^2)\) additional variables is briefly presented in Sect. 3. This model is then used to derive a new non-overlapping model with implicit variables in Sect. 4. The new model contains a linear number of variables and constraints on the number of ellipsoids to be packed. In Sect. 5, we show how to efficiently evaluate the constraints of the non-overlapping model. In Sect. 6, we briefly discuss the applicability of existent state-of-the-art deterministic global optimization tools to tackle packing problems based on the proposed model. In Sect. 7, we present some numerical experiments that show that the non-overlapping model introduced in the present work can be used to pack a large number of ellipsoids. Finally, we draw some conclusions in Sect. 8. The computer implementation of the models and methods introduced in the current work and the solutions reported in Sect. 7 are freely available at http://www.ime.usp.br/~lobato/.

2 Statement of the problem

An ellipsoid in \(\mathbb {R}^n\) is a set of the form \(\{ x \in \mathbb {R}^n \mid (x - c)^{\top }M^{-1} (x-c) \le 1\}\), where \(M \in \mathbb {R}^{n \times n}\) is a symmetric and positive definite matrix. The vector \(c \in \mathbb {R}^n\) is the center of the ellipsoid. The eigenvectors of \(M^{-1}\) determine the principal axes of the ellipsoid and the eigenvalues of \(M^{\frac{1}{2}}\) are the lengths of the semi-axes of the ellipsoid. In this work, we deal with the problem of packing ellipsoids in the n-dimensional space. This problem can be stated as follows. Let m be the number of ellipsoids to be packed. We are given the lengths \(l_i^1, \dots , l_i^n\) of the semi-axes of the i-th ellipsoid for each \(i \in \{1,\dots ,m\}\) and we are also given a set \(\mathcal {C}\subset \mathbb {R}^n\), which we call the container. The objective of the problem is to find ellipsoids \(\mathcal {E}_1, \dots , \mathcal {E}_m\) such that:

  1. 1.

    \(\mathcal {E}_i\) has semi-axes with lengths \(l_i^1, \dots , l_i^n\) for all \(i \in \{1,\dots ,m\}\);

  2. 2.

    \(\text {int}(\mathcal {E}_i) \cap \text {int}(\mathcal {E}_j) = \emptyset \), for each \(i,j \in \{1,\dots ,m\}\) with \(i \ne j\);

  3. 3.

    \(\mathcal {E}_i \subseteq \mathcal {C}\) for all \(i \in \{1,\dots ,m\}\).

The first constraint states that the ellipsoids must have the given lengths of the semi-axes. The second requirement is that the ellipsoids must not overlap each other, which means that the interiors of the ellipsoids must be mutually disjoint. The last constraint says that the ellipsoids must be inside the container. This is a feasibility problem where one must determine the center and rotation of each ellipsoid so that the constraints 1, 2, and 3 are satisfied. We also consider an optimization version of this problem in which the volume of the container must be minimized.

3 Transformation-based non-overlapping model

In [8], continuous and differentiable nonlinear programming models for packing ellipsoids within polyhedra and ellipsoids were introduced. Two models for the non-overlapping constraints were given. One of them is based on a linear transformation that, for every pair of ellipsoids, converts one of the ellipsoids into a ball. Since this model forms the basis for the non-overlapping model that will be introduced in this work, we briefly describe it below. For a more detailed explanation, see [8].

Let \(I = \{1,\dots ,m\}\) be the set of indices of the ellipsoids to be packed. For each \(i \in I\), we denote by \(P_i^{\frac{1}{2}}\) the \(n \times n\) diagonal matrix whose diagonal entries are the lengths of the semi-axes of the ellipsoid \(\mathcal {E}_i\). Then, for each \(i \in I\), we can represent the ellipsoid \(\mathcal {E}_i\) as

$$\begin{aligned} \mathcal {E}_i = \left\{ x \in \mathbb {R}^n \mid (x - c_i)^{\top }Q_iP_i^{-1}Q_i^{\top } (x-c_i) \le 1\right\} , \end{aligned}$$

where \(c_i \in \mathbb {R}^n\) is the center of the ellipsoid and \(Q_i \in \mathbb {R}^{n \times n}\) is an orthogonal matrix that determines the orientation of the ellipsoid. For example, for \(n = 2\), we can represent \(Q_i\) as

$$\begin{aligned} Q_i = \left( \begin{array}{ll} \cos \theta _i &{}\quad -\sin \theta _i \\ \sin \theta _i &{}\quad \phantom {-}\cos \theta _i \end{array} \right) , \end{aligned}$$
(1)

whereas, for \(n = 3\), we can represent \(Q_i\) as

$$\begin{aligned} Q_i = \left( \begin{array}{ccc} \cos \theta _i \cos \psi _i &{}\quad \sin \phi _i \sin \theta _i \cos \psi _i -\cos \phi _i \sin \psi _i &{}\quad \sin \phi _i \sin \psi _i + \cos \phi _i \sin \theta _i \cos \psi _i \\ \cos \theta _i \sin \psi _i &{}\quad \cos \phi _i \cos \psi _i + \sin \phi _i \sin \theta _i \sin \psi _i &{}\quad \cos \phi _i \sin \theta _i \sin \psi _i -\sin \phi _i \cos \psi _i\\ -\sin \theta _i &{}\quad \sin \phi _i \cos \theta _i &{}\quad \cos \phi _i \cos \theta _i \end{array} \right) . \end{aligned}$$
(2)

The parameters \(\theta _i\) (when \(n=2\)) and \(\theta _i\), \(\phi _i\), and \(\psi _i\) (when \(n = 3\)) are called “rotation angles” of the ellipsoid. Similar parametrizations can be made for \(n > 3\).

Let \(i,j \in I\) be such that \(i < j\) and consider the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\). Let \(T_{ij}: \mathbb {R}^n \rightarrow \mathbb {R}^n\) be the linear transformation defined by

$$\begin{aligned} T_{ij}(x) = P_i^{-\frac{1}{2}} Q_i^{\top }(x - c_j). \end{aligned}$$
(3)

Let \(\mathcal {E}_i^{ij}\) and \(\mathcal {E}_j^{ij}\) be the ellipsoids obtained when the transformation \(T_{ij}\) defined in (3) is applied to \(\mathcal {E}_i\) and \(\mathcal {E}_j\), respectively, i.e.,

$$\begin{aligned} \mathcal {E}_i^{ij} = \left\{ x \in \mathbb {R}^n \mid \left[ x - P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j)\right] ^{\top }\left[ x - P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j)\right] \le 1 \right\} \end{aligned}$$

and

$$\begin{aligned} \mathcal {E}_j^{ij} = \{ x \in \mathbb {R}^n \mid x^{\top } S_{ij}x \le 1\}, \end{aligned}$$

where

$$\begin{aligned} S_{ij} = P_i^{\frac{1}{2}}Q_i^{\top }Q_jP^{-1}_jQ_j^{\top }Q_iP_i^{\frac{1}{2}}. \end{aligned}$$

Therefore, \(\mathcal {E}_i^{ij}\) is a unit-radius ball centered at \(P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j)\) and \(\mathcal {E}_j^{ij}\) is an ellipsoid centered at the origin (since \(S_{ij} = V_{ij}^{\top }V_{ij}\) with \(V_{ij} = P_j^{-\frac{1}{2}}Q_j^{\top }Q_iP_i^{\frac{1}{2}}\)). By Lemma 3.1 below (that is a restatement of Lemma 3.1 in [8]), we have that the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) overlap if and only if the ellipsoids \(\mathcal {E}_i^{ij}\) and \(\mathcal {E}_j^{ij}\) overlap.

Lemma 3.1

Consider the ellipsoids \(\mathcal {E}_i\), \(\mathcal {E}_j\), \(\mathcal {E}_i^{ij}\), and \(\mathcal {E}_j^{ij}\) defined above. Thus, the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) overlap if and only if the ellipsoids \(\mathcal {E}_i^{ij}\) and \(\mathcal {E}_j^{ij}\) overlap.

Proof

For any \(x \in \mathbb {R}^n\), we have

$$\begin{aligned} \begin{aligned} (x -c_i)^{\top }Q_iP_i^{-1} Q_i^{\top }(x - c_i)&= (x -c_i)^{\top }Q_iP_i^{-\frac{1}{2}} P_i^{-\frac{1}{2}}Q_i^{\top }(x - c_i)\\&= (x -c_i)^{\top }\left( P_i^{-\frac{1}{2}}Q_i^{\top }\right) ^{\top } P_i^{-\frac{1}{2}}Q_i^{\top }(x - c_i)\\&= \left[ (x-c_j) - (c_i-c_j)\right] ^{\top }\left( P_i^{-\frac{1}{2}}Q_i^{\top }\right) ^{\top }\\&\qquad P_i^{-\frac{1}{2}}Q_i^{\top }\left[ (x-c_j) - (c_i-c_j)\right] \\&= \left[ P_i^{-\frac{1}{2}} Q_i^{\top }(x-c_j) - P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j)\right] ^{\top }\\&\quad \;\;\left[ P_i^{-\frac{1}{2}} Q_i^{\top }(x-c_j)- P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j)\right] \\&= \left[ T_{ij}(x) - P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j)\right] ^{\top }\left[ T_{ij}(x) - P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j)\right] . \end{aligned} \end{aligned}$$

Therefore, we have that \(x \in \text {int}(\mathcal {E}_i)\) if and only if \(T_{ij}(x) \in \text {int}(\mathcal {E}_i^{ij})\). Furthermore,

$$\begin{aligned} (x - c_j)^{\top }Q_jP^{-1}_jQ_j^{\top }(x - c_j)&= (x - c_j)^{\top }Q_iP_i^{-\frac{1}{2}} P_i^{\frac{1}{2}}Q_i^{\top }Q_jP^{-1}_jQ_j^{\top }Q_iP_i^{\frac{1}{2}} P_i^{-\frac{1}{2}}Q_i^{\top }(x - c_j)\\&= (x - c_j)^{\top }Q_iP_i^{-\frac{1}{2}} S_{ij}P_i^{-\frac{1}{2}}Q_i^{\top }(x - c_j)\\&= (x - c_j)^{\top }\left( P_i^{-\frac{1}{2}}Q_i^{\top }\right) ^{\top } S_{ij}P_i^{-\frac{1}{2}}Q_i^{\top }(x - c_j)\\&= \left[ P_i^{-\frac{1}{2}} Q_i^{\top }(x-c_j)\right] ^{\top } S_{ij}P_i^{-\frac{1}{2}} Q_i^{\top }(x - c_j)\\&= T_{ij}(x)^{\top }S_{ij}T_{ij}(x). \end{aligned}$$

Then, \(x \in \text {int}(\mathcal {E}_j)\) if and only if \(T_{ij}(x) \in \text {int}(\mathcal {E}_j^{ij})\). Hence, \(\text {int}(\mathcal {E}_i) \cap \text {int}(\mathcal {E}_j) \ne \emptyset \) if and only if \(\text {int}(\mathcal {E}_i^{ij}) \cap \text {int}(\mathcal {E}_j^{ij}) \ne \emptyset \). In other words, the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) overlap if and only if the ellipsoids \(\mathcal {E}_i^{ij}\) and \(\mathcal {E}_j^{ij}\) overlap.\(\square \)

Two propositions given in [8] are now required. The frontier of a set \(\mathcal {E}\) is denoted by \(\partial \mathcal {E}\).

Proposition 3.1

Let \(\mathcal {E}= \{x \in \mathbb {R}^n \mid x^{\top }Mx \le 1\}\), where \(M \in \mathbb {R}^{n \times n}\) is symmetric and positive definite. Thus, for each \(y \in \mathbb {R}^n \setminus \text {int}(\mathcal {E})\), there exist unique \(x^* \in \mathbb {R}^n\) and \(\mu ^* \in \mathbb {R}\) such that \(y = x^* + \mu ^* M x^*\) and \(x^*\) is the projection of y onto \(\mathcal {E}\). Moreover, \(x^* \in \partial \mathcal {E}\) and \(\mu ^* \in \mathbb {R}_+\).

Proof

See the proof of Proposition 4.1 in [8]. \(\square \)

Proposition 3.2

Let \(\mathcal {E}= \{x \in \mathbb {R}^n \mid x^{\top }Mx \le 1\}\), where \(M \in \mathbb {R}^{n \times n}\) is symmetric and positive definite. Thus, for each \(x \in \partial \mathcal {E}\) and \(\mu > 0\), we have \((x + \mu M x)^{\top }M(x + \mu Mx) > 1\).

Proof

See the proof of Proposition 4.2 in [8]. \(\square \)

For the ball \(\mathcal {E}_i^{ij}\) not to overlap with \(\mathcal {E}_j^{ij}\), the distance between the center \(c_{i}^{ij}\) of the unit-radius ball \(\mathcal {E}_i^{ij}\) and the ellipsoid \(\mathcal {E}_j^{ij}\) must be at least one. In particular, \(c_{i}^{ij}\) must not belong to \(\mathcal {E}_j^{ij}\). If \(c_{i}^{ij} \notin \mathcal {E}_j^{ij}\), then, by Proposition 3.1, there exist unique \(x_{ij} \in \partial \mathcal {E}_j^{ij}\) and \(\mu _{ij} \in \mathbb {R}_+\) such that

$$\begin{aligned} c_{i}^{ij} = x_{ij} + \mu _{ij} S_{ij}x_{ij}. \end{aligned}$$

Moreover, \(x_{ij}\) is the projection of \(c_{i}^{ij}\) onto \(\mathcal {E}_j^{ij}\). Hence, the distance \(d(c_{i}^{ij},\mathcal {E}_i^{ij})\) between \(c_{i}^{ij}\) and \(\mathcal {E}_i^{ij}\) is given by

$$\begin{aligned} d\left( c_{i}^{ij},\mathcal {E}_i^{ij}\right) = \Big \Vert c_{i}^{ij} - x_{ij}\Big \Vert = \Big \Vert \mu _{ij} S_{ij}x_{ij}\Big \Vert . \end{aligned}$$

By Proposition 3.2, if \(x_{ij} \in \partial \mathcal {E}_j^{ij}\) and \(\mu _{ij} > 0\), then \(c_{i}^{ij} \notin \mathcal {E}_j^{ij}\). Therefore, we obtain the following non-overlapping model:

$$\begin{aligned}&x_{ij}^{\top }S_{ij}x_{ij} = 1, \quad \forall i,j \in I \text { such that } i < j \end{aligned}$$
(4)
$$\begin{aligned}&\mu _{ij}^2 \Big \Vert S_{ij}x_{ij}\Big \Vert ^2 \ge 1, \quad \forall i,j \in I \text { such that } i < j \end{aligned}$$
(5)
$$\begin{aligned}&P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j) = x_{ij} + \mu _{ij}S_{ij}x_{ij}, \quad \forall i,j \in I \text { such that } i < j \end{aligned}$$
(6)
$$\begin{aligned}&\mu _{ij} \ge 0, \quad \forall i,j \in I \text { such that } i < j \end{aligned}$$
(7)

Figure 1 illustrates the transformation-based non-overlapping model. The unit-radius ball \(\mathcal {E}_i^{ij}\) centered at \(c_i^{ij}\) is shown to the left; while the ellipsoid \(\mathcal {E}_j^{ij}\) is shown to the right. The point \(c_i^{ij}\) can be written as the sum of \(x_{ij}\) (which is the orthogonal projection of \(c_i^{ij}\) onto the frontier of \(\mathcal {E}_j^{ij}\)) and a positive multiple of the vector \(S_{ij}x_{ij}\).

Fig. 1
figure 1

Illustration of the transformation-based non-overlapping model

Notice that \(\mu _{ij}\) must be positive in order to satisfy constraints (5, 7). Proposition 3.3 below provides a positive lower bound \(\epsilon _{ij}\) on \(\mu _{ij}\).

Proposition 3.3

Any solution to the system (4)–(7) is such that \(\mu _{ij} \ge \epsilon _{ij}\) for all \(i < j\), where

$$\begin{aligned} \epsilon _{ij} \equiv \epsilon (P_i,P_j) = \lambda _{\min }\left( P_i^{-1}\right) \lambda _{\min }\left( P_i^{\frac{1}{2}}\right) \lambda _{\min }(P_j) \lambda _{\min }\left( P_j^{-\frac{1}{2}}\right) > 0. \end{aligned}$$

Proof

See the proof of Proposition 4.3 in [8]. \(\square \)

By manipulating the equalities and inequalities in model (4)–(7) and including the positive bound on \(\mu _{ij}\), we obtain the following equivalent non-overlapping model:

$$\begin{aligned}&x_{ij}^{\top } \left( P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) - x_{ij}\right) = \mu _{ij}, \quad \forall i,j \in I \text { such that } i < j \end{aligned}$$
(8)
$$\begin{aligned}&\Big \Vert P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) - x_{ij}\Big \Vert ^2 \ge 1, \quad \forall i,j \in I \text { such that } i < j \end{aligned}$$
(9)
$$\begin{aligned}&P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) = x_{ij} + \mu _{ij}S_{ij}x_{ij}, \quad \forall i,j \in I \text { such that } i < j \end{aligned}$$
(10)
$$\begin{aligned}&\mu _{ij} \ge \epsilon _{ij}, \quad \forall i,j \in I \text { such that } i < j. \end{aligned}$$
(11)

The numbers of variables and constraints of this non-overlapping model are quadratically proportional to the number of ellipsoids to be packed. Thus, when the number of ellipsoids is relatively high, this model becomes very hard to be numerically solved. The introduction of a new model that does not suffer from this inconvenience starts by reducing the number of variables and constraints of model (8)–(11). Although those reductions do not reduce the complexity of evaluating the model, they are the first step in the direction of developing a more tractable model. In order to reduce the number of constraints, we will combine all constraints from the non-overlapping model (8)–(11) into a linear number of constraints by simple summing up the squared infeasibilities. To reduce the number of variables, we will replace the variables \(x_{ij}\) and \(\mu _{ij}\) from model (8)–(11) with functions that play the same roles as these variables.

4 Non-overlapping model with implicit variables

In this section, we present a non-overlapping model with implicit variables. The model is derived from the transformation-based non-overlapping model briefly described in the previous section. In Sect. 4.1, we show how to reduce the number of constraints and in Sect. 4.2 we show how to reduce the number of variables. The complete model with implicit variables, that contains a linear number of variables and constraints, is presented in Sect. 4.3.

4.1 Reduction of the number of constraints

Consider the non-overlapping model (8)–(11). By replacing each of the inequality constraints of this model with its squared infeasibility measure, we obtain the following model:

$$\begin{aligned} \left( x_{ij}^{\top } \left( P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) - x_{ij}\right) - \mu _{ij}\right) ^2&= 0,\quad \forall i,j \in I \text { such that } i < j \end{aligned}$$
(12)
$$\begin{aligned} \max \left\{ 0, 1 - \Big \Vert P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) - x_{ij}\Big \Vert ^2\right\} ^2&= 0,\quad \forall i,j \in I \text { such that } i < j \end{aligned}$$
(13)
$$\begin{aligned} \Big \Vert x_{ij} + \mu _{ij}S_{ij}x_{ij} - P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j})\Big \Vert ^2&= 0, \quad \forall i,j \in I \text { such that } i < j \end{aligned}$$
(14)
$$\begin{aligned} \max \{0, \epsilon _{ij} - \mu _{ij}\}^2&= 0,\quad \forall i,j \in I \text { such that } i < j. \end{aligned}$$
(15)

This model is equivalent to the model (8)–(11), in the sense that any solution to (12)–(15) is a solution to the model (8)–(11) and vice-versa.

For each \(i \in I\), let \(\Omega _i \in \mathbb {R}^q\) denote the vector of rotation angles of the i-th ellipsoid. For each \(i,j \in I\) such that \(i < j\), define the sum of the squared infeasibilities \(o: \mathbb {R}^{3n+2q+1} \rightarrow \mathbb {R}_+\) by

$$\begin{aligned} \begin{aligned} o(c_i, c_j, \Omega _i, \Omega _j, x_{ij}, \mu _{ij}; P_i, P_j)&= \left( x_{ij}^{\top } \left( P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) - x_{ij}\right) - \mu _{ij}\right) ^2\\&\quad + \max \{0, \epsilon _{ij} - \mu _{ij}\}^2\\&\quad + \Big \Vert x_{ij} + \mu _{ij}S_{ij}x_{ij} - P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j})\Big \Vert ^2\\&\quad + \max \left\{ 0, 1 - \Big \Vert P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) - x_{ij}\Big \Vert ^2\right\} ^2. \end{aligned} \end{aligned}$$
(16)

Observe that the set of constraints (12)–(15) is equivalent to the constraints

$$\begin{aligned} o(c_i, c_j, \Omega _i, \Omega _j, x_{ij}, \mu _{ij}; P_i, P_j) = 0, \quad \forall i,j \in I \text { such that } i < j. \end{aligned}$$
(17)

In order to obtain a model with a linear number of constraints, we can combine the constraints (17) in the following way

$$\begin{aligned} \sum _{j=i+1}^m o(c_i, c_j, \Omega _i, \Omega _j, x_{ij}, \mu _{ij}; P_i, P_j) = 0, \quad \forall i \in I \setminus \{m\}, \end{aligned}$$
(18)

or even combining them into a single constraint:

$$\begin{aligned} \sum _{i=1}^{m-1} \sum _{j=i+1}^m o(c_i, c_j, \Omega _i, \Omega _j, x_{ij}, \mu _{ij}; P_i, P_j) = 0. \end{aligned}$$
(19)

The constraints (18) (or the constraint (19)) are equivalent to the constraints (17). Therefore, we can replace the constraints (8)–(11) with constraints (18) (or constraint (19)) and obtain an equivalent model for the non-overlapping of ellipsoids.

Although this new model has a linear number of constraints that models the non-overlapping of ellipsoids, the total number of terms in the summations is quadratically proportional to the number of ellipsoids to be packed. Thus, the computational cost of evaluating the constraints (18) at a given point is practically the same as the cost of evaluating the constraints (8)–(11). In Sect. 5, we will see how to efficiently evaluate these constraints.

4.2 Reduction of the number of variables

Consider the constraints (18) and suppose that the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) do not overlap. In this case, we know that there exist values for \(x_{ij}\) and \(\mu _{ij}\) such that the term \(o(c_i, c_j, \Omega _i, \Omega _j, x_{ij}, \mu _{ij}; P_i, P_j)\) as defined in (16) vanishes and, therefore, does not contribute to the summation in (18). We can simply take \(x_{ij}\) as the projection of \(c_{i}^{ij}\) onto the ellipsoid \(\mathcal {E}_j^{ij}\) and \(\mu _{ij}\) as the nonnegative scalar that satisfies \(c_{i}^{ij} = x_{ij} + \mu _{ij} S_{ij} x_{ij}\). The projection of \(c_{i}^{ij}\) onto the ellipsoid \(\mathcal {E}_j^{ij}\) is the solution to the problem

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} \Big \Vert x - c_{i}^{ij}\Big \Vert ^2\\ \text {subject to} &{} x^{\top } S_{ij} x \le 1. \end{array} \end{aligned}$$
(20)

However, taking \(x_{ij}\) as the solution to the problem (20) does not lead to a good overlapping measure when ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) do overlap. If \(c_{i}^{ij} \in \text {int}(\mathcal {E}_j^{ij})\), then the solution to the problem (20) is \(c_{i}^{ij}\) and we must have \(\mu _{ij} = 0\). Then, the term associated with the distance between \(c_{i}^{ij}\) and \(\mathcal {E}_j^{ij}\) will be constant for any \(c_{i}^{ij} \in \text {int}(\mathcal {E}_j^{ij})\), as well as the term associated with the positivity of \(\mu _{ij}\) in (18). Consider the following problem:

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} \Big \Vert x - c_{i}^{ij}\Big \Vert ^2\\ \text {subject to} &{} x^{\top } S_{ij} x = 1. \end{array} \end{aligned}$$
(21)

If \(c_{i}^{ij} \notin \text {int}(\mathcal {E}_j^{ij})\), problems (20) and (21) are equivalent and they both have a single solution. Notice that the null vector is not a feasible solution to problem (21). Thus, since problem (21) has a single constraint and the matrix \(S_{ij}\) is positive definite, the gradient of the constraint is nonzero at every feasible point. Therefore, any solution to this problem satisfies the linear independence constraint qualification. This means that the Karush–Kuhn–Tucker optimality conditions of problem (21) (see, for example, Proposition 3.3.1 in [3]) is satisfied by every solution to problem (21). Thus, if \(x^*\) is a solution to this problem then there exists \(\mu ^* \in \mathbb {R}\) such that

$$\begin{aligned} x^* + \mu ^* S_{ij} x^* - c_{i}^{ij} = 0. \end{aligned}$$
(22)

If \(c_{i}^{ij} \notin \text {int}(\mathcal {E}_j^{ij})\) then, by Proposition 3.1, there exist a unique \(x^* \in \partial \mathcal {E}_j^{ij}\) and a unique \(\mu ^*\) that satisfy (22). Moreover, \(\mu ^* \ge 0\). On the other hand, if \(c_{i}^{ij} \in \text {int}(\mathcal {E}_j^{ij})\) then problem (21) may have more than one solution. But, by Proposition 4.2, the Lagrange multiplier associated with the constraint of this problem is the same for any solution and belongs to the interval \([-1/\lambda _{\max }(S_{ij}),0]\).

We restate here Lemma 4.1 and Proposition 4.1 that were proved in [8]. Lemma 4.1 will be used in the proof of Lemma 4.2 and Proposition 4.1 will be used in the proof of Proposition 4.2. Lemma 4.2 and Proposition 4.2 are introduced in the present work. Lemma 4.2 will be used in the proof of the Proposition 4.2, which shows that the Lagrange multiplier associated with the constraint of problem (21) is the same for any solution to this problem. Lemma 4.2 is a particular case of Proposition 4.2.

Lemma 4.1

Consider ellipsoid \(\mathcal {E}= \{z \in \mathbb {R}^n \mid z^{\top } D z \le 1\}\), where \(D \in \mathbb {R}^{n \times n}\) is a positive definite diagonal matrix. For each \(y \in \mathcal {E}\), there exist \(x \in \partial \mathcal {E}\) and \(\alpha \in [-1/\lambda _{\max }(D), 0]\) such that \(y = x + \alpha Dx\).

Proof

See the proof of Lemma 5.1 in [8]. \(\square \)

Proposition 4.1

Consider ellipsoid \(\mathcal {E}= \{z \in \mathbb {R}^n \mid z^{\top } S z \le 1\}\), where \(S \in \mathbb {R}^{n \times n}\) is symmetric and positive definite. For each \(y \in \mathcal {E}\), there exist \(x \in \partial \mathcal {E}\) and \(\alpha \in [-1/\lambda _{\max }(S), 0]\) such that \(y = x + \alpha Sx\).

Proof

See the proof of Proposition 5.1 in [8]. \(\square \)

Lemma 4.2

Consider the ellipsoid \(\mathcal {E}= \{z \in \mathbb {R}^n \mid z^{\top } D z \le 1\}\), where \(D \in \mathbb {R}^{n \times n}\) is diagonal and positive definite. Given \(y \in \mathcal {E}\), there exists a unique \(\alpha \in [-1/\lambda _{\max }(D), 0]\) and there exists \(x \in \partial \mathcal {E}\) such that \(y = x + \alpha Dx\). Moreover, if \(\alpha \in (-1/\lambda _{\max }(D), 0]\) then \(x \in \partial \mathcal {E}\) is unique.

Proof

Let \(\mathcal {I} = \{1,\dots ,n\}\). For each \(i \in \mathcal {I}\), denote the i-th diagonal element of matrix D by \(d_i\). Consider the system

$$\begin{aligned}&\displaystyle y_i = x_i + \alpha d_i x_i,\quad \forall i \in I, \end{aligned}$$
(23)
$$\begin{aligned}&\displaystyle x^{\top }Dx = 1, \end{aligned}$$
(24)
$$\begin{aligned}&\displaystyle \alpha \in [-1/\lambda _{\max }(D),0]. \end{aligned}$$
(25)

By Lemma 4.1, the system (23)–(25) has at least one solution. Suppose that \((x^*,\alpha ^*)\) be a solution to this system and that \(\alpha ^* > -1/\lambda _{\max }(D)\). We shall prove that this is the only solution to this system. Notice that this is enough to prove the lemma.

Since \(\alpha ^* > -1/\lambda _{\max }(D)\), we have that

$$\begin{aligned} 1+\alpha ^*d_i > 0,\quad \forall i\in \mathcal {I}. \end{aligned}$$
(26)

By (23), we have \(y_i = (1 + \alpha ^*d_i)x_i^*\) for each \(i \in \mathcal{I}\). Then, (23) and (26) imply that, for each \(i \in \mathcal{I}\), \(x^*_i = 0\) if \(y_i = 0\). However, since \(x^* = 0\) does not satisfy (24), there must exist \(i \in \mathcal{I}\) such that \(y_i \ne 0\).

Since \(1+\alpha ^*d_i > 0\) for each \(i \in \mathcal{I}\), by (23) we have that

$$\begin{aligned} x_i^* = \frac{y_i}{1 + \alpha ^* d_i}, \quad \forall i \in \mathcal{I}. \end{aligned}$$
(27)

In order to derive a contradiction, suppose that the system has a solution \((\bar{x},\bar{\alpha }) \ne (x^*,\alpha ^*)\). If \(\bar{\alpha } = \alpha ^*\), then \(\bar{x} = x^*\) by (27). Thus, we must have \(\bar{\alpha } \ne \alpha ^*\). We shall divide the proof in the cases where \(\bar{\alpha } > -1/\lambda _{\max }(D)\) and \(\bar{\alpha } = -1/\lambda _{\max }(D)\).

Case 1 Suppose that \(\bar{\alpha } > -1/\lambda _{\max }(D)\). Then, \(1+\bar{\alpha }d_i > 0\) for each \(i \in \mathcal{I}\) and, therefore,

$$\begin{aligned} \bar{x}_i = \frac{y_i}{1 + \bar{\alpha } d_i},\quad \forall i \in \mathcal{I}. \end{aligned}$$

By (24), we have that

$$\begin{aligned} 1 = {x^*}^{\top }Dx^* = \sum _{i=1}^n d_i(x_i^*)^2 = \sum _{i=1}^n d_i\frac{y_i^2}{(1 + \alpha ^* d_i)^2}. \end{aligned}$$

If \(\bar{\alpha } < \alpha ^*\), then \(1 + \alpha ^*d_i> 1+\bar{\alpha }d_i > 0\) for each \(i \in \mathcal{I}\) and

$$\begin{aligned} 1 = \sum _{i=1}^n d_i\frac{y_i^2}{(1 + \alpha ^* d_i)^2} < \sum _{i=1}^n d_i\frac{y_i^2}{(1 + \bar{\alpha } d_i)^2} = \sum _{i=1}^n d_i(\bar{x}_i)^2 = \bar{x}^{\top }D\bar{x}. \end{aligned}$$

If \(\bar{\alpha } > \alpha ^*\), then \(0< 1 + \alpha ^*d_i < 1+\bar{\alpha }d_i\) for each \(i \in \mathcal{I}\) and

$$\begin{aligned} 1 = \sum _{i=1}^n d_i\frac{y_i^2}{(1 + \alpha ^* d_i)^2} > \sum _{i=1}^n d_i\frac{y_i^2}{(1 + \bar{\alpha } d_i)^2} = \sum _{i=1}^n d_i(\bar{x}_i)^2 = \bar{x}^{\top }D\bar{x}. \end{aligned}$$

In both cases we have that \(\bar{x}^{\top }D\bar{x} \ne 1\) and, therefore, \((\bar{x},\bar{\alpha })\) is not a solution to the system (23)–(25).

Case 2 Suppose that \(\bar{\alpha } = -1/\lambda _{\max }(D)\). Let \(\mathcal{I}^+ = \{i \in \mathcal{I} \mid d_i = \lambda _{\max }(D)\}\) and \(\mathcal{I}^- = \mathcal{I} \setminus \mathcal{I}^+\). By (23), we must have that \(y_i = 0\) for each \(i \in \mathcal{I}^+\), since \(1 + \bar{\alpha }d_i = 0\) for each \(i \in \mathcal{I}^+\). Thus, since \(\alpha ^* > -1/\lambda _{\max }(D)\), we must have that \(x_i^* = 0\) for each \(i \in \mathcal{I}^+\). Hence, since \({x^*}^{\top }Dx^* = 1\), there exists \(i \in \mathcal{I}^-\) such that \(x_i^* \ne 0\) and, consequently, \(y_i \ne 0\), since \(y_i = (1 + \alpha ^*d_i)x_i^*\). Since \(1+\bar{\alpha }d_i > 0\) for each \(i \in \mathcal{I}^-\), by (23) we have that

$$\begin{aligned} \bar{x}_i = \frac{y_i}{1 + \bar{\alpha } d_i}, \quad \forall i \in \mathcal{I}^-. \end{aligned}$$

Therefore,

$$\begin{aligned} 1 = {x^*}^{\top }Dx^* =&\sum _{i \in \mathcal{I}} d_i(x_i^*)^2 = \sum _{i \in \mathcal{I}^-} d_i(x_i^*)^2 = \sum _{i \in \mathcal{I}^-} d_i\frac{y_i^2}{(1 + \alpha ^* d_i)^2} < \sum _{i \in \mathcal{I}^-} d_i\frac{y_i^2}{(1 + \bar{\alpha } d_i)^2}\\ =&\sum _{i \in \mathcal{I}^-} d_i\bar{x}_i^2. \end{aligned}$$

Thus,

$$\begin{aligned} \bar{x}^{\top }D\bar{x} = \sum _{i \in \mathcal{I}} d_i\bar{x}_i^2 \ge \sum _{i \in \mathcal{I}^-} d_i\bar{x}_i^2 > 1, \end{aligned}$$

that is, \((\bar{x},\bar{\alpha })\) is not a solution to the system (23)–(25).

Hence, \((x^*,\alpha ^*)\) is the only solution to the system (23)–(25). \(\square \)

Proposition 4.2

Consider the ellipsoid \(\mathcal {E}= \{z \in \mathbb {R}^n \mid z^{\top } S z \le 1\}\), where \(S \in \mathbb {R}^{n \times n}\) is a symmetric and definite positive matrix. Given \(y \in \mathcal {E}\), there exists a unique \(\alpha \in [-1/\lambda _{\max }(S), 0]\) and there exists \(x \in \partial \mathcal {E}\) such that \(y = x + \alpha Sx\). Moreover, if \(\alpha \in (-1/\lambda _{\max }(S), 0]\) then \(x \in \partial \mathcal {E}\) is unique.

Proof

By Proposition 4.1, there exist \(x^* \in \partial \mathcal {E}\) and \(\alpha ^* \in [-1/\lambda _{\max }(S), 0]\) such that

$$\begin{aligned} y = x^* + \alpha ^* Sx^*. \end{aligned}$$
(28)

Suppose that \(\alpha ^* \in (-1/\lambda _{\max }(S), 0]\). We shall prove that there do not exist \(\bar{x} \in \partial \mathcal {E}\) and \(\bar{\alpha } \in [-1/\lambda _{\max }(S), 0]\) such that \(y = \bar{x} + \bar{\alpha } S\bar{x}\) and \((\bar{x}, \bar{\alpha }) \ne (x^*, \alpha ^*)\). In order to derive a contradiction, suppose that there exist \(\bar{x} \in \partial \mathcal {E}\) and \(\bar{\alpha } \in [-1/\lambda _{\max }(S), 0]\) such that

$$\begin{aligned} y = \bar{x} + \bar{\alpha } S\bar{x} \end{aligned}$$
(29)

and \((\bar{x}, \bar{\alpha }) \ne (x^*, \alpha ^*)\).

Since S is symmetric, there exist an orthogonal matrix \(Q \in \mathbb {R}^{n \times n}\) and a diagonal matrix \(D \in \mathbb {R}^{n \times n}\) formed by the eigenvalues of S such that \(S = QDQ^{\top }\) and \(\lambda _{\max }(S) = \lambda _{\max }(D)\) (see, for example, Theorem 8.1.1 in [26]). Consider the ellipsoid \(\mathcal {E}' = \{z \in \mathbb {R}^n \mid z^{\top } D z \le 1\}\).

Then, \(Q^{\top }y \in \mathcal {E}'\), \(Q^{\top }x^* \in \partial \mathcal {E}'\), \(Q^{\top }\bar{x} \in \partial \mathcal {E}'\). Moreover, since \(\lambda _{\max }(S) = \lambda _{\max }(D)\), we have that \(\alpha ^* \in (-1/\lambda _{\max }(D), 0]\) and \(\bar{\alpha } \in [-1/\lambda _{\max }(D),0]\). By left multiplying both sides of Eqs. (28) and (29) by Q, we obtain

$$\begin{aligned} Q^{\top }y&= Q^{\top }x^* + \alpha ^* D Q^{\top }x^* \end{aligned}$$
(30)

and

$$\begin{aligned} Q^{\top }y&= Q^{\top }\bar{x} + \bar{\alpha } D Q^{\top }\bar{x}. \end{aligned}$$
(31)

By Lemma 4.2, if \(\bar{\alpha } = \alpha ^*\), we must have \(Q^{\top }\bar{x} = Q^{\top }x^*\) and, consequently, \(\bar{x} = x^*\), which contradicts the hypothesis that \((\bar{x}, \bar{\alpha }) \ne (x^*, \alpha ^*)\). If \(\bar{\alpha } \ne \alpha ^*\), then (30) and (31) contradict Lemma 4.2.

Hence, if \(\alpha ^* \in (-1/\lambda _{\max }(S), 0]\), then the system

$$\begin{aligned}&y = x + \alpha S x,\\&x^{\top }Sx = 1,\\&\alpha \in [-1/\lambda _{\max }(S),0] \end{aligned}$$

has a unique solution. \(\square \)

The equation (22) implies that

$$\begin{aligned} {x^*}^{\top }\left( x^* + \mu ^* S_{ij} x^* - c_{i}^{ij}\right) = 0. \end{aligned}$$

Since \({x^*}^{\top } S_{ij} x^* = 1\), this implies that

$$\begin{aligned} {x^*}^{\top }\left( c_{i}^{ij} - x^*\right) - \mu ^* = 0. \end{aligned}$$

Therefore, since \(c_{i}^{ij} = P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j)\), any solution \(x^*\) to problem (21) together with the corresponding Lagrange multiplier \(\mu ^*\) satisfy

$$\begin{aligned}&{x^*}^{\top } \left( P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) - x^*\right) - \mu ^* = 0\\&x^* + \mu ^*S_{ij}x^* - P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) = 0. \end{aligned}$$

Thus, if we take \(\mathcal {X}_{ij}\) as a solution to problem (21) and \(\mathcal {U}_{ij}\) as the corresponding Lagrange multiplier, the constraints (18) become

$$\begin{aligned} \sum _{j=i+1}^m \max \left\{ 0, 1 - \Big \Vert P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) - \mathcal {X}_{ij}\Big \Vert ^2\right\} ^2 + \max \{0, \epsilon _{ij} - \mathcal {U}_{ij}\}^2 = 0, \quad \forall i \in I \setminus \{m\}. \end{aligned}$$
(32)

Thus, the variables \(x_{ij}\) and \(\mu _{ij}\) cease to be part of the non-overlapping model. In (32), we define \(\mathcal {X}_{ij}\) to be a mapping whose value is a solution to problem (21) and \(\mathcal {U}_{ij}\) is the function whose value is the Lagrange multiplier associated with the value of \(\mathcal {X}_{ij}\).

4.3 The model

To make it clearer that \(x_{ij}\) and \(\mu _{ij}\) are no longer variables of the model but functions of the centers and rotation angles of the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\), we shall rewrite (32) in the following way:

$$\begin{aligned} \sum _{j=i+1}^m f(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j) = 0, \quad \forall i \in I \setminus \{m\}, \end{aligned}$$
(33)

where

$$\begin{aligned} \begin{aligned} f(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j) =&\max \left\{ 0, 1 - \Big \Vert P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) - \mathcal {X}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\Big \Vert ^2\right\} ^2 \\&+\max \{0, \epsilon _{ij} - \mathcal {U}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\}^2, \end{aligned} \end{aligned}$$
(34)

\(\mathcal {X}(c_i,c_j,\Omega _i,\Omega _j; P_i, P_j)\) is a solution to the problem

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} \Big \Vert x - c_{i}^{ij}\Big \Vert ^2\\ \text {subject to} &{} x^{\top } S_{ij} x = 1, \end{array} \end{aligned}$$
(35)

and \(\mathcal {U}(c_i,c_j,\Omega _i,\Omega _j; P_i, P_j)\) is the Lagrange multiplier associated with this solution.

This new non-overlapping model has \(m-1\) constraints given by (33) and its variables are the centers of the ellipsoids (\(c_i \in \mathbb {R}^n\) for each \(i \in \{1,\dots ,m\}\)) and the rotation angles of the ellipsoids (\(\Omega _i \in \mathbb {R}^q\) for each \(i \in \{1,\dots ,m\}\)). Therefore, this model has a linear number of variables and a linear number of constraints on the number of ellipsoids to be packed. The following lemma shows that the constraints (33) constitute indeed a non-overlapping model.

Lemma 4.3

The constraints (33) are satisfied if and only if the ellipsoids do not overlap.

Proof

Firstly, notice that the function \(f(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\) is nonnegative at every point. Let \(i,j \in \{1,\dots ,m\}\) be such that \(i < j\). Suppose that the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) do not overlap. Thus, we have that the ellipsoids \(\mathcal {E}_i^{ij}\) and \(\mathcal {E}_j^{ij}\) do not overlap either. So, the distance from the center \(c_{i}^{ij}\) of the unit-radius ball \(\mathcal {E}_i^{ij}\) to the ellipsoid \(\mathcal {E}_j^{ij}\) is at least one. Since in this case \(c_{i}^{ij} \notin \mathcal {E}_j^{ij}\), we have that \(\mathcal {X}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\) is the projection of \(c_{i}^{ij}\) onto the ellipsoid \(\mathcal {E}_j^{ij}\). Therefore, recalling that \(c_{i}^{ij} = P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j})\), we have

$$\begin{aligned} \Big \Vert P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) - \mathcal {X}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\Big \Vert \ge 1. \end{aligned}$$

Consequently, we have that

$$\begin{aligned} \max \left\{ 0, 1 - \Big \Vert P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) - \mathcal {X}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\Big \Vert ^2\right\} ^2 = 0. \end{aligned}$$

Since \(\mathcal {X}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\) is the solution to problem (21) and \(\mathcal {U}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\) is the Lagrange multiplier associated with this solution and satisfies (22), we have that

$$\begin{aligned} \mathcal {U}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j) \ge 0 \end{aligned}$$

by Proposition 3.1. By Proposition 3.3, we have that

$$\begin{aligned} \mathcal {U}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j) \ge \epsilon _{ij}. \end{aligned}$$

Therefore,

$$\begin{aligned} \max \{0, \epsilon _{ij} - \mathcal {U}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\}^2 = 0. \end{aligned}$$

Hence, if the ellipsoids do not overlap, \(f(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\) vanishes.

Suppose that there exist two ellipsoids that overlap. Let \(i,j \in \{1,\dots ,m\}\) such that \(i < j\) be the indices of those ellipsoids. Let \(c_{i}^{ij}\) be the center of the ball \(\mathcal {E}_i^{ij}\), that is, \(c_{i}^{ij} = P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j})\). Let us consider two cases. Suppose that \(c_{i}^{ij} \notin \text {int}(\mathcal {E}_j^{ij})\). In this case, \(\mathcal {X}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\) is the projection of \(c_{i}^{ij}\) onto ellipsoid \(\mathcal {E}_j^{ij}\). Since \(\mathcal {E}_i\) and \(\mathcal {E}_j\) overlap, we have that the ellipsoids \(\mathcal {E}_i^{ij}\) and \(\mathcal {E}_j^{ij}\) also overlap. Thus,

$$\begin{aligned} \Big \Vert c_{i}^{ij} - \mathcal {X}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\Big \Vert < 1. \end{aligned}$$

Therefore,

$$\begin{aligned} \max \left\{ 0, 1 - \Big \Vert P_i^{-\frac{1}{2}}Q_i^{\top }(c_{i} - c_{j}) - \mathcal {X}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\Big \Vert ^2\right\} ^2 > 0. \end{aligned}$$

Now, suppose that \(c_{i}^{ij} \in \text {int}(\mathcal {E}_j^{ij})\). Since \(c_{i}^{ij} \in \text {int}(\mathcal {E}_j^{ij})\) and \(\mathcal {X}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j) \in \partial \mathcal {E}_j^{ij}\), by (22) and by Proposition 3.2, we must have \(\mathcal {U}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j) \le 0\). Moreover, since \(c_{i}^{ij} \ne \mathcal {X}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\), we must have \(\mathcal {U}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j) \ne 0\). Consequently,

$$\begin{aligned} \mathcal {U}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j) < 0. \end{aligned}$$

Then,

$$\begin{aligned} \max \{0, \epsilon _{ij} - \mathcal {U}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\}^2 > 0. \end{aligned}$$

Therefore, if the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) overlap, \(f(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\) is strictly positive.

\(\square \)

We now show that the overlapping measure f of two ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) given by (34) does not depend on the scaling of the ellipsoids.

Definition 1

Consider the set \(\mathcal {E}\subseteq \mathbb {R}^n\). For each \(\nu \in \mathbb {R}_{++}\), we define \(\nu \mathcal {E}= \{\nu x \mid x \in \mathcal {E}\}\).

Consider the ellipsoid \(\mathcal {E}_i = \{ x \in \mathbb {R}^n \mid (x - c_i)^{\top }Q_iP_i^{-1}Q_i^{\top } (x-c_i) \le 1\}\). Let \(\nu \in \mathbb {R}_{++}\). According to Definition 1, we have

$$\begin{aligned} \nu \mathcal {E}_{i}&= \{ \nu x \in \mathbb {R}^n \mid x \in \mathcal {E}_i\}\\&= \left\{ x \in \mathbb {R}^n \mid \nu ^{-1} x \in \mathcal {E}_i\right\} \\&= \left\{ x \in \mathbb {R}^n \mid (\nu ^{-1}x - c_i)^{\top }Q_iP_i^{-1}Q_i^{\top } (\nu ^{-1}x-c_i) \le 1 \right\} \\&= \left\{ x \in \mathbb {R}^n \mid (x - \nu c_i)^{\top }Q_i(\nu ^2 P_i)^{-1}Q_i^{\top } (x- \nu c_i) \le 1 \right\} . \end{aligned}$$

Thus, \(\nu \mathcal {E}_{i}\) is an ellipsoid defined by the tuple \((\nu c_i,\Omega _i,\nu ^2 P_i)\). It is centered at \(\nu c_i\), its semi-axis lengths form the diagonal of \(\nu P_i^{\frac{1}{2}}\), and it has the same orientation as the one of ellipsoid \(\mathcal {E}_{i}\).

Lemma 4.4

The function defined in (34) is invariant with respect to the scaling of the ellipsoids. That is, \(f(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j) = f(\nu c_i,\nu c_j,\Omega _i,\Omega _j;\nu ^2P_i,\nu ^2P_j)\) for each \(\nu \in \mathbb {R}_{++}\).

Proof

Consider the ellipsoids \(\mathcal {E}_i = \{ x \in \mathbb {R}^n \mid (x - c_i)^{\top }Q_iP_i^{-1}Q_i^{\top } (x-c_i) \le 1\}\) and \(\mathcal {E}_j = \{ x \in \mathbb {R}^n \mid (x - c_j)^{\top }Q_jP_j^{-1}Q_j^{\top } (x-c_j) \le 1\}\). Let \(\nu \in \mathbb {R}_{++}\). According to Definition 1, we have that

$$\begin{aligned} \nu \mathcal {E}_{i}&= \left\{ x \in \mathbb {R}^n \mid (x - \nu c_i)^{\top }Q_i(\nu ^2 P_i)^{-1}Q_i^{\top } (x- \nu c_i) \le 1 \right\} \\ \nu \mathcal {E}_{j}&= \left\{ x \in \mathbb {R}^n \mid (x - \nu c_j)^{\top }Q_j(\nu ^2 P_j)^{-1}Q_j^{\top } (x- \nu c_j) \le 1 \right\} . \end{aligned}$$

Notice that the constant \(\epsilon _{ij}\) given by Proposition 3.3 for the pair of ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) is the same for the pair of ellipsoids \(\nu \mathcal {E}_i\) and \(\nu \mathcal {E}_j\), since

$$\begin{aligned} \epsilon _{ij} \equiv \epsilon (P_i,P_j)&= \lambda _{\min }\left( P_i^{-1}\right) \lambda _{\min }\left( P_i^{\frac{1}{2}}\right) \lambda _{\min }(P_j) \lambda _{\min }\left( P_j^{-\frac{1}{2}}\right) \\&= \left( \nu ^{-2}\nu \nu ^2 \nu ^{-1}\right) \lambda _{\min }\left( P_i^{-1}\right) \lambda _{\min }\left( P_i^{\frac{1}{2}}\right) \lambda _{\min }(P_j) \lambda _{\min }\left( P_j^{-\frac{1}{2}}\right) \\&=\nu ^{-2}\lambda _{\min }\left( P_i^{-1}\right) \nu \lambda _{\min }\left( P_i^{\frac{1}{2}}\right) \nu ^2 \lambda _{\min }(P_j) \nu ^{-1}\lambda _{\min }\left( P_j^{-\frac{1}{2}}\right) \\&=\lambda _{\min }\left( (\nu ^2 P_i)^{-1}\right) \lambda _{\min }\left( (\nu ^2 P_i)^{\frac{1}{2}}\right) \lambda _{\min }\left( \nu ^2 P_j\right) \lambda _{\min }\left( (\nu ^2 P_j)^{-\frac{1}{2}}\right) \\&= \epsilon \left( \nu ^2 P_i, \nu ^2 P_j\right) . \end{aligned}$$

Thus, we have that

Let \((\nu \mathcal {E}_i)^{ij}\) be the set obtained by applying the transformation \(T_{ij}\) to the ellipsoid \(\nu \mathcal {E}_i\), that is,

$$\begin{aligned} (\nu \mathcal {E}_i)^{ij}&= \left\{ x \in \mathbb {R}^n \mid x = T_{ij}(z), z \in \nu \mathcal {E}_i\right\} \\&= \left\{ x \in \mathbb {R}^n \mid x = \left( \nu ^2 P_i\right) ^{-\frac{1}{2}} Q_i^{\top }(z - \nu c_j), z \in \nu \mathcal {E}_i\right\} \\&= \left\{ x \in \mathbb {R}^n \mid z = Q_i\left( \nu ^2 P_i\right) ^{\frac{1}{2}}x + \nu c_j, z \in \nu \mathcal {E}_i\right\} \\&= \left\{ x \in \mathbb {R}^n \mid \left( Q_i\left( \nu ^2 P_i\right) ^{\frac{1}{2}}x + \nu c_j - \nu c_i\right) ^{\top }Q_i\left( \nu ^2 P_i\right) ^{-1}Q_i^{\top }\left( Q_i(\nu ^2 P_i)^{\frac{1}{2}}x + \nu c_j - \nu c_i\right) \le 1\right\} \\&= \left\{ x \in \mathbb {R}^n \mid \left( \nu Q_iP_i^{\frac{1}{2}}x + \nu c_j - \nu c_i\right) ^{\top }Q_i\left( \nu ^2 P_i\right) ^{-1}Q_i^{\top }\left( \nu Q_iP_i^{\frac{1}{2}}x + \nu c_j - \nu c_i\right) \le 1\right\} \\&= \left\{ x \in \mathbb {R}^n \mid \nu \left[ Q_iP_i^{\frac{1}{2}}x - (c_i-c_j)\right] ^{\top }Q_i\nu ^{-2} P_i^{-1}Q_i^{\top }\nu \left[ Q_iP_i^{\frac{1}{2}}x - (c_i - c_j)\right] \le 1\right\} \\&= \left\{ x \in \mathbb {R}^n \mid \left[ Q_iP_i^{\frac{1}{2}}x - (c_i-c_j)\right] ^{\top }Q_iP_i^{-1}Q_i^{\top }\left[ Q_iP_i^{\frac{1}{2}}x - (c_i - c_j)\right] \le 1\right\} \\&= \left\{ x \in \mathbb {R}^n \mid \left[ x - P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j)\right] ^{\top }P_i^{-\frac{1}{2}}Q_i^{\top }Q_iP_i^{-1}Q_i^{\top }Q_iP_i^{\frac{1}{2}}\left[ x - P_i^{-\frac{1}{2}}Q_i^{\top }(c_i - c_j)\right] \le 1\right\} \\&= \left\{ x \in \mathbb {R}^n \mid \left[ x - P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j)\right] ^{\top }\left[ x - P_i^{-\frac{1}{2}}Q_i^{\top }(c_i - c_j)\right] \le 1\right\} \\&= \mathcal {E}_i^{ij}. \end{aligned}$$

Thus, \((\nu \mathcal {E}_i)^{ij} = \mathcal {E}_i^{ij}\). Similarly, we obtain \((\nu \mathcal {E}_j)^{ij} = \mathcal {E}_j^{ij}\). Therefore, since \(\mathcal {X}(c_i, c_j,\Omega _i,\Omega _j;P_i,P_j)\) is a projection of the center of \(\mathcal {E}_i^{ij}\) onto the frontier of \(\mathcal {E}_j^{ij}\) and \(\mathcal {X}(\nu c_i, \nu c_j,\Omega _i,\Omega _j;\nu ^2 P_i,\nu ^2 P_j)\) is a projection of the center of \((\nu \mathcal {E}_i)^{ij}\) onto the frontier of \((\nu \mathcal {E}_j)^{ij}\), \((\nu \mathcal {E}_i)^{ij} = \mathcal {E}_i^{ij}\) and \((\nu \mathcal {E}_j)^{ij} = \mathcal {E}_j^{ij}\) imply that

$$\begin{aligned} \mathcal {U}\left( \nu c_i, \nu c_j,\Omega _i,\Omega _j;\nu ^2 P_i, \nu ^2 P_j\right) = \mathcal {U}\left( c_i, c_j,\Omega _i,\Omega _j;P_i,P_j\right) \end{aligned}$$

by Proposition 4.2, and it is possible to take

$$\begin{aligned} \mathcal {X}\left( \nu c_i, \nu c_j,\Omega _i,\Omega _j;\nu ^2 P_i, \nu ^2 P_j\right) = \mathcal {X}\left( c_i, c_j,\Omega _i,\Omega _j;P_i,P_j\right) . \end{aligned}$$

Consequently,

Hence, the function defined in (34) is invariant with respect to the scaling of the ellipsoids. \(\square \)

Let \(c_{i}^{ij} \in \mathbb {R}^n\) and \(S_{ij} \in \mathbb {R}^{n \times n}\) be a symmetric and positive definite matrix. We have defined \(\mathcal {X}\) to be the mapping that returns a solution to problem (35) and \(\mathcal {U}\) to be the mapping that returns the Lagrange multiplier associated with that solution. By Proposition 4.2, any solution to problem (35) is associated with the same Lagrange multiplier. Therefore, no matter what optimal solution to problem (35) is returned by the mapping \(\mathcal {X}\), \(\mathcal {U}\) will behave like a function.

If \(c_{i}^{ij} \notin \text {int}(\mathcal {E}_j^{ij})\) then problem (35) has only one solution. On the other hand, if \(c_{i}^{ij} \in \text {int}(\mathcal {E}_j^{ij})\) then problem (35) may have multiple optimal solutions. Figure 2a illustrates some of those cases. This picture shows an ellipse with semi-axis lengths a and b with \(a > b\). The projection of the point \(y^1\) onto the frontier of the ellipse is univocally determined: the point \(x^1\). However, for any point \(y^2\) in the set \(\{y \in \mathbb {R}^2 \mid b^2/a - a< y_1 < a - b^2/a, y_2 = 0\}\), there are two projections: \(\bar{x}^2\) and \(\underline{x}^2\). If the ellipse is a circle, that is, \(a = b\), then every point in the frontier of the circle is a projection of the center of the circle onto the frontier. Even if we were able to determine a single solution to be returned by \(\mathcal {X}\) in the case in which multiple solutions exist (which is an easy task since it can be accomplished by tackling (35) with a deterministic nonlinear programming solver), \(\mathcal {X}\) would be discontinuous in those cases (see Fig. 2b). In any case, since the set of points for which problem (35) has multiple optimal solutions has zero measure, it does not appear that it would be an issue in practice.

Fig. 2
figure 2

a Projections of the points \(y^1\) and \(y^2\) onto the frontier of the ellipse. The projection of \(y^1\) is univocally determined; while \(y^2\) has two projections \(\bar{x}^2\) and \(\underline{x}^2\). b The points \(y^1\) and \(y^2\) can be arbitrarily close, but their projections \(x^1\) and \(x^2\) may be far from each other

Since the constraints (33) depend on \(\mathcal {X}\) and \(\mathcal {U}\) whose values are given by the solution of an optimization problem, the computation of their derivatives (at the points where they are continuous and differentiable) is nontrivial. In “Appendix”, we show how to compute the first and second order derivatives of the functions that define the constraints (33).

5 Evaluation of the constraints

5.1 Evaluation of the overlapping measure

In order to evaluate the constraints (33) at a given point, we need to find the values of \(\mathcal {X}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\) and \(\mathcal {U}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\). As we have seen, if the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) do not overlap, then \(f(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\) in the summation in (33) is zero. Thus, if we know that the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) do not overlap, then we do not need to evaluate the functions \(\mathcal {X}\) and \(\mathcal {U}\).

Suppose that we do not know whether the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) overlap. In this case, we need to compute the values of \(\mathcal {X}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\) and \(\mathcal {U}(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\). For this, we need to solve the problem

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} \frac{1}{2}\Big \Vert x - c_{i}^{ij}\Big \Vert ^2\\ \text {subject to} &{} x^{\top } S_{ij} x = 1, \end{array} \end{aligned}$$

where \(c_{i}^{ij} = P_i^{-\frac{1}{2}}Q_i^{\top }(c_i-c_j)\). By performing the change of variable \(w = S_{ij}^{\frac{1}{2}}x\), this problem is equivalent to the problem

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} \frac{1}{2}w^{\top } S_{ij}^{-1}w - {c_{i}^{ij}}^{\top }S_{ij}^{-\frac{1}{2}}w\\ \text {subject to} &{} w^{\top } w = 1. \end{array} \end{aligned}$$
(36)

The objective function of problem (36) is a convex quadratic function and the feasible set is the frontier of the unit-radius ball centered at the origin. This problem can be numerically solved by the algorithm proposed in [36].

5.2 Efficient evaluation of the constraints

The total number of terms presented in constraints (33) is \(O(m^2)\). However, most of these terms do not need to be computed when the constraints are evaluated at a point that is almost feasible, that is, a point where most of the ellipsoids do not overlap each other. In a feasible solution, only a constant number of ellipsoids may touch a given ellipsoid. For example, suppose that the ellipsoids are identical balls. In the two-dimensional case, at most six balls can touch a given ball. In the three-dimensional case, this number is twelve. For identical ellipsoids, the number of ellipsoids that can touch a given one will depend on the eccentricities of these ellipsoids. In any case, in a (almost) feasible solution, only O(m) terms need to be evaluated.

If the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) do not overlap, then the term associated with this pair of ellipsoids in the summation (33) is zero and does not need to be evaluated. A sufficient condition for the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) not to overlap is that their enclosing balls do not overlap. An enclosing ball of a set is a ball that contains that set. The minimal enclosing ball of \(\mathcal {E}_i\) is the ball with radius \(r_i = \lambda _{\max }(P_i^{\frac{1}{2}})\) centered at \(c_i\). Therefore, if

$$\begin{aligned} \Big \Vert c_i - c_j\Big \Vert \ge r_i + r_j \end{aligned}$$

then the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) do not overlap. Let

$$\begin{aligned} R = \max _{i \in I} \{r_i\}. \end{aligned}$$

Thus, every ellipsoid \(\mathcal {E}_i\) is contained in a ball with radius R centered at \(c_i\). Therefore, if

$$\begin{aligned} \Big \Vert c_i - c_j\Big \Vert \ge 2R \end{aligned}$$
(37)

then the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) do not overlap.

The method described here to identify the pairs of ellipsoids that do not meet the condition (37) has been used in other works, such as [15, 37], to identify pairs of balls that may overlap. Let \(l = 2R\) and consider a hypercube with edge length L that contains the container. This hypercube can be covered by \({\lceil L/l \rceil }^n\) hypercubes with edge lengths l whose interiors are mutually disjoint. We refer to each of these hypercubes with edge lengths l as a region. Two regions are adjacent if they share at least one vertex. Suppose that \(\mathcal {E}_i\) and \(\mathcal {E}_j\) have their centers in non-adjacent regions. In this case, since each region is a hypercube with edge length 2R, we must have \(\left\| c_i - c_j\right\| \ge 2R\), that is, the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) do not overlap. Therefore, if two ellipsoids are in non-adjacent regions, they do not overlap. If two ellipsoids are in the same region or in adjacent regions, they may or may not overlap. Hence, considering all the terms that appear in the constraints (33), we can evaluate only those that are associated with pairs of ellipsoids that lie in the same region or in adjacent regions.

Each ellipsoid can be assigned to a region in constant time based on its center. The region of the ellipsoid \(\mathcal {E}_i\) is defined as the tuple

$$\begin{aligned} (p([c_i]_1), \dots , p([c_i]_n)) \end{aligned}$$

where

$$\begin{aligned} p(x) = \min \{\max \{ 1, \lfloor x / l\rfloor \}, N_\mathrm{reg} \} \end{aligned}$$

and \(N_\mathrm{reg} = {\lceil L/l \rceil }\). The method to determine which pairs of ellipsoids should be considered works as follows. First, an n-dimensional array with \(N_\mathrm{reg}\) entries for each dimension is created. Each element of this array is associated with a region and stores a list with the indices of ellipsoids that belong to that region. This structure can be constructed in O(m) time. Also, there is a list with the non-empty regions (regions that have at least one ellipsoid). This list is also constructed in O(m) time. Then, for each non-empty region and for each ellipsoid \(\mathcal {E}_i\) in that region, the term \(f(c_i,c_j,\Omega _i,\Omega _j;P_i,P_j)\) associated with the ellipsoids \(\mathcal {E}_i\) and \(\mathcal {E}_j\) is computed for each ellipsoid \(\mathcal {E}_j\) in that region and in adjacent regions. Considering the case where all the ellipsoids have approximately the same size, each region will contain only a constant number of ellipsoids in an almost feasible solution. In this case, this algorithm performs in O(m) time.

6 Solvability of the proposed model

In the present paper, solutions to large-sized instances of ellipsoids’ packing problems are being sought. Nonlinear programming models (that involve a quadratic number of variables and constraints) for several variants of the problem of packing ellipsoids have already been introduced in the literature [8, 30,31,32,33, 38, 40]. Numerical experiments in all the mentioned papers show that state-of-the-art deterministic global optimization methods are able to find global solutions with a certificate of optimality for instances (of the models with a quadratic number of variables and constraints) with only 3 (three) ellipsoids in the two- and three-dimensional spaces. In all the papers, the authors highlight how fast the difficulty of the problems increases with the number of ellipsoids. In [32], it is suggested that global optimization strategies could become prohibitively expensive for an arbitrarily increasing number of ellipsoids and that large-sized instances might be tackled with heuristic approaches.

A non-overlapping model with a linear number of variables and constraints was introduced in Sect. 4. A procedure for evaluating the model in linear expected cost was described in Sect. 5. Those two ingredients overcome the quadratic cost inherent to all the previously introduced models. However, considering the poor performance reported in the literature, this improvement is not enough to expect that state-of-the-art deterministic global optimization solvers would be able to tackle the large-sized instances that are the focus of the present work.

In any case, assuming that applying existent deterministic global optimization tools to a given problem requires the analytical expressions of the problem to be written in an algebraic modelling language, there are two characteristics of the proposed non-overlapping model that impair the attempt of applying those tools to a packing problem based on it. The non-overlapping constraints (33) are constituted, as a whole, by the summation of a quadratic number of non-negative terms. Their efficient evaluation consists in determining, in linear expected cost, which terms may be strictly positive, discarding those that are certainly null. This procedure (described in Sect. 5) uses complex data structures and algorithms and, therefore, up to the authors’ acknowledge, it cannot be efficiently expressed in an algebraic modelling language. Moreover, once the terms that need to be evaluated have been identified, their evaluation requires the computation of implicit variables that are the (primal and dual) solutions to nonlinear optimization problems of the form (35). Since those subproblems are numerically solved, no analytical expressions for the implicit variables are available and, once again, this situation cannot be handled by an algebraic modelling language.

By the reasons exposed in the previous paragraphs, we concluded that the adequate tool for enhancing the probability of finding good quality solutions to large-sized instances of ellipsoids’ packing problem would be to consider stochastic multi-start global optimization methods. Two are the main ingredients of the considered method: (a) specially devised random initial guesses and (b) a robust local solver that was developed with the intention of actively pursue good quality local minimizers. The description of the considered local solver as well as the ad hoc initial guesses, that depend on the packing problem being considered, are described in detail in the following section.

7 Numerical experiments

In this section, we present some experiments to show that the non-overlapping model (33) with implicit variables introduced in the present work can be used to solve instances larger than the ones solved by the transformation-based model (8)–(11) introduced in [8]. Since the former model can be solved faster (by solved we mean that a stationary point can be found), it opens the possibility for the use of a multi-start strategy to potentially obtain better quality solutions. On the one hand, this strategy could not be applied with the original transformation-based model for medium- or large-sized instances due to the excessive amount of time spent to perform a single local minimization starting from a given initial guess. For example, a single local minimization of the problem of packing 100 three-dimensional ellipsoids within a minimum volume ball took almost 20 h in the numerical experiments reported in [8]. On the other hand, the model with implicit variables is not suitable for small-sized instances because of the overhead of evaluating the non-overlapping constraints that were designed for medium- and large-sized instances and, in that case, the models introduced in [8] should be preferred. For a given ellipsoids’ packing problem, the decision on whether the model introduced in the present work or the model proposed in [8] should be used depends on how different the ellipsoids being packed are and on the number of ellipsoids being packed. If there are O(m) ellipsoids that are much smaller than the largest ellipsoid, the “efficient evaluation” of constraint (33) described in Sect. 5.2 may cost \(O(m^2)\). If this is not the case, numerical experiments in [34] (that are not being included here by lack of space) suggest that \(m \ge 50\) is the rule of thumb for using the model with implicit variables introduced in the present work.

We have implemented two- and the three-dimensional versions of the introduced non-overlapping models with implicit variables in Fortran 2003, as well as the optimization procedure. To solve the nonlinear programming problems, we used Algencan [1, 11] version 3.0.0, which is available for downloading at the TANGO Project web page (http://www.ime.usp.br/~egbirgin/tango/). The models, the optimization procedure, and Algencan were compiled with the GNU Fortran compiler (GCC) 4.7.2 with the -O3 option enabled. The experiments were run on an Intel 2.4GHz Intel® Core i7-3770 with 16GB of RAM memory and Debian GNU/Linux 7.8 (Linux version 3.2.0-4-amd64) operating system. Our computer implementation and the solutions reported in this section are freely available at http://www.ime.usp.br/~lobato/.

Algencan is an augmented Lagrangian method for nonlinear programming that solves the bound-constrained subproblems using Gencan [2, 9, 10], an active-set method for bound-constrained minimization. Gencan adopts the leaving-face criterion described in [9], that employs spectral projected gradients defined in [12, 13]. For the internal-to-the-face minimization, Gencan uses an unconstrained algorithm that depends on the dimension of the problem and the availability of second-order derivatives. For small problems with available Hessians, a Newtonian trust-region approach is used (see [2]); while for medium- and large-sized problems with available Hessians a Newtonian line-search method that combines backtracking and extrapolation is used. When second-order derivatives are not available, each step of Gencan computes the direction inside the face using a line-search truncated-Newton approach with incremental quotients to approximate the matrix-vector products and memoryless BFGS preconditioners (this is the case of the problems considered in the present section, for which only first-order derivatives were coded). In all the experiments described in the present section, the local solver Algencan was run using its default parameters; while the optimality and feasibility tolerances \(\varepsilon _{\text{ feas }}\) and \(\varepsilon _{\text{ opt }}\) (that are parameters that must be provided by the user) were both set to \(10^{-4}\). Those tolerances, related to the stopping criteria, are used to determine whether a solution (stationary point) to the optimization problem being solved has been found. See [11, pp. 116–117] for details.

Although Algencan is a local nonlinear programming solver, it was designed in such a way that global minimizers of subproblems are actively pursued, independently of the fulfillment of approximate stationarity conditions in the subproblems. In other words, Algencan’s subproblem solvers try always to find the lowest possible function values, even when this is not necessary for obtaining approximate local minimizers. As a consequence, practical behavior of Algencan is usually well explained by the properties of their global-optimization counterparts [6]. The “preference for global minimizers” of Algencan has been discussed in [1]. This has also been observed in papers devoted to numerical experiments concerning Algencan and other solvers (see, for example, [27, 28] and the references therein). This does not mean at all that Algencan is always able to find global minimizers.

7.1 Two-dimensional packing

In this section, we consider the problem of packing ellipses within a minimum area container. Each instance is defined by the lengths of the semi-axes of the m ellipses to be packed and the type of the container. For illustration purposes, we have considered circles, ellipses, and rectangles as containers. We used the models introduced in [8] for modelling the confinement of ellipses within circles, ellipses, and rectangles. Assuming that the lengths of semi-axes of the i-th ellipse are \(a_i\) and \(b_i\), for each \(i \in I\), \(P_i^{\frac{1}{2}}\) is the diagonal matrix whose diagonal entries are \(a_i\) and \(b_i\). For each \(i \in I\), the rotation matrix for the i-th ellipse is given by (1). Therefore, the variables of the problem that determine the center and orientation of the i-th ellipse are \(c_i \in \mathbb {R}^2\) and \(\theta _i \in \mathbb {R}\), respectively, for \(i \in I\). When the container is a circle, the additional variable r determines the radius of the circular container. When the container is an ellipse, the additional variables a and b determine the lengths of semi-axes of the elliptical container. When the container is a rectangle, the additional variables l and w determine the length and width of the rectangular container. The containment model may have additional variables depending on the type of the container. For more details about the containment models, see [8].

7.1.1 Identical ellipses

Since this is a non-convex optimization problem, we employ a multi-start strategy in order to enhance the probability of finding good quality solutions. At each iteration of the multi-start strategy, an initial solution is constructed and then the NLP local solver Algencan is used to solve the problem starting from this initial solution. In the initial solution of the first iteration, the ellipses to be packed are not rotated and their centers are arranged in a generalisation of the hexagonal lattice for circles (see Fig. 3a). For each of the subsequent iterations, the initial solution is given by a random perturbation of the most recent feasible solution found. The centers and rotation angles of the ellipses are perturbed by at most 2.5%. A CPU time limit of 48 h for running the multi-start strategy on each considered instance was imposed.

Fig. 3
figure 3

First and final solutions to the problem of packing \(m=100\) identical ellipses with semi-axis lengths \(a=2\) and \(b=1\) within a minimum area ellipse. a First initial (feasible) solution. b Initial (infeasible) solution at iteration 542. c Solution found at iteration 542

Table 1 shows the results obtained for the problem of packing \(m \in \{100, 200, 300, 400, 500,\) \(1000 \}\) identical ellipses with semi-axis lengths 2 and 1 within a minimum area ellipse. The first column shows the number m of ellipses packed. The second and third columns show the area of the container and the density of the packing in the best solution found, respectively. Areas and densities are rounded to 5 decimal places (results up to the machine precision can be found in http://www.ime.usp.br/~lobato/). The fourth and fifth columns show the number of local minimizations and the total time spent until the best solution was found, respectively. The sixth column shows the total number of (completed) local minimizations performed within the imposed CPU time limit and the last column shows the average time per local minimization.

Table 1 Numerical results of the solutions found to the problem of packing \(m \in \{ 100, 200, 300, 400, 500, 1000 \}\) identical ellipses with semi-axis lengths \(a=2\) and \(b=1\) within a minimum area ellipse

Figure 3 shows graphical representations of the initial and the best solutions found for the problem of packing 100 ellipses within a minimum area ellipse. Figure 3a depicts the initial solution of the first iteration. Figure 3b represents the initial solution of the 542nd iteration and Fig. 3c shows the solution found in the 542nd local minimization. Figure 4 illustrates the best solutions found for the problem of packing m ellipses, for \(m \in \{200, 300, 400, 500, 1000\}\), within a minimum area ellipse.

Fig. 4
figure 4

Graphical representation of the solutions found to the problem of packing \(m \in \{200, 300, 400, 500, 1000\}\) identical ellipses with semi-axis lengths \(a=2\) and \(b=1\) within a minimum area ellipse

The results we obtained for the problem of packing identical ellipses within a minimum area rectangle are shown in Table 2. Figure 5 depicts the best solutions found within the imposed CPU time limit.

Table 2 Numerical results of the solutions found to the problem of packing \(m \in \{ 100, 200, 300, 400, 500, 1000 \}\) identical ellipses with semi-axis lengths \(a=2\) and \(b=1\) within a minimum area rectangle
Fig. 5
figure 5

Graphical representation of the solutions found to the problem of packing \(m \in \{100, 200, 300, 400, 500, 1000\}\) identical ellipses with semi-axis lengths \(a=2\) and \(b=1\) within a minimum area rectangle

7.1.2 Non-identical ellipses

We also dealt with the problem \(\mathcal {P}_C\) of packing m non-identical ellipses within a minimum area circle. We considered an instance formed by \(m=231\) ellipses with mutually different pairs of semi-axis lengths belonging to the set \(\{(1 + 0.2\alpha , 1 + 0.2\beta ) \mid \alpha ,\beta \in \{0,1,\dots ,20\}, \alpha \le \beta \}\). In order to create an initial solution to this problem, we considered another problem, called \(\mathcal {P}_S\), whose solution will serve as a starting guess to problem \(\mathcal {P}_C\). Problem \(\mathcal {P}_S\) is the problem of packing the ellipses without overlap so that the sum of the squared distances of the ellipses’ centers to the origin of the Cartesian coordinate system is minimum. For solving problem \(\mathcal {P}_S\), we considered a multi-start strategy similar to the one used for solving the problem of packing ellipses within a minimum area ellipse. We imposed a CPU time limit of 1 h to solve problem \(\mathcal {P}_S\). Figure 6a depicts the solution found to this problem within the given time limit. This is the first initial solution to problem \(\mathcal {P}_C\). (The circle that appears around the ellipses in Fig. 6a is not related to problem \(\mathcal {P}_S\) but it is part of the initial solution for problem \(\mathcal {P}_C\).) For problem \(\mathcal {P}_C\), the same multi-start strategy was used and Fig. 6b shows the best solution found after 48 h (and 135 local minimizations). In this solution, the container has radius 49.39860, approximately.

Fig. 6
figure 6

Initial and final solutions to the problem of packing \(m=231\) non-identical ellipses with semi-axis lengths belonging to the set \(\{(1 + 0.2\alpha , 1 + 0.2\beta ) \mid \alpha ,\beta \in \{0,1,\dots ,20\}, \alpha \le \beta \}\) within a minimum area ball. a Initial (feasible) solution. b Best solution found

Table 3 Numerical results of the solutions found to the problem of packing \(m \in \{ 100, 200, 300, 400, 500 \}\) identical ellipsoids with semi-axis lengths \(l_1 = 1\), \(l_2 = 0.75\), and \(l_3 = 0.5\) within a minimum volume ball
Table 4 Numerical results of the solutions found to the problem of packing \(m \in \{ 100, 200, 300, 400, 500 \}\) identical ellipsoids with semi-axis lengths \(l_1 = 1\), \(l_2 = 0.75\), and \(l_3 = 0.5\) within a minimum volume cuboid
Fig. 7
figure 7

First and final solutions to the problem of packing \(m=100\) identical ellipsoids with semi-axis lengths \(l_1 = 1\), \(l_2 = 0.75\), and \(l_3 = 0.5\) within a minimum volume ball. a First initial (feasible) solution. b Initial (infeasible) solution at iteration 369. c Solution found at iteration 369

Fig. 8
figure 8

Graphical representation of the solutions found to the problem of packing \(m \in \{200, 300, 400, 500\}\) identical ellipsoids with semi-axis lengths \(l_1 = 1\), \(l_2 = 0.75\), and \(l_3 = 0.5\) within a minimum volume ball

Fig. 9
figure 9

Graphical representation of the solutions found to the problem of packing \(m \in \{100, 200, 300, 400, 500\}\) identical ellipsoids with semi-axis lengths \(l_1 = 1\), \(l_2 = 0.75\), and \(l_3 = 0.5\) within a minimum volume cuboid

7.2 Three-dimensional packing

In this section, we consider the problems of packing ellipsoids within a minimum volume ball and within a minimum volume cuboid. We used the models introduced in [8] for modelling the confinement of ellipsoids within balls and cuboids. We deal with instances formed by \(m \in \{100,200,300,400,500\}\) identical ellipsoids with semi-axis lengths \(l_1 = 1\), \(l_2 = 0.75\), and \(l_3 = 0.5\). For each \(i \in I\), \(P_i^{\frac{1}{2}}\) is the diagonal matrix whose diagonal entries are \(l_1\), \(l_2\), and \(l_3\). For each \(i \in I\), the rotation matrix for the i-th ellipsoid is given by (2). As in the two-dimensional experiments, we use a multi-start strategy in order to find good quality solutions. The initial solution is defined as follows. The ellipsoids are not rotated and their centers correspond to m points in the set \(\{(\delta _1 l_1, \delta _2 l_2, \delta _3 l_3) \mid \delta _1, \delta _2, \delta _3 \in \mathbb {Z}\}\) that are closest to the origin. In this way, the ellipsoids do not overlap in the initial solution. The containers (ball and cuboid) in the initial solution are the smallest ones that contain the enclosing balls of each ellipsoid. For each of the subsequent iterations, the initial solution is given by a random perturbation of the most recent feasible solution found. The centers and rotation angles of the ellipsoids were perturbed by at most 10%.

Table 3 exhibits the results for the minimization of the volume of the spherical container; while Table 4 presents the results for the minimization of the volume of the cuboidal container. The first column shows the number m of packed ellipsoids. The second and third columns show the volume of the container and the packing density in the best solution found, respectively. The fourth and fifth columns show the number of local minimizations (number of multi-start iterations) and the total time required to find the best solution found. The sixth and seventh columns show the number of (completed) local minimizations and the average time per local minimization within the CPU time limit of 48 h.

Figure 7 depicts some solutions that appeared during the multi-start procedure for the problem of packing \(m=100\) identical ellipsoids within a minimum volume ball. Figure 7a illustrates the first initial solution that was constructed for this problem. Figure 7b shows the initial solution for the 369th iteration and Fig. 7c displays the solution found at iteration 369. Figure 8 illustrates the solutions for the problem of packing \(m \in \{200,300,400,500\}\) identical ellipsoids within a minimum volume ball; while Fig. 9 exhibits the solutions for the problem of packing \(m \in \{100,200,300,400,500\}\) identical ellipsoids within a minimum volume cuboid.

8 Concluding remarks

Up to the authors’ acknowledge, all published nonlinear programming models for avoiding the overlapping between ellipsoids have a quadratic number of variables and constraints on the number of ellipsoids to be packed. Therefore, when the number of ellipsoids is relatively large, solving these models become a computationally prohibitive task. In order to alleviate this shortcoming, we proposed a model with implicit variables, inspired by the transformation-based non-overlapping model introduced in [8], that has a linear number of variables and constraints. Moreover, a clever algorithm for evaluating the whole model in linear time was also proposed. Numerical experiments showed that, by considering the proposed non-overlapping model, it is possible to tackle problems one or two order of magnitude larger than the instances previously addressed in the literature.