Clustering Analysis of a Dissimilarity: a Review of Algebraic and Geometric Representation

Fortin, D.

doi:10.1007/s00357-019-09315-7

Clustering Analysis of a Dissimilarity: a Review of Algebraic and Geometric Representation

Published: 30 March 2019

Volume 37, pages 180–202, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Classification Aims and scope Submit manuscript

Clustering Analysis of a Dissimilarity: a Review of Algebraic and Geometric Representation

Download PDF

D. Fortin ORCID: orcid.org/0000-0002-1188-1806¹

330 Accesses
1 Citation
Explore all metrics

Abstract

It is customary to split clustering analysis into an optimization level, then a (preferably) graphical representation level to take benefit of human vision for an effective understanding of (big) data structure. This article aspires to clarify relationships between clustering, both its process and its representation, and the underlying structural graph properties, both algebraic and geometric, starting from the mere knowledge of a dissimilarity matrix among items, possibly with missing entries. It is inspired by an analogous work on seriation problem, relating Robinson property in a dissimilarity with missing entries, with interval graph recognition using a sequence of 4 lexicographic breadth first searches.

Expanding the Class of Global Objective Functions for Dissimilarity-Based Hierarchical Clustering

Article 04 September 2023

Combinatorial Optimization Approaches for Data Clustering

Consensus of Clusterings Based on High-Order Dissimilarities

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Clustering plays an important role in many real-life applications; the visualizing of the clustering result is deeply related with the clustering process itself. Hierarchical analysis receives most favors as it may be drawn in a plane; more generally, planar representation receives much attention especially under spectral studies: knowing the correlation matrix between items, the principal explanation of relationships among items lay in the plane of first eigenvectors; outliers, if any, are well discarded by looking at next eigenvectors plane and so on. In the presence of missing entries, some attempts have been made to appropriately fill the correlation matrix from the approximated distribution. In general, it is customary to study spectral and original space of a problem, separatedly; however, few studies in clustering face the issue of missing entries as such. This article aspires to clarify relationships between clustering, both its process and its representation, and the underlying structural graph properties, both algebraic and geometric, starting from the mere knowledge of a dissimilarity matrix among items, possibly with missing entries. It is inspired by an analogous work on seriation problem, relating Robinson property in a dissimilarity with missing entries, with interval graph recognition using a sequence of 4 lexicographic breadth first searches (LexBFS) (Fortin 2017).

The article is structured as follows: we review in Section 3 the clustering process and in Section 4 its planar representation from the algebraic and geometric point of views in each section; then, follow our minor contributions to existing algorithms for maximum matchings and ear decomposition of simple undirected graphs: we establish the intimate relationship between augmenting paths and a sequence of a LexBFS sweep and a lexicographic depth first search (LexDFS) sweep in Section 5; so do we between non-separating ear decomposition and a pair of LexDFS sweeps in Section 6. It puts further shed on the connection between the structure of these lexicographic traversals and the decompositions retrieved by the algorithms known for long and it is discussed further in Section 7.

We insert many (small sized) teaching examples as illustrative witnesses of the relationships between clustering analysis as a whole and the lexicographic traversals in underlying graphs.

2 Notations and Prerequisites

Vectors (resp. matrices) are denoted by lower (resp. upper) case letters with the all ones case e (resp. E); dot product 〈u |v〉 extends to matrices through vec(.) operator that stacks entries columnwise $\left \langle U\right .\left | V\right \rangle =\left \langle {\text {vec}}(U)\lvert {\text {vec}}(V)\right \rangle ={\sum }_{I}{\sum }_{J} u_{ij}v_{ij}$ where matrix scalar entries are lowercased for ease of reading; in particular $\left \langle U\right .\left | U\right \rangle =\lVert U \rVert _{F}^{2}$ is the usual Frobenius norm. We suppose the reader familiar with graphs and matchings and that he or she is relatively aware of concepts from computational geometry, mostly a basis of fundamental cycles of an undirected graph. For a graph G(V,E), we refer to its maximum genus γ_M(G), its cyclomatic number β_T(G) w.r.t. (with respect to) a spanning tree T (a.k.a. (also known as) cycle rank, Betti number) and (implictly) to its matching number ω(G) (a.k.a. Berge number). It is well known that the cardinality of a basis of fundamental cycles is β(G) = ∣ E ∣ −∣ V ∣ + 1. These invariants are involved in Sections 4.2 and 5 (Section 7.1 for weighted case) respectively.

We call circuit a simple cycle without repetition and prefer to use rooted circuit to speak of a circuit with a distinguished vertex, over the widespread usage of a Loop in computational geometry.

Since our study mainly addresses the representation of a clustering in some plane, we recall the definition of Schnyder property in orienting an undirected graph (raised after the seminal work by Schnyder 1989).

Definition 2.1

An orientation of an undirected graph G(V,E) fulfills a generalized Schnyder property with k colors, if all edges may be oriented in such a way that for each vertex v ∈ V, edges are ordered in, say clockwise ordering, and

the k outgoing orientations are clockwise compatible,
if ingoing orientation occurs between outgoing arcs colored i and i + 1 clockwise, then ingoing arc has color i − 1 modulo k,
no edge oriented in both directions share the same color orientation.

A k-Schnyder suspension of a graph is an orientation of edges satisfying the Schnyder property with k distinguished vertices having each a distinguished extra outgoing arc (no associated edge in G) whose color satisfies Schnyder property with a different color among all distinguished vertices (see Section 4.1.2 for standard Schnyder woods with only 3 colors).

3 Clustering from a Dissimilarity

Clustering aims at grouping items that resemble each other together while separating groups as much as possible for a better discrimination; in other words, we address the dilemma of minimizing the inertia intra-group while maximizing the inertia inter-group. Most algebraic approaches assume a dense matrix as input and usually fail in the circumstance of sparse matrix. In this section, we review optimization problems that address both algebraic and geometric issues on the support graph of the matrix data (whether missing entries exist or not).

3.1 Algebraic Clustering

As most approaches, the focus is put on minimizing the intra-group inertia only: for non-negative values, the 1-norm is equivalent to the 2-norm and for binary variable x_ij its square $x_{ij}^{2}$ is equal to x_ij, whence a linear assignment problem (LAP) to be minimized on the dissimilarity (equivalently maximized on the similarity d_max − D). However, in real-life instances, not all distances among items are known, so that matrices have missing entries (left blank in the example below, drawn from recognition of a permuted Robinson property (Fortin 2017)) that could not be properly filled by a unique value (huge value in the dissimilarity case) since it forces the same relationship among items while they may vary instead.

$$ \begin{array}{@{}rcl@{}} D=\left[\begin{array}{c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}ccc} 0& 9& 2& 11& 6& 11& 6& & 9& 11& 6& 11& 6& & 11& 11& 9& 11& 6 \\ 9& 0& 9& 11& 2& 11& & 6& 1& 11& 6& & 6& & 11& & 1& 11& 3 \\ 2& 9& 0& 11& 6& 11& 6& 6& 9& 11& 6& 11& 6& & 11& 11& & 11& 6 \\ 11& 11& 11& 0& 11& 8& 11& 11& 11& & 11& 8& & 11& 1& 8& & 2& \\ 6& 2& 6& 11& 0& 11& & 4& 2& 11& 4& & & 6& 11& 11& 2& 11& \\ 11& 11& 11& 8& 11& 0& & 11& 11& 1& 11& 5& 11& & 6& 3& 11& 6& 11 \\ 6& & 6& 11& & & 0& 4& 3& 11& 4& 11& 4& & 11& 11& 3& 11& 2 \\ & 6& 6& 11& 4& 11& 4& 0& 5& 11& 1& 11& & 4& & 11& 5& 11& 4 \\ 9& 1& 9& 11& 2& 11& 3& 5& 0& 11& 5& 11& 6& 9& 11& 11& & & 3 \\ 11& 11& 11& & 11& 1& 11& 11& 11& 0& 11& 5& & 11& 7& 2& 11& 6& 11 \\ 6& 6& 6& 11& 4& 11& 4& 1& 5& 11& 0& 11& 2& 4& 11& 11& 5& 11& 4 \\ 11& & 11& 8& & 5& 11& 11& 11& 5& 11& 0& 11& 11& 2& 5& 11& 1& 11 \\ 6& 6& 6& & & 11& 4& & 6& & 2& 11& 0& 4& 11& 11& 6& 11& \\ & & & 11& 6& & & 4& 9& 11& 4& 11& 4& 0& 11& 11& & 11& 6 \\ 11& 11& 11& 1& 11& 6& 11& & 11& 7& 11& 2& 11& 11& 0& 7& & & 11 \\ 11& & 11& 8& 11& 3& 11& 11& 11& 2& 11& 5& 11& 11& 7& 0& 11& 7& \\ 9& 1& & & 2& 11& 3& 5& & 11& 5& 11& 6& & & 11& 0& 11& 3 \\ 11& 11& 11& 2& 11& 6& 11& 11& & 6& 11& 1& 11& 11& & 7& 11& 0& \\ 6& 3& 6& & & 11& 2& 4& 3& 11& 4& 11& & 6& 11& & 3& & 0 \end{array}\right] \end{array} $$

3.1.1 Linear Assignment Problem

For non-dense and non-square case, i.e., bipartite graph, multiscaling (auction) algorithm (Bertsekas and Castañon 1992) has been experienced more efficient than Hungarian method, provided the scaling is appropriately tailored (Buš and Tvrdík 2009).

For non-square m × n LAP, say landscape shape m < n, let us denote row (resp. column) indices by I (resp. J) and let us introduce extra binary primal variables to turn inequalities into equalities

$$ \begin{array}{@{}rcl@{}} && \max\left\langle{C}\lvert{X}\right\rangle \end{array} $$

(1)

$$ \begin{array}{@{}rcl@{}} &&\text{s.t.} \end{array} $$

(2)

$$ \begin{array}{@{}rcl@{}} &\pi_{i}\lvert\quad&\sum\limits_{J} x_{ij}= 1,\quad i\in I \end{array} $$

(3)

$$ \begin{array}{@{}rcl@{}} &p_{j}\lvert\quad&\left( x_{sj}+\sum\limits_{I} x_{ij}\right)= 1,\quad j\in J \end{array} $$

(4)

$$ \begin{array}{@{}rcl@{}} &\lambda\lvert\quad&\sum\limits_{J} x_{sj}=n-m \end{array} $$

(5)

$$ \begin{array}{@{}rcl@{}} &0\leq\zeta_{ij}&\lvert\quad x_{ij}\geq 0,\quad i\in I, j\in J \end{array} $$

(6)

$$ \begin{array}{@{}rcl@{}} &0\leq\sigma_{j}&\lvert\quad x_{sj}\geq 0,\quad j\in J \end{array} $$

(7)

where multipliers have been written on the left on each constraint and extra variables x_sj capture the landscape shape columnwise. The dual Lagrangian rewrites

$$ \begin{array}{@{}rcl@{}} \sum\limits_{I}\sum\limits_{J} c_{ij}x_{ij}+\sum\limits_{I} \pi_{i}\left( 1-\sum\limits_{J}x_{ij}\right)+\sum\limits_{J} p_{j}\left( 1-x_{sj}-\sum\limits_{I}x_{ij}\right) \end{array} $$

(8)

$$ \begin{array}{@{}rcl@{}} \quad -\lambda\left( (n-m)-\sum\limits_{J}x_{sj})\right)+\sum\limits_{I}\sum\limits_{J}\zeta_{ij}x_{ij}+ \sum\limits_{J}\sigma_{j}x_{sj} \end{array} $$

(9)

$$ \begin{array}{@{}rcl@{}} x_{ij}\in\left\{0,1\right\}^{mn} \end{array} $$

(10)

subject to first order optimality conditions:

$$ \begin{array}{@{}rcl@{}} c_{ij}-\pi_{i}-p_{j}+\zeta_{ij}= 0\\ -p_{j}+\lambda+\sigma_{j}= 0 \end{array} $$

hence, the dual problem:

$$ \begin{array}{@{}rcl@{}} &\min& \sum\limits_{I} \pi_{i}+\sum\limits_{J} p_{j}-\lambda(n-m) \end{array} $$

(11)

$$ \begin{array}{@{}rcl@{}} &\text{s.t.} & \pi_{i}+p_{j}=c_{ij}+\zeta_{ij}\geq c_{ij} \end{array} $$

(12)

$$ \begin{array}{@{}rcl@{}} &&p_{j}=\lambda+\sigma_{j}\geq \lambda \end{array} $$

(13)

together with the complementary conditions at optimum $\bar \zeta _{ij}=\bar \sigma _{j}= 0$. It exhibits the separability in the profit π_i and price p_j dual variables from the primal LAP.

Given an assignment, the auction algorithm selects λ = minj assignedp_j and σ_j = 0 for all j unassigned, i.e., x_sj = 1 and then, it alternates between forward and reverse bidding:

forward: for unassigned i, let j_i = argmax(c_ij − p_j) then
$$ \begin{array}{@{}rcl@{}} \text{bid}_{i}=p_{j_{i}}+{\displaystyle\max_{1}}(c_{ij}-p_{j})-{\displaystyle\max_{2}}(c_{ij}-p_{j})+\epsilon\geq \min(p_{j_{i}}+\epsilon,\infty) \end{array} $$
for first and second maximum written max1 and max2 respectively. Price and profit are updated first as
$$ \begin{array}{@{}rcl@{}} p_{j_{i}}=&\max(\lambda,\text{bid}_{i})\quad (\text{increase})\\ \pi_{i}=&{\displaystyle\max_{2}}(c_{ij}-p_{j})-\epsilon \end{array} $$
and if bid_i ≥ λ then i is assigned to j_i, whence $\pi _{i}+p_{j_{i}}=\text {bid}_{i}+\pi _{i}=p_{j_{i}}+{\displaystyle \max _{1}}(c_{ij}-p_{j})=c_{ij_{i}}$ as required for optimality.
reverse: for unassigned j such that p_j > λ let i_j = argmax(c_ij − π_i) then
$$ \begin{array}{@{}rcl@{}} \text{bid}_{j}=\pi_{i_{j}}+{\displaystyle\max_{1}}(c_{ij}-\pi_{i})-\max(\lambda+\epsilon,{\displaystyle\max_{2}}(c_{ij}-\pi_{i}))+\epsilon \end{array} $$
similarly to forward case.
- if max1(c_ij − π_i) ≥ λ + 𝜖 then
  $$ \begin{array}{@{}rcl@{}} p_{j}&=&\max(\lambda+\epsilon,{\displaystyle\max_{2}}(c_{ij}-\pi_{i}))-\epsilon\\ \pi_{i_{j}}&=&\text{bid}_{j}\quad (\text{increase}) \end{array} $$
  and j is assigned to i_j so that $p_{j}+\pi _{i_{j}}=c_{i_{j}j}$ as required for optimality.
- otherwise p_j = max1(c_ij − π_i) − 𝜖 < λ; notice that, since j is no longer examined, we may degrade p_j = λ.

The multiscaling scheme comes after the way 𝜖 goes to 0 (Buš and Tvrdík 2009).

To avoid the cycling of the forward-reverse auction, we must ensure an irreversible progress before the switching between the forward and reverse stages. This is guaranteed if the switching is done only when the assignment is enlarged in the current forward or reverse stage.

If at least one feasible assignment exists for the square n × nLAP with integer costs, then the scaled Forward Reverse Auction Algorithm (FRAA) has been proven to return an optimal assignment in O(n³ log(nc_max)) steps. Among other algorithms, FRAA offers an easy parallelization for huge cost matrices of real-life applications (biological data).

3.1.2 Subtour Patching

The idea of subtour-patching is borrowed from the traveling salesperson problem (TSP): start with finding an optimal assignment ϕ for the linear assignment problem. If ϕ is a tour, it is clearly a shortest tour and TSP optimality is done. Otherwise, ϕ consists of several subtours (also called 2-factors). In this case, patch the subtours together to yield a single tour, namely, an optimal tour. Summarizing, the problem is given an optimal assignmentϕ , find a permutation α such that ϕ ∘ αis an optimal tour (Burkard et al. 1998; Deineko 2004; Deineko et al. 2006).

Let ϕ be a permutation with two subtours ϕ₁ and ϕ₂ where i ∈ ϕ₁ and j ∈ ϕ₂ hold. Essentially, the subtour patching scheme relies on the fact that if we postmultiply ϕ by the transposition (i,j), this will result in a permutation where the two subtours ϕ₁ and ϕ₂ are patched together. Hence, the number of subtours of ϕ ∘ (i,j) is one less than the number of subtours of ϕ . A permutation α will be called a patching permutation for ϕ if ϕ ∘ α is a cyclic permutation.

Permutation α is called an optimal patching permutation for ϕ if ϕ ∘ α is an optimal tour. The theory of subtour patching is based on the Gilmore and Gomory result that every tree permutation on a certain patching graph is a patching permutation; then by dynamic programming, a patching permutation is retrieved by multiplying adjacent transpositions among subtours. Notice that for special four points condition on the cost matrix, the exponential neighborhood searched in polynomial time may shrink , e.g., in Kalmanson, Supnick, van der Veen structured matrices. In clustering case, there is no need to patch all subtours together but it is aimed at balancing cluster sizes; therefore, we define a patching graph among tiny 2-factors G(V,E) with a vertex per 2-factors and multiple edges weighed by all defined entries between 2-factors from the dissimilarity. In above example, consider the assignment which has 2-factors, in disjoint cycles notation: (1,3), (2,9,5), (4,15), (6,10), (7,19,17,16,12,18,14,13), (8,11) and patching graph Fig. 1. Remind that clustering aims at grouping together items that resemble each other and at separating groups as much as possible for a better discrimination; therefore, it makes sense to threshold the patching graph with a small to medium dissimilarity level.

As in TSP case the ordering of 2-factors play a prominent role for patching them; therefore, it raises a seriation problem whose solution could be handled by thresholding the weights as in Robinsonian dissimilarity recognition (Fortin 2017).

3.2 Geometric Clustering

The fundamental problem in distance geometry is the assigned Distance Geometry Problem (aDGP), a decision problem that, given an integer K > 0 and a connected simple edge-weighted graph G = (V,E,d) where d is a positive edge weight function $E \mapsto \mathbb {R}^{+}$, asks whether there exists a realization x that maps V to $\mathbb {R}^{K}$ such that

$$ \begin{array}{@{}rcl@{}} \lVert x(u)-x(v)\rVert = d_{uv},\quad{\text{~for all~}}~(u, v)\in E \end{array} $$

where ∥.∥ indicates an arbitrary norm (most applications using the Euclidean norm). The aDGP is an inverse problem: whereas computing some of the pairwise distances given the positions of the points is an easy task, the inverse inference (retrieving the point positions given some of the distances) is not so easy. Notice that a realization can be represented by a ∣V ∣ × K matrix, the i-th row of which is the location vector x_i for vertex i ∈ V. The even more complicated unassigned Distance Geometry Problem (uDGP) deals with a set of (possibly multiple) distances and aims at the same condition, together with assigning all the distances in the set to actual entries in the distance matrix (Duxbury et al. 2016). Where aDGP solved for a dissimilarity, then we could apply the celebrated k-means algorithm to achieve the clustering; notice that our algebraic approach above remains valid to fix the a priori number of clusters. Notice that aDGP is related to the widely used multidimensional scaling.

4 Representation of a Clustering

This section is devoted to the challenging issue of representing the clusters, obtained by optimization problems, in some surface so that the human vision capability helps the user to give a structural interpretation of the (huge) data under study; again, it is divided into algebraic and geometric tools at this aim.

4.1 Algebraic Representation: Simplicial Embedding

In this section, we apply fundamental forms of integral matrices theoretically studied by Newman (1972), in a computational perspective; then we address, the representation of these matrices on some surface (of small genus to be effective, e.g., either the plane or the torus) for visualizing purpose to help the interpretation about data structure. A bunch of pathological cases are provided to stress the issues. We focus on surface of small genus (either the sphere/plane or the torus), yet we exhibit relatively simple structural properties that require higher order genus.

4.1.1 Singular Ideals Decomposition

Smith Normal Form (SNF) is the diagonal matrix S such that S[i,i] divides S[i + 1,i + 1] up to the rank of A, and S = LAR where L and R are unimodular. It is polynomially computable; yet, it suffers fast-growing entries (called expression swell) and requires integer arithmetic in arbitrary precision together with modular reduction; among others, Storjohann (1998) computes S while bounding the expression swell in terms of the determinant det(S) (see Hruz and Fortin 1993 for worst case expression swell). Storjohann (1998) gives all the complexity proofs; however, for the sake of completeness, we detail the easy cases omitted in his article. Let T be a 2 × 2 upper triangular matrix with coprime entries, using Bézout identity for extended $\gcd $, b₁t₁₁ + b₂t₁₂ = h where $h=\gcd (t_{11},t_{12})$ may be greater than 1, and q₁ = t₁₁/h, q₂ = t₁₂/h

$$ \begin{array}{@{}rcl@{}} \left[\begin{array}{cc} 1&0\\-b_{2}q_{2}&1 \end{array}\right] \left[\begin{array}{cc} t_{11}&t_{12}\\0&t_{22} \end{array}\right] \left[\begin{array}{cc} b_{1}&-q_{2}\\b_{2}&q_{1} \end{array}\right] =\left[\begin{array}{cc} h&0\\0&q_{1}t_{22} \end{array}\right] \end{array} $$

A careful look at the induction proof shows that it arises at upper left corner reduction and at each lower right corner whenever a_k− 1 does not divide a_k with the notations of the reference above. As a consequence, the induction a_i divides a_j for j ∈ (i,k] requires to backtrack upwards in order to retrieve the largest index i ∈ [0,k − 1] such that a_i divides g = gcd(a_i+ 1,…,a_k) (with the convention a₀ = 1) and then to apply 2 × 2 diagonal transform

$$ \begin{array}{@{}rcl@{}} \left[\begin{array}{cc} b_{1}& b_{2}\\-s_{2}& s_{1} \end{array}\right] \left[\begin{array}{cc} ghs_{1}& 0\\0& ghs_{2} \end{array}\right] \left[\begin{array}{cc} 1& 0\\1& 1 \end{array}\right] \left[\begin{array}{cc} 1& -b_{2}s_{2}\\0& 1 \end{array}\right] =\left[\begin{array}{cc} gh& 0\\0& ghs_{1}s_{2} \end{array}\right] \end{array} $$

where b₁s₁ + b₂s₂ = 1 after factoring out h = gcd(s₁,s₂)

It is well known that SNF is unique but left and right unimodular matrices are not. It is customary to use Hermite normal form (HNF) to deal with principal submatrix, i.e., left-sided unimodular transform, then apply double-sided unimodular reduction to yield SNF as in 2 × 2 cases above. However, we may apply double-sided reductions from the very beginning, using different pivoting strategies to find an upper trapezoidal matrix: either partial pivoting along the row and the column, or full pivoting within the lower right rectangular matrix (as much as Gauss inverse in field case); furthermore, either the minimum or the maximum in absolute value may be chosen whose influence on expression swell vary. At the end of first phase, we get a simplicial decomposition:

$$ \begin{array}{@{}rcl@{}} LAR=\left[\begin{array}{cc} D&C\\0&0 \end{array}\right] \end{array} $$

where D is diagonal with D[i,i] divides D[i + 1,i + 1] and C is rectangular and reduced rowwise w.r.t. the diagonal element. By analogy to the field case (QRP, resp. QR decomposition), we call it a singular (resp. principal) ideals decomposition according to the double-sided (resp. Hermite) triangular initialization. Since every column in C is a convex combination of the columns in D, it justifies the name simplicial decomposition. In a second phase, C is zeroed rowwise using extended $\gcd $ w.r.t. diagonal element to lead to a square lower triangular matrix whose transpose feeds the induction phase once more yielding the unique SNF; example below shows the factorization on the matrix in right hand side and expression swell of the entries of left and right factors are displayed in Table 1

$$ \begin{array}{@{}rcl@{}} \left[\begin{array}{cccccccc} 8& 0 & 0 & 0 & 0 & 0 & 0& 0\\ 0& 16& 0 & 0 & 0 & 0 & 0& 0\\ 0& 0 & 16& 0 & 0 & 0 & 0& 0\\ 0& 0 & 0 & 16 & 0 & 0 & 0& 0\\ 0& 0 & 0 & 0 & 16& 0 & 0& 0\\ 0& 0 & 0 & 0 & 0 & 128& 0& 0 \end{array}\right]=L \left[\begin{array}{cccccccc} 72 & 120& 120& 88& 56 & 88& 72 & 56 \\ 104& 72 & 24 & 56& 8 & 72& 88 & 72 \\ 56 & 56 & 40 & 40& 120& 72& 24 & 120\\ 72 & 120& 120& 88& 56 & 56& 120& 8 \\ 8 & 104& 72 & 88& 72 & 88& 40 & 88 \\ 104& 8 & 88 & 56& 72 & 72& 104& 88 \end{array}\right]R \end{array} $$

After the first phase, all pivoting policies yield the same upper left decomposition but a different lower right matrix:

$$ \begin{array}{@{}rcl@{}} \left[\begin{array}{cc}\text{diag}(\left[\begin{array}{cc}8,16,16,16 \end{array}\right])& 0\\ 0& B \end{array}\right] \end{array} $$

Table 1 Expression swell for Hermite phase, then various pivoting policies for Smith phase

Full size table

Since D[i,i] divides D[i + 1,i + 1] for all 1 <= i < k, then D^‡ = D[k,k]D^− 1 is integral and R^− 1D^‡L^− 1 defines a pseudo-inverse up to D[k,k] scaling.

4.1.2 Finding a Suspension

Definition 4.1

For each vertex, v in a k-Schnyder suspension is associated with a vector whose i-th component is the cyclomatic number of the cone delimited by the outgoing arcs i + 1 and i − 1 w.r.t. cyclic ordering mod k around v.

A graph is planar if and only if it has a 3-Schnyder suspension; therefore, its cyclomatic vectors fulfill a Principal Ideals Decomposition starting from the 3 suspension vertices, so that the base of the simplex they define contains all the remaining vertices, and the circle circumscribed to the base too; therefore, it yields a polar coordinates representation with center as the apex projection on the base. Examples in Figs. 2 and 3 give a planar representation along SNF to show the successful case of simplicial embedding. However, both have an Asteroidal-Triple (AT); therefore, they do not belong to the interval, permutation, trapezoid, and co-comparability graph classes; as such, the recognition of which class of graphs they belong to remains opened.

The authors in Felsner and Zickfeld (2008) study 3-Schnyder and orthogonal surfaces in order to lift vertices inside the base such that edges have a 3D non-crossing representation with a single bend, called a rigid representation.

All other graphs in the sequel are non-planar; even with a k-Schnyder suspension, there may be no longer a principal ideals decomposition with the k roots, possibly with row and column rank deficiencies that could not lead to a direct polar embedding in the base of the simplex defined by the roots of suspension trees. A bunch of examples shows the spread of properties within graph classes, SNF and Schnyder’s woods: Fig. 4 for its obstruction to planarity, Fig. 5 for its not Hamiltonian (left)/ triangle-free (right) properties, Fig. 6 for its Hamiltonian (TSP related), triangle-free and 4-colorable properties, Fig. 7 for its polynomial recognition as complete multipartite, Fig. 8 for its 4-edge-disjoint trees Section 4.1.3, Figs. 9 and 10 for their embedding in the torus (genus equals 1), and finally Fig. 11 for its embedding in a surface of higher genus to be addressed in Section 4.2.

$$ \begin{array}{@{}rcl@{}} H=\left[\begin{array}{cccccc} 2& 0& 8 & 0 & 8 & 8 \\ 0& 2& 2 & 2 & 2 & 2 \\ 0& 0& 12& 0 & 12 & 12\\ 0& 0& 0 & 0 & 0 & 0 \end{array}\right] \end{array} $$

Instead, we are looking for a planar immersion in the simplex base at largest principal ideal given by SNF S, say hyperplane 〈e|x〉 = S[k,k], since the cyclomatic vectors C rewrite, using decomposition of left and right inverses accordingly:

$$ \begin{array}{@{}rcl@{}} &\left[\begin{array}{cc} \text{diag}(S)& 0\\0& 0 \end{array}\right]=L C R,\quad \det(L)=\det(R)=\pm 1\\ &\left\{\begin{array}{l} L{^{-1}}=\left[\begin{array}{l} L_{1}\\L_{2} \end{array}\right]\\ R{^{-1}}=\left[\begin{array}{cc} R_{1}&R2 \end{array}\right] \end{array}\right.\\ &C=R_{1} \text{diag}(S)L_{1} \end{array} $$

It is eventually required to search for a grid (more generally lattice) immersion in the selected base; in this event, shortest/closest lattice vector problems are involved (Hanrot et al. 2011). Anyway, the rank 1 deficiency remains intractable w.r.t. algebraic processing despite a neat structure is available.

4.1.3 Edge Disjoint Trees

Let us define a k-Schnyder standard if and only if edges have a single orientation; for standardizing, duplicate each bicolored edge to get single colored edges then assign endpoints appropriately around outgoing edges to fulfill Schnyder property. Notice that it simply introduces multiple edges in the original graph but it results in distorted principal ideals. It yields a representation of the dissimilarity with k edge disjoint trees in spirit of Bailey et al. (2014), Durocher and Mondal (2015), Kundu (1974), Kaiser (2012), and Li et al. (2015) do; however, it changes the algebraic structure since the principal ideals are different, and the geometric structure as well, by introducing a torsion around the vertices during standardization of bicolored edges (see Fig. 8).

4.2 Geometric Representation: Embedding in Orientable Surface with Higher Genus

Since Chvátal graph is 4-connected, it is upper-embeddable (Jæger and Kundu, see Xuong 1979) in an orientable surface of genus $\left \lfloor \frac {\beta (G)}{2}\right \rfloor = 6$; no matter the edges wrap around the canonical fundamental polygon, we arrive at the planar representation inside the polygon as shown in Fig. 11. However, all crossings occur outside the fundamental polygon so that the actual structure is hidden. Various crossing numbers have been defined, especially bundled crossing number where edges are grouped for ease of visualizing (Alam et al. 2016; Holten and van Wijk 2009) in the spirit of our study, and more generally on crossing families (Aronov et al. 1994; Mohar 2009).

In spite of a numerous literature for embedding in higher genus orientable surface (Ren et al. 2009; Cabello et al. 2016; Johnson 1975), a few only address planar embedding in small genus (Castelli Aleardi et al. 2009; Gonçalves and Lévêque 2012) and all require a triangulated graph w.r.t. the orientable surface.

5 Strongly Simple Augmenting Paths

This section focuses on the matching number of a bipartite graph in connection with discovering the genus of a surface well adapted for embedding the exact/approximated solution of an underlying optimization problem. Alternating augmenting paths for bipartite graphs are well known since Berge’s work; for simple connected graphs, starting from some matching we review in this section the augmenting path framework. Maximum cardinality matchings play a fundamental role in studying the genus of a graph (Kotrbčík and Škoviera 2012). It is involved too in maximum weighted matching implementation of blossom framework; despite this augmenting phase is marginal in the whole primal-dual scheme (Kolmogorov 2009), recently, N. Blum gives another implementation, say BLOSSOM VI, of the algorithm with varying δ parameter (Blum 2015). His modifications of BFS and DFS to handle general graphs may be rephrased in respectively LexBFS and LexDFS using the partition refinement paradigm whose implementation is quite easy (Habib et al. 2000).

Given some matching M in a graph G = (V,E), a vertex is M-free whenever it is not incident to a matching edge; w.l.o.g. we consider that M-free vertices form a stable set or else the matching could be improved by at least one edge. A path P in G is M-alternating if it contains alternately edges in M and not in M; a simple M-alternating path is M-augmenting if it starts and end with an M-free vertex., then clearly the symmetric difference PΔM yields a matching of cardinality 1 greater than M, whence the original Berge’s theorem. When the graph is general, it is first transformed into a directed bipartite graph $\vec {G}_{M}=(A,B,\vec {E}_{M})$ w.r.t. the matching M as follows: A and B are both copies of V and for all (u,v) ∈ E, (u^A,v^B) and (v^A,u^B) (resp. (u^B,v^A) and (v^B,u^A)) belong to $\vec {E}_{M}$ according to whether (u,v) ∈ M (resp. ∉M). For sake of simplicity, we number u^A = 2u (resp. u^B = 2u − 1) for layered neighbors in either LexBFS or LexDFS partition refinement.

Definition 5.1

A path P in $\vec {G}_{M}$ is strongly simple if it is simple and for all u^A ∈ P, then u^B∉P.

Theorem 5.1

(Blum 2015) LetG = (V,E) and M a matching, then there exists an M-augmenting pathin G if and only if there exists a strongly simple pathin$\vec {G}_{M}=(A,B,\vec {E}_{M})$

We claim that in 2 sweeps, a LexBFS sweep for shortest distances, followed by a LexDFS sweep for longest strongly simple paths, an M-augmenting path if any, is found; in our opinion, it simplifies the modified BFS and DFS versions used by Blum (in spirit of Hopcroft-Karp’s original work) to find a maximum cardinality matching in $O(\sqrt {\mid {\!V\!}\mid }(\mid {\!E\!}\mid +\mid {\!V\!}\mid ))$.

In the partition refinement paradigm of LexBFS (resp. LexDFS), the pivot neighbors are put ahead the class they belong to (resp. collected and put in a single part behind the pivot part) while their original ordering is kept; we borrow notations from Fortin (2017) for describing the four points condition on vertex ordering σ of simple undirected graphs G(V,E): for all j <_σk <_σl (BFS), respectively for all i <_σk <_σl (DFS)

$$ \begin{array}{@{}rcl@{}} j?l:k &\Rightarrow& \exists i<_{\sigma} j,\quad i? k:l (\text{unsigned LexBFS})\\ i?l:k &\Rightarrow& \exists i<_{\sigma} j<_{\sigma} k,\quad j? k:l (\text{unsigned LexDFS}) \end{array} $$

where the ternary operator j?l : k means (v_j,v_l) ∈ E and (v_j,v_k)∉E for associated vertices in G. The structure of either traversal has been proven first by using labeling scheme (Berry and Bordat 1998; Xu et al. 2013) and rewritten using partition refinement in Fortin (2017), i.e., without recoursing to the (trivial) relationship between labels in the former paradigm and parts in the latter. Under partition refinement, a LexBFS ordering σ is retrieved in O(∣ E ∣), and the actual complexity for LexDFS is not accurately known, despite it is conjectured an amortized O(∣ E ∣) complexity in order to maintain the original ordering among the pivot neighbors belonging to the same part (Fig. 12).

For concrete programming, we consider even and odd parts, and we assume that matchings are associated with even to odd vertices while remaining edges are from odd to even vertices in graph $\vec {G}_{M}$; in examples below, we denote vertices of $\vec {G}_{M}$ whose traversals operate upon, by their corresponding vertex in G appended with E (resp. O) for ease of reading which vertex is concerned in graph G.

For LexBFS sweep, we start with two parts: oddM-free vertices in the part ahead and all remaining odd and even vertices (except even M-free vertices since they have no neighbor) so that the final ordering alternates even and odd vertices by previous observation on pivot neighbors; furthermore, by $\vec {G}_{M}$ definition w.r.t. matched edges, even vertices are followed by exactly the same number of odd vertices. For LexDFS sweep, we start with a single part given by the final ordering from LexBFS sweep appended with evenM-free vertices since they finish any augmenting path; the observation on pivot neighbors leads to an alternate sequence of odd and even vertices.

$$ \begin{array}{@{}rcl@{}} &&\left[\begin{array}{cccccccccccccc} {4O}& {12O}& 1E& 2E& 3E& 9E& 10E& 11E& 9O& 3O& 2O& 1O& 11O& 10O\\ \text{cont'd} & 5E & 6O& 7E& 8E& 8O& 7O& 6E& 5O \end{array}\right]\\ &&\left[\begin{array}{ccccccccccccccc} {4O}& 1E& 9O& 10E& 11O & {12E}& 5E& 6O& 7E& 8O& 6E& 5O& 8E& 7O& 9E \\\text{cont'd} & 1O& 2E& 3O& {4E}& 3E& 2O& {12O}& 11E& 10O \end{array}\right] \end{array} $$

$$ \begin{array}{@{}rcl@{}} && \left[\begin{array}{ccccccccccccccccccc} {10O}& { 15O}& 6E& 7E& 8E & 9E& 11E& 12E& 13E& 14E & 7O& 6O& 9O& 8O& 12O & 11O& 14O& 13O\\ \text{cont'd} & 16E & 1O& 2E& 5E& 5O& 2O& 1E & 3E& 4E& 16O& 4O & 3O \end{array}\right]\\ &&\left[\begin{array}{ccccccccccccccccccc} {10O}& 6E& 7O& 8E& 9O& { 10E}& 7E& 6O& 16E& 1O& 2E& 5O& 1E& 16O& 11E& 12O& 13E& 14O& { 15E} \\\text{cont'd} & 12E& 11O& 14E& 13O& 3E& 4O& 5E& 2O& 4E& 3O& 9E& 8O& { 15O} \end{array}\right] \end{array} $$

Lemma 5.1

If M-free vertices are interleaving inLexDFSordering, then there exists agreater matching.

Proof

Let consider the first interleaving sequence in LexDFS odd-even ordering among M-free verices u,v, uO,…,vE,xE,…,uE. W.l.o.g. consider uO = 1O; by LexDFS definition, there exists iO > 1O such that iO?lE : kE where kE = vE and lE = xE and uO < iO < kE; therefore, using four points condition, there exists jO such that jO?kE : lE with iO < jO < kE. The crucial point here is that, since vE is M-free but xE is not, xE was ahead of vE in starting LexDFS ordering and would occur before if jO were connected to lE; therefore, j = k − 1 and k = l − 1 whatever i is. Clearly, we augment the matching M from 1 to k by selecting odd to even vertices (see Fig. 13). □

This constructive proof using four points property of LexDFS sweep proves an alternative to Blum’s modified DFS traversal. It simplifies the BLOSSOM implementation as for increasing the cardinality of a current matching; however, it is marginal compared with managing the blossoms among maximum cardinality matchings of different weights. To our knowledge, no comparison has been made between BLOSSOM V (a cutting algorithm with possibly an exponential number of cuts, see Section 7.1 below) and general purpose branch and bound software with adaptative branching rule (Fig. 14) (Fortin and Tseveendorj 2009).

6 Non-separating Ear Decomposition

Finally, we address the case of any kind of graph to be embedded on a surface through the knowledge of its Betti number; to this aim, we review the ear decomposition algorithm.

Definition 6.1

An ear decomposition of a simple graph G = (V,E) is a sequence D = (P₀,…,P_k) such that P₀ is a cycle and each P_i for i > 0 is a path that intersects ∪_j<iP_j in exactly its endpoints.

A short (resp. long) ear has 1 (resp. more than 1) edge and no (resp. at least 1) inner vertex. Let the complement of induced subgraph G_i of vertices in V_i = V (∪_j≤iP_j) be denoted $\bar G_{i}$ with vertices $\bar V_{i}=V\setminus V_{i}$. An ear decomposition is non-separating if and only if $\bar G_{i}$ is connected for all i. The birth of an edge (resp. inner vertex) is the smallest path index of its appearance; it defines an ordering u <_birthv meaning birth(u) < birth(v). A path P_i is induced if there is no chord among inner vertices.

Definition 6.2

Let (r,t) and (r,u) be edges of a simple connected graph G, a Mondshein sequence through (r,t) and avoiding (r,u) is an ear decomposition D of g such that (r,t) ∈ P₀, P_birth(u) is the last long ear with the single inner vertex u that does not contain edge (r,u) and D is non-separating.

It is known that a simple connected graph G = (V,E) with edges (r,t), (r,u) ∈ E is 3-connected if and only if it has a a Mondshein sequence through (r,t) and avoiding (r,u). Constructing a Mondshein sequence D = (P₀,…P_β(G)− 1) through (r,t) and avoiding (r,u) relies on the following

Lemma 6.1

(Schmidt 2014) LetG = (V,E) be a simple 3-connected graph, a Mondshein sequenceD = (P₀,…P_β(G)− 1) through (r,t) and avoiding (r,u) can be constructed in amortized linear timeO(∣E∣).

where the cyclomatic (Betti) number β(G) = ∣E∣ −∣V ∣ + 1 is the dimension of a basis of fundamental cycles; as an illustrative example, Fig. 15 shows the case for Chvátal graph.

We claim that in 2 LexDFS sweeps, a non-separating ear decomposition of a simple 3-connected graph could be retrieved under the partition refinement.

Lemma 6.2

Using the four points property of LexDFS , the circuit P ₀ is retrieved from the forward semi-umbrellas recognized along the LexDFS sweep

Proof

For the first semi-umbrella, we get u <_σv <_σw in the LexDFS ordering σ with (u,v) and (u,w) in E and let v = v₁ <_σ… <_σv_p <_σw be the sequence in σ then for all 1 < j ≤ p, (u,v_j)∉E (or else it contradicts the first semi-umbrella). If (v_p,w) ∈ E then we found the circuit [u,v,…w]; otherwise, (v_p,w)∉E, let consider the partition refinement at pivot w: if ∃1 ≤ j < p such that (v_j,w) ∈ E, we get the circuit [u,…,v_j,w] (notice it happens whenever w has no more neighbor to process). Otherwise, let w₂ be the vertex following w in σ, (w,w₂) ∈ E, then proceed with the first semi-umbrella occurring after w in σ until the previous argument apply. If the enumeration stops without applying the argument, then 3-connectedness is violated. □

Gathering forward semi-umbrellas is done along LexDFS sweep (Fig. 16) and they are at most O(∣E∣), hence the LexDFS complexity for selecting P₀. Let σ = [P₋,P₀,P₊], the second sweep starts with 2 parts,P₀and [P₊,P−] to keep cycle structure in P₊ as close as possible to P₀.

Lemma 6.3

Using the four points property of LexDFS , the non-separating ear decomposition is retrieved from the second LexDFS sweep

Proof

Since the first part P₀ is a circuit without chord, LexDFS keeps it ahead up to possible a reordering of its vertices. By induction, we assume a non-separating ear decomposition has been found up to P_e; we just have to prove that next path P_e+ 1 with endpoints in ${\cup _{0}^{e}} P_{e}$ has no chord. By contradiction, consider the first chord w.r.t. LexDFS ordering Fig. 17 and consider endpoints of P_e then for i <_σ < k <_σl such that i?l : k; therefore ∃ji <_σ < j <_σk such that j?k : l, hence either $v_{j}{\in \cup _{0}^{e}} P_{e}$ and we have found a path in ear decomposition before P_e+ 1, or v_j ∈ P_e+ 1 and we have a chord before contradicting our assumption. □

Remark 6.1

LexDFS enumeration of paths in a non-separating ear decomposition does not examine simple edge between different paths; they have to be added to the tail of the list as noticed in Schmidt (2014).

Were the first sweep σ₀ start with P₀, the second sweep σ₁ possibly differs from the first only by the P₀ part, ahead the remaining vertices in σ₀; see below, the example in Schmidt (2014) starting from 1,3,5,4,2,6,7,8,9,10,11,12,13,14

$$ \begin{array}{@{}rcl@{}} \left[\begin{array}{l}\sigma_{0}\\\sigma_{1} \end{array}\right] =&\left[\begin{array}{cccccccccccccc} 1& 5& 4& 2& 3& 6& 7& 8& 9& 11& 12& 13& 14& 10\\ 1& 5& 4& 2& 3& 6& 7& 8& 9& 11& 12& 13& 14& 10 \end{array}\right] \end{array} $$

7 Discussion

This section summarizes the challenging issues at both optimization and representation levels described above; furthermore, it provides perspectives to carry the methodology over input data not far from dissimilarities.

7.1 Maximum Weighted Matching

The separability of profit and price dual variables is crucial for solving the LAP optimization problem by auction on either dense or sparse costs. Notice that we can start from the optimal solution of the linear relaxation, i.e., where binary variables are replaced by continuous variables in [0,1] and then rows are assigned to columns along the decreasing ordered optimal (fractional) values until no more assignment is feasible: here, we get the optimal binary solution from the fractional solution (since it is integral), in disjoint cycle notations (1,3), (2,9), (4,15), (5,17), (6,10,16), (7,19), (8,11), (12,18), (13,14).

For maximum cardinality matching of minimum weight, we can add a sufficiently large number d_max = max|d_ij| to every edge so that if the matching found is not of maximal cardinality then a better solution can be found with greater cardinality; it leads to maximum weighted matching on cost matrix C = nd_max − D

$$ \begin{array}{@{}rcl@{}} C=\left[\begin{array}{ccccccccccccccccccc} 0& 186& 193& 184& 189& 184& 189& & 186& 184& 189& 184& 189& & 184& 184& 186& 184& 189\\ 186& 0& 186& 184& 193& 184& & 189& 194& 184& 189& & 189& & 184& & 194& 184& 192\\ 193& 186& 0& 184& 189& 184& 189& 189& 186& 184& 189& 184& 189& & 184& 184& & 184& 189\\ 184& 184& 184& 0& 184& 187& 184& 184& 184& & 184& 187& & 184& 194& 187& & 193& \\ 189& 193& 189& 184& 0& 184& & 191& 193& 184& 191& & & 189& 184& 184& 193& 184& \\ 184& 184& 184& 187& 184& 0& & 184& 184& 194& 184& 190& 184& & 189& 192& 184& 189& 184\\ 189& & 189& 184& & & 0& 191& 192& 184& 191& 184& 191& & 184& 184& 192& 184& 193\\ & 189& 189& 184& 191& 184& 191& 0& 190& 184& 194& 184& & 191& & 184& 190& 184& 191\\ 186& 194& 186& 184& 193& 184& 192& 190& 0& 184& 190& 184& 189& 186& 184& 184& & & 192\\ 184& 184& 184& & 184& 194& 184& 184& 184& 0& 184& 190& & 184& 188& 193& 184& 189& 184\\ 189& 189& 189& 184& 191& 184& 191& 194& 190& 184& 0& 184& 193& 191& 184& 184& 190& 184& 191\\ 184& & 184& 187& & 190& 184& 184& 184& 190& 184& 0& 184& 184& 193& 190& 184& 194& 184\\ 189& 189& 189& & & 184& 191& & 189& & 193& 184& 0& 191& 184& 184& 189& 184& \\ & & & 184& 189& & & 191& 186& 184& 191& 184& 191& 0& 184& 184& & 184& 189\\ 184& 184& 184& 194& 184& 189& 184& & 184& 188& 184& 193& 184& 184& 0& 188& & & 184\\ 184& & 184& 187& 184& 192& 184& 184& 184& 193& 184& 190& 184& 184& 188& 0& 184& 188& \\ 186& 194& & & 193& 184& 192& 190& & 184& 190& 184& 189& & & 184& 0& 184& 192\\ 184& 184& 184& 193& 184& 189& 184& 184& & 189& 184& 194& 184& 184& & 188& 184& 0& \\ 189& 192& 189& & & 184& 193& 191& 192& 184& 191& 184& & 189& 184& & 192& & 0 \end{array}\right] \end{array} $$

Here, all subtours in a LAP formulation should have length 2 which introduces x_ij = x_ji whenever the size is square (forcing c_ji = c_ij when one entry is missing in cost matrix); for bipartite case, LAP constraints are completed with ${\sum }_{J} x_{ij}\leq 1$ (resp. ${\sum }_{I} x_{ij}\leq 1$), without taking care of extra variables x_sj (resp. x_is). Finally, BLOSSOM algorithm completes this formulation by valid inequalities on connected induced odd sets ${\sum }_{(i,j)\in \mathcal {O}} x_{ij}\geq 1$ and could start with rounding the continuous variables solutions^{Footnote 1} without odd sets inequalities, until no more assignment is feasible; here, we get (1,3), (2,17), (4,15), (5,9), (7,19), (8,11), (12,18), (13,14).

For maximum unweighted matching, we get respectively (1,9), (2,3), (5,6), (7,8), (10,11), and (1,16), (2,5), (3,4), (6,7), (8,9), (11,12), (13,14) which lead to the LexBFS LexDFS sweeps in Section 5.

Notice that since the seminal work by Edmonds (1987), it has been required five implementations of Blossom algorithm with odd sets (Kolmogorov 2009) in order to actually achieve the complexity $O(\sqrt {\mid {V}\mid }(\mid {E}\mid +\mid {V}\mid ))$ and maximum weighted matching is important enough to leave this branch still active (Blum 2015).

7.2 Generalized Schnyder Woods

For small-sized graphs, Schnyder woods construction is manually tractable. In general, the algorithm proceeds by selecting an outercircuit passing through all the vertices with minimum degree w.r.t. some orientation (say clockwise as in examples); then, it chooses a vertex in rounds on the outercircuit such that all its uncolored edges can be assigned a color which fulfills the Schnyder rule and whose removal does not disconnect the outercircuit (the removal of which yields the outercircuit for the next round), whence the termination and correctness of the suspension. In planar case, a sufficient condition to meet this framework, selects a vertex which is not a chord of the outercircuit, provided the graph is triangulated before starting the algorithm. Despite the geometric change introduced by the triangulation in this case, the extra edges may be removed after completing the planar drawing.

In orientable surfaces of higher genus, as could be seen in Section 4.2, a triangulation requires the knowledge of the genus so that Schnyder woods eventually depend on the triangulation, and therefore it strongly impacts the geometric representation; moreover, the choice of a vertex to process on the outercircuit requires the feasibility of the coloring together with the connectedness of the outercircuit for next round. As it has been shown above, multiple edges do not change the algebraic/geometric structure but triangulation does; therefore, we leave as a challenging issue: how to build ak-Schnyder suspension under adding multiple edges on demand (when a vertex has degree deficiency).

We believe that the fundamental cycles (Ren et al. 2009) of the induced graph in each round, their intersection graph and their (so called) attachments on the oriented outercircuit are useful in this regard (Chambers et al. 2009; Erickson and Whittlesey 2005; Cabello et al. 2016); more precisely, an ear decomposition looks necessary in each round to select the vertex on the outercircuit that maintains both the coloring capability and the connectedness of the next outercircuit.

7.3 Preference Matrices

In Fortin (2017), dissimilarity matrices have been studied for Robinson property recognition (using 4 LexBFS sweeps). It is stated that similarity matrices with an ultrametric property compatible with an ordering are related to co-comparability graph recognition; a problem harder than recognizing Robinson property on dissimilarities. The discussion above, on non-separating ear decomposition opens up new perspectives to recognizing anti-Robinson property. Consider preference matrices as square non-symmetric matrices, possibly sparse, with positive entries; then, the whole discussion above (weighted matching, weighted ear decomposition) applies to such matrices. Indeed, to such a matrix P, we may associate the symmetric Pencil (L,U) i.e., L = L^t and U = U^t, such that P is made of the lower triangular part in L and the upper triangular part in U. The naming Pencil for a pair of matrices comes after the generalization of spectral matrix algorithms (involving a Matrix A and implicitly the identity I) to any pair (A,B). A constructive answer to the question whether there exists an ordering such that the permuted pencil has the (Robinsonian, anti-Robinsonian) property would refine voting systems knowledge, with Schulze method (Schulze 2018) at one end and (Robinsonian, anti-Robinsonian) pencil at the other.

8 Conclusion

In this article, we review the clustering of a dissimilarity with missing entries, and its representation in orientable surfaces of small genus, from the algebraic and geometric point of views. Our minor contributions concern the SNF computation where we provide missing details, in the literature addressing this algebraic topic. For undirected simple graphs, we establish in a few lemmas, the intimate relationships between a pair of LexBFS and LexDFS sweeps and augmenting paths to increase the cardinality of a matching on the one hand; and between a pair of LexDFS sweeps and a non-separating ear decomposition on the other hand.

It leaves open the challenging issue to construct a k-Schnyder suspension without triangularizing the graph w.r.t. some orientable surface with a priori genus, as most authors do in computational geometry. Finally, it suggests to use ear decomposition to address the relationship between anti-Robinson property of a dissimilarity and co-comparability graph recognition, for seriation problems; the same tracks look promising as well to extend this work to preference matrices for voting systems.

This review is far from being exhaustive, both at the optimization level, where more constraints could be introduced, and at the representation level, where other objects than surfaces could benefit of human vision capabilities: to our knowledge, braids and knots have not been targeted while they could quite appropriately render entanglement among clusters.

Notes

All problems have been solved using CPLEX under IBM Academic Initiative

References

Alam, M.J., Fink, M., Pupyrev, S. (2016). The bundled crossing number (pp. 399–412). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-50106-2_31. ISBN 978-3-319-50106-2.
MATH Google Scholar
Aronov, B., Erdős, P., Goddard, W., Kleitman, D.J., Klugerman, M., Pach, J., Schulman, L.J. (1994). Crossing families. Combinatorica, 14(2), 127–134. https://doi.org/10.1007/BF01215345. ISSN 1439-6912.
Article MathSciNet Google Scholar
Bailey, R.F., Newman, M., Stevens, B. (2014). A note on packing spanning trees in graphs and bases in matroids. Australasian Journal of Combinatorics, 59(1), 24–38.
MathSciNet MATH Google Scholar
Berry, A., & Bordat, J.-P. (1998). Separability generalizes Dirac’s theorem. Discrete Applied Mathematics, 84(1–3), 43–53. https://doi.org/10.1016/S0166-218X(98)00005-5. ISSN 0166-218X.
Article MathSciNet Google Scholar
Bertsekas, D.P., & Castañon, D.A. (1992). A forward/reverse auction algorithm for asymmetric assignment problems. Computational Optimization and Applications, 1 (3), 277–297. ISSN 0926-6003.
Article MathSciNet Google Scholar
Blum, N. (2015). Maximum matching in general graphs without explicit consideration of blossoms revisited. CoRR, arXiv:1509.04927.
Burkard, R., Deineko, V., van Dal, R., van der Veen, J., Woeginger, G. (1998). Well-solvable special cases of the traveling salesman problem: a survey. SIAM Review, 40(3), 496–546. https://doi.org/10.1137/S0036144596297514.
Article MathSciNet MATH Google Scholar
Buš, L., & Tvrdík, P. (2009). Towards auction algorithms for large dense assignment problems. Computational Optimization and Applications, 43(3), 411–436. https://doi.org/10.1007/s10589-007-9146-5. ISSN 0926-6003.
Article MathSciNet Google Scholar
Cabello, S., Colin de Verdière, É., Lazarus, F. (2016). Finding shortest non-trivial cycles in directed graphs on surfaces. Journal of Computational Geometry, 7(1), 123–148.
MathSciNet MATH Google Scholar
Castelli Aleardi, L., Fusy, E., Lewiner, T. (2009). Schnyder woods for higher genus triangulated surfaces, with applications to encoding. Discrete and Computational Geometry, 42(3), 489–516. https://doi.org/10.1007/s00454-009-9169-z. https://hal.inria.fr/hal-00712046. Extended version of the article appeared in the Proc. of the ACM SoCG 2008.
Article MathSciNet Google Scholar
Chambers, E.W., Erickson, J., Nayyeri, A. (2009). Homology flows, cohomology cuts. In Proceedings of the forty-first annual ACM symposium on theory of computing, STOC ’09. https://doi.org/10.1145/1536414.1536453. ISBN 978-1-60558-506-2. http://doi.acm.org/10.1145/1536414.1536453 (pp. 273–282). New York: ACM.
Deineko, V. (2004). New exponential neighbourhood for polynomially solvable TSPs. Electronic Notes in Discrete Mathematics, 17, 111–115. ISSN 1571-0653. https://doi.org/10.1016/j.endm.2004.03.015. http://www.sciencedirect.com/science/article/pii/S1571065304010236.
Article MathSciNet Google Scholar
Deineko, V., Klinz, B., Woeginger, G.J. (2006). Four point conditions and exponential neighborhoods for symmetric TSP. In Proceedings of the seventeenth annual ACM-SIAM symposium on discrete algorithm, SODA ’06. ISBN 0-89871-605-5. http://dl.acm.org/citation.cfm?id=1109557.1109617 (pp. 544–553). Philadelphia: Society for Industrial and Applied Mathematics.
Durocher, S., & Mondal, D. (2015). Plane 3-trees, Embeddability and approximation. SIAM Journal on Discrete Mathematics, 29(1), 405–420. https://doi.org/10.1137/140964710.
Article MathSciNet MATH Google Scholar
Duxbury, P., Granlund, L., Gujarathi, S., Juhas, P., Billinge, S. (2016). The unassigned distance geometry problem. Discrete Applied Mathematics, 204, 117–132. https://doi.org/10.1016/j.dam.2015.10.029. ISSN 0166-218X, http://www.sciencedirect.com/science/article/pii/S0166218X15005168.
Article MathSciNet Google Scholar
Edmonds, J. (1987). Paths, trees, and flowers. In I. Gessel, G.-C. Rota (Eds.), Classic Papers in Combinatorics (pp. 361–379). Boston: Birkhäuser Boston. https://doi.org/10.1007/978-0-8176-4842-8_26.
Erickson, J., & Whittlesey, K. (2005). Greedy optimal homotopy and homology generators. In Proceedings of the sixteenth annual ACM-SIAM symposium on discrete algorithms, SODA ’05. ISBN 0-89871-585-7. http://dl.acm.org/citation.cfm?id=1070432.1070581 (pp. 1038–1046). Philadelphia: Society for Industrial and Applied Mathematics.
Felsner, S., & Zickfeld, F. (2008). Schnyder woods and orthogonal surfaces. Discrete & Computational Geometry, 40(1), 103–126. https://doi.org/10.1007/s00454-007-9027-9, ISSN 1432-0444.
Article MathSciNet Google Scholar
Fortin, D. (2017). Robinsonian matrices: recognition challenges. Journal of Classification, 34(2), 191–222.
Article MathSciNet Google Scholar
Fortin, D., & Tseveendorj, I. (2009). A trust branching path heuristic for zero–one programming. European Journal of Operational Research, 197(2), 439–445. https://doi.org/10.1016/j.ejor.2008.06.033. ISSN 0377-2217. http://www.sciencedirect.com/science/article/pii/S0377221708004967.
Article Google Scholar
Gonçalves, D., & Lévêque, B. (2012). Toroidal maps: Schnyder woods, orthogonal surfaces and straight-line representations. CoRR, arXiv:1202.0911.
Habib, M., McConnell, R., Paul, C., Viennot, L. (2000). Lex-BFS and partition refinement, with applications to transitive orientation, interval graph recognition and consecutive ones testing. Theoretical Computer Science, 234(1–2), 59–84. https://doi.org/10.1016/S0304-3975(97)00241-7. ISSN 0304-3975.
Article MathSciNet MATH Google Scholar
Hanrot, G., Pujol, X., Stehlé, D. (2011). Algorithms for the shortest and closest lattice vector problems (pp. 159–190). Berlin: Springer.
MATH Google Scholar
Holten, D., & van Wijk, J.J. (2009). Force-directed edge bundling for graph visualization. In Proceedings of the 11th Eurographics/IEEE - VGTC conference on visualization, EuroVis’09 (pp. 983–998). Chichester: The Eurographs Association & #38; Wiley #38, DOI https://doi.org/10.1111/j.1467-8659.2009.01450.x, (to appear in print).
Hruz, T., & Fortin, D. (1993). Parallelism in Hermite and Smith normal forms. Technical report, INRIA. http://hal.inria.fr/inria-00074594.
Johnson, D.B. (1975). Finding all the elementary circuits of a directed graph. SIAM Journal on Computing, 4(1), 77–84. https://doi.org/10.1137/0204007.
Article MathSciNet Google Scholar
Kaiser, T. (2012). A short proof of the tree-packing theorem. Discrete Mathematics, 312(10), 1689–1691. https://doi.org/10.1016/j.disc.2012.01.020.
Article MathSciNet MATH Google Scholar
Kolmogorov, V. (2009). Blossom v: a new implementation of a minimum cost perfect matching algorithm. Mathematical Programming Computation, 1(1), 43–67. https://doi.org/10.1007/s12532-009-0002-8. ISSN 1867-2957.
Article MathSciNet MATH Google Scholar
Kotrbčík, M., & Škoviera, M. (2012). Matchings, cycle bases, and the maximum genus of a graph. The Electronic Journal of Combinatorics, 19(3), 1–12.
Article MathSciNet Google Scholar
Kundu, S. (1974). Bounds on the number of disjoint spanning trees. Journal of Combinatorial Theory, Series B, 17(2), 199–203. https://doi.org/10.1016/0095-8956(74)90087-2. ISSN 0095-8956. http://www.sciencedirect.com/science/article/pii/0095895674900872.
Article MathSciNet Google Scholar
Li, H., Li, X., Mao, Y., Yue, J. (2015). Note on the spanning-tree packing number of lexicographic product graphs. Discrete Mathematics, 338(5), 669–673. https://doi.org/10.1016/j.disc.2014.12.007. ISSN 0012-365X. www.sciencedirect.com/science/article/pii/S0012365X14004543.
Article MathSciNet MATH Google Scholar
Mohar, B. (2009). The genus crossing number. ARS Mathematica Contemporanea, 2(2). ISSN 1855-3974. http://amc-journal.eu/index.php/amc/article/view/21.
Newman, M. (1972). Integral matrices. Pure and applied mathematics: a series of monographs and textbooks. New York: Academic Press. ISBN 9780125178501. https://books.google.fr/books?id=bpHglAEACAAJ.
Google Scholar
Ren, H., Zhao, H., Li, H. (2009). Fundamental cycles and graph embeddings. Science in China Series A: Mathematics, 52(9), 1920–1926. ISSN 1862-2763. https://doi.org/10.1007/s11425-009-0041-7.
Article MathSciNet Google Scholar
Schmidt, J.M. (2014). The Mondshein sequence (pp. 967–978). Berlin: Springer.
MATH Google Scholar
Schnyder, W. (1989). Planar graphs and poset dimension. Order, 5, 323–343, 12. https://doi.org/10.1007/BF00353652.
Article MathSciNet Google Scholar
Schulze, M. (2018). The schulze method of voting. CoRR, arXiv:1804.02973.
Storjohann, A. (1998). Computing Hermite and Smith normal forms of triangular integer matrices. Linear Algebra and its Applications, 282(1), 25–45. ISSN 0024-3795. https://doi.org/10.1016/S0024-3795(98)10012-5. http://www.sciencedirect.com/science/article/pii/S0024379598100125.
Article MathSciNet Google Scholar
Xu, S.-J., Li, X., Liang, R. (2013). Moplex orderings generated by the LexDFS algorithm. Discrete Applied Mathematics, 161(13–14), 2189–2195. ISSN 0166-218X. https://doi.org/10.1016/j.dam.2013.02.028.
Article MathSciNet Google Scholar
Xuong, N.H. (1979). Upper-embeddable graphs and related topics. Journal of Combinatorial Theory, Series B, 26(2), 226–232. https://doi.org/10.1016/0095-8956(79)90059-5. ISSN 0095-8956. http://www.sciencedirect.com/science/article/pii/0095895679900595.
Article MathSciNet Google Scholar

Download references

Acknowledgments

An anonymous referee deeply interacts with the manuscript so that it brings many clarifications and improvements. The author is grateful to him/her for the deep and accurate reviewing.

Author information

Authors and Affiliations

Inria, 2 Rue Simone IFF, 75012, Paris, France
D. Fortin

Authors

D. Fortin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Fortin.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fortin, D. Clustering Analysis of a Dissimilarity: a Review of Algebraic and Geometric Representation. J Classif 37, 180–202 (2020). https://doi.org/10.1007/s00357-019-09315-7

Download citation

Published: 30 March 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s00357-019-09315-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Clustering Analysis of a Dissimilarity: a Review of Algebraic and Geometric Representation

Abstract

Similar content being viewed by others

Expanding the Class of Global Objective Functions for Dissimilarity-Based Hierarchical Clustering

Combinatorial Optimization Approaches for Data Clustering

Consensus of Clusterings Based on High-Order Dissimilarities

1 Introduction

2 Notations and Prerequisites

Definition 2.1

3 Clustering from a Dissimilarity

3.1 Algebraic Clustering

3.1.1 Linear Assignment Problem

3.1.2 Subtour Patching

3.2 Geometric Clustering

4 Representation of a Clustering

4.1 Algebraic Representation: Simplicial Embedding

4.1.1 Singular Ideals Decomposition

4.1.2 Finding a Suspension

Definition 4.1

4.1.3 Edge Disjoint Trees

4.2 Geometric Representation: Embedding in Orientable Surface with Higher Genus

5 Strongly Simple Augmenting Paths

Definition 5.1

Theorem 5.1

Lemma 5.1

Proof

6 Non-separating Ear Decomposition

Definition 6.1

Definition 6.2

Lemma 6.1

Lemma 6.2

Proof

Lemma 6.3

Proof

Remark 6.1

7 Discussion

7.1 Maximum Weighted Matching

7.2 Generalized Schnyder Woods

7.3 Preference Matrices

8 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation