Two Simple but Efficient Algorithms to Recognize Robinson Dissimilarities

Carmona, M.; Chepoi, V.; Naves, G.; Préa, P.

doi:10.1007/s00357-023-09446-y

Two Simple but Efficient Algorithms to Recognize Robinson Dissimilarities

Special Issue: IFCS 2022
Published: 18 August 2023

(2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Classification Aims and scope Submit manuscript

Two Simple but Efficient Algorithms to Recognize Robinson Dissimilarities

Download PDF

M. Carmona^1,2,
V. Chepoi¹,
G. Naves¹ &
…
P. Préa ORCID: orcid.org/0000-0002-7951-8827^1,2

108 Accesses
Explore all metrics

Abstract

A dissimilarity d on a set S of size n is said to be Robinson if its matrix can be symmetrically permuted so that its elements do not decrease when moving away from the main diagonal along any row or column. Equivalently, S admits a total order < such that $i<j<k$ implies that $d(i,j)\le d(i,k)$ and $ d(j,k)\le d(i,k) $. Intuitively, d is Robinson if S can be represented by points on a line. Recognizing Robinson dissimilarities has numerous applications in seriation and classification. Robinson dissimilarities also play an important role in the recognition of tractable cases for TSP. In this paper, we present two simple algorithms (inspired by Quicksort) to recognize Robinson dissimilarities. One of these algorithms runs in $O(n^2\log n)$; the other one runs in $O(n^3)$ in worst case and in $O(n^2)$ on average.

Robinsonian Matrices: Recognition Challenges

Article 24 June 2017

Clustering Analysis of a Dissimilarity: a Review of Algebraic and Geometric Representation

Article 30 March 2019

A Lex-BFS-Based Recognition Algorithm for Robinsonian Matrices

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

A major issue in classification and data analysis is to visualize simple geometrical and relational structures between objects based on their pairwise dissimilarities. Many applied algorithmic problems ranging from archaeological dating through DNA sequencing and numerical ecology to sparse matrix reordering and overlapping clustering involve ordering a set of objects so that closely coupled elements are placed near each other. For example, the classical seriation problem, introduced by Robinson (1951) as a tool to seriate archaeological deposits, asks to find a simultaneous ordering (or permutation) of the rows and the columns of the dissimilarity matrix with the objective that small values should be concentrated around the main diagonal as closely as possible, whereas large values should fall as far from it as possible. This goal is best achieved by considering the so-called Robinson property: a dissimilarity matrix A is said to have the Robinson property if its values increase monotonically in the rows and the columns when moving away from the main diagonal in both directions. In case of (0, 1)-matrices, the Robinson property is best known as the Consecutive One Property. A dissimilarity matrix A with Robinson property is called a Robinson matrix (or a R-matrix, see Atkins et al. ,1998). A Robinsonian matrix (called also a pre-R-matrix by Atkins et al. ,1998) is a dissimilarity matrix A which can be transformed by permuting its rows and columns to a dissimilarity matrix having the Robinson property. The permutation which leads to a matrix with the Robinson property is called a compatible order. Instead of dissimilarities, many papers on seriation consider similarities; in this case, “increase monotonically” is replaced by “decrease monotonically.”

1.1 Related Work

Due to the importance of Robinson dissimilarities in seriation and classification, the algorithmic problem of recognizing Robinsonian dissimilarities on n points attracted the interest of many authors, and several polynomial time algorithms for solving this problem have been proposed. The existing recognition algorithms can be classified into combinatorial and spectral. All combinatorial algorithms are based on the correspondence between Robinson dissimilarities and interval hypergraphs/unit interval graphs. The main difficulty arising in recognition algorithms is the existence of several compatible orders for the whole matrix or for some its submatrices (if the whole matrix is Robinson).

Historically, the first recognition algorithm was given in 1984 by Mirkin and Rodin and consists in testing if the hypergraph of balls of the dissimilarity is an interval hypergraph; it runs in $O(n^4)$ time and uses $O(n^3)$ space. Critchley & Fichet (1994) gave a simple divide-and-conquer algorithm running in $O(n^{3})$ time and using $O(n^2)$ space algorithms to recognize Robinson dissimilarities. The algorithm divides the set of points into subsets and consequently refine the obtained subsets into blocks to which the recursion is applied. Seston (2008a) presented another $O(n^{3})$ time and $O(n^2)$ space algorithm, by using threshold graphs defined by the input dissimilarity. Seston (2008b) consequently improved the complexity of the second algorithm to $O(n^2 \log n)$. Finally, in 2014, Préa & Fortin (2014) presented an algorithm running in optimal $O(n^2)$ time. The optimal complexity of the algorithm of Préa & Fortin (2014) is due to the use of the PQ-trees of Booth & Lueker (1976) as a data structure for encoding all compatible orders. Even if optimal, the algorithm of Préa & Fortin (2014) is far from being simple and efficient in practice.

Subsequently, two new recognition algorithms were proposed by Laurent & Seminaroti (2017a): they presented an algorithm of complexity $O(\alpha \cdot n)$ based on classical LexBFS traversal and divide-and-conquer paradigm (where $\alpha $ is the depth of the recursion tree, which is at most the number of distinct nonzero elements of the input matrix) and an $O(n^2\log n)$ algorithm, which extends LexBFS to weighted matrices and is used as a multisweep traversal. Finally, (Laurent et al., 2017) presented a structural characterization of Robinson matrices in terms of forbidden substructures, extending the notion of asteroidal triples in graphs to weighted graphs.

The spectral approach to the recognition of Robinson (dis)similarities was originally introduced by Atkins et al. (1998) and was subsequently used in numerous papers (see, for example, (Fogel et al., 2016) and the references therein). The method is based on the computation of the second smallest eigenvalue of the Laplacian of a similarity matrix A and of its eigenvector, which are called the Fiedler value and the Fiedler vector of the matrix A. This leads to an algorithm of complexity $O(nT(n)+n^2\log n)$ to recognize if a similarity matrix is Robinson, where T(n) is the complexity of computing the Fiedler vector of a matrix. The Fiedler vector is computed by the Lanczos algorithm, which is an iterative numerical algorithm that at each iteration performs a multiplication of the input matrix A by a vector. The authors mention that the algorithm converges in fewer than n iterations, often only $O(\sqrt{n})$. Their algorithm has complexity $O(n^4)$ in the worst case and $O(n^{3.5})$ on average.

Real data contains errors; therefore, the dissimilarity between the objects can be measured only approximately, and the resulting dissimilarity matrix fails to satisfy the Robinson property. In this case, we are led to the problem of approximating a dissimilarity by a Robinson dissimilarity. As an error measure, one can use the usual $\ell _p$-distance between two matrices of equal size. However, this $\ell _p$-fitting problem has been shown to be NP-hard for $p=1$ (see Barthélemy & Brucker (2001)) and for $p=\infty $ (see Chepoi et al., (2009)); no efficient algorithm is known for other values of p. Various heuristics for this optimization problem have been considered in (see Hubert , 1974; Hubert et al. , 2006 and papers cited therein). The approximability of this fitting problem for any $1\le p<\infty $ is open. Chepoi & Seston (2011) presented a polynomial factor 16 approximation for the $\ell _{\infty }$-fitting problem. Given the input dissimilarity matrix A, the algorithm uses the fact that the $\ell _{\infty }$-fitting problem is polynomial when the total order $\prec $ (which is derived from the existence of super-dominants) and the fact that the optimal fitting error belongs to a well-defined list of size $O(n^4)$. Running a binary search on this list, for each potential error value $\epsilon >0$, the algorithm either detects that there is no Robinson matrix approximating A with error at most $\epsilon $ or returns a Robinson approximation with factor at most $16\epsilon $.

Similarly to the classical correspondence between ultrametrics and hierarchies, there is a one-to-one correspondence between Robinson dissimilarities and pseudohierarchies due to Diday (1986); Durand & Fichet (1988). Pseudohierarchies are now classical examples of classification with overlapping classes. As pseudohierarchies are a generalization of hierarchies, Robinson dissimilarities are a generalization of ultrametrics.

Robinson dissimilarities are also linked with the Traveling Salesman Problem (TSP). First, TSP can be polynomially solved on Robinson dissimilarity matrices by returning any compatible order. Kalmanson (1975); Demidenko (1976) dissimilarity matrices are two other types of related dissimilarities on which TSP can be polynomially solved (see Deineko et al. (2014) for a study of some of these cases). They are defined quite similarly to Robinson dissimilarities: a dissimilarity on a set S is Demidenko (respectively, Kalmanson) if there exists a linear (respectively, circular) order $\{x_1,\ldots ,x_n\}$ such that $i< j<k<\ell $ implies $d(x_i, x_j) + d(x_k, x_\ell ) \le d(x_i, x_k) + d(x_j, x_\ell )$ (respectively, $d(x_i,x_k)+d(x_j,x_{\ell })\ge \max \{ d(x_i,x_j)+d(x_k,x_{\ell }),d(x_i,x_{\ell })+d(x_j,x_k)\}$). Notice that the currently best algorithm for recognizing Demidenko matrices (and finding a permutation of the points which lead to the Demidenko condition) is due to Çela et al. (2023) and is based on the recognition of Robinson dissimilarities as a subroutine..

1.2 Our Results

The recognition of Robinson dissimilarities can be viewed as a 2-dimensional version of the classical sorting problem. Therefore, it is natural to investigate in which way the sorting algorithms can be generalized to this 2-dimensional setting. The goal of this paper is to propose two new algorithms to recognize Robinson dissimilarities based on the idea of QuickSort of partitioning the elements with respect to randomly chosen pivots. In our setting, the pivots are randomly chosen pairs of points $\{ x, y\}$ and the partition consists in determining, when, say, $x<y$, which points must be located to the left of x, between x and y, and to the right of y. At the difference of QuickSort, this partition step is much less evident because of ambiguity caused by equalities between distances from x to y and from x or y to other points. We deal with this difficulty in two different ways:

Do a precise study of the ambiguities and of the use of the PQ-tree structure (see Section 2.2). This yields to an algorithm which, similarly to QuickSort, is simple and optimal ($O(n^2)$ time) on average but less efficient ($O(n^3)$ time) in the worst case.
Use the partition refinement (see Section 2.3). It appears that partition refinement was sufficient to recognize Robinson dissimilarities, and so we designed a simple algorithm without pivots and partitioning. This algorithm runs in $O(n^2\log n)$ time.

1.3 Paper’s Organization

The paper is organized as follows. In Section 2, we recall the definitions related to Robinson dissimilarities and we recall two classical tools (PQ-trees and refinement of partitions) that will be used by our algorithms. In Section 3, we will show how to partition the input S according to a pair of points. This is the main tool of the first of our algorithms. In Section 4, we describe our first algorithm, which is iterative and that we called irri for Iterative-Robinson-Recognition-with-Intervals. In Section 5, we give the second algorithm, which is recursive and that we called RRRR for Recursive-Robinson-Recognition-by-Refinement. In Appendix A, we show experimentally that Algorithm irri has complexity $O(n^2)$ on average.

2 Preliminaries

2.1 Robinsonian Dissimilarities

Let $S=\{ p_1,\ldots , p_n\}$ be a set of n elements, called points. A partial order $\prec $ on S is called linear (or total) if any two elements of S are comparable. A dissimilarity on S is a symmetric function d from $S^2$ to the nonnegative real numbers and vanishing on the main diagonal, i.e., $d(x,y)=d(y,x)\ge 0$ and $d(x,y)=0$ if $x=y$. Then, d(x, y) is called the distance between x and y, and (S, d) is called a dissimilarity space. A dissimilarity d and a linear order $\prec $ on S are called compatible if $x\prec z\prec y$ implies that $d(x,y)\ge \max \{ d(x,z),d(z,y)\}.$ If d and $\prec $ are compatible, then d is also compatible with the linear order $\prec ^{op}$ opposite to $\prec $. A dissimilarity d on S is said to be Robinson if it admits a compatible order (Robinson , 1951). Equivalently, d is Robinson if its distance matrix $D=(d(p_i,p_j))$ can be symmetrically permuted so that its elements do not decrease when moving away from the main diagonal along any row or column. Such a dissimilarity matrix D is called Robinson (Critchley & Fichet, 1994; Diday, 1986; Durand & Fichet, 1988; Hubert, 1974), and we will call (S, d) a Robinson space.

The ball of radius $r\ge 0$ and center $x \in S$ is the set $B_r(x):=\{y\in S: d(x,y)\le r\}$. From the definition of a Robinson dissimilarity immediately follows that d is Robinson if and only if there exists a linear order $\prec $ on S such that all balls $B_r(x)$ of (S, d) are intervals of $\prec $. Moreover, this property holds for all compatible orders. Finally, notice that if $S'\subset S$, then the restriction $d|_{S'}$ of d to $S'$ is Robinsonian and the restriction of any compatible order $\prec $ of d to $S'$ is a compatible order of $d|_{S'}$.

Basic examples of Robinson dissimilarities are the ultrametrics. Recall that d is an ultrametric if $d(x,y)\le \max \{ d(x,z),d(y,z)\}$ for all $x,y,z\in S$. Another basic example of a Robinson dissimilarity is provided by the standard line-distance between n points $p_1<\ldots <p_n$ of ${\mathbb {R}}$. Notice that any line-distance has exactly two compatible orders: the order $p_1<\ldots <p_n$ defined by the coordinates of the points and its opposite. We say that a Robinson dissimilarity d is straight if d has exactly two compatible orders. All line-distances are straight, but the converse is not true.

A set $B\subset S$ is called a block if B is an interval in any compatible order, i.e., if $x,y\in B$ and $x\prec z\prec y$ in some compatible order $\prec $ implies that $z\in B$. Two disjoint blocks are consecutive if their union is also a block. More generally, the blocks in a sequence $B_1,\ldots , B_k$ are consecutive if $B_i$ and $B_{i+1}$ are consecutive blocks for all $1\le i<k$.

Let U and V be two disjoints subsets of S. We say that U is independent from V if for all $x,y \in U$, $z\in V$, we have $d(x,z) = d(y,z)$, i.e., if the points of U cannot be distinguished from V. If U is independent from $S\setminus U$, we say that U is independent.

Notice that an independent set may not be a block (if d is the constant dissimilarity on a set S, every subset of S is independent, but the only blocks of (S, d) are S and the singletons) and that a block is not necessary independent (if $S = [n]$ and $d(i,j) = |i-j|$, then every interval of S is a block, but the only independent sets are S and the singletons).

2.2 PQ-Trees and the Consecutive One’s Property

A PQ-tree is a tree-based data structure introduced by Booth & Lueker (1976) to efficiently encode a family of permutations on a set S in which various subsets of S occur consecutively. A PQ-tree over a set S is a rooted, ordered tree T whose leaves are the elements of S and whose internal nodes are distinguished as P-nodes or Q-nodes. The children of a P-node can be arbitrarily permuted. The children of a Q-node are ordered, and the only permutation we can apply to this ordered list is to reverse it. We use the convention that P-nodes are represented by circles or ellipses and Q-nodes are represented by rectangles. For a P-node or Q-node $\alpha $ of T, we denote by $S(\alpha )$ the set of all leaves in the subtree of T rooted at $\alpha $. Two PQ-trees are said to be equivalent if one can be transformed into the other by applying a sequence of the following two equivalence transformations.

(1)
Arbitrarily permute the children of a P-node.
(2)
Reverse the children of a Q-node.

Example 1

The PQ-tree of Fig. 1 has one Q-node (the root) and one P-node $\alpha $ with $S({\alpha })=\{ 1,2,3\}$ corresponding to all permutations of the elements 1, 2, 3. Consequently, the equivalence class represented by this PQ-tree corresponds precisely to the set of 12 permutations of the forms $(\pi ,4,5,6,7)$ and $(7,6,5,4,\pi )$, where $\pi $ is any permutation on $\{ 1,2,3\}$.

PQ-trees are used in DNA sequencing (to create a contig map from DNA fragments), testing a matrix for the consecutive ones property, recognizing interval graphs, and testing planarity of a graph (for last three applications, see the original paper by Booth & Lueker (1976). Préa & Fortin (2014) used PQ-trees to encode the compatible orderings of a Robinson dissimilarity space (S, d), namely, that the set of all orders (permutations) compatible with d correspond to the equivalence class represented by a PQ-tree on S. We recall this correspondence.

A (0, 1)-matrix A has the Consecutive Ones Property (C1P) if its columns can be permuted in such a way that in all rows the 1s appear consecutively. Such an order is called compatible. If A is a C1P-matrix, then the sets of all its compatible permutations can be represented by a PQ-tree (Booth & Lueker, 1976). Let ${\textbf{B}}$ denote the set of all distinct balls of a dissimilarity space (S, d), i.e., ${\textbf{B}}:= \{B_r(x): r \in \text {Im}(d), x \in S\}$, where $\text {Im}(d):= \{\delta \in \mathbb {R}: \exists x,y \in S: d(x,y) = \delta \}$. Let $\Pi $ be the $\{0,1\}$-matrix whose columns are indexed by the points of S and rows by the balls of ${\textbf{B}}$: for $x\in S$ and $B\in {\textbf{B}}$, we define $\Pi (B,x):=1$ if $x\in B$ and $\Pi (B,x):=0$ otherwise. The following simple result establishes a link between Robinson dissimilarities and C1P-matrices:

Proposition 1

(Mirkin & Rodin, 1984) A dissimilarity d on S is Robinson if and only if the matrix $\Pi $ satisfies the C1P. There exists a bijection between the orders compatible with d and the orders compatible with $\Pi $.

Since the sets of all compatible permutations of a C1P-matrix can be represented by a PQ-tree, from Proposition 1, we obtain:

Corollary 1

The set of all orders compatible with a Robinson dissimilarity space (S, d) can be represented by a PQ-tree.

Booth & Lueker (1976) designed an iterative algorithm which determines if a matrix M has the C1P. If the answer is “yes,” the algorithm of Booth and Lueker constructs the corresponding PQ-tree. The algorithm is sketched in Algorithm 1, where

Universal-PQ-Tree([n]) returns a PQ-tree representing all n! permutations on [n]. This PQ-tree has one internal node, which is a P-node.
Given a PQ-tree T on a set S and $L\subset S$, PQ-Tree-Update(T, L) returns a PQ-tree representing all permutations encoded by T in which the elements of L are consecutive (form an interval). If there is no such permutation or if T is None, then it returns None. PQ-Tree-Update(T, L) mainly goes through T via a bottom-up traversal and modify (or not) each node accordingly to its type (P-node or Q-node), the type of its children and the repartition of the elements of L among its children.

Functions Universal-PQ-Tree and PQ-Tree-Update run in O(n), so Algorithm Booth-Lueker runs in O(nm).

2.3 Refinement Procedures

The general algorithmic paradigm of partition refinement was introduced by Paige & Tarjan (1987). For recognizing Robinson dissimilarities, this paradigm has already been used in Chepoi & Fichet (1997); Mirkin & Rodin (1984); Préa & Fortin (2014), as well as in some other papers.

Within the field of Robinson dissimilarities, the paradigm of partition refinement can be expressed in the following way: given a Robinson space (S, d) and two disjoint subsets U and V of S, refining U (with respect to V) consists in partitioning U into $U_1\cup U_2 \cup \ldots \cup U_k$ in such a way that there exists a subset $V'$ of V such that

For all $0<i\le k$ and for all $x,y \in U_i, z\in V$, the equality $d(x,z) = d(y,z)$ holds.
For all $0<i<j\le k$ and for all $x\in U_i, y\in U_j, z\in V$, if $z \in V'$, then $d(x, z) \le d(y,z)$, and if $z \in V\setminus V'$, then $d(x, z) \ge d(y,z)$.

Notice that this operation is not possible with arbitrary dissimilarity spaces, for example, if $U=\{u_1,u_2,u_3\}, V=\{v_1,v_2\}$ and $d(u_1,v_1) = d(u_1,v_2)< d(u_2,v_1) = d(u_3,v_2) < d(u_3, v_1) = d(u_2,v_2)$. However if (S, d) is Robinson and U is an interval of $U\cup V$ when $U\cup V$ is sorted along a compatible order, then it is possible to refine U with respect to V. In this case, $V'$ is the set of the points to the left of U and $V\setminus V'$ is the set of the points to the right of U (or vice-versa) or whose position is unknown; in addition, each $U_i$ is an interval of U when $U\cup V$ is sorted along a compatible order. More precisely, we will consider the procedure Refine(U, V), which

Takes as input a block U, divided into a sequence of consecutive blocks $(U_1, \ldots , U_k)$ and a set $V\subset S\setminus U$.
Returns a sequence of consecutive blocks $\mathcal {B}=(U_{1,1},\ldots , U_{1, k_1}, U_{2,1},\ldots U_{2,k_2}, \ldots , U_{k, 1}$, $\ldots , U_{k, k_k})$ and a partition $V^+ \cup V^- \cup V^\circ $ of V such that
1. (1)
  For all i, j, $U_{i,j} \subset U_i$,
2. (2)
  For all i, j, $x,y \in U_{i,j},$ and $z \in V$, $d(x,z) = d(y,z)$ holds,
3. (3)
  For all $i,j, i', j'$ such that $i<i'$ or ($i=i'$ and $j<j'$), and for all $x \in U_{i,j}, y \in U_{i',j'}$:
  1. (i)
    If $z \in V^+$, then $d(x,z) \ge d(y,z)$,
  2. (ii)
    If $z \in V^-$, then $d(x,z) \le d(y,z)$,
4. (4)
  $z \in V^\circ $ if and only if for all $x,y \in U,$ $d(x,z) = d(y,z)$ holds,
5. (5)
  For all $i,j,i',j'$ with $(i,j) \ne (i',j')$, there exists $z \in V$ such that for all $x \in U_{i,j}, y \in U_{i',j'}$, $d(x,z) \ne d(y,z)$ holds.

Notice that

Refine(U, V) runs in $O(mp\log p)$, where $p = \vert U\vert $ and $m = \vert V\vert $.
If V and $V'$ are two subsets of $S\setminus U$, refining U relatively to V and then refining U relatively to $V'$ is equivalent to refining U relatively to $V\cup V'$.

3 Partitioning S with Respect to a Pivot-Pair

Let (S, d) be a Robinson space with an arbitrary input order on S, which is not compatible with d. A pivot-pair is any pair $\{ x,y\}$ of distinct points of S. By an xy-order, we will mean any linear order $\prec $ compatible with d such that $x\prec y$. The aim of this section is to determine how the points of $S\setminus \{x, y\}$ can be located relatively to x and y in any xy-order. This defines a partition of S into 13 classes. We show that some of these classes are blocks of (S, d). Moreover, we establish the order between the resulting blocks and between the remaining classes and these blocks.

Given two points $z, t\in S\setminus \{x, y\}$, we say that z is before t relatively to the pair $\{x,y\}$ (with the notation $z\prec _{xy} t$ or $z\prec t$ if there is no ambiguity) if $z\prec t$ for every xy-order $\prec $. For two disjoint subsets U, V of S, we write $U\prec _{xy} V$ if $u\prec _{xy} v$ for any $u\in U$ and any $v\in V$. Since the reversal of any xy-order is a yx-order and that any compatible order is either an xy-order or an yx-order, $z\prec _{xy} t$ if and only if $t \prec _{yx} z$. Thus, U is a block of d if and only if U is an interval of any xy-order. We say that point $z_2$ is d-between two points $z_1$ and $z_3$ if $z_2$ is between $z_1$ and $z_3$ (i.e., $z_1\prec z_2 \prec z_3$ or $z_3\prec z_2\prec z_1$) for any compatible order. As above, $z_2$ is d-between $z_1$ and $z_3$ if and only if $z_2$ is between $z_1$ and $z_3$ for any compatible xy-order. Consequently, without loss of generality, we will further consider only xy-orders.

3.1 The Partition

$S= L\cup M\cup R \cup X\cup Y \cup A^{\circ } \cup A{^{=}}$. Let x, y be any pivot pair of S. Comparing for each other point z of S the three distances d(z, x), d(z, y), and d(x, y), we partition the set S as follows:

$$\begin{aligned} S= L\cup M\cup R \cup X\cup Y \cup A^{\circ } \cup A{^{=}}, \end{aligned}$$

where

$$\begin{aligned} L= & {} \{z \in S: d(z, y)> \max (d(z, x), d(x, y) \},\\ M= & {} \{z \in S: d(x, y)> \max (d(z, x), d(z, y) \},\\ R= & {} \{z \in S: d(z, x)> \max (d(z, y), d(x, y) \},\\ X= & {} \{z \in S: d(z, y) = d(x, y)> d(z, x) \},\\ Y= & {} \{z \in S: d(z, x) = d(x, y)> d(z, y) \},\\ A^{\circ }= & {} \{z \in S: d(z, y) = d(z, x) > d(x, y) \},\\ A{^{=}}= & {} \{z \in S: d(z, y) = d(z, x) = d(x, y) \}. \end{aligned}$$

Furthermore, we partition the set L into $L = L_{\ell } \cup L_m \cup L_r$, the set M into $M = M_{\ell } \cup M_m \cup M_r$, and the set R into $R = R_{\ell } \cup R_m \cup R_r$, where

$$\begin{aligned}{} & {} L_{\ell } = \{z \in S: d(z, y)> d(z, x)> d(x, y) \},\\{} & {} L_m = \{z \in S: d(z, y)> d(z, x) = d(x, y) \},\\{} & {} L_r = \{z \in S: d(z, y)> d(x, y)> d(z, x) \},\\{} & {} M_{\ell } = \{z \in S: d(x, y)> d(z, y)> d(z, x) \},\\{} & {} M_m = \{z \in S: d(x, y)> d(z, y) = d(z, x) \},\\{} & {} M_r = \{z \in S: d(x, y)> d(z, x)> d(z, y)\},\\{} & {} R_{\ell } = \{z \in S: d(z, x)> d(x, y)> d(z, y) \},\\{} & {} R_m = \{z \in S: d(z, x)> d(z, y) = d(x, y) \},\\{} & {} R_r = \{z \in S: d(z, x)> d(z, y) > d(x, y) \}. \end{aligned}$$

We now continue with properties of these sets. We call the sets L, M, R, X, and Y non-ambiguous and the sets $A^\circ $ and $A^=$ ambiguous. In the following three subsections (i.e., until Lemma 12), we assume that d is Robinson.

3.2 Properties of the Non-Ambiguous Sets

We consider the properties of the non-ambiguous sets L, M, R, X, and Y. Notice first that $x \in X$ and $y \in Y$.

Lemma 1

X and Y are blocks.

Proof

We will prove the result for X. Pick any $z \in X$ and let t be any point which is d-between z and x. We assert that $t \in X$. First, notice that $z \prec _{xy} y$. Indeed, if for an xy-order $\prec $ we have $x\prec y\prec z$, then $d(x, y) \le d(x, z)$, contrary to the assumption that $z\in X$. Thus, for any xy-order $\prec $, we have $z\prec t\prec x\prec y$ or $x\prec t\prec z\prec y$. In both cases, we have $d(y, t)\in [d(z, y), d(x, y)]$, yielding $d(y, t)=d(x, y)$. Since $d(x, t) \le d(x, z) < d(x, y)$, we conclude that $t \in X$. $\square $

Lemma 2

$L \prec _{xy} X \prec _{xy} M \prec _{xy} Y \prec _{xy} R$.

Proof

We first show that $L\prec _{xy} X$. Pick any $z \in L$. Since $d(x, z) < d(y, z)$, y cannot be d-between x and z and since $d(x, y)<d(z, y)$, z cannot be d-between x and y. Consequently, $z \prec _{xy} y$ and since X is a block, we get $L \prec _{xy} X$. Similarly, we deduce that $Y \prec _{xy} R$.

We now show that $X\prec _{xy} M$. Let $z \in M$. Since $d(y, z)<d(x, y)$, x cannot be d-between z and y, so $x \prec _{xy} z$, yielding $X \prec _{xy} M$. Similarly, we have $M \prec _{xy} Y$. $\square $

Lemma 3

$L_{\ell } \prec _{xy} L_m \prec _{xy} L_r$ and $R_{\ell } \prec _{xy} R_m \prec _{xy} R_r$.

Proof

Pick any $u \in L_{\ell }$, $v \in L_m$, and $w \in L_r$. Then, we have $d(x, u)> d(x, v) > d(x, w)$. Since $L \prec _{xy} x$, we obtain $z \prec _{xy} v \prec _{xy} w$. The proof of the second statement is similar. $\square $

Lemma 4

$M_{\ell } \prec _{xy} M_m \prec _{xy} M_r$.

Proof

We prove that $M_{\ell } \prec _{xy} M_m$ (the proof that $M_m \prec _{xy} M_r$ is similar). Let $z \in M_{\ell }$ and $t \in M_m$ and suppose by way of contradiction that for some xy-order $\prec $ we have $x\prec t\prec z \prec y$. Since t is between x and z, we must have $d(x, t) \le d(x, z)$. Since z is between t and y, we must have $d(y, z) \le d(y, t)$. This is impossible because $d(x, t) = d(y, t)$ and $d(x, z) < d(y, z)$. $\square $

3.3 Properties of the Ambiguous Set $A^=$

We now provide relationships between the ambiguous set $A^=$ and the non-ambiguous sets. Namely, we show that $A^=$ is “mixed” with the sets $L_r$, X, M, Y, and $R_{\ell }$.

Lemma 5

$L_m \prec _{xy} A^= \prec _{xy} R_m$.

Proof

We prove that $L_m \prec _{xy} A^=$ (the proof that $A^= \prec _{xy} R_m$ is similar). Pick any $z \in L_m$ and $t \in A^=$. Suppose by way of contradiction that for some xy-order $\prec $ we have $t\prec z$. Since $z \prec _{xy} x$, we have $z\prec x$, yielding $d(y, z)\le d(y, t)$, which is impossible because $d(y, z)>d(x, y)=d(y, t)$. $\square $

A more precise location of the set $A^=$ can be given when one of the sets $L_r$ or $R_{\ell }$ is nonempty:

Lemma 6

If $L_r \ne \varnothing $, then $X \prec _{xy} A^=$. If $R_{\ell } \ne \varnothing $, then $A^= \prec _{xy} Y$. Consequently, if $L_r \ne \varnothing $ and $R_{\ell } \ne \varnothing $, then $X\prec _{xy} A^=\prec _{xy} Y$.

Proof

Let $L_r \ne \varnothing $ (the proof of the case $R_{\ell } \ne \varnothing $ is similar). Pick any $z \in L_r$ and $t \in A^=$. Suppose by way of contradiction there exists an xy-order $\prec $ such that $t\prec x$. Then, either $z \prec t \prec x \prec y$ or $t\prec z \prec x \prec y$. If $z \prec t \prec x \prec y$, then we would have $d(x, t) \le d(x, z)$, which is impossible because $d(x, z)<d(x, y)=d(x, t)$. If $t\prec z \prec x \prec y$, then we would have $d(y, z) \le d(y, t)$, which is impossible because $d(y, t) = d(x, y) < d(y, z)$. $\square $

Lemma 7

If $M \ne \varnothing $, then for all xy-orders $\prec $ and any $z \in A^=$, we have $z\prec x$ or $y\prec z$.

Proof

Suppose by way of contradiction that there exists an xy-order $\prec $ such that $x\prec z\prec y$. Pick any $t \in M$. Then, either we have $x\prec z\prec t\prec y$ or $x\prec t\prec z\prec y$. In the first case, we will obtain $d(x, t)\ge d(x, z)=d(x, y)$, and in the second case, we will obtain $d(t, y)\ge d(z, y)=d(x, y)$, contrary to the assumption $d(x, y)>\max \{ d(x, t),d(t, y)\}$. $\square $

By Lemma 8 below, $A^\circ $ does not “interfere” with X, Y, and M. Therefore, from Lemma 7, we obtain:

Corollary 2

If $M \ne \varnothing $, then M and $X\cup M\cup Y$ are blocks.

From Lemmas 4 and 2, we have:

Corollary 3

The sets $M_{\ell }$, $M_m$, and $M_r$ are blocks.

From Lemmas 6–7, we have:

Corollary 4

At least one of the sets $L_r$, M, $R_{\ell }$ or $A^=$ is empty.

Lemmas 6–7 and Corollary 4 are resumed in Table 1.

Table 1 The blocks induced by $L_r$, X, M, Y, $R_{\ell }$ and $A^=$ when $A^= \ne \varnothing $

Full size table

3.4 Properties of the Ambiguous Set $A^{\circ }$

We now look at the set $A^\circ $. First, we show that $A^\circ $ may interfere only with the sets $L_{\ell }$ and $R_r$.

Lemma 8

$L_m \cup L_r \cup X \cup M \cup A^= \cup Y \cup R_{\ell } \cup R_m$ is a block.

Proof

Let $\prec $ be an xy-order and z be a point of $L_{\ell }\cup A^\circ \cup R_r$. Then, $d(x, z)>d(x, y)$ and $d(y, z) > d(x, y)$. Let $v_{\ell }$ and $v_r$ be the leftmost and the rightmost points of $L_m \cup L_r \cup X \cup M \cup A^= \cup Y \cup R_{\ell } \cup R_m$ with respect to $\prec $. Then, $v_{\ell }\preceq x\prec y\preceq v_r$. Consequently, $v_{\ell }\in L_m \cup L_r \cup X\cup A^=$ and $v_r\in A^= \cup Y \cup R_{\ell } \cup R_m$, from which we conclude that $d(x, v_{\ell })\le d(x, y)$ and $d(y, v_r)\le d(x, y)$.

Since $d(x, z)>d(x, y)$ and $d(x, v_{\ell }) \le d(x, y)$, z cannot be located between $v_{\ell }$ and x. Since $d(y, z)>d(x, y)$ and $d(y, v_r)\le d(x, y)$, z cannot be located between y and $v_r$. Finally, since $d(x, z)>d(x, y)$, z cannot be located between x and y. $\square $

In order to determine the relations between $A^\circ $ and $L_{\ell }$, $R_r$, we pick the points of $L_{\ell }$ and $L_r$ farthest from y and x, respectively. Namely, let $z_{\ell }$ be a point of $L_{\ell }$ such that for all $z \in L_{\ell }$, we have $d(y, z) \le d(y, z_{\ell })$, and let $z_r$ be a point of $R_r$ such that for all $z \in R_r$, we have $d(x, z) \le d(x, z_r)$. Finally, we define the following subsets of $A^\circ $:

$$\begin{aligned}{} & {} A^\circ _{\ell }:= \{ z \in A^\circ : d(y, z)< d(y, z_{\ell }) \text { and } d(z_{\ell }, z) \le d(z_{\ell }, x)\},\\{} & {} A^\circ _r:= \{ z \in A^\circ : d(x, z) < d(x, z_r) \text { and } d(z_r, z) \le d(z_r, y)\}, \end{aligned}$$

$$ A_\circ ^\circ := A^\circ \setminus (A^\circ _{\ell } \cup A^\circ _r).$$

If $L_{\ell } = \varnothing $, then we set $A^\circ _{\ell }:= \varnothing $, and if $R_r = \varnothing $, then we set $A^\circ _r:= \varnothing $.

Lemma 9

There does not exist $z \in A^\circ $ such that $d(y,z) < d(y, z_{\ell })$ and $d(z_{\ell }, x)< d(z_{\ell }, z) < d(z_{\ell }, y)$.

Proof

Suppose by way of contradiction that such a point z exists, however there exists an xy-order $\prec $. Since $d(y, z)<d(y, z_{\ell })$, $z_{\ell }\prec z$. As $d(z_{\ell }, x)< d(z_{\ell }, z) < d(z_{\ell }, y)$, z must be located between x and y, which is impossible because $d(x, z) > d(x, y)$. $\square $

Similarly, we have:

Lemma 10

There does not exist $z \in A^\circ $ such that $d(x, z) < d(x, z_r)$ and $d(z_r, y)< d(z_r, z) < d(z_r, x)$.

Lemma 11

$L_{\ell } \cup A^\circ _{\ell }$ and $R_r \cup A^\circ _r$ are blocks.

Proof

Let $\prec $ be an xy-order and pick any $z \in A^\circ _{\ell }$. By Lemma 8, $z \prec L_m$ or $R_m \prec z$. Since $d(z_{\ell }, z) \le d(z_{\ell }, x) < d(z_{\ell }, y)$, z cannot be at the right of $R_m$, hence $A^\circ _{\ell }\prec L_m$ and so $L_\ell \cup A^\circ _{\ell } \prec L_m$. Similarly, $R_m \prec R_r \cup A^\circ _r$.

Let $z\in L_{\ell } \cup A^\circ _{\ell }$ and $t \in A_\circ ^\circ $. We suppose that $t\prec x$ (otherwise $ L_{\ell } \cup A^\circ _{\ell } \prec t$, and we are done), so if $d(z_\ell , t) > d(z_\ell , x)$, we have $t \prec z_\ell $. We prove now that $t\prec z$.

If $d(z, y) < d(z_{\ell }, y)$, then $z_\ell \prec z$. So, if $d(z_\ell , t) > d(z_\ell , x)$, then $t \prec z$. If $d(z_\ell , t) \le d(z_ell, x)$, then, as $t \notin A^\circ _{\ell }$, we have $d(y, t) \ge d(y, z_\ell )$, and so $t \prec z_\ell $. Thus, in this case too, $t\prec z$.
If $d(z, y) = d(z_{\ell }, y)$, then $z \in L_{\ell }$ and $d(z, x) < d(z, y)$. As $t \notin A^\circ _{\ell }$, we have $d(y,t) \ge d(y, z_\ell )$ or $d(z_\ell , t) > d(z_\ell , x)$. If $d(y,t) \ge d(y, z_\ell )$, then, as $d(t,x) = d(t,y)$, we have $d(z,x) < d(t, x)$ and so $t\prec x$. If $d(z_\ell , t) > d(z_\ell , x)$, then $t\prec z_\ell $ and $d(t, x) = d(t, y) \ge d(z_\ell , y) = d(z, y) > d(z, x)$. So $t\prec z$.

As it is impossible that $d(z, y) > d(z_{\ell }, y)$, $L_{\ell } \cup A^\circ _{\ell }$ is an interval. Similarly, $R_r \cup A^\circ _r$ is an interval. $\square $

There may exist many points which, like $z_{\ell }$ or $z_r$, maximize the distance from y or x. By Lemma , the choice of $z_{\ell }$ and $z_r$ among these points has no impact on the sets $A_{\ell }^\circ $ and $A_r^\circ $. In addition, from this lemma, we obtain:

Corollary 5

$A^\circ _{\ell } \cap A^\circ _r = \varnothing $.

We conclude this section by showing that the set $S \setminus A_\circ ^\circ $ defines an block.

Lemma 12

$S \setminus A_\circ ^\circ $ is a block.

Proof

Let $\prec $ be an xy-order and pick any $z \in A_\circ ^\circ $. First suppose that $z\prec x$ and let t be the leftmost point of $S \setminus A_\circ ^\circ $ for $\prec $. If $L_{\ell } \cup A^\circ _{\ell } \ne \varnothing $, then $t \in L_{\ell } \cup A^\circ _{\ell }$ and, by Lemma , $z\prec t$. If $L_{\ell } \cup A^\circ _{\ell } = \varnothing $, then $d(x, t) \le d(x, y) < d(x, z)$, and so $z \prec t$. Similarly, if $y \prec z$, then $S\setminus A_\circ ^\circ \prec z$. Since z cannot be located between x and y, we conclude that $S\setminus A_\circ ^\circ $ is a block. $\square $

3.5 Summary

The links between the sets $L_r$, X, M, Y, $R_{\ell }$ and $A^=$ when $A^= \ne \varnothing $ can be resumed by Table 1. In this case, $(L_{\ell }\cup A_{\ell }^\circ $, $L_m$, $\mathcal {B}$, $R_m$, $R_r\cup A_r^\circ )$ is a sequence of consecutive blocks. Notice that $L_{\ell }\cup A_{\ell }^\circ $, $L_m$, $R_m$ or $R_r\cup A_r^\circ $ may be empty.

If $A^= = \varnothing $, then $(L_{\ell }\cup A_{\ell }^\circ $, $L_m$, $L_r$, X, M, Y, $R_{\ell }$, $R_m$, $R_r\cup A_r^\circ )$ is a sequence of consecutive blocks (which can all be empty except X and Y).

Lemmas 9 and 10, Corollaries 4 and 5 can be settled as

Theorem 1

Let d be a dissimilarity on S. If one of the following conditions is satisfied

$L_r$, M, $R_{\ell }$ and $A^=$ are all non empty,
$A^\circ _{\ell } \cap A^\circ _r \ne \varnothing $,
There exist $z \in A^\circ $ such that $d(y,z) < d(y, z_{\ell })$ and $d(z_{\ell }, x)< d(z_{\ell }, z) < d(z_{\ell }, y)$,
There exist $z \in A^\circ $ such that $d(x, z) < d(x, z_r)$ and $d(z_r, y)< d(z_r, z) < d(z_r, x)$,

then d is not Robinson. In this case, we say that we have an internal contradiction.

It is thus simple to construct, from a pivot-pair (x, y), a set of intervals:

(1)
Construct the sets $L_\ell , L_m, L_r,X, M, Y,R_\ell , R_m, R_r, A^=, A_\ell ^\circ , A_r^\circ , A_\circ ^\circ $
(2)
From Table 1, transform these sets into sequences of consecutive blocks.
(3)
Remove the empty blocks from the sequence.
(4)
Take the blocks two by two to form the intervals.

For instance, if after Step (1), we have $A^=, L_r\ne \varnothing $ and $R_\ell = M = R_m = \varnothing $; then, after Step (2), we have two sequences of consecutive blocks: $(L_\ell \cup A_\ell ^\circ , L_m, L_r, X, Y\cup A^=, R_m, R_r)$ and (Y). Since $R_m = \varnothing $, we remove it from the first sequence, and so, after Step (4), we get the set $\{L_\ell \cup A_\ell ^\circ \cup L_m, L_m\cup L_r, L_r\cup X, X\cup Y\cup A^=, Y\cup A^=\cup R_r, Y \}$.

We call this construction (Steps (1)–(4)) Intervals-From-Pivot-Pair(x, y). If, at Step (1), we have an internal contradiction, then Intervals-From-Pivot-Pair(x, y) returns None. Clearly, Intervals-From-Pivot-Pair runs in O(n).

4 An Iterative Algorithm Using PQ-Trees

In this section, we present Algorithm irri (Iterative-Robinson-Recognition-with-Intervals), which is a direct application of Sections 3 and 2.2. Algorithm irri considers all pairs $\{ x,y\}$ of S as pivot-pairs and for each of them computes the partition from Section 3 and its intervals. Then, the algorithm updates the current PQ-tree T by subsequently applying PQ-Tree-Update(T, I) for each occurring interval. We first give a “basic” version of irri (Algorithm 2) which runs n $O(n^3)$ in all cases and then an “elaborate” version of irri (Algorithm 3) which runs in $O(n^3)$ in worst case but in $O(n^2)$ on average (this will be experimentally shown in Appendix A).

Proposition 2

If d is Robinson, then Algorithm 2 returns a PQ-tree that represents all compatible orders of d; otherwise, it returns None. Algorithm 2 runs in $O(n^3)$ time.

Proof

Clearly, if d is Robinson, then Algorithm 2 returns a PQ-tree T such that all permutations compatible with d are represented by the PQ-trees equivalent to T. In addition, for every pair $\{x, y\} \subset S$ ($x \ne y$), if we denote d(x, y) by $\delta $, then $X \cup L_{\ell } \cup L_m \cup M \cup A^= \cup Y = B_\delta (x)$. So all balls of d, for all centers and all radius, are implicitly considered by Algorithm 2, and the result follows from Proposition 1.

For each pivot-pair $\{x, y\} \subset S$, ${\mathcal {I}}_{xy}$ contains at most 13 intervals. Since each interval is treated by PQ-Tree-Update in O(n), Algorithm 2 runs in $O(n^3)$. $\square $

At each step of Algorithm 2, we consider several blocks, not only the one defined by the ball $B_\delta (x)$. Therefore, it may happen that the algorithm builds the PQ-tree of d before checking all pivot-pairs $\{x, y\}$. This option is incorporated in Algorithm 3.

Given two points $u,v\in S$, if no pivot-pair chosen by the Algorithm 3 contains u or v, then the distance d(u, v) is not considered by the algorithm; therefore, Algorithm 3 cannot determine if d is not Robinson. So, Algorithm 3 must consider at least O(n) pivot-pairs. Thus, at Line 1, we set $\kappa $ (which is the initial number of tested pivot pairs) to n. Testing if an order is compatible (line 3) with d can be done in $O(n^2)$ time. To avoid making too many such tests, in the loop beginning at Line 2, we consider $2^{i-1} \times n$ pivots at the i-th occurrence of the repeat loop. Consequently, we have:

Proposition 3

Algorithm 3 runs in O(Kn), where K is the total number of considered pivot-pairs.

Proof

Suppose that Algorithm 3 stops after testing a total number of K pivot-pairs (with a compatible permutation or None as answer). We set $P:= K/n$. The test at Line 3 is made for $p=1, 2, 4, 8,\ldots , P/2, P$, so it is made $\log P$ times. As each of these tests runs in $O(n^2)$, the total time taken by the tests at Line 3 is $O(\log P\, n^2)$, which is less than O(Kn).

The complexity of the rest of the algorithm is dominated by the calls to Intervals-From-Pivot-Pair and PQ-Tree-Update. Each of these calls runs in O(n), and there are K such calls. So Algorithm 3 runs in O(Kn). $\square $

Consequently, Algorithm 3 runs in $O(n^3)$ in the worst case. Experimental simulations in Appendix A show that $K=O(n)$ on average, so Algorithm 3 has an average complexity of $O(n^2)$. In some cases, we can have $K=\Theta (n^2)$, as the following Examples 2 and 3 show:

Example 2

Let $S=\{1,\ldots , n\}$, $d(1, i) = i$ for $i>1$ and $d(i, j) = 1$ for $1<i<j$. For every couple $(x, y)=(i, j)$ with $1 < i, j$, we have $X=\{i\}$, $Y = \{j\}$, $L = \{1\}$ and $A^= = S \setminus \{1, i, j \}$. So, to determine a compatible order, the algorithm must consider all pairs of the form $\{1, i \}$ as pivot-pairs. Since pivots are selected at random, we have to select $O(n^2)$ pairs in order to consider all pertinent ones.

Example 3

Consider the PQ-tree $T_t$ with $3^t$ leaves, all at depth t and such that all the internal nodes are Q-nodes with three children; see $T_3$ on Fig. 2.

On $S=\{1,\ldots , 3^t\}$ we define the dissimilarity $d_t$ in the following way: for $i, j \in S$, let $\alpha $ be the lowest common ancestor of i and j and $\beta _1, \beta _2, \beta _3$ be the three children (in this order) of $\alpha $. We set $d_t(i, j):= 2h$ if i or j is a descendant of $\beta _2$ and $d_t(i, j):= 2h+1$ otherwise, where h is the (graph) distance between $\alpha $ and i. Clearly, $d_t$ is a Robinson dissimilarity. Recall that for a node $\beta $ of $T_t$, we define $S(\beta )\subseteq S$ as the set of leaves under $\beta $. If we partition S with respect to the pivot-pair $\{ i, j\}$, then, up to symmetry, we get

$X = S({\beta _1})$, $Y = S_{\beta _3}$, $M_m = S({\beta _2})$ and $A_\circ ^\circ = S\setminus S(\alpha )$ if $i \in S({\beta _1})$ and $j \in S({\beta _3})$.
$X = S({\beta _1})$, $Y = S({\beta _2})$, $R_m = S({\beta _3})$ and $A_\circ ^\circ = S\setminus S(\alpha )$ if $i \in S({\beta _1})$ and $j \in S({\beta _2})$.

To determine an order compatible with $d_t$, we have to consider a pivot-pair $\{i, j\} \in S(\alpha )$ for every internal node $\alpha $, so we must check $O(n^2)$ pivot-pairs.

5 A Recursive Algorithm by Partition Refinement

This section is devoted to Algorithm Recursive-Robinson-Recognition-by-Refinement (RRRR). Contrary to irri, it is recursive and does not use PQ-trees, but partition refinement.

5.1 Preliminaries

Algorithm RRRR is based on the two following Claims 1 and 2 (whose proof is elementary) and on the Refine procedure defined in Section 2.3:

Claim 1

Let (S, d) be a Robinson space. If S is partitioned into a sequence of consecutive blocks $(B_{1,1},\ldots , B_{1, k_1}, B_{2,1}, \ldots , B_{2, k_2},$ $\ldots , B_{k,1},\ldots , B_{k,k_k})$ such that

for all $1\le i\ne i' \le k, 1\le j\le k_i, 1\le j' \le k_{i'}$, $x, y \in B_{i,j}, z \in B_{i', j'}$, we have $d(x,z) = d(y,z)$,
for all $1\le i\le k, 1\le j_1<j_2\le k_i$, there exist $i' \ne i, 1\le j' \le k_{i'}$ such that for all $x \in B_{i,j_1}, y \in B_{i, j_2}, z \in B_{i', j'}$, we have $d(x,z) \ne d(y,z)$,
for all $1\le i< i' \le k, 1\le j_1 <j_2 \le k_i, 1\le j'\le k_{i'}, x \in B_{i, j_1}, y \in B_{i, j_2}, z \in B_{i', j'}$, we have $d(x,z) \ge d(y,z)$,
for all $1\le i'< i \le k, 1\le j_1 <j_2 \le k_i, 1\le j'\le k_{i'}, x \in B_{i, j_1}, y \in B_{i, j_2}, z \in B_{i', j'}$, we have $d(x,z) \le d(y,z)$.

Additionally, if for all $1\le i\le k$, $\pi _i$ is a permutation of $B_i:= \bigcup _{j=1}^{k_i}{B_{i,j}}$ such that

$\pi _i$ is a compatible order on $(B_i,d\vert _{B_i})$,
for all $1\le j_1 < j_2 \le i_k$, $x \in B_{i, j_1}, y\in B_{i,j_2}$, we have $x \prec _{\pi _i} y$,

then $\pi _1\pi _2\ldots \pi _k$ is a compatible order of d.

Claim 2

Let (S, d) be a Robinson space, U be an independent block of S, and $u \in U$. Then, for every compatible order $\sigma $ of $d|_{U}$ and $\pi _1u\pi _2$ of $d|_{S\setminus U \cup \{u\}}$, $\pi _1\sigma \pi _2$ is a compatible order of d. Moreover, every compatible order on (S, d) can be decomposed in this way.

5.2 The Algorithm

We now describe Algorithm RRRR. It takes as input a pair (S, U) where (S, d) is a dissimilarity space and U is a subset of S of size $\ne 1$ (U may be empty) and returns, if (S, d) is Robinson, a compatible order of S. Sets S and U are given as sequences (eventually of length 1) of disjoint subsets $(B_1, \ldots , B_k)$ and $(U_1, \ldots , U_{k'})$. If (S, d) is Robinson, then $(B_1, \ldots , B_k)$ and $(U_1, \ldots , U_{k'})$ will be sequences of consecutive blocks (so U is a block). The initial call of RRRR will be with $U = \varnothing $ and $k=k' =1$.

If $\vert S\vert \le 2$ or if d is the constant dissimilarity, then (S, d) is Robinson and RRRR returns any order such that $B_1<\ldots <B_k$. If all blocks $B_i$ are singletons, then RRRR returns the total order on these sets. Otherwise,

(1)
If $U = \varnothing $, then:
1. (a)
  If S is given by a sequence $(B_1, \ldots , B_k)$ with $k>1$, then take as U any set $B_i$ of size $>1$.
2. (b)
  Otherwise, as d is not constant, then there exist $x\in S$ and $\delta >0$ such that $B_\delta (x) \varsubsetneq S$. Set $U:= B_\delta (x)$. (Section 5.3 shows how to find such a ball $B_\delta (x)$ or to decide that d is constant in O(n) time, after a $O(n^2\log n)$ preprocessing). Notice that $\vert B_\delta (x)\vert > 1$ and that if (S, d) is Robinson, then $B_\delta (x)$ is a block.
(2)
Refine U relatively to $V:= S\setminus U$.
(3)
Two cases may occur:
1. (a)
  If $V^- = V^+ = \varnothing $, then U is an independent block. By Claim 2, after performing the following operations (i)-(iii), the algorithm returns a compatible order in Step (4) if (S, d) is Robinson.
  1. (i)
    Pick $u \in U$.
  2. (ii)
    Apply RRRR on $(U, \varnothing )$. We get a permutation $\sigma $ of U.
  3. (iii)
    Apply RRRR on $(V\cup \{u\}, \varnothing )$. We get a permutation $\pi _1u\pi _2$ of $V\cup \{u\}$.
2. (b)
  If $V^-\cup V^+ \ne \varnothing $, then U is partitioned into a sequence of disjoint subsets $(U_1,\ldots , U_p)$ with $p>1$.
  1. (i)
    Randomly choose $u_1\in U_1$ and $u_p\in U_p$.
    - If $V^-\ne \varnothing $, then take $v^- \in V^-$ such that $d(v^-, u_1)$ is maximal and set $W^-:= \{w \in V^\circ : d(u_1, w) \le d(u_1, v^-) \wedge d(v^-, w) \le d(v^-, u_1) \}$.
    - If $V^- =\varnothing $, then $W^-:=\varnothing $.
    - If $V^+\ne \varnothing $, then take $v^+ \in V^+$ such that $d(v^+, u_p)$ is maximal and set $W^+:= \{w \in V^\circ : d(u_k, w) \le d(u_p, v^+) \wedge d(v^+, w) \le d(v^+, u_p) \}$.
    - If $V^+ = \varnothing $, then $W^+:= \varnothing $.
    By Lemma 13, $\overline{V^{-}}:= V^-\cup W^-$, U, and $\overline{V^+}:= V^+ \cup W^+$ are consecutive blocks.
  2. (ii)
    Refine $\overline{V^{-}}$ and $\overline{V^{+}}$ relatively to U. If these sets are nonempty, then we get $\overline{V^-}$ as a sequence $(B_1^-,\ldots ,B_{k^-}^-)$ and $\overline{V^+}$ as $(B_1^+,\ldots , B_{k^+}^+)$. Note that in each case, U is unchanged.
  3. (iii)
    Pick a point $u\in U$ and set $U':= \overline{V^-} \cup \{u\} \cup \overline{V^+}$, given as the sequence $(B_1^-,\ldots ,B_{k^-}^-,\{u\}, B_1^+,\ldots , B_{k^+}^+)$.
  4. (iv)
    Run RRRR on $(U=(U_1,\ldots , U_k), \varnothing )$ and get a permutation $\sigma $ of U.
  5. (v)
    Run RRRR on $(V^\circ \cup U', U')$ and get a permutation $\pi :=\pi _1u\pi _2$.
(4)
Return $\pi _{1}\sigma \pi _{2}$.

The pseudo-code of Algorithm RRRR is given by Algorithm 4.

Lemma 13

With the notations of Step (3-b-i): A point $w \in V^\circ $ is d-between a point of $V^-$ and $u_1$ if and only if $w \in W^-$; a point $w \in V^\circ $ is d-between a point of $V^+$ and $u_p$ if and only if $w \in W^+$.

Proof

We prove the first assertion (the proof of the second one is similar). For a given $u_1u_p$-order, let $v'_{\ell }$ be the leftmost point of $V^-$. Even if $v_{\ell }' \ne v_{\ell }$, we have $d(v'_{\ell }, u_1) = d(v_{\ell }, u_1)$. So the “only if" part is the definition of Robinson dissimilarities.

To prove the converse, we only consider $u_1u_p$-orders. Since $d(w, u_1) \le d(v^-, u_1)$, we have $d(w, u_p) < d(v^-, u_p)$, so w is on the right of $v^-$. As $d(v^-, w) \le d(v^-, u_1)$, we have $d(v^-, w) < d(v^-, u_p)$ and w is on the left of $u_p$. As U is a block, w is on the left of $u_1$. $\square $

When we call RRRR at Step (3-a-ii) or (3-b-iv), U has to be partitioned into a sequence of consecutive blocks $(U_1, \ldots , U_k)$ such that, $\forall 1\le i\le k, x,y \in U_i, z \in S\setminus U$, we have $d(x, z) = d(y, z)$, i.e., U has to be refined relatively to $S\setminus U$.

If there is a sequence of Step (3-b), and so of calls by Step (3-b-iv), then we get a sequence of sets $U^0, U^1= U'^0, U^2= U'^1,\ldots $ By Step (2), each $U^i$ is refined relatively to $S\setminus \bigcup _{j<i}{U^j}$. But, at Step (3-b-ii), $U^i$ is refined relatively to $U^{i-1}$ but not relatively to $\bigcup _{j<i-1}{U^j}$. But as $\forall x,y \in \bigcup _{j<i-1}{U^j}, z \in U^i$, we have $d(x, z) = d(y, z)$ and $U^{i-1}$ contains a point of $\bigcup _{j<i-1}{U^j}$ (at Step (3-b-iii), we pick $u \in \bigcup _{j<i-1}{U^j}$), refining $U^i$ relatively to $U^{i-1}$ is equivalent to refining $U^i$ relatively to $\bigcup _{j<i}{U^j}$.

So, we have

Proposition 4

If d is Robinson, RRRR$(S,\varnothing )$ returns a compatible order.

Notice in addition that the construction of $W^-$ and $W^+$ at Step (3-b-i) is similar to the construction of $A^\circ _\ell $ and $A^\circ _r$ for Algorithms 2 and 3 (see Section 3.4)

5.3 Finding a Block

Before running RRRR, we compute, for each $x\in S$, the balls $B_\delta (x)$ centered in x and of radius $\delta >0$ that we give as a sequence $\mathcal {B}(x):=(B_\delta (x))$, sorted by increasing values of $\delta $. This takes (for all $x\in S$) $O(n^2\log n)$. There are three cases where RRRR can be called with $U=\varnothing $:

The initial call: finding a ball $B_\delta (x)$ which is different from S or determining that d is a constant dissimilarity takes O(n): we only have to check the first element of each sequence $\mathcal {B}(x)$.
At Step (3-a-ii) or (3-b-iv). If U is not partitioned into a sequence of consecutive blocks $(U_1, \ldots , U_k)$, then for all $x, y \in U, z \in S\setminus U$, we have $d(x, y) \le d(x, z) = d(y, z)$. So $d\vert _U$ is constant if and only if for all $x \in U$, the first element $B_{\delta _1}(x)$ of $\mathcal {B}(x)$ is such that $ \vert B_{\delta _1}(x)\vert \ge \vert U\vert $. Thus checking if d is constant or finding a block can be done in O(n).
At Step (3-a-iii) with V not partitioned into a sequence of consecutive blocks. For $x\in V$, as d(x, y) has a constant value for $y\in U$, d(x, y) has a constant value for $y \in V\cup \{u\}$ if and only if the first element $B_{\delta _1}(x)$ of $\mathcal {B}(x)$ is such that $ \vert B_{\delta _1}(x)\vert \ge \vert S\vert $ (It is possible that $ \vert B_{\delta _1}(x)\vert > S$ since the computation of $\mathcal {B}(x)$ has been made as a pre-treatment, on the whole set). We can check that in $O(\vert V\vert )$ time. We now look at the balls centered at u. If one of these balls contains a point not in U, it contains all of U. So, a ball $B_\delta (u)$ corresponds to a block of $V\cup \{u\}$ if and only if $\vert U \vert< \vert B_\delta (u)\vert < \vert U \cup V\vert $. If such a ball exists, it appears in the first $\vert U\cup V\vert $ elements of $\mathcal {B}(u)$. So checking the existence of such a ball takes $O(\vert U\cup V\vert )$.

5.4 Complexity

In this subsection, we prove the following result.

Proposition 5

Algorithm RRRR runs in $O(n^2\log n)$ time in worst case.

Proof

We set $n:= \vert S\vert $, $p:= \vert U\vert $ and $m:= n-p$. Apart from the recursive calls, the steps of RRRR have the following complexity: O(n) for Steps (1), (3-b-i) and (3-b-iii), $O(mp\log p)$ for Step (2), $O(mp \log m)$ for Step (3-b-ii), and O(1) for Step (3-a-i). So, apart from the recursive calls, there exists a constant C such that the number of operations taken by RRRR(S, U) is at most $Cmp\log n$.

We now show by induction on n that there exists a constant $C'$ such that the number of operations taken by RRRR is at most $C'n^2\log n$. The property trivially holds for small values of n. We suppose, with no loss of generality that $U \ne \varnothing $ (when $U=\varnothing $, at Step (1), the algorithm builds in O(n) a non-empty set U). So we have $p\ge 2$. We set $C'>2C$,

Suppose now that, for all $n' < n$, RRRR runs on a set $S'$ with $\vert S' \vert = n'$ within at most $C'n'^2\log n'$ operations. Since $C'>2C$ and $p\ge 2$, we obtain the following bounds for the total number of operations for RRRR(S, U):

$$\begin{aligned}&\le C'p^2 \log p + C'(m+1)^2\log (m+1) + Cmp\log n\\&\le (C'p^2+ C'(m+1)^2 + Cmp)\log n\le C'(p^2+(m+1)^2+\frac{mp}{2}) \log n\\&\le C'(p + m)^2 \log n= C'n^2 \log n.\\ \end{aligned}$$

This concludes the complexity analysis of Algorithm RRRR. $\square $

Remark 1

It is possible to make Refine(U, V) run in $O(\max (\vert U\vert ^2, \vert V\vert ^2))$ but, if we apply this version of Refine to RRRR, we get an algorithm in $O(n^3)$ instead of $O(n^2\log n)$, for instance with a sequence of recursive calls at Step (3-b-v) with, at each time, $\vert U\vert = 2$.

6 Conclusion

In regard to the algorithm of Section 4, one can note that the decomposition of M into $M_\ell \cup M_m \cup M_r$ changes neither the theoretical complexity nor the results of the empirical tests. The two examples at the end of Section 4 lead to the following open questions:

${\textbf {Question 1:}}$:: In these two examples, only O(n) pivots need to be considered and Algorithm 3 checks $O(n^2)$ pivots to find O(n) pertinent ones. Does there exist dissimilarities which actually need $O(n^2)$ pivots? Or at least more than O(n) ones?
${\textbf {Question 2:}}$:: If every dissimilarity needs less than $O(n^2)$ pivots (i.e., if the answer of Question 1 is “No"), is it possible to derandomize Algorithm 3 in such a way that pertinent pivots are selected first? In this case, Algorithm irri would be optimal in $O(n^2)$.

As mentioned is Section 2.3, partition refinement has already been used to recognize Robinson dissimilarities, but the algorithm RRRR is the first one to use only this paradigm. Moreover, although it uses only one tool, it is quite efficient since it runs in $O(n^2\log n)$.

Data Availability

The code we use for simulations is available from the corresponding author upon request. We do not analyze or generate other datasets.

References

Atkins, J. E., Boman, E. G., & Hendrickson, B. (1998). Spectral algorithm for seriation and the consecutive ones problem. SIAM Journal on Computing, 28, 297–310.
Article MathSciNet MATH Google Scholar
Barthélemy, J.-P., & Brucker, F. (2001). NP-hard approximation problems in overlapping clustering. Journal of Classification, 18, 159–183.
Article MathSciNet MATH Google Scholar
Booth, K. S., & Lueker, G. S. (1976). Testing for the consecutive ones property, interval graphs and graph planarity using PQ-tree algorithm. Journal of Computer and System Sciences, 13, 335–379.
Article MathSciNet MATH Google Scholar
Çela, E., Deineko, V., & Woeginger G.J. (2023). Recognising permuted Demidenko matrices. arXiv:230 2.05191v1.
Chepoi, V., & Fichet, B. (1997). Recognition of Robinsonian dissimilarities. Journal of Classification, 14, 311–325.
Article MathSciNet MATH Google Scholar
Chepoi, V., Fichet, B., & Seston, M. (2009). Seriation in the presence of errors: NP-hardness of l$_\infty $-fitting Robinson structures to dissimilarity matrices. Journal of Classification, 26, 279–296.
Article MathSciNet MATH Google Scholar
Chepoi, V., & Seston, M. (2011). Seriation in the presence of errors: An approximation algorithm for fitting Robinson structures to dissimilarity matrices. Algorithmica, 59, 521–568.
Article MathSciNet MATH Google Scholar
Critchley, F., & Fichet, B. (1994). The partial ordre by inclusion of the principal classe of dissimilarity on a finite set, and some properties of their basic properties. In Classification and dissimilarity analysis, B. van Cutsen (Ed.), Lecture notes in statistics, 5–65.
Deineko, V., Klinz, B., Tiskin, A., & Woeginger, G. J. (2014). Four-point conditions for the TSP: The complete complexity classification. Discrete Optimization, 14, 147–159.
Article MathSciNet MATH Google Scholar
Demidenko, V. M. (1976). A special case of traveling salesman problem.Izvestiya Akademii Nauk Belarusi. Seriya Fiziko-Matematicheskikh Nauk, 5, 28–32. (in Russian).
Google Scholar
Diday, E. (1986). Orders and overlapping clusters by pyramids. In Multidimensionnal data analysis, J. de Leeuw, W. Heiser, J. Meulman and F. Critchley (Eds.), 201–234, DSWO.
Durand, C., & Fichet, B. (1988). One-to-one correspondences in pyramidal representation: An unified approach. In Classification and related methods of data analysis, H.H. Bock (Ed.), 85–90, North-Holland.
Fogel, F., D’aspremont, A., & Vojnovic, M. (2016). Spectral ranking using seriation. Journal Machine Learning Research, 17: Paper No. 88.
Hubert, L. (1974). Some applications of graph theory and related nonmetric techniques to problems of approximate seriation: The case of symmetric proximity measures. British Journal of Mathematical Statistics and Psychology, 27, 133–153.
Article MATH Google Scholar
Hubert, L., Arabie, P., & Meulman, J. (2006). The structural representation of proximity matrices with Matlab. ASA-SIAM Series on Statistics and Applied Probability.
Kalmanson, K. (1975). Edgeconvex circuits and the travelling salesman problem. Canadian Journal Mathematics, 27(1975), 1000–1010.
Article MathSciNet MATH Google Scholar
Laurent, M., & Seminaroti, M. (2017). A Lex-BFS-based recognition algorithm for Robinsonian matrices. Discrete Applied Mathematics, 222, 151–165.
Article MathSciNet MATH Google Scholar
Laurent, M., & Seminaroti, M. (2017). Similarity-first search: A new algorithm with application to Robinsonian matrix recognition. SIAM Journal Discrete Mathematics, 31, 1765–1800.
Article MathSciNet MATH Google Scholar
Laurent, M., Seminaroti, M., & Tanigawa, S.-I. (2017). A structural characterization for certifying Robinsonian matrices. Electronic Journal Combination, 24–2
Mirkin, B., & Rodin, S. (1984). Graphs and Genes. Springer-Verlag.
Paige, R., & Tarjan, R. E. (1987). Three partition refinement algorithms. SIAM Journal on Computing, 16, 973–989.
Article MathSciNet MATH Google Scholar
Préa, P., & Fortin, D. (2014). An optimal algorithm to recognize Robinsonian dissimilarities. Journal of Classification, 31, 351–385.
Article MathSciNet MATH Google Scholar
Robinson, W. S. (1951). A method for chronologically ordering archeological deposits. American Antiquity, 16, 293–301.
Article Google Scholar
Seston, M. (2008). A simple algorithm to recognize Robinsonian dissimilarities. COMPSTAT’2008, Porto.
Seston, M. (2008). Dissimilarités de Robinson : Algorithmes de Reconnaissance et d’Approximation, Ph.D. Thesis, Université de la Méditerranée.

Download references

Acknowledgements

We would like to acknowledge the referees for their useful comments and suggestions. This research was supported in part by anr project distancia (ANR-17-CE40-0015) and has received funding from Excellence Initiative of Aix-Marseille - a*midex(Archimedes Institute AMX-19-IET-009), a French “Investissements d’Avenir” Program.

Author information

Authors and Affiliations

LIS, Aix-Marseille Université, CNRS and Université de Toulon, Marseille, France
M. Carmona, V. Chepoi, G. Naves & P. Préa
École Centrale Marseille, Marseille, France
M. Carmona & P. Préa

Authors

M. Carmona
View author publications
You can also search for this author in PubMed Google Scholar
V. Chepoi
View author publications
You can also search for this author in PubMed Google Scholar
G. Naves
View author publications
You can also search for this author in PubMed Google Scholar
P. Préa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Préa.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Empirical tests

The aim of this appendix is to show, from a practical point of view, that Algorithm 3 runs in $O(n^2)$ on average or, at least, is much more efficient than $O(n^3)$. More precisely, we made tests on Algorithm 2 and wrote down the number of pivot-pairs tested before obtaining a contradiction or getting the PQ-tree of the dissimilarity. Tests have been made for various generators of “random matrices.” For each kind of generator (except for the Toeplitz matrices, see Fig. 6), we have generated 1000 matrices of size $25\times 25$, $50\times 50$, $100\times 100$, and $200\times 200$ and noted the minimum number of pivot-pairs tested, the first decile ($10\%$ of the matrices need less than this number of pivot-pairs), the median, the last decile, and the maximum number of pivot-pairs tested. Notice that the total number of pivot-pairs is 300, 1225, 4950, and 19900 for $25 \times 25$, $50 \times 50$, $100 \times 100$, and $200 \times 200$ matrices, respectively.

Table 2 Average number of compatible permutations of the dissimilarities tested for Figs. 4 and 5 for dissimilarities on n points whose compatible orders are represented by PQ-trees built from k intervals of [n]

Full size table

We have first tested Algorithm 2 on Robinson matrices generated diagonals-after-diagonals, i.e., for each $i<j$, M[i, j] is randomly chosen in $\{\max (M[i, j-1], M[i+1, j]), \max (M[i, j-1], M[i+1, j]) +1\}$ (see Fig. 3). Nearly all dissimilarities that are generated in this way are flat (have only one compatible order and its reverse). Although nearly all Robinson dissimilarities are flat (the cone of all Robinson dissimilarities on n points is the subspace of ${\mathbb {R}}^{\frac{n(n-1)}{2}}$ which is the union of n! convex cones of Robinson dissimilarities with a given compatibility order and the flat Robinson dissimilarities correspond to the inner points of these full cones), we have also generated random Robinson dissimilarities on [n] with several compatible orders. This has been done in the following way:

(1)
Generate k random intervals on [n].
(2)
Generate the PQ-tree on [n] which represents the set of permutations such that these k intervals (and only them) remain intervals for all these permutations. This is done by k calls of Function PQ-Tree-Update.
(3)
Generate a Robinson dissimilarity admitting all these permutations (and only them) as compatible ones.

Such dissimilarities have been generated for $k \in \{\lceil n/10\rceil , \lceil n/2\rceil , n, \lceil 3n/2\rceil , 2n, 3n \}$, and the results of the tests are shown in Figs. 4 and 5. These dissimilarities have several compatible orders (see Table 2). We have also tested Algorithm 2 on Toeplitz matrices on $\{0,1,2\}$. A $\{0,1,2\}$-Toeplitz matrix is defined by $M[i,i] = 0$, $M[i, j] = 1$ if $0<\vert i-j\vert \le k$ and $M[i,j] = 2$ otherwise (these matrices are Robinson). For our purpose, the interest for these matrices is that they are flat and have very few different values, two characteristics which are, at a first glance, opposite to each other. The result of tests on Toeplitz matrices is shown in Fig. 6.

Table 3 Average running time of Algorithm 3 on different structures of dissimilarities tested in Figs. 3–9 for a number n of points in $\{50, 100, 200\}$

Full size table

On Figs. 3–6, one can see that the median value of the number of requested pivot-pairs grows linearly with the number of points. The last decile of the number of used pivot-pairs also grows linearly, except for dissimilarities generated from k intervals with $k<n$. In any case, this last decile is very low when compared with the total number of pivot-pairs.

Finally, we have also tested our algorithm on non-Robinson dissimilarities. On random dissimilarities, the maximum number of pivot-pairs before getting a contradiction is always very low: at most 5 for $200\times 200$ matrices. We have also tested non-Robinson dissimilarities built from Robinson dissimilarities on which a few values have been modified so that the resulting dissimilarity is not Robinson:

On matrices M such that, $\forall i, j$, $M[i,j] = \max (M[i, j-1], M[i+1, j])$ with probability 0.6 and $M[i,j] =1+ \max (M[i, j-1], M[i+1, j])$ with probability 0.4 (as for Fig. 3-B) with 1, 2, 5 or 10 modified elements. The results are shown in Fig. 7.
On $n\times n$ matrices such that the set of compatible orders is represented by a PQ-tree that can be built from $\lceil 2n/10\rceil $ intervals (as for Fig. 4-B) with 1, 2, 5, or 10 modified elements. The results are shown in Fig. 8.
On $n\times n$ matrices such that the set of compatible orders is represented by a PQ-tree that can be built from $\lceil 3n/2\rceil $ intervals (as for Fig. 5-B) with 1, 2, 5, or 10 modified elements. The results are shown in Fig. 9.

We can see that if two or more values are modified, then the median value of the number of pivot-pairs used before getting a contradiction is nearly linear. If there are 5 or 10 modified values, then this is also the case for the last decile. In any case, even with only one modified value, the number of pivot-pairs used before getting a contradiction is much smaller than the total number of pivot-pairs.

In conclusion, in practice, Algorithm 3 is very efficient and is not distinguishable from the optimal $O(n^2)$ complexity. The running average time in Python 3 on an ordinary desktop computer (Intel Core i5 at 2.7 GHz with 4 Go of memory) is approximatively 26 ms for a $50\times 50$ matrix, 129 ms for a $100\times 100$ matrix, and 686 ms for a $200\times 200$ matrix. In Table 3, we show the running time for diverse structures of distance corresponding with Figs. 3–9.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Carmona, M., Chepoi, V., Naves, G. et al. Two Simple but Efficient Algorithms to Recognize Robinson Dissimilarities. J Classif (2023). https://doi.org/10.1007/s00357-023-09446-y

Download citation

Accepted: 31 July 2023
Published: 18 August 2023
DOI: https://doi.org/10.1007/s00357-023-09446-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Two Simple but Efficient Algorithms to Recognize Robinson Dissimilarities

Abstract

Similar content being viewed by others

Robinsonian Matrices: Recognition Challenges

Clustering Analysis of a Dissimilarity: a Review of Algebraic and Geometric Representation

A Lex-BFS-Based Recognition Algorithm for Robinsonian Matrices

1 Introduction

1.1 Related Work

1.2 Our Results

1.3 Paper’s Organization

2 Preliminaries

2.1 Robinsonian Dissimilarities

2.2 PQ-Trees and the Consecutive One’s Property

Example 1

Proposition 1

Corollary 1

2.3 Refinement Procedures

3 Partitioning S with Respect to a Pivot-Pair

3.1 The Partition

3.2 Properties of the Non-Ambiguous Sets

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

3.3 Properties of the Ambiguous Set \(A^=\)

Lemma 5

Proof

Lemma 6

Proof

Lemma 7

Proof

Corollary 2

Corollary 3

Corollary 4

3.4 Properties of the Ambiguous Set \(A^{\circ }\)

Lemma 8

Proof

Lemma 9

Proof

Lemma 10

Lemma 11

Proof

Corollary 5

Lemma 12

Proof

3.5 Summary

Theorem 1

4 An Iterative Algorithm Using PQ-Trees

Proposition 2

Proof

Proposition 3

Proof

Example 2

Example 3

5 A Recursive Algorithm by Partition Refinement

5.1 Preliminaries

Claim 1

Claim 2

5.2 The Algorithm

Lemma 13

Proof

Proposition 4

5.3 Finding a Block

5.4 Complexity

Proposition 5

Proof

Remark 1

6 Conclusion

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest