1 Introduction

The k -SUM problem is a fixed-parameter version of the NP-complete SUBSET SUM problem. It consists of deciding, given a set of n numbers, whether any subset of size k sums to zero. The problem for \(k=3\), known as 3-SUM, is now a well-established bottleneck problem in fine-grained complexity theory (see for instance [1, 29] and references therein). While there are many reductions showing 3-SUM- or k -SUM-hardness of computational problems in geometry, only few reductions to 3-SUM and k -SUM are known (from triangle enumeration in graphs, for instance; see Jafargholi and Viola [26]). We give examples of computational geometry problems that reduce to 3-SUM or k -SUM.

Our results are motivated by the nontrivial improved upper bounds on the complexity of 3-SUM and k -SUM proven in the recent years. While it has long been conjectured that no subquadratic algorithm for 3-SUM existed, it is now known to be solvable in time \(O((n^2/{\log n})(\log \log n)^2)\) in the real RAM model, and in time \(O((n^2/{\log ^2\!n})(\log \log n)^{O(1)})\) if we allow bitwise operations on fixed-length words [14, 20, 22, 25]. The existence of an \(O(n^{2-\delta })\) algorithm for some \(\delta >0\) remains an open problem. Using folklore meet-in-the-middle algorithms, k -SUM can be solved in time \(O(n^{\lceil k/2\rceil })\) if k is odd, and in time \(O(n^{k/2}\log n)\) if k is even. Recently, Kane et al. [27] showed that it can be solved in time \(O(n\log ^2 n)\) in the linear decision tree model, improving on previous polynomial bounds [13, 19].

Geometric pattern matching  We consider two problems involving searching for a given set P of k points, called the pattern, within a larger set S of points, up to some geometric transformation. Here we focus on exact algorithms, in which the pattern must match the subset of points exactly. We consider two types of geometric transformations: similarity transformations, which are compositions of a translation, a rotation, and a uniform scaling, and affine transformations, which are compositions of a translation and a linear map. This yields the following two problems.

Problem 1.1

(SIMILARITY MATCHING) For a fixed integer \(k\ge 3\), given a set P of k points in the plane and a set S of n points in the plane, determine whether S contains the image of P under a similarity transformation.

Problem 1.2

(AFFINE MATCHING) For fixed integers \(d\ge 2\) and \(k\ge d+2\), given a set P of k points in \({\mathbb {R}}^d\) containing \(d+1\) affinely independent points, and a set S of n points in \({\mathbb {R}}^d\), determine whether S contains the image of P under an affine transformation.

A large body of the computational geometry and pattern recognition literature is dedicated to the problems of finding approximate matches up to some geometric transformation, where the quality of the approximation is typically measured by the Hausdorff distance [6, 15, 21, 24]. For exact pattern matching problems under different families of transformations, known upper bounds on time complexity have been compiled in a survey by Peter Braß [11]. We reproduce them in Table 1.

Table 1 Known upper bounds on the time complexity of exact geometric pattern matching in various settings (taken from [11] and [23, Chap. 54])

The complexity of these algorithms is directly related to bounds on the maximum number of occurrences of a pattern or a distance in a set of n points. In fact, such bounds directly yield a lower bound on the computational problem of listing all occurrences of the pattern. A prototypal example is Erdős’ unit distance problem; see Braß and Pach [12] for more examples. It is known, in particular, that there can be \(\varTheta (n^2)\) similar copies of a pattern in an n-point set [3, 4, 18]. Structural results on the extremal point sets are also known [2]. For affine transformations in \({\mathbb {R}}^d\), there exist pairs PS such that S contains \(\varTheta (n^{d+1})\) copies of P: for instance, the d-dimensional lattice \(\{1,2,\ldots ,n^{1/d}\}^d\) contains \(\varTheta (n^{d+1})\) affine images of a cube.

Our results  We suppose we can perform exact computations over the reals. Therefore, all the algorithms that we consider are either uniform algorithms in the real RAM model, or nonuniform algorithms in the real algebraic decision tree model. For simplicity, we will also assume that in the RAM model we have access to random real numbers. We show that they can be replaced by polynomially bounded random integers, at the cost of a polynomially vanishing probability of error. Our main result is the following.

Theorem 1.3

SIMILARITY MATCHING and AFFINE MATCHING reduce in randomized linear time to k -SUM.

We refer the reader to the exact definitions of the k -SUM problem and the notion of randomized linear-time reduction given later. Theorem 1.3 has a number of consequences. Let us consider the special case of the SIMILARITY MATCHING problem in which \(k=3\).

Problem 1.4

(TRIANGLE) Given a triangle \(\varDelta \) and a set S of n points in the plane, determine whether S contains three points whose convex hull is similar to \(\varDelta \).

Combining the reduction provided by Theorem 1.3 with the real RAM algorithm for 3-SUM from Chan [14], we obtain the following.

Corollary 1.5

There exists an \(O((n^2/{\log n})(\log \log n)^2)\) randomized real RAM algorithm for TRIANGLE. In particular, there exists a subquadratic algorithm to detect equilateral triangles in a point set.

This contrasts with our current knowledge on the related 3-SUM-hard problem of finding three collinear points, also known as GENERAL POSITION TESTING. Despite recent attempts [10, 14], it is still an open problem to find a subquadratic algorithm for GENERAL POSITION TESTING.

Our next corollary is obtained directly from known algorithms for k -SUM. It improves on the best known \(O(n^{d+1}\log n)\) algorithm whenever \(k<2(d+1)\).

Corollary 1.6

There exists an \(O(n^{\lceil k/2\rceil })\) (for k odd), or an \(O(n^{k/2}\log n)\) (for k even) randomized real RAM algorithm for AFFINE MATCHING.

Finally, we consider the nonuniform decision tree complexity, also known as query complexity, of the two problems. By applying a recent result of Kane et al. [27], we can bound the number of algebraic tests that are required to detect copies of P in an input set S.

Corollary 1.7

There exist randomized algebraic decision trees of height \(O(n\log ^2\!n)\) for SIMILARITY MATCHING and AFFINE MATCHING.

In fact, if the pattern P is a fixed parameter, that is, when P is not part of the input, but known at the algorithm design time, then the decision tree in the statement above only involves linear tests.

Corollary 1.8

There exist randomized linear decision trees of height \(O(n\log ^2\!n)\) for the fixed-parameter versions of SIMILARITY MATCHING and AFFINE MATCHING, in which P is a fixed parameter of the problems.

In a recent paper, Aronov et al. [8] study the following problem: Given three sets ABC of n points in the plane, decide whether there exists \((a,b,c)\in A\times B\times C\) that simultaneously satisfies two real polynomial equations. They provide a subquadratic upper bound on the algebraic decision tree complexity of this problem. In a preliminary version of their paper [9, version 2, Corollary 4.4], they considered the TRIANGLE problem as a special case of this problem. This version also contains a proof that TRIANGLE is 3-SUM-hard. As our result shows, it turns out that this special case is in fact much easier than the general problem, as the two polynomial equations can be made linear. Hence TRIANGLE is actually linear-time equivalent to 3-SUM, and its decision tree complexity is near-linear. We refer to [8, 9] for a thorough discussion of the relation between these and other related problems.

Note that our results rely on a randomized reduction. We leave as an open problem the question of whether analogous deterministic reductions exist.

Plan  In the next section, we define a number of variants of the k -SUM problem and prove they are all equivalent in the computation model we consider. In Sect. 3, we prove our main result for SIMILARITY MATCHING. Section 4 considers the AFFINE MATCHING problem. The last section is dedicated to the proof of Corollaries 1.7 and 1.8.

2 Linear Degeneracy Testing

We first give a definition of the k -SUM problem. Here, \(k\ge 3\) is a fixed integer and X is a ring.

Problem 2.1

() Given k sets \(A_1,\ldots ,A_k\) of n elements of X, determine whether there exists a k-tuple such that \(\sum _{i=1}^ka_i=0\).

Our next problem is often referred to as linear degeneracy testing [7, 17]. We consider the cases where \(X={\mathbb {R}}\) or \({\mathbb {C}}\) with the usual addition and multiplication operations, or where \(X={\mathbb {R}}^d\) or \({\mathbb {C}}^d\) for some integer \(d\ge 2\), with the vector addition and Hadamard (entrywise) product defined by \((uv)_i=u_iv_i\). In the latter cases, the all-zero vector is denoted by 0, and the all-one vector by 1.

Problem 2.2

() For a linear function \(f:X^k\rightarrow X\) given by \(f(a_1,\ldots ,a_k)= \beta _0 + \sum _{i=1}^k \beta _i a_i\) with \(\beta _i\in X\) for \(0\le i\le k\), given k sets \(A_1,\ldots ,A_k\) of n elements of X, determine whether there exists a k-tuple such that \(f(a_1,\ldots ,a_k)=0\).

We make two observations. First, these are fixed-parameter problems: the integer k is part of the definition of the problem, not of the input. The same can be assumed for the function f. Such parameters will be referred to as fixed in what follows. Another observation is that using the Hadamard product in the definition of the function f allows us to combine conditions on the sought k-tuples: In the ring X, searching for k-tuples that simultaneously satisfy d linear equations can be cast as .

It is clear that k -SUM is the special case of k -LDT in which \(\beta _0=0\) and \(\beta _i=1\) for \(1\le i\le k\). On the other hand, k -LDT is not harder than k -SUM.

Lemma 2.3

For any integer \(d>0\), reduces in linear time to .

Proof

Consider the sets \(A_i\) from the k -LDT instance and let \(A'_i:=\{\beta _i a \mid a\in A_i \}\) for all \(1\le i< k\), and \(A'_k:=\{\beta _k a+\beta _0\mid a\in A_k\}\). Then the instance of k -SUM composed of the sets \(A'_i\) has a solution if and only if the instance of k -LDT has a solution. \(\square \)

In what follows, we say that a problem A reduces to problem B in randomized g(n) time if there exists an algorithm in the real RAM model with access to random real numbers in [0, 1] that maps any instance of size n of A to an equivalent instance of B in time O(g(n)) with probability 1. Over the reals, the vector and scalar versions of k -SUM are also essentially equivalent, up to such a randomized reduction.

Lemma 2.4

For any fixed integer \(d>0\), reduces in randomized linear time to .

Proof

Given an instance \(\{A_1,\ldots ,A_k\}\) of , pick a uniform random unit vector \(v\in {\mathbb {R}}^d\) (see for instance Chapter V in Devroye’s classical textbook [16] for the generation of random vectors on the unit hypersphere) and consider the sets \(A'_i:=\{a\cdot v\mid a\in A_i\}\subset {\mathbb {R}}\), where \(a\cdot v\) is the usual dot product. They form an instance of such that any solution to the original instance of is also a solution. In the other direction, suppose there is a k-tuple such that \(\sum _{i=1}^ka'_i=0\), where \(a'_i=a_i\cdot v\). Hence we have \(\sum _{i=1}^ka_i\cdot v=0\), which is either because \(v\perp \sum _{i=1}^ka_i\) and \(\sum _{i=1}^ka_i\ne 0\), or because \(\sum _{i=1}^ka_i=0\). Since \(v\perp \sum _{i=1}^k a_i\) and \(\sum _{i=1}^ka_i\ne 0\) occurs with probability 0, the k-tuple \((a_1,\ldots ,a_k)\) is a solution of the instance \(\{A_1,\ldots ,A_k\}\) of with probability 1. \(\square \)

In a model of computation where random real numbers are not available, one can replace a random unit vector in \({\mathbb {R}}^d\) by a random vector v from the integer grid cube \([n^c]^d\), for a suitably large constant c. There is no need to normalize the length of v. The above dot product becomes zero only if v lies on a particular hyperplane. Since the grid cube intersects any hyperplane in at most \((n^c)^{d-1}\) points, the probability of such an event is at most \(1/n^c\). We also make the following simple observation:

Observation 2.5

is equivalent to .

3 Searching for a Similar Copy

Recall that in the TRIANGLE problem, we want to determine whether an input set S of n points in the plane contains three points whose convex hull is similar to a given triangle \(\varDelta \). The short proof of the following result uses the interpretation of points in the plane as complex numbers, an idea that was exploited in a combinatorial context before [18, 28].

Lemma 3.1

TRIANGLE reduces in linear time to .

Proof

Let \(u=re^{i\theta }\) be such that the three numbers 0, 1, u are the vertices of a triangle similar to \(\varDelta \) in the complex plane. Recall that multiplying by \(re^{i\theta }\) has a geometric interpretation in the complex plane as scaling by a factor r and rotating by an angle \(\theta \). Hence three other complex numbers \(a,b,c\in {\mathbb {C}}\) form a triangle similar to \(\varDelta \) in the complex plane with the same orientation if and only if \(c-a=u(b-a)\), or equivalently if \((u-1) a-ub+c=0\). Hence TRIANGLE reduces to with \(\beta =(0,u-1,-u,1)\). From Lemma 2.3, it reduces in linear time to . \(\square \)

Combining with Observation 2.5 and Lemma 2.4, we obtain:

Theorem 3.2

TRIANGLE reduces in randomized linear time to .

Recall that TRIANGLE is also known to be 3-SUM-hard [9], hence it is actually linear-time equivalent to 3-SUM. Our result generalizes naturally to larger patterns.

Lemma 3.3

SIMILARITY MATCHING reduces in linear time to .

Proof

Let \(u_1,\ldots ,u_{k-2}\in {\mathbb {C}}\) be such that the set \(Q=\{0,1,u_1,\ldots ,u_{k-2}\}\) is similar to P in the complex plane. Then k numbers \(a_1,\ldots ,a_k\in {\mathbb {C}}\) form a similar copy of Q in the complex plane, with \(a_1\) mapped to 0, \(a_2\) to 1, and so on, if and only if \(a_i-a_1=u_{i-2}(a_2-a_1)\) for all \(3\le i\le k\). These are \(k-2\) linear equations on the k complex numbers \(a_1,\ldots ,a_k\), hence SIMILARITY MATCHING reduces in linear time to . From Lemma 2.3, it reduces in linear time to . \(\square \)

Again, combining with Observation 2.5 and Lemma 2.4, we obtain the first statement of Theorem 1.3.

Theorem 3.4

SIMILARITY MATCHING reduces in randomized linear time to .

4 Searching for an Affine Image

We now prove the analogous result for the affine case. As a warm-up, we first consider the following simpler special case of AFFINE MATCHING in which the pattern is a square. Four points form the affine image of vertices of a square if and only if they are the vertices of a (possibly degenerate) parallelogram. Hence the problem can be cast as follows.

Problem 4.1

(PARALLELOGRAM) Given a set S of n points in the plane, determine whether S contains four points whose convex hull is a parallelogram.

Theorem 4.2

PARALLELOGRAM reduces in randomized linear time to .

Proof

Four points \(a_1,a_2,a_3,a_4\in S\) in this order form a parallelogram with \(a_1a_2\) parallel to \(a_4a_3\) and \(a_2a_3\) parallel to \(a_1a_4\) if and only if \(a_2-a_1=a_3-a_4\), or equivalently if \(a_1-a_2+a_3-a_4=0\). Hence PARALLELOGRAM reduces to with \(\beta =((0,0),(1,1),(-1,-1),(1,1),(-1,-1))\). From Lemmas 2.3 and 2.4, it also reduces in randomized linear time to . \(\square \)

The general case follows from the following observation. Consider a matrix \(Q\in {\mathbb {R}}^{n\times n}\) and let \(Q_k\) denote the matrix obtained from Q by replacing its kth column by the column vector \(x^T\), where \(x_1,x_2,\ldots ,x_n\) are variables. Then \(\det Q_k\) is a linear combination of \(x_1,x_2,\ldots ,x_n\), with coefficients defined by Q.

Lemma 4.3

AFFINE MATCHING reduces in linear time to with \(\ell =d(k-(d+1))\).

Proof

We use the notation \([k]:=\{1,2,\ldots ,k\}\). Let \(p_i=(p_{i,1},\ldots ,p_{i,d})\) be a row vector representing the ith point of P. From the problem definition, P must contain \(d+1\) affinely independent points. Since we suppose k and d fixed, these points can be determined in constant time. We therefore assume without loss of generality that they are the first \(d+1\) points \(p_1,\ldots ,p_{d+1}\). Let \(A=\{a_1,\ldots ,a_k\}\in {S\atopwithdelims ()k}\) be a candidate match. In order for the set A to be the image of P under an affine transformation, there must be a solution to the system of k linear equations of the form \(p_iF+t=a_i\) for all \(i\in [k]\), with \(d^2+d\) real unknowns \(F\in {\mathbb {R}}^{d\times d}\) and \(t\in {\mathbb {R}}^d\). The system can be decomposed into d systems, one for each coordinate \(j\in [d]\). Each consists of k equations with \(d+1\) unknowns, of the form \(p_iF_j+t_j=a_{ij}\) for \(i\in [k]\), where \(F_j\) is the jth column of F. We consider one such system, for a fixed \(j\in [d]\), and restrict it to the first \(d+1\) equations only:

$$\begin{aligned} Q \cdot \begin{pmatrix} F_j \\ t_j\end{pmatrix}=\begin{pmatrix} a_{1,j}\\ \vdots \\ a_{d+1,j}\end{pmatrix},\qquad \text {where}\quad \ Q=\begin{pmatrix} p_1 &{} 1 \\ \vdots &{} \vdots \\ p_{d+1} &{} 1\end{pmatrix}. \end{aligned}$$

Since the first \(d+1\) points of P are affinely independent, Q is invertible and the system defines a unique solution for the coefficients \(F_j\) and \(t_j\) of the affine transformation. From Cramer’s rule, the value of the kth unknown is the ratio \({\det Q_k}/{\det Q}\), where \(Q_k\) is the matrix obtained by replacing the kth column of Q by \((a_{1,j},\ldots ,a_{d+1,j})^T\). From the above observation and the fact that Q does not depend on S, the expressions \({\det Q_k}/{\det Q}\) are linear combinations of the values \(a_{1,j},\ldots ,a_{d+1,j}\), with coefficients determined by P. Hence the explicit solution for the coefficients \(F_j\) and \(t_j\) are linear combinations of the \(a_{1,j},\ldots ,a_{d+1,j}\).

A necessary and sufficient condition for the set A to be a match is that the remaining \(k-d-1\) points of A are also images of the corresponding points in P. Hence we require that for all \(i>d+1\) the ith equation \(p_i F_j + t_j = a_{ij}\) is also satisfied by this solution. The unknowns \(F_j\) and \(t_j\) can be replaced by linear combinations of \(a_{1,j},\ldots ,a_{d+1,j}\). Hence we obtain a set of \(k-(d+1)\) linear equations on the variables \(a_{1,j},\ldots ,a_{k,j}\), with coefficients depending on P.

Since these \(k-(d+1)\) equations must hold for all coordinates \(j\in [d]\) simultaneously, we obtain that AFFINE MATCHING reduces to with \(\ell =d(k-(d+1))\). From Lemma 2.3 it also reduces to . Since d and k are fixed, the reduction takes linear time. \(\square \)

Combining with the randomization step in Lemma 2.4, we get the second part of Theorem 1.3.

Theorem 4.4

AFFINE MATCHING reduces in randomized linear time to .

5 Algebraic Decision Tree Complexity

An algebraic decision tree is a type of nonuniform algorithm for problems on inputs composed of n real numbers. For each input size n, it consists of a binary tree whose internal nodes are labeled with inequalities of the form “\(q(x)\le 0\)” on the input \(x\in {\mathbb {R}}^n\), where q is a bounded-degree n-variate polynomial in \(x_1,x_2,\ldots ,x_n\). Inequalities are interpreted as queries on the input, and the two subtrees correspond to the possible outcomes of the query on the input. Leaves of the tree are labeled with the answer to the problem. The minimum height h(n) of an algebraic decision tree solving instances of size n of the problem is the decision tree complexity, or query complexity of the problem. When the queries only involve linear functions, such trees are called linear decision trees. In that case, a query is said to be t-sparse when it involves at most t numbers of the input.

We have the following recent result on the linear decision tree complexity of the k -SUM problem.

Theorem 5.1

(Kane et al. [27])  The k -SUM problem on n elements can be solved by a linear decision tree of height \(O(kn\log ^2\!n)\) in which all the queries are 2k-sparse and have only \(\{-1,0,1\}\) coefficients.

We now show that this result directly applies to the SIMILARITY MATCHING and AFFINE MATCHING problems, thereby proving Corollary 1.7.

We first consider the SIMILARITY MATCHING problem, an instance y of which consists of two coordinates per point of P and S, hence of \(2(k+n)\) real numbers. Suppose we apply the randomized reduction proposed in Theorem 3.4 to obtain an instance of . Now consider the linear decision tree from Theorem 5.1. Each linear query on the transformed input maps to a query on the original input numbers y. Because the reduction only involves multiplications and additions on these numbers, such queries are algebraic queries on the original input y. Therefore, the linear decision tree for k -SUM maps to an algebraic decision tree of the same height for SIMILARITY MATCHING. The same reasoning applies to AFFINE MATCHING. In that case, it suffices to observe that multiplying both sides of every query by the quantity \(\det Q\) for the matrix Q used in the proof of Lemma 4.3 yields algebraic queries again. Note that since k and d are constant and the linear queries in Theorem 5.1 are sparse, the queries have bounded degree and bounded size. This proves Corollary 1.7.

Also note that if we suppose the pattern P is a fixed parameter of the problem, then the two problems are solved by linear decision trees of height \(O(n\log ^2\!n)\). It can indeed be checked that the algebraic queries do not involve multiplications between coordinates of the points of S, hence are linear whenever P is fixed. This proves Corollary 1.8. It applies in particular to the PARALLELOGRAM problem, or to finding an equilateral triangle in a point set.