Keywords

1 Introduction

Graph algorithms are the foundation of many computer science applications such as navigation systems, community detection, supply chain networks [32, 38, 39], hyperspectral imaging [35], and sparse linear solvers. Privacy-preserving parallel algorithms are needed to expedite the processing of large private data sets for graph algorithms and meet high-end computational demands. Constructing real-world privacy applications based on secure multiparty computation is challenging due to the round complexity of the computation parties of SMC protocol [11, 23, 24]. The round complexity problem of SMC protocol can be solved using parallel computing [10, 14].

Single-Instruction-Multiple-Data (SIMD) a principle for parallel computations [19]. Recently, SIMD principles have been used to reduce the round complexities in many privacy-preserving graph algorithms, including minimum spanning tree [4, 25] and shortest path [2, 3, 5]. These privacy-preserving graph protocols are constructed on top of SMC protocols, and they are capable to process sizeable private data sets, where both the location and length of edges are private. The construction of these protocols follows the classical graph algorithms [15], storing the intermediate values privately, invoking SMC protocols for the computational steps in these algorithms, and attempting to parallelize the computations as much as possible.

In this paper we show that certain other SSSD algorithms may be even more suitable for conversion into SMC protocols. Considering the Parallel RAM (PRAM) model of execution, Pan and Reif [29, 31] proposed a parallel algorithm for the Algebraic Path Computation (APC). Their algorithm assumes the availability of the separator tree of the graph and computes a recursive factorization of the graph’s adjacency matrix on its basis [31], with the number of steps and the parallel complexity depending on the height of the tree and the size of separators. We also assume that the separator tree is among the public inputs for the SMC protocol, and show how to privately execute Pan and Reif’s algorithm. Our empirical evaluation shows that for graphs with “good” separator trees, including planar graphs, our protocol may be up to two orders of magnitude faster than protocols based on classical SSSD algorithms.

The availability of the separator tree implies that the locations of the edges of the graph have to be public (this is accounted for in our empirical comparisons), only their lengths are private. Privately computing SSSD in this setting can still be relevant for a number of applications. E.g. private SSSD in city streets with public layouts has been tackled using either SMC [37] or differential privacy [34]. However, SMC protocols based on classical SSSD algorithms either do not benefit from the public end-points of edges at all [17], or benefit only slightly [6].

Our Contributions. In this paper, we present the following:

  • The first privacy-preserving parallel computation protocol of algebraic shortest path. The protocol uses the sparse representation of an (adjacency) matrix, where the locations of edges are public, while their lengths are private. We propose suitable data structures and normalizations for this task.

  • Implementations (on top of the Sharemind MPC platform [7, 8]) of the algebraic shortest path protocol, and an optimized privacy-preserving Bellman-Ford protocol, with public locations and private lengths of edges. Benchmarking results for both implementations for various sizes of graphs, and different network environments.

2 Preliminaries

2.1 Secure Multiparty Computation

Secure multiparty computation (SMC) is a cryptographic technique, allowing a number of parties each give input to a pre-agreed functionality F, and learn the input meant for this party, such that each party (or a tolerable coalition of parties) will learn nothing besides their own input and output. There exist a number of different approaches for constructing SMC protocols, including garbled circuits [40], homomorphic encryption [16, 21], or secret sharing [12, 20], and offering security either against passive or active adversaries. These approaches typically include steps for entering a value into the computation in a privacy-preserving manner, for performing simple arithmetic operations (e.g. addition and multiplication in a finite field or ring) with private values present in the computation, and for opening a private value to a party upon the agreement of sufficiently many other parties. These steps, that constitute protocols by themselves, can be combined relatively freely. Hence, if the functionality F has been presented as an arithmetic circuit, then these protocols for input/output and arithmetic operations can be combined to yield a protocol for F.

Availability of such compositions leads to the typical abstraction of SMC in privacy-preserving applications—the Arithmetic Black Box (ABB) [16, 26]. An ABB is an ideal functionality in the Universal Composability [13] framework. This framework considers a set \(\mathcal {T}\) of interacting Turing machines [22], executing a protocol \(\varPi \). Beside the set of machines \(\mathcal {T}\), there is also another Turing machine—the adversary that can interfere with \(\varPi \) by sending to machines in \(\mathcal {T}\) certain commands that have been defined in the adversarial API’s of these machines. The set of the machines also includes the environment that interacts with machines in \(\mathcal {T}\) and the adversary over a well-defined API. Given two sets of machines \(\mathcal {T}\) and \(\mathcal {T}'\) implementing the same API towards the environment, we say that \(\mathcal {T}\) is at least as secure as \(\mathcal {T}'\), if for any possible adversary \(\textbf{A}\) targeting \(\mathcal {T}\) (i.e. its adversarial API), there exists an adversary \(\textbf{S}\) targeting \(\mathcal {T}'\), such that the environment cannot distinguish whether it is executing with \(\mathcal {T}\) and \(\textbf{A}\), or with \(\mathcal {T}'\) and \(\textbf{S}\). This notion is composable: if additionally \(\mathcal {T}=\mathcal {T}_0\cup \{\varXi \}\) for a Turing machine \(\varXi \), and a set of machines \(\mathcal {U}\) is at last as secure as \(\{\varXi \}\), then \(\mathcal {T}_0\cup \mathcal {U}\) is at least as secure as \(\mathcal {T}'\). Often, we say that \(\varXi \) is the ideal functionality for the corresponding real functionality \(\mathcal {U}\) that implements it.

The ABB functionality is represented by a Turing machine \(\mathcal {F}_\mathcal {ABB}\) that allows the environment representing all parties of a multiparty application to perform private computations. If one of the parties sends the command \((\textsf{store},v)\) to the ABB, where v is a value from one of the rings that the ABB supports, then it creates a new handle h, stores the pair (hv), and sends h back to all parties. If all (or sufficiently many) parties send the command \((\textsf{perform}, op ,h_1,\ldots ,h_k)\) to the ABB, where \( op \) is one of the supported operations and \(h_1,\ldots ,h_k\) are existing handles, then the ABB looks up the stored pairs \((h_1,v_1),\ldots ,(h_k,v_k)\), computes \(v= op (v_1,\ldots ,v_k)\), creates a new handle h, stores (hv), and sends h back to all parties. If all (or sufficiently many) parties send the command \((\textsf{declassify},h)\), then ABB looks up (hv) and sends v back to all parties. A secure application that makes use of the ABB remains secure if \(\mathcal {F}_\mathcal {ABB}\) is replaced with a set of Turing machines that securely implement the ABB, i.e. run secure multiparty computation protocols. Note that if we want to compute a function F with the help of an ABB, and if the ABB only declassifies the end result of F, then the resulting protocol is trivially private [26].

In the following, a value v stored in the ABB and accessed through a handle is denoted by \(\llbracket {v}\rrbracket \). Similarly, \(\llbracket {\boldsymbol{v}}\rrbracket \) denotes a vector of values, and \(\llbracket {\textbf{V}}\rrbracket \) a matrix of values stored in the ABB. We use the notation \(\llbracket {u}\rrbracket +\llbracket {v}\rrbracket \) [resp. \(\textsf{min}(\llbracket {u}\rrbracket ,\llbracket {v}\rrbracket )\)] to denote that the addition [resp. minimum] operation is being invoked on the values \(\llbracket {u}\rrbracket \) and \(\llbracket {v}\rrbracket \); the result of this operation is again stored in the ABB. We extend this notation pointwise to vectors and matrices.

The cost of the operations of the ABB depends on the implementation of \(\mathcal {F}_\mathcal {ABB}\). If Sharemind has been used as the implementation, then the addition is a free operation (i.e. it requires no communication between parties), and minimum requires a constant amount of bits to be exchanged in a constant number of rounds. Hence the bandwidth cost of \(\textsf{min}(\llbracket {u}\rrbracket ,\llbracket {\boldsymbol{v}}\rrbracket )\) is linear in the length of \(\boldsymbol{u}\) and \(\boldsymbol{v}\), while the round complexity is logarithmic in their length. In the following descriptions of algorithms built on top of the ABB, we have to be explicit in stating, which operations can or cannot be performed in parallel. For loops, we write forall to denote that all iterations take place in parallel; we write for to state that the loop is sequential.

2.2 Graphs, Separators, Semirings, and Algebra Path Problem

A graph \(G=(V,E)\) is a mathematical structure consisting of a set V of vertices that are connected by edges from a set \(E\subseteq V\times V\). The edges between vertices may have lengths assigned to them; these are given by a function \(w:E\rightarrow \mathbb {Z}\). A graph may be directed or undirected; in the latter case, E is symmetric. If \(V'\subseteq V\), then we let \(G[V']\) denote the induced subgraph \((V',E\cap V'\times V')\).

A graph \(G=(V,E)\) can be represented in computer memory in different ways. The dense representation of G is the adjacency matrix—a \(|V|\times |V|\) matrix over \(\mathbb {Z}\cup \{\infty \}\), where the entry at u-th row and v-th column is w(uv). On the other hand, the adjacency list representation gives for each vertex \(u\in V\) the list of pairs \((v_1, w_1),\ldots ,(v_k, w_k)\), where \((u,v_1),\ldots ,(u, v_k)\) are all edges in G that start in u, and \(w_i = w(u, v_i)\); we call such representations sparse. If \(|E|\ll |V|^2\), then sparse representation takes up less space than dense representation and the algorithms working on sparse representation may be faster [9].

We call a graph (actually, an infinite family of graphs) sparse if its number of edges is proportional to its number of vertices, \(|E|=O(|V|)\); otherwise we call it dense. A graph is planar if it can be drawn a plane without crossing the edges outside vertices. If G is planar, then \(|E|\le 3|V|-6\) according to Euler’s formula for the number of vertices, edges and faces of its drawing [36].

A separation of the graph \(G=(V,E)\) is a partition of its vertices \(V=V_1\mathbin {\dot{\cup }}S\mathbin {\dot{\cup }}V_2\), such that any path from a vertex in \(V_1\) to a vertex in \(V_2\) must pass through a vertex in S (called a separator). It is known [27] that planar graphs have separations where \(|S|=O(\sqrt{|V|})\) and \(|V_1|,|V_2|\le 2|V|/3\). A separator tree of G is either a single node containing \((\emptyset ,V,\emptyset )\); or the root node \((V_1,S,V_2)\) for some separation of G, and its two subtrees—separator trees for \(G[V_1\cup S]\) and \(G[V_2\cup S]\). Planar graphs thus have separator trees of height \(O(\log |V|)\).

A (commutative) semiring is an algebraic structure S with two binary operations \(\oplus \) and \(\otimes \), where both are associative and commutative, have unit elements and , where \(\otimes \) distributes over \(\oplus \), and where for all \(a\in S\). We overload \(\oplus \) and \(\otimes \) to also denote addition and multiplication of matrices with elements from S; the multiplication may also be denoted by juxtaposition. Given matrices \(\textbf{A}\in S^{n\times n}\) and \(\textbf{X}\in S^{m\times n}\), the algebra path problem is to find a matrix \(\textbf{Y}\in S^{m\times n}\), such that \(\textbf{Y} = \textbf{X}\oplus \textbf{Y} \textbf{A}\). Let \(\textbf{I}\in S^{n\times n}\) be the identity matrix. If \(\textbf{A}\) has a quasi-inverse, i.e. a matrix \(\textbf{A}^*\in S^{n\times n}\), such that \(\textbf{I}\oplus \textbf{A}\textbf{A}^*=\textbf{I}\oplus \textbf{A}^*\textbf{A}=\textbf{A}^*\), then \(\textbf{Y}=\textbf{X}\textbf{A}^*\) is a possible solution to the algebra path problem.

Algebra path problem generalizes a number of graph-theoretic tasks. Let \(G=(V,E)\) be a graph and let \(t\in V\), and let \(S=\mathbb {N}\cup \{\infty \}\), \(\oplus \) be the minimum, \(\otimes \) be the addition, , , \(n=|V|\), \(m=1\), \(\textbf{A}\) be the adjacency matrix of G, and \(\boldsymbol{x} = \textbf{X}\in S^{1\times n}\) be the t-th unit vector (i.e. and for \(j\not =t\)). Then \(\boldsymbol{y}=\textbf{Y}\) is the vector of shortest distances from the t-th vertex [28]. Other instantiations of the semiring and \(\textbf{X}\) give solutions to other problems [30].

Having the semiring instantiated as in the previous paragraph, the quasi-inverse of \(\textbf{A}\in S^{n\times n}\) is defined; it is equal to \((\textbf{I}\oplus \textbf{A})^n\). If \(\textbf{A}\) is the adjacency matrix of some graph, then \(\textbf{A}^*\) is the matrix of shortest distances between the vertices of the same graph. Hence any all-pairs shortest distance (APSD) algorithm is suitable for computing \(\textbf{A}^*\); but it would be inefficient to use for the SSSD problem, particularly when \(\textbf{A}\) is a sparse(ly represented) matrix.

Given \(\boldsymbol{x}\) and symmetric \(\textbf{A}\), Pan and Reif [30], proposed the following algorithm for computing \(\boldsymbol{x}\otimes \textbf{A}^*\) without ever materializing \(\textbf{A}^*\). Let \(d\in \mathbb {N}\) and pick numbers \(n=n_0>n_1>\cdots>n_d>0\). Let \(\textbf{P}\in S^{n\times n}\) be a permutation matrix. Define matrix \(\textbf{A}_0=\textbf{P}\textbf{A}\textbf{P}^T\) (i.e. we permute the rows and columns of \(\textbf{A}\) in the same manner; this corresponds to reordering the vertices of G) and define the matrices \(\textbf{X}_h,\textbf{Y}_h,\textbf{Z}_h,\textbf{A}_{h+1}\) (for \(h\in \{0,\ldots ,d-1\}\)) by

$$\begin{aligned} \begin{bmatrix} {\textbf {X}}_h &{} {\textbf {Y}}_{h}^T \\ {\textbf {Y}}_h &{} {\textbf {Z}}_h \end{bmatrix} := {\textbf {A}}_h \begin{matrix} {\textbf {A}}_{h+1} := {\textbf {Z}}_{h} \oplus {\textbf {Y}}_{h} {\textbf {X}}_{h}^*{\textbf {Y}}_{h}^T \end{matrix} \end{aligned}$$
(1)

where \(\textbf{Z}_h,\textbf{A}_{h+1}\in S^{n_{h+1}\times n_{h+1}}\); this also defines the sizes of \(\textbf{X}_h\) and \(\textbf{Y}_h\). Letting \(\textbf{I}\) and \(\textbf{O}\) denote identity and zero matrices of appropriate sizes, one can verify that the following identity holds:

$$\begin{aligned} \textbf{A}_{h}^*= \begin{bmatrix} \textbf{I} &{} \textbf{X}_{h}^*\otimes \textbf{Y}_{h}^T \\ \textbf{O} &{} \textbf{I} \end{bmatrix}\otimes \begin{bmatrix} \textbf{X}_{h}^*&{} \textbf{O} \\ \textbf{O} &{} \textbf{A}_{h+1}^*\end{bmatrix}\otimes \begin{bmatrix} \textbf{I} &{} \textbf{O} \\ \textbf{Y}_{h} \otimes \textbf{X}_{h}^*&{} \textbf{I} \end{bmatrix}. \end{aligned}$$
(2)

We thus have an algorithm to compute \(\boldsymbol{y}=\boldsymbol{x}\otimes \textbf{A}^*\). Let \({\boldsymbol{x}}_\textrm{P}=\boldsymbol{x}\otimes \textbf{P}^T\). Let \(h=0\). Extract \(\textbf{X}_h,\textbf{Y}_h,\textbf{Z}_h\) from \(\textbf{A}_h\) and compute \(\textbf{X}_h^*\) (using any APSD algorithm), \(\textbf{Q}_h=\textbf{Y}_h\otimes \textbf{X}_h^*\) and \(\textbf{A}_{h+1}\). Note that \(\textbf{X}_h^*\otimes \textbf{Y}_h^T=\textbf{Q}_h^T\). Multiply \({\boldsymbol{x}}_\textrm{P}\) with the first matrix in (2), then the result with the second matrix, and then the result with the third matrix, thus defining \({\boldsymbol{y}}_\textrm{P}={\boldsymbol{x}}_\textrm{P}\otimes \textbf{A}_0^*\). Finally remove the permutation, computing \(\boldsymbol{y}={\boldsymbol{y}}_\textrm{P}\otimes (\textbf{P}^T)^{-1}\). All computations are done with sparse matrices. Importantly, multiplication with the second matrix in (2) splits the current vector into two parts, where the left part is multiplied with \(\textbf{X}^*_h\), and the right part with \(\textbf{A}_{h+1}^*\) through a recursive call. The recursion stops by computing \(\textbf{A}^*_d\) directly (using any APSD algorithm).

Pan and Reif [30] show that if the choice of \(\textbf{P}\) and \(n_0,\ldots ,n_d\) is informed by a separator tree \(\textbf{T}\) of G with height \(d=O(\log n)\) and separators of size \(O(\sqrt{n})\), then, depending on how \(\textbf{X}_h^*\) and \(\textbf{A}_d^*\) are computed, the described algorithm requires either \(O(\log ^3 n)\) parallel time and \(O(n^{3/2}\log n)\) work, or \(O(\sqrt{n}\log n)\) parallel time and \(O(n^{3/2})\) work. The time estimate follows directly from the parallel time complexity of the matrix operations, multiplied by the depth of the recursion. The work estimate follows from careful counting of elements in the sparse representations of matrices [31].

Pan and Reif [31, Sec. 7] describe, what kind of information is extracted from the separator tree \(\textbf{T}\). We refer to them for details, but let us describe the result. The main outcome is a list \(\boldsymbol{\mathcal {L}}=(\mathcal {L}_0,\ldots ,\mathcal {L}_d)\) of lists of lists of vertices of G, such that each \(i\in V\) occurs in \(\boldsymbol{\mathcal {L}}\) exactly once. The permutation matrix \(\textbf{P}\) must reorder the vertices so, that they appear in the same order as in flattened \(\boldsymbol{\mathcal {L}}\). For \(h\in \{0,\ldots ,d\}\), the number \(n_h\) is equal to the number of vertices in \((\mathcal {L}_h,\mathcal {L}_{h+1},\ldots ,\mathcal {L}_d)\). Let \(\boldsymbol{L}=(L_0,\ldots ,L_d)\), where each \(L_h\) is the list of lengths of elements of \(\mathcal {L}_h\) (note that elements of \(\mathcal {L}_h\) are lists of vertices). In (1), \(\textbf{X}_h\) is going to be a block-diagonal matrix with the block sizes listed in \(L_h\); this is used in the computation of \(\textbf{X}^*_h\).

3 Privacy-Preserving Algebraic Shortest Path Protocol

This section presents the privacy-preserving version of the algorithm decsribed above. We present the used data structures, the auxiliary functionalities, and the main computation.

Data Structures. We mostly use the sparse representation of matrices. The representation \(\langle \!\langle {\textbf{A}}\rangle \!\rangle \) for a matrix \(\textbf{A}\in S^{m\times n}\) where we do not hide the position of non- cells, but we hide the contents of these cells, is a triple \(\langle \!\langle {\textbf{A}}\rangle \!\rangle =\langle m,n,\mathcal {C}\rangle \), where \(\mathcal {C}\) is the list of cells of the matrix that may contain an element different from . Each cell is again a triple \((i,j,\llbracket {v}\rrbracket )\), where \(i\in \{0,\ldots ,m-1\}\) and \(j\in \{0,\ldots ,n-1\}\) are the coordinates of that cell, and \(v\in S=\mathbb {N}\cup \{\infty \}\) is the value in it. The value v is stored privately in the ABB. In our implementation on top of Sharemind, we represent elements of S as 64-bit integers (representing \(\infty \) as a large number). In the following, we use the standard list constructors, destructors, and combinators—\(\textsf{NIL},\textsf{cons},\textsf{length},\textsf{head},\textsf{tail},\mathbin {+\!+}\) (concatenation)—to express algorithms working with lists. We write \(\mathcal {C}[k]\) for the k-th element of the list (starting with 0).

We allow the same coordinates (ij) to occur several times in \(\mathcal {C}\). We define that the triple \(\langle m,n,\mathcal {C}\rangle \) represents a \(m\times n\) matrix, where the cell at coordinates (ij) contains the value \(\min \{v \,|\, (i,j,\llbracket {v}\rrbracket )\in \mathcal {C}\}\).

We also make use of the dense representation \(\llbracket {\textbf{V}}\rrbracket \) of (small) matrices. It is simply a matrix of elements of S stored in the ABB.

Auxiliary Functions. We have a relatively large set of helper functions for decomposing and combining matrices, as well as normalizing and converting between different representations. We list them below and shortly describe how they work.

  • \(\textsf{getMin}(\llbracket {\boldsymbol{v}}\rrbracket ,\boldsymbol{\imath })\) takes a private vector of values, and an equal-length public vector of indicators. The indicator vector consists of segments of equal values. If there are k such segments of length \(l_1,\ldots ,l_k\) (with \(|\boldsymbol{v}|=\sum _j l_j\)), then the output of \(\textsf{getMin}\) is a private vector of k values, where the j-th element is the minimum among the elements of \(\boldsymbol{v}\) at the positions corresponding to the j-th segment of equal values in \(\boldsymbol{\imath }\). The implementation of \(\textsf{getMin}\) is straightforward, we can divide \(\llbracket {\boldsymbol{v}}\rrbracket \) into k segments according to the values in \(\boldsymbol{\imath }\), and then call \(\textsf{min}\) from the ABB for all segments in parallel. Sharemind does not directly support such parallel invocation for segments of different length, but it is still possible to design \(\textsf{getMin}\) to run in SIMD fashion, doing \(O(|\boldsymbol{v}|)\) work and requiring \(O(\log \max _j l_j)\) rounds.

  • \(\textsf{norm}_1(\langle m,n,\mathcal {C}\rangle )\) takes a sparsely represented matrix. It returns the same matrix, having sorted elements \((i,j,\llbracket {v}\rrbracket )\) of \(\mathcal {C}\) by (ij). It does not invoke any MPC protocols.

  • \(\textsf{norm}_2(\langle m,n,\mathcal {C}\rangle )\) first invokes \(\textsf{norm}_1\) on its input, and then removes the duplicate occurrences of the same cell from \(\mathcal {C}\). It does the latter by invoking \(\textsf{getMin}\).

  • \(\textsf{getSlice}(\langle m,n,\mathcal {C}\rangle ,u,l,m',n')\) returns the \(m'\times n'\)-sized submatrix of \(\langle m,n,\mathcal {C}\rangle \), whose upper corner is in the cell (ul) of the input matrix. Its output is \(\langle m',n',\mathcal {C}'\rangle \), where \(\mathcal {C}'\) is the list of elements \((i-u,j-l,\llbracket {v}\rrbracket )\), where \((i,j,\llbracket {v}\rrbracket )\in \mathcal {C}\), \(u\le i< u+m'\), and \(l\le j< l+n'\).

  • \(\textsf{overlay}(\langle m,n,\mathcal {C}\rangle ,u,l,m',n'),\) where \(m'\ge m+u\) and \(n'\ge n+l\), outputs \(\langle m',n',\mathcal {C}'\rangle \), where \(\mathcal {C}'\) is the list of elements \((i+u,j+l,\llbracket {v}\rrbracket )\), where \((i,j,\llbracket {v}\rrbracket )\in \mathcal {C}\). I.e. \(\textsf{overlay}\) creates a \(m'\times n'\)-sized supermatrix of the original matrix, where the upper left corner of the original matrix is at position (ul), and the rest of the matrix is filled with .

  • \(\textsf{overlap}(\langle m,n,\mathcal {C}_1\rangle ,\ldots ,\langle m,n,\mathcal {C}_k\rangle )\) returns \(\langle m,n,\mathcal {C}_1\mathbin {+\!+}\cdots \mathbin {+\!+}\mathcal {C}_k\rangle \).

  • \(\textsf{transpose}(\langle m,n,\mathcal {C}\rangle )\) returns \(\langle n,m,\mathcal {C}'\rangle \), where the elements of \(\mathcal {C}'\) are the elements of \(\mathcal {C}\) with their first two components swapped.

  • \(\textsf{identity}(n)\) returns the \(n\times n\) identity matrix, represented sparsely.

  • \(\textsf {sparse-to-dense}(\langle m,n,\mathcal {C}\rangle )\) returns the dense representation of its argument (which has to be normalized). It initializes a \(m\times n\) array of values \(\llbracket {\infty }\rrbracket \), and copies the elements of \(\mathcal {C}\) to their places.

  • \(\textsf {dense-to-sparse}(\llbracket {\textbf{V}}\rrbracket )\) returns the sparse representation of its argument. It returns \(\langle m,n,\mathcal {C}\rangle \), where m and n are dimensions of \(\textbf{V}\), and \(\mathcal {C}\) is a list of length mn, containing one element for each cell of \(\llbracket {\textbf{V}}\rrbracket \).

Major Functions. These include the addition and multiplication of matrices, and the computation of quasi-inverses of block-diagonal matrices. The first of them—pointwise minimum—is simple: if \(\langle \!\langle {\textbf{M}}\rangle \!\rangle \) and \(\langle \!\langle {\textbf{N}}\rangle \!\rangle \) have the same dimensions, then \(\langle \!\langle {\textbf{M}}\rangle \!\rangle \oplus \langle \!\langle {\textbf{N}}\rangle \!\rangle =\textsf{norm}_2(\textsf{overlap}(\langle \!\langle {\textbf{M}}\rangle \!\rangle ,\langle \!\langle {\textbf{N}}\rangle \!\rangle ))\).

The multiplication protocol for sparse matrices, given in Algorithm 1, is also unsurprising. An interesting detail is the transposition (and normalization) of the first matrix before the actual multiplication. In this way, the values of both \(x_1\) and \(x_2\) are non-decreasing during the loop. In our implementation we optimize the inner loop by running only through the segment of \(\mathcal {D}\), where \(x_2=x_1\).

figure k

The only non-local operation in Algorithm 1 is the final \(\textsf{norm}_2\). The addition in line 8 is performed locally by the parties running the protocols implementing the ABB. Both the round complexity and the number of non-free operations of Algorithm 1 depend on the cells included in \(\mathcal {C}\) and \(\mathcal {D}\).

Algorithm 2 for quasi-inverse of \(\langle \!\langle {\textbf{M}}\rangle \!\rangle \) finds the quasi-inverse of each block of \(\textbf{M}\), and then combines the blocks. The input to Algorithm 2 is a list of sizes of the blocks on the main diagonal of \(\textbf{M}\); the sum of elements of \(\boldsymbol{B}\) has to be n. We use the Floyd-Warshall APSD algorithm [18] for computing the quasi-inverse of a single block. We have adapted our privacy-preserving implementation [2, Alg. 8] to compute the APSD for several (adjacency) matrices at the same time, such that the round complexity of Algorithm 2 is \(O(\max \boldsymbol{B})\), while the number of non-free operations is \(O(\sum _i (B[i])^3)\). Our experiments [2] show that despite greater round complexity, Floyd-Warshall is faster than repeated squaring.

figure l

Main Computation. The computation corresponding to the multiplication \(\boldsymbol{x}\otimes \textbf{A}^*\) according to (2) is given in Algorithm 3. It takes as inputs the sparse matrix representations of both \(\textbf{A}\) and \(\boldsymbol{x}\), where we think of the latter as a matrix with a single row. The multiplication operation also takes as input a list \(\boldsymbol{L}\) of lists of block-sizes; it is formed on the basis of the separator tree of the graph having the adjacency matrix \(\textbf{A}\) (described at the end of Sect. 2.2), its length is \(d+1\).

figure m

Algorithm 3 closely follows (1)–(2). The current length of \(\boldsymbol{L}\) describes the current depth of the recursion; length 1 (checked in line 3) indicates the base. Otherwise, \(\langle \!\langle {\textbf{A}}\rangle \!\rangle \) is the reprensetation of one of the matrices \(\textbf{A}_h\). We start by decomposing \(\textbf{A}_h\) into \(\textbf{X}_h\) (and find its quasi-inverse, using the list of lengths in the first element of \(\boldsymbol{L}\)), \(\textbf{Y}_h\) and \(\textbf{Z}_h\), compute \(\textbf{Q}_h\) and \(\textbf{A}_{h+1}\), multiply \(\langle \!\langle {\boldsymbol{x}}\rangle \!\rangle \) with the first matrix in (2). We will then split the resulting vector \(\boldsymbol{z}\) into two parts of lengths s and \(n-s\), and multiply the left half with \(\textbf{X}_h^*\). We now recursively call Algorithm 3 with the right half of \(\boldsymbol{z}\), with \(\textbf{A}_{h+1}\), and with the list of lists of lengths missing the first element. We complete the computation by concatenating the two vectors, and multiplying it with the third matrix in (2). The round complexity, and the number of invoked ABB operations follow directly from Pan and Reif’s analysis [31].

In order to compute the distances from a vertex t of an undirected graph \(G=(V,E)\) with public locations, but private lengths of edges, we have to perform more steps before and after invoking Algorithm 3, but all these steps are public. Starting from the sparsely represented adjacency matrix \(\langle \!\langle {\textbf{A}}\rangle \!\rangle \) of G, we have to find the separator tree of G, permute the vertices of G (giving us the matrix \(\langle \!\langle {\textbf{A}_0}\rangle \!\rangle \)), and create the list \(\boldsymbol{L}\). We have to create the vector \(\langle \!\langle {\boldsymbol{x}}\rangle \!\rangle \) as a unit vector, where we have the value only at the position corresponding to the location of vertex t after the permutation. After calling Algorithm 3 with \(\langle \!\langle {\textbf{A}_0}\rangle \!\rangle \), \(\boldsymbol{L}\) and \(\langle \!\langle {\boldsymbol{x}}\rangle \!\rangle \), we have to apply the inverse permutation to the resulting vector \(\langle \!\langle {\boldsymbol{y}}\rangle \!\rangle \).

4 Security and Privacy of Protocols

The privacy-preserving APC protocol is built on top of a universally composable ABB. It receives its private inputs through the handles to values stored in the ABB, and returns its private outputs in the same fashion. The protocol contains no \(\textsf{declassify}\)-operations. Hence, as discussed in Sect. 2.1, it inherits the same security properties against various adversaries as the underlying secure computation protocol set. In particular, if the ABB is implemented by the Sharemind MPC platform, then the resulting APC protocol is a three-party protocol, working with public locations but secret-shared lengths of edges, and provides information-theoretic security against an adversary passively corrupting at most one of the parties.

5 Empirical Evaluation

5.1 Privacy-Preserving Bellman-Ford with Public Edges

We want to compare the APC protocol with protocols based on classical SSSD algorithms, where the locations of edges are public, but their lengths are private. We see that Dijkstra’s algorithm cannot benefit from public location of edges, because the order in which it relaxes the vertices depends on the lengths of the edges, thus the random permutation of vertices that could hide that order [1] would make the locations of edges private again. Hence we think that it is fair to compare the new protocol against a protocol based on the Bellman-Ford (BF) algorithm.

Such privacy-preserving algorithm is given in Algorithm 4. We see that at each iteration of the main loop, it defines \(\llbracket {\boldsymbol{a}}\rrbracket \) as the current distance of the start vertex of each edge from s. Vector \(\llbracket {\boldsymbol{b}}\rrbracket \) will then record the current distance of the end vertex of each edge, when the last step is made over this edge. The same \(\textsf{getMin}\) operation as in Sect. 3 is used to find the minimum distance for each vertex. We see that the number of non-free operations executed by Algorithm 4 is O(mn), while its round complexity is \(O(n\log D)\), where D is the maximum in-degree of a vertex.

figure o

5.2 Setup of Benchmarking

We have implemented the APC and BF algorithm on the Sharemind MPC platform, using the SecreC language [33] offered by this platform. The benchmarking took place on three servers with 12-core 3 GHz CPUs with Hyper-Threading running Linux, and 48 GB of RAM, connected by an Ethernet 1 Gbps LAN. The local computations in Sharemind MPC are single-threaded, and there is no support for performing computations and network operations at the same time.

We want to measure the performance in different network environments, corresponding to LAN and WAN deployments. We throttle the connections between the servers in order to simulate these environments. In our experiments, we consider “HBLL”, “HBHL” and “LBHL” settings. Here HB (high-bandwidth) means 1 Gbps and LB (low-bandwidth) 100 Mbps link speed between servers. Also, LL (low-latency) means no added delay for the messages sent between the servers, while HL (high-latency) means additional 40 ms delay.

The performance of the APC algorithm is highly dependent on the locations of edges. As we are most interested in the performance of the algorithms on planar graphs, and as we want to focus on optimizing the privacy-preserving computations, not the computation of the separator tree, we have selected grid graphs as the family of graphs on which we have performed benchmarking. The \(R\times C\) grid graph has RC vertices that can be thought as being placed in a \(R\times C\) grid. Each vertex is connected with 4 of its closest neighbours (less for vertices at the edges of the grid); the number of (undirected) edges is \((2RC-R-C)\). Grid graphs have easy-to-compute separators of size \(\min (R,C)\) that split their set of vertices into two roughly equal parts; the height of the resulting separator tree is \(\approx \log R+\log C\). In the following we let G(N) denote the \(N\times N\) grid graph.

5.3 Measuring the Performance of Algebraic Path Computation

We report the running times and the bandwidth consumption (per computing server) for grid graphs G(N) for different values of N, on Sharemind cluster for the HBLL network environment in Table 1. The times correspond to the execution of Algorithm 3; we have not measured the time it takes to construct the separator tree, the lists \(\boldsymbol{\mathcal {L}}\) and \(\boldsymbol{L}\), or to permute the matrices and vectors.

Table 1. Running time (in seconds) and bandwidth consumption of privacy-preserving algebraic path computation protocol for graphs G(N)

The largest grid graph that we ran our implementation on, was G(600). This graph has 360 k vertices and \(\approx \)1.4 M (directed) edges. We are not aware of any previous executions of privacy-preserving SSSD on graphs of similar size, no matter if the locations of edges are private or not, or what the actual shape of the graph is. We see that the running time for such a graph was a bit over 16 h, which may be practical for certain settings.

In Fig. 1, we compare the running time of privacy-preserving Algebraic path computation protocol on graphs of different sizes in different network environments. We see that for small graphs, the performance only depends on the latency of the network. Only for graphs with 1000 or more vertices (\(N=33\)) does the available bandwidth start having an effect.

Fig. 1.
figure 1

Performance of algebraic path computation protocol on graphs with given numbers of vertices in different network environments

5.4 Comparison of APC and BF Protocols

The running times of both privacy-preserving SSSD protocols that use public edges—Bellman-Ford and Algebraic path computation—for the sparse representation of the graphs are illustrated in Table 2. The experiments also show average bandwidths in different network environments. The running times of all graphs in different network environments for Algebraic path computation are lower than the running times of the Bellman-Ford protocol. Similarly, the bandwidth consumption in Algebraic path computation is smaller than bandwidth in Bellman-Ford protocol.

In Table 2, the execution times of the Bellman-Ford protocol on larger graphs have been estimated: we benchmarked the larger examples by running only a few iterations of the main loop in Algorithm 4, measured the running time of a single iteration, and then multiplied with the total number of iterations.

Table 2. Benchmarking results (bandwidth for a single computing server) for Bellman-Ford and Algebraic path protocol in different network environments, for grid graphs G(N)
Fig. 2.
figure 2

Performance (time in seconds) of Bellman-Ford Version 3 and Algebraic path computation protocols on graphs of different sizes in different network environments (red: HBLL, green: HBHL, blue: LBHL, light: Bellman-Ford, dark: Algebraic path computation) (Color figure online)

We depict the running times also in Fig. 2, presenting the comparison of Algebraic path computation and Bellman-Ford protocol for different network environments. We see that despite the simple structure of Bellman-Ford, Algebraic path computation is still faster also in high-latency environments.

6 Conclusion and Future Work

We have shown that designers of privacy-preserving applications working with data in graph form and needing to find the distances between vertices should look beyond the classical SSSD algorithms when selecting the protocol for shortest paths’ computation on top of a SMC framework. Even though many of the Parallel RAM algorithms proposed for SSSD have components that are not easily converted into parallel privacy-preserving protocols (e.g. the spawning and scheduling of tasks based on private data), there may be algorithms that process data sufficiently uniformly in order to serve as basis of SMC protocols.

We have shown how APC may be used to compute SSSD in privacy-preserving manner. It gives us efficient protocols, compared to classical SSSD algorithms. The same semiring framework may be instantiated in different ways, and be used for solving other graph problems, e.g. finding the minimum spanning trees or solving the all-pairs shortest distance problem. These algorithms may be converted into SMC protocols exactly as we have done here, with the only possible slight difference arising from the scalar \(\otimes \)-operation no longer being free.

In this paper, we have presented a protocol for undirected graphs. The APC algorithm is equally well applicable to directed graphs [31, Remark 6.1], and this change can also be implemented on top of an ABB.

In this paper, we have required the locations of edges to be public. We believe that a protocol with private locations is possible. This would not significantly change the subroutines. Still the matrix multiplication may become more expensive due to the need to run through both loops in Algorithm 1, and quasi-inverse will become more expensive due to the need to consider a central stripe of diagonals, instead of just the blocks on the main diagonal. There may be more changes to the main computation, as we no longer know the sizes of matrices; hence padding may be necessary. Also, the main computation would receive the list of lists of block-sizes as a private parameter, too.

Computing that list of lists of block-sizes privately is likely an even more complex problem. We are not aware of efficient parallel RAM algorithms for computing the separator tree, that could be easily converted to run on top of a SMC framework.