1 Introduction

Interactive proofs are a natural extension of non-determinism and have become a fundamental concept in complexity theory and cryptography. The study of interactive proofs has led to many of the exciting notions that are at the heart of several areas of theoretical computer science, including zero-knowledge proofs [40, 41] and probabilistically checkable proofs (PCPs) [4, 10, 11].

An interactive proof is a protocol between a randomized verifier and a powerful but untrusted prover. The goal of the prover is to convince the verifier regarding the validity of some statement. If the statement is indeed correct, we require that the verifier should accept an honestly generated proof with high probability. Otherwise, if the statement is false, the verifier should reject with high probability any maliciously crafted proof. A particularly interesting and practical case is when the verifier is significantly weaker than the prover in some aspect. Typically, verifiers that are weaker in terms of computational abilities are studied, but other sorts of limitations are relevant.

The standard model of interactive proofs, as described above, has a key limitation: The data must be held by a prover modeled as a single machine. A scenario where the data is distributed among multiple parties is not natively supported. Indeed, large organizations nowadays store vast amounts of data, often reaching petabytes or even exabytes in size. To store and efficiently manage such enormous volumes of data, these organizations utilize massive data-centers. With existing succinct arguments, if such an organization takes up the role of the prover, the only way to use existing interactive proofs technology is by essentially aggregating the data at a single machine. However, the latter is physically impossible as there is no one machine that can store so much data.

Motivated by the above scenario, in this work, we study interactive proofs for distributed provers. We first define a concrete model that captures the constraints of such a distributed setting, and then design new interactive proofs in our model.

The Distributed Computation Model. We imagine an enormous data-set, the size of which is denoted \(N\). The data is stored in a cluster split among \(M\) machines; i.e., every machine stores roughly a size \(N/M\) portion of the data-set. As an example, imagine that \(N=10^{17}\) bytes (100 petabytes) and that each machine has a hard-disc capable of storing \(10^{13}\) bytes (10 terabytes). Then, a cluster consisting of \(10^4={10,000}\) machines is needed. (Clearly, there is no single machine capable of storing 100 petabytes).

The distributed prover is the above cluster, consists of \(M\) server machines. The verifier is another machine, as powerful as a single machine in the cluster, i.e., it can store \(N/M\) bits of information. The goal of the distributed prover is to convince the verifier of the validity of some statement about its data-set. The distributed prover can perform arbitrary communication (server-to-server or server-to-client) and local computation, as long as it respects the space constraint of each machine. If we care about computational complexity of the (honest or malicious) prover, we shall also require that the local computation of the (honest or malicious) provers is polynomial time. Each server machine and the client have their own private source of randomness.

The above logic coincides with the rationale behind the Massively Parallel Computation (MPC) model. This model was invented to capture popular modern parallel computation programming paradigms such as MapReduce, Hadoop, and Spark, designed to utilize parallel computation power to manipulate and analyze huge amounts of data. In this model, first introduced by Karloff, Suri, and Vassilvitskii [42], the size \(N\) data-set is stored in a distributed manner among \(M\) machines. The machines are connected via pairwise communication channels and each machine can only store \(S= N^\epsilon \) bits of information locally for some \(\epsilon \in (0,1)\). Naturally, we assume that \(M \ge N^{1-\epsilon }\) so that all machines can jointly at least store the entire data-set.

The primary metric for the complexity of algorithms in this model is their round complexity. Reasonable polynomial-time computations that are performed within a machine are considered “for free” since the communication is often the bottleneck. We typically want algorithms in the MPC framework to have a small number of rounds, say, poly-logarithmically or even sub-logarithmically many rounds (in the total data size \(N\)). With the goal of designing efficient algorithms in the MPC model, there is an immensely rich algorithmic literature suggesting various non-trivial efficient algorithms for tasks of interest, including graph problems [1,2,3, 6,7,8,9, 12, 15, 16, 26, 28, 31], clustering [13, 14, 33, 39] and submodular function optimization [32, 34, 46].

Succinct Arguments in the MPC Model. In this work, we study the question of constructing interactive argument systems in the MPC model, where the “prover” is a cluster of machines, each with \(N^\epsilon \) maximum storage, where N is the size of the witness, and the client is also a machine with the same storage restriction. Note that it is unrealistic to achieve an argument system for all polynomial-time computable functions in this model, because there are various results showing that not all such functions can be computed in the MPC model [30, 54]. Thus, we aim for the best-possible goal: to prove a statement whose verification algorithm is itself an MPC algorithm.

We design an argument system that supports clusters acting as provers and where the protocol respects the requirements of the MPC model. Specifically, we prove the following theorem.

Theorem 1 (Main result; Informal)

Let \(R = \{ (x, w) \}\) be any relation which has a massively-parallel verification algorithm \(\varPi \) among \(M = N^{1-\epsilon }\) parties each with space \(N^\epsilon \), where \(N = |w |\), and \(|x |\le N^\epsilon \).

Then there exists an argument system \(\varPi '\) for R in the MPC model, which has M space-bounded provers \(P_1, \dots , P_M\), and convinces a space-bounded verifier V that \(x \in L_R\). The protocol \(\varPi '\) has space overhead multiplicative in \(\textsf{poly}(\lambda )\) relative to \(\varPi \), where \(\lambda \) is a security parameter, and has round overhead multiplicative in \(\textsf{polylog}(N)\).

Under standard falsifiable cryptographic assumptions, the argument \(\varPi '\) is sound in the CRS model against malicious provers with arbitrary \(\textsf{poly}(N, \lambda )\) running time and space.

Our protocol’s soundness relies on the existence of groups of hidden order, which can be instantiated based on the RSA assumption [53] or on class groups [27, 57].

To put the above result in better context, we mention a recent work of Fernando et al. [35] (building on [36]) who built a secure computation compiler for arbitrary MPC protocols. That is, they compile any MPC protocol into secure counterparts, which still respect the constraints of the model. In particular, their protocol can be used as an argument system in the cluster-verifier model we introduce above. Unfortunately, their compiler relies on (publicly verifiable) succinct non-interactive proofs of knowledge (SNARKs), which are well-known not to be constructible based on falsifiable assumptions [38, 49]. Our main contribution, and the main technical challenge we overcome, is achieving such an argument system relying only on falsifiable assumptions. As a bonus, we mention that if we instantiate the hidden order group using class groups, our protocol requires only a common random string, whereas the SNARK based solution requires a structured common reference string.

1.1 Techniques: Distributed IOPs and Distributed Streaming Polynomial Commitments

To achieve our main result, we use recent work on interactive oracle proofs (IOPs). Recall that the IOP model is a proof system model that combines features of interactive proofs (IPs) and probabilistically checkable proofs (PCPs). In this model, the verifier is not required to read the prover’s messages in their entirety; rather, the verifier has oracle access to some of the prover’s messages (viewed as strings), and may probabilistically query these messages during its interaction with the prover. IOPs strictly generalize PCPs, and serve as a convenient intermediate model for getting succinct “plain model” protocols. Many recent succinct arguments have been constructed by first giving a protocol in the IOP model, and then using a vector commitment or polynomial commitment to instantiate the IOP oracle [18, 21, 22, 27, 29, 37].

We extend the IOP model to a setting where the prover is distributed — here on referred to as the distributed IOP. We imagine a prover that is made up of a collection of servers that can communicate between themselves via peer-to-peer channels, as in the classical distributed cluster-verifier MPC model. But, communication between any server and the verifier occurs as in the IOP model: the verifier has oracle access to a large string committed to by the server, in addition to being able to communicate directly with any of the parties comprising the server.

We build a distributed IOP in the MPC model analogous to the “plain model” protocol we stated above. Specifically, given a distributed, massively-parallel protocol \(\varPi \) for verifying a relation R, we construct a distributed argument system \(\varPi '\) which works in this new IOP model, and where a distributed group of provers convince a verifier V that some \(x \in L_R\). Our argument uses a polynomial commitment oracle, where each prover first streams evaluations of some multilinear polynomial W over some subset of the Boolean hypercube, and where at the end the provers have collectively defined W by their evaluations. The verifier then interacts with the prover and queries this polynomial IOP oracle in order to verify the statement x.

Our IOP is inspired by the work of Blumberg et al. [23], who give an IOP for RAM programs, where the prover’s running time and space are approximately preserved in relation to the running time and space of the verification algorithm. At a very high level, the [23] IOP has the prover commit to a polynomial \(\hat{W}\), which encodes the RAM computation, and then has the prover and verifier run a sumcheck argument in relation to a polynomial h that is based on \(\hat{W}\). The polynomial h has the property that it can be evaluated at any point via a constant number of evaluations of \(\hat{W}\). At the end of the sumcheck, the verifier can thus query the IOP oracle in order to do the final random evaluation of h.

We would like to use a similar strategy to [23], having the provers encode a polynomial \(\hat{W}\) which encodes the MPC computation, and then using a sumcheck argument to verify the truthfulness of \(\hat{W}\). However, since \(\hat{W}\) now encodes an interactive protocol between RAM programs \(\varPi _L\), instead of just a RAM computation, it is unclear how the provers would be able to generate sumcheck messages without rerunning the MPC protocol many times, thus blowing up the communication complexity.

To solve this, we use several ideas. First, for each round of the MPC protocol, the provers commit to a concatenation \(\pi _r\) of their states after the round is finished, using a Merkle tree-based succinct commitment. This defines a statement \((r, \pi _{r-1}, \pi _r)\), where a witness for this statement is a set of decommitments for \(\pi _{r-1}\) and \(\pi _r\) which show honest behavior during this round. If we can build a knowledge-sound argument for this statement which works in the MPC model and is round-efficient, this is sufficient to build an argument for honest execution of the whole protocol \(\varPi _L\). We then design an IOP similar to [23] for proving the statement \((r, \pi _{r-1}, \pi _r)\). Note that even though we have reduced to proving honesty of one round, we still have the problem that knowledge of \(\hat{W}\) is spread across all the provers, and no single prover knows the whole description of \(\hat{W}\). Thus it is still unclear how the provers will generate the sumcheck provers’ messages in a round-efficient way. The main technical part of our paper deals with how to do this.

Polynomial Commitments. Once we have an IOP for L, we still need to instantiate it using a polynomial commitment scheme. Informally, a polynomial commitment scheme allows a prover to commit to some low degree polynomial f, and provide evaluations f(x) to a verifier along with a (interactive) proof that the provided evaluation is consistent with the commitment. Polynomial commitments were introduced by [43] and have recently drawn significant attention due to their use in compiling oracle proof systems (e.g., PCPs and IOPs) into real world proof systems (e.g., arguments). A sequence of works [5, 17, 19, 24, 25, 27, 44, 47, 52, 55, 56, 59] have studied several different aspects of efficiency including getting constant-sized proofs/commitments, sublinear (even polylogarithmic) time verification, as well as linear prover time. However, these works consider a monolithic prover that stores the entire polynomial locally. This is in stark contrast with our setting where there are multiple provers \(P_1, \dots , P_M\), each of which only have streaming access to a small piece of the description of the polynomial. Looking ahead, the polynomial in our context is the description of the transcript of the RAM computation, which can be generated as a stream.

The works that come closest to our requirements are that of Block et al. [21, 22] who introduced the streaming model of access where a monolithic prover has streaming access to the description of the polynomial. They build a logarithmic round polynomial commitment scheme in the streaming model where the prover’s memory usage is logarithmic, the prover time is quasilinear, and requires only a logarithmic number of passes over the stream. Using such a polynomial commitment scheme they build a succinct argument for RAM computation where the prover is both time- and space- optimal. The key structural property of their construction that allows for this small-space implementation in the streaming model is: they show that for each of the logarithmic rounds, prover’s messages in the interactive proof of consistency can be expressed as a linear combination of the elements in the description stream. Therefore, it is sufficient for the monolithic prover to take a single pass over its stream to compute its message in every round. Although, their work still considers a monolithic prover, this structural property is the starting point of our work. In particular, we observe that the natural adaptation of Block et al. [22] commitment scheme to our setting suffices for our purposes. In fact, when the cluster of provers \(P_1, \ldots , P_M\) is viewed as a monolithic prover, then the two schemes are identical. This allows us to base our security on that of Block et al. [22], which in turn, is based on groups of hidden order (e.g., RSA and class groups). Due to the above structural property, in each of the rounds, each of the provers in the cluster can (a) first compute their contributions to this round’s message in small space, while making a single pass over their stream, and (b) then all provers can combine their contributions in logarithmic (in \(M\)) rounds via a tree-based protocol to compute the full round message.

We present our construction in the MPC model in Sect. 6.2. Along the way, we introduce the definition of polynomial commitments in the MPC model tailored to the case of multilinear polynomials in Sect. 4.

1.2 Related Work

The terminology of distributed interactive proofs appeared in several prior works, all of which differ significantly from our notion. The works [20, 45, 50] all study a variant of interactive proofs where the verifier is distributed but the prover is a single machine. The work of [51, 58] allow multiple (potentially mutually-distrusting) provers to efficiently derive a single SNARK for a large statement/witness pair. While their goal on the surface is similar to ours, both works inherently require non-falsifiable assumptions since they rely on SNARKs. In contrast, the main contribution of our work is in building a succinct argument system that does not require non-falsifiable assumptions.

1.3 Organization

The rest of the paper is organized as follows. Section 2 contains preliminaries. In Sect. 3, we define the MPC model and security properties required for argument systems in this model. In Sect. 4, we define polynomial commitments that work with distributed committers. Section 5 contains the main construction of succinct arguments in the MPC model. Section 6 contains our adaptation of the [22] polynomial commitment.

2 Preliminaries

Let S be some finite, non-empty set. By \(x \leftarrow S\) we denote the process of sampling a random element x from S. For any \(k \in \mathbb {N}\), by \(S^k\) we denote the set of all sequences/vectors of length k containing elements of S where \(S^0 = \{\epsilon \}\) for empty string \(\epsilon \). We let \(\mathbb {F}= \mathbb {F}_p\) denote a finite field of prime cardinality p. We assume that \(\vec b = (b_n, \ldots , b_1)\), where \(b_n\) is the most significant bit and \(b_1\) is the least significant bit. For bitstrings \(\vec b \in \{0,1\}^n\), we naturally associate \(\vec b\) with integers in the set \(\{0, \ldots , 2^n-1\}\), i.e., \(\vec b = \sum _{i=1}^n b_i \cdot 2^{i-1}\). For any two equal sized vectors \(\vec u, \vec v\), by \(\vec u \odot \vec v\) we denote the coordinate-wise multiplication of \(\vec u\) and \(\vec v\). We use uppercase letters to denote matrices, e.g., \(A \in \mathbb {Z}^{m \times n}\). For \(m \times n\) dimensional matrix A, \(A(i, *)\) and \(A(j, *)\) denote the i-th row and j-th column of A, respectively.

Notation for Matrix-Vector “Exponents". For some group \(\mathbb {G}\), \(A \in \mathbb {Z}^{m \times n}\). \(\vec u = (u_1, \ldots , u_m) \in \mathbb {G}^{1 \times m}\), and \(\vec v = (v_1, \ldots , v_m)^\top \in \mathbb {G}^{n \times 1}\), we let \(\vec u \star A\) and \(A \star \vec v\) denote a matrix-vector exponent, defined for every \(j\in [n]\), \(i' \in [m]\) as

$$\begin{aligned} (\vec u \star A)_j = \prod _{i=1}^m u_i^{A(i,j)} \ ; \ (A \star \vec v)_{i'} = \prod _{j'=1}^n v_{j'}^{A(i',j')} \ , \end{aligned}$$

For any vector \(\vec x \in \mathbb {Z}^n\) and group element \(g \in \mathbb {G}\), we define \(g^{\vec x} = (g^{x_1}, \ldots , g^{x_n})\). Finally, for \(k \in \mathbb {Z}\) and a vector \(\vec u \in \mathbb {G}^n\), we let \(\vec u^k\) denote the vector \((u_1^k, \ldots , u_n^k)\).

2.1 Multilinear Polynomials

An n-variate polynomial \(f : \mathbb {F}^n \rightarrow \mathbb {F}\) is multilinear if the individual degree of each variable in f is at most 1.

Fact 1

A multilinear polynomial \(f : \mathbb {F}^n \rightarrow \mathbb {F}\) (over a finite field \(\mathbb {F}\)) is uniquely defined by its evaluations over the Boolean hypercube. Moreover, for every \(\vec \zeta \in \mathbb {F}^n\),

$$\begin{aligned} f(\vec \zeta ) = \sum _{\vec b \in \{0,1\}^{n}} f(\vec b) \cdot \prod _{i = 1}^n \chi (b_i, \zeta _i)\ , \end{aligned}$$

where \(\chi (b,\zeta ) = b \cdot \zeta + (1 - b) \cdot (1 - \zeta )\).

As a shorthand, we will often denote \(\prod _{i = 1}^n \chi (b_i, \zeta _i)\) by \(\overline{\chi }(\vec b, \vec \zeta )\) for \(n = |\vec b| = |\vec \zeta |\).

Notation for Multilinear Polynomials. Throughout, we denote a multilinear polynomial f by the \(2^n\) sized sequence \(\mathcal {Y}\) containing its evaluations over the Boolean hypercube. That is, \(\mathcal {Y}= (f(\vec b) : \vec b \in \{0,1\}^n)\), and denote the evaluation of the multilinear polynomial defined by \(\mathcal {Y}\) on the point \(\vec \zeta \) as \(\textsf{ML}(\mathcal {Y}, \vec \zeta ) = \sum _{\vec b \in \{0,1\}^n} \mathcal {Y}_{\vec b} \cdot \overline{\chi }(\vec b, \vec \zeta )\).

3 Model Definition

In the massively-parallel computation (MPC) model, there are \(M\) parties (also called machines) and each party has a local space of \(S\) bits. The input is assumed to be distributed across the parties. Let \(N\) denote the total input size in bits; it is standard to assume \(M\ge N^{1 - \varepsilon }\) and \(S= N^\varepsilon \) for some small constant \(\varepsilon \in (0, 1)\). Note that the total space is \(M\cdot S\) which is large enough to store the input (since \(M\cdot S\ge N\)), but at the same time it is not desirable to waste space and so it is commonly further assumed that \(M\cdot S\in \tilde{O}(N)\) or \(M\cdot S= N^{1 + \theta }\) for some small constant \(\theta \in (0, 1)\). Further, assume that \(S= \varOmega (\log M)\).

At the beginning of a protocol, each party receives an input, and the protocol proceeds in rounds. During each round, each party performs some local computation given its current state (modeled as a RAM program with maximum space \(S\)), and afterwards may send messages to some other parties through private authenticated pairwise channels. An MPC protocol must respect the space restriction throughout its execution, even during the communication phase—to send a message at the end of a round, a party must write that message in some designated place in memory, and in order to receive a message at the end of a round, a party must reserve some space in memory equal to the size of the message. This in turn implies that each party can send or receive at most \(S\) bits in each round. An MPC algorithm may be randomized, in which case every machine has a sequential-access random tape and can read random coins from the random tape. The size of this random tape is not charged to the machine’s space consumption.

3.1 Succinct Arguments in the MPC Model

We are interested in building a succinct argument in this model for some NP language L, where the witness \(w = (w_1, \dots , w_M)\) for \(x \in L\) has size much larger than \(S\). The prover role is carried out by a group of \(S\)-space-bounded parties \(P_1, \dots , P_M\), each of which has the statement x and one piece \(w_i\) of the witness. They work together to convince a verifier V, which is also \(S\)-space-bounded. Since any prover must at least be powerful enough to verify that \((x, w) \in R_L\), and the MPC model is not known to capture P when the rounds are bounded, we only consider languages L where the verification algorithm \(R_L :( (x, w_1), \dots , (x, w_M)) \rightarrow \{0,1\}\) is implementable by a MPC protocol \(\varPi _L\) where each party is \(S\)-space-bounded. Given such a protocol, our goal is to build a new MPC protocol \(\varPi _L'\) between \(M+1\) parties \(P_1, \dots , P_M, V\), where \(P_i\) has input \((x, w_i)\) and V has input x, which satisfies the properties discussed below.

Communication Model and Setup. We assume a synchronous setting, with pairwise channels between parties. We also allow for a CRS \(\textsf{Setup}(1^\lambda ) \rightarrow (\alpha _1,\ldots ,\alpha _M)\), where party i receives \(\alpha _i\) at the beginning of the protocol. Since each party must store some \(\alpha _i\), it is clear that \(|\alpha _i |\le S\) for all i. Looking ahead, in our protocol, all parties get the same CRS string \(\alpha \) which is a description of a group of size \(2^\lambda \), that is, \(\alpha _i = \alpha \) for all \(i \in [M]\).

Efficiency Requirements. We want to build a protocol \(\varPi '_L\) which has efficiency properties as close as possible to the original verification protocol \(\varPi _L\). Specifically, if in \(\varPi _L\) each party uses space bounded by \(S\), in \(\varPi '_L\) each party’s space should be bounded by \(S\cdot p(\lambda )\), for some fixed polynomial p. Moreover, if \(\varPi _L\) takes r rounds, \(\varPi '_L\) should take a small multiplicative factor \(r\cdot \beta \) rounds. In this paper, we set \(\beta = \textsf{polylog}(N)\).

Security Requirements. Let \(\alpha \) be the output of the setup algorithm, and denote with

$$\varPi '_L\left\langle \left[ P_1, \dots , P_M\right] , V \right\rangle \left( 1^\lambda , \alpha , x, w = (w_1, \dots , w_M) \right) $$

the output of the protocol \(\varPi _L'\) with interactive RAM programs \(P_1, \dots , P_M\) playing the roles of the \(M\) provers, and with the interactive RAM program V playing the role of the verifier, where each \(P_i\) is initialized with input \((1^\lambda , \alpha , x, w_i)\), and V is initialized with input \((1^\lambda , \alpha , x)\). Similarly, denote with

$$\varPi '_L\left\langle \mathcal {A}, V \right\rangle \left( 1^\lambda , \alpha , x, w = (w_1, \dots , w_M) \right) $$

the output of the protocol \(\varPi '_L\) with an interactive monolithic RAM program \(\mathcal {A}\) playing the role of all provers \(P_1, \dots , P_M\), and with the interactive RAM program V playing the role of the verifier, where \(\mathcal {A}\) is initialized with the inputs of all \(P_i\) as defined above, and V is initialized in the same way as above.

We require \(\varPi '_L\) satisfies completeness and soundness, defined as follows.

Definition 1 (Completeness)

Let L be a language with a corresponding MPC protocol \(\varPi _L\) which implements the verification functionality for \(R_L\). For all \((x, w) \in R_L\) and for all \(\lambda \), letting \(m = m(|x|)\),

$$\Pr \Bigl [ \varPi '_L\left\langle \left[ P_1, \dots , P_M\right] , V \right\rangle \left( 1^\lambda , \textsf{Setup}(1^\lambda ), x, w\right) = 1 \Bigl ] = 1,$$

where \(P_1, \dots , P_M\) (resp., V) are the honest provers (resp., verifier), and the probability is taken over random coins of the parties and of the setup algorithm.

Definition 2 (Soundness)

Let L be a language with a corresponding MPC protocol \(\varPi _L\) which implements the verification functionality \(R_L\). Fix a PPT adversary \(\mathcal {A}= (\mathcal {A}_1, \mathcal {A}_2)\), where \(\mathcal {A}_1\) takes as input the security parameter and the output of \(\textsf{Setup}\), and chooses an input x, and where \(\mathcal {A}_2\) plays the roles of the provers \(P_1, \dots , P_M\). Then \(\varPi _L\) is said to satisfy soundness if there exists a negligible function \(\textsf{negl}\) such that for all \(\lambda \),

$$ \Pr \left[ \begin{array}{c} x \notin L\ \wedge \\ \varPi '_L\left\langle \mathcal {A}_2, V \right\rangle \left( 1^\lambda , \alpha , x, \bot \right) = 1 \end{array} :\begin{array}{l} (\alpha _1, \ldots , \alpha _M) \leftarrow \textsf{Setup}(1^\lambda ) \\ x \leftarrow \mathcal {A}_1(\lambda , \alpha _1, \dots , \alpha _M) \end{array} \right] < \textsf{negl}(\lambda ).$$

To prove soundness of our protocol, we show the stronger property of witness-extended emulation as formalized by Lindell [48]. Intuitively, witness-extended emulation requires the existence of an efficient extractor that can simulate an adversarial prover’s view while extracting the underlying witness. Below we formally extend the standard definition to the MPC setting in the natural way.

Definition 3 (Witness-Extended Emulation)

Let L be a language with a corresponding MPC protocol \(\varPi _L\) which implements the verification functionality \(R_L\). Fix a PPT adversary \(\mathcal {A}= (\mathcal {A}_1, \mathcal {A}_2)\), where \(\mathcal {A}_1\) takes as input the security parameter and the output of \(\textsf{Setup}\) and chooses an input x along with a private state \(\sigma \), and where \(\mathcal {A}_2\) takes this \(\sigma \) as input and plays the roles of the provers \(P_1, \dots , P_M\). Then \(\varPi _L\) is said to satisfy witness-extended emulation with respect to L (and \(R_L\)) if there exists an (expected) PPT machine \(\mathcal {E}\) (called the “extractor”) and a negligible function \(\textsf{negl}\) such that the following holds. Define two distributions \(\mathcal {D}_1^\lambda \) and \(\mathcal {D}_2^\lambda \) based on \(\mathcal {A}\) and \(\mathcal {E}\), as follows:

  • \(\mathcal {D}_1^\lambda \): Compute the setup \(\alpha \leftarrow \textsf{Setup}(1^\lambda )\) and then compute \((x,\sigma ) \leftarrow \mathcal {A}_1(1^\lambda , \alpha )\), then output \((\alpha , r_\mathcal {A}, r_V, x, \tau )\), where \(\tau \) is the transcript of messages obtained by the execution \(\varPi '_L\left\langle \mathcal {A}_2(\sigma ), V \right\rangle \left( 1^\lambda , \alpha , x, \bot \right) \), \(r_\mathcal {A}\) is the random tape of \(\mathcal {A}_1\) and \(\mathcal {A}_2\), and \(r_V\) is the random tape of V.

  • \(\mathcal {D}_2^\lambda \): Compute the setup \(\alpha \leftarrow \textsf{Setup}(1^\lambda )\) and then compute \((x, \sigma ) \leftarrow \mathcal {A}_1(1^\lambda , \alpha )\), then output \((\alpha , r_\mathcal {A}, r_V, x, \tau , w) \leftarrow \mathcal {E}^{\mathcal {O}}(1^\lambda , \alpha , x)\), where \(\mathcal {O}\) is an oracle which provides an execution of \(\varPi '_L\left\langle \mathcal {A}_2(\sigma ), V \right\rangle \left( 1^\lambda , \alpha , x, \bot \right) \), and allows for rewinding of the protocol and choosing the randomness of \(\mathcal {A}_2\) during each round.

With respect to these distributions, for all \(\lambda \), the following holds:

  1. 1.

    The distributions \(\mathcal {D}_1^\lambda \) and \({\mathcal {D}_2^\lambda }\big |_{\alpha , r, x, \tau }\) are identical, where \({\mathcal {D}_2^\lambda }\big |_{\alpha , r, x, \tau }\) is the restriction of \(\mathcal {D}_2\) to the first four components of the tuple \((\alpha , r, x, \tau , w)\).

  2. 2.

    It holds that \(\Pr \left[ V\text { accepts and }(x,w) \notin R_L :(\alpha , r, x, \tau , w) \leftarrow \mathcal {D}_2^\lambda \right] < \textsf{negl}(\lambda ).\)

4 Defining Multilinear Polynomial Commitments in the MPC Model

In this section, we discuss how to define a polynomial commitment scheme which works in the MPC model starting with a discussion on how the polynomial is distributed across all of the M many S-space-bounded parties. Let M be a power of 2 and let \(\mathcal {Y}\in \mathbb {F}^N\) define an n variate multilinear polynomial where \(N=2^n\). We assume that \(\mathcal {Y}\) is distributed across all parties in the following way: Let \(\{I_1, \ldots , I_M \}\) be the canonical partition of \(\{0,1\}^n\), that is, \(I_i = \{(i-1) \cdot N/M, \ldots , i\cdot N/M - 1 \}\). We associate each party \(P_i\) with the subset \(I_i\), and assume that \(P_i\) holds only the partial vector \(\mathcal {Y}_i\) containing elements from \(\mathcal {Y}\) restricted to the indices in \(I_i\). That is,

$$\begin{aligned} \mathcal {Y}_i = (\mathcal {Y}_{\vec b})_{\vec b \in I_i} \ . \end{aligned}$$

Furthermore, for the canonical partition, if i-th party holds the partial vector \(\mathcal {Y}_i\), then they collectively define the multilinear polynomial \(\mathcal {Y}\) where \(\mathcal {Y}= \mathcal {Y}_1 \mid \mid \mathcal {Y}_2 \mid \mid \ldots \mid \mid \mathcal {Y}_M\), where \(\mid \mid \) refers to the concatenation of two vectors.

Definition 4 (Multilinear Polynomial Commitment Syntax)

A multilinear polynomial commitment has the following syntax.

  • \(\textsf{PC}.\textsf{Setup}(1^\lambda , p, 1^n, M) \rightarrow pp\): On input the security parameter \(1^\lambda \) (in unary), a field size p less than \(2^\lambda \), the number of variables \(1^n\) (also in unary), and the number of parties M, the setup algorithm \(\textsf{PC}.\textsf{Setup}\) is a randomized PPT algorithm that outputs a CRS pp whose size is at most \(\textsf{poly}(\lambda , n, \log (M))\).

  • \(\textsf{PC}.\textsf{PartialCom}(pp, \mathcal {Y}_i) \rightarrow (\textsf{com}_i; \mathcal {Z}_i)\): On input a CRS pp, and a vector \(\mathcal {Y}_i \in \mathbb {F}\) which is the description of a multilinear polynomial restricted to the set \(I_i \subset \{0,1\}^n\), \(\textsf{PC}.\textsf{PartialCom}\) outputs a “partial commitment” \(\textsf{com}_i\) as well as the corresponding decommitment \(\mathcal {Z}_i \in \mathbb {Z}\).

  • \(\textsf{PC}.\textsf{CombineCom}(pp, \{ \textsf{com}_i \}_{i \in [M]}) \rightarrow \textsf{com}\): This is an interactive PPT protocol in the MPC model computing the following functionality: each party \(P_i\) holds the string \((pp, \textsf{com}_i)\), they jointly compute the full commitment \(\textsf{com}\) such that \(P_1\) learns \(\textsf{com}\), and outputs it.

  • \(\textsf{PC}.\textsf{PartialEval}(pp, \mathcal {Y}_i, \vec \zeta ) \rightarrow y_i\): On input a CRS pp, a partial description vector \(\mathcal {Y}_i\), and an evaluation point \(\vec \zeta \in \mathbb {F}^n\), \(\textsf{PC}.\textsf{PartialEval}\) is a PPT algorithm that outputs the partial evaluation \(y_i\).

  • \(\textsf{PC}.\textsf{CombineEval}(pp, \{ y_i \}_{i \in [M]}, \vec \zeta ) \rightarrow y\): This is an interactive PPT protocol in the MPC model computing the following functionality: each party \(P_i\) holds the string \((pp, y_i)\), they jointly compute the full evaluation y such that \(P_1\) learns y, and outputs it.

  • \(\textsf{PC}.\textsf{IsValid}(pp, \textsf{com}, \mathcal {Y}, \mathcal {Z}) \rightarrow 0\text { or }1\): On input the CRS pp, a commitment \(\textsf{com}\), a multilinear polynomial \(\mathcal {Y}\) and a decommitment \(\mathcal {Z}\), \(\textsf{PC}.\textsf{IsValid}\) is a PPT algorithm that returns a decision bit.

  • \(\textsf{PC}.\textsf{Open}\): Is a public-coin succinct interactive argument system \(\langle [P_1, \dots , P_M], V \rangle \) in the MPC model, where the statement \((pp, \textsf{com}, \vec \zeta , y)\) and witness \((\mathcal {Y}= \{ \mathcal {Y}_i \}_{i \in [M]}, \mathcal {Z}= \{ \mathcal {Z}_i \}_{i \in [M]})\), with respect to the relation

    $$R = \left\{ \left( (pp, \textsf{com}, \vec \zeta , y), (\mathcal {Y}, \mathcal {Z})\right) :\begin{array}{c} \textsf{IsValid}(pp, \textsf{com}, \mathcal {Y}, \mathcal {Z}) = 1\text {, and} \\ \textsf{ML}(\mathcal {Y},\vec \zeta ) = y \end{array} \right\} ,$$

    where each prover \(P_i\) has input \((pp, \textsf{com}, \vec \zeta , y, \mathcal {Y}_i, \mathcal {Z}_i)\) and V has input \((pp, \textsf{com}, \vec \zeta , y)\).

In the following sections, we assume that \(\textsf{PC}.\textsf{PartialCom}\) works even if we are given streaming access to \(\mathcal {Y}_i\).

We now specify the security properties which are required of \(\textsf{PC}\).

Definition 5 (Multilinear Polynomial Commitment Security)

We require the following three properties from a polynomial commitment scheme:

  • Correctness: For every prime p, number of variables n, and all \(\mathcal {Y}\) and \(\vec \zeta \),

    $$\Pr \left[ 1 = \textsf{PC}.\textsf{Open}(pp, \textsf{com}, \vec \zeta , y; \mathcal {Y}, \mathcal {Z}) :\begin{array}{c} pp \leftarrow \textsf{PC}.\textsf{Setup}(1^\lambda , p, 1^n) \\ \{\textsf{com}_i, \mathcal {Z}_i \leftarrow \textsf{PC}.\textsf{PartialCom}(pp,\mathcal {Y}_i)\}_{i\in [M]} \\ \textsf{com}, \mathcal {Z}\leftarrow \textsf{PC}.\textsf{CombineCom}(pp, \{ \textsf{com}_i \}_{i \in [M]}) \end{array} \right] = 1.$$
  • Computational Binding: For every prime q, number of variables n, number of parties M, and nonuniform polynomial machine \(\mathcal {A}\), there exists a negligible function \(\textsf{negl}: \mathbb {N}\rightarrow [0,1]\) such that for every \(\lambda \in \mathbb {N}\) and every \(z \in \{0,1\}^*\), following holds:

    $$\Pr \left[ \begin{array}{c} b_0 = 1 \\ b_1 = 1 \\ \mathcal {Y}_0 \ne \mathcal {Y}_1 \end{array} :\begin{array}{c} pp \leftarrow \textsf{PC}.\textsf{Setup}(1^\lambda , q, 1^n, M) \\ (\textsf{com}, \mathcal {Y}_0, \mathcal {Y}_1, \mathcal {Z}_0, \mathcal {Z}_1) \leftarrow \mathcal {A}(1^\lambda , pp, z) \\ b_0 \leftarrow \textsf{PC}.\textsf{IsValid}(pp, \textsf{com}, \mathcal {Y}_0, \mathcal {Z}_0) \\ b_1 \leftarrow \textsf{PC}.\textsf{IsValid}(pp, \textsf{com}, \mathcal {Y}_1, \mathcal {Z}_1) \\ \end{array} \right] < \textsf{negl}(\lambda ).$$
  • Properties of \(\textsf{PC}.\textsf{Open}\): The argument \(\textsf{PC}.\textsf{Open}\) satisfies the efficiency, completeness and witness-extended emulation properties defined in Sect. 3.

Looking ahead, in Sect. 6, we will prove the following theorem, showing the existence of a scheme \(\textsf{PC}\) which satisfies the properties above.

Theorem 2

Assume \(\mathcal {G}\) is a group sampler where the Hidden Order Assumption holds. Let n be the number of variables, \(M \le 2^n\) be the number of parties. Then, the scheme defined in Sect. 6.2 is a polynomial commitment scheme (as in Sect. 4) for n variate multilinear polynomials over finite field of prime-order p in the MPC model with M parties with the following efficiency guarantees:

  1. 1.

    \(\textsf{PC}.\textsf{PartialCom}\) outputs a partial commitment of size \(\textsf{poly}(\lambda )\) bits, runs in time \(2^n \cdot \textsf{poly}(\lambda , n, \log (p))\), and uses a single pass over the stream.

  2. 2.

    \(\textsf{PC}.\textsf{PartialEval}\) outputs a partial evaluation of size \(\lceil \log (p) \rceil \), runs in time \((2^n/M) \cdot \textsf{poly}(n, \log (p))\), and uses a single pass over the stream.

  3. 3.

    \(\textsf{PC}.\textsf{CombineCom}\) and \(\textsf{PC}.\textsf{CombineEval}\) have \(O(\log (M))\) rounds, and each party in it requires \(\textsf{poly}(\lambda )\) bits of space.

  4. 4.

    \(\textsf{PC}.\textsf{Open}\) takes \(O(n \cdot \log (M))\) rounds with \(\textsf{poly}(n, \lambda , \log (p), \log (M))\) communication.

  5. 5.

    The verifier in \(\textsf{PC}.\textsf{Open}\) runs in time \(\textsf{poly}(\lambda , n, \log (p))\).

  6. 6.

    Each party \(P_i\) in \(\textsf{PC}.\textsf{Open}\) runs in time \(2^n \cdot \textsf{poly}(n, \lambda , \log (p))\), requires space \(n \cdot \textsf{poly}(\lambda , \log (p), \log (M))\), and uses O(n) passes over its stream.

5 Constructing Succinct Arguments in the MPC Model

Our construction uses the subprotocols \(\textsf{Distribute}\), \(\textsf{Combine}\), and \(\textsf{CalcMerkleTree}\) introduced in [35]. These protocols take \(O(\log _\nu {M})\) rounds and the communication is \(O(S\cdot \nu )\) per round for each machine, for small integral branching factor \(\nu \ge 2\).

5.1 Tools from Prior Work

We import two major tools from previous work. The first is the following lemma, which says that any RAM program can be transformed into a circuit C, where the wire assignments of C can be streamed in time and space both proportional to the time and space of the RAM program, respectively. In addition, the circuit logic can be represented succinctly by low-degree polynomials which have properties amenable to sumcheck arguments.

Lemma 1

(From Blumberg et al. [23]). Let M be an arbitrary (non-deterministic) RAM program that on inputs of length n runs in time T(n) and space \(S(n)\). M can be transformed into an equivalent (non-deterministic) arithmetic circuit C over a field \(\mathbb {F}\) of size \(\textsf{polylog}(T(n))\). Moreover, there exist cubic extensions \(\widehat{\textsf{add}}\) and \(\widehat{\textsf{mult}}\) of the wiring predicates \(\textsf{add}\) and \(\textsf{mult}\) of C that satisfy:

  1. 1.

    C has size \(O(T(n) \cdot \textsf{polylog}(T(n))\).

  2. 2.

    The cubic extensions \(\widehat{\textsf{add}}\) and \(\widehat{\textsf{mult}}\) of C can be evaluated in time \(O(\textsf{polylog}(T(n)))\).

  3. 3.

    an (input,witness) pair (xw) that makes M accept can be mapped to a correct transcript W for C in time \(O(T(n) \cdot \textsf{polylog}(T(n))\) and space \(O(S(n)) \cdot \textsf{polylog}(T(n))\). Furthermore, w is a substring of the transcript W, and any correct transcript \(W'\) for C possesses a witness \(w'\) for (Mx) as a substring.

  4. 4.

    C can be evaluated “gate-by-gate” in time \(O(T(n) \cdot \textsf{polylog}(T(n)))\) and space \(O(S(n) \cdot \textsf{polylog}(T(n)))\).

  5. 5.

    The prover’s sumcheck messages can be computed in space \(O(S(n) \cdot \textsf{polylog}(T(n)))\).

5.2 Notation

We make the following notational assumptions about the MPC algorithm \(\varPi _L\) which verifies membership in L.

Let R be the number of rounds that \(\varPi _L\) takes. In each round \(r \in [R]\) of an execution of \(\varPi _L\), the behavior of party \(i \in [M]\) is described as a succinct RAM program \(\textsf{NextSt}(i,r,\cdot )\). Thus the program \(\textsf{NextSt}\) is a succinct representation of the entire protocol \(\varPi _L\). We assume \(\textsf{NextSt}\) has size much less than S. For convenience, we write \(\textsf{NextSt}_{i,r}(\cdot ) = \textsf{NextSt}(i, r, \cdot )\). We assume that \(\textsf{NextSt}_{i,r}\) takes a string \(\textsf{st}_{i, r-1}||\textsf{msg}^{\textsf{in}}_{i, r-1}\) as an input and outputs string \(\textsf{st}_{i,r}||\textsf{msg}^{\textsf{out}}_{i, r}\), where \(\textsf{st}_{i,r}\) is the internal, private state of party i in round r and \(\textsf{msg}^{\textsf{in}}_{i, r-1}\) is the list of messages which party i received in round \(r-1\), and \(\textsf{msg}^{\textsf{out}}_{i, r}\) are the outgoing messages of party i in round r. Note that the space of each party is limited to \(S\) bits, so in particular \(|\textsf{st}_{i, r}||\textsf{msg}^{\textsf{in}}_{i,r}||\textsf{msg}^{\textsf{out}}_{i,r}| \le S\) for each \(i\in [M]\) and \(r\in [R]\). We assume that the first-round private state \(\textsf{st}_{i,0}\) of each party i is equal to its private input \((x, w_i)\) (or x if \(i = M+1\)). In addition, we assume that \(\textsf{msg}^{\textsf{out}}_{i,r} = \{ (j, \ell _j, m_j) \}_j\), where each triple \((j, \ell _j, m_j)\) means that party i should send message \(m_j\) to party j, and that party j should store this message at position \(\ell _j\) in \(\textsf{msg}^{\textsf{in}}_{j, r-1}\). Finally, we assume that if r is the final round then \(P_1\) writes 1 to the first position of \(\textsf{st}_{1,r}\) iff \(x \in L\).

5.3 The Construction

The main construction of a succinct argument in the MPC model works as follows. First, we construct a succinct argument for the following scenario. Fix a round r and corresponding starting states \(\textsf{st}_{i,r-1} || \textsf{msg}^{\textsf{in}}_{i,r-1}\) for each party \(i \in [M]\), and let \(\pi _{r-1}\) be a Merkle commitment to the concatenation of all these starting states. Let \(\textsf{st}_{i,r} || \textsf{msg}^{\textsf{out}}_{i,r} || \textsf{msg}^{\textsf{in}}_{i,r}\) be the state of party i after an honest execution of round r, and let \(\pi _r\) be a Merkle commitment to the concatenation of all these end states. Assuming V has x, \(\pi _{r-1}\), and \(\pi _r\), the goal is to convince V that \(\pi _r\) is a commitment to states which have been obtained by an honest round-r interaction, starting with the states committed to by \(\pi _{r-1}\). If we construct an argument for this language, and this argument satisfies witness-extended emulation, this is sufficient for achieving an argument system which verifies an honest execution of the full protocol \(\varPi _L\) with respect to a witness for L.

In the following, we construct such a “round verification protocol,” which is our main technical contribution. In Sect. 5.4, we show how to use this round verification protocol to build an argument system for L.

To start, we define a new machine, which we call \(\textsf{NextSt}'\). As before, we write \(\textsf{NextSt}'_{i,r}(\cdot ) = \textsf{NextSt}'(i, r, \cdot )\). Let

$$\textsf{NextSt}'_{i,r}(\pi _{r-1}, \pi _r, \textsf{st}_{i,r-1}, \textsf{msg}^{\textsf{in}}_{i,r-1}, \rho _{i, r-1}, \rho _{i, r}, \{ \rho _{i\rightarrow j, r} \}_j ) = 1$$

if the all following holds:

  • \(\rho _{i,r-1}\) is an opening of \(\pi _{r-1}\) to \((\textsf{st}_{i,r-1}, \textsf{msg}^{\textsf{in}}_{i,r-1})\) at position i,

  • \(\rho _{i,r}\) is an opening of \(\pi _r\) to \(\textsf{st}_{i,r} || \textsf{msg}^{\textsf{out}}_{i,r} || \textsf{msg}^{\textsf{in}}_{i,r}\) at position i, where

    $$\textsf{NextSt}_{i,r}(\textsf{st}_{i,r-1}, \textsf{msg}^{\textsf{in}}_{i,r-1}) = \textsf{st}_{i,r} || \textsf{msg}^{\textsf{out}}_{i,r}||\textsf{msg}^{\textsf{in}}_{i,r},$$
  • Writing \(\textsf{msg}^{\textsf{out}}_{i,r}\) as \(\{ (j, \ell _j, m_j) \}_j\), for each j, \(\rho _{i\rightarrow j, r}\) is an opening to \(m_j\) at position \(\ell _j\) in \(\textsf{msg}^{\textsf{in}}_{j,r}\).

Otherwise, let

$$\textsf{NextSt}'_{i,r}(\pi _{r-1}, \pi _r, \textsf{st}_{i,r-1}, \textsf{msg}^{\textsf{in}}_{i,r-1}, \rho _{i, r-1}, \rho _{i, r}, \{ \rho _{i\rightarrow j, r} \}_j ) = 0.$$

Note that since \(\textsf{NextSt}_{i,r}\) is succinct, \(\textsf{NextSt}'_{i,r}\) is also succinct. Let \(C_{i,r}\) be the circuit corresponding to \(\textsf{NextSt}'_{i,r}\) via Lemma 1. Also from Lemma 1, party i can stream the gate assignments \(W_{i,r}\) of \(C_{i,r}\) in space proportional to the space taken by an execution of \(\textsf{NextSt}'_{i,r}\).

We take an approach inspired by that of [22, 23] in constructing a sumcheck polynomial that encodes the computation, and using a polynomial commitment to allow for a succinct verifier. Let \(s = \lceil \log T' \rceil \), where \(T'\) is the number of wires in \(C_{i,r}\) (which is constant across i and r). We can index every wire in \(C_{i,r}\) with some string \(\vec x \in \{0,1\}^s\). Define the polynomial \(\hat{W}_{i,r}(X_1, \dots , X_s)\) to be the multilinear extension of \(W_{i,r}\), i.e., for all \(\vec x \in \{0,1\}^s\), \(W_{i,r}(\vec x)\) is the value that \(W_{i,r}\) assigns to wire \(\vec x\). Now, letting \(m = \lceil \log M\rceil \), we can index each party by a string \(\vec z \in \{0,1\}^m\). Define \(\hat{W}_r(X_1, \dots , X_s, Z_1, \dots , Z_m)\) to be the multilinear polynomial such that \(\hat{W}_r(\vec x, \vec z ) = \hat{W}_{i,r}(\vec x)\), where i is the index which corresponds to \(\vec z\). Let \(\widehat{\textsf{add}}(X_1, \dots , X_{3s})\) be the succinct, low-degree polynomial from Lemma 1 where \(\widehat{\textsf{add}}(\vec x_1, \vec x_2, \vec x_3) = 1\) if in \(C_{i,r}\) the unique gate which has input wires \(\vec x_1\) and \(\vec x_2\) and output wire \(\vec x_3\) is an addition gate. Note that \(\widehat{\textsf{add}}\) does not depend on i (or r for that matter) since, except for some hardcoded input wires, \(C_{i,r} = C_{i',r'}\) for all \(i, i', r, r'\). Similarly, define \(\widehat{\textsf{mult}}(X_1, \dots , X_{3\,s})\). Finally, define \(\widehat{\textsf{inout}}(X_1, \dots , X_{3s})\) so that \(\widehat{\textsf{inout}}(\vec x_1, \vec x_2, \vec x_3) = 1\) if either \(\vec x_3\) is an input wire which is known by V, or \(\vec x_3\) is an output wire which is known by V and \(\vec x_1\) and \(\vec x_2\) are the input wires for the gate whose output wire is \(\vec x_3\). Define \(\hat{I}(X_1, \dots , X_m)\) to be the multilinear polynomial such that \(\hat{I}(\vec x)\) is the corresponding bit of \(\pi _{r-1}\) (or \(\pi _r\)) if \(\vec x\) is an input wire which takes the value of a bit of \(\pi _{r-1}\) (or \(\pi _r\), respectively), and is the corresponding bit of the statement to the argument system if \(r = 0\) \(\vec x\) is an input wire which takes the value of the statement, and is 1 if \(\vec x\) is an output wire which V knows should be 1.

Given above, we can define the polynomial g as follows:

$$\begin{aligned} g(\vec X_1, \vec X_2, \vec X_3, \vec Z ) &= \widehat{\textsf{add}}(\vec X_1, \vec X_2, \vec X_3) (\hat{W}_r( \vec X_3, \vec Z) - (\hat{W}_r( \vec X_1, \vec Z) + \hat{W}_r(\vec X_2, \vec Z ))) \\ {} &+ \widehat{\textsf{mult}}(\vec X_1, \vec X_2, \vec X_3) (\hat{W}_r( \vec X_3,\vec Z ) - (\hat{W}_r( \vec X_1, \vec Z ) \cdot \hat{W}_r( \vec X_2, \vec Z))) \\ {} &+ \widehat{\textsf{inout}}(\vec X_1, \vec X_2, \vec X_3) (\hat{W}_r( \vec X_3, \vec Z) - \hat{I}(\vec X_3)). \end{aligned}$$

With this definition, g vanishes on all boolean inputs if and only if \(\hat{W}_r\) encodes transcripts of the correct computations of each party i with respect to starting states committed to in \(\pi _{r-1}\) and ending states committed to in \(\pi _r\), and if all messages sent by i have been stored in the respective \(\textsf{msg}^{\textsf{in}}_{j,r}\). For \(q \in \mathbb {Z}_p\), let \(h_q(\vec X) = g(\vec X) \cdot \prod _{\beta \in [m + 3\,s]} (1 - (1 - q^{2^{\beta -1}})X_i)\). Then, \(h_q(\vec x) = g(\vec x) \cdot q^{\textsf{bin}^{-1}(\vec x)}\) for all \(\vec x \in \{0,1\}^{m+3\,s}\), where \(\textsf{bin}^{-1}(\vec X)\) is the integer represented by the binary representation \(\vec X\). We now have defined the polynomials required for the protocol below. If \(P_1, \dots P_M\) can collectively construct the prover’s sumcheck messages for the polynomial \(h_q\) for a randomly chosen q, then this is sufficient to build an argument that convinces V that g vanishes on the boolean hypercube. We now describe the protocol, assuming the provers have an efficient subprotocol \(\textsf{CalcSumcheckProverMsg}\) (defined below) for constructing their responses. This protocol is heavily inspired by the IOP in [22]. However, that protocol was significantly simpler, since in their setting, there is only one prover who can stream the whole polynomial \(\hat{W}_r\). In contrast, we have the task of showing that it is possible to construct the prover’s sumcheck responses in a round-efficient way, even given that \(\hat{W}_r\) is spread across many different machines.

figure a

The \(\textsf{CalcSumcheckProverMsg}\) subprotocol 

We now show how the parties \(P_1, \dots , P_M\) can generate the sumcheck prover’s polynomials \(f_\gamma \) in a round- and space-efficient manner. For each round \(\gamma \), the honest \(f_\gamma (X)\) is defined to be the following univariate polynomial:

$$f_\gamma (X) = \sum _{\vec x \in \{0,1\}^{m+3\,s - \gamma }} h_q(\vec \zeta , X, \vec x),$$

for the random vector \(\vec \zeta \) chosen by the verifier in previous rounds. (In round one, \(\vec \zeta \) is the empty vector of length 0.) Recall that \(h_q(\vec X) = g(\vec X) \cdot \prod _{i \in [m + 3\,s]} (1 - (1 - q^{2^i}X_i))\), for g as defined below (setting \(\vec X = ( \vec X_1, \vec X_2, \vec X_3, \vec Z)\)):

$$\begin{aligned} g(\vec X_1, \vec X_2, \vec X_3, \vec Z ) &= \widehat{\textsf{add}}(\vec X_1, \vec X_2, \vec X_3) (\hat{W}_r( \vec X_3, \vec Z) - (\hat{W}_r( \vec X_1, \vec Z) + \hat{W}_r(\vec X_2, \vec Z ))) \\ {} &+ \widehat{\textsf{mult}}(\vec X_1, \vec X_2, \vec X_3) (\hat{W}_r( \vec X_3,\vec Z ) - (\hat{W}_r( \vec X_1, \vec Z ) \cdot \hat{W}_r( \vec X_2, \vec Z))) \\ {} &+ \widehat{\textsf{inout}}(\vec x_1, \vec x_2, \vec x_3) (\hat{W}_r( \vec X_3, \vec Z) - \hat{I}(\vec X_3)). \end{aligned}$$

Observe that \(h_q(\vec X)\) can be written as

$$\begin{aligned} h_q(\vec X_1, \vec X_2, \vec X_3, \vec Z ) &= \sum _{i=j}^5 p_j( \vec X_1, \vec X_2, \vec X_3, \vec Z ), \end{aligned}$$

where \(p_5( \vec X_1, \vec X_2, \vec X_3, \vec Z) = \widehat{\textsf{inout}}(\vec x_1, \vec x_2, \vec x_3) \cdot \hat{I}(\vec X_3)\) and can be computed locally by each party,

$$p_4( \vec X_1, \vec X_2, \vec X_3, \vec Z) = p'_4( \vec X_1, \vec X_2, \vec X_3, \vec Z) \hat{W}_r( \vec X_1 ,\vec Z) \hat{W}_r( \vec X_2,\vec Z ),$$

and for all \(j \in \{1, 2, 3\}\)

$$p_j( \vec X_1, \vec X_2, \vec X_3 , \vec Z) = p'_j( \vec X_1, \vec X_2, \vec X_3, \vec Z) \hat{W}_r( \vec X_j, \vec Z) \ .$$

Here each \(p'_j\) is a succinct low-degree polynomial known by V. Thus, to compute the polynomial \(f_\gamma (X)\) in small rounds and space, it is sufficient to compute

$$\begin{aligned} \sum _{\vec x \in \{0,1\}^{m+3\,s - \gamma }} p_j(\vec \zeta , X, \vec x) \end{aligned}$$
(1)

in small rounds and space for each \(j \in [4]\) (and \(p_5\) locally) and sum the results.

We now show how to do this, focusing first on the case of \(i \in \{1,2,3\}\). Note that in every round except the first, computing the sum in Eq. (1) involves computing \(O(2^{|\vec x |})\) interpolations of \(\hat{W}_r\). Since the evaluations of \(\hat{W}_r\) are distributed among the \(M\) parties \(P_1, \dots , P_M\), doing these interpolations requires communication among these parties. If we interpolated \(p_j(\vec \zeta , X, \vec x)\) for each \(\vec x\) and then summed the result, then even if the communication per interpolation is a constant number of rounds, this would mean that computing Eq. (1) would involve a number of rounds linear in the total computation time. So we need something slightly more clever than the naive strategy.

Before we go on, we note that for Eq. (1), it suffices to compute

$$\begin{aligned} \sum _{\vec x \in \{0,1\}^{m+3\,s - \gamma }} p_j(\vec \zeta , \zeta ', \vec x), \end{aligned}$$

for each \(\zeta ' \in \{0, \dots , \delta \}\), where \(\delta \) is the degree of \(p_j\). Once we have these \(\delta + 1\) field elements, we can interpolate Eq. (1) in constant space. So we focus on computing this; i.e., we focus on computing the following for an arbitrary \(\vec \zeta \in \mathbb {F}^{\gamma }\)

$$\begin{aligned} \sum _{\vec x \in \{0,1\}^{m+3\,s - \gamma }} p_j(\vec \zeta , \vec x). \end{aligned}$$
(2)

Note that each term in the sum above is of the form \(p'_j(\vec \zeta , \vec x)\hat{W}_r(\vec \zeta ', \vec x')\), where \(\vec \zeta '\) is obtained from \(\vec \zeta \) by deleting some (possibly zero) indices, and \(\vec x'\) is obtained from \(\vec x\) in the same manner. The key insight which allows us to compute Eq. (2) in low rounds is as follows. Imagine that \(\vec \zeta ' = (\zeta _1)\) is a single element. Then, by the multilinearity of \(\hat{W}_r\), it follows that \(\hat{W}_r(\zeta _1, \vec x') = \zeta _1 \cdot \hat{W}_r(1, \vec x') + (1 - \zeta _1) \cdot \hat{W}_r(0, \vec x')\). In the same way, if \(\vec \zeta ' = (\zeta _1, \zeta _2)\), then

$$\begin{aligned} \hat{W}_r(\zeta _1, \zeta _2, \vec x') &= \zeta _1 \cdot \hat{W}_r(1, \zeta _2, \vec x') + (1 - \zeta _1) \cdot \hat{W}_r(0, \zeta _2, \vec x') \\ &= \zeta _1 \cdot \left( \zeta _2 \cdot \hat{W}_r(1, 1, \vec x') + (1 - \zeta _2) \cdot \hat{W}_r(1, 0, \vec x') \right) \\ &+ (1 - \zeta _1) \cdot \left( \zeta _2 \cdot \hat{W}_r(0, 1, \vec x') + (1 - \zeta _2) \cdot \hat{W}_r(0, 0, \vec x') \right) . \end{aligned}$$

By a simple use of induction, we can write \(\hat{W}_r(\vec \zeta ', \vec x')\), for arbitrary \(\vec \zeta '\), as

$$\begin{aligned} \hat{W}_r(\vec \zeta ', \vec x') = \sum _{\vec y \in \{0,1\}^{|\vec \zeta ' |}} c_{\vec \zeta ', \vec y} \cdot \hat{W}_r(\vec y, \vec x') \end{aligned}$$
(3)

where

$$\begin{aligned} c_{\vec \zeta ', \vec y} &= \prod _{j = 1}^{|\vec \zeta ' |} \left( \zeta _j \cdot y_j + (1 - \zeta _j)(1 - y_j) \right) = \prod _{j = 1}^{|\vec \zeta ' |} \left\{ \zeta _j\text { if }y_j = 1\text {, otherwise }(1-\zeta _j) \right\} . \end{aligned}$$

It follows that we can rewrite Eq. (2) as

$$\begin{aligned} \sum _{\vec x \in \{0,1\}^{m+3\,s - \gamma }} p_j(\vec \zeta , \vec x) &= \sum _{\vec x \in \{0,1\}^{m+3\,s - \gamma }} p'_j(\vec \zeta , \vec x) \left( \sum _{\vec y \in \{0,1\}^{|\vec \zeta ' |}} c_{\vec \zeta ', \vec y} \cdot \hat{W}_r(\vec y, \vec x') \right) \end{aligned}$$
(4)
$$\begin{aligned} &= \sum _{\vec x \in \{0,1\}^{m+3\,s - \gamma }} \sum _{\vec y \in \{0,1\}^{|\vec \zeta ' |}} c'_{\vec x, \vec \zeta ', \vec y} \cdot \hat{W}_r(\vec y, \vec x'), \end{aligned}$$
(5)

where \(c'_{\vec x, \vec \zeta ', \vec y}\) is computable in space proportional to the space required to compute \(c_{\vec \zeta ', \vec y}\) and \(p'_j(\vec \zeta , \vec x)\).

Since Eq. (2) can be written as a weighted sum of evaluations of \(\hat{W}_r\) on points in the boolean hypercube, and since all such evaluations are partitioned across the provers, each prover can compute a component of the sum by streaming the computation in space \(O(S)\), and then the provers can all sum their components together using a large-arity tree in constant rounds.

The case where \(j = 4\) is more involved. Recall that the goal is to compute

$$\begin{aligned} \sum _{\vec x \in \{0,1\}^{m+3\,s - \gamma }} p_4(\vec \zeta , \vec x), \end{aligned}$$
(6)

for some given \(\vec \zeta \in \mathbb {F}^\gamma \). We first handle the case where \(\gamma \le 3s\). In this case,

$$\begin{aligned} \sum _{\vec x \in \{0,1\}^{m+3\,s - \gamma }} p_4(\vec \zeta , \vec x) = \sum _{\vec z \in \{0,1\}^m} \sum _{\vec x' \in \{0,1\}^{3\,s - \gamma }} p'_4(\vec \zeta , \vec x', \vec z) \hat{W}_r(\vec \zeta _1, \vec z) \hat{W}_r(\vec \zeta _2, \vec z), \end{aligned}$$
(7)

where \(\vec \zeta _1\) and \(\vec \zeta _2\) are both a combination of \(\vec \zeta \) and \(\vec x\). Observe that from the discussion above, for each \(\vec z \in \{0,1\}^m\), the values \(\{ \hat{W}_r(\vec \zeta _j) \}_{\vec \zeta _j}\) can be streamed by a single party \(P_i\), where \(\vec z\) is the binary representation of i, by streaming the values of \(\hat{W}_r\) in the boolean hypercube and then using Eq. (3). Thus, for each \(\vec z\), the inner sum \(\sum _{\vec x' \in \{0,1\}^{3\,s - \gamma }} p'_4(\vec \zeta , \vec x', \vec z) \hat{W}_r(\vec \zeta _1, \vec z) \hat{W}_r(\vec \zeta _2, \vec z)\) can be computed by a single party in \(O(S)\) space. The parties can then sum these terms in a large-arity tree, thus computing Eq. (6) in O(1) rounds and \(O(S)\) space.

We now consider the case where \(\gamma > 3s\). Write \(\gamma = 3\,s + m'\), for some \(m' > 1\), and write \(\vec \zeta = (\vec \zeta _1, \vec \zeta _2, \vec \zeta _3, \vec \zeta _4)\). In this case,

$$\begin{aligned} \sum _{\vec x \in \{0,1\}^{m+3\,s - \gamma }} p_4(\vec \zeta , \vec x) &= \sum _{\vec z' \in \{0,1\}^{m - m'}} p'_4(\vec \zeta , \vec z') \hat{W}_r(\vec \zeta _1, \vec \zeta _4, \vec z') \hat{W}_r(\vec \zeta _2, \vec \zeta _4, \vec z'), \end{aligned}$$

and then again by Eq. (3), this is equal to

$$\begin{aligned} \sum _{\vec z' \in \{0,1\}^{m - m'}} \textsf{term}_{\vec z'} \end{aligned}$$
(8)

where \(\textsf{term}_{\vec z'}\) is the following:

$$\begin{aligned} p'_4(\vec \zeta , \vec z') \left( \sum _{\vec y_4^{(1)} \in \{0,1\}^{m'}} c_{\vec \zeta _4, \vec y_4^{(1)}} \cdot \hat{W}_r(\vec \zeta _1, \vec y_4^{(1)}, \vec z') \right) \left( \sum _{\vec y_4^{(2)} \in \{0,1\}^{m'}} c_{\vec \zeta _4, \vec y_4^{(2)}} \cdot \hat{W}_r(\vec \zeta _2, \vec y_4^{(2)}, \vec z') \right) . \end{aligned}$$
(9)

Note that for any \(\hat{W}_r(\vec \zeta _j, \vec y_4^{(2)}, \vec z')\), there is a party (indexed by \((\vec y_4^{(2)}, \vec z')\) who can compute this value locally, so WLOG, we assume each party has precomputed this corresponding value. Observe that \(\vec z'\) defines a subset of parties, indexed by the set \(S_{\vec z'} = \{ \vec y_4, \vec z' :y_4 \in \{0,1\}^{m'} \}\), and distinct from \(S_{\vec z''}\) for all \(\vec z'' \ne \vec z'\). Observe also that for each \(\vec z'\), to compute \(\textsf{term}_{\vec z'}\), only the parties in \(S_{\vec z'}\) must interact, and they can compute the sum in Eq. (9) in constant rounds and \(O(S)\) space by first computing the two inner sums via large-arity trees as in all the previous cases, and then multiplying these two summed values together and weighting them according to \(p'_4(\vec \zeta , \vec z')\). Thus, to compute the outer sum, for each \(\vec z'\), the parties in \(S_{\vec z'}\) can interact in the manner described above, simultaneously with all other \(S_{\vec z''}\). Then, once each set has their term of the sum, representative parties for each of the sets can again use a large-arity tree to obtain the final result in constant rounds and \(O(S)\) space.

We now give the description of \(\textsf{CalcSumcheckProverMsg}\).

figure b
figure c

Efficiency  

We now discuss the efficiency of the \(\textsf{VerifyRound}\) protocol.

Round complexity. The protocol \(\textsf{VerifyRound}\) can be separated into two steps: first, the provers commit to the polynomial \(\hat{W}_r\) and receive a random q from V, and then second, the parties carry out a sumcheck protocol. The first step is dominated by the subprotocols \(\textsf{PC}.\textsf{CombineCom}(\alpha , \{ \phi _i \}_{i \in [M]})\) and \(\textsf{Distribute}_\nu (q)\). Note that since \(\nu = \lambda \), and each of these two protocols take \(O(\log _\nu (M))\) rounds, the first step takes a constant number of rounds. The second step takes \((m+3\,s) \cdot (R_\textsf{CalcSumcheckProverMsg}+ C_1) + C_2 \cdot R_{\textsf{PC}.\textsf{Open}}\) rounds, where \(m+3s\) is \(\textsf{polylog}(N)\), \(R_\textsf{CalcSumcheckProverMsg}\) and \(R_{\textsf{PC}.\textsf{Open}}\) are the number of rounds required for the \(\textsf{CalcSumcheckProverMsg}\) and \(\textsf{PC}.\textsf{Open}\) subprotocols respectively, and \(C_1\) and \(C_2\) are constants. As explained in Sect. 6, \(R_{\textsf{PC}.\textsf{Open}} = \textsf{polylog}(|\hat{W}_r |)\), which is \(\textsf{polylog}(N)\). As explained in Sect. 5.3, \(R_\textsf{CalcSumcheckProverMsg}\) is constant. Thus, \((m+3\,s) \cdot (R_\textsf{CalcSumcheckProverMsg}+ C_1) + C_2 \cdot R_{\textsf{PC}.\textsf{Open}}\) is \(\textsf{polylog}(N)\). It follows that the entire protocol \(\textsf{VerifyRound}\) takes \(\textsf{polylog}(N)\) rounds.

Space complexity per party. By the properties of the polynomial commitment and the sumcheck protocol, the verifier takes space \(\textsf{polylog}(N) \cdot \textsf{poly}(\lambda )\). The provers each take space \(S\cdot \textsf{poly}(\lambda )\); this follows from the following:

  • Each party’s polynomial \(\hat{W}\) which encodes the wire assignments of \(C_{i,r}\) can be streamed in space \(O(S)\) by Lemma 1, and \(\textsf{PC}.\textsf{PartialCom}\) works assuming streaming access to \(\hat{W}\).

  • \(\textsf{PC}.\textsf{CombineCom}\) and \(\textsf{PC}.\textsf{Open}\) are MPC protocols where the provers require at most \(S\cdot \textsf{poly}(\lambda )\) space, as per the properties of \(\textsf{PC}\).

  • \(\textsf{CalcSumcheckProverMsg}\) is an MPC protocol where the provers require at most \(S\cdot \textsf{poly}(\lambda )\) space, as discussed in the previous section.

5.4 From Round Verification to a Full Argument

In this section, we use the \(\textsf{VerifyRound}\) protocol from Sect. 5 and the polynomial commitment \(\textsf{PC}\) from Sect. 6.2 to achieve a succinct argument for a language L, assuming L has a MPC verification algorithm \(\varPi _L\) as described in Sect. 5.2.

The formal description of the argument system is as follows. Assume the original protocol \(\varPi _L\) runs for R rounds.

figure d

Efficiency. The round complexity of the above argument is \(R \cdot \textsf{polylog}(N)\), where R is the number of rounds taken by \(\varPi _L\). The space complexity is \(S\cdot \textsf{poly}(\lambda )\) per party. The round and space complexity of the argument follows from those of \(\textsf{VerifyRound}\) discussed above.

Security. We have the following theorem and defer its proof to the full version.

Theorem 3

Assume the polynomial commitment scheme \(\textsf{PC}\) satisfies the security properties in Definition 5. Then the argument system above satisfies witness-extended emulation with respect to the language L.

6 Constructing Polynomial Commitments in the MPC Model

Our construction extensively uses the polynomial commitment scheme of Block et al. [22], which we describe in detail in the full version. To describe our construction, we first introduce the distributed streaming model in Sect. 6.1, then describe the construction in Sect. 6.2 with its proof in Sect. 6.3.

6.1 Distributed Streaming Model

Looking ahead to our goal of designing succinct arguments in the MPC model, we consider an enhancement of the streaming model [22] to the MPC setting. We refer to the model as the distributed streaming model: Let \(\mathcal {Y}\in \mathbb {F}^N\) be some multilinear polynomial and let \(\{\mathcal {Y}_i \in \mathbb {F}^{N/M}\}_{i \in [M]}\) be the set of partial descriptions vectors such that \(\mathcal {Y}= \mathcal {Y}_1 \mid \mid \mathcal {Y}_2 \mid \mid \ldots \mid \mid \mathcal {Y}_M\). In the distributed streaming model, we assume that each of the S-space bounded parties \(P_i\) have streaming access only to the elements of their partial description vector \(\mathcal {Y}_i\), where \(S \ll N/M\).

While adapting Block et al. [22] to the distributed streaming model, we need to ensure two properties: (a) low-space provers and (b) a low-round protocol. A naive low space implementation is achieved by blowing up the number of rounds of interaction. Similarly, a naive polylogarithmic round protocol is achieved by simply having each party communicate their whole input (in a single round) to a single party, but this incurs high space for the prover. Achieving the two properties together is the main technical challenge. We build a low-space and a low-round protocol by heavily exploiting the algebraic structure of [22].

6.2 Our New Construction

To support n variate polynomials, recall that each party \(P_i\) holds a partial vector \(\mathcal {Y}_i\) over \(\mathbb {F}\) of size N/M and the corresponding index set \(I_i = \{(i-1) \cdot N/M, \ldots , iN/M - 1\}\). The \(\textsf{PC}.\textsf{Setup}\) algorithm is identical to [22], and the \(\textsf{PC}.\textsf{PartialCom}\) and \(\textsf{PC}.\textsf{CombineCom}\) collectively implement the commitment algorithm of [22], and \(\textsf{PC}.\textsf{Open}\) implements their open algorithm.

\(\underline{\textsf{PC}.\textsf{Setup}(1^\lambda , p, 1^n, M):}\) The public parameters pp output by \(\textsf{PC}.\textsf{Setup}\) contains the tuple \((g,p,\mathbb {G})\) where g is a random element of the hidden order group \(\mathbb {G}\) and q is a sufficiently large integer odd integer (i.e., \(q > p \cdot 2^{n \cdot \textsf{poly}(\lambda )}\)).

\(\underline{\textsf{PC}.\textsf{PartialCom}(pp, \mathcal {Y}_i):}\) Each of the parties locally run this algorithm to compute their partial commitment to the polynomial. In particular, on inputs \(pp = (q, g, \mathbb {G})\) and the partial sequence \(\mathcal {Y}_i \in \mathbb {F}^{N/M}\), the algorithm \(\textsf{PartialCom}\) outputs a commitment \(\textsf{com}_i\) to \(\mathcal {Y}_i\) by encoding its elements as an integer in base q. Specifically, \(\textsf{com}_i = g^{z_i}\) where

$$\begin{aligned} z_i = q^{(i-1)N/M} \left( \sum _{\vec b \in \{0,1\}^{n-m}} q^{\vec b} \cdot {\mathcal {Y}_i}_{\vec b} \right) \ , \end{aligned}$$
(10)

and private partial decommitment is the sequence \(\mathcal {Z}_i = \textsf{lift}{(\mathcal {Y}_i)}\). We give the formal description of this algorithm in the streaming model below.

figure e

\(\underline{\textsf{PC}.\textsf{CombineCom}(pp, \{\textsf{com}_i\}):}\) Parties each holding their partial commitments \(\textsf{com}_i\) want to jointly compute a full commitment \(\textsf{com}= \prod _{i \in [M]} \textsf{com}_i\). For this, parties run the \(\textsf{Combine}\) subprotocol on their inputs with \(\textsf{op}\) as the group multiplication and \(P_1\) as the receiver. Then, \(P_1\) outputs \(\textsf{com}\) as the commitment.

\(\underline{\textsf{PC}.\textsf{PartialEval}(pp, \mathcal {Y}_i, \vec \zeta ):}\) Each of the parties locally run this algorithm to compute their contributions to the evaluation. In particular, on input the CRS pp, a partial vector \(\mathcal {Y}_i \in \mathbb {F}^{N/M}\) and a evaluation point \(\vec \zeta \in \mathbb {F}^n\), the partial evaluation algorithm outputs \(y_i \in \mathbb {F}\) such that

$$\begin{aligned} y_i = \sum _{\vec b \in \{0,1\}^{n-m}} {\mathcal {Y}_i}_{\vec b} \cdot \overline{\chi }(\vec \zeta , \vec b + (i-1) \cdot M) \ . \end{aligned}$$
(11)

We give the formal description of this algorithm in the streaming model below.

figure f

\(\underline{\textsf{PC}.\textsf{CombineEval}(pp, y_i, \vec \zeta ):}\) Parties each holding their partial evaluations \(y_i\) want to jointly compute the full evaluation \(y = \sum _{i \in [M]} y_i\). For this, parties run the \(\textsf{Combine}\) subprotocol on their inputs with the field addition as the associate operator \(\textsf{op}\) and \(P_1\) as the receiver. Then, \(P_1\) outputs y as the evaluation.

\(\underline{\textsf{PC}.\textsf{Open}}\) The \(\textsf{PC}.\textsf{Open}\) algorithm is the natural adaptation of the \(\textsf{Open}\) algorithm in [22] to the distributed streaming model. Specifically, all parties (including V) hold the public parameters \(pp = (q, p, \mathbb {G})\), the claimed evaluation \(y \in \mathbb {F}\), the evaluation point \(\vec \zeta \in \mathbb {F}^n\) and the commitment \(\textsf{com}\). Further, each party \(P_i\) has streaming access to the entries in its partial decommitment vector \(\mathcal {Z}_i\).

figure g

6.3 Proof of Theroem 2

We now prove Theorem 2 – our main theorem statement for multilinear polynomial commitments in the MPC model. The correctness, binding and witness-extended emulation properties follow readily from that of [22]: this is because, for these properties, it suffices to view the cluster of provers as monolithic. In such a setting, the above described polynomial commitment scheme is then identical to that of [22]. Finally, we argue about the efficiency of each of the algorithms next.

Efficiency of \(\textsf{PC}.\textsf{PartialCom}\). In \(\textsf{PC}.\textsf{PartialCom}\) (Sect. 6.2), each party \(P_i\) runs through the stream of \(\mathcal {Y}_i\) once, and for each of the \(2^n/M\) elements performs the following computation: In line 3, it does a single group exponentiation where the exponent is an \(\mathbb {F}\) value, and performs a single group multiplicaton. In line 4, it performs a group exponentiation where the exponent is q. Thus, lines 3–4 results in total runtime of \((2^n/M) \cdot \textsf{poly}(\lambda , \log (p), \log (q))\). On line 5, it performs a single group exponentiation where the exponent is \(q^{(i-1)N/M}\) followed by a single group multiplication. The former requires \((i-1) (N/M) \textsf{poly}(\lambda , \log (q))\) time whereas latter requires \(\textsf{poly}(\lambda )\) run time. Plugging the value of q, results in an overall time of \(2^n \cdot \textsf{poly}(\lambda , n, \log (p))\). The output is a single group elements which require \(\textsf{poly}(\lambda )\) bits, and only one pass over the stream \(\mathcal {Y}_i\) is required.

Efficiency of \(\textsf{PC}.\textsf{CombineCom}\). Recall from Sect. 6.2, that in \(\textsf{PC}.\textsf{CombineCom}\), all parties run the \(\textsf{Combine}\) subprotocol on local inputs of \(\textsf{poly}(\lambda )\) bits. This requires \(O(\log M)\) rounds and each party only requires \(\textsf{poly}(\lambda )\) bits of space.

Efficiency of \(\textsf{PC}.\textsf{PartialEval}\). Recall from Sect. 6.2, each party \(P_i\) runs through the stream of \(\mathcal {Y}_i\) once, and for each of the \(2^n/M\) elements performs the following operations in line 3: (a) computes the polynomial \(\overline{\chi }\) on inputs of size n, and (b) performs a single field multiplication and addition. Thus \(\textsf{PC}.\textsf{PartialEval}\)’s running time is bounded by \((2^n/M) \cdot \textsf{poly}(\lambda , n, \log (p))\), the output is a single field element of \(\lceil \log (p) \rceil \) bits, and only one pass over \(\mathcal {Y}_i\) is required.

Efficiency of \(\textsf{PC}.\textsf{CombineEval}\). Recall from Sect. 6.2, that in \(\textsf{PC}.\textsf{CombineEval}\), all parties run the \(\textsf{Combine}\) subprotocol on local inputs of \(\textsf{poly}(\lambda )\) bits. This requires \(O(\log M)\) rounds and each party only requires \(\textsf{poly}(\lambda )\) bits of space.

Communication/Round Complexity of \(\textsf{PC}.\textsf{Open}\). The round complexity of \(\textsf{PC}.\textsf{Open}\) as described in Sect. 6.2 is dominated by line 2. In particular, line 2 is executed for O(n) times where in each iteration k: parties perform local computations in all lines except 2-(b), 2-(d) and 2-(f). In particular, in 2-(b) (resp., 2-(f)), an instantiation of the \(\textsf{Combine}\) (resp., \(\textsf{Distribute}\))subprotocol is run which requires \(O(\log (M))\) rounds. Additionally, in 2-(d), party \(P_1\) and the verifier engage in a POE protocol which requires \(O(n-k)\) rounds. Therefore, overall, the round complexity of \(\textsf{PC}.\textsf{Open}\) is \(O(n \cdot \log (M))\) rounds. In terms of communication complexity, in each round of the protocol at most \(\textsf{poly}(\lambda , n, \log (p), \log (M))\) bits are transmitted, therefore overall its bounded by \(\textsf{poly}(\lambda , n, \log (p), \log (M))\).

The Efficiency of \(\textsf{PC}.\textsf{Open}\). The verifier efficiency is dominated by its computation in the PoE execution in line 2 of each of the n rounds, which is bounded by \(\textsf{poly}(\lambda , n, \log (p), \log (q))\). Now onto the prover efficiency. The efficiency of each party \(P_i\) is dominated by the n iterative executions of line 2 of the \(\textsf{PC}.\textsf{Open}\)(Sect. 6.2). In each iteration: in line 2-(a), \(P_i\) runs through the stream of \(\mathcal {Y}_i\) once, and for each of the \(2^n/M\) elements performs some \(\textsf{poly}(\lambda , n)\) computation for computing the matrices \(M_{\vec c}\) as well as an O(n) size-product of evaluations of the \(\chi \) function. Further, the prover computation in lines 2-(d) through 2-(g), doesn’t depend on the stream. In particular, its running time is dominated by its computation in lines 2-(d) where \(P_1\) acts as a prover in the PoE protocol where the exponent is of the form \(q^{2^{n-k-1}}\). This results in overall running time of \(2^n \cdot \textsf{poly}(\lambda ,n, \log (p))\). Further, the prover’s space in each of the n iterations is \(\textsf{poly}(\lambda , \log (p), \log (M))\). Finally, in each run of line 2, a single pass over the entire stream is sufficient, resulting in O(n) passes over the stream for each party \(P_i\).