Keywords

1 Introduction

Privacy-preserving data mining (PPDM) is becoming significantly vital as more and more data is collected, analyzed, and shared. In addition to this trend, privacy regulations such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) have boosted the significance of privacy-preserving techniques by necessitating organizations to protect sensitive and personal information. In this regard, various techniques such as differential privacy and homomorphic encryption (HE) are proposed to protect sensitive information. Among these techniques, homomorphic encryption, which is based on lattice-based cryptography, is considered a post-quantum resistant encryption algorithm and one of the most promising solutions to attacks from quantum computers. The technique is often referred to as the “holy grail” of cryptography, as it allows for computation on encrypted data. Hence, much literature [3,4,5] has focused on the execution of privacy-preserving data analysis using homomorphic encryption.

However, homomorphic encryption is often considered impractical due to its poor time performance, albeit with its promising properties. Specifically, the computation of HE circuits typically requires more time at least by the several order of magnitude compared to the construction of plain circuits. In [3], the evaluation of a logistic regression model on the Edinburgh Myocardial Infarction dataset, which consists of 1,253 observations and 10 features, has been reported to require 116 min when implemented within the HE scheme. In contrast, the same process in the plain domain, utilizing a personal computer, can be completed in a matter of seconds. In light of this limitation, much of the recent research has focused on algorithmically [14, 15] improving the time performance of HE, as well as on the use of parallel structures [6] in the hardware construction of HE schemes, in an effort to make the technique practical for deployment.

The most efficient and practical implementation of a fully homomorphic encryption scheme based on the Learning with Errors (LWE) problem is the CKKS scheme with leveled homomorphic encryption (LHE) setting. While the CKKS scheme offers the ability to perform arbitrary computations through the use of bootstrapping techniques, its practical deployment is limited to the LHE setting. In this context, the depth of the circuit must be pre-determined, with the number of multiplications per ciphertext serving as the determining factor. Furthermore, as the scheme is based on arithmetic homomorphic encryption, the majority of algorithms must be approximated using only the basic operations of addition and multiplication.

The primary concern when executing privacy-preserving data mining algorithms in HE is the low latency of matrix operations. Among these operations, the most challenging and time-consuming task to construct within HE is the inverse operation. In the plain domain, the inverse of a matrix can be easily obtained through the use of Gaussian elimination. However, in the encrypted domain, all HE circuits must be designed for the worst-case scenario. Additionally, the encrypted elements in a matrix necessitate comparison operations for all elementary row (or column) operations and time-consuming divisions.

In order to overcome this problem, there are several attempts at designing matrix inverse operations in the context of HE. However, they have been met with limited success due to their naïve implementation, resulting in a significant increase in computational time and multiplicative depth. Cheon et al. [7] use a matrix version of Goldschmidt’s algorithm described in [8] since it can only be operated using additions and multiplications. However, it is not practical to use as it requires knowledge of a threshold value in advance, which is infeasible in the encrypted domain. Therefore, much literature generally uses Newton’s method [9] for matrix inversion in HE [11, 12] because it obtains an approximate matrix inverse using only additions and multiplications in an iterative manner. However, it also has a drawback as it requires many iterations and multiplications.

The issue of multiplicative depth is also crucial when designing homomorphic circuits, as most algorithms in practice use leveled homomorphic encryption (LHE), in which the multiplicative depth is predetermined. In the encrypted domain, the matrix inverse must be approximated using a sequence of matrix multiplications, which significantly increases the multiplicative depth of the circuit. For example, the Newton method requires a multiplicative depth of 43 to approximate the inverse matrix, taking up most of the circuit’s depth and preventing further operations. Although a technique called bootstrapping can increase the multiplicative depth of the ciphertext, it requires a much greater amount of time and is therefore avoided in practical circuit construction.

Therefore, it is crucial to design an efficient matrix inverse operation with fewer depths in leveled homomorphic encryption (LHE). By reducing the number of multiplications per ciphertext in the matrix inverse algorithm, one can design an HE circuit with a shallower multiplicative depth. Additionally, with the same security parameter set, more operations can be added for further computations within a leveled homomorphic encryption scheme or smaller parameters can be chosen for more efficient computation of the circuit.

In short, our contributions are summarized as the following:

  • We present a novel iterative matrix inverse operation. Our technique can reduce the number of depths by nearly a half compared to the Newton’s method, mostly used algorithms in the current literature.

  • We provide mathematical proofs and experimental result comparing two approaches—ours and Newton’s method. Specifically, we demonstrate the convergence speed and required depths of both approaches in theory and implementation.

  • Our matrix inverse algorithm seamlessly integrates with the inverse matrix in HE. We substantiate our claim by presenting experimental results.

2 Background

2.1 Homomorphic Encryption

Homomorphic encryption (HE) is a technique that allows for computations to be performed on the encrypted data without the need for decryption, utilizing a one-to-one model between the client and the server. This is achieved by designing the encryption scheme based on the Learning with Errors (LWE) problem [10], which uses noise as a means of ensuring security. However, as computations are performed on the encrypted data, the noise in the ciphertext accumulates, and if this noise exceeds a certain threshold, the correctness of the decryption process can no longer be guaranteed.

Let \(\mathcal {M}\) and \(\mathcal {C}\) denote the spaces of plaintexts and ciphertexts, respectively. The process of HE is typically composed of four algorithms: key generation, encryption, decryption, and evaluation.

  1. 1.

    Key generation: Given the security parameter \(\lambda \), this algorithm outputs a public key pk, a public evaluation key evk and a secret key sk.

  2. 2.

    Encryption: Using the public key pk, the encryption algorithm encrypts a plaintext \(m \in \mathcal {M}\) into a ciphertext \(ct \in \mathcal {C}\).

  3. 3.

    Decryption: For the secret key sk and a ciphertext ct, the decryption algorithm outputs a plaintext \(m \in \mathcal {M}\).

  4. 4.

    Evaluation: Suppose a function \(f: \mathcal {M}^k \rightarrow \mathcal {M}\) is performed over the plaintexts \(m_1, \cdots , m_k\). Then, the evaluation algorithm takes in ciphertext \(c_1, \cdots , c_k\) corresponding to \(m_1, \cdots , m_k\) and the evaluation key evk to output \(c^*\) such that \(\textsf {Dec}(c^*) = f(m_1, \cdots , m_k)\).

In the field of homomorphic encryption, there are two primary categories of encryption schemes: fully homomorphic encryption (FHE) and leveled homomorphic encryption (LHE). FHE permits any computation to be executed on the encrypted data, while LHE is more restricted in the types of computations that can be performed. These distinctions are due to the various methods used to handle the accumulation of noise in the ciphertext.

FHE utilizes a specialized technique known as bootstrapping to reduce the noise in the ciphertext and increase the multiplicative level of the ciphertext, allowing for further computations to be performed on the encrypted data. However, the use of bootstrapping is a computationally expensive technique and can be time-consuming. In practical applications, LHE is often preferred for its faster performance when working with limited depth circuits. This is because LHE does not rely on the use of bootstrapping and thus is less computationally intensive.

Homomorphic encryption can be categorized in terms of evaluation based on the type of computations that can be performed on the encrypted data. Arithmetic homomorphic encryption allows for basic arithmetic operations such as addition and multiplication to be performed on the encrypted data. Two popular examples are CKKS encryption [1] and BFV [13] encryption schemes, where the CKKS encryption scheme is the latest and the most practical HE solution providing real number arithmetics. Boolean-based homomorphic encryption allows for Boolean operations, such as AND, OR, and NOT, to be performed on the encrypted data. TFHE [15] and FHEW [14] are two examples. The choice of the homomorphic encryption scheme depends on the specific application and the type of computations that need to be performed on the data.

2.2 Arithmetic HE

Arithmetic HE generally uses the usual arithmetic such as addition and multiplication within the limited multiplicative depth which is pre-defined by the encryption parameters. Therefore, one needs to consider the depth of the circuit in advance for the optimal performance since the more depth of the circuit requires larger parameter set resulting in the performance degradation. In the BFV and CKKS schemes, the depth of the circuit is mostly determined by the number of multiplications per chiphertext required for the HE circuit.

Moreover, the multiplication operation is more complex designed than the addition in HE. In BFV and CKKS, the multiplication between two ciphertexts entails auxiliary procedures such as relinearization and modulus switching. Therefore, the time gap between such operations differs in a significant amount. As an illustration, within the CKKS scheme, the computational time required for multiplication exceeds that of addition by a factor greater than 46 (time for mult. : \(649 \,ms\), add: \(14 \,ms\))Footnote 1. Hence, it is important to note that reducing number of multiplication is crucial in HE circuit design.

One of the key features of the CKKS scheme is the use of the Single Instruction Multiple Data (SIMD) structure. SIMD [17] is a structure that enables the packing of vector plaintexts into a single ciphertext, and operations are performed in vector units. Another feature is additional functionalities such as slot rotation. Rotations enable us to interact with values located in different ciphertext slots. These features allow for efficient operations on vectors and matrices. Halevi et al. [16] introduce a matrix encoding method based on diagonal decomposition, where the matrix is arranged in diagonal order. This method requires O(n) ciphertexts to represent the matrix, and the matrix multiplication can be computed using \(O(n^2)\) rotations and multiplications and two circuit depths given the multiplication of two square matrices of size n.

Additionally, Jiang et al. [18] propose the matrix multiplication method that reduces the complexity of multiplications and rotations to O(n) by employing three levels of computational depth. These approaches are beneficial in terms of computational efficiency. Nevertheless, within the scope of this paper, we employ a naive matrix multiplication approach that necessitates \(O(n^3)\) multiplicative operations for the computation of the inverse matrix. The evaluation of an inverse matrix typically entails substantial computational depth. Utilizing a naive matrix multiplication method is advantageous in this regard, as it necessitates only a single depth.

2.3 Circuit Depth

In leveled homomorphic encryption, the total count of multiplication evaluations for a single ciphertext is predetermined by the initial depth parameter of the HE system. For example, when a ciphertext is assigned a depth level denoted as L, it is intrinsically constrained to execute a maximum of L multiplicative operations. Beyond this specified threshold of L multiplications, the ciphertext ceases to support further multiplication operations.

The design of HE circuits can significantly influence the multiplicative depths, making it a crucial consideration. To illustrate this point, consider four distinct ciphertexts denoted as xyz,  and w, each initially possessing a depth level of L. When these ciphertexts are multiplied sequentially, it consumes 3 depth levels, resulting in a ciphertext denoted as xyzw with a reduced depth of \(L-3\).

Alternatively, we can initially perform a multiplication between x and y, yielding xy with a depth decrement of 1; likewise, we can evaluate a multiplication on z and w. Finally, the multiplication of xy and zw results in a ciphertext xyzw with a reduced depth of \(L-2\). Importantly, both approaches yield equivalent results and require an identical count of 3 multiplication operations. However, the depth level of the resulting ciphertext differs by a factor of 1.

Note that when multiplying ciphertexts with different levels, the multiplication operations are executed based on the lowest level among them.

2.4 Conventional Iterative Matrix Inverse

There are mainly two approaches in implementing the iterative matrix inverse operation: Goldschmidt’s method [8] and Newton’s method.

Goldschmidt Algorithm. (See the details in Algorithm 3) Let \(\textbf{A}\) be an invertible square matrix that satisfies \(\Vert \bar{\textbf{A}} \Vert \le \epsilon < 1\) for \(\bar{\textbf{A}} = \textbf{I} - \frac{1}{2^{t}} \textbf{A}\) for some non-negative integer t. It follows that

$$ \frac{1}{2^{t}}\textbf{A}(\textbf{I}+\mathbf {\bar{A}})(\textbf{I}+\mathbf {\bar{A}}^{2})\cdots (\textbf{I}+\mathbf {\bar{A}}^{2^{r-1}}) = \textbf{I} - \mathbf {\bar{A}}^{2^{r}} $$

where \(\textbf{I}\) is the identity matrix. Additionally, we note that \(\Vert \mathbf {\bar{A}}^{2^{r}} \Vert \le \Vert \mathbf {\bar{A}} \Vert ^{2^{r}} \le \epsilon ^{2^{r}}\), which implies that \(\frac{1}{2^{t}}\prod _{i=0}^{r-1} (\textbf{I}+\mathbf {\bar{A}}^{2^{i}}) = \textbf{A}^{-1}(\textbf{I}-\mathbf {\bar{A}}^{2^{r}})\) is an approximate inverse of \(\textbf{A}\) when \(\epsilon ^{2^{r}} \ll 1\).

The algorithm is able to correctly output the approximate matrix inverse for some sufficiently large \(r\in \mathbb {N}\). Using the Goldschmidt algorithm, Cheon et al. propose a matrix inverse method over HE schemes [7].

Newton’s Method. (See the details in Algorithm 4) Likewise, let \(\textbf{A} \in \mathbb {R}^{n \times n}\) be any invertible square matrix, and let \(\alpha \) be the reciprocal of the dominant eigenvalue of \(\textbf{AA}^{T}\). Newton’s method computes the following sequence of matrices \(\{X_{k}\}_{k\ge 0}\) as:

$$ \textbf{X}_{0} = \mathbf {\alpha A}^{T} ~\text { and }~~ \textbf{X}_{k+1} = \textbf{X}_{k}(2\textbf{I}-\textbf{AX}_{k}), $$

until \(\textbf{X}_{k}\) converges to \(\textbf{A}^{-1}\). We will dive into the details including the proof for convergence in Theorem 1.

Newton’s method for obtaining an approximate inverse matrix consists of three steps: (1) computing \(\textbf{A} \textbf{A}^T\), (2) computing the dominant eigenvalue of \(\textbf{AA}^T\), and (3) calculating a sequence of \(\textbf{X}_n\) to approximate the inverse of \(\textbf{A}\). It is worth noting that \(\alpha \) is the reciprocal of a dominant eigenvalue of \(\textbf{AA}^{T}\) which can be approximated using the Goldschmidt’s algorithm through a combination of addition and multiplication operations.

In fact, it is difficult to directly obtain the dominant eigenvalue from homomorphic encryption. However, in this paper, we demonstrate that convergence can be proven even when a larger value is used rather than the exact value of the dominant eigenvalue. Therefore, some literature uses a trace instead of a dominant eigenvalue when obtaining the inverse matrix in homomorphic encryption by Newton’s method [12]. The trace of a square matrix is the sum of its main diagonal elements. Thus, the trace of \(\textbf{AA}^T\) is always greater than the dominant eigenvalue of \(\textbf{AA}^T\) since the trace is the sum of eigenvalues and \(\textbf{AA}^T\) is positive-definite.

3 Problems in Two Popular Methods

In this section, we will delve into the details and challenges associated with the implementation of the iterative matrix inverse operation in HE using two distinct approaches: Goldschmidt’s method and Newton’s method.

First, a major limitation of Goldschmidt’s method is that the value of \(\mathbf {\bar{A}}\) must be known in advance in order to satisfy the condition where \(\Vert \mathbf {\bar{A}}\Vert \) is less than 1. This is infeasible, as all values—including input, intermediate, and output—are processed in an encrypted state. In other words, it is not possible to find t such that \(\Vert \bar{\textbf{A}}\Vert = \Vert \textbf{I} - \frac{1}{2^{t}} \textbf{A}\Vert < 1\). As a result, the algorithm cannot be initiated at all. It may be suggested to raise the value of t sufficiently large to match the condition of \(\Vert \mathbf {\bar{A}}\Vert < 1\), however, this would highly likely zero out the elements of \(\bar{\textbf{A}} = \textbf{I} - \frac{1}{2^{t}} \textbf{A}\), thus the approach cannot provide the approximate matrix inverse for all \(\textbf{A}\).

Next, a drawback of Newton’s method is the significant computational complexity in terms of overall time consumption. Upon examination of Newton’s method, the sequence of \(\textbf{X}_n\) requires two matrix multiplications in one iteration; assuming that the process converges in r iterations, the time complexity of step (3) in Sect. 2.4 is \(O(n^2 r)\). Additionally, step (3) consumes 2r circuit-depth. As a result, the time complexity of Newton’s method and its depth-consumption are significant. To provide an intuitive example, for a small matrix of size \(n=10\) and iteration number \(r=15\), the total number of multiplications in a HE setting is 4,500. If we assume that each multiplication takes 649 ms, the expected time for the inverse matrix operation would be at least 2,920 s.

4 Proposed Approach

We propose a novel matrix inverse method by combining elements from both Goldschmidt’s method and Newton’s method.

4.1 Motivation

Goldschmidt’s approach requires the value of t for the convergence of \(\bar{\textbf{A}} = \textbf{I} - \frac{1}{2^{t}} \textbf{A}\), however, as previously mentioned, finding this value in the encrypted domain is infeasible.

In contrast, Newton’s method relates the dominant eigenvalue of \(\textbf{A} \textbf{A}^T\) to the scaling of \(\textbf{A}\textbf{A}^T\), where the scaling by \(\alpha \) ensures that the norm of \(\textbf{AA}^T\) is less than 1.

4.2 Efficient Matrix Inverse

Based on this observation, we posit that the dominant eigenvalue \(\lambda _1\) is correlated with the role of t in Goldschmidt’s method. To address this issue, (1) we first find the dominant eigenvalue of \(\textbf{A} \textbf{A}^T\), and (2) scale \(\textbf{AA}^T\) by its dominant eigenvalue. (3) We then use the normalized \(\textbf{AA}^T\) to iteratively approximate the matrix inverse using the Goldschmidt’s sequence for \(\textbf{Y}_i\), as detailed in Algorithm 1.

Algorithm 1
figure a

. Our Approach

In summary, our approach diverges from Newton’s method in two fundamental ways: (1) we employ the Goldschmidt algorithm to approximate the inverse of matrix \(\textbf{A}\), and (2) our technique incurs a multiplicative depth of only 1 per iteration, while Newton’s method entails a depth of 2 per iteration.

It is worth emphasizing that both methods involve the same number of multiplications per iteration, namely, 2. However, the discrepancy in depth utilization per iteration between the two methods arises from the fact that our approach permits the computation of multiplications independently, incurring a depth cost of 1 for each operation. In contrast, Newton’s method conducts matrix multiplications sequentially, incurring a depth cost of 2 per iteration.

Furthermore, it is crucial to note that both Newton’s method and our approach require an equivalent number of iterations to achieve convergence. Consequently, given that Newton’s method necessitates a depth of 2 per iteration, our approach ultimately requires only half the depth cost to achieve convergence compared to Newton’s method. Further details regarding this matter will be addressed in the subsequent proof section.

The reason for finding the dominant eigenvalue of \(\textbf{AA}^T\), instead of \(\textbf{A}\) itself, is because not all eigenvalues of the input matrix \(\textbf{A}\) are necessarily positive. For convergence, it is essential that the norm of \(\mathbf {\bar{A}_0}\) (the matrix used in the Algorithm 1) be less than 1. \(\textbf{AA}^T\) has the property that all of its eigenvalues are positive. By using the dominant eigenvalue of \(\textbf{AA}^T\), we ensure that the norm of \(\mathbf {\bar{A}_0}\) remains less than 1 for any invertible matrix \(\textbf{A}\). In the case that the input matrix \(\textbf{A}\) is positive definite, it is unnecessary to calculate \(\textbf{AA}^T\). Under such circumstances, we can directly evaluate the inverse matrix using the following approach.

Algorithm 2
figure b

. Our Approach

5 Convergence and Depth Analysis

In this work, we demonstrate that our proposed method converges to the inverse matrix, and it does so at the same rate as Newton’s method. To support our claim, we provide the following lemma, which establishes the convergence of a matrix \(\textbf{A}\) under a specific condition.

Lemma 1

Suppose \(\textbf{A}\) is an \(n\times n\) complex matrix with spectral radius \(\rho (\textbf{A}).\) Then, \( \underset{k \rightarrow \infty }{\textrm{lim}} \textbf{A}^{k} = 0\) if \(\rho (\textbf{A}) < 1 \).

5.1 Proof of Convergence

Suppose that the eigenvalues of an \(n\times n\) matrix \(\textbf{A}\) by \(\lambda _{i}(\textbf{A}),i = 1,\dots ,n\). When \(\textbf{A}\) is positive-definite, we can order its eigenvalues in a non-decreasing order as follows:

$$\begin{aligned} \lambda _{1}(\textbf{A})\ge \lambda _{2}(\textbf{A})\ge \cdots \ge \lambda _{n}(\textbf{A}) > 0. \end{aligned}$$

It is worth noting that the eigenvalues of a positive definite matrix are real and positive.

We first state the convergence of Newton’s iterative algorithm. We provide details of the proof of Theorem 1 in Appendix B.1 since it is used in other theorems.

Theorem 1

Let \(\textbf{A} \in \mathbb {R}^{n\times n}\) be an invertible matrix and define the sequence \( \{\textbf{X}_{k} \}_{k\ge 0}\) of matrices as follows:

$$\begin{aligned} {\left\{ \begin{array}{ll} \textbf{X}_{0} = \mathbf {\alpha A}^{T},\\ \textbf{X}_{k+1} = \textbf{X}_{k}(2\textbf{I}-\textbf{AX}_{k}). \end{array}\right. } \end{aligned}$$

where \(\mathbf {\alpha } = \frac{1}{\lambda _{1}(\textbf{AA}^{T})}\). Then, \(\textbf{X}_{k} \rightarrow \textbf{A}^{-1} \ as \ k \rightarrow \infty .\)

Next, we prove that the sequence in our approach (in Algorithm 1) converges to an inverse matrix, i.e., \(\textbf{Y}_i \rightarrow \textbf{A}^{-1}\).

Theorem 2

Let \(\textbf{A} \in \mathbb {R}^{n\times n}\) be an invertible matrix and define the sequence \( \{\textbf{Y}_{k} \}_{k\ge 0}\) of matrices as follows:

$$\begin{aligned} {\left\{ \begin{array}{ll} \textbf{Y}_{0} = \mathbf {\alpha \textbf{A}}^T, \ with \ \mathbf {\alpha } = \frac{1}{\lambda _{1}(\textbf{AA}^T)},\\ \mathbf {\bar{A}} = \textbf{I} - \alpha \textbf{AA}^T,\\ \textbf{Y}_{k+1} = \textbf{Y}_{k}(\textbf{I}+\mathbf {\bar{A}}^{2^{k}}). \end{array}\right. } \end{aligned}$$

Then, \(\textbf{Y}_{k} \rightarrow \textbf{A}^{-1} \ as \ k \rightarrow \infty .\)

Proof

From the definition of Theorem 2, we get

$$\begin{aligned} \textbf{Y}_{k} = \alpha \textbf{A}^T(\textbf{I}+\mathbf {\bar{A}})(\textbf{I}+\mathbf {\bar{A}}^{2})\cdots (\textbf{I}+\mathbf {\bar{A}}^{2^{k-1}}) = \textbf{A}^{-1}(\textbf{I}-\mathbf {\bar{A}}^{2^{k}}). \end{aligned}$$
(1)

We show that \(\rho (\mathbf {\bar{A}}) = \rho (\textbf{I} - \alpha \textbf{AA}^T) < 1.\) We note that the eigenvalues \(\lambda _{i}(\mathbf {\bar{A}})\) are given by, \(\lambda _{i}(\mathbf {\bar{A}}) = 1 - \alpha \lambda _{i}(\textbf{AA}^T).\) Since \(\textbf{AA}^T\) is positive-definite and \(\alpha = \frac{1}{\lambda _{1}(\textbf{AA}^T)}\), we have \(|\lambda _{i}(\mathbf {\bar{A}})| < 1\). Thus, we can get \(\rho (\mathbf {\bar{A}}) < 1.\) Therefore, by Lemma 1 we have \(\lim _{k \rightarrow \infty } \mathbf {\bar{A}}^{k} = 0\). We note that \(\textbf{Y}_k = \alpha \textbf{A}^T\prod _{i=0}^{k-1} (\textbf{I}+\mathbf {\bar{A}}^{2^{i}}) = \textbf{A}^{-1}(\textbf{I}-\mathbf {\bar{A}}^{2^{k}})\) follows from Eq. (1). Therefore,

$$\begin{aligned} \lim _{k \rightarrow \infty } \textbf{Y}_{k} = \alpha \textbf{A}^T\prod _{i=0}^{\infty } (\textbf{I}+\mathbf {\bar{A}}^{2^{i}}) = \textbf{A}^{-1}(\textbf{I}-\lim _{k \rightarrow \infty }\mathbf {\bar{A}}^{2^{k}}) = \textbf{A}^{-1}. \end{aligned}$$

In the context of our method, we posit the use of the trace of \(\textbf{AA}^T\) in place of the dominant eigenvalue of \(\textbf{AA}^T\). Our method still guarantees convergence of the iterative process, as the spectral radius of the modified matrix \(\mathbf {\bar{A}}\), denoted as \(\rho (\mathbf {\bar{A}})\), remains less than one under this assumption.

5.2 Convergence Comparison

We prove that our method has the same convergence rate as Newton’s method.

Theorem 3

Let \(\textbf{A} \in \mathbb {R}^{n \times n}\) be an invertible matrix. Suppose \(\{\textbf{X}_{k}\}_{k \ge 0}\) is the sequence of matrices generated from Newton’s method of Theorem 1 and \(\{ \textbf{Y}_{k}\}_{k \ge 0}\) generated from Theorem 2 with \(\textbf{A}\). Then for any \( 0 < \epsilon \ll \Vert \textbf{A}^{-1} \Vert \), let \(R_{1}, R_{2} \in \mathbb {N}\) be the smallest integers that satisfy \(\Vert \textbf{A}^{-1} - \textbf{X}_{i} \Vert < \epsilon \; \; \text {for all} \; i > R_{1}\) and \(\Vert \textbf{A}^{-1} - \textbf{Y}_{j} \Vert < \epsilon \; \; \text {for all} \; j > R_{2}\) respectively. Then we have \(R_{1} = R_{2}\). That is, the method illustrated in Theorem 2 converges with the same iterations as Newton’s method.

Proof

From the proofs of Theorem 1 and Theorem 2, we have

$$\begin{aligned} \textbf{X}_{k} &= \textbf{A}^{-1}(\textbf{I}- \textbf{R}_{k}) = \textbf{A}^{-1}\left( \textbf{I} - \left( \textbf{I} - \frac{1}{\lambda _{1}(\textbf{AA}^{T})}\textbf{AA}^{T}\right) ^{2^{k}}\right) , \\ \textbf{Y}_{k} &= \textbf{A}^{-1}(\textbf{I} - \bar{\textbf{A}}^{2^{k}}). \end{aligned}$$

We first prove that \(R_{1}\) and \(R_{2}\) always exist for \(0 < \epsilon < \Vert \textbf{A}^{-1} \Vert \). Define two sequences \(\{x_{k}\}_{k \ge 0}\) and \( \{y_{k}\}_{k \ge 0}\) with \(x_{k} = \Vert \textbf{A}^{-1} - \textbf{X}_{k}\Vert \) and \(y_{k} = \Vert \textbf{A}^{-1} - \textbf{Y}_{k} \Vert \).

For simplicity, we denote the greatest eigenvalue of \(\textbf{AA}^T\) as \(\lambda _{1}\), and the smallest eigenvalue as \(\lambda _{n}\). Then we have

$$\begin{aligned} x_{k} = \Vert \textbf{A}^{-1} - \textbf{X}_{k}\Vert &= \left\Vert \textbf{A}^{-1} \left( \textbf{I} - \frac{1}{\lambda _{1}}\textbf{AA}^{T}\right) ^{2^{k}}\right\Vert \\ &\le \left\Vert \textbf{A}^{-1} \right\Vert \cdot \left\Vert \textbf{I} - \frac{1}{\lambda _{1}}\textbf{AA}^{T} \right\Vert ^{2^{k}} \\ &= \left\Vert \textbf{A}^{-1} \right\Vert \cdot \left( \frac{\lambda _{1} - \lambda _{n}}{\lambda _{1}}\right) ^{2^{k}}. \end{aligned}$$

Also for \(y_{k}\), we have

$$\begin{aligned} y_{k} = \left\Vert \textbf{A}^{-1} - \textbf{Y}_{k} \right\Vert = \left\Vert \textbf{A}^{-1} \bar{\textbf{AA}^T}^{2^{k}} \right\Vert &= \left\Vert \textbf{A}^{-1} \left( \textbf{I} - \frac{1}{\lambda _{1}} \cdot \textbf{AA}^T \right) ^{2^{k}}\right\Vert \\ &\le \left\Vert \textbf{A}^{-1} \right\Vert \cdot \left\Vert \textbf{I} - \frac{1}{\lambda _{1}} \cdot \textbf{AA}^T\right\Vert ^{2^{k}} \\ &= \left\Vert \textbf{A}^{-1}\right\Vert \cdot \left( \frac{\lambda _{1} - \lambda _{n}}{\lambda _{1}} \right) ^{2^{k}}. \end{aligned}$$

Then by the definition of \(\lambda _{1}\) and \(\lambda _{n}\), we have the inequality

$$\begin{aligned} 0 < \frac{\lambda _{1} - \lambda _{n}}{\lambda _{1}} < 1. \end{aligned}$$

From the results, we can observe that both sequences \(x_{n}\) and \(y_{n}\) monotonically decrease and both converge to 0 as \(k \rightarrow \infty \). Thus, for any \(0 < \epsilon \ll \left\Vert \textbf{A}^{-1} \right\Vert \), there always exist \(R_{1}, R_{2} \in \mathbb {N}\) such that

$$\begin{aligned} x_{i} < \epsilon \; \; \text {for all} \; i > R_{1}, \text { and } y_{j} < \epsilon \; \; \text {for all} \; j > R_{2}. \end{aligned}$$

We further investigate the behavior of \(x_{k}\) and \(y_{k}\) to compare the minimal iteration required, namely \(R_{1}\) and \(R_{2}\):

$$\begin{aligned} x_{R_{1}} &= \left\Vert \textbf{A}^{-1} \left( \textbf{I} - \frac{1}{\lambda _{1}}\textbf{AA}^{T}\right) ^{2^{R_{1}}}\right\Vert < \epsilon , \\ y_{R_{2}} &= \left\Vert \textbf{A}^{-1} \left( \textbf{I} - \frac{1}{\lambda _{1}}\textbf{AA}^{T}\right) ^{2^{R_{2}}}\right\Vert < \epsilon . \end{aligned}$$

It is readily evident that, for a given \(\epsilon \) value, \(R_{1}\) is equal to \(R_{2}\).

Our proposed method, despite relying on the trace instead of the dominant eigenvalue when compared to Newton’s method, demonstrates an equivalent convergence rate. The proof for this is similar to Theorem 3.

5.3 Depth Comparison

From Theorem 3, we confirm that our method converges at the same rate as Newton’s method. It implies that our method uses less multiplicative depth for matrix inverse operation.

Specifically, let \(t_{div}\) denote the iteration number required for the division algorithm method. Moreover, let \(\textbf{X}_k\) and \(\textbf{Y}_k\) represent the previous two algorithms, and assume that \(\textbf{X}_k\) and \(\textbf{Y}_k\) converge at iterations of \(R_1\) and \(R_2\), respectively. Then, the total number of multiplications required for \(\textbf{X}_k\) is \(2t_{div}\) \( + n^3 + n^2 + 2n^{3}R_1\) and the total number of multiplications required for \(\textbf{Y}_k\) is \(2t_{div} + 2n^2+ 2n^{3}R_2\). Since the division algorithm requires the same amount of multiplications for both algorithms, we only compare the remaining terms. Hence, \(\textbf{Y}_k\) requires almost the same number of multiplications since \(R_1 = R_2\).

For a depth comparison, we analysis the sequence equation \(\textbf{X}_{k+1} = \textbf{X}_{k}(2\textbf{I}-\textbf{AX}_{k})\) in Theorem 1 and the sequence equation \(\textbf{Y}_{k+1} = \textbf{Y}_{k}(\textbf{I}+\mathbf {\bar{A}}^{2^{k}})\) in Theorem 2. First, assuming the depth level of the input matrix \(\textbf{A}\) is denoted as L, and the level of \(\textbf{X}_0\) is assumed to be \(L-5\), we can observe that \(\textbf{X}_1\) is computed by multiplying \(\textbf{A}\) and \(\textbf{X}_0\), then subtracting it from \(2\textbf{I}\), followed by another multiplication with \(\textbf{X}_0\). Considering only the multiplication operations (since addition and subtraction do not affect the level), the level of \(\textbf{A}\textbf{X}_0\) becomes \(L-6\), and after another multiplication with \(\textbf{X}_0\), the resulting matrix \(\textbf{X}_1\) has a level of \(L-7\). Following this pattern, we can see that \(\textbf{X}_2\) has a \(L-9\) level, \(\textbf{X}_3\) has a \(L-11\) level, and so on. Since the level difference between \(\textbf{X}_k\) and \(\textbf{X}_{k+1}\) (\(k\ge 0\)) is 2, we can conclude that the Newton method consumes 2 depths per iteration.

Next, assuming the level of \(\textbf{Y}_0\) is L, then \(\mathbf {\bar{A}}\) has a \(L-1\) level. \(\textbf{Y}_1\) is computed by adding \(\mathbf {\bar{A}}\) and \(\textbf{I}\) and then multiplying it by \(\textbf{Y}_0\), resulting in a \(L-2\) level. \(\mathbf {\bar{A}}^2\) is the square of \(\mathbf {\bar{A}}\), which has \(L-2\) level. \(\textbf{Y}_2\) is the result of multiplying \(\textbf{Y}_1\) and \(\mathbf {\bar{A}}^2\), which makes its level \(L-3\). This pattern continues, and we can observe that \(\textbf{Y}_3\) has a \(L-4\) level, \(\textbf{Y}_4\) has a \(L-5\) level, and so on. The level difference between \(\textbf{Y}_k\) and \(\textbf{Y}_{k+1}\) (\(k\ge 1\)) is always 1. Therefore, our method consumes 1 depth per iteration.

Based on the observation, the total depths required for \(\textbf{X}_k\) is \(t_{div}+ 2 + 2R_1\) and the total depths required for \(\textbf{Y}_k\) is \(t_{div} + 3 + R_2\). Since \(R_1 = R_2\), and assuming that \(R_1 = R_2 \ge 2\), our method can achieve the inverse matrix with fewer depths compared to the Newton method.

6 Experiment

In this section, we conduct a comparative analysis to evaluate the performance of the proposed algorithm and Newton’s method when applied to invertible matrices in both the plain and encrypted domains. The evaluation focuses on two critical metrics: circuit depth and iteration number. Subsequently, the proposed algorithm is applied to linear regression and LDA in the encrypted domain to validate its computational efficiency.

6.1 Experiment Setting

Environment. In our cryptographic experiments, we employed OpenFHE [2] library for implementing the CKKS scheme. All experiments were evaluated on a system consisting of Intel Core i9-9900K CPU 3.60GHz \(\times \) 16, 62.7 GiB RAM, Ubuntu 20.04.4 LTS.

CKKS Scheme Setting. We employed a 128-bit security level for all CKKS implementations. The other encryption parameters, including the ring dimension N, scaling factor \(\varDelta \), and circuit depth D, were pre-determined to perform the inverse matrix operations or machine learning algorithms. Furthermore, we exclusively used a leveled approach and avoided the use of bootstrapping during the evaluation of homomorphic circuits.

6.2 Invertible Matrix and Machine Learning

Fig. 1.
figure 1

The distribution of iteration numbers required for convergence to the inverse matrix across various dimensions for two algorithms—ours and Newton method.

Iteration Number Distribution. Figure 1 demonstrates the distributions of the iteration numbers for our proposed algorithm and the Newton’s method. We conducted 100 experiments for matrix dimensions of \(10, 20, \dots , 50\) and depicted their distributions using box plots. We randomly generated square invertible matrices of varying sizes, with the smallest eigenvalue greater than \(10^{-7}\) to avoid being recognized as zero. We recorded the iteration numbers at which convergence was achieved, with \(\epsilon \) set to 0.001 and compared the approaches in the plain domain using Matlab R2022b. The results of our experiments show that our proposed algorithm converges identically to Newton’s method regardless of dimension.

Time and Memory w.r.t. Circuit Depth. The reduction in depths has a significant impact on both the multiplication time and the memory size of ciphertext and keys in the encrypted domain. For example, in the CKKS scheme, with the same parameter set (\(\lambda \), N, \(\varDelta \)), the multiplication time increases proportionally with respect to the depth D of the circuit (see Table 1). Specifically, multiplication time for \(D=1\) is 0.037, while 0.649 for \(D=50\); the latter is approximately 17 times greater.

Table 1. Impact of circuit depth on the multiplication and key generation time in the CKKS encryption scheme with fixed encryption parameters (\(\lambda \), N, \(\varDelta \)).

Additionally, in the leveled-CKKS scheme, the size of the ciphertext and key are linearly determined by the circuit depth. This is due to the fact that the CKKS scheme uses rescaling (or similarly modulus-reduction in other schemes) procedure, which reduces the ciphertext size (modulus) after multiplication. Consequently, a larger initial ciphertext size is necessary to accommodate the entire circuit multiplications. Therefore, the depth of the circuit is a crucial factor that determines both the time performance and memory capacity in leveled encryption schemes.

Comparison of Implementation in Encrypted Domain: Time and Depth. We compare our proposed algorithm with the Newton’s method for a randomly generated square matrix of size 5 with regards to error at specific iterations, under varying circuit depths (as seen in Table 2) in the encrypted domain. We use the same set of parameters (\(\lambda \), N, \(\varDelta \)) as in Table 1 and measure the error of the approximated inverse matrix using the spectral norm. For the convergence of the approximated inverse, we set \(\epsilon = 0.001\).

Table 2. Evaluation of our approach and Newton’s method in the encrypted domain based on iteration number, circuit depth, and error (both use trace instead of dominant eigenvalue).

The results indicate that our algorithm converges at iteration number 16, which can be efficiently implemented with a circuit depth of \(D=27\). In contrast, the Newton’s method converges at the same iteration number 16; however, it requires a circuit depth of \(D=43\).

Therefore, we conclude that our proposed algorithm has the same convergence speed as the Newton’s method in the encrypted domain. However, as our algorithm can be implemented with a smaller circuit depth, its total execution time is about 596 s, whereas the Newton’s method’s execution time is about 1,256 s, making our method 2.1 times faster.

Table 3. Comparison of our proposed approach and Newton’s method in performing ML algorithms—linear regression and LDA (both use trace instead of dominant eigenvalue).

Application to ML Algorithms. We demonstrate the efficiency of our approach through two popular ML algorithms, linear regression and LDA, that utilize a positive definite matrix as input to evaluate its inverse. We compare the efficiency of our method with the Newton’s algorithm in terms of circuit depth and time performance in the encrypted domain; we show that our algorithm significantly enhances the overall performance.

For our evaluation of linear regression in the encrypted domain, we employed 100 samples with 8 features from the well-known public dataset “Diabetes dataset”. We used the same encryption parameters \(\lambda , N, \varDelta \) and set \(\epsilon = 0.001\) for the convergence of the matrix inverse operation. The linear regression of the dataset requires an inverse of a \(8 \times 8\) square matrix. Our method and Newton’s method both required 22 iteration number (see Table 3). However, our method requires less depth per iteration than Newton’s method. This results in a circuit depth optimization of 37 for our method, compared to 58 for the Newton’s method.

Initially, we conducted an experiment using the same circuit depth of 58 for both our method and Newton’s method. Our approach closely resembles Newton’s method in terms of the number of iterations required for convergence. However, it is noteworthy that as the depth level of the ciphertext decreases, the ciphertext modulus decreases as well, resulting in an increase in multiplication speed. In contrast to our method, which consumes only one depth in a single iteration, Newton’s method consumes two depths in a single iteration. Consequently, even when performing the same number of operations, the multiplication of ciphertexts with a relatively lower depth level in Newton’s method takes less time than in our approach. This phenomenon results in a decrease in the total execution time of Newton’s method, reducing it by 1568.81 s compared to the execution time of our method. However, our method can perform additional 21 multiplications followed by the acquisition of the inverse matrix. Conversely, in the case of the Newton’s method, further multiplication was no longer feasible upon obtaining the inverse matrix.

Subsequently, we measured the execution time of our approach with an optimal circuit depth of 37. Our approach demonstrated approximately 1.8 times less execution time compared to the Newton’s method. It is important to note that the Newton’s method cannot be implemented with a depth of 37; a minimum circuit depth of 58 is required to ensure correctness of the result.

In the evaluation of LDA, we used a subset of 150 samples from Iris flower dataset, which consists of 4 features and 3 species. With the same setting as in the linear regression, the LDA algorithm has to compute over an inverse of \(4 \times 4\) matrix. Our method and Newton’s method both required 9 iterations. Hence, the total depth required for each approach was 28 and 36, respectively, for constructing the optimal circuit. The evaluation time for the optimal circuit for each approach was approximately 1481.71 s for our method and 1884.46 s for the Newton’s method, indicating a 1.27 times improvement in time performance of our proposed algorithm.

7 Conclusion

This paper presents a novel iterative matrix inverse algorithm that reduces multiplicative depths compared to the widely used Newton’s method in the homomorphic encryption domain. Our algorithm offers significant improvements in computational time efficiency, with about 2 times reduction, and is advantageous in machine learning algorithms requiring the inverse of matrices.