Keywords

1 Introduction

As the era of the Internet of Things approaches, various smart sensing devices are widely used, and face recognition systems are widely applied in areas such as law enforcement, payment, transportation, access control, and more. Compared to traditional identity authentication mechanisms, face recognition features the advantages of being less prone to forgetting or damage and offers convenient usage, effectively enhancing the efficiency of authentication. One common attack on face recognition is the use of facial feature template libraries. For instance, attackers can reconstruct the original face image to successfully bypass the authentication or infer personal attributes like age and gender.

Since data involves sensitive information, uploading data to the cloud poses risks of leakage. An effective solution is to encrypt the data before uploading it, which can prevent data leaks, but it hinders feature matching in the ciphertext domain. General encryption algorithms can only perform encryption, and it has become a hot topic in academia to find a solution for performing computations on encrypted data.

Enabling computation on encrypted data has been a challenging issue that has perplexed many scholars until the advent of Fully Homomorphic Encryption (FHE) schemes. FHE can perfectly solve this problem as it allows computations on encrypted data. Subsequently, many scholars, inspired by his work, have gradually developed various FHE schemes, enriching the field of encrypted computation and providing theoretical support for achieving face recognition in the ciphertext domain.

In comparison to the existing homomorphic encryption face recognition algorithms, which are extremely time-consuming, this paper focuses on the challenges faced by face recognition in the ciphertext space, such as low efficiency and massive computational overhead. To address these challenges, the paper proposes a data stream computing framework for face privacy protection. It encrypts face data using the CKKS homomorphic encryption scheme and processes the data as a data stream using Flink, a stream processing framework. Flink features real-time processing, multi-threading, and stateful data processing, dividing the stream data processing into two Map-Reduce stages. The proposed framework aims to protect the facial ciphertext information stored in the database, enabling approximate homomorphic encryption for matching calculations while utilizing Flink for stream processing to improve the efficiency of facial ciphertext computation. This approach ensures both the security of facial data and high efficiency and practicality in the process.

2 Related Work

2.1 Homomorphic Encryption

Homomorphic encryption is one of the important techniques for privacy protection. The idea behind homomorphic encryption is to perform encryption first and then perform encrypted calculations based on the ciphertext data. The result of the encrypted computation, when decrypted, will match the result of the same computation on the original unencrypted data.

In 2017, Cheon et al. proposed the CKKS (Cheon-Kim-Kim-Song) homomorphic encryption scheme [1]. It is an approximate homomorphic encryption scheme that supports floating-point number calculations and has high efficiency. Currently, it is one of the most widely used homomorphic encryption schemes [2].

As the theory of homomorphic encryption continues to evolve, this technology has been widely applied in various fields [2,3,4,5,6], such as machine learning [7, 8], secure multi-party computation [9,10,11], federated learning [12, 13], and cloud computing [14,15,16]. However, the high computational overhead and low efficiency of homomorphic encryption privacy protection schemes have been major bottlenecks restricting the development of homomorphic encryption technology. In recent years, many experts and scholars have been dedicated to improving the efficiency of homomorphic encryption technology [17,18,19,20,21]. Parallel computing techniques have been widely used to enhance the execution efficiency of homomorphic encryption schemes, such as ciphertext packing techniques [22], batch processing, and single instruction multiple data (SIMD) techniques, among others.

2.2 Flink Stream Processing Engine

Apache Flink is a distributed processing engine primarily designed to handle streaming data and supports stateful computations. In the Flink framework, all data is treated as data streams, where batch data represents bounded data streams and real-time data means unbounded data streams. As a result, Flink is a unified big data processing engine that can perform stream and batch processing operations. One of the critical features of Flink is that it is event-driven. It can receive data from multiple data sources as data streams and triggers corresponding data operations only when data is received. No procedures are performed when data is not available.

2.3 FaceRecognition

FaceRecognition is an open-source face recognition library based on the Python programming language. It utilizes a simple API to perform face detection and recognition. The library offers a range of powerful functionalities, including face alignment, feature extraction, and more, making it suitable for various applications such as face recognition and face verification.

3 Methodology

3.1 Methodology Overview

In this paper, the features extracted from facial images are first encrypted and stored in a cloud-based database. During face identity verification, the features of the current face are extracted and encrypted. The feature matching phase utilizes Flink for data stream processing, which consists of two sub-stages. The Map operator calculates the differences between encrypted data streams for each sample and then squares them. The Reduce operator aggregates the results of all data streams and computes the sum. Finally, the encrypted comparison score between the two is returned.

This scheme aims to protect the private information of users stored in the template database by performing approximate homomorphic encryption for matching calculations. Simultaneously, it utilizes Flink for data stream processing to improve computational efficiency. The approach ensures both data security and efficiency, providing a practical and effective solution.

System Model: In privacy-preserving machine learning algorithms, one of the most common computations is the matrix multiplication between two matrices. Matrices can be viewed as special types of vectors, which allows us to transform the matrix multiplication into the form of a vector inner product. The vector inner product involves the multiplication of corresponding elements in two vectors followed by their summation. The element-wise multiplication process between vectors is independent of each other, making it amenable to parallelization to improve efficiency. Finally, the results of parallel computations are aggregated and summed up. The framework can be seen in Fig. 1.

Fig. 1.
figure 1

Frame Work

3.2 Feature Encryption

To prevent malicious attacks on the template database, which could lead to obtaining real facial images and their corresponding feature attributes, it is necessary to protect the stored feature templates in the cloud. Therefore, the proposed scheme suggests encrypting the successfully extracted feature templates using approximate homomorphic encryption and then sending the encrypted feature templates to the cloud. The key generation and encryption process for approximate homomorphic encryption are as follows:

Key Generation: Use the SEAL library to get the locally transmitted parms build parameter container. The CKKS framework is then generated using the parameter params: contextSEALContext context(params). Obtain the key list through local transmission. After the key is generated, the decrypted private key is saved by the local client by calling the HE.save_key key saving function in the homomorphic encryption module.

The public key, relinearized key, and rotating key need to be sent to the cloud for subsequent homomorphic calculation. Relinearized key is mainly used to reduce noise. Euclidean distance calculation requires the use of rotation keys. The algorithm flow can be seen in Algorithm 1.

figure a

Feature Encryption: After obtaining the facial feature values, it is necessary to encrypt them using the generated keys. The entire facial floating-point feature values are converted into a vector called DoubleVector. The encryption is performed using the HE.encrypt function from the homomorphic encryption module. This function will return the encrypted ciphertext of the facial features. Subsequently, the local client uploads the encrypted facial feature ciphertext to the cloud-based ciphertext database. The algorithm flow can be seen in Algorithm 2.

figure b

3.3 Ciphertext Feature Stream Distributed Computation

The approximate homomorphic encryption used in this paper is based on polynomial encryption, and only linear functions can perform corresponding approximate homomorphic operations. However, when the function model is nonlinear, it needs to undergo an approximate transformation to convert it into an approximate linear function.

Since face_recognition calculates the similarity between features using the Euclidean distance between them, the closer the Euclidean distance between two facial features, the higher the likelihood that the two faces belong to the same person. As shown below, the formula represents the Euclidean calculation in an N-dimensional space.

$$\begin{aligned} {\text {dist}}(\textbf{x},\textbf{y})=\sqrt{\sum _{i=1}^{n}(\textbf{x}_{i}-\textbf{y}_{i})^{2}} \end{aligned}$$
(1)

From the formula, it can be deduced that all operations are linear, except for the final square root calculation. In this paper, homomorphic computation is performed on the cloud side, and the homomorphic results are returned to the client, where the square root is performed in the plaintext domain.

The homomorphic encryption module currently does not directly support summation. Therefore, vector rotation is required to achieve summation with respect to the first element.

The approach proposed in this paper is based on the Flink data stream processing engine. Similar to traditional big data frameworks like Hadoop and Spark, Flink’s parallel computing framework also supports the Map-Reduce paradigm. In Flink, data processing is done on data streams, where the Map operator performs operations on the data stream, and the Reduce operator aggregates the results of all data streams. Prior to the Reduce operation, a KeyBy operation is needed to split the data stream into different partitions and group data with the same key into the same partition.

In the proposed approach, we need to identify parallelizable steps in the ciphertext computation process. In the CKKS algorithm’s ciphertext computation, ciphertexts are mutually independent and do not interfere with each other. Therefore, we can follow the parallelization techniques used in plaintext data processing to design parallelized computation for ciphertext data. The algorithm flow can be seen in Algorithm 3.

figure c

3.4 Feature Matching

The local client receives a dictionary of encrypted squared sums of facial feature values sent from the cloud. In this dictionary, the keys represent the user information identifiers, and the values represent the encrypted squared sums of facial feature values corresponding to each user. The HE.decrypt function is used to decrypt the values, and then a plaintext square root operation is performed locally, resulting in the plaintext facial feature values. Afterward, using the corresponding plaintext feature value threshold, the client filters out user information that meets the condition of having the smallest distance between facial feature values. The algorithm flow can be seen in Algorithm 4.

figure d

4 Algorithm Analysis

4.1 Security Analysis

Assuming that all entities (clients and servers) in the system model are honest and curious, they can honestly perform protocol computations but may try to obtain data from other entities. An adversary is defined with the following capabilities: (1) it may eavesdrop on the transmission of facial feature data; (2) it may eavesdrop and obtain intermediate results of facial feature template matching on the server, such as the squared Euclidean distance, leading to the inference of other private data based on the intermediate results and its own facial data.

Regarding the facial feature templates, the client’s information is encrypted using homomorphic encryption before uploading it to the privacy service provider. The decryption key required for decryption can only be obtained by the client, and the server cannot decrypt the encrypted feature templates. This ensures that the server cannot access users’ facial information.

The squared Euclidean distance, which is an intermediate result obtained through homomorphic operations, also requires the decryption key for decryption, and thus the server cannot obtain this data. Therefore, the client’s data is secure and cannot be accessed by the server or the adversary, only requiring attention to the security of approximate homomorphic encryption itself. The hardness of the RLWE problem ensures the security of approximate homomorphic encryption.

4.2 Algorithm Performance Analysis

Serial Performance Analysis

We assume that the data stream contains N data points, each consisting of m features. After encryption, there are N ciphertexts, each ciphertext containing m ciphertext slots with data. The ciphertext computations are derived from two fundamental operations: homomorphic addition and homomorphic multiplication. As mentioned in the previous introduction to the CKKS algorithm, homomorphic addition and multiplication primarily involve additions and multiplications. In computer computations, the time consumed by addition and multiplication is roughly the same and denoted as \({T}_{m} \).

Homomorphic addition requires one addition operation and one modulo operation, while homomorphic multiplication requires 9 additions or multiplications and one modulo operation. Additionally, after each homomorphic multiplication, a re-scaling operation is needed, which requires 2 multiplications. The time consumed by the modulo operation is denoted as \({T}_{a} \).

In ciphertext computations, the time consumed by homomorphic addition or subtraction is denoted as \({T}_{m} + {T}_{a}\), and the time consumed by homomorphic multiplication is denoted as \(11{T}_{m} + {T}_{a}\). Homomorphic vector dot product is implemented through operations that traverse ciphertext slots. Assuming the time taken to traverse one ciphertext slot is denoted as \({T}_{c} \), the time consumed by homomorphic vector dot product is \((10m+2){T}_{m} + {T}_{a} + m{T}_{c}\).

In this scheme, calculating the Euclidean distance requires N homomorphic vector subtractions, N homomorphic vector dot products, and N homomorphic additions. The time taken for homomorphic vector subtraction is \(Nm({T}_{m} + {T}_{a})\), the time taken for homomorphic vector dot product is \(N[(10m+2){T}_{m} + {T}_{a} + m{T}_{c}]\), and the time taken for homomorphic addition is \(N{T}_{m} \).

When performed sequentially, the time consumed for calculating the Euclidean distance is denoted as \({T}_{serial} \), and it is equal to the sum of the times taken for vector subtraction, vector dot product, and addition: \({T}_{serial} = Nm({T}_{m} + {T}_{a})+ N[(10m+2){T}_{m} + {T}_{a} + m{T}_{c}] + N{T}_{m} = (11m+3)N{T}_{m}+(m+1)N{T}_{a}+Nm{T}_{c}\).

Parallel Performance Analysis

Assuming parallelism, the data stream is divided into s sub-data streams and assigned to s maps for ciphertext computations. Each sub-data stream contains s data points, so it satisfies \(k=N/s\). Assuming the cluster has d nodes, and on average, each node can process v sub-data streams, then \(s=dv\).

The parallel ciphertext computation process can be divided into two parts: Map and Reduce. In the Map phase, data stream computations and communication between Maps are performed. In the Reduce phase, node scheduling and merging/sorting of sub-data streams are carried out. Let the time consumed by a single Map be \({M}_{0} \), and the total time for the Map phase be M. Then:

$$\begin{aligned} \begin{aligned} M =\,& v{M}_{0}\\ M_{0}^{m a i n}\,=(11m+3)k T_{m}&+(m+1)k T_{a}+k m T_{c} \end{aligned} \end{aligned}$$
(2)

When there are s Maps working in the cluster, there will be at least 2s communication events. Let the communication time be \(T_{f}=\delta s T_{m} \). The speedup of the algorithm in the Map phase can be calculated as follows:

$$\begin{aligned} \begin{aligned} & \frac{T_{m a i n}}{T_{M a p}} \approx \frac{T_{m a i n}}{v M_{0}^{m a i n}+T_{f}}\\ & =\frac{(11m+3)N T_{m}+(m+1)N T_{a} +N m T_{c}}{v[(11m+3)k T_{m}+(m+1)k T_{a}+ k m T_{c}]+\delta s T_{m}}\\ & ={\displaystyle \frac{1}{\displaystyle {\frac{k v}{N}}+\displaystyle {\frac{\delta s T_{m}}{(11m+3)N T_{m}+(m+1)N T_{a}+N m T_{c}}}}}\\ & ={\frac{d}{1+{\frac{d\delta S T_{m}}{(11m+3)N T_{m}+(m+1)N T_{a} +N m T_{c}}}}}\approx d \end{aligned} \end{aligned}$$
(3)

In practical environments, N is usually much larger than the number of cluster nodes and the number of sub-data streams, and the communication time between nodes is almost negligible. Therefore, the theoretical speedup in the Map phase can reach the theoretical value of d.

In the Reduce phase, the main time-consuming tasks are the integration and sorting of computation results from Map nodes and communication among nodes. Let the communication time in this process be \(T_{f r}=\delta _{1}s T_{m} \). Additionally, using a sorting algorithm with a complexity of O(slogs) , the sorting time is denoted as \(S=\varepsilon s l o g s T_{m} \).

The overall speedup in the parallel computation process can be calculated as follows:

$$\begin{aligned} \begin{aligned} \frac{T_{m a i n}}{T_{P}}=\frac{T_{m a i n}}{v M_{0}^{m a i n}+T_{f}+S+T_{f r}}&\approx \frac{T_{m a i n}}{v M_{0}^{m a i n}+T_{f}}\\ \approx \frac{T_{m a i n}}{T_{M a p}}\approx d & \end{aligned} \end{aligned}$$
(4)

In the experimental environment of this paper, the average computation time for the Map phase is 0.5121 s, while the average computation time for the Reduce phase is 0.6163 s. The Reduce phase’s computation time is 1.204 times that of the Map phase.

$$\begin{aligned} \begin{aligned} S+T_{f r}\approx 1.2(v M_{0}^{main}+T_{f}) \end{aligned} \end{aligned}$$
(5)

The overall speedup is given by:

$$\begin{aligned} \begin{aligned} \frac{T_{m a i n}}{T_{P}}=\frac{T_{m a i n}}{v M_{0}^{m a i n}+T_{f}+S+T_{f r}} & \\ \approx \frac{T_{m a i n}}{v M_{0}^{m a i n}+T_{f}+1.2(v M_{0}^{m a i n}+T_{f})}\approx & \frac{1}{2.2}d \end{aligned} \end{aligned}$$
(6)

Based on the above analysis, it can be concluded that due to the relatively large computation time in the Reduce phase, the overall speedup will be approximately 1/2.2 times the numwber of cluster nodes.

5 Experiments

Precision Comparison: In this experiment, the Labeled Faces in the Wild (LFW) dataset was used, which consists of 5,749 folders containing 13,233 images. The face verification task was performed using the face pairs information provided in pairs.txt, which includes 3,000 matching pairs and 3,000 non-matching pairs. The goal was to determine the facial similarity and verify whether two face images belonged to the same person.The CPU device used in this experiment is: 11th Gen Intel Core i7-11800H @ 2.30 GHz, Octa-Core. The memory device used is: 32 GB (Kingston DDR4 3200MHz). The Flink cluster was set up using Ubuntu virtual machines, consisting of one Taskmanager and two Jobmanagers.

The results are shown in Fig. 2.

Fig. 2.
figure 2

Face Similarity

Obtain the matching results and conduct error analysis of the similarity between the unencrypted and encrypted schemes. The calculation of the error rate for the encryption scheme is done using the following formula:

$$\begin{aligned} \begin{aligned} error\_rate = \frac{ABS(Unencrypted - Encrypted)}{Unencrypted} * 100\end{aligned} \end{aligned}$$
(7)

Figure 3 corresponds to the error rate for 6,000 pairs of face verification, and the final accuracy was 98.05%, with an average error rate of 0.004618455%. The face verification scheme based on homomorphic encryption produces similarity calculation values that are close to the values obtained in plaintext. Furthermore, when compared with the accuracy results from [23], as shown in the Table 1, our method significantly improves accuracy and demonstrates higher efficiency.

Further experimental comparison of unencrypted and encrypted data is shown in Fig. 4. From the ROC curves of the False Acceptance Rate (FAR) and True Acceptance Rate (TAR), it can be observed that both unencrypted and encrypted ROC values are 98.3401%. The parallel processing of homomorphic encryption hardly affects the recognition accuracy, and this experimental result is consistent with the error analysis results.

Fig. 3.
figure 3

Face Feature Error Rate

Table 1. Accuracy Comparison

The smaller the values of False Acceptance Rate (FAR) and False Rejection Rate (FRR), the better the performance. However, changes in individual metrics can affect other metrics. From the Detection Error Tradeoff (DET) curve of FAR and FRR, it can be observed that the encrypted ERR (Equal Error Rate) value is 0.02567. A curve that leans towards the lower-left corner indicates better performance of the scheme, meaning that the difference between actual and measured values is small. The results are shown in Fig. 5.

Efficiency Comparison: This paper conducts serial and parallel tests on encrypted face recognition using a test dataset containing 10,000 pairs of encrypted face images. The parallelism of the cluster is adjusted by controlling the available slot slots and task parallelism. The experiment records the training time of the algorithm in serial and with different degrees of parallelism in the cluster. By using the serial time as a reference, the speedup ratio for different degrees of parallelism is calculated, and finally, the experimental results are analyzed. The results are shown in Table 2 and Fig. 6.

Fig. 4.
figure 4

ROC Curve for Encrypted and Unencrypted Data

Fig. 5.
figure 5

DET Curve for Encrypted and Unencrypted Data

Based on the analysis of Table 2 and Fig. 6, the following conclusions can be drawn regarding the increase in available slots and parallelism of the Flink cluster:

  1. (1)

    The overall execution time of the ciphertext gradually decreases with the increase in parallelism. Particularly, before reaching the maximum available slots in the cluster, the execution time of the ciphertext significantly decreases, leading to a notable improvement in algorithm performance. Once the parallelism exceeds the maximum available slots in the cluster, all nodes are fully utilized for computation. Moreover, the impact of increased parallelism on the cluster performance becomes relatively small, and the ciphertext execution time stabilizes gradually.

  2. (2)

    As the size of the Flink cluster increases, theoretically, it can handle more data volume and more complex computational tasks. However, in practical applications, the extent of efficiency improvement does not always directly correlate with the size of the cluster. This is mainly due to the following factors:

    • Data Partitioning: For some datasets, it may be challenging to distribute them evenly among all nodes in the Flink cluster. This leads to some nodes being overloaded while others remain idle, which decreases the overall efficiency of the cluster.

    • Network Communication: With the growth of the cluster, the amount of data that needs to be transferred between nodes also increases. If the network bandwidth is insufficient or there are high network latencies, it can slow down the communication between nodes, thus affecting the overall efficiency of the cluster.

    • Hardware Limitations: Before increasing the size of the Flink cluster, it is essential to ensure that each node has sufficient hardware resources such as memory, CPU, and disk space. Otherwise, adding more nodes may result in wastage of resources without a corresponding increase in efficiency.

  3. (3)

    Due to the significant time consumption in the Reduce stage, the overall speedup ratio decreases. The highest overall speedup ratio is 4.8, which is close to 9/2.2 = 4.09, and it is consistent with the theoretical analysis value mentioned earlier. Moreover, compared [23], the average encryption matching time per pair of images has reduced from 0.71 s to 63.5 ms, greatly improving the efficiency.

Feasibility Analysis: Based on the precision comparison experiments conducted on encrypted data, it can be concluded that the average error rate of the face recognition algorithm in the current encrypted domain is approximately 0.004618455%, which ensures high accuracy in practical application scenarios.

Table 2. Runtime and Speedup at Different Parallelism Levels
Fig. 6.
figure 6

Runtime and Speedup Curves

Furthermore, based on the efficiency comparison experiments, in the experimental environment with nine slots, the highest parallel efficiency for retrieving 10,000 facial images took approximately 635,354 ms. The average retrieval time for each image in the database was 63.5 ms. According to the conclusions from the performance analysis of parallel algorithms, by increasing the slots to 54, the acceleration ratio can reach approximately 24.5 times, and the retrieval time for each image is around 10.58 ms. If the encrypted clustering search algorithm is also applied simultaneously, it can effectively meet the requirements for daily applications.

6 Conclusion

This paper proposes a parallel privacy-preserving homomorphic encryption face verification scheme to address the issues of face information leakage and low computational efficiency in traditional face recognition. The scheme reconstructs the feature template matching protocol based on the homomorphic encryption module to achieve feature vector matching in the ciphertext domain. Security analysis and experimental comparisons demonstrate that the proposed scheme can achieve vector feature template matching while ensuring the security of the feature templates.

Leveraging the parallel computing concept of Map-Reduce, the scheme combines the stream data processing engine Flink with the homomorphic encryption algorithm CKKS to accelerate feature value matching in the ciphertext domain. Through algorithm performance analysis and experimental results, the acceleration ratio and accuracy of the algorithm are verified.

In conclusion, the proposed approach achieves efficient and accurate face verification while ensuring the security of data information during transmission and storage in the cloud.