Keywords

1 Introduction

Face recognition [12, 25, 46] is an important task in various applications. A facial image is personal information, and so we need to consider privacy when we include face recognition in these applications. The European Union has enforced the General Data Protection Regulation, which requires the protection of personal information. In addition, the regulation of privacy protection may expand worldwide in the future. We must therefore pay close attention to the protection of privacy when using facial images. However, it is difficult to successfully realize both privacy protection and face recognition.

Nodari et al. [29] proposed a method of decreasing the visibility of a facial region with a mosaic, in Google Street View. Padilla-Lopez et al. [30] proposed a method of replacing a facial image with a public image, to avoid privacy concerns. Fernande et al. [13] proposed a blurring method for a self-moving robot. Thorpe et al. [44] proposed a method using two types of blurred images, which differ depending on whether they are public or private images. These methods focus on image processing after an image is captured, and must overcome the serious problem of storing the facial image securely, because stored facial images can be leaked, and privacy can be violated through hacking.

To protect privacy, an image should be captured with enhanced security for personal information. As examples, an event camera records only the change in brightness of pixels for each frame [15], and thermal-image-based recognition also records information without detailed personal information [14]. Browarek et al. [4] proposed a method for human detection that uses an infrared sensor. Dai et al. [11] proposed a privacy protection method that uses a low-resolution camera. However, although these methods preserve privacy, they have difficulty in recognizing faces with high accuracy.

Computational photography [33] is another perspective on privacy protection that is worth considering. As an example of such technologies, Cossalter et al. [10] proposed a method that protects privacy by random projection based on compressive sensing. It captures an optically blurred image using a coded-aperture camera [6, 17, 19, 23, 24, 26, 41, 45], in which a coded mask is arranged in front of the aperture. Pittaluga et al. [31, 32] proposed a method of shooting an optically blurred image with a multi-aperture mask using three-dimensionally printed optics. However, its effectiveness in recognition technology is not clear because image recognition is not evaluated quantitatively. Wang et al. [48] proposed an action recognition method that protects privacy using a coded aperture camera. However, the recognition accuracy is low because it is insufficient as a feature extraction method for improving the accuracy of image recognition. Canh et al. [5] proposed a method in which it is possible to select whether to reconstruct the area excluding the face area or including the face area from the blurred image; however, they do not describe the application that uses the reconstructed image.

Fig. 1.
figure 1

Training steps of the proposed face recognition method. The training of the proposed method comprises three steps. First, we pretrain the face feature extraction network, using blurred images and unblurred images, for the extraction of face features. Second, we pretrain the face recognition network using unblurred images. Finally, we fine-tune the privacy-aware face recognition network

It is difficult to identify an individual from a blurred image captured by the coded aperture camera. However, it is possible to obtain a reconstructed image from the blurred image if we know a code pattern of the mask. Inspired by this technology, we propose a multi-pinhole camera (MPC) with a mask that has multiple pinholes. We also propose a face recognition method that achieves the accuracy of non-blurred images even when we use privacy-preserving blurred images. In this paper, an image captured with a normal camera is called an unblurred image, and an image captured with an MPC, in which privacy is protected, is called a blurred image. In general, the features of a blurred image are ineffective for face recognition, and reconstruction methods are employed in prepossessing, to obtain effective features. ConvNet-based methods reconstruct high-quality facial images [8, 9, 16, 21, 28, 35, 37, 39, 42, 43, 47, 50, 51]. However, these methods require substantial memory and have a high computational cost.

As an alternative approach, Xu et al. [49] and Ren et al. [36] proposed restoration methods for slight blur using shallow neural networks. These methods are based on the fact that a blurred image is constructed by convoluting an unblurred image with a point spread function (PSF). The methods can both reduce the calculation cost and suppress blurring in the blurred image, even when face reconstruction is difficult. Therefore, the methods are suitable for face feature extraction, while privacy is protected because face reconstruction is difficult, because of their use of a shallow neural network.

Face recognition and privacy protection are inseparable. It is difficult to protect privacy when effective features can be extracted from an unblurred image. However, face recognition accuracy decreases when the restored face features are protected by privacy because the blurred features remain. To solve this dilemma, we focus on the extraction of effective features, rather than the face image. Additionally, it is possible to recognize a face in an extremely blurred image.

We propose a method of improving the face recognition accuracy in a privacy-protected image to solve the above dilemma. Our method involves capturing an extremely blurred facial image using a lensless MPC, for privacy protection. The training of the proposed method comprises three steps, as shown in Figure 1. First, we pretrain the face feature extraction network (FFEN), which extracts face features from a blurred image. Second, we pretrain the face recognition network (FRN) using unblurred images. Finally, we train a privacy-aware FRN, which is the connected FFEN and FRN, as one network, in an end-to-end manner.

The contributions of the paper are as follows.

  • New attempts are made to achieve both privacy protection and high recognition accuracy.

  • Effective features are extracted from a restored facial image captured by a lensless MPC.

  • High facial recognition accuracy is achieved even if the facial image is extremely blurred.

2 Lensless Multi-pinhole Camera

The MPC captures a blurred image such that privacy is protected, even if a raw image stored in memory is leaked. Conventional coded apertures are intended to record light rays. Their image is therefore only slightly blurred, and privacy protection is not possible. In contrast, in the MPC, a coded mask with multiple pinholes is arranged in front of the aperture. Multiple blurs can be superimposed because light rays from the same object are incident on each pinhole. It is difficult to identify the individual in such a multiple-blurred facial image.

Fig. 2.
figure 2

Proposed MPC: (a) a lensless MPC, (b) an enlarged image of the coded mask captured with a microscope, and (c) a PSF of the coded mask (b)

Table 1. Specifications of the coded mask

2.1 Design of the Proposed Lensless Multi-pinhole Camera

Various methods have been proposed to reconstruct deblurred images from images captured by a lensless camera [2, 20, 22, 27, 40]. These methods have a high cost because the imaging systems must be changed substantially. In contrast, the setup of the lensless MPC only requires a coded mask to be attached in front of the aperture. Therefore, the lensless MPC is more versatile and practical. Figure 2 shows our lensless MPC and coded mask. We employ an FLIR Blackfly BFLY-U3-23S6C-C for the body of the lensless MPC, as shown in Fig. 2(a). Figure 2(b) is an example of an enlarged image of the coded mask captured with a microscope. Figure 2(c) is the PSF corresponding to the coded mask in Fig. 2(b). The blurred image is a spatial convolution of an unblurred image and the PSF. Therefore, if the PSF is unknown, it is difficult to reconstruct the unblurred image from the blurred image. Conversely, if the PSF is known in advance, as it is for our system, the unblurred image can be reconstructed easily, by inversely convoluting the blurred image with the PSF. Because the PSF is different for each camera, a hacker would need to steal the camera to discover the PSF. The hacker would then need to install the camera in its original location after measuring the PSF. Therefore, the PSF leakage risk is lower than the image leakage risk in an actual scene. Even if a blurred image is leaked from the network or data storage, the risk of image restoration is small.

Fig. 3.
figure 3

Examples of simulated images convoluted by a PSF: (a) an unblurred image, and images convoluted by (b) PSFid:3–025, (c) PSFid:3–050, (d) PSFid:9–025, and (e) PSFid:9–050

2.2 PSF of the Proposed Camera

The PSF represents the response of an optical system to a point light source, in terms of the spread of spatial blur. If we know the PSF function as prior information, we can obtain the unblurred image by convolving the blurred image and the inverse PSF. We can measure the PSF before capturing an image.

We prepared four types of coded mask. The specifications of each coded mask are given in Table 1, including the number of pinholes and the distance from the center of the mask to the nearest pinhole. Figure 3 shows the measured PSF and an example of a blurred image, for each coded mask. The overlapping of objects increases with the number of pinholes. In the case where there are three pinholes and the distance from the center of the mask to each pinhole is 0.25  mm, the facial image is blurred, as shown in Fig. 3(b). It is difficult to recognize the individual in the image because parts of the face that exhibit individual characteristics, such as the eyes, nose, and mouth, are blurred. The facial image is extremely blurred when we combine nine pinholes with a distance from the center of the mask of 0.50 mm. It is even difficult to recognize the image as being that of a human face, with these settings.

3 Face Recognition from Images Captured by the Lensless Multi-pinhole Camera

We propose a method of recognizing a face in a blurred image captured by the lensless MPC. As shown in Fig. 1, the proposed method adopts the FFEN and FRN. In the FFEN, we extract effective face features from the blurred image. We can employ a state-of-the-art FRN if we obtain features similar to the features of the unblurred image. The important aspect of the proposed method is that, to preserve privacy, we do not reconstruct the deblurred facial image explicitly. In the proposed method, we train both networks in an end-to-end manner to obtain suitable face features for the FRN. The training of the proposed method comprises three steps. First, we pretrain the FFEN, which extracts face features from a blurred image. Second, we pretrain the FRN using unblurred images. Finally, we train a privacy-aware FRN, which is the connected FFEN and FRN, as one network, in an end-to-end manner. To protect privacy, we use these images only for training.

Many reconstruction methods have been proposed, but they focus on reconstruction of the entire face. This approach fails to reconstruct detail in the facial region. However, facial areas exhibiting individual characteristics are important features from the viewpoint of facial recognition. In contrast, in our approach, the FFEN focuses on the extraction of face features from a blurred image instead of a high-quality facial image.

The FRN extracts features that are effective in verifying the individual and can be easily used with state-of-the-art methods. We employ metric-learning-based methods that achieve high recognition accuracy, such as ArcFace [12], CosFace [46], and SphereFace [25]. After pretraining the FFEN and FRN, we fine-tune both networks using a blurred facial image to extract suitable features.

3.1 Pretraining of the FFEN

We measure the PSF of the lensless MPC before training the FFEN. We initialize the parameter of the network by calculating the inverse PSF, following [36, 49]. The blurred image y is obtained by convolving the unblurred image x and PSF k, as expressed in Eq. (1).

$$\begin{aligned} y = k *x. \end{aligned}$$
(1)

Here, \(*\) is the convolution operation. Equation (1) is replaced by Eq. (2) in the frequency domain. The convolution operation is the product of each element in the frequency domain.

$$\begin{aligned} \mathcal {F}(y) = \mathcal {F}(k) \times \mathcal {F}(x). \end{aligned}$$
(2)

Here, \(\mathcal {F}(\cdot )\) is the discrete Fourier transform. After converting to the frequency domain, we convert the blurred image y to the unblurred image x by Eq. (3).

$$\begin{aligned} \begin{aligned} x = \mathcal {F}^{-1}(1/\mathcal {F}(k)) *y. \end{aligned} \end{aligned}$$
(3)

The function \(\mathcal {F}^{-1}(\cdot )\) is the inverse Fourier transform. To prevent division by zero in the frequency domain, the Wiener filter, expressed in Eq. (4), is used.

$$\begin{aligned} \begin{aligned} x&= \mathcal {F}^{-1}(1/\mathcal {F}(k) \{ \frac{|\mathcal {F}(k)|^2}{|\mathcal {F}(k)|^2+\frac{1}{SNR}}\}) *y \\&= k^{\dag } *y. \end{aligned} \end{aligned}$$
(4)

Here, \(k^{\dag }\) is the pseudo-inverse PSF and the SNR is the signal-to-noise ratio in the pseudo-inverse PSF. If the SNR is large, it is robust to noise.

The pseudo-inverse PSF can be resolved into \(k^{\dag } = USV^{T}\) through singular value decomposition (SVD). The elements of the \(j^{th}\) rows of U and V are \(u_j\) and \(v_j\), respectively, and the \(j^{th}\) singular value is \(s_{j}\). In Eq. (4), SVD replaces the convolution of the two-dimensional pseudo-inverse PSF with the product of the convolution of the one-dimensional vectors \(u_j\) and \(v_j\) and the scalar \(s_j\), as in Eq. (5).

$$\begin{aligned} \begin{aligned} x&= \sum _j{s_j \cdot u_j *(v_j^T *y) }. \end{aligned} \end{aligned}$$
(5)

Conversion from the blurred image to the unblurred image using the pseudo-inverse PSF can be considered to be the adoption of a convolutional neural network taking \(s_j\), \(u_j\), and \(v_j^T\) as the convolutional kernels of three layers. These three layers have neither an activation function nor normalization, such as batch normalization. We use the outlier rejection subnetwork in addition to the last three layers, following [36, 49].

The FFEN module in Fig. 1 shows the network architecture. The first and second layers have \(K \times 1\) and \(1 \times K\) kernels, respectively. Both layers have K channels. The third layer has a \(1 \times 1\) kernel with K channels. The initial values of the kernels are the K eigenvectors and eigenvalues selected from the larger eigenvalue. The kernel sizes of the fourth, fifth, and sixth layers are \(15 \times 15\), \(1 \times 1\), and \(7 \times 7\), respectively. The fourth, fifth, and sixth layers have 64, 128, and 128 channels, respectively. In optimization, we use the \(L_1\) loss for FFEN.

$$\begin{aligned} \begin{aligned} loss_{FFEN}&= \frac{1}{N}\sum _{n=1}^{N}{|x_n - z_n|} \\ z&= DF(y) \end{aligned} \end{aligned}$$
(6)

Here, N is the number of pixels, x is the unblurred image, y is the blurred image, and DF(y) is the face feature of y.

3.2 Pretraining of the FRN

The FFEN outputs features that are effective for face recognition. These features are then input to the FRN. We first perform pretraining of the FRN with unblurred facial images. The FRN network is based on ArcFace [12], CosFace [46], and SphereFace [25], which are state-of-the-art methods. ArcFace obtains effective features using cosine distance. The loss function for pretraining in ArcFace is given by Eq. (7).

$$\begin{aligned} loss_{arcface} = -\frac{1}{M}\sum _{i=1}^{M}log\frac{ e^{s(cos(\theta _{y_i}+m)))}}{e^{s(cos(\theta _{y_i}+m))}+\sum _{i=1,j \ne y_i}^ne^{scos(\theta _j)}}. \end{aligned}$$
(7)

Here, M is the number of data items, s is the scale parameter for cosine similarity, and m is the margin with other classes.

To recognize whether two facial images are of the same person, we extract features of the faces using the trained FRN. The two facial images are of the same person if the cosine distance between the extracted features is greater than or equal to a threshold.

3.3 Fine-Tuning of the Privacy-Aware FRN

The FFEN and FRN are trained independently. The output from the FFEN comprises face features, and the input to the FRN is the face features. We perform fine-tuning to adapt to the input and output of the two networks in an end-to-end manner, using the loss function given by Eq. (7). The proposed method is less affected by blur because of the combination of these networks.

For our networks, particularly the FFEN, the feature extraction accuracy of the entire face region is not essential. By fine-tuning both networks, it is possible to extract the feature only the region in which it is effective to extract features for recognizing the individual. Even if the subject wears eyeglasses, and there are few samples of faces wearing eyeglasses in the training data, the network can extract features in other important regions. When we do not pretrain the FFEN, it is necessary to extract features of the entire facial image that represent individual characteristics. However, it is difficult to extract them because the network cannot extract a feature of a small region that represents individual characteristics. The proposed method improves the accuracy of face recognition by training a FFEN that extracts feature maps representing individual characteristics.

4 Experiments

4.1 Details of Implementation

The parameters of each layer of the FFEN are shown in Fig. 1. An activation function is not arranged in the first three layers. In the second three layers, Leaky ReLU, with a gradient of 0.02, is arranged as an activation function. The mini-batch size is 1, the learning rate is 0.0001, and the number of iterations is 50 epochs in the FFEN. We use 58,346 images, randomly sampled from MS1MV2 [12] and LFW [18]. To validate the performance of feature extraction, we use 58,346 images, randomly sampled from MS1MV2 and LFW images, that are not used in training.

We employ SphereFace [25], CosFace [46], and ArcFace [12] as the FRN. The backbone network is ResNet50. We use the MS1MV2 dataset for training. The number of images is 5,822,653 and the number of IDs is 85,741. We use LFW [18], CPLFW [3], and CALFW [7] for the evaluation data; there are 12,000 images in each dataset. Each image is normalized in terms of orientation and cropped to \(112 \times 112\) pixels. The mini-batch size is 256 and there are four epochs. The initial learning rate for pretraining is 0.1, and the learning rate is multiplied by 0.1 in epoch 3. The learning rate, momentum, and weight decay in fine-tuning are determined by adopting Bayesian optimization [1].

Public face recognition datasets do not include both unblurred and blurred images. Therefore, we first simulate the blurred images using the PSF of this camera, as shown in the leftmost four columns of Fig. 4. Blurring of PSFid:3–025 can be seen for the eyes, nose, and mouth, and it is difficult to recognize the individual. In the case of PSFid:3–050, the distance of each pinhole from the center of the mask is large, and it is possible to identify the individual, but the positional deviation is large and feature extraction is therefore difficult. It is generally possible to identify the shape of the facial contours in the blurred image of PSFid:9–025, but it is difficult to identify facial parts, because of the blur. For the blurred image of PSFid:9–050, it is difficult to identify the contours of the face and face parts. In order to prevent personal identification, our method blurs the face by overlaying the image. When the number of pinholes is small or the distance from the center of the mask is large, the area where the image overlaps decreases. In this case, it is difficult to protect privacy, so in order to make it difficult to identify individuals, it is desirable to make pinholes where face images overlap in many areas. We used blurred images captured by this apparatus as real images in the experiments reported below, in which we evaluated a simulated image against a real image captured by the lensless MPC.

Fig. 4.
figure 4

Examples of unblurred and blurred images and attention maps of each image. (a) Left four columns: examples of unblurred images and blurred images for each PSFid. (b) Center four columns: attention maps of unblurred images and proposed method for each PSFid. (c) Right four columns: attention maps of conventional method for each PSFid

Table 2. Comparison of face verification results (%)

4.2 Face Recognition Results of the Privacy-Aware FRN

In this section we present the results of the pretraining of the FFEN and FRN, and the results of the fine-tuning of the privacy-aware FRN for each PSF. Table 2 shows the face recognition results of each PSF for LFW, CPLFW, and CALFW. In Table 2, the first column shows the dataset, the second column shows the PSFid, and the third and subsequent columns show evaluation results using different FRN algorithms. (A), (B), (C), and (D) show the result of training with a blurred image, the result of training without pretraining the FRN, the result without fine-tuning, and the result of the proposed method, respectively. (A) is a conventional result. Each row shows the result for different coded masks, and the other rows show the results of SphereFace, CosFace, and ArcFace trained with unblurred images. The first value of each PSFid is the number of pinholes, and the second value is the distance of the pinholes from the mask center.

When there are three pinholes, the performance is similar to that when unblurred images are used in training. Even for nine pinholes, the performance of the proposed method is superior to that without pretraining or fine-tuning. This result shows that both pretraining and fine-tuning, which are the training steps of the proposed method, are effective. The recognition rate of CPLFW and CALFW is lower than that of LFW. This is not limited to this study, but it has been reported that this trend is similar in [12]. CPLFW performs face verification of pair images with different face pose, and CALFW performs face verification of pair images of different ages. Therefore, CPLFW and CALFW are more difficult images than LFW.

4.3 Analysis Using the Area of Focus of Features and Extracted Features

We visualize whether a fine-tuning model extracts face features, using Grad-Cam [38]. ArcFace obtains similarity based on the cosine distance between feature vectors. The visualization is performed using a one-hot vector that has a value of 1 for the most similar person.

The leftmost four columns (a) of Fig. 4 show the unblurred image and the blurred image for each PSFid, the center four columns (b) show the attention maps of the proposed method for each PSFid, and the rightmost four columns (c) show the attention maps of the blurred image (conventional method) for each PSFid. When the face has little blur, such as in the case of PSFid:3–025, the attention maps of both the unblurred image and the proposed method are similar in position and strength within the area of the face. For other PSFs, the position of the attention map is slightly different, but parts of the face such as the eyes, nose, and mouth respond strongly. In the case of the conventional method, the attention maps are largely outside the face. It is therefore difficult to obtain effective features for face recognition from the blurred image.

Fig. 5.
figure 5

Examples of captured images. Real captured images are more blurred than simulated images

Table 3. Face verification results for captured images (%)

4.4 Experiments Using Real Images

We compared the accuracy of face recognition for a real image, using blurred images captured by the lensless MPC for PSFid:3-025 and 9-025. The unblurred image was displayed on the monitor in a dark room and considered as a captured image with real blur. As a result, a pair, comprising an unblurred image and a real blurred image, was obtained. To train the FFEN, we used 53,143 images randomly sampled from MS1MV2 and LFW. The captured images are presented in Fig. 5. The real image used in the experiment was more blurred than the simulated image. To train the FRN, we used 147,464 images sampled from MS1MV2. The image size is \(112 \times 112\) as well as simulation.

Comparison results are shown in Table 3. The proposed method achieved higher accuracy than the conventional method. Although the blurred image was extremely blurred and there was little training data, the setting for both PSFid:3–025 and PSFid:9–025 had higher accuracy than the other settings. In an experiment using real images, pretraining and fine-tuning were effective, as in an experiment using simulated data. Because the proposed method achieved high accuracy even with real images, we conclude that it achieves face recognition that can protect privacy.

Fig. 6.
figure 6

Training losses of CycleGAN

4.5 Evaluation of Privacy Protection Performance of Proposed System

We evaluate the privacy protection performance of blurred images. As noted in Sect. 2.1, PSF does not leak. Therefore, we evaluated the privacy protection performance using CycleGAN [52], which is a generative model, and SelfDeblur [34], which is one of the state-of-the-art methods for blind deconvolution.

For training CycleGAN, we require unblurred and blurred images. Unblurred images were randomly selected from LFW. Two types of blurred images were used: The blurred images selected from LFW did not overlap with the unblurred images from LFW. The number of training images in each set (unblurred images from LFW and blurred images) was 5000. We used images that were not used for training, as the evaluation images. The number of training iteration is 10000. In each PSFid, losses of a generator are shown in Fig. 6. The vertical axis is the loss, and the horizontal axis is the number of the iteration. From this figure, losses converged in approximately 7000 iterations, so in this experiment, sufficient training has been done by 10000 iterations. SelfDeblur estimates the unblurred image and the PSF, given a single image.

Fig. 7.
figure 7

Example of privacy protection performance. (a) is an unblurred image, (b) is a result of the simulated image, and (c) is a result of the captured image. For each PSFid, the leftmost image is the blurred image, the center image is that generated by CycleGAN [52], and the rightmost image is that reconstructed by SelfDeblur [34]

Figure 7(a) shows an unblurred image, Fig. 7(b) shows the reconstruction result of the simulated image, and Fig. 7(c) shows the reconstruction result using the captured image. For each PSFid, the figure shows the blurred image, the results of CycleGAN, and the results of SelfDeblur. The image generated by CycleGAN is a sharp image. When the distance between pinholes is small, such as PSFid:3–025 and PSFid:9–025, the contour shape is similar to an unblurred image, but the face parts of the generated image are different from those of the unblurred image. In contrast, when the distance from the center of PSF is large, such as PSFid:3–050 and PSFid:9–050, the unblurred image and the generated image differ greatly in the shape of the contours, in addition to the face parts. Therefore, it is difficult to recognize the blurred image and the generated image as the same person. In general, the more training, the higher the accuracy. However, in GAN, the distribution of training data is trained. In this experiment, the feature distribution for expressing the face is trained rather than the character which represents the individual. Therefore, it is possible to generate face images without blur, but since individuality is lost, the face recognition accuracy does not necessarily increase even if the number of images is increased in this experiment.

The deconvolution image created by SelfDeblur from the simulated image can approximately distinguish the face area from the background. However, the artifact is so large that the subject cannot be identified. This tendency is the same for all PSFids, but increasing the number of pinholes causes the face shape to collapse more, making deconvolution difficult. In the result of deconvolution using a captured image, it is difficult even to visually recognize the position of the face area. From these results, it was confirmed that it is difficult to identify the person in image generation and image reconstruction when the PSF is unknown, and the proposed system is effective for privacy protection.

5 Conclusion

We have proposed a privacy-aware face recognition method that solves the dilemma of simultaneously realizing good privacy protection and face recognition accuracy. To be successful at both, we constructed an acquisition system based on a lensless MPC that captures extremely blurred face images. The MPC has several pinholes and captures a blurred image. From this blurred image, we extract face features that are similar to those of an unblurred image using a FFEN. The FFEN is trained with initial parameters calculated using the inverse PSF. An FRN based on ArcFace recognizes a person using the face features. These networks are fine-tuned, in an end-to-end manner, after each is pretrained.

We are concerned that privacy may not be protected in the event that a hacker steals the captured image. If the PSF is unknown, it is difficult to reconstruct the image only from the blurred image; however, if the PSF is known, image reconstruction can be performed relatively easily. However, because the PSF is different for each camera, a hacker would need to measure the PSF, in addition to stealing the captured image. Therefore, in a real environment, it is unlikely that a hacker could recover a blurred image. By experiments using image reconstruction when the PSF is unknown, we showed that it is difficult to reconstruct a blurred image without PSF into an unblurred image.

We experimented with four types of coded masks, but these are not always optimal for privacy protection. In future studies, we intend to design a pattern that is optimal for both recognition and privacy protection, by treating the coded mask pattern as a training parameter. And, The loss of face recognition is back-propagated to FFEN by fine-tuning, but it does not specify explicitly whether to train the effective region for face recognition. We consider effective use of combining with attention and facial feature inspection.