1 Introduction

It is well known that illumination is an important factor to perform computer vision task. As is shown in Fig. 1, some reasons, such as the excessive exposure and the lack of exposure of the camera, the intensity and direction of the light, could make the lighting conditions complicated. Face appearances can change dramatically due to illumination variations. This fact will cause: the variations between the images of the same face since illumination are almost always larger than image variations due to changes in face identities. Zhang et al. (2021) proposed a recursive copy and paste generative adversarial network (Re-CPGAN) to recover authentic high-resolution face images while compensating for non-uniform illumination. A depth algorithm was proposed for a single-lens occluded portrait to estimate the actual portrait distance for different poses, angles of view and obscuration (Tsai et al. 2020). A network analysis approach was carried out, depicting sub-communities of nodes related to face (Moret-Tatay et al. 2020). There are some still challenges in illumination (Adini et al. 1997). Therefore, illumination normalization of face is essential and valuable. In this study, we focus on the task of illumination normalization of facial images under different illumination conditions and a variety of head poses.

Fig. 1
figure 1

Some poor-lighted faces under various head poses

As a pioneering work, Faisal et al. (2006) combine Phong’s lighting model and a 3D face model to normalize illumination of color face. Unfortunately, due to the requirement for 3D point clouds and a large amount of computation, this method has limited practical application. With the developments of hardware and neural networks, illumination normalization is gradually evolved from traditional ways to deep learning-based techniques. So far, there are a few of the methods to process illumination with deep learning, for which there remain two challenges: (1) The key challenge is identity preservation. (2) It is difficult to deal with the illumination of color face. Ma et al. (2018) first used generative adversarial nets to process illumination of facial images. Han et al. (2019) put forward asymmetric joint generative adversarial networks (GAN) to normalize facial illumination with lighting labels. Therefore, in view of the shortcomings of the methods, we propose a new method to normalize illumination of color facial image without any identity label and illumination label as input. Moreover, our method which contains a generator, a discriminator and one feature extractor, is simpler than the previous methods based on deep learning.

Inspired by the success of GAN on image denoising, image synthesis and transfer learning, we reformulate the illumination normalization problem like the way of dealing with the tasks above. Our goal is to learn a GAN mapping from any poor-lighted images to well-lighted images, and the latter is called standard illumination cases in this study. In summary, our main contributions are listed as follows:

  1. 1.

    A new scheme is proposed for the illumination normalization of color face images. Unlike previous deep learning methods, we reduce the number of discriminators and not utilize reconstruction computation, which improve the computational speed.

  2. 2.

    We use content loss and elaborately designed generator for preserving identity. Experimental results demonstrate that the combination of content and L1 loss makes our method achieve good performance. The proposed method can not only process the illumination of frontal face but also the non-frontal face.

  3. 3.

    Though our model is trained on the faces under well-controlled lighting variations, it generalizes to face images with less-controlled lighting variations well, meanwhile preserving its identity effectively.

The reminder of the paper is organized as follows. Section 2 describes related work on illumination normalization. Section 3 describes the proposed Illumination Normalization GAN (IN-GAN) in details. Experimental results, evaluation and comparisons are included in Sect. 4. Section 5 describes validation for identity preserving. Finally, conclusions are drawn in Sect. 6.

2 Related work

2.1 Illumination normalization

  1. (1)

    Traditional methods

To deal with the illumination variation problem, numerous works have been put over the past decades. In 1987, Pizer et al. (1987) proposed adaptive histogram equalization to enhance image contrast. Afterward, many researchers extended the histogram equalization algorithm. For instance, Shan et al. (2003) propose region-based histogram equalization to deal with illumination. Xie et al. (2005) proposed block-based histogram equalization for illumination processing. Orientated local histogram equalization that compensates illumination while encoding rich information on the edge orientations is presented by Lee et al. (2012). In 1999, Shashua and Riklin-Raviv (1999) proposed the quotient image method that provided an invariant approach to deal with the illumination variation. Afterward, many researchers extended the quotient image algorithm. Shan et al. (2003) developed gamma intensity correction for normalizing the overall image intensity at the given illumination level by introducing an intensity mapping and quotient image relighting. Wang et al. (2004) put self-quotient image. Chen et al. (2005) came up with TV-based quotient image model for illumination normalization. Srisuk et al. (2008) proposed Gabor quotient image to extend from the self-quotient image by which the 2D Gabor filter is applied instead of weighted Gaussian filter. An et al. (2010) proposed a decomposed image under L1 and L2 norm constraint, then obtained illumination invariant large-scale part by region-based histogram equalization and got illumination invariant small-scale part by self-quotient image.

Adini et al. (1997) proposed logarithmic transformation, directional gray-scale derivation, and Laplacian of Gaussian for illumination normalization. Single-scale Retinex was put by Jobson et al. (1997) for processing illumination. Fitzgibbon et al. (2002) proposed Gaussian high-pass to process illumination. Local normalization technology proposed by Xie et al. (2006). It could effectively eliminate the adverse effect of uneven illumination while keeping the local statistical properties of the processed image the same as in the corresponding image under normal illumination condition. Chen et al. (2005) came up with a lighting normalization method based on the generic intrinsic illumination subspace, which was used as a bootstrap subspace for novel images. Du et al. (2005) presented wavelet-based illumination normalization. Chen et al. (2006a, b) proposed logarithmic total variation for processing illumination. Chen et al. (2006a, b) put a new method named logarithmic discrete cosine transformation for illumination compensation and normalization. Tan and Triggs (2010) processed illumination by combination of gamma correction, difference of Gaussian filtering, masking, and contrast equalization, which was called TT in literature (2013). Fan et al. (2011) proposed a method named homomorphic filtering-based illumination normalization. The filter’s key component was a difference of Gaussian.

Wang et al. (2011) came up with illumination normalization based on Weber’s Law. Zhao et al. (2012) proposed a self-lighting ratio to suppress illumination differences in the frequency domain. A linear representation-based face illumination normalization method was put forward by Li et al. (2012). BimaSenaBayu et al. (2013) proposed an adaptive contrast ratio based on fuzzy by considering two models of individual face as input, appearance estimation model and shadow coefficient model. Goel et al. (2013) put forward an approach for illumination normalization based on discrete wavelet transformation and discrete cosine transformation. Discrete wavelet transformation was performed on the image and discrete cosine transformation was employed on low frequency sub band. Then low frequency discrete cosine transformation coefficients were modified to suppress the illumination variations. Vishwakarma (2015) proposed a fuzzy filter applied over the low-frequency discrete cosine transformation coefficients method for illumination normalization. With the development of 3D technologies, physical lighting models became a mainstream. Zhao et al. (2014) decomposed lighting effect by ambient, diffuse, and specular lighting maps and estimated the albedo for face images with drastic lighting conditions. Tu et al. (2017) presented a new and efficient method for illumination normalization with an energy minimization framework. Ahmad et al. (2017) used independent component analysis and filtering to process illumination. Zhang et al. (2018) presented a novel patch-based dictionary learning framework for face illumination normalization. Zheng et al. (2019) proposed a local texture enhanced illumination normalization method based on fusion of difference of Gaussian filters and difference of bilateral filters. Zhang et al. (2019) first combined Phong’s model and Lambertian model, then generated the chromaticity intrinsic image (CII) in a log chromaticity space that was robust to illumination variations. The largest matching area was helpful to perform lighting normalization, occlusion de-emphasis and finally face recognition (Mclaughlin et al. 2017).

  1. (2)

    Deep learning-based methods

The essence of deep learning is to solve a function to realize the mapping from input to output. Ma et al. (2018) used GAN to process illumination of color faces. Though their method can generate vivid and well-lighted facial images based on illumination label. However, reconstruction and discriminators were used, it took more time to complete its computation. Han et al. (2020) put forward asymmetric joint GAN to normalize face illumination. Their method contained two GANs, one is to normalize illumination, the other is to maintain personalized facial structures. Moreover, their method needs lighting labels.

2.2 GAN and their applications

The GAN (Goodfellow 2014) brings extraordinary vitality to the image generation, even expand to Fourier series (González-Prieto et al. 2021). With the help of the combination of GAN and convolutional neural network (CNN), deep convolutional generative adversarial (Radford et al. 2016) makes a great leap in the ability of image generation. By specifying the input conditions, conditional GAN (Mirza et al. 2014) can generate the specific target photos. At the same time, GAN leads to a series of breakthroughs in image inpainting (Demir et al. 2018), super-resolution (Wang et al. 2018), style transfer (Yang et al. 2019), image translation (Zhu et al. 2017; Lin et al. 2018) and so forth. In the field of face recognition, the training data is expanded by using GAN to generate some face photos (Shrivastava et al. 2017), different expressions (Pumarola et al. 2018), different age faces (Zhu et al. 2017), etc. For the application of generating frontal photos for FR, TP GAN (Huang et al. 2017), DR-GAN (Tran et al. 2017) and others synthesize large pose face images into frontal photos, and obtain good FR results.

3 Proposed method

3.1 Overall framework

In this section, we detail the proposed IN-GAN for illumination normalization. Figure 2 shows the block diagram of IN-GAN, which takes a set of poor-lighted face images and corresponding standard illumination images as input and outputs a set of well-lighted face images in an end-to-end way. As is illustrated in Fig. 2, the core of our approach consists of a generator, a discriminator, and a feature extractor. Our generator is composed of an encoder network and a decoder network. The discriminator is employed to judge its input whether real images or fake images generated by the generator. The feature extractor is to extract face features for every face image. In the testing phase, we just use a generator to transform poor-lighted face images into well-lighted images. We utilize 3 loss items: adversarial loss, content loss and L1 loss.

Fig. 2
figure 2

Our overall generative adversarial network framework

3.2 Generator and discriminator architecture

The generator of our IN-GAN is inspired by the components of Pix2Pix (Isola et al. 2017) and U-net (Ronneberger et al. 2015). Our generator consists of 11 convolutional layers and 11 deconvolutional layers, each of which is equipped with a LeakyReLU as activation. Details of the generator are shown in Fig. 3. As is shown in Fig. 3, the input size of the generator is designed to be a 128 × 128 color image. The output resolution of our generator is 128 × 128 pixels in size. The dotted lines in Fig. 3 are skip connections that are conducive to feature retention. In the middle 6 convolutional layers, we utilize dropout to avoid overfitting and special deconvolutional and convolutional layers at the end of the generator for enhancing the synthetic ability of our model. For this special design, which further enhances the ability of feature retention. Because InstanceNorm (Ulyanov et al. 2016) has the characteristics of preventing instance-specific mean and covariance shift simplifying the learning process, we utilize InstanceNorm after each convolutional layer. Moreover, experiments demonstrate that InstanceNorm makes our model converge fast, which means that our model can obtain higher recognition rate, good visual effect, and fine illumination normalization results after about 14 epochs. InstanceNorm can be computed by:

$$y_{tijk} = \frac{{x_{tijk} - \mu_{ti} }}{{\sqrt {\sigma_{ti}^{2} + \varepsilon } }}$$
(1)
$$\mu_{ti} = \frac{1}{HW}\sum\limits_{l = 1}^{W} {\sum\limits_{m = 1}^{H} {x_{tilm} } }$$
(2)
$$\sigma_{ti}^{2} = \frac{1}{HW}\sum\limits_{l = 1}^{W} {\sum\limits_{m = 1}^{H} {\left( {x_{tilm} - \mu_{ti} } \right)^{2} } }$$
(3)

where \(x \in R^{T \times C \times W \times H}\) is an input tensor with a batch of \(T\) images. Let xtijk mean its tijk-th element, where k and j are span spatial dimensions, \(i\) denotes the feature channel (color channel if the input is a RGB image), t is the index of the image in the batch. The discriminator of our IN-GAN is inspired by the compoents of Pix2Pix (Huang et al. 2017) and consists of 5 convolutional layers, each of which is equipped with a LeakyReLU as activation. Details of the discriminator are shown in Fig. 4. As is illustrated in Fig. 4, the input size of the discriminator is designed to be a 128 × 128 color image. We use InstanceNorm after each convolutional layer.

Fig. 3
figure 3

The detailed structure of our generator

Fig. 4
figure 4

The detailed structure of our discriminator

3.3 Objective

Our method uses one discriminator D and a generator G, which constitutes a set of adversarial training processes respectively and optimizes their min–max problems. Our full objective is:

$$L(G,D) = L_{adversarial} (G,D) + \lambda_{1} \times L_{content} {\text{(G) + }}\lambda_{2} \times L_{l1} {\text{(G)}}$$
(4)

where λ1, λ2 are weight parameters respectively, \(L_{adversarial}\), \(L_{content}\) and \(L_{l1}\) are as follows:

$$L_{adversarial} (G,D) = E_{x} \left[ {\log D\left( x \right)} \right] + E_{G(x)} \left[ {\log \left( {1 - D\left( {G\left( x \right)} \right)} \right)} \right]$$
(5)
$$L_{content} (G) = \left\| {F(y) - F(G(x))} \right\|_{1}$$
(6)
$$L_{l1} (G) = \left\| {y - G(x)} \right\|_{1}$$
(7)

where x denotes input image, whereas y is the target image (standard illumination), F means feature extractor such as VGG-19 (Simonyan et al. 2014), ResNet-50 (He et al. 2016) for extracting feature. In this study, ResNet-50 is trained on VGGFace2 (Cao et al. 2018).

4 Experiments

4.1 Datasets

As far as the dataset, we choose Multi-PIE (Gross et al. 2010) that is with 15 poses ranging from − 90° to + 90°. Each pose includes 20 illuminations and up to 6 expressions. All the dataset is 337 identities. We select 0°, − 15°, − 30°, − 45° faces, 20 illuminations and natural expression, without glasses from session 1 of Multi-PIE as our dataset. We detect and crop faces from the dataset with single shot scale-invariant face detector (S3FD) (Zhang et al. 2017) and resize to 128 × 128 as our training and test sets. The 07 illumination faces are chosen as standard face (standard illumination) and the rest are selected as poor lighted facial images. The total identity number is 129. 30 identities are chosen as our test dataset and the rest 99 identities as our training dataset. As to the test dataset, we organize 4 settings. The setting 1 contains only frontal facial image, which is 30 identities and includes 19 illuminations. Regarding settings 2, it is the same identities as setting 1 but with large head poses such as [− 15°, − 30°, − 45°] under 19 illuminations. With respect to setting 3, we choose − 15° facial images and convert RGB to gray. Regarding setting 4, we obtain some poor-lighted faces using search engines from the internet and face recognition grand challenge (FRGC) database (Phillips et al. 2005). For the purpose of verifying the performance of our algorithm, we also add people who wear glasses and faces under large head poses to our test dataset. All the faces of setting 4 are under less controlled lighting variations.

4.2 Implementation details

As to the encoder, LeakyReLU with a slope of 0.2 is used for activation firstly. Regarding the decoder, LeakyReLU is a slope of 0.2. For gradient descent, we use Adam (Kingma et al. 2015) optimeter, and choose a learning rate of 0.0002 with momentum parameters β1 = 0.5, β2 = 0.999, weight decay = 0.0001. One 1080Ti graphics cards with batch size of 16 is used for training. The time of 20 epochs is about 25 min. Besides, during the training period, without any data enhancement methods. By setting different values for λ1 and λ2, we get 3 combinations of loss items. Set λ1 = 1.0, λ2 = 0 to train a model with content loss item only. We choose λ1 = 0.0, λ2 = 1.0 to train a model with L1 loss item only. When λ1 = 1.0 and λ2 = 0.1, we train a model with content and L1 loss items.

4.3 Metrics

At present, most literatures evaluate the performance of illumination normalization algorithms from two aspects: one is recognition rate, the other is illustrated by some face images before and after processing by various methods. Cosine similarity of feature vectors for face recognition is selected. For the purpose of more comprehensively estimating the performance of various illumination normalization algorithms, except for comparing recognition rate and illustrating examples, peak signal to noise ratio (PSNR) is used to evaluate the performance of various illumination normalization methods. Cosine similarity and PSNR are briefly introduced as follows:

  1. (1)

    Cosine similarity is defined as:

    $$\cos \left( \theta \right) = \frac{{\sum\nolimits_{i = 1}^{n} {A_{i} \times B_{i} } }}{{\sqrt {\sum\nolimits_{i = 1}^{n} {\left( {A_{i} } \right)^{2} } } \times \sqrt {\sum\nolimits_{i = 1}^{n} {\left( {B_{i} } \right)^{2} } } }}$$
    (8)

    where A and B are the feature vectors obtained from ResNet-50 (Ulyanov et al. 2016) trained on VGGFace2 (Simonyan et al. 2014), Ai denotes i-th element of vector A, Bi denotes i-th element of vector B.

  2. (2)

    PSNR is defined as:

    $$PSNR = 10 \times \log_{10} \left[ {\frac{{\left( {2^{n} - 1} \right)^{2} }}{MSE}} \right]$$
    (9)

where n is the maximum bits of each pixel, MSE means mean square error of two images, which can be computed by:

$$MSE = \frac{1}{mn}\sum\limits_{i = 0}^{m - 1} {\sum\limits_{j = 0}^{n - 1} {\left[ {I(i,j) - K(i,j)} \right]^{2} } }$$
(10)

where m and n denote the width and height of image, I and K are two gray images that are equal in width and height.

4.4 Quantitative evaluation

In this section, the performance of other existed methods and our algorithm are evaluated from two aspects: one is face recognition rate, the other is PSNR. In terms of recognition rate, we evaluate the performance of directional gray-scale derivation X (DGDx) (Adini et al. 1997), directional gray-scale derivation Y (DGDy) (Adini et al. 1997), Gaussian high-pass (GHP) (Fitzgibbon et al. 2002), histogram equalization (HE) (Pizer et al. 1987), logarithmic discrete cosine transform (LDCT) (Chen et al. 2006a, b), local normalization technology (LN) (Xie et al. 2006), Laplacian of Gaussian (LoG) (Adini et al. 1997), logarithmic transform (LT) (Adini et al. 1997), logarithimic total variation (LTV) (Du et al. 2005), self-quotient image (SQI) (Wang et al. 2004), single-scale retinex (SSR) (Jobson et al. 1997), TT (Tan et al. 2010) and ours. In case of PSNR, we only evaluate GHP, GIC, HE, LN, LT, SQI, SSR, TT and ours. All the methods have higher recognition rate and better visual effect than the rest when they are used to process frontal faces.

As is shown in Table 1, the recognition rate of GIC, HE, LN, SQI, SSR and ours is 100% when the probe images are frontal faces under various illuminations processed by a variety of methods. GHP, LT, LTV and TT also obtain high recognition rate. All of them are greater than 92%. In general, all the methods can preserve details or identity information in some sense. But GIC, HE, LN, SQI, SSR, and ours obtain the maximal recognition rate. In qualitative comparisons section, we illustrate contrast face images of the aforesaid algorithms.

Table 1 Recognition rate on setting 1

In Table 2, the PSNR of GHP, GIC, HE, LN, LT, SQI, SSR, TT and ours under 19 illuminations are illustrated. As is shown in Table 2, it can be concluded that the PSNR of original images is different, but after illumination normalization, GHP, GIC, HE, LN, LT, SQI, SSR, TT and ours narrow the difference of PSNR and enhance the quality of poor-lighted images. It is obvious that our method is the best of all the illumination normalization algorithms illustrated in Table 2 and our approach archive the best result of illumination normalization.

Table 2 PSNR of original and processed images by various illumination normalization methods

4.5 Qualitative comparisons

Because most of the previous methods focused on the illumination normalization of gray faces. For the sake of fairness, the comparative experiment on gray faces are conducted. According to Fig. 5, the faces from 2 to 7 columns are − 15°, from 8 to 13 are − 30°, from 14 to 19 are − 45°. The first row is the original poor-lighted facial images. The second and third rows are the results of GHP and GIC separately. The fourth row is obtained from HE method. Next, the experimental results of LN, LT, SQI, SSR, TT, DGDx, DGDy, LDCT, LoG, and LTV are shown. The final row is the output results of our method.

Fig. 5
figure 5

Some gray faces under various illuminations and large head poses before and after processing by various methods. (Color figure online)

As can be observed from the Fig. 5, in general, these methods such as SQI, GIC, HE, LN, SSR, LT and ours achieve good performance, whereas the other methods preserve only part of the details and have poor visual effect. Although SQI, GIC, HE, LN, SSR, LT achieve good results, there are some defects. For instance, though the SQI method has a high recognition rate, its visual effect is a little worse. From the fourth row of Fig. 5, HE method is with a lot of noise and can’t handle shadows effectively. Although LN method improves the illumination of the original facial images, it cannot deal with the shadows. From the third, ninth, eleventh and fifteenth image of the third row of Fig. 5, GIC cannot handle light and cast shadow effectively.

There is a large difference in light intensity on the whole face region. Though SSR method is with good visual effect, some images are over-saturated and cannot process cast shadow effectively. Although the LT method can deal with the illumination of each facial image effectively and with fine visual effect, it cannot process cast shadow effectively and some images are over-saturated. It is obvious in the final row in Fig. 5. Our method can not only deal with the illumination of each image but also keep the corresponding identity of the images under various illuminations effectively. In summary, GIC, SSR, LT and ours perform better than other illumination normalization algorithms. From three aspects: recognition rate, illumination normalization effect and visual effect, our method is the best of all the approaches.

From the previous parts, the methods such as GHP, GIC, HE, LN, LT, SQI, SSR, TT and ours have better perormance than others. To verify their performance under less-controlled lighting variations, another set of experiments illustrate the comparison results are shown in Fig. 6. It can be concluded that GIC, HE, LN, LT, SSR and ours have better visual effect than others. From the third image of the third row, GIC cannot process the light. From the third image of the fourth row, HE encounters same problem and there is some noise. In line 5 of Fig. 6, it is obvious that all the face images still have severe shadows after processing by LN. LT and SSR are the best if only consider from the aspect of visual effect. But some images are over-saturated. Our algorithm is the best of all the approaches from two aspects: illumination normalization and visual effect.

Fig. 6
figure 6

Some gray faces under less-controlled illuminations by various methods. (Color figure online)

4.6 Ablation studies

Since our method is without illumination label and identity label, we compare it only to algorithms that do not require any label. Because there is no label free algorithm in deep learning, we only illustrate some results of our method. According to Fig. 7, the first row is the original facial images under various illuminations. The second row is the synthetic facial images using our approach training with the content loss item only. The third row is the synthetic faces using our method trained with the L1 loss item only. The final row of Fig. 7 is the result obtained from our algorithm trained with the content and L1 loss items. According to Fig. 7, our method can not only make the poor illumination become well-lighted and unanimous but also preserves the identity information of the original poor-lighted facial images effectively. Since content and L1 loss items can preserve identity well, the model trained with the combination of content and L1 loss items can not only keep identity effectively but also obtain better visual effect.

Fig. 7
figure 7

Some frontal color faces under various loss items by our method. (Color figure online)

In order to further verify the performance of our algorithm, we perform illumination normalization experiment on the non-frontal facial images under various lighting conditions. As is shown in Fig. 8, we can see that the first and third rows are − 15°, − 30° and − 45° faces under various illuminations, the second row is the output of the first row. The fourth row is the corresponding synthetic results of the third row. It is obvious that our method can not only process illumination of color faces under large-poses and poor-lighted conditions but also keep corresponding identities effectively.

Fig. 8
figure 8

Some color faces under large head poses and various illuminations before and after processing by our method. (Color figure online)

In practical application, it is basically less-controlled illumination. Therefore, it is necessary to verify the performance of our algorithm under the less-controlled lighting variations. According to Fig. 9, the first row is the original input with poorly lighted faces and the second row is the corresponding output. It is apparent that the input images are not aligned and under various less-controlled lighting variations. The results of Fig. 9 indicate that our method can not only process the illumination of color face under less-controlled lighting variations with various head poses but also keep its identity effectively. Though no one wore glasses in our training set, our method still has the synthetic ability to generate glasses, which demonstrates our approach has a strong ability of feature retention.

Fig. 9
figure 9

Verify our algorithm to process the illumination of color faces

As is illustrated in the Fig. 10, the first row is the original images, and the next 3 rows are the corresponding output of epoch 1, 5 and 13. According to Fig. 10, we can conclude that our method trained with content loss, L1 loss, the combination of content and L1 loss has good feature retention ability. When we combine content and L1 loss, our method converges faster than L1 loss and content loss used separately. Although content loss also makes our algorithm converge fast, its visual effect is not as good as the combination of content and L1 loss. It is apparent in the Fig. 10 that our method has the advantages such as rapid convergence, good feature retention, favorable illumination normalization results.

Fig. 10
figure 10

Some color face after illumination normalization by our method under various epochs and loss items. (Color figure online)

To evaluate our method quantitatively in Figs. 11 and 12. They are under 3 loss items from epoch 1 to 14. Figure 11 illustrates the recognition rate of frontal faces under 3 loss items from epoch 1 to 14. From Fig. 11, it is known that the combination of content and L1 loss makes our model converge fast and obtain higher recognition rate. Figure 12 shows the recognition rate of − 15° faces under 3 loss items from epoch 1 to 14, which demonstrates that our method can get better performance after combining content loss and L1 loss than using content loss or L1 loss separately. From Figs. 11 and 12, It can be concluded that our method converges fast and obtains high recognition rate after combining content loss and L1 loss.

Fig. 11
figure 11

Recognition rate of frontal faces

Fig. 12
figure 12

Recognition rate of − 15°

5 Validation for identity preserving

In this section, we discuss our algorithm’s ability to keep identity information by conducting face recognition experiment. Table 3 shows the recognition results of our algorithm. In line 2 of Table 3, it is the recognition rates of color faces under 0°, − 15°, − 30°, − 45°. It is obvious that our method under content and L1 loss items can obtain high recognition rate. As is illustrated in Table 3, there is no any input such as identity label, illumination label, our method still preserves the corresponding identities of the original faces effectively.

Table 3 Recognition rate of faces under various head poses and lighting variations after doing illumination normalization by our method

6 Conclusion

In this study, we put a novel and practical deep fully convolutional neural network architecture for illumination normalization of color face termed IN-GAN. Our method can process not only the illumination of color face images but also the illumination of gray face images. Furthermore, all the existed methods mainly focus on processing the illumination of frontal or near frontal face. Our scheme can not only process the illumination of frontal face but also the non-frontal face. Moreover, our method can normalize illumination of face image, and retain identity information effectively. Finally, though our trained model on faces under well-controlled lighting variations, it can process face under less-controlled lighting variations and preserve identity information effectively.

In our further researches, other features, and geometric structure (Pareja-Corcho et al. 2020) need to be considered. We find that the number of layers and types of connection are important in CNN (Guo et al. 2021a). Even the discriminate model is considered in the generative adversarial network (Guo et al. 2021b) and multiple features (Guo et al. 2022). The authors showed that the six-convolution layer and three fully-connected layer CNN, nine-layers in total, achieved better performance in sensitivity, specificity, accuracy, and precision (Wang et al. 2020a). In Wang et al. (2020b), the performance of parametric rectified linear unit is better than ordinary ReLU. They also verify that batch normalization overcomes the internal covariate shift and dropout got over the overfitting. We shall try different layer CNN model, and various types of connection in the future. There is no gradient in ReLU, when values are less than zero. However, there is a small gradient in LeakyReLU, while values are less than zero. Therefore, LeakyRelu is selected in our experiments, which can get good results.

Although our illumination normalization algorithm achieves preferable results from qualitative and quantitative comparisons. Compared with other algorithms, our algorithm has gained advantages. There is a lot of future work here worth continuing to study:

  • To improve our network structure for preserving more texture details.

  • To train a feature extractor and classifier for the facial images after normalizing illumination by our method.

  • To process illumination normalization of other image types, such as landscape and medical images.

  • To the preprocessing stage of other visual analysis tasks, such as facial landmark detection and face alignment.