1 Introduction

Imaging in low illumination conditions is of interest for applications in remote sensing, underwater imaging, and night vision. Integral imaging is a three-dimensional (3D) imaging technique that incorporates the angular and intensity information from multiple viewing perspectives to reconstruct a 3D scene [1, 2], and has been shown to provide superior performance over 2D imaging strategies in low light environments due to being optimal in a maximum likelihood sense [3,4,5,6,7,8]. The pickup process which consists of recording images with different viewing perspectives, known as elemental images, can be accomplished using a lenslet or camera array, or by using a single camera on a moving translation stage. The 3D reconstruction of the scene can be performed either optically or computationally. Computational 3D integral imaging reconstruction is performed using the following equation:

$$I\left( {x,y;z} \right) = \frac{1}{{O\left( {x,y} \right)}}\mathop \sum \limits_{a = 0}^{A - 1} \mathop \sum \limits_{b = 0}^{B - 1} E_{a,b} \left( {x - a\frac{{L_{x} \times p_{x} }}{{c_{x} \times M}},y - b\frac{{L_{y} \times p_{y} }}{{c_{y} \times M}}} \right) + \varepsilon ,$$

where (x, y) is the pixel index, z is the reconstruction distance, O(x, y) is the overlapping number on (x, y), A and B are the total number of elemental images obtained in each column and row, respectively; Ea,b is the elemental image in the a-th column and b-th row, and Lx and Ly are the total number of pixels in each column and row, respectively, for each Ea,b. M is the magnification factor and equals to z/f, f is the focal length, px and py represent the pitch between image sensors, cx and cy are the size of the image sensor, and ε is the additive read noise. Figure 1 shows the diagram for optical pickup and computational 3D reconstruction of integral imaging [3].

Fig. 1
figure 1

Diagram of integral imaging, a optical pickup of elemental images, and b 3D computational reconstruction. p is the pitch between sensors, g is the focal length, and c is the sensor size [3]

2 Results and discussion

In [3], 3D integral imaging-based low illumination object visualization and detection was presented using a conventional low-cost and compact CMOS sensor on a moving translation stage to capture the perspective elemental images of the scene. 36 elemental images were recorded of a person standing behind an occluding tree branch in low illumination conditions. Following image acquisition, computational 3D integral imaging reconstruction was performed, then the Total Variation (TV) denoising algorithm [9] was applied to the 3D reconstructed image. After denoising, the Viola-Jones object detection framework [10] was used on the reconstructed image for successful face detection. Experimental results under two low illumination conditions are provided by Fig. 2. Analysis of these results showed a reduction in entropy for the 3D reconstructed images, as well as an increase in the signal-to-noise ratio (SNR) in comparison to the traditional 2D imaging [3]. This overviewed work [3] demonstrated 3D integral imaging for object visualization and detection in poor illumination conditions without the need for photon-counting or cooled CCD cameras and enabled detection of faces that was not possible in the conventional 2D images.

Fig. 2
figure 2

Experimental results for two low light illumination conditions. a, d are the read noise limited 2D elemental images. b, e are the reconstructed 3D images with the faces detected c, f are the detected faces from (b, e) respectively, after total variation denoising [3]

More recently, the use of convolutional neural networks (CNN) has been presented for 3D integral imaging-based object recognition in very low illumination conditions [5]. In this overviewed work, 3D integral imaging is used to improve the SNR followed by TV denoising. After TV denoising, regions of interest are extracted from the denoised reconstructed image using the Viola-Jones face detection framework [10] and input into a CNN. Moreover, the input to the CNN is a 2D slice of the reconstructed volume, with several different depths selected separately from each scene. The CNN is trained on different low illumination conditions, then performs object recognition on the 3D reconstructed images taken at an unknown illumination condition [5]. During the experiments, 72 elemental images were obtained using an Allied Vision Mako-192 camera on a translation stage in varying illumination conditions. 6 human subjects were used in the experiments and were located 4.5 m away from the camera array. The scene illumination was altered by adjusting the intensity of the light source. Data was collected at 17 different illumination conditions for each of the 6 subjects. The images were reconstructed at different depths between 4 and 5 m using a step size of 50 mm. From each of the reconstructed images, a region of interest was extracted using the Viola-Jones face detector to be input into the network. The data was then randomly split into testing and training data with 4 randomly chosen illumination conditions being held out of the training procedure for testing and the remaining 13 illumination conditions used for training the CNN. To increase the size of the training set, data augmentation was applied on the extracted regions of interest. After data augmentation, a total of 29,232 images were used for training the network. The overview of the classification scheme using CNN is depicted by Fig. 3. Using this scheme, 100% classification accuracy was achieved for object recognition among the 6 subjects in very low illumination conditions.

Fig. 3
figure 3

Overview for classification procedure using CNN for low light object recognition [5]

3 Conclusions

In summary, we have overviewed recent works [3, 5] for integral imaging-based 3D object detection and recognition in low illumination conditions. 3D integral imaging improves the SNR over the conventional 2D images in photon-starved conditions. Following 3D reconstruction, TV denoising further improves the image quality, then faces can be detected using the Viola-Jones face detector which fails on the read-noise dominant conventional 2D images. The detected faces can be recognized using a CNN for classification [5]. Continued research for integral imaging-based 3D object detection and recognition in low illumination conditions includes work with highly sensitive imaging sensors such as the scientific CMOS and electron multiplying CCD cameras, and work in low light polarimetric imaging [6,7,8].