1 Introduction

Three-dimensional (3D) displays can be applied to a wide range of fields, such as television, medicine, education, automobiles, digital signage, and games, and various display methods and techniques to improve display characteristics are actively being researched and developed. Research on 3D displays has a long history, and their origin is said to be the stereoscope [1], which was proposed in 1838. A stereoscope is a stereoscopic display that presents different images simultaneously to the right and left eyes. This allows the viewer to perceive binocular parallax and view 3D images. Nowadays, stereoscopic displays using 3D glasses based on the polarization method or active shutter method are widely used. In a stereoscopic display, the number of pixels in a 3D image is half of the total number of pixels in the display device, which has the advantage of enabling viewing of 3D images with high pixel density. However, there are several problems such as the inconvenience of wearing 3D glasses and visual fatigue caused by the mismatch between convergence and accommodation [2]. In addition, because motion parallax is not presented, users may feel that what they see is unnatural when moving the viewpoint.

Holography and light field displays, which are based on the concept of faithfully reproducing light rays from objects, have been researched and developed as 3D display methods that have the potential to solve various problems of stereoscopic displays. Holography was first proposed by Gábor Dénes in 1948 [3]. This method is based on wavefront reproduction, which reproduces all light ray information, such as light intensity, wavelength, and phase, to form an optical image of objects in space. In principle, it satisfies all the physiological factors of stereopsis, such as convergence, accommodation, and binocular parallax, and is, therefore, considered the ultimate 3D display method. Smooth motion parallax is perceived when the viewer moves, and because parallax exists in both the horizontal and vertical directions, 3D images can be viewed even when the viewer’s face is tilted. However, holography requires coherent light, such as a laser, and to obtain a wide viewing angle, an ultrahigh pixel density display device with a pixel size equivalent to a wavelength is required. It is anticipated that it will take a long time for research and development to solve these problems and achieve practical applications in a wide range of fields.

On the other hand, a light field display is a 3D display method based on ray reproduction, which reproduces the emitting position, traveling angle, intensity, and wavelength of light rays. Although the spatial frequency characteristics of 3D images at deep depth positions are generally lower than those of holography, coherent light and display devices with a pixel size equivalent to a wavelength are not required. Therefore, it is expected to be practical for use in many fields sooner than holography. Light field displays originated from integral photography [4], a 3D photographic technique proposed by Gabriel Lippmann in 1908. The term “light field” was first used by Andrey Gershun in 1936 as a vector function to describe light rays [5]. In the 1990s, concepts, such as the plenoptic function [6], which defines light rays as a seven-dimensional function, and light field rendering [7], which is an image-based rendering technique, were proposed one after another. Paralleling the progress of computer technology, the research and development of 3D imaging and display technology based on light field reproduction have become more active. Although light field displays do not require a display device with a pixel size equivalent to a wavelength, as in holography, a large amount of image information is required to display 3D images with high display characteristics. To improve the display characteristics of 3D images, various methods and techniques based on light field reproduction have been researched and developed.

Based on binocular parallax, light field displays can be classified into two types: horizontal and full parallax. In the horizontal parallax type, light rays are diffused in the vertical direction; therefore, an optical image is not exactly formed in the space. The advantage of this type is that it can display 3D images with higher display characteristics such as pixel density than the full-parallax type when the same display device is used, because the image information is distributed only in the horizontal direction. The disadvantage is that it cannot be viewed when the viewer’s face is tilted, and geometric distortions occur in the 3D image when observed away from the specified viewing distance. In the full-parallax type, it is necessary to distribute image information both horizontally and vertically, making it difficult to achieve high display characteristics for 3D images. However, 3D images can be viewed from any position, including when the viewer’s face is tilted. Therefore, compared to the horizontal parallax type, the full-parallax display type is expected to be applied to a wider range of fields in the future. This study focuses on reviews of full-parallax light field display methods, such as the integral 3D display, Aktina Vision, and compressive light field display while also referring to the horizontal parallax-type light field display. The basic principles of these methods and the latest research results on technology for improving the display characteristics of 3D images are reported.

2 Integral 3D display

The integral 3D display is a light field display method that is based on a 3D photographic technique called integral photography, which was proposed in 1908 [4]. Integral photography uses a photographic plate to record and display elemental images and reproduce 3D still images. On the other hand, in an integral 3D display, the photographic plate is replaced by an electronic device. Using an image sensor for capturing and a display device such as a liquid crystal display (LCD) or organic electroluminescence diode (OLED) display for display, it is possible to capture and display 3D videos. The basic configuration of an integral 3D display is illustrated in Fig. 1. It should be noted that only the principal rays are illustrated in this figure for simplicity. A lens array is placed in front of the image sensor while capturing an object, as shown in Fig. 1a. A lens array is an optical element consisting of many tiny lenses arranged in a two-dimensional array. By capturing the object through the lens array, a tiny individual image is captured by each lens. These are referred to as elemental images. For display, the elemental images are displayed on the display device with the lens array placed in front of it, as shown in Fig. 1b. As a result, light rays traveling in the direction opposite to that of the capture are reproduced. These light rays form an optical image of the object in space, and thus a full-parallax 3D image can be viewed.

Fig. 1
figure 1

Basic principle of integral 3D display. a Capturing. b Displaying

A technology similar to the integral 3D display is the light field display using a lenticular lens. Lenticular lenses have conventionally been used in stereoscopic displays and multi-view 3D displays. While these conventional 3D displays form a small number of viewpoints, the lenticular lens-based light field display reproduces dense light rays in many directions to reproduce light rays from an object, similar to an integral 3D display. Because the lenticular lens has a structure that is periodic only in the horizontal direction, the parallax of the 3D image is only in the horizontal direction. Because it is a horizontal parallax light field display, there are problems, such as the inability to view 3D images when the viewer’s face is tilted, as described in Sect. 1. However, it has the advantage of being able to display 3D images with a higher pixel density than integral 3D displays, because it does not need to distribute the image information in the vertical direction.

Indicators of the display characteristics of integral 3D images are the maximum number of pixels, depth range, and viewing angle. In general, the maximum pixel number in 3D images is the same as the number of elemental lenses constituting the lens array. Next, the depth range mainly depends on the pixel pitch of the display device, the lens pitch, and the focal length of the lens array. To explain the depth range in detail, a graph of the resolution limit of integral 3D images [8] is shown in Fig. 2. The horizontal axis of the graph represents the depth distance of the 3D image with respect to the lens array plane, and the vertical axis represents the maximum spatial frequency \({\beta }_{\mathrm{max}}\) of the 3D image when observed from a predetermined viewing distance. The upper limit of the maximum spatial frequency of the 3D image is determined by the Nyquist frequency \({\beta }_{\mathrm{nyq}}\). The Nyquist frequency is expressed by the following equation:

$$\begin{array}{*{20}c} {\beta_{{{\text{nyq}}}} = \frac{L}{2p},} \\ \end{array}$$
(1)

where \(L\) is the viewing distance from the lens array plane and \(p\) is the lens pitch of the lens array. The shorter the lens pitch, the higher the Nyquist frequency. As shown in Fig. 2, when the depth of the 3D image is shallow, the maximum spatial frequency is limited by the Nyquist frequency and does not vary with the depth distance. The maximum spatial frequency decreases as the depth of the 3D image increases, because the density of the light rays that form the 3D image decreases as they move away from the lens array surface. The maximum spatial frequency at deep positions is mainly determined by the pixel pitch of the elemental images and focal length of the lens array. To improve the maximum spatial frequency at deep depth positions, it is necessary to reduce the pixel pitch or increase the focal length. Finally, the viewing angle \(\theta\) of the 3D image is determined by the positional relationship between the elemental images and lens array. The viewing angle is expressed by the following equation:

$$\begin{array}{*{20}c} {\theta = 2{\text{tan}}^{ - 1} \left( \frac{p}{2f} \right),} \\ \end{array}$$
(2)

where the distance between the display device and lens array is the focal length \(f\), and the shapes of the elemental images and elemental lenses are identical.

Fig. 2
figure 2

Maximum spatial frequency characteristics of a 3D image

There is a trade-off between the display characteristics of 3D images, such as the maximum number of pixels, depth range, and viewing angle. Therefore, a large amount of image information is required to improve the overall display characteristics. With this background, integral 3D displays have been developed that use display devices with a large number of pixels, such as ultra-high-definition projectors. For example, in 2010, a display system was developed that can display 3D images with approximately 100,000 pixels using a single projector with full 8 K resolution [9]. In addition, as a way to improve the display characteristics of 3D images, methods that use multiple display devices have been researched and developed [10,11,12]. For example, by tiling multiple direct-view display devices and using a dedicated multi-image combining optical system, the gaps caused by the bezels around the display devices can be eliminated and 3D images with an increased screen size and number of pixels can be displayed [12].

In general, in an integral 3D display using a direct-view display device, light rays from each elemental image enter not only the elemental lens facing them but also the adjacent ones, resulting in a repeating image when the viewer observes the display screen from outside the viewing zone. On the other hand, in an integral 3D display that uses a projector to project elemental images directly onto a lens array, the light rays of each elemental image enter only the corresponding elemental lens; thus, there is no repeating image. Based on this feature, methods have been proposed to improve the display characteristics of 3D images using multiple projectors [13,14,15]. For example, by projecting elemental images from each projector at a predetermined angle onto the lens array, the viewing zone of the 3D image is continuously combined, and the viewing angle is expanded [13]. It is also possible to improve the pixel density of 3D images by overlapping the viewing zones of 3D images displayed by each projector [14]. A method to efficiently improve both the pixel density and viewing angle of 3D images has also been proposed [15]. These projection-type integral 3D displays can improve the display characteristics of 3D images as the number of display devices increases. However, there is the problem that the scale of the display system becomes larger.

As another approach to improve the display characteristics of 3D images, time-division multiplexing techniques have also been proposed [16,17,18,19]. For example, methods have been proposed to change the position of the lens array physically [16, 17] or to change the aperture position of the elemental lenses electronically by placing a liquid crystal mask close to the lens array [19]. In addition, eye-tracking techniques for integral 3D displays have also been studied [20, 21]. In this method, a lens array with a long focal length is used to increase the depth range of the 3D image in exchange for narrowing the viewing angle. The camera then detects the position of the viewer’s eyes, and the computer generates elemental images in real time such that the viewing zone is formed in the direction of the viewer. Thus, a wide viewing angle can be achieved simultaneously.

Color moiré is a major problem in integral 3D displays that use direct-view display devices. Color moiré is caused by interference between the periodic structure of the lens array and the pixels of the direct-view display device. A method for shifting the moiré component to the high-frequency region by tilting the lens array has been proposed [22]. Although this method does not eliminate the color moiré, it can make it less noticeable. The use of a diffuser or the defocusing effect has been proposed as a method to reduce the color moiré effect [23]. However, this method increases the crosstalk between the reproduced light rays and may degrade the spatial frequency characteristics of 3D images. To solve this problem, a method for optically synthesizing the displayed images of three integral 3D displays has been proposed [24]. In this method, the position of each integral 3D display and the phase of the color moiré are adjusted with high accuracy to cancel out the color moiré and improve the pixel density of the 3D image.

Integral 3D displays generally use a lens array; however, a point light source (PLS) array and an aperture array can also be used to display 3D images. The advantage of using a PLS array is that it is easy to apply a time-division multiplexing technique and to implement a 3D/2D conversion function. In a PLS array-type integral 3D display, elemental images are generally displayed on a transmissive LCD panel and a PLS array is placed behind the panel as a backlight. The PLS array can be a self-emitting light source, such as a light-emitting diode (LED) array or a pseudo PLS array, generated by projecting collimated light rays onto a lens array. As an example of a time-division multiplexing technique, a method has been proposed to improve the pixel density and viewing angle of 3D images by switching the light emission/non-light emission of the PLS on a frame-by-frame basis and synchronously displaying the corresponding elemental images [25]. In a 3D/2D convertible integral 3D display, the backlight status is switched between the PLS array and diffuse light. Several backlight switching methods have been proposed, including polymer-dispersed liquid crystals [26], dual-layer LED arrays [27], polarization-selective diffusers [28], edge-lit light guide plates [29], and geometric phase lens arrays [30].

As described above, various display methods and techniques to improve display characteristics are being researched and developed based on integral 3D displays. The basic system configuration of an integral 3D display is simple and consists of a lens array placed in front of a display device. This makes it possible to develop a thin and compact display device, which is promising for portable 3D displays.

3 Aktina Vision

We have developed a display method called Aktina Vision [31, 32], which is a light field display based on multi-view projection. The first 3D display based on multi-view projection was proposed by Herbert E. Ives in 1929 [33]. In the multi-view projection method, multiple projectors are generally arranged in the horizontal direction, and the viewpoint images corresponding to the projection direction are projected and superimposed onto the diffusing screen from behind the diffusing screen, to display 3D images with horizontal parallax. The diffusing screen has anisotropic diffusion characteristics with a narrow horizontal diffusion angle and a wide vertical diffusion angle and is used to widen the incoming discrete rays slightly to produce light rays with a continuous luminance distribution. This screen allows the viewer to perceive motion parallax by continuously changing the displayed image when the viewer moves horizontally and to view the same 3D images without darkening the display screen when the viewer moves vertically. In addition to the rear projection methods [34,35,36], a front projection method using retroreflective material [37] and a 360-degree display with a cylindrical diffusion screen [38], or a concave cone screen [39] have been proposed for the display system configuration. In a configuration, where a condenser lens is placed near the diffusing screen, light rays are focused at a predetermined viewing distance [36]. In this configuration, the maximum number of pixels in the 3D image is the same as the number of pixels in the viewpoint image when the viewer observes the 3D image from a specified viewing distance [40]. The multi-view projection method can increase the overall amount of image information using display devices with a large number of pixels and increasing the number of display devices; therefore, this method can easily display 3D images with high display characteristics.

As mentioned above, the conventional multi-view projection method displays 3D images with only horizontal parallax. Therefore, 3D images cannot be viewed when the viewer’s face is tilted, which is a common problem with horizontal parallax 3D displays. To solve this problem, we developed a display method called Aktina Vision, whose basic configuration is shown in Fig. 3. The display system consists of projectors, imaging lenses, condenser lenses, and a 3D screen. Each projector projects two-dimensionally arrayed viewpoint images. The light rays of each viewpoint image enter the corresponding imaging lens and are magnified. The direction of travel is changed by the first condenser lens and all the viewpoint images are superimposed on the 3D screen. As a result, each viewpoint image is imaged on the 3D screen, and high-density light rays diffused by the 3D screen form an optical image in space. This allows the observer to view full-parallax 3D images, as shown in Fig. 4. A second condenser lens is used to control the viewing zone. The light rays of each viewpoint image are focused at a specified optimum viewing distance. The display characteristics of the 3D image are maximized when observed at this optimum viewing distance. Because Aktina Vision forms an optical image in space, 3D images can be viewed without geometric distortions even when the viewing distance is far from the optimal viewing distance.

Fig. 3
figure 3

Basic configuration of Aktina Vision

Fig. 4
figure 4

Displayed 3D image from different viewpoints

A 3D screen is a special isotropic diffusion screen with a narrow diffusion angle and top-hat diffusion characteristics. The structure and function of the 3D screen are illustrated in Fig. 5a. The 3D screen is composed of a microlens array that diffuses light rays by refraction of the lenses, resulting in diffusion characteristics close to a top-hat shape. In conventional multi-view projection methods, a holographic screen is mainly used as an anisotropic diffusion screen. Conventional screens have a Gaussian distribution of diffusion characteristics, which causes a relatively large crosstalk between neighboring light rays, resulting in a decrease in the spatial frequency characteristics of 3D images at deep depth positions, as shown in Fig. 5b. Conversely, the top-hat diffusion of the 3D screen reduces the crosstalk between light rays and increases the spatial frequency characteristics of the 3D image at deep depth positions, as shown in Fig. 5c. To finely control the diffusion angle of the 3D screen, the surface of the microlens array is filled with ultraviolet (UV) curable resin with a slightly different refractive index. By controlling the difference in refractive indices between the lens array and the filled part, the diffusion angle of the 3D screen can be adjusted. Through these developments, we realized a light field display capable of displaying full-parallax 3D images with high pixel density.

Fig. 5
figure 5

Principle of a 3D screen. a Function of the 3D screen. b Diffusion characteristics and crosstalk between light rays of Gaussian diffusion screen and c 3D screen

We developed a display system with a maximum of approximately 330,000 pixels, based on Aktina Vision [31]. Similar to the maximum spatial frequency characteristics of the integral 3D display shown in Fig. 2, the spatial frequency of the 3D image displayed by Aktina Vision is high at shallow depth distances and decreases at deeper depth distances with respect to the 3D screen plane. The spatial frequency at a shallow depth distance can be improved by increasing the pixel density of the multi-view images, and the spatial frequency at a deeper depth distance can be improved by increasing the projection density of the multi-view images. Based on these characteristics, to further improve the spatial frequency characteristics of 3D images, we are developing two types of time-division multiplexing techniques that shift light rays spatially [32]. Although 3D display methods based on multi-view projection, including Aktina Vision, can display 3D images with high pixel density, the scale of the display system is large. Therefore, it is more promising as a display method for stationary large-screen 3D displays than for portable displays.

4 Compressive light field display

The compressive light field display is capable of displaying full-parallax 3D images based on a different principle than the integral 3D display and Aktina Vision. In this method, special pattern images are displayed on several different depth planes. Because the overlapping pixels of each patterned image differ depending on the viewing position, a 3D image based on light field reproduction is displayed. Depth-fused 3D (DFD) [41] is a 3D display method with a similar system configuration, although it is not a light field display. In DFD, images are displayed on two depth planes arranged one behind the other, and the luminance ratio of each image is controlled to give the impression that there is a 3D image between the two planes when viewed from the front. In this way, DFD does not form optical images, but instead relies on the phenomenon of stereoscopic illusion to create the perception of 3D images. Conversely, the compressive light field display forms an optical image in space and can display a full-parallax 3D image based on light field reproduction. The basic configuration of this method is shown in Fig. 6. The display device consists of several transmissive LCDs stacked in a face-to-face direction, and backlight. Consider a light ray emitted from position \(x\) on the backlight surface at an angle \(\theta\) and passing through the three LCDs, and let the luminance of this light ray be \(L\left(x, \theta \right)\). In addition, let \({p}_{1}\), \({p}_{2}\), and \({p}_{3}\) be the transmittances of the pixels of the LCDs that intersect this light ray. Then, \(L\left(x, \theta \right)\) becomes the luminance of the light ray emitted from the backlight multiplied by the transmittances \({p}_{1}\), \({p}_{2}\), and \({p}_{3}\). Similarly, the luminance of all light rays in the viewing zone is represented by the transmittance of each pixel of the LCDs, and the transmittance of the pixels is determined such that the error on the luminance between these light rays and the light rays of the desired 3D image to be reproduced is minimized. A compressive light field display can display 3D images with high pixel density. For example, if a display system is configured with two LCDs, and a 3D image is displayed at the same depth distance as either LCD, the maximum spatial frequency of the 3D image is the same as that of the LCD [42]. The maximum spatial frequency of the 3D image gradually decreases as the depth distance of the 3D image increases from the LCD surface.

Fig. 6
figure 6

Basic configuration of compressive light field display

Compressive light field display is a relatively new display method that has been active research topic since approximately 2010, triggered by the publication of research by the Massachusetts Institute of Technology [42,43,44,45,46]. In 2012, a unified optimization framework was proposed to represent the reproduced light field by tensors, considering the application of time-division multiplexing and directional backlighting [44]. From the title of this paper, a compressive light field display using stacked LCDs is also called a tensor display. The light field can also be reproduced by adding the luminance of the pixels, instead of multiplying the transmittance of the pixels. For example, a method using two layers of holographic optical elements (HOEs) and two projectors has been proposed [47]. The HOEs are designed to reflect and diffuse light rays incident from a predetermined angle toward the frontal direction while transmitting light rays incident from other angles. Different images can be displayed on the two HOEs by projecting images from two projectors at different angles, and these images can be synthesized optically. One of the challenges of the multiplicative-type compressive light field display is that the reproduced light rays are attenuated by the LCDs, so the displayed 3D image becomes dark. The problem with the additive type is that it requires special optical elements or a time-division multiplexing technique, which makes the system configuration more complicated. A common problem for the multiplicative and additive types is that the viewing zone is limited compared with other light field display methods.

In compressive light field displays, the accuracy of light ray reproduction, such as light ray direction and luminance, can be improved by increasing the number of layers on which images are displayed. For example, a method using a single LCD and two layers of geometric phase lenses has been proposed to increase the number of layers using time-division multiplexing [48]. In this method, the formation position of the imaginary image of the displayed image on the LCD is controlled by time division, and the corresponding pattern image is synchronously displayed on the LCD to display the 3D image based on an additive-type compressive light field display. Methods to increase the number of layers and improve the accuracy of light ray reproduction using various optical elements such as a reflective polarizer [49] and half mirror [50] have also been studied.

Two types of methods have been proposed for generating pattern images to be displayed in each layer: analytical methods using non-negative tensor factorization [44] or non-negative least squares [47], and methods based on convolutional neural networks (CNNs) [51]. The analytical method is more accurate for generating images, whereas the CNN-based method generates images more rapidly. The luminance values of the light rays reproduced by the compressive light field display always contain errors compared to the light rays of the desired 3D image to be reproduced. Comparing the multiplicative and additive methods of light field reproduction, it has been reported from theoretical analysis that the accuracy of the reproduced light rays is higher with the multiplicative type [51]. To generate pattern images, it is necessary to provide data on the target light field, such as multi-view images. The relationship between the target light field and the display quality of 3D images has been analyzed, and it has been reported that the target light field must be within 0 to 1 pixels of disparities among adjacent viewpoints [52]. In addition, a method for generating pattern images directly using the focal stack as input has been proposed to reduce the cost of data acquisition when acquiring the light field in real space [53].

Among the three types of full-parallax light field display methods introduced in this paper, while integral 3D display and Aktina Vision reproduce light rays as one light ray per pixel of the display device, the compressive light field display reproduces light rays by superimposing pixels and is a light field reproduction method based on a different concept. Compressive light field display is a promising method for displaying 3D images with high pixel density, and further improvements are expected.

5 Conclusions

Three types of methods for displaying full-parallax 3D images based on light field reproduction, namely, integral 3D display, Aktina Vision, and compressive light field display, were explained in terms of their basic system configuration, advantages, and challenges. Integral 3D displays have the advantage of being able to develop thin display systems owing to their simple configuration. Aktina Vision has the advantage of being able to reproduce 3D images with high pixel density. A compressive light field display is capable of reproducing 3D images with high pixel density in a relatively simple configuration, although it reproduces an approximate light field that contains errors. To display 3D images with high display characteristics on a light field display, a large amount of image information is required. As techniques to improve the display characteristics of 3D images, we explained various methods, such as using multiple display devices and time-division multiplexing techniques. Light field displays have many advantages over stereoscopic displays, including the ability to present motion parallax and the possibility of solving the problem of visual fatigue associated with stereopsis. In the future, with technological developments in display devices, such as LCDs, OLED displays, and projectors, as well as progress in various optical elements, the display performance of 3D images will be improved, and it is expected that light field displays will be put to practical use in a wide range of fields.