1 Introduction

Computational photography is based on capturing and processing discrete samples of all the light rays in the 3D space [1]. These light rays are described by the 7D plenoptic function [2] in terms of position, orientation, spectral content, and time. This can be simplified to a 4D lightfield [3, 4] by considering only the value of each ray as a function of its position and orientation in a static scene, and by constraining each ray to have the same value at every point along its direction of propagation. Compared to conventional photography, which captures 2D images, computational photography captures the entire 4D lightfield. This 4D lightfield extends the capabilities of current commercial cameras with new features such as refocusing [57], 3D depth estimation [810] or super-resolution [1113].

There are several methods to obtain a 4D lightfield. In this paper, we will use a plenoptic camera [14]. A plenoptic camera uses a microlens array inside a conventional camera body to measure the intensity and direction of light rays. There are several commercial models of plenoptic cameras available [15, 16]. Since the complete 4D lightfield inside the camera is recorded, new 2D photographs can be computed by tracing and projecting all the rays to desired virtual imaging planes. This so-called digital refocusing enables to change focus after the picture is taken, as well as extending the depth of field without decreasing the aperture.

The set of refocused images is called the focal stack of the scene. A particular image in the focal stack can be modeled through the continuous Photography integral operator [3]. Since the plenoptic image is discrete, to obtain a refocused photograph it is necessary to discretize the Photography operator. Then, it is necessary to solve two problems: to interpolate the lightfield and to approximate the integration process. There are several methods to perform this discretization: the brute-force approach interpolates the lightfield by nearest neighbor and approximates the integral by sums. If we assume that the plenoptic camera is composed of \(n\times n\) microlenses and that each microlens generates a \(n\times n\) image, the brute-force approach would need \({O}(n^{4})\) operations to generate a refocused photograph. A significant improvement to this performance was obtained in [5] with the “Fourier Slice Photography” (FSP) technique that reformulates the problem in the Fourier domain. The FSP method is based on the extraction of an appropriate dilated 2D slice in the 4D Fourier transform of the lightfield. In this case, interpolation takes place in the 4D Fourier domain using a 4D Kaiser–Bessel filter. This decreases the computational cost of the discretization of the Photography operator to \({O}(n^{4}\hbox {log}(n))\) operations to generate a focal stack with \({O}(n^{2})\) refocused photographs. Another method that uses the frequency domain is the Discrete Focal Stack Transform (DFST) that uses 4D trigonometric interpolation in the spatial domain and a discretization of the integral operator [6]. The Discrete Focal Stack Transform is fast [17] and can be computed in \({O}(n^{4}\hbox {log}(n))\) for a \(n^{4}\) lightfield. It also can be described analytically.

This paper studies the existence of an inverse for the Focal Stack transform. Such inverse should take the focal stack of a scene as its input and produce a 4D lightfield as its output. This would make a 4D lightfield equivalent to its focal stack and new procedures for capturing lightfields based on refocused images from standard conventional cameras could be developed. Such inverse could also be used for lightfield compression purposes since the focal stack images are highly redundant. In this paper, we show that this is not the case in general. We introduce the concept of \({\varPi }\)-constant lightfields, which includes the Lambertian lightfields, where this inversion is theoretically possible. We also study the numerical properties of this inversion procedure. Finally, we provide some experimental results that show that the lightfield captured from a plenoptic camera can be well approximated using a \({\varPi }\)-constant lightfield and we study the performance of the inverse for general lightfields using numerical inversion with regularization.

This paper is divided in five sections. In Sect. 2 we describe the algorithms to compute the Focal Stack in the frequency domain that are based on the Fourier Slice technique in [5, 6]. In Sect. 3, we study the inversion of the Focal Stack transform and discuss its properties. Section 4 contains some experimental results and Sect. 5 includes conclusions and future work.

2 Previous Work

Conventional 2D photographs are obtained in plenoptic cameras through the Photography transform. This transform takes a lightfield as its input and generates a photograph focused on a determined plane [5, 18]. The Photography transform will be defined by means of a two-plane parameterization of the lightfield inside the plenoptic camera. We will write \(L_{F}({{\varvec{x}}},\,{{\varvec{u}}})\) as the radiance traveling from position \({{\varvec{u}}}=(u_{1},\,u_{2})^\mathrm{T}\) on a domain defined by the aperture of the lens plane to position \({{\varvec{x}}}=(x_{1},\,x_{2})^\mathrm{T}\) over the sensor extent. F is the distance between the lens and the sensor (see Fig. 1 adapted from [5]).

To compute conventional photographs, the sensor plane is virtually placed at any distance \(\alpha F\) inside the camera (see Fig. 1). Let \(\mathcal {P}\left[ {L_F } \right] \) be the operator that transforms a lightfield \(L_{F}\) at a sensor depth F into a photograph formed at sensor depth \(\alpha F\). The operator can be formulated as

$$\begin{aligned} {\mathcal {P}}_\alpha \left[ {L_F } \right] \left( {\varvec{t}} \right) =\frac{1}{\alpha ^{2}F^{2}}\int L_F \left( {{\varvec{u}}\left( {1-\frac{1}{\alpha }} \right) +{\varvec{t}},{\varvec{u}}} \right) \mathrm{d}{\varvec{u}}. \end{aligned}$$
(1)

This equation shows how to compute a photograph \({\mathcal {P}}_\alpha \left[ {L_F } \right] \) formed at a virtual sensor plane that is located at distance \(\alpha F\) from the lens plane. Pixels in the 2D photograph depend on the spatial variable \({{\varvec{t}}}=(t_{1}, t_{2})^\mathrm{T}\) as seen on Fig. 1. When we compute the photographs for every virtual sensor distance \(\alpha F\), we obtain the Focal Stack transform \(\mathcal{S}\) of the lightfield.

Fig. 1
figure 1

Two plane parameterization of the lightfield

To render the 2D photograph, we usually want the image taken from a constant lightfield to be a constant independent of \(\alpha \), so we normalize the Photography Operator removing the \(1/({\alpha ^{2}F^{2}})\) term. Also, in order to make the discretization of the focal stack easier, we reparametrize the focusing distance using \((1-1/\alpha )=q\) leading to the normalized Photography operator defined as

$$\begin{aligned} P_q \left[ L \right] \left( {\varvec{t}} \right) =\int L\left( {{\varvec{u}}q+{\varvec{t}},{\varvec{u}}} \right) \mathrm{d}{\varvec{u}}. \end{aligned}$$
(2)

In practice, the normalized Photography operator cannot be used since a plenoptic camera only captures discrete samples of the lightfield. To discretize \(P_q \left[ L \right] \), we need to resample the lightfield and to approximate the integration process.

After defining the basics of the Photography transform, we introduce in Table 1 a summary of the symbols that will be used in the paper.

Table 1 Notation table

2.1 The Fourier Slice Theorem

A simple discretization of \(P_q \left[ L \right] \) could be done by resampling the lightfield through local interpolation and replacing the integration process with sums. This is the brute-force approach that needs \({O}(n^{4})\) operations to generate a single refocused photograph of size \(n^{2}\). A significant improvement in the computational complexity can be achieved through the Fourier Slice Photography technique that studies the Photography operator in the frequency domain. The main result is the Fourier Slice Photography Theorem [5], which states that in the frequency domain, a photograph is a 2D slice of the 4D lightfield. Then, photographs focused at different depths correspond to slices at different slopes in the 4D space. We may write the Fourier Slice Photography Theorem as

$$\begin{aligned} P_q \left[ L \right] \left( {\varvec{t}} \right) =\int {\hat{L}} \left( {{\varvec{\xi }} ,-q{\varvec{\xi }} } \right) e^{2\pi i{\varvec{t}}\cdot {\varvec{\xi }} }\mathrm{d}{\varvec{\xi }} , \end{aligned}$$
(3)

where \({\varvec{\xi }} \) is the 2D spatial frequency, the dot symbol \(\cdot \) is the scalar product, and \({\hat{L}} \) is the Fourier transform of the lightfield:

$$\begin{aligned} {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{\eta }} } \right) =\int \int L\left( {{\varvec{x}},{\varvec{u}}} \right) e^{-2\pi i\left( {{\varvec{x}}\cdot {\varvec{\xi }} +{\varvec{u}}\cdot {\varvec{\eta }} } \right) }\mathrm{d}{\varvec{x}}\mathrm{d}{\varvec{u}}. \end{aligned}$$
(4)

Therefore, to obtain a photograph \(P_q \left[ L \right] \left( {\varvec{t}} \right) \) we have to compute first the Fourier transform of the lightfield \({\hat{L}} \) using the 4D integral in (4). After that, we have to evaluate the slice \({\hat{L}} \left( {{\varvec{\xi }} ,-q{\varvec{\xi }} } \right) \) and finally we have to compute the inverse Fourier transform of the slice. It only remains to specify how to resample in the frequency domain as required by the slicing operator. In the Fourier Slice Photography technique, this resampling process is carried out with the Kaiser-Bessel filter, to minimize aliasing. The algorithm generates \({O}(n^{2})\) photographs of size \(n^{2}\) with a computational complexity of \({O}(n^{4}\hbox {log}(n))\) and obtains a significant improvement in computational complexity over the direct approach.

2.2 The Discrete Focal Stack Transform

The Discrete Focal Stack transform [6] is an alternative to the Fourier Slice Photography technique based on trigonometric interpolation. The Discrete Focal Stack Transform uses the following decomposition of (2) [17]:

$$\begin{aligned} P_q \left[ L \right] \left( {\varvec{t}} \right) =\int \int L\left( {{\varvec{x}},{\varvec{u}}} \right) \delta \left( {{\varvec{x}}-{\varvec{u}}q-{\varvec{t}}} \right) \mathrm{d}{\varvec{x}}\mathrm{d}{\varvec{u}} \end{aligned}$$
(5)

and the \(\delta \) representation in the Fourier domain:

$$\begin{aligned} \delta \left( {\varvec{x}} \right) =\int e^{-2\pi i{\varvec{\xi }} \cdot {\varvec{x}}}\mathrm{d}{\varvec{\xi }} \end{aligned}$$
(6)

The combination of (5) and (6) leads to

$$\begin{aligned}&P_q \left[ L \right] \left( {\varvec{t}} \right) \nonumber \\&\quad =\int \left( {\int \int L\left( {{\varvec{x}},{\varvec{u}}} \right) e^{-2\pi i\left( {{\varvec{x}}-{\varvec{u}}q} \right) \cdot {\varvec{\xi }} }\mathrm{d}{\varvec{x}}\mathrm{d}{\varvec{u}}} \right) e^{2\pi i{\varvec{\xi }} \cdot {\varvec{t}}}\mathrm{d}{\varvec{\xi }}\nonumber \\ \end{aligned}$$
(7)

Then, we obtain the following decomposition of the focal stack \(\mathcal{S}\) using the Fourier transform of \(\mathcal{S}\left[ L \right] \left( {{\varvec{t}},q} \right) \) with respect to q

$$\begin{aligned}&\mathcal{S}\left[ L \right] \left( {{\varvec{t}},q} \right) = P_q \left[ L \right] \left( {\varvec{t}} \right) \nonumber \\&= \int \int \left( {\int \int {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) e^{2\pi i\left( {{\varvec{u}}\cdot {\varvec{\xi }} -\kappa } \right) q}dqd{\varvec{u}}} \right) e^{2\pi i{\varvec{\xi }} \cdot {\varvec{t}}}e^{2\pi iq\kappa }\mathrm{d}{\varvec{\xi }} d\kappa \nonumber \\&= \int \int \left( {\int {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \delta \left( {{\varvec{u}}\cdot {\varvec{\xi }} -\kappa } \right) d{\varvec{u}}} \right) e^{2\pi i{\varvec{\xi }} \cdot {\varvec{t}}}e^{2\pi iq\kappa }\mathrm{d}{\varvec{\xi }} \mathrm{d}\kappa \nonumber \\ \end{aligned}$$
(8)

Therefore, defining the back-projection operator \({\varPi }\) in the frequency domain as

$$\begin{aligned} {\hat{{\varPi }}} \left[ L \right] \left( {{\varvec{\xi }} ,\kappa } \right) =\int {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \delta \left( {{\varvec{u}}\cdot {\varvec{\xi }} -\kappa } \right) \mathrm{d}{\varvec{u}}, \end{aligned}$$
(9)

we obtain

$$\begin{aligned} \mathcal{S}\left[ L \right] \left( {{\varvec{t}},q} \right) =\int \int {\hat{{\varPi }}} \left[ L \right] \left( {{\varvec{\xi }} ,\kappa } \right) e^{2\pi i{\varvec{\xi }} \cdot {\varvec{t}}}e^{2\pi iq\kappa }\mathrm{d}{\varvec{\xi }} \mathrm{d}\kappa , \end{aligned}$$
(10)

where variables \({\varvec{t}},q\) are in the spatial domain and variables \({\varvec{\xi }} ,\kappa \) are in the frequency domain.

The Fast Discrete Focal Stack transform discretizes (2) using trigonometric interpolation to obtain a discrete counterpart to (10). For a discretized lightfield composed of \(N_x \times N_x \) microlenses of size \(N_u \times N_u \) with \(N_x =2n_x +1,N_u =2n_u +1,\) the photograph \(P_q \left[ L \right] \left( t \right) \) is computed as

$$\begin{aligned} P_q \left[ L \right] \left( {\varvec{t}} \right)= & {} \mathop \sum \limits _{{\varvec{u}}=-{\varvec{n}}_{\varvec{u}} }^{{\varvec{n}}_{\varvec{u}} } L\left( {{\varvec{u}}q+{\varvec{t}},{\varvec{u}}} \right) \\= & {} \frac{1}{{\varvec{N}}_x }\mathop \sum \limits _ {{\varvec{u}}=-{\varvec{n}}_{\varvec{u}} }^{{\varvec{n}}_{\varvec{u}} } \mathop \sum \limits _{{\varvec{\xi }} =-{\varvec{n}}_{\varvec{x}} }^{{\varvec{n}}_{\varvec{x}} } {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) e^{2\pi i\frac{\left( {{\varvec{u}}q+{\varvec{t}}} \right) \cdot {\varvec{\xi }} }{N_x }}\\= & {} \frac{1}{{\varvec{N}}_x }\mathop \sum \limits _{{\varvec{\xi }} =-{\varvec{n}}_{\varvec{x}} }^{{\varvec{n}}_{\varvec{x}} } e^{2\pi i\frac{{\varvec{t}}\cdot {\varvec{\xi }} }{N_x }}\mathop \sum \limits _{{\varvec{u}}=-{\varvec{n}}_{\varvec{u}} } ^{{\varvec{n}}_{\varvec{u}} } {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) e^{2\pi i\frac{q\left( {{\varvec{u}}\cdot {\varvec{\xi }} } \right) }{N_x }} \end{aligned}$$

Now, for \(n_q =2n_x n_u ,N_q =2n_q +1\)

$$\begin{aligned} e^{2\pi i\frac{q\left( {{\varvec{u}}\cdot {\varvec{\xi }} } \right) }{N_x }}= & {} \mathop \sum \limits _{\kappa =-n_q }^{n_q } \delta _{N_q } \left( {{\varvec{u}}\cdot {\varvec{\xi }} -\kappa } \right) e^{2\pi i\frac{\kappa q}{N_x }},\delta _{N_q } \left( x \right) \\= & {} \frac{1}{N_q }\mathop \sum \limits _{\kappa =-n_q }^{n_q } e^{2\pi i\frac{\kappa x}{N_q }} \end{aligned}$$

So

$$\begin{aligned} \mathcal{S}\left[ L \right] \left( {{\varvec{t}},q} \right)= & {} P_q \left[ L \right] \left( {\varvec{t}} \right) \nonumber \\= & {} \frac{1}{{\varvec{N}}_x }\mathop \sum \limits _{{\varvec{\xi }} =-{\varvec{n}}_{\varvec{x}} }^{{\varvec{n}}_{\varvec{x}} } \mathop \sum \limits _{\kappa =-n_q }^{n_q } \left( \mathop \sum \limits _{{\varvec{u}}=-{\varvec{n}}_{\varvec{u}} } ^{{\varvec{n}}_{\varvec{u}} } {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \delta _{N_q } \right. \nonumber \\&\left. \left( {\varvec{u}}\cdot {\varvec{\xi }} -k \right) \right) e^{2\pi i\frac{\kappa q}{N_x }}e^{2\pi i\frac{{\varvec{t}}\cdot {\varvec{\xi }} }{N_x }},{\varvec{N}}_x =N_x^2 \end{aligned}$$
(11)

Then, defining the discrete back-projection operator \({\varPi }\) in the frequency domain as

$$\begin{aligned} {\hat{{\varPi }}} \left[ L \right] \left( {{\varvec{\xi }} ,\kappa } \right) =\mathop \sum \limits _ {{\varvec{u}}=-{\varvec{n}}_{\varvec{u}} } ^{{\varvec{n}}_{\varvec{u}} } {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \delta _{N_q } \left( {{\varvec{u}}\cdot {\varvec{\xi }} -\kappa } \right) \end{aligned}$$
(12)

we obtain the discrete version of (10)

$$\begin{aligned} \mathcal{S}\left[ L \right] \left( {{\varvec{t}},q} \right)= & {} P_q \left[ L \right] \left( {\varvec{t}} \right) \nonumber \\= & {} \frac{1}{{\varvec{N}}_x }\mathop \sum \limits _{{\varvec{\xi }} =-{\varvec{n}}_{\varvec{x}} }^{{\varvec{n}}_{\varvec{x}} } \mathop \sum \limits _{\kappa =-n_q }^{n_q } {\hat{{\varPi }}} \left[ L \right] \left( {{\varvec{\xi }} ,\kappa } \right) e^{2\pi i\frac{\kappa q}{N_x }}e^{2\pi i\frac{{\varvec{t}}\cdot {\varvec{\xi }} }{N_x }}\nonumber \\ \end{aligned}$$
(13)

The Fast Discrete Focal Stack transform [17] computes the back-projections in \({O}(n^{4})\). The inverse Fourier transform of the back-projections are computed in \({O}(n^{4}\hbox {log}n)\) to obtain the final computational complexity of the DFST.

3 The Inversion of the Discrete Focal Stack Transform

In this section, we study the inversion of the Discrete Focal Stack Transform both in the continuous and discrete case. We introduce the concept of \({\varPi }\)-constant lightfields that includes the Lambertian lightfields, where this inversion is theoretically possible. We also study the numerical properties of this inversion procedure and propose several regularization methods to stabilize the transform.

3.1 The Continuous Case

Our main result is to make explicit what information of the lightfield can be exactly recovered from the focal stack.

Theorem

Given a lightfield L and the back-projection operator:

$$\begin{aligned} {\hat{{\varPi }}} \left[ L \right] \left( {{\varvec{\xi }} ,\kappa } \right) =\int {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \delta \left( {{\varvec{u}}\cdot {\varvec{\xi }} -\kappa } \right) \mathrm{d}{\varvec{u}}, \end{aligned}$$
(14)

then, the focal stack \(S\left[ L \right] \) and \({\varPi }\left[ L \right] \) can be obtained from each other using:

$$\begin{aligned} {\hat{{\varPi }}} \left[ L \right] ={\hat{S}}\left[ L \right] \end{aligned}$$
(15)

This result shows that the back-projection operator \({\hat{{\varPi }}} \left[ L \right] \) is the Fourier transform of the focal stack \(S\left[ L \right] \) and conversely that the focal stack is the inverse Fourier transform of \({\hat{{\varPi }}} \left[ L \right] \). The result easily follows from (10).

An immediate consequence of the preceding theorem is that the Focal Stack transform is not invertible in general. For example, if we take \(L\left( {{\varvec{x}},{\varvec{u}}} \right) =u_1 \) for \(-1/2\le u_1 \le 1/2\) and 0 elsewhere then we have \({\hat{L}} ({{\varvec{\xi }} ,{\varvec{u}}})=\delta \left( {\varvec{\xi }} \right) u_1 \) so \({\hat{{\varPi }}} \left( {{\varvec{\xi }} ,\kappa } \right) =\delta \left( {\varvec{\xi }} \right) \delta \left( \kappa \right) \int u_1 \mathrm{d}{\varvec{u}}=0\). Since \(\mathbf{-}L\left( {{\varvec{x}},{\varvec{u}}} \right) \) gives the same value for \({\hat{{\varPi }}} \left( {{\varvec{\xi }} ,k} \right) \) both lightfields generate the same focal stack and the Focal Stack transform is not invertible.

Although the Focal Stack transform is not invertible in general, there are special cases of lightfields that can be inverted. The simplest invertible lightfields are those for which \({\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \) is constant along all planes \({\varvec{\xi }} \cdot {\varvec{u}}=\kappa \). We will call them \({\varPi }\)-constant. Then, for a \({\varPi }\)-constant lightfield there exists a function \(\varphi \), such that the relation \({\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) =\varphi \left( {{\varvec{\xi }} ,{\varvec{\xi }} \cdot {\varvec{u}}} \right) \) holds. Then

$$\begin{aligned} {\hat{{\varPi }}} \left[ L \right] \left( {{\varvec{\xi }} ,\kappa } \right) =\varphi \left( {{\varvec{\xi }} ,\kappa } \right) \int \delta \left( {{\varvec{u}}\cdot {\varvec{\xi }} -\kappa } \right) \mathrm{d}{\varvec{u}}. \end{aligned}$$
(16)

Note that for a \({\varPi }\)-constant lightfield we can directly recover the lightfield from the back-projections. Renaming variable \({\varvec{u}}\) to \({\varvec{v}}\) and substituting \(\kappa ={\varvec{\xi }} \cdot {\varvec{u}}\) in (16) we have

$$\begin{aligned} {\hat{{\varPi }}} \left[ L \right] \left( {{\varvec{\xi }} ,{\varvec{\xi }} \cdot {\varvec{u}}} \right) ={\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \int \delta \left( {{\varvec{\xi }} \cdot {\varvec{v}}-{\varvec{\xi }} \cdot {\varvec{u}}} \right) \mathrm{d}{\varvec{v}}. \end{aligned}$$
(17)

Then, using (15) and (16) for a \({\varPi }\)-constant lightfield we have

$$\begin{aligned} {\hat{S}}\left[ L \right] \left( {{\varvec{\xi }} ,\kappa } \right) =\varphi \left( {{\varvec{\xi }} ,\kappa } \right) \int \delta \left( {{\varvec{u}}\cdot {\varvec{\xi }} -\kappa } \right) \mathrm{d}{\varvec{u}}. \end{aligned}$$
(18)

Finally, substituting \(\kappa \) for \({\varvec{\xi }} \cdot {\varvec{u}}\)

$$\begin{aligned} {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) =\frac{{\hat{S}}\left[ L \right] \left( {{\varvec{\xi }} ,{\varvec{\xi }} \cdot {\varvec{u}}} \right) }{\int \delta \left( {{\varvec{\xi }} \cdot \left( {{\varvec{v}}-{\varvec{u}}} \right) } \right) \mathrm{d}{\varvec{v}}} \end{aligned}$$
(19)

where for example in a square aperture region \(-n_u \le \nu _i \le n_u \) we have for \({\varvec{\xi }} =\left( {{{\xi }} _1 ,{{\xi }} _2 } \right) ^{\mathrm{T}}\)

$$\begin{aligned}&\int \delta \left( {{\varvec{\xi }} \cdot \left( {{\varvec{v}}-{\varvec{u}}} \right) } \right) \mathrm{d}{\varvec{v}}\nonumber \\&\quad =\left\{ {{\begin{array}{ll} {\frac{2n_u }{{\max }\left( {\left| {{\varvec{\xi }} _2 } \right| ,\left| {{\varvec{\xi }} _1 } \right| } \right) }+\frac{\kappa +l}{\left| {{\varvec{\xi }} _2 } \right| \left| {{\varvec{\xi }} _1 } \right| }}&{} {-m<\kappa<-l} \\ {\frac{2n_u }{{\max }\left( {\left| {{\varvec{\xi }} _2 } \right| ,\left| {{\varvec{\xi }} _1 } \right| } \right) }}&{} {-l\le \kappa \le m} \\ {\frac{2n_u }{{\max }\left( {\left| {{\varvec{\xi }} _2 } \right| ,\left| {{\varvec{\xi }} _1 } \right| } \right) }-\frac{\kappa -l}{\left| {{\varvec{\xi }} _2 } \right| \left| {{\varvec{\xi }} _1 } \right| }}&{} {l<\kappa <m} \\ \end{array} }} \right. \nonumber \\&\kappa ={\varvec{\xi }} \cdot {\varvec{u}},l=n_u \left| {\left| {{\varvec{\xi }} _2 } \right| -\left| {{\varvec{\xi }} _1 } \right| } \right| ,m=n_u \left| {\left| {{\varvec{\xi }} _2 } \right| +\left| {{\varvec{\xi }} _1 } \right| } \right| \end{aligned}$$
(20)

Note that (19) can also be written as

$$\begin{aligned} {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) =\frac{{\hat{S}}\left[ L \right] \left( {{\varvec{\xi }} ,{\varvec{\xi }} \cdot {\varvec{u}}} \right) }{{\hat{S}}\left[ \delta \right] \left( {{\varvec{\xi }} ,{\varvec{\xi }} \cdot {\varvec{u}}} \right) } \end{aligned}$$
(21)

where \(\delta \) is the lightfield \(L\left( {{\varvec{x}},{\varvec{u}}} \right) =\delta \left( {\varvec{x}} \right) .\) Therefore, in order to recover the original lightfield, the image from the focal stack \({\hat{S}}\left[ L \right] \left( {{\varvec{\xi }} ,{\varvec{\xi }} \cdot {\varvec{u}}} \right) \) has to be deconvolved with the kernel \({\hat{S}}\left[ \delta \right] \left( {{\varvec{\xi }} ,{\varvec{\xi }} \cdot {\varvec{u}}} \right) \).

3.1.1 Lambertian Lightfields

An important subset of lightfields is Lambertian lightfields, where the radiance emitted in the scene is independent of the direction. We now show that Lambertian lightfields are \({\varPi }\)-constant and therefore invertible. The proof follows from the fact that if L is a Lambertian lightfield, there exists functions \(\lambda \left( {\varvec{x}} \right) ,{\phi }\left( {\varvec{x}} \right) \) such that for all \({\varvec{x}},{\varvec{u}}\), the lightfield L is uniquely determined from the equation \(L\left( {{\varvec{x}}+\lambda \left( {\varvec{x}} \right) {\varvec{u}},{\varvec{u}}} \right) ={\phi }\left( {\varvec{x}} \right) \). Then, for a fixed \({\varvec{u}}\,\, {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \) is determined through the integral equation

$$\begin{aligned} \int {\hat{L}} \left( {{\varvec{\xi }},{\varvec{u}}} \right) e^{2\pi i{\varvec{\xi }} \cdot \left( {{\varvec{x}}+\lambda \left( {\varvec{x}} \right) u} \right) }\mathrm{d}{\varvec{\xi }} ={\phi }\left( {\varvec{x}} \right) \end{aligned}$$
(22)

The preceding equation is exactly the same substituting \({\varvec{u}}\) for \({\varvec{v}}={\varvec{u}}+\mu {\varvec{\xi }} ^{\bot }\), with \({\varvec{\xi }} ^{\bot }\) a vector orthogonal to \({\varvec{\xi }} \). Then, for all \({\varvec{u}}\) and \(\mu \) the relation: \({\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) ={\hat{L}} ({{\varvec{\xi }} ,{\varvec{u}}+\mu {\varvec{\xi }} ^{\bot }})\) holds. This means that we can remove the \({\varvec{\xi }} ^{\bot }\) direction without changing the value of \({\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) .\) Now, since vector \({\varvec{u}}\) can be decomposed as \({\varvec{u}}= ({{\varvec{\xi }} /\Vert {\varvec{\xi }}\Vert ^{2}}) ({{\varvec{\xi }} \cdot {\varvec{u}}})+ ({{\varvec{\xi }}^{\bot }/\Vert {\varvec{\xi }}^{\bot }\Vert ^{2}}) ({{\varvec{\xi }}^{\bot }\cdot {\varvec{u}}})\), we can write \({\hat{L}} ({{\varvec{\xi }} ,{\varvec{u}}})\) as \({\hat{L}} ({{\varvec{\xi }} ,{\varvec{u}}})={\hat{L}} ({{\varvec{\xi }}, ({{\varvec{\xi }} /\Vert {\varvec{\xi }}\Vert ^{2}}) ({{\varvec{\xi }} \cdot {\varvec{u}}})+ ({{\varvec{\xi }} ^{\bot }/\Vert {\varvec{\xi }}^{\bot }\Vert ^{2}}) ( {{\varvec{\xi }} ^{\bot }\cdot {\varvec{u}}})}).\) Removing the \({\varvec{\xi }} ^{\bot }\) direction we have \({\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) ={\hat{L}} \left( {{\varvec{\xi }} ,\left( {{\varvec{\xi }} /\Vert {\varvec{\xi }}\Vert ^{2}} \right) \left( {{\varvec{\xi }} \cdot {\varvec{u}}} \right) } \right) \). So \({\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \) depends on \({\varvec{\xi }} \) and \({\varvec{\xi }} \cdot {\varvec{u}}\) and there exists a function \(\varphi \) such that \({\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) =\varphi \left( {{\varvec{\xi }} ,{\varvec{\xi }} \cdot {\varvec{u}}} \right) \). Therefore, Lambertian lightfields are \({\varPi }\)-constant and invertible.

3.2 The Discrete Case

To study the discrete case, we will use the Discrete Focal Stack Transform to obtain a result analogous to (15). Using the discrete back-projections in (12), we have

$$\begin{aligned} \frac{1}{N_q }{\hat{\mathcal {S}}}\left[ L \right] \left( {{\varvec{t}},q\frac{N_x }{N_q }} \right) \left( {{\varvec{\xi }} ,\kappa } \right) ={\hat{{\varPi }}} \left[ L \right] \left( {{\varvec{\xi }} ,\kappa } \right) \end{aligned}$$
(23)

and

$$\begin{aligned} {\varPi }\left[ L \right] \left( {{\varvec{t}},q} \right) =\frac{1}{N_q }\mathcal{S}\left[ L \right] \left( {{\varvec{t}},q\frac{N_x }{N_q }} \right) \end{aligned}$$
(24)

The result follows from (13). If the lightfield is \({\varPi }\)-constant we have the following relations in the discrete case that are the analogous to (17), (19), (21)

$$\begin{aligned}&{\hat{{\varPi }}} \left[ L \right] \left( {{\varvec{\xi }} ,{\varvec{u}}\cdot {\varvec{\xi }} } \right) ={\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \mathop \sum \limits _{{\varvec{v}}=-{\varvec{n}}_{\varvec{u}} }^{{\varvec{n}}_{\varvec{u}} } \delta _{N_q } \left( {{\varvec{v}}\cdot {\varvec{\xi }} -{\varvec{u}}\cdot {\varvec{\xi }} } \right) \end{aligned}$$
(25)
$$\begin{aligned}&{\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) =\frac{{\hat{\mathcal {S}}}\left[ L \right] \left( {{\varvec{t}},q\frac{N_x }{N_q }} \right) \left( {{\varvec{\xi }} ,{\varvec{\xi }} \cdot {\varvec{u}}} \right) }{N_q \mathop \sum \nolimits _{{\varvec{u}}=-{\varvec{n}}_{\varvec{u}} } ^{{\varvec{n}}_{\varvec{u}} } \delta _{N_q } \left( {{\varvec{\xi }} \cdot {\varvec{v}}-{\varvec{\xi }} \cdot {\varvec{u}}} \right) } \end{aligned}$$
(26)
$$\begin{aligned}&{\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) =\frac{{\hat{\mathcal {S}}}\left[ L \right] \left( {{\varvec{t}},q\frac{N_x }{N_q }} \right) \left( {{\varvec{\xi }} ,{\varvec{\xi }} \cdot {\varvec{u}}} \right) }{{\hat{\mathcal {S}}}\left[ \delta \right] \left( {{\varvec{t}},q\frac{N_x }{N_q }} \right) \left( {{\varvec{\xi }} ,{\varvec{\xi }} \cdot {\varvec{u}}} \right) } \end{aligned}$$
(27)

where \(\delta \) is a lightfield in case when \(L\left( {{\varvec{x}},{\varvec{u}}} \right) =\delta _{N_x } \left( {\varvec{x}} \right) \). Similarly to the continuous case, the recovery of the lightfield can also be expressed as a deconvolution problem.

3.3 Direct Focal Stack Inversion

Using the result in (27) in the previous section, we have derived a formula analogous to (21) that can be used for the inversion of the lightfield if we assume that the lightfield is \({\varPi }\)-constant. However, the practical use of this inversion formula for focal stacks obtained from plenoptic cameras is very limited. In first place, we need a huge number of images in the focal stack to use (26), (27). For a typical plenoptic camera with \(255\times 255\) microlenses and \(9\times 9\) pixels behind each microlens, we need \(N_q =2n_q +1= 2033\) images in the focal stack to use the inversion formula. Another problem that arises is the range of the slope variable q. To avoid aliasing in the images of the focal stack, in a plenoptic camera we need \(|{qN_x /N_q }|\le 1\). In our case, this leads to \(\left| q \right| \le ({N_q /N_x })\approx N_u \) (in the preceding example this would give 17 images) while the range of q in (26), (27) is \(\left| q \right| \le n_q \). To summarize, for the lightfield example, we set before using a plenoptic camera we will typically have 17 unaliased planes out of 2033 in the focal stack and the rest will be aliased so the inversion formula in (26), (27) is not practical.

3.4 Partial Focal Stack Inversion

To obtain a practical procedure, we need to invert the focal stack using a reduced number of its unaliased images. In this section, we show how this can be done. In the discrete case, we have for arbitrary q (11):

$$\begin{aligned} {\hat{\mathcal {S}}}\left[ L \right] \left( {{\varvec{t}},q} \right) \left( {{\varvec{\xi }}} \right) =\mathop \sum \limits _{\kappa =-n_q }^{n_q } {\hat{{\varPi }}} \left[ L \right] \left( {{\varvec{\xi }} ,\kappa } \right) e^{2\pi i\frac{\kappa q}{N_x }} \end{aligned}$$
(28)

Defining the set: \(W\left( {\varvec{\xi }} \right) =\{ {\kappa /{\hat{{\varPi }}} \left[ L \right] \left( {{\varvec{\xi }} ,\kappa } \right) \ne 0}\}\), the equation in (28) is equivalent to

$$\begin{aligned} {\hat{\mathcal {S}}}\left[ L \right] \left( {{\varvec{t}},q} \right) \left( {\varvec{\xi }} \right) =\mathop \sum \limits _{\kappa \in W\left( {\varvec{\xi }} \right) } {\hat{{\varPi }}} \left[ L \right] \left( {{\varvec{\xi }} ,\kappa } \right) e^{2\pi i\frac{\kappa q}{N_x }} \end{aligned}$$
(29)

Using (29), we see that for each fixed \({\varvec{\xi }}\) we obtain a system of linear equations where \({\hat{\mathcal {S}}}\left[ L \right] \) are the known images from the focal stack, the back-projections \({\hat{{\varPi }}} \left[ L \right] \) are the unknowns, and both are connected through the coefficients \(e^{2\pi i\frac{\kappa q}{N_x }}\). The linear system can be written for several q values and \(\kappa \in W({\varvec{\xi }})\) in matrix form as

$$\begin{aligned} {\hat{{\varvec{s}}}}= & {} {\varvec{G}}{\hat{{\varvec{\pi }}}} ,{\hat{{\varvec{s}}}} _i ={\hat{\mathcal {S}}}\left[ L \right] \left( {{\varvec{t}},q_i } \right) \left( {\varvec{\xi }} \right) , {\varvec{G}}_{i,j} =e^{2\pi i\frac{\kappa _j q_i }{N_x }},{\hat{{\varvec{\pi }}}} _i \nonumber \\= & {} {\hat{{\varvec{{\varPi }}}}} \left[ L \right] \left( {{\varvec{\xi }} ,\kappa _i } \right) \hbox {with }-m\le i\le m,-n\le j\le n,2m+1\nonumber \\= & {} N_u^2 ,2n+1=cardinal\left( {W\left( {\varvec{\xi }} \right) } \right) . \end{aligned}$$
(30)

Then, we obtain the following result:

Proposition

For each \({\varvec{\xi }} ,\) the back-projection values \({\hat{{\varPi }}} \left[ L \right] \left( {\varvec{\xi }},\right. \left. \kappa \right) \) for \(k\in W\left( {\varvec{\xi }} \right) \) can be obtained from at most \(N_u^2 \) images in the focal stack with slopes \(q_i =\frac{i}{n}, \quad n\ge N_q /N_x , i=-m\ldots m,2m+1=N_u^2 .\)

The proof follows from the fact that the coefficient matrix \({\varvec{G}}\) of the linear system in (30) is of Vandermonde type. Note that the preceding proposition also allows us to recover the complete 4D lightfield for a \({\varPi }\)-constant lightfield using a focal stack with \(N_u^2 \) images. The result is immediate since a \({\varPi }\)-constant lightfield can be directly recovered from the back-projections \({\varPi }\) (25). For the typical lightfield, we set before this would lead to a set of 81 images from the focal stack. Those \(N_u^2 \) images can be taken from the non-aliasing range in the focal stack from a plenoptic camera taking \(n=m\) in the previous result since \(N_q /N_x \approx N_u \).

A relevant fact about the linear system in (30) is that the minimal number of equations to determine a particular frequency \({\varvec{\xi }} \) may be much lower than \(N_u^2 \). For example, the linear system in (30) for \({\varvec{\xi }} ={\varvec{0}}\) is \({\hat{\mathcal {S}}}\left[ L \right] \left( {{\varvec{0}},q} \right) ={\hat{{\varPi }}} \left[ L \right] \left( {{\varvec{0}},0} \right) \) . Then, to determine \({\hat{{\varPi }}} \left[ L \right] \left( {{\varvec{0}},0} \right) \) only one equation is needed and the rest are redundant. An example of linear system where the maximum number of equations is needed happens if \({\varvec{\xi }} =\left( {{\varvec{\xi }} _1 ,{\varvec{\xi }} _2 } \right) \) and \({\varvec{\xi }} _1 ,{\varvec{\xi }} _2 \) are coprime and \(\left| {{\varvec{\xi }} _1 } \right|>N_u ,\left| {{\varvec{\xi }} _2 } \right| >N_u \). Therefore, the effective volume of data to store the back-projections is lower than the volume of data of the lightfield. For the typical lightfield, we set before with \(255\times 255\) microlenses and \(9\times 9\) pixels behind each microlens, the size needed to store the back-projections is 96.51 % of the original lightfield.

3.5 Numerical Analysis and Regularization

In the preceding section, it was shown that it is possible to recover the lightfield from a partial section of the focal stack under the \({\varPi }\)-constant assumption. However, there are several problems with this approach from a numerical point of view. As shown in the previous section, if we use \(N_u^2 \) images some frequencies will have overdetermined linear systems. Therefore, a minimal amount of noise would make the inversion fail. Of course, we could remove the redundant equations to make the system determined, but this procedure will not use all the data available. Instead of directly using (30), to cope with image noise we could compute \({\hat{{\varPi }}} \left[ L \right] \left( {{\varvec{\xi }} ,k} \right) \) for \(k\in W\left( {\varvec{\xi }} \right) \) as the solution of

$$\begin{aligned} {\hat{{\varvec{\pi }}}} _\mathrm{opt} =\arg \mathop {\min }\limits _{{\hat{{\varvec{\pi }}}} } \Vert {\hat{{\varvec{s}}}} -{\varvec{G}}{\hat{{\varvec{\pi }}}}\Vert ^{2}, \end{aligned}$$
(31)

leading to

$$\begin{aligned} {\hat{{\varvec{\pi }}}} _\mathrm{opt} =\left( {{\varvec{G}}^{\mathrm{T}}{\varvec{G}}}\right) ^{-1}{\varvec{G}}^{\mathrm{T}}{\hat{{\varvec{s}}}}, \quad ({\varvec{G}}^{\mathrm{T}}{\varvec{G}})_{ij} {=}\mathop \sum \limits _l e^{2\pi i\frac{(\kappa _i -\kappa _j )q_l }{N_x }},\nonumber \\ \end{aligned}$$
(32)

where \(\left( {{\varvec{G}}^{\mathrm{T}}{\varvec{G}}} \right) ^{-1}{\varvec{G}}^{\mathrm{T}}\) is the Moore–Penrose pseudo-inverse of \({\varvec{G}}\).

The structure of the linear systems in (30), (32) poses another problem since it is a Vandermonde system that is ill-conditioned [19]. Therefore, additional information must be provided to obtain a solution. This process is called regularization [20]. There are different formulations that can be used to describe the solution of the regularized problem. In general, some regularization measurement Reg is used to promote desired solutions. A common choice for the regularization measurement is

$$\begin{aligned} \mathrm{Reg}\left( {\varvec{L}} \right) =\Vert {\varvec{D}}_k {\varvec{L}}\Vert _p^p , \quad 1\le p\le \infty \end{aligned}$$
(33)

where \({\varvec{D}}_k \) is a finite difference approximation to the kth derivative with \({\varvec{D}}_0 ={\varvec{I}}\) , the identity operator, by convention. The most common choices of p are \(p=1\) and \(p=2\).

3.5.1 \({L}_{2}\) Regularization

If \(p=2\) in (33) the regularization is called \(L_{2}\) or Tikhonov regularization [21] so \(\mathrm{Reg}({\varvec{L}})=\Vert {\varvec{D}}_k {\varvec{L}}\Vert _2^2\). In order to use \(L_{2}\) regularization we rewrite (28) as

$$\begin{aligned} {\hat{\mathcal {S}}}\left[ L \right] \left( {{\varvec{t}},q} \right) \left( {\varvec{\xi }} \right) =\mathop \sum \limits _ {{\varvec{u}}=-{\varvec{n}}_{\varvec{u}} }^{{\varvec{n}}_{\varvec{u}} } {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) e^{2\pi i\frac{q\left( {{\varvec{u}}\cdot {\varvec{\xi }} } \right) }{N_x }} \end{aligned}$$
(34)

That is a collection of linear equations with unknowns \({\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \) for each \({\varvec{\xi }} \) when q varies. Defining the product of two operators: \({\varvec{A}}\left( {{\varvec{\alpha }} ,{\varvec{\beta }} } \right) ,{\varvec{B}}\left( {{\varvec{\lambda }} ,{\varvec{\mu }} } \right) \) as \({\varvec{AB}}\left( {{\varvec{\alpha }} ,{\varvec{\mu }} } \right) =\mathop \sum \limits _{\varvec{\beta }} {\varvec{A}}\left( {{\varvec{\alpha }} ,{\varvec{\beta }} } \right) {\varvec{B}}\left( {{\varvec{\beta }} ,{\varvec{\mu }} } \right) \), (34) can be written as

$$\begin{aligned} {\hat{{\varvec{s}}}}= & {} {\varvec{G}}{\hat{{\varvec{l}}}} ,{\hat{{\varvec{s}}}} _q ={\hat{\mathcal {S}}}\left[ L \right] \left( {{\varvec{t}},q} \right) \left( {\varvec{\xi }} \right) ,{\varvec{G}}\left( {q,{\varvec{u}}} \right) =e^{2\pi i\frac{q\left( {{\varvec{u}}\cdot {\varvec{\xi }} } \right) }{N_x }},{\hat{{\varvec{l}}}} \left( {\varvec{u}} \right) \nonumber \\= & {} {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \end{aligned}$$
(35)

We will use \(k=0\) in (35) and define the solution to the regularized linear system through the following optimization problem:

$$\begin{aligned} {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) _\mathrm{opt} =\arg \mathop {\min }\limits _{{\hat{{\varvec{l}}}} } \Vert {\hat{{\varvec{s}}}} -{\varvec{G}}{\hat{{\varvec{l}}}}\Vert ^{2}+\lambda \Vert {\hat{{\varvec{l}}}}\Vert ^{2} \end{aligned}$$
(36)

where \(\lambda \) is a regularization parameter, \(\Vert {\hat{{\varvec{s}}}} -{\varvec{G}}{\hat{{\varvec{l}}}}\Vert ^{2}\) is the data-fit term, and \(\mathrm{Reg}\left( {\varvec{l}} \right) =\Vert {\hat{{\varvec{l}}}}\Vert ^{2}\) is the regularization measurement that enforces smooth solutions. The normal equations for the preceding system are

$$\begin{aligned} \left( {{\varvec{G}}^{\mathrm{T}}{\varvec{G}}+\lambda {\varvec{I}}} \right) {\hat{{\varvec{l}}}} ={\varvec{G}}^{\mathrm{T}}{\hat{{\varvec{s}}}} \end{aligned}$$
(37)

with

\({\varvec{G}}^{\mathrm{T}}{\varvec{G}}\left( {{\varvec{u}},{\varvec{v}}} \right) =\mathop \sum \nolimits _q e^{2\pi i\frac{q\left( {\left( {{\varvec{u}}-{\varvec{v}}} \right) ^{\prime }{\varvec{\xi }} } \right) }{N_x }},{\varvec{I}}\left( {{\varvec{u}},{\varvec{v}}} \right) =1\) if \({\varvec{u}}={\varvec{v}}\) and \({\varvec{I}}\left( {{\varvec{u}},{\varvec{v}}} \right) =0\) in other case.

The system in (37) has \(N_u^2 \) equations and \(N_u^2 \) unknowns. The value for \(\lambda \) has to be determined beforehand. Different choices of the parameter \(\lambda \) results in a tradeoff between the smoothness of the solution and a good data fit. There are several techniques to select the regularization parameter including the discrepancy principle, L-curve or Generalized Cross Validation [22]. Since we have \(N_x^2 \) linear systems with \(N_u^2 \) unknowns, the total computational cost for lightfield inversion is \({O}(N_x^2 N_u^6 ).\) The \(L_{2}\) regularization technique tends to produce a smoothing effect on the solution so other regularization approaches can also be employed [20] as we will see in the next section.

3.5.2 Total Variation (TV) Regularization

The total variation regularization was introduced for image denoising and reconstruction in [23]. It is designed to prevent oscillations in the regularized solution while allowing for discontinuities. The anisotropic version of the TV regularization uses \(\mathrm{Reg}({\varvec{L}})=\Vert {\varvec{D}}_1 {\varvec{L}}\Vert \). We will use a superior isotropic TV model, which takes the form

$$\begin{aligned} \mathrm{Reg}({\varvec{L}})= & {} \mathop \sum \limits _{x,u} \sqrt{\left( {{\varvec{D}}_{x_1 } {\varvec{L}}} \right) ^{2}{+}\left( {{\varvec{D}}_{x_2 }{\varvec{L}}} \right) ^{2}{+}\left( {{\varvec{D}}_{u_1 } {\varvec{L}}} \right) ^{2}{+}\left( {{\varvec{D}}_{u_2 } {\varvec{L}}} \right) ^{2}}\nonumber \\= & {} \mathop \sum \limits _{{\varvec{x}},{\varvec{u}}}\Vert \nabla {\varvec{L}}\left( {{\varvec{x}},{\varvec{u}}} \right) \Vert \end{aligned}$$
(38)

This model is a generalization of TV regularization from images to lightfields. The isotropic TV is the \(L_{1}\) norm of discretized lightfield gradient magnitudes. The \(L_{1}\) -norm in the regularizer introduces non-linearities and the minimization of (38) is more complex than the \(L_{2}\) case. A simple procedure that solves this problem is called variable splitting, which decouples the \(L_{2}\) data fit in (36) and the \(L_{1}\) regularizer by introducing new variables and converting the solution to the problem into simple minimization steps. There are several techniques to minimize this new problem like the augmented Lagrangian method [24], split Bregman iterative method [25], or additive half-quadratic algorithm [26]. To use the additive half-quadratic algorithm we begin by defining the finite difference operators for lightfields in frequency space:

$$\begin{aligned}&{\hat{{\varvec{D}}}}_{x_1 } {\varvec{{\varvec{L}}}}\left( {{{\xi }} _1 ,\xi _2 ,u_1 ,u_2 } \right) ={\hat{{\varvec{L}}}} \left( {x_1 +1,x_2 ,u_1 ,u_2 } \right) \nonumber \\&\quad -{\hat{{\varvec{L}}}} \left( {x_1 ,x_2 ,u_1 ,u_2 } \right) =\left( {e^{2\pi i\frac{\xi _1 }{N_x }}-1} \right) {\hat{{\varvec{L}}}} \left( {\xi _1 ,\xi _2 ,u_1 ,u_2 } \right) \nonumber \\&{\hat{{\varvec{D}}}}_{x_2 } {\varvec{L}}\left( {\xi _1 ,\xi _2 ,u_1 ,u_2 } \right) ={\hat{{\varvec{L}}}} \left( {x_1 ,x_2 +1,u_1 ,u_2 } \right) \nonumber \\&\quad -{\hat{{\varvec{L}}}} \left( {x_1 ,x_2 ,u_1 ,u_2 } \right) =\left( {e^{2\pi i\frac{\xi _2 }{N_x }}-1} \right) {\hat{{\varvec{L}}}} \left( {\xi _1 ,\xi _2 ,u_1 ,u_2 } \right) \nonumber \\&{\hat{{\varvec{D}}}}_{u_1 } {\hat{{\varvec{L}}}} \left( {\xi _1 ,\xi _2 ,u_1 ,u_2 } \right) ={\hat{{\varvec{L}}}} \left( {\xi _1 ,\xi _2 ,u_1 +1,u_2 } \right) \nonumber \\&\quad -{\hat{{\varvec{L}}}} \left( {\xi _1 ,\xi _2 ,u_1 ,u_2 } \right) \nonumber \\&{\hat{{\varvec{D}}}}_{u_2 } {\hat{{\varvec{L}}}} \left( {\xi _1 ,\xi _2 ,u_1 ,u_2 } \right) ={\hat{{\varvec{L}}}} \left( {\xi _1 ,\xi _2 ,u_1 ,u_2 +1} \right) \nonumber \\&\quad -{\hat{{\varvec{L}}}} \left( {\xi _1 ,\xi _2 ,u_1 ,u_2 } \right) \end{aligned}$$
(39)

That can be written as

$$\begin{aligned} {\hat{{\varvec{D}}}}_{x_1 } {\varvec{L}}= & {} {\varvec{R}}_{x_1 } {\hat{{\varvec{L}}}} ,{\varvec{R}}_{x_1 } =\left( {e^{2\pi i\frac{\xi _1 }{N_x }}-1} \right) {\varvec{I}},\nonumber \\ {\hat{{\varvec{D}}}}_{x_2 } {\varvec{L}}= & {} {\varvec{R}}_{x_2 } {\hat{{\varvec{L}}}} ,{\varvec{R}}_{x_2 } =\left( {e^{2\pi i\frac{\xi _2 }{N_x }}-1} \right) {\varvec{I}}\nonumber \\ {\hat{{\varvec{D}}}}_{{{u}}_1 } {\varvec{L}}= & {} {\varvec{R}}_{{{u}}_1 } {\hat{{\varvec{L}}}} ,{\varvec{R}}_{{{u}}_1 } \left( {{\varvec{u}},{\varvec{v}}} \right) ={\varvec{I}}\left( {{\varvec{u}},{\varvec{v}}-\left( {1,0} \right) } \right) -{\varvec{I}}\left( {{\varvec{u}},{\varvec{v}}} \right) ;\nonumber \\ {\hat{{\varvec{D}}}}_{{{u}}_2 } {\varvec{L}}= & {} {\varvec{R}}_{{{u}}_2 } {\hat{{\varvec{L}}}} ,{\varvec{R}}_{{{u}}_2 } \left( {{\varvec{u}},{\varvec{v}}} \right) ={\varvec{I}}\left( {{\varvec{u}},{\varvec{v}}-\left( {0,1} \right) } \right) -{\varvec{I}}\left( {{\varvec{u}},{\varvec{v}}} \right) .\nonumber \\ \end{aligned}$$
(40)

Then, to obtain the optimal solution of

$$\begin{aligned} {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) _\mathrm{opt} =\arg \mathop {\min }\limits _{{\hat{{\varvec{l}}}} } \frac{\gamma }{2}\Vert {\hat{{\varvec{s}}}} -{\varvec{G}}{\hat{{\varvec{l}}}}\Vert ^{2}+\frac{\beta }{2}\mathop \sum \limits _{x,u}\Vert \nabla {\varvec{L}}\left( {{\varvec{x}},{\varvec{u}}} \right) \Vert \end{aligned}$$
(41)

The additive half-quadratic algorithm proceeds by the minimization of the relaxed function

$$\begin{aligned} \left[ {{\hat{{\varvec{l}}}} ,{\hat{{\varvec{v}}}}}\right] _\mathrm{opt}= & {} \arg \mathop {\min }\limits _{{\varvec{l}}} \frac{\gamma }{2}\Vert {\hat{{\varvec{s}}}} -{\varvec{G}}{\hat{{\varvec{l}}}}\Vert ^{2}\nonumber \\&+\,\frac{\beta }{2}\mathop \sum \limits _{i\in \left\{ {x_1 ,x_2 ,u_1 ,u_2 } \right\} } \Vert {\varvec{R}}_i {\hat{{\varvec{l}}}} -{\hat{{\varvec{v}}}} _i\Vert ^{2}+\Vert {\hat{{\varvec{v}}}}_i\Vert \end{aligned}$$
(42)

where \({\hat{v}} _i \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \) are \(N_u^2 \) auxiliary variables for each i. The minimization proceeds by an alternate minimization in the variables \({\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \) and \({\hat{v}} _i \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \). Optimization in \({\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \) leads to the normal equations:

$$\begin{aligned}&\left( {{\varvec{G}}^{\mathrm{T}}{\varvec{G}}+\beta /\gamma \mathop \sum \limits _{i\in \left\{ {x_1 ,x_2 ,u_1 ,u_2 } \right\} } {\varvec{R}}_i^\mathrm{T} {\varvec{R}}_i } \right) {\hat{{\varvec{l}}}} ={\varvec{G}}^{\mathrm{T}}{\hat{{\varvec{s}}}} \nonumber \\&\quad +\,\beta /\gamma \mathop \sum \limits _{i\in \left\{ {x_1 ,x_2 ,u_1 ,u_2 } \right\} } {\varvec{R}}_i^\mathrm{T} {\hat{{\varvec{v}}}} _i \end{aligned}$$
(43)

That is a system for each \({\varvec{\xi }} \) with \(N_u^2 \) equations and \(N_u^2 \) unknowns assuming that the \({\hat{v}} _i \) are fixed. Optimization of (42) with respect to the \({\hat{{\varvec{v}}}} _i \) for a fixed \({\hat{{\varvec{l}}}} \) leads to the closed-form solution

$$\begin{aligned} {\varvec{v}}_{i_\mathrm{opt}} =\frac{{\varvec{D}}_i {\varvec{l}}}{\Vert {\varvec{D}}_i {\varvec{l}}\Vert }\max \left( \Vert {{\varvec{D}}_i {\varvec{l}}\Vert -\frac{1}{\beta },0} \right) \end{aligned}$$
(44)

Note that this second optimization is performed in the original spatial space while the first optimization is performed in the frequency space, so each minimization step changes between both spaces. Since the method is iterative, its computational complexity depends on the number of iterations. The computational complexity of each iteration is \({O}(N_x^2 N_u^6 ).\)

The values for \(\gamma \) and \(\beta \) have to be determined beforehand. Different choices for the parameters result in a tradeoff between the smoothness of the solution and a good data fit. Since the optimization method is iterative, it is necessary to provide an initial starting point for the algorithm. Setting \({\hat{{\varvec{v}}}}_i=0\) and computing \({\hat{{\varvec{l}}}} \) from (43) lead to a starting point for \({\hat{{\varvec{l}}}} \) that is the solution to the \(L_{2}\) regularization problem

$$\begin{aligned} {\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) _\mathrm{opt} =\arg \mathop {\min }\limits _{{\hat{{\varvec{l}}}} } \frac{\gamma }{2}\Vert {\hat{{\varvec{s}}}} -{\varvec{G}}{\hat{{\varvec{l}}}} \Vert ^{2}+\frac{\beta }{2}\mathop \sum \limits _{{\varvec{x}},{\varvec{u}}}\Vert \nabla {\varvec{L}}\left( {{\varvec{x}},{\varvec{u}}} \right) \Vert ^{2} \end{aligned}$$
(45)

with \(\mathrm{Reg}\left( {\varvec{L}} \right) =\Vert {\varvec{D}}_1 {\varvec{L}}\Vert _2^2 \). This is the initial point, we will use for the TV regularization in the experiments.

4 Experimental Results

In this section, we provide some computational results for the inversion of the Focal Stack. To test the applicability of the inversion procedure, we have processed several lightfields from several plenoptic cameras. The central image of the lightfields is shown in Fig. 2. Following the order in the figure, the statue lightfield has \(291\times 291\times 7\times 7\) pixels and the faces lightfield has \(291\times 291\times 5\times 5\) pixels and were taken from a research plenoptic camera. The toys, belfry, bicycle, and batteries lightfields were taken with a Lytro camera and have \(377\times 377\times 7\times 7\) pixels. Finally, the elephant and watch lightfields were taken with a Raytrix camera and have \(377\times 377\times 9\times 9\) pixels. Faces and statue lightfield had round microlenses that were cutoff to obtain square microlenses to process similarly all the lightfields. Lytro and Raytrix lightfield cameras have hexagonal microlenses in a hexagonal lattice that were interpolated to provide square microlenses in a square lattice [27]. Lytro images were affected by vignetting and noise. Noise is also noticeable in the statue and faces lightfield. Raytrix images are in black and white while the rest of plenoptic images are in color. Lightfield values are normalized in the range [0,1].

Fig. 2
figure 2

Central images from the statue, faces, toys, belfry, bicycle, batteries, elephant, and watch lightfields

4.1 The \({\varPi }\)-Constant Assumption

In Sect. 3.1, we introduced the \({\varPi }\)-constant lightfield assumption since inversion is not possible in the general case. In order to test the validity of this assumption, we have recovered the lightfields described in the previous section assuming that they are \({\varPi }\)-constant by means of formula (27). To measure the quality of the reconstruction, we have computed the Mean Squared Error (MSE), the Peak Signal-to-Noise Ratio (PSNR), and the Structural Similarity Index Measure (SSIM) [28]. Those measures are averaged over all the images of the lightfield to obtain lightfield quality measures. In Table 2, we show quantitative results of the reconstruction error. Qualitative results are presented in Sect. 4.2.2.

Table 2 Quality measures for lightfield reconstruction using the \({\varPi }\)-constant assumption

Results show that recovery based on the \({\varPi }\)-constant assumption is a very good approximation to the original lightfield with a standard deviation of the error around 1 % of the true values and a structural similarity above 0.95 in all cases. However, as shown in Sect. 3.3, despite its accuracy this reconstruction is not practical due to the high number of planes that are needed.

4.2 Regularized Reconstruction

Regularized reconstruction tries to invert the discrete focal stack transform computing for each \({\varvec{\xi }} \) the solution \({\hat{L}} \left( {{\varvec{\xi }} ,{\varvec{u}}} \right) \) of a linear system. We have performed several experiments for various types of regularizers: \(L_{2}\) regularization of 0th order and 1st order, and \(L_{1}\) isotropic TV regularization. As explained in Sect. 3.5.2 the initial point for TV regularization is computed from \(L_{2}\) 1st-order regularization. In order to test the dependency of the reconstruction with respect to the number of images in the focal stack, we have set different values for its number. All planes are equispaced in the non-aliasing range to use the proposition in Sect. 3.4. The results for the Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity Index Measure (SSIM) are shown on Tables 3 and 4.

Table 3 PSNR of lightfield reconstruction for several regularizers
Table 4 SSIM of lightfield reconstruction for several regularizers

The parameter values for regularization have been \(\lambda =10^{-6}\) for the \(L_{2}\) 0th order regularization and \(\gamma =10^{7}\), \(\beta =10^{2}\) for the 1st order regularization and were optimized for the recovery of all lightfields. The number of iterations for the TV regularization was set to 10. Although regularization parameters should be adjusted for each camera and each value of the number of planes, we adopted this approach due to the robustness of the error measures with respect to the regularization parameters. The data-fit term in (36) was divided by the number of planes. This allows to use the same regularization parameters for all problems.

Numerical results in Tables 3 and 4 show that the lightfield recovery error stabilizes around 11 planes that is approximately 2 times the width of the microlens. For 11 planes, the PSNR of the research camera is in the range [27.9, 32.0] with SSIM in the range [0.84, 0.92]. For the Lytro camera, the PSNR is in the range [27.6, 33.9] with SSIM in the range [0.88, 0.94] and for the Raytrix camera the PSNR is around 40 with SSIM around 0.99. The difference in the recovery results can be attributed to the quality of the images since research and Lytro lightfield images are noisier than Raytrix lightfield images. Note that for a typical lightfield with \(255\times 255\) microlenses and \(9\times 9\) pixels behind each microlens this would lead to a recovery from a volume of data that is 22.2 % of the original lightfield. The quantitative performance of the different regularizers is similar, although numerical results show that in general the PSNR and SSIM are ordered from lower to higher values as \(L_{2}\) 0th order, \(L_{2}\) 1st order, and TV regularization. A qualitative comparison of the recovery process is presented in Sect. 4.2.2.

4.2.1 Error Distribution in the Lightfield

Numerical results in Tables 3 and 4 show the mean error from all the images in the lightfield. A relevant question is the distribution of the error in all the images. To evaluate the error distribution for the images in the lightfield, we show in Fig. 3 the PSNR for each of the \(7\times 7\) images in the statue lightfield that has the worst recovery results with \(L_{2}\) 0th order regularization. The coefficient of variation of the PSNR and SSIM between all the lightfield images is small with values 0.0081 and 0.0274, respectively.

Fig. 3
figure 3

Error distribution of the PSNR and SSIM in the lightfield images

We have also studied the spatial distribution of the recovery error. It is measured as the absolute difference between the recovered lightfield and the true lightfield. The results can be seen in Fig. 4 for the central image of the lightfields in Fig. 2 and \(L_{1}\) TV regularization with 11 planes. Results show that errors are very low for most pixels of the image (lightfield values are in the range [0,1]). Errors are higher in the statue image (that can be attributed to noise) and the batteries image (there are specular reflections that violate the Lambertian assumption). Note that errors increase around depth discontinuities as shown in the belfry and bicycle images.

Fig. 4
figure 4

Spatial distribution of the recovery absolute error. It is measured as the absolute difference between the recovered lightfield and the true lightfield. Results are shown for the central images in the statue, faces, toys, belfry, bicycle, batteries, elephant, and watch lightfields. The regularization used has been \(L_{1}\) TV with a number of planes equal to 11

Fig. 5
figure 5

Focal stack with \(N_{p}=11\) planes for the statue lightfield

Fig. 6
figure 6

Focal stack with \(N_{p}=11\) planes for the elephant lightfield

4.2.2 Qualitative Results

In order to test qualitatively the lightfield recovery, we have selected the statue lightfield with the worst recovery results and the elephant lightfield with the best recovery results. The focal stack for \(N_{p}=11\) planes is shown in Fig. 5 for the statue lightfield and Fig. 6 for the elephant lightfield. To present the results, we have also selected the number of planes with worst recovery results \(N_{p}=3\), the number of planes where error stabilizes \(N_{p}=11\), and the number of planes with best recovery results \(N_{p}=41\). We have also selected for comparison the central image of the lightfield. In accordance with the preceding quantitative results in Figs. 7 and 8, we can see that there is no visual appreciable difference between \(N_{p}=11\) and \(N_{p}=41\). Comparing the recovered central image with the original central lightfield image, we also can see that even small details are restored. Note that lightfield images are extended depth of field (DOF) images, so one feature of the lightfield recovery technique is that it is capable to compute extended DOF images from a set of defocused images.

Fig. 7
figure 7

First row Central image of the lightfield (left) and recovery with the \({\varPi }\)-constant assumption (right). Second row, from left to right Recovery with \(N_{p}=3\) for \(L_{2}\) 0th order, \(L_{2}\) 1st order, and TV regularization. Third and fourth row Recovery with \(N_{p}=11\) and with \(N_{p}=41\) with the same regularizers

Fig. 8
figure 8

First row Central image of the lightfield (left) and recovery with the \({\varPi }\)-constant assumption (right). Second row, from left to right Recovery with \(N_{p}=3\) for \(L_{2}\) 0th order, \(L_{2}\) 1st order, and TV regularization. Third and fourth row Recovery with \(N_{p}=11\) and with \(N_{p}=41\) with the same regularizers

Fig. 9
figure 9

Top Detail from the central image of the faces and statue lightfield. Bottom Detail from the central image of the faces and statue recovered lightfields with TV regularization

Fig. 10
figure 10

Details from the batteries lightfield where non-Lambertian specularities are present. Note the quality in the reconstruction despite the specular reflection on the surfaces

4.2.3 Lightfield Denoising

A consequence of the regularization process is the attenuation of oscillations in the lightfield. This causes that noise in lightfield images is reduced. Therefore, the inversion process could also be used to reduce noise in lightfield images by selecting appropriate parameter values to select the strength of the regularization. We have selected the noisiest statues and faces lightfields to show this denoising effect. As can be seen in Fig. 9, the recovery process effectively denoises the images in the lightfield.

4.2.4 Lightfield Recovery Under Non-Lambertian Scenes

As explained in Sect. 3.1.1, lightfield inversion is exact for Lambertian lightfields. Then, it is interesting to study the behavior of the lightfield recovery process under non-Lambertian scenes. A common extension to the Lambertian assumption is the dichromatic model [29]. In this approach, the light reflected from the objects has two independent components, which correspond to Lambertian diffuse reflection and a specular non-Lambertian component. In Fig. 10, we study the qualitative performance of the reconstruction process under strong specular components. We have selected a detail from the batteries lightfield. It can be seen that the reconstruction process is able to recover both the diffuse and specular components of the original lightfield with high quality. The quantitative results of the difference between these two lightfield images are presented in Fig. 4.

Another source of non-Lambertian behavior appears at surface discontinuities even if the individual surfaces in the scene have a Lambertian reflectance. Then, it is interesting to study the recovery technique around depth discontinuities. We have selected the belfry lightfield due to its strong depth changes between the foreground and the background. As shown in Fig. 11, for a detail of two side-images from the lightfield, the reconstruction around depth discontinuities also has a good quality. Note that the parallax in the lightfield images is also recovered. The numerical error results of the reconstruction error around the discontinuity can also be found in Fig. 4.

Fig. 11
figure 11

Top Detail from two belfry lightfield images. Bottom Detail recovered from the belfry lightfield. Note the differences in position between the foreground and background in the left and right images

Fig. 12
figure 12

Numerical study of the dependence of the recovery error (vertical axis) against a Lambertian photoconsistency error measure (horizontal axis) for the belfry (left) and batteries (right) lightfield

Finally, in order to obtain a better understanding of the dependence of the reconstruction error under the Lambertian assumption, we have computed a photoconsistency measure [8] for each point in the scene. It is obtained from the lightfield by measuring the variance of the intensity of all rays emanating from a surface point at a given depth. Since the depth of a given point on the surface is unknown, we compute the minimum of this measure for all possible point depths. Therefore, for a Lambertian surface, the photoconsistency error measure should be ideally zero for any point. Deviations from this assumption due to specularities or discontinuities should generate higher values. To study the dependence between both errors, we have computed a 2D histogram with the minus logarithm of both the photoconsistency and reconstruction errors in the horizontal and vertical axis, respectively. Therefore, lower errors for both measures are in the top and right side of the histogram. These values are accumulated for all surface points and rendered in Fig. 12 for the belfry and batteries lightfields. Numerical results show that lower values of photoconsistency error lead to lower values of the reconstruction error.

5 Conclusions

We have presented a new technique for the recovery of a lightfield from its Focal Stack. This recovery is based on the inversion of the Discrete Focal Stack transform. The inversion process has been studied in the continuous case and it is shown that this inversion is not possible in general. However, there is a subset of lightfields where this inversion is possible. This subset, that we call \({\varPi }\)-constant lightfields, has been characterized and it is shown that it includes the important case of Lambertian lightfields. A simple and direct inversion procedure has been developed for these lightfields. The preceding results have been also studied in the discrete case, and a similar direct procedure has been developed. A drawback of the direct procedure is that it needs a large number of images in the focal stack. Therefore, a theoretical result has been provided that ensures that inversion for \({\varPi }\)-constant lightfields is possible solving a linear system of equations for a focal stack with a volume of data equal to the volume of data of the lightfield. We have studied the inversion of this linear system, and we have shown that is bad-conditioned, so regularization procedures have been proposed and detailed. The inversion method has been tested with several images from plenoptic cameras. Experimental results show that \({\varPi }\)-constant lightfields are a very good approximation to real world lightfields. To study practical inversion procedures, we have studied the inversion of general lightfields with regularization. The experimental results show that general lightfields can be recovered from a focal stack with a limited number of images with high precision, so a focal stack can be used as a compressed representation of a lightfield. The inversion method is accurate enough to recover parallax and depth discontinuities. This method can also be employed to obtain an extended depth of field image from a focal stack. Due to the regularization effect, the inversion procedure can also be used to denoise the lightfield. Future extensions to this paper will include the study of the optimal placement of planes in the focal stack to decrease its number in the recovery process and porting the technique to Graphical Processing Units and Field Programmable Gate Arrays.