Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Data analysis is becoming more and more important in astronomy. This can be explained by detector evolution, which concerns all wavelengths. In the 1980s, CCD (charge-coupled device) images had a typical size of 512 × 512 pixels, while astronomers now have CCD mosaics with 16, 000 × 16, 000 pixels or even more. At the same time, methods of analysis have become much more complex, and the human and financial efforts to create and process the data can sometimes be of the same order as for the construction of the instrument itself. As an example, for the ISOCAM camera of the Infrared Space Observatory (ISO), the command software of the instrument, and the online and offline data processing, required altogether 70 person years of development, while 200 person years were necessary for the construction of the camera. The data analysis effort for the PLANCK project is even larger. Furthermore, the quantity of outputs requires the use of databases, and in parallel sophisticated tools are needed to extract ancillary astrophysical information, generally now through the web. From the current knowledge, new questions emerge, and it is necessary to proceed to new observations of a given object or a part of the sky. The acquired data need to be calibrated prior to useful information for the scientific project being extracted.

Data analysis acts during the calibration, the scientific information extraction process, and the database manipulation. The calibration phase consists of correcting various instrumental effects, such as the dark current (i.e., in the absence of any light, the camera does not return zero values, and the measured image is called the dark image and needs to be subtracted from any observation) or the flat-field correction (i.e., for uniform light, the detector does not return the same value for each pixel, and a normalization needs to be performed by dividing the observed image by the “flat” image). Hence, it is very important to know well the parameters of the detector (flat-field image, dark image, etc.), because any error on these parameters will propagate to the measurements. Other effects can also be corrected during this phase, such as the removal of the cosmic ray impacts or the field distortion (the pixel surface for each pixel does not correspond to the same surface on the sky). Depending on the knowledge of the instrument, each of these tasks may be more or less difficult.

Once the data are calibrated, the analysis phase can start. Following the scientific objectives, several kinds of information can be extracted from the data, such as the detection of stars and galaxies, the measurement of their position, intensity, and various morphological parameters. The results can be compared to existing catalogs, obtained from previous observations. It is obviously impossible to cite all operations we may want to carry through on an astronomical image, and we have just mentioned the most common. In order to extract the information, it is necessary to take into account noise and point spread function. Noise is the random fluctuation which is added to the CCD data and comes partially from the detector and partially from the data. In addition to the errors induced by the noise on the different measurements, noise also limits the detection of objects and can be responsible for false detections. The point spread function is manifested in how the image of a star, for example, is generally spread out on several pixels, caused by the atmosphere’s effect on the light path. The main effect is a loss of resolution, because two sufficiently close objects cannot be separated. Once information has been extracted, such details can be compared to our existing knowledge. This comparison allows us to validate or reject our understanding of the universe.

In this chapter, we will discuss in detail how to detect objects in astronomical images and how to take into account the point spread function through the deconvolution processing.

1.1 Source Detection

As explained above, source (i.e., object) extraction from images is a fundamental step for astronomers. For example, to build catalogs, stars and galaxies must be identified and their position and photometry must be estimated with good accuracy. Catalogs comprise a key result of astronomical research. Various methods have been proposed to support the construction of catalogs. One of the now most widely used software packages is SExtractor [6], which is capable of handling very large images. A standard source detection approach, such as in SExtractor, consists of the following steps:

  1. 1.

    Background estimation.

  2. 2.

    Convolution with a mask.

  3. 3.

    Detection.

  4. 4.

    Deblending/merging.

  5. 5.

    Photometry.

  6. 6.

    Classification.

These different steps are described in the next section. Astronomical images contain typically a large set of point-like sources (the stars), some quasi point-like objects (faint galaxies, double stars), and some complex and diffuse structures (galaxies, nebulae, planetary stars, clusters, etc.). These objects are often hierarchically organized: a star in a small nebula, itself embedded in a galaxy arm, itself included in a galaxy, and so on.

The standard approach, which is presented in detail in Sect. 2, presents some limits, when we are looking for faint extended objects embedded in noise. Figure 1 shows a typical example where a faint extended object is under the detection limit. In order to detect such objects, more complex data modeling needs to be defined. Section 3 presents another approach to model and represent astronomical data, by using a sparse model in a wavelet dictionary. A specific wavelet transform, called the starlet transform or the isotropic undecimated wavelet transform, is presented. Based on this new modeling, several approaches are proposed in Sects. 4 and 5.

Fig. 1
figure 1

Example of astronomical data: a point source and an extended source are shown, with noise and background. The extended object, which can be detected by eye, is undetected by a standard detection approach

2 Standard Approaches to Source Detection

We describe here the most popular way to create a catalog of galaxies from astronomical images.

2.1 The Traditional Data Model

The observed data Y can be decomposed into two parts, the signal X and the noise N:

$$\displaystyle{ Y [k,l] = X[k,l] + N[k,l] }$$
(1)

The imaging system can also be considered. If it is linear, the relation between the data and the image in the same coordinate frame is a convolution:

$$\displaystyle\begin{array}{rcl} Y [k,l]& =& (HX)[k,l] + N[k,l]{}\end{array}$$
(2)

where H is the matrix related to the point spread function (PSF) of the imaging system.

In most cases, objects of interest are superimposed on a relatively flat signal B, called background signal. The model becomes

$$\displaystyle{ Y [k,l] = (HX)[k,l] + B[k,l] + N[k,l] }$$
(3)

2.2 PSF Estimation

The PSF H can be estimated from the data or from an optical model of the imaging telescope. In astronomical images, the data may contain stars, or one can point towards a reference star in order to reconstruct a PSF. The drawback is the “degradation” of this PSF because of unavoidable noise or spurious instrument signatures in the data. So, when reconstructing a PSF from experimental data, one has to reduce very carefully the images used (background removal for instance). Another problem arises when the PSF is highly variable with time, as is the case for adaptive optics (AO) images. This means usually that the PSF estimated when observing a reference star, after or before the observation of the scientific target, has small differences from the perfectly correct PSF.

Another approach consists of constructing a synthetic PSF. Various studies [11, 21, 38, 39] have suggested a radially symmetric approximation to the PSF:

$$\displaystyle\begin{array}{rcl} P(r) \propto \left (1 + \frac{r^{2}} {R^{2}}\right )^{-\beta }& &{}\end{array}$$
(4)

The parameters β and R are obtained by fitting the model with stars contained in the data.

In the case of AO systems, this model can be used for the tail of the PSF (the so-called seeing contribution), while in the central region, the system provides an approximation of the diffraction-limited PSF. The quality of the approximation is measured by the Strehl ratio (SR), which is defined as the ratio of the observed peak intensity in the image of a point source to the theoretical peak intensity of a perfect imaging system.

2.3 Background Estimation

The background must be accurately estimated; otherwise it will introduce bias in flux estimation. In [7, 28], the image is partitioned into blocks, and the local sky level in each block is estimated from its histogram. The pixel intensity histogram p(Y ) is modeled using three parameters, the true sky level B, the RMS (root mean square) noise σ, and a parameter describing the asymmetry in p(Y ) due to the presence of objects, and is defined by [7]:

$$\displaystyle{ p(Y ) = \frac{1} {a}\exp \left (\sigma ^{2}/2a^{2}\right )\exp \left [-(Y - B)/a\right ]{\mathrm{erfc}}\left ( \frac{\sigma } {a} -\frac{(Y - B)} {\sigma } \right ) }$$
(5)

Median filtering can be applied to the 2D array of background measurements in order to correct for spurious background values. Finally the background map is obtained by a bilinear or a cubic interpolation of the 2D array. The block size is a crucial parameter. If it is too small, the background estimation map will be affected by the presence of objects, and if too large it will not take into account real background variations.

In [6, 15], the local sky level is calculated differently. A 3-sigma clipping around the median is performed in each block. If the standard deviation is changed by less than 20 % in the clipping iterations, the block is uncrowded, and the background level is considered to be equal to the mean of the clipped histogram. Otherwise, it is calculated by c 1 ×median − c 2 ×mean, where \(c_{1} = 3,c_{2} = 2\) in [15] and \(c_{1} = 2.5,c_{2} = 1.5\) in [6]. This approach has been preferred to histogram fitting for two reasons: it is more efficient from the computation point of view and more robust with small sample size.

2.4 Convolution

In order to optimize the detection, the image must be convolved with a filter. The shape of this filter optimizes the detection of objects with the same shape. Therefore, for star detection, the optimal filter is the PSF. For extended objects, a larger filter size is recommended. In order to have optimal detection for any object size, the detection must be repeated several times with different filter sizes, leading to a kind of multiscale approach.

2.5 Detection

Once the image is convolved, all pixels Y [k, l] at location (k, l) with a value larger than T [k, l] are considered as significant, i.e., belonging to an object. T [k, l] is generally chosen as B [k, l] + K σ, where B [k, l] is the background estimate at the same position, σ is the noise standard deviation, and K is a given constant (typically chosen between 3 and 5). The thresholded image is then segmented, i.e., a label is assigned to each group of connected pixels. The next step is to separate the blended objects which are connected and have the same label.

An alternative to the thresholding/segmentation procedure is to find peaks. This is only well suited to star detection and not to extended objects. In this case, the next step is to merge the pixels belonging to the same object.

2.6 Deblending/Merging

This is the most delicate step. Extended objects must be considered as single objects, while multiple objects must be well separated. In SExtractor, each group of connected pixels is analyzed at different intensity levels, starting from the highest down to the lowest level. The pixel group can be seen as a surface, with mountains and valleys. At the beginning (highest level), only the highest peak is visible. When the level decreases, several other peaks may become visible, defining therefore several structures. At a given level, two structures may become connected, and the decision whether they form only one (i.e., merging) or several objects (i.e., deblending) must be taken. This is done by comparing the integrated intensities inside the peaks. If the ratio between them is too low, then the two structures must be merged.

2.7 Photometry and Classification

2.7.1 Photometry

Several methods can be used to derive the photometry of a detected object [7, 29]. Adaptive aperture photometry uses the first image moment to determine the elliptical aperture from which the object flux is integrated. Kron [29] proposed an aperture size of twice the radius of the first image moment radius r 1, which leads to recovery of most of the flux ( > 90 %). In [6], the value of 2. 5r 1 is discussed, leading to loss of less than 6 % of the total flux. Assuming that the intensity profiles of the faint objects are Gaussian, flux estimates can be refined [6, 35]. When the image contains only stars, specific methods can be developed which take the PSF into account [18, 42].

2.7.2 Star–Galaxy Separation

In the case of star–galaxy classification, following the scanning of digitized images, Kurtz [30] lists the following parameters which have been used:

  1. 1.

    Mean surface brightness;

  2. 2.

    Maximum intensity and area;

  3. 3.

    Maximum intensity and intensity gradient;

  4. 4.

    Normalized density gradient;

  5. 5.

    Areal profile;

  6. 6.

    Radial profile;

  7. 7.

    Maximum intensity, 2nd and 4th order moments, and ellipticity;

  8. 8.

    The fit of galaxy and star models;

  9. 9.

    Contrast versus smoothness ratio;

  10. 10.

    The fit of a Gaussian model;

  11. 11.

    Moment invariants;

  12. 12.

    Standard deviation of brightness;

  13. 13.

    2nd order moment;

  14. 14.

    Inverse effective squared radius;

  15. 15.

    Maximum intensity and intensity-weighted radius;

  16. 16.

    2nd and 3rd order moments, number of local maxima, and maximum intensity.

References for all of these may be found in the cited work. Clearly there is room for differing views on parameters to be chosen for what is essentially the same problem. It is of course the case also that aspects such as the following will help to orientate us towards a particular set of parameters in a particular case: the quality of the data; the computational ease of measuring certain parameters; the relevance and importance of the parameters measured relative to the data analysis output (e.g., the classification, or the planar graphics); and, similarly, the importance of the parameters relative to theoretical models under investigation.

2.7.3 Galaxy Morphology Classification

The inherent difficulty of characterizing spiral galaxies especially when not face-on has meant that most work focuses on ellipticity in the galaxies under study. This points to an inherent bias in the potential multivariate statistical procedures. In the following, it will not be attempted to address problems of galaxy photometry per se [17, 44], but rather to draw some conclusions on what types of parameters or features have been used in practice.

From the point of view of multivariate statistical algorithms, a reasonably homogeneous set of parameters is required. Given this fact, and the available literature on quantitative galaxy morphological classification, two approaches to parameter selection appear to be strongly represented:

  1. 1.

    The luminosity profile along the major axis of the object is determined at discrete intervals. This may be done by the fitting of elliptical contours, followed by the integrating of light in elliptical annuli [33]. A similar approach was used in the ESO–Uppsala survey. Noisiness and faintness require attention to robustness in measurement: the radial profile may be determined taking into account the assumption of a face-on optically thin axisymmetric galaxy and may be further adjusted to yield values for circles of given radius [64]. Alternatively, isophotal contours may determine the discrete radial values for which the profile is determined [62].

  2. 2.

    Specific morphology-related parameters may be derived instead of the profile. The integrated magnitude within the limiting surface brightness of 25 or 26 mag. arcsec−2 in the visual is popular [33, 61]. The logarithmic diameter (D 26) is also supported by Okamura [43]. It may be interesting to fit to galaxies under consideration model bulges and disks using, respectively, \(r^{\frac{1} {4} }\) or exponential laws [62], in order to define further parameters. Some catering for the asymmetry of spirals may be carried out by decomposing the object into octants; furthermore, the taking of a Fourier transform of the intensity may indicate aspects of the spiral structure [61].

The following remarks can be made relating to image data and reduced data:

  • The range of parameters to be used should be linked to the subsequent use to which they might be put, such as to underlying physical aspects.

  • Parameters can be derived from a carefully constructed luminosity profile, rather than it being possible to derive a profile from any given set of parameters.

  • The presence of both partially reduced data such as luminosity profiles, and more fully reduced features such as integrated flux in a range of octants, is of course not a hindrance to analysis. However, it is more useful if the analysis is carried out on both types of data separately.

Parameter data can be analyzed by clustering algorithms, by principal component analysis, or by methods for discriminant analysis. Profile data can be sampled at suitable intervals and thus analyzed also by the foregoing procedures. It may be more convenient in practice to create dissimilarities between profiles and analyze these dissimilarities: this can be done using clustering algorithms with dissimilarity input.

3 Mathematical Modeling

Different models may be considered to represent the data. One of the most effective is certainly the sparsity model, especially when a specific wavelet dictionary is chosen to represent the data. We introduce here the sparsity concept as well as the wavelet transform decomposition, which is the most used in astronomy.

3.1 Sparsity Data Model

A signal \(X,\ X = \left [x_{1},\ldots,x_{N}\right ]^{T}\), is sparse if most of its entries are equal to zero. For instance, a k-sparse signal is a signal where only k samples have a nonzero value. A less strict definition is to consider a signal as weakly sparse or compressible when only a few of its entries have a large magnitude, while most of them are close to zero.

If a signal is not sparse, it may be sparsified using a given data representation. For instance, if X is a sine, it is clearly not sparse but its Fourier transform is extremely sparse (i.e., 1-sparse). Hence, we say that a signal X is sparse in the Fourier domain if its Fourier coefficients \(\hat{X}[u]\), \(\hat{X}[u] = \frac{1} {N}\sum _{k=-\infty }^{+\infty }X[k]e^{2i\pi \frac{uk} {N} },\) are sparse. More generally, we can model a vector signal \(X \in \mathbb{R}^{N}\) as the linear combination of T elementary waveforms, also called signal atoms: \(X =\boldsymbol{ \Phi }\alpha =\sum _{ i=1}^{T}\alpha [i]\phi _{i}\), where \(\alpha [i] =\big< X,\phi _{i}\big >\) are called the decomposition coefficients of X in the dictionary \(\boldsymbol{\Phi } = [\phi _{1},\ldots,\phi _{T}]\) (the N × T matrix whose columns are the atoms normalized to a unit 2-norm, i.e., \(\forall i \in [1,T],\left \|\phi _{i}\right \|_{\ell^{2}} = 1\)).

Therefore, to get a sparse representation of our data, we need first to define the dictionary \(\boldsymbol{\Phi }\) and then to compute the coefficients α. x is sparse in \(\Phi \) if the sorted coefficients in decreasing magnitude have fast decay, i.e., most coefficients α vanish except for a few.

The best dictionary is the one which leads to the sparsest representation. Hence, we could imagine having a huge overcomplete dictionary (i.e., TN), but we would be faced with prohibitive computation time cost for calculating the α coefficients. Therefore, there is a trade-off between the complexity of our analysis step (i.e., the size of the dictionary) and the computation time. Some specific dictionaries have the advantage of having fast operators and are very good candidates for analyzing the data.

The Isotropic Undecimated Wavelet Transform (IUWT) , also called starlet wavelet transform, is well known in the astronomical domain because it is well adapted to astronomical data where objects are more or less isotropic in most cases [54, 57]. For most astronomical images, the starlet dictionary is very well adapted.

3.2 The Starlet Transform

The starlet wavelet transform [53] decomposes an n × n image c 0 into a coefficient set \(W =\{ w_{1},\ldots,w_{J},c_{J}\}\), as a superposition of the form

$$\displaystyle{c_{0}[k,l] = c_{J}[k,l] +\sum _{ j=1}^{J}w_{ j}[k,l],}$$

where c J is a coarse or smooth version of the original image c 0 and w j represents the details of c 0 at scale 2j (see Starck et al. [56, 58] for more information). Thus, the algorithm outputs J + 1 sub-band arrays of size N × N. (The present indexing is such that j = 1 corresponds to the finest scale or high frequencies.)

The decomposition is achieved using the filter bank \((h_{2\text{D}},g_{2\text{D}} =\delta -h_{2\text{D}},\tilde{h}_{2\text{D}} =\delta,\tilde{g}_{2\text{D}} =\delta )\), where h 2D is the tensor product of two 1D filters h 1D and δ is the Dirac function. The passage from one resolution to the next one is obtained using the “à trous” (“with holes”) algorithm [58]:

$$\displaystyle\begin{array}{rcl} \begin{array}{ll} c_{j+1}[k,l] & =\sum _{m}\sum _{n}h_{1{\mathrm{D}}}[m]\ h_{1{\mathrm{D}}}[n]\ c_{j}[k + 2^{j}m,l + 2^{j}n], \\ w_{j+1}[k,l]& = c_{j}[k,l] - c_{j+1}[k,l]\,\end{array} & &{}\end{array}$$
(6)

If we choose a B 3-spline for the scaling function,

$$\displaystyle\begin{array}{rcl} & \phi (x) = B_{3}(x) = & \\ & \frac{1} {12}(\mid x - 2\mid ^{3} - 4\mid x - 1\mid ^{3} + 6\mid x\mid ^{3} - 4\mid x + 1\mid ^{3} + \mid x + 2\mid ^{3})&{}\end{array}$$
(7)

the coefficients of the convolution mask in one dimension are \(h_{1{\mathrm{D}}} = \left \{ \frac{1} {16}, \frac{1} {4}, \frac{3} {8}, \frac{1} {4}, \frac{1} {16}\right \}\), and in two dimensions,

$$\displaystyle{h_{2\text{D}} = \left( 1/16 \, 1/4 \, 3/8 \, 1/4 \, 1/16 \right)\left( \begin{array}{c} \frac{1} {16} \\ \frac{1} {4} \\ \frac{3} {8} \\ \frac{1} {4} \\ \frac{1} {16}\end{array} \right) = \left(\begin{array}{ccccc} \frac{1} {256} & \frac{1} {64} & \frac{3} {128} & \frac{1} {64} & \frac{1} {256} \\ \frac{1} {64} & \frac{1} {16} & \frac{3} {32} & \frac{1} {16} & \frac{1} {64} \\ \frac{3} {128} & \frac{3} {32} & \frac{9} {64} & \frac{3} {32} & \frac{3} {128} \\ \frac{1} {64} & \frac{1} {16} & \frac{3} {32} & \frac{1} {16} & \frac{1} {64} \\ \frac{1} {256} & \frac{1} {64} & \frac{3} {128} & \frac{1} {64} & \frac{1} {256} \end{array} \right )}$$

Figure 2 shows the scaling function and the wavelet function when a cubic spline function is chosen as the scaling function ϕ.

Fig. 2
figure 2

Left, the cubic spline function ϕ; right, the wavelet ψ

The most general way to handle the boundaries is to consider that \(c[k + N] = c[N - k]\) (“mirror”). But other methods can be used such as periodicity \((c[k + N] = c[N])\) or continuity \((c[k + N] = c[k])\).

The starlet transform algorithm is:

  1. 1.

    We initialize j to 0 and we start with the data c j [k, l].

  2. 2.

    We carry out a discrete convolution of the data c j [k, l] using the filter (h 2D), using the separability in the two-dimensional case. In the case of the B 3-spline, this leads to a row-by-row convolution with \(\left ( \frac{1} {16}, \frac{1} {4}, \frac{3} {8}, \frac{1} {4}, \frac{1} {16}\right )\), followed by column-by-column convolution. The distance between the central pixel and the adjacent ones is 2j.

  3. 3.

    After this smoothing, we obtain the discrete wavelet transform from the difference \(c_{j}[k,l] - c_{j+1}[k,l]\).

  4. 4.

    If j is less than the number J of resolutions we want to compute, we increment j and then go to step 2.

  5. 5.

    The set \(\alpha =\{ w_{1},\ldots,w_{J},c_{J}\}\) represents the wavelet transform of the data.

This starlet transform is very well adapted to the detection of isotropic features, and this explains its success for astronomical image processing, where the data contain mostly isotropic or quasi-isotropic objects, such as stars, galaxies, or galaxy clusters.

Figure 3 shows the starlet transform of the galaxy NGC 2997 displayed in Fig. 4. Five wavelet scales and the final smoothed plane (lower right) are shown. The original image is given exactly by the sum of these six images.

Fig. 3
figure 3

Wavelet transform of NGC 2997 by the IUWT. The co-addition of these six images reproduces exactly the original image

Fig. 4
figure 4

Galaxy NGC 2997

3.3 The Starlet Reconstruction

The reconstruction is straightforward. A simple co-addition of all wavelet scales reproduces the original map: \(c_{0}[k,l] = c_{J}[k,l] +\sum _{ j=1}^{J}w_{j}[k,l]\). But because the transform is non-subsampled, there are many ways to reconstruct the original image from its wavelet transform [53]. For a given wavelet filter bank (h, g), associated with a scaling function ϕ and a wavelet function ψ, any synthesis filter bank \((\tilde{h},\ \tilde{g})\), which satisfies the following reconstruction condition

$$\displaystyle\begin{array}{rcl} \hat{h}^{{\ast}}(\nu )\hat{\tilde{h}}(\nu ) +\hat{ g}^{{\ast}}(\nu )\hat{\tilde{g}}(\nu ) = 1\,& &{}\end{array}$$
(8)

leads to exact reconstruction. For instance, for isotropic h, if we choose \(\tilde{h} = h\) (the synthesis scaling function \(\tilde{\phi } =\phi\)), we obtain a filter \(\tilde{g}\) defined by [53]

$$\displaystyle{\tilde{g} =\delta +h.}$$

If h is a positive filter, then g is also positive. For instance, if \(h_{\mathrm{1D}} = [1,4,6,4,1]/16\), then \(\tilde{g}_{\mathrm{1D}} = [1,4,22,4,1]/16\). That is, \(\tilde{g}_{\mathrm{1D}}\) is positive. This means that \(\tilde{g}\) is no longer related to a wavelet function. The 1D detail synthesis function related to \(\tilde{g}_{\mathrm{1D}}\) is defined by

$$\displaystyle\begin{array}{rcl} \frac{1} {2}\tilde{\psi }_{\mathrm{1D}}\left ( \frac{t} {2}\right )& =& \phi _{\mathrm{1D}}(t) + \frac{1} {2}\phi _{\mathrm{1D}}\left ( \frac{t} {2}\right ).{}\end{array}$$
(9)

Note that by choosing \(\tilde{\phi }_{\mathrm{1D}} =\phi _{\mathrm{1D}}\), any synthesis function \(\tilde{\psi }_{\mathrm{1D}}\) which satisfies

$$\displaystyle\begin{array}{rcl} \hat{\tilde{\psi }}_{\mathrm{1D}}(2\nu )\hat{\psi }_{\mathrm{1D}}(2\nu ) = \hat{\phi }_{\mathrm{1D}}^{2}(\nu ) -\hat{\phi }_{\mathrm{ 1D}}^{2}(2\nu )& &{}\end{array}$$
(10)

leads to an exact reconstruction [36] and \(\hat{\tilde{\psi }}_{\mathrm{1D}}(0)\) can take any value. The synthesis function \(\tilde{\psi }_{\mathrm{1D}}\) does not need to verify the admissibility condition (i.e., to have a zero mean).

Figure 5 shows the two functions \(\tilde{\phi }_{\mathrm{1D}}(=\phi _{\mathrm{1D}})\) and \(\tilde{\psi }_{\mathrm{1D}}\) used in the reconstruction in 1D, corresponding to the synthesis filters \(\tilde{h}_{\mathrm{1D}} = h_{\mathrm{1D}}\) and \(\tilde{g}_{\mathrm{1D}} =\delta +h_{\mathrm{1D}}\). More details can be found in [53].

Fig. 5
figure 5

Left, \(\tilde{\phi }_{\mathrm{1D}}\) the 1D synthesis scaling function and right, \(\tilde{\psi }_{\mathrm{1D}}\) the 1D detail synthesis function

3.4 Starlet Transform: Second Generation

A particular case is obtained when \(\hat{\tilde{\phi }}_{\mathrm{1D}} =\hat{\phi } _{\mathrm{1D}}\) and \(\hat{\psi }_{\mathrm{1D}}(2\nu ) = \frac{\hat{\phi }^{2}_{ \mathrm{1D}}(\nu )-\hat{\phi }_{\mathrm{1D}}^{2}(2\nu )} {\hat{\phi }_{\mathrm{1D}}(\nu )}\), which leads to a filter g 1D equal to δh 1Dh 1D. In this case, the synthesis function \(\tilde{\psi }_{\mathrm{1D}}\) is defined by \(\frac{1} {2}\tilde{\psi }_{\mathrm{1D}}( \frac{t} {2}) =\phi _{\mathrm{1D}}(t)\), and the filter \(\tilde{g}_{\mathrm{1D}} =\delta\) is the solution to (8).

We end up with a synthesis scheme where only the smooth part is convolved during the reconstruction.

Deriving h from a spline scaling function, for instance \(B_{1}(h_{1} = [1,2,1]/4)\) or \(B_{3}\ (h_{3} = [1,4,6,4,1]/16)\) (note that \(h_{3} = h_{1} \star h_{1}\)), since h 1D is even-symmetric (i.e., \(H(z) = H(z^{-1})\)), the z-transform of g 1D is then

$$\displaystyle\begin{array}{rcl} \begin{array}{ll} G(z)& = 1 - H^{2}(z) = 1 - z^{4}\left (\frac{1+z^{-1}} {2} \right )^{8} \\ & = \frac{1} {256}\left (-z^{4} - 8z^{3} - 28z^{2} - 56z + 186 - 56z^{-1} - 28z^{-2} - 8z^{-3} - z^{-4}\right ),\end{array}& &{}\end{array}$$
(11)

which is the z-transform of the filter

$$\displaystyle{g_{\mathrm{1D}} = [-1,-8,-28,-56,186,-56,-28,-8,-1]/256.}$$

We get the following filter bank:

$$\displaystyle\begin{array}{rcl} \begin{array}{ll} h_{\mathrm{1D}} & = h_{3} = \tilde{h} = [1,4,6,4,1]/16 \\ g_{\mathrm{1D}} & =\delta -h \star h = [-1,-8,-28,-56,186,-56,-28,-8,-1]/256\\ \tilde{g}_{ \mathrm{1D}} & =\delta .\end{array} & &{}\end{array}$$
(12)

The second-generation starlet transform algorithm is:

  1. 1.

    We initialize j to 0 and we start with the data c j [k].

  2. 2.

    We carry out a discrete convolution of the data c j [k] using the filter h 1D. The distance between the central pixel and the adjacent ones is 2j. We obtain c j+1[k].

  3. 3.

    We do exactly the same convolution on c j+1[k] and we obtain c j+1 [k].

  4. 4.

    After this two-step smoothing, we obtain the discrete starlet wavelet transform from the difference \(w_{j+1}[k] = c_{j}[k] - c_{j+1}^{{\prime}}[k]\).

  5. 5.

    If j is less than the number J of resolutions we want to compute, we increment j and then go to step 2.

  6. 6.

    The set \(\alpha =\{ w_{1},\ldots,w_{J},c_{J}\}\) represents the starlet wavelet transform of the data.

As in the standard starlet transform, extension to 2D is trivial. We just replace the convolution with h 1D by a convolution with the filter h 2D, which is performed efficiently by using the separability.

With this specific filter bank, there is a no convolution with the filter \(\tilde{g}_{\mathrm{1D}}\) during the reconstruction. Only the low-pass synthesis filter \(\tilde{h}_{\mathrm{1D}}\) is used.

The reconstruction formula is

$$\displaystyle\begin{array}{rcl} c_{j}[l] = (h_{\mathrm{1D}}^{(j)} \star c_{ j+1})[l] + w_{j+1}[l]\,& &{}\end{array}$$
(13)

and denoting \(L^{j} = h^{(0)} \star \ldots \star h^{(j-1)}\) and L 0 = δ, we have

$$\displaystyle\begin{array}{rcl} c_{0}[l] = \left (L^{J} \star c_{ J}\right )[l] +\sum _{ j=1}^{J}\left (L^{j-1} \star w_{ j}\right )[l].& &{}\end{array}$$
(14)

Each wavelet scale is convolved with a low-pass filter.

The second-generation starlet reconstruction algorithm is:

  1. 1.

    The set \(\alpha =\{ w_{1},\ldots,w_{J},c_{J}\}\) represents the input starlet wavelet transform of the data.

  2. 2.

    We initialize j to J − 1 and we start with the coefficients c j [k].

  3. 3.

    We carry out a discrete convolution of the data c j+1[k] using the filter (h 1D). The distance between the central pixel and the adjacent ones is 2j. We obtain c j+1 [k].

  4. 4.

    Compute \(c_{j}[k] = c_{j+1}^{{\prime}}[k] + w_{j+1}[k]\).

  5. 5.

    If j is larger than 0, \(j = j - 1\) and then go to step 3.

  6. 6.

    c 0 contains the reconstructed data.

As for the transformation, the 2D extension consists just in replacing the convolution by h 1D with a convolution by h 2D.

Figure 6 shows the analysis scaling and wavelet functions. The synthesis functions \(\tilde{\phi }_{\mathrm{1D}}\) and \(\tilde{\psi }_{\mathrm{1D}}\) are the same as those in Fig. 5. As both are positive, we have a decomposition of an image X on positive scaling functions \(\tilde{\phi }_{\mathrm{1D}}\) and \(\tilde{\psi }_{\mathrm{1D}}\), but the coefficients α are obtained with the starlet wavelet transform and have a zero mean (except for c J ), as a regular wavelet transform.

Fig. 6
figure 6

Left, the ϕ 1D analysis scaling function and right, the ψ 1D analysis wavelet function. The synthesis functions \(\tilde{\phi }_{\mathrm{1D}}\) and \(\tilde{\psi }_{\mathrm{1D}}\) are the same as those in Fig. 5

In 2D, similarly, the second-generation starlet transform leads to the representation of an image X[k, l]:

$$\displaystyle\begin{array}{rcl} X[k,l] =\sum _{m,n}\phi _{j,k,l}^{(1)}(m,n)c_{ J}[m,n] +\sum _{ j=1}^{J}\sum _{ m,n}\phi _{j,k,l}^{(2)}(m,n)w_{ j}[m,n]\,& &{}\end{array}$$
(15)

where \(\phi _{j,k,l}^{(1)}(m,n) = 2^{-2j}\tilde{\phi }_{\mathrm{1D}}(2^{-j}(k - m))\tilde{\phi }_{\mathrm{1D}}(2^{-j}(l - n))\) and \(\phi _{j,k,l}^{(2)}(m,n) = 2^{-2j}\tilde{\psi }_{\mathrm{1D}}(2^{-j}(k - m))\tilde{\psi }_{\mathrm{1D}}(2^{-j}(l - n))\).

ϕ (1) and ϕ (2) are positive, and w j are zero mean 2D wavelet coefficients.

The advantage of the second-generation starlet transform will be seen in section “Sparse Positive Decomposition” below.

3.5 Sparse Modeling of Astronomical Images

Using the sparse modeling, we now consider that the observed signal X can be considered as a linear combination of a few atoms of the wavelet dictionary \(\boldsymbol{\Phi } = [\phi _{1},\ldots,\phi _{T}]\). The model of Eq. 3 is then replaced by the following:

$$\displaystyle\begin{array}{rcl} Y & =& H\boldsymbol{\Phi }\alpha + N + B{}\end{array}$$
(16)

and \(X =\boldsymbol{ \Phi }\alpha\), and \(\alpha = \left \{w_{1},\ldots,w_{J},c_{J}\right \}\). Furthermore, most of the coefficients α will be equal to zero. Positions and scales of active coefficients are unknown, but they can be estimated directly from the data Y. We define the multiresolution support M of an image Y by

$$\displaystyle\begin{array}{rcl} M_{j}[k,l] = \left \{\begin{array}{ll} \text{ 1 }&\text{ if }w_{j}[k,l]\text{ is significant} \\ \text{ 0 }&\text{ if }w_{j}[k,l]\text{ is not significant} \end{array} \right.& &{}\end{array}$$
(17)

where w j [k, l] is the wavelet coefficient of Y at scale j and at position (k, l). Hence, M describes the set of active atoms in Y. If H is compact and not too extended, then M describes also well the active set of X. This is true because the background B is generally very smooth, and therefore, a wavelet coefficient w j [k, l] of Y, which does not belong to the coarsest scale, is only dependent on X and N (the term < ϕ i , B > being equal to zero).

3.5.1 Selection of Significant Coefficients Through Noise Modeling

We need now to determine when a wavelet coefficient is significant. Wavelet coefficients of Y are corrupted by noise, which follows in many cases a Gaussian distribution, a Poisson distribution, or a combination of both. It is important to detect the wavelet coefficients which are “significant,” i.e., the wavelet coefficients which have an absolute value too large to be due to noise.

For Gaussian noise, it is easy to derive an estimation of the noise standard deviation σ j at scale j from the noise standard deviation, which can be evaluated with good accuracy in an automated way [55]. To detect the significant wavelet coefficients, it suffices to compare the wavelet coefficients w j [k, l] to a threshold level t j . t j is generally taken equal to K σ j , and K, as noted in Sect. 2, is chosen between 3 and 5. The value of 3 corresponds to a probability of false detection of 0. 27 %. If w j [k, l] is small, then it is not significant and could be due to noise. If w j [k, l] is large, it is significant:

$$\displaystyle\begin{array}{rcl} \begin{array}{l} \text{ if }\mid w_{j}[k,l]\mid \ \geq \ t_{j}\ \ \text{ then }w_{j}[k,l]\text{ is significant } \\ \text{ if }\mid w_{j}[k,l]\mid \ < t_{j}\ \ \text{ then }w_{j}[k,l]\text{ is not significant }\end{array} & &{}\end{array}$$
(18)

When the noise is not Gaussian, other strategies may be used:

  • Poisson noise: if the noise in the data Y is Poisson, the transformation [1] \(\mathcal{A}(Y ) = 2\sqrt{Y + \frac{3} {8}}\) acts as if the data arose from a Gaussian white noise model, with σ = 1, under the assumption that the mean value of Y is sufficiently large. However, this transform has some limits, and it has been shown that it cannot be applied for data with less than 20 counts (due to photons) per pixel. So for X-ray or gamma ray data, other solutions have to be chosen, which manage the case of a reduced number of events or photons under assumptions of Poisson statistics.

  • Gaussian + Poisson noise: the generalization of variance stabilization [40] is

    $$\displaystyle\begin{array}{rcl} \mathcal{G}(Y [k,l]) = \frac{2} {\alpha } \sqrt{\alpha Y [k, l] + \frac{3} {8}\alpha ^{2} +\sigma ^{2} -\alpha g}& & {}\\ \end{array}$$

    where α is the gain of the detector and g and σ are the mean and the standard deviation of the readout noise.

  • Poisson noise with few events using the MS-VST: for images with very few photons, one solution consists in using the Multi-Scale Variance Stabilization Transform (MS-VST) [66]. The MS-VST combines both the Anscombe transform and the starlet transform in order to produce stabilized wavelet coefficients, i.e., coefficients corrupted by a Gaussian noise with a standard deviation equal to 1. In this framework, wavelet coefficients are now calculated by

    $$\displaystyle\begin{array}{rcl} \begin{array}{c} \text{Starlet}\\ + \\ \text{MS-VST}\\ \end{array} \left \{\begin{array}{lll} c_{j} & =&\sum _{m}\sum _{n}h_{1{\mathrm{D}}}[m]h_{1{\mathrm{D}}}[n] \\ & &c_{j-1}[k + 2^{j-1}m,l + 2^{j-1}n] \\ w_{j}& =&\mathcal{A}_{j-1}(c_{j-1}) -\mathcal{A}_{j}(c_{j})\end{array} \right.& & {}\end{array}$$
    (19)

    where \(\mathcal{A}_{j}\) is the VST operator at scale j defined by

    $$\displaystyle{ \mathcal{A}_{j}(c_{j}) = b^{(j)}\sqrt{\vert c_{ j} + e^{(j)}\vert } }$$
    (20)

    where the variance stabilization constants b (j) and e (j) only depend on the filter h 1D and the scale level j. They can all be precomputed once for any given h 1D [66]. The multiresolution support is computed from the MS-VST coefficients, considering a Gaussian noise with a standard deviation equal to 1. This stabilization procedure is also invertible as we have

    $$\displaystyle{ c_{0} = \mathcal{A}_{0}^{-1}\left [\mathcal{A}_{ J}(a_{J}) +\sum _{ j=1}^{J}w_{ j}\right ] }$$
    (21)

For other kinds of noise (correlated noise, nonstationary noise, etc.), other solutions have been proposed to derive the multiresolution support [57].

3.6 Sparse Positive Decomposition

Many astronomical images can be modeled as a sum of positive features, like stars and galaxies, which are more or less isotropic. The previous representation, based on the starlet transform, is well adapted to the representation of isotropic objects, but does not introduce any prior relative to the positivity of the features contained in our image. A positive and sparse modeling of astronomical images is similar to Eq. 16:

$$\displaystyle\begin{array}{rcl} Y & =& H\boldsymbol{\Phi }\alpha + N + B{}\end{array}$$
(22)

or

$$\displaystyle\begin{array}{rcl} Y & =& \boldsymbol{\Phi }\alpha + N + B{}\end{array}$$
(23)

if we do not take into account the point spread function. All coefficients in α are now positive, and all atoms in the dictionary \(\boldsymbol{\Phi }\) are positive functions. Such a decomposition normally requires computationally intensive algorithms such as matching pursuit [37]. The second-generation starlet transform offers us a new way to perform such a decomposition. Indeed, we have seen in section “Starlet Transform: Second Generation” that, using a specific filter bank, we can decompose an image Y on a positive dictionary \(\boldsymbol{\Phi }\) (see Fig. 5) and obtain a set of coefficients α (Y ), where \(\alpha ^{(Y )} = \mathbf{W}Y =\{ w_{1},\ldots,w_{J},c_{J}\}\), W being the starlet wavelet transform operator. α coefficients are positive and negative and are obtained using the standard starlet wavelet transform algorithm. Hence, by thresholding all negative (respectively, positive) coefficients, the reconstruction is always positive (respectively, negative), since \(\boldsymbol{\Phi }\) contains only positive atoms.

Hence, we would like to have a sparse set of positive coefficients \(\tilde{\alpha }\) which verify \(\boldsymbol{\Phi }\tilde{\alpha } = Y\). But in order to take into account the background and the noise, we need to define the constraint in the wavelet space (i.e., \(\mathbf{W}\boldsymbol{\Phi }\tilde{\alpha } = \mathbf{W}Y =\alpha ^{(Y )}\)), and this constraint must be applied only to the subset of coefficients in α (Y ) which are larger than the detection level. Therefore, to get a sparse positive decomposition on \(\boldsymbol{\Phi }\), we need to minimize

$$\displaystyle\begin{array}{rcl} \tilde{\alpha }=\min _{\alpha } \parallel \alpha \parallel _{1}\ \ \text{ s.t. }M\mathbf{W}\boldsymbol{\varPhi }\alpha = M\alpha ^{(Y )}\ \,& &{}\end{array}$$
(24)

where M is the multiresolution support defined in the previous section (i.e., M j [k, l] = 1 if a significant coefficient is detected at scale j and at position (k, l), and zero otherwise). To remove the background, we have to set \(M_{J+1}[k,l] = 0\) for all (k, l).

It was shown that such optimization problems can be efficiently solved through an iterative soft thresholding (IST) algorithm [14, 24, 52]. The following algorithm, based on the IST, allows to take into account the noise modeling through the multiresolution support and force the coefficients to be all positive:

  1. 1.

    Taking the second-generation starlet wavelet transform of the data Y, we obtain α (Y ).

  2. 2.

    From a given noise model, determine the multiresolution support M.

  3. 3.

    Set the number of iterations N iter, the first threshold, λ (0) = MAX(α (Y )), and the solution \(\tilde{\alpha }^{(0)} = 0\).

  4. 4.

    For 0 = 1, N iter do:

    • Reconstruct the image \(\tilde{Y }^{(i)}\) from \(\tilde{\alpha }^{(i)}:\ \tilde{Y }^{(i)} =\boldsymbol{ \Phi }\tilde{\alpha }^{(i)}\).

    • Taking the second-generation starlet wavelet transform of the data \(\tilde{Y }^{(i)}\), we obtain \(\alpha ^{\tilde{Y }^{(i)} } = \mathbf{W}\boldsymbol{\Phi }\tilde{\alpha }^{(i)}\).

    • Compute the significant residual r (i):

      $$\displaystyle\begin{array}{rcl} r^{(i)} = M\left (\alpha ^{(Y )} -\alpha ^{\tilde{Y }^{(i)} }\right ) = M\left (\alpha ^{(Y )} -\mathbf{W}\boldsymbol{\Phi }\tilde{\alpha }^{(i)}\right )& & {}\end{array}$$
      (25)
    • Calculate the value \(\lambda ^{(i)} =\lambda ^{(0)}(1 - i/N_{{\mathrm{iter}}})\)

    • Update the solution, by adding the residual, applying a soft thresholding on positive coefficients using the threshold level λ (i), and setting all negative coefficients to zero.

      $$\displaystyle\begin{array}{rcl} \begin{array}{ll} \tilde{\alpha }^{(i+1)} & = \left (\tilde{\alpha }^{(i)} + r^{(i)} -\lambda ^{(i)}\right )_{+} \\ & = \left (\tilde{\alpha }^{(i)} + M\left (\alpha ^{(Y )} -\mathbf{W}\boldsymbol{\Phi }\tilde{\alpha }^{(i)}\right ) -\lambda ^{(i)}\right )_{+}\end{array} & & {}\end{array}$$
      (26)
    • \(i = i + 1\).

  5. 5.

    The set \(\tilde{\alpha } = \tilde{\alpha }^{(N_{{\mathrm{iter}}})}\) represents the sparse positive decomposition of the data.

The threshold parameter λ (i) decreases with the iteration number, and it plays a role similar to the cooling parameter of the simulated annealing techniques, i.e., it allows the solution to escape from local minima.

3.6.1 Example 1: Sparse Positive Decomposition of NGC 2997

Figure 7 shows the positive starlet decomposition, using 100 iterations, and can be compared to Fig. 3.

Fig. 7
figure 7

Positive starlet decomposition of the galaxy NGC 2997 with six scales

3.6.2 Example 2: Sparse Positive Starlet Decomposition of a Simulated Image

The next example compares the standard starlet transform to the positive starlet decomposition (PSD) on a simulated image.

Figure 8 shows respectively from top to bottom and left to right (a) the original simulated image, (b) the noisy data, (c) the reconstruction from the PSD coefficients, and (d) the residual between the noisy data and the PSD reconstructed image (i.e., image b–image c). Hence, the PSD reconstructed image gives a very good approximation of the original image. No structures can be seen in the residual, and all sources are well detected.

Fig. 8
figure 8

(a and b) Original simulated image and the same image contaminated by a Gaussian noise. (c and d) Reconstructed image for the positive starlet coefficients of the noisy image using 50 iterations, and residual (i.e., noisy image – reconstructed image)

The first PSD scale does not contain any nonzero coefficient. Figure 9, top, shows the first four scales of the wavelet transform, and Fig. 9, bottom, the first four scales of the PSD.

Fig. 9
figure 9

Top, starlet transform, and bottom, positive starlet decomposition of a simulated astronomical image

4 Source Detection Using a Sparsity Model

As described is the previous section, the wavelet coefficients of Y which do not belong to the coarsest scale c J are not dependent on the background. This is a serious disadvantage, since the background estimation can be sometimes very problematic.

Two approaches have been proposed to detect sources, assuming the signal is sparse in the wavelet domain. The first consists in first removing the noise and the background and then applying the standard approach described in Sect. 2. It has been used for many years for X-ray source detection [45, 59]. The second approach, called Multiscale Vision Model [8], attempts to define directly an astronomical object in the wavelet space.

4.1 Detection Through Wavelet Denoising

The most commonly used filtering method is hard thresholding, which consists of setting to 0 all wavelet coefficients of Y which have an absolute value lower than a threshold t j :

$$\displaystyle\begin{array}{rcl} \tilde{w}_{j}[k,l] = \left \{\begin{array}{ll} w_{j}[k,l]&\text{ if }\mid w_{j}[k,l]\mid > t_{j} \\ 0 &\text{ otherwise } \end{array} \right.& &{}\end{array}$$
(27)

More generally, for a given sparse representation (i.e., wavelet) with its associated fast transform W and fast reconstruction R, we can derive a hard threshold denoising solution X from the data Y, by first estimating the multiresolution support M using a given noise model, and then calculating

$$\displaystyle{ X = \mathbf{R}M\mathbf{W}Y. }$$
(28)

We transform the data, multiply the coefficients by the support, and reconstruct the solution.

The solution can however be improved by considering the following optimization problem, \(\min _{X} \parallel M(\mathbf{W}Y -\mathbf{W}X) \parallel _{2}^{2}\), where M is the multiresolution support of Y. A solution can be obtained using the Landweber iterative scheme [51, 58]:

$$\displaystyle\begin{array}{rcl} X^{n+1} = X^{n} + \mathbf{R}M\left [\mathbf{W}Y -\mathbf{W}X^{n}\right ]& &{}\end{array}$$
(29)

If the solution is known to be positive, the positivity constraint can be introduced using the following equation:

$$\displaystyle\begin{array}{rcl} X^{n+1} = P_{ +}\left (X^{n} + \mathbf{R}M\left [\mathbf{W}Y -\mathbf{W}X^{n}\right ]\right )& &{}\end{array}$$
(30)

where P + is the projection on the cone of nonnegative images.

This algorithm allows us to constrain the residual to have a zero value within the multiresolution support [58]. For astronomical image filtering, iterating improves significantly the results, especially for the photometry (i.e., the integrated number of photons in a given object).

Removing the background in the solution is straightforward. The algorithm does not need to be modified. We only need to set to zero the coefficients related to the coarsest scale in the multiresolution support: \(\forall k\ \ M_{J}[k,l] = 0\).

4.2 The Multiscale Vision Model

4.2.1 Introduction

The wavelet transform of an image Y by the starlet transform produces at each scale j a set {w j }. This has the same number of pixels as the image. The original image I can be expressed as the sum of all the wavelet scales and the smoothed array c J by the expression

$$\displaystyle\begin{array}{rcl} Y [k,l] = c_{J}[k,l] +\sum _{ j=1}^{J}w_{ j}[k,l].& &{}\end{array}$$
(31)

Hence, we have a multiscale pixel representation, i.e., each pixel of the input image is associated with a set of pixels of the multiscale transform. A further step is to consider a multiscale object representation, which would associate with an object contained in the data a volume in the multiscale transform. Such a representation obviously depends on the kind of image we need to analyze, and we present here a model which has been developed for astronomical data. It may however be used for other kinds of data, to the extent that such data are similar to astronomical data. We assume that an image Y can be decomposed into a set of components:

$$\displaystyle\begin{array}{rcl} Y [k,l] =\sum _{ i=1}^{N_{o} }X_{i}[k,l] + B[k,l] + N[k,l]& &{}\end{array}$$
(32)

where N o is the number of components, X i are the components contained in the data (stars, galaxies, etc.), B is the background image, and N is the noise.

To perform such a decomposition, we have to detect, to extract, to measure, and to recognize the significant structures. This is done by first computing the multiresolution support of the image (i.e., the set of significant active coefficients) and by applying a segmentation scale by scale. The wavelet space of a 2D direct space is a 3D volume. An object, associated with a component, has to be defined in this space. A general idea for object definition lies in the connectivity property. An object occupies a physical region, and in this region we can join any pixel to other pixels based on significant adjacency. Connectivity in direct space has to be transported into wavelet transform space. In order to define the objects, we have to identify the wavelet transform space pixels we can attribute to the objects. We describe in this section the different steps of this method.

4.2.2 Multiscale Vision Model Definition

The multiscale vision model, MVM [8], described an object as a hierarchical set of structures. It uses the following definitions:

  • Significant wavelet coefficient: a wavelet coefficient is significant when its absolute value is above a given detection limit. The detection limit depends on the noise model (Gaussian noise, Poisson noise, and so on). See section “Sparse Modeling of Astronomical Images” for more details.

  • Structure: a structure S j, k is a set of significant connected wavelet coefficients at the same scale j:

    $$\displaystyle{ S_{j,k} =\{ w_{j}[k_{1},l_{1}],w_{j}[k_{2},l_{2}],\ldots,w_{j}[k_{p},l_{p}]\} }$$
    (33)

    where p is the number of significant coefficients included in the structure S j, k and \(w_{j}[x_{i},y_{i}]\) is a wavelet coefficient at scale j and at position (x i , y i ).

  • Object: an object is a set of structures:

    $$\displaystyle{ O_{l} =\{ S_{j_{1},k_{1}},\ldots,S_{j_{n},k_{n}}\} }$$
    (34)

    We define also the operator \(\mathcal{L}\) which indicates to which object a given structure belongs: \(\mathcal{L}(S_{j,k}) = l\) is S j, k O l , and \(\mathcal{L}(S_{j,k}) = 0\) otherwise.

  • Object scale: the scale of an object is given by the scale of the maximum of its wavelet coefficients.

  • Interscale relation: the criterion allowing us to connect two structures into a single object is called the “interscale relation.”

  • Sub-object: a sub-object is a part of an object. It appears when an object has a local wavelet maximum. Hence, an object can be composed of several sub-objects. Each sub-object can also be analyzed.

4.2.3 From Wavelet Coefficients to Object Identification

4.2.4 Multiresolution Support Segmentation

Once the multiresolution support has been calculated, we have at each scale a Boolean image (i.e., pixel intensity equals 1 when a significant coefficient has been detected, and 0 otherwise). The segmentation consists of labeling the Boolean scales. Each group of connected pixels having a “1” value gets a label value between 1 and L max, L max being the number of groups. This process is repeated at each scale of the multiresolution support. We define a “structure” S j, i as the group of connected significant pixels which has the label i at a given scale j.

4.2.5 Interscale Connectivity Graph

An object is described as a hierarchical set of structures. The rule which allows us to connect two structures into a single object is called “interscale relation.” Figure 10 shows how several structures at different scales are linked together and form objects. We have now to define the interscale relation . Let us consider two structures at two successive scales, S j, k and S j+1, l . Each structure is located in one of the individual images of the decomposition and corresponds to a region in this image where the signal is significant. Denoting (x m , y m ) the pixel position of the maximum wavelet coefficient value of S j, k , S j, k is said to be connected to S j+1, l if S j+1, l contains the pixel position (x m , y m ) (i.e., the pixel position of the maximum wavelet coefficient of the structure S j, k must also be contained in the structure S j+1, l ). Several structures appearing in successive wavelet coefficient images can be connected in such a way, which we call an object in the interscale connectivity graph. Hence, we identify n o objects in the wavelet space, each object O i being defined by a set of structures, and we can assign to each structure a label i, with i ∈ [1, n o ]: \(\mathcal{L}(S_{j,k}) = i\) if the structure S j, k belongs to the ith object.

Fig. 10
figure 10

Example of connectivity in wavelet space: contiguous significant wavelet coefficients form a structure, and following an interscale relation, a set of structures forms an object. Two structures S j , S j+1 at two successive scales belong to the same object if the position pixel of the maximum wavelet coefficient value of S j is included in S j+1

4.2.6 Filtering

Statistically, some significant structures can be due to the noise. They contain very few pixels and are generally isolated, i.e., connected to no field at upper and lower scales. So, to avoid false detection, the isolated fields can be removed from the initial interscale connection graph. Structures at the border of the images may also have been detected because of the border problem and can be removed.

4.2.7 Merging/Deblending

As in the standard approach, true objects which are too close may generate a set of connected structures, initially associated with the same object, and a decision must be taken whether to consider such a case as one or two objects. Several cases may be distinguished:

  • Two (or more) close objects, approximately of the same size, generate a set of structures. At a given scale j, two separate structures S j, 1 and S j, 2 are detected, while at the scale j + 1, only one structure is detected S j+1, 1, which is connected to the S j, 1 and S j, 2.

  • Two (or more) close objects of different sizes generate a set of structures, from scale j to scale k (k > j).

In the wavelet space, the merging/deblending decision will be based on the local maxima values of the different structures belonging to this object. A new object (i.e., deblending) is derived from the structure S j, k if there exists at least one other structure at the same scale belonging to the same object (i.e., there exists one structure S j+1, a and at least one structure S j, b such that \(\mathcal{L}(S_{j+1,a}) = \mathcal{L}(S_{j,b}) = \mathcal{L}(S_{j,k})\)) and if the following relationship is verified: \(w_{j}^{m} > w_{j-1}^{m}\) and \(w_{j}^{m} > w_{j+1}^{m}\), where:

  • w j m is the maximum wavelet coefficient of the structure S j, k : w j m = Max(S j, k ):

    • \(w_{j-1}^{m} = 0\) if S j, k is not connected to any structure at scale j − 1.

    • w j−1 m is the maximum wavelet coefficient of the structure S j−1, l , where S j−1, l is such that \(\mathcal{L}(S_{j-1,l}) = \mathcal{L}(S_{j,k})\) and the position of its highest wavelet coefficient is the closest to the position of the maximum of S j, k .

  • \(w_{j+1}^{m} =\mathrm{ Max}\{w_{j+1,x_{1},y_{1}},\ldots,w_{j+1,x_{n},y_{n}}\}\), where all wavelet coefficients w j+1, x, y are at a position which belongs also to S j, k (i.e., w j, x, y S j, k ).

When these conditions are verified, S j, k and all structures at smaller scales which are directly or indirectly connected to S j, k will define a new object.

4.2.8 Object Identification

We can now summarize this method allowing us to identify all the objects in a given image Y:

  1. 1.

    We compute the wavelet transform with the starlet algorithm, which leads to a set \(\alpha = \mathbf{W}Y =\{ w_{1},\ldots,w_{J},c_{J}\}\). Each scale w j has the same size as the input image.

  2. 2.

    We determine the noise standard deviation in w 1.

  3. 3.

    We deduce the thresholds at each scale from the noise modeling.

  4. 4.

    We threshold scale by scale and we do an image labeling.

  5. 5.

    We determine the interscale relations.

  6. 6.

    We identify all the wavelet coefficient maxima of the wavelet transform space.

  7. 7.

    We extract all the connected trees resulting from each wavelet transform space maximum.

4.3 Source Reconstruction

4.3.1 Partial Reconstruction as an Inverse Problem

A set of structures \(\mathcal{S}_{i}\) (\(\mathcal{S}_{i} =\{ S_{j,k},\ldots,S_{j^{{\prime}},k^{{\prime}}}\}\)) defines an object O i which can be reconstructed separately from other objects, in order to provide the components X i . The co-addition of all reconstructed objects is a filtered version of the input data. We will denote α i the set of wavelet coefficients belonging to the object O i . Therefore, α i is a subset of the wavelet transform of X i , \(\tilde{\alpha }_{i} = \mathbf{W}X_{i}\). Indeed, the last scale of \(\tilde{\alpha }_{i}\) is unknown, as well as many wavelet coefficients which have not been detected. Then the reconstruction problem consists of searching for an image X i such that its wavelet transform reproduces the coefficients α i (i.e., they are the same as those of \(\mathcal{S}_{i}\), the detected structures). If W describes the wavelet transform operator and P w the projection operator in the subspace of the detected coefficients (i.e., having set to zero all coefficients at scales and positions where nothing was detected), the solution is found by minimization of

$$\displaystyle\begin{array}{rcl} \min _{X_{i}} \parallel \alpha _{i} - P_{w}\left (\mathbf{W}X_{i}\right ) \parallel ^{2}& &{}\end{array}$$
(35)

The size of the restored image X i is arbitrary, and it can be easily set greater than the number of known coefficients. It is certain that there exists at least one image X i which gives exactly α i , i.e., the original one. But generally we have an infinity of solutions, and we have to choose among them the one which is considered as correct. An image is always a positive function, which leads us to constrain the solution, but this is not sufficient to get a unique solution. More details on the reconstruction algorithm can be found in [8, 57].

4.4 Examples

4.4.1 Band Extraction

We simulated a spectrum which contains an emission band at \(3.50\,\upmu {\mathrm{m}}\) and a nonstationary noise superimposed on a smooth continuum. The band is a Gaussian of width \(\text{FWHM} = 0.01\,\upmu {\mathrm{m}}\) (FWHM = full width at half-maximum) and normalized such that its maximum value equals ten times the local noise standard deviation.

Figure 11 (top) contains the simulated spectrum. The wavelet analysis results in the detection of an emission band at \(3.50\,\upmu {\mathrm{m}}\) above 3σ. Figure 11 (middle) shows the reconstruction of the detected band in the simulated spectrum. The real feature is overplotted as a dashed line. Figure 11 (bottom) contains the original simulation with the reconstructed band subtracted. It can be seen that there are no strong residuals near the location of the band, which indicates that the band is well reconstructed. The center position of the band, its FWHM, and its maximum can then be estimated via a Gaussian fit. More details about the use of MVM for spectral analysis can be found in [60].

Fig. 11
figure 11

Top: simulated spectrum. Middle: reconstructed simulated band (full line) and original band (dashed line). Bottom: simulated spectrum minus the reconstructed band

4.4.2 Star Extraction in NGC 2997

We applied MVM to the galaxy NGC 2997 (Fig. 12, top left). Two images were created by co-adding objects detected from scales 1 and 2 and from scales 3 to 6. They are displayed, respectively, in Fig. 12, top right and bottom left. Figure 12, bottom right, shows the difference between the input data and the image which contained the objects from scales 1 and 2. As we can see, all small objects have been removed, and the galaxy can be better analyzed.

Fig. 12
figure 12

(a) Galaxy NGC 2997, (b) objects detected from scales 1 and 2, (c) objects detected from scales 3 to 6, and (d) difference between (a) and (b)

4.4.3 Galaxy Nucleus Extraction

Figure 13 shows the extracted nucleus of NGC 2997 using the MVM method and the difference between the galaxy image and the nucleus image.

Fig. 13
figure 13

Upper left, galaxy NGC 2997; upper right, extracted nucleus; bottom, difference between the two previous images

5 Deconvolution

Up to now, the PSF H has not been considered in the source detection. This means that all morphological parameters (size, ellipticity, etc.) derived from the detected objects need to be corrected from the PSF. Very close objects may also be seen as a single object because H acts as a blurring operator on the data. A solution may consist in deconvolving first the data and carrying out the source detection afterwards.

The problem of image deconvolution is ill-posed [3], and as a consequence, the matrix H modeling the imaging system is ill-conditioned. If Y is the observed image and X the unknown object, the equation HX = Y has not a unique and stable solution. Therefore, one must look for approximate solutions of this equation that are also physically meaningful. One approach is Tikhonov regularization theory [23]; however, a more general approach is provided by the so-called Bayes paradigm [25], even if it is applicable only to discrete problems. In this framework one can both take into account statistical properties of the data (Tikhonov regularization is obtained by assuming additive Gaussian noise) and also introduce a priori information on the unknown object.

5.1 Statistical Approach to Deconvolution

We assume that the detected image Y is the realization of a multivalued random variable I corresponding to the (unknown) value X of another multivalued random variable, the object O. Moreover, we assume that the conditional probability distribution p I (Y | X) is known. Since the unknown object appears as a set of unknown parameters, the problem of image deconvolution can be considered as a classical problem of parameter estimation. The standard approach is the maximum likelihood (ML) method . In our specific application, for a given detected image Y, this consists of introducing the likelihood function defined by

$$\displaystyle{ L_{Y }(X) = p_{I}(Y;X). }$$
(36)

Then the ML estimate of the unknown object is any maximizer X of the likelihood function

$$\displaystyle{ X^{{\ast}} = \text{arg}\max _{ X\in \mathbb{R}^{n}}\ L_{Y }(X)\ \, }$$
(37)

if it exists.

In our applications the likelihood function is the product of a very large number of terms (the data components are assumed to be statistically independent), so that it is convenient to take the logarithm of this function; moreover, if we consider its negative logarithm, the maximization problem is transformed into a minimization one. Let us consider the function

$$\displaystyle{ J_{0}(X;Y ) = -A\ {\mathrm{ln}}\ L_{Y }(X) + B\ \, }$$
(38)

where A, B are suitable constants. They are introduced in order to obtain a function which has a simple expression and is also nonnegative since, in our applications, the negative logarithm of the likelihood is bounded from below. Then, it is easy to verify that the problem of Eq. 37 is equivalent to the following one:

$$\displaystyle{ X^{{\ast}} = \text{arg}\min _{ X\in \mathbb{R}^{n}}\ J_{0}(X;Y ). }$$
(39)

We consider now the model of Eq. 2 with three different examples of noise.

Example 1.

In the case of additive white Gaussian noise, by a suitable choice of the constants A, B, we obtain (we assume here that the background B is not subtracted even if it must be estimated)

$$\displaystyle{ J_{0}(X;Y ) = \vert \vert HX + B - Y \vert \vert ^{2}\ \, }$$
(40)

and therefore, the ML approach coincides with the well-known least-squares (LS) approach. It is also well known that the function of Eq. 40 is convex and strictly convex if and only if the equation HX = 0 has only the solution X = 0. Moreover, it has always absolute minimizers, i.e., the LS problem has always a solution; but the problem is ill-conditioned because it is equivalent to the solution of the Euler equation:

$$\displaystyle{ H^{T}H\ X = H^{T}(Y - B). }$$
(41)

We remark that the ill-posedness of the LS problem is the starting point of Tikhonov regularization theory (see, for instance, [23, 63]), and therefore, this theory is based on the tacit assumption that the noise affecting the data is additive and Gaussian.

We remark that, in the case of object reconstruction, since objects are nonnegative, we should consider the minimization of the function of Eq. 40 on the nonnegative orthant. With such a constraint the problem is not treatable in the standard framework of regularization theory.

Example 2.

In the case of Poisson noise, if we introduce the so-called generalized Kullback–Leibler (KL) divergence of a vector Z from a vector Y, defined by

$$\displaystyle{ D_{{\mathrm{KL}}}(Y,Z) =\sum _{ i=1}^{m}\left \{y_{ i}\ {\mathrm{ln}}\ \frac{Y _{i}} {Z_{i}} + Z_{i} - Y _{i}\right \}\ \, }$$
(42)

then, with a suitable choice of the constants A, B, the function J 0(X; Y ) is given by

$$\displaystyle\begin{array}{rcl} \begin{array}{ll} J_{0}(X;Y )& = D_{{\mathrm{KL}}}(Y;HX + B) \\ & =\sum _{ i=1}^{m}\left \{Y _{i}\ {\mathrm{ln}} \frac{y_{i}} {(HX+B)_{i}} + (HX + B)_{i} - y_{i}\right \}\ .\end{array} & &{}\end{array}$$
(43)

It is quite natural to take the nonnegative orthant as the domain of this function. Moreover, it is well known that it is convex (strictly convex if the equation HX = 0 has only the solution X = 0), nonnegative, and coercive. Therefore, it has absolute minimizers. However, these minimizers are strongly affected by noise, and the specific effect of the noise in this problem is known as checkerboard effect [41], since many components of the minimizers are zero.

Example 3.

In the case of Gauss+Poisson noise, the function J 0(X; Y ) is given by a much more complex form. This function is also convex (strictly convex if the equation Hx = 0 has the unique solution x = 0), nonnegative, and coercive [2]. Therefore, it also has absolute minimizer on the nonnegative orthant.

The previous examples demonstrate that, in the case of image reconstruction, ML problems are ill-posed or ill-conditioned. That means that one is not interested in computing the minimum points X of the functions corresponding to the different noise models because they do not provide sensible estimates \(\bar{X}\) of the unknown object.

The previous remark is not surprising in the framework of inverse problem theory. Indeed it is generally accepted that, if the formulation of the problem does not use some additional information on the object, then the resulting problem is ill-posed. This is what happens in the maximum likelihood approach because we only use information about the noise with, possibly, the addition of the constraint of non-negativity.

The additional information may consist, for instance, of prescribed bounds on the solution and/or its derivatives up to a certain order (in general not greater than two). These prescribed bounds can be introduced in the problem as additional constraints in the variational formulation provided by ML. However, in a quite natural probabilistic approach, called the Bayesian approach , the additional information is given in the form of statistical properties of the object [25].

In other words, one assumes that the unknown object X is a realization of a vector-valued random variable O and that the probability distribution of O, the so-called prior denoted by p O (X), is also known or can be deduced from known properties of the object. The most frequently used priors are Markov random fields or, equivalently, Gibbs random fields, i.e., they have the following form:

$$\displaystyle{ p_{O}(X) = \frac{1} {Z}{\mathrm{e}}^{-\mu \varOmega (X)}\ \, }$$
(44)

where Z is a normalization constant, μ is a positive parameter (a hyperparameter in statistical language, a regularization parameter in the language of regularization theory), while Ω(X) is a function, possibly convex.

The previous assumptions imply that the joint probability density of the random variables O, I is given by

$$\displaystyle{ p_{OI}(X,Y ) = p_{I}(Y \vert X)p_{O}(X). }$$
(45)

If we introduce the marginal probability density of the image I

$$\displaystyle{ p_{I}(Y ) =\int p_{OI}(X,Y )\ dX\ \, }$$
(46)

from Bayes’ formula we obtain the conditional probability density of O for a given value Y of I:

$$\displaystyle{ p_{O}(X\vert Y ) = \frac{p_{OI}(X,Y )} {p_{I}(Y )} = \frac{p_{I}(Y \vert X)p_{O}(X)} {p_{I}(Y )} . }$$
(47)

If in this equation we insert the detected value Y of the image, we obtain the a posteriori probability density of X:

$$\displaystyle{ P_{Y }(X) = p_{O}(X\vert Y ) = L_{Y }(X)\frac{p_{O}(X)} {p_{I}(Y )} . }$$
(48)

Then, a maximum a posteriori (MAP) estimate of the unknown object is defined as any object X that maximizes the a posteriori probability density:

$$\displaystyle{ X^{{\ast}} = \text{arg}\max _{ X\in \mathbb{R}^{n}}P_{Y }(X). }$$
(49)

As in the case of the likelihood, it is convenient to consider the negative logarithm of P Y (X). If we assume a Gibbs prior as that given in Eq. 44 and we take into account the definition of Eq. 38, we can introduce the following function:

$$\displaystyle\begin{array}{rcl} \begin{array}{ll} &J(X;Y ) = -A\ {\mathrm{ln}}P_{Y }(X) + B - A\ {\mathrm{ln}}\ Z \\ & - A\ {\mathrm{ln}}\ p_{I}(Y ) = J_{0}(X;Y ) +\mu J_{R}(X)\ \,\end{array} & &{}\end{array}$$
(50)

where \(J_{R}(X) = A\Omega (X)\). Therefore, the MAP estimates are also given by

$$\displaystyle{ X^{{\ast}} = \text{arg}\min _{ X\in \mathbb{R}^{n}}\ J(X;Y ) }$$
(51)

and again one must look for the minimizers satisfying the non-negativity constraint.

5.2 The Richardson–Lucy Algorithm

One of the most frequently used methods for image deconvolution in astronomy is an iterative algorithm known as the Richardson–Lucy (RL) algorithm [34, 48]. In emission tomography it is also denoted as expectation maximization (EM) because, as shown in [49], it can be obtained by applying to the ML problem with Poisson noise a general EM method introduced in [19] for obtaining ML estimates.

In [49] it is shown that, if the iteration converges, then the limit is just an ML estimate in the case of Poisson data. Subsequently the convergence of the algorithm was proved by several authors in the case B = 0. An account can be found in [41].

The iteration is as follows: it is initialized with a positive image X (0) (a constant array, in general); then, given X (n), X (n+1) is computed by

$$\displaystyle{ X^{(n+1)} = X^{(n)}H^{T} \frac{Y } {HX^{(n)} + B}. }$$
(52)

This algorithm has some nice features. First, the result of each iteration is automatically a positive array; second, in the case B = 0, the result of each iteration has the same flux of the detected image Y, and this property is interesting from the photometric point of view.

The limit of the RL iteration is, in general, very noisy and sparse in pixel space (see the remark at the end of Example 2 in the previous section) and can provide satisfactory results in the case of star systems (see [4], section 3.1); in the case of complex systems, a reasonable solution can be obtained by a suitable stopping of the algorithm before convergence. This can be seen as a kind of regularization, and this property is called semi-convergence [3], i.e., the iteration first approaches the correct solution and then goes away. An example of RL reconstruction is shown in Fig. 14 (lower left panel).

Fig. 14
figure 14

Simulated Hubble Space Telescope Wide Field Camera image of a distant cluster of galaxies. Upper left: original, unaberrated, and noise-free. Upper right: input, aberrated, noise added. Lower left: restoration, Richardson–Lucy. Lower right, restoration starlet deconvolution

The main drawback of RL is that, in general, it is very slow and may require hundreds or thousands of iterations. The proposed acceleration approaches are based on the remark that RL is a scaled gradient method since it can be written in the following form:

$$\displaystyle{ X^{(n+1)} = X^{(n)} - X^{(n)}\nabla _{ X}J_{0}(X^{(n)};Y )\ \, }$$
(53)

where ∇ X denotes the gradient with respect to X and J 0(X; Y ) is the data-fidelity function defined in Eq. 43. Therefore, a reduction of the number of iterations can be obtained by means of a suitable line search along the descent direction. This is the approach proposed by several authors. However, this structure of RL has inspired a recently proposed optimization method, known as scaled gradient projection (SGP) [10], which can be viewed as a general method for the constrained minimization of differentiable functions.

If J(X) is the function to be minimized on the nonnegative orthant, then the method is based on the descent direction:

$$\displaystyle{ D^{(n)} = P_{ +}\big(X^{(n)} -\sigma _{ n}S^{(n)}\nabla _{ X}J(X^{(n)})\big) - X^{(n)} }$$
(54)

where P + is the projection on the nonnegative orthant, S (n) is the (diagonal) scaling matrix, and σ n is a suitably chosen step length [10]. Then iteration X (n+1) is obtained by a line search along the descent direction based on the Armijo rule. We can add that the method can be easily extended to the case where the convex set of the admissible solutions is defined by box and equality constraints.

In the case of the minimization of the KL divergence, the diagonal scaling matrix is that suggested by RL, i.e., S (n) = X (n) (with the addition of suitable upper and lower bounds), and the method shows the semi-convergence property as RL, but requires a much smaller number of iterations for obtaining a sensible reconstruction. In an application to the deconvolution of astronomical images [46], it has been shown that, even if the computational cost per iteration is about 30 % greater than that of RL, thanks to the reduction of the number of iterations it is possible to obtain a speedup, with respect to RL, ranging from 4 to more than 30, depending on the astronomical source and the noise level. Moreover, implementation on graphics processing units (GPU) allows to deconvolve a 2, 048 × 2, 048 image in a few seconds.

Several iterative methods, modeled on RL, have been introduced for computing MAP estimates corresponding to different kinds of priors. A recent account can be found in [4]. A nice feature of SGP, whose convergence is proved in [10], is that it can be easily applied to this problem, i.e., to the minimization of the function of Eq. 50 (again, with the addition of box and equality constraints). The scaling is taken from the split-gradient method (SGM), proposed in [31], since this scaling is always nonnegative, while the scaling proposed in [26] may take negative values. In general the choice of the scaling is as follows. If the gradient of J R (X) is split in the following way:

$$\displaystyle{ -\nabla _{X}J_{R}(X) = U_{R}(X) - V _{R}(X), }$$
(55)

where U R , V R are nonnegative arrays, then the scaling is given by the array

$$\displaystyle{ S = \frac{X} {1 +\mu V _{R}(X)}. }$$
(56)

Of course SGP can be applied if J R (X) is differentiable, and therefore, it can cover both smoothing regularization as given by Tikhonov and also edge-preserving regularization as given by smoothed TV (total variation) [12, 65]. Finally, a difficult problem in the case of regularized problems is the choice of the value of the regularization parameter. An attempt in the direction of solving this problem is provided by a recently proposed discrepancy principle for Poisson data deconvolution [5].

5.3 Blind Deconvolution

Blind deconvolution is the problem of image deblurring when the blur is unknown. In the case of a space-invariant model, the naive problem formulation is to solve the problem Y = HX where only Y is known, where ∗ denotes convolution. It is obvious that this problem is extremely undetermined and that there is an infinite set of pairs solving the equation. Among them also is the trivial solution \(X = Y,\ H =\delta\), where δ denotes the usual delta function. Therefore, the problem must be formulated by introducing as far as possible all available constraints on both the object X and the PSF H.

In the case of Poisson noise, several iterative methods have been proposed, which consist of alternating updates of the object and PSF by means of RL iterations or accelerated RL iterations. For instance, in [27] one RL iteration is used both on the object and the PSF. This algorithm was investigated, in the context of nonnegative matrix factorization (NMF), by Lee and Seung [32], but their convergence proof is incomplete, since only the monotonic decrease of the objective function is shown while, for a general descent method to be convergent, strongest Armijo-like decreasing conditions have to be verified. In general, the proposed approaches to blind deconvolution with Poisson data could be classified as methods of inexact alternating minimization applied to the KL divergence, as a function of both object and PSF.

In a recent paper [9], in the context of NMF, convergence of inexact alternating minimization is proved if the iterative algorithm used for the inner iterations satisfies suitable conditions, which are satisfied by SGP. Therefore, this approach looks very suitable for the problem of blind deconvolution with Poisson data. The approach is applied in [47] using constraints both on the object and the PSF. The method applies to the imaging by ground-based telescopes since, as suggested in [20], one of the constraints on the PSF is provided by the Strehl ratio (SR; see Sect. 2), a parameter measuring the quality of the AO correction. Indeed we recall that the advantage of SGP is not only fast convergence, if a suitable scaling of the gradient is used, but also the possibility of introducing suitable box and equality constraints on the solution. In particular the SR constraint excludes the trivial solution mentioned above. The method works well in the case of star systems.

5.4 Deconvolution with a Sparsity Prior

Another approach is to use the sparsity to model the data. A sparse model can be interpreted from a Bayesian standpoint, by assuming the coefficients α of the solution in the dictionary Φ follow a leptokurtic PDF with heavy tails such as the generalized Gaussian distribution form:

$$\displaystyle{ \qquad {\mathrm{pdf}}_{\alpha }(\alpha _{1},\ldots,\alpha _{K}) \propto \prod _{k=1}^{K}\exp \left (-\lambda \left \|\alpha _{ i}\right \|_{p}^{p}\right )\ \ \ 0 \leq p < 2. }$$
(57)

Between all possible solutions, we want the one which has the sparsest representation in the dictionary Φ. Putting together the log-likelihood function in the case of Gaussian noise σ and the priors on α, the MAP estimator leads to the following optimization problem:

$$\displaystyle{ \min _{\alpha _{1},\ldots,\alpha _{K}}\frac{1} {2\sigma }\left \|Y -\varPhi \alpha \right \|^{2} +\lambda \sum _{ k=1}^{K}\left \|\alpha _{ k}\right \|_{p}^{p}\,0 \leq p < 2. }$$
(58)

The sparsity can be measured through the \(\|\alpha \|_{0}\) norm (i.e., p = 0). This counts in fact the number of nonzero elements in the sequence. It was also proposed to convexify the constraint by substituting the convex \(\|\alpha \|_{1}\) norm for the \(\|\alpha \|_{0}\) norm [13]. Depending on the H operator, there are several ways to obtain the solution of this equation.

A first iterative thresholding deconvolution method was proposed in [51] which consists of the following iterative scheme:

$$\displaystyle\begin{array}{rcl} X^{(n+1)} = P_{ +}\left (X^{(n)} + H^{T}\left (\mathbf{WDen}_{ M^{(n)}}\left (Y - HX^{(n)}\right )\right )\right )& &{}\end{array}$$
(59)

where P + is the projection on the cone of nonnegative images and WDen is an operator which performs a wavelet thresholding, i.e., applies the wavelet transform of the residual R (n) (i.e., \(R^{(n)} = Y - HX^{(n)}\)), thresholds some wavelet coefficients, and applies the inverse wavelet transform. Only coefficients that belong to the multiresolution support M (n) [51] are kept, while the others are set to zero. At each iteration, the multiresolution support M (n) is updated by selecting new coefficients in the wavelet transform of the residual which have an absolute value larger than a given threshold. The threshold is automatically derived assuming a given noise distribution such as Gaussian or Poisson noise.

More recently, it was shown [14, 16, 24] that a solution of Eq. 58 for p = 1 can be obtained through a thresholded Landweber iteration:

$$\displaystyle\begin{array}{rcl} X^{(n+1)} = P_{ +}\left (\mathbf{WDen}_{\lambda }\left (X^{(n)} + H^{T}\left (Y - HX^{(n)}\right )\right )\right )\,& &{}\end{array}$$
(60)

with \(\|H\| = 1\). In the framework of monotone operator splitting theory, it was shown that for frame dictionaries, a slight modification of this algorithm converges to the solution [14]. Extension to constrained nonlinear deconvolution is proposed in [22].

5.4.1 Constraints in the Object or Image Domains

Let us define the object domain \(\mathcal{O}\) as the space in which the solution belongs and the image domain \(\mathcal{I}\) as the space in which the observed data belongs (i.e., if \(X \in \mathcal{O}\) then \(HX \in \mathcal{I}\)). The constraint in (59) was applied in the image domain, while in (60) we have considered constraints on the solution. Hence, two different wavelet-based strategies can be chosen in order to regularize the deconvolution problem. The constraint in the image domain through the multiresolution support leads to a very robust way to control the noise. Indeed, whatever the nature of the noise, we can always derive robust detection levels in the wavelet space and determine scales and positions of the important coefficients. A drawback of the image constraints is that there is no guarantee that the solution is free of artifacts such as ringing around point sources. A second drawback is that image constraints can be used only if the point spread function is relatively compact, i.e., does not smear the information over the whole image.

The property of introducing robust noise modeling is lost when applying the constraint in the object domain. For example, in the case of Poisson noise, there is no way (except using time-consuming Monte Carlo techniques) to estimate the level of the noise in the solution and to adjust properly the thresholds. The second problem with this approach is that, in fact, we try to solve two problems simultaneously (noise amplification and artifact control in the solution) with one parameter (i.e., λ). The choice of this parameter is crucial, while such a parameter is implicit when using the multiresolution support.

Ideally, constraints should be added in both the object and image domains in order to better control the noise by using the multiresolution support and avoid such a ringing artifact.

5.4.2 Example

A simulated Hubble Space Telescope Wide Field Camera image of a distant cluster of galaxies is shown in Fig. 14, upper left. The simulated data are shown in Fig. 14, upper right. The Richardson–Lucy and the wavelet solutions are shown, respectively, in Fig. 14, lower left and right. The Richardson–Lucy method amplifies the noise, which implies that the faintest objects disappear in the deconvolved image, while the wavelet starlet solution is stable for any kind of PSF, and any kind of noise modeling can be considered.

5.5 Detection and Deconvolution

The PSF is not needed with MVM. This is an advantage when the PSF is unknown or difficult to estimate, which happens relatively often when it is space variant. However, when the PSF is well determined, it becomes a drawback because known information is not used for the object reconstruction. This can lead to systematic errors in the photometry, which depends on the PSF and on the source signal-to-noise ratio. In order to preempt such a bias, a kind of calibration must be performed using simulations [50]. This section shows how the PSF can be used in the MVM, leading to a deconvolution.

5.6 Object Reconstruction Using the PSF

A reconstructed and deconvolved object X i can be obtained by searching for a signal X i such that the wavelet coefficients of HX i are the same as those of the detected structures α i . If W describes the wavelet transform operator and P w the projection operator in the subspace of the detected coefficients, the solution is found by minimization of

$$\displaystyle\begin{array}{rcl} \min _{X_{i}} \parallel \alpha _{i} - P_{w}\left (\mathbf{W}HX_{i}\right ) \parallel ^{2}& &{}\end{array}$$
(61)

where α i represents the detected wavelet coefficients of the object O i and H is the PSF. In this approach, each object is deconvolved separately. The flux related to the extent of the PSF will be taken into account. For point sources, the solution will be close to that obtained by PSF fitting. This problem is also different from global deconvolution in the sense that it is well constrained. Except for the positivity of the solution which is always true and must be used, no other constraint is needed. This is due to the fact that the reconstruction is performed from a small set of wavelet coefficients (those above a detection limit). The number of objects is the same as those obtained by the MVM, but the photometry and the morphology are different. The astrometry may also be affected.

5.7 The Algorithm

Any minimizing method can be used to obtain the solution X i . Since there is no problem of convergence, noise amplification, or ringing effect, the Van Cittert method was proposed on the grounds of its simplicity [57]. It leads to the following iterative scheme:

$$\displaystyle\begin{array}{rcl} X_{i}^{(n+1)} = X_{ i}^{(n)} + \mathbf{R}\left (\alpha _{ i} - P_{w}\left (\mathbf{W}HX_{i}^{(n)}\right )\right )& &{}\end{array}$$
(62)

where R is the inverse wavelet transform, and the algorithm is:

  1. 1.

    Set n to 0.

  2. 2.

    Find the initial estimation X i (n) by applying an inverse wavelet transform to the set α i corresponding to the detected wavelet coefficients in the data.

  3. 3.

    Convolve X i (n) with the PSF H: \(Y _{i}^{(n)} = HX_{i}^{(n)}\).

  4. 4.

    Determine the wavelet transform \(\alpha ^{(Y _{i}^{(n)}) }\) of Y i (n).

  5. 5.

    Threshold all wavelet coefficients in \(\alpha ^{(Y _{i}^{(n)}) }\) at position and scales where nothing has been detected (i.e., P w operator). We get \(\alpha _{t}^{(Y _{i}^{(n)}) }\).

  6. 6.

    Determine the residual \(\alpha _{r} =\alpha _{i} -\alpha _{t}^{(Y _{i}^{(n)}) }\).

  7. 7.

    Reconstruct the residual image R (n) by applying an inverse wavelet transform.

  8. 8.

    Add the residual to the solution: \(X_{i}^{(n+1)} = X_{i}^{(n)} + R^{(n)}\).

  9. 9.

    Threshold negative values in X i (n+1).

  10. 10.

    If \(\sigma (R^{(n)})/\sigma (X_{i}^{(0)}) <\epsilon\), then \(n = n + 1\) and go to step 3.

  11. 11.

    X i (n+1) contains the deconvolved reconstructed object.

In practice, convergence is very fast (less than 20 iterations). The reconstructed image (not deconvolved) can also be obtained just by reconvolving the solution with the PSF.

5.8 Space-Variant PSF

Deconvolution methods generally do not take into account the case of a space-variant PSF. The standard approach when the PSF varies is to decompose the image into blocks and to consider the PSF constant inside a given block. Blocks which are too small lead to a problem of computation time (the FFT cannot be used), while blocks which are too large introduce errors due to the use of an incorrect PSF. Blocking artifacts may also appear. Combining source detection and deconvolution opens up an elegant way for deconvolution with a space-variant PSF. Indeed, a straightforward method is derived by just replacing the constant PSF at step 3 of the algorithm with the PSF at the center of the object. This means that it is not the image which is deconvolved, but its constituent objects.

5.9 Undersampled Point Spread Function

If the PSF is undersampled, it can be used in the same way, but results may not be optimal due to the fact that the sampled PSF varies depending on the position of the source. If an oversampled PSF is available, resulting from theoretical calculation or from a set of observations, it should be used to improve the solution. In this case, each reconstructed object will be oversampled. Equation 61 must be replaced by

$$\displaystyle\begin{array}{rcl} \min _{X_{i}} \parallel \alpha _{i} - P_{w}\left (\mathbf{W}\mathcal{D}_{l}HX_{i}\right ) \parallel ^{2}& &{}\end{array}$$
(63)

where \(\mathcal{D}_{l}\) is the averaging-decimation operator, consisting of averaging the data in the window of size l × l and keeping only one average pixel for each l × l block.

5.10 Example: Application to Abell 1689 ISOCAM Data

Figure 15 (left) shows the detections (isophotes) obtained using the MVM method without deconvolution on ISOCAM data. The data were collected using the 6 arcsec lens at \(6.75\,\upmu {\mathrm{m}}\). This was a raster observation with 10 s integration time, 16 raster positions, and 25 frames per raster position. The noise is nonstationary, and the detection of the significant wavelet coefficients was carried out using the root mean square error map R σ (x, y) by the method described in [50]. The isophotes are overplotted on an optical image (NTT, band V) in order to identify the infrared source. Figure 15 (right) shows the same treatment but using the MVM method with deconvolution. The objects are the same, but the photometry is improved, and it is clearly easier to identify the optical counterpart of the infrared sources.

Fig. 15
figure 15

Abell 1689: left, ISOCAM source detection (isophotes) overplotted on an optical image (NTT, band V). The ISOCAM image is a raster observation at 7 \(\upmu {\mathrm{m}}\). Right, ISOCAM source detection using the PSF (isophotes) overplotted on the optical image. Compared to the left panel, it is clearly easier to identify the detected infrared sources in the optical image

6 Conclusion

In this chapter we have used the sparsity principle that now occupies a very central role in signal processing. We have discussed the vision models within which the sparsity principle is applied. Finally, we have reviewed the use of the starlet wavelet transform as a prime technique in order to apply the sparsity principle in the context of vision models in various application domains. Among the latter are object detection coupled with denoising, deconvolution and ltering generally. Issues of algorithmic optimization and of statistical modeling entered into our discussion on various occasions. Many examples and case studies were used to demonstrate the powerfulness of the approaches described for astronomical data analysis and processing.

7 Cross-References

8 Recommended Readings

  • Bertero, M., Boccacci, P.: Introduction to Inverse Problems in Imaging. Institute of Physics (1998)

  • Starck, J.-L., Murtagh, F.: Astronomical Data Analysis, 2nd edn. Springer (2006).

  • Mallat, S.: A Wavelet Tour of Signal Processing, 3rd edn. Academic (2008).

  • Starck, J.-L., Murtagh, F., Fadili, J.: Sparse Image & Signal Processing. Cambridge University Press (2010)