1 Introduction

Image inpainting is the outstretched and progressive area for research in Image Processing domain. Earlier inpainting was only done for restoring small areas, text removal, filling in holes, removing red eye, etc. The application has been widely spread to modify large areas by retaining the structure and the texture information, video inpainting [1], secret image sharing [2], etc. The aim of image inpainting is to fill the damaged region or unknown region of the image in such a way that inpainted image should seem to be unaltered. Bertalmio et al. [3] introduced partial differential equations (PDEs) based image inpainting method. The method first inpaints the boundary of the target region by considering the strength of the isophotes and then propagates the information using anisotropic diffusion. Isophotes represents direction of minimal change and are obtained by computing perpendicular of the gradient vector at every pixel. Anisotropic diffusion [4] is applying diffusion on each pixel to smooth textures in an image. A threshold function is used to encourage diffusion inside the region and prevent diffusion across the boundary. The method is suitable for inpainting small regions but produces blur effects for large inpainting area. To enhance the results and reduce the complexity, Oliveira et al. [5] proposed convolution method. The target region is repeatedly convolved with the diffusion kernel. The results in [5] were produced after performing 100 iterations on average, which is time consuming. To speed up the process, authors in [6] modified the diffusion kernel. In [6] the zero value is shifted to the bottom right corner of the filter. This modification will eliminate the iterative process of convolution and so will reduce the time of operation. However, the results are better only when the background is symmetrical, otherwise the method produces blur effects. Authors of [7] proposed the interpolation method. The candidate inpainting results are produced by interpolating to the observed data at the different neighbor- hoods of the target region. A final inpainting decision is taken among the inpainting candidate for each pixel in the target region by combining the information of co occurrence matrix and the patch found in the image. However, averaging of pixel intensity on the basis of surrounding pixel information leads to blurring effect and does not recreate a fully satisfied texture and structure look. Bertalmio et al. [8] proposed a new approach of dividing the image into two sub images as structural image and textural image. These sub images are individually processed using different approaches. The target region in the structural image is processed using inpainting approach [3] and textural image is processed using texture synthesis [9]. After the target region is inpainted, these individually inpainted images are combined to get the final result. The above methods inpaint one pixel at a time which results into high computational complexities and more computational time. The methods are suitable to inpaint small target regions while they produce blurs in inpainting larger areas. In 2004 Criminisi et al. [10] proposed patch wise filling of the target region rather than pixel wise filling. The method first extracts the border of the target region and then defines patches for every pixel on the border. Every patch is assigned a priority by using a data term and a confidence term. Data term reflects the presence of structure and confidence term reflects the presence of known information. The border patches are inpainted in decreasing order of the priority. After filling every border patch, target region is updated for next iteration. Thus, the process repeats itself until the target region is completely filled. Many variations in data term have been proposed in literature to improve the results of Criminisi [11]. The authors of [12] identified that priority of the patch drops to a low value due to rapid decrease in confidence term at lower iterations. To reduce this effect regularization factor is introduced in the confidence term. The regularization factor is used to reduce the smoothness of the confidence curve. Another modification in priority term has been proposed in [13]. Some pixel in the target region was observed to have low data term and high confidence term which resulted in low priority to the pixel. To avoid this situation, curvature feature of isophotes was added into data term. If there is linear change in surrounding of the pixel, the absolute value of the curvature will be smaller and so the priority of the patch would be higher. Authors of [14] presented the mechanism for determining inpainting point priority. In the traditional exemplar method, regions with different structures may be assigned low priority value due to low confidence term or vice versa. To reduce this, inpainting point is divided into four levels of queue according to their parameters. The scheduling mechanism chooses one of these queues and determines the best inpainting point. In traditional exemplar method, the best patch was identified by computing sum of squared difference (SSD) between the target patch and the candidate patch. Here, SSD was calculated between known region of the target patch and corresponding region of the candidate patch but does not consider the unknown region of the target patch and corresponding region of the candidate patch. Hence, authors of [15] added variance of the target patch and candidate patch to SSD for considering the correlation between unknown region of the target patch and corresponding region of the candidate patch. Traditional exemplar method does not consider the situation when many candidate patches with similar SSD values are found in the image. To avoid this situation, two round search was introduced [16]. The few candidates are selected by calculating mean SSD which reflects average similarity of patches. Then, these candidate patches are re-considered by measuring normalized cross correlation (NCC). The sum of absolute difference (SAD) ignores the edge sensitivity of human visual system (HVS). Hence, edge information SAD was introduced [14]. The results are shown in figure. All the above methods compare two patches at pixel level. To compare the similarity at patch level, Structural Similarity Index (SSIM) was proposed [17]. SSIM algorithm compares the structural information between two patches as HVS is highly adaptive to structural changes. Authors of [18] proposed the patch selection method by calculating perceptual-fidelity aware mean squared error (PAMSE) between two patches. PAMSE is a Gaussian smoothed mean squared error (MSE) which can extract more geometric structural information compared to MSE. The candidate patch with least PAMSE value is chosen as the source patch for the target patch. The above methods only compare two patches as it is and does not consider rotated versions of them. Authors of [19] proposed patch matching based on symmetrical exemplar. The basic idea was to fill the damaged region in left part of the patch by its symmetrical region on the right. The similar patch for the target patch is searched in its eight direction and also in arbitrary direction nd then the most similar symmetrical patch is copied to the target patch. The method gives better results when the image contains more similar symmetrical regions. Another modification based on patch size selection could be done in traditional exemplar method to reduce the complexity of the algorithm. Ishi et al. [20] proposed gradient based adaption of patch size. The magnitude of image gradient represents textural variations. High value of the gradient magnitude indicates edges and low value of the gradient magnitude indicates smooth regions. The patch size increased when the magnitude of the gradient value is high, otherwise the pre-defined patch size is used for inpainting the target region. Another method for adaptive patch size was introduced using image segmentation [21].The segmentation map divides the image into different regions based on local texture similarity. Initially, the patch size for the target patch and the source patch is fixed. When the target patch belongs to only one segmented region, size of the patch is increased. While the patch size remains unchanged when the target patch belongs to more than one segmented regions. These methods presented inpainting in single resolution and could be modified to inpainting by multi-resolution concept. Inpainting in multi resolution would reduce the search range for best patch and consume less inpainting time. A hierarchical method working on image pyramid was proposed in [22,23,24]. Kim et al. proposed image pyramid method which is obtained by dividing original image to set of images having different resolutions. The original image will be at finest level and its different resolutions are at the coarse level. The best match as stated in [10] is iteratively searched from coarse to fine level of the pyramid. The above method does not consider frequency component individually for analysis. Therefore, authors of [23] combined the texture synthesis and image inpainting technique in multi-resolution to consider frequency component individually. The original image containing damaged region is divided into two sub images as high frequency image and low frequency image using discrete cosine transform (DCT). Another modification for multi-resolution based on Wavelet transform was presented [24]. The damaged image is applied to the wavelet transform to form L coarse levels. For every level scaling and wavelet coefficients subbands are estimated. The coarsest scaling sub band is filled with the PDE [3] method as it is low frequency smooth image. Then the wavelet sub bands are filled using exemplar based inpainting [10]. After the filling of the coarsest scale, inverse wavelet transform is applied to approximate next fine scale. In this way, filling from the coarsest level to the finest level is done.

2 Review of exemplar based image inpainting method

Criminisi et al. [10] proposed patch wise filling algorithm which is known as Exemplar based image inpainting algorithm. The basic notations used here were: for target region Ω, for boundary pixels δΩ and for source region ø. Priority P(p) as given in (1) is assigned on the basis of two criteria: (i) contours striking on patch and (ii) maximum information in patch. Data term reflects the presence of contours and confidence term reflects the presence of known information. C(p) as given in (2) and D(p) as given in (3) are confidence term and data term respectively for the target patch ψp with the center pixel p. C(q) is a confidence value for the pixels q belonging to the source region in the target patch ψp where |ψp| is the area of patch, IΩ is source region, α is the normalization factor (α = 255), \( \nabla I_{p}^{ \bot } \) shows isophotes direction and intensity at pixel p and np is the unit vector orthogonal to the pixel on δΩ.

$$ P\left( p \right) = C\left( p \right)* D(p) $$
(1)
$$ C\left( p \right) = \frac{{\mathop \sum \nolimits_{{q \in \psi p \cap \left( { I {-} \varOmega } \right)}} C\left( { q } \right) }}{{\left| { \psi_{p} } \right|}} $$
(2)
$$ D (p) = \frac{{\nabla I^{ \bot } \cdot np }}{\alpha } $$
(3)

The best match is identified by calculating distance between every known pixel of the source patch and the target patch considering pixel value information. Distance between these two patches is computed by sum of squared differences (SSD) which is denoted by \( d(\psi_{{\hat{p}}} ,\psi_{q} ) \). The best similar patch \( \psi_{{\hat{q}}} \) is defined as,

$$ \psi_{{\hat{q}}} = arg \;min\; d \left( {\psi_{{\hat{p}}} ,\psi_{q} } \right) $$
(4)

The confidence term C(p) is updated at the area of the target patch \( \psi_{{\hat{p}}} \) every time when pixels are copied form ψq to \( \psi_{{\hat{q}}} \).

$$ C\left( p \right) = C\left( {\hat{p}} \right) $$
(5)

This method could inpaint the large target area while restoring both the textural and structural information. The method does not perform well for curvature structures and does not generate good results when similar patch is not found in the image.

3 Variations in traditional exemplar method

3.1 Modifications in priority term

Priority calculation is important for the filling order. Here, the traditional priority calculation is modified by adding regularization factor, curvature feature of isophotes and by scheduling the priority points [12]. The priority calculation is given as (6) and modified confidence term Gc(p) is defined as (7). Here, a and b are the coefficients related to confidence term and data term respectively, where a ≥ 0, b ≤ 1 and a + b = 1. The regularization factor μ has the value between 0.1 and 0.7 empirically.

$$ P\left( p \right) = a \, G_{c} \left( p \right) \, + \, b \, D\left( p \right) $$
(6)
$$ G_{c} \left( p \right) = \left( {1 - \, \mu } \right) \, C\left( p \right) + \mu $$
(7)

The consideration of curvature feature of isophotes will avoid the condition where the priority value is near to zero when the data term is zero [13]. The priority equation is given in (8) where weights αβ are introduced such as α + β = 1. K(p) is the curvature of isophotes through the center pixel p defined as (9).

$$ P\left( p \right) = \alpha \cdot C\left( p \right) + \, \beta \cdot (D\left( p \right) + 1/K(p)) $$
(8)
$$ K\left( p \right) = \nabla \cdot \left[ {\frac{{\nabla I_{p} }}{{\left| {\nabla I_{p} } \right|}}} \right] $$
(9)

The method does not perform well for the image containing different color and texture. Another approach [14] modified the mechanism of determining inpainting point priorities. The method first divides the target region into structure and smooth type region. If the value of D(p) as given in (1) for a pixel p on the border of the target region exceeds from the pre-defined threshold THT then it belongs to the structure region otherwise it belongs to the smooth region. Now, the inpainting priority points are classified into four levels of queue. Structure region inpainting priority is classified into different levels by considering D(p) values and for smooth region inpainting priority is classified based on C(p) values as defined in (2). The D(p) and C(p) values of inpainting point are sorted in descending order. The inpainting point having D(p) values for structure regions and C(p) values for smooth regions in first 25% belongs to 1st queue, second 25% belongs to 2nd queue, next 25% belongs to 3rd queue and last 25% belongs to 4th queue. The selection of priority queue is determined by Weighted Round Robin (WRR) fashion. This method generates better results compared to the traditional exemplar method. The computation complexity of the scheduling mechanism is high.

3.2 Variation in patch selection

The choice of best matching patch could be made by measuring sum of squared difference (SSD), sum of absolute difference (SAD), normalized cross correlation (NCC), mean square error (MSE), perceptual fidelity aware MSE (PAMSE), Structural Similarity Index (SSIM) and many other objective metrics. In [15] authors proposed the SSD criteria considering the correlation between unknown region of the target patch and corresponding region of the candidate patch. The variance of target patch and candidate patch difference is added to the SSD criteria as given in (10). Here, v(ψp) and v(ψq) are variance of target patch and candidate patch respectively and δ is the regularization factor such as 0 < δ < 1.

$$ \psi \left( q \right) = \arg \hbox{min} \left( {d\left( {\psi_{p} ,\psi_{q} } \right) + \delta *\left( {v\left( {\psi_{p} } \right) - v\left( {\psi_{q} } \right)} \right)} \right) $$
(10)

Authors of [16] proposed two round search method to avoid the situation when many patches have similar SSD value. In this method, few candidates are selected by calculating mean SSD as given in (11) where \( \bar{R} \), \( \bar{G} \) and \( \bar{B} \) are the mean values of intensity for each channel and then they are re-considered by measuring Normalized Cross Correlation (NCC) as given in (12) where G denotes grayscale values.

$$ \bar{d}_{SSD} \left( {\psi_{p} ,\psi_{q} } \right) = \sum \left[ {\left( {\bar{R}_{{\psi_{p} }} - \bar{R}_{{\psi_{q} }} } \right)^{2} + \left( {\bar{G}_{{\psi_{p} }} - \bar{G}_{{\psi_{q } }} } \right)^{2} + \left( {\bar{B}_{{\psi_{p} }} - \bar{B}_{{\psi_{q} }} } \right)} \right] $$
(11)
$$ d_{NCC} \left( {\psi_{p} ,\psi_{q} } \right) = \frac{{\left[ {\sum G_{{\psi_{p } }} \cdot G_{{\psi_{q} }} } \right]^{2} }}{{\sum \left[ {G_{{\psi_{p } }} } \right]^{2} \sum \left[ {G_{{\psi_{q } }} } \right]^{2} }} $$
(12)

The best matching patch in [12] was selected by measuring SSE and SAD independently. The distance between target patch and candidate patch is calculated by SSE as given in (12) and SAD as given in (13). The above methods ignore the edge sensitivity of Human Visual System (HVS). Comparing two patches by considering edge information would lead to better results for best patch. Hence, authors of [14] introduced weighted edge information SAD to the traditional SAD calculation. All the above methods compare two patches at pixel level. To compare the similarity at patch level, Structural Similarity Index (SSIM) was proposed [17]. SSIM is calculated to extract more structural information. SSIM have more computational complexity. MSE have simpler computation but calculating MSE for structural information is not effective. Hence, Gaussian smoothed MSE known as Perceptual fidelity Aware Mean Square Error (PAMSE) was proposed in [18]. PAMSE can be defined as (13) where \( \sigma \in \left[ {0.3, \, 0.5} \right] \) is the standard deviation from a Gaussian filter Gσ, \( \otimes \) is the convolution operator, Ωc is the source region and \( |\psi_{{\hat{p}}} \cap \, \varOmega c| \) is the number of known pixels.

$$ PAMSE\left( {\psi_{{\hat{p}}} ,\psi_{q} } \right) = \frac{1}{{\left| {\psi_{{\hat{p}}} \cap \varOmega c} \right|}}\left\|{\text{G}}_{\sigma } \otimes \left( {\psi_{{\hat{p}}} - \psi_{q} } \right)_{2}^{2} \right\|$$
(13)

The above methods only compare two patches as it is and does not consider rotated versions of them. Therefore, authors of [17] proposed patch matching algorithm based on rotation invariance. The patch is transformed to matrix containing vectors. From center pixel to the edge pixel, every vector li represents one scale of rotation. Pixels in every vectors are not same, so the zeros are added to make a matrix. Transformation is achieved by rotation of patch by cyclically moving every layer vector li. For one vector, \( l_{i} = \, \left( {p_{1} , \, p_{2} , \ldots , \, p_{(8*i - 1)} , \, p_{(8*i)} } \right) \) are the pixels in the vector. The similarity can be measured with the function given as (14) where \( ROR(\psi_{{\hat{q}}} ,s) \) is defined as (15) and numi is defined as (16).

$$ \psi \bar{q} = \arg \mathop {\hbox{min} }\limits_{{\psi_{q } \in \emptyset , s = 1, \ldots S}} ssd(\psi_{{\hat{q} }} ,ROR\left( {\psi_{{\hat{q} }} ,s} \right) $$
(14)
$$ ROR\left( {\psi_{{\hat{q} }} ,s} \right) = ROR_{i = 1}^{n} \left( {l_{i} ,num_{i} } \right) $$
(15)
$$ num_{i} = floor\left( {\frac{8*i*s}{S}} \right) $$
(16)

The rotation scale S is set to 18 and s = 1, 2, …, S. For numi = 1, function \( ROR\left( {l_{i} ,1} \right) \, = \left( {p_{{\left( {8*i} \right),}} p_{1} , \, p_{2} , \ldots , \, p_{(8*i - 1)} } \right) \) is the right cyclic moving one rotation. We calculate ROR function for all the value of s = 1, 2, …, S and for which ever value of s, ssd is small, the patch is similar to that rotated candidate patch.

3.3 Adaptive patch size

To enhance the results of traditional exemplar method, modification in patch size have been done. Authors of [20] proposed gradient based adaption of patch size. The textural information is indicated by magnitude of gradient. Edges are identified by high value of magnitude of gradient and low value represents smooth areas. The patch size is depended on the magnitude of the gradient. If the gradient of magnitude is higher, then the patch size is also large and if the magnitude of the gradient is low, then the patch size is smaller. This method generated satisfactory results while consuming less time. Another modification could be done for adaptive patch using graph based region segmentation algorithm [21]. The segmentation map M divides an image I into different regions based on texture similarity and represents structure by boundaries which is expressed as given in (17).

$$ I = \mathop {\bigcup }\limits_{i = 1}^{N} R_{i} $$
(17)

Ri shows the ith segment of an image I and N represents the number of segment. The minimum and maximum patch size is already defined by the user. If the target patch belongs to more than one segmented region, then default patch size is used while if the target patch belongs to only one segmented region, then patch size is increased. The adaptive patch for inpainting process and matching process based on patch sparsity was proposed [17]. The method first initializes the patch size w0, which can be taken as the fortieth part of the target region. Patch size selection is divided for two different processes, for matching process wm patch size is used and for inpainting process wp patch size is used. Patch size is defined by calculating patch sparsity S(p). For target patch ψp and source patch ψq, patch sparsity S(p) is calculated as given in (18) and ωp,q is defined as given in (19).

$$ S\left( p \right) = \sqrt {\left[ {\mathop \sum \limits_{{q \in N_{s\left( p \right)} }} \omega_{p,q}^{2} } \right] \cdot C\left( p \right)} $$
(18)
$$ \omega_{p,q} = \frac{1}{Z\left( p \right)}\exp \left( { - \frac{{d\left( {\psi_{p} ,\psi_{q} } \right)}}{25}} \right) $$
(19)

Here, \( d(\psi_{p} ,\psi_{q} ) \) is the Sum of Squared Difference (SSD) between patch ψp and ψq and Z(p) is a normalized constant. If the value of patch sparsity is less, then the patch is located in an edge region and so large patch size for matching process and small patch size for inpainting process is used. The large patch sparsity indicates the patch is located in a stable region and so large patch size is used to speed up the process.

3.4 Multi-resolution exemplar method

The traditional exemplar method worked on single resolution, which have been modified to multi-resolution for better results and less computational time consumption. In [22] authors presented recursive method based on image pyramid. For a given input image laplacian pyramids are generated. The flow of choosing best match based on multi scaled space. To generate a pyramid, original image g0 is downsampled to form low frequency information containing image g1. Then, the downsampled image g1 is upsampled so that size of the original image is recovered. To get the laplacian image l0, upsampled image g1 is subtracted from the original image g0. If the process is repeated N times to g1 image, then N layered pyramid is obtained. Last layer of the pyramid will not be a laplacian image. Inpainting process starts with the last layer. Another approach for image restoration based on multi-resolution was proposed in [23]. The damaged image is divided into low frequency containing and high frequency containing image using DCT. The original image I is decomposed into spectral subbands and for first k subbands Inverse Discrete Cosine Transform (IDCT) is applied to obtain Lk low frequency image. The high frequency image Hk is obtained by subtracting low frequency image Lk from original image I. The low frequency image L is inpainted by fast inpainting approach proposed in [5] to obtain L*. High frequency image H is decomposed to form n level Gaussian pyramid. Texture synthesis [9] is applied to Hi high frequency image. The high frequency synthesized image H0* and low frequency inpainted image L* are combined to form complete restored image I*. This method simultaneously repairs texture and intensity/color information. Another inpainting approach using wavelet based inter and intra scale dependency was presented in [24]. The input image is decomposed into scaling and wavelet coefficient using Discrete Wavelet decomposition. For each scale, scaling and wavelet coefficients are given as in (20) and (21).

$$ s_{j + 1} = h_{j} *h_{j} *s ,\quad w_{j + 1}^{LH} = g_{j} *h_{j} *s_{j} $$
(20)
$$ w_{j + 1}^{HL} = h_{j} *g_{j} *s_{j} , \quad w_{j + 1}^{HH} = g_{j} *g_{j} *s_{j} $$
(21)

Here, hj represents low pass filter, gj represents high pass filter and sj,wj are scaling and wavelet coefficient respectively for j scale. The prediction of scaling and wavelet coefficients from coarse to fine level is done to inpaint the image in wavelet domain. Every coefficient is dependent on neighboring coefficient of same subband and corresponding coefficient of other subbands. The coarsest scaling band having smooth image is inpainted by total variation based inapinitng. To complete three wavelet coefficient, exemplar based method [10] is used. The difference of target patch and source patch should be minimum which is calculated as given in (22) where \( {\text{w}}_{\text{LH}}^{\text{d}} \;{\text{and}}\;{\text{w}}_{\text{LH}}^{\text{e}} \) are target patch and source patch in LH subband, sd and se are target patch and source patch of scaling subband and α, β, γ are parameter to be determined and set as 1 by default.

$$ d = \alpha \left\| {w_{LH}^{d} - w_{LH}^{e} } \right\| + \beta \left\| {w_{HL}^{d} - w_{HL}^{e} } \right\| + \gamma \left\| {w_{HH}^{d} - w_{HH}^{e} } \right\| + \left\| {s^{d} - s^{e} } \right\| $$
(22)

Now, inverse transform for the level is obtained to approximate fine level. The procedure is repeated until the target region of the finer level is not completely filled. The results of the proposed method are compared to exemplar method which shows that it maintains textural and structural information better.

4 Conclusion

The paper covers variations in patch selection, priority term, patch size and multi-resolution inpainting. The review of some of the variations in exemplar based image inpainting method is presented in Table 1. For two different patch matching methods SSE and SAD the results are similar when repairing small inpainting region. While for restoring large inpainting region, combination of high regularization factor in priority term for SSE method and low regularization factor for SAD gives better results otherwise the method may generate low quality results. Adding variance to the SSD criteria for patch matching will reduce the error in patch selection and give better results than traditional exemplar method. Using priority scheduling technique for choosing inpainting point will enhance restoring of the structures in an image but as the algorithm has more computational complexity, it can only be preferred for inpainting large areas. Varying the size of the patch for inpainting process will lead to restore minute details in highly structured and textured image while for repairing low texture and structure containing image the method becomes cumbersome. The rotation invariance inpainting method gives better results for refilling large areas in the image containing symmetrical objects and while repairing small areas for asymmetrical objects in the image. Multi-resolution approach based on DCT and DWT may generate results that seem to have sudden changes around the target region in highly textured areas. The multi-resolution inpainting based on energy optimization method produces good results in preserving texture and structure but better hardware capacity are needed to reduce its time cost.

Table 1 Summary table for variations in exemplar method