1 Introduction

With the rapid progress of digital-image-editing software, digital images can be easily manipulated, leaving no perceptible trace. The field of digital forensics has emerged to restore public trust in digital images. Image forensic techniques are generally categorized into two groups: active and passive forensic techniques [16, 32]. Active forensic techniques such as digital watermarking insert a watermark or signature into the image at the time of recording and detect tampered regions using the embedded information [38, 44]. Basically, the application of active approaches is limited because the majority of consumer images are created without containing any digital watermark or signature. In contrast to active approaches, passive image forensic techniques have been developed for image authentication without the need for any prior knowledge [7]. These techniques work on the assumption that, although digital forgeries may leave no visual clue, they alter the underlying statistics of an image [10]. In this work, we focus on passive image forensics.

Digital images can be tampered or manipulated in many different ways. Copy-move forgery (CMF), which copies a part of the image and pastes it into another region, is one of the most common methods for digital image tampering [26, 33]. In the CMF scenario, a tampered region might not be exactly the same as another region since it usually undergoes a sequence of post-processing operations such as rotation, scaling, blurring, and noising for a better visual appearance. Therefore, it becomes increasing difficult to manually identify tampered regions even for practiced users. Accordingly, the detection of the CMF has become one of the most actively researched topics in passive image forensics.

Exhaustive search where an image and its all cyclic-shifted versions are repeatedly compared for finding duplicated regions is a straightforward way to detect the CMF [8]. However, it has extremely high computational complexity and cannot detect the CMF if the copied region has been scaled. Therefore, many CMF detection (CMFD) algorithms with reduced computational complexity have been introduced to efficiently find the duplicated regions.

In this paper, we propose a new feature descriptor, upsampled log-polar Fourier (ULPF) features, which is robust to several geometric transformations including rotation, scaling, sheering, and reflection. We first describe the theoretical background of the ULPF representation. We then present how to extract rotation and scale invariant features from the ULPF representation by exploiting properties of the Fourier transform. In addition, we analyze the common CMFD processing pipeline. Based on the analysis, we improve a part of the CMFD processing pipeline to efficiently handle various types of tampering attacks. In our simulation, we present comparative results between the proposed feature descriptor and state-of-the-art ones with proven performance guarantees.

The rest of this paper is organized as follows. In Section 2, detailed overviews of conventional CMFD algorithms and the common CMFD processing pipeline are given. In Section 3, we introduce the ULPF representation and present how to extract features from the ULPF representation. The proposed CMFD processing pipeline is illustrated in Section 4. Comparative experimental results of the proposed and conventional algorithms are presented in Section 5. Finally, our conclusions are drawn in Section 6.

2 Related works

2.1 Conventional CMFD algorithms

Up until this point, researchers have developed various techniques to efficiently find tampered regions. The first CMFD method was proposed by Fridrich et al. in 2003 [14]. This method divides an image into 8 ×8 overlapping blocks and extracts discrete cosine transform (DCT) features from the blocks. Feature vectors are lexicographically sorted, and then similar feature vectors are identified to judge forgery. More efficient algorithms have been introduced in the literature including blur-invariant moments [31], principal component analysis (PCA) [36], Hu moments [41], discrete wavelet transform (DWT) features [17, 32], improved DCT features [15], and segmentation-based detections [23, 39].

Recently, there was an attempt to apply the Fourier-Mellin transform (FMT) to the CMFD application. The authors in [5] proposed to extract features from image blocks using the FMT. The FMT algorithm takes the Fourier transform of each block and the resulting magnitudes are mapped into log-polar coordinates. They claimed that the FMT-based features are robust to the geometric transformation including scaling and translation. In [22], an improved algorithm was proposed by Li and Yu. The algorithm in [22] modifies the projection operation of the algorithm in [5] to achieve better rotation invariance. A log-polar Fourier (LPF) transform-based algorithm was proposed in [42]. Unlike the FMT algorithms in [5] and [22], the LPF algorithm first performs a log-polar transform and then takes the Fourier transform of the result. The similar regions are identified by computing the cross power spectrum of the LPF results. Our work is mainly motivated by the FMT and LPF algorithms in [5] and [42].

Ryu et al. [37] proposed a feature extraction algorithm using Zernike moments (ZERNIKE). Since the magnitude of Zernike moments is algebraically invariant against rotation, Ryu’s method can detect a forged region even though it is largely rotated. It was reported in [11] that the ZERNIKE algorithm shows relatively good performance for detecting rotated duplicated regions. Note that all the algorithms mentioned above divide the input image into overlapping blocks and apply a feature extraction process to each block.

There exists another type of CMFD algorithms, which does not utilize block-based feature representations. These algorithms identify high-entropy regions (keypoints) in the image and extract feature vectors only at the keypoints. Therefore, the number of feature vectors is reduced and the computational complexity of the keypoint-based algorithms is relatively lower than that of the block-based algorithms. On the other hand, duplicated regions are often sparsely covered by matched pairs in the keypoint-based algortihms. This may result in the duplicated regions being completely missed. A number of keypoint-based descriptors have been widely used for image retrieval and object recognition. Among them, scale invariant feature transform (SIFT) [28] and speed up robust feature (SURF) [4] have been applied to CMFD applications.

In this work, we compare the performance of the proposed algorithm with state-of-the-art algorithms. We first choose FMT, LPF, ZERNIKE, and SIFT as comparison targets. Recently, new keypoint-based descriptors such as binary robust invariant scalable keypoints (BRISK) [21] and fast retina keypoints (FREAK) [1] have received considerable attention due to their proven efficacy. In our simulations, we implemented FREAK and applied it to the CMFD application. To the best of our knowledge, this is the first study to evaluate the performance of the FREAK descriptor for the CMFD application.

2.2 Common processing pipeline

Most CMFD algorithms follow a common processing pipeline shown in Fig. 1 [11, 12]. First, the target image is preprocessed. For example, a color image is optionally converted to gray scale. Next, the block-based methods divide the image into overlapping blocks and compute a feature vector of each block. On the other hand, the keypoint-based methods find some keypoints and compute feature vectors at the keypoints.

Fig. 1
figure 1

Common CMFD processing pipeline

In the matching step, highly similar feature vectors are matched using certain methods. Since computing the similarity between all possible feature-vector pairs introduces a huge computational load, several fast matching algorithms were introduced. Among the algorithms, most researchers propose the use of lexicographic matching [25] or kd-tree matching [6] in identifying similar feature vectors. It is worthwhile to note that a recent study [18] shows that, for the block-based methods, lexicographic matching might be a better trade-off in practice when taking both accuracy and runtime into account.

In the verification step, spurious matched pairs are removed and transformation parameters are estimated using reliable pairs. Many verification schemes have been proposed. For example, threshold-based schemes [31, 37], morphological operations [20, 35], and clustering-based schemes [2, 3] were introduced. In [2], Amerini et al. proposed a scheme that builds clusters from the locations of detected features and uses random sample consensus (RANSAC) to estimate the geometric transformation between the original area and its replica. Alternatively, the same affine transformation selection (SATS) scheme in [12] finds an initial transformation using matched pairs that are spatially close to each other. Then, the SATS scheme applies region growing on the pairs with the similar transformation parameters. In [11], the authors presented that SATS provides the most reliable results in their experiments.

The last step of the processing pipeline is post-processing. In this step, the input image is geometrically transformed using the transformation parameters estimated in the verification step. The original and transformed images are overlaid to find closely matched regions. The regions that do not exhibit common behavior are removed in the post-processing step.

The proposed algorithm mainly follows the common processing pipeline. Specifically, we adopt lexicographic sorting for feature matching. Further, we improve the verification step in the common processing pipeline to efficiently handle various types of tampering attacks.

3 Proposed feature descriptor

In this section, we first present the theoretical background of the proposed ULPF representation that can be successfully used for CMFD applications. Next, we describe how to extract feature vectors from the ULPF representation. The resultant feature vectors are used in the following matching and verification steps.

3.1 Upsampled Log-Polar Fourier (ULPF) representation

Consider an H×V input image which will be divided into T overlapping circular patches of radius R, where T=(H−2R+1)⋅(V−2R+1). Suppose that a circular patch c and its replica \(\tilde {\textbf {c}}\) are related by scale factor s and rotation angle λ. Let m=[m,n]T be a column vector indicating the pixel position in the circular patch. Further, let us denote A (s,λ) be a 2×2 linear matrix, which is represented by

$$\begin{array}{@{}rcl@{}} \textbf{A}^{(s,\lambda)}= \left[\begin{array}{ll} s\cos\lambda & s\sin\lambda\\ -s\sin\lambda & s\cos\lambda\\ \end{array}\right]. \end{array} $$
(1)

Then, the relationship between the two circular patches is expressed as

$$ c(\textbf{m})=\tilde{c}(\textbf{A}^{(s,\lambda)}\textbf{m}) $$
(2)

where s>0 and 0≤λ<2π.

We separate the effects of scaling and rotation by performing a log-polar transform on the circular patches. Let p=[r,𝜃]T be a column vector indicating the pixel position in the log-polar coordinates. Using this notation, (2) is rewritten as

$$ c(\textbf{p})=\tilde{c}(\textbf{p} + \Delta) $$
(3)

where \(r=\log \sqrt {m^{2}+n^{2}}\), 𝜃= arctan(n/m), and Δ=[logs,λ]T. The above equation indicates that 2-dimensional (2-D) scaling and rotation in the Cartesian coordinates can be replaced with separate 1-dimensional (1-D) translations in the log-polar coordinates.

We propose a log-polar grid sampling strategy to precisely capture the spatial information. We use an upsampled log-polar grid with NR bins in both the radial and angular directions to sample the circular patch of radius R. Let b and \(\tilde {\textbf {b}}\) be, respectively, N R×N R matrices containing sampled pixels of c and \(\tilde {\textbf {c}}\) on the upsampled log-polar grid (see Fig. 2). Further, let B and \(\tilde {\textbf {b}}\) be the Fourier transforms of b and \(\tilde {\textbf {b}}\), respectively. Then, according to the Fourier shift property, B and \(\tilde {\textbf {b}}\) are related by

$$ B(\textbf{u})=e^{j2\pi(\textbf{u}^{T}\Delta)} \tilde{B}(\textbf{u}) $$
(4)

where u=[u,v]T. It is natural that the magnitudes of B and \(\tilde {\textbf {b}}\), have the following relationship:

$$\begin{array}{@{}rcl@{}} |B(\textbf{u})|&=&|e^{j2\pi(\textbf{u}^{T}\Delta)}\tilde{B}(\textbf{u})|\\ &=&|\tilde{B}(\textbf{u})| \end{array} $$
(5)

where \(|e^{j2\pi (\textbf {u}^{T}\Delta )}|=1\). The above result forms the basis of a rotation and scaling invariant feature extraction scheme for the CMFD. Even when the copied region is rotated and scaled in the tampering attack, we clearly see that the magnitudes of B and \(\tilde {\textbf {b}}\) are always identical. Now, let us explain the proposed feature extraction algorithm using the ULPF representation.

Fig. 2
figure 2

Graphical explanation of c and b for the parameters R=4 and N=2

3.2 Feature extraction from the ULPF representation

It is well known that high frequency components are not stable if the image suffered from signal processing operations such as JPEG compression, noise contamination, and so on [24]. Therefore, low-pass filtering can improve the detection performance in the case of a tampering attack performing signal processing operations. To exploit this feature, we apply a 3×3 Gaussian low-pass filter to the input image before the image is divided into overlapping blocks.

In addition, large flat areas (such as sky, cloud, and ocean) in the image often produce a number of false matches in the matching process [8, 37]. To deal with this issue, we calculate the standard deviation of each circular patch. Only the circular patches of which standard deviations are larger than a certain threshold α are considered in the following CMFD process. In this paper, the Greek symbols, α, β, and γ, are used to denote certain thresholds.

In the proposed algorithm, the feature vector of the circular patch consists of two parts, the feature header and the feature body. Let f be the feature vector of c, which is represented by

$$ \textbf{f}=\{h,\textbf{f}^{\prime}\} $$
(6)

where h and f are the feature header and the feature body of c, respectively. The header h implicitly describes the entire characteristic of the current patch and the feature body f is used for specifying the actual features. Note that, at the beginning of the feature extraction step, the proposed algorithm has already calculated the standard deviations of circular patches. Using the results, the feature header h is simply constructed as

$$ h=\text{Round}(\sigma) $$
(7)

where σ is the pre-calculated standard deviation of c and Round(⋅) indicates a round operation.

Next, we extract features from the ULPF representation and construct f using those features. It is well-known that the Fourier transform efficiently compacts the energy of an image into a few low-frequency coefficients. This implies that the similarity between two circular patches can be accurately estimated using only a few low-frequency coefficients [34, 43]. Based on this energy compaction property of the Fourier transform, the proposed algorithm rearranges the magnitudes of coefficients from low-frequency to high-frequency and uses only the first few ones. In addition, when the input samples are real numbers, the resulting Fourier transform is conjugate symmetric and the magnitude is symmetric (see Fig. 3a) [40]. This means that, for real input samples, the Fourier transform can be completely specified by about half of the coefficients. Therefore, not all of coefficients need to be considered in the feature extraction step.

Fig. 3
figure 3

Derivation of the AZS order based on the properties of the Fourier transform for an 8×8 block, where an under-bar indicates a duplicated magnitude. a symmetry of Fourier magnitudes. b AZS order

Based on these observations, we propose an adaptive zigzag scanning (AZS) scheme for the Fourier transform of real input samples. Note that, in the Fourier transform, low frequency coefficients are centered at four corners. The proposed AZS starts the scanning process at the two upper corners, (0,0) and (N R,0), and the coefficients are ordered alternatively. Figure 3b shows an example of the AZS order for an 8×8 block where the duplicated magnitudes are included only once in AZS. Let L be the total length of the proposed ULPF feature vector containing h and f . The proposed algorithm rearranges the magnitudes of B in the AZS order and forms a length- (L−1) 1-D vector f , which is represented by

$$ \textbf{f}^{\prime}=\{f^{\prime}(0), f^{\prime}(1), \dots,f^{\prime}(L-2)\} $$
(8)

where f (i), i=0,1,…,L−2, is the i-th element in f . Here, f (i) is obtained by quantizing the magnitude of the i-th low-frequency coefficient as follows

$$ f^{\prime}(i)=\mathrm{Q}(|B(\textbf{u}_{i})|) $$
(9)

where Q(⋅) is a quantization operation and u i =[u i ,v i ]T represents the i-th coefficient position in the AZS order.

It should be noted that the feature header h is used only in the lexicographic sorting process. The Euclidean distance between two feature vectors, f and \(\tilde {\textbf {f}}\), is calculated without using h’s as follows

$$\begin{array}{@{}rcl@{}} ||\textbf{f}-\tilde{\textbf{f}}||_{2} &\triangleq&||\textbf{f}^{\prime}-\tilde{\textbf{f}}{}^{\prime}||_{2}\\ &=&\sqrt{{\sum}_{k=0}^{L-2}|f^{\prime}(k)-\tilde{f}^{\prime}(k)|^{2}}. \end{array} $$
(10)

Using the above measure, we determine whether the two patches are duplicated or not.

The novelty of the proposed ULPF descriptor is summarized as follows:

  • The ULPF magnitudes of duplicated regions are mathematically equivalent to each other even when the copied region is rotated and scaled. On the other hand, the FMT magnitudes in [5] are varied if the copied region is scaled.

  • The geometric transformation can be accurately estimated by using the upsampled log-polar grid. As mentioned, the LPF descriptor was proposed in the previous work [42]. However, the CMFD performance can be improved significantly by exploiting the upsampled grid.

  • The proposed feature vector consisting of the header and the body is very useful for the CMFD.

    • ✓ The feature header can improve the sorting performance by implicitly representing the entire characteristic of the circular patch.

    • ✓ The AZS scheme sorting the Fourier magnitudes based on their frequencies can optimally constructs the feature body. On the contrary, the LPF algorithm in [42] uses all transform coefficients containing redundant information.

The proposed ULPF descriptor can be efficiently implemented on the parallel processor. The proposed descriptor needs to computes the ULPF representations of the circular patches of which standard deviations are larger than a certain threshold. This process incurs a large amount of computational load. Therefore, if possible, we recommend to compute the feature vectors of circular patches in a parallel manner using multi-core CPU and GPU.

4 Improved verification process

In the verification step of the common CMFD processing pipeline, the geometric transformation is estimated using reliable matched pairs of which matching distortions are lower than a predefined threshold. We observed that a variety of tampering attacks such as rotation, scaling, blurring, and noising cannot be properly handled using a single fixed threshold. For example, in the case of plain CMF, a sufficient number of pairs satisfy the given constraint. However, if the copied regions are rotated and scaled, the number of pairs satisfying the constraint is too small to detect the geometric transformation. To address this problem, we improve the verification step of the common CMFD pipeline.

Suppose that all feature vectors have been lexicographically sorted in the matching step. Let f k be the k-th feature vector in the sorted list and \(\tilde {\textbf {f}}_{k}\) be its matched pair. In the lexicographically sorted list, \(\tilde {\textbf {f}}_{k}\) indicates f k+1. Let x=[x,y]T be a column vector indicating the pixel position in the image coordinates. And, let x k (\(\tilde {\textbf {x}}_{k}\)) be the center of the circular patch c k (\(\tilde {\textbf {c}}_{k}\)) from which f k (\(\tilde {\textbf {f}}_{k}\)) is extracted. The proposed verification algorithm first constructs the distortion list D containing the distortions of matched pairs. The proposed algorithm selects reliable matched pairs using D and estimates the geometric transformation using the reliable pairs. The proposed verification step proceeds as follows.

  • (1) The distortion list D is constructed by applying the following procedure to each feature vector pair, f k and \(\tilde {f}_{k}\).

    • (a) Calculate the spatial distance \(z_{k}=||\textbf {x}_{k}-\tilde {\textbf {x}}_{k}||_{2}\) between f k and \(\tilde {\textbf {f}}_{k}\). If z k β, this pair is verified as a false match (spatially too close regions) and its verification process is terminated. Otherwise, if z k >β, the algorithm goes to the next step.

    • (b) Compute the matching distortion \(d_{k}=||\textbf {f}_{k}-\tilde {\textbf {f}}_{k}||_{2}\) of the current pair. The resultant d k is inserted into the distortion list D in ascending order.

  • (2) Set the initialize value of the parameter w to 1. Examine the number of elements in D, which are less than w. If the number of elements is larger than γ, the algorithm goes to the next step. Otherwise, the algorithm repeatedly increases w by 1 until the number of elements is larger than γ. The updated w will be the input of the following step.

  • (3) Estimate the geometric transformation between duplicated regions using the pairs satisfying d k w. The proposed algorithm computes the affine transformation of the pairs using the SATS algorithm as recommended in [11]. The resultant affine parameters are used in the following post-processing step.

Through the above procedure, the proposed algorithm can efficiently handle different types of tampering attacks without introducing the computational overhead. In the post-processing step, the input image is geometrically transformed using the transformation parameters estimated in the verification step. The original and transformed images are overlaid to find closely matched regions. The regions that do not exhibit common behavior are removed in the post-processing step.

5 Experimental results

We evaluated the performance of the proposed feature descriptor by comparing it with existing ones including FMT, LPF, ZERNIKE, SIFT, and FREAK. All feature descriptors were implemented using a highly efficient ANSI-C code and the performance was evaluated on an Intel i7 3.4GHz CPU with 16 GB RAM. The source codes of FMT, ZERNIKE, and SIFT are available online [45] and the FREAK descriptor was implemented based on the code of SIFT. In our implementation, the overall CMFD process was accelerated with OpenMP [9].

Basically, we measured the forgery detection performance of all descriptors using the common CMFD processing pipeline introduced in [11]. We used the lexicographic sorting in identifying similar feature vectors in the matching step. For a fair comparison, the improved verification step in Section 4 was applied to the CMFD process of all feature descriptors. In the simulations, the parameters R, N, L, α, β, and γ were set to 16, 2, 65, 4, 100, and 200, respectively.

5.1 Datasets and evaluation criteria

There exist several benchmarking CMFD datasets for evaluating the performance of feature descriptors. In this paper, we use the realistic and challenging dataset introduced in [11]. The tampered images in the dataset were manually created by skilled artists. In addition, common noise sources, such as JPEG artifacts, noise, additional scaling or rotation, are automatically included using a software framework. The dataset also provides ground truth images which are very useful for the performance evaluation. The average size of the images is about 3000×2300 pixels. In our simulations, the image scotland that generates the tampered region using the saturated region is excluded.

To quantitatively evaluate the detection performance, we adopt two metrics, precision M p and recall M r , which are calculated as [15]

$$ M_{p}=\frac{\text{\#correctly detected pixles}}{\text{\#all detected pixels}} $$
(11)

and

$$ M_{r} = \frac{\text{\#correctly detected pixles}}{\text{\#all forged pixels}}. $$
(12)

Hence, precision is the fraction of pixels identified as tampered that are truly tampered and recall is the fraction of tampered pixels that are correctly classified as such. A trade-off exists between precision and recall. Larger precision might decrease recall and vice versa. To consider both precision and recall, we compute their harmonic mean M F , called F 1-score, as follows

$$ M_{F} = \frac{2M_{p} M_{r}}{M_{p}+M_{r}}. $$
(13)

Using these metrics, we show how precisely the tampered regions are identified.

5.2 Performance evaluation

We evaluate the performance of the feature descriptors for four CMF scenarios: rotation, scaling, JPEG compression, and additive white Gaussian noise (AWGN). Next, the measured CMFD processing times of six descriptors are presented.

5.2.1 Rotation invariance

In this scenario, the copied regions are rotated in the range of 0 and 10 in steps of 2. Further, we test three larger rotation angles of 20, 60, and 180. Figure 4 shows the measured results for the CMF with rotation. As shown in Fig. 4, the proposed ULPF descriptor usually provides the best precision and recall over the entire range of rotation angles. Especially, ULPF achieves a significant performance improvement for large amounts of rotation as compared to the existing feature descriptors. For example, M F of ULPF is almost double of those of FMT, LPF, and SIFT. Therefore, for the applications that need to detect the CMF with rotation, we strongly recommend the use of the ULPF descriptor. To show the result more clearly, we present the CMFD results of ULPF for the plane copy-move and the rotation of 20 in Figs. 5b and d, respectively.

Fig. 4
figure 4

Measured M p , M r , and M F for the CMF with rotation

Fig. 5
figure 5

Three test cases and their CMFD results using the ULPF descriptor. a plane CMF. b CMFD result of (a). c CMF with the rotation of 20. d CMFD result of (c). e CMF with the scaling of 120 %. f CMFD result of (f)

In our simulation, for small amounts of rotation, ZERNIKE also shows relatively good results. Further, we observed that the keypoint-based descriptors, SIFT and FREAK, yield stable results for the CMF with rotation. For the rotation of 180, the FMT and LPF descriptors also achieve relatively good detection performance.

5.2.2 Scale invariance

We investigate the case where the copied regions are scaled between 101 % and 109 % of its original size in increments of 2 % as well as 120 % and 200 %. Figure 6 presents the results for the CMF with scaling. As compared to the CMFD of rotation, most features show a relatively weak invariance. The proposed ULPF tends to exhibit the best scale invariance in the experiments. However, for the scaling of 200 %, SIFT achieves the highest M F among the descriptors. In general, if the scale factor is very large, the keypoint-based descriptors perform better than the block-based ones. In summary, the proposed ULPF can be used to handle a moderate amount of scaling which is often the case in real-world CMF manipulations. However, the detection performance of ULPF decreases sharply as the scale factor increases. Therefore, the keypoint-based descriptors are the better choice for relatively large scale factors. Figure 5f shows the CMFD result of ULPF for the scaling of 120 %.

Fig. 6
figure 6

Measured M p , M r , and M F for the CMF with scaling

5.2.3 Robustness to JPEG compression artifacts

Robustness to JPEG compression artifacts is investigated. The quality factor of JPEG is varied between 100 and 20 in steps of 10. In general, ULPF and ZERNIKE outperform the other methods. As shown in Fig. 7, M F ’s of the two methods moderately decrease as the quality factor decreases. For the high quality factors 100 and 90, M F of ULPF is slightly higher than that of ZERNIKE. Further, for the quality factors 80 and 70, M F of ULPF is almost the same as that of ZERNIKE. FREAK is the best feature for very low quality factors 30 and 20. It is worthwhile to note that the quality factor is usually equal to or larger than 70 for real-world forgeries. In our setup, the FMT and LPF descriptors yield a weak robustness to JPEG compression artifacts.

Fig. 7
figure 7

Robustness to JPEG compression artifacts

5.2.4 Robustness to Gaussian noise

We also evaluate the robustness of all feature descriptors to AWGN. We normalize the image intensities between 0 and 1, and added zero-mean Gaussian noise with standard deviations of 0.02, 0.04, 0.06, 0.08, and 0.10 to tampered regions. In Fig. 8, we clearly see that the detection performance of all descriptors sharply decreases as the standard deviation increases. When the standard deviation is equal to 0.02, the detection performance of ULPF is the highest. However, if the standard deviation is equal to or larger than 0.04, FREAK achieves higher M F than the other algorithms. Therefore, it can be seen that the keypoint-based methods tend to show a more stable performance than the block-based ones in terms of robustness to Gaussian noise.

Fig. 8
figure 8

Robustness to Gaussian noise

5.2.5 Computational complexity

The CMFD processing time of a descriptor varies depending on the complexity of its feature vector and the number of feature vectors. The measured processing times are listed in Table 1. As shown in Table 1, our implementation is highly optimized in terms of the processing time. In our simulations, FMT requires the highest processing time for extracting feature vectors. Accordingly, the total processing time of FMT is the highest among all the methods. We see that ZERNIKE also yields a relatively high processing time for feature extraction. However, the matching and verification time of ZERNIKE is relatively lower than the other block-based methods. This is because the feature length of ZERNIKE is shorter than those of the other block-based methods. The total processing time of ULPF is lower than those of FMT and ZERNIKE but it is higher than that of LPF. In our simulation, the total processing time of LPF is the lowest among the block-based algorithms.

Table 1 Average CMFD processing time (s)

As we expected, the processing times of the keypoint-based methods are much lower than those of the block-based methods. This is because the number of feature vectors of the keypoint-based methods are much smaller than those of the block-based methods. In our simulations, FREAK achieves the lowest processing time.

5.2.6 Detailed performance analysis

In order to provide more insight into the simulation results, we measured the performances of the upsampled log-polar grid, feature header, AZS, and improved verification schemes separately. We evaluated the effectiveness of each scheme by excluding the scheme from the CMFD processing pipeline. In detail, we measured the average change of M F by excluding each scheme. The measured results are listed in Table 2.

Table 2 Detailed Performance Analysis (Variation of M F )

As shown in Table 2, all schemes can improve the detection performance. Especially, the improved verification scheme introduced in Section 4 yields a significant performance enhancement for both the CMF with rotation and scaling. Further, we can see that AZS achieves a relatively high performance improvement for the scaling and the upsampled log-polar grid shows a high performance improvement for the rotation. The scheme that inserts the feature header into the feature vector consistently enhances the detection performance for both the CMF with rotation and scaling.

6 Conclusions

A new feature descriptor was presented for the efficient detection of CMF. The proposed ULPF descriptor has a solid theoretical background and its actual performance is superior than existing descriptors. Especially, the proposed descriptor achieves a very stable detection performance over the entire range of rotation angles. In addition, the proposed feature vector structure and AZS order can be utilized in a wide range of applications dealing with images in the Fourier domain.