1 Introduction

Motion estimation is most vital difficulties for development of video coding applications. Between all motion estimation methodologies, block matching (BM) algorithm is the utmost standard techniques because of efficiency and effortlessness in case of both software as well as hardware executions (Barjatya 2004; Feghali 2005). BMA methodologies take responsibility that measure of pixels inside a definite area of current frame can be demonstrated as a paraphrase of pixels confined in the earlier frame. So in this process, motion vector is found by lessening sum of absolute differences (SAD) or mean absolute difference (MAD) created by the micro-block of the existing frame over a determined examine window from earlier frame. The SAD estimation is computationally costly and exemplifies the overwhelming task in BMA progression. The most popular BMA technique is full search (FS) algorithm (Lin and Tai 1997) that finds most precise motion vector, computing comprehensively SAD values for most of the element of examine window. In recent years several fast BMAs have been recommended to diminish the number of SAD tasks by computing simply a stable subsection of search positions at the expense of poor accurateness. A general mechanism of block-matching algorithm is shown in Fig. 1. There are several video compression criterions have been suggested for different applications such as H.261, MPEG-1 and MPEG-2 (Sullivan and Wiegand 2005; Sullivan et al. 2012). Generally it has been seen that most of the multimedia data are constituted by video data (Wiegand 2003). Coding of video is also very important for the usage of limited amount of bandwidth as well as storage medium. High quantity of compression is enabled by temporal correlation which generally exists between successive image frames (Pal 2015). To exploit temporal correlation, motion estimation is an important tool (Luo and Konofagou 2010; Dikbas and Altunbasak 2013).

Fig. 1
figure 1

Mechanism of block-matching algorithm

Block based motion estimation with non-overlapping rectangular blocks are widely used in numerous video coding standards (Bergen et al. 1992; Karunakar and Pai 2009). In this case, frames of image are divided into several non-overlapping blocks and the match who has been found to be best, searched within a pre-defined search range using all possible positions for each block (Jung and Ye 2010). Although full FS algorithm which is also called as exhaustive search (ES) algorithm provides optimal quality but pointedly it suffers from computational load (Brünig and Niehsen 2001). Various motion estimation methods exist but block matching is used for all most all type of international standards video compression such as MPEG1, MPEG2, MPEG4 and H.264 (Ertürk 2007; Luo et al. 2008). A general video compression flow process has been shown in Fig. 2. Motion Estimation for Block matching is the procedure for so many compensated video coding standards where temporal redundancies between consecutive frames are efficiently removed (Luo et al. 2008). BMA begins by dividing a reference image up into equal size squares blocks. Then each block in the reference image is compared to a block in current image. Blocks in current image which are centered at different locations called search points inside a designated searching area of the current image called searching window (Farsiu et al. 2004). The distortion is calculated between two blocks centered at each designated search point and the point which has the lowest distortion is the center for the reference block in current image. Distortion can be calculated by means of number of different ways including mean square error (MSE), MAD as well as SAD technique. Selecting number of search points and pixels which become search points is a main difference for many of the block matching algorithms (Gao et al. 2000).

Fig. 2
figure 2

Video compression flow progression

As we know most accurate and most expensive algorithm is FS algorithm which checks every point inside the search window. Search patterns in block matching motion estimation with altered size or shape and also the individualities of motion vector which are centrally biased have a huge influence on examining quality and speed of performance. Many other algorithms have tried to enhance computational complexity level of FS algorithm despite the fact maintaining a reasonable level of accuracy. Some such algorithms are three step search (TSS), four step search (FSS), diamond search (DS) and logarithmic search (LS) (Jing and Chau 2004; Po and Ma 1996a) but most of the algorithms have poor accuracy as well as computational complexity is too large.

2 Block matching algorithms

2.1 Three step search algorithm

Three step search algorithm (TSSA), suggested in (Li et al. 1994), is a satisfactory search mechanism. It turns out to be very standard algorithm because of simple and performance is robust and optimum. It examines for the finest motion vector to make search pattern precise. The TSSA is process which confines total number of inspection points in examines area. Also TSSA uses a homogeneously circulated search pattern in every single stage and for that reason displays straightforwardness and uniformity. One major difficult that happens with the TSSA, uses a homogeneously distributed inspection point shape in 1st phase which further turn out to be ineffective for insignificant motion estimation. Figure 3 shows the TSSA algorithm with selection of block in different stages.

Fig. 3
figure 3

Block search pattern in TSSA

There are also various step search algorithm related to TSSA have been proposed by authors in Sun et al. (2009) and Kim and Choi (1998), but most of the BMAs have disadvantages in terms of accuracy, complexity and as well as in performances. Authors in Jing and Chau (2004) and Kim and Choi (1998) proposed new three step search algorithm by changing some search pattern which reflects better results in comparison of exiting TSAA but it is also not up to the mark for practical implementation because of its complexity.

2.2 Four step search algorithm

Authors in Po and Ma (1996a 1996b) first proposed four step search (4SS) algorithm which is basically based on actual world image agreement’s distinguishing of midpoint-influenced motion and half way stop methods. This algorithm also activities the midpoint influenced appearances of the actual world video orders by means of using a lesser preliminary step size related with TSSA. The preliminary step size is 4th of determined motion dislocation. By reason of lesser preliminary step size, 4SS algorithm wants four examining steps to touch the periphery of a examine window. A general 4SS process has been shown in Fig. 4 where all the four stages are shown to find the pattern.

Fig. 4
figure 4

Block search pattern in 4SS algorithm

There are also various step search algorithm related to 4SS have been projected by authors in Nisar and Choi (2000), but most of the BMAs have disadvantages in terms of accuracy, complexity and as well as in performances.

2.3 Full step search algorithm

FS algorithm method counterparts all probable displace contender block inside the search area in reference frame with the purpose of discover the block with minimum distortion (Brünig and Niehsen 2001). So FS algorithm has large motion and more searching point to do the blocks matching and thus the computations may be too complete. A general FS algorithm search pattern is shown in Fig. 5. FS algorithm is frequently used in hardware execution of motion estimation for the reason that of its consistency and its simplicity ideal result. The commonly used metric is to find the finest match for FS algorithm in hardware is the MAD technique (Cai and David Pan 2012). To find the minimum MAD from all candidate blocks, performing search iteration for each candidate block. The performance of existing FPGA technology permits to new designs to resolve the motion estimation problem.

Fig. 5
figure 5

Block Search pattern in FS algorithm

This work defines a study about recent motion estimation designs (Karunakar and Pai 2009). FS algorithm is a computationally expensive BMA of all existing algorithm. Cost function in this FS algorithm is calculated at each and every possible place in search or examine window. As an outcome, it finds finest likely match and gives the maximum PSNR amongst any BMA. Fast block matching algorithm attempts to attain similar PSNR doing as slight calculation as probable. The noticeable shortcoming of FS is that number of computations required is very high since a contender block is matched with all reference blocks in search window thus making, processing slower (Ouyang et al. 2012; Kim 2005).

A unique diamond search method which is planned to be realized in this paper are totally different from exiting all BM algorithm and we compare the performance with FS algorithm as well as with some other existing BM algorithm in reference of total number of essential search points, PSNR and computational complication (Yaakob et al. 2013; Ouyang et al. 2012).

3 Evaluation metrics

An evaluation criterion for matching the macro-block in frames with a different block is on the basis of cost function. Most standard in reference of computational expense is MAD as well as MSE. MAD can be calculated by Eq. (1):

$$ MAD = \frac{1}{{N^{2} }}\mathop \sum \limits_{i = 0}^{n - 1} \mathop \sum \limits_{j = 0}^{n - 1} \left| {C_{ij} - R_{ij} } \right|. $$
(1)

MSE can be calculated by using Eq. (2):

$$ MSE = \frac{1}{{N^{2} }}\mathop \sum \limits_{i = 0}^{n - 1} \mathop \sum \limits_{j = 0}^{n - 1} \left( {\left| {C_{ij} - R_{ij} } \right|} \right)^{2} , $$
(2)

where N is size of the macro-block, \( C_{ij} \) is pixels compared in the Present macro-block and \( R_{ij} \) is pixels compared in reference macro-block.

Goodness of a match is also found by SAD criteria (Vassiliadis et al. 1998; Niitsuma and Maruyama 2010). It is similar to MSE criteria but as an alternative of totaling the value of square differences, absolute differences are added which is given by Eq. (3):

$$ SAD = \frac{1}{{N^{2} }}\mathop \sum \limits_{i = 0}^{n - 1} \mathop \sum \limits_{j = 0}^{n - 1} \left| {C_{ij} - R_{ij} } \right|. $$
(3)

Motion compensated image which is generated using motion vectors as well as macro-blocks from reference frame is categorized as PSNR which is calculated by Eq. (4):

$$ PSNR = 10\log_{10} \left( {\frac{Peak\;to\;Peak\;Value}{MSE}} \right)^{2} . $$
(4)

4 Proposed algorithm

A novel diamond search (DS) algorithm has been implemented. There is pre-assumption in proposed algorithm is that motion vector should must be centrally biased. There are 25 crosses which are showing all different–different checking points inside the circle which has been shown in Fig. 6. Proposed DS algorithm basically using three search patterns and these are,

Fig. 6
figure 6

Search pattern of proposed algorithm

  • Small diamond search pattern.

  • Medium diamond search pattern.

  • Large diamond search pattern.

Small diamond search pattern defined with 5 cross points with one search point is at center and another 4 point is around center cross point with length value 1 which is shown in Fig. 7. Small diamond search pattern is last stage in proposed DS algorithm to find the best matches in the current frames. Medium diamond search pattern defined with 9 cross points with one search point is at center, 4 point is around center cross point with length value \( \sqrt 2 \) and another 4 points is around center with length value 2 which is shown in Fig. 8. Large diamond search pattern defined with 13 cross points with one search point is at center, 8 point is around center cross point with length value \( 5 \) with different–different arrangement and another 4 points is located at extreme end of diameter of circle with length value 3 which is displayed in Fig. 9.

Fig. 7
figure 7

Small diamond search pattern

Fig. 8
figure 8

Medium diamond search pattern

Fig. 9
figure 9

Large diamond search pattern

The proposed algorithm runs as follows:

Step 1: Begin with search position at midpoint. Search 24 positions about origin using diamond search pattern. Select between 25 positions that are searched, one which has smallest cost function. If smallest weightiness is achieved at midpoint for search window then follow the Step 2 and if smallest weightiness is achieved at one of the 24 positions other than midpoint then fix new origin to this position and start repeating Step 1.

Step 2: Fixe new search origin and start search 8 positions. Select between 9 positions that are searched, one with smallest cost function. If smallest weightiness is achieved at midpoint for search window then follow the Step 3 and if smallest weightiness is achieved at one of the 8 positions other than midpoint and fix new origin to this position and starting repeating Step 2.

Step 3: Fix new search origin and repeat search process to evaluate position with smallest weightiness. Choose position with smallest weightiness as motion vector.

5 Simulation results and discussion

Simulation of proposed algorithm is done using MATLAB tool and also results have been compared to the some existing algorithms. The below plots reflects the average number of searches which is essential per macro block for video sequence for different–different algorithms. Simulation parameters with its value are listed in Table 1.

Table 1 Simulation parameters

5.1 Three step search algorithm

Figure 10 represents the plot for computational level (search per macro block) for TSS algorithm for frame number 0–33. For better understanding of this plot Table 2 shows the number of computations needed for TSS algorithm.

Fig. 10
figure 10

Computation levels for TSS algorithm

Table 2 Computations needed for TSS algorithm

Figure 11 represents plot for PSNR values for TSS algorithm for frame number 0–33. For better understanding of this plot Table 3 shows the PSNR values for different frame numbers.

Fig. 11
figure 11

PSNR levels for TSS algorithm

Table 3 PSNR values for TSS algorithm

5.2 Four step search algorithm

Figure 12 represents the plot for computational level (search per macro block) for 4SS algorithm for frame number 0–33. For better understanding of this plot Table 4 shows the number of computations needed for 4SS algorithm.

Fig. 12
figure 12

Computation levels for 4SS algorithm

Table 4 Computations needed for 4SS algorithm

Figure 13 represents plot for PSNR values for 4SS algorithm for frame number 0–33. For better understanding of this plot Table 5 shows the PSNR values for different frame numbers.

Fig. 13
figure 13

PSNR levels for 4SS algorithm

Table 5 PSNR values for 4SS algorithm

The PSNR result of 4SS algorithm is improved compared to the TSS algorithm with the significant amount (see in Figs. 11, 13). TSS takes average value around 23 searches for every macro-block while 4SS takes around 19 searches per macro block and hence drops the search per macro block (see in Figs. 10, 12) which reflects that performance is also improved and computational complexity has been decreased up to significant level.

5.3 Full search algorithm

Figure 14 represents the plot for computational level (search per macro block) for FS algorithm for frame number 0–33. For better understanding of this plot Table 6 shows the number of computations needed for FS algorithm.

Fig. 14
figure 14

Computation levels for FS algorithm

Table 6 Computations needed for FS algorithm

Figure 15 represents plot for PSNR values for FS algorithm for frame number 0–33. For better understanding of this plot Table 7 shows the PSNR values for different frame numbers.

Fig. 15
figure 15

PSNR levels for FS algorithm

Table 7 PSNR values for FS algorithm

5.4 Proposed diamond search algorithm

Figure 16 represents the plot for computational level (search per macro block) for proposed DS algorithm for frame number 0–33. For better understanding of this plot Table 8 shows the number of computations needed for TSS algorithm

Fig. 16
figure 16

Computation levels for proposed DS algorithm

Table 8 Computations needed for DS algorithm

Figure 17 represents plot for PSNR values for DS algorithm for frame number 0–33. For better understanding of this plot Table 9 shows the PSNR values for different frame numbers.

Fig. 17
figure 17

PSNR levels for DS algorithm

Table 9 PSNR values for proposed DS algorithm

The PSNR result of proposed DS algorithm is almost same (average 27) as the PSNR results of FS algorithm (see in Figs. 15, 17) and far better than TSS and similar to 4SS algorithm but main problem with 4SS algorithm is that it not searches all the macro-blocks in specified area which leads to sometimes inaccuracy in performances while DS algorithm follow one of the best hierarchy to search the macro-blocks to find the best matches in the specified area and leads to the almost same performance compared to the FS algorithm. FS takes average value around 207 searches for every macro -block while DS takes around 18 searches for every macro- block and hence drops number by the huge margin (see in Figs. 14, 16) which reflects that performance is almost similar to FS algorithm and computational complexity has been decreased up to significant level. Experimentally it has been also found that 88–99% of the motion vectors are found inside the circle which has radius of 3-pixel unit and centered on location of zero motion.

6 Conclusion

The proposed DS algorithm outperforming well compared to the existing algorithms in reference of PSNR, required number of search points and computational complexity. Experimentally 88–99% of the motion vectors are found inside the circle which has radius of 3-pixel unit and pinpointed on place of zero motion. The proposed process is used to implement various standards examples such as MPEG1 and MPEG4.In future results can be also improved by considering various parameters such as matching criteria, outsized search region for large multifaceted motions, small search region for less multifaceted motions as well as bidirectional motion approximation. There is a further scope to develop new methods which will decrease complication of MPEG video coding.