Center of Mass-Based Adaptive Fast Block Motion Estimation

.


INTRODUCTION
Motion estimation underlies the foundation of motioncompensated predictive coding of video sequences.Efficient block matching algorithms (BMAs) have received considerable attention and have been adopted in modern video compression standards such as MPEG4, H.264/AVC, and WMV9 [1,2].
Several fast block matching algorithms, such as three-step search (TSS), new three-step search (N3SS) [3], four-step search (4SS) [4], diamond search (DS) [5], and block-based gradient decent search (BBGDS) [6], have been proposed to reduce computational complexity during the matching process by decreasing the number of search points.Based on the characteristic of center-biased motion vector (MV) distribution, the N3SS, 4SS, and DS algorithms were proposed in [3][4][5] for improving TSS algorithm performance when estimating small motions.These algorithms utilize the characteristic of center-biased MV distribution and use the halfway-stop approach to speed up stationary or quasistationary block matching.By employing the first step stop mechanism and the center-biased small square pattern, BBGDS [6] yields extremely small number of search points for zero motion.
On the other hand, some studies have applied one-bit transform (1BT) techniques for motion estimation.In [7,8], 1BT was utilized to assess whether a pixel was an edge pixel.The benefit of such a representation is that distortion between the reference block and search block can be computed very efficiently using an exclusive-or (XOR) function.The 1BT markedly reduces arithmetic and hardware complexity, and power consumption, while retaining good compression performance.
As block-based motion compensation is commonly utilized in video coding to eliminate temporal redundancy, a blocking effect is generated that decreases video quality.Thus, using a fixed block size for block matching is inappropriate.Although utilizing large blocks decreases bitrate, blocking effect increases.This phenomenon is caused by ineffective matching of the blocks straddling the moving zone boundary.Conversely, a small block size increases the number of MVs and, hence, requires additional bits to code the MVs.Therefore, numerous studies [9][10][11][12] have proposed quadtree-based variable block size segmentation approaches that utilize large blocks for the background to decrease the computational complexity, and small blocks for moving zone boundaries to improve prediction precision.However, considerable computations are required to obtain the difference, variance, or even MV from reference frames in top-down splitting or bottom-up merging approaches.
Moreover, some studies developed search techniques based on motion type to enhance speed and quality of BMAs.For example, Jiancong et al. [13] proposed the content adaptive search technique that clusters blocks within a frame into foreground and background regions based on video scene analysis.Parameters for motion characteristics for each region are extracted to identify a suitable search area and the initial search point.
This work proposes a novel adaptive fast block motion estimation algorithm based on center of mass (CEM), binary transform, subsampling, and horizontal/vertical projection techniques.A preliminary MV is computed based on the CEM difference between macroblocks, the CEM MV then classifies the moving direction and motion type to determine the initial search point and search patterns.As the conventional CEM calculation is computationally intensive, binary transform and subsampling techniques [15,16] are utilized to simplify CEM MV calculations; the binary transform CEM (BITCEM) is then obtained.Since CEM properties do not hold for particular scenarios, horizontal and vertical projections are applied to segment the blocks when the variable block size option is enabled.The BITCEM MV is not applied when a block is segmented.After classifying motion type, different search patterns are employed to obtain the MVs.
The remainder of this paper is organized as follows.Section 2 describes the proposed BITCEM and techniques that decrease the computational complexity and define search patterns.Section 3 describes in detail the proposed CEM-based BMA algorithm.Sections 4 and 5 present experimental results and the discussion, respectively.Conclusions are reported in Section 6.

PROPOSED BINARY TRANSFORM CENTER OF MASS
The principle in the CEM scheme, which has been utilized in previous imaging applications [17], was first applied in motion estimation.The shortcoming of the CEM technique is that it requires a massive amount of computations.Therefore, this study redefines the CEM of a moving zone by transforming the gray-level image into a binary-level image, thereby decreasing the number of operations.Based on this BITCEM approach, the CEM of a moving zone within a block and its direction of movement can be obtained rapidly.Four additional techniques are employed in this stud y to decrease computational complexity and maintain picture quality.All approaches utilized in the proposed search scheme are described as follows.

Center of mass
Motion of a CEM can represent rigid object motion.In this study, gray levels are regarded as the pixel mass.The definition of CEM is where I(i, j) is the gray level of (i, j) of a block, (i, j) is the coordinate of the CEM of a block, and (M, N) is the block dimension.Based on complexity, (1) require, considerable computation.
Example 1.For a 16×16 block using (1) to identify the block CEM, the following computations are required: Note that the number of additions in numerator is 16 × 16 − 1, whereas that in denominator is also 16 × 16 − 1.The number of additions for a horizontal or vertical component is the sum of that in the numerator and denominator-255 + 255.However, horizontal and vertical components can have a common denominator, indicating that the number of additions for both horizontal and vertical components is only 2 × 255 + 255 rather than 2 × (255 + 255).To obtain the MV between two CEMs, calculations must be applied in colocated blocks in previous and current frames.Consequently, the total number of additions doubles to 2×(2×255+255) = 1530.
When the mean absolute difference (MAD) is utilized as the criterion, then a search point requires 256 subtractions and 255 additions, implying that the computation of CEM is equivalent to approximately 11 search points, assuming that multiplication or division operations are four times the number of addition operations.
In the following section, the CEM is revised to decrease the computational complexity.

Revised center of mass with the binary transform
Notably, the additional effort required when calculating the CEM of a nonmoving zone within a block is unnecessary; consequently, the CEM of a moving zone is redefined to decrease computational effort.The binary transformation is applied to each block such that each pixel has a bi-level value and the bi-level image block is represented by P. The P(i, j) = 1 indicates that the (i, j) pixel is inside the moving zone, and P(i, j) = 0 indicates that the pixel is outside the moving zone.The BITCEM is defined as where P(i, j) is the binary level of (i, j) of a block, (i, j) is the coordinate of BITCEM of a block, and (M, N) is the block dimension.
Clearly, by utilizing ( 2) and ( 3), multiplication can be avoided when calculating BITCEM, and addition is only required when a pixel is located inside the moving zone, that is, when P(i, j) = 1.Take a 16 × 16 block as an example, the computations required in (2) and (3) are as follows: additions in maximum: 2 × 255 + 255 = 765, multiplications: 0, divisions: 2.
Similarly, to acquire the MV between two BITCEMs, calculations must be performed for both colocated blocks in the previous and current frames.Consequently, the maximum number of additions doubles to 2 × (2 × 255 + 255) = 1530, indicating that the computation of BITCEM is equivalent to roughly 3 search points in maximum, assuming that multiplication or division operations are four times the number of addition operations.Hence, this BITCEM formula markedly decreases the CEM computational complexity.

Definition of moving zone and BITCEM motion vector
In nature, an object may have uniformity or homogeneity of gray levels to some degree [18], suggesting that an object (the moving zone within a block) can be represented by a reference gray value.To eliminate false alarms or misdetection caused by noise prior to identifying a moving zone, the moving zone is assumed to be larger than a 5 × 5 pixel area.As movement of a moving zone generates gray-level differences, the current block B k is subtracted from its colocated block B k−1 to obtain block difference.A moving zone should be located at the position at which a large pixel difference exists, of which there are two cases.One position is located in the path of a moving direction, and the other position is located in the path of the opposite moving direction.Hence, this work searches for the largest pixel difference with the outermost coordinates in the quadrant indicated by the motion vector MV k−1 of the colocated block in the reference frame.Those outermost coordinates with the largest pixel difference are most likely a moving zone edge.One must then identify a ref- erence gray-level; it is best to adopt the pixel value inside the moving zone.Thus, according to the motion vector MV k−1 , pixel (i , j ) is located at the farthest location along the moving direction among the candidates with the largest gray level difference.To obtain the pixel inside the moving zone with the reference gray level, 5 is added or subtracted from horizontal and vertical coordinates based on the reverse moving direction to derive I k ( i, j) as the moving zone assumed larger than 5 × 5. Thus, Figure 1 shows the reference gray level of moving zone within the block k.
After obtaining the reference gray level I k ( i, j) for a moving zone, (2) and ( 3) are applied to locate the BITCEMs of moving zones within the current block B k and the colocated block B k−1 .The following are the steps defining a moving zone and BITCEM of a block.
Step 2. Use ( 2) and (3) to derive (i k , j k ), the BITCEM of current block B k .
The decision regarding a threshold (TH) value is based on the human perceptual characteristic.Thus, the BITCEM MV (mx, my) can be obtained using the following equations, Moving zone

Reference block
Current block A BITCEM MV can be obtained from two colocated blocks between successive frames (Figure 2).The following simple proof verifies that the BITCEM MV represents the MV of a moving zone and is taken as the basis of the proposed algorithm.
Theorem 1. Suppose that the moving zone will not move outside the block, the BITCEM MV then represents the MV of the moving zone.
Proof.Let the BITCEM of moving zone within the current block P k and the reference block P k−1 be (i k , j k ) and (i k−1 , j k−1 ), respectively.The BITCEM MV (m1, m2) is then defined by ( 4) and ( 5) as follows, Replace ( 4) by ( 2) to obtain where , representing the area of the moving zone within a block.Additionally, the motion quantity of all pixels within the moving zone is the same such that i k = i k−1 + Δi, where Δi is the motion quantity of a moving zone.Equation ( 7) can therefore be rewritten as By the same reasoning, my = Δ j.Clearly, the BITCEM MV is equivalent to the MV of the moving zone.

Subsampling
To obtain the BITCEM for a 16 × 16 block, at least 2 × 256 subtractions and 2 × 256 comparisons are required.Hence, computations for at minimum two search points is required.Moreover, an additional computation is required to calculate the BITCEM which is dependent on moving zone size.Hence, under the assumption that each pixel in a block has the same MV, the subsampling approach can be utilized to simplify the BITCEM computation.In this approach, the subsampling of the bi-level frame is applied with subsampling rates of 1, 2, 4, or 8 causing a small reduction in precision.As a trade off between computational complexity and picture quality, the subsampling rate is set to 4 as an adequate subsampling rate.The following is the mathematical proof for the subsampling approach employed in the BITCEM algorithm.Proof. then, where (i * , j * ) is the pixel coordinate of the pixel after subsampling, P(i * , j * ) is the pixel bi-level value (i * , j * ), and (i * , j * ) is the coordinate of BITCEM after sub-sampling.By the same reasoning, j = R × j * .Based on this deduction, the BITCEM of a block is the BITCEM of a block following subsampling multiplied by R. In the same manner, the BITCEM MV following pixel subsampling is equivalent to the MV of a moving zone: By the same reason, my = Δ j.

Classification of video motion types
To utilize computational resources efficiently, different search patterns are allocated to different video motion types.The (0, 0) BITCEM MV implies a still block.Table 1 lists the percentage of still blocks in each sequence using different subsampling rates.The still block percentage for the previous frame is utilized to classify the video into three BITCEM motion types: near-still motion, slow motion, and fast motion (Table 2).Table 2 is utilized as a reference for classifying BITCEM motion types when the percentage of still blocks (Table 1) is given.The three classification types of video motion are not arbitrary.First, the still block percentage in Table 1 is calculated.Each frame in an image sequence is then classified dynamically according to the classification rule in Table 2.
Notably, the still block percentage ranges (Table 2) are empirical values.As the background blocks always dominate a full scene, background blocks account for more than 75% of all video motion types.

Estimation of initial search point
The spatial and temporal correlations between blocks are significant characteristics for increasing the speed of the block matching algorithm [19].
(1) In consecutive frames, the moving zones are almost at the same velocity; consequently, the MVs of colocated blocks at consecutive frames are strongly correlated.(2) The MVs of neighboring blocks within the same frame are almost the same.
Consequently, when the MVs of certain blocks are identified, the linear prediction model MV [20] can be applied to predict the initial search point of the related block.
Let MV(i, j, k) be the MV of block (i, j) in the kth frame; then where d MV(i, j, k) is the MV difference between the MV and the estimation of the initial search point, and can be represented as where (p, q) is the coordinate difference between neighboring blocks and the current block; W1 and W2 are the ranges of weighted MVs in the current and previous frames, respectively; λ p,q,k and λ p,q,k−1 are weighted coefficients; λ p,q,k is the spatial correlation of MV(i, j, k); and λ p,q,k−1 is the temporal correlation of MV(i, j, k) (Figure 3).

Variable block size option
In addition to fixed block size (FBS) mode, the variable block size (VBS) option, including 8 × 8 and 16 × 16 block sizes, is proposed in this work.As the projection of the binary image retains considerable information, the projection can be widely utilized for object shape recognition [21].Horizontal projections (HP) and vertical projections (VP) that project a binary image in horizontal and vertical directions, respectively, are the two simplest projection methods.Blocks that produce zeros within 2 pixels from the middle of the current block will be horizontally or vertically segmented after horizontal or vertical projection; the block motion can then be estimated using small blocks.Horizontal projection is applied to the binary value block, resulting in a zero value in the horizontal direction (Figure 4).In the proposed algorithm, segmentation is applied in accordance with the horizontal projection HP(i) or the vertical projection VP( j).Almost no additional computations are required for binary image projections when obtaining the BITCEM.HP(i) and VP( j) are defined as where P(i, j) is the binary value of pixel (i, j) of a block and (M, N) is the block dimension.
Based on the assumption of a rigid object, the translation of a moving zone is M−1 i=0 HP(i) = N−1 j=0 VP( j) = Area, where Area is the area of a moving zone.Then, the BITCEM (i, j) can be rewritten as Based on this analysis, the BITCEM can be derived using the HP and the VP of a block with a binary value.Thus, only 64 multiplication operations are needed to obtain the BITCEMs of the current and reference blocks.The computation that has 256 additions is equivalent to a negligible 0.5 search point, assuming that multiplication operations are four times the number of addition operations.

Initial search point
In the proposed scheme, the current and reference blocks are first input to acquire the BITCEM MV; the percentage of the (0, 0) BITCEM MVs in the previous frame is then utilized to classify the three BITCEM motion types.This study alternated the conventional linear prediction model MV (as described in Section 2.5) with the proposed BITCEM MV as the initial search point, based on the BITCEM motion type.The BITCEM MV is applied in near-still and slow BITCEM motion types to acquire a precise initial search point, whereas for the fast BITCEM motion type, the linear prediction model in ( 13) is adopted instead.

Segmentation
When the VBS option is enabled (as described in Section 2.6), the proposed scheme determines whether segmentation is required after identifying the initial search point.Both HP and VP employ the derivatives of BITCEM calculation to determine whether the block requires segmentation.The BITCEM of the original 16 × 16 block is not used as the original block has been segmented.This BITCEM fails to represent the BITCEMs of multiple moving zones within the block.For simplification, the BITCEMs of the subblocks after horizontal and vertical segmentations are not calculated.The BITCEM MV calculated prior to segmentation is replaced by (0, 0) as the initial search point.

Search patterns
Based on BITCEM motion directions and motion types, different search patterns with different search strategies are proposed (Figures 5(a) and 5(b)) to estimate a motion vector with increased precision.For near-still and slow BITCEM motion types, concentrated search patterns are applied, whereas for fast BITCEM motion type, dispersed search patterns are applied for fast BITCEM motion types.Additionally, alternative search patterns are introduced into the scheme to further decrease the number of search points when attempting to retain picture quality.
When the BITCEM MV is not (0, 0), some additional points, in addition to the points close to the center, are added along the BITCEM moving direction (horizontal, vertical, sloped, inverse-sloped) to improve search precision.For a BITCEM moving horizontally or vertically, additional search points, such as SP3H/SP4H or SP3V/SP4V, are allocated to horizontal or vertical directions, respectively.Regardless of the direction in which the BITCEM is moving, the search patterns contain points close to the center to employ the characteristic of center-biased distribution for near-still and slow BITCEM motion types.For the fast BITCEM motion type, points in a circular shape are added to locations far from the center.To accommodate all directions with a slope other than straight horizontal or straight vertical BITCEM moving direction, defined as sloped or inverse-sloped, concentrated and dispersed search patterns, such as SP5S or SP5IS, are combined for all BITCEM motion types.During the next search step, when the frame is the near-still or slow BITCEM motion type, SP6 or SP1 is allocated alternatively around the best match candidate of the first search step to acquire the final MV.
When the BITCEM MV is (0, 0), no directionally biased search points are allocated.The SP1 search pattern is applied for near-still or slow BITCEM motion types and SP2 is applied for fast BITCEM motion type.When the block requires segmentation, a single search pattern, SP7, is utilized as the initial search pattern.
The proposed algorithmic process is summarized as follows.
Step 1. Input the current block.
Step 3 (VBS option).If any zero value is located in the middle of the HP(i) or VP(j), then the block is segmented.
Step 4 (VBS option).When the block is segmented, the initial MV is then assigned to be (0, 0); go to Step 7.
Step 5. Classify the BITCEM motion types according to the percentage of the (0, 0) BITCEM MV.
Step 6. Assign the initial search point ((0, 0) is suitable for segmented blocks, BITCEM MV for near-still and slow BITCEM motion types, and linear prediction model MV for fast video motion type) and allocate the search pattern based on the BITCEM motion type and direction of BITCEM motion.
Step 7. Begin searching in accordance with the initial search pattern.
Step 8. Continue searching from the best match point via Step 7 using the next search pattern.
Step 9. When the best match point is (0, 0) during a search iteration, stop the search and go to Step 1 (when the block is segmented, continue searching other subblocks); otherwise, continue searching based on the next search pattern.
(1) Average mean square error: since the focus of this work is on motion estimation rather than the whole coding scheme, only the difference between the reconstructed frame via motion compensation and the original frame is compared.That is, the residual frame is not added to the reconstructed frame to clarify the comparison of each BMA.Notably, MSE is inversely correlated with picture quality.
(2) Picture deterioration percentage: this criterion measures the difference in MSE between each algorithm and the FS algorithm divided by the MSE of the FS algorithm.Deterioration percentage is inversely correlated with picture quality.
(3) Complexity/block: complexity is a measure of the number of search points for each algorithm.Take BITCEM for example, since each search point requires 256 subtractions and 255 additions, the complexity of BITCEM is calculated as follows: complexity = search points + BITCEM computation/511 (search points).(16) Complexity is inversely correlated with coding speed.
(4) Speedup: speedup represents the complexity of the FS algorithm divided by that of each algorithm.
The FBS mode (Table 3) demonstrates that the proposed algorithm decreases computational complexity significantly.The algorithm is 13-20 times faster than the FS algorithm.Additionally, based on MSE/pixel comparison results, the proposed algorithm renders the best picture quality and fewest search points compared with those of TSS, N3SS, 4SS, and DS except for the Football and Carphone sequences.Although BBGDS requires the fewer search points than the other algorithms, it is likely to be trapped into a local minimum for video sequences with large motion content.The proposed algorithm requires slightly more search points than BBGDS and retains superior MSE performance.
The VBS mode (Table 3) demonstrates that the speed of the proposed algorithm remains high, and generates better picture quality than the FS algorithm with fixed block size, such that the deterioration percentage comparison results in negative values.That is, the algorithm takes advantage of the calculation for BITCEM MV to further increase picture quality without excessive computations in the projection technique utilized when determining whether to segment the block or not.The proposed algorithm costs only 5.38% to 8.22% of the computation cost required by FS to enhance 0.21%-9.67% of the picture quality generated by FS.Thus, the proposed BITCEM-based adaptive BMA with variable block size technique effectively eliminates the blocking effect, thereby improving the precision of motion estimation.Experimental results justify the motivation and robustness of the proposed scheme.

DISCUSSION
The threshold TH is inversely correlated with the degree of uniformity of the moving zone gray level, and positively cor-related with the moving zone size.Thus, when the TH is set as a large value, then an increased number of pixels fall within the range, in which P k (i, j) = 1.That is, the moving zone area estimated by the number of P k (i, j) = 1 or P k−1 (i, j) = 1 enlarges.Considering the uniformity of the moving zone gray level, 40 is the empirical optimal threshold that attains a satisfactory result for all video sequence types suggesting that 40 as the threshold generates the most likely moving zone and the most accurate BITCEM.In "Football" and "Carphone" fast motion sequences, many blocks break the assumption in Theorem 1; consequently, the moving direction of the BITCEM cannot be accurately estimated, and, hence, a correct search pattern for successive block matching cannot be applied.
Furthermore, like all BMAs, three assumptions are required: (1) no object distortion while moving; (2) a single moving object within a block; and (3) the object will not move outside a block.The following discusses the impact on BITCEM robustness when any one of the three assumptions does not hold.
(1) No object distortion while moving (rigid object translation): the nonrigid object translation problem cannot be solved using BMAs.When this assumption is violated, any BMA fails to find a similar block as the best match.This typically results in a large prediction error.
(2) A single moving object within a block: when there is more than one moving zone in a block, only one reference pixel will fail to represent the gray levels for multiple moving zones.This issue may be solved using the VBS option with the proposed H/V projection segmentation.
(3) The moving zone stops outside a block.When the moving zone moves out of a block, then the reference pixel cannot be located to perform a binary transform and successive CEM operation.Since a moving zone moving out of a block is prone to happen in fast video motion, the mechanism of detecting the assumption break can be performed by classifying the BITCEM motion type.As this assumption breaks, the moving zone does not exist in the current block, which leads to an inaccurate BITCEM MV.The proposed algorithm applies a linear prediction model MV rather than CEM MV for an initial search point.
Moreover, the approach for performing block size variation is computationally efficient, as HP and VP are derived with the BITCEM calculation, for which no overhead is required.On the other hand, by enabling the VBS option, a specific block size can be determined.Although this study only simulated the block sizes of 16 × 16 and 8 × 8, which are adopted in MPEG-4 and H.263, the proposed scheme can be extended to block sizes of 16 × 8/8 × 16/8 × 4/4 × 8/4 × 4 for H.264 via further horizontal and/or vertical segmentations.Consequently, the same motion search algorithm is unnecessary for each block size to decrease significantly the number of search steps [14].
Considering the conditional branches, there are 5 conditional branches in the proposed scheme when the segmentation branch condition is not taken: (1) to check if the block is required for segmentation (one conditional branch); (2) to determine the initial search pattern and next search pattern based on the value of the BITCEM MV and BITCEM motion types (two conditional branches); and (3) to determine successive search patterns by the BITCEM motion type and the termination condition whether the best match point is in the center (two conditional branches).Conversely, when the segmentation branch condition is taken, there are only 2 conditional branches: (1) to check if the block is required for segmentation (one conditional branch); and (2) to determine the successive search patterns by the termination condition whether the best match point is in the center (one conditional branch).Note that the penalty is very different for each conditional branch in each BMA and the penalty varies with the way how a BMA is implemented in software or hardware.

CONCLUSION
This study presented a novel adaptive motion estimation based on CEM.The proposed scheme primarily focuses on accurately predicting the moving direction and motion quantity of a block to increase matching process efficiency (including speed and precision).The principal approaches applied are the CEM via binary transform, subsampling, predictive search, classification of video motion types, arrangement of search patterns, and variable block size.To decrease computational complexity, a binary transform approach with colocated measures (e.g., reference pixel estimation and empirical threshold finding) is utilized.Subsampling is applied to further decrease the number of computations, which is the best method of descreasing overhead generated when calculating a binary transform and CEM.When the VBS option is enabled, the horizontal/vertical projections of a binary transformed macroblock are employed to determine whether the block requires segmentation.Experimental results show that the VBS mode generates the best picture quality with a slight increase in overhead complexity.
When the FBS mode is adopted, its speed is close to the first step-stop BBGDS algorithm with the fewest search points (complexity/block), and picture quality, with the exception of Football and Carphone sequences, remains the highest among FS, TSS, N3SS, 4SS, DS, and BBGDS algorithms.
Experimental findings demonstrate that the proposed algorithm is an efficient BMA that is robust in prediction quality and descreasing the computational complexity to fit all benchmark sequences.

Figure 1 :
Figure 1: The reference gray levels of moving zone with moving directions to the top left.

Figure 2 :
Figure 2: The relationship between block motion and BITCEM motion.

Figure 3 :
Figure 3: Estimation of initial search point.

Table 1 :
Still block percentage with different sub-sampling rate.

Table 2 :
Classification of video motion type.