Keywords

25.1 Introduction

Nowadays, CMOS sensors are widely used in video cameras because of their low prices and the ability to integrate with image processing circuits. A CMOS camera commonly adopts rolling shutter to read and reset the rows of the sensor array sequentially, making the capture time of each row slightly after the capture time of the previous row. When camera or object moves during image capture, the scene imaged by each row presents relative motion, resulting in geometric distortion called rolling shutter distortion. These distortions degrade the quality of images and making objects in videos appear dynamic non-rigid deformation.

Removing rolling shutter distortion needs to compute the relative displacements between rows in a frame. This computational task is more difficult than video stabilization in which only the relative motions between frames are computed. Recently the problem of removing rolling shutter distortions from videos has been studied by many researchers [18]. These researches focused on the distortions caused only by camera motion. 2D image processing methods were proposed to process the distortions induced by smoothly varying translation parallel to the image plane [2, 3, 6, 7]. The more complex distortions caused by camera rotation was studied using 3D rotation models [4], with a requirement of camera calibration.

In this paper, we propose a rectification method that can handle both camera translation and camera rotation, without requiring any information about the camera. Since the general image transformation is very complex, we do not intent to estimate the motion of each pixel exactly, but only to compute approximate frame motions that can be used to remove the dominating distortion caused by rolling shutter. A 2D rotation model is used to represent the transformation between two successive frames. The parameters of the model are solved by minimizing the measurement constraints on point correspondences. Based on the rotation model, the distortion caused by rolling shutter can be revealed. Experiments show that our method is comparable to the 3D rotation model [4] that beat other 2D methods in most situations.

We introduce the related work in Sect. 25.2. Section 25.3 gives our 2D rotation model and analyzes why the model works for our task. Our rectification method is described in section four. Section five analyzes the experimental results. At last, section six concludes the paper.

25.2 Related Work

Mechanical methods have been designed to avoid rolling shutter distortion by using a mechanical global shutter or adopting a mechanical image stabilization system. However, the mechanical global shutter will reduce the fill factor of image sensors, and the mechanical image stabilization system is not equipped in most consumer cameras. Post-processing methods are thus required to rectify the distorted videos.

The most important thing in rectifying rolling shutter distortions is to find the relative pixel displacement during the image exposure interval. This displacement is equal to the temporal integral of the image motion between the time instants at which the pixels are imaged. If a camera undergoes a constant translation parallel to the image plane, the constant motion velocity of the image can be directly computed using the global motion between two successive frames [2, 3, 7]. Liang et al. proposed a method that can deal with smoothly varying translation parallel to the image plane [6]. The high-resolution motion velocity of frames is computed using global motion estimation and parametric curve fitting. These methods suppose the points on the same row have the same motion, and cannot be used to rectify distortion induced by camera rotation.

Forssén and Ringaby are the first to compute 3D camera motion for rolling shutter rectification [4, 8]. As the motion of image is the result of camera motion through perspective projection, their method has a clear physical meaning. They used spherical linear interpolation to interpolate the camera rotation from the knots represented by rotation parameters, and then solved the parameters using inter-frame correspondences. This method can effectively recover the rolling shutter images distorted by camera rotation. But it needs camera calibration, which is not convenient in some situations. To facilitate controlled evaluation of different algorithms, Forssén and Ringaby generated synthetic rolling shutter sequences that can be downloaded from [9].

When a camera is attached to a moving vehicle, the camera may vibrate at a higher frequency than the frame rate of the camera. Baker et al. studied this problem using translational model and affine model [1]. The camera jitter was solved by optimizing the measurement constraints of the correspondences subsampled from the optical flow fields.

Cho and Hong used affine matrix to represent the transformation between two distorted frames [2]. Their derivation was based on uniform frame translation.

Experiments showed that the video stabilization methods proposed by Liu et al. could reduce rolling shutter distortions [10, 11]. Though the theoretical explanation for why this happened was not clear.

25.3 An Image Motion Model

We first analyze the relationship between rolling shutter distortions and image motion. Then a 2D rotation model is introduced.

25.3.1 Rolling Shutter Distortions

To rectify a distorted rolling shutter frame, the middle row of the frame is normally chosen as the reference. As show in Fig. 25.1, the aim of the rectification is to obtain a frame as if it is imaged by a global shutter camera at the time instant when the middle row of the frame is imaged. We define the time interval between the sampling instants of two successive rows as the unit time. The displacement of a pixel \( p_{i} \) during the time interval between the instant \( r_{i} \) it is imaged and the reference instant \( r_{m} \) is:

$$ d_{i} = \int_{{r_{i} }}^{{r_{m} }} {v(t)dt}, $$
(25.1)

where \( v(t) \) represents the motion velocity of the pixel. If the inter-frame delay of the camera is zero, the time interval \( r_{m} - r_{i} \) is at most half of the frame period. A nonzero value of inter-frame delay will decrease this rate further. So if the displacement of pixel \( p_{i} \) during the time interval between frames is small, the distortion \( d_{i} \) will be smaller, at most half of the former. Thus we can only consider large displacements of pixels to remove obvious distortions.

Fig. 25.1
figure 1

The image rectification model

25.3.2 A 2D Rotation Model

For two successive frames captured by a global shutter camera, the major motion of each pixel is rotation around a point in the image plane or translation along a direction. If a frame rotates around a point \( {\mathbf{c}} \) in the image plane through an angle θ, pixel \( {\mathbf{x}} \) in this frame will move to a new position \( {\mathbf{x^{\prime}}} \), which is:

$$ {\mathbf{x^{\prime}}} = f({\mathbf{x}},\theta ,c_{x} ,c_{y} ) = \left[ {\begin{array}{*{20}c} {\cos \theta } & { - \sin \theta } \\ {\sin \theta } & {\cos \theta } \\ \end{array} } \right](\left[ {\begin{array}{*{20}c} x \\ y \\ \end{array} } \right] - \left[ {\begin{array}{*{20}c} {c_{x} } \\ {c_{y} } \\ \end{array} } \right]) + \left[ {\begin{array}{*{20}c} {c_{x} } \\ {c_{y} } \\ \end{array} } \right]. $$
(25.2)

This equation models in-plane rotation. If the motion is translation, the pixels can be thought as rotating around a far away point located on the direction that is perpendicular to the motion direction. Then Eq. (25.2) can be used to approximately model translation if we choose an appropriate rotation center.

25.4 Image Rectification

Suppose we have point correspondences \( {\mathbf{x}} \) and \( {\mathbf{x^{\prime}}} \) in two successive frames, then:

$$ {\mathbf{x^{\prime}}} = {\mathbf{x}} + \int_{r}^{{N + r^{\prime}}} {v(t,x)dt} $$
(25.3)

where \( r \) and \( r^{\prime} \) are the row number of pixel \( {\mathbf{x}} \) and \( {\mathbf{x^{\prime}}} \), \( N \) is the number of rows in a frame. In Eq. (25.3), we suppose the inter-frame delay is zero. If we have a set of data correspondences, the measurement constraint of the correspondences is:

$$ E = \sum\limits_{k = 1}^{N} {d({\mathbf{x}}_{k} - {\mathbf{o}}_{k} )^{2} + d({\mathbf{x^{\prime}}}_{k} - {\mathbf{o^{\prime}}}_{k} )^{2} } $$
(25.4)

where \( {\mathbf{x}}_{k} \) and \( {\mathbf{x^{\prime}}}_{k} \) are point correspondence, \( {\mathbf{o}}_{k} \) and \( {\mathbf{o^{\prime}}}_{k} \) are relocated points. In Eq. (25.4), \( {\mathbf{o^{\prime}}}_{k} \) is computed by \( {\mathbf{o}}_{k} \) and \( v(t) \) using Eq. (25.3), \( {\mathbf{o}}_{k} \) and \( v(t) \) are parameters needed to be determined by minimizing the measurement constraint. If we don’t add any constraint on \( v(t) \), the parameters are hard to be solved even when we have enough correspondences.

25.4.1 Motion Computation

To compute the frame motion for image rectification, we first use a strong assumption to initialize the computation, and then use a weak assumption to finish the computation.

25.4.1.1 Initialization

If a frame uniformly rotates around a fixed center \( (c_{x} ,c_{y} ) \) during the capture interval of two successive frames, the rotation velocity \( v_{r} \) is:

$$ v_{r} = \theta /N, $$
(25.5)

where \( \theta \) represents the rotation angle of the frame in one frame interval. Then Eq. (25.2) can be rewritten as:

$$ {\mathbf{x^{\prime}}} = f({\mathbf{x}},(N + r^{\prime} - r)v_{r} ,c_{x} ,c_{y} ). $$
(25.6)

Using Eq. (25.6), \( \theta \), \( c_{x} \) and \( c_{y} \) can be easily solved by minimizing the measurement constraint. We use the sparse Levenberg-Marquardt algorithm given by Hartley and Zisserman [12] to solve this optimization problem.

Though the assumption of uniform rotation in two frame intervals seems strong, it can be satisfied by some frames in a video captured by a camera undergoing smoothly varying motion. To find such frames, we compute the measurement constraint for each pair of successive frames. The two frames giving the minimal value of \( E \) are considered to satisfy the assumption. They are chosen as the initial frames with their motion solved.

25.4.1.2 Sequential Computation

Now we can relax our assumption by assuming a frame uniformly rotates around a fixed center in only one frame interval. Based on the motion of the initial frames, we compute the motion of other frames in two directions, which is illustrated in Fig. 25.2.

Fig. 25.2
figure 2

Sequential computation of frame motion. The frame \( k \) is the initial frame with known motion

For the frames behind the initial frame \( k \), we first transform the frame \( k \) to the time instant when the last row was imaged:

$$ {\mathbf{x}}^{new} = f({\mathbf{x}},((N - r)/N)\theta (k),c_{x} (k),c_{y} (k)) $$
(25.7)

Then the distances between the points in correspondences are only caused by the motion of frame \( k + 1 \). The relationship between the corresponding points \( {\mathbf{x}}^{new} \) and \( {\mathbf{x^{\prime}}} \) is:

$$ {\mathbf{x^{\prime}}} = f({\mathbf{x}}^{new} ,(r^{\prime}/N)\theta (k + 1),c_{x} (k + 1),c_{y} (k + 1)) $$
(25.8)

Now we can get \( \theta (k + 1) \), \( c_{x} (k + 1) \) and \( cy(k + 1) \) by minimizing the Eq. (25.4).

To find the rotation of frames captured before frame \( k \), we first rotate frame \( k \) to the instant before the first row was imaged:

$$ {\mathbf{x}}^{new} = f({\mathbf{x}},( - r^{\prime}/N)\theta (k),c_{x} (k),c_{y} (k)) $$
(25.9)

The relationship between \( {\mathbf{x}}^{new} \) and the corresponding point \( {\mathbf{x^{\prime}}} \) in frame \( k - 1 \) is:

$$ {\mathbf{x^{\prime}}} = f({\mathbf{x}}^{new} ,( - (N - r^{\prime})/N)\theta (k - 1),c_{x} (k - 1),c_{y} (k - 1)) $$
(25.10)

This process is going on until every frame motion is solved.

25.4.1.3 Finial Solution

When we compute the motion of frame \( k + 1 \) or \( k - 1 \) based on the motion of frame \( k \), the error of the latter may decrease the quality of the computation. To handle this problem, for each frame we compare the solution computed in the initialization step with the solution acquired in the sequential computation step. The one with smaller value of \( E \) is chosen as the finial solution.

25.4.2 Frame Rectification

To remove the rolling shutter distortion in frame \( k \), the corrected position of pixel \( {\mathbf{x}} \) is:

$$ {\mathbf{x}}^{cor} = f({\mathbf{x}},((N/2 - r)/N)\theta (k),c_{x} (k),c_{y} (k)) $$
(25.11)

Because \( {\mathbf{x}}^{cor} \) may not fall on the image sampling grid, we perform cubic interpolation to acquire the recovered image. The function griddata in Matlab is used to do this work.

25.5 Experimental Analysis

In our experiments, point correspondences are computed by the optical flow algorithm proposed by Sun et al. [13]. The flow fields over the whole frame are sampled uniformly to obtain the point correspondences. Thus the acquired correspondences cover the whole frame.

25.5.1 Accuracy Comparison

We use the synthetic dataset provided by Forssén and Ringaby [9] to compare our method with their 3D models [4]. These 3D models beat previous 2D rolling shutter rectification methods on this dataset.

We use the thresholded Euclidean colour distance to compute the rectification accuracy as in [4]. The threshold is also set to 0.3. Figure 25.3 shows the results on sequence rot1_B0 in the dataset. In this sequence, the camera rotates in a spiral fashion. Our method gets nearly the same result as the 3D rotation model. Figure 25.4 gives the results on sequence rot2_B40. This sequence simulates a complex camera rotation in 3D space. The performance of our 2D rotation model is a little worse than the 3D rotation model. In Fig. 25.5, we compare the 3D rotation model and the 3D translation model with our 2D model on sequence trans_B40, in which the camera makes a pure translation. Our model is better than the 3D rotation model, and is comparable to the 3D translation model. In Fig. 25.6, we compare the 3D rotation model and the 3D full model with our 2D model on sequence trans_rot1_B40. This sequence is generated by adding a translation to the sequence rot2_B40. The accuracy of our 2D model is similar to that of the 3D rotation model, and is much better than the 3D full model.

Fig. 25.3
figure 3

Sequence rot1_B0. Left the 3D rotation model with 2-frame reconstruction window, our 2D rotation model, and uncorrected frames. Right the 3D rotation model with 3-frame reconstruction window, our 2D rotation model, and uncorrected frames

Fig. 25.4
figure 4

Sequence rot2_B40. Left the 3D rotation model with 2-frame reconstruction window, our 2D rotation model, and uncorrected frames. Right the 3D rotation model with 3-frame reconstruction window, our 2D rotation model, and uncorrected frames

Fig. 25.5
figure 5

Sequence trans1_B40. Left the 3D rotation model with 2-frame reconstruction window, our 2D rotation model, and uncorrected frames. Right the 3D translation model with 3-frame reconstruction windows, our 2D rotation model, and uncorrected frames

Fig. 25.6
figure 6

Sequence trans_rot1_B40. Left the 3D rotation model with 2-frame reconstruction window, our 2D rotation model, and uncorrected frames. Right the 3D full model with 2-frame reconstruction window, our 2D rotation model, and uncorrected frames

From above figures we can see that our 2D model works well for both rotation and translation. Even for complex camera rotation, it is comparable with the 3D method that requires camera calibration.

25.5.2 The Inter-Frame Delay

In our 2D rotation model, we simply assume the inter-frame delay is zero. In this section, we show that our method is not sensitive to the value of inter-frame delay. Figure 25.7 illustrates the accuracy comparison between the results with inter-frame delay set to zero and the results with inter-frame delay set to the true value. It can be seen that the results are nearly the same. Thus our method does not need to compute the value of the inter-frame delay.

Fig. 25.7
figure 7

Accuracy comparison with inter-frame delay = 0 vs. inter-frame delay = 40row. Top left Comparison on sequence rot1_B40. Top right Comparison on sequence rot2_B40. Bottom left Comparison on sequence trans1_B40. Bottom right Comparison on sequence trans_rot1_B40

25.6 Conclusion

In this paper, we propose a 2D rectification method for removing rolling shutter distortion caused by camera motion. This method approximately models the frame motion as 2D rotation. The parameters of the model are solved by minimizing the measurement constraint on point correspondences. Experiments show the method can effectively remove distortions caused by camera rotation and translation. Without requiring any information of cameras, the method can be used to process videos obtained from unknown sources.