J.-B. Chun et al.: Suppressing Rolling-Shutter Distortion of CMOS Image Sensors by Motion Vector Detection
Contributed Paper
Manuscript received August 22, 2008 0098 3063/08/$20.00 © 2008 IEEE
1479
Suppressing Rolling-Shutter Distortion of CMOS Image Sensors
by Motion Vector Detection
Jung-Bum Chun, Hunjoon Jung, and Chong-Min Kyung, Senior Member, IEEE
Abstract - This paper focuses on the rolling shutter
distortion of CMOS image sensor coming from its unique
readout mechanism as the main cause for image degradation
when there are fast-moving objects. This paper proposes a
post image processing scheme based on motion vector
detection to suppress the rolling shutter distortion. Motion
vector detection is performed based on an optical flow
method at a reasonable computational complexity. A
practical implementation scheme is also described.
Index Terms - CMOS image sensor, rolling-shutter distortion,
post-processing technique
I. INTRODUCTION
Since solid-state image sensors replaced films in the consumer
electronics market, more and more digital gadgets such as
cellular phones and PDA’s are increasingly equipped with
digital camera function. Image sensor, which converts photons
into electrons via photoelectric effect, consists of photodiodes or
phototransistors as light-sensing area and peripheral circuitry to
control the signal read-out. Image sensors are classified into
CCD image sensor and CMOS image sensor (CIS). While CCD
can store charges like a memory and can transfer them by means
of controlling gate voltages [1], MOSFET’s in CIS can only
transfer charges from photodiodes to readout circuitry without
storing them.
CCD is fabricated through its dedicated fabrication process and
is generally known for better image quality than CIS. But a
major drawback of CCD is the process incompatibility with
CMOS, which makes it difficult to implement peripherals such
as timing generator and analog-to-digital converter on the same
die with the image sensor. On the other hand, CIS can be built
on the same chip as their peripherals due to the process
compatibility, which gives CIS an economical advantage over
CCD.
Another important difference between CMOS and CCD lies
in the signal readout mechanism. While all photodiodes of
CCD are exposed to a scene simultaneously to obtain signals
corresponding to an image frame, each CIS row, being
sequentially accessed, is given a different exposure time
window as shown in Fig. 1 (a). We call the readout
mechanism of CIS rolling shutter (RS) mechanism and that of
CCD synchronous shutter (SS) mechanism.
Jung-Bum Chun is with the Electrical Engineering Department, Korea
Advanced Institute of Science and Technology, Daejeon, 305-701, Korea (e-
mail: jbchun@vslab.kaist.ac.kr).
Hunjoon Jung is with Clairpixel Co., Ltd., Seoul, 153-803, Korea (e-mail:
henry@clairpixel.com).
Chong-Min Kyung is with the Electrical Engineering Department, Korea
Advanced Institute of Science and Technology, Daejeon, 305-701, Korea (e-
mail: kyung@ee.kaist.ac.kr).
The RS mechanism does not incur any problem as long as
the object and the camera are stationary with each other. If
either one is moving with respect to the other, then the RS
mechanism will produce distorted images as shown in Fig. 1
(b). Three images in the left were taken when the panel is
stationary whereas the images in the right were taken when
the panel is rotating clockwise. It is shown that distortion
patterns are different according to the direction of motion of
objects.
Fig.1 Rolling shutter mechanism. (a) Integration time ( IΔ ) of each row,
shown as shaded region, is sliding downwards as each row is sequentially
accessed to form an image. D and b denote the timing delay between the
first and last rows and the blank time between two consecutive frames,
respectively. MIΔ denotes the integration time when a mechanical shutter
is employed. (b) Three images in the left are taken when the panel is
stationary while the others, showing different distortion patterns according
to the direction of motion, are taken when the panel rotates.
This paper proposes a post-processing scheme to reduce the
image distortion caused by the RS mechanism. Previous
works on the RS mechanism and known alternatives are
described in section II. A mathematical analysis of the RS
IEEE Transactions on Consumer Electronics, Vol. 54, No. 4, NOVEMBER 2008 1480
mechanism is given in section III. In section IV, the
implementation scheme for the proposed algorithm is
described. Experimental results are given in section V.
II. PREVIOUS WORKS AND OTHER ALTERNATIVES
The RS distortion can be completely removed by using a
mechanical shutter. By adopting the mechanical shutter, all
the photodiodes of CIS are exposed to light during the
same time interval, denoted as MIΔ in Fig. 1 (a),
regardless of readout mechanism. Downside of using the
mechanical shutter is reduced integration time ( IΔ Æ
MIΔ ), and the additional size and cost caused by
additional mechanical devices. Since primary applications
of CIS are portable devices such as cellulars and PDA’s to
which imaging is a subordinate function, such overheads
due to mechanical shutters is not always justified.
On the other hand, the RS distortion can be reduced by
raising the readout speed. In Fig. 1 (a), IΔ denotes the
exposure time and D denotes the maximum access time
difference between rows. D can be reduced by speeding up
the readout while IΔ remains unchanged. However, raising
the readout frequency becomes more difficult and requires
more power consumption as the number of pixels increases.
El Gamal et al. [2] described a novel CIS architecture
where each pixel integrates an analog-to-digital converter
to digitize the signal and a latch to store the digitized signal.
This architecture enables CIS to operate in the same way as
CCD does. However, this architecture is economically
impractical due to poor fill factor and poor sensitivity.
Geyer et al. [3] proposed a new camera projection model
for camera with the RS mechanism to mitigate the
reduction in accuracy and described a framework for
analyzing structure-from-motion problems in RS cameras.
By parameterizing the velocity of the camera coordinate,
the RS effect is applied to the traditional pinhole camera
model [4] to derive the projection matrix.
Ait-Aider et al. [5] proposed a technique for pose
recovery and 3-D velocity computation by taking the RS
effect into account. They took advantage of image
deformation induced by the RS mechanism and computed
3-D poses and velocity based on rigid sets of 3-D points.
Liang et al. [6] made the first attempt to correct the RS
distortion based on motion vector detection. They found the
global motion vector by the block matching and voting
method [7]. The block matching technique is similar to that
of MPEG 4 where we cannot but sacrifice the accuracy of
motion vector to decrease the computation time. Excessive
computational load due to the smoothing operation to
compensate for inaccuracy of motion vector is not
acceptable in mobile devices, major applications of CIS.
In this paper, we utilized an optical flow method which
usually produces more accurate motion vector than block
matching. To reduce computation, we set center-oriented
subwindows, applied the optical flow algorithm and
produced one motion vector from those outputs. For low-
power consideration, the motion vector detection is
performed only when an image capture takes place.
III. ROLLING SHUTTER ANALYSIS
An image sensor array with the RS mechanism is defined
in Fig. 2. The sensor has a W x N pixel array and can generate
up to FM frames per second. We define Cartesian coordinates
where the origin is located at the top-left corner of the sensor
array. In addition, we refer to a new M x N array (in the unit
of pixel) located at the center as effective area whose top-left
corner is located at (x1, y1). The effective area corresponds to
the actual image output of the sensor. Generally, the effective
area is used for an image output while the rest, called margin
area, is utilized by other auxiliary routines like black-level
compensation, color interpolation and so on.
Now consider a situation where an image is taken by the
image sensor when there is a rectilinearly moving object with
velocity v. We assume that integration time IΔ is so small
that we can ignore any motion blur in the image and the
motion occurs globally throughout the sensing area. We
assume that the camera is fixed and the scene moves with a
relative velocity. It is possible to find distortion patterns of
the effective area with respect to individual components of
velocity vector v.
A. Horizontal (x-axis) Motion
CIS is controlled row by row and all the pixels belonging to
a row have the same exposure timing.
Consider only a horizontal motion with motion vector (vx,0)
where the moving object is the effective area itself. If the
sensor starts reading out the pixel array from the first row
after an integration time IΔ , then it takes τH to read out the
whole array, where H is the number of rows in the array and τ is the time spent to read out a single row.
Fig.2 Definitions of related parameters and axes; an image sensor array
consists of effective area for actual image output and margin area for
other image processing purposes
τ is either a known parameter or can be approximated
from other given parameters. When clock frequency f is given,
τ is given by W/f since reading a row requires W clocks. If
frame rate FM is given instead of f , a period for a single
image frame can be given by 1/FM - b where b is blank time
J.-B. Chun et al.: Suppressing Rolling-Shutter Distortion of CMOS Image Sensors by Motion Vector Detection 1481
between image frames as shown in Fig. 1 (a). By equating
τH with 1/FM - b, τ can be obtained as
M
M
HF
bF
f
W −== 1τ (1)
In Fig. 3, the y-coordinate of the k-th row of the effective
area is 11 −+ ky and the time spent by the rolling shutter
until it reaches the row is τ)1( 1 −+ ky . Since the row also
moves with velocity vx, multiplying τ)1( 0 −+ ky by vx yields
dx,k, the displacement of the k-th row in the x direction as
)1( 1, −+= kyvd xkx τ (2)
Fig.3 Distortion in a horizontal motion; a rectangular object is distorted
into a parallelogram due to the RS distortion
Since dx,k is represented as the sum of a constant plus a
component proportional to k, the rectangular area is distorted
into a parallelogram, where the so-called skew angle θ
formed by y-axis and a side of the parallelogram is a good
measurement to show the degree of distortion. Maximum
horizontal skew, dx,max, is defined as the difference between
dx,k of the first (k = 1) and the last (k = N) row;
)1(
)1( 11
1,,max,
−=
−−+=
−=
Nv
yvNyv
ddd
x
xx
xNxx
τ
ττ (3)
θ is thus given by
τθ xx vN
d 1max,1 tan
1
tan −− =−= (4)
B. Vertical (y-axis) Motion
Let us consider a vertical motion of the effective area with
velocity (0, vy). To understand the vertical distortion, we
define scan velocity vscan, denoting the number of rows read
out per second, as given by the inversion of τ ;
τ
1=scanv (5)
The displacement of the k-th row of the effective area in
the vertical direction is denoted by dy,k. When a capture starts,
the rolling shutter starts its readout at velocity vscan and the k-
th row of the object starts its downward motion at velocity vy
denoting the number of rows traversed by the moving object
in a second. Since the time elapsed until the rolling shutter
meets the k-th row of the effective area can be written as
scanky vkyd /)1( 1, −++ or yky vd /, .
By equating the two expressions, dy,k can be obtained as
follows.
τ
τ
y
y
yscan
y
ky
v
kyv
vv
kyv
d
−
−+=
−
−+=
1
)1(
)1(
1
1
,
(6)
Maximum vertical stretch denoted by dy,max
is defined as
the difference between dy,1 and dy,N;
τ
τ
y
y
yNyy v
Nv
ddd −
−=−=
1
)1(
1,,max,
(7)
Fig.4 Distortion in a vertical motion. Dotted rectangle denotes the
effective area, while shaded rectangles denote how it appears when vy = 0.
(a) Original image appears undistorted when vy = 0. (b) Images in the RS
system are vertically shrunk (top) when vy < 0, and vertically expanded
(bottom) when vy > 0.
If τyv−1 > 0, the sign of dy,max is given by that of vy. Vertical
motion in the RS mechanism causes vertical distortion as
shown in Fig. 4. When vy is positive, the object is stretched by
dy,max and when vy is negative, it is shrunk by |dy,max |.
C. Motion Vector Composition
For general motions with nonzero values for vx and vy, the
distortion from the results of the previous sections. Eq. (2),
(3), (6) and (7) are still valid under the same definitions.
However, the skew angle needs to be rewritten as
max,
max,1tan
y
x
dN
d
+=
−θ (8)
IEEE Transactions on Consumer Electronics, Vol. 54, No. 4, NOVEMBER 2008 1482
The analysis up to now can explain the distortion patterns in
Fig. 1. Because a vertical motion is dominant in the top and
the middle case, the object is stretched or shrunk. In the
bottom case where a horizontal motion is dominant, the
object is skewed.
When we consider only rectilinear and global motions, the
RS distortion can be represented by an affine transformation
from non-distortion space (x, y, 1) to distortion space (x’, y’,
1) as shown in Fig.5. The image in the non-distortion space
can be regarded as the result of the SS system. We can
represent the transformation as
⎟⎟
⎟
⎠
⎞
⎜⎜
⎜
⎝
⎛
=
⎟⎟
⎟
⎠
⎞
⎜⎜
⎜
⎝
⎛
⎟⎟
⎟
⎠
⎞
⎜⎜
⎜
⎝
⎛
=
⎟⎟
⎟
⎠
⎞
⎜⎜
⎜
⎝
⎛
111001
'
'
232221
131211
y
x
Ay
x
aaa
aaa
y
x
(9)
Fig.5 Distortion by rolling-shutter imager can be represented by an
affine transformation
The same image as would be obtained by the SS system
can also be obtained in the RS system by ‘undoing’ the
distortion through the inverse transformation if the
transformation matrix A can be found.
To find matrix A in Eq. (9), instead of substituting
matching points between two spaces and solving
simultaneous equations with respect to aij, we took
advantage of well-known properties of the affine
transformation. In Eq. (9), a11 and a22 reflect scaling
factors with respect to x- and y-axis, respectively. Since
no scaling takes place in the direction of x-axis, a11 is
considered to be a unity. Vertical scaling factor a22 can
be given by (N + dy,max) / N since the height of the sample
is changed from N to N + dy,max. On the other hand, a12
reflects a shearing factor with respect to x-axis which is
given by cotθ = (N+dy,max) / dx,max whereas a21 is zero
since there is no shearing effect in the direction of y-axis.
Thus A is given by
⎟⎟
⎟⎟
⎟⎟
⎟
⎠
⎞
⎜⎜
⎜⎜
⎜⎜
⎜
⎝
⎛
+
+
=
100
0
1
1,
max,
1,
max,
max,
y
y
x
x
y
d
N
dN
d
d
dN
A (10)
The variables can be evaluated by Eq. (3) and (7) along with
known sensor parameters such as N and τ if motion vector
(vx, vy) is acquired.
IV. IMPLEMENTATION SCHEME TO REDUCE ROLLING-
SHUTTER DISTORTIONS
To reduce the complexity in the implementation, three
assumptions were made, shown below along with the
rationale for each assumption.
z Motions are rectilinear: Any general motion can be
approximated into a combination of rectilinear ones for
a sufficiently short period of time.
z Motion blur is less than a certain level: Motion blur,
always accompanied by any motion, is affected by
exposure time and motion velocity. However, for a
short period of exposure time, motion blur can also be
assumed to be negligible.
z There is only a global motion in an image: Distortions
due to partial motion are less apparent than those due to
a global motion. Therefore, partial motions are ignored.
Under these assumptions, we propose a post-processing
routine as shown in Fig. 6. The routine receives RGB
image as input and generates RGB output image. As a
whole, the routine consists of motion detection stage and
transformation stage. It operates in either preview mode
or capture mode. In the preview mode, the sensor
generates low-resolution video streams before a capture
takes place. In the capture mode, high-resolution target
image data are generated by the sensor after a capture
signal is generated. Motion vectors, extracted from
consecutive images by the motion detection stage in the
preview mode, are used in the transformation stage to
adjust the image. To reduce power consumption, only two
image frames after each capture signal are used in the
motion detection, which is accomplished by delaying the
mode switching until two preview images are saved after
the capture.
J.-B. Chun et al.: Suppressing Rolling-Shutter Distortion of CMOS Image Sensors by Motion Vector Detection 1483
A. Motion Vector Detection Stage
Motion vector detection is an important issue in image
processing and computer vision studies. Applications like
video compression, video mosaic and video surveillance
belong to its major application and they utilize block
matching technique or optical flow method for the motion
vector detection. On the other hand, image stabilization
techniques [8], [9] are used for video camera to compensate
for image deterioration due to unexpected swing of user’s
hands. Their motion vector detection and image stabilization
method correspond to the first and second step of our
approach shown in Fig. 6. To find global motion vectors,
Oshima et al. [8] utilize a specialized gyro sensor and
Kinugasa et al. [9] derive their motion vectors from
consecutive images by projecting images into x- and y-axis
and comparing the current projection with the previous
projection. [9] is similar to our approach in utilizing
consecutive images but its one-dimensional motion vector
detection technique is less accurate than typical two-
dimensional techniques.
Most of known methods to find motion vectors are based
on consecutive images. The most straightforward way to find
a motion vector from two consecutive images is to perform
the exhaustive search for all possible relative motions
between two frames so that the sum of errors between
Fig.6 Block diagram of the proposed system
two images is minimal. The computation complexity
becomes O(M2N2) when the size of image is M x N.
Various methods to reduce the complexity were published
[10]-[13]. The most popular one is Lucas-Kanade (LK)
algorithm [12] on which our motion detection method is
based. In the LK algorithm, for two image data F(X) and G(X)
for X = (x, y), error function E(h) is defined as
∑ −+=
X
XGhXFhE 2)]()([)( (11)
where h is 2-D shift amount.
Motion vector detection between F and G is to find h which
makes the error function minimal. Both sides of Eq. (11) are
differentiated and equated to zero:
0
)]()(')()[(2
)]()(')([ 2
=
−+=
−+∂
∂≈∂
∂
∑
∑
X
X
XGXhFXFXF
XGXhFXF
hh
E
(12)
Eq. (12) can be solved with respect to h. F(X) is then
shifted by h and the same procedure is iterated until h
reaches a target value. In this way, the complexity is
reduced to )log( MNMNO . Because the proposed routine
is expected to be run on mobile devices of which
computing power is not as powerful as stationary ones, it is
needed to further reduce the complexity at the LK stage of
the proposed routine.
In Fig. 6, the RGB2Gray block, the first stage of motion
vector detection
本文档为【CMOS稳像】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。