A Flexible New Technique for Camera
Calibration
Zhengyou Zhang
December 2, 1998
(updated on December 14, 1998)
(updated on March 25, 1999)
(last updated on Aug. 10, 2002; a typo in Appendix B)
Technical Report
MSR-TR-98-71
Microsoft Research
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052
zhang@microsoft.com
http://research.microsoft.com/˜zhang
A Flexible New Technique for Camera Calibration
Zhengyou Zhang
Microsoft Research, One Microsoft Way, Redmond, WA 98052-6399, USA
zhang@microsoft.com http://research.microsoft.com/˜zhang
Contents
1 Motivations 2
2 Basic Equations 3
2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Homography between the model plane and its image . . . . . . . . . . . . . . . . . 4
2.3 Constraints on the intrinsic parameters . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Geometric Interpretation† . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Solving Camera Calibration 5
3.1 Closed-form solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Dealing with radial distortion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Degenerate Configurations 8
5 Experimental Results 9
5.1 Computer Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.3 Sensitivity with Respect to Model Imprecision‡ . . . . . . . . . . . . . . . . . . . . 14
5.3.1 Random noise in the model points . . . . . . . . . . . . . . . . . . . . . . . 14
5.3.2 Systematic non-planarity of the model pattern . . . . . . . . . . . . . . . . . 15
6 Conclusion 17
A Estimation of the Homography Between the Model Plane and its Image 17
B Extraction of the Intrinsic Parameters from Matrix B 18
C Approximating a 3× 3 matrix by a Rotation Matrix 18
D Camera Calibration Under Known Pure Translation§ 19
†added on December 14, 1998
‡added on December 28, 1998; added results on systematic non-planarity on March 25, 1998
§added on December 14, 1998, corrected (based on the comments from Andrew Zisserman) on January 7, 1999
1
A Flexible New Technique for Camera Calibration
Abstract
We propose a flexible new technique to easily calibrate a camera. It is well suited for use
without specialized knowledge of 3D geometry or computer vision. The technique only requires
the camera to observe a planar pattern shown at a few (at least two) different orientations. Either
the camera or the planar pattern can be freely moved. The motion need not be known. Radial lens
distortion is modeled. The proposed procedure consists of a closed-form solution, followed by a
nonlinear refinement based on the maximum likelihood criterion. Both computer simulation and
real data have been used to test the proposed technique, and very good results have been obtained.
Compared with classical techniques which use expensive equipment such as two or three orthog-
onal planes, the proposed technique is easy to use and flexible. It advances 3D computer vision
one step from laboratory environments to real world use.
Index Terms— Camera calibration, calibration from planes, 2D pattern, absolute conic, projective
mapping, lens distortion, closed-form solution, maximum likelihood estimation, flexible setup.
1 Motivations
Camera calibration is a necessary step in 3D computer vision in order to extract metric information
from 2D images. Much work has been done, starting in the photogrammetry community (see [2,
4] to cite a few), and more recently in computer vision ([9, 8, 23, 7, 26, 24, 17, 6] to cite a few).
We can classify those techniques roughly into two categories: photogrammetric calibration and self-
calibration.
Photogrammetric calibration. Camera calibration is performed by observing a calibration object
whose geometry in 3-D space is known with very good precision. Calibration can be done very
efficiently [5]. The calibration object usually consists of two or three planes orthogonal to each
other. Sometimes, a plane undergoing a precisely known translation is also used [23]. These
approaches require an expensive calibration apparatus, and an elaborate setup.
Self-calibration. Techniques in this category do not use any calibration object. Just by moving a
camera in a static scene, the rigidity of the scene provides in general two constraints [17, 15]
on the cameras’ internal parameters from one camera displacement by using image informa-
tion alone. Therefore, if images are taken by the same camera with fixed internal parameters,
correspondences between three images are sufficient to recover both the internal and external
parameters which allow us to reconstruct 3-D structure up to a similarity [16, 13]. While this ap-
proach is very flexible, it is not yet mature [1]. Because there are many parameters to estimate,
we cannot always obtain reliable results.
Other techniques exist: vanishing points for orthogonal directions [3, 14], and calibration from pure
rotation [11, 21].
Our current research is focused on a desktop vision system (DVS) since the potential for using
DVSs is large. Cameras are becoming cheap and ubiquitous. A DVS aims at the general public,
who are not experts in computer vision. A typical computer user will perform vision tasks only from
time to time, so will not be willing to invest money for expensive equipment. Therefore, flexibility,
robustness and low cost are important. The camera calibration technique described in this paper was
developed with these considerations in mind.
2
The proposed technique only requires the camera to observe a planar pattern shown at a few (at
least two) different orientations. The pattern can be printed on a laser printer and attached to a “rea-
sonable” planar surface (e.g., a hard book cover). Either the camera or the planar pattern can be moved
by hand. The motion need not be known. The proposed approach lies between the photogrammet-
ric calibration and self-calibration, because we use 2D metric information rather than 3D or purely
implicit one. Both computer simulation and real data have been used to test the proposed technique,
and very good results have been obtained. Compared with classical techniques, the proposed tech-
nique is considerably more flexible. Compared with self-calibration, it gains considerable degree of
robustness. We believe the new technique advances 3D computer vision one step from laboratory
environments to the real world.
Note that Bill Triggs [22] recently developed a self-calibration technique from at least 5 views of
a planar scene. His technique is more flexible than ours, but has difficulty to initialize. Liebowitz and
Zisserman [14] described a technique of metric rectification for perspective images of planes using
metric information such as a known angle, two equal though unknown angles, and a known length
ratio. They also mentioned that calibration of the internal camera parameters is possible provided at
least three such rectified planes, although no experimental results were shown.
The paper is organized as follows. Section 2 describes the basic constraints from observing a
single plane. Section 3 describes the calibration procedure. We start with a closed-form solution,
followed by nonlinear optimization. Radial lens distortion is also modeled. Section 4 studies con-
figurations in which the proposed calibration technique fails. It is very easy to avoid such situations
in practice. Section 5 provides the experimental results. Both computer simulation and real data are
used to validate the proposed technique. In the Appendix, we provides a number of details, including
the techniques for estimating the homography between the model plane and its image.
2 Basic Equations
We examine the constraints on the camera’s intrinsic parameters provided by observing a single plane.
We start with the notation used in this paper.
2.1 Notation
A 2D point is denoted by m = [u, v]T . A 3D point is denoted by M = [X,Y, Z]T . We use x˜ to denote
the augmented vector by adding 1 as the last element: m˜ = [u, v, 1]T and M˜ = [X,Y, Z, 1]T . A camera
is modeled by the usual pinhole: the relationship between a 3D point M and its image projection m is
given by
sm˜ = A
[
R t
]
M˜ , (1)
where s is an arbitrary scale factor, (R, t), called the extrinsic parameters, is the rotation and trans-
lation which relates the world coordinate system to the camera coordinate system, and A, called the
camera intrinsic matrix, is given by
A =
α γ u00 β v0
0 0 1
with (u0, v0) the coordinates of the principal point, α and β the scale factors in image u and v axes,
and γ the parameter describing the skewness of the two image axes.
We use the abbreviation A−T for (A−1)T or (AT )−1.
3
2.2 Homography between the model plane and its image
Without loss of generality, we assume the model plane is on Z = 0 of the world coordinate system.
Let’s denote the ith column of the rotation matrix R by ri. From (1), we have
s
uv
1
= A [r1 r2 r3 t]
X
Y
0
1
= A
[
r1 r2 t
] XY
1
.
By abuse of notation, we still use M to denote a point on the model plane, but M = [X,Y ]T since Z is
always equal to 0. In turn, M˜ = [X,Y, 1]T . Therefore, a model point M and its image m is related by a
homography H:
sm˜ = HM˜ with H = A
[
r1 r2 t
]
. (2)
As is clear, the 3× 3 matrix H is defined up to a scale factor.
2.3 Constraints on the intrinsic parameters
Given an image of the model plane, an homography can be estimated (see Appendix A). Let’s denote
it by H =
[
h1 h2 h3
]
. From (2), we have[
h1 h2 h3
]
= λA
[
r1 r2 t
]
,
where λ is an arbitrary scalar. Using the knowledge that r1 and r2 are orthonormal, we have
hT1A
−TA−1h2 = 0 (3)
hT1A
−TA−1h1 = hT2A
−TA−1h2 . (4)
These are the two basic constraints on the intrinsic parameters, given one homography. Because a
homography has 8 degrees of freedom and there are 6 extrinsic parameters (3 for rotation and 3 for
translation), we can only obtain 2 constraints on the intrinsic parameters. Note thatA−TA−1 actually
describes the image of the absolute conic [16]. In the next subsection, we will give an geometric
interpretation.
2.4 Geometric Interpretation
We are now relating (3) and (4) to the absolute conic.
It is not difficult to verify that the model plane, under our convention, is described in the camera
coordinate system by the following equation:[
r3
rT3 t
]T [ x
y
z
w
]
= 0 ,
where w = 0 for points at infinity and w = 1 othewise. This plane intersects the plane at infinity at a
line, and we can easily see that
[
r1
0
]
and
[
r2
0
]
are two particular points on that line. Any point on it
4
is a linear combination of these two points, i.e.,
x∞ = a
[
r1
0
]
+ b
[
r2
0
]
=
[
ar1 + br2
0
]
.
Now, let’s compute the intersection of the above line with the absolute conic. By definition, the
point x∞, known as the circular point, satisfies: xT∞x∞ = 0, i.e.,
(ar1 + br2)T (ar1 + br2) = 0, or a2 + b2 = 0 .
The solution is b = ±ai, where i2 = −1. That is, the two intersection points are
x∞ = a
[
r1 ± ir2
0
]
.
Their projection in the image plane is then given, up to a scale factor, by
m˜∞ = A(r1 ± ir2) = h1 ± ih2 .
Point m˜∞ is on the image of the absolute conic, described by A−TA−1 [16]. This gives
(h1 ± ih2)TA−TA−1(h1 ± ih2) = 0 .
Requiring that both real and imaginary parts be zero yields (3) and (4).
3 Solving Camera Calibration
This section provides the details how to effectively solve the camera calibration problem. We start
with an analytical solution, followed by a nonlinear optimization technique based on the maximum
likelihood criterion. Finally, we take into account lens distortion, giving both analytical and nonlinear
solutions.
3.1 Closed-form solution
Let
B = A−TA−1 ≡
B11 B12 B13B12 B22 B23
B13 B23 B33
=
1
α2
− γ
α2β
v0γ−u0β
α2β
− γ
α2β
γ2
α2β2
+ 1
β2
−γ(v0γ−u0β)
α2β2
− v0
β2
v0γ−u0β
α2β
−γ(v0γ−u0β)
α2β2
− v0
β2
(v0γ−u0β)2
α2β2
+ v
2
0
β2
+1
. (5)
Note that B is symmetric, defined by a 6D vector
b = [B11, B12, B22, B13, B23, B33]T . (6)
Let the ith column vector of H be hi = [hi1, hi2, hi3]T . Then, we have
hTi Bhj = v
T
ijb (7)
5
with
vij = [hi1hj1, hi1hj2 + hi2hj1, hi2hj2,
hi3hj1 + hi1hj3, hi3hj2 + hi2hj3, hi3hj3]T .
Therefore, the two fundamental constraints (3) and (4), from a given homography, can be rewritten as
2 homogeneous equations in b: [
vT12
(v11 − v22)T
]
b = 0 . (8)
If n images of the model plane are observed, by stacking n such equations as (8) we have
Vb = 0 , (9)
whereV is a 2n×6 matrix. If n ≥ 3, we will have in general a unique solution b defined up to a scale
factor. If n = 2, we can impose the skewless constraint γ = 0, i.e., [0, 1, 0, 0, 0, 0]b = 0, which is
added as an additional equation to (9). (If n = 1, we can only solve two camera intrinsic parameters,
e.g., α and β, assuming u0 and v0 are known (e.g., at the image center) and γ = 0, and that is indeed
what we did in [19] for head pose determination based on the fact that eyes and mouth are reasonably
coplanar.) The solution to (9) is well known as the eigenvector of VTV associated with the smallest
eigenvalue (equivalently, the right singular vector of V associated with the smallest singular value).
Once b is estimated, we can compute all camera intrinsic matrix A. See Appendix B for the
details.
OnceA is known, the extrinsic parameters for each image is readily computed. From (2), we have
r1 = λA−1h1
r2 = λA−1h2
r3 = r1 × r2
t = λA−1h3
with λ = 1/‖A−1h1‖ = 1/‖A−1h2‖. Of course, because of noise in data, the so-computed matrix
R = [r1, r2, r3] does not in general satisfy the properties of a rotation matrix. Appendix C describes
a method to estimate the best rotation matrix from a general 3× 3 matrix.
3.2 Maximum likelihood estimation
The above solution is obtained through minimizing an algebraic distance which is not physically
meaningful. We can refine it through maximum likelihood inference.
We are given n images of a model plane and there are m points on the model plane. Assume
that the image points are corrupted by independent and identically distributed noise. The maximum
likelihood estimate can be obtained by minimizing the following functional:
n∑
i=1
m∑
j=1
‖mij − mˆ(A,Ri, ti, Mj)‖2 , (10)
where mˆ(A,Ri, ti, Mj) is the projection of point Mj in image i, according to equation (2). A rotation
R is parameterized by a vector of 3 parameters, denoted by r, which is parallel to the rotation axis
and whose magnitude is equal to the rotation angle. R and r are related by the Rodrigues formula [5].
Minimizing (10) is a nonlinear minimization problem, which is solved with the Levenberg-Marquardt
Algorithm as implemented in Minpack [18]. It requires an initial guess of A, {Ri, ti|i = 1..n}
which can be obtained using the technique described in the previous subsection.
6
3.3 Dealing with radial distortion
Up to now, we have not considered lens distortion of a camera. However, a desktop camera usually
exhibits significant lens distortion, especially radial distortion. In this section, we only consider the
first two terms of radial distortion. The reader is referred to [20, 2, 4, 26] for more elaborated models.
Based on the reports in the literature [2, 23, 25], it is likely that the distortion function is totally
dominated by the radial components, and especially dominated by the first term. It has also been
found that any more elaborated modeling not only would not help (negligible when compared with
sensor quantization), but also would cause numerical instability [23, 25].
Let (u, v) be the ideal (nonobservable distortion-free) pixel image coordinates, and (u˘, v˘) the
corresponding real observed image coordinates. The ideal points are the projection of the model
points according to the pinhole model. Similarly, (x, y) and (x˘, y˘) are the ideal (distortion-free) and
real (distorted) normalized image coordinates. We have [2, 25]
x˘ = x+ x[k1(x2 + y2) + k2(x2 + y2)2]
y˘ = y + y[k1(x2 + y2) + k2(x2 + y2)2] ,
where k1 and k2 are the coefficients of the radial distortion. The center of the radial distortion is the
same as the principal point. From u˘ = u0 + αx˘+ cy˘ and v˘ = v0 + βy˘, we have
u˘ = u+ (u− u0)[k1(x2 + y2) + k2(x2 + y2)2] (11)
v˘ = v + (v − v0)[k1(x2 + y2) + k2(x2 + y2)2] . (12)
Estimating Radial Distortion by Alternation. As the radial distortion is expected to be small, one
would expect to estimate the other five intrinsic parameters, using the technique described in Sect. 3.2,
reasonable well by simply ignoring distortion. One strategy is then to estimate k1 and k2 after having
estimated the other parameters, which will give us the ideal pixel coordinates (u, v). Then, from (11)
and (12), we have two equations for each point in each image:[
(u−u0)(x2+y2) (u−u0)(x2+y2)2
(v−v0)(x2+y2) (v−v0)(x2+y2)2
] [
k1
k2
]
=
[
u˘−u
v˘−v
]
.
Given m points in n images, we can stack all equations together to obtain in total 2mn equations, or
in matrix form as Dk = d, where k = [k1, k2]T . The linear least-squares solution is given by
k = (DTD)−1DTd . (13)
Once k1 and k2 are estimated, one can refine the estimate of the other parameters by solving (10) with
mˆ(A,Ri, ti, Mj) replaced by (11) and (12). We can alternate these two procedures until convergence.
Complete Maximum Likelihood Estimation. Experimentally, we found the convergence of the
above alternation technique is slow. A natural extension to (10) is then to estimate the complete set of
parameters by minimizing the following functional:
n∑
i=1
m∑
j=1
‖mij − m˘(A, k1, k2,Ri, ti, Mj)‖2 , (14)
where m˘(A, k1, k2,Ri, ti, Mj) is the projection of point Mj in image i according to equation (2),
followed by distortion according to (11) and (12). This is a nonlinear minimization problem, which
7
is solved with the Levenberg-Marquardt Algorithm as implemented in Minpack [18]. A rotation is
again parameterized by a 3-vector r, as in Sect. 3.2. An initial guess of A and {Ri, ti|i = 1..n} can
be obtained using the technique described in Sect. 3.1 or in Sect. 3.2. An initial guess of k1 and k2 can
be obtained with the technique described in the last paragraph, or simply by setting them to 0.
3.4 Summary
The recommended calibration procedure is as follows:
1. Print a pattern and attach it to a planar surface;
2. Take a few images of the model plane under different orientations by moving either the plane
or the camera;
3. Detect the feature points in the images;
4. Estimate the five intrinsic parameters and all the extrinsic parameters using the closed-form
solution as described in Sect. 3.1;
5. Estimate the coefficients of the radial distortion by solving the linear least-squares (13);
6. Refine all parameters by minimizing (14).
4 Degenerate Configurations
We study in this section configurations in which additional images do not provide more constraints on
the camera intrinsic parameters. Because (3) and (4) are derived from the properties of the rotation
matrix, if R2 is not independent of R1, then image 2 does not provide additional constraints. In
particular, if a plane undergoes a pure translation, then R2 = R1 and image 2 is not helpful for
camera calibration. In the following, we consider a more complex configuration.
Proposition 1. If the model plane at the second position is parallel to its first position, then the second
homography does not provide additional constraints.
Proof. Under our convention, R2 and R1 are related by a rotation around z-axis. That is,
R1
cos θ − sin θ 0sin θ cos θ 0
0 0 1
= R2 ,
where θ is the angle of the relative rotation. We will use superscript (1) and (2) to denote vectors
related to image 1 and 2,
本文档为【A+Flexible+New+Technique+for+Camera+Calibration】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。