首页 Face pose estimation and its application in video shot selection

Face pose estimation and its application in video shot selection

Face pose estimation and its application in video shot selection Face Pose Estimation and its Application in Video Shot Selection Zhiguang YANG 1 , Haizhou AI 1 , Bo WU 1 , Shihong LAO 2 and Lianhong CAI1 1 Computer Science and Technology Department, Tsinghua University, Beijing, 100084, China 2 Sensing Technology L...

Face Pose Estimation and its Application in Video Shot Selection Zhiguang YANG 1 , Haizhou AI 1 , Bo WU 1 , Shihong LAO 2 and Lianhong CAI1 1 Computer Science and Technology Department, Tsinghua University, Beijing, 100084, China 2 Sensing Technology Laboratory, Omron Corporation E-mail: ahz@mail.tsinghua.edu.cn Abstract In this paper, a face pose estimation method and its application in video shot selection for face image preprocessing is introduced. The pose estimator is learned by a boosting regression algorithm called SquareLev.R [1] that learns poses from simple Haar- type features. It consists of two tree structured subsystems for the left-right angle and up-down angle respectively. As a specific application in video based face recognition, the best shot selection problem is discussed, which results in a real-time system that can automatically select the most frontal face from a video sequence. 1. Introduction Face pose estimation (PE) is used to predict the 3D orientation, that is the rotation-in-plane (RIP) and rotation-out-of-plane (ROP) angles, of human head. In particular, in this paper we only discuss its simplified version that corresponds to left-right angles and up- down angles. It is very important due to face pose plays an essential role in many real-life applications, such as monitoring attentiveness of drivers [2] or automating camera management [3]. In addition, many view-based approaches for face image analysis such as face recognition usually need to estimate the pose to some extent [4]. Previous works on pose estimation (PE) include PCA [5,6], ANN [7], SVMs [8,9], and Independent Subspace Analysis (ISA) [10]. In this paper, we propose a novel method to learn a pose estimator by boosting regression algorithm called SquareLev.R [1] that learns poses from simple Haar-type features [11]. It consists of two tree structured subsystems for the left-right angle and up-down angle respectively. As a specific application in video based face recognition, the best shot selection problem is discussed, which results in a real-time system that can automatically select the most frontal face from a video sequence. Best shot selection is of important value in live video based face related processing such as face recognition, demographic classification [12], etc. The main contribution of our work is a novel pose estimation method based on boosting regression that proves to be very useful for practical applications such as best shot selection. The rest of this paper is organized as follows: in Section 2, we discuss the problems involved in pose estimation; in Section 3, we give a brief introduction of the boosting regression algorithm, SquareLev.R; in Section 4, we introduce the Haar feature based weak learner for regression; in Section 5, we describe our pose estimation trees; in Section 6, we give our solution to the best shot selection problem and its results; and finally in Section 7, we present our conclusions. Al l S u b- W in do ws PE Vi e w - Ba se d Fa ce D e te ct or s Vi e w - Ba se d M od e ls fo r F ac e Al ig nm en t PE PE Further Processing Level 1 Level 2 Level 3 Figure 1. Definition of pose estimation (PE) 2. The Definition of Pose Estimation As illustrated in Fig.1, PE has three variations according to its position in the flow chart. PE before face detection is a rough prediction used to divide each sub-window into its corresponding subcategory for view-based face detectors. Because there are usually millions of patches to be processed for face detection, PE at this level must be simple and fast. PE after face detection serves as the multiplexer that guides the face pattern to its view-based model. Its accuracy has direct influence on the performance of further processing. PE after face alignment is the last level. At this stage, there are usually many facial landmarks available, so model- based method can be used. In this paper we focus on Proceedings of the 17th International Conference on Pattern Recognition (ICPR’04) 1051-4651/04 $ 20.00 IEEE the second level. It means we assume there are no landmarks available and the target is to estimate the pose from the detected face regions in an image. 3. Boosting Regression Algorithm The learning algorithm SquareLev.R [1] is a boost- or leverage-style regression algorithm that aims at reducing the variance of residuals. Given a sample set 1 {( , )}mi i iS y == x and a regressor F, the variance of residuals is 2 2Var P = −r r , (1) where r is the m-vector of residuals defined by ( )i i ir y F= − x , and r is the m-vector with all components equal to 1 1 m ii r r m = = ¦ . Fig.2 gives the details of SquareLev.R. It has been proved that in each iteration of SquareLev.R, PVar will decrease by a factor of 2(1 )tε− [1]. That means if İt has a positive lower bound İmin then for any positive number ȡ this algorithm will definitely generate a master regressor whose sample error is at most ȡ. 4. Haar Feature Based Weak Learner In each boosting round, SquareLev.R will call the weak learner to obtain a hypothesis or weak regressor. Different from classifications in which the hypothesis is a threshold function, the hypothesis for regression should be a continuous function of the feature value. A very simple yet effective set is the Look-Up-Table (LUT). We follow Viola & Jones’ [11] to use the Haar features. For a Haar feature h, assuming its range has been normalized to [0,1], our LUT has 64 bins and the i-th bin corresponds to the sub-domain [(i-1) /64, i/64], i=1,…,64. The hypothesis on bini is calculated as [ ]| ( ) iE y h bin∈x� . (2) Define the characteristic function 1 ( ) 0 i i i u bin B u u bin ∈ = ® ∉¯ , then the hypothesis based on Haar feature h can be formalized as 64 1 ( ) ( ( )) [ | ( ) ]i i i f B h E y h bin = = ∈¦x x x� (3) We construct a hypothesis pool from all possible Haar features. Figure 3. Multi-view face samples 5. Pose Estimation Tree Pose data for training consist of faces with ±45˚, ±30˚, ±15˚, 0˚ left-right ROP and ±30˚, ±15˚, 0˚ up- down ROP that is totally 35 view categories of which each has 300 faces of different people. Because our target is PE after face detection, we do not do any shape alignment to the face samples, that is to say the face block obtained by the face detection module will be used for training directly. All samples are resized to 24×24-pixel patch, see Fig.3. • Given Sample Set 1 {( , )}mi i iS y == x , a base learning algorithm and parameters ȡ, Tmax • Initialize master regressor F to the zero function • For t = 1 to Tmax do For i = 1 to m do ( )i i ir y F= − x end do If 2 2 mρ−

                    本文档为【Face pose estimation and its application in video shot selection】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

Face pose estimation and its application in video shot selection

你可能还喜欢