Online selection of discriminative tacking features

Online selection of discriminative tacking features On-Line Selection of Discriminative Tracking Features Robert T. Collins and Yanxi Liu CMU-RI-TR-03-12 The Robotics Institute, Carnegie Mellon University, Pittsburgh PA Abstract This paper presents a method for evaluating multiple feature spaces while tra...

On-Line Selection of Discriminative Tracking Features Robert T. Collins and Yanxi Liu CMU-RI-TR-03-12 The Robotics Institute, Carnegie Mellon University, Pittsburgh PA Abstract This paper presents a method for evaluating multiple feature spaces while tracking, and for adjusting the set of features used to improve tracking performance. Our hypothesis is that the features that best discriminate between object and background are also best for tracking the object. We develop an on-line feature ranking mechanism based on the two-class variance ratio measure, applied to log likelihood values computed from empirical distributions of object and background pixels with respect to a given feature. This feature ranking mechanism is embedded in a tracking system that adaptively selects the top-ranked discriminative features for tracking. Examples are presented to illustrate how the method adapts to changing appearances of both tracked object and scene background. This work is supported in part by DARPA/IAO HumanID under ONR contract N00014-00-1-0915, and by DARPA/IPTO MARS contract NBCHC020090. c 2003 Carnegie Mellon University This work has been submitted to IEEE ICCV03 for possible publication. Copyright may be transferred without notice. 1 Introduction Two decades of vision research have yielded an arsenal of powerful algorithms for object tracking. Moving objects can be effectively tracked in real-time from stationary cameras using frame dif- ferencing or adaptive background subtraction combined with simple data association techniques [10]. This approach can be generalized to situations where the video data can be easily stabilized, including purely rotating and zooming cameras, and aerial views that allow scene structure to be modeled as an approximately planar surface [5]. Modern appearance-based tracking methods such as the mean-shift algorithm use viewpoint-insensitive object appearance models to track objects through non-rigid pose changes without any prior knowledge of scene structure or camera motion [4]. Kalman filter extensions achieve more robust tracking of maneuvering objects through the introduction of statistical models of object and camera motion [2]. Particle filtering extensions en- able tracking through occlusion and clutter by reasoning over a state-space of multiple hypotheses [6]. Our experience with a variety of tracking methods can be summarized simply: tracking suc- cess or failure depends primarily on how distinguishable an object is from its surroundings. If the object is very distinctive, we can use a simple tracker to follow it. If the object has low-contrast or is camouflaged, we will obtain robust tracking only by imposing a great deal of prior knowl- edge about scene structure or expected motion, and thus tracking success is bought at the price of reduced generality. The degree to which a tracker can discriminate object and background is directly related to the feature space(s) it uses. Surprisingly, most tracking applications are conducted using a fixed set of features, determined apriori. Preliminary experiments are often run to determine which fixed feature space to use – a good example is work on head tracking using skin color, where many papers evaluate different color spaces to find one in which pixel values for skin cluster most tightly, e.g. [13]. However, these approaches ignore the fact that it is the ability to distinguish between object and background that is most important, and the background can rarely be specified in advance. Furthermore, both foreground and background appearance will change as the target object moves from place to place, so tracking features will also need to adapt. Figure 1 illustrates this phenomenon with low contrast imagery of a car traveling through patches of sunlight and shadow. The best feature for tracking the car through sunlight performs poorly in shadow, and vice versa. A key issue addressed in this work is on-line, adaptive selection of an appropriate feature space for tracking. Our insight is that the feature space that best distinguishes between object and back- ground is the best feature space to use for tracking, and that this choice of feature space will need to be continuously re-evaluated over time to adapt to changing appearances of the tracked object and scene background. Target tracking is cast as a local discrimination problem with two classes: fore- ground and background. This point of view opens up a wide range of pattern recognition feature selection techniques that can be potentially adapted for use in tracking. An interesting character- istic of target tracking is that foreground and background appearances are constantly changing, 2 Figure 1: The features used for tracking an object must be adapted as the appearance of the object and background changes. The source imagery (left column) is low contrast aerial video of a car on a road. The car travels between sunny patches (top row) and shadow (bottom row). The best feature for tracking the car in sunlight (R-G) performs poorly in shadow. Similarly, the best feature for tracking through shadow (2G-B) does not perform as well in sunlight. albeit gradually. Naturally, when class appearance varies, the most discriminating set of features also varies. The issue of on-line feature selection has rarely been addressed in the literature, espe- cially under the hard constraint of speed required for target tracking. The nearest relevant work is [11], which dynamically switches between five color spaces to improve face tracking performance. Section 2 presents a brief look at off-line discriminative feature selection in the field of pattern classification. Section 3 adapts these ideas to the task of target tracking. Since the goal is to perform on-line feature selection while tracking, efficiency must be favored over optimality. Examples are presented in Section 4 to illustrate how incorporating feature selection with tracking facilitates adaptation to changing object and background appearance. Section 5 concludes the paper. 2 Feature Selection Feature selection is a technique for dimensionality reduction whereby a set of m features is chosen from a pool of n candidates, where usually m << n. The choice is made by optimizing a criterion function over all subsets of size m. This technique is especially useful if some of the input features carry little useful information for the problem and/or there are strong correlations between different feature dimensions [1], which is often the case when we extract image features for classification problems in computer vision. The two major components in feature selection are the selection criterion function, which is a quantitative measure that can be used to compare one feature subset against another, and the search strategy, which is a systematic procedure to enumerate candidate feature subsets and to decide when to stop. Criterion functions can be categorized by whether the evaluation process is 3 data intrinsic (filters) or classifier-dependent (wrappers). For discrimination problems, the crite- rion involves evaluation of the discriminating power of the selected feature subset. There are many ways to evaluate the discriminative power of each feature. For example, augmented variance ratio (AVR) has been shown to be effective for feature ranking as a preprocessing step for feature subset selection [7, 8]. AVR is the ratio of the between class variance of the feature to the within class variance of the feature, with an added penalty for features that may have small intra-class vari- ance but have close inter-class mean values. Other measures for discriminative power of a feature include information gain and mutual information. Since we usually do not know what the best subset size m should be, the search space for feature subsets is 2n, where n is the total number of features: with 100 features, the search space is 1030. The goal in feature subset selection is to find m features (out of n possible ones) that best complement each other for the classification task at hand. Existing heuristic search methods for feature selection provide a set of compromises between speed and optimality of selected feature set. For example, Sequential Forward Selection [1] has a linear computational complexity in n. In biomedical image classification, for example, a combination of feature ranking and feature subset selection has been shown to be effective for off-line selection of small, discriminative feature subsets from thousands of feature candidates [8]. To achieve on-line selection, we are forced to consider simplified selection criteria, non-exhaustive search spaces and heuristic search strategies. In this work, we simplify by finding the best m features individually, fully realizing that the best m individual features may not form the best feature subset of size m [12]. 3 Feature Selection for Tracking Our goal in this section is to develop an efficient method that continually evaluates and updates the set of features used for tracking. Our hypothesis is that the most promising features for track- ing are the same features that best discriminate between object and background classes. Given an appearance model learned from previous views of the object, the distribution of feature values for object and background samples is computed. Candidate features are then rank-ordered by mea- suring separability of the distributions of object and background classes. The most discriminative features are used to label pixels in a new video frame with the likelihood that they correspond to either object or background. Discriminative features produce likelihood maps where object pixels have high values, and background pixels have low values. We use the mean-shift algorithm as a non-parametric method to find the nearest local mode of this likelihood surface, thereby estimating the 2D location of the object in the image. Each of these steps is described in more detail below. It is important to note that the features we use for tracking need only be locally discriminative, in that the object only needs to be clearly separable from its immediate surroundings. This is a much less restrictive assumption than is necessary for a tracker that uses a fixed set of features, since that set must by necessity be discriminative across a wide-range of imaging conditions. Since we are swapping features in and out on the fly while tracking, we are able to focus on finding 4 features that are finely-tuned to provide good foreground/background discrimination, even if they are only locally, and temporarily, valid. 3.1 Feature Spaces In principle, a wide range of features could be used for tracking, including color, texture, shape and motion. Each potential feature space typically has dozens of tunable parameters, and therefore the full set of potential features that could be used for tracking is enormous. In this work, we represent target appearance using histograms of color filter bank responses applied to R, G, B pixel values within local image windows. This representation is chosen since it is relatively insensitive to variations in target appearance due to viewpoint, occlusions and non-rigidity. Although we only consider color features in this paper, the approach can in principle be extended to incorporate other cues such as texture and object motion. The set of candidate features is composed of linear combinations of camera R,G,B pixel val- ues. Specifically, for our experiments, we have chosen the following set of feature-space candidates F1 � fw1R+w2G+w3B j w� 2 [�2;�1;0;1;2]g (1) that is, linear combinations composed of integer coefficients between -2 and 2. The total num- ber of such candidates would be 53, but by pruning redundant coefficients where (w01;w02;w03) = k(w1;w2;w3), and by disallowing (w1;w2;w3) = (0;0;0), we are left with a pool of 49 features. This set of candidate features is chosen because: 1) the features are efficient to compute (only inte- ger arithmetic is involved); 2) the features approximately uniformly sample the set of 1D subspaces of 3D RGB space; and 3) some common features from the literature are covered in the candidate space, such as raw R, G and B values, intensity R+G+B, approximate chrominance features such as R-B, and so-called excess color features such as 2G-R-B. All features are normalized into the range 0 to 255, and further discretized into histograms of length 2b values, where b is the number of bits of resolution to use. We typically discretize to 5 or 6 bits, yielding feature histograms with 32 or 64 buckets. This discretization is performed for efficiency, and for defeating the “curse of dimensionality” when trying to estimate feature densities from small numbers of samples. 3.2 Evaluating Feature Discriminability If both object and background were uni-colored, then a plausible argument could be made that variation in apparent color of pixels would lead to Gaussian distributions in color space. In this case, Linear Discriminant Analysys (LDA) could be used to find the subspace projection yielding the least overlap (i.e. maximum separability) between object and background. However, we must be able to handle targets and backgrounds that have multi-modal distributions of colors. These violate LDA’s Gaussian assumption, and thus invalidate its analytic solution. 5 Our approach is to empirically evaluate each candidate feature to determine which ones yield good class separability. For a given feature, we measure separability between the object and back- ground classes by 1) estimating the distributions of object and background pixels with respect to the feature; 2) computing the log likelihood ratio of these distributions; and 3) applying a vari- ance ratio measure to the distribution of likelihood values from object vs background. Figure 2 illustrates this process. Figure 2: Empirical evaluation of a candidate feature, demonstrated on an IR image of a truck. Histograms of feature values for object and background pixels are used to compute a log likelihood function in which object pixels have positive values and background pixels have negative values. When mapped back into image space, the result is a 2D “likelihood” image that can be used to track the object. The variance ratio is computed from histograms of these likelihood values for object and background pixels to determine separability of the two classes, which correlates well with suitability of the likelihood image for tracking. We use a “center-surround” approach to sampling pixels from the object and the background. That is, a compact set of pixels (e.g. rectangle or ellipse) covering the object is chosen to represent the object pixels, while a larger ring of neighboring pixels surrounding that region is chosen to represent the background. This is a conservative strategy that leads to discriminative features that separate object from background regardless of which direction the object maneuvers in the image. Of course, one could sample background appearance in other ways. For example, we could bias selection of pixels from the area of the image that we expect the object to traverse in the future, given its recent trajectory. Given a feature f , let Hob j(i) be a histogram of that feature’s values for pixels on the object, and Hbg(i) be a histogram for pixels from the background sample, where index i ranges from 1 to 2b, the number of histogram buckets. We form an empirical discrete probability density p(i) for the object, and density q(i) for the background, by normalizing each histogram by the number of 6 elements in it: p(i) = Hob j(i)=nob j (2) q(i) = Hbg(i)=nbg (3) with nob j and nbg being the number of object and background samples, respectively. The log likelihood of a feature value i is given by L(i) = logmaxfp(i);δg maxfq(i);δg (4) where δ is a small value (we set it to 0.001) that prevents dividing by zero or taking the log of zero. The nonlinear log likelihood ratio maps potentially multimodal object/background distributions into positive values for colors distinctive to the object, and negative for colors associated with the background. Colors that are shared by both object and background tend towards zero. A new image composed of these log likelihood values becomes the “likelihood” image used for tracking ( Figure 2). Finally, we compute the variance ratio of L(i) in order to quantify the separability of object and background classes under feature f . Given a discrete probability density function a(i), we use the equality var(x) = Ex2� (Ex)2 to define the variance of L(i) with respect to a as var(L;a) = ∑ i a(i)L2(i)� [∑ i a(i)L(i)]2 : (5) The variance ratio of the log likelihood function can now be defined as VR(L; p;q)� var(L;(p+q)=2) [var(L; p)+var(L;q)] (6) which is the total variance of L over both object and background pixels, divided by the sum of the within class variances of L when object and background pixels are treated separately. The intuition behind the variance ratio is that we would like log likelihood values of pixels on the object and background to both be tightly clustered (low within class variance), while the two clusters should ideally be spread apart as much as possible (high total variance). The denominator enforces that the within class variances should be small for both object and background classes, while the numerator rewards cases where values associated with object and background are widely separated. Note the similarity to the Fisher discriminant used in the computation of LDA, where the squared difference between the mean values of the two classes is used as an alternative measure of total variance. 3.3 Ranked Likelihood Images If a feature’s two-class log likelihood function from the previous step is used to label pixels in a new video frame, the result is a likelihood image where, ideally, object pixels contain positive 7 values and background pixels contain negative values. Figure 3 shows a sample object, and the set of likelihood images produced by all 49 candidate features, after rank-ordering the features based on the two-class variance ratio measure. The likelihood image for the most discriminative feature is at the upper left, and the image for least discriminative feature is at the lower right. We observe a very high correlation between variance-ratio ranking and suitability of the likelihood image for localizing the object in the next frame. (A) (B) Figure 3: (A) A sample image with concentric boxes delineating object and background samples. (B) Likelihood images produced by all 49 candidate feature spaces, rank-ordered by the two-class variance ratio measure. The likelihood image for the most discriminative feature (which is also best for tracking) is drawn in the upper left. The image for least discriminative feature (worst for tracking) is at the lower right. Figure 4 shows other sample images with labeled object and background pixels, along with log likelihood images associated with the features having highest, median, and lowest variance ratio values, corresponding to the best, median and worst features, respectively, in terms of ob- ject/background separability. Again, we see good agreement between these rankings and our intu- itive preference regarding which likelihood images to use for tracking. 3.4 Tracking The above feature ranking mechanism is embedded in a tracking system as depicted in Figure 5. Object pixels and background pixels are sampled from the current frame, given the current loca- 8 Figure 4: Sample video frames with ranked likelihood images. Left column: frame with labeled object (green box) and background pixels (red box) pixels. Second-fourth columns: likelihood images correspond- ing to the highest ranked, median, and lowest ranked features, respectively. We can see that rank ordering features by two-class variance ratio correlates well w

                    本文档为【Online selection of discriminative tacking features】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

Online selection of discriminative tacking features

你可能还喜欢