图像处理（特征提取）必看文献mikolajczyk_ijcv2004

图像处理（特征提取）必看文献mikolajczyk_ijcv2004 International Journal of Computer Vision 60(1), 63–86, 2004 c© 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. Scale & Affine Invariant Interest Point Detectors KRYSTIAN MIKOLAJCZYK AND CORDELIA SCHMID INRIA Rhne-Alpes GRAVIR-CNRS, 655 av....

International Journal of Computer Vision 60(1), 63–86, 2004 c© 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. Scale & Affine Invariant Interest Point Detectors KRYSTIAN MIKOLAJCZYK AND CORDELIA SCHMID INRIA Rhne-Alpes GRAVIR-CNRS, 655 av. de l’Europe, 38330 Montbonnot, France Krystian.Mikolajczyk@inrialpes.fr Cordelia.Schmid@inrialpes.fr Received January 3, 2003; Revised September 24, 2003; Accepted January 22, 2004 Abstract. In this paper we propose a novel approach for detecting interest points invariant to scale and affine transformations. Our scale and affine invariant detectors are based on the following recent results : (1) Interest points extracted with the Harris detector can be adapted to affine transformations and give repeatable results (geometrically stable). (2) The characteristic scale of a local structure is indicated by a local extremum over scale of normalized derivatives (the Laplacian). (3) The affine shape of a point neighborhood is estimated based on the second moment matrix. Our scale invariant detector computes a multi-scale representation for the Harris interest point detector and then selects points at which a local measure (the Laplacian) is maximal over scales. This provides a set of distinctive points which are invariant to scale, rotation and translation as well as robust to illumination changes and limited changes of viewpoint. The characteristic scale determines a scale invariant region for each point. We extend the scale invariant detector to affine invariance by estimating the affine shape of a point neighborhood. An iterative algorithm modifies location, scale and neighborhood of each point and converges to affine invariant points. This method can deal with significant affine transformations including large scale changes. The characteristic scale and the affine shape of neighborhood determine an affine invariant region for each point. We present a comparative evaluation of different detectors and show that our approach provides better results than existing methods. The performance of our detector is also confirmed by excellent matching results; the image is described by a set of scale/affine invariant descriptors computed on the regions associated with our points. Keywords: interest points, local features, scale invariance, affine invariance, matching, recognition 1. Introduction Local features have been shown to be well suited to matching and recognition as well as to many other ap- plications as they are robust to occlusion, background clutter and other content changes. The difficulty is to obtain invariance to viewing conditions. Different solu- tions to this problem have been developed over the past few years and are reviewed in Section 1.1. These ap- proaches first detect features and then compute a set of descriptors for these features. In the case of significant transformations, feature detection has to be adapted to the transformation, as at least a subset of the fea- tures must be present in both images in order to allow for correspondences. Features which have proved to be particularly appropriate are interest points. How- ever, the Harris interest point detector is not invari- ant to scale and affine transformations (Schmid et al., 2000). In this paper we give a detailed description of a scale and an affine invariant interest point detector introduced in Mikolajczyk and Schmid (2001, 2002). Our approach combines the Harris detector with the Laplacian-based scale selection. The Harris-Laplace detector is then extended to deal with significant affine transformations. Previous detectors partially handle the problem of affine invariance since they 64 Mikolajczyk and Schmid assume that the localization and scale are not affected by an affine transformation of the local image struc- tures. The proposed improvements result in better re- peatability and accuracy of interest points. Moreover, the scale invariant Harris-Laplace approach detects dif- ferent regions than the DoG detector (Lowe, 1999). The latter one detects mainly blobs, whereas the Harris de- tector responds to corners and highly textured points, hence these detectors extract complementary features in images. If the scale change between images is known, we can adapt the Harris detector to the scale change (Dufournaud et al., 2000) and we then obtain points, for which the localization and scale perfectly reflect the real scale change between two images. If the scale change between images is unknown, a simple way to deal with scale changes is to extract points at several scales and to use all these points to represent an im- age. The problem with a multi-scale approach is that in general a local image structure is present in a certain range of scales. The points are then detected at each scale within this range. As a consequence, there are many points, which represent the same structure, but the location and the scale of the points is slightly differ- ent. The unnecessarily high number of points increases the probability of mismatches and the complexity of the matching algorithms. In this case, efficient methods for rejecting the false matches and for verifying the results are necessary. Our scale invariant approach solves this problem by selecting the points in the multi-scale representation which are present at characteristic scales. Local ex- trema over scale of normalized derivatives indicate the presence of characteristic local structures (Lindeberg, 1998). Here we use the Laplacian-of-Gaussian to se- lect points localized at maxima in scale-space. This detector can deal with significant scale changes, as pre- sented in Section 2. To obtain affine invariant points, we adapt the shape of the point neighborhood. The affine shape is determined by the second moment ma- trix (Lindeberg and Garding, 1997). We then obtain a truly affine invariant image description which gives stable/repeatable results in the presence of arbitrary viewpoint changes. Note that a perspective transforma- tion of a smooth surface can be locally approximated by an affine transformation. Although smooth surfaces are almost never planar in the large, they are always planar in the small that is, sufficiently small surface patches can always be thought of as being comprised of coplanar points. Of course this does not hold if the point is localized on a depth boundary. However, such points are rejected during the subsequent steps, for ex- ample during matching. An additional post-processing method can be used to separate the foreground from the background (Borenstein and Ullman, 2002; Mikolajczyk and Schmid, 2003b). The affine invari- ant detector is presented in Section 3. To measure the accuracy of our detectors we introduce a repeatability criterion which we use to evaluate and compare our detectors to existing approaches. Section 4 presents the evaluation criteria and the results of the compar- ison, which shows that our detector performs better then existing ones. Finally, in Section 5 we present experimental results for matching. 1.1. Related Work Many approaches have been proposed for extracting scale and affine invariant features. These are reviewed in the following. Scale Invariant Detectors. There are a few ap- proaches which are truly invariant to significant scale changes. Typically, such techniques assume that the scale change is the same in every direction, although they exhibit some robustness to weak affine deforma- tions. Existing methods search for local extrema in the 3D scale-space representation of an image (x, y and scale). This idea was introduced in the early eighties by Crowley (1981) and Crowley and Parker (1984). In this approach the pyramid representation is computed using difference-of-Gaussian filters. A feature point is detected if a local 3D extremum is present and if its absolute value is higher than a threshold. The existing approaches differ mainly in the differential expression used to build the scale-space representation. Lindeberg (1998) searches for 3D maxima of scale normalized differential operators. He proposes to use the Laplacian-of-Gaussian (LoG) and several other derivative based operators. The scale-space represen- tation is built by successive smoothing of the high res- olution image with Gaussian based kernels of different size. The LoG operator is circularly symmetric and it detects blob-like structures. The scale invariance of in- terest point detectors with automatic scale selection has also been explored by Bretzner and Lindeberg (1998) in the context of tracking. Lowe (1999) proposed an efficient algorithm for object recognition based on local 3D extrema in Scale & Affine Invariant Interest Point Detectors 65 the scale-space pyramid built with difference-of- Gaussian (DoG) filters. The input image is successively smoothed with a Gaussian kernel and sampled. The difference-of-Gaussian representation is obtained by subtracting two successive smoothed images. Thus, all the DoG levels are constructed by combined smoothing and sub-sampling. The local 3D extrema in the pyramid representation determine the localization and the scale of the interest points. The DoG operator is a close ap- proximation of the LoG function but the DoG can sig- nificantly accelerate the computation process (Lowe, 1999). A few images per second can be processed with this algorithm. The common drawback of the DoG and the LoG rep- resentation is that local maxima can also be detected in the neighborhood of contours or straight edges, where the signal change is only in one direction. These max- ima are less stable because their localization is more sensitive to noise or small changes in neighboring tex- ture. A more sophisticated approach, solving this prob- lem, is to select the scale for which the trace and the determinant of the Hessian matrix (H) simultaneously assume a local extremum (Mikolajczyk, 2002). The trace of the H matrix is equal to the LoG but detect- ing simultaneously the maxima of the determinant pe- nalizes points for which the second derivatives detect signal changes in only one direction. A similar idea is explored in the Harris detector, although it uses the first derivatives. The second derivative gives a small response exactly in the point where the signal change is most significant. Therefore the maxima are not lo- calized exactly at the largest signal variation, but in its neighborhood. A different approach for the scale selection was pro- posed by Kadir and Brady (2001). They explore the idea of using local complexity as a measure of saliency. The salient scale is selected at the entropy extremum of the local descriptors. The selected scale is therefore descriptor dependent. The method searches for scale lo- calized features with high entropy, with the constraint that the scale is isotropic. Affine Invariant Detectors. An affine invariant de- tector can be seen as a generalization of the scale in- variant detector. In the case of an affine transformation the scaling can be different in each direction. The non- uniform scaling has an influence on the localization, the scale and the shape of a local structure. Therefore, the scale invariant detectors fail in the case of significant affine transformations. An affine invariant algorithm for corner detection was proposed by Alvarez and Morales (1997). They apply affine morphological multi-scale analysis to ex- tract corners. For each extracted point they build a chain of points detected at different scales, but associated with the same local image structure. The final loca- tion and orientation of the corner is computed using the bisector line given by the chain of points. A similar idea was previously explored by Deriche and Giraudon (1993). The main drawback of these approaches is that an interest point in images of natural scenes cannot be approximated by a model of a perfect corner, as it can take any form of a bi-directional signal change. The real points detected at different scales do not move along a straight bisector line as the texture around the points significantly influences the location of the local maxima. This approach cannot be a general solution to the problem of affine invariance but gives good re- sults for images where the corners and multi-junctions are formed by straight or nearly straight step-edges. Our approach makes no assumption on the form of a local structure. It only requires a bi-directional signal change. Recently, Tuytelaars and Van Gool (1999, 2000) pro- posed two approaches for detecting image features in an affine invariant way. The first one starts from Harris points and uses the nearby edges. Two nearby edges, which are required for each point, limit the number of potential features in an image. A parallelogram region is bounded by these two edges and the initial Harris point. Several intensity based functions are used to de- termine the parallelogram. In this approach, a reliable algorithm for extracting the edges is necessary. The sec- ond method is purely intensity-based and starts with ex- traction of local intensity extrema. Next, the algorithm investigates the intensity profiles along rays going out of the local extremum. An ellipse is fitted to the re- gion determined by significant changes in the intensity profiles. A similar approach based on local intensity extrema was introduced by Matas et al. (2002). They use the water-shed algorithm to find intensity regions and fit an ellipse to the estimated boundaries. Lindeberg and Garding (1997) developed a method for finding blob-like affine features with an iterative procedure in the context of shape from texture. The affine invariance of shape adapted fixed points was also used for estimating surface orientation from binocular data (shape from disparity gradients). This work pro- vided the theory for the affine invariant detector pre- sented in this paper. It explores the properties of the 66 Mikolajczyk and Schmid second moment matrix and iteratively estimates the affine transformation of local patterns. The authors pro- pose to extract the points using the maxima of a uniform scale-space representation and to iteratively modify the scale and the shape of points. However, the location of points is detected only at the initial step of the algo- rithm, by the circularly symmetric, not affine invariant Laplacian measure. Therefore, the spatial location of the maximum can be slightly different if the pattern un- dergoes a significant affine deformation. This method was also applied to detect elliptical blobs in the con- text of hand tracking (Laptev and Lindeberg, 2001). The affine shape estimation was used for matching and recognition by Baumberg (2000). He extracts interest points at several scales using the Harris detector and then adapts the shape of the point neighborhood to the local image structure using the iterative procedure proposed by Lindeberg. The affine shape is estimated for a fixed scale and fixed location, that is the scale and the location of the points are not extracted in an affine invariant way. The points as well as the associ- ated regions are therefore not invariant in the case of significant affine transformations (see Section 4.1 for a quantitative comparison). Furthermore, there are many points repeated at the neighboring scale levels (Fig. 2), which increases the probability of false matches and the complexity. Recently, Schaffalitzky and Zisser- man (2002) extended the Harris-Laplace detector (Mikolajczyk and Schmid, 2001) by affine normaliza- tion proposed by Baumberg (2000). However, the loca- tion and scale of points are provided by the scale invari- ant Harris-Laplace detector (Mikolajczyk and Schmid, 2001), which is not invariant to significant affine transformations. 2. Scale Invariant Interest Point Detector The evaluation of interest point detectors presented in Schmid et al. (2000) demonstrate an excellent perfor- mance of the Harris detector compared to other exis- ting approaches (Cottier, 1994; Forstner, 1994; Heitger et al., 1992; Horaud et al., 1990). However this detec- tor is not invariant to scale changes. In this section we propose a new interest point detector that combines the reliable Harris detector (Harris and Stephens, 1988) with automatic scale selection (Lindeberg, 1998) to ob- tain a scale invariant detector. In Section 2.1 we intro- duce the methods on which we base the approach. In Section 2.2 we discuss in detail the scale invariant detector and present an example of extracted points. 2.1. Feature Detection in Scale-Space Scale Adapted Harris Detector. The Harris detector is based on the second moment matrix. The second moment matrix, also called the auto-correlation matrix, is often used for feature detection or for describing local image structures. This matrix must be adapted to scale changes to make it independent of the image resolution. The scale-adapted second moment matrix is defined by: µ(x, σI , σD) = [ µ11 µ12 µ21 µ22 ] = σ 2D g(σI ) ∗ [ L2x (x, σD) Lx L y(x, σD) Lx L y(x, σD) L2y(x, σD) ] (1) where σI is the integration scale, σD is the differen- tiation scale and La is the derivative computed in the a direction. The matrix describes the gradient distri- bution in a local neighborhood of a point. The local derivatives are computed with Gaussian kernels of the size determined by the local scale σD (differentiation scale). The derivatives are then averaged in the neigh- borhood of the point by smoothing with a Gaussian window of size σI (integration scale). The eigenvalues of this matrix represent two principal signal changes in the neighborhood of a point. This property enables the extraction of points, for which both curvatures are significant, that is the signal change is significant in the orthogonal directions i.e. corners, junctions etc. Such points are stable in arbitrary lighting conditions and are representative of an image. One of the most reliable in- terest point detectors, the Harris detector (Harris and Stephens, 1988), is based on this principle. The Harris measure combines the trace and the determinant of the second moment matrix: cornerness = det(µ(x, σI, σD)) − αtrace2(µ(x, σI, σD)) (2) Local maxima of cornerness determine the location of interest points. Automatic Scale Selection. Automatic scale selec- tion and the properties of the selected scales have been extensively studied by Lindeberg (1998). The idea is to select the characteristic scale of a local structure, for which a given function attains an extremum over scales. In relation to automatic scale selection, the term char- acteristic originally referred to the fact that the selected Scale & Affine Invariant Interest Point Detectors 67 scale estimates the characteristic length of the corre- sponding image structures, in a similar manner as the notion of characteristic length is used in physics. The selected scale is characteristic in the quantitative sense, since it measures the scale at which there is maximum similarity between the feature detection operator and the local image structures. This scale estimate will (for a given image operator) obey perfect scale invariance under rescaling of the image pattern. Given a point in an image and a scale selection op- erator we compute the operator responses for a set of scales σn (Fig. 1). The characteristic scale corre- sponds to the local extremum of the responses. Note that there might be several maxima or minima, that is several characteristic scales corresponding to differ- ent local structures centered on this point. The char- acteristic scale is relatively independent of the image resolution. It is related to the structure and not to the resolution at which the structure is represented. The ratio of the scales at which the extrema are found for corresponding points is the actual scale factor between the point neighborhoods. In Mikolajczyk and Schmid (2001) we compared several differential o

                    本文档为【图像处理（特征提取）必看文献mikolajczyk_ijcv2004】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

图像处理（特征提取）必看文献mikolajczyk_ijcv2004

你可能还喜欢