首页 英文版-Conditional Quantification(or:Poor Mans Probability)

英文版-Conditional Quantification(or:Poor Mans Probability)

举报
开通vip

英文版-Conditional Quantification(or:Poor Mans Probability) Conditional quantification, or poor man’s probability Michiel van Lambalgen∗ ILLC University of Amsterdam Plantage Muidergracht 24 NL 1018 TV Amsterdam michiell@wins.uva.nl June 25, 1999 1 Introduction The purpose of this note is to ta...

英文版-Conditional Quantification(or:Poor Mans Probability)
Conditional quantification, or poor man’s probability Michiel van Lambalgen∗ ILLC University of Amsterdam Plantage Muidergracht 24 NL 1018 TV Amsterdam michiell@wins.uva.nl June 25, 1999 1 Introduction The purpose of this note is to take a few steps toward a first order logic of incomplete information, based on the idea that such a logic may be obtained by importing an essential ingredient of probability theory, conditional expectation, into the logic. The reason that we focus on conditional expectation is that it is often used to model a kind of incomplete knowledge that is of interest in itself: incomplete knowledge about the value of a variable, where the ‘degree of incompleteness’ is given by some algebraic structure. Conditional expectation often comes into play when what can be described as a change of granularity is involved. Formal details will be given below, but a typical example is this. Let Ω be a sample space equipped with a second countable Hausdorff topology, and let X : Ω −→ IR be a random variable. Think of X as representing some measurement apparatus. If B is the Borel σ-algebra on Ω, then the fact that singletons are in B represents the assumption that an outcome can be observed with arbitrary precision. In practice, however, it may be impossible to observe an outcome X(ω) with arbitrary precision. For instance, the best we can do may be to locate X(ω) in some element of a partition of IR into intervals of length ǫ. Let B′ be the σ-algebra generated by those intervals. Then G = X−1B′ is strictly contained in B. The conditional expectation E(X|G) is in a sense X viewed from the perspective of G; the values of X are averaged over an element ∗I am indebted to Samson Abramsky, Alan Bundy, Jaap van der Does, Rosalie Iemhoff, Alekos Kechris, Daniele Turi and Marco Vervoort for helpful comments. Support by the Netherlands Organisation for Scientific research (NWO) under grant PGS 22 262 is gratefully acknowledged. This paper was begun in Amsterdam and finished at HCRC in Edinburgh. I thank Keith Stenning for his friendship and hospitality, and the EPSRC for financial support. 1 in the generated partition on Ω. This is brought out by the defining condition of E: E(X|G) is a G-measurable random variable such that for A in G, ∫ A E(X|G)dP = ∫ A XdP. Incomplete information is not always a negative qualification. In some cases one wants to expressly reduce the information present in a signal in order to make that information more useful. This happens for instance when noise is suppressed in an audio signal. Moving to a coarser granularity can thus be beneficial as well. For our present purposes it is of interest that this process of filtering is also modelled by means of conditional expectation. These two aspects of changes in granularity occur as well in qualitative reasoning with incomplete information. In an interesting programmatic paper, Hobbs [8] emphasises the positive aspects of switching to coarser granularities: Our ability to conceptualize the world at different granularities and to switch among these granularities is fundamental to our in- telligence and flexibility. It enables us to map the complexities of the world around us into simple theories that are computationally tractable to reason in. A few years earlier, Marr, in his book Vision, emphasised the same point while sketching a program for semantics (Marr [12, p. 357–8]) I expect that at the heart of our understanding of intelligence will lie at least one and probably several important principles abour orga- nizing and representing knowledge that in some sense capture what is important about our intellectual capabilities. . . . The perception of an event or an object must include the simultaneous computation of several different descriptions of it that capture different aspects of the use, purpose or circumstances of the event or object. . . . The var- ious descriptions include coarse versions as well as fine ones. These coarse descriptions are a vital link in choosing appropriate overall scenarios . . . and in correctly establishing the roles played by the objects and actions that caused those scenarios to be chosen. In fact, natural language even has adverbial expressions, such as ‘actually’, ‘really’, which indicate such shifts. Here is an example due to Asher and Vieu The point of this pencil is actually an irregular surface with several peaks. It would be useful to have a formal apparatus capturing the process of simulta- neously viewing the world at different scales; the claim of this paper is that one way to approach this problem is via a suitable notion of generalised quantifica- tion. We are thus motivated by the following analogy. Moving to a larger grain size is analogous to taking conditional expectation, and it will be shown that 2 this process can be captured by a quantifier which shares many formal proper- ties with conditional expectation. We will investigate several ways of defining grain size with their associated conditioning structures and quantifiers. An im- portant (and largely open) question that arises here is: when and why can we be confident that a result (for instance a plan) obtained in a coarse world also applies to the real, or at least a finer world? This leads to the study of a type of preservation theorems which are analogous to the martingale convergence theorems of probability theory. The ideas sketched above have been applied to the development of a logic of visual perception in a series of papers [21], [19] and [20]; the present article intends to provide some theoretical complements. To further clarify the point of departure of the present article it is useful to contrast it with the aims of Keisler’s probability logic (cf. Keisler [10]). Keisler presents an infinitary axiomatisation of such concepts as probability measure, integral and conditional expectation, and proves the completeness of the ax- iomatisation with respect to the usual probability spaces. By contrast, we look at what conditional expectation is used for in probability theory, try to iden- tify situations of a more qualitative nature where approximately the same use is called for, and define qualitative analogues of conditional expectation which can perform these functions. There appears to be a certain logical and philo- sophical interest in this procedure, because it touches upon the old problem of the relation between logic and probability. It is our contention that logic and probability are alike in that their fundamental concepts, quantification and con- ditional expectation, are basically the same. Indeed, it has been observed several times that algebraically speaking the existential quantifier and conditional ex- pectation are in the same class of operators on a Boolean algebra, namely the hemimorphisms1 α satisfying the averaging identity α(p ∧ αq) = αp ∧ αq. (See, for example, Wright [24],[25] and the references given therein.) Furthermore Ellerman and Rota [4] showed that existential quantification ∃x is conditional expectation with respect to the algebra determined by projection along x and a suitably generalised notion of measure, which is such that even on uncountable models, every subset of the domain has positive measure. What remains to be done is to further exploit the analogy and to generalise existential quantification in such a way that the very special algebra used to define ∃x (namely, the algebra of the sets of assignments obtained by projection along x) can be replaced by arbitrary Boolean algebras, or even more general structures. In view of the analogy with conditional expectation, the new quan- tifiers will be called conditional quantifiers. It will be very useful to think of these quantifiers as resource-bounded quantifiers: the resource is some algebraic structure which determines the granularity of the available information with respect to which one quantifies. For example, put in these terms quantification ∃xϕ means that we have no information about x, but full information about the other variables. The remainder of the paper is organised as follows. We shall first, in sec- 1A hemimorphism on a lattice L with 1 is a mapping α : L −→ L satisfying α(a ∨ b) = α(a) ∨ α(b) and α1 = 1. 3 tion 2, briefly rehearse theory and applications of conditional expectation in so far as relevant for our purposes. We then look at several examples of shifts of granularity, and show that they call for different types of resources (section 3). In section 4 the resources considered are Boolean algebras, and we try to retain as many of the properties of the existential quantifier as possible. Although we prove an infinitary completeness theorem here, the results are not very satis- factory, and in the next section (6) we consider, along with Boolean resources, a modified notion of quantifier, namely a quantifier ∃ which lacks the property ϕ ≤ ∃ϕ. These quantifiers, which bear a stronger resemblance to conditional quantification then the existential quantifier itself, have a smoother theory, al- though at some points we need the continuum hypothesis to get things off the ground. In section 8 we consider resources which do not have the structure of a Boolean algebra, notably Lawvere’s co-Heyting algebras (Lawvere [11]). These algebras naturally arise in situations where there is a distinction between positive and negative information. Again we supply an infinitary axiomatisa- tion with a completeness theorem. Lastly, we return to our original motivation, conditional quantification as capturing shifts of granularity, which leads us to define a logical analogue of the martingales familiar from probability theory. We show that there exists a natural relation between nonmonotonic reasoning and martingale convergence, and we close with a conjecture on the precise form of martingale convergence in the logical setting. 2 Conditional expectation In this section we give a very brief introduction to the fundamental properties of conditional expectation. The reader wishing to know more is advised to consult any introductory treatise on measure theoretic probability, a beautiful specimen of which is Williams’ Probability with martingales [23]. We begin with a simple example. Suppose we have a variable X on a sample space Ω which takes values 0 and 1 both with probability 12 . Let Ai be the subset of Ω on which X takes value i, then P (Ai) = 1 2 . Suppose furthermore that we cannot measure X directly, but can only measure X +Y , where Y is some small perturbation. Let Bi ⊆ Ai be such that P (B0) = ǫ 6= P (B1) = δ, where ǫ, δ ≪ 1 2 ; Y takes value 1 2 on B0, value − 1 2 on B1 and 0 elsewhere. Then X +Y takes value 0 on A0−B0, value 1 on A1 − B1 and value 1 2 on B0 ∪ B1. The smallest Boolean algebra B which contains A0−B0, A1−B1 and B0∪B1 does not contain A0 and A1. The algebra B represents the situation that we have incomplete information about X, and precisely codes the kind of information that we do have available about X, namely X + Y , since B is the smallest algebra with respect to which X + Y is measurable. If we can measure only X + Y , not X, this implies that all information about X is represented by the function E(X|B), which takes value 0 on A0 −B0, value 1 2 − δ on A1 −B1, and value δ on B0 ∪B1. It follows that we have no information about the event X > 0 (since A0 /∈ B, but our best estimate for the probability of this event is given by P (E(X|B) > 0) = 12 + ǫ. The next example is more realistic example, with continuous noise. Suppose 4 we have a random variable X on a sample space Ω (measurable with respect to a σ-algebra B) which we want to measure; for the sake of definiteness, assume X is distributed as the Gaussian N(0, σ2). Due to noise, we can only observe X + c.ξ, where ξ is independent of X, ξ is distributed as the Gaussian N(0, 1) and c ∈ IR. X + c.ξ can be taken to be measurable with respect to a σ-algebra G which is not the same as B. For example, the event {X < 0} ∈ B is not in G if G is the smallest σ-algebra with respect to which X + c.ξ is measurable. Hence we can only determine properties of X as filtered through G; this is represented by the conditional expectation E(X|G). It is not possible to determine precisely which sample point ω has been chosen. We can only ask whether ω ∈ A for A ∈ G. Then the expected value of X given such information, namely ∫ A XdP , should equal ∫ A E(X|G)dP ; from the point of view of G, no other questions about X can be answered. This leads to the following Definition 1 Let B be a σ-algebra on Ω. A conditional expectation of a B- measurable random variable X : Ω −→ IR with respect to the sub-σ-algebra G and the probability measure P is a function E(X|G) : Ω −→ IR satisfying ∫ A E(X|G)dP = ∫ A XdP. The Radon-Nikodym theorem is used to show that a function E(X|G)(ω) with these properties exists. However, it is not unique, in the sense that for given X two versions of E(X|G) only agree almost everywhere2. In the following list of properties of E(X|G), the (in)equalities must therefore be understood as holding almost everywhere: 1. E(1∅|G) = 0, E(1Ω|G) = 1; 2. X ⊆ Y implies E(1X |G) ≤ E(1Y |G); and 3. E(X ·E(Y |G)|G) = E(X|G) ·E(Y |G). All three properties are strongly suggestive of quantifier properties, as has often been remarked in the literature; see, for example, Birkhoff’s survey paper [3]. However, conditional expectation as constructed above by means of the Radon-Nikodym theorem is not yet quite analogous to quantification, since the construction is not uniform. For example, for each X,Y there is a nullset N such that off N , 3 holds; but there need not be a nullset which does the job for every X,Y . The required uniformity is given by the following definition. For ease of exposition, we formulate it in terms of conditional probability P(·|G) determined by P(A|G) = E(1A|G). Definition 2 A regular conditional probability is a function P(F |G)(ω) : B × Ω −→ [0, 1] satisfying 2The analogues of conditional expectation in section 6 suffer from this non-uniqueness as well. 5 1. for each F ∈ B, P(F |G)(·) = E(1F |G) a.s. 2. for almost every ω, the map F 7→ P(F |G)(ω) is a probability measure on B. The last condition ensures that the additivity and monotonicity properties hold in a uniform manner, in the sense that there exists a single nullset such that outside this nullset these properties hold. Regular conditional probabilities do not always exist, but we need not inquire into the conditions for existence here. It is important to observe that also in this case the function P(F | G)(ω) is not completely determined by G, and that essential reference is made to nullsets. In probability one often considers families of conditional expectation opera- tors. Since this suggests new ways of looking at quantifiers, we include a brief description. Let T be a directed set, i.e. a set partially ordered by a relation ≥ such that for s, t ∈ T , there exists r ∈ T with r ≥ s, t. A family of algebras {Bs}s∈T is called a net (sometimes filtration) if s ≥ t implies Bs ⊇ Bt. The intuitive idea is that Bs contains all the information at hand at ‘stage’ s (which could be a timepoint, or a location in space etc.). A set of random variables {Xs}s∈T is called adapted (w.r.t. {Bs}s∈T ) if each Xs is measurable with re- spect to Bs. The set {Xs}s∈T is called a martingale if for each s, t such that s ≥ t, E(Xs|Bt) = Xt. For our purposes the following fundamental theorem is of interest. Theorem 1 Let X be a random variable measurable with respect to B = ⋃ s∈T Bs. 1. The family of random variables {E(X | Bs)}s∈T is adapted w.r.t. {Bs}s∈T and is a martingale. 2. lims∈T E(X | Bs) = E(X | B) a.s.. (In many interesting cases, but not always, one has in addition that X = E(X | B) a.s..) The preceding material suggests that the family of quantifiers considered by first order logic, {∃y | y a variable}, may be only one possibility among many interesting families of quantifiers. In section 9 we shall study one example in some detail, namely the logical analogue of a martingale. Another interesting example is furnished by the logical analogue of a Markov random field, a family of quantifiers that was implicitly used in the proof theory for generalised quan- tifiers presented in Alechina and van Lambalgen [1]. In slightly more formal detail: it can be shown that every filter quantifier is definable using a family of resource-bounded quantifiers which is the qualitative analogue of a Markov random field. This however is best left for a separate paper. 3 Examples: partiality and granularity We proceed to give a few examples of conditional quantifiers, motivated by the idea of ‘changes in granularity’. These examples play an important role in the 6 logic of visual perception developed in [21] and [19], and the reader is encouraged to consult the papers cited for fuller information on their use. Here, we shall discuss the motivating examples mostly with reference to Hobbs’ paper. We will return to vision in section 7, after we have developed the necessary technical apparatus. 3.1 Restricted signature Let L be a first order language, L′ a sublanguage of L containing only the ‘relevant’ predicates of L. A concrete case is furnished by visual occlusion: if I am looking at the back of someone’s head, a predicate such as ‘mouth’ is not applicable, hence not part of my model of the situation, or, equivalently, not part of the language with which I describe that situation. Given a model M (‘the real world’), we may then define an equivalence relation E on M by E(a, b) iff for all formulas ϕ(x) in the language L′, M |= ϕ(a) ⇐⇒M |= ϕ(b). Hobbs proposes to define a model N with domain the E-equivalence classes of elements of M, such that (for monadic predicates) N |= A([a]) iff M |= A(a). By the choice of E, this is welldefined. However, apart from the fact that this does not generalise to predicates of higher arity, the truth definition does not capture the intuitive motivation. The trouble lies in the direction from left to right: if N |= A([a]) then M |= A(a). When planning a hike, a curve on the map (represented by a suitable equivalence class) be a trail from x to y from the perspective of the map’s scale, without being an actual trail (e.g. it may have been overgrown in part). A more intuitive truth definition is obtained by putting N |= A([a]) iff ∃b(E(a, b)&M |= A(b)). Since E is an equivalence relation this is again welldefined, but the definition now better captures the uncertain nature of the inference from N to M. In order to treat predicates of arbitrary arity, we can define an equivalence re- lationR on assignments byR(f, g) iff for all formulas ϕ in the language L′, M |= ϕ[f ] ⇐⇒M |= ϕ[g]. R generates a quantifier3 ∃R by M |= ∃R[f ] ⇐⇒ ∃g(R(f, g) & M |= ϕ[g]). If A is a predicate of L, ∃RA represents the set of tuples of R-equivalence classes corresponding to tuples satisfying A. ∃R thus defines a model N which can be viewed as a coarsening of M4. Let B be the algebra of L-definable sets of assignments, and let G be the subalgebra of B determined by L′. We will see later that ∃R is in a sense a quantifier conditional on G. It will also become clear later that ∃R only has the right properties for a resource-bounded quantifier when N is finite, but at this stage ∃R is useful to fix ideas. The important point to remember is that the ‘best estimate’ of a predicate is represented by a quantifier. We could also use two quantifiers ∃ and ∀ corresponding to best upper and best lower estimate, but in the Boolean case ∃ and ∀ will be dual. 3Quantifiers of this type (which satisfy the S5 axioms) were introduced by Halmos [6]. 4The reader may observe that Hobbs’s truth definition actually corresponds to using the dual quantifier ∀R when interpreting pred
本文档为【英文版-Conditional Quantification(or:Poor Mans Probability)】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_980281
暂无简介~
格式:pdf
大小:334KB
软件:PDF阅读器
页数:46
分类:工学
上传时间:2011-07-03
浏览量:20