Conditional quantification, or poor man’s
probability
Michiel van Lambalgen∗
ILLC
University of Amsterdam
Plantage Muidergracht 24
NL 1018 TV Amsterdam
michiell@wins.uva.nl
June 25, 1999
1 Introduction
The purpose of this note is to take a few steps toward a first order logic of
incomplete information, based on the idea that such a logic may be obtained by
importing an essential ingredient of probability theory, conditional expectation,
into the logic. The reason that we focus on conditional expectation is that it
is often used to model a kind of incomplete knowledge that is of interest in
itself: incomplete knowledge about the value of a variable, where the ‘degree of
incompleteness’ is given by some algebraic structure. Conditional expectation
often comes into play when what can be described as a change of granularity
is involved. Formal details will be given below, but a typical example is this.
Let Ω be a sample space equipped with a second countable Hausdorff topology,
and let X : Ω −→ IR be a random variable. Think of X as representing some
measurement apparatus. If B is the Borel σ-algebra on Ω, then the fact that
singletons are in B represents the assumption that an outcome can be observed
with arbitrary precision. In practice, however, it may be impossible to observe
an outcome X(ω) with arbitrary precision. For instance, the best we can do may
be to locate X(ω) in some element of a partition of IR into intervals of length
ǫ. Let B′ be the σ-algebra generated by those intervals. Then G = X−1B′ is
strictly contained in B. The conditional expectation E(X|G) is in a sense X
viewed from the perspective of G; the values of X are averaged over an element
∗I am indebted to Samson Abramsky, Alan Bundy, Jaap van der Does, Rosalie Iemhoff,
Alekos Kechris, Daniele Turi and Marco Vervoort for helpful comments. Support by the
Netherlands Organisation for Scientific research (NWO) under grant PGS 22 262 is gratefully
acknowledged. This paper was begun in Amsterdam and finished at HCRC in Edinburgh. I
thank Keith Stenning for his friendship and hospitality, and the EPSRC for financial support.
1
in the generated partition on Ω. This is brought out by the defining condition
of E: E(X|G) is a G-measurable random variable such that for A in G,
∫
A
E(X|G)dP =
∫
A
XdP.
Incomplete information is not always a negative qualification. In some cases
one wants to expressly reduce the information present in a signal in order to
make that information more useful. This happens for instance when noise is
suppressed in an audio signal. Moving to a coarser granularity can thus be
beneficial as well. For our present purposes it is of interest that this process of
filtering is also modelled by means of conditional expectation.
These two aspects of changes in granularity occur as well in qualitative
reasoning with incomplete information. In an interesting programmatic paper,
Hobbs [8] emphasises the positive aspects of switching to coarser granularities:
Our ability to conceptualize the world at different granularities
and to switch among these granularities is fundamental to our in-
telligence and flexibility. It enables us to map the complexities of
the world around us into simple theories that are computationally
tractable to reason in.
A few years earlier, Marr, in his book Vision, emphasised the same point while
sketching a program for semantics (Marr [12, p. 357–8])
I expect that at the heart of our understanding of intelligence will
lie at least one and probably several important principles abour orga-
nizing and representing knowledge that in some sense capture what
is important about our intellectual capabilities. . . . The perception
of an event or an object must include the simultaneous computation
of several different descriptions of it that capture different aspects of
the use, purpose or circumstances of the event or object. . . . The var-
ious descriptions include coarse versions as well as fine ones. These
coarse descriptions are a vital link in choosing appropriate overall
scenarios . . . and in correctly establishing the roles played by the
objects and actions that caused those scenarios to be chosen.
In fact, natural language even has adverbial expressions, such as ‘actually’,
‘really’, which indicate such shifts. Here is an example due to Asher and Vieu
The point of this pencil is actually an irregular surface with several
peaks.
It would be useful to have a formal apparatus capturing the process of simulta-
neously viewing the world at different scales; the claim of this paper is that one
way to approach this problem is via a suitable notion of generalised quantifica-
tion.
We are thus motivated by the following analogy. Moving to a larger grain
size is analogous to taking conditional expectation, and it will be shown that
2
this process can be captured by a quantifier which shares many formal proper-
ties with conditional expectation. We will investigate several ways of defining
grain size with their associated conditioning structures and quantifiers. An im-
portant (and largely open) question that arises here is: when and why can we
be confident that a result (for instance a plan) obtained in a coarse world also
applies to the real, or at least a finer world? This leads to the study of a type
of preservation theorems which are analogous to the martingale convergence
theorems of probability theory. The ideas sketched above have been applied to
the development of a logic of visual perception in a series of papers [21], [19]
and [20]; the present article intends to provide some theoretical complements.
To further clarify the point of departure of the present article it is useful to
contrast it with the aims of Keisler’s probability logic (cf. Keisler [10]). Keisler
presents an infinitary axiomatisation of such concepts as probability measure,
integral and conditional expectation, and proves the completeness of the ax-
iomatisation with respect to the usual probability spaces. By contrast, we look
at what conditional expectation is used for in probability theory, try to iden-
tify situations of a more qualitative nature where approximately the same use
is called for, and define qualitative analogues of conditional expectation which
can perform these functions. There appears to be a certain logical and philo-
sophical interest in this procedure, because it touches upon the old problem of
the relation between logic and probability. It is our contention that logic and
probability are alike in that their fundamental concepts, quantification and con-
ditional expectation, are basically the same. Indeed, it has been observed several
times that algebraically speaking the existential quantifier and conditional ex-
pectation are in the same class of operators on a Boolean algebra, namely the
hemimorphisms1 α satisfying the averaging identity α(p ∧ αq) = αp ∧ αq. (See,
for example, Wright [24],[25] and the references given therein.) Furthermore
Ellerman and Rota [4] showed that existential quantification ∃x is conditional
expectation with respect to the algebra determined by projection along x and a
suitably generalised notion of measure, which is such that even on uncountable
models, every subset of the domain has positive measure.
What remains to be done is to further exploit the analogy and to generalise
existential quantification in such a way that the very special algebra used to
define ∃x (namely, the algebra of the sets of assignments obtained by projection
along x) can be replaced by arbitrary Boolean algebras, or even more general
structures. In view of the analogy with conditional expectation, the new quan-
tifiers will be called conditional quantifiers. It will be very useful to think of
these quantifiers as resource-bounded quantifiers: the resource is some algebraic
structure which determines the granularity of the available information with
respect to which one quantifies. For example, put in these terms quantification
∃xϕ means that we have no information about x, but full information about
the other variables.
The remainder of the paper is organised as follows. We shall first, in sec-
1A hemimorphism on a lattice L with 1 is a mapping α : L −→ L satisfying α(a ∨ b) =
α(a) ∨ α(b) and α1 = 1.
3
tion 2, briefly rehearse theory and applications of conditional expectation in so
far as relevant for our purposes. We then look at several examples of shifts of
granularity, and show that they call for different types of resources (section 3).
In section 4 the resources considered are Boolean algebras, and we try to retain
as many of the properties of the existential quantifier as possible. Although we
prove an infinitary completeness theorem here, the results are not very satis-
factory, and in the next section (6) we consider, along with Boolean resources,
a modified notion of quantifier, namely a quantifier ∃ which lacks the property
ϕ ≤ ∃ϕ. These quantifiers, which bear a stronger resemblance to conditional
quantification then the existential quantifier itself, have a smoother theory, al-
though at some points we need the continuum hypothesis to get things off the
ground. In section 8 we consider resources which do not have the structure
of a Boolean algebra, notably Lawvere’s co-Heyting algebras (Lawvere [11]).
These algebras naturally arise in situations where there is a distinction between
positive and negative information. Again we supply an infinitary axiomatisa-
tion with a completeness theorem. Lastly, we return to our original motivation,
conditional quantification as capturing shifts of granularity, which leads us to
define a logical analogue of the martingales familiar from probability theory.
We show that there exists a natural relation between nonmonotonic reasoning
and martingale convergence, and we close with a conjecture on the precise form
of martingale convergence in the logical setting.
2 Conditional expectation
In this section we give a very brief introduction to the fundamental properties of
conditional expectation. The reader wishing to know more is advised to consult
any introductory treatise on measure theoretic probability, a beautiful specimen
of which is Williams’ Probability with martingales [23]. We begin with a simple
example. Suppose we have a variable X on a sample space Ω which takes values
0 and 1 both with probability 12 . Let Ai be the subset of Ω on which X takes
value i, then P (Ai) =
1
2 . Suppose furthermore that we cannot measure X
directly, but can only measure X +Y , where Y is some small perturbation. Let
Bi ⊆ Ai be such that P (B0) = ǫ 6= P (B1) = δ, where ǫ, δ ≪
1
2 ; Y takes value
1
2
on B0, value −
1
2 on B1 and 0 elsewhere. Then X +Y takes value 0 on A0−B0,
value 1 on A1 − B1 and value
1
2 on B0 ∪ B1. The smallest Boolean algebra B
which contains A0−B0, A1−B1 and B0∪B1 does not contain A0 and A1. The
algebra B represents the situation that we have incomplete information about
X, and precisely codes the kind of information that we do have available about
X, namely X + Y , since B is the smallest algebra with respect to which X + Y
is measurable. If we can measure only X + Y , not X, this implies that all
information about X is represented by the function E(X|B), which takes value
0 on A0 −B0, value
1
2 − δ on A1 −B1, and value δ on B0 ∪B1. It follows that
we have no information about the event X > 0 (since A0 /∈ B, but our best
estimate for the probability of this event is given by P (E(X|B) > 0) = 12 + ǫ.
The next example is more realistic example, with continuous noise. Suppose
4
we have a random variable X on a sample space Ω (measurable with respect to
a σ-algebra B) which we want to measure; for the sake of definiteness, assume
X is distributed as the Gaussian N(0, σ2). Due to noise, we can only observe
X + c.ξ, where ξ is independent of X, ξ is distributed as the Gaussian N(0, 1)
and c ∈ IR. X + c.ξ can be taken to be measurable with respect to a σ-algebra
G which is not the same as B. For example, the event {X < 0} ∈ B is not in G if
G is the smallest σ-algebra with respect to which X + c.ξ is measurable. Hence
we can only determine properties of X as filtered through G; this is represented
by the conditional expectation E(X|G). It is not possible to determine precisely
which sample point ω has been chosen. We can only ask whether ω ∈ A for
A ∈ G. Then the expected value of X given such information, namely
∫
A
XdP ,
should equal
∫
A
E(X|G)dP ; from the point of view of G, no other questions
about X can be answered. This leads to the following
Definition 1 Let B be a σ-algebra on Ω. A conditional expectation of a B-
measurable random variable X : Ω −→ IR with respect to the sub-σ-algebra G
and the probability measure P is a function E(X|G) : Ω −→ IR satisfying
∫
A
E(X|G)dP =
∫
A
XdP.
The Radon-Nikodym theorem is used to show that a function E(X|G)(ω) with
these properties exists. However, it is not unique, in the sense that for given X
two versions of E(X|G) only agree almost everywhere2. In the following list of
properties of E(X|G), the (in)equalities must therefore be understood as holding
almost everywhere:
1. E(1∅|G) = 0, E(1Ω|G) = 1;
2. X ⊆ Y implies E(1X |G) ≤ E(1Y |G); and
3. E(X ·E(Y |G)|G) = E(X|G) ·E(Y |G).
All three properties are strongly suggestive of quantifier properties, as has
often been remarked in the literature; see, for example, Birkhoff’s survey paper
[3]. However, conditional expectation as constructed above by means of the
Radon-Nikodym theorem is not yet quite analogous to quantification, since the
construction is not uniform. For example, for each X,Y there is a nullset N
such that off N , 3 holds; but there need not be a nullset which does the job
for every X,Y . The required uniformity is given by the following definition.
For ease of exposition, we formulate it in terms of conditional probability P(·|G)
determined by
P(A|G) = E(1A|G).
Definition 2 A regular conditional probability is a function P(F |G)(ω) : B ×
Ω −→ [0, 1] satisfying
2The analogues of conditional expectation in section 6 suffer from this non-uniqueness as
well.
5
1. for each F ∈ B, P(F |G)(·) = E(1F |G) a.s.
2. for almost every ω, the map F 7→ P(F |G)(ω) is a probability measure on
B.
The last condition ensures that the additivity and monotonicity properties hold
in a uniform manner, in the sense that there exists a single nullset such that
outside this nullset these properties hold. Regular conditional probabilities do
not always exist, but we need not inquire into the conditions for existence here.
It is important to observe that also in this case the function P(F | G)(ω) is not
completely determined by G, and that essential reference is made to nullsets.
In probability one often considers families of conditional expectation opera-
tors. Since this suggests new ways of looking at quantifiers, we include a brief
description. Let T be a directed set, i.e. a set partially ordered by a relation
≥ such that for s, t ∈ T , there exists r ∈ T with r ≥ s, t. A family of algebras
{Bs}s∈T is called a net (sometimes filtration) if s ≥ t implies Bs ⊇ Bt. The
intuitive idea is that Bs contains all the information at hand at ‘stage’ s (which
could be a timepoint, or a location in space etc.). A set of random variables
{Xs}s∈T is called adapted (w.r.t. {Bs}s∈T ) if each Xs is measurable with re-
spect to Bs. The set {Xs}s∈T is called a martingale if for each s, t such that
s ≥ t, E(Xs|Bt) = Xt. For our purposes the following fundamental theorem is
of interest.
Theorem 1 Let X be a random variable measurable with respect to B =
⋃
s∈T Bs.
1. The family of random variables {E(X | Bs)}s∈T is adapted w.r.t. {Bs}s∈T
and is a martingale.
2. lims∈T E(X | Bs) = E(X | B) a.s.. (In many interesting cases, but not
always, one has in addition that X = E(X | B) a.s..)
The preceding material suggests that the family of quantifiers considered by
first order logic, {∃y | y a variable}, may be only one possibility among many
interesting families of quantifiers. In section 9 we shall study one example in
some detail, namely the logical analogue of a martingale. Another interesting
example is furnished by the logical analogue of a Markov random field, a family
of quantifiers that was implicitly used in the proof theory for generalised quan-
tifiers presented in Alechina and van Lambalgen [1]. In slightly more formal
detail: it can be shown that every filter quantifier is definable using a family
of resource-bounded quantifiers which is the qualitative analogue of a Markov
random field. This however is best left for a separate paper.
3 Examples: partiality and granularity
We proceed to give a few examples of conditional quantifiers, motivated by the
idea of ‘changes in granularity’. These examples play an important role in the
6
logic of visual perception developed in [21] and [19], and the reader is encouraged
to consult the papers cited for fuller information on their use. Here, we shall
discuss the motivating examples mostly with reference to Hobbs’ paper. We will
return to vision in section 7, after we have developed the necessary technical
apparatus.
3.1 Restricted signature
Let L be a first order language, L′ a sublanguage of L containing only the
‘relevant’ predicates of L. A concrete case is furnished by visual occlusion: if
I am looking at the back of someone’s head, a predicate such as ‘mouth’ is
not applicable, hence not part of my model of the situation, or, equivalently,
not part of the language with which I describe that situation. Given a model
M (‘the real world’), we may then define an equivalence relation E on M by
E(a, b) iff for all formulas ϕ(x) in the language L′, M |= ϕ(a) ⇐⇒M |= ϕ(b).
Hobbs proposes to define a model N with domain the E-equivalence classes
of elements of M, such that (for monadic predicates) N |= A([a]) iff M |= A(a).
By the choice of E, this is welldefined. However, apart from the fact that this
does not generalise to predicates of higher arity, the truth definition does not
capture the intuitive motivation. The trouble lies in the direction from left to
right: if N |= A([a]) then M |= A(a). When planning a hike, a curve on the
map (represented by a suitable equivalence class) be a trail from x to y from
the perspective of the map’s scale, without being an actual trail (e.g. it may
have been overgrown in part). A more intuitive truth definition is obtained
by putting N |= A([a]) iff ∃b(E(a, b)&M |= A(b)). Since E is an equivalence
relation this is again welldefined, but the definition now better captures the
uncertain nature of the inference from N to M.
In order to treat predicates of arbitrary arity, we can define an equivalence re-
lationR on assignments byR(f, g) iff for all formulas ϕ in the language L′, M |=
ϕ[f ] ⇐⇒M |= ϕ[g]. R generates a quantifier3 ∃R by
M |= ∃R[f ] ⇐⇒ ∃g(R(f, g) & M |= ϕ[g]).
If A is a predicate of L, ∃RA represents the set of tuples of R-equivalence classes
corresponding to tuples satisfying A. ∃R thus defines a model N which can be
viewed as a coarsening of M4.
Let B be the algebra of L-definable sets of assignments, and let G be the
subalgebra of B determined by L′. We will see later that ∃R is in a sense a
quantifier conditional on G. It will also become clear later that ∃R only has the
right properties for a resource-bounded quantifier when N is finite, but at this
stage ∃R is useful to fix ideas. The important point to remember is that the
‘best estimate’ of a predicate is represented by a quantifier. We could also use
two quantifiers ∃ and ∀ corresponding to best upper and best lower estimate,
but in the Boolean case ∃ and ∀ will be dual.
3Quantifiers of this type (which satisfy the S5 axioms) were introduced by Halmos [6].
4The reader may observe that Hobbs’s truth definition actually corresponds to using the
dual quantifier ∀R when interpreting pred
本文档为【英文版-Conditional Quantification(or:Poor Mans Probability)】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。