J. Parallel Distrib. Comput. 68
Data-stream-based global event moni
Punit Chandra, Ajay D
Department of Computer Science, University of Illino
fo
ne 4
ute
ed
rac
ape
pro
s p
communication structures to determine which of the intervals being examined at any time may never satisfy the stipulated interaction type for that
pair of processes, and therefore that interval(s) need no longer be considered as forming a part of any solution. Based on this theory, the paper
proposes two on-line distributed algorithms to solve the problem.
c© 2008 Elsevier Inc. All rights reserved.
Keywords: Causality; Data streams; Data fusion; Distributed system; Global state; Interactions; Intervals; Snapshot; State observation
1. Introduction
A global state is represented by a collection of local
states, one from each process. The problem of global state
observation in a consistent manner is fundamental to distributed
systems, as identified by Chandy’s and Lamport’s seminal
paper on recording global states [14]. Furthermore, the
problem of monitoring and analyzing data streams is becoming
increasingly important, particularly for crisis management
(see [13,38]). Often, data streams have to be analyzed on-line
[1,2] under different constraints and modalities. For example,
I Some portions of this paper have appeared in [P. Chandra, A.D.
Kshemkalyani, Detection of orthogonal interval relations, in: Proceedings 9th
International High Performance Computing Conference (HiPC), in: Lecture
Notes in Computer Science, vol. 2552, Springer-Verlag, 2002, pp. 323–333;
P. Chandra, A.D. Kshemkalyani, Analysis of interval-based global state
much research focuses on data streams in sensor networks [19,
31,32], and in generic middleware roles [30,36].
In this paper, we consider applications that require the con-
sistent observation and continuous processing of data in the data
streams. The consistent observation of the global state [14,24]
requires observing the causality constraints among the events in
the execution [28]. This paper generalizes the problem of global
state observation to executions where the “event” at a process
is a high-level abstract event that can contain multiple events
that span across a time interval. The interval has the property
that a predicate defined on local variables is always true in the
interval and is false immediately before and immediately af-
ter the interval. The semantics of the interval depend on the
predicate which is application-specific [17,18,20,22,25,27,29];
application areas such as sensor networks, distributed debug-
ging, deadlock characterization [26], predicate detection [7,11,
12], checkpointing [3–5,33,34], and industrial process control
Received 14 November 2006; received in revised
Available onli
Abstract
The problem of global state observation is fundamental to distrib
distributed systems can be analyzed in terms of the building block form
causality-based pairwise interactions by which two processes may inte
pair of processes (Pi , P j ), let interaction type ri, j be of interest. This p
is specified in terms of such pairwise interaction types, one per pair of
a global state in which the interaction type specified for each proces
detection, in: Proceedings 2nd Int. Conference on Distributed Computing and
Internet Technology (ICDCIT), in: Lecture Notes in Computer Science, vol.
3816, Springer, 2005, pp. 203–216; P. Chandra, A.D. Kshemkalyani, Global
state detection based on peer-to-peer interactions, in: Proceedings IFIP Int.
Conference on Embedded and Ubiquitous Computing (EUC), in: Lecture Notes
in Computer Science, vol. 3824, Springer, 2005, pp. 560–571].
∗ Corresponding author.
E-mail address: ajayk@cs.uic.edu (A.D. Kshemkalyani).
0743-7315/$ - see front matter c© 2008 Elsevier Inc. All rights reserved.
doi:10.1016/j.jpdc.2008.01.006
(2008) 729–751
www.elsevier.com/locate/jpdc
toring using pairwise interactionsI
. Kshemkalyani∗
is at Chicago, Chicago, IL 60607, United States
rm 16 October 2007; accepted 23 January 2008
March 2008
d systems and to the analysis of data streams. Many interactions in
by the pairwise interactions of intervals at two processes. Considering
t with each other, there are 40 orthogonal interaction types. For each
r examines the problem: “If a global state of interest to an application
cesses, how can such a global state be detected?” A solution identifies
air is satisfied. This paper formulates the specific conditions on the
model such intervals. Such high-level abstract events at pro-
cesses and the corresponding time intervals that they span have
been explicitly studied [18,20,21,29].
Fig. 1(a) shows a consistent global state GS2 and an
inconsistent global state GS1 in a distributed execution where
events are modeled as atomic send, receive, or internal events,
ra
te.
Relations [6,22].
Problem
processes
identify t
that each
Devisi
DOOR i
at differe
relations
states. In
that can b
distribute
1) which
pair of
ntified as
), gives a
usality in
identified
R (also
[7]. The
DOOR: Given a relation ri, j from R for each pair of
Pi and Pj , devise a distributed on-line algorithm to
he intervals, if they exist, one from each process, such
relation ri, j is satisfied by the (Pi , Pj ) process pair.
ng an efficient on-line algorithm to solve problem
s a challenge because of having to track the intervals
nt processes and to determine the pairwise orthogonal
in a search space of an exponential number of global
this paper, we first identify the underlying principles
4. The process of devising the principle (Theorem
determines whether at least one interval in any
intervals being examined at any time can be ide
never forming a part of any solution (Lemma 4
deeper insight into the nature of reasoning with ca
a distributed execution. Schwarz and Mattern have
this as an important problem [37].
A centralized algorithm to solve problem DOO
referred to as problem Fine Rel,) was given in
730 P. Chandra, A.D. Kshemkalyani / J. Pa
Fig. 1. (a) Event positions determine the consistency of a global sta
as per the traditional model of a distributed execution [28].
Fig. 1(b) shows high-level abstract events that may contain
multiple events and span across a time interval. The intervals
are shown as a rectangle. An application-specific predicate
defined at each process becomes true at the start of each interval
shown and it becomes false at the end of each interval shown.
Intervals X , Y , and Z occur at Pi , Pj , and Pk , respectively.
Observe that if interval Y were to begin at event a marked by
a vertical line, the inherent causality-based interaction type of
Y with respect to Z as well as with respect to X would be
different from what is shown. Thus, the relative placement of
one interval with respect to another defines different interaction
types between the interval pair.
It has been observed that causality-based interactions
in distributed executions can be analyzed in terms of the
building block formed by point-to-point pairwise interactions
of intervals at two processes [20]. A detailed analysis of the
causality-based pairwise interactions by which two processes
may interact with each other identified 29 (40) causality-
based orthogonal interaction types, denoted as R, under the
dense (and non-dense) time model, respectively [20]. The
orthogonality of the interaction types in R implies that no
interaction type can be expressed in terms of the other
interaction types. Each interaction type in R between a pair of
intervals is essentially a relationship between the two intervals.
Hence, the interaction types in R are also interchangeably
termed as relations. For each pair of processes (Pi , Pj ), let
interaction type ri, j be of interest. This paper examines the state
detection problem: “If a global state of interest to an application
is specified in terms of such pairwise interaction types, one per
pair of processes, how can such a global state be detected?”
For any relationship r ∈ R, let ri, j (X i , Y j ) denote that r holds
for interval X i at process Pi and interval Y j at process Pj .
The above state detection problem is formally formulated as
the following problem DOOR for the Detection of Orthogonal
e used to solve problem DOOR. We then propose two
d algorithms to solve the problem.
llel Distrib. Comput. 68 (2008) 729–751
(b) Interval positions define the interaction types in a global state.
Summary of results and contributions:
1. To devise any efficient solution to problem DOOR, this
paper formulates specific conditions on the structure of the
causal communication patterns to determine which of two
intervals being examined from processes Pi and Pj may
never satisfy ri, j , and therefore that interval will never form
part of any solution and should no longer be considered. This
result is embodied as:
• a basic principle that we prove in Theorem 1 — the main
result, and
• Lemma 4 — a useful lemma derived from the above
theorem, that we will use to efficiently manage the
distributed data structures in solving problem DOOR.
Any algorithms to solve DOOR can leverage this principle.
2. The paper proposes two distributed on-line algorithms to
solve problem DOOR, based on the above formulated
conditions to determine when an interval no longer needs
to be considered as a candidate for a solution. Let n be the
number of processes, ms be the total number of messages
sent, mr be the total number of messages received, and p
be the maximum number of intervals at any process. For
unicast communication, ms = mr . The first distributed
algorithm uses O(min(np, 2ms+2mr )) number of messages
with a message size of O(n2). The second algorithm uses
O(n · min(np, 2ms + 2mr )) number of messages with a
message size of O(n). For both the algorithms, the total
space complexity across all the processes is min(4n2 p −
2np, 6msn+4mrn), and the total time complexity across all
processes is O(n · min(np, 2ms + 2mr )). The performance
of the algorithms is compared in Table 1.
3. Global state observation and predicate detection are
fundamental problems in distributed systems. This paper
provides an understanding of interval-based global states in
terms of the causal communication patterns induced by the
message-passing interactions in an execution [25].
algorithm was presented without any formal discussion
or analysis of the theoretical basis — embodied here in
el
ms
n +
m
4
))
2
2m
gives the theory used to determine which of two given intervals at each process are the durations during which an application-
at different processes can never be part of a solution set.
Section 4 shows how to track intervals. Sections 5 and 6 present
the distributed algorithms to solve Problem DOOR based on
the results derived in Section 3. Section 7 gives an extended
specification of DOOR. Section 8 gives concluding remarks.
2. System model and background
System model. We assume an asynchronous distributed system
in which n processes communicate by reliable message
passing over logical FIFO channels [20,29]. The execution is
modeled as (E,≺), where ≺ is an irreflexive partial ordering
specific local predicate is true. See Fig. 2. An interval begins
when the predicate becomes true, and ends when/just before the
predicate becomes false. Each interval defines an abstract event
of coarser granularity at a process, as studied by Lamport [29],
Helary et al. [18], and Kshemkalyani [27]. Such higher-level
abstract events can be used to identify a global state [27].
For example, in the context of checkpointing [4,5,15,33,34],
ψi can be used to describe the kth checkpoint interval at Pi .
To characterize deadlock [26,25], ψi denotes the interval from
the time of incoming wait-for dependency to the time of an
outgoing wait-for dependency, or vice versa; such intervals at
different processes capture how dependency chains grow. For
P. Chandra, A.D. Kshemkalyani / J. Parall
Table 1
Summary of space, time, and message complexities
Metric Algorithm 1
Total space complexity O(min(2np(2n − 1), 6
Space per process (worst case) O(min(4np − 2p, 4ms
Total time complexity O(n ·min(np, 2ms + 2
Time per process (worst case) O(n ·min(10p, 20ms +
Total number of messages O(min(np, 2ms + 2mr
Message size (average case) O(n2)
Message size (worst case) O(n ·min(2(n − 1)p +
Total message space O(n2 ·min(np, 2ms +
Fig. 2. Intervals at processes. Messages and send and receive events are not
shown to simplify the diagram.
Theorems 1–3 and Lemma 4. That paper also showed how
to perform predicate detection in the context of traditional
modalities on predicates [12]. This paper gives the theory
behind the results of [7] and then gives two distributed
algorithms for DOOR. Distributed algorithms are more elegant
than centralized algorithms and do not require a central server.
Distributed algorithms result in a better workload and space
complexity distribution as compared to a centralized one.
The space complexity and the time complexity get distributed
linearly with the number of processes. Thus, a distributed
algorithm is more scalable than a centralized algorithm. Also,
the network traffic gets distributed more uniformly in a
distributed algorithm compared to the centralized approach
where a traffic bottleneck gets created at the central process.
Preliminary versions of this paper appeared in [6,8,9].
Section 2 gives the system model and background. Section 3
representing the causality or the “happens before” relation [28]
on the event set E .
Distrib. Comput. 68 (2008) 729–751 731
Algorithm 2
n + 4mrn)) O(min(2np(2n − 1), 6msn + 4mrn))
2ms )) O(min(4np − 2p, 4msn + 2ms ))
r )) O(n ·min(np, 2ms + 2mr ))
mr )) O(n ·min(10p, 20ms + 4mr ))
O(n ·min(np, 2ms + 2mr ))
O(n)
n, 2ms + 2n)) O(min(2(n − 1)p + 2n, 2ms + 2n))
r )) average O(n2 ·min(4np − 2p, 6ms + 4mr ))
Definition 1 (Causality or the “happens before” Relation ≺).
Event e happens before event e′, denoted e ≺ e′, if:
1. event e occurs before event e′ at the same process, or
2. event e is the send of a message m and event e′ is the receive
of m, or
3. there is an event e′′ such that e happens before e′′ and e′′
happens before e′.
E is partitioned into local executions at each process. Let Ei
denote the linearly ordered set of events executed by process
Pi . An event e executed by Pi is denoted ei and may be of three
types — an internal event, a send event, or a receive event. N
denotes the set of all processes. A cut C is a subset of E such
that if ei ∈ C then (∀e′i ) e′i ≺ ei H⇒ e′i ∈ C . A consistent
cut is a downward-closed subset of E , i.e., if e ∈ C then
(∀e′) e′ ≺ e H⇒ e′ ∈ C . A consistent cut denotes an execution
prefix. Two special consistent cuts ↓ e and e ↑ can be defined
for any event e.
Definition 2 (Past and Future Cuts). For any event e, cut ↓e is
the set of events {e′|e′ ≺ e} that happen before e. Cut e↑ is the
set of events {e′|e′ 6� e}⋃{ei , i = 1, . . . , n | ei � e∧ (∀e′i ≺
ei , e′i 6� e)} up to and including the earliest events at each
process for which e happens before the events.
The system state after the events in a cut is a global state
[14]; if the cut is consistent, the corresponding system state
is a consistent global state. We assume that vector clocks are
available [16,35].
Intervals. As introduced in Section 1, the intervals of interest
conjunctive predicate detection [7,12], ψi is the predicate on
the local variables that the application wants to detect.
ra
ec
, Y
−1
IC (= IV ) 0 0 1 1 1 0 0 0 0 0 0 0
ID (= IX−1) 0 0 1 1 1 1 0 1 0 1 0 0
ID′(= IU−1) 0 0 1 1 0 1 0 1 0 1 0 1
IE (= IW−1) 0 0 1 1 1 1 0 0 0 1 0 0
IE′(= IT−1) 0 0 1 1 0 1 0 0 0 1 0 1
IF (= IS−1) 0 1 1 1 0 1 0 0 0 1 0 1
IG (= IG−1) 0 0 0 0 1 0 0 0 0 0 1 0
IH (= IK−1) 0 0 0 1 1 0 0 0 0 0 1 0
II (= IJ−1) 0 1 0 1 0 0 0 0 0 0 1 0
IL (= IO−1) 0 0 0 1 1 1 0 1 0 1 0 0
IL′ (= IP−1) 0 0 0 1 0 1 0 1 0 1 0 1
IM (= IM−1) 0 0 0 1 1 0 0 0 0 1 1 0
IN (= IM′−1) 0 0 0 1 1 1 0 0 0 1 0 0
IN′ (= IN′−1) 0 0 0 1 0 1 0 0 0 1 0 1
ID′′ (= (IUX)−1) 0 0 1 1 0 1 0 1 0 1 0 0
IE′′ (= (ITW)−1) 0 0 1 1 0 1 0 0 0 1 0 0
IL′′ (= (IOP)−1) 0 0 0 1 0 1 0 1 0 1 0 0
IM′′ (= (IMN)−1) 0 0 0 1 0 0 0 0 0 1 1 0
IN′′ (= (IMN′)−1) 0 0 0 1 0 1 0 0 0 1 0 0
IMN′′ (= (IMN′′)−1) 0 0 0 1 0 0 0 0 0 1 0 0
The upper part gives the 29 interaction types for dense time. The lower part gives 11 additional interaction types for non-dense time.
In our discrete event system model, an interval X i at process
Pi is identified by the (totally ordered) subset of adjacent events
of Ei , beginning from the event that makes the predicate true
up to the event that precedes the event that makes the predicate
false. Intervals are denoted by capitals such as X , Y , and Z . The
subscripts are omitted when not necessary or when the context
is clear. Lower-case alphabet x denotes an individual event in
an abstract event X .
Definition 3 (Interval). For a predicate ψi defined on process
Pi , an interval X i (ψi ) ⊆ Ei , satisfies the following.
• ∀ei ∈ Ei , if min(X i ) ≺ ei ≺ max(X i ) then ei ∈ X i
• ψi becomes true at min(X i ) and becomes false at
next(max(X i ))
• ∀xi such that min(X i ) ≺ xi ≺ max(X i ), xi does not falsify
ψi .
The execution history at Pi is 〈s0i , e1i , s1i , e2i , s2i . . . eki ,
k k k
true. ψi is application-specific and so we henceforth refer to
intervals without using this parameter.
Orthogonal interaction types/relationships. There are 29 or 40
possible mutually orthogonal ways in which any two durations
can be related to each other, depending on whether the dense
or the non-dense time model is assumed [20]. Informally, with
dense time, ∀z1, z2 in interval Z , z1 ≺ z2 H⇒ ∃z ∈ Z | z1 ≺
z ≺ z2. These orthogonal interaction types were identified
by first using the six dependent relations defined in the first
two columns of Table 2. Relations R1 (strong precedence), R2
(partially strong precedence), R3 (partially weak precedence),
R4 (weak precedence) define causality conditions; S1 and S2
define coupling conditions. The tests in the third column are
explained later in this section.
• (Dense time:) The 29 orthogonal interaction types between
a pair of intervals are given in the upper part of Table 3.
For any intervals X and Y , the orthogonal interaction types
732 P. Chandra, A.D. Kshemkalyani / J. Pa
Table 2
Dependent relations for interactions between intervals [20]
Relation r Expression for r(X, Y )
R1 ∀x ∈ X∀y ∈ Y, x ≺ y
R2 ∀x ∈ X∃y ∈ Y, x ≺ y
R3 ∃x ∈ X∀y ∈ Y, x ≺ y
R4 ∃x ∈ X∃y ∈ Y, x ≺ y
S1 ∃x ∈ X∀y ∈ Y, x 6� y∧ y 6� x
S2 ∃x1, x2 ∈ X∃y ∈ Y, x1 ≺ y ≺ x2
The second column defines the relations. The third column gives the tests using v
Table 3
Definitions of the 40 orthogonal interaction types inR [20]
Orthogonal interaction type (on intervals X and Y ) Dependent relation r(X
R1 R2 R3
IA (= IQ−1) 1 1 1
IB (= IR−1) 0 1 1
si , . . .〉, where si is the state after event ei . So the definition of
X i attempts to capture the time duration in which ψi remains
llel Distrib. Comput. 68 (2008) 729–751
Test for r(Xi , Y j ) using vector timestamps
V−j (Y j )[i] ≥ V+i (Xi )[i]
V+j (Y j )[i] ≥ V+i (Xi )[i]
V−j (Y j )[i] ≥ V−i (Xi )[i]
V+j (Y j )[i] ≥ V−i (Xi )[i]
∃x0 ∈ Xi : V−j (Y j )[ j] 6≤ V x
0
i [ j] ∧ V x
0
i [i] 6≤ V+j (Y j )[i]
∃y0 ∈ Y j : V+i (Xi )[ j] 6< V y
0
j [ j] ∧ V y
0
j [i] 6< V−i (Xi )[i]
tor timestamps.
) Dependent relation r(Y, X)
R4 S1 S2 R1 R2 R3 R4 S1 S2
1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
listed in the first column are specified using boolean vectors
of length 12 on the dependent relations R1–R4 and S1–S2
lel
es
IC(X i , Y j ), IA(Zk, Y j ), and IX(Zk, X i ), or alternately, in
t
O
s
A
s
a
I
a
E
i
r
a
i
v
c
t
U
e
theorem in the form of Lemma 4 will be used in practice by
erms of inverses, IV(Y j , X i ), IQ(Y j , Zk), and ID(X i , Zk).
bserve using Fig. 3 that the intervals in Fig. 1(b) satisfy this
pecification on the interval-based global state.
nother example specification of DOOR: Detect a global
tate satisfying IE(X i , Y j ), IP(Zk, Y j ), and IX(Zk, X i ),
本文档为【Data Stream Based Global Event Monitoring Using Pairwise Interactions】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。