Chapter 4
Markov Chains
4.1 Definitions and Examples
The importance of Markov chains comes from two facts: (i) there are a large
number of physical, biological, economic, and social phenomena that can be
described in this way, and (ii) there is a well-developed theory that allows us to
do computations. We begin with a famous example, then describe the property
that is the defining feature of Markov chains.
Example 4.1 (Gambler’s ruin). Consider a gambling game in which on any
turn you win $1 with probability p = 0.4 or lose $1 with probability 1− p = 0.6.
Suppose further that you adopt the rule that you quit playing if your fortune
reaches $N . Of course, if your fortune reaches $0 the casino makes you stop.
Let Xn be the amount of money you have after n plays. I claim that your
fortune, Xn has the “Markov property.” In words, this means that given the
current state, any other information about the past is irrelevant for predicting
the next state Xn+1. To check this, we note that if you are still playing at time
n, i.e., your fortune Xn = i with 0 < i < N , then for any possible history of
your wealth in−1, in−2, . . . i1, i0
P (Xn+1 = i+ 1|Xn = i,Xn−1 = in−1, . . . X0 = i0) = 0.4
since to increase your wealth by one unit you have to win your next bet and the
outcome of the previous bets has no useful information for predicting the next
outcome.
Turning now to the formal definition, we say that Xn is a discrete time
Markov chain with transition matrix p(i, j) if for any j, i, in−1, . . . i0
P (Xn+1 = j|Xn = i,Xn−1 = in−1, . . . , X0 = i0) = p(i, j) (4.1)
Equation (4.1), also called the “Markov property” says that the conditional
probability Xn+1 = j given the entire history Xn = i,Xn−1 = in−1, . . . X1 =
91
92 CHAPTER 4. MARKOV CHAINS
i1, X0 = i0 is the same as the conditional probability Xn+1 = j given only the
previous state Xn = i. This is what we mean when we say that “given the
current state any other information about the past is irrelevant for predicting
Xn+1.”
In formulating (4.1) we have restricted our attention to the temporally
homogeneous case in which the transition probability
p(i, j) = P (Xn+1 = j|Xn = i)
does not depend on the time n. Intuitively, the transition probability gives the
rules of the game. It is the basic information needed to describe a Markov chain.
In the case of the gambler’s ruin chain, the transition probability has
p(i, i+ 1) = 0.4, p(i, i− 1) = 0.6, if 0 < i < N
p(0, 0) = 1 p(N,N) = 1
When N = 5 the matrix is
0 1 2 3 4 5
0 1.0 0 0 0 0 0
1 0.6 0 0.4 0 0 0
2 0 0.6 0 0.4 0 0
3 0 0 0.6 0 0.4 0
4 0 0 0 0.6 0 0.4
5 0 0 0 0 0 1.0
Example 4.2 (Wright–Fisher model). Thinking of a population of N/2 diploid
individuals who have two copies of each of their genes, or of N haploid individ-
uals who have one copy, we consider a fixed population of N genes that can be
one of two types: A or a. These types are called alleles. In the simplest version
of this model the population at time n + 1 is obtained by drawing with replace-
ment from the population at time n. In this case if we let Xn be the number of
A alleles at time n, then Xn is a Markov chain with transition probability
p(i, j) =
(
N
j
)(
i
N
)j (
1− i
N
)N−j
since the right-hand side is the binomial distribution for N independent trials
with success probability i/N .
time n time n+ 1
A
a
A
A
A
a
A
a
A
a
A
A
A
a
a
A
A
a
a
A
4.1. DEFINITIONS AND EXAMPLES 93
In the Gambler’s Ruin chain and the Wright-Fisher model the states 0 and N
are absorbing states. Once we enter these states we can never leave. The long
run behavior of these models is not very interesting, they will eventually enter
one of the absorbing states and stay there forever. To make the Wright-Fisher
model more interesting and more realistic, we can introduce the possibility of
mutations: an A that is drawn ends up being an a in the next generation with
probability u, while an a that is drawn ends up being an A in the next generation
with probability v. In this case the probability an A is produced by a given draw
is
ρi =
i
N
(1− u) + N − i
N
v
i.e., we can get an A by drawing an A and not having a mutation or by drawing
an a and having a mutation. Since the draws are independent the transition
probability still has the binomial form
p(i, j) =
(
N
j
)
(ρi)j(1− ρi)N−j (4.2)
Moving from biology to physics:
Example 4.3 (Ehrenfest chain). We imagine two cubical volumes of air con-
nected by a small hole. In the mathematical version, we have two “urns,” i.e.,
two of the exalted trash cans of probability theory, in which there are a total of
N balls. We pick one of the N balls at random and move it to the other urn.
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
Let Xn be the number of balls in the “left” urn after the nth draw. It should
be clear that Xn has the Markov property; i.e., if we want to guess the state
at time n + 1, then the current number of balls in the left urn Xn, is the only
relevant information from the observed sequence of states Xn, Xn−1, . . . X1, X0.
To check this we note that
P (Xn+1 = i+ 1|Xn = i,Xn−1 = in−1, . . . X0 = i0) = (N − i)/N
since to increase the number we have to pick one of the N − i balls in the other
urn. The number can also decrease by 1 with probability i/N . In symbols, we
have computed that the transition probability is given by
p(i, i+ 1) = (N − i)/N, p(i, i− 1) = i/N for 0 ≤ i ≤ N
94 CHAPTER 4. MARKOV CHAINS
with p(i, j) = 0 otherwise. When N = 5, for example, the matrix is
0 1 2 3 4 5
0 0 5/5 0 0 0 0
1 1/5 0 4/5 0 0 0
2 0 2/5 0 3/5 0 0
3 0 0 3/5 0 2/5 0
4 0 0 0 4/5 0 1/5
5 0 0 0 0 5/5 0
Here we have written 1 as 5/5 to emphasize the pattern in the diagonals of the
matrix.
Moving from science to business:
Example 4.4 (Inventory chain). An electronics store sells a video game system.
If at the end of the day, the number of units they have on hand is 1 or 0, they
order enough new units so their total on hand is 5. This chain is an example of
an s, S inventory control policy with s = 1 and S = 5. That is, when the stock
on hand falls to s or below we order enough to bring it back up to S.
For simplicity we assume that the new merchandise arrives before the store
opens the next day. Let Xn be the number of units on hand at the end of the
nth day. If we assume that the number of customers who want to buy a video
game system each day is 0, 1, 2, or 3 with probabilities .3, .4, .2, and .1, then
we have the following transition matrix:
0 1 2 3 4 5
0 0 0 .1 .2 .4 .3
1 0 0 .1 .2 .4 .3
2 .3 .4 .3 0 0 0
3 .1 .2 .4 .3 0 0
4 0 .1 .2 .4 .3 0
5 0 0 .1 .2 .4 .3
To explain the entries we note that if Xn ≥ 2 then no ordering is done so what
we have at the end of the day is the supply minus the demand. If Xn = 2 and
the demand is 3 or more, or if Xn = 3 and the demand is 4, we end up with 0
units at the end of the day and at least one unhappy customer. If Xn = 0 or 1
then we will order enough so that at the beginning of the day we have 5, so the
result at the end of the day is the same as if Xn = 5.
Markov chains are described by giving their transition probabilities. To
create a chain, we can write down any n × n matrix, provided that the entries
satisfy:
(i) p(i, j) ≥ 0, since they are probabilities.
(ii)
∑
j p(i, j) = 1, since when Xn = i, Xn+1 will be in some state j.
4.1. DEFINITIONS AND EXAMPLES 95
The equation in (ii) is read “sum p(i, j) over all possible values of j.” In words
the last two conditions say: the entries of the matrix are nonnegative and each
ROW of the matrix sums to 1.
Any matrix with properties (i) and (ii) gives rise to a Markov chain, Xn.
To construct the chain we can think of playing a board game. When we are
in state i, we roll a die (or generate a random number on a computer) to pick
the next state, going to j with probability p(i, j). To illustrate this we will now
introduce some simple examples.
Example 4.5 (Weather chain). Let Xn be the weather on day n in Ithaca, NY,
which we assume is either: 1 = rainy, or 2 = sunny. Even though the weather
is not exactly a Markov chain, we can propose a Markov chain model for the
weather by writing down a transition probability
1 2
1 .4 .6
2 .2 .5
The table says, for example, the probability a rainy day (state 1) is followed by
a sunny day (state 2) is p(1, 2) = 0.6.
Example 4.6 (Social mobility). Let Xn be a family’s social class in the nth
generation, which we assume is either 1 = lower, 2 = middle, or 3 = upper. In
our simple version of sociology, changes of status are a Markov chain with the
following transition probability
1 2 3
1 .7 .2 .1
2 .3 .5 .2
3 .2 .4 .4
Example 4.7 (Brand preference). Suppose there are three types of laundry
detergent, 1, 2, and 3, and let Xn be the brand chosen on the nth purchase.
Customers who try these brands are satisfied and choose the same thing again
with probabilities 0.8, 0.6, and 0.4 respectively. When they change they pick one
of the other two brands at random. The transition probability matrix is
1 2 3
1 .8 .1 .1
2 .2 .6 .2
3 .3 .3 .4
We now have several examples to think about. The basic question concerning
Markov chains is what happens in the long run? In the case of the weather chain,
96 CHAPTER 4. MARKOV CHAINS
does the probability that day n is sunny converge to a limit? Do the fractions
of the population in the three income classes (or that buy each of the three
types of detergent) stabilize as time goes on? The first step in answering these
questions is to figure out what happens in the Markov chain after two or more
steps.
4.2. MULTISTEP TRANSITION PROBABILITIES 97
4.2 Multistep Transition Probabilities
The transition probability p(i, j) = P (Xn+1 = j|Xn = i) gives the probability
of going from i to j in one step. Our goal in this section is to compute the
probability of going from i to j in m > 1 steps:
pm(i, j) = P (Xn+m = j|Xn = i)
For a concrete example, we start with the transition probability of the social
mobility chain:
1 2 3
1 .7 .2 .1
2 .3 .5 .2
3 .2 .4 .4
To warm-up we consider:
Example 4.8. Suppose the family starts in the middle class (state 2) in gen-
eration 0. What is the probability that the generation 1 rises to the upper class
(state 3) and generation 2 falls to the lower class (state 1)?
Intuitively, the Markov property implies that starting from state 2 the probabil-
ity of jumping to 1 and then to 3 is given by p(2, 3)p(3, 1). To get this conclusion
from the definitions, we note that using the definition of conditional probability,
P (X2 = 1, X1 = 3|X0 = 2) = P (X2 = 1, X1 = 3, X0 = 2)
P (X0 = 2)
Multiplying and dividing by P (X1 = 3, X0 = 2):
=
P (X2 = 1, X1 = 3, X0 = 2)
P (X1 = 3, X0 = 2)
· P (X1 = 3, X0 = 2)
P (X0 = 2)
Using the definition of conditional probability:
= P (X2 = 1|X1 = 3, X0 = 2) · P (X1 = 3|X0 = 2)
By the Markov property (1.1) the last expression is
P (X2 = 1|X1 = 3) · P (X1 = 3|X0 = 2) = p(2, 3)p(3, 1)
Moving on to the real question:
Example 4.9. Suppose the family starts in the middle class (state 2) in gen-
eration 0. What is the probability that generation 2 will be in the lower class
(state 1)?
98 CHAPTER 4. MARKOV CHAINS
To do this we simply have to consider the three possible states for generation 1
and use the previous computation.
P (X2 = 1|X0 = 2) =
3∑
k=1
p(2, k)p(k, 1)
= (.3)(.7) + (.5)(.3) + (.2)(.2)
= .21 + .15 + .04 = .40
There is nothing special here about the states 2 and 1 here. By the same
reasoning,
P (X2 = j|X0 = i) =
3∑
k=1
p(i, k) p(k, j)
The right-hand side of the last equation gives the (i, j)th entry of the matrix p
is multiplied by itself.
To explain this, we note that to compute p2(2, 1) we multiplied the entries
of the second row by those in the first column: . . ..3 .5 .2
. . .
.7 . ..3 . .
.2 . .
=
. . ..40 . .
. . .
If we wanted p2(1, 3) we would multiply the first row by the third column:.7 .2 .1. . .
. . .
. . .1. . .2
. . .4
=
. . .15. . .
. . .
When all of the computations are done we have.7 .2 .1.3 .5 .2
.2 .4 .4
.7 .2 .1.3 .5 .2
.2 .4 .4
=
.57 .28 .15.40 .39 .21
.34 .40 .26
The two step transition probability p2 = p·p. Based on this you can probably
leap to the next conclusion:
The m-step transition probability
pm(i, j) = P (Xn+m = j|Xn = i) (4.3)
is the mth power of the transition matrix p, i.e., p · p · · · p, where there are m
terms in the product.
The key ingredient in proving this is the:
4.2. MULTISTEP TRANSITION PROBABILITIES 99
Chapman–Kolmogorov equation
pm+n(i, j) =
∑
k
pm(i, k) pn(k, j) (4.4)
Once this is proved, (4.3) follows, since taking n = 1 in (4.4), we see that
pm+1 = pm · p.
Why is (4.4) true? To go from i to j in m+ n steps, we have to go from i to
some state k in m steps and then from k to j in n steps. The Markov property
implies that the two parts of our journey are independent.
•
•
•
•
•
•
•
•
•
•
•
•
````````
aaaaaaaa
aaaaaaaa
````````
i
j
time 0 m m+ n
Proof of (4.4). The independence in the second sentence of the previous
explanation is the mysterious part. To show this, we combine Examples 4.8 and
4.9. Breaking things down according to the state at time m,
P (Xm+n = j|X0 = i) =
∑
k
P (Xm+n = j,Xm = k|X0 = i)
Repeating the computation in Example 4.8, the definition of conditional prob-
ability implies:
P (Xm+n = j,Xm = k|X0 = i) = P (Xm+n = j,Xm = k,X0 = i)
P (X0 = i)
Multiplying and dividing by P (Xm = k,X0 = i) gives:
=
P (Xm+n = j,Xm = k,X0 = i)
P (Xm = k,X0 = i)
· P (Xm = k,X0 = i)
P (X0 = i)
Using the definition of conditional probability we have:
= P (Xm+n = j|Xm = k,X0 = i) · P (Xm = k|X0 = i)
By the Markov property (4.1) the last expression is
= P (Xm+n = j|Xm = k) · P (Xm = k|X0 = i) = pm(i, k)pn(k, j)
and we have proved (4.4).
100 CHAPTER 4. MARKOV CHAINS
Having established (4.4), we now return to computations. We begin with
the weather chain (
0.6 0.4
0.2 0.8
)(
0.6 0.4
0.2 0.8
)
=
(
0.44 0.56
0.28 0.72
)
Mutiplying again p2 · p = p3(
0.44 0.56
0.28 0.72
)(
0.6 0.4
0.2 0.8
)
=
(
0.376 0.624
0.312 0.688
)
and then p3 · p = p4(
0.376 0.624
0.312 0.688
)(
0.6 0.4
0.2 0.8
)
=
(
0.3504 0.6496
0.3248 0.6752
)
To increase the time faster we can use (4.4) to conclude that p4 · p4 = p8:(
0.3504 0.6496
0.3248 0.6752
)(
0.3504 0.6496
0.3248 0.6752
)
=
(
0.33377 0.66623
0.33311 0.66689
)
Multiplying again p8 · p8 = p16(
0.33333361 0.66666689
0.33333319 0.66666681
)
Based on the last calculation, one might guess that as n gets large the matrix
becomes closer and closer to (
1/3 2/3
1/3 2/3
)
This is true and will be explained in the next section.
4.3. STATIONARY DISTRIBUTIONS 101
4.3 Stationary distributions
Our first step is to consider
What happens when the initial state is random? Breaking things down
according to the value of the initial state and using the definition of conditional
probability
P (Xn = y) =
∑
x
P (X0 = x,Xn = y)
=
∑
x
P (X0 = x)P (Xn = y|X0 = x)
If we introduce q(x) = P (X0 = x), then the last equation can be written as
P (Xn = y) =
∑
x
q(x)pn(x, y) (4.5)
In words, we multiply the transition matrix on the left by the vector q of initial
probabilities. If there are k states, then pn(x, y) is a k × k matrix. So to make
the matrix multiplication work out right, we should take q as a 1× k matrix or
a “row vector.”
For a concrete example consider the social mobility chain and suppose that
the initial distribution: q(1) = .5, q(2) = .2, and q(3) = .3. Multiplying the
vector q times the transition probability gives the vector of probabilities at time
1. (
.5 .2 .3
).7 .2 .1.3 .5 .2
.2 .4 .4
= (.47 .32 .21)
To check the arithmetic note that the three entries are
.5(.7) + .2(.3) + .3(.2) = .35 + .06 + .06 = .47
.5(.2) + .2(.5) + .3(.4) = .10 + .10 + .12 = .32
.5(.1) + .2(.2) + .3(.4) = .05 + .04 + .12 = .21
For a second example consider the weather chain and suppose that the initial
distribution is q(1) = 1/3 and q(2) = 2/3. In this case
(
1/3 2/3
)(0.6 0.4
0.2 0.8
)
=
(
1/3 2/3
)
since
1
3
(0.6) +
2
3
(0.2) =
1
3
1
3
(0.4) +
2
3
(0.8) =
2
3
102 CHAPTER 4. MARKOV CHAINS
In symbols q ·p = q. In words if q is the distribution at time 0 then it is also the
distribution at time 1, and by the Markov property at all times n ≥ 1. Because
of this q is called a stationary distribution. Stationary distributions have a
special importance in the theory of Markov chains, so we will use a special letter
pi to denote solutions of pi · p = pi.
To have a mental picture of what happens to the distribution of probability
when one step of the Markov chain is taken, it is useful to think that we have
q(i) pounds of sand at state i, with the total amount of sand
∑
i q(i) being one
pound. When a step is taken in the Markov chain, a fraction p(i, j) of the sand
at i is moved to j. The distribution of sand when this has been done is
q · p =
∑
i
q(i)p(i, j)
If the distribution of sand is not changed by this procedure q is a stationary
distribution.
General two state transition probability.
1 2
1 1− a a
2 b 1− b
We have written the chain in this way so the stationary distribution has a simple
formula
pi(1) =
b
a+ b
pi(2) =
a
a+ b
(4.6)
As a first check on this formula we note that in the weather chain a = 0.4 and
b = 0.2 which gives (1/3, 2/3) as we found before. We can prove this works in
general by drawing a picture:
•
1
b
a+b •
2
a
a+b
a
−→←−
b
In words, the amount of sand that flows from 1 to 2 is the same as the amount
that flows from 2 to 1 so the amount of sand at each site stays constant. To
check algebraically that this is true:
b
a+ b
(1− a) + a
a+ b
b =
b− ba+ ab
a+ b
=
b
a+ b
b
a+ b
a+
a
a+ b
(1− b) = ba+ a− ab
a+ b
=
a
a+ b
(4.7)
Formula (4.6) gives the stationary distribution for any two state chain, so we
progress now to the three state case and consider the brancd preference chain.
4.3. STATIONARY DISTRIBUTIONS 103
The equation pip = pi says
(
pi1 pi2 pi3
).8 .1 .1.2 .6 .2
.3 .3 .4
= (pi1 pi2 pi3)
which translates into three equations
.8pi1 + .2pi2 + .3pi3 = pi1
.1pi1 + .6pi2 + .3pi3 = pi2
.1pi1 + .2pi2 + .4pi3 = pi3
Note that the dolumns of the matrix give the numbers in the rows of the equa-
tions. The third equation is redundant since if we add up the three equations
we get
pi1 + pi2 + pi3 = pi1 + pi2 + pi3
If we replace the third equation by pi1 + pi2 + pi3 = 1 and subtract pi1 from each
side of the first equation and pi2 from each side of the second equation we get
−.2pi1 + .2pi2 + .3pi3 = 0
.1pi1 − .4pi2 + .3pi3 = 0
pi1 + pi2 + pi3 = 1 (4.8)
At this point we can solve the equations by hand or using a calcuator.
By hand. We note that the third equation implies pi3 = 1 − pi1 − pi2 and
substituting this in the first two gives
−.5pi1 − .1pi2 + .3 = 0
−.2pi1 − .7pi2 + .3 = 0
which we rearrange to give
0.3 = .5pi1 + .1pi2
0.3 = .2pi1 + .7pi2
Multiplying the first equation by .7 and adding −.1 times the second gives
1.8 = (.35− .02)pi1 or pi1 = 18/33 = 6/11
Multiplying the first equation by .2 and adding -.5 times the second gives
−0.09 = (0.02− .35)pi2 or pi2 = 9/33 = 3/11
Since the three probabilities add up to 1, pi3 = 2/11.
Using the calculator is easier. To begin we write (4.8) in matrix form as
(
pi1 pi2 pi3
)−.2 .1 1.2 −.4 1
.3 .3 1
= (0 0 1)
104 CHAPTER 4. MARKOV CHAINS
If we let A be the 3×3 matrix in the middle this can be written as piA = (0, 0, 1).
Multiplying on each side by A−1 we see that
pi = (0, 0, 1)A−1
which is the third row of A−1. Entering A into our calculator computing the
inverse and reading the third row we find that the stationary distribution is
(.545454, .272727, .181818)
Converting the answer to fractions using the first entry in the math menu gives
(6/11, 3/11, 2/11)
Example 4.10 (Mobility chain).
1 2 3
1 .7 .2 .1
2 .3 .5 .2
3 .2 .4 .4
Using the first two equations and the fact that the sum of the
本文档为【ElementaryProbability_ch4】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。