Dynamic Programming
and Optimal Control
Volume I
THIRD EDITION
P. Bertsekas
Massachusetts Institute of Technology
WWW site for book information and
http://www.athenasc.com
IiJ Athena Scientific, Belmont,
Athena Scientific
Post Office Box 805
NH 03061-0805
U.S.A.
ErnaH: info@athenasc.com
WWW: http://www.athenasc.co:m
Cover Design: Ann Gallager, www.gallagerdesign.com
© 2005, 2000, 1995 Dimitri P. Bertsekas
All rights reserved. No part of this book may be reproduced in any form
by ~1Il~ electronic or mechanical means (including photocopying, recording,
or mlormation storage and retrieval) without permission in writing from
the publisher.
Publisher's Cataloging-in-Publication Data
Bertsekas, Dimitri P.
Dynamic Programming and Optimal Control
Includes Bibliography and Index
1. Mathematical Optimization. 2. Dynamic Programming. L Title.
QA402.5 .13465 2005 519.703 00-91281
ISBN 1-886529-26-4
ABOUT THE AUTHOR
Dimitri Bertsekas studied Mechanical and Electrical Engineering at
the National Technical University of Athens, Greece, and obtained his
Ph.D. in system science from the Massachusetts Institute of Technology. He
has held faculty positions with the Engineering-Economic Systems Dept.,
Stanford University, and the Electrical Engineering Dept. of the Univer-
sity of Illinois, Urbana. Since 1979 he has been teaching at the Electrical
Engineering and Computer Science Department of the Massachusetts In-
stitute of Technology (M.LT.), where he is currently McAfee Professor of
Engineering.
His research spans several fields, including optimization, control, la,rge-
scale computation, and data communication networks, and is closely tied
to his teaching and book authoring activities. He has written llUInerous
research papers, and thirteen books, several of which are used as textbooks
in MIT classes. He consults regularly with private industry and has held
editorial positions in several journals.
Professor Bertsekas was awarded the INFORMS 1997 Prize for H,e-
search Excellence in the Interface Between Operations Research and Com-
puter Science for his book "Neuro-Dynamic Programming" (co-authored
with John Tsitsiklis), the 2000 Greek National Award for Operations Re-
search, and the 2001 ACC John R. Ragazzini Education Award. In 2001,
he was elected to the United States National Academy of Engineering.
ATHENA SCIENTIFIC
OPTIMIZATION AND COl\1PUTATION SERIES
1. Convex Analysis and Optimization, by Dimitri P. Bertsekas, with
Angelia Nedic and Asuman E. Ozdaglar, 2003, ISBN 1-886529-
45-0, 560 pages
2. Introduction to Probability, by Dimitri P. Bertsekas and John N.
Tsitsiklis, 2002, ISBN 1-886529-40-X, 430 pages
3. Dynamic Programming and Optimal Control, Two-Volume Set,
by Dimitri P. Bertsekas, 2005, ISBN 1-886529-08-6, 840 pages
4. Nonlinear Programming, 2nd Edition, by Dimitri P. Bertsekas,
1999, ISBN 1-886529-00-0, 791 pages
5. Network Optimization: Continuous and Discrete Models, by Dim-
itri P. Bertsekas, 1998, ISBN 1-886529-02-7, 608 pages
6. Network Flows and Monotropic Optimization, by R. Tyrrell Rock-
areUar, 1998, ISBN 1-886529-06-X, 634 pages
7. Introduction to Linear Optimization, by Dimitris Bertsimas and
John N. Tsitsiklis, 1997, ISBN 1-886529-19-1, 608 pages
8. Parallel and Distributed Computation: Numerical Methods, by
Dimitri P. Bertsekas and John N. Tsitsiklis, 1997, ISBN 1-886529-
01-9, 718 pages
9. Neuro-Dynamic Programming, by Dimitri P. Bertsekas and John
N. Tsitsiklis, 1996, ISBN 1-886529-10-8, 512 pages
10. Constra,ined Optimization and Lagrange Multiplier Methods, by
Dimitri P. Bertsekas, 1996, ISBN 1-88f1529-04-3, 410 pages
11. Stochastic Optirnal Control: The Discrete-Time Case, by Dimitri
P. Bertsekas and Steven E. Shreve, 1996, ISBN 1-886529-03-5,
330 pages
Contents
1. The Dynamic Programming Algorithm
1.1. Introduction . . . . . . . . .
1.2. The Basic Problem. . . . . . . . . .
1.3. The Dynamic Programming Algorithm .
1.4. State Augmentation and Other Reformulations
1.5. Some Mathematical Issues . . . . . . . .
1.6. Dynamic Prograrnming and Minimax Control
1.7. Notes, Sources, and Exercises . . . . . . .
2. Deterministic Systems and the Shortest Path Probleln
2.1. Finite-State Systems and Shortest Paths
2.2. Some Shortest Path Applications
2.2.1. Critical Path Analysis
2.2.2. Hidden Markov Models and the Viterbi Algorithm
2.3. Shortest Path Algorithms . . . . . . . . . . .
2.3.1. Label Correcting Methods. . . . . . . .
2.3.2. Label Correcting Variations - A* Algorithm
2.3.3. Branch-and-Bound . . . . . . . . . .
2.3.4. Constrained and Multiobjective Problems
2.4. Notes, Sources, and Exercises .
3. Deterministic Continuous-Time
3.1. Continuous-Time Optimal Control
3.2. The Hamilton-Jacobi-Bellman Equation
3.3. The Pontryagin Minimum Principle
3.3.1. An Informal Derivation Using the HJB Equation
3.3.2. A Derivation Based on Variational Ideas
3.3.3. Minimum Principle for Discrete-Time Problems
3.4. Extensions of the Minimum Principle
3.4.1. Fixed Terminal State
3.4.2. Free Initial State
p. 2
p. 12
p. 18
p.35
p.42
p. 46
p.51
p. 64
p. fl8
p. 68
p.70
p.77
p. 78
p. 87
p.88
p.91
p. 97
p.106
p.109
p.115
p.115
p. 125
p.129
p. 131
p.131
p.135
6. Control
6.1. Certainty Equivalent and Adaptive Control p. 283
G.l.l. Caution, Probing, and Dual Control p. 289
6.1.2. Two-Phase Control and Identifiability p. 291
6.1.~1. Certainty Equivalent Control and Identifiability p. 293
6.1.4. Self-Tuning Regulators p. 298
G.2. Open-Loop Feedback Control . . . . . . . . . " p. 300
t\.~3. Limited Lookahead Policies . . . . . . . . . . .. p. 304
6.3.1. Performance Bounds for Limited Lookahead Policies p. 305
6.3.2. Computational Issues in Limited Lookahead . . . p. 310
G.3.3. Problem Approximation - Enforced Decomposition p. 312
6.3.4. Aggregation . . . . . . . . . . . . p. 319
6.3.5. Parametric Cost-to-Go Approximation p. 325
6.4. Rollout Algorithms. . . . . . . . . . p. 335
6.4.1. Discrete Deterministic Problems . p. 342
6.4.2. Q-Factors Evaluated by Simulation p.361
6.4.3. Q-Factor Approximation p. 363
vi
3.4.3. Free Terminal Time .....
~.L4.4. Time-Varying System and Cost
3.4.5. Singular Problems . .
~~.5. Notes, Sources, and Exercises . . . .
4. Problellls with Perfect State Information
4.1. Linear Systems and Quadratic Cost
4.2. Inventory Control
L1.3. Dynamic Portfolio Analysis . . . .
4.4. Optimal Stopping Problems . . . .
4.5. Scheduling and the Interchange Argument
4.6. Set-Membership Description of Uncertainty
4.6.1. Set-Membership Estimation . . . .
4.6.2. Control with Unknown-but-Bounded Disturbances
4.7. Notes, Sources, and Exercises . . . . . . . . . . . .
5. Problen'ls with Imperfect State Information
5.1. Reduction to the Perfect Information Case
5.2. Linear Systems and Quadratic Cost
5.3. Minimum Variance Control of Linear Systems
5.4. SufIicient Statistics and Finite-State Markov Chains
5.4.1. The Conditional State Distribution
5.4.2. Finite-State Systems .
5.5. Notes, Sources, and Exercises
Contents
p.135
p.138
p.139
p.142
p.148
p. 162
p.170
p.176
p. 186
p.190
p.191
p.197
p.201
p.218
p.229
p.236
p.251
p.252
p.258
p.270
Contents
6.5. Model Predictive Control and Related Methods
6.5.1. Rolling Horizon Approximations . . . .
6.5.2. Stability Issues in Model Predictive Control
6.5.3. Restricted Structure Policies . .
6.6. Additional Topics in Approximate DP
6.6.1. Discretization . . . . . . . .
6.6.2. Other Approximation Approaches
6.7. Notes, Sources, and Exercises . . . . .
7. Introduction to Infinite Horizon Problems
7.1. An Overview . . . . . . . . .
7.2. Stochastic Shortest Path Problems
7.3. Discounted Problems . . . . . .
7.4. Average Cost per Stage Problems
7.5. Semi-Markov Problems . . .
7.6. Notes, Sources, and Exercises . .
Appendix A: Mathematical Review
A.1. Sets .
A.2. Euclidean Space.
A.3. Matrices . . . .
A.4. Analysis . . . .
A.5. Convex Sets and Functions
Appendix B: On Optimization Theory
B.1. Optimal Solutions . . . . . . .
B.2. Optimality Conditions . . . . .
B.3. Minimization of Quadratic J:iorms
Appendix C: On Probability Theory
C.l. Probability Spaces. . .
C.2. Random Variables
C.3. Conditional Probability
Appendix D: On Finite-State Markov Chains
D.l. Stationary Markov Chains
D.2. Classification of States
D.3. Limiting Probabilities
D.4. First Passage Times .
vii
p. ~366
p.367
p.369
p. ~376
p. 382
p. 382
p. 38L1
p. 386
p.402
p.405
p.417
p.421
p.435
p. 445
p.459
p.460
p.461
p. 465
p.467
p.468
p.470
p.471
p.472
p. 47~i
p. 475
p.477
p.478
p.479
p.480
G: Forrnulating Problems of Decision Under Uncer-
viii
P'C'A.a'LIU.lI.. E: Kalman Filtering
E.l. Least-Squares Estimation .
E.2. Linear Least-Squares Estimation
E.~1. State Estimation Kalman Filter
E.4. Stability Aspects . . . . . . .
E.5. Gauss-Markov Estimators
E.6. Deterministic Least-Squares Estimation
Appendix F: lVIodeling of Stochastic Linear Systems
F .1. Linear Systems with Stochastic Inputs
F.2. Processes with Rational Spectrum
F .~1. The ARMAX Model . . . . . .
G.l. T'he Problem of Decision Under Uncertainty
G.2. Expected Utility Theory and Risk . .
G.3. Stoehastic Optimal Control Problems
References
Index ...
Contents
p.481
p.483
p.491
p.496
p.499
p.501
p. 503
p. 504
p. 506
p. 507
p.511
p.524
p.529
p.541
Contents
COl\fTENTS OF VOLUIVIE II
1. Infinite Horizon - Discounted Problems
1.1. Minimization of Total Cost Introduction
1.2. Discounted Problems with Bounded Cost per Stage
1.3. Finite-State Systems - Computational Methods
1.3.1. Value Iteration and Error Bounds
1.3.2. Policy Iteration
1.3.3. Adaptive Aggregation
1.3.4. Linear Programming
1.3.5. Limited Lookahead Policies
1.4. The Role of Contraction Mappings
1.5. Scheduling and Multiarmed Bandit Problems
1.6. Notes, Sources, and Exereises
2. Stochastic Shortest Path Problems
2.1. Main Results
2.2. Computational Methods
2.2.1. Value Iteration
2.2.2. Policy Iteration
2.3. Simulation-Based Methods
2.3.1. Policy Evaluation by Monte-Carlo Simulation
2.3.2. Q-Learning
2.3.3. Approximations
2.3.4. Extensions to Discounted Problems
2.3.5. The Role of Parallel Computation
2.4. Notes, Sources, and Exereises
3. Undiscounted Problems
3.1. Unbounded Costs per Stage
3.2. Linear Systems and Quadratic Cost
3.3. Inventory Control
3.4. Optimal Stopping
3.5. Optimal Gambling Strategies
3.6. Nonstationary and Periodic Problems
3.7. Notes, Sourees, and Exercises
4. Average Cost per Stage Problems
4.1. Preliminary Analysis
4.2. Optimality Conditions
4.3. Computational Methods
4.3.1. Value Iteration
Index
5. Continuous-Time Problems
References This two-volume book is based on a first-year graduate course ondynamic programming and optimal control that I have taught for over
twenty years at Stanford University, the University of Illinois, and HIe Mas-
sachusetts Institute of Technology. The course has been typically attended
by students from engineering, operations research, economics, and applied
mathematics. Accordingly, a principal objective of the book has been to
provide a unified treatment of the subject, suitable for a broad audience.
In particular, problems with a continuous character, such as stochastic con-
trol problems, popular in modern control theory, are simultaneously treated
with problems with a discrete character, such as Markovian decision prob-
lems, popular in operations research. F\lrthermore, many applications and
examples, drawn from a broad variety of fields, are discussed.
The book may be viewed as a greatly expanded and pedagogically
improved version of my 1987 book "Dynamic Programming: Deterministic
and Stochastic Models," published by Prentice-Hall. I have included much
new material on deterministic and stochastic shortest path problems, as
well as a new chapter on continuous-time optimal control problems and the
Pontryagin Minimum Principle, developed from a dynamic programming
viewpoint. I have also added a fairly extensive exposition of simulation-
based approximation techniques for dynamic programming. These tech-
niques, which are often referred to as "neuro-dynamic programming" or
"reinforcement learning," represent a breakthrough in the practical ap-
plication of dynamic programming to complex problems that involve the
dual curse of large dimension and lack of an accurate mathematical model.
Other material was also augmented, substantially modified, and updated.
With the new material, however, the book grew so much in size that
it became necessary to divide it into two volumes: one on finite horizon,
and the other on infinite horizon problems. This division was not only·
natural in terms of size, but also in terms of style and orientation. The
first volume is more oriented towards modeling, and the second is more
oriented towards mathematical analysis and computation. I have included
in the first volume a final chapter that provides an introductory treatment
of infinite horizon problems. The purpose is to make the first volume self-
Preface
Contents
4.3.2. Policy Iteration
L1.~t3. Linear Programming
4.3.4. Simulation-Based Methods
Infinite State Space
Notes, Sources, and Exercises
5.1. Uniformization
5.2. Queueing Applications
5.3. Semi-Markov Problems
5.4. Notes, Sources, and Exercises
4.4.
4.5.
x
Algorithm
Contents
1.1. Introduction . " . . . . . . . .
1.2. The Basic Problem . . . . . . .
1.3. The Dynamic Programming Algorithm
1.4. State Augmentation and Other Reformulations
1.5. Some Mathematical Issues . . . . . . . . .
1.6. Dynamic Programming and Minimax Control
1.7. Notes, Sources, and Exercises. . . . . . . .
p. 2
p.12
p. 18
p. 35
p. 42
p.46
p.51
Life can only be understood going backwards,
but it lllust be lived going forwards.
Kierkegaard
N is the horizon or number of times control is applied,
and fk is a function that describes the system and in particular the mech-
anism by which the state is updated.
The cost function is additive in the sense that the cost incurred at
time k, denoted by gk(Xk, Uk, 'Wk), accumulates over time. The total cost
is
2 ,]'11e Dynamic Programming Algmithm Chap. 1 Sec. 1.1 Introduction 3
1.1 INTRODUCTION
This book deals with situations where decisions are made in stages. The
outcome of each decision may not be fully predictable but can be antici-
pated to some extent before the next decision is made. The objective is to
minimize a certain cost a mathematical expression of what is considered
an undesirable outcome.
A key aspect of such situations is that decisions cannot be viewed in
isolation since one must balance the desire for low present cost with the
undesirability of high future costs. The dynamic programming technique
captures this tradeoff. At each stage, it ranks decisions based on the sum
of the present cost and the expected future cost, assuming optimal decision
making for subsequent stages.
There is a very broad variety of practical problems that can be treated
by dynamic programming. In this book, we try to keep the main ideas
uncluttered by irrelevant assumptions on problem structure. To this end,
we formulate in this section a broadly applicable model of optimal control
of a dynamic system over a finite number of stages (a finite horizon). This
model will occupy us for the first six chapters; its infinite horizon version
will be the subject of the last chapter as well as Vol. II.
Our basic model has two principal features: (1) an underlying discrete-
time dynamic system, and (2) a cost function that is additive over time.
The dynamic system expresses the evolution of some variables, the system's
"state" , under the influence of decisions made at discrete instances of time.
T'he system has the form
k = 0,1, ... ,N - 1,
where
k indexes discrete time,
:1; k is the state of the system and summarizes past information that is
relevant for future optimization,
'Ilk is the control or decision variable to be selected at time k,
'Wh: is a random parameter (also called disturbance or noise depending on
the context),
N-1
gN(XN) + L gk(Xk, Uk, 'Wk),
k=O
where gN(XN) is a terminal cost incurred at the end of the process. How-
ever, because of the presence of 'Wk, the cost is generally a random variable
and cannot be meaningfully optimized. We therefore formulate the problem
as an optimization of the expected cost
where the expectation is with respect to the joint distribution of the random
variables involved. The optimization is over the controls 'lLo, 'Ill, ... , UN -1,
but some qualification is needed here; each control Uk is selected with some
knowledge of the current state Xk (either its exact value or some other
related information).
A more precise definition of the terminology just used will be given
shortly. Vile first provide some orientation by means of examples.
Example 1.1.1 (Inventory Control)
Consider a problem of ordering a quantity of a certain item at each of N
periods so as to (roughly) meet a stochastic demand, while minimizing the
incurred expected cost. Let us denote
Xk stock available at the beginning of the kth period,
Uk stock ordered (and immediately delivered) at the beginning of the kth
period,
'Wk demand during the kth period with given probability distribution.
We assume that 'Wo, 'WI, ... , 'WN-l are independent random variables,
and that excess demand is backlogged and filled as soon as additional inven-
tory becomes available. Thus, stock evolves according to the discrete-time
equation
where negative stock corresponds to backlogged demand (see Fig. 1.1.1).
The cost incurred in period k consists of two components:
(a) A cost r(xk) representing a penalty for either positive stock Xk (holding
cost for excess inventory) or negative stock Xk (shortage cost for unfilled
demand).
Figure 1.1.1 Inventory control example. At period k, the current stock
(state) x k, the stock ordered (control) Uk, and the demand (random distur-
bance) 'Wk determine the cost r(xk)+cUk and the stock Xk+1 = Xk +Uk 'Wk
at the next period.
1-.........--- Uk
5
if Xk < Eh,
otherwise,
Introduction
/tk(Xk) = amount that should be ordered at time k if the stock is Xk.
and we want to minimize J1t"(xo) for a given Xo over all 'if that satisfy the
constraints of the problem. This is a typical dynamic programming problem.
We will analyze this problem in various forms in subsequent sections. For
example, we will show in Section 4.2 that for a reasonable choice of the cost
function, the optimal ordering policy is of the form
The sequence 'if {{to, ... , jlN - I} will be referred to as a policy or
contr-ol law. For each 'if, the corresponding cost for a fixed initial stock :ro is
so as to minimize the expected cost. The meaning of jlk is that, for each k
and each possible value of Xk,
Sec. 1.1Chap. 1
dk+1
The Dynamic Programming Algorithm
Wk IDemand at Period k
eriod I< Stocl< at Perio
Inventory System
Xk+ 1 = Xk +
Stock ordered at
Period I<
Stocl< at P
xk
Cost of Penod k
r(xk) + CUI<
(b) The purchasing cost C'Uk, where c is cost per unit ordered.
There is also a terminal cost R(XN) for being left with inventory XN at the
end of N periods. Thus, the total cost over N periods is
where Sk is a suitable threshold level determined by the data of the problem.
In other words, when stock falls below the threshold Sk, order just enough to
bring stock up to Sk.
We want to minimize this cost by proper choice of the orders Uo, ... , UN-I,
subject to the natural constraint Uk 2:: 0 for all k.
At this point we need to distinguish between closed-loop and open-
loop minimization of the cost. In open-loop minimization we select all orders
Uo, ... , UN-I at once at time 0, without waiting to see the subsequent demand
levels. In closed-loop minimization we postpone placing the order Uk until the
last possible moment (time k) when the current stock Xk will be known. The
idea is that since there is no penalty for delaying the order Uk up to time k,
we can take advantage of information that becomes available between times
o and k (the demand and stock level in past periods).
本文档为【Dynamic Programming and Optimal Control Volume I】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。