Dynamic Programming and Optimal Control Volume I

Dynamic Programming and Optimal Control Volume I Dynamic Programming and Optimal Control Volume I THIRD EDITION P. Bertsekas Massachusetts Institute of Technology WWW site for book information and http://www.athenasc.com IiJ Athena Scientific, Belmont, Athena Scientific Post Office Box 805 NH 030...

Dynamic Programming and Optimal Control Volume I THIRD EDITION P. Bertsekas Massachusetts Institute of Technology WWW site for book information and http://www.athenasc.com IiJ Athena Scientific, Belmont, Athena Scientific Post Office Box 805 NH 03061-0805 U.S.A. ErnaH: info@athenasc.com WWW: http://www.athenasc.co:m Cover Design: Ann Gallager, www.gallagerdesign.com © 2005, 2000, 1995 Dimitri P. Bertsekas All rights reserved. No part of this book may be reproduced in any form by ~1Il~ electronic or mechanical means (including photocopying, recording, or mlormation storage and retrieval) without permission in writing from the publisher. Publisher's Cataloging-in-Publication Data Bertsekas, Dimitri P. Dynamic Programming and Optimal Control Includes Bibliography and Index 1. Mathematical Optimization. 2. Dynamic Programming. L Title. QA402.5 .13465 2005 519.703 00-91281 ISBN 1-886529-26-4 ABOUT THE AUTHOR Dimitri Bertsekas studied Mechanical and Electrical Engineering at the National Technical University of Athens, Greece, and obtained his Ph.D. in system science from the Massachusetts Institute of Technology. He has held faculty positions with the Engineering-Economic Systems Dept., Stanford University, and the Electrical Engineering Dept. of the Univer- sity of Illinois, Urbana. Since 1979 he has been teaching at the Electrical Engineering and Computer Science Department of the Massachusetts In- stitute of Technology (M.LT.), where he is currently McAfee Professor of Engineering. His research spans several fields, including optimization, control, la,rge- scale computation, and data communication networks, and is closely tied to his teaching and book authoring activities. He has written llUInerous research papers, and thirteen books, several of which are used as textbooks in MIT classes. He consults regularly with private industry and has held editorial positions in several journals. Professor Bertsekas was awarded the INFORMS 1997 Prize for H,e- search Excellence in the Interface Between Operations Research and Com- puter Science for his book "Neuro-Dynamic Programming" (co-authored with John Tsitsiklis), the 2000 Greek National Award for Operations Re- search, and the 2001 ACC John R. Ragazzini Education Award. In 2001, he was elected to the United States National Academy of Engineering. ATHENA SCIENTIFIC OPTIMIZATION AND COl\1PUTATION SERIES 1. Convex Analysis and Optimization, by Dimitri P. Bertsekas, with Angelia Nedic and Asuman E. Ozdaglar, 2003, ISBN 1-886529- 45-0, 560 pages 2. Introduction to Probability, by Dimitri P. Bertsekas and John N. Tsitsiklis, 2002, ISBN 1-886529-40-X, 430 pages 3. Dynamic Programming and Optimal Control, Two-Volume Set, by Dimitri P. Bertsekas, 2005, ISBN 1-886529-08-6, 840 pages 4. Nonlinear Programming, 2nd Edition, by Dimitri P. Bertsekas, 1999, ISBN 1-886529-00-0, 791 pages 5. Network Optimization: Continuous and Discrete Models, by Dim- itri P. Bertsekas, 1998, ISBN 1-886529-02-7, 608 pages 6. Network Flows and Monotropic Optimization, by R. Tyrrell Rock- areUar, 1998, ISBN 1-886529-06-X, 634 pages 7. Introduction to Linear Optimization, by Dimitris Bertsimas and John N. Tsitsiklis, 1997, ISBN 1-886529-19-1, 608 pages 8. Parallel and Distributed Computation: Numerical Methods, by Dimitri P. Bertsekas and John N. Tsitsiklis, 1997, ISBN 1-886529- 01-9, 718 pages 9. Neuro-Dynamic Programming, by Dimitri P. Bertsekas and John N. Tsitsiklis, 1996, ISBN 1-886529-10-8, 512 pages 10. Constra,ined Optimization and Lagrange Multiplier Methods, by Dimitri P. Bertsekas, 1996, ISBN 1-88f1529-04-3, 410 pages 11. Stochastic Optirnal Control: The Discrete-Time Case, by Dimitri P. Bertsekas and Steven E. Shreve, 1996, ISBN 1-886529-03-5, 330 pages Contents 1. The Dynamic Programming Algorithm 1.1. Introduction . . . . . . . . . 1.2. The Basic Problem. . . . . . . . . . 1.3. The Dynamic Programming Algorithm . 1.4. State Augmentation and Other Reformulations 1.5. Some Mathematical Issues . . . . . . . . 1.6. Dynamic Prograrnming and Minimax Control 1.7. Notes, Sources, and Exercises . . . . . . . 2. Deterministic Systems and the Shortest Path Probleln 2.1. Finite-State Systems and Shortest Paths 2.2. Some Shortest Path Applications 2.2.1. Critical Path Analysis 2.2.2. Hidden Markov Models and the Viterbi Algorithm 2.3. Shortest Path Algorithms . . . . . . . . . . . 2.3.1. Label Correcting Methods. . . . . . . . 2.3.2. Label Correcting Variations - A* Algorithm 2.3.3. Branch-and-Bound . . . . . . . . . . 2.3.4. Constrained and Multiobjective Problems 2.4. Notes, Sources, and Exercises . 3. Deterministic Continuous-Time 3.1. Continuous-Time Optimal Control 3.2. The Hamilton-Jacobi-Bellman Equation 3.3. The Pontryagin Minimum Principle 3.3.1. An Informal Derivation Using the HJB Equation 3.3.2. A Derivation Based on Variational Ideas 3.3.3. Minimum Principle for Discrete-Time Problems 3.4. Extensions of the Minimum Principle 3.4.1. Fixed Terminal State 3.4.2. Free Initial State p. 2 p. 12 p. 18 p.35 p.42 p. 46 p.51 p. 64 p. fl8 p. 68 p.70 p.77 p. 78 p. 87 p.88 p.91 p. 97 p.106 p.109 p.115 p.115 p. 125 p.129 p. 131 p.131 p.135 6. Control 6.1. Certainty Equivalent and Adaptive Control p. 283 G.l.l. Caution, Probing, and Dual Control p. 289 6.1.2. Two-Phase Control and Identifiability p. 291 6.1.~1. Certainty Equivalent Control and Identifiability p. 293 6.1.4. Self-Tuning Regulators p. 298 G.2. Open-Loop Feedback Control . . . . . . . . . " p. 300 t\.~3. Limited Lookahead Policies . . . . . . . . . . .. p. 304 6.3.1. Performance Bounds for Limited Lookahead Policies p. 305 6.3.2. Computational Issues in Limited Lookahead . . . p. 310 G.3.3. Problem Approximation - Enforced Decomposition p. 312 6.3.4. Aggregation . . . . . . . . . . . . p. 319 6.3.5. Parametric Cost-to-Go Approximation p. 325 6.4. Rollout Algorithms. . . . . . . . . . p. 335 6.4.1. Discrete Deterministic Problems . p. 342 6.4.2. Q-Factors Evaluated by Simulation p.361 6.4.3. Q-Factor Approximation p. 363 vi 3.4.3. Free Terminal Time ..... ~.L4.4. Time-Varying System and Cost 3.4.5. Singular Problems . . ~~.5. Notes, Sources, and Exercises . . . . 4. Problellls with Perfect State Information 4.1. Linear Systems and Quadratic Cost 4.2. Inventory Control L1.3. Dynamic Portfolio Analysis . . . . 4.4. Optimal Stopping Problems . . . . 4.5. Scheduling and the Interchange Argument 4.6. Set-Membership Description of Uncertainty 4.6.1. Set-Membership Estimation . . . . 4.6.2. Control with Unknown-but-Bounded Disturbances 4.7. Notes, Sources, and Exercises . . . . . . . . . . . . 5. Problen'ls with Imperfect State Information 5.1. Reduction to the Perfect Information Case 5.2. Linear Systems and Quadratic Cost 5.3. Minimum Variance Control of Linear Systems 5.4. SufIicient Statistics and Finite-State Markov Chains 5.4.1. The Conditional State Distribution 5.4.2. Finite-State Systems . 5.5. Notes, Sources, and Exercises Contents p.135 p.138 p.139 p.142 p.148 p. 162 p.170 p.176 p. 186 p.190 p.191 p.197 p.201 p.218 p.229 p.236 p.251 p.252 p.258 p.270 Contents 6.5. Model Predictive Control and Related Methods 6.5.1. Rolling Horizon Approximations . . . . 6.5.2. Stability Issues in Model Predictive Control 6.5.3. Restricted Structure Policies . . 6.6. Additional Topics in Approximate DP 6.6.1. Discretization . . . . . . . . 6.6.2. Other Approximation Approaches 6.7. Notes, Sources, and Exercises . . . . . 7. Introduction to Infinite Horizon Problems 7.1. An Overview . . . . . . . . . 7.2. Stochastic Shortest Path Problems 7.3. Discounted Problems . . . . . . 7.4. Average Cost per Stage Problems 7.5. Semi-Markov Problems . . . 7.6. Notes, Sources, and Exercises . . Appendix A: Mathematical Review A.1. Sets . A.2. Euclidean Space. A.3. Matrices . . . . A.4. Analysis . . . . A.5. Convex Sets and Functions Appendix B: On Optimization Theory B.1. Optimal Solutions . . . . . . . B.2. Optimality Conditions . . . . . B.3. Minimization of Quadratic J:iorms Appendix C: On Probability Theory C.l. Probability Spaces. . . C.2. Random Variables C.3. Conditional Probability Appendix D: On Finite-State Markov Chains D.l. Stationary Markov Chains D.2. Classification of States D.3. Limiting Probabilities D.4. First Passage Times . vii p. ~366 p.367 p.369 p. ~376 p. 382 p. 382 p. 38L1 p. 386 p.402 p.405 p.417 p.421 p.435 p. 445 p.459 p.460 p.461 p. 465 p.467 p.468 p.470 p.471 p.472 p. 47~i p. 475 p.477 p.478 p.479 p.480 G: Forrnulating Problems of Decision Under Uncer- viii P'C'A.a'LIU.lI.. E: Kalman Filtering E.l. Least-Squares Estimation . E.2. Linear Least-Squares Estimation E.~1. State Estimation Kalman Filter E.4. Stability Aspects . . . . . . . E.5. Gauss-Markov Estimators E.6. Deterministic Least-Squares Estimation Appendix F: lVIodeling of Stochastic Linear Systems F .1. Linear Systems with Stochastic Inputs F.2. Processes with Rational Spectrum F .~1. The ARMAX Model . . . . . . G.l. T'he Problem of Decision Under Uncertainty G.2. Expected Utility Theory and Risk . . G.3. Stoehastic Optimal Control Problems References Index ... Contents p.481 p.483 p.491 p.496 p.499 p.501 p. 503 p. 504 p. 506 p. 507 p.511 p.524 p.529 p.541 Contents COl\fTENTS OF VOLUIVIE II 1. Infinite Horizon - Discounted Problems 1.1. Minimization of Total Cost Introduction 1.2. Discounted Problems with Bounded Cost per Stage 1.3. Finite-State Systems - Computational Methods 1.3.1. Value Iteration and Error Bounds 1.3.2. Policy Iteration 1.3.3. Adaptive Aggregation 1.3.4. Linear Programming 1.3.5. Limited Lookahead Policies 1.4. The Role of Contraction Mappings 1.5. Scheduling and Multiarmed Bandit Problems 1.6. Notes, Sources, and Exereises 2. Stochastic Shortest Path Problems 2.1. Main Results 2.2. Computational Methods 2.2.1. Value Iteration 2.2.2. Policy Iteration 2.3. Simulation-Based Methods 2.3.1. Policy Evaluation by Monte-Carlo Simulation 2.3.2. Q-Learning 2.3.3. Approximations 2.3.4. Extensions to Discounted Problems 2.3.5. The Role of Parallel Computation 2.4. Notes, Sources, and Exereises 3. Undiscounted Problems 3.1. Unbounded Costs per Stage 3.2. Linear Systems and Quadratic Cost 3.3. Inventory Control 3.4. Optimal Stopping 3.5. Optimal Gambling Strategies 3.6. Nonstationary and Periodic Problems 3.7. Notes, Sourees, and Exercises 4. Average Cost per Stage Problems 4.1. Preliminary Analysis 4.2. Optimality Conditions 4.3. Computational Methods 4.3.1. Value Iteration Index 5. Continuous-Time Problems References This two-volume book is based on a first-year graduate course ondynamic programming and optimal control that I have taught for over twenty years at Stanford University, the University of Illinois, and HIe Mas- sachusetts Institute of Technology. The course has been typically attended by students from engineering, operations research, economics, and applied mathematics. Accordingly, a principal objective of the book has been to provide a unified treatment of the subject, suitable for a broad audience. In particular, problems with a continuous character, such as stochastic con- trol problems, popular in modern control theory, are simultaneously treated with problems with a discrete character, such as Markovian decision prob- lems, popular in operations research. F\lrthermore, many applications and examples, drawn from a broad variety of fields, are discussed. The book may be viewed as a greatly expanded and pedagogically improved version of my 1987 book "Dynamic Programming: Deterministic and Stochastic Models," published by Prentice-Hall. I have included much new material on deterministic and stochastic shortest path problems, as well as a new chapter on continuous-time optimal control problems and the Pontryagin Minimum Principle, developed from a dynamic programming viewpoint. I have also added a fairly extensive exposition of simulation- based approximation techniques for dynamic programming. These tech- niques, which are often referred to as "neuro-dynamic programming" or "reinforcement learning," represent a breakthrough in the practical ap- plication of dynamic programming to complex problems that involve the dual curse of large dimension and lack of an accurate mathematical model. Other material was also augmented, substantially modified, and updated. With the new material, however, the book grew so much in size that it became necessary to divide it into two volumes: one on finite horizon, and the other on infinite horizon problems. This division was not only· natural in terms of size, but also in terms of style and orientation. The first volume is more oriented towards modeling, and the second is more oriented towards mathematical analysis and computation. I have included in the first volume a final chapter that provides an introductory treatment of infinite horizon problems. The purpose is to make the first volume self- Preface Contents 4.3.2. Policy Iteration L1.~t3. Linear Programming 4.3.4. Simulation-Based Methods Infinite State Space Notes, Sources, and Exercises 5.1. Uniformization 5.2. Queueing Applications 5.3. Semi-Markov Problems 5.4. Notes, Sources, and Exercises 4.4. 4.5. x Algorithm Contents 1.1. Introduction . " . . . . . . . . 1.2. The Basic Problem . . . . . . . 1.3. The Dynamic Programming Algorithm 1.4. State Augmentation and Other Reformulations 1.5. Some Mathematical Issues . . . . . . . . . 1.6. Dynamic Programming and Minimax Control 1.7. Notes, Sources, and Exercises. . . . . . . . p. 2 p.12 p. 18 p. 35 p. 42 p.46 p.51 Life can only be understood going backwards, but it lllust be lived going forwards. Kierkegaard N is the horizon or number of times control is applied, and fk is a function that describes the system and in particular the mech- anism by which the state is updated. The cost function is additive in the sense that the cost incurred at time k, denoted by gk(Xk, Uk, 'Wk), accumulates over time. The total cost is 2 ,]'11e Dynamic Programming Algmithm Chap. 1 Sec. 1.1 Introduction 3 1.1 INTRODUCTION This book deals with situations where decisions are made in stages. The outcome of each decision may not be fully predictable but can be antici- pated to some extent before the next decision is made. The objective is to minimize a certain cost a mathematical expression of what is considered an undesirable outcome. A key aspect of such situations is that decisions cannot be viewed in isolation since one must balance the desire for low present cost with the undesirability of high future costs. The dynamic programming technique captures this tradeoff. At each stage, it ranks decisions based on the sum of the present cost and the expected future cost, assuming optimal decision making for subsequent stages. There is a very broad variety of practical problems that can be treated by dynamic programming. In this book, we try to keep the main ideas uncluttered by irrelevant assumptions on problem structure. To this end, we formulate in this section a broadly applicable model of optimal control of a dynamic system over a finite number of stages (a finite horizon). This model will occupy us for the first six chapters; its infinite horizon version will be the subject of the last chapter as well as Vol. II. Our basic model has two principal features: (1) an underlying discrete- time dynamic system, and (2) a cost function that is additive over time. The dynamic system expresses the evolution of some variables, the system's "state" , under the influence of decisions made at discrete instances of time. T'he system has the form k = 0,1, ... ,N - 1, where k indexes discrete time, :1; k is the state of the system and summarizes past information that is relevant for future optimization, 'Ilk is the control or decision variable to be selected at time k, 'Wh: is a random parameter (also called disturbance or noise depending on the context), N-1 gN(XN) + L gk(Xk, Uk, 'Wk), k=O where gN(XN) is a terminal cost incurred at the end of the process. How- ever, because of the presence of 'Wk, the cost is generally a random variable and cannot be meaningfully optimized. We therefore formulate the problem as an optimization of the expected cost where the expectation is with respect to the joint distribution of the random variables involved. The optimization is over the controls 'lLo, 'Ill, ... , UN -1, but some qualification is needed here; each control Uk is selected with some knowledge of the current state Xk (either its exact value or some other related information). A more precise definition of the terminology just used will be given shortly. Vile first provide some orientation by means of examples. Example 1.1.1 (Inventory Control) Consider a problem of ordering a quantity of a certain item at each of N periods so as to (roughly) meet a stochastic demand, while minimizing the incurred expected cost. Let us denote Xk stock available at the beginning of the kth period, Uk stock ordered (and immediately delivered) at the beginning of the kth period, 'Wk demand during the kth period with given probability distribution. We assume that 'Wo, 'WI, ... , 'WN-l are independent random variables, and that excess demand is backlogged and filled as soon as additional inven- tory becomes available. Thus, stock evolves according to the discrete-time equation where negative stock corresponds to backlogged demand (see Fig. 1.1.1). The cost incurred in period k consists of two components: (a) A cost r(xk) representing a penalty for either positive stock Xk (holding cost for excess inventory) or negative stock Xk (shortage cost for unfilled demand). Figure 1.1.1 Inventory control example. At period k, the current stock (state) x k, the stock ordered (control) Uk, and the demand (random distur- bance) 'Wk determine the cost r(xk)+cUk and the stock Xk+1 = Xk +Uk 'Wk at the next period. 1-.........--- Uk 5 if Xk < Eh, otherwise, Introduction /tk(Xk) = amount that should be ordered at time k if the stock is Xk. and we want to minimize J1t"(xo) for a given Xo over all 'if that satisfy the constraints of the problem. This is a typical dynamic programming problem. We will analyze this problem in various forms in subsequent sections. For example, we will show in Section 4.2 that for a reasonable choice of the cost function, the optimal ordering policy is of the form The sequence 'if {{to, ... , jlN - I} will be referred to as a policy or contr-ol law. For each 'if, the corresponding cost for a fixed initial stock :ro is so as to minimize the expected cost. The meaning of jlk is that, for each k and each possible value of Xk, Sec. 1.1Chap. 1 dk+1 The Dynamic Programming Algorithm Wk IDemand at Period k eriod I< Stocl< at Perio Inventory System Xk+ 1 = Xk + Stock ordered at Period I< Stocl< at P xk Cost of Penod k r(xk) + CUI< (b) The purchasing cost C'Uk, where c is cost per unit ordered. There is also a terminal cost R(XN) for being left with inventory XN at the end of N periods. Thus, the total cost over N periods is where Sk is a suitable threshold level determined by the data of the problem. In other words, when stock falls below the threshold Sk, order just enough to bring stock up to Sk. We want to minimize this cost by proper choice of the orders Uo, ... , UN-I, subject to the natural constraint Uk 2:: 0 for all k. At this point we need to distinguish between closed-loop and open- loop minimization of the cost. In open-loop minimization we select all orders Uo, ... , UN-I at once at time 0, without waiting to see the subsequent demand levels. In closed-loop minimization we postpone placing the order Uk until the last possible moment (time k) when the current stock Xk will be known. The idea is that since there is no penalty for delaying the order Uk up to time k, we can take advantage of information that becomes available between times o and k (the demand and stock level in past periods).

                    本文档为【Dynamic Programming and Optimal Control Volume I】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

Dynamic Programming and Optimal Control Volume I

你可能还喜欢