Reinforcement learning (RL) can optimally solve decision and control problems involving complex dynamic systems, without requiring a mathematical model of the system. If a model is available, dynamic programming (DP), the model-based counterpart of RL, can be used. RL and DP are applicable in a variety of disciplines, including automatic control, artificial intelligence, economics, and medicine. Recent years have seen a surge of interest RL and DP using compact, approximate representations of the solution, which enable algorithms to scale up to realistic problems.
This book provides an in-depth introduction to RL and DP with function approximators. A concise description of classical RL and DP (Chapter 2) builds the foundation for the remainder of the book. This is followed by an extensive review of the state-of-the-art in RL and DP with approximation, which combines algorithm development with theoretical guarantees, illustrative numerical examples, and insightful comparisons (Chapter 3). Each of the final three chapters (4 to 6) is dedicated to a representative algorithm from the three major classes of methods: value iteration, policy iteration, and policy search. The features and performance of these algorithms are highlighted in extensive experimental studies on a range of control applications.
For graduate students and others new to the field, this book offers a thorough introduction to both the basics and emerging methods. And for those researchers and practitioners working in the fields of optimal and adaptive control, machine learning, artificial intelligence, and operations research, this resource offers a combination of practical algorithms, theoretical analysis, and comprehensive examples that they will be able to adapt and apply to their own work.
Sample chapter: Ch. 3 - Dynamic programming and reinforcement learning in large and continuous spaces. The most extensive chapter in the book, it reviews methods and algorithms for approximate dynamic programming and reinforcement learning, with theoretical results, discussion, and illustrative numerical examples. This chapter has been made freely available for download, for a limited time, with the kind permission of Taylor & Francis.
Code used for the numerical studies in the book: ApproxRL, A Matlab toolbox for approximate RL and DP, approxrl.zip. See the readme file of the toolbox for more information.
Lecture slides on classical RL and DP (part 1) and on RL and DP with function approximation (part 2).
Lucian Busoniu is a postdoctoral fellow at the Delft Center for Systems and Control of Delft University of Technology, in the Netherlands. He received his PhD degree (cum laude) in 2009 from the Delft University of Technology, and his MSc degree in 2003 from the Technical University of Cluj-Napoca, Romania. His current research interests include reinforcement learning and dynamic programming with function approximation, intelligent and learning techniques for control problems, and multi-agent learning.
Robert Babuska is a full professor at the Delft Center for Systems and Control of Delft University of Technology in the Netherlands. He received his PhD degree (cum laude) in Control in 1997 from the Delft University of Technology, and his MSc degree (with honors) in Electrical Engineering in 1990 from Czech Technical University, Prague. His research interests include fuzzy systems modeling and identification, data-driven construction and adaptation of neuro-fuzzy systems, model-based fuzzy control and learning control. He is active in applying these techniques in robotics, mechatronics, and aerospace.
Bart De Schutter is a full professor at the Delft Center for Systems and Control and at the Marine & Transport Technology department of Delft University of Technology in the Netherlands. He received the PhD degree in Applied Sciences (summa cum laude with congratulations of the examination jury) in 1996 from K.U. Leuven, Belgium. His current research interests include multi-agent systems, hybrid systems control, discrete-event systems, and control of intelligent transportation systems.
Damien Ernst received the MSc and PhD degrees from the University of Liège in 1998 and 2003, respectively. He is currently a Research Associate of the Belgian FRS-FNRS and he is affiliated with the Systems and Modeling Research Unit of the University of Liège. Damien Ernst spent the period 2003--2006 with the University of Liège as a Postdoctoral Researcher of the FRS-FNRS and held during this period positions as visiting researcher at CMU, MIT and ETH. He spent the academic year 2006--2007 working at Supélec (France) as professor. His main research interests are in the fields of power system dynamics, optimal control, reinforcement learning, and design of dynamic treatment regimes.
Comments, suggestions, and questions concerning the book or the Web site are welcome. Please contact preferably the first author, Lucian Busoniu, or otherwise any of the other authors (see their respective websites for contact information).