Optimal Control and Reinforcement Learning
Preface
Feedback
Offerings
1
Markov Decision Process
1.1
Finite-Horizon MDP
1.1.1
Value Functions
1.1.2
Policy Evaluation
1.1.3
Principle of Optimality
1.1.4
Dynamic Programming
1.2
Infinite-Horizon MDP
1.2.1
Value Functions
1.2.2
Policy Evaluation
1.2.3
Principle of Optimality
1.2.4
Policy Improvement
1.2.5
Policy Iteration
1.2.6
Value Iteration
2
Value-based Reinforcement Learning
2.1
Tabular Methods
2.1.1
Policy Evaluation
2.1.2
Convergence Proof of TD Learning
2.1.3
On-Policy Control
2.1.4
Off-Policy Control
2.2
Function Approximation
2.2.1
Basics of Continuous MDP
2.2.2
Policy Evaluation
2.2.3
On-Policy Control
2.2.4
Off-Policy Control
3
Policy Gradient Methods
3.1
Gradient-based Optimization
3.1.1
Basic Setup
3.1.2
Gradient Ascent and Descent
3.1.3
Stochastic Gradients
3.1.4
Beyond Vanilla Gradient Methods
3.2
Policy Gradients
3.2.1
Setup
3.2.2
The Policy Gradient Lemma
3.2.3
REINFORCE
3.2.4
Baselines and Variance Reduction
3.3
Actor–Critic Methods
3.3.1
Anatomy of an Actor–Critic
3.3.2
On-Policy Actor–Critic with TD(0)
3.3.3
Generalized Advantage Estimation (GAE)
3.3.4
Off-Policy Actor–Critic
3.4
Advanced Policy Gradients
3.4.1
Revisiting Generalized Policy Iteration
3.4.2
Performance Difference Lemma
3.4.3
Trust Region Constraint
3.4.4
Natural Policy Gradient
3.4.5
Proof of Fisher Information
3.4.6
Trust Region Policy Optimization
3.4.7
Proximal Policy Optimization
3.4.8
Soft Actor–Critic
3.4.9
Deterministic Policy Gradient
3.5
Model-based Policy Optimization
4
Model-based Planning and Optimization
4.1
Linear Quadratic Regulator
4.1.1
Finite-Horizon LQR
4.1.2
Infinite-Horizon LQR
4.1.3
Linear System Basics
4.2
LQR Trajectory Tracking
4.3
Trajectory Optimization
4.3.1
Iterative LQR
4.3.2
Differential Dynamic Programming
4.3.3
Quadratic Programming
4.3.4
Sequential Quadratic Programming
4.3.5
Interior Point Method
4.4
Model Predictive Control
5
Advanced Materials
Appendix
A
Convex Analysis and Optimization
A.1
Theory
A.1.1
Sets
A.1.2
Convex function
A.1.3
Lagrange dual
A.1.4
KKT condition
A.2
Practice
A.2.1
CVX Introduction
A.2.2
Linear Programming (LP)
A.2.3
Quadratic Programming (QP)
A.2.4
Quadratically Constrained Quadratic Programming (QCQP)
A.2.5
Second-Order Cone Programming (SOCP)
A.2.6
Semidefinite Programming (SDP)
A.2.7
CVXPY Introduction and Examples
B
Linear System Theory
B.1
Stability
B.1.1
Continuous-Time Stability
B.1.2
Discrete-Time Stability
B.1.3
Lyapunov Analysis
B.2
Controllability and Observability
B.2.1
Cayley-Hamilton Theorem
B.2.2
Equivalent Statements for Controllability
B.2.3
Duality
B.2.4
Equivalent Statements for Observability
B.3
Stabilizability And Detectability
B.3.1
Equivalent Statements for Stabilizability
B.3.2
Equivalent Statements for Detectability
References
Published with bookdown
Optimal Control and Reinforcement Learning
Chapter 5
Advanced Materials