References

Antos, András, Csaba Szepesvári, and Rémi Munos. 2007. “Fitted q-Iteration in Continuous Action-Space MDPs.” Advances in Neural Information Processing Systems 20.

Arnold, William F, and Alan J Laub. 1984. “Generalized Eigenproblem Algorithms and Software for Algebraic Riccati Equations.” Proceedings of the IEEE 72 (12): 1746–54.

Baird, Leemon et al. 1995. “Residual Algorithms: Reinforcement Learning with Function Approximation.” In Proceedings of the Twelfth International Conference on Machine Learning, 30–37.

Barto, Andrew G, Richard S Sutton, and Charles W Anderson. 2012. “Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems.” IEEE Transactions on Systems, Man, and Cybernetics, no. 5: 834–46.

Chen, Chi-Tsong. 1984. Linear System Theory and Design. Saunders college publishing.

Davison, E., and W. Wonham. 1968. “On Pole Assignment in Multivariable Linear Systems.” IEEE Transactions on Automatic Control 13 (6): 747–48. https://doi.org/10.1109/TAC.1968.1099056.

Fan, Jianqing, Zhaoran Wang, Yuchen Xie, and Zhuoran Yang. 2020. “A Theoretical Analysis of Deep q-Learning.” In Learning for Dynamics and Control, 486–89. PMLR.

Garrigos, Guillaume, and Robert M Gower. 2023. “Handbook of Convergence Theorems for (Stochastic) Gradient Methods.” arXiv Preprint arXiv:2301.11235.

Haarnoja, Tuomas, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.” In International Conference on Machine Learning, 1861–70. Pmlr.

Janner, Michael, Justin Fu, Marvin Zhang, and Sergey Levine. 2019. “When to Trust Your Model: Model-Based Policy Optimization.” Advances in Neural Information Processing Systems 32.

Kakade, Sham M. 2001. “A Natural Policy Gradient.” Advances in Neural Information Processing Systems 14.

Kang, Shucheng, Xiaoyang Xu, Jay Sarva, Ling Liang, and Heng Yang. 2024. “Fast and Certifiable Trajectory Optimization.” In International Workshop on the Algorithmic Foundations of Robotics.

Kearns, Michael J, and Satinder Singh. 2000. “Bias-Variance Error Bounds for Temporal Difference Updates.” In COLT, 142–47.

Lillicrap, Timothy P, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. “Continuous Control with Deep Reinforcement Learning.” arXiv Preprint arXiv:1509.02971.

Liu, Dong C, and Jorge Nocedal. 1989. “On the Limited Memory BFGS Method for Large Scale Optimization.” Mathematical Programming 45 (1): 503–28.

Mahmood, A Rupam, Huizhen Yu, Martha White, and Richard S Sutton. 2015. “Emphatic Temporal-Difference Learning.” arXiv Preprint arXiv:1507.01569.

Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, et al. 2015. “Human-Level Control Through Deep Reinforcement Learning.” Nature 518 (7540): 529–33.

Munos, Rémi, and Csaba Szepesvári. 2008. “Finite-Time Bounds for Fitted Value Iteration.” Journal of Machine Learning Research 9 (5).

Nesterov, Yurii. 2018. Lectures on Convex Optimization. Vol. 137. Springer.

Nocedal, Jorge, and Stephen J Wright. 1999. Numerical Optimization. Springer.

Riedmiller, Martin. 2005. “Neural Fitted q Iteration–First Experiences with a Data Efficient Neural Reinforcement Learning Method.” In European Conference on Machine Learning, 317–28. Springer.

Robbins, Herbert, and David Siegmund. 1971. “A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications.” In Optimizing Methods in Statistics, 233–57. Elsevier.

Schulman, John, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. “Trust Region Policy Optimization.” In International Conference on Machine Learning, 1889–97. PMLR.

Schulman, John, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2015. “High-Dimensional Continuous Control Using Generalized Advantage Estimation.” arXiv Preprint arXiv:1506.02438.

Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. “Proximal Policy Optimization Algorithms.” arXiv Preprint arXiv:1707.06347.

Silver, David, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. “Deterministic Policy Gradient Algorithms.” In International Conference on Machine Learning, 387–95. Pmlr.

Sutton, Richard S, and Andrew G Barto. 1998. Reinforcement Learning: An Introduction. Vol. 1. 1. MIT press Cambridge.

Sutton, Richard S, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, and Eric Wiewiora. 2009. “Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation.” In Proceedings of the 26th Annual International Conference on Machine Learning, 993–1000.

Sutton, Richard S, Csaba Szepesvári, and Hamid Reza Maei. 2008. “A Convergent o(n) Algorithm for Off-Policy Temporal-Difference Learning with Linear Function Approximation.” Advances in Neural Information Processing Systems 21 (21): 1609–16.

Wächter, Andreas, and Lorenz T Biegler. 2006. “On the Implementation of an Interior-Point Filter Line-Search Algorithm for Large-Scale Nonlinear Programming.” Mathematical Programming 106 (1): 25–57.

Zhou, Kemin, JC Doyle, and Keither Glover. 1996. “Robust and Optimal Control.” Control Engineering Practice 4 (8): 1189–90.