Here is a curated list of Reinforcement Learning materials that we've gathered over the years. This is shared with our new employees and we thought we'd share it with the rest of the world.

**Theory**

Lectures[UCL] COMPM050/COMPGI13 Reinforcement Learning by David Silver

Video Lectures by David Silver https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PL5X3mDkKaJrL42i_jhE4N-p6E2Ol62Ofa

[UC Berkeley] CS188 Artificial Intelligence by Pieter Abbeel

[Udacity (Georgia Tech.)] Machine Learning 3: Reinforcement Learning (CS7641)

[Stanford] CS229 Machine Learning - Lecture 16: Reinforcement Learning by Andrew Ng

**Books**

Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction [Book] [Code]

Csaba Szepesvari, Algorithms for Reinforcement Learning [Book]

David Poole and Alan Mackworth, Artificial Intelligence: Foundations of Computational Agents [Book Chapter]

Dimitri P. Bertsekas and John N. Tsitsiklis, Neuro-Dynamic Programming [Book (Amazon)] [Summary]

Mykel J. Kochenderfer, Decision Making Under Uncertainty: Theory and Application [Book (Amazon)]

**Surveys**

Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore, Reinforcement Learning: A Survey, JAIR, 1996.[Paper]

S. S. Keerthi and B. Ravindran, A Tutorial Survey of Reinforcement Learning, Sadhana, 1994. [Paper]

Matthew E. Taylor, Peter Stone, Transfer Learning for Reinforcement Learning Domains: A Survey, JMLR, 2009.[Paper]

Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement Learning in Robotics, A Survey, IJRR, 2013. [Paper]

Michael L. Littman, "Reinforcement learning improves behaviour from evaluative feedback." Nature 521.7553 (2015): 445-451. [Paper]

Marc P. Deisenroth, Gerhard Neumann, Jan Peter, A Survey on Policy Search for Robotics, Foundations and Trends in Robotics, 2014. [Book]

**Papers / Thesis**

Foundational Papers

Marvin Minsky, Steps toward Artificial Intelligence, Proceedings of the IRE, 1961. [Paper]

discusses issues in RL such as the "credit assignment problem"

Ian H. Witten, An Adaptive Optimal Controller for Discrete-Time Markov Environments, Information and Control, 1977. [Paper]

earliest publication on temporal-difference (TD) learning rule.

Methods

Dynamic Programming (DP):

Christopher J. C. H. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, 1989.[Thesis]

Monte Carlo:

Temporal-Difference:

Richard S. Sutton, Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44, 1988. [Paper]

Q-Learning (Off-policy TD algorithm):

Chris Watkins, Learning from Delayed Rewards, Cambridge, 1989. [Thesis]

Sarsa (On-policy TD algorithm):

R-Learning (learning of relative values)

Andrew Schwartz, A Reinforcement Learning Method for Maximizing Undiscounted Rewards, ICML, 1993.[Paper-Google Scholar]

Function Approximation methods (Least-Sqaure Temporal Difference, Least-Sqaure Policy Iteration)

Policy Search / Policy Gradient

Richard Sutton, David McAllester, Satinder Singh, Yishay Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS, 1999. [Paper]

Jan Peters, Sethu Vijayakumar, Stefan Schaal, Natural Actor-Critic, ECML, 2005. [Paper]

Jens Kober, Jan Peters, Policy Search for Motor Primitives in Robotics, NIPS, 2009. [Paper]

Jan Peters, Katharina Mulling, Yasemin Altun, Relative Entropy Policy Search, AAAI, 2010. [Paper]

Freek Stulp, Olivier Sigaud, Path Integral Policy Improvement with Covariance Matrix Adaptation, ICML, 2012. [Paper]

Nate Kohl, Peter Stone, Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, ICRA, 2004. [Paper]

Marc Deisenroth, Carl Rasmussen, PILCO: A Model-Based and Data-Efficient Approach to Policy Search, ICML, 2011. [Paper]

Scott Kuindersma, Roderic Grupen, Andrew Barto, Learning Dynamic Arm Motions for Postural Recovery, Humanoids, 2011. [Paper]

Hierarchical RL

Deep Learning + Reinforcement Learning (A sample of recent works on DL+RL)

V. Mnih, et. al., Human-level Control through Deep Reinforcement Learning, Nature, 2015. [Paper]

Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, Xiaoshi Wang, Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, NIPS, 2014. [Paper]

Sergey Levine, Chelsea Finn, Trevor Darrel, Pieter Abbeel, End-to-End Training of Deep Visuomotor Policies. ArXiv, 16 Oct 2015. [ArXiv]

Tom Schaul, John Quan, Ioannis Antonoglou, David Silver, Prioritized Experience Replay, ArXiv, 18 Nov 2015. [ArXiv]

Hado van Hasselt, Arthur Guez, David Silver, Deep Reinforcement Learning with Double Q-Learning, ArXiv, 22 Sep 2015. [ArXiv]

Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, Asynchronous Methods for Deep Reinforcement Learning, ArXiv, 4 Feb 2016.[ArXiv]

**Game Playing**

Doom! - https://github.com/openai/doom-py/tree/master/doom_py

Traditional Games

Computer Games

**Control**

An Application of Reinforcement Learning to Aerobatic Helicopter Flight (Abbeel, NIPS 2006) [Paper][Video]

Autonomous helicopter control using Reinforcement Learning Policy Search Methods (Bagnell, ICRA 2011) [Paper]

**Tutorials / Websites**

Mance Harmon and Stephanie Harmon, Reinforcement Learning: A Tutorial

Short introduction to some Reinforcement Learning algorithms

C. Igel, M.A. Riedmiller, et al., Reinforcement Learning in a Nutshell, ESANN, 2007. [Paper]

UNSW - Reinforcement Learning

Scholarpedia articles on:

Repository with useful MATLAB Software, presentations, and demo videos

UC Berkeley - CS 294: Deep Reinforcement Learning, Fall 2015 (John Schulman, Pieter Abbeel) [Class Website]

Blog posts on Reinforcement Learning, Parts 1-4 by Travis DeWolf

The Arcade Learning Environment - Atari 2600 games environment for developing AI agents

Deep Reinforcement Learning: Pong from Pixels by Andrej Karpathy

**Online Demos**

Deep Q-Learning Demo - A deep Q learning demonstration using ConvNetJS

Deep Q-Learning with Tensor Flow - A deep Q learning demonstration using Google Tensorflow

Reinforcement Learning Demo - A reinforcement learning demo using reinforcejs by Andrej Karpathy

Stay tuned on the __Remi AI blog__ as we build out the complete supply chain offering!

Or, if you're ready to start seeing the benefits of A.I-powered inventory management, start the journey __here__.

Who are we?

*Remi AI is an Artificial Intelligence Research Firm with offices in Sydney and San Francisco. We have delivered inventory and supply chain projects across FMCG, automotive, industrial and corporate supply and more.*

Want to know more? Sign up to our newsletter for the latest information from us and other knowledgeable folk in the market.