How can I apply reinforcement learning to continuous action spaces?

The common way of dealing with this problem is with actor-critic methods. These naturally extend to continuous action spaces. Basic Q-learning could diverge when working with approximations, however, if you still want to use it, you can try combining it with a self-organizing map, as done in "Applications of the self-organising map to reinforcement learning". The paper also contains some further references you might find useful.


Fast forward to this year, folks from DeepMind proposes a deep reinforcement learning actor-critic method for dealing with both continuous state and action space. It is based on a technique called deterministic policy gradient. See the paper Continuous control with deep reinforcement learning and some implementations.


There are numerous ways to extend reinforcement learning to continuous actions. One way is to use actor-critic methods. Another way is to use policy gradient methods.

A rather extensive explanation of different methods can be found in the following paper, which is available online: Reinforcement Learning in Continuous State and Action Spaces (by Hado van Hasselt and Marco A. Wiering).