Often, real-world problems modeled by Markov decision processes (MDPs) are difficult to solve in practise because of the curse of dimensionality. In others, explicit specification of the MDP model parameters is not feasible, but simulation samples are available. For these settings, various sampling and population-based numerical algorithms for computing an optimal solution in terms of a policy and/or value function have been developed recently.
Here, this state-of-the-art research is brought together in a way that makes it accessible to researchers of varying interests and backgrounds. Many specific algorithms, illustrative numerical examples and rigorous theoretical convergence results are provided. The algorithms differ from the successful computational methods for solving MDPs based on neuro-dynamic programming or reinforcement learning. The algorithms can be combined with approximate dynamic programming methods that reduce the size of the state space and ameliorate the effects of dimensionality.