This book presents reinforcement learning (RL) based solutions for user-centric online network selection optimization. The main content can be divided into three parts. The first part (chapter 2 and 3) focuses on how to learning the best network when QoE is revealed beyond QoS under the framework of multi-armed bandit (MAB). The second part (chapter 4 and 5) focuses on how to meet dynamic user demand in complex and uncertain heterogeneous wireless networks under the framework of markov decision process (MDP). The third part (chapter 6 and 7) focuses on how to meet heterogeneous user demand for multiple users inlarge-scale networks under the framework of game theory. Efficient RL algorithms with practical constraints and considerations are proposed to optimize QoE for realizing intelligent online network selection for future mobile networks. This book is intended as a reference resource for researchers and designers in resource management of 5G networks and beyond.