User Scheduling and Beam-Selection with Tabular and Deep Reinforcement Learning
Rebecca Almeida Aben-Athar, Cleverson Nahum, Davi da Silva Brilhante, José F. de Rezende, Luciano Leonel Mendes, Aldebaro Klautau

DOI: 10.14209/sbrt.2022.1570824777
Evento: XL Simpósio Brasileiro de Telecomunicações e Processamento de Sinais (SBrT2022)
Keywords: Reinforcement learning 5G 6G beam-selection
Abstract
Reinforcement Learning (RL) is a promising alternative to traditional methods of user scheduling and beam-selection (SBS). Most of the current works in this topic adopt deep RL, in which neural networks allow to adopt state and action spaces with dimensions larger than the ones supported by tabular RL. However, while deep RL uses approximations that prevent them from getting policies that can be guaranteed to be optimal, tabular RL allow methods that find the optimal policy. The lack of optimal solutions complicates the proper interpretation and assessment of results in deep RL applied to SBS. This paper discusses how optimal policies can be found in the context of SBS and the associated issues. It also provides environments based on finite Markov decision processes that promote reproducible results and support smooth transition from simple to more advanced RL problems. The presented experiments provide a benchmark of state-of-art deep and tabular RL algorithms, including scenarios for which the optimal solution is known. The results indicate that there is still room for improvement with respect to deep RL algorithms, which do not reach the optimal solution in the adopted scenarios. This methodology not only provides insight on the performance of RL methods but helps comparing new algorithms by first looking at contrived problems and later expanding the number of states and actions.

Download