Optimal Policies for Reinforcement Learning Applied to User Scheduling Tabular Environments
Caio B. Brasil, Cleverson Veloso Nahum, Aldebaro Klautau, Jasmine Priscyla Leite de Araújo, Ingrid Nascimento

DOI: 10.14209/sbrt.2025.1571156595
Evento: XLIII Simpósio Brasileiro de Telecomunicações e Processamento de Sinais (SBrT2025)
Keywords: Reinforcement learning User scheduling Baseline
Abstract
User scheduling is a fundamental task in shared systems where multiple users or processes compete for limited resources. Its main objective is to allocate these resources efficiently while ensuring fairness, high performance, and adherence to quality of service (QoS) requirements. In this paper, we explore the use of Reinforcement Learning (RL) methods to address the user scheduling problem in a simplified scenario. Our results show that when the problem is modeled as a fully observable finite Markov decision process (FMDP), deep reinforcement learning methods exhibit apparent training convergence. However, when compared to classical approaches such as Value Iteration, there remains noticeable room for policy improvement in these methods, making them suitable as baselines for further implementations.

Download