Next-generation networks aim to provide performance guarantees to real-time interactive services that require timely and cost-efficient packet delivery. In this context, the goal is to reliably deliver packets with strict deadlines imposed by the application while minimizing overall resource allocation cost. A large body of work has leveraged stochastic optimization techniques to design efficient dynamic routing and scheduling solutions under average delay constraints; however, these methods fall short when faced with strict per-packet delay requirements. We work on the minimum-cost delay-constrained network control problem as a constrained Markov decision process and utilize constrained deep reinforcement learning (CDRL) techniques to effectively minimize total resource allocation cost while maintaining timely throughput above a target reliability level.
Figure 1: Illustration of lifetime-based queue dynamics. Each commodity is responsible to forward packets from its source node to its destination node. Packets build up on queues based on their lifetimes and which commodity they belong to. After each time slot, packets gradually deplete their lifetimes as queues evolve, and move to queues with less lifetime. In this figure, packets go from green to red as they lose lifetime, and packets with no remaining lifetime are assumed to be dropped from the queue. Scheduling agent can decide to forward, hold, or actively drop packets based on its policy.
Figure 2: Reliability (timely throughput/mean arrival rate) values on the left side of y-axis, and cost per episode values on the right side of the y-axis. We compare our algorithm CDRL-NC with baselines such as Backpressure (BP) and Universal Max-Weight (UMW). Our results show that CDRL-NC can satisfy the reliability constraint (horizontal black dashed line) with lower cost whereas other baselines fail to satisfy the constraints, even with higher costs.
Please refer to Ozan’s website about more information and related papers.