Minghao Ye

[ICNP ’23] Roracle: Enabling Lookahead Routing for Scalable Traffic Engineering with Supervised Learning

October 31, 2023 by Minghao Ye

One of our recent works entitled “Roracle: Enabling Lookahead Routing for Scalable Traffic Engineering with Supervised Learning” was accepted by The 31st IEEE International Conference on Network Protocols (ICNP 2023) on 7/29/2023. Congratulations!

In this work, we developed a Supervised Learning (SL)-based Graph Neural Network (GNN) framework called Roracle to greatly accelerate Traffic Engineering (TE) operations in today’s large-scale networks. Instead of solving a time-consuming routing optimization problem in real time, Roracle can bypass routing optimization and quickly infer lookahead routing decisions to improve network load balancing performance with good scalability. This work was presented by my PhD advisor, Prof. H. Jonathan Chao, at the ICNP conference in Reykjavik, Iceland, on 10/13/2023.

Project Overview:

(1) Limitations of traditional TE: Poor scalability in today’s large-scale wide-area networks

(2) Proposed TE solution: Roracle and its advantages

(3) GNN framework of Roracle

(4) Evaluation results in real-world large networks and traffic traces

Abstract:

Traditional Traffic Engineering (TE) usually balances the load on network links by formulating and solving a routing optimization problem based on measured Traffic Matrices (TMs). Given that traffic demands could change unexpectedly and significantly in realistic scenarios, routing strategies optimized based on currently measured TMs might not work well in future traffic scenarios. To compensate for the mismatch between stale routing decisions and future TMs, network operators may perform routing updates more frequently, which could introduce significant network disturbance and service disruption. Moreover, given the high routing computation overhead of TE optimization in today’s large-scale networks, routing updates could experience severe delay and thus cannot accommodate future traffic changes in time.

To address these challenges, we propose Roracle, a scalable learning-based TE that quickly predicts a good routing strategy for a long sequence of future TMs, while the learning process is guided by the optimal solutions of Linear Programming (LP) problems using Supervised Learning (SL). We design a scalable Graph Neural Network (GNN) architecture that greatly facilitates training and inference processes to accelerate TE in large networks.

Extensive simulation results on real-world network topologies and traffic traces show that Roracle outperforms existing TE solutions by up to 36% in terms of worst-case performance under future unknown traffic scenarios. Additionally, Roracle achieves good scalability by providing at least 71× speedup over the most efficient baseline method in large-scale networks.

Publications:

[ICNP ’23] Minghao Ye, Junjie Zhang, Zehua Guo, and H. Jonathan Chao, “Roracle: Enabling Lookahead Routing for Scalable Traffic Engineering with Supervised Learning,” The 31st IEEE International Conference on Network Protocols (ICNP), 2023. (Acceptance rate: 18.8%, 34/181) [Paper URL] [PDF]

[IWQoS ’23] QoS-RL: QoS-Aware Traffic Engineering with Reinforcement Learning

June 20, 2023 by Minghao Ye

One of our recent works entitled “Reinforcement Learning-based Traffic Engineering for QoS Provisioning and Load Balancing” was accepted by the 31st IEEE/ACM International Symposium on Quality of Service (IWQoS 2023) on 3/30/2023. Congratulations!

In this work, we developed a Reinforcement Learning (RL)-based traffic engineering solution to provide good Quality of Service (QoS) for high priority traffic while maintaining promising load balancing performance in the network by rerouting a small portion of low priority traffic with low management overhead. This work was presented at the IWQoS conference in Orlando, FL, USA, on 6/19/2023.

Project Overview:

(1) TE requirements for different applications: QoS provisioning + load balancing

(2) Our idea: Categorize traffic into different priority levels for routing optimization

(3) Apply destination-based routing to forward high/low priority traffic with simplified routing updates

(4) Leverage intelligent RL to select a few forwarding entries for routing updates: Achieve good performance with efficient TCAM usage + accelerated TE optimization + reduced management overhead

(5) Evaluation results: Close-to-optimal delay performance + Optimal load balancing performance

Abstract:

Emerging applications pose different Quality of Service (QoS) requirements for the network, where Traffic Engineering (TE) plays an important role in QoS provisioning by carefully selecting routing paths and adjusting traffic split ratios on routing paths. To accommodate diverse QoS requirements of traffic flows under network dynamics, TE usually periodically computes an optimal routing strategy and updates a significant number of forwarding entries, which introduces considerable network operation management overhead.

In this work, we propose QoS-RL, a Reinforcement Learning (RL)-based TE solution for QoS provisioning and load balancing with low management overhead and service disruption during routing updates. Given the traffic matrices that represent the traffic demands of high and low priority flows, QoS-RL can intelligently select and update only a few destination-based forwarding entries to satisfy the QoS requirements of high priority traffic while maintaining good load balancing performance by rerouting a small portion of low priority traffic.

Extensive simulation results on four real-world network topologies demonstrate that QoS-RL provides at least 95.5% of optimal end-to-end delay performance on average for high priority flows, and also achieves above 90% of optimal load balancing performance in most cases by updating only 10% of destination-based forwarding entries.

Publications:

[IWQoS ’23] Minghao Ye, Yang Hu (co-first author), Junjie Zhang, Zehua Guo, and H. Jonathan Chao, “Reinforcement Learning-based Traffic Engineering for QoS Provisioning and Load Balancing,” The 31st IEEE/ACM International Symposium on Quality of Service (IWQoS), 2023. (Acceptance rate: 23.5%, 62/264) [Paper URL] [PDF]

[INFOCOM ’23] LARRI: Adaptive Range Routing Prediction with Supervised Learning and Graph Neural Networks

December 1, 2022 by Minghao Ye

One of our recent works entitled “LARRI: Learning-based Adaptive Range Routing for Highly Dynamic Traffic in WANs” was accepted by the IEEE International Conference on Computer Communications (INFOCOM 2023). Congratulations!

In this work, we developed a learning-based adaptive range routing scheme to accommodate dynamic future traffic fluctuation in the networks. The key insight is to directly predict a routing for future traffic scenarios by combining Supervised Learning (SL) and Graph Neural Network (GNN) techniques. This work was presented at the INFOCOM conference in Hoboken, NJ, USA, on 5/17/2023.

Project Overview:

(1) Existing problem: Traditional TE cannot adapt to unexpected traffic fluctuations in the future(2) LARRI: Directly predict appropriate range routing strategies based on historical traffic demands to accommodate future traffic variations

(3) SL + GNN: Accelerate the training process and improve the accuracy of routing prediction

(4) Workflow of LARRI

(5) Evaluation results: LARRI is robust against unexpected traffic fluctuations with a strong worst-case performance guarantee

Abstract:

Traffic Engineering (TE) has been widely used by network operators to improve network performance and provide better service quality to users. One major challenge for TE is how to generate good routing strategies adaptive to highly dynamic future traffic scenarios. Unfortunately, existing works could either experience severe performance degradation under unexpected traffic fluctuations or sacrifice performance optimality for guaranteeing the worst-case performance when traffic is relatively stable.

In this paper, we propose LARRI, a learning-based TE to predict adaptive routing strategies for future unknown traffic scenarios. By learning and predicting a routing to handle an appropriate range of future possible traffic matrices, LARRI can effectively realize a trade-off between performance optimality and worst-case performance guarantee. This is done by integrating the prediction of future demand range and the imitation of optimal range routing into one step. Moreover, LARRI employs a scalable graph neural network architecture to greatly facilitate training and inference.

Extensive simulation results on six real-world network topologies and traffic traces show that LARRI achieves near-optimal load balancing performance in future traffic scenarios with up to 43.3% worst-case performance improvement over state-of-the-art baselines, and also provides the lowest end-to-end delay under dynamic traffic fluctuations.

Publication:

[INFOCOM ’23] Minghao Ye, Junjie Zhang, Zehua Guo, and H. Jonathan Chao, “LARRI: Learning-based Adaptive Range Routing for Highly Dynamic Traffic in WANs,” IEEE International Conference on Computer Communications (INFOCOM), 2023. (Selected as one of the five fast-tracked papers for submission to IEEE/ACM Transactions on Networking. Acceptance rate: 19.2%, 252/1312) [Paper URL] [PDF]

[ToN 22] FlexDATE: Disturbance-Aware Traffic Engineering with Reinforcement Learning in SDN

November 17, 2022 by Minghao Ye

One important issue neglected by existing Traffic Engineering (TE) solutions is the network disturbance and service disruption caused by flow rerouting operations. To address this problem, we developed an RL-based TE solution called FlexDATE to reduce the network disturbance of TE while achieving near-optimal load balancing performance. Our idea is to leverage Reinforcement Learning (RL) to intelligently select and reroute a flexible number of critical flows that contribute the most to load balancing performance improvement, while the majority of network traffic is routed by the static ECMP method without any routing updates to mitigate network disturbance.

Project Overview:

(1) Motivation: Network disturbance caused by TE is neglected by existing works

(2) FlexDATE: Use RL to intelligently identify critical flows in dynamic networks, and then use Linear Programming (LP) to optimize routing for critical flows to achieve load balancing with low network disturbance

(3) RL training pipeline of FlexDATE

(4) Evaluation results: Generalizes well to dynamic traffic scenarios and unseen link failures with near-optimal load balancing performance and mitigated network disturbance

Contributions:

We proposed a new QoS metric named network disturbance to evaluate the negative impact of TE’s flow rerouting operations on WANs, such as service disruption.
We designed a disturbance-aware TE with GNN and RL to intelligently reroute flexible numbers of critical flows under dynamic traffic fluctuations and unexpected single link failures.
Our proposed TE solution achieved close-to-optimal performance (i.e., above 90% of optimal performance) in 99% of network scenarios and mitigated network disturbance by up to 38.6% in five real networks.

Abstract:

Traffic Engineering (TE) is an important network operation that routes/reroutes flows based on network topology and traffic demands to optimize network performance. Recently, new emerging applications pose challenges to TE with dynamic network conditions, where frequent routing updates are required to maintain good network performance with Software-Defined Networking (SDN). However, flow rerouting operations could lead to considerable Quality of Service (QoS) degradation and service disruption, which is often neglected by existing TE solutions.

In this paper, we apply a new QoS metric named network disturbance to measure the negative impact of flow rerouting operations performed by TE. To achieve near-optimal load balancing performance and mitigate network disturbance together in dynamic network scenarios, we propose a flexible and disturbance-aware TE solution called FlexDATE that combines Reinforcement Learning (RL) and Linear Programming (LP). Specifically, FlexDATE leverages RL to intelligently identify flexible numbers of critical flows for each traffic matrix and reroutes these critical flows based on LP optimization to improve network performance with low disturbance. Empowered by a customized actor-critic architecture coupled with Graph Neural Networks (GNNs), FlexDATE can generalize well to unseen traffic scenarios and remain resilient to single link failures.

Extensive simulations are conducted on five real-world network topologies to evaluate FlexDATE with real and synthetic traffic traces. The results show that FlexDATE can achieve the performance target (i.e., 90% of optimal performance) in 99% of network scenarios and effectively mitigate the average and maximum network disturbance by up to 9.1% and 38.6%, respectively, compared to state-of-the-art TE solutions.

Publications:

[ToN 22] Minghao Ye, Junjie Zhang, Zehua Guo, and H. Jonathan Chao, “FlexDATE: Flexible and Disturbance-Aware Traffic Engineering with Reinforcement Learning in Software-Defined Networks,” IEEE/ACM Transactions on Networking (ToN), 2022. (Impact factor: 3.7) [Paper URL] [PDF]
[IWQoS ’21] Minghao Ye, Junjie Zhang, Zehua Guo, and H. Jonathan Chao, “DATE: Disturbance-Aware Traffic Engineering with Reinforcement Learning in Software-Defined Networks,” The 29th IEEE/ACM International Symposium on Quality of Service (IWQoS), 2021. (Acceptance rate: 25%, 64/256) [Paper URL] [Video] [PDF]
[JSAC 20] Junjie Zhang, Minghao Ye, Zehua Guo, Chen-Yu Yen, and H. Jonathan Chao, “CFR-RL: Traffic Engineering with Reinforcement Learning in SDN,” IEEE Journal on Selected Areas in Communications (JSAC), 2020. (Impact factor: 16.4) [Paper URL] [arXiv] [Codes] [PDF]

[JSAC 22] FlexEntry: Mitigating Destination-based Routing Update Overhead with Reinforcement Learning

July 15, 2022 by Minghao Ye

Traffic Engineering (TE) may render high time complexity when generating and updating a large number of forwarding entries in the network. To mitigate such routing update overhead, we developed a 2-stage Reinforcement Learning (RL)-based TE solution to identify a flexible number of critical destination-based forwarding entries to be updated in different traffic scenarios. As a result, we only need to compute the optimal traffic split ratios for these critical entries and update them accordingly to achieve close-to-optimal performance, while reducing the entry updates by up to 99.3% on average.

Project Overview:

(1) Our idea: Mitigating routing update overhead by only updating some critical entries at some critical nodes to reroute traffic and improve network performance

(2) FlexEntry: Reinforcement Learning (RL) + Linear Programming (LP) combined approach

(3) Stage 1: Train multiple RL sub-models with different numbers of critical entries 𝐾 to be identified in different traffic scenarios

(4) Stage 2: Train a single RL model to learn a sub-model selection policy, such that FlexEntry can select flexible numbers of critical entries to accommodate dynamic traffic scenarios

(5) Evaluation results: FlexEntry generalizes well to different traffic variations with near-optimal load balancing performance while only updating a very low percentage (~1%) of critical entries

Contributions:

We customized a 2-stage RL approach to identify critical destination-based forwarding entries for routing updates in different traffic scenarios.
We adopted Linear Programming (LP) to produce reward signals for RL and optimize traffic split ratios for the selected critical entries to control traffic distribution.
Our proposed TE solution achieved near-optimal performance in unseen traffic scenarios with at most 99.3% of average entry update savings in six real networks.

Abstract:

Traffic Engineering (TE) is a widely-adopted network operation to optimize network performance and resource utilization. Destination-based routing is supported by legacy routers and more readily deployed than flow-based routing, where the forwarding entries could be frequently updated by TE to accommodate traffic dynamics. However, as the network size grows, destination-based TE could render high time complexity when generating and updating many forwarding entries, which may limit the responsiveness of TE and degrade network performance.

In this paper, we propose a novel destination-based TE solution called FlexEntry, which leverages emerging Reinforcement Learning (RL) to reduce the time complexity and routing update overhead while achieving good network performance simultaneously. For each traffic matrix, FlexEntry only updates a few forwarding entries called critical entries for redistributing a small portion of the total traffic to improve network performance. These critical entries are intelligently selected by RL with traffic split ratios optimized by Linear Programming (LP).

We find out that the combination of RL and LP is very effective. Our simulation results on six real-world network topologies show that FlexEntry reduces up to 99.3% entry updates on average and generalizes well to unseen traffic matrices with near-optimal load balancing performance.

Publications:

[JSAC 22] Minghao Ye, Yang Hu, Junjie Zhang, Zehua Guo, and H. Jonathan Chao, “Mitigating Routing Update Overhead for Traffic Engineering by Combining Destination-based Routing with Reinforcement Learning,” IEEE Journal on Selected Areas in Communications (JSAC), 2022. (Impact factor: 16.4) [Paper URL] [Codes] [PDF]
[NetAI ’20] Junjie Zhang, Zehua Guo, Minghao Ye, and H. Jonathan Chao, “SmartEntry: Mitigating Routing Update Overhead with Reinforcement Learning for Traffic Engineering,” ACM SIGCOMM Workshop on Network Meets AI & ML (NetAI), 2020. (9 out of 19 papers were accepted) [Paper URL] [Slides] [Video] [PDF]

Minghao Ye

[ICNP ’23] Roracle: Enabling Lookahead Routing for Scalable Traffic Engineering with Supervised Learning

[IWQoS ’23] QoS-RL: QoS-Aware Traffic Engineering with Reinforcement Learning

[INFOCOM ’23] LARRI: Adaptive Range Routing Prediction with Supervised Learning and Graph Neural Networks

[ToN 22] FlexDATE: Disturbance-Aware Traffic Engineering with Reinforcement Learning in SDN

[JSAC 22] FlexEntry: Mitigating Destination-based Routing Update Overhead with Reinforcement Learning

Navigation Menu

Contact Me

Quick Links

Minghao Ye

Footer

Navigation Menu

Contact Me

Quick Links