论文标题
差异化联合加固学习的基于空间空气地面集成网络上的流量卸载
Differentiated Federated Reinforcement Learning Based Traffic Offloading on Space-Air-Ground Integrated Networks
论文作者
论文摘要
空间空气地面集成网络(Sagin)作为全面的基础网络通信基础架构起着关键作用,为高效的全球数据传输提供了机会。尽管如此,鉴于萨金作为动态异质网络的独特特征,传统的网络优化方法在满足该网络环境中数据传输固有的网络延迟和稳定性的严格要求方面遇到了挑战。因此,本文建议使用差异化联合加固学习(DFRL)来解决萨金中的流量卸货问题,即使用多个代理来生成差异化的流量卸载策略。考虑到萨金每个区域的差异特征,DFRL将流量卸载策略优化过程建模为解决分散的部分可观察到的马尔可夫决策过程(DEC-POMDP)问题的过程。本文提出了一种新型的分化联邦软性参与者评论(DFSAC)算法来解决该问题。 DFSAC算法将网络数据包延迟作为关节奖励值,并将全球趋势模型作为每个代理的联合目标行动值函数,以指导每个代理商的策略的更新。模拟结果表明,与传统的联邦加固学习方法和其他基线方法相比,基于DFSAC算法的流量卸载策略在网络吞吐量,数据包丢失率和数据包延迟方面取得了更好的性能。
The Space-Air-Ground Integrated Network (SAGIN) plays a pivotal role as a comprehensive foundational network communication infrastructure, presenting opportunities for highly efficient global data transmission. Nonetheless, given SAGIN's unique characteristics as a dynamically heterogeneous network, conventional network optimization methodologies encounter challenges in satisfying the stringent requirements for network latency and stability inherent to data transmission within this network environment. Therefore, this paper proposes the use of differentiated federated reinforcement learning (DFRL) to solve the traffic offloading problem in SAGIN, i.e., using multiple agents to generate differentiated traffic offloading policies. Considering the differentiated characteristics of each region of SAGIN, DFRL models the traffic offloading policy optimization process as the process of solving the Decentralized Partially Observable Markov Decision Process (DEC-POMDP) problem. The paper proposes a novel Differentiated Federated Soft Actor-Critic (DFSAC) algorithm to solve the problem. The DFSAC algorithm takes the network packet delay as the joint reward value and introduces the global trend model as the joint target action-value function of each agent to guide the update of each agent's policy. The simulation results demonstrate that the traffic offloading policy based on the DFSAC algorithm achieves better performance in terms of network throughput, packet loss rate, and packet delay compared to the traditional federated reinforcement learning approach and other baseline approaches.