博士论文答辩:基于深度强化学习的太赫兹无线网络中的资源分配研究
日期:2025/05/21 - 2025/05/21
博士论文答辩:基于深度强化学习的太赫兹无线网络中的资源分配研究
主讲人:Zhifeng Hu, Ph.D. candidate at UM-SJTU Joint Institute
时间:2025年5月21日(周三)下午2:00-3:30
地点:密西根学院龙宾楼503会议室
讲座摘要
Terahertz (THz) band communications are envisioned as a key technology to pave the road to the six-generation (6G) and beyond networks. Benefiting from the multi-ten gigahertz (GHz) available bandwidth, THz communications with ultra-high data rates can support both small-scale access scenarios as well as large-scale backhaul and satellite systems, including THz non-orthogonal multiple access (NOMA) networks, THz mesh backhaul networks, and THz space communication (Tera-SpaceCom) satellite edge computing (SEC) networks. The THz-NOMA network with power domain exploration and ultra-broad bandwidth is promising to support missive connectivity, high channel capacities, and user fairness. With enticing data rates and flexible reconfigurability, the THz mesh network can empower next-generation backhaul systems. Furthermore, Tera-SpaceCom can provide SEC services for THz space sensing task computation via massive satellites in a low earth orbit (LEO) mega-constellation. Due to the ultra-high data rates, the THz links can enable Tera- SpaceCom to support efficient SEC services with low latencies. To unleash the potential of the THz band, effective and efficient long-term resource allocation solutions are essential for the aforementioned networks. Nevertheless, the peculiarities of these THz networks incur several challenges as follows. i) The long-term hybrid continuous and discrete beamforming-bandwidth-power (BBP) allocation for massive user equipments in THz-NOMA networks is a non-deterministic polynomial-time hard (NP-hard) mixed-integer non-linear programming problem. ii) Besides the NP-hardness, possible link failures of highly directional THz links as well as the dynamic and unpredictable backhaul traffic demands further challenge the long-term resource allocation design in THz mesh backhaul networks. iii) In addition to addressing the NP-hardness and unpredictable computing demands, Tera-SpaceCom SEC networks are required to efficiently coordinate the long-term resource allocation strategies of a large number of satellites with very long distances. The above mentioned issues challenge the feasibility of traditional non-learning solutions. To address these challenges, the author develops customized deep reinforcement learning (DRL) resource allocation algorithms for THz-NOMA, THz mesh backhaul, and Tera-SpaceCom SEC networks in this dissertation.
In the THz-NOMA network, in light of the continuous property of power and sub-arrays ratios assignment and the discrete property of sub-bands allocation, a hybrid discrete and continuous actions multi-task DRL (DISCO) algorithm is proposed to maximize the long-term throughput. Specifically, the multi-task structure leveraged in DISCO actor integrates two state-of-the-art DRL architectures, i.e., actor-critic (AC) and deep deterministic policy gradient (DDPG) designed for discrete and continuous actions, respectively. Moreover, to tailor the proposed DISCO for the BBP allocation problem, rigorous theoretical derivations for the neural network design and backpropagation process are discussed. Simulation results demonstrate that compared to the benchmark algorithms, DISCO improves the network throughput, while achieving good fairness among user equipments and short running time.
To tackle the challenges brought by NP-hardness, stochastic traffic demands, and possible sudden link failures, a DRL based cross-layer design in THz mesh backhaul networks (DEFLECT) is proposed, which considers both resource allocation and routing. In particular, a heuristic routing metric is first developed with the target of resource efficiency enhancement regarding power and sub-array usages. Furthermore, a DRL based resource allocation algorithm is designed to realize long-term resource efficiency maximization and fast recovery from broken links. On one hand, the leveraged multi-task and multi-agent DRL mechanisms cooperatively benefit joint power and sub-array allocation for all base stations. On the other hand, DEFLECT DRL deploys a hierarchical architecture to tailor resource allocation for each base station and transfer learned knowledge for rapid recovery. Simulation results show that DEFLECT routing consumes fewer resources, compared to the commonly-used minimal hop-count metric. Moreover, unlike conventional DRL methods that lead to packet loss and second-level latency, DEFLECT DRL realizes long-term resource efficiency maximization with zero packet loss and millisecond-level latency, and recovers resource-efficient backhaul from broken links within 1s.
In the Tera-SpaceCom SEC network with a large number of satellites, a graph neural network (GNN)-DRL based joint resource allocation and task offloading (GRANT) algorithm is proposed for the purpose of long-term resource efficiency maximization. The exploitation of GNNs can learn relationships among different satellites in the large-scale LEO mega-constellation from their connectivity information, benefiting the cooperative training for all satellites. Additionally, multi-agent and multi-task structures in GRANT can further cooperatively train computing task offloading as well as power and sub-array allocation for each satellite. Simulation results illustrate that compared with benchmark solutions, GRANT not only achieves the highest resource efficiency with relatively low latency, but realizes the highest memory efficiency and time efficiency.
The DRL based algorithms proposed in this dissertation can tackle the important resource allocation challenges in key scenarios of THz networks, facilitating the development of THz communications as well as next-generation wireless networks.
主讲人介绍
Zhifeng Hu received the B.E. degree in electrical and computer engineering from Shanghai Jiao Tong University, China, in 2020, where he is currently pursuing the Ph.D. degree with the Terahertz Wireless Communications (TWC) Laboratory, University of Michigan–Shanghai Jiao Tong University (UM-SJTU) Joint Institute. His current research interests include machine/deep learning for terahertz networking.