1. 西安电子科技大学 电子工程学院,陕西 西安 710071
2. 齐鲁工业大学 计算机科学与技术学院,山东 济南 250000
[ "贾瑞鹏(1996—),男,西安电子科技大学博士研究生,E-mail:[email protected]; " ]
林中朝(1988—),男,副教授,E-mail:[email protected]
[ "左 胜(1992—),男,副研究员,E-mail:[email protected]; " ]
[ "张 玉(1978—),男,教授,E-mail:[email protected]; " ]
[ "杨美红(1966—),女,研究员,E-mail:[email protected]" ]
纸质出版日期:2024-4-20,
网络出版日期:2023-10-13,
收稿日期:2023-3-21,
扫 描 看 全 文
贾瑞鹏, 林中朝, 左胜, 等. 面向国产异构DCU平台的大规模并行矩量法研究[J]. 西安电子科技大学学报, 2024,51(2):76-83.
Ruipeng JIA, Zhongchao LIN, Sheng ZUO, et al. Study of the parallel MoM on a domestic heterogeneous DCU platform[J]. Journal of Xidian University, 2024,51(2):76-83.
贾瑞鹏, 林中朝, 左胜, 等. 面向国产异构DCU平台的大规模并行矩量法研究[J]. 西安电子科技大学学报, 2024,51(2):76-83. DOI: 10.19665/j.issn1001-2400.20230504.
Ruipeng JIA, Zhongchao LIN, Sheng ZUO, et al. Study of the parallel MoM on a domestic heterogeneous DCU platform[J]. Journal of Xidian University, 2024,51(2):76-83. DOI: 10.19665/j.issn1001-2400.20230504.
面向国产异构众核处理器超级计算机发展趋势
实现了基于CPU+DCU国产异构并行系统的大规模并行高阶矩量法。在同构并行矩量法负载均衡策略的基础上
提出了一种“MPI+openMP+DCU”的高效异构并行编程框架
解决了计算任务与计算能力不匹配的问题
实现了矩量法异构并行计算过程的负载均衡。采用细粒度任务划分策略与异步通信技术
对深度计算处理器计算过程进行了流水线优化设计
实现了计算与通信重叠
提升了矩量法异构协同计算的效率。通过与有限元法的仿真结果对比
验证了CPU+DCU异构并行矩量法的准确性。基于国产深度计算处理器异构平台的可扩展性分析结果表明
与单纯CPU计算相比
所实现的CPU+DCU异构协同计算方法能够获得5.5~7.0倍的加速效果
且在国家超级计算西安中心能够实现全系统运行
并行规模从360节点扩展到3 600节点(共1 036 800个处理器核心)
并行效率可以达到约73.5%。
In view of the current development trend of the domestic supercomputer CPU+DCU heterogeneous architecture
the research on the CPU+DCU massively heterogeneous parallel higher-order method of moments is carried out.First
the basic implementation strategy of DCU to accelerate the calculation of the method of moments is given.Based on the load balancing parallel strategy of the isomorphic parallel moment of methods
an efficient heterogeneous parallel programming framework of "MPI+openMP+DCU" is proposed to address the problem of mismatch between computing tasks and computing power.In addition
the fine-grained task division strategy and asynchronous communication technology are adopted to optimize the design of the pipeline for the DCU computation process
thus realizing the overlapping of computation and communication and improving the acceleration performance of the program.The accuracy of the CPU+DCU heterogeneous parallel moment of methods is verified by comparing the simulation results with those by the finite element method.The scalability analytical results based on the domestic DCU heterogeneous platform show that the implemented CPU+DCU heterogeneous co-computing program can obtain 5.5~7.0 times acceleration effect at different parallel scales
and that the parallel efficiency reaches 73.5% when scaled from 360 nodes to 3600 nodes(1
036
800 cores in total).
高阶矩量法国产异构并行系统深度计算处理器异构协同并行计算
method of momentsdomestic heterogeneous platformsdeep computing unit(DCU)parallel algorithm
GUO H L, YUE Y Y, BO M Y, et al. Research of Thread Granularity Optimization Strategy for the Domestic DCU Accelerator[C]// 2021 International Conference on Information Science,Parallel and Distributed Systems(ISPDS).Piscataway:IEEE, 2021: 110-115.
BO M Y, LIU Y Q, YUE Y Y, et al. SpMV Algorithm Based on MPI+HIP Hybrid Model[C]// 2022 7th International Conference on Intelligent Computing and Signal Processing(ICSP). Piscataway:IEEE, 2022: 944-948.
HE W J, KONG Y N, HE K F, et al. Massively Parallel Approach of Multilevel Fast Multipole Algorithm on DCU Clusters for Large Electromagnetic Scattering Problems[C]// 2021 International Applied Computational Electromagnetics Society(ACES-China) Symposium. Piscataway:IEEE, 2021: 1-2.
HARRINGTON R F. Field Computation by Moment Methods[M]. New York: Wiley-IEEE Press, 1993.
翟畅, 林中朝, 赵勋旺, 等. 一种使用八叉树的半空间MLFMA区域分解算法[J]. 西安电子科技大学学报, 2021, 48(6):144-150.
ZHAI Chang, LIN Zhongchao, ZHAO Xunwang, et al. Algorithm for Half-Space MLFMA Domain Decomposition Utilizing an Octre[J]. Journal of Xidian University, 2021, 48(6):144-150.
左胜, 陈岩, 张玉, 等. 一种可扩展异构并行核外高阶矩量法[J]. 西安电子科技大学学报, 2017, 44(1):146-151.
ZUO Sheng, CHEN Yan, ZHANG Yu, et al. Study of the Scalable Heterogeneous Parallel Out-of-Core Higher Order Method of Moments[J]. Journal of Xidian University, 2017, 44(1):146-151.
CHEN Y, LIN Z C, ZHANG Y, et al. Parallel Out-of-Core Higher-Order Method of Moments Accelerated by Graphics Processing Unit[C]// 2015 IEEE International Symposium on Antennas and Propagation & USNC/URSI National Radio Science Meeting. Piscataway:IEEE, 2015: 1674-1675.
ZHANG G H, CHEN Y, ZHANG Y, et al. MIC Accelerated LU Decomposition for Method of Moments[C]//Proceedings of 2015 IEEE Antennas and Propagation Society International Symposium. Piscataway:IEEE, 2015:756-757.
CHEN Y, ZHANG Y, LIN Z C, et al. GPU Accelerated Parallel MoM for Simulating Microstrip Antenna Array[C]//Proceedings of 2014 3rd Asia-Pacific Conference on Antennas and Propagation. Piscataway:IEEE, 2014:1027-1029.
LIN Z C, CHEN Y, ZHAO X W, et al. Parallel Higher-Order Method of Moments with Efficient Out-of-GPU Memory Schemes for Solving Electromagnetic Problems[J]. Applied Computational Electromagnetics Society Journal, 2017,32:781-788.
LIN Z C, CHEN Y, ZHANG Y, et al. An Efficient GPU-Based Out-of-Core LU Solver of Parallel Higher-Order Method of Moments for Solving Airborne Array Problems[J]. International Journal of Antennas and Propagation, 2017,2017:1-10.
CHEN Y, ZUO S, ZHANG Y, et al. Large-Scale Parallel Method of Moments on CPU/MIC Heterogeneous Clusters[J]. IEEE Transactions on Antennas and Propagation, 2017, 65(7):3782-3787.
ZHANG N N, RONG Z, CHEN Y P, et al. Hierarchical LU Direct Solver Based on Higher Order Basis Function[C]// 2021 International Applied Computational Electromagnetics Society(ACES-China) Symposium. Piscataway:IEEE, 2021: 1-2.
张玉. 计算电磁学的超大规模并行矩量法[M]. 西安: 西安电子科技大学出版社, 2016.
奎因. MPI与OpenMP并行程序设计[M]. 北京: 清华大学出版社, 2004.
ZAMBRE R, CHANDRAMOWLISHWARAN A. Lessons Learned on MPI+Threads Communication[C]//SC22:International Conference for High Performance Computing,Networking,Storage and Analysis. Piscataway:IEEE, 2022:1-16.
陈岩. 高性能矩量法及其在复杂目标电磁模拟中的应用[D]. 西安: 西安电子科技大学, 2017.
ZUO S, LIN Z C, LIU J Z, et al. A Fast Parallel Solution Technique for Large Periodic Structures Based on FEM-DDM[J]. IEEE Antennas and Wireless Propagation Letters, 2020, 19(10):1704-1708.
ZUO S, LIN Z C, GARCÍA-DOÑORO D, et al. A Parallel Direct Domain Decomposition Solver Based on Schur Complement for Electromagnetic Finite Element Analysis[J]. IEEE Antennas and Wireless Propagation Letters, 2021, 20(4):458-462.
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构