留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

从网格到“东数西算” : 构建国家算力基础设施

钱德沛 栾钟治 刘轶

钱德沛, 栾钟治, 刘轶等 . 从网格到“东数西算” : 构建国家算力基础设施[J]. 北京航空航天大学学报, 2022, 48(9): 1561-1574. doi: 10.13700/j.bh.1001-5965.2022.0715
引用本文: 钱德沛, 栾钟治, 刘轶等 . 从网格到“东数西算” : 构建国家算力基础设施[J]. 北京航空航天大学学报, 2022, 48(9): 1561-1574. doi: 10.13700/j.bh.1001-5965.2022.0715
QIAN Depei, LUAN Zhongzhi, LIU Yiet al. From grid to 'East-west Computing Transfer' : Constructing national computing infrastructure[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(9): 1561-1574. doi: 10.13700/j.bh.1001-5965.2022.0715(in Chinese)
Citation: QIAN Depei, LUAN Zhongzhi, LIU Yiet al. From grid to "East-west Computing Transfer" : Constructing national computing infrastructure[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(9): 1561-1574. doi: 10.13700/j.bh.1001-5965.2022.0715(in Chinese)

从网格到“东数西算” : 构建国家算力基础设施

doi: 10.13700/j.bh.1001-5965.2022.0715
详细信息
    通讯作者:

    钱德沛, E-mail: depeiq@buaa.edu.cn

  • 中图分类号: TP311;TP316;TP391

From grid to "East-west Computing Transfer" : Constructing national computing infrastructure

More Information
  • 摘要:

    简要回顾了几十年来计算机使用方式的变迁,介绍了基于网络计算技术的国家高性能计算基础设施CNGrid的设计与实现。讨论了在“东数西算”战略工程背景下中国算力发展的新趋势,以及国家算力基础设施发展面临的新的技术挑战,并对中国未来超算应用生态和算力基础设施建设提出了展望。

     

  • 图 1  国家高性能计算基础设施的非集中层次虚拟化体系结构

    Figure 1.  Decentralized hierarchical virtualization architecture for national high performance computing infrastructure

    图 2  CNGrid Suite系统架构

    Figure 2.  CNGrid suite system architecture

    图 3  一体化基础设施监控管理体系结构框架

    Figure 3.  Architectural framework of integrated infrastructure monitoring and management

    图 4  工业社区系统框架

    Figure 4.  Industrial community system framework

    图 5  高性能计算应用集成开发环境系统架构

    Figure 5.  System architecture of integrated development environment for high performance computing applications

    表  1  世界部分网格相关研究计划

    Table  1.   Part of grid-related programs in the world

    国别 网格计算相关研究计划/项目
    美国 TeraGrid, XSEDE
    欧盟 EGEE, EGI
    英国 UK e-Science
    日本 NAREGI,HPCI
    韩国 K*Grid
    中国 CNGrid,ChinaGrid
    下载: 导出CSV

    表  2  中国科技部的网格和高性能计算项目

    Table  2.   Grid and high performance computing projects under the Ministry of Science and Technology of China

    项目来源 项目名称 执行周期 项目成果
    国家863重大课题 国家高性能计算环境 1999—2000年 4 000亿次曙光3000;包含5个高性能计算中心的国家高性能计算环境原型
    国家863重大专项 高性能计算机及核心软件 2002—2005年 11.2万亿次曙光4000,5.36万亿次的联想深腾6800;国家高性能计算环境实验床“中国国家网格CNGrid”,8个结点,18万亿次计算能力; 一批网格应用
    国家863重大项目 高效能计算机及网格服务环境 2006—2010年 4 700万亿次的天河1A,3 000万亿次的曙光6000,1 071万亿次的神威蓝光; 具有服务特征的国家网格服务环境CNGrid,11个结点,8 000万亿次计算能力; 一批网格和高性能计算应用
    国家863重大项目 高效能计算机及应用服务环境 2011—2015年 12.5亿亿次的神威·太湖之光,10亿亿次的天河2A;以服务支持应用的国家高性能计算环境CNGrid,14个结点,20亿亿次计算能力; 一批高性能计算应用
    国家重点研发专项 高性能计算 2016—2021年 E级计算机; 初步具备基础设施形态的国家高性能计算环境CNGrid,19个结点,52亿亿次计算能力; 一批高性能计算应用
    下载: 导出CSV

    表  3  典型深度神经网络模型的训练计算量[25]

    Table  3.   Training computations of typical deep neural networks[25]

    神经网络模型 应用领域 训练计算量/Flops
    AlexNet 图像分类 4.7×1017
    VGG16 图像分类 8.5×1018
    YOLOv3 图像目标检测 5.1×1019
    Transformer 自然语言处理 7.4×1018
    GPT-3 自然语言处理 3.1×1023
    注:数据来源于https://docs.google.com/spreadsheets/d/1AAIebjNsnJj_uKALHbXNfn3_YsT6sHXtCU0q7OIPuc4
    下载: 导出CSV

    表  4  TOP500排名前十的高性能计算机(2022年6月)[26]

    Table  4.   TOP10 in TOP500 high performance computing systems (June 2022)[26]

    排名 系统 处理器/加速器 Linpack性能/PFlops
    1 Frontier AMD 64C+AMD MI250X 1 102
    2 Fugaku(富岳) A64FX 48C 442.01
    3 LUMI AMD 64C+AMD MI250X 151.90
    4 Summit(顶点) IBM Power+Nvidia V100 148.60
    5 Sierra(山脊) IBM Power+Nvidia V100 94.64
    6 Sunway TaihuLight(神威·太湖之光) Sunway SW26010 93.01
    7 Perlmutter AMD 64C+Nvidia A100 70.87
    8 Selene AMD 64C+Nvidia A100 63.46
    9 Tianhe-2A(天河2A) Intel Xeon + Matrix2000 61.44
    10 Adastra AMD 64C+AMD MI250X 46.10
    注:数据来源于http://www.top500.org
    下载: 导出CSV
  • [1] DENNIS J. Segmentation and the design of multiprogrammed computer systems[J]Journal of the ACM, 1965, 12(4): 589-602. doi: 10.1145/321296.321310
    [2] SACKMAN H. Time-sharing versus batch processing: The experimental evidence[C]//Proceedings of the American Federation of Information Processing Societies. New York: ACM, 1968: 1-10.
    [3] SCHWARTZ J, COFFMAN E, WEISSMAN C. A general-purpose time-sharing system[C]//Proceedings of the American Federation of Information Processing Societies. New York: ACM, 1964: 397-411.
    [4] MILLS D L, BRAUN H. The NSFNET backbone network[C]//Proceedings of the ACM Workshop on Frontiers in Computer Communications Technology. New York: ACM, 1987: 191-196.
    [5] FOSTER I T, KESSELMAN C. The grid: Blueprint for a new computing infrastructure[M]. San Francisco: Morgan Kaufman Publishers, 1998.
    [6] STEVENS R, WOODWARD P, DEFANTI T, et al. From the I-WAY to the national technology grid[J]. Communications of the ACM, 1997, 40(11): 50-60. doi: 10.1145/265684.265692
    [7] THOMAS M, BOISSEAU J, DAHAN M, et al. Development of NPACI grid application portals and portal Web services[J]. Cluster Computing, 2003, 6(3): 177-188. doi: 10.1023/A:1023566402391
    [8] FOSTER I, CZAJKOWSKI K, FERGUSON D, et al. Modeling and managing state in distributed systems: The role of OGSI and WSRF[J]. Proceedings of the IEEE, 2005, 93(3): 604-612. doi: 10.1109/JPROC.2004.842766
    [9] TALIA D. The open grid services architecture: Where the grid meets the Web[J]. IEEE Internet Computing, 2002, 6(6): 67-71. doi: 10.1109/MIC.2002.1067739
    [10] FOSTER I, KESSELMAN C. Globus: A metacomputing infrastructure toolkit[J]. International Journal of Supercomputer Application, 1998, 11(2): 115-129.
    [11] REED D.A. Grids, the TeraGrid, and beyond[J]. IEEE Computer, 2003, 36(1): 62-68. doi: 10.1109/MC.2003.1160057
    [12] KUNSZT P. European DataGrid project: Status and plans[J]. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 2003, 502(2-3): 376-381. doi: 10.1016/S0168-9002(03)00447-9
    [13] GAGLIARDI F, JONES B, GREY F, et al. Building an infrastructure for scientific grid computing: Status and goals of the EGEE project[J]. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2005, 363(1833): 1729-1742. doi: 10.1098/rsta.2005.1603
    [14] HEY T, TREFETHEN A E. The UK e-Science core programme and the grid[J]. Future Generation Computer Systems, 2002, 18(8): 1017-1031. doi: 10.1016/S0167-739X(02)00082-1
    [15] MATSUOKA S, SHINJO S, AOYAGI M, et al. Japanese computational grid research project: NAREGI[J]. Proceedings of the IEEE, 2005, 93(3): 522-533. doi: 10.1109/JPROC.2004.842748
    [16] ARMBRUST M, FOX A, GRIFFITH R, et al. Above the clouds: A Berkeley view of cloud computing: UCB/EECS-2009-28[R]. Berkeley: EECS Department University of California, Berkeley Technical Report, 2009.
    [17] SARASWAT M, TRIPATHI R C. Cloud computing: Analysis of top 5 CSPs in SaaS, PaaS and IaaS platforms[C]//2020 9th International Conference on System Modeling and Advancement in Research Trends, 2020: 20421390.
    [18] SOTOMAYOR B, MONTERO R, LLORENTE I, et al. Virtual infrastructure management in private and hybrid clouds[J]. IEEE Internet Computing, 2009, 13(5): 14-22. doi: 10.1109/MIC.2009.119
    [19] BARIK R, LENKA R, RAO K, et al. Performance analysis of virtual machines and containers in cloud computing[C]//2016 International Conference on Computing, Communication and Automation. Piscataway: IEEE Press, 2016: 16585534.
    [20] SIMONS J. HPC cloud bad; HPC in the cloud good[C]//2013 IEEE 27th International Symposium on Parallel and Distributed Processing. Piscataway: IEEE Press, 2013: 13683523.
    [21] MOR N. Edge computing: Scaling resources within multiple administrative domains[J]. Queue, 2018, 16(6): 106-116. doi: 10.1145/3305263.3313377
    [22] 乔健, 查礼. 中国国家网格作业管理设计与实现[J]. 计算机应用, 2008, 28(8): 2003-2009. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY200808030.htm

    QIAO J, ZHA L. Design and implementation of grid job management for China national grid[J]. Computer Applications, 2008, 28(8): 2003-2009(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY200808030.htm
    [23] 王小宁, 肖海力, 曹荣强. 面向高性能计算环境的作业优化调度模型的设计与实现[J]. 计算机工程与科学, 2017, 39(4): 619-626. doi: 10.3969/j.issn.1007-130X.2017.04.002

    WANG X N, XIAO H L, CAO R Q. Design and implementation of an optimal job scheduling model for the high performance computing environment[J]. Computer Engineering & Science, 2017, 39(4): 619-626(in Chinese). doi: 10.3969/j.issn.1007-130X.2017.04.002
    [24] 喻林, 邹永强, 查礼. CNGrid GOS安全: 设计与实现[J]. 华中科技大学学报(自然科学版), 2010, 38(S1): 6-10. https://www.cnki.com.cn/Article/CJFDTOTAL-HZLG2010S1003.htm

    YU L, ZOU Y Q, ZHA L. CNGrid GOS security: Design and implementation[J]. Journal of Huazhong University of Science & Technology (Natural Science Edition), 2010, 38(S1): 6-10 (in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-HZLG2010S1003.htm
    [25] SEVILLA J, VILLALOBOS P, C ERON J, et al. Parameter, compute and data trends in machine learning[EB/OL]. [2022-05-30]. https://docs.google.com/spreadsheets/d/1AAIebj NsnJj_uKALHbXNfn3_YsT6sHXtCU0q7OIPuc4.
    [26] TOP500 list[EB/OL]. [2022-06-20]. https://top500.org/lists/top500/2022/06/.
    [27] BONAWITZ K, EICHNER H, GRIESKAMP W, et al. Towards federated learning at scale: System design[C]//Proceedings of the Conference on Machine Learning and Systems. Piscataway: IEEE Press, 2019: 1-15.
  • 加载中
图(5) / 表(4)
计量
  • 文章访问数:  320
  • HTML全文浏览量:  106
  • PDF下载量:  61
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-08-02
  • 录用日期:  2022-08-18
  • 网络出版日期:  2022-08-23

目录

    /

    返回文章
    返回
    常见问答