面向集群环境的虚拟化GPU计算平台

杨经纬; 马凯; 龙翔

doi:10.13700/j.bh.1001-5965.2015.0731

面向集群环境的虚拟化GPU计算平台

doi: 10.13700/j.bh.1001-5965.2015.0731

北京航空航天大学计算机学院, 北京 100083

详细信息

作者简介:
杨经纬,男,博士研究生。主要研究方向:计算系统结构、嵌入式系统、多核实时操作系统与实时调度。E-mail:yaungjw@buaa.edu.cn;马凯,男,硕士研究生。主要研究方向:计算机系统结构、并行与分布式计算。E-mail:makai@buaa.edu.cn;龙翔,男,博士,教授,博士生导师。主要研究方向:计算系统结构、并行与分布式系统、实时系统。Tel.:010-82339685,E-mail:long@buaa.edu.cn

通讯作者:
龙翔,Tel.:010-82339685,E-mail:long@buaa.edu.cn

中图分类号: TP315
计量
- 文章访问数: 943
- HTML全文浏览量: 97
- PDF下载量: 763
- 被引次数: 0
出版历程
- 收稿日期: 2015-11-09
- 修回日期: 2016-01-15
- 网络出版日期: 2016-11-20

Virtualized GPU computing platform in clustered system environment

School of Computer Science and Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100083, China

摘要

摘要: 针对集群系统的多节点多GPU环境，提出一种新型虚拟化GPU计算平台。该平台实现对集群系统所有节点上GPU资源的统一抽象与管理，构建公共GPU资源池。原有GPU应用程序可以不经任何修改而迁移到虚拟化GPU计算平台，并具备访问资源池内任何GPU的能力，编程人员无需显式针对多节点多GPU应用展开MPI编程。应用程序摆脱了单个节点上GPU资源的限制，并具备无差别地访问集群系统中任何可用GPU资源的能力，能有效提高系统总体资源利用率以及吞吐量。采用流水化通信技术，实现对虚拟化GPU计算平台的运行时开销以及节点间数据传输延迟的隐藏。实验表明：与非流水化通信相比，系统总体数据传输延迟降低了50%~70%，具备与节点机本地数据传输等同的通信性能。
- GPU /
- MPI /
- CUDA /
- 集群系统 /
- 硬件加速 /
- 并行计算 /
- 高性能计算
Abstract: A virtualized GPU computing platform is proposed for clustered systems, which are often equipped with GPUs in some nodes. All GPUs in system are uniformly abstracted as virtualized ones in a commonly accessed resource pool. Legacy GPU programs can execute on the virtualized GPU computing platform without any modification and any free virtualized GPU in the common resource pool is available to it, which relieves the burden of MPI programming. The platform frees programs with the limit of GPUs in local node and makes it possible for them to access any available GPU in distributed nodes, leading to higher system utilization and throughput. Based on pipelined communication, the run-time overhead and inter-node transmitting latency in virtualized GPU computing platform are hidden by intra-node memory copying and GPU computing. Compared with the non-pipelined communication, the total transmission latency is decreased by approximately 50%-70%. It results in a comparable performance with intra-node local data transmission.
- GPU /
- MPI /
- CUDA /
- clustered systems /
- hardware acceleration /
- parallel computing /
- high performance computing

HTML全文

参考文献(16)

[1]	KIVITY A,KAMAY Y,LAOR D,et al.KVM:The Linux virtual machine monitor[EB/OL].Proceedings of the Linux Symposium,Ottawa[2015-11-01].
[2]	BARHAM P,DRAGOVIC B,FRASER K,et al.Xen and the art of virtualization[C]//Proceedings of the 19th ACM Symposium on Operating Systems Principles.New York: ACM,2003:164-177.
[3]	NextION2800-ICA-Flexible and manageable I/O expansion and virtualization[EB/OL].Austin:NEXTIO[2015-11-01].
[4]	SHREINER D.OpenGL programming guide[M].7th ed.Boston:Addison-Wesley Professional,2009:1-28.
[5]	BLYTHE D.The direct 3D 10 system[J].ACM Transactions on Graphics,2006,25(3):724-734.
[6]	NVIDIA.CUDA C programming guide[EB/OL].Santa Clara:NVIDIA[2015-11-01].
[7]	KHRONOS OpenCL Working Group.OpenCL 2.0 specification[EB/OL].OR:Khronos[2015-11-01].
[8]	DOWTY M,SUGERMAN J.GPU virtualization on VMware's hosted I/O architecture[J].ACM SIGOPS Operating Systems Review,2009,43(3):73-82.
[9]	LAGAR-CAVILLA H A,TOLIA N,SATYANARAYANAN M,et al.VMM-independent graphics acceleration[C]//VEE'07:Proceedings of the 3rd International Conference on Virtual Execution Environments.New York:ACM,2007:33-43.
[10]	SHI L,CHEN H,SUN J.vCUDA:GPU accelerated high performance computing in virtual machines[C]//23rd IEEE International Symposium on Parallel and Distributed Processing (IPDPS'09).Piscataway,NJ:IEEE Press,2009:418-428.
[11]	GUPTA V,GAVRILOVSKA A,SCHWAN K,et al.GViM:GPU-accelerated virtual machines[C]//Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing.New York:ACM,2009:17-24.
[12]	GIUNTA G,MONTELLA R,AGRILLO G,et al.A GPGPU transparent virtualization component for high performance computing clouds[C]//16th International Euro-Par-Conference on Parallel Processing.Berlin:Springer,2010:379-391.
[13]	KEGEL P,STEUWER M,GORLATCH S.dOpenCL:Towards a uniform programming approach for distributed heterogeneous multi-/many-core systems[C]//2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW).Piscataway,NJ:IEEE Press,2012:174-186.
[14]	BARAK A,BEN-NUN T,LEVY E,et al.A package for OpenCL based heterogeneous computing on clusters with many GPU devices[C]//2010 IEEE International Conference on Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).Piscataway,NJ:IEEE Press,2010:1-7.
[15]	DUATO J,PEA A J,SILLA F,et al.rCUDA:Reducing the number of GPU-based accelerators in high performance clusters[C]//2010 International Conference on High Performance Computing and Simulation (HPCS).Piscataway,NJ:IEEE Press,2010,6:224-231.
[16]	PEA A J,REAO C,SILLA F,et al.A complete and efficient CUDA-sharing solution for HPC clusters[J].Parallel Computing,2014,40(10):574-588.