Virtualized GPU computing platform in clustered system environment
-
摘要: 针对集群系统的多节点多GPU环境,提出一种新型虚拟化GPU计算平台。该平台实现对集群系统所有节点上GPU资源的统一抽象与管理,构建公共GPU资源池。原有GPU应用程序可以不经任何修改而迁移到虚拟化GPU计算平台,并具备访问资源池内任何GPU的能力,编程人员无需显式针对多节点多GPU应用展开MPI编程。应用程序摆脱了单个节点上GPU资源的限制,并具备无差别地访问集群系统中任何可用GPU资源的能力,能有效提高系统总体资源利用率以及吞吐量。采用流水化通信技术,实现对虚拟化GPU计算平台的运行时开销以及节点间数据传输延迟的隐藏。实验表明:与非流水化通信相比,系统总体数据传输延迟降低了50%~70%,具备与节点机本地数据传输等同的通信性能。Abstract: A virtualized GPU computing platform is proposed for clustered systems, which are often equipped with GPUs in some nodes. All GPUs in system are uniformly abstracted as virtualized ones in a commonly accessed resource pool. Legacy GPU programs can execute on the virtualized GPU computing platform without any modification and any free virtualized GPU in the common resource pool is available to it, which relieves the burden of MPI programming. The platform frees programs with the limit of GPUs in local node and makes it possible for them to access any available GPU in distributed nodes, leading to higher system utilization and throughput. Based on pipelined communication, the run-time overhead and inter-node transmitting latency in virtualized GPU computing platform are hidden by intra-node memory copying and GPU computing. Compared with the non-pipelined communication, the total transmission latency is decreased by approximately 50%-70%. It results in a comparable performance with intra-node local data transmission.
-
Key words:
- GPU /
- MPI /
- CUDA /
- clustered systems /
- hardware acceleration /
- parallel computing /
- high performance computing
-
[1] KIVITY A,KAMAY Y,LAOR D,et al.KVM:The Linux virtual machine monitor[EB/OL].Proceedings of the Linux Symposium,Ottawa[2015-11-01].https://www.kernel.org/doc/ols/2007/ols2007v1-pages-225-230.pdf. [2] BARHAM P,DRAGOVIC B,FRASER K,et al.Xen and the art of virtualization[C]//Proceedings of the 19th ACM Symposium on Operating Systems Principles.New York: ACM,2003:164-177. [3] NextION2800-ICA-Flexible and manageable I/O expansion and virtualization[EB/OL].Austin:NEXTIO[2015-11-01].http://www.nextio.com/docs/NextIO20N2800-ICAIOConsolidationApplianceProductBriefv0.18.pdf. [4] SHREINER D.OpenGL programming guide[M].7th ed.Boston:Addison-Wesley Professional,2009:1-28. [5] BLYTHE D.The direct 3D 10 system[J].ACM Transactions on Graphics,2006,25(3):724-734. [6] NVIDIA.CUDA C programming guide[EB/OL].Santa Clara:NVIDIA[2015-11-01].https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html. [7] KHRONOS OpenCL Working Group.OpenCL 2.0 specification[EB/OL].OR:Khronos[2015-11-01].http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf. [8] DOWTY M,SUGERMAN J.GPU virtualization on VMware's hosted I/O architecture[J].ACM SIGOPS Operating Systems Review,2009,43(3):73-82. [9] LAGAR-CAVILLA H A,TOLIA N,SATYANARAYANAN M,et al.VMM-independent graphics acceleration[C]//VEE'07:Proceedings of the 3rd International Conference on Virtual Execution Environments.New York:ACM,2007:33-43. [10] SHI L,CHEN H,SUN J.vCUDA:GPU accelerated high performance computing in virtual machines[C]//23rd IEEE International Symposium on Parallel and Distributed Processing (IPDPS'09).Piscataway,NJ:IEEE Press,2009:418-428. [11] GUPTA V,GAVRILOVSKA A,SCHWAN K,et al.GViM:GPU-accelerated virtual machines[C]//Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing.New York:ACM,2009:17-24. [12] GIUNTA G,MONTELLA R,AGRILLO G,et al.A GPGPU transparent virtualization component for high performance computing clouds[C]//16th International Euro-Par-Conference on Parallel Processing.Berlin:Springer,2010:379-391. [13] KEGEL P,STEUWER M,GORLATCH S.dOpenCL:Towards a uniform programming approach for distributed heterogeneous multi-/many-core systems[C]//2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW).Piscataway,NJ:IEEE Press,2012:174-186. [14] BARAK A,BEN-NUN T,LEVY E,et al.A package for OpenCL based heterogeneous computing on clusters with many GPU devices[C]//2010 IEEE International Conference on Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).Piscataway,NJ:IEEE Press,2010:1-7. [15] DUATO J,PEA A J,SILLA F,et al.rCUDA:Reducing the number of GPU-based accelerators in high performance clusters[C]//2010 International Conference on High Performance Computing and Simulation (HPCS).Piscataway,NJ:IEEE Press,2010,6:224-231. [16] PEA A J,REAO C,SILLA F,et al.A complete and efficient CUDA-sharing solution for HPC clusters[J].Parallel Computing,2014,40(10):574-588.
点击查看大图
计量
- 文章访问数: 873
- HTML全文浏览量: 95
- PDF下载量: 760
- 被引次数: 0