Design and Implementation of High Performance BLAS for Pentium Pro
-
摘要: 支持科学和工程计算的BLAS(基本线性代数子程序)在高性能计算中有着重要作用.本文针对Pentium Pro的体系结构特点,提出了一些优化方法使得BLAS在Pentium Pro上计算性能达到最佳.测试表明,在200MHz的Pentium Pro上BLAS3的速度可达112Mflops.Abstract: BLAS is basic linear algebra libraries for science and engineering applications.It plays an important role in high-performance computing. By identifying and optimizing frequently used, numerically intensive operations, BLAS can help in reducing the cost of computation, enhancing portability, and improving productivity.By taking advantage of the architectural features of Pentium Pro, a series of optimized methods are proposed to implement BLAS on Pentium Pro so that BLAS can perform optimally on Pentium Pro systems. The testing results demonstrate that the speed of BLAS3 can reach 112Mflops on 200MHz Pentium Pro.
-
Key words:
- linear algebra /
- optimization /
- registers /
- BLAS(Basic Linear Algebra Subprograms) /
- cache /
- block /
- loop unroll
-
1. Dodson D S,Lewis J G.Issues relating to extension of the basic linear algebra subprograms.SIGNUM Newsl (ACM), 1985,20(1):19~22 2. Dongarra J J, Du Croz J,Hammarling S,et al.An extended set of FORTRAN basic linear algebra subprograms.ACM Trans Math Softw,1988,14(1):1~17 3. Dongarra J J, Du Croz J, Duff I S,et al. A set of level 3 basic linear algebra subprograms.ACM Trans Math Softw, 1990, 16(1):1~17 4. Bhandarkar D, Ding J.Performance Characterization of the Pentium Pro processor.1997,see http://www.computer.org/conferen/hpca97/77640288.pdf 5. Lam M,Rothberg E,Wolf M E.The cache performance and optimizations of blocked algorithms.1991,see http://suif.stanford,edu/paper/papers.html
点击查看大图
计量
- 文章访问数: 2133
- HTML全文浏览量: 70
- PDF下载量: 849
- 被引次数: 0