Optimized BLAS and Its Effect on Performance of Parallel Programs

LONG Xiang; LI Zhong-ze; CHEN Jin

Volume 27 Issue 1

Jan. 2001

Turn off MathJax

Article Contents

Journal of Beijing University of Aeronautics and Astronautics > 2001 > 27(1): 79-82.

LONG Xiang, LI Zhong-ze, CHEN Jinet al. Optimized BLAS and Its Effect on Performance of Parallel Programs[J]. Journal of Beijing University of Aeronautics and Astronautics, 2001, 27(1): 79-82. (in Chinese)

Citation:

LONG Xiang, LI Zhong-ze, CHEN Jinet al. Optimized BLAS and Its Effect on Performance of Parallel Programs[J]. Journal of Beijing University of Aeronautics and Astronautics, 2001, 27(1): 79-82. (in Chinese)

Citation:

PDF( 274 KB)

Optimized BLAS and Its Effect on Performance of Parallel Programs

Beijing University of Aeronautics and Astronautics, Dept. of Computer Science and Engineering

Received Date: 04 May 1999
Publish Date: 31 Jan 2001

Abstract

Abstract

It is the trend of using SMP board as the compute node of a high performance system. The benefits of multithreading is discussed firstly and the BLAS3 is rewritten to get higher performance on a Dual Pentium II system. To investigate the relation between the performance of a single compute node and the entire parallel system, the SUMMA(Scalable Universal Matrix Multiplication Algorithm) is taken as an instance of our research works afterwards. The result demonstrates that the higher the performance of a SMP compute node is, the more sensitive to the capability of the SAN(System Area Network) the performance of the whole parallel system is.
- parallel processing,
- linear algebra,
- optimization,
- multithread,
- BLAS,
- SUMMA

FullText(HTML)

References(1)

References

[1] Dongarra J J, Gustavson F G, Karp A. Implementing linear algebra algorithms for dense matrices on a Vecto pipeline machine[J].SIAM Rev, 1984,26:91~112.[2] Chandrika Kamath, Roy Ho, Dwight P Manley.DXML:A high-performance scientific subroutine library. [3] 李忠泽,陈瑾,龙翔,等.基于Pentium Pro的高性能BLAS的设计与实现[J].北京航空航天大学学报,1998,24(4):454~457.[4] Golub G H, Van Loan C F.Matrix computations[M]. 2nd ed. Baltimore:Johns Hopkins University Press, 1989.[5] Cannon L E.A cellular computer to implement the Kalman filter algorithm. Bozeman:Montana State University, 1969.[6] Fox G C, Johnson M A, Lyzenga G A, et al.Solving problems on concurrent processors[M]. Englewood Cliffs:Prentice Hall, 1988.[7] Agarwal R C, Gustavson F, Zubair M. A high-performance matrix multiplication algorithm on a distributed memory parallel computer using overlapped communication[J]. IBM Journal of Research and Development, 1994, 38(6):673~681.[8] Geijin R Van de, Watts J. SUMMA:scalable universal matrix multiplication algorithm. Technical Report of The University of Texas,TR-95-13, 1995.

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Get Citation

PDF

XML

Article Metrics

Article views(3443) PDF downloads(118)

Optimized BLAS and Its Effect on Performance of Parallel Programs

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Optimized BLAS and Its Effect on Performance of Parallel Programs

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content