Volume 27 Issue 1
Jan.  2001
Turn off MathJax
Article Contents
LONG Xiang, LI Zhong-ze, CHEN Jinet al. Optimized BLAS and Its Effect on Performance of Parallel Programs[J]. Journal of Beijing University of Aeronautics and Astronautics, 2001, 27(1): 79-82. (in Chinese)
Citation: LONG Xiang, LI Zhong-ze, CHEN Jinet al. Optimized BLAS and Its Effect on Performance of Parallel Programs[J]. Journal of Beijing University of Aeronautics and Astronautics, 2001, 27(1): 79-82. (in Chinese)

Optimized BLAS and Its Effect on Performance of Parallel Programs

  • Received Date: 04 May 1999
  • Publish Date: 31 Jan 2001
  • It is the trend of using SMP board as the compute node of a high performance system. The benefits of multithreading is discussed firstly and the BLAS3 is rewritten to get higher performance on a Dual Pentium II system. To investigate the relation between the performance of a single compute node and the entire parallel system, the SUMMA(Scalable Universal Matrix Multiplication Algorithm) is taken as an instance of our research works afterwards. The result demonstrates that the higher the performance of a SMP compute node is, the more sensitive to the capability of the SAN(System Area Network) the performance of the whole parallel system is.

     

  • loading
  • [1] Dongarra J J, Gustavson F G, Karp A. Implementing linear algebra algorithms for dense matrices on a Vecto pipeline machine[J].SIAM Rev, 1984,26:91~112. [2] Chandrika Kamath, Roy Ho, Dwight P Manley.DXML:A high-performance scientific subroutine library. http://www.digital.com/info/DTJF04/ DTJF04SC.TXT. [3] 李忠泽,陈 瑾,龙 翔,等.基于Pentium Pro的高性能BLAS的设计与实现[J].北京航空航天大学学报,1998,24(4):454~457. [4] Golub G H, Van Loan C F.Matrix computations[M]. 2nd ed. Baltimore:Johns Hopkins University Press, 1989. [5] Cannon L E.A cellular computer to implement the Kalman filter algorithm. Bozeman:Montana State University, 1969. [6] Fox G C, Johnson M A, Lyzenga G A, et al.Solving problems on concurrent processors[M]. Englewood Cliffs:Prentice Hall, 1988. [7] Agarwal R C, Gustavson F, Zubair M. A high-performance matrix multiplication algorithm on a distributed memory parallel computer using overlapped communication[J]. IBM Journal of Research and Development, 1994, 38(6):673~681. [8] Geijin R Van de, Watts J. SUMMA:scalable universal matrix multiplication algorithm. Technical Report of The University of Texas,TR-95-13, 1995.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views(2852) PDF downloads(3) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return