Citation: | LONG Xiang, LI Zhong-ze, CHEN Jinet al. Optimized BLAS and Its Effect on Performance of Parallel Programs[J]. Journal of Beijing University of Aeronautics and Astronautics, 2001, 27(1): 79-82. (in Chinese) |
[1] Dongarra J J, Gustavson F G, Karp A. Implementing linear algebra algorithms for dense matrices on a Vecto pipeline machine[J].SIAM Rev, 1984,26:91~112.[2] Chandrika Kamath, Roy Ho, Dwight P Manley.DXML:A high-performance scientific subroutine library. [3] 李忠泽,陈 瑾,龙 翔,等.基于Pentium Pro的高性能BLAS的设计与实现[J].北京航空航天大学学报,1998,24(4):454~457.[4] Golub G H, Van Loan C F.Matrix computations[M]. 2nd ed. Baltimore:Johns Hopkins University Press, 1989.[5] Cannon L E.A cellular computer to implement the Kalman filter algorithm. Bozeman:Montana State University, 1969.[6] Fox G C, Johnson M A, Lyzenga G A, et al.Solving problems on concurrent processors[M]. Englewood Cliffs:Prentice Hall, 1988.[7] Agarwal R C, Gustavson F, Zubair M. A high-performance matrix multiplication algorithm on a distributed memory parallel computer using overlapped communication[J]. IBM Journal of Research and Development, 1994, 38(6):673~681.[8] Geijin R Van de, Watts J. SUMMA:scalable universal matrix multiplication algorithm. Technical Report of The University of Texas,TR-95-13, 1995.
|