Citation: | LONG Xiang, LI Zhong-ze, CHEN Jinet al. Optimized BLAS and Its Effect on Performance of Parallel Programs[J]. Journal of Beijing University of Aeronautics and Astronautics, 2001, 27(1): 79-82. (in Chinese) |
[1] Dongarra J J, Gustavson F G, Karp A. Implementing linear algebra algorithms for dense matrices on a Vecto pipeline machine[J].SIAM Rev, 1984,26:91~112. [2] Chandrika Kamath, Roy Ho, Dwight P Manley.DXML:A high-performance scientific subroutine library. http://www.digital.com/info/DTJF04/ DTJF04SC.TXT. [3] 李忠泽,陈 瑾,龙 翔,等.基于Pentium Pro的高性能BLAS的设计与实现[J].北京航空航天大学学报,1998,24(4):454~457. [4] Golub G H, Van Loan C F.Matrix computations[M]. 2nd ed. Baltimore:Johns Hopkins University Press, 1989. [5] Cannon L E.A cellular computer to implement the Kalman filter algorithm. Bozeman:Montana State University, 1969. [6] Fox G C, Johnson M A, Lyzenga G A, et al.Solving problems on concurrent processors[M]. Englewood Cliffs:Prentice Hall, 1988. [7] Agarwal R C, Gustavson F, Zubair M. A high-performance matrix multiplication algorithm on a distributed memory parallel computer using overlapped communication[J]. IBM Journal of Research and Development, 1994, 38(6):673~681. [8] Geijin R Van de, Watts J. SUMMA:scalable universal matrix multiplication algorithm. Technical Report of The University of Texas,TR-95-13, 1995.
|