[1] 李伟,李子晋,高永伟.理解数字音乐——音乐信息检索技术综述[J].复旦学报(自然科学版),2018,57(3):5-47.LI W,LI Z J,GAO Y W.Understanding digital music-A review of music information retrieval technology[J].Journal of Fudan University(Natural Science),2018,57(3):5-47(in Chinese). [2] SIMPSON A J R,ROMA G,PLUMBLEY M D.Deep karaoke:Extracting vocals from musical mixtures using a convolutional deep neural network[C]//International Conference on Latent Variable Analysis and Signal Separation.Berlin:Springer,2015:429-436. [3] HUANG P S,KIM M,HASEGAWA-JOHNSON M,et al.Joint optimization of masks and deep recurrent neural networks for monaural source separation[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2015,23(12):2136-2147. [4] UHLICH S,PORCH M,GIRON F,et al.Improving music source separation based on deep neural networks through data augmentation and network blending[C]//2017 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).Piscataway:IEEE Press,2017:261-265. [5] JANSSON A,HUMPHREY E,MONTECCHIO N,et al.Singing voice separation with deep U-Net convolutional networks[C]//18th International Society for Music Information Retrieval Conference(ISMIR),2017:745-751. [6] PARK S,KIM T,LEE K,et al.Music source separation using stacked hourglass networks[C]//19th International Society for Music Information Retrieval Conference(ISMIR),2018:289-296. [7] STOLLER D,EWERT S,DIXON S.Wave-U-Net:A multi-scale neural network for end-to-end audio source separation[C]//19th International Society for Music Information Retrieval Conference(ISMIR),2018:334-340. [8] SUN K,XIAO B,LIU D,et al.Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2019:5693-5703. [9] SUN K,ZHAO Y,JIANG B R,et al.High-resolution representations for labeling pixels and regions[EB/OL].(2019-04-09)[2019-09-01].https://arxiv.org/abs/1904.04514. [10] VIRTANEN T.Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria[J].IEEE Transactions on Audio,Speech,and Language Processing,2007,15(3):1066-1074. [11] HUANG P S,CHEN S D,SMARAGDIS P,et al.Singing-voice separation from monaural recordings using robust principal component analysis[C]//2012 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).Piscataway:IEEE Press,2012:57-60. [12] HSU C L,WANG D L,JANG J S R,et al.A tandem algorithm for singing pitch extraction and voice separation from music accompaniment[J].IEEE Transactions on Audio,Speech,and Language Processing,2012,20(5):1482-1491. [13] IKEMIYA Y,ITOYAMA K,YOSHⅡ K.Singing voice separation and vocal F0 estimation based on mutual combination of robust principal component analysis and subharmonic summation[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2016,24(11):2084-2095. [14] RAFⅡ Z,PARDO B.Music/voice separation using the similarity matrix[C]//13th International Society for Music Information Retrieval Conference(ISMIR),2012:583-588. [15] ZHU B L,LI W,LI R J,et al.Multi-stage non-negative matrix factorization for monaural singing voice separation[J].IEEE Transactions on Audio,Speech,and Language Processing,2013,21(10):2096-2107. [16] ZHANG X,LI W,ZHU B L.Latent time-frequency component analysis:A novel pitch-based approach for singing voice separation[C]//2015 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).Piscataway:IEEE Press,2015:131-135. [17] DEIF H,WANG W,GAN L,et al.Local discontinuity based approach for monaural singing voice separation from accompanying music with multi-stage non-negative matrix factorization[C]//2015 IEEE Global Conference on Signal and Information Processing(GlobalSIP).Piscataway:IEEE Press,2015:93-97. [18] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2016:770-778. |