Volume 46 Issue 8
Aug.  2020
Turn off MathJax
Article Contents
ZHANG Yang, NIU Zhixian, NIU Baoning, et al. Monaural singing voice separation based on high-resolution network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(8): 1555-1563. doi: 10.13700/j.bh.1001-5965.2019.0491(in Chinese)
Citation: ZHANG Yang, NIU Zhixian, NIU Baoning, et al. Monaural singing voice separation based on high-resolution network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(8): 1555-1563. doi: 10.13700/j.bh.1001-5965.2019.0491(in Chinese)

Monaural singing voice separation based on high-resolution network

doi: 10.13700/j.bh.1001-5965.2019.0491
Funds:

National Key R & D Program of China 2017YFB1401001-01

National Natural Science Foundation of China 61572345

More Information
  • Corresponding author: NIU Zhixian, E-mail:niuniurose63@163.com
  • Received Date: 09 Sep 2019
  • Accepted Date: 13 Dec 2019
  • Publish Date: 20 Aug 2020
  • Monaural singing voice separation separates singing voice and accompaniment from a song, which can be used for applications such as melody extraction, lyrics recognition, karaoke, etc. To resolve the limited accuracy of predicted spectrogram, this paper proposes a monaural singing voice separation algorithm based on high-resolution neural network, which has the advantages of parallel structure and sufficient features interaction for improving the performance of the model. Firstly, the high-resolution network suitable for singing voice separation is designed and constructed. Then, the spectrogram of the origin song is input to the network in order to get the predicted spectrograms of accompaniment and singing voice. Finally, the time-domain signals are reconstructed by combining the song phases with the separated spectrograms. Experiments conducted on the MIR-1K dataset show that SNR, SIR and SAR indicators of the proposed algorithm are better than those of the state-of-the-art algorithm, and the proposed algorithm improves the quality of the separated accompaniment and singing voice.

     

  • loading
  • [1]
    李伟, 李子晋, 高永伟.理解数字音乐——音乐信息检索技术综述[J].复旦学报(自然科学版), 2018, 57(3):5-47. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=fdxb201803001

    LI W, LI Z J, GAO Y W.Understanding digital music-A review of music information retrieval technology[J].Journal of Fudan University(Natural Science), 2018, 57(3):5-47(in Chinese). http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=fdxb201803001
    [2]
    SIMPSON A J R, ROMA G, PLUMBLEY M D.Deep karaoke: Extracting vocals from musical mixtures using a convolutional deep neural network[C]//International Conference on Latent Variable Analysis and Signal Separation.Berlin: Springer, 2015: 429-436.
    [3]
    HUANG P S, KIM M, HASEGAWA-JOHNSON M, et al.Joint optimization of masks and deep recurrent neural networks for monaural source separation[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(12):2136-2147. doi: 10.1109/TASLP.2015.2468583
    [4]
    UHLICH S, PORCH M, GIRON F, et al.Improving music source separation based on deep neural networks through data augmentation and network blending[C]//2017 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP).Piscataway: IEEE Press, 2017: 261-265.
    [5]
    JANSSON A, HUMPHREY E, MONTECCHIO N, et al.Singing voice separation with deep U-Net convolutional networks[C]//18th International Society for Music Information Retrieval Conference(ISMIR), 2017: 745-751.
    [6]
    PARK S, KIM T, LEE K, et al.Music source separation using stacked hourglass networks[C]//19th International Society for Music Information Retrieval Conference(ISMIR), 2018: 289-296.
    [7]
    STOLLER D, EWERT S, DIXON S.Wave-U-Net: A multi-scale neural network for end-to-end audio source separation[C]//19th International Society for Music Information Retrieval Conference(ISMIR), 2018: 334-340.
    [8]
    SUN K, XIAO B, LIU D, et al.Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2019: 5693-5703.
    [9]
    SUN K, ZHAO Y, JIANG B R, et al.High-resolution representations for labeling pixels and regions[EB/OL].(2019-04-09)[2019-09-01].https://arxiv.org/abs/1904.04514.
    [10]
    VIRTANEN T.Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria[J].IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(3):1066-1074. doi: 10.1109/TASL.2006.885253
    [11]
    HUANG P S, CHEN S D, SMARAGDIS P, et al.Singing-voice separation from monaural recordings using robust principal component analysis[C]//2012 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP).Piscataway: IEEE Press, 2012: 57-60.
    [12]
    HSU C L, WANG D L, JANG J S R, et al.A tandem algorithm for singing pitch extraction and voice separation from music accompaniment[J].IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(5):1482-1491. doi: 10.1109/TASL.2011.2182510
    [13]
    IKEMIYA Y, ITOYAMA K, YOSHⅡ K.Singing voice separation and vocal F0 estimation based on mutual combination of robust principal component analysis and subharmonic summation[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(11):2084-2095. doi: 10.1109/TASLP.2016.2577879
    [14]
    RAFⅡ Z, PARDO B.Music/voice separation using the similarity matrix[C]//13th International Society for Music Information Retrieval Conference(ISMIR), 2012: 583-588.
    [15]
    ZHU B L, LI W, LI R J, et al.Multi-stage non-negative matrix factorization for monaural singing voice separation[J].IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(10):2096-2107. doi: 10.1109/TASL.2013.2266773
    [16]
    ZHANG X, LI W, ZHU B L.Latent time-frequency component analysis: A novel pitch-based approach for singing voice separation[C]//2015 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP).Piscataway: IEEE Press, 2015: 131-135.
    [17]
    DEIF H, WANG W, GAN L, et al.Local discontinuity based approach for monaural singing voice separation from accompanying music with multi-stage non-negative matrix factorization[C]//2015 IEEE Global Conference on Signal and Information Processing(GlobalSIP).Piscataway: IEEE Press, 2015: 93-97.
    [18]
    HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2016: 770-778.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(2)

    Article Metrics

    Article views(522) PDF downloads(80) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return