留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于高分辨率网络的单声道歌声分离

张阳 牛之贤 牛保宁 常艳

张阳, 牛之贤, 牛保宁, 等 . 基于高分辨率网络的单声道歌声分离[J]. 北京航空航天大学学报, 2020, 46(8): 1555-1563. doi: 10.13700/j.bh.1001-5965.2019.0491
引用本文: 张阳, 牛之贤, 牛保宁, 等 . 基于高分辨率网络的单声道歌声分离[J]. 北京航空航天大学学报, 2020, 46(8): 1555-1563. doi: 10.13700/j.bh.1001-5965.2019.0491
ZHANG Yang, NIU Zhixian, NIU Baoning, et al. Monaural singing voice separation based on high-resolution network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(8): 1555-1563. doi: 10.13700/j.bh.1001-5965.2019.0491(in Chinese)
Citation: ZHANG Yang, NIU Zhixian, NIU Baoning, et al. Monaural singing voice separation based on high-resolution network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(8): 1555-1563. doi: 10.13700/j.bh.1001-5965.2019.0491(in Chinese)

基于高分辨率网络的单声道歌声分离

doi: 10.13700/j.bh.1001-5965.2019.0491
基金项目: 

国家重点研发计划 2017YFB1401001-01

国家自然科学基金 61572345

详细信息
    作者简介:

    张阳  女, 硕士研究生。主要研究方向:音乐信息检索

    牛之贤  女, 硕士, 副教授, 硕士生导师。主要研究方向:信息检索、数据挖掘、软件理论与算法

    牛保宁  男, 博士, 教授, 博士生导师。主要研究方向:大数据、数据库系统的自主计算与性能管理

    常艳  女, 硕士研究生。主要研究方向:操作系统安全

    通讯作者:

    牛之贤, E-mail:niuniurose63@163.com

  • 中图分类号: TP391

Monaural singing voice separation based on high-resolution network

Funds: 

National Key R & D Program of China 2017YFB1401001-01

National Natural Science Foundation of China 61572345

More Information
  • 摘要:

    单声道歌声分离是指将单声道歌曲中的伴奏和歌声分离,在旋律提取、歌词识别、卡拉OK伴奏等方面有重要应用。针对当前时频谱图预测精度受限的问题,利用高分辨率网络具有并行结构及特征充分交互提高模型性能的优势,提出基于高分辨率网络的单声道歌声分离算法。设计并构建适合单声道歌声分离的高分辨率网络,输入歌曲的时频谱图到网络,得到预测的伴奏和歌声时频谱图。结合歌曲相位进行重构,得到伴奏和歌声的时域信号。实验表明,在公开数据集MIR-1K上,所提算法的SNR、SIR、SAR指标均优于当前代表性算法,提高了分离后伴奏和歌声的质量。

     

  • 图 1  基于高分辨率网络的单声道歌声分离

    Figure 1.  Monaural singing voice separation based on high-resolution network

    图 2  多分辨率表征融合

    Figure 2.  Multi-resolution representation fusion

    图 3  多分辨率块

    Figure 3.  Multi-resolution block

    图 4  测试阶段总体框架

    Figure 4.  Overall framework of test phase

    图 5  不同算法预测的时频谱图及纯净时频谱图

    Figure 5.  Spectrograms predicted by different algorithms and real spectrograms

    图 6  不同歌声分离算法性能评估

    Figure 6.  Performance evaluation of different singing voice separation algorithms

    表  1  伴奏分离质量总体评估

    Table  1.   Overall evaluation of accompaniment separation quality dB

    算法 GSNR GSIR GSAR
    U-Net[5] 10.09 11.96 11.30
    SH-4stack[6] 12.61 14.19 12.25
    HR-Net(本文) 15.28 14.55 12.82
    下载: 导出CSV

    表  2  歌声分离质量总体评估

    Table  2.   Overall evaluation of singing voice separation quality dB

    算法 GSNR GSIR GSAR
    U-Net[5] 9.28 13.38 11.19
    SH-4stack[6] 12.09 15.38 12.47
    HR-Net(本文) 14.76 16.60 13.02
    下载: 导出CSV
  • [1] 李伟, 李子晋, 高永伟.理解数字音乐——音乐信息检索技术综述[J].复旦学报(自然科学版), 2018, 57(3):5-47. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=fdxb201803001

    LI W, LI Z J, GAO Y W.Understanding digital music-A review of music information retrieval technology[J].Journal of Fudan University(Natural Science), 2018, 57(3):5-47(in Chinese). http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=fdxb201803001
    [2] SIMPSON A J R, ROMA G, PLUMBLEY M D.Deep karaoke: Extracting vocals from musical mixtures using a convolutional deep neural network[C]//International Conference on Latent Variable Analysis and Signal Separation.Berlin: Springer, 2015: 429-436.
    [3] HUANG P S, KIM M, HASEGAWA-JOHNSON M, et al.Joint optimization of masks and deep recurrent neural networks for monaural source separation[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(12):2136-2147. doi: 10.1109/TASLP.2015.2468583
    [4] UHLICH S, PORCH M, GIRON F, et al.Improving music source separation based on deep neural networks through data augmentation and network blending[C]//2017 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP).Piscataway: IEEE Press, 2017: 261-265.
    [5] JANSSON A, HUMPHREY E, MONTECCHIO N, et al.Singing voice separation with deep U-Net convolutional networks[C]//18th International Society for Music Information Retrieval Conference(ISMIR), 2017: 745-751.
    [6] PARK S, KIM T, LEE K, et al.Music source separation using stacked hourglass networks[C]//19th International Society for Music Information Retrieval Conference(ISMIR), 2018: 289-296.
    [7] STOLLER D, EWERT S, DIXON S.Wave-U-Net: A multi-scale neural network for end-to-end audio source separation[C]//19th International Society for Music Information Retrieval Conference(ISMIR), 2018: 334-340.
    [8] SUN K, XIAO B, LIU D, et al.Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2019: 5693-5703.
    [9] SUN K, ZHAO Y, JIANG B R, et al.High-resolution representations for labeling pixels and regions[EB/OL].(2019-04-09)[2019-09-01].https://arxiv.org/abs/1904.04514.
    [10] VIRTANEN T.Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria[J].IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(3):1066-1074. doi: 10.1109/TASL.2006.885253
    [11] HUANG P S, CHEN S D, SMARAGDIS P, et al.Singing-voice separation from monaural recordings using robust principal component analysis[C]//2012 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP).Piscataway: IEEE Press, 2012: 57-60.
    [12] HSU C L, WANG D L, JANG J S R, et al.A tandem algorithm for singing pitch extraction and voice separation from music accompaniment[J].IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(5):1482-1491. doi: 10.1109/TASL.2011.2182510
    [13] IKEMIYA Y, ITOYAMA K, YOSHⅡ K.Singing voice separation and vocal F0 estimation based on mutual combination of robust principal component analysis and subharmonic summation[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(11):2084-2095. doi: 10.1109/TASLP.2016.2577879
    [14] RAFⅡ Z, PARDO B.Music/voice separation using the similarity matrix[C]//13th International Society for Music Information Retrieval Conference(ISMIR), 2012: 583-588.
    [15] ZHU B L, LI W, LI R J, et al.Multi-stage non-negative matrix factorization for monaural singing voice separation[J].IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(10):2096-2107. doi: 10.1109/TASL.2013.2266773
    [16] ZHANG X, LI W, ZHU B L.Latent time-frequency component analysis: A novel pitch-based approach for singing voice separation[C]//2015 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP).Piscataway: IEEE Press, 2015: 131-135.
    [17] DEIF H, WANG W, GAN L, et al.Local discontinuity based approach for monaural singing voice separation from accompanying music with multi-stage non-negative matrix factorization[C]//2015 IEEE Global Conference on Signal and Information Processing(GlobalSIP).Piscataway: IEEE Press, 2015: 93-97.
    [18] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2016: 770-778.
  • 加载中
图(6) / 表(2)
计量
  • 文章访问数:  470
  • HTML全文浏览量:  74
  • PDF下载量:  76
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-09-09
  • 录用日期:  2019-12-13
  • 网络出版日期:  2020-08-20

目录

    /

    返回文章
    返回
    常见问答