北京航空航天大学学报 ›› 2020, Vol. 46 ›› Issue (8): 1555-1563.doi: 10.13700/j.bh.1001-5965.2019.0491

• 论文 • 上一篇    下一篇

基于高分辨率网络的单声道歌声分离

张阳1, 牛之贤1, 牛保宁1, 常艳2   

  1. 1. 太原理工大学 信息与计算机学院, 晋中 030600;
    2. 中国科学院软件研究所, 北京 100190
  • 收稿日期:2019-09-09 发布日期:2020-08-27
  • 通讯作者: 牛之贤 E-mail:niuniurose63@163.com
  • 作者简介:张阳 女,硕士研究生。主要研究方向:音乐信息检索。
    牛之贤 女,硕士,副教授,硕士生导师。主要研究方向:信息检索、数据挖掘、软件理论与算法。
    牛保宁 男,博士,教授,博士生导师。主要研究方向:大数据、数据库系统的自主计算与性能管理。
    常艳 女,硕士研究生。主要研究方向:操作系统安全。
  • 基金资助:
    国家重点研发计划(2017YFB1401001-01);国家自然科学基金(61572345)

Monaural singing voice separation based on high-resolution network

ZHANG Yang1, NIU Zhixian1, NIU Baoning1, CHANG Yan2   

  1. 1. College of Information and Computer, Taiyuan University of Technology, Jinzhong 030600, China;
    2. Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2019-09-09 Published:2020-08-27
  • Supported by:
    National Key R & D Program of China (2017YFB1401001-01); National Natural Science Foundation of China (61572345)

摘要: 单声道歌声分离是指将单声道歌曲中的伴奏和歌声分离,在旋律提取、歌词识别、卡拉OK伴奏等方面有重要应用。针对当前时频谱图预测精度受限的问题,利用高分辨率网络具有并行结构及特征充分交互提高模型性能的优势,提出基于高分辨率网络的单声道歌声分离算法。设计并构建适合单声道歌声分离的高分辨率网络,输入歌曲的时频谱图到网络,得到预测的伴奏和歌声时频谱图。结合歌曲相位进行重构,得到伴奏和歌声的时域信号。实验表明,在公开数据集MIR-1K上,所提算法的SNR、SIR、SAR指标均优于当前代表性算法,提高了分离后伴奏和歌声的质量。

关键词: 单声道歌声分离, 深度学习, 时频谱图, 高分辨率网络, 频域模型

Abstract: Monaural singing voice separation separates singing voice and accompaniment from a song, which can be used for applications such as melody extraction, lyrics recognition, karaoke, etc. To resolve the limited accuracy of predicted spectrogram, this paper proposes a monaural singing voice separation algorithm based on high-resolution neural network, which has the advantages of parallel structure and sufficient features interaction for improving the performance of the model. Firstly, the high-resolution network suitable for singing voice separation is designed and constructed. Then, the spectrogram of the origin song is input to the network in order to get the predicted spectrograms of accompaniment and singing voice. Finally, the time-domain signals are reconstructed by combining the song phases with the separated spectrograms. Experiments conducted on the MIR-1K dataset show that SNR, SIR and SAR indicators of the proposed algorithm are better than those of the state-of-the-art algorithm, and the proposed algorithm improves the quality of the separated accompaniment and singing voice.

Key words: monaural singing voice separation, deep learning, spectrogram, high-resolution network, frequency-domain model

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发