留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于改进SEGAN的空管语音增强算法

王宇哲 黎新 牟睿 周继华 何怡甫

王宇哲,黎新,牟睿,等. 基于改进SEGAN的空管语音增强算法[J]. 北京航空航天大学学报,2024,50(12):3930-3939 doi: 10.13700/j.bh.1001-5965.2022.0874
引用本文: 王宇哲,黎新,牟睿,等. 基于改进SEGAN的空管语音增强算法[J]. 北京航空航天大学学报,2024,50(12):3930-3939 doi: 10.13700/j.bh.1001-5965.2022.0874
WANG Y Z,Li X,MOU R,et al. Improved SEGAN based ATC speech enhancement algorithm for air traffic control[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(12):3930-3939 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0874
Citation: WANG Y Z,Li X,MOU R,et al. Improved SEGAN based ATC speech enhancement algorithm for air traffic control[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(12):3930-3939 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0874

基于改进SEGAN的空管语音增强算法

doi: 10.13700/j.bh.1001-5965.2022.0874
基金项目: 国家自然科学基金民航联合基金重点项目(U2033213);中国民用航空飞行学院-安全保障能力提升专项(MHAQ2022007);中国民用航空飞行学院民航飞行技术与飞行安全重点实验室飞行技术专题项目(FZ2022ZX10);中国民用航空飞行学院空管重点实验室项目(XM3484)
详细信息
    通讯作者:

    E-mail:lixin11236@163.com

  • 中图分类号: TN912.35;V328.3;TP183

Improved SEGAN based ATC speech enhancement algorithm for air traffic control

Funds: National Natural Science Foundation of China Civil Aviation Joint Fund Key Project (U2033213); Civil Aviation Flight University of China-Safety and security capacity enhancement special (MHAQ2022007); Civil Aviation Flight University of China-Flight Technology Special Project of Key Laboratory of Civil Aviation Flight Technology and Flight Safety (FZ2022ZX10); Key Laboratory of Air Traffic Control Civil Aviation Flight University of China (XM3484)
More Information
  • 摘要:

    为提高空中交通管制过程中语音通话质量,提出了一种改进生成对抗网络的空管语音增强算法(SEGAN)。针对传统的SEGAN在低信噪比条件下,语音信号被淹没的问题,在SEGAN网络模型的基础上提出了多阶段、多映射、多维度输出的生成器和多尺度、多个鉴别器的网络模型。基于深层神经网络结构提取语音语义特征,完成空管语音语义分段;设置多个生成器,对语音信号做进一步优化处理;在卷积层中添加下采样模块,以提高模型对语音信息的利用率,减少语音信息的流失;采用多尺度、多个鉴别器、多方位学习语音样本的分布规律和信息。结果表明,在低信噪比条件下,改进SEGAN模型在短时客观可懂度(STOI)和语音质量感知评估(PESQ) 2个指标上分别提高23.28%和20.11%,能够快速有效的进行空管语音增强,为后续空管语音识别提供准备工作。

     

  • 图 1  SEGAN模型构架图

    Figure 1.  Framework of SEGAN model

    图 2  生成器结构图

    Figure 2.  Generator structure diagram

    图 3  改进SEGAN网络结构图

    Figure 3.  Improved SEGAN network structure diagram

    图 4  生成器网络结构图

    Figure 4.  Generator network structure diagram

    图 5  生成器的损失函数

    Figure 5.  Numerical plot of the loss function of the Generator

    图 6  D1和D2的网络结构图

    Figure 6.  Network structure diagram of D1 and D2

    图 7  鉴别器的网络结构图

    Figure 7.  Network structure diagram of the Discriminator

    图 8  不同卷积层的声谱图性能对比

    Figure 8.  Comparison of acoustic spectrogram performance of different convolution layers

    图 9  不同信噪比条件下语音频谱对比图

    Figure 9.  Contrast chart of speech spectrum under different signal-to-noise ratio conditions

    表  1  子鉴别器1

    Table  1.   Sub-discriminator 1

    网络层卷积核大小步长输入大小输出大小激活函数
    卷积层131416 384×14 096×16LeakyReLU
    卷积层23144 096×161 024×32LeakyReLU
    卷积层33141 024×32256×64LeakyReLU
    卷积层4314256×6464×128LeakyReLU
    卷积层531464×12816×256LeakyReLU
    卷积层631416×2564×512LeakyReLU
    卷积层7144×5124×1LeakyReLU
    全连接层4×11Softmax
    下载: 导出CSV

    表  2  子鉴别器2

    Table  2.   Sub-discriminator 2

    网络层卷积核大小步长输入大小输出大小激活函数
    卷积层13144 096×161 024×32LeakyReLU
    卷积层23141 024×32256×64LeakyReLU
    卷积层3314256×6464×128LeakyReLU
    卷积层431464×12816×256LeakyReLU
    卷积层531416×2564×512LeakyReLU
    卷积层6144×5124×1LeakyReLU
    全连接层4×11Softmax
    下载: 导出CSV

    表  3  子鉴别器3

    Table  3.   Sub-discriminator 3

    网络层卷积核大小步长输入大小输出大小激活函数
    卷积层13141 024×32256×64LeakyReLU
    卷积层2314256×6464×128LeakyReLU
    卷积层331464×12816×256LeakyReLU
    卷积层431416×2564×512LeakyReLU
    卷积层5144×5124×1LeakyReLU
    全连接层4×11Softmax
    下载: 导出CSV

    表  4  不同信噪比下STOI的评价结果

    Table  4.   Evaluation results of STOI under different signal-to-noise ratios

    模型 评价结果
    −15 dB −5 dB 0 dB 5 dB 15 dB
    Noisy 0.640 4 0.798 2 0.872 1 0.934 2 0.954 1
    SEGAN 0.621 2 0.803 1 0.898 4 0.924 6 0.951 7
    TFSEGAN 0.651 7 0.820 4 0.900 2 0.940 8 0.954 4
    改进SEGAN 0.765 8 0.891 2 0.922 3 0.946 5 0.969 3
    下载: 导出CSV

    表  5  不同信噪比下PESQ的评价结果

    Table  5.   Evaluation results of PESQ under different signal-to-noise ratios

    模型 评价结果
    −15 dB −5 dB 0 dB 5 dB 15 dB
    Noisy 1.097 8 1.297 3 1.613 0 1.823 4 2.230 4
    SEGAN 1.028 9 1.328 6 1.723 8 2.046 3 2.395 6
    TFSEGAN 1.151 9 1.435 9 1.760 3 2.095 7 2.417 2
    改进SEGAN 1.235 8 1.482 8 1.848 9 2.151 8 2.580 3
    下载: 导出CSV

    表  6  MOS评分标准

    Table  6.   MOS scoring criteria

    MOS得分 语音质量等级 参与者感受
    1 不能忍受
    2 厌烦但能忍受
    3 听到噪声,可接受
    4 刚能听到噪声
    5 几乎听不出噪声
    下载: 导出CSV

    表  7  测评结果

    Table  7.   Evaluation results

    模型 测评结果 总计 MOS
    1 2 3 4 5
    Noisy 3 8 89 0 0 100 2.86
    SEGAN 0 0 72 28 0 100 3.28
    DSEGAN 0 0 58 41 1 100 3.43
    改进SEGAN 0 0 13 83 4 100 3.91
    下载: 导出CSV

    表  8  不同区管不同信噪比下STOI的评价结果

    Table  8.   Evaluation results of STOI with different signal-to-noise ratios for different zone tubes

    模型 评价结果
    −15 dB −5 dB 0 dB 5 dB 15 dB
    Noisy 0.640 4 0.798 2 0.872 1 0.934 2 0.954 1
    中国XX空管中心 0.765 8 0.891 2 0.922 3 0.946 5 0.969 3
    区管1 0.765 2 0.890 9 0.921 7 0.946 0 0.968 9
    区管2 0.764 5 0.889 6 0.920 1 0.945 8 0.968 6
    区管3 0.764 0 0.889 3 0.910 5 0.945 9 0.969 0
    下载: 导出CSV

    表  9  不同区管不同信噪比下PESQ的评价结果

    Table  9.   Evaluation results of PESQ with different signal-to-noise ratios for different zone pipes

    模型 评价结果
    −15 dB −5 dB 0 dB 5 dB 15 dB
    Noisy 1.097 8 1.297 3 1.613 0 1.823 4 2.230 4
    中国XX空管中心 1.235 8 1.482 8 1.848 9 2.151 8 2.580 3
    区管1 1.234 6 1.482 2 1.848 6 2.151 1 2.580 0
    区管2 1.235 9 1.482 4 1.848 2 2.150 9 2.580 6
    区管3 1.234 0 1.481 7 1.847 9 2.151 3 2.579 9
    下载: 导出CSV
  • [1] 张军峰, 游录宝, 周铭, 等. 基于点融合系统的多目标进场排序与调度[J]. 北京航空航天大学学报, 2023, 49(1): 66-73.

    ZHANG J F, YOU L B, ZHOU M, et al. Multi-objective arrival sequencing and scheduling based on point merge system[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(1): 66-73 (in Chinese).
    [2] 王钇翔. 面向民航空中管制语音指令的语音增强算法系统研究与应用[D]. 成都: 电子科技大学, 2022: 46-60.

    WANG Y X. Research and application of voice enhancement algorithm system for civil aviation air traffic control voice command[D]. Chengdu: University of Electronic Science and Technology of China, 2022: 46-60(in Chinese).
    [3] 中国民航局. 2018年中国航空安全年度报告 [R]. 北京: 中国民航局, 2019.

    Civil Aviation Administration of China. Annual report on aviation safety in China, 2018 [R]. Beijing: Civil Aviation Administration of China, 2019(in Chinese).
    [4] 周坤, 陈文杰, 陈伟海, 等. 基于三次样条插值的扩展谱减语音增强算法[J]. 北京航空航天大学学报, 2023, 49(10): 2826-2834.

    ZHOU K, CHEN W J, CHEN W H, et al. Spline subtraction speech enhancement based on cubic spline interpolation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(10): 2826-2834(in Chinese).
    [5] KARAM M, KHAZAAL H F, AGLAN H, et al. Noise removal in speech processing using spectral subtraction[J]. Journal of Signal and Information Processing, 2014, 5(2): 32-41. doi: 10.4236/jsip.2014.52006
    [6] CHEN J D, BENESTY J, HUANG Y T, et al. New insights into the noise reduction Wiener filter[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1218-1234. doi: 10.1109/TSA.2005.860851
    [7] LIM J S, OPPENHEIM A V. Enhancement and bandwidth compression of noisy speech[J]. Proceedings of the IEEE, 1979, 67(12): 1586-1604. doi: 10.1109/PROC.1979.11540
    [8] 孙琦. 基于子空间的低计算复杂度语音增强算法研究[D]. 长春: 吉林大学, 2017: 18-23.

    SUN Q. Research on speech enhancement algorithm with low computational complexity based on subspace[D]. Changchun: Jilin University, 2017: 18-23 (in Chinese).
    [9] DENDRINOS M, BAKAMIDIS S, CARAYANNIS G. Speech enhancement from noise: A regenerative approach[J]. Speech Communication, 1991, 10(1): 45-57. doi: 10.1016/0167-6393(91)90027-Q
    [10] TUFTS D W, KUMARESAN R, KIRSTEINS I. Data adaptive signal estimation by singular value decomposition of a data matrix[J]. Proceedings of the IEEE, 1982, 70(6): 684-685. doi: 10.1109/PROC.1982.12367
    [11] LEE D D, SEUNG H S. Learning the parts of objects by non-negative matrix factorization[J]. Nature, 1999, 401: 788-791. doi: 10.1038/44565
    [12] 娄迎曦, 袁文浩, 时云龙, 等. 融合注意力机制的QRNN语音增强方法[J]. 山东理工大学学报(自然科学版), 2022, 36(3): 7-12.

    LOU Y X, YUAN W H, SHI Y L, et al. A speech enhancement method based on QRNN incorporating attention mechanism[J]. Journal of Shandong University of Technology (Natural Science Edition), 2022, 36(3): 7-12 (in Chinese).
    [13] SCHMIDHUBER J. Deep learning in neural networks: An overview[J]. Neural Networks, 2015, 61: 85-117. doi: 10.1016/j.neunet.2014.09.003
    [14] SCALART P, FILHO J V. Speech enhancement based on a priori signal to noise estimation[C]// 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings. Piscataway: IEEE Press, 1996: 629-632.
    [15] XU Y, DU J, DAI L R, et al. An experimental study on speech enhancement based on deep neural networks[J]. IEEE Signal Processing Letters, 2014, 21(1): 65-68. doi: 10.1109/LSP.2013.2291240
    [16] KANG T G, KWON K, SHIN J W, et al. NMF-based speech enhancement incorporating deep neural network[C]//Interspeech 2014. Singapore: ISCA, 2014.
    [17] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144. doi: 10.1145/3422622
    [18] PASCUAL S, BONAFONTE A, SERRÀ J. SEGAN: Speech enhancement generative adversarial network[EB/OL]. (2017-06-09)[2022-10-30]. http://arxiv.org/abs/1703.09452.
    [19] 尹文兵, 高戈, 曾邦, 等. 基于时频域生成对抗网络的语音增强算法[J]. 计算机科学, 2022, 49(6): 187-192. doi: 10.11896/jsjkx.210500114

    YIN W B, GAO G, ZENG B, et al. Speech enhancement based on time-frequency domain GAN[J]. Computer Science, 2022, 49(6): 187-192 (in Chinese). doi: 10.11896/jsjkx.210500114
    [20] 李晓理, 张博, 王康, 等. 人工智能的发展及应用[J]. 北京工业大学学报, 2020, 46(6): 583-590.

    LI X L, ZHANG B, WANG K, et al. Development and application of artificial intelligence[J]. Journal of Beijing University of Technology, 2020, 46(6): 583-590 (in Chinese).
    [21] QUAN T M, NGUYEN-DUC T, JEONG W K. Compressed sensing MRI reconstruction using a generative adversarial network with a cyclic loss[J]. IEEE Transactions on Medical Imaging, 2018, 37(6): 1488-1497. doi: 10.1109/TMI.2018.2820120
    [22] PHAN H, MCLOUGHLIN I V, PHAM L, et al. Improving GANs for speech enhancement[J]. IEEE Signal Processing Letters, 2020, 27: 1700-1704. doi: 10.1109/LSP.2020.3025020
    [23] PANDEY A, WANG D L. On adversarial training and loss functions for speech enhancement[C]// 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE Press, 2018: 5414-5418.
  • 加载中
图(9) / 表(9)
计量
  • 文章访问数:  191
  • HTML全文浏览量:  45
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-10-30
  • 录用日期:  2023-07-28
  • 网络出版日期:  2023-09-08
  • 整期出版日期:  2024-12-31

目录

    /

    返回文章
    返回
    常见问答