北京航空航天大学学报 ›› 2015, Vol. 41 ›› Issue (10): 1925-1934.doi: 10.13700/j.bh.1001-5965.2014.0771

• 算法与应用 • 上一篇    下一篇

汉语双模情感语音数据库标注及一致性检测

景少玲, 毛峡, 陈立江, 张娜娜   

  1. 北京航空航天大学电子信息工程学院, 北京 100191
  • 收稿日期:2014-12-08 修回日期:2015-01-16 出版日期:2015-10-20 发布日期:2015-11-02
  • 通讯作者: 毛峡(1952-),女,浙江义乌人,教授,moukyou@buaa.edu.cn,主要研究方向为人工智能、模式识别、情感计算、人机交互及红外目标检测、跟踪、识别和评价等. E-mail:moukyou@buaa.edu.cn
  • 作者简介:景少玲(1987-),女,山西永济人,博士研究生,jingshaoling2013@163.com
  • 基金资助:
    高等学校博士学科点专项科研基金(20121102130001);中央高校基本科研业务费专项资金(YWF-14-DZXY-015)

Annotations and consistency detection for Chinese dual-mode emotional speech database

JING Shaoling, MAO Xia, CHEN Lijiang, ZHANG Nana   

  1. School of Electronic and Information Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100191, China
  • Received:2014-12-08 Revised:2015-01-16 Online:2015-10-20 Published:2015-11-02

摘要: 对缺少含有丰富情感标注信息的情感语音数据库问题,建立了一个包含语音和电声门图仪(EGG)信息的汉语双模情感语音数据库,并对其进行了标注和一致性检测.首先,根据情感语音数据库的特色制定了详细的标注规则和方法,由5名标注者按照制定的标注规则对情感语音数据库进行标注.其次,为了确保情感语音数据库的标注质量和测试标注规则的完整性,标注者在正式标注之前先进行了测试性标注,测试语音包含280条语音(7种情感×2名说话人×20条语音).最后,根据语音标注规则设计了相应的一致性检测算法.结果表明,在5ms的时间误差范围内,5名标注者对相同语音标注的一致性平均可以达到60%以上,当误差范围增大至8ms和10ms时,一致性平均可提高5%和8%.实验说明5名标注者对语音的理解较一致,制定的标注规则比较完整,情感语音数据库的质量也较高.

关键词: 汉语, 双模, 情感语音数据库, 语音标注, 一致性检测

Abstract: To solve problem of lacking emotional speech database with rich emotion annotation information, a Chinese dual-mode emotional speech database which contained speech and Electroglottography (EGG) information was established. Annotation and consistency detection for the established database were conducted. Firstly, we designed detailed annotation rules and methods according to characteristics of emotional speech database and selected 5 annotators labeling emotional speech database in accordance with the rules. Secondly, in order to ensure annotation quality of emotional speech database and test the integrity of annotation rules, annotators labeled parts of utterances as a test before the official annotation, the test material comprises 280 sentences (seven emotions×two actors×twenty sentences). Finally, according to the speech annotation rules, we designed corresponding consistency detection algorithm. The results show that within the time error range of 5 ms, the annotation consistency for the same utterances which labeled by 5 annotators reaches more than 60% on average. When the time error range increased to 8 ms and 10 ms, consistency can be increased by 5% and 8% on average. The experiment indicates that 5 annotators are more consistent in understanding speech. The annotation rules we designed are more complete. The quality of emotional speech database is higher.

Key words: Chinese, dual-mode, emotional speech database, speech annotation, consistency detection

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发