汉语双模情感语音数据库标注及一致性检测

景少玲; 毛峡; 陈立江; 张娜娜

doi:10.13700/j.bh.1001-5965.2014.0771

汉语双模情感语音数据库标注及一致性检测

doi: 10.13700/j.bh.1001-5965.2014.0771

北京航空航天大学电子信息工程学院, 北京 100191

基金项目: 高等学校博士学科点专项科研基金(20121102130001);中央高校基本科研业务费专项资金(YWF-14-DZXY-015)

详细信息

作者简介:
景少玲(1987-),女,山西永济人,博士研究生,jingshaoling2013@163.com

通讯作者:
毛峡(1952-),女,浙江义乌人,教授,moukyou@buaa.edu.cn,主要研究方向为人工智能、模式识别、情感计算、人机交互及红外目标检测、跟踪、识别和评价等.

中图分类号: TP391.4
计量
- 文章访问数: 1305
- HTML全文浏览量: 171
- PDF下载量: 938
- 被引次数: 0
出版历程
- 收稿日期: 2014-12-08
- 修回日期: 2015-01-16
- 网络出版日期: 2015-10-20

Annotations and consistency detection for Chinese dual-mode emotional speech database

School of Electronic and Information Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100191, China

摘要

摘要: 对缺少含有丰富情感标注信息的情感语音数据库问题,建立了一个包含语音和电声门图仪(EGG)信息的汉语双模情感语音数据库,并对其进行了标注和一致性检测.首先,根据情感语音数据库的特色制定了详细的标注规则和方法,由5名标注者按照制定的标注规则对情感语音数据库进行标注.其次,为了确保情感语音数据库的标注质量和测试标注规则的完整性,标注者在正式标注之前先进行了测试性标注,测试语音包含280条语音(7种情感×2名说话人×20条语音).最后,根据语音标注规则设计了相应的一致性检测算法.结果表明,在5ms的时间误差范围内,5名标注者对相同语音标注的一致性平均可以达到60%以上,当误差范围增大至8ms和10ms时,一致性平均可提高5%和8%.实验说明5名标注者对语音的理解较一致,制定的标注规则比较完整,情感语音数据库的质量也较高.
- 汉语 /
- 双模 /
- 情感语音数据库 /
- 语音标注 /
- 一致性检测
Abstract: To solve problem of lacking emotional speech database with rich emotion annotation information, a Chinese dual-mode emotional speech database which contained speech and Electroglottography (EGG) information was established. Annotation and consistency detection for the established database were conducted. Firstly, we designed detailed annotation rules and methods according to characteristics of emotional speech database and selected 5 annotators labeling emotional speech database in accordance with the rules. Secondly, in order to ensure annotation quality of emotional speech database and test the integrity of annotation rules, annotators labeled parts of utterances as a test before the official annotation, the test material comprises 280 sentences (seven emotions×two actors×twenty sentences). Finally, according to the speech annotation rules, we designed corresponding consistency detection algorithm. The results show that within the time error range of 5 ms, the annotation consistency for the same utterances which labeled by 5 annotators reaches more than 60% on average. When the time error range increased to 8 ms and 10 ms, consistency can be increased by 5% and 8% on average. The experiment indicates that 5 annotators are more consistent in understanding speech. The annotation rules we designed are more complete. The quality of emotional speech database is higher.
- Chinese /
- dual-mode /
- emotional speech database /
- speech annotation /
- consistency detection

HTML全文

参考文献(31)

[1]	韩文静,李海峰.情感语音数据库综述[J].智能计算机与应用,2013,3(1):5-7.Han W J,Li H F.A brief review on emotional speech databases[J].Intelligent Computer and Applications,2013,3(1):5-7(in Chinese).
[2]	徐露,徐明星,杨大利.面向情感变化检测的汉语情感语音数据库[J].清华大学学报:自然科学版,2009,49(S1):1413-1418.Xu L,Xu M X,Yang D L.Chinese emotional speech database for the detection of emotion variations[J].Journal of Tsinghua University:Natural Science,2009,49(S1):1413-1418(in Chinese).
[3]	薛雨丽,毛峡,张帆.BHU人脸表情数据库的设计与实现[J].北京航空航天大学学报,2007,33(2):224-228.Xue Y L,Mao X,Zhang F.Design and realization of BHU facial expression database[J].Beijing University of Aeronautics and Astronautics,2007,33(2):224-228(in Chinese).
[4]	Ververidis D,Kotropoulos C.A state of the art review on emotional speech databases[C]∥Proceedings of 1st Richmedia Conference.Lausanne:The European Association for Signal Processing,2003:109-119.
[5]	El Ayadi M,Kamel M S,Karray F.Survey on speech emotion recognition:Features,classification schemes,and databases[J].Pattern Recognition,2011,44(3):572-587.
[6]	Greasley P,Setter J,Waterman M,et al.Representation of prosodic and emotional features in a spoken language database[C]∥Proceedings of the 13th International Congress of Phonetic Sciences.Paris:IPA,1995:242-245.
[7]	Grimm M,Kroschel K,Narayanan S.The Vera am Mittag German audio-visual emotional speech database[C]∥IEEE International Conference on Multimedia and Expo.Piscataway,NJ:IEEE Press,2008:865-868.
[8]	Campbell N.The JST/CREST ESP project-a mid-term progress report[C]∥1st JST/CREST Intl.Workshop Expressive Speech Processing.Baixas:ISCA,2003:61-70.
[9]	Chong T Y,Xiao X,Tan T P,et al.Collection and annotation of malay conversational speech corpus[C]∥International Conference on Speech Database and Assessments (Oriental COCOSDA 2012).Piscataway,NJ:IEEE Press,2012:30-35.
[10]	Mori H,Satake T,Nakamura M,et al.Constructing a spoken dialogue corpus for studying paralinguistic information in expressive conversation and analyzing its statistical/acoustic characteristics[J].Speech Communication,2011,53(1):36-50.
[11]	Mori H,Hitomi T.Annotating conversational speech for corpus-based dialogue speech synthesizer-a first step[C]∥International Conference on Speech Database and Assessments (Oriental COCOSDA 2012).Piscataway,NJ:IEEE Press,2012:135-140.
[12]	CASIA.Database of Chinese emotional sppech[EB/OL].Beijing:Chinese Linguistic Data Consortium,2008(2010-10-09)[2014-12-8].http:∥www.chineseldc.org/resource_info.php?rid=76.
[13]	Pan Y C,Xu M X,Liu L Q,et al.Emotion-detecting based model selection for emotional speech recognition[C]∥IMACS Multiconference on Computational Engineering in Systems Applications.Piscataway,NJ:IEEE Press,2006,2:2169-2172.
[14]	Nwe T L,Foo S W,de Silva L C.Speech emotion recognition using hidden markov models[J].Speech Communication,2003,41(4):603-623.
[15]	Morrison D,Wang R,de Silva L C.Ensemble methods for spoken emotion recognition in call-centres[J].Speech Communication,2007,49(2):98-112.
[16]	Fu L,mao X,Chen L.Speaker independent emotion recognition based on SVM/HMMS fusion system[C]∥International Conference on Audio,Language and Image Processing,2008.Piscataway,NJ:IEEE Press:61-65.
[17]	Zhou J,Wang G,Yang Y,et al.Speech emotion recognition based on rough set and SVM[C]∥IEEE International Conference on Cognitive Informatics.Piscataway,NJ:IEEE Press,2006,1:53-61.
[18]	Hu H,Xu M X,Wu W.GMM Supervector based SVM with spectral features for speech emotion recognition[C]∥2007 IEEE International Conference on Acoustics,Speech and Signal Processing.Piscataway,NJ:IEEE Press,2007:413-416.
[19]	Burkhardt F,Paeschke A,Rolfes M,et al.A database of German emotional speech[C]∥Interspeech 2005.Baixas:ISCA,2005,5:1517-1520.
[20]	Schuller B.Towards intuitive speech interaction by the integration of emotional aspects[C]∥IEEE International Conference on Systems,Man and Cybernetics.Piscataway,NJ:IEEE Press,2002,6:6-12.
[21]	Engberg I S,Hansen A V.Documentation of the danish emotional speech database[R].Denmark:Aalborg University,1996.
[22]	Hansen J H L,Bou-Ghazale S E,Sarikaya R,et al.Getting started with SUSAS:A speech under simulated and actual stress database[C]∥EUROSPEECH 1997.Baixas:ISCA,1997,97(4):1743-1746.
[23]	Breazeal C,Aryananda L.Recognition of affective communicative intent in robot-directed speech[J].Autonomous Robots,2002,12(1):83-104.
[24]	Slaney M,Mcroberts G.BabyEars:A recognition system for affective vocalizations[J].Speech Communication,2003,39(3):367-384.
[25]	Wang M,Li Y,Lin M,et al.The development of a database of functional and emotional intonation in Chinese[C]∥International Conference on Speech Database and Assessments (Oriental COCOSDA 2011).Piscataway,NJ:IEEE Press,2011:136-141.
[26]	Li A J.Chinese prosody and prosodic labeling of spontaneous speech[C]∥International Conference on Speech Prosody 2002.Baixas:ISCA,2002.
[27]	刘亚斌.汉语自然口语的韵律分析和自动标注研究[D].北京:中国社会科学院研究生院,2003.Liu Y B.Prosodic analysis and automatic prosodic-labeling for Chinese spontaneous speech[D].Beijing:Graduate School of Chinese Academy of Social Sciences,2003(in Chinese).
[28]	Devillers L,Vidrascu L.Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs[C]∥Interspeech 2006.Baixas:ISCA 2006:801-804.
[29]	Truong K P,Neerincx M A,van Leeuwen D A.Assessing agreement of observer-and self-annotations in spontaneous multimodal emotion data[C]∥Interspeech 2008.Baixas:ISCA,2008:318-321.
[30]	Arimoto Y,Kawatsu H,Ohno S,et al.Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems[C]∥Ninth Annual Conference of the International Speech Communication Association.Baixas:ISCA,2008:322-325.
[31]	李爱军,陈肖霞,孙国华,等.CASS:一个具有语音学标注的汉语口语语音库[J].当代语言学,2002,4(2):81-89.Li A J,Chen X X,Sun G H,etal. CASS:A Chinese annotated spontaneous speech corpus[J].Contemporary Linguistics,2002,4(2):81-89(in Chinese).

施引文献

资源附件(0)

访问统计

点击查看大图

计量

文章访问数: 1305
HTML全文浏览量: 171
PDF下载量: 938
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

汉语双模情感语音数据库标注及一致性检测

doi: 10.13700/j.bh.1001-5965.2014.0771

作者简介:
景少玲(1987-),女,山西永济人,博士研究生,jingshaoling2013@163.com

通讯作者:
毛峡(1952-),女,浙江义乌人,教授,moukyou@buaa.edu.cn,主要研究方向为人工智能、模式识别、情感计算、人机交互及红外目标检测、跟踪、识别和评价等.

计量

Annotations and consistency detection for Chinese dual-mode emotional speech database

计量

目录

留言板

汉语双模情感语音数据库标注及一致性检测

doi: 10.13700/j.bh.1001-5965.2014.0771

作者简介: 景少玲(1987-),女,山西永济人,博士研究生,jingshaoling2013@163.com

通讯作者: 毛峡(1952-),女,浙江义乌人,教授,moukyou@buaa.edu.cn,主要研究方向为人工智能、模式识别、情感计算、人机交互及红外目标检测、跟踪、识别和评价等.

计量

出版历程

Annotations and consistency detection for Chinese dual-mode emotional speech database

计量

出版历程

目录

作者简介:
景少玲(1987-),女,山西永济人,博士研究生,jingshaoling2013@163.com

通讯作者:
毛峡(1952-),女,浙江义乌人,教授,moukyou@buaa.edu.cn,主要研究方向为人工智能、模式识别、情感计算、人机交互及红外目标检测、跟踪、识别和评价等.