留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

汉语双模情感语音数据库标注及一致性检测

景少玲 毛峡 陈立江 张娜娜

景少玲, 毛峡, 陈立江, 等 . 汉语双模情感语音数据库标注及一致性检测[J]. 北京航空航天大学学报, 2015, 41(10): 1925-1934. doi: 10.13700/j.bh.1001-5965.2014.0771
引用本文: 景少玲, 毛峡, 陈立江, 等 . 汉语双模情感语音数据库标注及一致性检测[J]. 北京航空航天大学学报, 2015, 41(10): 1925-1934. doi: 10.13700/j.bh.1001-5965.2014.0771
JING Shaoling, MAO Xia, CHEN Lijiang, et al. Annotations and consistency detection for Chinese dual-mode emotional speech database[J]. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(10): 1925-1934. doi: 10.13700/j.bh.1001-5965.2014.0771(in Chinese)
Citation: JING Shaoling, MAO Xia, CHEN Lijiang, et al. Annotations and consistency detection for Chinese dual-mode emotional speech database[J]. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(10): 1925-1934. doi: 10.13700/j.bh.1001-5965.2014.0771(in Chinese)

汉语双模情感语音数据库标注及一致性检测

doi: 10.13700/j.bh.1001-5965.2014.0771
基金项目: 高等学校博士学科点专项科研基金(20121102130001);中央高校基本科研业务费专项资金(YWF-14-DZXY-015)
详细信息
    作者简介:

    景少玲(1987-),女,山西永济人,博士研究生,jingshaoling2013@163.com

    通讯作者:

    毛峡(1952-),女,浙江义乌人,教授,moukyou@buaa.edu.cn,主要研究方向为人工智能、模式识别、情感计算、人机交互及红外目标检测、跟踪、识别和评价等.

  • 中图分类号: TP391.4

Annotations and consistency detection for Chinese dual-mode emotional speech database

  • 摘要: 对缺少含有丰富情感标注信息的情感语音数据库问题,建立了一个包含语音和电声门图仪(EGG)信息的汉语双模情感语音数据库,并对其进行了标注和一致性检测.首先,根据情感语音数据库的特色制定了详细的标注规则和方法,由5名标注者按照制定的标注规则对情感语音数据库进行标注.其次,为了确保情感语音数据库的标注质量和测试标注规则的完整性,标注者在正式标注之前先进行了测试性标注,测试语音包含280条语音(7种情感×2名说话人×20条语音).最后,根据语音标注规则设计了相应的一致性检测算法.结果表明,在5ms的时间误差范围内,5名标注者对相同语音标注的一致性平均可以达到60%以上,当误差范围增大至8ms和10ms时,一致性平均可提高5%和8%.实验说明5名标注者对语音的理解较一致,制定的标注规则比较完整,情感语音数据库的质量也较高.

     

  • [1] 韩文静,李海峰.情感语音数据库综述[J].智能计算机与应用,2013,3(1):5-7.Han W J,Li H F.A brief review on emotional speech databases[J].Intelligent Computer and Applications,2013,3(1):5-7(in Chinese).
    [2] 徐露,徐明星,杨大利.面向情感变化检测的汉语情感语音数据库[J].清华大学学报:自然科学版,2009,49(S1):1413-1418.Xu L,Xu M X,Yang D L.Chinese emotional speech database for the detection of emotion variations[J].Journal of Tsinghua University:Natural Science,2009,49(S1):1413-1418(in Chinese).
    [3] 薛雨丽,毛峡,张帆.BHU人脸表情数据库的设计与实现[J].北京航空航天大学学报,2007,33(2):224-228.Xue Y L,Mao X,Zhang F.Design and realization of BHU facial expression database[J].Beijing University of Aeronautics and Astronautics,2007,33(2):224-228(in Chinese).
    [4] Ververidis D,Kotropoulos C.A state of the art review on emotional speech databases[C]∥Proceedings of 1st Richmedia Conference.Lausanne:The European Association for Signal Processing,2003:109-119.
    [5] El Ayadi M,Kamel M S,Karray F.Survey on speech emotion recognition:Features,classification schemes,and databases[J].Pattern Recognition,2011,44(3):572-587.
    [6] Greasley P,Setter J,Waterman M,et al.Representation of prosodic and emotional features in a spoken language database[C]∥Proceedings of the 13th International Congress of Phonetic Sciences.Paris:IPA,1995:242-245.
    [7] Grimm M,Kroschel K,Narayanan S.The Vera am Mittag German audio-visual emotional speech database[C]∥IEEE International Conference on Multimedia and Expo.Piscataway,NJ:IEEE Press,2008:865-868.
    [8] Campbell N.The JST/CREST ESP project-a mid-term progress report[C]∥1st JST/CREST Intl.Workshop Expressive Speech Processing.Baixas:ISCA,2003:61-70.
    [9] Chong T Y,Xiao X,Tan T P,et al.Collection and annotation of malay conversational speech corpus[C]∥International Conference on Speech Database and Assessments (Oriental COCOSDA 2012).Piscataway,NJ:IEEE Press,2012:30-35.
    [10] Mori H,Satake T,Nakamura M,et al.Constructing a spoken dialogue corpus for studying paralinguistic information in expressive conversation and analyzing its statistical/acoustic characteristics[J].Speech Communication,2011,53(1):36-50.
    [11] Mori H,Hitomi T.Annotating conversational speech for corpus-based dialogue speech synthesizer-a first step[C]∥International Conference on Speech Database and Assessments (Oriental COCOSDA 2012).Piscataway,NJ:IEEE Press,2012:135-140.
    [12] CASIA.Database of Chinese emotional sppech[EB/OL].Beijing:Chinese Linguistic Data Consortium,2008(2010-10-09)[2014-12-8].http:∥www.chineseldc.org/resource_info.php?rid=76.
    [13] Pan Y C,Xu M X,Liu L Q,et al.Emotion-detecting based model selection for emotional speech recognition[C]∥IMACS Multiconference on Computational Engineering in Systems Applications.Piscataway,NJ:IEEE Press,2006,2:2169-2172.
    [14] Nwe T L,Foo S W,de Silva L C.Speech emotion recognition using hidden markov models[J].Speech Communication,2003,41(4):603-623.
    [15] Morrison D,Wang R,de Silva L C.Ensemble methods for spoken emotion recognition in call-centres[J].Speech Communication,2007,49(2):98-112.
    [16] Fu L,mao X,Chen L.Speaker independent emotion recognition based on SVM/HMMS fusion system[C]∥International Conference on Audio,Language and Image Processing,2008.Piscataway,NJ:IEEE Press:61-65.
    [17] Zhou J,Wang G,Yang Y,et al.Speech emotion recognition based on rough set and SVM[C]∥IEEE International Conference on Cognitive Informatics.Piscataway,NJ:IEEE Press,2006,1:53-61.
    [18] Hu H,Xu M X,Wu W.GMM Supervector based SVM with spectral features for speech emotion recognition[C]∥2007 IEEE International Conference on Acoustics,Speech and Signal Processing.Piscataway,NJ:IEEE Press,2007:413-416.
    [19] Burkhardt F,Paeschke A,Rolfes M,et al.A database of German emotional speech[C]∥Interspeech 2005.Baixas:ISCA,2005,5:1517-1520.
    [20] Schuller B.Towards intuitive speech interaction by the integration of emotional aspects[C]∥IEEE International Conference on Systems,Man and Cybernetics.Piscataway,NJ:IEEE Press,2002,6:6-12.
    [21] Engberg I S,Hansen A V.Documentation of the danish emotional speech database[R].Denmark:Aalborg University,1996.
    [22] Hansen J H L,Bou-Ghazale S E,Sarikaya R,et al.Getting started with SUSAS:A speech under simulated and actual stress database[C]∥EUROSPEECH 1997.Baixas:ISCA,1997,97(4):1743-1746.
    [23] Breazeal C,Aryananda L.Recognition of affective communicative intent in robot-directed speech[J].Autonomous Robots,2002,12(1):83-104.
    [24] Slaney M,Mcroberts G.BabyEars:A recognition system for affective vocalizations[J].Speech Communication,2003,39(3):367-384.
    [25] Wang M,Li Y,Lin M,et al.The development of a database of functional and emotional intonation in Chinese[C]∥International Conference on Speech Database and Assessments (Oriental COCOSDA 2011).Piscataway,NJ:IEEE Press,2011:136-141.
    [26] Li A J.Chinese prosody and prosodic labeling of spontaneous speech[C]∥International Conference on Speech Prosody 2002.Baixas:ISCA,2002.
    [27] 刘亚斌.汉语自然口语的韵律分析和自动标注研究[D].北京:中国社会科学院研究生院,2003.Liu Y B.Prosodic analysis and automatic prosodic-labeling for Chinese spontaneous speech[D].Beijing:Graduate School of Chinese Academy of Social Sciences,2003(in Chinese).
    [28] Devillers L,Vidrascu L.Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs[C]∥Interspeech 2006.Baixas:ISCA 2006:801-804.
    [29] Truong K P,Neerincx M A,van Leeuwen D A.Assessing agreement of observer-and self-annotations in spontaneous multimodal emotion data[C]∥Interspeech 2008.Baixas:ISCA,2008:318-321.
    [30] Arimoto Y,Kawatsu H,Ohno S,et al.Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems[C]∥Ninth Annual Conference of the International Speech Communication Association.Baixas:ISCA,2008:322-325.
    [31] 李爱军,陈肖霞,孙国华,等.CASS:一个具有语音学标注的汉语口语语音库[J].当代语言学,2002,4(2):81-89.Li A J,Chen X X,Sun G H,etal. CASS:A Chinese annotated spontaneous speech corpus[J].Contemporary Linguistics,2002,4(2):81-89(in Chinese).
  • 加载中
计量
  • 文章访问数:  1250
  • HTML全文浏览量:  166
  • PDF下载量:  925
  • 被引次数: 0
出版历程
  • 收稿日期:  2014-12-08
  • 修回日期:  2015-01-16
  • 网络出版日期:  2015-10-20

目录

    /

    返回文章
    返回
    常见问答