Annotations and consistency detection for Chinese dual-mode emotional speech database

JING Shaoling; MAO Xia; CHEN Lijiang; ZHANG Nana

doi:10.13700/j.bh.1001-5965.2014.0771

Volume 41 Issue 10

Oct. 2015

Turn off MathJax

Article Contents

Journal of Beijing University of Aeronautics and Astronautics > 2015 > 41(10): 1925-1934.

JING Shaoling, MAO Xia, CHEN Lijiang, et al. Annotations and consistency detection for Chinese dual-mode emotional speech database[J]. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(10): 1925-1934. doi: 10.13700/j.bh.1001-5965.2014.0771(in Chinese)

Citation:

JING Shaoling, MAO Xia, CHEN Lijiang, et al. Annotations and consistency detection for Chinese dual-mode emotional speech database[J]. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(10): 1925-1934. doi: 10.13700/j.bh.1001-5965.2014.0771(in Chinese)

Citation:

PDF( 3748 KB)

Annotations and consistency detection for Chinese dual-mode emotional speech database

doi: 10.13700/j.bh.1001-5965.2014.0771

School of Electronic and Information Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100191, China

Received Date: 08 Dec 2014
Rev Recd Date: 16 Jan 2015
Publish Date: 20 Oct 2015

Abstract

Abstract

To solve problem of lacking emotional speech database with rich emotion annotation information, a Chinese dual-mode emotional speech database which contained speech and Electroglottography (EGG) information was established. Annotation and consistency detection for the established database were conducted. Firstly, we designed detailed annotation rules and methods according to characteristics of emotional speech database and selected 5 annotators labeling emotional speech database in accordance with the rules. Secondly, in order to ensure annotation quality of emotional speech database and test the integrity of annotation rules, annotators labeled parts of utterances as a test before the official annotation, the test material comprises 280 sentences (seven emotions×two actors×twenty sentences). Finally, according to the speech annotation rules, we designed corresponding consistency detection algorithm. The results show that within the time error range of 5 ms, the annotation consistency for the same utterances which labeled by 5 annotators reaches more than 60% on average. When the time error range increased to 8 ms and 10 ms, consistency can be increased by 5% and 8% on average. The experiment indicates that 5 annotators are more consistent in understanding speech. The annotation rules we designed are more complete. The quality of emotional speech database is higher.
- Chinese,
- dual-mode,
- emotional speech database,
- speech annotation,
- consistency detection

FullText(HTML)

References(31)

References

[1]	韩文静,李海峰.情感语音数据库综述[J].智能计算机与应用,2013,3(1):5-7.Han W J,Li H F.A brief review on emotional speech databases[J].Intelligent Computer and Applications,2013,3(1):5-7(in Chinese).
[2]	徐露,徐明星,杨大利.面向情感变化检测的汉语情感语音数据库[J].清华大学学报:自然科学版,2009,49(S1):1413-1418.Xu L,Xu M X,Yang D L.Chinese emotional speech database for the detection of emotion variations[J].Journal of Tsinghua University:Natural Science,2009,49(S1):1413-1418(in Chinese).
[3]	薛雨丽,毛峡,张帆.BHU人脸表情数据库的设计与实现[J].北京航空航天大学学报,2007,33(2):224-228.Xue Y L,Mao X,Zhang F.Design and realization of BHU facial expression database[J].Beijing University of Aeronautics and Astronautics,2007,33(2):224-228(in Chinese).
[4]	Ververidis D,Kotropoulos C.A state of the art review on emotional speech databases[C]∥Proceedings of 1st Richmedia Conference.Lausanne:The European Association for Signal Processing,2003:109-119.
[5]	El Ayadi M,Kamel M S,Karray F.Survey on speech emotion recognition:Features,classification schemes,and databases[J].Pattern Recognition,2011,44(3):572-587.
[6]	Greasley P,Setter J,Waterman M,et al.Representation of prosodic and emotional features in a spoken language database[C]∥Proceedings of the 13th International Congress of Phonetic Sciences.Paris:IPA,1995:242-245.
[7]	Grimm M,Kroschel K,Narayanan S.The Vera am Mittag German audio-visual emotional speech database[C]∥IEEE International Conference on Multimedia and Expo.Piscataway,NJ:IEEE Press,2008:865-868.
[8]	Campbell N.The JST/CREST ESP project-a mid-term progress report[C]∥1st JST/CREST Intl.Workshop Expressive Speech Processing.Baixas:ISCA,2003:61-70.
[9]	Chong T Y,Xiao X,Tan T P,et al.Collection and annotation of malay conversational speech corpus[C]∥International Conference on Speech Database and Assessments (Oriental COCOSDA 2012).Piscataway,NJ:IEEE Press,2012:30-35.
[10]	Mori H,Satake T,Nakamura M,et al.Constructing a spoken dialogue corpus for studying paralinguistic information in expressive conversation and analyzing its statistical/acoustic characteristics[J].Speech Communication,2011,53(1):36-50.
[11]	Mori H,Hitomi T.Annotating conversational speech for corpus-based dialogue speech synthesizer-a first step[C]∥International Conference on Speech Database and Assessments (Oriental COCOSDA 2012).Piscataway,NJ:IEEE Press,2012:135-140.
[12]	CASIA.Database of Chinese emotional sppech[EB/OL].Beijing:Chinese Linguistic Data Consortium,2008(2010-10-09)[2014-12-8].http:∥www.chineseldc.org/resource_info.php?rid=76.
[13]	Pan Y C,Xu M X,Liu L Q,et al.Emotion-detecting based model selection for emotional speech recognition[C]∥IMACS Multiconference on Computational Engineering in Systems Applications.Piscataway,NJ:IEEE Press,2006,2:2169-2172.
[14]	Nwe T L,Foo S W,de Silva L C.Speech emotion recognition using hidden markov models[J].Speech Communication,2003,41(4):603-623.
[15]	Morrison D,Wang R,de Silva L C.Ensemble methods for spoken emotion recognition in call-centres[J].Speech Communication,2007,49(2):98-112.
[16]	Fu L,mao X,Chen L.Speaker independent emotion recognition based on SVM/HMMS fusion system[C]∥International Conference on Audio,Language and Image Processing,2008.Piscataway,NJ:IEEE Press:61-65.
[17]	Zhou J,Wang G,Yang Y,et al.Speech emotion recognition based on rough set and SVM[C]∥IEEE International Conference on Cognitive Informatics.Piscataway,NJ:IEEE Press,2006,1:53-61.
[18]	Hu H,Xu M X,Wu W.GMM Supervector based SVM with spectral features for speech emotion recognition[C]∥2007 IEEE International Conference on Acoustics,Speech and Signal Processing.Piscataway,NJ:IEEE Press,2007:413-416.
[19]	Burkhardt F,Paeschke A,Rolfes M,et al.A database of German emotional speech[C]∥Interspeech 2005.Baixas:ISCA,2005,5:1517-1520.
[20]	Schuller B.Towards intuitive speech interaction by the integration of emotional aspects[C]∥IEEE International Conference on Systems,Man and Cybernetics.Piscataway,NJ:IEEE Press,2002,6:6-12.
[21]	Engberg I S,Hansen A V.Documentation of the danish emotional speech database[R].Denmark:Aalborg University,1996.
[22]	Hansen J H L,Bou-Ghazale S E,Sarikaya R,et al.Getting started with SUSAS:A speech under simulated and actual stress database[C]∥EUROSPEECH 1997.Baixas:ISCA,1997,97(4):1743-1746.
[23]	Breazeal C,Aryananda L.Recognition of affective communicative intent in robot-directed speech[J].Autonomous Robots,2002,12(1):83-104.
[24]	Slaney M,Mcroberts G.BabyEars:A recognition system for affective vocalizations[J].Speech Communication,2003,39(3):367-384.
[25]	Wang M,Li Y,Lin M,et al.The development of a database of functional and emotional intonation in Chinese[C]∥International Conference on Speech Database and Assessments (Oriental COCOSDA 2011).Piscataway,NJ:IEEE Press,2011:136-141.
[26]	Li A J.Chinese prosody and prosodic labeling of spontaneous speech[C]∥International Conference on Speech Prosody 2002.Baixas:ISCA,2002.
[27]	刘亚斌.汉语自然口语的韵律分析和自动标注研究[D].北京:中国社会科学院研究生院,2003.Liu Y B.Prosodic analysis and automatic prosodic-labeling for Chinese spontaneous speech[D].Beijing:Graduate School of Chinese Academy of Social Sciences,2003(in Chinese).
[28]	Devillers L,Vidrascu L.Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs[C]∥Interspeech 2006.Baixas:ISCA 2006:801-804.
[29]	Truong K P,Neerincx M A,van Leeuwen D A.Assessing agreement of observer-and self-annotations in spontaneous multimodal emotion data[C]∥Interspeech 2008.Baixas:ISCA,2008:318-321.
[30]	Arimoto Y,Kawatsu H,Ohno S,et al.Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems[C]∥Ninth Annual Conference of the International Speech Communication Association.Baixas:ISCA,2008:322-325.
[31]	李爱军,陈肖霞,孙国华,等.CASS:一个具有语音学标注的汉语口语语音库[J].当代语言学,2002,4(2):81-89.Li A J,Chen X X,Sun G H,etal. CASS:A Chinese annotated spontaneous speech corpus[J].Contemporary Linguistics,2002,4(2):81-89(in Chinese).

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Get Citation

PDF

XML

Article Metrics

Article views(1189) PDF downloads(923)

Annotations and consistency detection for Chinese dual-mode emotional speech database

doi: 10.13700/j.bh.1001-5965.2014.0771

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Annotations and consistency detection for Chinese dual-mode emotional speech database

doi: 10.13700/j.bh.1001-5965.2014.0771

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content