The Emotion in Speech Project: Papers

The Emotion in Speech Project: Papers (most recent first)

Greasley, P.; Sherrard, C. & Waterman, M. (Forthcoming). "Emotion in speech: Methodological issues in naturalistic approaches". To appear in Language and Speech.

Stibbard, R.M. (2001). Vocal Expression of Emotions in Non-laboratory Speech: An Investigation of the Reading/Leeds Emotion in Speech Project Annotation Data". Unpublished PhD thesis. University of Reading.

Abstract: Findings are reported from the annotations made for a previously established project, the Reading/Leeds Emotion in Speech Project, which aimed to create a database of genuine emotional speech, to annotate it perceptually for emotions, intonation, and paralinguistic features, and to establish co-occurrences between the emotion and speech-related annotations. This thesis examines the hypothesis that such co-occurrences exist.
Shortcomings of the project's conception and methodology are identified. Two emotion annotation systems were used, the one involving too many emotion descriptors for replication to be achieved, the other too few to describe the data adequately.
The paralinguistic annotations suffered from well-known shortcomings, including a lack of agreement as to the level of delicacy to be achieved, the lack of objectively defined terminology, and the impossibility of transcribing enough data for statistical analysis. By the close of the project, little emotion-specific analysis of the annotations had been achieved and no method of doing so had been developed.
The present study used automated processing which facilitated the analysis of this data in a way not achieved in previous work of a similar kind. However, little evidence was found of a systematic link between the phonetic and the psychological annotations.
Recommendations for future research include the development of a more appropriate system for the classification of emotions in speech, the development of a methodologically less problematic description of speech-related features, and the expansion of the data collected to include relevant non-phonetic factors including contextual and inter-personal information.
The thesis concludes by arguing further that the attempt to discover a speaker-independent relationship between speech sounds and emotions is ill-conceived, that speech sounds do not alone carry systematic, reliable cues sufficient to differentiate emotions, and that the vocal cues to emotional expression are more context-bound and specific to particular types of interaction than has previously been thought.

Roach, P. (2000). "Techniques for the phonetic description of emotional speech". Proceedings of the ISCA Workshop on Speech and Emotion. Newcastle, Northern Ireland. September 2000. 53-59.

Abstract: It is inconceivable that there could be information present in the speech signal that could be detected by the human auditory system but which is not accessible to acoustic analysis and phonetic categorisation. We know that humans can reliably recognise a range of emotions produced by speakers of their own language on the basis of the acoustic signal alone, yet it appears that our ability to identify the relevant acoustic correlates is at present rather limited. This paper proposes that we have to build a bridge between the human perceptual experience and the measurable properties of the acoustic signal by developing an analytic framework based partly on auditory analysis. A possible framework is outlined which is based on the work of the Reading/Leeds Emotional Speech Database. The project was funded by ESRC Grant no. R000235285.

Stibbard, R.M. (2000) "Automated extraction of ToBI annotation data from the Reading/Leeds Emotional Speech Corpus." In Cowie, R.; Douglas-Cowie, E & Schröder, M. (Eds.). (2000). Proceedings of the ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research, 60-65.

Abstract This paper reports on computational tools developed for the automatic extraction of numerical and audio data from the Reading/Leeds Emotion in Speech Corpus and presents findings concerning the distribution of ToBI Terminal Tone Contours across the emotions annotated.

Roach, P.; Stibbard, R.; Osborne, J.; Arnfield, S. & Setter, J. (1998). "Transcription of prosodic and paralinguistic fatures of emotional speech". Journal of the International Phonetic Association, 28, 83-94.

Abstract: A study of emotional speech has resulted in a collection of some five hours of recorded material. The analysis of this material has required computer-based annotation incorporating prosodic and paralinguistic transcription as well as the coding of various psychological variables. A version of the prosodic and paralinguistic transcription devised by Crystal & Quirk was developed for use within the xwaves environment. This paper describes this transcription system and its application.

Greasley, P.; Sherrard, C.; Waterman, M.; Setter, J.; Roach, P.; Arnfield, S.; Horton, D. (1996). "The perception of emotion in speech", Abstracted in International Journal of Psychology, 31 (3/4), 406.

Abstract: Research into the perception of emotion in speech has focused on portrayals of emotions by actors. A series of four experiments is reported in which subjects listened to ninety-one episodes of naturally occurring emotional speech. Subjects were required to judge the emotions expressed by: 1) using a word of their own choice; 2) selection from a list of 22 'emotion types'; 3) selection from a list of 5 'basic emotions'. Analysis of the results showed that naturally occurring emotional speech presents a much more complex picture of emotion perception than that found in studies using actor portrayals of emotion.

Greasley, P., Setter, J., Waterman, M., Sherrard, C., Roach, P., Arnfield, S., and Horton, D. (1995). "Representation of prosodic and emotional features in a spoken language database". Proceedings of the 13th International Congress of Phonetic Sciences. Stockholm. 242-245.

Abstract: Reports on research in progress on the Emotion in Speech project, in which the ToBI transcription system is used to represent prosodic information and additional transcription tiers have been created to code emotional speech according to the emotional lexicon, affective valence and cognitive appraisals. Recently, work has been started on representing features such as tempo. The aim is to produce a database of fully labelled emotional speech.

Arnfield, S., Roach, P., Setter, J., Greasley, P., and Horton, D. (1995). "Emotional stress and speech tempo variation". Proceedings of ESCA-NATO Tutorial and Research Workshop on Speech Under Stress. Lisbon. 13-15.

Abstract: In studying the effect of emotional stress on speech, it is necessary to give attention to variations in speaking tempo. Recordings made for the Emotion in Speech Project are being analysed both for prosodic and for emotional content. A simple measurement procedure for speech tempo is being used to look for cases of acceleration and deceleration in conditions of emotional stress.

Professor Peter Roach (P.J.Roach@reading.ac.uk)

June 2001