Petra Wagner, Universität Bielefeld

Exploiting the speech-gesture link to capture fine-grained prosodic prominence impressions and unveil individual processing strategies


2016. szeptember 15.

In recent work [1], we have suggested a novel approach for fine-grained and fast prominence annotation by naive listeners. Our approach relies on annotators’ “drummed” replications of a perceived utterance, modulating their drumming velocity in accordance with the perceptual prominence of consecutive linguistic units (syllables, words). The drumming velocity is used as a fine-grained operationalization of prosodic prominence. This intuitive method exploits the established link between prominence and speech-accompanying gestures [2, 3]. Due to its speed and ease, it allows for the rapid annotation of large amounts of data and yields results that are comparable to established annotation methods [4]. Our results also show that “drummed” prominences capture intra-sentential prosodic variability, e.g. caused by speakers' individual interpretations of information structure or speaking style, similarly to fine-grained expert annotations.  

As the method allows to access naive listeners' individual prominence impressions on a larger scale, we investigate prosodic processing strategies with the help of Random Forest regression models. These allow for estimating the individual impact of established linguistic (lexical stress, lexical class, syllable weight), acoustic (duration, F0) and contextual (stress clash) prominence correlates on prominence impressions. Our analyses unveil individual processing strategies for blending and integrating top-down and bottom-up cues into impressions of word level prominence. However, they also reveal a lot of inter-annotator agreement in weighing cues when drumming syllable prominence.  

[1] B. Samlowski and P. Wagner, “Promdrum — exploiting the prosody-gesture link for intuitive, fast and finegrained prominence annotation,” in Proceedings of Speech Prosody 2016, 2016, p. p5.06.

[2] P. Wagner, Z. Malisz, and S. Kopp, “Speech and gesture in interaction: an overview,” Speech Communication, vol. 57, pp. 209–232, 2014.

[3] B. Parrell, L. Goldstein, S. Lee, and D. Byrd, “Spatiotemporal coupling between speech and manual motor actions,” Journal of Phonetics, vol. 42, pp. 1—11, 2014.

[4] J. Cole, Y. Mo, and M. Hasegawa-Johnson, “Signal-based and expectation based factors in the perception of prosodic prominence,” Journal of Laboratory Phonology, vol. 1, 2010, pp. 425–452.