Movements of the head and speech articulators have been observed in tandem during an alternating word pair production task driven by an accelerating rate metronome. Word pairs contrasted either onset ...or coda dissimilarity with same word controls. Results show that as production effort increased, so did speaker head nodding, and that nodding increased abruptly following errors. More errors occurred under faster production rates, and in coda rather than onset alternations. The greatest entrainment between head and articulators was observed at the fastest rate under coda alternation. Neither jaw coupling nor imposed prosodic stress was observed to be a primary driver of head movement. In alternating pairs, nodding frequency tracked the slower alternation rate rather than the syllable rate, interpreted as recruitment of additional degrees of freedom to stabilize the alternation pattern under increasing production rate pressure.
This article introduces theoretically driven acoustic measures of /s/ that reflect aerodynamic and articulatory conditions. The measures were evaluated by assessing whether they revealed expected ...changes over time and labiality effects, along with possible gender differences suggested by past work.
Productions of /s/ were extracted from various speaking tasks from typically speaking adolescents (6 boys, 6 girls). Measures were made of relative spectral energies in low- (550-3000 Hz), mid- (3000-7000 Hz), and high-frequency regions (7000-11025 Hz); the mid-frequency amplitude peak; and temporal changes in these parameters. Spectral moments were also obtained to permit comparison with existing work.
Spectral balance measures in low-mid and mid-high frequency bands varied over the time course of /s/, capturing the development of sibilance at mid-fricative along with showing some effects of gender and labiality. The mid-frequency spectral peak was significantly higher in nonlabial contexts, and in girls. Temporal variation in the mid-frequency peak differentiated ±labial contexts while normalizing over gender.
The measures showed expected patterns, supporting their validity. Comparison of these data with studies of adults suggests some developmental patterns that call for further study. The measures may also serve to differentiate some cases of typical and misarticulated /s/.
Gaze cues serve an important role in facilitating human conversations and are generally considered to be one of the most important non-verbal cues. Gaze cues are used to manage turn-taking, ...coordinate joint attention, regulate intimacy, and signal cognitive effort. In particular, it is well established that gaze aversion is used in conversations to avoid prolonged periods of mutual gaze. Given the numerous functions of gaze cues, there has been extensive work on modelling these cues in social robots. Researchers have also tried to identify the impact of robot gaze on human participants. However, the influence of robot gaze behavior on human gaze behavior has been less explored. We conducted a within-subjects user study (N = 33) to verify if a robot's gaze aversion influenced human gaze aversion behavior. Our results show that participants tend to avert their gaze more when the robot keeps staring at them as compared to when the robot exhibits well-timed gaze aversions. We interpret our findings in terms of intimacy regulation: humans try to compensate for the robot's lack of gaze aversion.
Intra-gestural and inter-gestural coordination in German word-initial consonant clusters /kl, kn, ks, pl, ps/ is investigated in four speakers by means of EMA as a function of segmental make-up and ...prosodic variation, i.e. prosodic boundary strength and lexical stress. Segmental make-up is shown to determine the extent of articulatory overlap of the clusters, with /kl/ exhibiting the highest degree, followed by /pl/, /ps/, /ks/ and finally /kn/. Prosodic variation does not alter this order. However, overlap is shown to be affected by lexical stress in /kl/ and /ps/ and by boundary strength in /pC/ clusters. This indicates that boundary effects on coordination are stronger for clusters with little inter-articulator dependence (e.g. lips + tongue tip in /pl/ vs. tongue back+tongue tip in /kl/). The results also show that the extent to which prosodic factors affect articulation interacts with the position of the affected segment in the sound sequence: In general, boundary strength strongly affects the cluster's first consonant while lexical stress influences the second consonant. This indicates that prosodic effects are strongest at their source (i.e. the boundary or the stressed nucleus) and decrease in strength with distance from their source. However, prosodic lengthening effects can reach the more distal consonant in clusters with a high degree of overlap and high inter-articulator dependence. Besides these aspects the discussion covers differences in measures of articulatory coordination.
•We analyze the coordination of word-initial consonant clusters in German.•Segmental make-up and prosodic condition are systematically varied.•Effects of boundary strength and of lexical stress are graded and decline with distance from the source.•Results show that sensitivity to prosodic variation depends on segmental make-up.
Coarticulation and invariance are two topics at the center of theorizing about speech production and speech perception. In this paper, a quantitative scale is proposed that places coarticulation and ...invariance at the two ends of the scale. This scale is based on physical information flow in the articulatory signal, and uses Information Theory, especially the concept of mutual information, to quantify these central concepts of speech research. Mutual Information measures the amount of physical information shared across phonological units. In the proposed quantitative scale, coarticulation corresponds to greater and invariance to lesser information sharing. The measurement scale is tested by data from three languages: German, Catalan, and English. The relation between the proposed scale and several existing theories of coarticulation is discussed, and implications for existing theories of speech production and perception are presented.
This study uses acoustic and physiological measures to compare laryngeal reflexes of global changes in vocal effort to the effects of modulating such aspects of linguistic prominence as sentence ...accent, induced by focus variation, and word stress. Seven speakers were recorded by using a laryngograph. The laryngographic pulses were preprocessed to normalize time and amplitude. The laryngographic pulse shape was quantified using open and skewness quotients and also by applying a functional version of the principal component analysis. Acoustic measures included the acoustic open quotient and spectral balance in the vowel /e/ during the test syllable. The open quotient and the laryngographic pulse shape indicated a significantly shorter open phase for loud speech than for soft speech. Similar results were found for lexical stress, suggesting that lexical stress and loud speech are produced with a similar voice source mechanism. Stressed syllables were distinguished from unstressed syllables by their open phase and pulse shape, even in the absence of sentence accent. Evidence for laryngeal involvement in signaling focus, independent of fundamental frequency changes, was not as consistent across speakers. Acoustic results on various spectral balance measures were generally much less consistent compared to results from laryngographic data.
This study compares the time to initiate words with varying syllable structures (V, VC, CV, CVC, CCV, CCVC). In order to test the hypothesis that different syllable structures require different ...amounts of time to prepare their temporal controls, or plans, two delayed naming experiments were carried out. In the first of these the initiation time was determined from acoustic recordings. The results confirmed the hypothesis but also showed an interaction with the initial segment (i.e., vowel-initial words were initiated later than words beginning with consonants, but this difference was much smaller for words starting stops compared to /l/ or /s/). Adding a coda did not affect the initiation time. In order to rule out effects of segment-specific articulatory to acoustic interval differences, a second experiment was performed in which speech movements of the tongue, the jaw and the lips were recorded by means of electromagnetic articulography. Results from initiation time, based on articulatory measurements, showed a significant syllable structure effect with VC words being initiated significantly later than CV(C) words. Only minor effects of the initial segment were found. These results can be partly explained by the amount of accumulated experience a speaker has in coordinating the relevant gesture combinations and triggering them appropriately in time.
► Planning time for varying syllable structures compared in a delayed naming task. ► Measurements made using acoustics and articulatory kinematics. ► Onsetless syllables (V,VC) take longer to plan than those with onsets (e.g. CV, CVC). ► Inclusion of coda does not affect planning time. ► Planning time decreases inversely with phonotactic probability.
This study investigated whether speakers adapt their breathing and speech (fundamental frequency
) to a prerecorded confederate who is sitting or moving under different levels of physical effort and ...who is either speaking or not. Following Paccalin and Jeannerod (2000), we would expect breathing rate to change in the direction of the confederate's, even if the participant is physically inactive. This might in turn affect their speech acoustics.
We recorded the speech and respiration of 22 native German speakers. They produced solo and synchronous read speech in interaction with a confederate who appeared on a prerecorded video. There were three within-subject experimental conditions: the confederate (a) sitting, (b) biking with light effort, or (c) biking with heavier effort.
During speech, the confederate's inhalation amplitude and
increased with physical effort, as expected. Her breath cycle duration changed differently, probably because of read speech constraints. Overall, the only adaptation the participants showed was higher
with increase in the confederate's physical effort during synchronous, but not solo, speech. Additionally, they produced shallower inhalations when observing the confederate biking in silence, as compared to the condition without movement. Crucially, the participants' acoustic and breathing data showed large interindividual variability.
Our findings indicate that, in this paradigm, convergence only took place on
during synchronous speech and that this phonetic adaptation happened independently from any speech breathing adaptation. It also suggests that participants may adapt their quiet breathing while watching a person performing physical exercise but that the mechanism is more complex than that explained previously.
In the description of German phonology, two distinct phonetic symbols are currently recommended for the transcription of the vowels a (a central low vowel, phonemically /a/) and ɐ (phonemically /əʁ/) ...in word-final, unstressed positions. The present study examines whether differences between these two vowels exist in production and perception of Standard German speakers from the north of Germany. In Experiment 1, six speakers produced a series of minimal pairs that were embedded in meaningful sentences and varied with respect to their accentuation and position within a prosodic phrase. In Experiment 2, the minimal pairs produced by the six speakers of the first experiment were extracted from their respective contexts and tested with 44 native German listeners in a forced-choice identification task. Perceptual results showed a better-than-chance performance for one male speaker of the corpus only. Phonetic analyses also confirmed that only this male speaker produced subtle, but consistent F2/F3 differences between a and ɐ while the contrast was completely neutralised in the rest of the corpus. We discuss the role of prosody in vowel neutralisation with a specific focus on unstressed vowels and make suggestions for phonetic and phonological accounts of Standard German.
Research on language use has become increasingly interested in the multimodal and interactional aspects of language – theoretical models of dialogue, such as the Communication Accommodation Theory ...and the Interactive Alignment Model are examples of this. In addition, researchers have started to give more consideration to the relationship between physiological processes and language use. This article aims to contribute to the advancement in studies of physiological and/or multimodal language use in naturalistic settings. It does so by providing methodological recommendations for such multi‐speaker experimental designs. It covers the topics of (a) speaker preparation and logistics, (b) experimental tasks and (c) data synchronisation and post‐processing. The types of data that will be considered in further detail include audio and video, electroencephalography, respiratory data and electromagnetic articulography. This overview with recommendations is based on the answers to a questionnaire that was sent amongst the members of the Horizon 2020 research network ‘Conversational Brains’, several researchers in the field and interviews with three additional experts.