This study investigated the role of orthographic information in the acquisition of non-native speech sounds by monolingual English listeners. Two potentially important orthographic variables were ...explored: Orthographic compatibility (whether the orthographic information supports or contradicts the distributional information) and orthographic familiarity (whether the native and target languages share the same orthography). Ten groups of learners were trained on either a unimodal or bimodal distribution of two length continua. Out of the 10 groups, eight groups were also exposed to orthographic cues that varied in their compatibility with the distributional information (compatible vs. incompatible) and familiarity with the orthography of learners’ native language (Roman vs. Arabic). Following training, all participants performed an AX discrimination task to test their discrimination of the length contrast. The results revealed that, in general, the availability of either familiar or unfamiliar orthographic input which signaled the existence of a single length category significantly lowered learners’ discrimination of the length contrast regardless of the auditory distribution. Further, the exposure to orthographic input that supported a two-category length distinction enhanced the discrimination of the length contrast irrespective of the distribution. However, the most significant improvement occurred when both distributional information and familiar orthographic input were compatible. Overall, these findings indicate that orthographic input, regardless of its level of compatibility or familiarity, may influence the acquisition of non-native speech sounds.
This study constitutes an investigation into the acoustic variability of intervocalic alveolar taps in a corpus of spontaneous speech from Madrid, Spain. Substantial variability was documented in ...this segment, with highly reduced variants constituting roughly half of all tokens during spectrographic inspection. In addition to qualitative documentation, the intensity difference between the tap and surrounding vowels was measured. Changes in this intensity difference were statistically modeled using Bayesian finite mixture models containing lexical and phonetic predictors. Model comparisons indicate predictive performance is improved when we assume two latent categories, interpreted as two pronunciation variants for the Spanish tap. In interpreting the model, predictors were more often related to categorical changes in which pronunciation variant was produced than to gradient intensity changes within each tap type. Variability in tap production was found according to lexical frequency, speech rate, and phonetic environment. These results underscore the importance of evaluating model fit to the data as well as what researchers modeling phonetic variability can gain in moving past linear models when they do not adequately fit the observed data.
•We compared native English and non-native (Dutch) Lombard and plain speech.•Canadian, Dutch and Spanish listeners participated in an intelligibility experiment.•Both native and non-native Lombard ...speech resulted in a Lombard benefit.•A Lombard benefit was evident for both native and non-native listeners.•Energetic masking release partially accounted for the Lombard benefit.
Speech produced in noise (Lombard speech) is more intelligible than speech produced in quiet (plain speech). Previous research on the Lombard intelligibility benefit focused almost entirely on how native speakers produce and perceive Lombard speech. In this study, we investigate the size of the Lombard intelligibility benefit of both native (American-English) and non-native (native Dutch) English for native and non-native listeners (Dutch and Spanish). We used a glimpsing metric to measure the energetic masking potential of speech, which predicted that both native and non-native Lombard speech could withstand greater amounts of masking to a similar extent, compared to plain speech. In an intelligibility experiment, native English, Spanish, and Dutch listeners listened to the same words, mixed with noise. While the non-native listeners appeared to benefit more from Lombard speech than the native listeners did, each listener group experienced a similar benefit for native and non-native Lombard speech. Energetic masking, as captured by the glimpsing metric, only accounted for part of the Lombard benefit, indicating that the Lombard intelligibility benefit does not only result from a shift in spectral distribution. Despite subtle native language influences on non-native Lombard speech, both native and non-native speech provides a Lombard benefit.
Reconsidering classic ideas in speech communication Winn, Matthew B; Wright, Richard A; Tucker, Benjamin V
The Journal of the Acoustical Society of America,
03/2023, Letnik:
153, Številka:
3
Journal Article
Recenzirano
Odprti dostop
The papers in this special issue provide a critical look at some historical ideas that have had an influence on research and teaching in the field of speech communication. They also address widely ...used methodologies or address long-standing methodological challenges in the areas of speech perception and speech production. The goal is to reconsider and evaluate the need for caution or replacement of historical ideas with more modern results and methods. The contributions provide respectful historical context to the classic ideas, as well as new original research or discussion that clarifies the limitations of the original ideas.
Variability is perhaps the most notable characteristic of speech, and it is particularly noticeable in spontaneous conversational speech. The current research examines how speakers realize the ...American English stops /p, k, b, g/ and flaps (ɾ from /t, d/), in casual conversation and in careful speech. Target consonants appear after stressed syllables (e.g., "lobby") or between unstressed syllables (e.g., "humanity"), in one of six segmental/word-boundary environments. This work documents the degree and types of variability listeners encounter and must parse. Findings show greater reduction in connected and spontaneous speech, greater reduction in high frequency phrases (but not within high frequency words), and greater reduction between unstressed syllables than after a stress. Although highly reduced productions of stops and flaps occur often, with approximant-like tokens even in careful speech, reduction does not lead to a large amount of overlap between phonological categories. Approximant-like realizations of expected stops and flaps in some conditions constitute the majority of tokens. This shows that reduced speech is something that listeners encounter, and must perceive, in a large proportion of the speech they hear.
We present a series of computational simulations of the auditory lexical decision task using the jTRACE and TISK models of spoken word recognition. Simulation 1 replicates high accuracy in word ...recognition and similar performance of these models using the small, default dictionary. Simulation 2 expands the set of words and phonemes, leading to issues in representing certain phonemes in jTRACE. Simulation 3 expands the lexicon of competitors and we find that TISK struggles to select the target word as the winner. Finally, Simulation 4 shows that the decision criteria employed leads to many false positives when pseudowords are presented to the model. None of the model estimates of the time cycle when the winner should be selected predicted participant response latency in the auditory lexical decision task. We discuss these findings and offer suggestions as to what a contemporary model of spoken word recognition should be able to do.
Using phonological neighborhood density has been a common method to quantify lexical competition. It is useful and convenient but has shortcomings that are worth reconsidering. The present study ...quantifies the effects of lexical competition during spoken word recognition using acoustic distance and acoustic absement rather than phonological neighborhood density. The indication of a word's lexical competition is given by what is termed to be its acoustic distinctiveness, which is taken as its average acoustic absement to all words in the lexicon. A variety of acoustic representations for items in the lexicon are analyzed. Statistical modeling shows that acoustic distinctiveness has a similar effect trend as that of phonological neighborhood density. Additionally, acoustic distinctiveness consistently increases model fitness more than phonological neighborhood density regardless of which kind of acoustic representation is used. However, acoustic distinctiveness does not seem to explain all of the same things as phonological neighborhood density. The different areas that these two predictors explain are discussed in addition to the potential theoretical implications of the usefulness of acoustic distinctiveness in the models. The present paper concludes with some reasons why a researcher may want to use acoustic distinctiveness over phonological neighborhood density in future experiments.
The Massive Auditory Lexical Decision (MALD) database is an end-to-end, freely available auditory and production data set for speech and psycholinguistic research, providing time-aligned stimulus ...recordings for 26,793 words and 9592 pseudowords, and response data for 227,179 auditory lexical decisions from 231 unique monolingual English listeners. In addition to the experimental data, we provide many precompiled listener- and item-level descriptor variables. This data set makes it easy to explore responses, build and test theories, and compare a wide range of models. We present summary statistics and analyses.
Studies have shown that the voice onset time (VOT) of alveolo-palatal affricates is the longest, followed by velars, dental/alveolars, and bilabials. In a reciprocal pattern, closure duration is the ...longest for bilabials, followed by dental/alveolars, and then velars. Longer VOT is also associated with high and front vowels and tones with rising components. Moreover, the VOT of voiceless unaspirated stops is reported to be longer and closure duration shorter in nasal words. Finally, the voiceless interval has been described as constant in some languages and inconstant in others. Given the evidence of previous research, this study investigates the effects of place, nasality, tone, and vowel quality on the VOT, closure duration, and voiceless interval of the voiced and voiceless obstruents of Northern Pwo Karen (N. Pwo), a language of Thailand. N. Pwo (ISO 639-3 pww) is a ‘true voicing’ language with a three-way distinction in stops, voiceless aspirated and unaspirated affricates, oral and nasal vowels, and six tones (four modal tones and two glottalized tones). In N. Pwo, the place effects on VOT and closure duration pattern reciprocally. Whereas, both VOT and the voiceless interval are longer before oral vowels compared to nasal vowels. VOT is longest before the mid tone, which has a slight rise, while it is the shortest before the falling-glottalized tone. This pattern is reversed for the closure duration of aspirates and voiced stops. Finally, VOT, closure duration, and the voiceless interval are the longest before high and front vowels.