Background:
The visual speech signal can provide sufficient information to support successful communication. However, individual differences in the ability to appreciate that information are large, ...and relatively little is known about their sources.
Purpose:
Here a body of research is reviewed regarding the development of a theoretical framework in which to study speechreading and individual differences in that ability. Based on the hypothesis that visual speech is processed via the same perceptual-cognitive machinery as auditory speech, a theoretical framework was developed by adapting a theoretical framework originally developed for auditory spoken word recognition.
Conclusion:
The evidence to date is consistent with the conclusion that visual spoken word recognition is achieved via a process similar to auditory word recognition provided differences in perceptual similarity are taken into account. Words perceptually similar to many other words and that occur infrequently in the input stream are at a distinct disadvantage within this process. The results to date are also consistent with the conclusion that deaf individuals, regardless of speechreading ability, recognize spoken words via a process similar to individuals with hearing.
Abstract A new pneumatic tactile stimulator, called the TAC-Cell, was developed in our laboratory to non-invasively deliver patterned cutaneous stimulation to the face and hand in order to study the ...neuromagnetic response adaptation patterns within the primary somatosensory cortex (S1) in young adult humans. Individual TAC-Cells were positioned on the glabrous surface of the right hand, and midline of the upper and lower lip vermilion. A 151-channel magnetoencephalography (MEG) scanner was used to record the cortical response to a novel tactile stimulus which consisted of a repeating 6-pulse train delivered at three different frequencies through the active membrane surface of the TAC-Cell. The evoked activity in S1 (contralateral for hand stimulation, and bilateral for lip stimulation) was characterized from the best-fit dipoles of the earliest prominent response component. The S1 responses manifested significant modulation and adaptation as a function of the frequency of the punctate pneumatic stimulus trains and stimulus site (glabrous lip versus glabrous hand).
The neighborhood activation model (NAM; P. A. Luce & Pisoni, 1998) of spoken word recognition was applied to the problem of predicting accuracy of visual spoken word identification. One hundred ...fifty-three spoken consonant-vowel-consonant words were identified by a group of 12 college-educated adults with normal hearing and a group of 12 college-educated deaf adults. In both groups, item identification accuracy was correlated with the computed NAM output values. Analysis of subsets of the stimulus set demonstrated that when stimulus intelligibility was controlled, words with fewer neighbors were easier to identify than words with many neighbors. However, when neighborhood density was controlled, variation in segmental intelligibility was minimally related to identification accuracy. The present study provides evidence of a common spoken word recognition system for both auditory and visual speech that retains sensitivity to the phonetic properties of the input.
This study examines relationships between external face movements, tongue movements, and speech acoustics for consonant-vowel (CV) syllables and sentences spoken by two male and two female talkers ...with different visual intelligibility ratings. The questions addressed are how relationships among measures vary by syllable, whether talkers who are more intelligible produce greater optical evidence of tongue movements, and how the results for CVs compared to those for sentences. Results show that the prediction of one data stream from another is better for C/a/ syllables than C/i/ and C/u/ syllables. Across the different places of articulation, lingual places result in better predictions of one data stream from another than do bilabial and glottal places. Results vary from talker to talker; interestingly, high rated intelligibility do not result in high predictions. In general, predictions for CV syllables are better than those for sentences.
This study investigated effects of short-term training/practice on group and individual differences in deaf and hearing speechreaders. In two experiments, participants speechread sentences with ...feedback during training and without feedback during testing, alternating 10 times over six sessions spanning up to 5 weeks. Testing used sentence sets balanced for expected mean performance. In each experiment, participants were adults who reported good speechreading and either normal hearing (n = 8) or severe to profound hearing impairments (n = 8). The experiments were replicates, except that in one participants received vibrotactile speech stimuli in addition to visible speech during training, testing whether vibrotactile speech enhances speechreading learning. Results showed that (a) training/practice did not alter the relative performance among individuals or groups; (b) significant learning occurred when training and testing were conducted with speechreading only (although the magnitude of the effect was small); and (c) there was evidence that the vibrotactile training depressed rather than raised speechreading scores over the training period.
Probabilistic phonotactics refers to the relative frequencies of segments and sequences of segments in spoken words. Neighborhood density refers to the number of words that are phonologically similar ...to a given word. Despite a positive correlation between phonotactic probability and neighborhood density, nonsense words with high probability segments and sequences are responded to more quickly than nonsense words with low probability segments and sequences, whereas real words occurring in dense similarity neighborhoods are responded to more slowly than real words occurring in sparse similarity neighborhoods. This contradiction may be resolved by hypothesizing that effects of probabilistic phonotactics have a sublexical focus and that effects of similarity neighborhood density have a lexical focus. The implications of this hypothesis for models of spoken word recognition are discussed.
To determine whether congenitally deafened adults achieve improved speech perception when auditory and visual speech information is available after cochlear implantation.
Repeated-measures single ...subject analysis of speech perception in visual-alone, auditory-alone, and audiovisual conditions.
Neurotologic private practice and research institute.
Eight subjects with profound congenital bilateral hearing loss who underwent cochlear implantation as adults (aged 18-55 years) between 1995 and 2002 and had at least 1 year of experience with the implant.
Auditory, visual, and audiovisual speech perception.
The median for speech perception scores were as follows: visual-alone, 25.9% (range, 12.7-58.1%); auditory-alone, 5.2% (range, 0-49.4%); and audiovisual, 50.7% (range, 16.5-90.8%). Seven of eight subjects did as well or better in the audiovisual condition than in either auditory-alone or visual-alone conditions alone. Three subjects had audiovisual scores greater than what would be expected from a simple additive effect of the information from the auditory-alone and visual-alone conditions alone, suggesting a superadditive effect of the combination of auditory-alone and visual-alone information. Three subjects had a simple additive effect of speech perception in the audiovisual condition.
Some congenitally deafened subjects who undergo implantation as adults have significant gains in speech perception when auditory information from a cochlear implant and visual information by lipreading is available. This study shows that some congenitally deafened adults are able to integrate auditory information provided by the cochlear implant (despite the lack of auditory speech experience before implantation) with visual speech information.
The locus coeruleus (LC), the origin of noradrenergic modulation of cognitive and behavioral function, may play an important role healthy ageing and in neurodegenerative conditions. We investigated ...the functional significance of age-related differences in mean normalized LC signal intensity values (LC-CR) in magnetization-transfer (MT) images from the Cambridge Centre for Ageing and Neuroscience (Cam-CAN) cohort - an open-access, population-based dataset. Using structural equation modelling, we tested the pre-registered hypothesis that putatively noradrenergic (NA)-dependent functions would be more strongly associated with LC-CR in older versus younger adults. A unidimensional model (within which LC-CR related to a single factor representing all cognitive and behavioral measures) was a better fit with the data than the a priori two-factor model (within which LC-CR related to separate NA-dependent and NA-independent factors). Our findings support the concept that age-related reduction of LC structural integrity is associated with impaired cognitive and behavioral function.
Neuromagnetic evoked fields were recorded to compare the adaptation of the primary somatosensory cortex (SI) response to tactile stimuli delivered to the glabrous skin at the fingertips of the first ...three digits (condition 1) and between midline upper and lower lips (condition 2). The stimulation paradigm allowed to characterize the response adaptation in the presence of functional integration of tactile stimuli from adjacent skin areas in each condition. At each stimulation site, cutaneous stimuli (50ms duration) were delivered in three runs, using trains of 6 pulses with regular stimulus onset asynchrony (SOA). The pulses were separated by SOAs of 500ms, 250ms or 125ms in each run, respectively, while the inter-train interval was fixed (5s) across runs. The evoked activity in SI (contralateral to the stimulated hand, and bilaterally for lips stimulation) was characterized from the best-fit dipoles of the response component peaking around 70ms for the hand stimulation, and 8ms earlier (on average) for the lips stimulation. The SOA-dependent long-term adaptation effects were assessed from the change in the amplitude of the responses to the first stimulus in each train. The short-term adaptation was characterized by the lifetime of an exponentially saturating model function fitted to the set of suppression ratios of the second relative to the first SI response in each train. Our results indicate: 1) the presence of a rate-dependent long-term adaptation effect induced only by the tactile stimulation of the digits; and 2) shorter recovery lifetimes for the digits compared with the lips stimulation.
Purpose: This study investigated the effects of external feedback on perceptual learning of visual speech during lipreading training with sentence stimuli. The goal was to improve visual-only (VO) ...speech recognition and increase accuracy of audiovisual (AV) speech recognition in noise. The rationale was that spoken word recognition depends on the accuracy of sublexical (phonemic/phonetic) speech perception; effective feedback during training must support sublexical perceptual learning. Method: Normal-hearing (NH) adults were assigned to one of three types of feedback: Sentence feedback was the entire sentence printed after responding to the stimulus. Word feedback was the correct response words and perceptually near but incorrect response words. Consonant feedback was correct response words and consonants in incorrect but perceptually near response words. Six training sessions were given. Pre- and posttraining testing included an untrained control group. Test stimuli were disyllable nonsense words for forced-choice consonant identification, and isolated words and sentences for open-set identification. Words and sentences were VO, AV, and audio-only (AO) with the audio in speech-shaped noise. Results: Lipreading accuracy increased during training. Pre- and posttraining tests of consonant identification showed no improvement beyond test-retest increases obtained by untrained controls. Isolated word recognition with a talker not seen during training showed that the control group improved more than the sentence group. Tests of untrained sentences showed that the consonant group significantly improved in all of the stimulus conditions (VO, AO, and AV). Its mean words correct scores increased by 9.2 percentage points for VO, 3.4 percentage points for AO, and 9.8 percentage points for AV stimuli. Conclusions: Consonant feedback during training with sentences stimuli significantly increased perceptual learning. The training generalized to untrained VO, AO, and AV sentence stimuli. Lipreading training has potential to significantly improve adults' face-to-face communication in noisy settings in which the talker can be seen.