首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Research has shown that auditory speech recognition is influenced by the appearance of a talker's face, but the actual nature of this visual information has yet to be established. Here, we report three experiments that investigated visual and audiovisual speech recognition using color, gray-scale, and point-light talking faces (which allowed comparison with the influence of isolated kinematic information). Auditory and visual forms of the syllables /ba/, /bi/, /ga/, /gi/, /va/, and /vi/ were used to produce auditory, visual, congruent, and incongruent audiovisual speech stimuli. Visual speech identification and visual influences on identifying the auditory components of congruent and incongruent audiovisual speech were identical for color and gray-scale faces and were much greater than for point-light faces. These results indicate that luminance, rather than color, underlies visual and audiovisual speech perception and that this information is more than the kinematic information provided by point-light faces. Implications for processing visual and audiovisual speech are discussed.  相似文献   

2.
Seeing a talker's face influences auditory speech recognition, but the visible input essential for this influence has yet to be established. Using a new seamless editing technique, the authors examined effects of restricting visible movement to oral or extraoral areas of a talking face. In Experiment 1, visual speech identification and visual influences on identifying auditory speech were compared across displays in which the whole face moved, the oral area moved, or the extraoral area moved. Visual speech influences on auditory speech recognition were substantial and unchanging across whole-face and oral-movement displays. However, extraoral movement also influenced identification of visual and audiovisual speech. Experiments 2 and 3 demonstrated that these results are dependent on intact and upright facial contexts, but only with extraoral movement displays.  相似文献   

3.
Audiovisual integration (AVI) has been demonstrated to play a major role in speech comprehension. Previous research suggests that AVI in speech comprehension tolerates a temporal window of audiovisual asynchrony. However, few studies have employed audiovisual presentation to investigate AVI in person recognition. Here, participants completed an audiovisual voice familiarity task in which the synchrony of the auditory and visual stimuli was manipulated, and in which visual speaker identity could be corresponding or noncorresponding to the voice. Recognition of personally familiar voices systematically improved when corresponding visual speakers were presented near synchrony or with slight auditory lag. Moreover, when faces of different familiarity were presented with a voice, recognition accuracy suffered at near synchrony to slight auditory lag only. These results provide the first evidence for a temporal window for AVI in person recognition between approximately 100 ms auditory lead and 300 ms auditory lag.  相似文献   

4.
In noisy situations, visual information plays a critical role in the success of speech communication: listeners are better able to understand speech when they can see the speaker. Visual influence on auditory speech perception is also observed in the McGurk effect, in which discrepant visual information alters listeners’ auditory perception of a spoken syllable. When hearing /ba/ while seeing a person saying /ga/, for example, listeners may report hearing /da/. Because these two phenomena have been assumed to arise from a common integration mechanism, the McGurk effect has often been used as a measure of audiovisual integration in speech perception. In this study, we test whether this assumed relationship exists within individual listeners. We measured participants’ susceptibility to the McGurk illusion as well as their ability to identify sentences in noise across a range of signal-to-noise ratios in audio-only and audiovisual modalities. Our results do not show a relationship between listeners’ McGurk susceptibility and their ability to use visual cues to understand spoken sentences in noise, suggesting that McGurk susceptibility may not be a valid measure of audiovisual integration in everyday speech processing.  相似文献   

5.
Speech perception is audiovisual, as demonstrated by the McGurk effect in which discrepant visual speech alters the auditory speech percept. We studied the role of visual attention in audiovisual speech perception by measuring the McGurk effect in two conditions. In the baseline condition, attention was focused on the talking face. In the distracted attention condition, subjects ignored the face and attended to a visual distractor, which was a leaf moving across the face. The McGurk effect was weaker in the latter condition, indicating that visual attention modulated audiovisual speech perception. This modulation may occur at an early, unisensory processing stage, or it may be due to changes at the stage where auditory and visual information is integrated. We investigated this issue by conventional statistical testing, and by fitting the Fuzzy Logical Model of Perception (Massaro, 1998) to the results. The two methods suggested different interpretations, revealing a paradox in the current methods of analysis.  相似文献   

6.
Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and temporal modulations in the 2–7 Hz range are of particular importance. Dyslexic individuals have specific problems in perceiving speech envelope cues. In the current study, we used an audiovisual noise-vocoded speech task to investigate the contribution of low-frequency visual information to intelligibility of 4-channel and 16-channel noise vocoded speech in participants with and without dyslexia. For the 4-channel speech, noise vocoding preserves amplitude information that is entirely congruent with dynamic visual information. All participants were significantly more accurate with 4-channel speech when visual information was present, even when this information was purely spatio-temporal (pixelated stimuli changing in luminance). Possible underlying mechanisms are discussed.  相似文献   

7.
8.
Vatakis A  Spence C 《Perception》2008,37(1):143-160
Research has shown that inversion is more detrimental to the perception of faces than to the perception of other types of visual stimuli. Inverting a face results in an impairment of configural information processing that leads to slowed early face processing and reduced accuracy when performance is tested in face recognition tasks. We investigated the effects of inverting speech and non-speech stimuli on audiovisual temporal perception. Upright and inverted audiovisual video clips of a person uttering syllables (experiments 1 and 2), playing musical notes on a piano (experiment 3), or a rhesus monkey producing vocalisations (experiment 4) were presented. Participants made unspeeded temporal-order judgments regarding which modality stream (auditory or visual) appeared to have been presented first. Inverting the visual stream did not have any effect on the sensitivity of temporal discrimination responses in any of the four experiments, thus implying that audiovisual temporal integration is resilient to the effects of orientation in the picture plane. By contrast, the point of subjective simultaneity differed significantly as a function of orientation only for the audiovisual speech stimuli but not for the non-speech stimuli or monkey calls. That is, smaller auditory leads were required for the inverted than for the upright-visual speech stimuli. These results are consistent with the longer processing latencies reported previously when human faces are inverted and demonstrates that the temporal perception of dynamic audiovisual speech can be modulated by changes in the physical properties of the visual speech (ie by changes in orientation).  相似文献   

9.
We propose a measure of audiovisual speech integration that takes into account accuracy and response times. This measure should prove beneficial for researchers investigating multisensory speech recognition, since it relates to normal-hearing and aging populations. As an example, age-related sensory decline influences both the rate at which one processes information and the ability to utilize cues from different sensory modalities. Our function assesses integration when both auditory and visual information are available, by comparing performance on these audiovisual trials with theoretical predictions for performance under the assumptions of parallel, independent self-terminating processing of single-modality inputs. We provide example data from an audiovisual identification experiment and discuss applications for measuring audiovisual integration skills across the life span.  相似文献   

10.
McCotter MV  Jordan TR 《Perception》2003,32(8):921-936
We conducted four experiments to investigate the role of colour and luminance information in visual and audiovisual speech perception. In experiments 1a (stimuli presented in quiet conditions) and 1b (stimuli presented in auditory noise), face display types comprised naturalistic colour (NC), grey-scale (GS), and luminance inverted (LI) faces. In experiments 2a (quiet) and 2b (noise), face display types comprised NC, colour inverted (CI), LI, and colour and luminance inverted (CLI) faces. Six syllables and twenty-two words were used to produce auditory and visual speech stimuli. Auditory and visual signals were combined to produce congruent and incongruent audiovisual speech stimuli. Experiments 1a and 1b showed that perception of visual speech, and its influence on identifying the auditory components of congruent and incongruent audiovisual speech, was less for LI than for either NC or GS faces, which produced identical results. Experiments 2a and 2b showed that perception of visual speech, and influences on perception of incongruent auditory speech, was less for LI and CLI faces than for NC and CI faces (which produced identical patterns of performance). Our findings for NC and CI faces suggest that colour is not critical for perception of visual and audiovisual speech. The effect of luminance inversion on performance accuracy was relatively small (5%), which suggests that the luminance information preserved in LI faces is important for the processing of visual and audiovisual speech.  相似文献   

11.
Speech unfolds over time, and the cues for even a single phoneme are rarely available simultaneously. Consequently, to recognize a single phoneme, listeners must integrate material over several hundred milliseconds. Prior work contrasts two accounts: (a) a memory buffer account in which listeners accumulate auditory information in memory and only access higher level representations (i.e., lexical representations) when sufficient information has arrived; and (b) an immediate integration scheme in which lexical representations can be partially activated on the basis of early cues and then updated when more information arises. These studies have uniformly shown evidence for immediate integration for a variety of phonetic distinctions. We attempted to extend this to fricatives, a class of speech sounds which requires not only temporal integration of asynchronous cues (the frication, followed by the formant transitions 150–350 ms later), but also integration across different frequency bands and compensation for contextual factors like coarticulation. Eye movements in the visual world paradigm showed clear evidence for a memory buffer. Results were replicated in five experiments, ruling out methodological factors and tying the release of the buffer to the onset of the vowel. These findings support a general auditory account for speech by suggesting that the acoustic nature of particular speech sounds may have large effects on how they are processed. It also has major implications for theories of auditory and speech perception by raising the possibility of an encapsulated memory buffer in early auditory processing.  相似文献   

12.
The authors investigated the effects of changes in horizontal viewing angle on visual and audiovisual speech recognition in 4 experiments, using a talker's face viewed full face, three quarters, and in profile. When only experimental items were shown (Experiments 1 and 2), identification of unimodal visual speech and visual speech influences on congruent and incongruent auditory speech were unaffected by viewing angle changes. However, when experimental items were intermingled with distractor items (Experiments 3 and 4), identification of unimodal visual speech decreased with profile views, whereas visual speech influences on congruent and incongruent auditory speech remained unaffected by viewing angle changes. These findings indicate that audiovisual speech recognition withstands substantial changes in horizontal viewing angle, but explicit identification of visual speech is less robust. Implications of this distinction for understanding the processes underlying visual and audiovisual speech recognition are discussed.  相似文献   

13.
The multistable perception of speech, or verbal transformation effect, refers to perceptual changes experienced while listening to a speech form that is repeated rapidly and continuously. In order to test whether visual information from the speaker's articulatory gestures may modify the emergence and stability of verbal auditory percepts, subjects were instructed to report any perceptual changes during unimodal, audiovisual, and incongruent audiovisual presentations of distinct repeated syllables. In a first experiment, the perceptual stability of reported auditory percepts was significantly modulated by the modality of presentation. In a second experiment, when audiovisual stimuli consisting of a stable audio track dubbed with a video track that alternated between congruent and incongruent stimuli were presented, a strong correlation between the timing of perceptual transitions and the timing of video switches was found. Finally, a third experiment showed that the vocal tract opening onset event provided by the visual input could play the role of a bootstrap mechanism in the search for transformations. Altogether, these results demonstrate the capacity of visual information to control the multistable perception of speech in its phonetic content and temporal course. The verbal transformation effect thus provides a useful experimental paradigm to explore audiovisual interactions in speech perception.  相似文献   

14.
When participants judge multimodal audiovisual stimuli, the auditory information strongly dominates temporal judgments, whereas the visual information dominates spatial judgments. However, temporal judgments are not independent of spatial features. For example, in the kappa effect, the time interval between two marker stimuli appears longer when they originate from spatially distant sources rather than from the same source. We investigated the kappa effect for auditory markers presented with accompanying irrelevant visual stimuli. The spatial sources of the markers were varied such that they were either congruent or incongruent across modalities. In two experiments, we demonstrated that the spatial layout of the visual stimuli affected perceived auditory interval duration. This effect occurred although the visual stimuli were designated to be task-irrelevant for the duration reproduction task in Experiment 1, and even when the visual stimuli did not contain sufficient temporal information to perform a two-interval comparison task in Experiment 2. We conclude that the visual and auditory marker stimuli were integrated into a combined multisensory percept containing temporal as well as task-irrelevant spatial aspects of the stimulation. Through this multisensory integration process, visuospatial information affected even temporal judgments, which are typically dominated by the auditory modality.  相似文献   

15.
Emotional expression and how it is lateralized across the two sides of the face may influence how we detect audiovisual speech. To investigate how these components interact we conducted experiments comparing the perception of sentences expressed with happy, sad, and neutral emotions. In addition we isolated the facial asymmetries for affective and speech processing by independently testing the two sides of a talker's face. These asymmetrical differences were exaggerated using dynamic facial chimeras in which left- or right-face halves were paired with their mirror image during speech production. Results suggest that there are facial asymmetries in audiovisual speech such that the right side of the face and right-facial chimeras supported better speech perception than their left-face counterparts. Affective information was also found to be critical in that happy expressions tended to improve speech performance on both sides of the face relative to all other emotions, whereas sad emotions generally inhibited visual speech information, particularly from the left side of the face. The results suggest that approach information may facilitate visual and auditory speech detection.  相似文献   

16.
Successful communication in everyday life crucially involves the processing of auditory and visual components of speech. Viewing our interlocutor and processing visual components of speech facilitates speech processing by triggering auditory processing. Auditory phoneme processing, analyzed by event‐related brain potentials (ERP), has been shown to be associated with impairments in reading and spelling (i.e. developmental dyslexia), but visual aspects of phoneme processing have not been investigated in individuals with such deficits. The present study analyzed the passive visual Mismatch Response (vMMR) in school children with and without developmental dyslexia in response to video‐recorded mouth movements pronouncing syllables silently. Our results reveal that both groups of children showed processing of visual speech stimuli, but with different scalp distribution. Children without developmental dyslexia showed a vMMR with typical posterior distribution. In contrast, children with developmental dyslexia showed a vMMR with anterior distribution, which was even more pronounced in children with severe phonological deficits and very low spelling abilities. As anterior scalp distributions are typically reported for auditory speech processing, the anterior vMMR of children with developmental dyslexia might suggest an attempt to anticipate potentially upcoming auditory speech information in order to support phonological processing, which has been shown to be deficient in children with developmental dyslexia.  相似文献   

17.
In this research, we investigated the effects of voice and face information on the perceptual learning of talkers and on long-term memory for spoken words. In the first phase, listeners were trained over several days to identify voices from words presented auditorily or audiovisually. The training data showed that visual information about speakers enhanced voice learning, revealing cross-modal connections in talker processing akin to those observed in speech processing. In the second phase, the listeners completed an auditory or audiovisual word recognition memory test in which equal numbers of words were spoken by familiar and unfamiliar talkers. The data showed that words presented by familiar talkers were more likely to be retrieved from episodic memory, regardless of modality. Together, these findings provide new information about the representational code underlying familiar talker recognition and the role of stimulus familiarity in episodic word recognition.  相似文献   

18.
Perception of visual speech and the influence of visual speech on auditory speech perception is affected by the orientation of a talker's face, but the nature of the visual information underlying this effect has yet to be established. Here, we examine the contributions of visually coarse (configural) and fine (featural) facial movement information to inversion effects in the perception of visual and audiovisual speech. We describe two experiments in which we disrupted perception of fine facial detail by decreasing spatial frequency (blurring) and disrupted perception of coarse configural information by facial inversion. For normal, unblurred talking faces, facial inversion had no influence on visual speech identification or on the effects of congruent or incongruent visual speech movements on perception of auditory speech. However, for blurred faces, facial inversion reduced identification of unimodal visual speech and effects of visual speech on perception of congruent and incongruent auditory speech. These effects were more pronounced for words whose appearance may be defined by fine featural detail. Implications for the nature of inversion effects in visual and audiovisual speech are discussed.  相似文献   

19.
Hu Z  Zhang R  Zhang Q  Liu Q  Li H 《Brain and language》2012,121(1):70-75
Previous studies have found a late frontal-central audiovisual interaction during the time period about 150-220 ms post-stimulus. However, it is unclear to which process is this audiovisual interaction related: to processing of acoustical features or to classification of stimuli? To investigate this question, event-related potentials were recorded during a words-categorization task with stimuli presented in the auditory-visual modality. In the experiment, congruency of the visual and auditory stimuli was manipulated. Results showed that within the window of about 180-210 ms post-stimulus more positive values were elicited by category-congruent audiovisual stimuli than category-incongruent audiovisual stimuli. This indicates that the late frontal-central audiovisual interaction is related to audiovisual integration of semantic category information.  相似文献   

20.
Emotional expression and how it is lateralized across the two sides of the face may influence how we detect audiovisual speech. To investigate how these components interact we conducted experiments comparing the perception of sentences expressed with happy, sad, and neutral emotions. In addition we isolated the facial asymmetries for affective and speech processing by independently testing the two sides of a talker's face. These asymmetrical differences were exaggerated using dynamic facial chimeras in which left- or right-face halves were paired with their mirror image during speech production. Results suggest that there are facial asymmetries in audiovisual speech such that the right side of the face and right-facial chimeras supported better speech perception than their left-face counterparts. Affective information was also found to be critical in that happy expressions tended to improve speech performance on both sides of the face relative to all other emotions, whereas sad emotions generally inhibited visual speech information, particularly from the left side of the face. The results suggest that approach information may facilitate visual and auditory speech detection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号