首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Integration of simultaneous auditory and visual information about an event can enhance our ability to detect that event. This is particularly evident in the perception of speech, where the articulatory gestures of the speaker's lips and face can significantly improve the listener's detection and identification of the message, especially when that message is presented in a noisy background. Speech is a particularly important example of multisensory integration because of its behavioural relevance to humans and also because brain regions have been identified that appear to be specifically tuned for auditory speech and lip gestures. Previous research has suggested that speech stimuli may have an advantage over other types of auditory stimuli in terms of audio-visual integration. Here, we used a modified adaptive psychophysical staircase approach to compare the influence of congruent visual stimuli (brief movie clips) on the detection of noise-masked auditory speech and non-speech stimuli. We found that congruent visual stimuli significantly improved detection of an auditory stimulus relative to incongruent visual stimuli. This effect, however, was equally apparent for speech and non-speech stimuli. The findings suggest that speech stimuli are not specifically advantaged by audio-visual integration for detection at threshold when compared with other naturalistic sounds.  相似文献   

2.
Visual information conveyed by iconic hand gestures and visible speech can enhance speech comprehension under adverse listening conditions for both native and non‐native listeners. However, how a listener allocates visual attention to these articulators during speech comprehension is unknown. We used eye‐tracking to investigate whether and how native and highly proficient non‐native listeners of Dutch allocated overt eye gaze to visible speech and gestures during clear and degraded speech comprehension. Participants watched video clips of an actress uttering a clear or degraded (6‐band noise‐vocoded) action verb while performing a gesture or not, and were asked to indicate the word they heard in a cued‐recall task. Gestural enhancement was the largest (i.e., a relative reduction in reaction time cost) when speech was degraded for all listeners, but it was stronger for native listeners. Both native and non‐native listeners mostly gazed at the face during comprehension, but non‐native listeners gazed more often at gestures than native listeners. However, only native but not non‐native listeners' gaze allocation to gestures predicted gestural benefit during degraded speech comprehension. We conclude that non‐native listeners might gaze at gesture more as it might be more challenging for non‐native listeners to resolve the degraded auditory cues and couple those cues to phonological information that is conveyed by visible speech. This diminished phonological knowledge might hinder the use of semantic information that is conveyed by gestures for non‐native compared to native listeners. Our results demonstrate that the degree of language experience impacts overt visual attention to visual articulators, resulting in different visual benefits for native versus non‐native listeners.  相似文献   

3.
McCotter MV  Jordan TR 《Perception》2003,32(8):921-936
We conducted four experiments to investigate the role of colour and luminance information in visual and audiovisual speech perception. In experiments 1a (stimuli presented in quiet conditions) and 1b (stimuli presented in auditory noise), face display types comprised naturalistic colour (NC), grey-scale (GS), and luminance inverted (LI) faces. In experiments 2a (quiet) and 2b (noise), face display types comprised NC, colour inverted (CI), LI, and colour and luminance inverted (CLI) faces. Six syllables and twenty-two words were used to produce auditory and visual speech stimuli. Auditory and visual signals were combined to produce congruent and incongruent audiovisual speech stimuli. Experiments 1a and 1b showed that perception of visual speech, and its influence on identifying the auditory components of congruent and incongruent audiovisual speech, was less for LI than for either NC or GS faces, which produced identical results. Experiments 2a and 2b showed that perception of visual speech, and influences on perception of incongruent auditory speech, was less for LI and CLI faces than for NC and CI faces (which produced identical patterns of performance). Our findings for NC and CI faces suggest that colour is not critical for perception of visual and audiovisual speech. The effect of luminance inversion on performance accuracy was relatively small (5%), which suggests that the luminance information preserved in LI faces is important for the processing of visual and audiovisual speech.  相似文献   

4.
In musical performance, bodily gestures play an important role in communicating expressive intentions to audiences. Although previous studies have demonstrated that visual information can have an effect on the perceived expressivity of musical performances, the investigation of audiovisual interactions has been held back by the technical difficulties associated with the generation of controlled, mismatching stimuli. With the present study, we aimed to address this issue by utilizing a novel method in order to generate controlled, balanced stimuli that comprised both matching and mismatching bimodal combinations of different expressive intentions. The aim of Experiment 1 was to investigate the relative contributions of auditory and visual kinematic cues in the perceived expressivity of piano performances, and in Experiment 2 we explored possible crossmodal interactions in the perception of auditory and visual expressivity. The results revealed that although both auditory and visual kinematic cues contribute significantly to the perception of overall expressivity, the effect of visual kinematic cues appears to be somewhat stronger. These results also provide preliminary evidence of crossmodal interactions in the perception of auditory and visual expressivity. In certain performance conditions, visual cues had an effect on the ratings of auditory expressivity, and auditory cues had a small effect on the ratings of visual expressivity.  相似文献   

5.
Lip reading is the ability to partially understand speech by looking at the speaker's lips. It improves the intelligibility of speech in noise when audio-visual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speaker's lips also enhances sensitivity to acoustic information, decreasing the auditory detection threshold of speech embedded in noise [J. Acoust. Soc. Am. 109 (2001) 2272; J. Acoust. Soc. Am. 108 (2000) 1197]. However, detection is different from comprehension, and it remains to be seen whether improved sensitivity also results in an intelligibility gain in audio-visual speech perception. In this work, we use an original paradigm to show that seeing the speaker's lips enables the listener to hear better and hence to understand better. The audio-visual stimuli used here could not be differentiated by lip reading per se since they contained exactly the same lip gesture matched with different compatible speech sounds. Nevertheless, the noise-masked stimuli were more intelligible in the audio-visual condition than in the audio-only condition due to the contribution of visual information to the extraction of acoustic cues. Replacing the lip gesture by a non-speech visual input with exactly the same time course, providing the same temporal cues for extraction, removed the intelligibility benefit. This early contribution to audio-visual speech identification is discussed in relationships with recent neurophysiological data on audio-visual perception.  相似文献   

6.
In two experiments, we investigated whether simultaneous speech reading can influence the detection of speech in envelope-matched noise. Subjects attempted to detect the presence of a disyllabic utterance in noise while watching a speaker articulate a matching or a non-matching utterance. Speech detection was not facilitated by an audio-visual match, which suggests that listeners relied on low-level auditory cues whose perception was immune to cross-modal top-down influences. However, when the stimuli were words (Experiment 1), there was a (predicted) relative shift in bias, suggesting that the masking noise itself was perceived as more speechlike when its envelope corresponded to the visual information. This bias shift was absent, however, with non-word materials (Experiment 2). These results, which resemble earlier findings obtained with orthographic visual input, indicate that the mapping from sight to sound is lexically mediated even when, as in the case of the articulatory-phonetic correspondence, the cross-modal relationship is non-arbitrary.  相似文献   

7.
In noisy situations, visual information plays a critical role in the success of speech communication: listeners are better able to understand speech when they can see the speaker. Visual influence on auditory speech perception is also observed in the McGurk effect, in which discrepant visual information alters listeners’ auditory perception of a spoken syllable. When hearing /ba/ while seeing a person saying /ga/, for example, listeners may report hearing /da/. Because these two phenomena have been assumed to arise from a common integration mechanism, the McGurk effect has often been used as a measure of audiovisual integration in speech perception. In this study, we test whether this assumed relationship exists within individual listeners. We measured participants’ susceptibility to the McGurk illusion as well as their ability to identify sentences in noise across a range of signal-to-noise ratios in audio-only and audiovisual modalities. Our results do not show a relationship between listeners’ McGurk susceptibility and their ability to use visual cues to understand spoken sentences in noise, suggesting that McGurk susceptibility may not be a valid measure of audiovisual integration in everyday speech processing.  相似文献   

8.
言语与手部运动关系的研究回顾   总被引:1,自引:0,他引:1  
言语与手部运动之间存在复杂的联系。该文总结了两类手部运动(伴随言语发生的手势运动和抓握运动)与言语之间关系的行为和脑科学研究成果。发现:(1)伴随言语产生的意义手势可促进言语加工,特别是词汇的提取过程;(2)观察手的抓握运动影响言语产生时唇的运动和声音成分;(3)对词语的知觉影响抓握运动的早期计划阶段;(4)言语产生可增加手运动皮层的兴奋性。作者由此认为,言语加工与手势间的联系不仅表现为神经通路的重叠和相互激活,而且可能在外显行为上也相互影响  相似文献   

9.
Previous studies indicate that at least some aspects of audiovisual speech perception are impaired in children with specific language impairment (SLI). However, whether audiovisual processing difficulties are also present in older children with a history of this disorder is unknown. By combining electrophysiological and behavioral measures, we examined perception of both audiovisually congruent and audiovisually incongruent speech in school‐age children with a history of SLI (H‐SLI), their typically developing (TD) peers, and adults. In the first experiment, all participants watched videos of a talker articulating syllables ‘ba’, ‘da’, and ‘ga’ under three conditions – audiovisual (AV), auditory only (A), and visual only (V). The amplitude of the N1 (but not of the P2) event‐related component elicited in the AV condition was significantly reduced compared to the N1 amplitude measured from the sum of the A and V conditions in all groups of participants. Because N1 attenuation to AV speech is thought to index the degree to which facial movements predict the onset of the auditory signal, our findings suggest that this aspect of audiovisual speech perception is mature by mid‐childhood and is normal in the H‐SLI children. In the second experiment, participants watched videos of audivisually incongruent syllables created to elicit the so‐called McGurk illusion (with an auditory ‘pa’ dubbed onto a visual articulation of ‘ka’, and the expectant perception being that of ‘ta’ if audiovisual integration took place). As a group, H‐SLI children were significantly more likely than either TD children or adults to hear the McGurk syllable as ‘pa’ (in agreement with its auditory component) than as ‘ka’ (in agreement with its visual component), suggesting that susceptibility to the McGurk illusion is reduced in at least some children with a history of SLI. Taken together, the results of the two experiments argue against global audiovisual integration impairment in children with a history of SLI and suggest that, when present, audiovisual integration difficulties in this population likely stem from a later (non‐sensory) stage of processing.  相似文献   

10.
Perception of visual speech and the influence of visual speech on auditory speech perception is affected by the orientation of a talker's face, but the nature of the visual information underlying this effect has yet to be established. Here, we examine the contributions of visually coarse (configural) and fine (featural) facial movement information to inversion effects in the perception of visual and audiovisual speech. We describe two experiments in which we disrupted perception of fine facial detail by decreasing spatial frequency (blurring) and disrupted perception of coarse configural information by facial inversion. For normal, unblurred talking faces, facial inversion had no influence on visual speech identification or on the effects of congruent or incongruent visual speech movements on perception of auditory speech. However, for blurred faces, facial inversion reduced identification of unimodal visual speech and effects of visual speech on perception of congruent and incongruent auditory speech. These effects were more pronounced for words whose appearance may be defined by fine featural detail. Implications for the nature of inversion effects in visual and audiovisual speech are discussed.  相似文献   

11.
The current study addressed the question whether audiovisual (AV) speech can improve speech perception in older and younger adults in a noisy environment. Event-related potentials (ERPs) were recorded to investigate age-related differences in the processes underlying AV speech perception. Participants performed an object categorization task in three conditions, namely auditory-only (A), visual-only (V), and AVspeech. Both age groups revealed an equivalent behavioral AVspeech benefit over unisensory trials. ERP analyses revealed an amplitude reduction of the auditory P1 and N1 on AVspeech trials relative to the summed unisensory (A + V) response in both age groups. These amplitude reductions are interpreted as an indication of multisensory efficiency as fewer neural resources were recruited to achieve better performance. Of interest, the observed P1 amplitude reduction was larger in older adults. Younger and older adults also showed an earlier auditory N1 in AVspeech relative to A and A + V trials, an effect that was again greater in the older adults. The degree of multisensory latency shift was predicted by basic auditory functioning (i.e., higher hearing thresholds were associated with larger latency shifts) in both age groups. Together, the results show that AV speech processing is not only intact in older adults, but that the facilitation of neural responses occurs earlier in and to a greater extent than in younger adults. Thus, older adults appear to benefit more from additional visual speech cues than younger adults, possibly to compensate for more impoverished unisensory inputs because of sensory aging.  相似文献   

12.
Kim J  Sironic A  Davis C 《Perception》2011,40(7):853-862
Seeing the talker improves the intelligibility of speech degraded by noise (a visual speech benefit). Given that talkers exaggerate spoken articulation in noise, this set of two experiments examined whether the visual speech benefit was greater for speech produced in noise than in quiet. We first examined the extent to which spoken articulation was exaggerated in noise by measuring the motion of face markers as four people uttered 10 sentences either in quiet or in babble-speech noise (these renditions were also filmed). The tracking results showed that articulated motion in speech produced in noise was greater than that produced in quiet and was more highly correlated with speech acoustics. Speech intelligibility was tested in a second experiment using a speech-perception-in-noise task under auditory-visual and auditory-only conditions. The results showed that the visual speech benefit was greater for speech recorded in noise than for speech recorded in quiet. Furthermore, the amount of articulatory movement was related to performance on the perception task, indicating that the enhanced gestures made when speaking in noise function to make speech more intelligible.  相似文献   

13.
Despite spectral and temporal discontinuities in the speech signal, listeners normally report coherent phonetic patterns corresponding to the phonemes of a language that they know. What is the basis for the internal coherence of phonetic segments? According to one account, listeners achieve coherence by extracting and integrating discrete cues; according to another, coherence arises automatically from general principles of auditory form perception; according to a third, listeners perceive speech patterns as coherent because they are the acoustic consequences of coordinated articulatory gestures in a familiar language. We tested these accounts in three experiments by training listeners to hear a continuum of three-tone, modulated sine wave patterns, modeled after a minimal pair contrast between three-formant synthetic speech syllables, either as distorted speech signals carrying a phonetic contrast (speech listeners) or as distorted musical chords carrying a nonspeech auditory contrast (music listeners). The music listeners could neither integrate the sine wave patterns nor perceive their auditory coherence to arrive at consistent, categorical percepts, whereas the speech listeners judged the patterns as speech almost as reliably as the synthetic syllables on which they were modeled. The outcome is consistent with the hypothesis that listeners perceive the phonetic coherence of a speech signal by recognizing acoustic patterns that reflect the coordinated articulatory gestures from which they arose.  相似文献   

14.
The visible movement of a talker's face is an influential component of speech perception. However, the ability of this influence to function when large areas of the face (~50%) are covered by simple substantial occlusions, and so are not visible to the observer, has yet to be fully determined. In Experiment 1, both visual speech identification and the influence of visual speech on identifying congruent and incongruent auditory speech were investigated using displays of a whole (unoccluded) talking face and of the same face occluded vertically so that the entire left or right hemiface was covered. Both the identification of visual speech and its influence on auditory speech perception were identical across all three face displays. Experiment 2 replicated and extended these results, showing that visual and audiovisual speech perception also functioned well with other simple substantial occlusions (horizontal and diagonal). Indeed, displays in which entire upper facial areas were occluded produced performance levels equal to those obtained with unoccluded displays. Occluding entire lower facial areas elicited some impairments in performance, but visual speech perception and visual speech influences on auditory speech perception were still apparent. Finally, implications of these findings for understanding the processes supporting visual and audiovisual speech perception are discussed.  相似文献   

15.
In accord with a proposed innate link between speech perception and production (e.g., motor theory), this study provides compelling evidence for the inhibition of stuttering events in people who stutter prior to the initiation of the intended speech act, via both the perception and the production of speech gestures. Stuttering frequency during reading was reduced in 10 adults who stutter by approximately 40% in three of four experimental conditions: (1) following passive audiovisual presentation (i.e., viewing and hearing) of another person producing pseudostuttering (stutter-like syllabic repetitions) and following active shadowing of both (2) pseudostuttered and (3) fluent speech. Stuttering was not inhibited during reading following passive audiovisual presentation of fluent speech. Syllabic repetitions can inhibit stuttering both when produced and when perceived, and we suggest that these elementary stuttering forms may serve as compensatory speech gestures for releasing involuntary stuttering blocks by engaging mirror neuronal systems that are predisposed for fluent gestural imitation.  相似文献   

16.
Infant perception often deals with audiovisual speech input and a first step in processing this input is to perceive both visual and auditory information. The speech directed to infants has special characteristics and may enhance visual aspects of speech. The current study was designed to explore the impact of visual enhancement in infant-directed speech (IDS) on audiovisual mismatch detection in a naturalistic setting. Twenty infants participated in an experiment with a visual fixation task conducted in participants’ homes. Stimuli consisted of IDS and adult-directed speech (ADS) syllables with a plosive and the vowel /a:/, /i:/ or /u:/. These were either audiovisually congruent or incongruent. Infants looked longer at incongruent than congruent syllables and longer at IDS than ADS syllables, indicating that IDS and incongruent stimuli contain cues that can make audiovisual perception challenging and thereby attract infants’ gaze.  相似文献   

17.
Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and temporal modulations in the 2–7 Hz range are of particular importance. Dyslexic individuals have specific problems in perceiving speech envelope cues. In the current study, we used an audiovisual noise-vocoded speech task to investigate the contribution of low-frequency visual information to intelligibility of 4-channel and 16-channel noise vocoded speech in participants with and without dyslexia. For the 4-channel speech, noise vocoding preserves amplitude information that is entirely congruent with dynamic visual information. All participants were significantly more accurate with 4-channel speech when visual information was present, even when this information was purely spatio-temporal (pixelated stimuli changing in luminance). Possible underlying mechanisms are discussed.  相似文献   

18.
The multistable perception of speech, or verbal transformation effect, refers to perceptual changes experienced while listening to a speech form that is repeated rapidly and continuously. In order to test whether visual information from the speaker's articulatory gestures may modify the emergence and stability of verbal auditory percepts, subjects were instructed to report any perceptual changes during unimodal, audiovisual, and incongruent audiovisual presentations of distinct repeated syllables. In a first experiment, the perceptual stability of reported auditory percepts was significantly modulated by the modality of presentation. In a second experiment, when audiovisual stimuli consisting of a stable audio track dubbed with a video track that alternated between congruent and incongruent stimuli were presented, a strong correlation between the timing of perceptual transitions and the timing of video switches was found. Finally, a third experiment showed that the vocal tract opening onset event provided by the visual input could play the role of a bootstrap mechanism in the search for transformations. Altogether, these results demonstrate the capacity of visual information to control the multistable perception of speech in its phonetic content and temporal course. The verbal transformation effect thus provides a useful experimental paradigm to explore audiovisual interactions in speech perception.  相似文献   

19.
Cvejic E  Kim J  Davis C 《Cognition》2012,122(3):442-453
Prosody can be expressed not only by modification to the timing, stress and intonation of auditory speech but also by modifying visual speech. Studies have shown that the production of visual cues to prosody is highly variable (both within and across speakers), however behavioural studies have shown that perceivers can effectively use such visual cues. The latter result suggests that people are sensitive to the type of prosody expressed despite cue variability. The current study investigated the extent to which perceivers can match visual cues to prosody from different speakers and from different face regions. Participants were presented two pairs of sentences (consisting of the same segmental content) and were required to decide which pair had the same prosody. Experiment 1 tested visual and auditory cues from the same speaker and Experiment 2 from different speakers. Experiment 3 used visual cues from the upper and the lower face of the same talker and Experiment 4 from different speakers. The results showed that perceivers could accurately match prosody even when signals were produced by different speakers. Furthermore, perceivers were able to match the prosodic cues both within and across modalities regardless of the face area presented. This ability to match prosody from very different visual cues suggests that perceivers cope with variation in the production of visual prosody by flexibly mapping specific tokens to abstract prosodic types.  相似文献   

20.
This study investigated whether individual differences in cognitive functions, attentional abilities in particular, were associated with individual differences in the quality of phonological representations, resulting in variability in speech perception and production. To do so, we took advantage of a tone merging phenomenon in Cantonese, and identified three groups of typically developed speakers who could differentiate the two rising tones (high and low rising) in both perception and production [+Per+Pro], only in perception [+Per–Pro], or in neither modalities [–Per–Pro]. Perception and production were reflected, respectively, by discrimination sensitivity d′ and acoustic measures of pitch offset and rise time differences. Components of event-related potential (ERP)—the mismatch negativity (MMN) and the ERPs to amplitude rise time—were taken to reflect the representations of the acoustic cues of tones. Components of attention and working memory in the auditory and visual modalities were assessed with published test batteries. The results show that individual differences in both perception and production are linked to how listeners encode and represent the acoustic cues (pitch contour and rise time) as reflected by ERPs. The present study has advanced our knowledge from previous work by integrating measures of perception, production, attention, and those reflecting quality of representation, to offer a comprehensive account for the underlying cognitive factors of individual differences in speech processing. Particularly, it is proposed that domain-general attentional switching affects the quality of perceptual representations of the acoustic cues, giving rise to individual differences in perception and production.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号