首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The importance of visual cues in speech perception is illustrated by the McGurk effect, whereby a speaker’s facial movements affect speech perception. The goal of the present study was to evaluate whether the McGurk effect is also observed for sung syllables. Participants heard and saw sung instances of the syllables /ba/ and /ga/ and then judged the syllable they perceived. Audio-visual stimuli were congruent or incongruent (e.g., auditory /ba/ presented with visual /ga/). The stimuli were presented as spoken, sung in an ascending and descending triad (C E G G E C), and sung in an ascending and descending triad that returned to a semitone above the tonic (C E G G E C#). Results revealed no differences in the proportion of fusion responses between spoken and sung conditions confirming that cross-modal phonemic information is integrated similarly in speech and song.  相似文献   

2.
In noisy situations, visual information plays a critical role in the success of speech communication: listeners are better able to understand speech when they can see the speaker. Visual influence on auditory speech perception is also observed in the McGurk effect, in which discrepant visual information alters listeners’ auditory perception of a spoken syllable. When hearing /ba/ while seeing a person saying /ga/, for example, listeners may report hearing /da/. Because these two phenomena have been assumed to arise from a common integration mechanism, the McGurk effect has often been used as a measure of audiovisual integration in speech perception. In this study, we test whether this assumed relationship exists within individual listeners. We measured participants’ susceptibility to the McGurk illusion as well as their ability to identify sentences in noise across a range of signal-to-noise ratios in audio-only and audiovisual modalities. Our results do not show a relationship between listeners’ McGurk susceptibility and their ability to use visual cues to understand spoken sentences in noise, suggesting that McGurk susceptibility may not be a valid measure of audiovisual integration in everyday speech processing.  相似文献   

3.
Research has shown that auditory speech recognition is influenced by the appearance of a talker's face, but the actual nature of this visual information has yet to be established. Here, we report three experiments that investigated visual and audiovisual speech recognition using color, gray-scale, and point-light talking faces (which allowed comparison with the influence of isolated kinematic information). Auditory and visual forms of the syllables /ba/, /bi/, /ga/, /gi/, /va/, and /vi/ were used to produce auditory, visual, congruent, and incongruent audiovisual speech stimuli. Visual speech identification and visual influences on identifying the auditory components of congruent and incongruent audiovisual speech were identical for color and gray-scale faces and were much greater than for point-light faces. These results indicate that luminance, rather than color, underlies visual and audiovisual speech perception and that this information is more than the kinematic information provided by point-light faces. Implications for processing visual and audiovisual speech are discussed.  相似文献   

4.
Previous studies indicate that at least some aspects of audiovisual speech perception are impaired in children with specific language impairment (SLI). However, whether audiovisual processing difficulties are also present in older children with a history of this disorder is unknown. By combining electrophysiological and behavioral measures, we examined perception of both audiovisually congruent and audiovisually incongruent speech in school‐age children with a history of SLI (H‐SLI), their typically developing (TD) peers, and adults. In the first experiment, all participants watched videos of a talker articulating syllables ‘ba’, ‘da’, and ‘ga’ under three conditions – audiovisual (AV), auditory only (A), and visual only (V). The amplitude of the N1 (but not of the P2) event‐related component elicited in the AV condition was significantly reduced compared to the N1 amplitude measured from the sum of the A and V conditions in all groups of participants. Because N1 attenuation to AV speech is thought to index the degree to which facial movements predict the onset of the auditory signal, our findings suggest that this aspect of audiovisual speech perception is mature by mid‐childhood and is normal in the H‐SLI children. In the second experiment, participants watched videos of audivisually incongruent syllables created to elicit the so‐called McGurk illusion (with an auditory ‘pa’ dubbed onto a visual articulation of ‘ka’, and the expectant perception being that of ‘ta’ if audiovisual integration took place). As a group, H‐SLI children were significantly more likely than either TD children or adults to hear the McGurk syllable as ‘pa’ (in agreement with its auditory component) than as ‘ka’ (in agreement with its visual component), suggesting that susceptibility to the McGurk illusion is reduced in at least some children with a history of SLI. Taken together, the results of the two experiments argue against global audiovisual integration impairment in children with a history of SLI and suggest that, when present, audiovisual integration difficulties in this population likely stem from a later (non‐sensory) stage of processing.  相似文献   

5.
The “McGurk effect” demonstrates that visual (lip-read) information is used during speech perception even when it is discrepant with auditory information. While this has been established as a robust effect in subjects from Western cultures, our own earlier results had suggested that Japanese subjects use visual information much less than American subjects do (Sekiyama & Tohkura, 1993). The present study examined whether Chinese subjects would also show a reduced McGurk effect due to their cultural similarities with the Japanese. The subjects were 14 native speakers of Chinese living in Japan. Stimuli consisted of 10 syllables (/ba/, /pa/, /ma/, /wa/, /da/, /ta/, /na/, /ga/, /ka/, /ra/ ) pronounced by two speakers, one Japanese and one American. Each auditory syllable was dubbed onto every visual syllable within one speaker, resulting in 100 audiovisual stimuli in each language. The subjects’ main task was to report what they thought they had heard while looking at and listening to the speaker while the stimuli were being uttered. Compared with previous results obtained with American subjects, the Chinese subjects showed a weaker McGurk effect. The results also showed that the magnitude of the McGurk effect depends on the length of time the Chinese subjects had lived in Japan. Factors that foster and alter the Chinese subjects’ reliance on auditory information are discussed.  相似文献   

6.
In the McGurk effect, perception of audiovisually discrepant syllables can depend on auditory, visual, or a combination of audiovisual information. Under some conditions, visual information can override auditory information to the extent that identification judgments of a visually influenced syllable can be as consistent as for an analogous audiovisually compatible syllable. This might indicate that visually influenced and analogous audiovisually compatible syllables are phonetically equivalent. Experiments were designed to test this issue using a compelling visually influenced syllable in an AXB matching paradigm. Subjects were asked to match an audio syllable/va/either to an audiovisually consistent syllable (audio/va/-video/fa/) or an audiovisually discrepant syllable (audio/ba/-video/fa/). It was hypothesized that if the two audiovisual syllables were phonetically equivalent, then subjects should choose them equally often in the matching task. Results show, however, that subjects are more likely to match the audio/va/ to the audiovisually consistent/va/, suggesting differences in phonetic convincingness. Additional experiments further suggest that this preference is not based on a phonetically extraneous dimension or on noticeable relative audiovisual discrepancies.  相似文献   

7.
McCotter MV  Jordan TR 《Perception》2003,32(8):921-936
We conducted four experiments to investigate the role of colour and luminance information in visual and audiovisual speech perception. In experiments 1a (stimuli presented in quiet conditions) and 1b (stimuli presented in auditory noise), face display types comprised naturalistic colour (NC), grey-scale (GS), and luminance inverted (LI) faces. In experiments 2a (quiet) and 2b (noise), face display types comprised NC, colour inverted (CI), LI, and colour and luminance inverted (CLI) faces. Six syllables and twenty-two words were used to produce auditory and visual speech stimuli. Auditory and visual signals were combined to produce congruent and incongruent audiovisual speech stimuli. Experiments 1a and 1b showed that perception of visual speech, and its influence on identifying the auditory components of congruent and incongruent audiovisual speech, was less for LI than for either NC or GS faces, which produced identical results. Experiments 2a and 2b showed that perception of visual speech, and influences on perception of incongruent auditory speech, was less for LI and CLI faces than for NC and CI faces (which produced identical patterns of performance). Our findings for NC and CI faces suggest that colour is not critical for perception of visual and audiovisual speech. The effect of luminance inversion on performance accuracy was relatively small (5%), which suggests that the luminance information preserved in LI faces is important for the processing of visual and audiovisual speech.  相似文献   

8.
The performance of 14 poor readers on an audiovisual speech perception task was compared with 14 normal subjects matched on chronological age (CA) and 14 subjects matched on reading age (RA). The task consisted of identifying synthetic speech varying in place of articulation on an acoustic 9-point continuum between /ba/ and /da/ (Massaro & Cohen, 1983). The acoustic speech events were factorially combined with the visual articulation of /ba/, /da/, or none. In addition, the visual-only articulation of /ba/ or /da/ was presented. The results showed that (1) poor readers were less categorical than CA and RA in the identification of the auditory speech events and (2) that they were worse in speech reading. This convergence between the deficits clearly suggests that the auditory speech processing difficulty of poor readers is speech specific and relates to the processing of phonological information.  相似文献   

9.
In this study, the nature of speech perception of native Mandarin Chinese was compared with that of American English speakers, using synthetic visual and auditory continua (from /ba/ to /da/) in an expanded factorial design. In Experiment 1, speakers identified synthetic unimodal and bimodal speech syllables as either /ba/ or /da/. In Experiment 2, Mandarin speakers were given nine possible response alternatives. Syllable identification was influenced by both visual and auditory sources of information for both Mandarin and English speakers. Performance was better described by the fuzzy logical model of perception than by an auditory dominance model or a weighted-averaging model. Overall, the results are consistent with the idea that although there may be differences in information (which reflect differences in phonemic repertoires, phonetic realizations of the syllables, and the phonotactic constraints of languages), the underlying nature of audiovisual speech processing is similar across languages.  相似文献   

10.
Vatakis A  Spence C 《Perception》2008,37(1):143-160
Research has shown that inversion is more detrimental to the perception of faces than to the perception of other types of visual stimuli. Inverting a face results in an impairment of configural information processing that leads to slowed early face processing and reduced accuracy when performance is tested in face recognition tasks. We investigated the effects of inverting speech and non-speech stimuli on audiovisual temporal perception. Upright and inverted audiovisual video clips of a person uttering syllables (experiments 1 and 2), playing musical notes on a piano (experiment 3), or a rhesus monkey producing vocalisations (experiment 4) were presented. Participants made unspeeded temporal-order judgments regarding which modality stream (auditory or visual) appeared to have been presented first. Inverting the visual stream did not have any effect on the sensitivity of temporal discrimination responses in any of the four experiments, thus implying that audiovisual temporal integration is resilient to the effects of orientation in the picture plane. By contrast, the point of subjective simultaneity differed significantly as a function of orientation only for the audiovisual speech stimuli but not for the non-speech stimuli or monkey calls. That is, smaller auditory leads were required for the inverted than for the upright-visual speech stimuli. These results are consistent with the longer processing latencies reported previously when human faces are inverted and demonstrates that the temporal perception of dynamic audiovisual speech can be modulated by changes in the physical properties of the visual speech (ie by changes in orientation).  相似文献   

11.
The visible movement of a talker's face is an influential component of speech perception. However, the ability of this influence to function when large areas of the face (~50%) are covered by simple substantial occlusions, and so are not visible to the observer, has yet to be fully determined. In Experiment 1, both visual speech identification and the influence of visual speech on identifying congruent and incongruent auditory speech were investigated using displays of a whole (unoccluded) talking face and of the same face occluded vertically so that the entire left or right hemiface was covered. Both the identification of visual speech and its influence on auditory speech perception were identical across all three face displays. Experiment 2 replicated and extended these results, showing that visual and audiovisual speech perception also functioned well with other simple substantial occlusions (horizontal and diagonal). Indeed, displays in which entire upper facial areas were occluded produced performance levels equal to those obtained with unoccluded displays. Occluding entire lower facial areas elicited some impairments in performance, but visual speech perception and visual speech influences on auditory speech perception were still apparent. Finally, implications of these findings for understanding the processes supporting visual and audiovisual speech perception are discussed.  相似文献   

12.
Perception of visual speech and the influence of visual speech on auditory speech perception is affected by the orientation of a talker's face, but the nature of the visual information underlying this effect has yet to be established. Here, we examine the contributions of visually coarse (configural) and fine (featural) facial movement information to inversion effects in the perception of visual and audiovisual speech. We describe two experiments in which we disrupted perception of fine facial detail by decreasing spatial frequency (blurring) and disrupted perception of coarse configural information by facial inversion. For normal, unblurred talking faces, facial inversion had no influence on visual speech identification or on the effects of congruent or incongruent visual speech movements on perception of auditory speech. However, for blurred faces, facial inversion reduced identification of unimodal visual speech and effects of visual speech on perception of congruent and incongruent auditory speech. These effects were more pronounced for words whose appearance may be defined by fine featural detail. Implications for the nature of inversion effects in visual and audiovisual speech are discussed.  相似文献   

13.
In the McGurk effect, perception of audiovisually discrepant syllables can depend on auditory, visual, or a combination of audiovisual information. Undersome conditions, Vi8Ual information can override auditory information to the extent that identification judgments of a-visually influenced syllable can be as consistent as for an analogous audiovisually compatible syllable. This might indicate that visually influenced and analogous audiuvisually-compatible syllables-are-phictnetically equivalent. Experiments were designed to test this issue using a compelling visually influenced syllable in an AXB matching paradigm. Subjects were asked tomatch an audio syllable /val either to an audiovisually consistent syllable (audio /val-video /fa/) or an audiovisually discrepant syllable (audio /bs/-video ifa!). It was hypothesized that if the two audiovisual syllables were phonetically equivalent, then subjects should choose them equally often in the matching task. Results show, however, that subjects are more likely to match the audio /va/ to the audiovisually consistent /va/, suggesting differences in phonetic convincingness. Additional experiments further suggest that this preference is not based on a phonetically extraneous dimension or on noticeable relative audiovisual discrepancies.  相似文献   

14.
The multistable perception of speech, or verbal transformation effect, refers to perceptual changes experienced while listening to a speech form that is repeated rapidly and continuously. In order to test whether visual information from the speaker's articulatory gestures may modify the emergence and stability of verbal auditory percepts, subjects were instructed to report any perceptual changes during unimodal, audiovisual, and incongruent audiovisual presentations of distinct repeated syllables. In a first experiment, the perceptual stability of reported auditory percepts was significantly modulated by the modality of presentation. In a second experiment, when audiovisual stimuli consisting of a stable audio track dubbed with a video track that alternated between congruent and incongruent stimuli were presented, a strong correlation between the timing of perceptual transitions and the timing of video switches was found. Finally, a third experiment showed that the vocal tract opening onset event provided by the visual input could play the role of a bootstrap mechanism in the search for transformations. Altogether, these results demonstrate the capacity of visual information to control the multistable perception of speech in its phonetic content and temporal course. The verbal transformation effect thus provides a useful experimental paradigm to explore audiovisual interactions in speech perception.  相似文献   

15.
Infant perception often deals with audiovisual speech input and a first step in processing this input is to perceive both visual and auditory information. The speech directed to infants has special characteristics and may enhance visual aspects of speech. The current study was designed to explore the impact of visual enhancement in infant-directed speech (IDS) on audiovisual mismatch detection in a naturalistic setting. Twenty infants participated in an experiment with a visual fixation task conducted in participants’ homes. Stimuli consisted of IDS and adult-directed speech (ADS) syllables with a plosive and the vowel /a:/, /i:/ or /u:/. These were either audiovisually congruent or incongruent. Infants looked longer at incongruent than congruent syllables and longer at IDS than ADS syllables, indicating that IDS and incongruent stimuli contain cues that can make audiovisual perception challenging and thereby attract infants’ gaze.  相似文献   

16.
Speech perception is audiovisual, as demonstrated by the McGurk effect in which discrepant visual speech alters the auditory speech percept. We studied the role of visual attention in audiovisual speech perception by measuring the McGurk effect in two conditions. In the baseline condition, attention was focused on the talking face. In the distracted attention condition, subjects ignored the face and attended to a visual distractor, which was a leaf moving across the face. The McGurk effect was weaker in the latter condition, indicating that visual attention modulated audiovisual speech perception. This modulation may occur at an early, unisensory processing stage, or it may be due to changes at the stage where auditory and visual information is integrated. We investigated this issue by conventional statistical testing, and by fitting the Fuzzy Logical Model of Perception (Massaro, 1998) to the results. The two methods suggested different interpretations, revealing a paradox in the current methods of analysis.  相似文献   

17.
Sætrevik, B. (2010). The influence of visual information on auditory lateralization. Scandinavian Journal of Psychology. The classic McGurk study showed that presentation of one syllable in the visual modality simultaneous with a different syllable in the auditory modality creates the perception of a third, not presented syllable. The current study presented dichotic syllable pairs (one in each ear) simultaneously with video clips of a mouth pronouncing the syllables from one of the ears, or pronouncing a syllable that was not part of the dichotic pair. When asked to report the auditory stimuli, responses were shifted towards selecting the auditory stimulus from the side that matched the visual stimulus.  相似文献   

18.
Three experiments are reported on the influence of different timing relations on the McGurk effect. In the first experiment, it is shown that strict temporal synchrony between auditory and visual speech stimuli is not required for the McGurk effect. Subjects were strongly influenced by the visual stimuli when the auditory stimuli lagged the visual stimuli by as much as 180 msec. In addition, a stronger McGurk effect was found when the visual and auditory vowels matched. In the second experiment, we paired auditory and visual speech stimuli produced under different speaking conditions (fast, normal, clear). The results showed that the manipulations in both the visual and auditory speaking conditions independently influenced perception. In addition, there was a small but reliable tendency for the better matched stimuli to elicit more McGurk responses than unmatched conditions. In the third experiment, we combined auditory and visual stimuli produced under different speaking conditions (fast, clear) and delayed the acoustics with respect to the visual stimuli. The subjects showed the same pattern of results as in the second experiment. Finally, the delay did not cause different patterns of results for the different audiovisual speaking style combinations. The results suggest that perceivers may be sensitive to the concordance of the time-varying aspects of speech but they do not require temporal coincidence of that information.  相似文献   

19.
Three experiments examined whether image manipulations known to disrupt face perception also disrupt visual speech perception. Research has shown that an upright face with an inverted mouth looks strikingly grotesque whereas an inverted face and an inverted face containing an upright mouth look relatively normal. The current study examined whether a similar sensitivity to upright facial context plays a role in visual speech perception. Visual and audiovisual syllable identification tasks were tested under 4 presentation conditions: upright face-upright mouth, inverted face-inverted mouth, inverted face-upright mouth, and upright face-inverted mouth. Results revealed that for some visual syllables only the upright face-inverted mouth image disrupted identification. These results suggest that upright facial context can play a role in visual speech perception. A follow-up experiment testing isolated mouths supported this conclusion.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号