首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Studies of the McGurk effect have shown that when discrepant phonetic information is delivered to the auditory and visual modalities, the information is combined into a new percept not originally presented to either modality. In typical experiments, the auditory and visual speech signals are generated by the same talker. The present experiment examined whether a discrepancy in the gender of the talker between the auditory and visual signals would influence the magnitude of the McGurk effect. A male talker's voice was dubbed onto a videotape containing a female talker's face, and vice versa. The gender-incongruent videotapes were compared with gender-congruent videotapes, in which a male talker's voice was dubbed onto a male face and a female talker's voice was dubbed onto a female face. Even though there was a clear incompatibility in talker characteristics between the auditory and visual signals on the incongruent videotapes, the resulting magnitude of the McGurk effect was not significantly different for the incongruent as opposed to the congruent videotapes. The results indicate that the mechanism for integrating speech information from the auditory and the visual modalities is not disrupted by a gender incompatibility even when it is perceptually apparent. The findings are compatible with the theoretical notion that information about voice characteristics of the talker is extracted and used to normalize the speech signal at an early stage of phonetic processing, prior to the integration of the auditory and the visual information.  相似文献   

2.
The McGurk effect, where an incongruent visual syllable influences identification of an auditory syllable, does not always occur, suggesting that perceivers sometimes fail to use relevant visual phonetic information. We tested whether another visual phonetic effect, which involves the influence of visual speaking rate on perceived voicing (Green & Miller, 1985), would occur in instances when the McGurk effect does not. In Experiment 1, we established this visual rate effect using auditory and visual stimuli matching in place of articulation, finding a shift in the voicing boundary along an auditory voice-onset-time continuum with fast versus slow visual speech tokens. In Experiment 2, we used auditory and visual stimuli differing in place of articulation and found a shift in the voicing boundary due to visual rate when the McGurk effect occurred and, more critically, when it did not. The latter finding indicates that phonetically relevant visual information is used in speech perception even when the McGurk effect does not occur, suggesting that the incidence of the McGurk effect underestimates the extent of audio-visual integration.  相似文献   

3.
Previous studies indicate that at least some aspects of audiovisual speech perception are impaired in children with specific language impairment (SLI). However, whether audiovisual processing difficulties are also present in older children with a history of this disorder is unknown. By combining electrophysiological and behavioral measures, we examined perception of both audiovisually congruent and audiovisually incongruent speech in school‐age children with a history of SLI (H‐SLI), their typically developing (TD) peers, and adults. In the first experiment, all participants watched videos of a talker articulating syllables ‘ba’, ‘da’, and ‘ga’ under three conditions – audiovisual (AV), auditory only (A), and visual only (V). The amplitude of the N1 (but not of the P2) event‐related component elicited in the AV condition was significantly reduced compared to the N1 amplitude measured from the sum of the A and V conditions in all groups of participants. Because N1 attenuation to AV speech is thought to index the degree to which facial movements predict the onset of the auditory signal, our findings suggest that this aspect of audiovisual speech perception is mature by mid‐childhood and is normal in the H‐SLI children. In the second experiment, participants watched videos of audivisually incongruent syllables created to elicit the so‐called McGurk illusion (with an auditory ‘pa’ dubbed onto a visual articulation of ‘ka’, and the expectant perception being that of ‘ta’ if audiovisual integration took place). As a group, H‐SLI children were significantly more likely than either TD children or adults to hear the McGurk syllable as ‘pa’ (in agreement with its auditory component) than as ‘ka’ (in agreement with its visual component), suggesting that susceptibility to the McGurk illusion is reduced in at least some children with a history of SLI. Taken together, the results of the two experiments argue against global audiovisual integration impairment in children with a history of SLI and suggest that, when present, audiovisual integration difficulties in this population likely stem from a later (non‐sensory) stage of processing.  相似文献   

4.
An experiment was conducted to investigate the claims made by Bruce and Young (1986) for the independence of facial identity and facial speech processing. A well-reported phenomenon in audiovisual speech perception—theMcGurk effect (McGurk & MacDonald, 1976), in which synchronous but conflicting auditory and visual phonetic information is presented to subjects—was utilized as a dynamic facial speech processing task. An element of facial identity processing was introduced into this task by manipulating the faces used for the creation of the McGurk-effect stimuli such that (1) they were familiar to some subjects and unfamiliar to others, and (2) the faces and voices used were either congruent (from the same person) or incongruent (from different people). A comparison was made between the different subject groups in their susceptibility to the McGurk illusion, and the results show that when the faces and voices are incongruent, subjects who are familiar with the faces are less susceptible to McGurk effects than those who are unfamiliar with the faces. The results suggest that facial identity and facial speech processing are not entirely independent, and these findings are discussed in relation to Bruce and Young’s (1986) functional model of face recognition.  相似文献   

5.
The kinds of aftereffects, indicative of cross-modal recalibration, that are observed after exposure to spatially incongruent inputs from different sensory modalities have not been demonstrated so far for identity incongruence. We show that exposure to incongruent audiovisual speech (producing the well-known McGurk effect) can recalibrate auditory speech identification. In Experiment 1, exposure to an ambiguous sound intermediate between /aba/ and /ada/ dubbed onto a video of a face articulating either /aba/ or /ada/ increased the proportion of /aba/ or /ada/ responses, respectively, during subsequent sound identification trials. Experiment 2 demonstrated the same recalibration effect or the opposite one, fewer /aba/ or /ada/ responses, revealing selective speech adaptation, depending on whether the ambiguous sound or a congruent nonambiguous one was used during exposure. In separate forced-choice identification trials, bimodal stimulus pairs producing these contrasting effects were identically categorized, which makes a role of postperceptual factors in the generation of the effects unlikely.  相似文献   

6.
The personal attributes of a talker perceived via acoustic properties of speech are commonly considered to be an extralinguistic message of an utterance. Accordingly, accounts of the perception of talker attributes have emphasized a causal role of aspects of the fundamental frequency and coarsegrain acoustic spectra distinct from the detailed acoustic correlates of phonemes. In testing this view, in four experiments, we estimated the ability of listeners to ascertain the sex or the identity of 5 male and 5 female talkers from sinusoidal replicas of natural utterances, which lack fundamental frequency and natural vocal spectra. Given such radically reduced signals, listeners appeared to identify a talker’s sex according to the central spectral tendencies of the sinusoidal constituents. Under acoustic conditions that prevented listeners from determining the sex of a talker, individual identification from sinewave signals was often successful. These results reveal that the perception of a talker’s sex and identity are not contingent and that fine-grain aspects of a talker’s phonetic production can elicit individual identification under conditions that block the perception of voice quality.  相似文献   

7.
The “McGurk effect” demonstrates that visual (lip-read) information is used during speech perception even when it is discrepant with auditory information. While this has been established as a robust effect in subjects from Western cultures, our own earlier results had suggested that Japanese subjects use visual information much less than American subjects do (Sekiyama & Tohkura, 1993). The present study examined whether Chinese subjects would also show a reduced McGurk effect due to their cultural similarities with the Japanese. The subjects were 14 native speakers of Chinese living in Japan. Stimuli consisted of 10 syllables (/ba/, /pa/, /ma/, /wa/, /da/, /ta/, /na/, /ga/, /ka/, /ra/ ) pronounced by two speakers, one Japanese and one American. Each auditory syllable was dubbed onto every visual syllable within one speaker, resulting in 100 audiovisual stimuli in each language. The subjects’ main task was to report what they thought they had heard while looking at and listening to the speaker while the stimuli were being uttered. Compared with previous results obtained with American subjects, the Chinese subjects showed a weaker McGurk effect. The results also showed that the magnitude of the McGurk effect depends on the length of time the Chinese subjects had lived in Japan. Factors that foster and alter the Chinese subjects’ reliance on auditory information are discussed.  相似文献   

8.
In noisy situations, visual information plays a critical role in the success of speech communication: listeners are better able to understand speech when they can see the speaker. Visual influence on auditory speech perception is also observed in the McGurk effect, in which discrepant visual information alters listeners’ auditory perception of a spoken syllable. When hearing /ba/ while seeing a person saying /ga/, for example, listeners may report hearing /da/. Because these two phenomena have been assumed to arise from a common integration mechanism, the McGurk effect has often been used as a measure of audiovisual integration in speech perception. In this study, we test whether this assumed relationship exists within individual listeners. We measured participants’ susceptibility to the McGurk illusion as well as their ability to identify sentences in noise across a range of signal-to-noise ratios in audio-only and audiovisual modalities. Our results do not show a relationship between listeners’ McGurk susceptibility and their ability to use visual cues to understand spoken sentences in noise, suggesting that McGurk susceptibility may not be a valid measure of audiovisual integration in everyday speech processing.  相似文献   

9.
McGurk效应(麦格克效应)是典型的视听整合现象, 该效应受到刺激的物理特征、注意分配、个体视听信息依赖程度、视听整合能力、语言文化差异的影响。引发McGurk效应的关键视觉信息主要来自说话者的嘴部区域。产生McGurk效应的认知过程包含早期的视听整合(与颞上皮层有关)以及晚期的视听不一致冲突(与额下皮层有关)。未来研究应关注面孔社会信息对McGurk效应的影响, McGurk效应中单通道信息加工与视听整合的关系, 结合计算模型探讨其认知神经机制等。  相似文献   

10.
Visual information provided by a talker's mouth movements can influence the perception of certain speech features. Thus, the "McGurk effect" shows that when the syllable (bi) is presented audibly, in synchrony with the syllable (gi), as it is presented visually, a person perceives the talker as saying (di). Moreover, studies have shown that interactions occur between place and voicing features in phonetic perception, when information is presented audibly. In our first experiment, we asked whether feature interactions occur when place information is specificed by a combination of auditory and visual information. Members of an auditory continuum ranging from (ibi) to (ipi) were paired with a video display of a talker saying (igi). The auditory tokens were heard as ranging from (ibi) to (ipi), but the auditory-visual tokens were perceived as ranging from (idi) to (iti). The results demonstrated that the voicing boundary for the auditory-visual tokens was located at a significantly longer VOT value than the voicing boundary for the auditory continuum presented without the visual information. These results demonstrate that place-voice interactions are not limited to situations in which place information is specified audibly.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

11.
It has been well documented that listeners are able to estimate speaking rate when listening to a talker, but almost no work has been done on perception of rate information provided by looking at a talker’s face. In the present study, the method of magnitude estimation was used to collect estimates of the rate at which a talker was speaking. The estimates were collected under four experimental conditions: auditory only, visual only, combined auditory-visual, and inverted visual only. The results showed no difference in the slope of the functions relating perceived rate to physical rate for the auditory only, visual only, and combined auditory-visual presentations. There was, however, a significant difference between the normal visual-only and the inverted-visual presentations. These results indicate that there is visual rate information available on a talker’s face and, more importantly, suggest that there is a correspondence between the auditory and visual modalities for the perception of speaking rate, but only when the visual information is presented in its normal orientation.  相似文献   

12.
Visual information provided by a talker’s mouth movements can influence the perception of certain speech features. Thus, the “McGurk effect” shows that when the syllable /bi/ is presented audibly, in synchrony with the syllable /gi/, as it is presented visually, a person perceives the talker as saying /di/. Moreover, studies have shown that interactions occur between place and voicing features in phonetic perception, when information is presented audibly. In our first experiment, we asked whether feature interactions occur when place information is specified by a combination of auditory and visual information. Members of an auditory continuum ranging from /ibi/ to /ipi/ were paired with a video display of a talker saying /igi/. The auditory tokens were heard as ranging from /ibi/ to /ipi/, but the auditory-visual tokens were perceived as ranging from /idi/ to /iti/. The results demonstrated that the voicing boundary for the auditory-visual tokens was located at a significantly longer VOT value than the voicing boundary for the auditory continuum presented without the visual information. These results demonstrate that place-voice interactions are not limited to situations in which place information is specified audibly. In three follow-up experiments, we show that (1) the voicing boundary is not shifted in the absence of a change in the global percept, even when discrepant auditory-visual information is presented; (2) the number of response alternatives provided for the subjects does not affect the categorization or the VOT boundary of the auditory-visual stimuli; and (3) the original effect of a VOT boundary shift is not replicated when subjects are forced by instruction to \ldrelabel\rd the /b-p/auditory stimuli as/d/or/t/. The subjects successfully relabeled the stimuli, but no shift in the VOT boundary was observed.  相似文献   

13.
The importance of visual cues in speech perception is illustrated by the McGurk effect, whereby a speaker’s facial movements affect speech perception. The goal of the present study was to evaluate whether the McGurk effect is also observed for sung syllables. Participants heard and saw sung instances of the syllables /ba/ and /ga/ and then judged the syllable they perceived. Audio-visual stimuli were congruent or incongruent (e.g., auditory /ba/ presented with visual /ga/). The stimuli were presented as spoken, sung in an ascending and descending triad (C E G G E C), and sung in an ascending and descending triad that returned to a semitone above the tonic (C E G G E C#). Results revealed no differences in the proportion of fusion responses between spoken and sung conditions confirming that cross-modal phonemic information is integrated similarly in speech and song.  相似文献   

14.
There is evidence that for both auditory and visual speech perception, familiarity with the talker facilitates speech recognition. Explanations of these effects have concentrated on the retention of talker information specific to each of these modalities. It could be, however, that some amodal, talker-specific articulatory-style information facilitates speech perception in both modalities. If this is true, then experience with a talker in one modality should facilitate perception of speech from that talker in the other modality. In a test of this prediction, subjects were given about 1 hr of experience lipreading a talker and were then asked to recover speech in noise from either this same talker or a different talker. Results revealed that subjects who lip-read and heard speech from the same talker performed better on the speech-in-noise task than did subjects who lip-read from one talker and then heard speech from a different talker.  相似文献   

15.
MacDonald J  Andersen S  Bachmann T 《Perception》2000,29(10):1155-1168
In the McGurk effect (McGurk and MacDonald, 1976 Nature 264 746-748), illusory auditory perception is produced if the visual information from lip movements is discrepant from the auditory information from the voice. A study is reported of the tolerance of the effect to varying levels of spatial degradation (videotaped images of a speaker's face were quantised by a mosaic transform). The illusory effect systematically decreased with an increase in the coarseness of the spatial quantisation. However, even with the coarsest level (11.2 pixels/face) the illusion did not completely disappear. In addition, those participants who did not experience the illusion nevertheless showed the effects of auditory-visual interaction in their clarity ratings of the auditory stimulus. It is concluded that auditory-visual interaction in visible speech perception is based on relatively coarse-spatial-scale information.  相似文献   

16.
In the McGurk effect, visual information specifying a speaker’s articulatory movements can influence auditory judgments of speech. In the present study, we attempted to find an analogue of the McGurk effect by using nonspeech stimuli—the discrepant audiovisual tokens of plucks and bows on a cello. The results of an initial experiment revealed that subjects’ auditory judgments were influenced significantly by the visual pluck and bow stimuli. However, a second experiment in which speech syllables were used demonstrated that the visual influence on consonants was significantly greater than the visual influence observed for pluck-bow stimuli. This result could be interpreted to suggest that the nonspeech visual influence was not a true McGurk effect. In a third experiment, visual stimuli consisting of the wordspluck andbow were found to have no influence over auditory pluck and bow judgments. This result could suggest that the nonspeech effects found in Experiment 1 were based on the audio and visual information’s having an ostensive lawful relation to the specified event. These results are discussed in terms of motor-theory, ecological, and FLMP approaches to speech perception.  相似文献   

17.
Social-rank cues communicate social status or social power within and between groups. Information about social-rank is fluently processed in both visual and auditory modalities. So far, the investigation on the processing of social-rank cues has been limited to studies in which information from a single modality was assessed or manipulated. Yet, in everyday communication, multiple information channels are used to express and understand social-rank. We sought to examine the (in)voluntary nature of processing of facial and vocal signals of social-rank using a cross-modal Stroop task. In two experiments, participants were presented with face-voice pairs that were either congruent or incongruent in social-rank (i.e. social dominance). Participants’ task was to label face social dominance while ignoring the voice, or label voice social dominance while ignoring the face. In both experiments, we found that face-voice incongruent stimuli were processed more slowly and less accurately than were the congruent stimuli in the face-attend and the voice-attend tasks, exhibiting classical Stroop-like effects. These findings are consistent with the functioning of a social-rank bio-behavioural system which consistently and automatically monitors one’s social standing in relation to others and uses that information to guide behaviour.  相似文献   

18.
Speech perception is audiovisual, as demonstrated by the McGurk effect in which discrepant visual speech alters the auditory speech percept. We studied the role of visual attention in audiovisual speech perception by measuring the McGurk effect in two conditions. In the baseline condition, attention was focused on the talking face. In the distracted attention condition, subjects ignored the face and attended to a visual distractor, which was a leaf moving across the face. The McGurk effect was weaker in the latter condition, indicating that visual attention modulated audiovisual speech perception. This modulation may occur at an early, unisensory processing stage, or it may be due to changes at the stage where auditory and visual information is integrated. We investigated this issue by conventional statistical testing, and by fitting the Fuzzy Logical Model of Perception (Massaro, 1998) to the results. The two methods suggested different interpretations, revealing a paradox in the current methods of analysis.  相似文献   

19.
The multistable perception of speech, or verbal transformation effect, refers to perceptual changes experienced while listening to a speech form that is repeated rapidly and continuously. In order to test whether visual information from the speaker's articulatory gestures may modify the emergence and stability of verbal auditory percepts, subjects were instructed to report any perceptual changes during unimodal, audiovisual, and incongruent audiovisual presentations of distinct repeated syllables. In a first experiment, the perceptual stability of reported auditory percepts was significantly modulated by the modality of presentation. In a second experiment, when audiovisual stimuli consisting of a stable audio track dubbed with a video track that alternated between congruent and incongruent stimuli were presented, a strong correlation between the timing of perceptual transitions and the timing of video switches was found. Finally, a third experiment showed that the vocal tract opening onset event provided by the visual input could play the role of a bootstrap mechanism in the search for transformations. Altogether, these results demonstrate the capacity of visual information to control the multistable perception of speech in its phonetic content and temporal course. The verbal transformation effect thus provides a useful experimental paradigm to explore audiovisual interactions in speech perception.  相似文献   

20.
Most speech research with infants occurs in quiet laboratory rooms with no outside distractions. However, in the real world, speech directed to infants often occurs in the presence of other competing acoustic signals. To learn language, infants need to attend to their caregiver’s speech even under less than ideal listening conditions. We examined 7.5-month-old infants’ abilities to selectively attend to a female talker’s voice when a male voice was talking simultaneously. In three experiments, infants heard a target voice repeating isolated words while a distractor voice spoke fluently at one of three different intensities. Subsequently, infants heard passages produced by the target voice containing either the familiar words or novel words. Infants listened longer to the familiar words when the target voice was 10 dB or 5 dB more intense than the distractor, but not when the two voices were equally intense. In a fourth experiment, the assignment of words and passages to the familiarization and testing phases was reversed so that the passages and distractors were presented simultaneously during familiarization, and the infants were tested on the familiar and unfamiliar isolated words. During familiarization, the passages were 10 dB more intense than the distractors. The results suggest that this may be at the limits of what infants at this age can do in separating two different streams of speech. In conclusion, infants have some capacity to extract information from speech even in the face of a competing acoustic voice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号