首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Studies of the McGurk effect have shown that when discrepant phonetic information is delivered to the auditory and visual modalities, the information is combined into a new percept not originally presented to either modality. In typical experiments, the auditory and visual speech signals are generated by the same talker. The present experiment examined whether a discrepancy in the gender of the talker between the auditory and visual signals would influence the magnitude of the McGurk effect. A male talker’s voice was dubbed onto a videotape containing a female talker’s face, and vice versa. The gender-incongruent videotapes were compared with gender-congruent videotapes, in which a male talker’s voice was dubbed onto a male face and a female talker’s voice was dubbed onto a female face. Even though there was a clear incompatibility in talker characteristics between the auditory and visual signals on the incongruent videotapes, the resulting magnitude of the McGurk effectwas not significantly different for the incongruent as opposed to the congruent videotapes. The results indicate that the mechanism for integrating speech information from the auditory and the visual modalities is not disrupted by a gender incompatibility even when it is perceptually apparent. The findings are compatible with the theoretical notion that information about voice characteristics of the talker is extracted and used to normalize the speech signal at an early stage of phonetic processing, prior to the integration of the auditory and the visual information.  相似文献   

2.
The effects of talker variability on visual speech perception were tested by having subjects speechread sentences from either single-talker or mixed-talker sentence lists. Results revealed that changes in talker from trial to trial decreased speechreading performance. To help determine whether this decrement was due to talker change--and not a change in superficial characteristics of the stimuli--Experiment 2 tested speechreading from visual stimuli whose images were tinted by a single color, or mixed colors. Results revealed that the mixed-color lists did not inhibit speechreading performance relative to the single-color lists. These results are analogous to findings in the auditory speech literature and suggest that, like auditory speech, visual speech operations include a resource-demanding component that is influenced by talker variability.  相似文献   

3.
It has been well documented that listeners are able to estimate speaking rate when listening to a talker, but almost no work has been done on perception of rate information provided by looking at a talker’s face. In the present study, the method of magnitude estimation was used to collect estimates of the rate at which a talker was speaking. The estimates were collected under four experimental conditions: auditory only, visual only, combined auditory-visual, and inverted visual only. The results showed no difference in the slope of the functions relating perceived rate to physical rate for the auditory only, visual only, and combined auditory-visual presentations. There was, however, a significant difference between the normal visual-only and the inverted-visual presentations. These results indicate that there is visual rate information available on a talker’s face and, more importantly, suggest that there is a correspondence between the auditory and visual modalities for the perception of speaking rate, but only when the visual information is presented in its normal orientation.  相似文献   

4.
Studies of the McGurk effect have shown that when discrepant phonetic information is delivered to the auditory and visual modalities, the information is combined into a new percept not originally presented to either modality. In typical experiments, the auditory and visual speech signals are generated by the same talker. The present experiment examined whether a discrepancy in the gender of the talker between the auditory and visual signals would influence the magnitude of the McGurk effect. A male talker's voice was dubbed onto a videotape containing a female talker's face, and vice versa. The gender-incongruent videotapes were compared with gender-congruent videotapes, in which a male talker's voice was dubbed onto a male face and a female talker's voice was dubbed onto a female face. Even though there was a clear incompatibility in talker characteristics between the auditory and visual signals on the incongruent videotapes, the resulting magnitude of the McGurk effect was not significantly different for the incongruent as opposed to the congruent videotapes. The results indicate that the mechanism for integrating speech information from the auditory and the visual modalities is not disrupted by a gender incompatibility even when it is perceptually apparent. The findings are compatible with the theoretical notion that information about voice characteristics of the talker is extracted and used to normalize the speech signal at an early stage of phonetic processing, prior to the integration of the auditory and the visual information.  相似文献   

5.
There is evidence that for both auditory and visual speech perception, familiarity with the talker facilitates speech recognition. Explanations of these effects have concentrated on the retention of talker information specific to each of these modalities. It could be, however, that some amodal, talker-specific articulatory-style information facilitates speech perception in both modalities. If this is true, then experience with a talker in one modality should facilitate perception of speech from that talker in the other modality. In a test of this prediction, subjects were given about 1 hr of experience lipreading a talker and were then asked to recover speech in noise from either this same talker or a different talker. Results revealed that subjects who lip-read and heard speech from the same talker performed better on the speech-in-noise task than did subjects who lip-read from one talker and then heard speech from a different talker.  相似文献   

6.
Visual information provided by a talker's mouth movements can influence the perception of certain speech features. Thus, the "McGurk effect" shows that when the syllable (bi) is presented audibly, in synchrony with the syllable (gi), as it is presented visually, a person perceives the talker as saying (di). Moreover, studies have shown that interactions occur between place and voicing features in phonetic perception, when information is presented audibly. In our first experiment, we asked whether feature interactions occur when place information is specificed by a combination of auditory and visual information. Members of an auditory continuum ranging from (ibi) to (ipi) were paired with a video display of a talker saying (igi). The auditory tokens were heard as ranging from (ibi) to (ipi), but the auditory-visual tokens were perceived as ranging from (idi) to (iti). The results demonstrated that the voicing boundary for the auditory-visual tokens was located at a significantly longer VOT value than the voicing boundary for the auditory continuum presented without the visual information. These results demonstrate that place-voice interactions are not limited to situations in which place information is specified audibly.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

7.
Visual information provided by a talker’s mouth movements can influence the perception of certain speech features. Thus, the “McGurk effect” shows that when the syllable /bi/ is presented audibly, in synchrony with the syllable /gi/, as it is presented visually, a person perceives the talker as saying /di/. Moreover, studies have shown that interactions occur between place and voicing features in phonetic perception, when information is presented audibly. In our first experiment, we asked whether feature interactions occur when place information is specified by a combination of auditory and visual information. Members of an auditory continuum ranging from /ibi/ to /ipi/ were paired with a video display of a talker saying /igi/. The auditory tokens were heard as ranging from /ibi/ to /ipi/, but the auditory-visual tokens were perceived as ranging from /idi/ to /iti/. The results demonstrated that the voicing boundary for the auditory-visual tokens was located at a significantly longer VOT value than the voicing boundary for the auditory continuum presented without the visual information. These results demonstrate that place-voice interactions are not limited to situations in which place information is specified audibly. In three follow-up experiments, we show that (1) the voicing boundary is not shifted in the absence of a change in the global percept, even when discrepant auditory-visual information is presented; (2) the number of response alternatives provided for the subjects does not affect the categorization or the VOT boundary of the auditory-visual stimuli; and (3) the original effect of a VOT boundary shift is not replicated when subjects are forced by instruction to \ldrelabel\rd the /b-p/auditory stimuli as/d/or/t/. The subjects successfully relabeled the stimuli, but no shift in the VOT boundary was observed.  相似文献   

8.
The effects of viewing the face of the talker (visual speech) on the processing of clearly presented intact auditory stimuli were investigated using two measures likely to be sensitive to the articulatory motor actions produced in speaking. The aim of these experiments was to highlight the need for accounts of the effects of audio-visual (AV) speech that explicitly consider the properties of articulated action. The first experiment employed a syllable-monitoring task in which participants were required to monitor for target syllables within foreign carrier phrases. An AV effect was found in that seeing a talker's moving face (moving face condition) assisted in more accurate recognition (hits and correct rejections) of spoken syllables than of auditory-only still face (still face condition) presentations. The second experiment examined processing of spoken phrases by investigating whether an AV effect would be found for estimates of phrase duration. Two effects of seeing the moving face of the talker were found. First, the moving face condition had significantly longer duration estimates than the still face auditory-only condition. Second, estimates of auditory duration made in the moving face condition reliably correlated with the actual durations whereas those made in the still face auditory condition did not. The third experiment was carried out to determine whether the stronger correlation between estimated and actual duration in the moving face condition might have been due to generic properties of AV presentation. Experiment 3 employed the procedures of the second experiment but used stimuli that were not perceived as speech although they possessed the same timing cues as those of the speech stimuli of Experiment 2. It was found that simply presenting both auditory and visual timing information did not result in more reliable duration estimates. Further, when released from the speech context (used in Experiment 2), duration estimates for the auditory-only stimuli were significantly correlated with actual durations. In all, these results demonstrate that visual speech can assist in the analysis of clearly presented auditory stimuli in tasks concerned with information provided by viewing the production of an utterance. We suggest that these findings are consistent with there being a processing link between perception and action such that viewing a talker speaking will activate speech motor schemas in the perceiver.  相似文献   

9.
A series of experiments was conducted to determine if linguistic representations accessed during reading include auditory imagery for characteristics of a talker's voice. In 3 experiments, participants were familiarized with two talkers during a brief prerecorded conversation. One talker spoke at a fast speaking rate, and one spoke at a slow speaking rate. Each talker was identified by name. At test, participants were asked to either read aloud (Experiment 1) or silently (Experiments 1, 2, and 3) a passage that they were told was written by either the fast or the slow talker. Reading times, both silent and aloud, were significantly slower when participants thought they were reading a passage written by the slow talker than when reading a passage written by the fast talker. Reading times differed as a function of passage author more for difficult than for easy texts, and individual differences in general auditory imagery ability were related to reading times. These results suggest that readers engage in a type of auditory imagery while reading that preserves the perceptual details of an author's voice.  相似文献   

10.
Endogenous attention is typically studied by presenting instructive cues in advance of a target stimulus array. For endogenous visual attention, task performance improves as the duration of the cue-target interval increases up to 800 ms. Less is known about how endogenous auditory attention unfolds over time or the mechanisms by which an instructive cue presented in advance of an auditory array improves performance. The current experiment used five cue-target intervals (0, 250, 500, 1,000, and 2,000 ms) to compare four hypotheses for how preparatory attention develops over time in a multi-talker listening task. Young adults were cued to attend to a target talker who spoke in a mixture of three talkers. Visual cues indicated the target talker’s spatial location or their gender. Participants directed attention to location and gender simultaneously (“objects”) at all cue-target intervals. Participants were consistently faster and more accurate at reporting words spoken by the target talker when the cue-target interval was 2,000 ms than 0 ms. In addition, the latency of correct responses progressively shortened as the duration of the cue-target interval increased from 0 to 2,000 ms. These findings suggest that the mechanisms involved in preparatory auditory attention develop gradually over time, taking at least 2,000 ms to reach optimal configuration, yet providing cumulative improvements in speech intelligibility as the duration of the cue-target interval increases from 0 to 2,000 ms. These results demonstrate an improvement in performance for cue-target intervals longer than those that have been reported previously in the visual or auditory modalities.  相似文献   

11.
A shadowing task was used to demonstrate an auditory analogue of change blindness (the failure to detect a change in a visual scene), namely change deafness. Participants repeated words varying in lexical difficulty. Halfway through the word list, either the same or a different talker presented the words to participants. At least 40% of the participants failed to detect the change in talker. More interesting is that differences in shadowing times were found as a function of change detection. Alternative possibilities to the change detection phenomenon were ruled out. The results of these experiments suggest that the allocation of attention may influence the detection of changes as well as the processing of spoken words in complex ways.  相似文献   

12.
When engaging in conversation, we efficiently go back and forth with our partner, organizing our contributions in reciprocal turn-taking behavior. Using multiple auditory and visual cues, we make online decisions about when it is the appropriate time to take our turn. In two experiments, we demonstrated, for the first time, that auditory and visual information serve complementary roles when making such turn-taking decisions. We presented clips of single utterances spoken by individuals engaged in conversations in audiovisual, auditory-only or visual-only modalities. These utterances occurred either right before a turn exchange (i.e., ‘Turn-Ends’) or right before the next sentence spoken by the same talker (i.e., ‘Turn-Continuations’). In Experiment 1, participants discriminated between Turn-Ends and Turn-Continuations in order to synchronize a button-press response to the moment the talker would stop speaking. We showed that participants were best at discriminating between Turn-Ends and Turn-Continuations in the audiovisual condition. However, in terms of response synchronization, participants were equally precise at timing their responses to a Turn-End in the audiovisual and auditory-only conditions, showing no advantage of visual information. In Experiment 2, we used a gating paradigm, where increasing segments of Turns-Ends and Turn-Continuations were presented, and participants predicted if a turn exchange would occur at the end of the sentence. We found an audiovisual advantage in detecting an upcoming turn early in the perception of a turn exchange. Together, these results suggest that visual information functions as an early signal indicating an upcoming turn exchange while auditory information is used to precisely time a response to the turn end.  相似文献   

13.
In this research, we investigated the effects of voice and face information on the perceptual learning of talkers and on long-term memory for spoken words. In the first phase, listeners were trained over several days to identify voices from words presented auditorily or audiovisually. The training data showed that visual information about speakers enhanced voice learning, revealing cross-modal connections in talker processing akin to those observed in speech processing. In the second phase, the listeners completed an auditory or audiovisual word recognition memory test in which equal numbers of words were spoken by familiar and unfamiliar talkers. The data showed that words presented by familiar talkers were more likely to be retrieved from episodic memory, regardless of modality. Together, these findings provide new information about the representational code underlying familiar talker recognition and the role of stimulus familiarity in episodic word recognition.  相似文献   

14.
Rosenblum, Miller, and Sanchez (Psychological Science, 18, 392-396, 2007) found that subjects first trained to lip-read a particular talker were then better able to perceive the auditory speech of that same talker, as compared with that of a novel talker. This suggests that the talker experience a perceiver gains in one sensory modality can be transferred to another modality to make that speech easier to perceive. An experiment was conducted to examine whether this cross-sensory transfer of talker experience could occur (1) from auditory to lip-read speech, (2) with subjects not screened for adequate lipreading skill, (3) when both a familiar and an unfamiliar talker are presented during lipreading, and (4) for both old (presentation set) and new words. Subjects were first asked to identify a set of words from a talker. They were then asked to perform a lipreading task from two faces, one of which was of the same talker they heard in the first phase of the experiment. Results revealed that subjects who lip-read from the same talker they had heard performed better than those who lip-read a different talker, regardless of whether the words were old or new. These results add further evidence that learning of amodal talker information can facilitate speech perception across modalities and also suggest that this information is not restricted to previously heard words.  相似文献   

15.
Cvejic E  Kim J  Davis C 《Cognition》2012,122(3):442-453
Prosody can be expressed not only by modification to the timing, stress and intonation of auditory speech but also by modifying visual speech. Studies have shown that the production of visual cues to prosody is highly variable (both within and across speakers), however behavioural studies have shown that perceivers can effectively use such visual cues. The latter result suggests that people are sensitive to the type of prosody expressed despite cue variability. The current study investigated the extent to which perceivers can match visual cues to prosody from different speakers and from different face regions. Participants were presented two pairs of sentences (consisting of the same segmental content) and were required to decide which pair had the same prosody. Experiment 1 tested visual and auditory cues from the same speaker and Experiment 2 from different speakers. Experiment 3 used visual cues from the upper and the lower face of the same talker and Experiment 4 from different speakers. The results showed that perceivers could accurately match prosody even when signals were produced by different speakers. Furthermore, perceivers were able to match the prosodic cues both within and across modalities regardless of the face area presented. This ability to match prosody from very different visual cues suggests that perceivers cope with variation in the production of visual prosody by flexibly mapping specific tokens to abstract prosodic types.  相似文献   

16.
Speech perception, especially in noise, may be maximized if the perceiver observes the naturally occurring visual-plus-auditory cues inherent in the production of spoken language. Evidence is conflicting, however, about which aspects of visual information mediate enhanced speech perception in noise. For this reason, we investigated the relative contributions of audibility and the type of visual cue in three experiments in young adults with normal hearing and vision. Relative to static visual cues, access to the talker??s phonetic gestures in speech production, especially in noise, was associated with (a) faster response times and sensitivity for speech understanding in noise, and (b) shorter latencies and reduced amplitudes of auditory N1 event-related potentials. Dynamic chewing facial motion also decreased the N1 latency, but only meaningful linguistic motions reduced the N1 amplitude. The hypothesis that auditory?Cvisual facilitation is distinct to properties of natural, dynamic speech gestures was partially supported.  相似文献   

17.
In a series of experiments, the effect of white noise distortion and talker variation on lexical access in normal and Broca's aphasic participants was examined using an auditory lexical decision paradigm. Masking the prime stimulus in white noise resulted in reduced semantic priming for both groups, indicating that lexical access is degraded by nonlinguistic white noise distortion. However, talker variation within a prime-target pair had no effect upon the performance of either the normal or aphasic individuals. The absence of a talker variation effect suggests that voice-specific information is not encoded in the lexical representations of words. The normal performance of Broca's aphasics under conditions of both white noise and talker variation supports the view that these patients have a lexical processing impairment rather than a more generalized language deficit characterized by limited computational resources.  相似文献   

18.
In a cross-modal matching task, participants were asked to match visual and auditory displays of speech based on the identity of the speaker. The present investigation used this task with acoustically transformed speech to examine the properties of sound that can convey cross-modal information. Word recognition performance was also measured under the same transformations. The authors found that cross-modal matching was only possible under transformations that preserved the relative spectral and temporal patterns of formant frequencies. In addition, cross-modal matching was only possible under the same conditions that yielded robust word recognition performance. The results are consistent with the hypothesis that acoustic and optical displays of speech simultaneously carry articulatory information about both the underlying linguistic message and indexical properties of the talker.  相似文献   

19.
Speech alignment is the tendency for interlocutors to unconsciously imitate one another’s speaking style. Alignment also occurs when a talker is asked to shadow recorded words (e.g., Shockley, Sabadini, & Fowler, 2004). In two experiments, we examined whether alignment could be induced with visual (lipread) speech and with auditory speech. In Experiment 1, we asked subjects to lipread and shadow out loud a model silently uttering words. The results indicate that shadowed utterances sounded more similar to the model’s utterances than did subjects’ nonshadowed read utterances. This suggests that speech alignment can be based on visual speech. In Experiment 2, we tested whether raters could perceive alignment across modalities. Raters were asked to judge the relative similarity between a model’s visual (silent video) utterance and subjects’ audio utterances. The subjects’ shadowed utterances were again judged as more similar to the model’s than were read utterances, suggesting that raters are sensitive to cross-modal similarity between aligned words.  相似文献   

20.
When the auditory and visual components of spoken audiovisual nonsense syllables are mismatched, perceivers produce four different types of perceptual responses, auditory correct, visual correct, fusion (the so-called McGurk effect), and combination (i.e., two consonants are reported). Here, quantitative measures were developed to account for the distribution of the four types of perceptual responses to 384 different stimuli from four talkers. The measures included mutual information, correlations, and acoustic measures, all representing audiovisual stimulus relationships. In Experiment 1, open-set perceptual responses were obtained for acoustic /bɑ/ or /lɑ/ dubbed to video /bɑ, dɑ, gɑ, vɑ, zɑ, lɑ, wɑ, eɑ/. The talker, the video syllable, and the acoustic syllable significantly influenced the type of response. In Experiment 2, the best predictors of response category proportions were a subset of the physical stimulus measures, with the variance accounted for in the perceptual response category proportions between 17% and 52%. That audiovisual stimulus relationships can account for perceptual response distributions supports the possibility that internal representations are based on modality-specific stimulus relationships.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号