首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Buchan JN  Munhall KG 《Perception》2011,40(10):1164-1182
Conflicting visual speech information can influence the perception of acoustic speech, causing an illusory percept of a sound not present in the actual acoustic speech (the McGurk effect). We examined whether participants can voluntarily selectively attend to either the auditory or visual modality by instructing participants to pay attention to the information in one modality and to ignore competing information from the other modality. We also examined how performance under these instructions was affected by weakening the influence of the visual information by manipulating the temporal offset between the audio and video channels (experiment 1), and the spatial frequency information present in the video (experiment 2). Gaze behaviour was also monitored to examine whether attentional instructions influenced the gathering of visual information. While task instructions did have an influence on the observed integration of auditory and visual speech information, participants were unable to completely ignore conflicting information, particularly information from the visual stream. Manipulating temporal offset had a more pronounced interaction with task instructions than manipulating the amount of visual information. Participants' gaze behaviour suggests that the attended modality influences the gathering of visual information in audiovisual speech perception.  相似文献   

2.
A brief, vivid phase of auditory sensory storage that outlasts the stimulus could be used in perception in two ways: First, all of the neural activity resulting from the stimulus, including that of the sensory store, could contribute to a sensation of growing loudness; second, the sensory store could permit the continued extraction of information about the sound's acoustic properties. This study includes a task for which these two processes lead to different predictions; a third prediction is based on the two processes combined. The task required loudness judgments for two brief tones presented with a variable intertone interval. The results of Experiments 1-3 were as one would expect if both the growth of sensation and information extraction contributed to the pattern of loudness judgments. Experiment 4 strengthened the two-process account by demonstrating the separability of the two processes. Approaches to mathematical modeling of these results are discussed.  相似文献   

3.
Perception of visual speech and the influence of visual speech on auditory speech perception is affected by the orientation of a talker's face, but the nature of the visual information underlying this effect has yet to be established. Here, we examine the contributions of visually coarse (configural) and fine (featural) facial movement information to inversion effects in the perception of visual and audiovisual speech. We describe two experiments in which we disrupted perception of fine facial detail by decreasing spatial frequency (blurring) and disrupted perception of coarse configural information by facial inversion. For normal, unblurred talking faces, facial inversion had no influence on visual speech identification or on the effects of congruent or incongruent visual speech movements on perception of auditory speech. However, for blurred faces, facial inversion reduced identification of unimodal visual speech and effects of visual speech on perception of congruent and incongruent auditory speech. These effects were more pronounced for words whose appearance may be defined by fine featural detail. Implications for the nature of inversion effects in visual and audiovisual speech are discussed.  相似文献   

4.
Hearing by eye   总被引:3,自引:0,他引:3  
Recent work on integration of auditory and visual information during speech perception has indicated that adults are surprisingly good at, and rely extensively on, lip reading. The conceptual status of lip read information is of interest: such information is at the same time both visual and phonological. Three experiments investigated the nature of short term coding of lip read information in hearing subjects. The first experiment used asynchronous visual and auditory information and showed that a subject's ability to repeat words, when heard speech lagged lip movements, was unaffected by the lag duration, both quantitatively and qualitatively. This suggests that lip read information is immediately recoded into a durable code. An experiment on serial recall of lip read items showed a serial position curve containing a recency effect (characteristic of auditory but not visual input). It was then shown that an auditory suffix diminishes the recency effect obtained with lip read stimuli. These results are consistent with the hypothesis that seen speech, that is not heard, is encoded into a durable code which has some shared properties with heard speech. The results of the serial recall experiments are inconsistent with interpretations of the recency and suffix effects in terms of precategorical acoustic storage, for they demonstrate that recency and suffix effects can be supra-modal.  相似文献   

5.
In the McGurk effect, visual information specifying a speaker’s articulatory movements can influence auditory judgments of speech. In the present study, we attempted to find an analogue of the McGurk effect by using nonspeech stimuli—the discrepant audiovisual tokens of plucks and bows on a cello. The results of an initial experiment revealed that subjects’ auditory judgments were influenced significantly by the visual pluck and bow stimuli. However, a second experiment in which speech syllables were used demonstrated that the visual influence on consonants was significantly greater than the visual influence observed for pluck-bow stimuli. This result could be interpreted to suggest that the nonspeech visual influence was not a true McGurk effect. In a third experiment, visual stimuli consisting of the wordspluck andbow were found to have no influence over auditory pluck and bow judgments. This result could suggest that the nonspeech effects found in Experiment 1 were based on the audio and visual information’s having an ostensive lawful relation to the specified event. These results are discussed in terms of motor-theory, ecological, and FLMP approaches to speech perception.  相似文献   

6.
In two previous studies, the perception of speech rate was found to be positively related to the vocal frequency and intensity of speech. In those studies, a single sample of spontaneous, content-masked speech was used to produce nine stimuli by factorially varying three levels of each vocal frequency and intensity, while controlling the actual speech rate of the stimuli. Participants were asked to judge each stimulus, preceded by a standard, “anchoring,” stimulus as to its speech rate, pitch, loudness, and duration. The purpose of the three studies reported here was to examine the generalizability of the previous findings by using stimuli that were nonmasked and/or were not preceded by an anchoring stimulus. In each study, nine speech stimuli were prepared, as described above, and participants were asked to make judgments about the rate, pitch, loudness, and duration of each stimulus. In the first study, the stimuli were masked but were not preceded by an anchoring stimulus. In the second study, participants listened to content-standard speech stimuli preceded by an anchoring stimulus. Finally, in the third study, content-standard stimuli without an anchoring stimulus were used. In addition, studies two and three used speech segments of a male and a female speaker. The findings from the three studies replicated the central findings of the previous studies. They suggest, in other words, that rate perception of speech is indeed influenced by vocal frequency and, to some extent, by intensity, and that these relationships are not materially altered by the speakers'gender.  相似文献   

7.
To examine whether a weapon's presence impairs witnesses' memory for auditory information (as it impairs memory for visual information), we conducted two experiments in which undergraduates watched one version of a videotape depicting a male target who held either a weapon or a neutral object and conversed with a female character. The semantic content of his remarks was either easy or difficult to comprehend. The weapon's presence did not affect voice identification accuracy or memory for the target's vocal characteristics (e.g., pitch, loudness, speech rate) but did worsen memory for semantic content in the Difficult Comprehension condition. Our results can be explained by multiple resource models of attention, which propose separate resource "pools" for different sensory modalities.  相似文献   

8.
Vatakis A  Spence C 《Perception》2008,37(1):143-160
Research has shown that inversion is more detrimental to the perception of faces than to the perception of other types of visual stimuli. Inverting a face results in an impairment of configural information processing that leads to slowed early face processing and reduced accuracy when performance is tested in face recognition tasks. We investigated the effects of inverting speech and non-speech stimuli on audiovisual temporal perception. Upright and inverted audiovisual video clips of a person uttering syllables (experiments 1 and 2), playing musical notes on a piano (experiment 3), or a rhesus monkey producing vocalisations (experiment 4) were presented. Participants made unspeeded temporal-order judgments regarding which modality stream (auditory or visual) appeared to have been presented first. Inverting the visual stream did not have any effect on the sensitivity of temporal discrimination responses in any of the four experiments, thus implying that audiovisual temporal integration is resilient to the effects of orientation in the picture plane. By contrast, the point of subjective simultaneity differed significantly as a function of orientation only for the audiovisual speech stimuli but not for the non-speech stimuli or monkey calls. That is, smaller auditory leads were required for the inverted than for the upright-visual speech stimuli. These results are consistent with the longer processing latencies reported previously when human faces are inverted and demonstrates that the temporal perception of dynamic audiovisual speech can be modulated by changes in the physical properties of the visual speech (ie by changes in orientation).  相似文献   

9.
Congruent information conveyed over different sensory modalities often facilitates a variety of cognitive processes, including speech perception (Sumby & Pollack, 1954). Since auditory processing is substantially faster than visual processing, auditory-visual integration can occur over a surprisingly wide temporal window (Stein, 1998). We investigated the processing architecture mediating the integration of acoustic digit names with corresponding symbolic visual forms. The digits "1" or "2" were presented in auditory, visual, or bimodal format at several stimulus onset asynchronies (SOAs; 0, 75, 150, and 225 msec). The reaction times (RTs) for echoing unimodal auditory stimuli were approximately 100 msec faster than the RTs for naming their visual forms. Correspondingly, bimodal facilitation violated race model predictions, but only at SOA values greater than 75 msec. These results indicate that the acoustic and visual information are pooled prior to verbal response programming. However, full expression of this bimodal summation is dependent on the central coincidence of the visual and auditory inputs. These results are considered in the context of studies demonstrating multimodal activation of regions involved in speech production.  相似文献   

10.
11.
When participants judge multimodal audiovisual stimuli, the auditory information strongly dominates temporal judgments, whereas the visual information dominates spatial judgments. However, temporal judgments are not independent of spatial features. For example, in the kappa effect, the time interval between two marker stimuli appears longer when they originate from spatially distant sources rather than from the same source. We investigated the kappa effect for auditory markers presented with accompanying irrelevant visual stimuli. The spatial sources of the markers were varied such that they were either congruent or incongruent across modalities. In two experiments, we demonstrated that the spatial layout of the visual stimuli affected perceived auditory interval duration. This effect occurred although the visual stimuli were designated to be task-irrelevant for the duration reproduction task in Experiment 1, and even when the visual stimuli did not contain sufficient temporal information to perform a two-interval comparison task in Experiment 2. We conclude that the visual and auditory marker stimuli were integrated into a combined multisensory percept containing temporal as well as task-irrelevant spatial aspects of the stimulation. Through this multisensory integration process, visuospatial information affected even temporal judgments, which are typically dominated by the auditory modality.  相似文献   

12.
Research has shown that auditory speech recognition is influenced by the appearance of a talker's face, but the actual nature of this visual information has yet to be established. Here, we report three experiments that investigated visual and audiovisual speech recognition using color, gray-scale, and point-light talking faces (which allowed comparison with the influence of isolated kinematic information). Auditory and visual forms of the syllables /ba/, /bi/, /ga/, /gi/, /va/, and /vi/ were used to produce auditory, visual, congruent, and incongruent audiovisual speech stimuli. Visual speech identification and visual influences on identifying the auditory components of congruent and incongruent audiovisual speech were identical for color and gray-scale faces and were much greater than for point-light faces. These results indicate that luminance, rather than color, underlies visual and audiovisual speech perception and that this information is more than the kinematic information provided by point-light faces. Implications for processing visual and audiovisual speech are discussed.  相似文献   

13.
Three experiments investigated the "McGurk effect" whereby optically specified syllables experienced synchronously with acoustically specified syllables integrate in perception to determine a listener's auditory perceptual experience. Experiments contrasted the cross-modal effect of orthographic on acoustic syllables presumed to be associated in experience and memory with that of haptically experienced and acoustic syllables presumed not to be associated. The latter pairing gave rise to cross-modal influences when Ss were informed that cross-modal syllables were paired independently. Mouthed syllables affected reports of simultaneously heard syllables (and vice versa). These effects were absent when syllables were simultaneously seen (spelled) and heard. The McGurk effect does not arise from association in memory but from conjoint near specification of the same causal source in the environment--in speech, the moving vocal tract producing phonetic gestures.  相似文献   

14.
We are constantly exposed to our own face and voice, and we identify our own faces and voices as familiar. However, the influence of self-identity upon self-speech perception is still uncertain. Speech perception is a synthesis of both auditory and visual inputs; although we hear our own voice when we speak, we rarely see the dynamic movements of our own face. If visual speech and identity are processed independently, no processing advantage would obtain in viewing one’s own highly familiar face. In the present experiment, the relative contributions of facial and vocal inputs to speech perception were evaluated with an audiovisual illusion. Our results indicate that auditory self-speech conveys a processing advantage, whereas visual self-speech does not. The data thereby support a model of visual speech as dynamic movement processed separately from speaker recognition.  相似文献   

15.
Lip reading is the ability to partially understand speech by looking at the speaker's lips. It improves the intelligibility of speech in noise when audio-visual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speaker's lips also enhances sensitivity to acoustic information, decreasing the auditory detection threshold of speech embedded in noise [J. Acoust. Soc. Am. 109 (2001) 2272; J. Acoust. Soc. Am. 108 (2000) 1197]. However, detection is different from comprehension, and it remains to be seen whether improved sensitivity also results in an intelligibility gain in audio-visual speech perception. In this work, we use an original paradigm to show that seeing the speaker's lips enables the listener to hear better and hence to understand better. The audio-visual stimuli used here could not be differentiated by lip reading per se since they contained exactly the same lip gesture matched with different compatible speech sounds. Nevertheless, the noise-masked stimuli were more intelligible in the audio-visual condition than in the audio-only condition due to the contribution of visual information to the extraction of acoustic cues. Replacing the lip gesture by a non-speech visual input with exactly the same time course, providing the same temporal cues for extraction, removed the intelligibility benefit. This early contribution to audio-visual speech identification is discussed in relationships with recent neurophysiological data on audio-visual perception.  相似文献   

16.
The ability to make accurate audiovisual synchrony judgments is affected by the "complexity" of the stimuli: We are much better at making judgments when matching single beeps or flashes as opposed to video recordings of speech or music. In the present study, we investigated whether the predictability of sequences affects whether participants report that auditory and visual sequences appear to be temporally coincident. When we reduced their ability to predict both the next pitch in the sequence and the temporal pattern, we found that participants were increasingly likely to report that the audiovisual sequences were synchronous. However, when we manipulated pitch and temporal predictability independently, the same effect did not occur. By altering the temporal density (items per second) of the sequences, we further determined that the predictability effect occurred only in temporally dense sequences: If the sequences were slow, participants' responses did not change as a function of predictability. We propose that reduced predictability affects synchrony judgments by reducing the effective pitch and temporal acuity in perception of the sequences.  相似文献   

17.
It is shown that an irrelevant visual perception interferes more with verbal learning by means of imagery than does an irrelevant auditory perception. The relative interfering effects of these perceptions were reversed in a verbal learning task involving highly abstract materials. Such results implicate the existence of a true visual component in imaginal mediation. A theoretical model is presented in which a visual system and a verbal-auditory system are distinguished. The visual system controls visual perception and visual imagination. The verbal-auditory system controls auditory perception, auditory imagination, internal verbal representation, and speech. Attention can be more easily divided between the two systems than within either one taken by itself. Furthermore, the visual and verbal-auditory systems are functionally linked by information recoding operations. The application of mnemonic imagery appears to involve a recoding of initially verbal information into visual form, and then the encoding of a primarily visual schema into memory. During recall, the schema is decoded as a visual image, and then recoded once again into the verbal-auditory system. Evidence for such transformations is provided not only by the interference data, but also by an analysis of recall-errors made by Ss using mnemonic imagery.  相似文献   

18.
The human voice is the carrier of speech, but also an "auditory face" that conveys important affective and identity information. Little is known about the neural bases of our abilities to perceive such paralinguistic information in voice. Results from recent neuroimaging studies suggest that the different types of vocal information could be processed in partially dissociated functional pathways, and support a neurocognitive model of voice perception largely similar to that proposed for face perception.  相似文献   

19.
The multistable perception of speech, or verbal transformation effect, refers to perceptual changes experienced while listening to a speech form that is repeated rapidly and continuously. In order to test whether visual information from the speaker's articulatory gestures may modify the emergence and stability of verbal auditory percepts, subjects were instructed to report any perceptual changes during unimodal, audiovisual, and incongruent audiovisual presentations of distinct repeated syllables. In a first experiment, the perceptual stability of reported auditory percepts was significantly modulated by the modality of presentation. In a second experiment, when audiovisual stimuli consisting of a stable audio track dubbed with a video track that alternated between congruent and incongruent stimuli were presented, a strong correlation between the timing of perceptual transitions and the timing of video switches was found. Finally, a third experiment showed that the vocal tract opening onset event provided by the visual input could play the role of a bootstrap mechanism in the search for transformations. Altogether, these results demonstrate the capacity of visual information to control the multistable perception of speech in its phonetic content and temporal course. The verbal transformation effect thus provides a useful experimental paradigm to explore audiovisual interactions in speech perception.  相似文献   

20.
When listeners hear a sinusoidal replica of a sentence, they perceive linguistic properties despite the absence of short-time acoustic components typical of vocal signals. Is this accomplished by a postperceptual strategy that accommodates the anomalous acoustic pattern ad hoc, or is a sinusoidal sentence understood by the ordinary means of speech perception? If listeners treat sinusoidal signals as speech signals however unlike speech they may be, then perception should exhibit the commonplace sensitivity to the dimensions of the originating vocal tract. The present study, employing sinusoidal signals, raised this issue by testing the identification of target /bVt/, or b-vowel-t, syllables occurring in sentences that differed in the range of frequency variation of their component tones. Vowel quality of target syllables was influenced by this acoustic correlate of vocal-tract scale, implying that the perception of these nonvocal signals includes a process of vocal-tract scale, implying that the perception of these nonvocal signals includes a process of vocal-tract normalization. Converging evidence suggests that the perception of sinusoidal vowels depends on the relation among component tones and not on the phonetic likeness of each tone in isolation. The findings support the general claim that sinusoidal replicas of natural speech signals are perceptible phonetically because they preserve time-varying information present in natural signals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号