首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
McCotter MV  Jordan TR 《Perception》2003,32(8):921-936
We conducted four experiments to investigate the role of colour and luminance information in visual and audiovisual speech perception. In experiments 1a (stimuli presented in quiet conditions) and 1b (stimuli presented in auditory noise), face display types comprised naturalistic colour (NC), grey-scale (GS), and luminance inverted (LI) faces. In experiments 2a (quiet) and 2b (noise), face display types comprised NC, colour inverted (CI), LI, and colour and luminance inverted (CLI) faces. Six syllables and twenty-two words were used to produce auditory and visual speech stimuli. Auditory and visual signals were combined to produce congruent and incongruent audiovisual speech stimuli. Experiments 1a and 1b showed that perception of visual speech, and its influence on identifying the auditory components of congruent and incongruent audiovisual speech, was less for LI than for either NC or GS faces, which produced identical results. Experiments 2a and 2b showed that perception of visual speech, and influences on perception of incongruent auditory speech, was less for LI and CLI faces than for NC and CI faces (which produced identical patterns of performance). Our findings for NC and CI faces suggest that colour is not critical for perception of visual and audiovisual speech. The effect of luminance inversion on performance accuracy was relatively small (5%), which suggests that the luminance information preserved in LI faces is important for the processing of visual and audiovisual speech.  相似文献   

2.
Audiovisual timing perception can recalibrate following prolonged exposure to asynchronous auditory and visual inputs. It has been suggested that this might contribute to achieving perceptual synchrony for auditory and visual signals despite differences in physical and neural signal times for sight and sound. However, given that people can be concurrently exposed to multiple audiovisual stimuli with variable neural signal times, a mechanism that recalibrates all audiovisual timing percepts to a single timing relationship could be dysfunctional. In the experiments reported here, we showed that audiovisual temporal recalibration can be specific for particular audiovisual pairings. Participants were shown alternating movies of male and female actors containing positive and negative temporal asynchronies between the auditory and visual streams. We found that audiovisual synchrony estimates for each actor were shifted toward the preceding audiovisual timing relationship for that actor and that such temporal recalibrations occurred in positive and negative directions concurrently. Our results show that humans can form multiple concurrent estimates of appropriate timing for audiovisual synchrony.  相似文献   

3.
Infant perception often deals with audiovisual speech input and a first step in processing this input is to perceive both visual and auditory information. The speech directed to infants has special characteristics and may enhance visual aspects of speech. The current study was designed to explore the impact of visual enhancement in infant-directed speech (IDS) on audiovisual mismatch detection in a naturalistic setting. Twenty infants participated in an experiment with a visual fixation task conducted in participants’ homes. Stimuli consisted of IDS and adult-directed speech (ADS) syllables with a plosive and the vowel /a:/, /i:/ or /u:/. These were either audiovisually congruent or incongruent. Infants looked longer at incongruent than congruent syllables and longer at IDS than ADS syllables, indicating that IDS and incongruent stimuli contain cues that can make audiovisual perception challenging and thereby attract infants’ gaze.  相似文献   

4.
Research has shown that auditory speech recognition is influenced by the appearance of a talker's face, but the actual nature of this visual information has yet to be established. Here, we report three experiments that investigated visual and audiovisual speech recognition using color, gray-scale, and point-light talking faces (which allowed comparison with the influence of isolated kinematic information). Auditory and visual forms of the syllables /ba/, /bi/, /ga/, /gi/, /va/, and /vi/ were used to produce auditory, visual, congruent, and incongruent audiovisual speech stimuli. Visual speech identification and visual influences on identifying the auditory components of congruent and incongruent audiovisual speech were identical for color and gray-scale faces and were much greater than for point-light faces. These results indicate that luminance, rather than color, underlies visual and audiovisual speech perception and that this information is more than the kinematic information provided by point-light faces. Implications for processing visual and audiovisual speech are discussed.  相似文献   

5.
Vatakis A  Spence C 《Perception》2008,37(1):143-160
Research has shown that inversion is more detrimental to the perception of faces than to the perception of other types of visual stimuli. Inverting a face results in an impairment of configural information processing that leads to slowed early face processing and reduced accuracy when performance is tested in face recognition tasks. We investigated the effects of inverting speech and non-speech stimuli on audiovisual temporal perception. Upright and inverted audiovisual video clips of a person uttering syllables (experiments 1 and 2), playing musical notes on a piano (experiment 3), or a rhesus monkey producing vocalisations (experiment 4) were presented. Participants made unspeeded temporal-order judgments regarding which modality stream (auditory or visual) appeared to have been presented first. Inverting the visual stream did not have any effect on the sensitivity of temporal discrimination responses in any of the four experiments, thus implying that audiovisual temporal integration is resilient to the effects of orientation in the picture plane. By contrast, the point of subjective simultaneity differed significantly as a function of orientation only for the audiovisual speech stimuli but not for the non-speech stimuli or monkey calls. That is, smaller auditory leads were required for the inverted than for the upright-visual speech stimuli. These results are consistent with the longer processing latencies reported previously when human faces are inverted and demonstrates that the temporal perception of dynamic audiovisual speech can be modulated by changes in the physical properties of the visual speech (ie by changes in orientation).  相似文献   

6.
Perceptual changes are experienced during rapid and continuous repetition of a speech form, leading to an auditory illusion known as the verbal transformation effect. Although verbal transformations are considered to reflect mainly the perceptual organization and interpretation of speech, the present study was designed to test whether or not speech production constraints may participate in the emergence of verbal representations. With this goal in mind, we examined whether variations in the articulatory cohesion of repeated nonsense words--specifically, temporal relationships between articulatory events--could lead to perceptual asymmetries in verbal transformations. The first experiment displayed variations in timing relations between two consonantal gestures embedded in various nonsense syllables in a repetitive speech production task. In the second experiment, French participants repeatedly uttered these syllables while searching for verbal transformation. Syllable transformation frequencies followed the temporal clustering between consonantal gestures: The more synchronized the gestures, the more stable and attractive the syllable. In the third experiment, which involved a covert repetition mode, the pattern was maintained without external speech movements. However, when a purely perceptual condition was used in a fourth experiment, the previously observed perceptual asymmetries of verbal transformations disappeared. These experiments demonstrate the existence of an asymmetric bias in the verbal transformation effect linked to articulatory control constraints. The persistence of this effect from an overt to a covert repetition procedure provides evidence that articulatory stability constraints originating from the action system may be involved in auditory imagery. The absence of the asymmetric bias during a purely auditory procedure rules out perceptual mechanisms as a possible explanation of the observed asymmetries.  相似文献   

7.
Perception of visual speech and the influence of visual speech on auditory speech perception is affected by the orientation of a talker's face, but the nature of the visual information underlying this effect has yet to be established. Here, we examine the contributions of visually coarse (configural) and fine (featural) facial movement information to inversion effects in the perception of visual and audiovisual speech. We describe two experiments in which we disrupted perception of fine facial detail by decreasing spatial frequency (blurring) and disrupted perception of coarse configural information by facial inversion. For normal, unblurred talking faces, facial inversion had no influence on visual speech identification or on the effects of congruent or incongruent visual speech movements on perception of auditory speech. However, for blurred faces, facial inversion reduced identification of unimodal visual speech and effects of visual speech on perception of congruent and incongruent auditory speech. These effects were more pronounced for words whose appearance may be defined by fine featural detail. Implications for the nature of inversion effects in visual and audiovisual speech are discussed.  相似文献   

8.
Three experiments are reported on the influence of different timing relations on the McGurk effect. In the first experiment, it is shown that strict temporal synchrony between auditory and visual speech stimuli is not required for the McGurk effect. Subjects were strongly influenced by the visual stimuli when the auditory stimuli lagged the visual stimuli by as much as 180 msec. In addition, a stronger McGurk effect was found when the visual and auditory vowels matched. In the second experiment, we paired auditory and visual speech stimuli produced under different speaking conditions (fast, normal, clear). The results showed that the manipulations in both the visual and auditory speaking conditions independently influenced perception. In addition, there was a small but reliable tendency for the better matched stimuli to elicit more McGurk responses than unmatched conditions. In the third experiment, we combined auditory and visual stimuli produced under different speaking conditions (fast, clear) and delayed the acoustics with respect to the visual stimuli. The subjects showed the same pattern of results as in the second experiment. Finally, the delay did not cause different patterns of results for the different audiovisual speaking style combinations. The results suggest that perceivers may be sensitive to the concordance of the time-varying aspects of speech but they do not require temporal coincidence of that information.  相似文献   

9.
Although music and dance are often experienced simultaneously, it is unclear what modulates their perceptual integration. This study investigated how two factors related to music–dance correspondences influenced audiovisual binding of their rhythms: the metrical match between the music and dance, and the kinematic familiarity of the dance movement. Participants watched a point-light figure dancing synchronously to a triple-meter rhythm that they heard in parallel, whereby the dance communicated a triple (congruent) or a duple (incongruent) visual meter. The movement was either the participant’s own or that of another participant. Participants attended to both streams while detecting a temporal perturbation in the auditory beat. The results showed lower sensitivity to the auditory deviant when the visual dance was metrically congruent to the auditory rhythm and when the movement was the participant’s own. This indicated stronger audiovisual binding and a more coherent bimodal rhythm in these conditions, thus making a slight auditory deviant less noticeable. Moreover, binding in the metrically incongruent condition involving self-generated visual stimuli was correlated with self-recognition of the movement, suggesting that action simulation mediates the perceived coherence between one’s own movement and a mismatching auditory rhythm. Overall, the mechanisms of rhythm perception and action simulation could inform the perceived compatibility between music and dance, thus modulating the temporal integration of these audiovisual stimuli.  相似文献   

10.
Previous studies indicate that at least some aspects of audiovisual speech perception are impaired in children with specific language impairment (SLI). However, whether audiovisual processing difficulties are also present in older children with a history of this disorder is unknown. By combining electrophysiological and behavioral measures, we examined perception of both audiovisually congruent and audiovisually incongruent speech in school‐age children with a history of SLI (H‐SLI), their typically developing (TD) peers, and adults. In the first experiment, all participants watched videos of a talker articulating syllables ‘ba’, ‘da’, and ‘ga’ under three conditions – audiovisual (AV), auditory only (A), and visual only (V). The amplitude of the N1 (but not of the P2) event‐related component elicited in the AV condition was significantly reduced compared to the N1 amplitude measured from the sum of the A and V conditions in all groups of participants. Because N1 attenuation to AV speech is thought to index the degree to which facial movements predict the onset of the auditory signal, our findings suggest that this aspect of audiovisual speech perception is mature by mid‐childhood and is normal in the H‐SLI children. In the second experiment, participants watched videos of audivisually incongruent syllables created to elicit the so‐called McGurk illusion (with an auditory ‘pa’ dubbed onto a visual articulation of ‘ka’, and the expectant perception being that of ‘ta’ if audiovisual integration took place). As a group, H‐SLI children were significantly more likely than either TD children or adults to hear the McGurk syllable as ‘pa’ (in agreement with its auditory component) than as ‘ka’ (in agreement with its visual component), suggesting that susceptibility to the McGurk illusion is reduced in at least some children with a history of SLI. Taken together, the results of the two experiments argue against global audiovisual integration impairment in children with a history of SLI and suggest that, when present, audiovisual integration difficulties in this population likely stem from a later (non‐sensory) stage of processing.  相似文献   

11.
The visible movement of a talker's face is an influential component of speech perception. However, the ability of this influence to function when large areas of the face (~50%) are covered by simple substantial occlusions, and so are not visible to the observer, has yet to be fully determined. In Experiment 1, both visual speech identification and the influence of visual speech on identifying congruent and incongruent auditory speech were investigated using displays of a whole (unoccluded) talking face and of the same face occluded vertically so that the entire left or right hemiface was covered. Both the identification of visual speech and its influence on auditory speech perception were identical across all three face displays. Experiment 2 replicated and extended these results, showing that visual and audiovisual speech perception also functioned well with other simple substantial occlusions (horizontal and diagonal). Indeed, displays in which entire upper facial areas were occluded produced performance levels equal to those obtained with unoccluded displays. Occluding entire lower facial areas elicited some impairments in performance, but visual speech perception and visual speech influences on auditory speech perception were still apparent. Finally, implications of these findings for understanding the processes supporting visual and audiovisual speech perception are discussed.  相似文献   

12.
In 3 experiments, auditory massed repetition was used to examine age-related differences in habituation by means of the verbal transformation paradigm. Participants heard 10 words (5 high frequency and 5 low frequency), each presented 180 times, and they reported perceived changes in the repeated words (verbal transformations). In these experiments, older adults reported fewer illusory percepts than young adults. Older adults' loss of auditory acuity and slowing of processing, stimulus degradation (in young adults), and instructions biasing the report of these illusory percepts did not account for the fewer illusory percepts reported by the older adults. These findings suggest that older adults' reduced susceptibility to habituation arises from centrally located declines in the transmission of information within the word-recognition pathway. The discussion focuses on the implications that these age-related declines may have on word identification during on-line speech perception.  相似文献   

13.
Buchan JN  Munhall KG 《Perception》2011,40(10):1164-1182
Conflicting visual speech information can influence the perception of acoustic speech, causing an illusory percept of a sound not present in the actual acoustic speech (the McGurk effect). We examined whether participants can voluntarily selectively attend to either the auditory or visual modality by instructing participants to pay attention to the information in one modality and to ignore competing information from the other modality. We also examined how performance under these instructions was affected by weakening the influence of the visual information by manipulating the temporal offset between the audio and video channels (experiment 1), and the spatial frequency information present in the video (experiment 2). Gaze behaviour was also monitored to examine whether attentional instructions influenced the gathering of visual information. While task instructions did have an influence on the observed integration of auditory and visual speech information, participants were unable to completely ignore conflicting information, particularly information from the visual stream. Manipulating temporal offset had a more pronounced interaction with task instructions than manipulating the amount of visual information. Participants' gaze behaviour suggests that the attended modality influences the gathering of visual information in audiovisual speech perception.  相似文献   

14.
Multisensory integration can play a critical role in producing unified and reliable perceptual experience. When sensory information in one modality is degraded or ambiguous, information from other senses can crossmodally resolve perceptual ambiguities. Prior research suggests that auditory information can disambiguate the contents of visual awareness by facilitating perception of intermodally consistent stimuli. However, it is unclear whether these effects are truly due to crossmodal facilitation or are mediated by voluntary selective attention to audiovisually congruent stimuli. Here, we demonstrate that sounds can bias competition in binocular rivalry toward audiovisually congruent percepts, even when participants have no recognition of the congruency. When speech sounds were presented in synchrony with speech-like deformations of rivalling ellipses, ellipses with crossmodally congruent deformations were perceptually dominant over those with incongruent deformations. This effect was observed in participants who could not identify the crossmodal congruency in an open-ended interview (Experiment 1) or detect it in a simple 2AFC task (Experiment 2), suggesting that the effect was not due to voluntary selective attention or response bias. These results suggest that sound can automatically disambiguate the contents of visual awareness by facilitating perception of audiovisually congruent stimuli.  相似文献   

15.
The authors investigated the effects of changes in horizontal viewing angle on visual and audiovisual speech recognition in 4 experiments, using a talker's face viewed full face, three quarters, and in profile. When only experimental items were shown (Experiments 1 and 2), identification of unimodal visual speech and visual speech influences on congruent and incongruent auditory speech were unaffected by viewing angle changes. However, when experimental items were intermingled with distractor items (Experiments 3 and 4), identification of unimodal visual speech decreased with profile views, whereas visual speech influences on congruent and incongruent auditory speech remained unaffected by viewing angle changes. These findings indicate that audiovisual speech recognition withstands substantial changes in horizontal viewing angle, but explicit identification of visual speech is less robust. Implications of this distinction for understanding the processes underlying visual and audiovisual speech recognition are discussed.  相似文献   

16.
In the McGurk effect, visual information specifying a speaker’s articulatory movements can influence auditory judgments of speech. In the present study, we attempted to find an analogue of the McGurk effect by using nonspeech stimuli—the discrepant audiovisual tokens of plucks and bows on a cello. The results of an initial experiment revealed that subjects’ auditory judgments were influenced significantly by the visual pluck and bow stimuli. However, a second experiment in which speech syllables were used demonstrated that the visual influence on consonants was significantly greater than the visual influence observed for pluck-bow stimuli. This result could be interpreted to suggest that the nonspeech visual influence was not a true McGurk effect. In a third experiment, visual stimuli consisting of the wordspluck andbow were found to have no influence over auditory pluck and bow judgments. This result could suggest that the nonspeech effects found in Experiment 1 were based on the audio and visual information’s having an ostensive lawful relation to the specified event. These results are discussed in terms of motor-theory, ecological, and FLMP approaches to speech perception.  相似文献   

17.
Cognitive scientists routinely distinguish between controlled and automatic mental processes. Through learning, practice, and exposure, controlled processes can become automatic; however, whether automatic processes can become deautomatized – recuperated under the purview of control – remains unclear. Here we show that a suggestion derails a deeply ingrained process involving involuntary audiovisual integration. We compared the performance of highly versus less hypnotically suggestible individuals (HSIs versus LSIs) in a classic McGurk paradigm – a perceptual illusion task demonstrating the influence of visual facial movements on auditory speech percepts. Following a posthypnotic suggestion to prioritize auditory input, HSIs but not LSIs manifested fewer illusory auditory perceptions and correctly identified more auditory percepts. Our findings demonstrate that a suggestion deautomatized a ballistic audiovisual process in HSIs. In addition to guiding our knowledge regarding theories and mechanisms of automaticity, the present findings pave the road to a more scientific understanding of top-down effects and multisensory integration.  相似文献   

18.
Speech prosody has traditionally been considered solely in terms of its auditory features, yet correlated visual features exist, such as head and eyebrow movements. This study investigated the extent to which visual prosodic features are able to affect the perception of the auditory features. Participants were presented with videos of a speaker pronouncing two words, with visual features of emphasis on one of these words. For each trial, participants saw one video where the two words were identical in both pitch and amplitude, and another video where there was a difference in either pitch or amplitude that was congruent or incongruent with the visual changes. Participants were asked to decide which video contained the sound difference. Thresholds were obtained for the congruent and incongruent videos, and for an auditory-alone condition. It was found that the congruent thresholds were better than the incongruent thresholds for both pitch and amplitude changes. Interestingly, the congruent thresholds for amplitude were better than for the auditory-alone condition, which implies that the visual features improve sensitivity to loudness changes. These results demonstrate that visual stimuli can affect auditory thresholds for changes in pitch and amplitude, and furthermore support the view that visual prosodic features enhance speech processing.  相似文献   

19.
康冠兰  罗霄骁 《心理科学》2020,(5):1072-1078
多通道信息交互是指来自某个感觉通道的信息与另一感觉通道的信息相互作用、相互影响的一系列加工过程。主要包括两个方面:一是不同感觉通道的输入如何整合;二是跨通道信息的冲突控制。本文综述了视听跨通道信息整合与冲突控制的行为心理机制和神经机制,探讨了注意对视听信息整合与冲突控制的影响。未来需探究视听跨通道信息加工的脑网络机制,考察特殊群体的跨通道整合和冲突控制以帮助揭示其认知和社会功能障碍的机制。  相似文献   

20.
Reports of sex differences in language processing are inconsistent and are thought to vary by task type and difficulty. In two experiments, we investigated a sex difference in visual influence onheard speech (the McGurk effect). First, incongruent consonant-vowel stimuli were presented where the visual portion of the signal was brief (100 msec) or full (temporally equivalent to the auditory). Second, to determine whether men and women differed in their ability to extract visual speech information from these brief stimuli, the same stimuli were presented to new participants with an additional visual-only (lipread) condition. In both experiments, women showed a significantly greater visual influence on heard speech than did men for the brief visual stimuli. No sex differences for the full stimuli or in the ability to lipread were found. These findings indicate that the more challenging brief visual stimuli elicit sex differences in the processing of audiovisual speech.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号