首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Identity perception often takes place in multimodal settings, where perceivers have access to both visual (face) and auditory (voice) information. Despite this, identity perception is usually studied in unimodal contexts, where face and voice identity perception are modelled independently from one another. In this study, we asked whether and how much auditory and visual information contribute to audiovisual identity perception from naturally-varying stimuli. In a between-subjects design, participants completed an identity sorting task with either dynamic video-only, audio-only or dynamic audiovisual stimuli. In this task, participants were asked to sort multiple, naturally-varying stimuli from three different people by perceived identity. We found that identity perception was more accurate for video-only and audiovisual stimuli compared with audio-only stimuli. Interestingly, there was no difference in accuracy between video-only and audiovisual stimuli. Auditory information nonetheless played a role alongside visual information as audiovisual identity judgements per stimulus could be predicted from both auditory and visual identity judgements, respectively. While the relationship was stronger for visual information and audiovisual information, auditory information still uniquely explained a significant portion of the variance in audiovisual identity judgements. Our findings thus align with previous theoretical and empirical work that proposes that, compared with faces, voices are an important but relatively less salient and a weaker cue to identity perception. We expand on this work to show that, at least in the context of this study, having access to voices in addition to faces does not result in better identity perception accuracy.  相似文献   

2.
Previous research has shown that irrelevant sounds can facilitate the perception of visual apparent motion. Here the effectiveness of a single sound to facilitate motion perception was investigated in three experiments. Observers were presented with two discrete lights temporally separated by stimulus onset asynchronies from 0 to 350 ms. After each trial, observers classified their impression of the stimuli using a categorisation system. A short sound presented temporally (and spatially) midway between the lights facilitated the impression of motion relative to baseline (lights without sound), whereas a sound presented either before the first or after the second light or simultaneously with the lights did not affect motion impression. The facilitation effect also occurred with sound presented far from the visual display, as well as with continuous-sound that was started with the first light and terminated with the second light. No facilitation of visual motion perception occurred if the sound was part of a tone sequence that allowed for intramodal perceptual grouping of the auditory stimuli prior to the critical audiovisual stimuli. Taken together, the findings are consistent with a low-level audiovisual integration approach in which the perceptual system merges temporally proximate sound and light stimuli, thereby provoking the impression of a single multimodal moving object.  相似文献   

3.
Looming visual stimuli (log-increasing in proximal size over time) and auditory stimuli (of increasing sound intensity over time) have been shown to be perceived as longer than receding visual and auditory stimuli (i.e., looming stimuli reversed in time). Here, we investigated whether such asymmetry in subjective duration also occurs for audiovisual looming and receding stimuli, as well as for stationary stimuli (i.e., stimuli that do not change in size and/or intensity over time). Our results showed a great temporal asymmetry in audition but a null asymmetry in vision. In contrast, the asymmetry in audiovision was moderate, suggesting that multisensory percepts arise from the integration of unimodal percepts in a maximum-likelihood fashion.  相似文献   

4.
返回抑制(inhibition of return, IOR)与情绪刺激都具有引导注意偏向、提高搜索效率的特点, 但二者间是否存在一定的交互作用迄今为止尚不明确。研究采用“线索-目标”范式并在视听双通道呈现情绪刺激来考察情绪刺激的加工与IOR的交互作用。实验1中情绪刺激以单通道视觉面孔或一致的视听双通道呈现, 实验2通过在视听通道呈现不一致的情绪刺激进一步考察视听双通道情绪一致刺激对IOR的影响是否是由听觉通道一致的情绪刺激导致的, 即是否对听觉通道的情绪刺激进行了加工。结果发现, 视听双通道情绪一致刺激能够削弱IOR, 但情绪不一致刺激与IOR之间不存在交互作用, 并且单双通道的IOR不存在显著差异。研究结果表明仅在视听双通道呈现情绪一致刺激时, 才会影响同一阶段的IOR, 这进一步支持了IOR的知觉抑制理论。  相似文献   

5.
In musical performance, bodily gestures play an important role in communicating expressive intentions to audiences. Although previous studies have demonstrated that visual information can have an effect on the perceived expressivity of musical performances, the investigation of audiovisual interactions has been held back by the technical difficulties associated with the generation of controlled, mismatching stimuli. With the present study, we aimed to address this issue by utilizing a novel method in order to generate controlled, balanced stimuli that comprised both matching and mismatching bimodal combinations of different expressive intentions. The aim of Experiment 1 was to investigate the relative contributions of auditory and visual kinematic cues in the perceived expressivity of piano performances, and in Experiment 2 we explored possible crossmodal interactions in the perception of auditory and visual expressivity. The results revealed that although both auditory and visual kinematic cues contribute significantly to the perception of overall expressivity, the effect of visual kinematic cues appears to be somewhat stronger. These results also provide preliminary evidence of crossmodal interactions in the perception of auditory and visual expressivity. In certain performance conditions, visual cues had an effect on the ratings of auditory expressivity, and auditory cues had a small effect on the ratings of visual expressivity.  相似文献   

6.
We report a series of experiments designed to assess the effect of audiovisual semantic congruency on the identification of visually-presented pictures. Participants made unspeeded identification responses concerning a series of briefly-presented, and then rapidly-masked, pictures. A naturalistic sound was sometimes presented together with the picture at a stimulus onset asynchrony (SOA) that varied between 0 and 533 ms (auditory lagging). The sound could be semantically congruent, semantically incongruent, or else neutral (white noise) with respect to the target picture. The results showed that when the onset of the picture and sound occurred simultaneously, a semantically-congruent sound improved, whereas a semantically-incongruent sound impaired, participants’ picture identification performance, as compared to performance in the white-noise control condition. A significant facilitatory effect was also observed at SOAs of around 300 ms, whereas no such semantic congruency effects were observed at the longest interval (533 ms). These results therefore suggest that the neural representations associated with visual and auditory stimuli can interact in a shared semantic system. Furthermore, this crossmodal semantic interaction is not constrained by the need for the strict temporal coincidence of the constituent auditory and visual stimuli. We therefore suggest that audiovisual semantic interactions likely occur in a short-term buffer which rapidly accesses, and temporarily retains, the semantic representations of multisensory stimuli in order to form a coherent multisensory object representation. These results are explained in terms of Potter’s (1993) notion of conceptual short-term memory.  相似文献   

7.
Audiovisual timing perception can recalibrate following prolonged exposure to asynchronous auditory and visual inputs. It has been suggested that this might contribute to achieving perceptual synchrony for auditory and visual signals despite differences in physical and neural signal times for sight and sound. However, given that people can be concurrently exposed to multiple audiovisual stimuli with variable neural signal times, a mechanism that recalibrates all audiovisual timing percepts to a single timing relationship could be dysfunctional. In the experiments reported here, we showed that audiovisual temporal recalibration can be specific for particular audiovisual pairings. Participants were shown alternating movies of male and female actors containing positive and negative temporal asynchronies between the auditory and visual streams. We found that audiovisual synchrony estimates for each actor were shifted toward the preceding audiovisual timing relationship for that actor and that such temporal recalibrations occurred in positive and negative directions concurrently. Our results show that humans can form multiple concurrent estimates of appropriate timing for audiovisual synchrony.  相似文献   

8.
When auditory stimuli are used in two-dimensional spatial compatibility tasks, where the stimulus and response configurations vary along the horizontal and vertical dimensions simultaneously, a right–left prevalence effect occurs in which horizontal compatibility dominates over vertical compatibility. The right–left prevalence effects obtained with auditory stimuli are typically larger than that obtained with visual stimuli even though less attention should be demanded from the horizontal dimension in auditory processing. In the present study, we examined whether auditory or visual dominance occurs when the two-dimensional stimuli are audiovisual, as well as whether there will be cross-modal facilitation of response selection for the horizontal and vertical dimensions. We also examined whether there is an additional benefit of adding a pitch dimension to the auditory stimulus to facilitate vertical coding through use of the spatial-musical association of response codes (SMARC) effect, where pitch is coded in terms of height in space. In Experiment 1, we found a larger right–left prevalence effect for unimodal auditory than visual stimuli. Neutral, non-pitch coded, audiovisual stimuli did not result in cross-modal facilitation, but did show evidence of visual dominance. The right–left prevalence effect was eliminated in the presence of SMARC audiovisual stimuli, but the effect influenced horizontal rather than vertical coding. Experiment 2 showed that the influence of the pitch dimension was not in terms of influencing response selection on a trial-to-trial basis, but in terms of altering the salience of the task environment. Taken together, these findings indicate that in the absence of salient vertical cues, auditory and audiovisual stimuli tend to be coded along the horizontal dimension and vision tends to dominate audition in this two-dimensional spatial stimulus–response task.  相似文献   

9.
McCotter MV  Jordan TR 《Perception》2003,32(8):921-936
We conducted four experiments to investigate the role of colour and luminance information in visual and audiovisual speech perception. In experiments 1a (stimuli presented in quiet conditions) and 1b (stimuli presented in auditory noise), face display types comprised naturalistic colour (NC), grey-scale (GS), and luminance inverted (LI) faces. In experiments 2a (quiet) and 2b (noise), face display types comprised NC, colour inverted (CI), LI, and colour and luminance inverted (CLI) faces. Six syllables and twenty-two words were used to produce auditory and visual speech stimuli. Auditory and visual signals were combined to produce congruent and incongruent audiovisual speech stimuli. Experiments 1a and 1b showed that perception of visual speech, and its influence on identifying the auditory components of congruent and incongruent audiovisual speech, was less for LI than for either NC or GS faces, which produced identical results. Experiments 2a and 2b showed that perception of visual speech, and influences on perception of incongruent auditory speech, was less for LI and CLI faces than for NC and CI faces (which produced identical patterns of performance). Our findings for NC and CI faces suggest that colour is not critical for perception of visual and audiovisual speech. The effect of luminance inversion on performance accuracy was relatively small (5%), which suggests that the luminance information preserved in LI faces is important for the processing of visual and audiovisual speech.  相似文献   

10.
Estimating time to contact (TTC) involves multiple sensory systems, including vision and audition. Previous findings suggested that the ratio of an object’s instantaneous optical size/sound intensity to its instantaneous rate of change in optical size/sound intensity (τ) drives TTC judgments. Other evidence has shown that heuristic-based cues are used, including final optical size or final sound pressure level. Most previous studies have used decontextualized and unfamiliar stimuli (e.g., geometric shapes on a blank background). Here we evaluated TTC estimates by using a traffic scene with an approaching vehicle to evaluate the weights of visual and auditory TTC cues under more realistic conditions. Younger (18–39 years) and older (65+ years) participants made TTC estimates in three sensory conditions: visual-only, auditory-only, and audio–visual. Stimuli were presented within an immersive virtual-reality environment, and cue weights were calculated for both visual cues (e.g., visual τ, final optical size) and auditory cues (e.g., auditory τ, final sound pressure level). The results demonstrated the use of visual τ as well as heuristic cues in the visual-only condition. TTC estimates in the auditory-only condition, however, were primarily based on an auditory heuristic cue (final sound pressure level), rather than on auditory τ. In the audio–visual condition, the visual cues dominated overall, with the highest weight being assigned to visual τ by younger adults, and a more equal weighting of visual τ and heuristic cues in older adults. Overall, better characterizing the effects of combined sensory inputs, stimulus characteristics, and age on the cues used to estimate TTC will provide important insights into how these factors may affect everyday behavior.  相似文献   

11.
Vatakis A  Spence C 《Perception》2008,37(1):143-160
Research has shown that inversion is more detrimental to the perception of faces than to the perception of other types of visual stimuli. Inverting a face results in an impairment of configural information processing that leads to slowed early face processing and reduced accuracy when performance is tested in face recognition tasks. We investigated the effects of inverting speech and non-speech stimuli on audiovisual temporal perception. Upright and inverted audiovisual video clips of a person uttering syllables (experiments 1 and 2), playing musical notes on a piano (experiment 3), or a rhesus monkey producing vocalisations (experiment 4) were presented. Participants made unspeeded temporal-order judgments regarding which modality stream (auditory or visual) appeared to have been presented first. Inverting the visual stream did not have any effect on the sensitivity of temporal discrimination responses in any of the four experiments, thus implying that audiovisual temporal integration is resilient to the effects of orientation in the picture plane. By contrast, the point of subjective simultaneity differed significantly as a function of orientation only for the audiovisual speech stimuli but not for the non-speech stimuli or monkey calls. That is, smaller auditory leads were required for the inverted than for the upright-visual speech stimuli. These results are consistent with the longer processing latencies reported previously when human faces are inverted and demonstrates that the temporal perception of dynamic audiovisual speech can be modulated by changes in the physical properties of the visual speech (ie by changes in orientation).  相似文献   

12.
This study investigated audiovisual synchrony perception in a rhythmic context, where the sound was not consequent upon the observed movement. Participants judged synchrony between a bouncing point-light figure and an auditory rhythm in two experiments. Two questions were of interest: (1) whether the reference in the visual movement, with which the auditory beat should coincide, relies on a position or a velocity cue; (2) whether the figure form and motion profile affect synchrony perception. Experiment 1 required synchrony judgment with regard to the same (lowest) position of the movement in four visual conditions: two figure forms (human or non-human) combined with two motion profiles (human or ball trajectory). Whereas figure form did not affect synchrony perception, the point of subjective simultaneity differed between the two motions, suggesting that participants adopted the peak velocity in each downward trajectory as their visual reference. Experiment 2 further demonstrated that, when judgment was required with regard to the highest position, the maximal synchrony response was considerably low for ball motion, which lacked a peak velocity in the upward trajectory. The finding of peak velocity as a cue parallels results of visuomotor synchronization tasks employing biological stimuli, suggesting that synchrony judgment with rhythmic motions relies on the perceived visual beat.  相似文献   

13.
The multistable perception of speech, or verbal transformation effect, refers to perceptual changes experienced while listening to a speech form that is repeated rapidly and continuously. In order to test whether visual information from the speaker's articulatory gestures may modify the emergence and stability of verbal auditory percepts, subjects were instructed to report any perceptual changes during unimodal, audiovisual, and incongruent audiovisual presentations of distinct repeated syllables. In a first experiment, the perceptual stability of reported auditory percepts was significantly modulated by the modality of presentation. In a second experiment, when audiovisual stimuli consisting of a stable audio track dubbed with a video track that alternated between congruent and incongruent stimuli were presented, a strong correlation between the timing of perceptual transitions and the timing of video switches was found. Finally, a third experiment showed that the vocal tract opening onset event provided by the visual input could play the role of a bootstrap mechanism in the search for transformations. Altogether, these results demonstrate the capacity of visual information to control the multistable perception of speech in its phonetic content and temporal course. The verbal transformation effect thus provides a useful experimental paradigm to explore audiovisual interactions in speech perception.  相似文献   

14.
Here, we investigate how audiovisual context affects perceived event duration with experiments in which observers reported which of two stimuli they perceived as longer. Target events were visual and/or auditory and could be accompanied by nontargets in the other modality. Our results demonstrate that the temporal information conveyed by irrelevant sounds is automatically used when the brain estimates visual durations but that irrelevant visual information does not affect perceived auditory duration (Experiment 1). We further show that auditory influences on subjective visual durations occur only when the temporal characteristics of the stimuli promote perceptual grouping (Experiments 1 and 2). Placed in the context of scalar expectancy theory of time perception, our third and fourth experiments have the implication that audiovisual context can lead both to changes in the rate of an internal clock and to temporal ventriloquism-like effects on perceived on- and offsets. Finally, intramodal grouping of auditory stimuli diminished any crossmodal effects, suggesting a strong preference for intramodal over crossmodal perceptual grouping (Experiment 5).  相似文献   

15.
唐晓雨  孙佳影  彭姓 《心理学报》2020,52(3):257-268
本研究基于线索-靶子范式, 操纵目标刺激类型(视觉、听觉、视听觉)与线索有效性(有效线索、中性条件、无效线索)两个自变量, 通过3个实验来考察双通道分配性注意对视听觉返回抑制(inhibition of return, IOR)的影响。实验1 (听觉刺激呈现在左/右侧)结果发现, 在双通道分配性注意条件下, 视觉目标产生显著IOR效应, 而视听觉目标没有产生IOR效应; 实验2 (听觉刺激呈现在左/右侧)与实验3 (听觉刺激呈现在中央)结果发现, 在视觉通道选择性注意条件下, 视觉与视听觉目标均产生显著IOR效应但二者无显著差异。结果表明:双通道分配性注意减弱视听觉IOR效应。  相似文献   

16.
唐晓雨  佟佳庚  于宏  王爱君 《心理学报》2021,53(11):1173-1188
本文采用内-外源性空间线索靶子范式, 操控内源性线索有效性(有效线索、无效线索)、外源性线索有效性(有效线索、无效线索)、目标刺激类型(视觉刺激、听觉刺激、视听觉刺激)三个自变量。通过两个不同任务难度的实验(实验1: 简单定位任务; 实验2: 复杂辨别任务)来考察内外源性空间注意对多感觉整合的影响。两个实验结果均发现外源性空间注意显著减弱了多感觉整合效应, 内源性空间注意没有显著增强多感觉整合效应; 实验2中还发现了内源性空间注意会对外源性空间注意减弱多感觉整合效应产生影响。结果表明, 与内源性空间注意不同, 外源性空间注意对多感觉整合的影响不易受任务难度的调控; 当任务较难时内源性空间注意会影响外源性空间注意减弱多感觉整合效应的过程。由此推测, 内外源性空间注意对多感觉整合的调节并非彼此独立、而是相互影响的。  相似文献   

17.
Vatakis, A. and Spence, C. (in press) [Crossmodal binding: Evaluating the 'unity assumption' using audiovisual speech stimuli. Perception &Psychophysics] recently demonstrated that when two briefly presented speech signals (one auditory and the other visual) refer to the same audiovisual speech event, people find it harder to judge their temporal order than when they refer to different speech events. Vatakis and Spence argued that the 'unity assumption' facilitated crossmodal binding on the former (matching) trials by means of a process of temporal ventriloquism. In the present study, we investigated whether the 'unity assumption' would also affect the binding of non-speech stimuli (video clips of object action or musical notes). The auditory and visual stimuli were presented at a range of stimulus onset asynchronies (SOAs) using the method of constant stimuli. Participants made unspeeded temporal order judgments (TOJs) regarding which modality stream had been presented first. The auditory and visual musical and object action stimuli were either matched (e.g., the sight of a note being played on a piano together with the corresponding sound) or else mismatched (e.g., the sight of a note being played on a piano together with the sound of a guitar string being plucked). However, in contrast to the results of Vatakis and Spence's recent speech study, no significant difference in the accuracy of temporal discrimination performance for the matched versus mismatched video clips was observed. Reasons for this discrepancy are discussed.  相似文献   

18.
Perception of visual speech and the influence of visual speech on auditory speech perception is affected by the orientation of a talker's face, but the nature of the visual information underlying this effect has yet to be established. Here, we examine the contributions of visually coarse (configural) and fine (featural) facial movement information to inversion effects in the perception of visual and audiovisual speech. We describe two experiments in which we disrupted perception of fine facial detail by decreasing spatial frequency (blurring) and disrupted perception of coarse configural information by facial inversion. For normal, unblurred talking faces, facial inversion had no influence on visual speech identification or on the effects of congruent or incongruent visual speech movements on perception of auditory speech. However, for blurred faces, facial inversion reduced identification of unimodal visual speech and effects of visual speech on perception of congruent and incongruent auditory speech. These effects were more pronounced for words whose appearance may be defined by fine featural detail. Implications for the nature of inversion effects in visual and audiovisual speech are discussed.  相似文献   

19.
Using a novel variant of dichotic selective listening, we examined the control of auditory selective attention. In our task, subjects had to respond selectively to one of two simultaneously presented auditory stimuli (number words), always spoken by a female and a male speaker, by performing a numerical size categorization. The gender of the task-relevant speaker could change, as indicated by a visual cue prior to auditory stimulus onset. Three experiments show clear performance costs with instructed attention switches. Experiment 2 varied the cuing interval to examine advance preparation for an attention switch. Experiment 3 additionally isolated auditory switch costs from visual cue priming by using two cues for each gender, so that gender repetition could be indicated by a changed cue. Experiment 2 showed that switch costs decreased with prolonged cuing intervals, but Experiment 3 revealed that preparation did not affect auditory switch costs but only visual cue priming. Moreover, incongruent numerical categories in competing auditory stimuli produced interference and substantially increased error rates, suggesting continued processing of task-relevant information that often leads to responding to the incorrect auditory source. Together, the data show clear limitations in advance preparation of auditory attention switches and suggest a considerable degree of inertia in intentional control of auditory selection criteria.  相似文献   

20.
In the ventriloquism aftereffect, brief exposure to a consistent spatial disparity between auditory and visual stimuli leads to a subsequent shift in subjective sound localization toward the positions of the visual stimuli. Such rapid adaptive changes probably play an important role in maintaining the coherence of spatial representations across the various sensory systems. In the research reported here, we used event-related potentials (ERPs) to identify the stage in the auditory processing stream that is modulated by audiovisual discrepancy training. Both before and after exposure to synchronous audiovisual stimuli that had a constant spatial disparity of 15°, participants reported the perceived location of brief auditory stimuli that were presented from central and lateral locations. In conjunction with a sound localization shift in the direction of the visual stimuli (the behavioral ventriloquism aftereffect), auditory ERPs as early as 100 ms poststimulus (N100) were systematically modulated by the disparity training. These results suggest that cross-modal learning was mediated by a relatively early stage in the auditory cortical processing stream.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号