首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Two experiments were pedormed under visual-only and visual-auditory discrepancy conditions (dubs) to assess observers’ abilities to read speech information on a face. In the first experiment, identification and multiple choice testing were used. In addition, the relation between visual and auditory phonetic information was manipulated and related to perceptual bias. In the second experiment, the “compellingness” of the visual-auditory discrepancy as a single speech event was manipulated. Subjects also rated the confidence they had that their perception of the lipped word was accurate. Results indicated that competing visual information exerted little effect on auditory speech recognition, but visual speech recognition was substantially interfered with when discrepant auditory information was present. The extent of auditory bias was found to be related to the abilities of observers to read speech under nondiscrepancy conditions, the magnitude of the visual-auditory discrepancy, and the compellingheSS of the visual-auditory discrepancy as a single event. Auditory bias during speech was found to be a moderately compelling conscious experience, and not simply a case of confused responding or guessing. Results were discussed in terms of current models of perceptual dominance and related to results from modality discordance during space perception.  相似文献   

2.
Three experiments were carried out to investigate the evaluation and integration of visual and auditory information in speech perception. In the first two experiments, subjects identified /ba/ or /da/ speech events consisting of high-quality synthetic syllables ranging from /ba/ to /da/ combined with a videotaped /ba/ or /da/ or neutral articulation. Although subjects were specifically instructed to report what they heard, visual articulation made a large contribution to identification. The tests of quantitative models provide evidence for the integration of continuous and independent, as opposed to discrete or nonindependent, sources of information. The reaction times for identification were primarily correlated with the perceived ambiguity of the speech event. In a third experiment, the speech events were identified with an unconstrained set of response alternatives. In addition to /ba/ and /da/ responses, the /bda/ and /tha/ responses were well described by a combination of continuous and independent features. This body of results provides strong evidence for a fuzzy logical model of perceptual recognition.  相似文献   

3.
Speech unfolds over time, and the cues for even a single phoneme are rarely available simultaneously. Consequently, to recognize a single phoneme, listeners must integrate material over several hundred milliseconds. Prior work contrasts two accounts: (a) a memory buffer account in which listeners accumulate auditory information in memory and only access higher level representations (i.e., lexical representations) when sufficient information has arrived; and (b) an immediate integration scheme in which lexical representations can be partially activated on the basis of early cues and then updated when more information arises. These studies have uniformly shown evidence for immediate integration for a variety of phonetic distinctions. We attempted to extend this to fricatives, a class of speech sounds which requires not only temporal integration of asynchronous cues (the frication, followed by the formant transitions 150–350 ms later), but also integration across different frequency bands and compensation for contextual factors like coarticulation. Eye movements in the visual world paradigm showed clear evidence for a memory buffer. Results were replicated in five experiments, ruling out methodological factors and tying the release of the buffer to the onset of the vowel. These findings support a general auditory account for speech by suggesting that the acoustic nature of particular speech sounds may have large effects on how they are processed. It also has major implications for theories of auditory and speech perception by raising the possibility of an encapsulated memory buffer in early auditory processing.  相似文献   

4.
Modality specificity in priming is taken as evidence for independent perceptual systems. However, Easton, Greene, and Srinivas (1997) showed that visual and haptic cross-modal priming is comparable in magnitude to within-modal priming. Where appropriate, perceptual systems might share like information. To test this, we assessed priming and recognition for visual and auditory events, within- and across- modalities. On the visual test, auditory study resulted in no priming. On the auditory priming test, visual study resulted in priming that was only marginally less than within-modal priming. The priming results show that visual study facilitates identification on both visual and auditory tests, but auditory study only facilitates performance on the auditory test. For both recognition tests, within-modal recognition exceeded cross-modal recognition. The results have two novel implications for the understanding of perceptual priming: First, we introduce visual and auditory priming for spatio-temporal events as a new priming paradigm chosen for its ecological validity and potential for information exchange. Second, we propose that the asymmetry of the cross-modal priming observed here may reflect the capacity of these perceptual modalities to provide cross-modal constraints on ambiguity. We argue that visual perception might inform and constrain auditory processing, while auditory perception corresponds to too many potential visual events to usefully inform and constrain visual perception.  相似文献   

5.
The multistable perception of speech, or verbal transformation effect, refers to perceptual changes experienced while listening to a speech form that is repeated rapidly and continuously. In order to test whether visual information from the speaker's articulatory gestures may modify the emergence and stability of verbal auditory percepts, subjects were instructed to report any perceptual changes during unimodal, audiovisual, and incongruent audiovisual presentations of distinct repeated syllables. In a first experiment, the perceptual stability of reported auditory percepts was significantly modulated by the modality of presentation. In a second experiment, when audiovisual stimuli consisting of a stable audio track dubbed with a video track that alternated between congruent and incongruent stimuli were presented, a strong correlation between the timing of perceptual transitions and the timing of video switches was found. Finally, a third experiment showed that the vocal tract opening onset event provided by the visual input could play the role of a bootstrap mechanism in the search for transformations. Altogether, these results demonstrate the capacity of visual information to control the multistable perception of speech in its phonetic content and temporal course. The verbal transformation effect thus provides a useful experimental paradigm to explore audiovisual interactions in speech perception.  相似文献   

6.
7.
Integration of simultaneous auditory and visual information about an event can enhance our ability to detect that event. This is particularly evident in the perception of speech, where the articulatory gestures of the speaker's lips and face can significantly improve the listener's detection and identification of the message, especially when that message is presented in a noisy background. Speech is a particularly important example of multisensory integration because of its behavioural relevance to humans and also because brain regions have been identified that appear to be specifically tuned for auditory speech and lip gestures. Previous research has suggested that speech stimuli may have an advantage over other types of auditory stimuli in terms of audio-visual integration. Here, we used a modified adaptive psychophysical staircase approach to compare the influence of congruent visual stimuli (brief movie clips) on the detection of noise-masked auditory speech and non-speech stimuli. We found that congruent visual stimuli significantly improved detection of an auditory stimulus relative to incongruent visual stimuli. This effect, however, was equally apparent for speech and non-speech stimuli. The findings suggest that speech stimuli are not specifically advantaged by audio-visual integration for detection at threshold when compared with other naturalistic sounds.  相似文献   

8.
We are constantly exposed to our own face and voice, and we identify our own faces and voices as familiar. However, the influence of self-identity upon self-speech perception is still uncertain. Speech perception is a synthesis of both auditory and visual inputs; although we hear our own voice when we speak, we rarely see the dynamic movements of our own face. If visual speech and identity are processed independently, no processing advantage would obtain in viewing one’s own highly familiar face. In the present experiment, the relative contributions of facial and vocal inputs to speech perception were evaluated with an audiovisual illusion. Our results indicate that auditory self-speech conveys a processing advantage, whereas visual self-speech does not. The data thereby support a model of visual speech as dynamic movement processed separately from speaker recognition.  相似文献   

9.
Buchan JN  Munhall KG 《Perception》2011,40(10):1164-1182
Conflicting visual speech information can influence the perception of acoustic speech, causing an illusory percept of a sound not present in the actual acoustic speech (the McGurk effect). We examined whether participants can voluntarily selectively attend to either the auditory or visual modality by instructing participants to pay attention to the information in one modality and to ignore competing information from the other modality. We also examined how performance under these instructions was affected by weakening the influence of the visual information by manipulating the temporal offset between the audio and video channels (experiment 1), and the spatial frequency information present in the video (experiment 2). Gaze behaviour was also monitored to examine whether attentional instructions influenced the gathering of visual information. While task instructions did have an influence on the observed integration of auditory and visual speech information, participants were unable to completely ignore conflicting information, particularly information from the visual stream. Manipulating temporal offset had a more pronounced interaction with task instructions than manipulating the amount of visual information. Participants' gaze behaviour suggests that the attended modality influences the gathering of visual information in audiovisual speech perception.  相似文献   

10.
11.
Previous studies have shown that people can use the information in trajectory forms to recognize visual events. A trajectory form is composed of the path of motion and the change in speed along that path. In past studies, however, only sensitivity to trajectory forms viewed from a single perspective was examined. The optical components change when an event is viewed from different perspectives, and the projected form of the trajectory is transformed. Does event recognition exhibit constancy despite these changes? In Experiment 1, participants were familiarized with five different trajectory forms viewed from a single perspective. Then the participants had to identify the same events viewed from different perspectives: from the side, at an angle, and entirely in depth. The participants exhibited perceptual constancy. Experiment 2 revealed, however, that both the change in optical components and the perspective transformations affected recognition.  相似文献   

12.
We propose a measure of audiovisual speech integration that takes into account accuracy and response times. This measure should prove beneficial for researchers investigating multisensory speech recognition, since it relates to normal-hearing and aging populations. As an example, age-related sensory decline influences both the rate at which one processes information and the ability to utilize cues from different sensory modalities. Our function assesses integration when both auditory and visual information are available, by comparing performance on these audiovisual trials with theoretical predictions for performance under the assumptions of parallel, independent self-terminating processing of single-modality inputs. We provide example data from an audiovisual identification experiment and discuss applications for measuring audiovisual integration skills across the life span.  相似文献   

13.
Congruent information conveyed over different sensory modalities often facilitates a variety of cognitive processes, including speech perception (Sumby & Pollack, 1954). Since auditory processing is substantially faster than visual processing, auditory-visual integration can occur over a surprisingly wide temporal window (Stein, 1998). We investigated the processing architecture mediating the integration of acoustic digit names with corresponding symbolic visual forms. The digits "1" or "2" were presented in auditory, visual, or bimodal format at several stimulus onset asynchronies (SOAs; 0, 75, 150, and 225 msec). The reaction times (RTs) for echoing unimodal auditory stimuli were approximately 100 msec faster than the RTs for naming their visual forms. Correspondingly, bimodal facilitation violated race model predictions, but only at SOA values greater than 75 msec. These results indicate that the acoustic and visual information are pooled prior to verbal response programming. However, full expression of this bimodal summation is dependent on the central coincidence of the visual and auditory inputs. These results are considered in the context of studies demonstrating multimodal activation of regions involved in speech production.  相似文献   

14.
Older adults often experience hearing difficulties in multitalker situations. Attentional control of auditory perception is crucial in situations where a plethora of auditory inputs compete for further processing. We combined an intensity-modulated dichotic listening paradigm with attentional manipulations to study adult age differences in the interplay between perceptual saliency and attentional control of auditory processing. When confronted with two competing sources of verbal auditory input, older adults modulated their attention less flexibly and were more driven by perceptual saliency than younger adults. These findings suggest that aging severely impairs the attentional regulation of auditory perception.  相似文献   

15.
Here, we investigate how audiovisual context affects perceived event duration with experiments in which observers reported which of two stimuli they perceived as longer. Target events were visual and/or auditory and could be accompanied by nontargets in the other modality. Our results demonstrate that the temporal information conveyed by irrelevant sounds is automatically used when the brain estimates visual durations but that irrelevant visual information does not affect perceived auditory duration (Experiment 1). We further show that auditory influences on subjective visual durations occur only when the temporal characteristics of the stimuli promote perceptual grouping (Experiments 1 and 2). Placed in the context of scalar expectancy theory of time perception, our third and fourth experiments have the implication that audiovisual context can lead both to changes in the rate of an internal clock and to temporal ventriloquism-like effects on perceived on- and offsets. Finally, intramodal grouping of auditory stimuli diminished any crossmodal effects, suggesting a strong preference for intramodal over crossmodal perceptual grouping (Experiment 5).  相似文献   

16.
Investigation of the effect that a word recognition task has on concurrent nonverbal tasks showed (a) auditory verbal messages affected visual tracking performance but not the detection of brief light flashes in the visual periphery, (b) greater impairment, both of tracking and light detections, when verbal messages were visual rather than auditory. With a kinaesthetic tracking task, errors increased significantly during auditory messages but were even greater during visual messages. There was no interaction between the modality of tracking error feedback (auditory or visual) and the modality of the verbal message. Nor was the decrement from visual messages reduced by changing the presentation format. It is suggested that different temporal characteristics of visual and auditory information affect the attentional demands of verbal messages.  相似文献   

17.
虚拟现实技术通过提供视觉、听觉和触觉等信息为用户创造身临其境的感知体验, 其中触觉反馈面临诸多技术瓶颈使得虚拟现实中的自然交互受限。基于多感官错觉的伪触觉技术可以借助其他通道的信息强化和丰富触觉感受, 是目前虚拟现实环境中优化触觉体验的有效途径。本文聚焦于触觉中最重要的维度之一——粗糙度, 试图为解决虚拟现实中触觉反馈的受限问题提供新思路。探讨了粗糙度感知中, 视、听、触多感觉通道整合的关系, 分析了视觉线索(表面纹理密度、表面光影、控制显示比)和听觉线索(音调/频率、响度)如何影响触觉粗糙度感知, 总结了当下调控这些因素来改变粗糙度感知的方法。最后, 探讨了使用伪触觉反馈技术时, 虚拟现实环境中视、听、触觉信息在呈现效果、感知整合等方面与真实世界相比可能存在的差异, 提出可借鉴的改善触觉体验的适用方法和未来待研究的方向。  相似文献   

18.
In three experiments, we investigated whether the ease with which distracting sounds can be ignored depends on their distance from fixation and from attended visual events. In the first experiment, participants shadowed an auditory stream of words presented behind their heads, while simultaneously fixating visual lip-read information consistent with the relevant auditory stream, or meaningless "chewing" lip movements. An irrelevant auditory stream of words, which participants had to ignore, was presented either from the same side as the fixated visual stream or from the opposite side. Selective shadowing was less accurate in the former condition, implying that distracting sounds are harder to ignore when fixated. Furthermore, the impairment when fixating toward distractor sounds was greater when speaking lips were fixated than when chewing lips were fixated, suggesting that people find it particularly difficult to ignore sounds at locations that are actively attended for visual lipreading rather than merely passively fixated. Experiments 2 and 3 tested whether these results are specific to cross-modal links in speech perception by replacing the visual lip movements with a rapidly changing stream of meaningless visual shapes. The auditory task was again shadowing, but the active visual task was now monitoring for a specific visual shape at one location. A decrement in shadowing was again observed when participants passively fixated toward the irrelevant auditory stream. This decrement was larger when participants performed a difficult active visual task there versus fixating, but not for a less demanding visual task versus fixation. The implications for cross-modal links in spatial attention are discussed.  相似文献   

19.
In noisy situations, visual information plays a critical role in the success of speech communication: listeners are better able to understand speech when they can see the speaker. Visual influence on auditory speech perception is also observed in the McGurk effect, in which discrepant visual information alters listeners’ auditory perception of a spoken syllable. When hearing /ba/ while seeing a person saying /ga/, for example, listeners may report hearing /da/. Because these two phenomena have been assumed to arise from a common integration mechanism, the McGurk effect has often been used as a measure of audiovisual integration in speech perception. In this study, we test whether this assumed relationship exists within individual listeners. We measured participants’ susceptibility to the McGurk illusion as well as their ability to identify sentences in noise across a range of signal-to-noise ratios in audio-only and audiovisual modalities. Our results do not show a relationship between listeners’ McGurk susceptibility and their ability to use visual cues to understand spoken sentences in noise, suggesting that McGurk susceptibility may not be a valid measure of audiovisual integration in everyday speech processing.  相似文献   

20.
The TRACE model of speech perception (McClelland & Elman, 1986) is contrasted with a fuzzy logical model of perception (FLMP) (Oden & Massaro, 1978). The central question is how the models account for the influence of multiple sources of information on perceptual judgment. Although the two models can make somewhat similar predictions, the assumptions underlying the models are fundamentally different. The TRACE model is built around the concept of interactive activation, whereas the FLMP is structured in terms of the integration of independent sources of information. The models are tested against test results of an experiment involving the independent manipulation of bottom-up and top-down sources of information. Using a signal detection framework, sensitivity and bias measures of performance can be computed. The TRACE model predicts that top-down influences from the word level influence sensitivity at the phoneme level, whereas the FLMP does not. The empirical results of a study involving the influence of phonological context and segmental information on the perceptual recognition of a speech segment are best described without any assumed changes in sensitivity. To date, not only is a mechanism of interactive activation not necessary to describe speech perception, it is shown to be wrong when instantiated in the TRACE model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号