首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
While audiovisual integration is well known in speech perception, faces and speech are also informative with respect to speaker recognition. To date, audiovisual integration in the recognition of familiar people has never been demonstrated. Here we show systematic benefits and costs for the recognition of familiar voices when these are combined with time-synchronized articulating faces, of corresponding or noncorresponding speaker identity, respectively. While these effects were strong for familiar voices, they were smaller or nonsignificant for unfamiliar voices, suggesting that the effects depend on the previous creation of a multimodal representation of a person's identity. Moreover, the effects were reduced or eliminated when voices were combined with the same faces presented as static pictures, demonstrating that the effects do not simply reflect the use of facial identity as a “cue” for voice recognition. This is the first direct evidence for audiovisual integration in person recognition.  相似文献   

2.
Audiovisual integration (AVI) has been demonstrated to play a major role in speech comprehension. Previous research suggests that AVI in speech comprehension tolerates a temporal window of audiovisual asynchrony. However, few studies have employed audiovisual presentation to investigate AVI in person recognition. Here, participants completed an audiovisual voice familiarity task in which the synchrony of the auditory and visual stimuli was manipulated, and in which visual speaker identity could be corresponding or noncorresponding to the voice. Recognition of personally familiar voices systematically improved when corresponding visual speakers were presented near synchrony or with slight auditory lag. Moreover, when faces of different familiarity were presented with a voice, recognition accuracy suffered at near synchrony to slight auditory lag only. These results provide the first evidence for a temporal window for AVI in person recognition between approximately 100 ms auditory lead and 300 ms auditory lag.  相似文献   

3.
We are constantly exposed to our own face and voice, and we identify our own faces and voices as familiar. However, the influence of self-identity upon self-speech perception is still uncertain. Speech perception is a synthesis of both auditory and visual inputs; although we hear our own voice when we speak, we rarely see the dynamic movements of our own face. If visual speech and identity are processed independently, no processing advantage would obtain in viewing one’s own highly familiar face. In the present experiment, the relative contributions of facial and vocal inputs to speech perception were evaluated with an audiovisual illusion. Our results indicate that auditory self-speech conveys a processing advantage, whereas visual self-speech does not. The data thereby support a model of visual speech as dynamic movement processed separately from speaker recognition.  相似文献   

4.
We rarely become familiar with the voice of another person in isolation but usually also have access to visual identity information, thus learning to recognize their voice and face in parallel. There are conflicting findings as to whether learning to recognize voices in audiovisual vs audio-only settings is advantageous or detrimental to learning. One prominent finding shows that the presence of a face overshadows the voice, hindering voice identity learning by capturing listeners' attention (Face Overshadowing Effect; FOE). In the current study, we tested the proposal that the effect of audiovisual training on voice identity learning is driven by attentional processes. Participants learned to recognize voices through either audio-only training (Audio-Only) or through three versions of audiovisual training, where a face was presented alongside the voices. During audiovisual training, the faces were either looking at the camera (Direct Gaze), were looking to the side (Averted Gaze) or had closed eyes (No Gaze). We found a graded effect of gaze on voice identity learning: Voice identity recognition was most accurate after audio-only training and least accurate after audiovisual training including direct gaze, constituting a FOE. While effect sizes were overall small, the magnitude of FOE was halved for the Averted and No Gaze conditions. With direct gaze being associated with increased attention capture compared to averted or no gaze, the current findings suggest that incidental attention capture at least partially underpins the FOE. We discuss these findings in light of visual dominance effects and the relative informativeness of faces vs voices for identity perception.  相似文献   

5.
An experiment was conducted to investigate the claims made by Bruce and Young (1986) for the independence of facial identity and facial speech processing. A well-reported phenomenon in audiovisual speech perception—theMcGurk effect (McGurk & MacDonald, 1976), in which synchronous but conflicting auditory and visual phonetic information is presented to subjects—was utilized as a dynamic facial speech processing task. An element of facial identity processing was introduced into this task by manipulating the faces used for the creation of the McGurk-effect stimuli such that (1) they were familiar to some subjects and unfamiliar to others, and (2) the faces and voices used were either congruent (from the same person) or incongruent (from different people). A comparison was made between the different subject groups in their susceptibility to the McGurk illusion, and the results show that when the faces and voices are incongruent, subjects who are familiar with the faces are less susceptible to McGurk effects than those who are unfamiliar with the faces. The results suggest that facial identity and facial speech processing are not entirely independent, and these findings are discussed in relation to Bruce and Young’s (1986) functional model of face recognition.  相似文献   

6.
Three experiments investigated functional asymmetries related to self-recognition in the domain of voices. In Experiment 1, participants were asked to identify one of three presented voices (self, familiar or unknown) by responding with either the right or the left-hand. In Experiment 2, participants were presented with auditory morphs between the self-voice and a familiar voice and were asked to perform a forced-choice decision on speaker identity with either the left or the right-hand. In Experiment 3, participants were presented with continua of auditory morphs between self- or a familiar voice and a famous voice, and were asked to stop the presentation either when the voice became "more famous" or "more familiar/self". While these experiments did not reveal an overall hand difference for self-recognition, the last study, with improved design and controls, suggested a right-hemisphere advantage for self-compared to other-voice recognition, similar to that observed in the visual domain for self-faces.  相似文献   

7.
Apart from speech content, the human voice also carries paralinguistic information about speaker identity. Voice identification and its neural correlates have received little scientific attention up to now. Here we use event-related potentials (ERPs) in an adaptation paradigm, in order to investigate the neural representation and the time course of vocal identity processing. Participants adapted to repeated utterances of vowel-consonant-vowel (VCV) of one personally familiar speaker (either A or B), before classifying a subsequent test voice varying on an identity continuum between these two speakers. Following adaptation to speaker A, test voices were more likely perceived as speaker B and vice versa, and these contrastive voice identity aftereffects (VIAEs) were much more pronounced when the same syllable, rather than a different syllable, was used as adaptor. Adaptation induced amplitude reductions of the frontocentral N1-P2 complex and a prominent reduction of the parietal P3 component, for test voices preceded by identity-corresponding adaptors. Importantly, only the P3 modulation remained clear for across-syllable combinations of adaptor and test stimuli. Our results suggest that voice identity is contrastively processed by specialized neurons in auditory cortex within ~250 ms after stimulus onset, with identity processing becoming less dependent on speech content after ~300 ms.  相似文献   

8.
Our voices sound different depending on the context (laughing vs. talking to a child vs. giving a speech), making within‐person variability an inherent feature of human voices. When perceiving speaker identities, listeners therefore need to not only ‘tell people apart’ (perceiving exemplars from two different speakers as separate identities) but also ‘tell people together’ (perceiving different exemplars from the same speaker as a single identity). In the current study, we investigated how such natural within‐person variability affects voice identity perception. Using voices from a popular TV show, listeners, who were either familiar or unfamiliar with this show, sorted naturally varying voice clips from two speakers into clusters to represent perceived identities. Across three independent participant samples, unfamiliar listeners perceived more identities than familiar listeners and frequently mistook exemplars from the same speaker to be different identities. These findings point towards a selective failure in ‘telling people together’. Our study highlights within‐person variability as a key feature of voices that has striking effects on (unfamiliar) voice identity perception. Our findings not only open up a new line of enquiry in the field of voice perception but also call for a re‐evaluation of theoretical models to account for natural variability during identity perception.  相似文献   

9.
Recent studies have shown that the face and voice of an unfamiliar person can be matched for identity. Here the authors compare the relative effects of changing sentence content (what is said) and sentence manner (how it is said) on matching identity between faces and voices. A change between speaking a sentence as a statement and as a question disrupted matching performance, whereas changing the sentence itself did not. This was the case when the faces and voices were from the same race as participants and speaking a familiar language (English; Experiment 1) or from another race and speaking an unfamiliar language (Japanese; Experiment 2). Altering manner between conversational and clear speech (Experiment 3) or between conversational and casual speech (Experiment 4) was also disruptive. However, artificially slowing (Experiment 5) or speeding (Experiment 6) speech did not affect cross-modal matching performance. The results show that bimodal cues to identity are closely linked to manner but that content (what is said) and absolute tempo are not critical. Instead, prosodic variations in rhythmic structure and/or expressiveness may provide a bimodal, dynamic identity signature.  相似文献   

10.
Despite the absence of “conscious”, overt identification, some patients with face recognition impairments continue covertly to process information regarding face familiarity. The fact that by no means all patients show these covert effects has led to the suggestion that indirect recogntion tasks may help in identifying different types of face recognition impairment. The present report describes a number of experiments with the patient NR, who, after a closed head injury, has been severely impaired at recognizing familiar faces. Investigations mostly failed to show overt or covert face recognition, but NR performed at an above-chance level in selecting the familiar face on a task requiring a forced-choice between a familiar and an unfamiliar face. This discrepancy between a degree of rudimentary overt recognition and absence of covert effects on most indirect tests was addressed using a cross-domaim identity priming paradigm. This examined separately the possibility of preserved recognition for faces that NR consistently chose correctly in a forced-choice familiarity decision and those on which he performed at chance level. Priming effects were apparent only for the faces that were consistently chosen as “familiar” in forced-choice. We suggest that NR's stored representations of familiar faces are degraded, so that face recognition is possible only via a limited set of relatively preserved representations able to support a rudimentary form of overt recognition and to facilitate performance in matching and priming tasks.  相似文献   

11.
Face identification and voice identification were examined using a standard old/new recognition task in order to see whether seeing and hearing the target interfered with subsequent recognition. Participants studied either visual or audiovisual stimuli prior to a face recognition test, and studied either audio or audiovisual stimuli prior to a voice recognition test. Analysis of recognition performance revealed a greater ability to recognise faces than voices. More importantly, faces accompanying voices at study interfered with subsequent voice identification but voices accompanying faces at study did not interfere with subsequent face identification. These results are similar to those obtained in previous research using a lineup methodology, and are discussed with respect to the interference that can result when earwitnesses are also eyewitnesses. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

12.
In this research, we investigated the effects of voice and face information on the perceptual learning of talkers and on long-term memory for spoken words. In the first phase, listeners were trained over several days to identify voices from words presented auditorily or audiovisually. The training data showed that visual information about speakers enhanced voice learning, revealing cross-modal connections in talker processing akin to those observed in speech processing. In the second phase, the listeners completed an auditory or audiovisual word recognition memory test in which equal numbers of words were spoken by familiar and unfamiliar talkers. The data showed that words presented by familiar talkers were more likely to be retrieved from episodic memory, regardless of modality. Together, these findings provide new information about the representational code underlying familiar talker recognition and the role of stimulus familiarity in episodic word recognition.  相似文献   

13.
Identity perception often takes place in multimodal settings, where perceivers have access to both visual (face) and auditory (voice) information. Despite this, identity perception is usually studied in unimodal contexts, where face and voice identity perception are modelled independently from one another. In this study, we asked whether and how much auditory and visual information contribute to audiovisual identity perception from naturally-varying stimuli. In a between-subjects design, participants completed an identity sorting task with either dynamic video-only, audio-only or dynamic audiovisual stimuli. In this task, participants were asked to sort multiple, naturally-varying stimuli from three different people by perceived identity. We found that identity perception was more accurate for video-only and audiovisual stimuli compared with audio-only stimuli. Interestingly, there was no difference in accuracy between video-only and audiovisual stimuli. Auditory information nonetheless played a role alongside visual information as audiovisual identity judgements per stimulus could be predicted from both auditory and visual identity judgements, respectively. While the relationship was stronger for visual information and audiovisual information, auditory information still uniquely explained a significant portion of the variance in audiovisual identity judgements. Our findings thus align with previous theoretical and empirical work that proposes that, compared with faces, voices are an important but relatively less salient and a weaker cue to identity perception. We expand on this work to show that, at least in the context of this study, having access to voices in addition to faces does not result in better identity perception accuracy.  相似文献   

14.
In this study, we used the distinction between remember and know (R/K) recognition responses to investigate the retrieval of episodic information during familiar face and voice recognition. The results showed that familiar faces presented in standard format were recognized with R responses on approximately 50% of the trials. The corresponding figure for voices was less than 20%. Even when overall levels of recognition were matched between faces and voices by blurring the faces, significantly more R responses were observed for faces than for voices. Voices were significantly more likely to be recognized with K responses than were blurred faces. These findings indicate that episodic information was recalled more often from familiar faces than from familiar voices. The results also showed that episodic information about a familiar person was never recalled unless some semantic information, such as the person's occupation, was also retrieved.  相似文献   

15.
Vatakis A  Spence C 《Perception》2008,37(1):143-160
Research has shown that inversion is more detrimental to the perception of faces than to the perception of other types of visual stimuli. Inverting a face results in an impairment of configural information processing that leads to slowed early face processing and reduced accuracy when performance is tested in face recognition tasks. We investigated the effects of inverting speech and non-speech stimuli on audiovisual temporal perception. Upright and inverted audiovisual video clips of a person uttering syllables (experiments 1 and 2), playing musical notes on a piano (experiment 3), or a rhesus monkey producing vocalisations (experiment 4) were presented. Participants made unspeeded temporal-order judgments regarding which modality stream (auditory or visual) appeared to have been presented first. Inverting the visual stream did not have any effect on the sensitivity of temporal discrimination responses in any of the four experiments, thus implying that audiovisual temporal integration is resilient to the effects of orientation in the picture plane. By contrast, the point of subjective simultaneity differed significantly as a function of orientation only for the audiovisual speech stimuli but not for the non-speech stimuli or monkey calls. That is, smaller auditory leads were required for the inverted than for the upright-visual speech stimuli. These results are consistent with the longer processing latencies reported previously when human faces are inverted and demonstrates that the temporal perception of dynamic audiovisual speech can be modulated by changes in the physical properties of the visual speech (ie by changes in orientation).  相似文献   

16.
Several findings showed that semantic information is more likely to be retrieved from recognised faces than from recognised voices. Earlier experiments, which investigated the recall of biographical information following person recognition, used stimuli that were pre-experimentally familiar to the participants, such as famous people's voices and faces. We propose an alternative method to compare the participants' ability to associate semantic information with faces and voices. The present experiments allowed a very strict control of frequency of exposure to pre-experimentally unfamiliar faces and voices and ensured the absence of identity clues in the spoken extracts. In Experiment 1 semantic information was retrieved from the presentation of a name. In Experiment 2 semantic and lexical information was retrieved from faces and/or voices. A memory advantage for faces over voices was again observed.  相似文献   

17.
Damjanovic and Hanley (2007) showed that episodic information is more readily retrieved from familiar faces than familiar voices, even when the two presentation modalities are matched for overall recognition rates by blurring the faces. This pattern of performance contrasts with the results obtained by Hanley and Turner (2000) who showed that semantic information could be recalled equally easily from familiar blurred faces and voices. The current study used the procedure developed by Hanley and Turner (2000) and applied it to the stimuli used by Damjanovic and Hanley (2007). The findings showed a marked decrease in retrieval of occupations and names from familiar voices relative to blurred faces even though the two modalities were matched for overall levels of recognition and rated familiarity. Similar results were obtained in Experiment 2 in which the same participants were asked to recognise both faces and voices. It is argued that these findings pose problems for any model of person recognition (e.g., Burton, Bruce, & Johnston, 1990) in which familiarity decisions occur beyond the point at which information from different modalities has been integrated.  相似文献   

18.
Several findings showed that semantic information is more likely to be retrieved from recognised faces than from recognised voices. Earlier experiments, which investigated the recall of biographical information following person recognition, used stimuli that were pre-experimentally familiar to the participants, such as famous people's voices and faces. We propose an alternative method to compare the participants’ ability to associate semantic information with faces and voices. The present experiments allowed a very strict control of frequency of exposure to pre-experimentally unfamiliar faces and voices and ensured the absence of identity clues in the spoken extracts. In Experiment 1 semantic information was retrieved from the presentation of a name. In Experiment 2 semantic and lexical information was retrieved from faces and/or voices. A memory advantage for faces over voices was again observed.  相似文献   

19.
Why are familiar-only experiences more frequent for voices than for faces?   总被引:1,自引:0,他引:1  
Hanley,Smith, and Hadfield (1998) showed that when participants were asked to recognize famous people from hearing their voice , there was a relatively large number of trials in which the celebrity's voice was felt to be familiar but biographical information about the person could not be retrieved. When a face was found familiar, however, the celebrity's occupation was significantly more likely to be recalled. This finding is consistent with the view that it is much more difficult to associate biographical information with voices than with faces. Nevertheless, recognition level was much lower for voices than for faces in Hanleyet al.'s study,and participants made significantly more false alarms in the voice condition. In the present study, recognition performance in the face condition was brought down to the same level as recognition in the voice condition by presenting the faces out of focus. Under these circumstances, it proved just as difficult to recall the occupations of faces found familiar as it was to recall the occupations of voices found familiar. In other words, there was an equally large number of familiar-only responses when faces were presented out of focus as in the voice condition. It is argued that these results provide no support for the view that it is relatively difficult to associate biographical information with a person's voice. It is suggested instead that associative connections between processing units at different levels in the voice-processing system are much weaker than is the case with the corresponding units in the face-processing system. This will reduce the recall of occupations from voices even when the voice has been found familiar. A simulation was performed using the latest version of the IAC model of person recognition (Burton, Bruce, & Hancock, 1999) which demonstrated that the model can readily accommodate the pattern of results obtained in this study.  相似文献   

20.
Reaction times to make a familiarity decision to the faces of famous people were measured after recognition of the faces in a pre-training phase had occurred spontaneously or following prompting with a name or other cue. At test, reaction times to familiar faces that had been recognized spontaneously in the pre-training phase were significantly facilitated relative to an unprimed comparison condition. Reaction times to familiar faces recognized only after prompting in the pre-training phase were not significantly facilitated. This was demonstrated both when a name prompt was used (Experiments 1 and 3) and when subjects were cued with brief semantic information (Experiment 2).

Repetition priming was not found to depend on prior spontaneous recognition per se. In Experiment 3, spontaneously recognizing a familiar face did not prime subsequent familiarity judgements when the same face had only been identified following prompting on a prior encounter. In Experiment 4, recognition memory for faces recognized after cueing was found to be over 90% accurate. This indicates that prompted recognition does not yield repetition priming, even though subjects can remember the faces. A fusion of “face recognition unit” and “episodic record” accounts of the repetition priming effect may be more useful than either theory alone in explaining these results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号