首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The ability to interpret vocal (prosodic) cues during social interactions can be disrupted by Parkinson's disease, with notable effects on how emotions are understood from speech. This study investigated whether PD patients who have emotional prosody deficits exhibit further difficulties decoding the attitude of a speaker from prosody. Vocally inflected but semantically nonsensical ‘pseudo‐utterances’ were presented to listener groups with and without PD in two separate rating tasks. Task 1 required participants to rate how confident a speaker sounded from their voice and Task 2 required listeners to rate how polite the speaker sounded for a comparable set of pseudo‐utterances. The results showed that PD patients were significantly less able than HC participants to use prosodic cues to differentiate intended levels of speaker confidence in speech, although the patients could accurately detect the polite/impolite attitude of the speaker from prosody in most cases. Our data suggest that many PD patients fail to use vocal cues to effectively infer a speaker's emotions as well as certain attitudes in speech such as confidence, consistent with the idea that the basal ganglia play a role in the meaningful processing of prosodic sequences in spoken language ( Pell & Leonard, 2003 ).  相似文献   

2.
A standard speaker read linguistically confident and doubtful texts in a confident or doubtful voice. A computer-based acoustic analysis of the four tapes showed that paralinguistic confidence was expressed by increased loudness of voice, rapid rate of speech, and infrequent, short pauses. Under some conditions, higher pitch levels and greater pitch and energy fluctuations in the voice were related to paralinguistic confidence. In a 2 × 2 design, observers perceived and used these cues to attribute confidence and related personality traits to the speaker. Both text and voice cues are related to confidence ratings; in addition, the two types of cue are related to differing personality attributes.  相似文献   

3.
Under a noisy “cocktail-party” listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker’s voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker’s voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.  相似文献   

4.
Voice recognition was assessed by a matching to sample procedure in 30 right-handed adults with normal hearing. The subject was required to indicate which of three voices speaking a nonsense syllable matched the speaker of a sample vowel. Subjects were able to recognize voices with reasonable accuracy, but there were no significant differences as a function of ears or practice, and performance was not markedly affected by knowledge of results or mode of response. There was a significant difference as a function of the temporal position of the matching voice, with recognition being most accurate when the matching voice was first and least accurate when it was third. Further research is necessary to determine whether voice recognition should be classified as the type of verbal ability associated with the cerebral hemisphere dominant for speech, or whether it is the type of nonverbal auditory ability associated with the nonspeech hemisphere.This research was supported in part by grant MA-1652 from the Medical Research Council of Canada.  相似文献   

5.
Prior work shows that children can make inductive inferences about objects based on their labels rather than their appearance (Gelman, 2003). A separate line of research shows that children's trust in a speaker's label is selective. Children accept labels from a reliable speaker over an unreliable speaker (e.g., Koenig & Harris, 2005). In the current paper, we tested whether 3- and 5-year-old children attend to speaker reliability when they make inductive inferences about a non-obvious property of a novel artifact based on its label. Children were more likely to use a reliable speaker's label than an unreliable speaker's label when making inductive inferences. Thus, children not only prefer to learn from reliable speakers, they are also more likely to use information from reliable speakers as the basis for future inferences. The findings are discussed in light of the debate between a similarity-driven and a label-driven approach to inductive inferences.  相似文献   

6.
Are listeners able to adapt to a foreign-accented speaker who has, as is often the case, an inconsistent accent? Two groups of native Dutch listeners participated in a cross-modal priming experiment, either in a consistent-accent condition (German-accented items only) or in an inconsistent-accent condition (German-accented and nativelike pronunciations intermixed). The experimental words were identical for both groups (words with vowel substitutions characteristic of German-accented speech); additional contextual words differed in accentedness (German-accented or nativelike words). All items were spoken by the same speaker: a German native who could produce the accented forms but could also pass for a Dutch native speaker. Listeners in the consistent-accent group were able to adapt quickly to the speaker (i.e., showed facilitatory priming for words with vocalic substitutions). Listeners in the inconsistent-accent condition showed adaptation to words with vocalic substitutions only in the second half of the experiment. These results indicate that adaptation to foreign-accented speech is rapid. Accent inconsistency slows listeners down initially, but a short period of additional exposure is enough for them to adapt to the speaker. Listeners can therefore tolerate inconsistency in foreign-accented speech.  相似文献   

7.
A man, woman or child saying the same vowel do so with very different voices. The auditory system solves the complex problem of extracting what the man, woman or child has said despite substantial differences in the acoustic properties of their voices. Much of the acoustic variation between the voices of men and woman is due to changes in the underlying anatomical mechanisms for producing speech. If the auditory system knew the sex of the speaker then it could potentially correct for speaker sex related acoustic variation thus facilitating vowel recognition. This study measured the minimum stimulus duration necessary to accurately discriminate whether a brief vowel segment was spoken by a man or woman, and the minimum stimulus duration necessary to accuately recognise what vowel was spoken. Results showed that reliable vowel recognition precedesreliable speaker sex discrimination, thus questioning the use of speaker sex information in compensating for speaker sex related acoustic variation in the voice. Furthermore, the pattern of performance across experiments where the fundamental frequency and formant frequency information of speaker's voices were systematically varied, was markedly different depending on whether the task was speaker-sex discrimination or vowel recognition. This argues for there being little relationship between perception of speaker sex (indexical information) and perception of what has been said (linguistic information) at short durations.  相似文献   

8.
Apart from speech content, the human voice also carries paralinguistic information about speaker identity. Voice identification and its neural correlates have received little scientific attention up to now. Here we use event-related potentials (ERPs) in an adaptation paradigm, in order to investigate the neural representation and the time course of vocal identity processing. Participants adapted to repeated utterances of vowel-consonant-vowel (VCV) of one personally familiar speaker (either A or B), before classifying a subsequent test voice varying on an identity continuum between these two speakers. Following adaptation to speaker A, test voices were more likely perceived as speaker B and vice versa, and these contrastive voice identity aftereffects (VIAEs) were much more pronounced when the same syllable, rather than a different syllable, was used as adaptor. Adaptation induced amplitude reductions of the frontocentral N1-P2 complex and a prominent reduction of the parietal P3 component, for test voices preceded by identity-corresponding adaptors. Importantly, only the P3 modulation remained clear for across-syllable combinations of adaptor and test stimuli. Our results suggest that voice identity is contrastively processed by specialized neurons in auditory cortex within ~250 ms after stimulus onset, with identity processing becoming less dependent on speech content after ~300 ms.  相似文献   

9.
We describe an account of lexically guided tuning of speech perception based on interactive processing and Hebbian learning. Interactive feedback provides lexical information to prelexical levels, and Hebbian learning uses that information to retune the mapping from auditory input to prelexical representations of speech. Simulations of an extension of the TRACE model of speech perception are presented that demonstrate the efficacy of this mechanism. Further simulations show that acoustic similarity can account for the patterns of speaker generalization. This account addresses the role of lexical information in guiding both perception and learning with a single set of principles of information propagation.  相似文献   

10.
An experiment was conducted to investigate the reliability of voice lineups. More specifically, the experiment was designed to look into the effects of retention interval (an immediate test or after a week), speech duration (30 or 70 s) and acoustic environment (indoors or indoors and outdoors) on speaker identification accuracy. In addition, the relation between confidence assessments of both participants and test assistant on the one hand and identification accuracy was explored. A total of 361 participants heard the target voice in one of four exposure conditions (short or long text and speech samples recorded only indoors or indoors and outdoors). Half the participants were tested immediately after exposure to the target voice and half 1 week later. The results show that the target was correctly identified in 42% of cases. In the target‐absent condition there were 51% false alarms. Acoustic environment did not affect identification accuracy. There was an interaction between speech duration and retention interval in the target‐absent condition. When listeners were tested after a week, they made fewer false identifications if the speech sample was long. No effects were found when participants were tested immediately. Only the confidence scores of the test assistant had predictive value. Taking the confidence score of the test assistant into account therefore increases the diagnostic value of the identity parade. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

11.
We are constantly exposed to our own face and voice, and we identify our own faces and voices as familiar. However, the influence of self-identity upon self-speech perception is still uncertain. Speech perception is a synthesis of both auditory and visual inputs; although we hear our own voice when we speak, we rarely see the dynamic movements of our own face. If visual speech and identity are processed independently, no processing advantage would obtain in viewing one’s own highly familiar face. In the present experiment, the relative contributions of facial and vocal inputs to speech perception were evaluated with an audiovisual illusion. Our results indicate that auditory self-speech conveys a processing advantage, whereas visual self-speech does not. The data thereby support a model of visual speech as dynamic movement processed separately from speaker recognition.  相似文献   

12.
Are perceptions of computer-synthesized speech altered by the belief that the person using this technology is disabled? In a 2 x 2 factorial design, participants completed an attitude pretest and were randomly assigned to watch an actor deliver a persuasive appeal under 1 of the following 4 conditions: disabled or nondisabled using normal speech and disabled or nondisabled using computer-synthesized speech. Participants then completed a posttest survey and a series of questionnaires assessing perceptions of voice, speaker, and message. Natural speech was perceived more favorably and was more persuasive than computer-synthesized speech. When the speaker was perceived to be speech-disabled, however, this difference diminished. This finding suggests that negatively viewed assistive technologies will be perceived more favorably when used by people with disabilities.  相似文献   

13.
Past research has established that listeners can accommodate a wide range of talkers in understanding language. How this adjustment operates, however, is a matter of debate. Here, listeners were exposed to spoken words from a speaker of an American English dialect in which the vowel /ae/ is raised before /g/, but not before /k/. Results from two experiments showed that listeners' identification of /k/-final words like back (which are unaffected by the dialect) was facilitated by prior exposure to their dialect-affected /g/-final counterparts, e.g., bag. This facilitation occurred because the competition between interpretations, e.g., bag or back, while hearing the initial portion of the input [bae], was mitigated by the reduced probability for the input to correspond to bag as produced by this talker. Thus, adaptation to an accent is not just a matter of adjusting the speech signal as it is being heard; adaptation involves dynamic adjustment of the representations stored in the lexicon, according to the characteristics of the speaker or the context.  相似文献   

14.
While audiovisual integration is well known in speech perception, faces and speech are also informative with respect to speaker recognition. To date, audiovisual integration in the recognition of familiar people has never been demonstrated. Here we show systematic benefits and costs for the recognition of familiar voices when these are combined with time-synchronized articulating faces, of corresponding or noncorresponding speaker identity, respectively. While these effects were strong for familiar voices, they were smaller or nonsignificant for unfamiliar voices, suggesting that the effects depend on the previous creation of a multimodal representation of a person's identity. Moreover, the effects were reduced or eliminated when voices were combined with the same faces presented as static pictures, demonstrating that the effects do not simply reflect the use of facial identity as a “cue” for voice recognition. This is the first direct evidence for audiovisual integration in person recognition.  相似文献   

15.
语音告警信号语速研究   总被引:3,自引:0,他引:3  
用普通会话句表和飞机告警句表两种测试材料 ,以言语可懂度测试法和主观评价法研究言语告警信号的适宜语速。实验中的语速定为 0 .1 1、0 .1 5、0 .2 0、0 .2 5、0 .3 5和 0 .45秒 /字六级。实验模拟飞机座舱环境 ,采用计算机生成的数字化言语信号 ,在 90 d B(A)的飞机噪声环境下 ,通过耳机传递给被试。研究得到以下结论 :言语告警信号的适宜语速为 0 .2 5秒 /字 (或 4字 /秒 ) ,它的下限为 >0 .2 0秒 /字 (或 <5字 /秒 ) ,它的上限为 0 .3 0秒 /字 (或 3 .3 3字 /秒 )。  相似文献   

16.
Borden’s (1979, 1980) hypothesis that speakers with vulnerable speech systems rely more heavily on feedback monitoring than do speakers with less vulnerable systems was investigated. The second language (L2) of a speaker is vulnerable, in comparison with the native language, so alteration to feedback should have a detrimental effect on it, according to this hypothesis. Here, we specifically examined whether altered auditory feedback has an effect on accent strength when speakers speak L2. There were three stages in the experiment. First, 6 German speakers who were fluent in English (their L2) were recorded under six conditions—normal listening, amplified voice level, voice shifted in frequency, delayed auditory feedback, and slowed and accelerated speech rate conditions. Second, judges were trained to rate accent strength. Training was assessed by whether it was successful in separating German speakers speaking English from native English speakers, also speaking English. In the final stage, the judges ranked recordings of each speaker from the first stage as to increasing strength of German accent. The results show that accents were more pronounced under frequency-shifted and delayed auditory feedback conditions than under normal or amplified feedback conditions. Control tests were done to ensure that listeners were judging accent, rather than fluency changes caused by altered auditory feedback. The findings are discussed in terms of Borden’s hypothesis and other accounts about why altered auditory feedback disrupts speech control.  相似文献   

17.
The effects of variation in a speaker's voice and temporal phoneme location were assessed through a series of speeded classification experiments. Listeners monitored speech syllables for target consonants or vowels. The results showed that speaker variability and phoneme-location variability had detrimental effects on classification latencies for target sounds. In addition, an interaction between variables showed that the speaker variability effect was obtained only when temporal phoneme location was fixed across trials. A subadditive decrement in latencies produced by the interaction of the two variables was also obtained, suggesting that perceptual loads may not affect perceptual adjustments to a speaker's voice in the same way that memory loads do.  相似文献   

18.
Recent work on perceptual learning shows that listeners' phonemic representations dynamically adjust to reflect the speech they hear (Norris, McQueen, & Cutler, 2003). We investigate how the perceptual system makes such adjustments, and what (if anything) causes the representations to return to their pre-perceptual learning settings. Listeners are exposed to a speaker whose pronunciation of a particular sound (either /s/ or /integral/) is ambiguous (e.g., halfway between /s/ and /integral/). After exposure, participants are tested for perceptual learning on two continua that range from /s/ to /integral/, one in the Same voice they heard during exposure, and one in a Different voice. To assess how representations revert to their prior settings, half of Experiment 1's participants were tested immediately after exposure; the other half performed a 25-min silent intervening task. The perceptual learning effect was actually larger after such a delay, indicating that simply allowing time to pass does not cause learning to fade. The remaining experiments investigate different ways that the system might unlearn a person's pronunciations: listeners hear the Same or a Different speaker for 25 min with either: no relevant (i.e., 'good') /s/ or /integral/ input (Experiment 2), one of the relevant inputs (Experiment 3), or both relevant inputs (Experiment 4). The results support a view of phonemic representations as dynamic and flexible, and suggest that they interact with both higher- (e.g., lexical) and lower-level (e.g., acoustic) information in important ways.  相似文献   

19.
20.
During a conversation, we hear the sound of the talker as well as the intended message. Traditional models of speech perception posit that acoustic details of a talker's voice are not encoded with the message whereas more recent models propose that talker identity is automatically encoded. When shadowing speech, listeners often fail to detect a change in talker identity. The present study was designed to investigate whether talker changes would be detected when listeners are actively engaged in a normal conversation, and visual information about the speaker is absent. Participants were called on the phone, and during the conversation the experimenter was surreptitiously replaced by another talker. Participants rarely noticed the change. However, when explicitly monitoring for a change, detection increased. Voice memory tests suggested that participants remembered only coarse information about both voices, rather than fine details. This suggests that although listeners are capable of change detection, voice information is not continuously monitored at a fine-grain level of acoustic representation during natural conversation and is not automatically encoded. Conversational expectations may shape the way we direct attention to voice characteristics and perceive differences in voice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号