首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
Despite spectral and temporal discontinuities in the speech signal, listeners normally report coherent phonetic patterns corresponding to the phonemes of a language that they know. What is the basis for the internal coherence of phonetic segments? According to one account, listeners achieve coherence by extracting and integrating discrete cues; according to another, coherence arises automatically from general principles of auditory form perception; according to a third, listeners perceive speech patterns as coherent because they are the acoustic consequences of coordinated articulatory gestures in a familiar language. We tested these accounts in three experiments by training listeners to hear a continuum of three-tone, modulated sine wave patterns, modeled after a minimal pair contrast between three-formant synthetic speech syllables, either as distorted speech signals carrying a phonetic contrast (speech listeners) or as distorted musical chords carrying a nonspeech auditory contrast (music listeners). The music listeners could neither integrate the sine wave patterns nor perceive their auditory coherence to arrive at consistent, categorical percepts, whereas the speech listeners judged the patterns as speech almost as reliably as the synthetic syllables on which they were modeled. The outcome is consistent with the hypothesis that listeners perceive the phonetic coherence of a speech signal by recognizing acoustic patterns that reflect the coordinated articulatory gestures from which they arose.  相似文献   

2.
Is the observed link between musical ability and non‐native speech‐sound processing due to enhanced sensitivity to acoustic features underlying both musical and linguistic processing? To address this question, native English speakers (N = 118) discriminated Norwegian tonal contrasts and Norwegian vowels. Short tones differing in temporal, pitch, and spectral characteristics were used to measure sensitivity to the various acoustic features implicated in musical and speech processing. Musical ability was measured using Gordon's Advanced Measures of Musical Audiation. Results showed that sensitivity to specific acoustic features played a role in non‐native speech‐sound processing: Controlling for non‐verbal intelligence, prior foreign language‐learning experience, and sex, sensitivity to pitch and spectral information partially mediated the link between musical ability and discrimination of non‐native vowels and lexical tones. The findings suggest that while sensitivity to certain acoustic features partially mediates the relationship between musical ability and non‐native speech‐sound processing, complex tests of musical ability also tap into other shared mechanisms.  相似文献   

3.
Although little studied, whining is a vocal pattern that is both familiar and irritating to parents of preschool‐ and early school‐age children. The current study employed multidimensional scaling to identify the crucial acoustic characteristics of whining speech by analysing participants' perceptions of its similarity to other types of speech (question, neutral speech, angry statement, demand, and boasting). We discovered not only that participants find whining speech more annoying than other forms of speech, but that it shares the salient acoustic characteristics found in motherese, namely increased pitch, slowed production, and exaggerated pitch contours. We think that this relationship is not random but may reflect the fact that the two forms of vocalization are the result of a similar accommodation to a universal human auditory sensitivity to the prosody of both forms of speech. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

4.
When listening to speech in everyday-life situations, our cognitive system must often cope with signal instabilities such as sudden breaks, mispronunciations, interfering noises or reverberations potentially causing disruptions at the acoustic/phonetic interface and preventing efficient lexical access and semantic integration. The physiological mechanisms allowing listeners to react instantaneously to such fast and unexpected perturbations in order to maintain intelligibility of the delivered message are still partly unknown. The present electroencephalography (EEG) study aimed at investigating the cortical responses to real-time detection of a sudden acoustic/phonetic change occurring in connected speech and how these mechanisms interfere with semantic integration. Participants listened to sentences in which final words could contain signal reversals along the temporal dimension (time-reversed speech) of varying durations and could have either a low- or high-cloze probability within sentence context. Results revealed that early detection of the acoustic/phonetic change elicited a fronto-central negativity shortly after the onset of the manipulation that matched the spatio-temporal features of the Mismatch Negativity (MMN) recorded in the same participants during an oddball paradigm. Time reversal also affected late event-related potentials (ERPs) reflecting semantic expectancies (N400) differently when words were predictable or not from the sentence context. These findings are discussed in the context of brain signatures to transient acoustic/phonetic variations in speech. They contribute to a better understanding of natural speech comprehension as they show that acoustic/phonetic information and semantic knowledge strongly interact under adverse conditions.  相似文献   

5.
Lip reading is the ability to partially understand speech by looking at the speaker's lips. It improves the intelligibility of speech in noise when audio-visual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speaker's lips also enhances sensitivity to acoustic information, decreasing the auditory detection threshold of speech embedded in noise [J. Acoust. Soc. Am. 109 (2001) 2272; J. Acoust. Soc. Am. 108 (2000) 1197]. However, detection is different from comprehension, and it remains to be seen whether improved sensitivity also results in an intelligibility gain in audio-visual speech perception. In this work, we use an original paradigm to show that seeing the speaker's lips enables the listener to hear better and hence to understand better. The audio-visual stimuli used here could not be differentiated by lip reading per se since they contained exactly the same lip gesture matched with different compatible speech sounds. Nevertheless, the noise-masked stimuli were more intelligible in the audio-visual condition than in the audio-only condition due to the contribution of visual information to the extraction of acoustic cues. Replacing the lip gesture by a non-speech visual input with exactly the same time course, providing the same temporal cues for extraction, removed the intelligibility benefit. This early contribution to audio-visual speech identification is discussed in relationships with recent neurophysiological data on audio-visual perception.  相似文献   

6.
The cyclic variation in the energy envelope of the speech signal results from the production of speech in syllables. This acoustic property is often identified as a source of information in the perception of syllable attributes, though spectral variation can also provide this information reliably. In the present study of the relative contributions of the energy and spectral envelopes in speech perception, we employed sinusoidal replicas of utterances, which permitted us to examine the roles of these acoustic properties in establishing or maintaining time-varying perceptual coherence. Three experiments were carried out to assess the independent perceptual effects of variation in sinusoidal amplitude and frequency, using sentence-length signals. In Experiment 1, we found that the fine grain of amplitude variation was not necessary for the perception of segmental and suprasegmental linguistic attributes; in Experiment 2, we found that amplitude was nonetheless effective in influencing syllable perception, and that in some circumstances it was crucial to segmental perception; in Experiment 3, we observed that coarse-grain amplitude variation, above all, proved to be extremely important in phonetic perception. We conclude that in perceiving sinusoidal replicas, the perceiver derives much from following the coherent pattern of frequency variation and gross signal energy, but probably derives rather little from tracking the precise details of the energy envelope. These findings encourage the view that the perceiver uses time-varying acoustic properties selectively in understanding speech.  相似文献   

7.
语音感知的发展状况对个体的语言发展有着深远影响。生命的第一年中, 在语言经验的作用下, 婴儿的语音感知从最初的普遍性感知逐渐发展为对母语的特异性感知。研究者们提出统计学习机制对这一过程加以解释, 即婴儿对语言环境中语音的频次分布十分敏感, 可以通过对频次分布的计算, 从语音的连续体中区分出在母语中起区别意义作用的各个语音范畴。同时, 功能性重组机制和一些社会性线索也会对婴儿语音感知的发展产生重要影响。  相似文献   

8.
When listeners hear a sinusoidal replica of a sentence, they perceive linguistic properties despite the absence of short-time acoustic components typical of vocal signals. Is this accomplished by a postperceptual strategy that accommodates the anomalous acoustic pattern ad hoc, or is a sinusoidal sentence understood by the ordinary means of speech perception? If listeners treat sinusoidal signals as speech signals however unlike speech they may be, then perception should exhibit the commonplace sensitivity to the dimensions of the originating vocal tract. The present study, employing sinusoidal signals, raised this issue by testing the identification of target /bVt/, or b-vowel-t, syllables occurring in sentences that differed in the range of frequency variation of their component tones. Vowel quality of target syllables was influenced by this acoustic correlate of vocal-tract scale, implying that the perception of these nonvocal signals includes a process of vocal-tract scale, implying that the perception of these nonvocal signals includes a process of vocal-tract normalization. Converging evidence suggests that the perception of sinusoidal vowels depends on the relation among component tones and not on the phonetic likeness of each tone in isolation. The findings support the general claim that sinusoidal replicas of natural speech signals are perceptible phonetically because they preserve time-varying information present in natural signals.  相似文献   

9.
Wonnacott E  Watson DG 《Cognition》2008,107(3):1093-1101
Acoustic emphasis may convey a range of subtle discourse distinctions, yet little is known about how this complex ability develops in children. This paper presents a first investigation of the factors which influence the production of acoustic prominence in young children’s spontaneous speech. In a production experiment, SVO sentences were elicited from 4 year olds who were asked to describe events in a video. Children were found to place more acoustic prominence both on ‘new’ words and on words that were ‘given’ but had shifted to a more accessible position within the discourse. This effect of accessibility concurs with recent studies of adult speech. We conclude that, by age four, children show appropriate, adult-like use of acoustic prominence, suggesting sensitivity to a variety of discourse distinctions.  相似文献   

10.
Speech sounds can be classified on the basis of their underlying articulators or on the basis of the acoustic characteristics resulting from particular articulatory positions. Research in speech perception suggests that distinctive features are based on both articulatory and acoustic information. In recent years, neuroelectric and neuromagnetic investigations provided evidence for the brain's early sensitivity to distinctive features and their acoustic consequences, particularly for place of articulation distinctions. Here, we compare English consonants in a Mismatch Field design across two broad and distinct places of articulation - labial and coronal - and provide further evidence that early evoked auditory responses are sensitive to these features. We further add to the findings of asymmetric consonant processing, although we do not find support for coronal underspecification. Labial glides (Experiment 1) and fricatives (Experiment 2) elicited larger Mismatch responses than their coronal counterparts. Interestingly, their M100 dipoles differed along the anterior/posterior dimension in the auditory cortex that has previously been found to spatially reflect place of articulation differences. Our results are discussed with respect to acoustic and articulatory bases of featural speech sound classifications and with respect to a model that maps distinctive phonetic features onto long-term representations of speech sounds.  相似文献   

11.
Developmental changes in children’s sensitivity to the role of acoustic variation in the speech stream in conveying speaker affect (vocal paralanguage) were examined. Four‐, 7‐ and 10‐year‐olds heard utterances in three formats: low‐pass filtered, reiterant, and normal speech. The availability of lexical and paralinguistic information varied across these three formats in a way that required children to base their judgments of speaker affect on different configurations of cues in each format. Across ages, the best performance was obtained when a rich array of acoustic cues was present and when there was no competing lexical information. Four‐year‐olds performed at chance when judgments had to be based solely on speech prosody in the filtered format and they were unable to selectively attend to paralanguage when discrepant lexical cues were present in normal speech. Seven‐year‐olds were significantly more sensitive to the paralinguistic role of speech prosody in filtered speech than were 4‐year‐olds and there was a trend toward greater attention to paralanguage when lexical and paralinguistic cues were inconsistent in normal speech. An integration of the ability to utilize prosodic cues to speaker affect with attention to paralanguage in cases of lexical/paralinguistic discrepancy was observed for 10‐year‐olds. The results are discussed in terms of the development of a perceptual bias emerging out of selective attention to language.  相似文献   

12.
Two- and 3-month-old infants were found to discriminate the acoustic cues for the phonetic feature of place of articulation in a categorical manner; that is, evidence for the discriminability of two synthetic speech patterns was present only when the stimuli signaled a change in the phonetic feature of place. No evidence of discriminability was found when two stimuli, separated by the same acoustic difference, signaled acoustic variations of the same phonetic feature. Discrimination of the same acoustic cues in a nonspeech context was found, in contrast, to be noncategorical or continuous. The results were discussed in terms of infants’ ability to process acoustic events in either an auditory or a linguistic mode.  相似文献   

13.
Two experiments were performed employing acoustic continua which change from speech to nonspeech. The members of one continuum, synthesized on the Pattern Playback, varied in the bandwidths of the first three formants in equal steps of change, from the vowel /α/ to a nonspeech buzz. The other continuum, achieved through digital synthesis, varied in the bandwidths of the first five formants, from the vowel /æ/ to a buzz. Identification and discrimination tests were carried out to establish that these continua were perceived categorically. Perceptual adaptation of these continua revealed shifts in the category boundaries comparable to those previously reported for speech sounds. The results were interpreted as suggesting that neither phonetic nor auditory feature detectors are responsible for perceptual adaptation of speech sounds, and that feature detector accounts of speech perception should therefore be reconsidered.  相似文献   

14.
Children determined to be at risk (n = 24) or not at risk (n = 13) for reading difficulty listened to tokens from a voice onset time (VOT) (/ga/-/ka/) or tone series played in a continuous unbroken rhythm. Changes between tokens occurred at random intervals and children were asked to press a button as soon as they detected a change. For the VOT series, at-risk children were less sensitive than not-at-risk children to changes between tokens that crossed the phonetic boundary. Maps of group stimulus space produced using multidimensional scaling of reaction times for the VOT series indicated that at-risk children may attend less to the phonological information available in the speech stimuli and more to subtle acoustic differences between phonetically similar stimuli than not-at-risk children. Better phonological processing was associated with greater sensitivity to changes between VOT tokens that crossed the phonetic boundary and greater relative weighting of the phonological compared to the acoustic dimension across both groups.  相似文献   

15.
婴儿期言语知觉研究表明婴儿最初(1~4个月)可以分辨几乎所有的语音范畴对比;随母语经验增加,婴儿言语知觉逐渐表现出母语语音特征的影响,辅音知觉表现为对母语语音范畴界限敏感性的提高和对非母语范畴界限敏感性的下降,非母语范畴开始同化到母语音系中去,母语元音知觉表现出知觉磁体效应。这些证据表明婴儿逐渐习得母语音位范畴,音位范畴习得顺序可能依赖范畴例子本身的声学特征、发生频率等因素  相似文献   

16.
During a conversation, we hear the sound of the talker as well as the intended message. Traditional models of speech perception posit that acoustic details of a talker's voice are not encoded with the message whereas more recent models propose that talker identity is automatically encoded. When shadowing speech, listeners often fail to detect a change in talker identity. The present study was designed to investigate whether talker changes would be detected when listeners are actively engaged in a normal conversation, and visual information about the speaker is absent. Participants were called on the phone, and during the conversation the experimenter was surreptitiously replaced by another talker. Participants rarely noticed the change. However, when explicitly monitoring for a change, detection increased. Voice memory tests suggested that participants remembered only coarse information about both voices, rather than fine details. This suggests that although listeners are capable of change detection, voice information is not continuously monitored at a fine-grain level of acoustic representation during natural conversation and is not automatically encoded. Conversational expectations may shape the way we direct attention to voice characteristics and perceive differences in voice.  相似文献   

17.
How might young learners parse speech into linguistically relevant units? Sensitivity to prosodic markers of these segments is one possibility. Seven experiments examined infants' sensitivity to acoustic correlates of phrasal units in English. The results suggest that: (a) 9 month olds, but not 6 month olds, are attuned to cues that differentially mark speech that is artificially segmented at linguistically COINCIDENT as opposed to NONCOINCIDENT boundaries (Experiments 1 and 2); (b) the pattern holds across both subject phrases and predicate phrases and across samples of both Child- and Adult-directed speech (Experiments 3, 4, and 7); and (c) both 9 month olds and adults show the sensitivity even when most phonetic information is removed by low-pass filtering (Experiments 5 and (6). Acoustic analyses suggest that pitch changes and in some cases durational changes are potential cues that infants might be using to make their discriminations. These findings are discussed with respect to their implications for theories of language acquisition.  相似文献   

18.
During a conversation, we hear the sound of the talker as well as the intended message. Traditional models of speech perception posit that acoustic details of a talker's voice are not encoded with the message whereas more recent models propose that talker identity is automatically encoded. When shadowing speech, listeners often fail to detect a change in talker identity. The present study was designed to investigate whether talker changes would be detected when listeners are actively engaged in a normal conversation, and visual information about the speaker is absent. Participants were called on the phone, and during the conversation the experimenter was surreptitiously replaced by another talker. Participants rarely noticed the change. However, when explicitly monitoring for a change, detection increased. Voice memory tests suggested that participants remembered only coarse information about both voices, rather than fine details. This suggests that although listeners are capable of change detection, voice information is not continuously monitored at a fine-grain level of acoustic representation during natural conversation and is not automatically encoded. Conversational expectations may shape the way we direct attention to voice characteristics and perceive differences in voice.  相似文献   

19.
Speech processing requires sensitivity to long-term regularities of the native language yet demands listeners to flexibly adapt to perturbations that arise from talker idiosyncrasies such as nonnative accent. The present experiments investigate whether listeners exhibit dimension-based statistical learning of correlations between acoustic dimensions defining perceptual space for a given speech segment. While engaged in a word recognition task guided by a perceptually unambiguous voice-onset time (VOT) acoustics to signal beer, pier, deer, or tear, listeners were exposed incidentally to an artificial "accent" deviating from English norms in its correlation of the pitch onset of the following vowel (F0) to VOT. Results across four experiments are indicative of rapid, dimension-based statistical learning; reliance on the F0 dimension in word recognition was rapidly down-weighted in response to the perturbation of the correlation between F0 and VOT dimensions. However, listeners did not simply mirror the short-term input statistics. Instead, response patterns were consistent with a lingering influence of sensitivity to the long-term regularities of English. This suggests that the very acoustic dimensions defining perceptual space are not fixed and, rather, are dynamically and rapidly adjusted to the idiosyncrasies of local experience, such as might arise from nonnative-accent, dialect, or dysarthria. The current findings extend demonstrations of "object-based" statistical learning across speech segments to include incidental, online statistical learning of regularities residing within a speech segment.  相似文献   

20.
In a spoken utterance, a talker expresses linguistic constituents in serial order. A listener resolves these linguistic properties in the rapidly fading auditory sample. Classic measures agree that auditory integration occurs at a fine temporal grain. In contrast, recent studies have proposed that sensory integration of speech occurs at a coarser grain, approximate to the syllable, on the basis of indirect and relatively insensitive perceptual measures. Evidence from cognitive neuroscience and behavioral primatology has also been adduced to support the claim of sensory integration at the pace of syllables. In the present investigation, we used direct performance measures of integration, applying an acoustic technique to isolate the contribution of short-term acoustic properties to the assay of modulation sensitivity. In corroborating the classic finding of a fine temporal grain of integration, these functional measures can inform theory and speculation in accounts of speech perception.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号