首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Three experiments were designed to investigate how listeners to coarticulated speech use the acoustic speech signal during a vowel to extract information about a forthcoming oral or nasal consonant. A first experiment showed that listeners use evidence of nasalization in a vowel as information for a forthcoming nasal consonant. A second and third experiment attempted to distinguish two accounts of their ability to do so. According to one account, listeners hear nasalization in the vowel as such and use it to predict that a forthcoming nasal consonant is nasal. According to a second, they perceive speech gestures and hear nasalization in the acoustic domain of a vowel as the onset of a nasal consonant. Therefore, they parse nasal information from a vowel and hear the vowel as oral. In Experiment 2, evidence in favor of the parsing hypothesis was found. Experiment 3 showed, however, that parsing is incomplete.  相似文献   

2.
This investigation examined whether speakers produce reliable prosodic correlates to meaning across semantic domains and whether listeners use these cues to derive word meaning from novel words. Speakers were asked to produce phrases in infant-directed speech in which novel words were used to convey one of two meanings from a set of antonym pairs (e.g., big/small). Acoustic analyses revealed that some acoustic features were correlated with overall valence of the meaning. However, each word meaning also displayed a unique acoustic signature, and semantically related meanings elicited similar acoustic profiles. In two perceptual tests, listeners either attempted to identify the novel words with a matching meaning dimension (picture pair) or with mismatched meaning dimensions. Listeners inferred the meaning of the novel words significantly more often when prosody matched the word meaning choices than when prosody mismatched. These findings suggest that speech contains reliable prosodic markers to word meaning and that listeners use these prosodic cues to differentiate meanings. That prosody is semantic suggests a reconceptualization of traditional distinctions between linguistic and nonlinguistic properties of spoken language.  相似文献   

3.
Listeners are able to accurately recognize speech despite variation in acoustic cues across contexts, such as different speaking rates. Previous work has suggested that listeners use rate information (indicated by vowel length; VL) to modify their use of context-dependent acoustic cues, like voice-onset time (VOT), a primary cue to voicing. We present several experiments and simulations that offer an alternative explanation: that listeners treat VL as a phonetic cue rather than as an indicator of speaking rate, and that they rely on general cue-integration principles to combine information from VOT and VL. We demonstrate that listeners use the two cues independently, that VL is used in both naturally produced and synthetic speech, and that the effects of stimulus naturalness can be explained by a cue-integration model. Together, these results suggest that listeners do not interpret VOT relative to rate information provided by VL and that the effects of speaking rate can be explained by more general cue-integration principles.  相似文献   

4.
Most models of word recognition concerned with prosody are based on a distinction between strong syllables (containing a full vowel) and weak syllables (containing a schwa). In these models, the possibility that listeners take advantage of finer grained prosodic distinctions, such as primary versus secondary stress, is usually rejected on the grounds that these two categories are not discriminable from each other without lexical information or normalization of the speaker's voice. In the present experiment, subjects were presented with word fragments that differed only by their degree of stress--namely, primary or secondary stress (e.g.,/'prasI/vs./"prasI/). The task was to guess the origin of the fragment (e.g., "prosecutor" vs. "prosecution"). The results showed that guessing performance significantly exceeds the chance level, which indicates that making fine stress distinctions is possible without lexical information and with minimal speech normalization. This finding is discussed in the framework of prosody-based word recognition theories.  相似文献   

5.
6.
A central question in psycholinguistic research is how listeners isolate words from connected speech despite the paucity of clear word-boundary cues in the signal. A large body of empirical evidence indicates that word segmentation is promoted by both lexical (knowledge-derived) and sublexical (signal-derived) cues. However, an account of how these cues operate in combination or in conflict is lacking. The present study fills this gap by assessing speech segmentation when cues are systematically pitted against each other. The results demonstrate that listeners do not assign the same power to all segmentation cues; rather, cues are hierarchically integrated, with descending weights allocated to lexical, segmental, and prosodic cues. Lower level cues drive segmentation when the interpretive conditions are altered by a lack of contextual and lexical information or by white noise. Taken together, the results call for an integrated, hierarchical, and signal-contingent approach to speech segmentation.  相似文献   

7.
Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important are the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context dependent. This study assessed the informational assumptions of several models of speech categorization, in particular, the number of cues that are the basis of categorization and whether these cues represent the input veridically or have undergone compensation. We collected a corpus of 2,880 fricative productions (Jongman, Wayland, & Wong, 2000) spanning many talker and vowel contexts and measured 24 cues for each. A subset was also presented to listeners in an 8AFC phoneme categorization task. We then trained a common classification model based on logistic regression to categorize the fricative from the cue values and manipulated the information in the training set to contrast (a) models based on a small number of invariant cues, (b) models using all cues without compensation, and (c) models in which cues underwent compensation for contextual factors. Compensation was modeled by computing cues relative to expectations (C-CuRE), a new approach to compensation that preserves fine-grained detail in the signal. Only the compensation model achieved a similar accuracy to listeners and showed the same effects of context. Thus, even simple categorization metrics can overcome the variability in speech when sufficient information is available and compensation schemes like C-CuRE are employed.  相似文献   

8.
We propose a psycholinguistic model of lexical processing which incorporates both process and representation. The view of lexical access and selection that we advocate claims that these processes are conducted with respect to abstract underspecified phonological representations of lexical form. The abstract form of a given item in the recognition lexicon is an integrated segmental-featural representation, where all predictable and non-distinctive information is withheld. This means that listeners do not have available to them, as they process the speech input, a representation of the surface phonetic realisation of a given word-form. What determines performance is the abstract, underspecified representation with respect to which this surface string is being interpreted. These claims were tested by studying the interpretation of the same phonological feature, vowel nasality, in two languages, English and Bengali. The underlying status of this feature differs in the two languages; nasality is distinctive only in consonants in English, while both vowels and consonants contrast in nasality in Bengali. Both languages have an assimilation process which spreads nasality from a nasal consonant to the preceding vowel. A cross-linguistic gating study was conducted to investigate whether listeners would interpret nasal and oral vowels differently in two languages. The results show that surface phonetic nasality in the vowel in VN sequences is used by English listeners to anticipate the upcoming nasal consonant. In Bengali, however, nasality is initially interpreted as an underlying nasal vowel. Bengali listeners respond to CVN stimuli with words containing a nasal vowel, until they get information about the nasal consonant. In contrast, oral vowels in both languages are unspecified for nasality and are interpreted accordingly. Listeners in both languages respond with CVN words (which have phonetic nasality on the surface) as well as with CVC words while hearing an oral vowel. The results of this cross-linguistic study support, in detail, the hypothesis that the listener's interpretation of the speech input is in terms of an abstract underspecified representation of lexical form.  相似文献   

9.
The ability to interpret vocal (prosodic) cues during social interactions can be disrupted by Parkinson's disease, with notable effects on how emotions are understood from speech. This study investigated whether PD patients who have emotional prosody deficits exhibit further difficulties decoding the attitude of a speaker from prosody. Vocally inflected but semantically nonsensical ‘pseudo‐utterances’ were presented to listener groups with and without PD in two separate rating tasks. Task 1 required participants to rate how confident a speaker sounded from their voice and Task 2 required listeners to rate how polite the speaker sounded for a comparable set of pseudo‐utterances. The results showed that PD patients were significantly less able than HC participants to use prosodic cues to differentiate intended levels of speaker confidence in speech, although the patients could accurately detect the polite/impolite attitude of the speaker from prosody in most cases. Our data suggest that many PD patients fail to use vocal cues to effectively infer a speaker's emotions as well as certain attitudes in speech such as confidence, consistent with the idea that the basal ganglia play a role in the meaningful processing of prosodic sequences in spoken language ( Pell & Leonard, 2003 ).  相似文献   

10.
The research investigates how listeners segment the acoustic speech signal into phonetic segments and explores implications that the segmentation strategy may have for their perception of the (apparently) context-sensitive allophones of a phoneme. Two manners of segmentation are contrasted. In one, listeners segment the signal into temporally discrete, context-sensitive segments. In the other, which may be consistent with the talker’s production of the segments, they partition the signal into separate, but overlapping, segments freed of their contextual influences. Two complementary predictions of the second hypothesis are tested. First, listeners will use anticipatory coarticulatory information for a segment as information for the forthcoming segment. Second, subjects will not hear anticipatory coarticulatory information as part of the phonetic segment with which it co-occurs in time. The first hypothesis is supported by findings on a choice reaction time procedure; the second is supported by findings on a 4IAX discrimination test. Implications of the findings for theories of speech production, perception, and of the relation between the two are considered.  相似文献   

11.
Talkers hyperarticulate vowels when communicating with listeners that require increased speech intelligibility. Vowel hyperarticulation is said to be motivated by knowledge of the listener's linguistic needs because it typically occurs in speech to infants, foreigners and hearing-impaired listeners, but not to non-verbal pets. However, the degree to which vowel hyperarticulation is determined by feedback from the listener is surprisingly less well understood. This study examines whether mothers' speech input is driven by knowledge of the infant's linguistic competence, or by the infant's feedback cues. Specifically, we manipulated (i) mothers' knowledge of whether they believed their infants could hear them or not, and (ii) the audibility of the speech signal available to the infant (full or partial audibility, or inaudible). Remarkably, vowel hyperarticulation was completely unaffected by mothers' knowledge; instead, there was a reduction in the degree of hyperarticulation such that vowels were hyperarticulated to the greatest extent in the full audibility condition, there was reduced hyperarticulation in the partially audible condition, and no hyperarticulation in the inaudible condition. Thus, while it might be considered adaptive to hyperarticulate speech to the hearing-impaired adult or infant, when these two factors (infant and hearing difficulty) are coupled, vowel hyperarticulation is sacrificed. Our results imply that infant feedback drives talker behavior and raise implications for intervention strategies used with carers of hearing-impaired infants.  相似文献   

12.
Laboratoire de Sciences Cognitives et Psycholinguistique, Paris, France This study introduces a new paradigm for investigating lexical processing. First, an analysis of data from a series of word-spotting experiments is presented suggesting that listeners treat vowels as more mutable than consonants in auditory word recognition in English. In order to assess this hypothesis, a word reconstruction task was devised in which listeners were required to turn word-like nonwords into words by adapting the identity of either one vowel or one consonant. Listeners modified vowel identity more readily than consonant identity. Furthermore, incorrect responses more often involved a vowel change than a consonant change. These findings are compatible with the proposal that English listeners are equipped to deal with vowel variability by assuming that vowel identity is comparatively underdefined. The results are discussed in the light of theoretical accounts of speech processing.  相似文献   

13.
The article reviews the literature from psychology, phonetics, and phonology bearing on production and perception of syllable timing in speech. A review of the psychological and phonetics literature suggests that production of vowels and consonants are interleaved in syllable sequences in such a way that vowel production is continuous or nearly so. Based on that literature, a hypothesis is developed concerning the perception of syllable timing assuming that vowel production is continuous. The hypothesis is that perceived syllable timing corresponds to the times sequencing of the vowels as produced and not to the timing either of vowel onsets as conventionally measured or of syllable-initial consonants. Three experiments support the hypothesis. One shows that information present during the portion of an acoustic signal in which a syllable-initial consonant predominates is used by listeners to identify the vowel. Compatibly, this information for the vowel contributes to the vowel's perceived duration. Finally, a measure of the perceived timing of a syllable correlates significantly with the time required to identify syllable-medial vowels but not with time to identify the syllable-initial consonants. Further support for the proposed mode of vowel-consonant production and perception is derived from the literature on phonology. Language-specific phonological conventions can be identified that may reflect exaggerations and conventionalizations of the articulatory tendency for vowels to be produced continuously in speech.  相似文献   

14.
Even when the speaker, context, and speaking style are held fixed, the physical properties of naturally spoken utterances of the same speech sound vary considerably. This variability imposes limits on our ability to distinguish between different speech sounds. We present a conceptual framework for relating the ability to distinguish between speech sounds in single-token experiments (in which each speech sound is represented by a single wave form) to resolution in multiple-token experiments. Experimental results indicate that this ability is substantially reduced by an increase in the number of tokens from 1 to 4, but that there is little further reduction when the number of tokens increases to 16. Furthermore, although there is little relation between the ability to distinguish between a given pair of tokens in the multiple- and the 1-token experiments, there is a modest correlation between the ability to distinguish specific vowel tokens in the 4- and 16-token experiments. These results suggest that while listeners use a multiplicity of cues to distinguish between single tokens of a pair of vowel sounds, so that performance is highly variable both across tokens and listeners, they use a smaller set when distinguishing between populations of naturally produced vowel tokens, so that variability is reduced. The effectiveness of the cues used in the latter case is limited more by internal noise than by the variability of the cues themselves.  相似文献   

15.
综述了近年来韵律句法解歧方面的一些研究。首先介绍了韵律特征及其作用;其次,简单介绍了句法歧义及其加工模型;然后从讲话者和听话者两个角度,分析了讲话者在自然的情境中是否自发稳定的生成韵律线索,以及听话者是否能够即时的使用韵律信息引导最初的句法分析两个问题,并简要地阐述了汉语韵律和句法的独特特点以及国内的相关研究;最后,初步探讨了进行汉语相应研究应该注意的问题  相似文献   

16.
Prolongation of speech sounds is currently used to modify stuttering and enhance fluency. Prolonged speech (PS) (e.g., prolonged vowels, prolongation throughout utterances) is, however, often perceived as unnatural by listeners. This study examined at which durations and in which contexts 52 college students (primary language was American English) perceived PS to be unnatural. Stimuli were limited to controlled variation in prolongation of the vowel in the middle single-syllable word of a carrier phrase (i.e., say word again). The prolongation was effected by digital waveform manipulation within the Kay Elemetrics Computerized Speech Laboratory (CSL). The listeners judged if they strongly agreed, agreed, or disagreed that the phrases sounded natural. Results indicated that the extent of vowel duration (and possibly context) does influence listener's perception of speech naturalness, findings which can be applied to facilitate fluency therapy.

Educational objectives: (1) The reader will learn about and be able to summarize the digital waveform manipulation procedure in the study. (2) The reader will learn about and be able to describe the effects of differential vowel prolongation on listener's perception of speech naturalness. (3) The reader will learn about and evaluate how differential vowel prolongation can be used to enhance fluency.  相似文献   


17.
Despite spectral and temporal discontinuities in the speech signal, listeners normally report coherent phonetic patterns corresponding to the phonemes of a language that they know. What is the basis for the internal coherence of phonetic segments? According to one account, listeners achieve coherence by extracting and integrating discrete cues; according to another, coherence arises automatically from general principles of auditory form perception; according to a third, listeners perceive speech patterns as coherent because they are the acoustic consequences of coordinated articulatory gestures in a familiar language. We tested these accounts in three experiments by training listeners to hear a continuum of three-tone, modulated sine wave patterns, modeled after a minimal pair contrast between three-formant synthetic speech syllables, either as distorted speech signals carrying a phonetic contrast (speech listeners) or as distorted musical chords carrying a nonspeech auditory contrast (music listeners). The music listeners could neither integrate the sine wave patterns nor perceive their auditory coherence to arrive at consistent, categorical percepts, whereas the speech listeners judged the patterns as speech almost as reliably as the synthetic syllables on which they were modeled. The outcome is consistent with the hypothesis that listeners perceive the phonetic coherence of a speech signal by recognizing acoustic patterns that reflect the coordinated articulatory gestures from which they arose.  相似文献   

18.
Prosodic patterns of speech appear to make a critical contribution to memory-related processing. We considered the case of a previously unexplored prosodic feature of Greek storytelling and its effect on free recall in thirty typically developing children between the ages of 10 and 12 years, using short ecologically valid auditory stimuli. The combination of a falling pitch contour and, more notably, extensive final-syllable vowel lengthening, which gives rise to the prosodic feature in question, led to statistically significantly higher performance in comparison to neutral phrase-final prosody. Number of syllables in target words did not reveal substantial difference in performance. The current study presents a previously undocumented culturally-specific prosodic pattern and its effect on short-term memory.  相似文献   

19.
Speech processing requires sensitivity to long-term regularities of the native language yet demands listeners to flexibly adapt to perturbations that arise from talker idiosyncrasies such as nonnative accent. The present experiments investigate whether listeners exhibit dimension-based statistical learning of correlations between acoustic dimensions defining perceptual space for a given speech segment. While engaged in a word recognition task guided by a perceptually unambiguous voice-onset time (VOT) acoustics to signal beer, pier, deer, or tear, listeners were exposed incidentally to an artificial "accent" deviating from English norms in its correlation of the pitch onset of the following vowel (F0) to VOT. Results across four experiments are indicative of rapid, dimension-based statistical learning; reliance on the F0 dimension in word recognition was rapidly down-weighted in response to the perturbation of the correlation between F0 and VOT dimensions. However, listeners did not simply mirror the short-term input statistics. Instead, response patterns were consistent with a lingering influence of sensitivity to the long-term regularities of English. This suggests that the very acoustic dimensions defining perceptual space are not fixed and, rather, are dynamically and rapidly adjusted to the idiosyncrasies of local experience, such as might arise from nonnative-accent, dialect, or dysarthria. The current findings extend demonstrations of "object-based" statistical learning across speech segments to include incidental, online statistical learning of regularities residing within a speech segment.  相似文献   

20.
Speech unfolds over time, and the cues for even a single phoneme are rarely available simultaneously. Consequently, to recognize a single phoneme, listeners must integrate material over several hundred milliseconds. Prior work contrasts two accounts: (a) a memory buffer account in which listeners accumulate auditory information in memory and only access higher level representations (i.e., lexical representations) when sufficient information has arrived; and (b) an immediate integration scheme in which lexical representations can be partially activated on the basis of early cues and then updated when more information arises. These studies have uniformly shown evidence for immediate integration for a variety of phonetic distinctions. We attempted to extend this to fricatives, a class of speech sounds which requires not only temporal integration of asynchronous cues (the frication, followed by the formant transitions 150–350 ms later), but also integration across different frequency bands and compensation for contextual factors like coarticulation. Eye movements in the visual world paradigm showed clear evidence for a memory buffer. Results were replicated in five experiments, ruling out methodological factors and tying the release of the buffer to the onset of the vowel. These findings support a general auditory account for speech by suggesting that the acoustic nature of particular speech sounds may have large effects on how they are processed. It also has major implications for theories of auditory and speech perception by raising the possibility of an encapsulated memory buffer in early auditory processing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号