首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Models of speech processing typically assume that speech is represented by a succession of codes. In this paper we argue for the psychological validity of a prelexical (phonetic) code and for a postlexical (phonological) code. Whereas phonetic codes are computed directly from an analysis of input acoustic information, phonological codes are derived from information made available subsequent to the perception of higher order (word) units. The results of four experiments described here indicate that listeners can gain access to, or identify, entities at both of these levels. In these studies listeners were presented with sentences and were asked to respond when a particular word-initial target phoneme was detected (phoneme monitoring). In the first three experiments speed of lexical access was manipulated by varying the lexical status (word/nonword) or frequency (high/low) of a word in the critical sentences. Reaction times (RTs) to target phonemes were unaffected by these variables when the target phoneme was on the manipulated word. On the other hand, RTs were substantially affected when the target-bearing word was immediately after the manipulated word. These studies demonstrate that listeners can respond to the prelexical phonetic code. Experiment IV manipulated the transitional probability (high/low) of the target-bearing word and the comprehension test administered to subjects. The results suggest that listeners are more likely to respond to the postlexical phonological code when contextual constraints are present. The comprehension tests did not appear to affect the code to which listeners responded. A “Dual Code” hypothesis is presented to account for the reported findings. According to this hypothesis, listeners can respond to either the phonetic or the phonological code, and various factors (e.g., contextual constraints, memory load, clarity of the input speech signal) influence in predictable ways the code that will be responded to. The Dual Code hypothesis is also used to account for and integrate data gathered with other experimental tasks and to make predictions about the outcome of further studies.  相似文献   

2.
We examined whether listeners use acoustic correlates of voicing to resolve lexical ambiguities created by whispered speech in which a key feature, the voicing, is missing. Three associative priming experiments were conducted. The results showed a priming effect with whispered primes that included an intervocalic voiceless consonant (/petal/ “petal”) when the visual targets (FLEUR “flower”) were presented at the offset of the primes. A priming effect emerged with whispered primes that included a voiced intervocalic consonant (/pedal/ “pedal”) when the delay between the offset of the primes and the visual targets (VELO “bike”) was increased by 50 ms. In none of the experiments, the voiced primes (/pedal/) facilitated the processing of the targets (FLEUR) associated with the voiceless primes (/petal/). Our results suggest that the acoustic correlates of voicing are used by listeners to recover the intended words. Nonetheless, the retrieval of the voiced feature is not immediate during whispered word recognition.  相似文献   

3.
Circumstances in which the speech input is presented in sub-optimal conditions generally lead to processing costs affecting spoken word recognition. The current study indicates that some processing demands imposed by listening to difficult speech can be mitigated by feedback from semantic knowledge. A set of lexical decision experiments examined how foreign accented speech and word duration impact access to semantic knowledge in spoken word recognition. Results indicate that when listeners process accented speech, the reliance on semantic information increases. Speech rate was not observed to influence semantic access, except in the setting in which unusually slow accented speech was presented. These findings support interactive activation models of spoken word recognition in which attention is modulated based on speech demands.  相似文献   

4.
For optimal word recognition listeners should use all relevant acoustic information as soon as it comes available. Using printed-word eye tracking we investigated when during word processing Dutch listeners use suprasegmental lexical stress information to recognize words. Fixations on targets such as “OCtopus” (capitals indicate stress) were more frequent than fixations on segmentally overlapping but differently stressed competitors (“okTOber”) before segmental information could disambiguate the words. Furthermore, prior to segmental disambiguation, initially stressed words were stronger lexical competitors than noninitially stressed words. Listeners recognize words by immediately using all relevant information in the speech signal.  相似文献   

5.
Eleven-month-olds can recognize a few auditorily presented familiar words in experimental situations where no hints are given by the intonation, the situation, or the presence of possible visual referents. That is, infants of this age (and possibly somewhat younger) can recognize words based on sound patterns alone. The issue addressed in this article is what is the type of mental representations infants use to code words they recognize. The results of a series of experiments with French-learning infants indicate that word representations in 11-month-olds are segmentally underspecified and suggest that they are all the more underspecified when infants engage in recognizing words rather than merely attending to meaningless speech sounds. But underspecification has limits, which were explored here with respect to word-initial consonants. The last two experiments show the way to investigating further these limits for word-initial consonants as well as for segments in other word positions. In French, infants' word representations are flexible enough to allow for structural changes in the voicing or even in the manner of articulation of word-initial consonants. Word-initial consonants must be present, however, for words to be recognized. In conclusion, a parallel is proposed between the emerging capacities to ignore variations that are irrelevant for word recognition in a “lexical mode” and to ignore variations that are phonemically irrelevant in a “neutral mode” of listening to native speech.  相似文献   

6.
Listeners are able to accurately recognize speech despite variation in acoustic cues across contexts, such as different speaking rates. Previous work has suggested that listeners use rate information (indicated by vowel length; VL) to modify their use of context-dependent acoustic cues, like voice-onset time (VOT), a primary cue to voicing. We present several experiments and simulations that offer an alternative explanation: that listeners treat VL as a phonetic cue rather than as an indicator of speaking rate, and that they rely on general cue-integration principles to combine information from VOT and VL. We demonstrate that listeners use the two cues independently, that VL is used in both naturally produced and synthetic speech, and that the effects of stimulus naturalness can be explained by a cue-integration model. Together, these results suggest that listeners do not interpret VOT relative to rate information provided by VL and that the effects of speaking rate can be explained by more general cue-integration principles.  相似文献   

7.
Speech perception requires listeners to integrate multiple cues that each contribute to judgments about a phonetic category. Classic studies of trading relations assessed the weights attached to each cue but did not explore the time course of cue integration. Here, we provide the first direct evidence that asynchronous cues to voicing (/b/ vs. /p/) and manner (/b/ vs. /w/) contrasts become available to the listener at different times during spoken word recognition. Using the visual world paradigm, we show that the probability of eye movements to pictures of target and of competitor objects diverge at different points in time after the onset of the target word. These points of divergence correspond to the availability of early (voice onset time or formant transition slope) and late (vowel length) cues to voicing and manner contrasts. These results support a model of cue integration in which phonetic cues are used for lexical access as soon as they are available.  相似文献   

8.
Evidence is presented that (a) the open and the closed word classes in English have different phonological characteristics, (b) the phonological dimension on which they differ is one to which listeners are highly sensitive, and (c) spoken open- and closed-class words produce different patterns of results in some auditory recognition tasks. What implications might link these findings? Two recent lines of evidence from disparate paradigms—the learning of an artificial language, and natural and experimentally induced misperception of juncture—are summarized, both of which suggest that listeners are sensitive to the phonological reflections of open- vs. closed-class word status. Although these correlates cannot be strictly necessary for efficient processing, if they are present listeners exploit them in making word class assignments. That such a use of phonological information is of value to listeners could be indirect evidence that open- vs. closed-class words undergo different processing operations.  相似文献   

9.
When identifying spoken words, older listeners may have difficulty resolving lexical competition or may place a greater weight on factors like lexical frequency. To obtain information about age differences in the time course of spoken word recognition, young and older adults' eye movements were monitored as they followed spoken instructions to click on objects displayed on a computer screen. Older listeners were more likely than younger listeners to fixate high-frequency displayed phonological competitors. However, degradation of auditory quality in younger listeners does not reproduce this result. These data are most consistent with an increased role for lexical frequency with age.  相似文献   

10.
Models of how listeners understand speech must specify the types of representations that are computed, the nature of the flow of information, and the control structures that modify performance. Three experiments are reported that focus on the control processes in speech perception. Subjects in the experiments tried to discriminate stimuli in which a phoneme had been replaced with white noise from stimuli in which white noise was merely superimposed on a phoneme. In the first two experiments, subjects practiced the discrimination for thousands of trials but did not improve, suggesting that they have poor access to low-level representations of the speech signal. In the third experiment, each (auditory) stimulus was preceded by a visual cue that could potentially be used to focus attention in order to enhance performance. Only subjects who received information about both the identity of the impending word and the identity of the critical phoneme showed enhanced discrimination. Other cues, including syllabic plus phonemic information, were ineffective. The results indicate that attentional control of processing is difficult but possible, and that lexical representations play a central role in the allocation of attention.  相似文献   

11.
Speech unfolds over time, and the cues for even a single phoneme are rarely available simultaneously. Consequently, to recognize a single phoneme, listeners must integrate material over several hundred milliseconds. Prior work contrasts two accounts: (a) a memory buffer account in which listeners accumulate auditory information in memory and only access higher level representations (i.e., lexical representations) when sufficient information has arrived; and (b) an immediate integration scheme in which lexical representations can be partially activated on the basis of early cues and then updated when more information arises. These studies have uniformly shown evidence for immediate integration for a variety of phonetic distinctions. We attempted to extend this to fricatives, a class of speech sounds which requires not only temporal integration of asynchronous cues (the frication, followed by the formant transitions 150–350 ms later), but also integration across different frequency bands and compensation for contextual factors like coarticulation. Eye movements in the visual world paradigm showed clear evidence for a memory buffer. Results were replicated in five experiments, ruling out methodological factors and tying the release of the buffer to the onset of the vowel. These findings support a general auditory account for speech by suggesting that the acoustic nature of particular speech sounds may have large effects on how they are processed. It also has major implications for theories of auditory and speech perception by raising the possibility of an encapsulated memory buffer in early auditory processing.  相似文献   

12.
A central question in psycholinguistic research is how listeners isolate words from connected speech despite the paucity of clear word-boundary cues in the signal. A large body of empirical evidence indicates that word segmentation is promoted by both lexical (knowledge-derived) and sublexical (signal-derived) cues. However, an account of how these cues operate in combination or in conflict is lacking. The present study fills this gap by assessing speech segmentation when cues are systematically pitted against each other. The results demonstrate that listeners do not assign the same power to all segmentation cues; rather, cues are hierarchically integrated, with descending weights allocated to lexical, segmental, and prosodic cues. Lower level cues drive segmentation when the interpretive conditions are altered by a lack of contextual and lexical information or by white noise. Taken together, the results call for an integrated, hierarchical, and signal-contingent approach to speech segmentation.  相似文献   

13.
The possible-word constraint (PWC; Norris, McQueen, Cutler, & Butterfield, 1997) has been proposed as a language-universal segmentation principle: Lexical candidates are disfavoured if the resulting segmentation of continuous speech leads to vowelless residues in the input—for example, single consonants. Three word-spotting experiments investigated segmentation in Slovak, a language with single-consonant words and fixed stress. In Experiment 1, Slovak listeners detected real words such as ruka “hand” embedded in prepositional-consonant contexts (e.g., /gruka/) faster than those in nonprepositional-consonant contexts (e.g., /truka/) and slowest in syllable contexts (e.g., /dugruka/). The second experiment controlled for effects of stress. Responses were still fastest in prepositional-consonant contexts, but were now slowest in nonprepositional-consonant contexts. In Experiment 3, the lexical and syllabic status of the contexts was manipulated. Responses were again slowest in nonprepositional-consonant contexts but equally fast in prepositional-consonant, prepositional-vowel, and nonprepositional-vowel contexts. These results suggest that Slovak listeners use fixed stress and the PWC to segment speech, but that single consonants that can be words have a special status in Slovak segmentation. Knowledge about what constitutes a phonologically acceptable word in a given language therefore determines whether vowelless stretches of speech are or are not treated as acceptable parts of the lexical parse.  相似文献   

14.
This study demonstrates that listeners use lexical knowledge in perceptual learning of speech sounds. Dutch listeners first made lexical decisions on Dutch words and nonwords. The final fricative of 20 critical words had been replaced by an ambiguous sound, between [f] and [s]. One group of listeners heard ambiguous [f]-final words (e.g., [WItlo?], from witlof, chicory) and unambiguous [s]-final words (e.g., naaldbos, pine forest). Another group heard the reverse (e.g., ambiguous [na:ldbo?], unambiguous witlof). Listeners who had heard [?] in [f]-final words were subsequently more likely to categorize ambiguous sounds on an [f]-[s] continuum as [f] than those who heard [?] in [s]-final words. Control conditions ruled out alternative explanations based on selective adaptation and contrast. Lexical information can thus be used to train categorization of speech. This use of lexical information differs from the on-line lexical feedback embodied in interactive models of speech perception. In contrast to on-line feedback, lexical feedback for learning is of benefit to spoken word recognition (e.g., in adapting to a newly encountered dialect).  相似文献   

15.
Language-users reduce words in predictable contexts. Previous research indicates that reduction may be stored in lexical representation if a word is often reduced. Because representation influences production regardless of context, production should be biased by how often each word has been reduced in the speaker’s prior experience. This study investigates whether speakers have a context-independent bias to reduce low-informativity words, which are usually predictable and therefore usually reduced. Content word durations were extracted from the Buckeye and Switchboard speech corpora, and analyzed for probabilistic reduction effects using a language model based on spontaneous speech in the Fisher corpus. The analysis supported the hypothesis: low-informativity words have shorter durations, even when the effects of local contextual predictability, frequency, speech rate, and several other variables are controlled for. Additional models that compared word types against only other words of the same segmental length further supported this conclusion. Words that usually appear in predictable contexts are reduced in all contexts, even those in which they are unpredictable. The result supports representational models in which reduction is stored, and where sufficiently frequent reduction biases later production. The finding provides new evidence that probabilistic reduction interacts with lexical representation.  相似文献   

16.
The roles of spectro-temporal coherence, lexical status, and word position in the perception of speech in acoustic signals containing a mixture of speech and nonspeech sounds were investigated. Stimuli consisted of nine (non)words in which either white noise was inserted only into the silent interval preceding and/or following the onset of vocalic transitions ambiguous between /p/ and /f/, or in which white noise overlaid the entire utterance. Ten listeners perceived 85% /f/s when noise was inserted only into the silent interval signaling a stop closure, 47% /f/s when noise overlaid the entire (non)words, and 1% in the control condition that contained no noise. Effects of spectro-temporal coherence seemed to have dominated perceptual outcomes, although the lexical status and position of the critical phoneme also appeared to affect responses. The results are explained more adequately by the theory of Auditory Scene Analysis than by the Motor Theory of Speech Perception.  相似文献   

17.
Although previous research has established that multiple top-down factors guide the identification of words during speech processing, the ultimate range of information sources that listeners integrate from different levels of linguistic structure is still unknown. In a set of experiments, we investigate whether comprehenders can integrate information from the 2 most disparate domains: pragmatic inference and phonetic perception. Using contexts that trigger pragmatic expectations regarding upcoming coreference (expectations for either he or she), we test listeners' identification of phonetic category boundaries (using acoustically ambiguous words on the /hi/~/∫i/ continuum). The results indicate that, in addition to phonetic cues, word recognition also reflects pragmatic inference. These findings are consistent with evidence for top-down contextual effects from lexical, syntactic, and semantic cues, but they extend this previous work by testing cues at the pragmatic level and by eliminating a statistical-frequency confound that might otherwise explain the previously reported results. We conclude by exploring the time course of this interaction and discussing how different models of cue integration could be adapted to account for our results.  相似文献   

18.
Most models of word recognition concerned with prosody are based on a distinction between strong syllables (containing a full vowel) and weak syllables (containing a schwa). In these models, the possibility that listeners take advantage of finer grained prosodic distinctions, such as primary versus secondary stress, is usually rejected on the grounds that these two categories are not discriminable from each other without lexical information or normalization of the speaker's voice. In the present experiment, subjects were presented with word fragments that differed only by their degree of stress--namely, primary or secondary stress (e.g.,/'prasI/vs./"prasI/). The task was to guess the origin of the fragment (e.g., "prosecutor" vs. "prosecution"). The results showed that guessing performance significantly exceeds the chance level, which indicates that making fine stress distinctions is possible without lexical information and with minimal speech normalization. This finding is discussed in the framework of prosody-based word recognition theories.  相似文献   

19.
Phoneme identification with audiovisually discrepant stimuli is influenced hy information in the visual signal (the McGurk effect). Additionally, lexical status affects identification of auditorily presented phonemes. The present study tested for lexical influences on the McGurk effect. Participants identified phonemes in audiovisually discrepant stimuli in which lexical status of the auditory component and of a visually influenced percept was independently varied. Visually influenced (McGurk) responses were more frequent when they formed a word and when the auditory signal was a nonword (Experiment 1). Lexical effects were larger for slow than for fast responses (Experiment 2), as with auditory speech, and were replicated with stimuli matched on physical properties (Experiment 3). These results are consistent with models in which lexical processing of speech is modality independent.  相似文献   

20.
We propose a psycholinguistic model of lexical processing which incorporates both process and representation. The view of lexical access and selection that we advocate claims that these processes are conducted with respect to abstract underspecified phonological representations of lexical form. The abstract form of a given item in the recognition lexicon is an integrated segmental-featural representation, where all predictable and non-distinctive information is withheld. This means that listeners do not have available to them, as they process the speech input, a representation of the surface phonetic realisation of a given word-form. What determines performance is the abstract, underspecified representation with respect to which this surface string is being interpreted. These claims were tested by studying the interpretation of the same phonological feature, vowel nasality, in two languages, English and Bengali. The underlying status of this feature differs in the two languages; nasality is distinctive only in consonants in English, while both vowels and consonants contrast in nasality in Bengali. Both languages have an assimilation process which spreads nasality from a nasal consonant to the preceding vowel. A cross-linguistic gating study was conducted to investigate whether listeners would interpret nasal and oral vowels differently in two languages. The results show that surface phonetic nasality in the vowel in VN sequences is used by English listeners to anticipate the upcoming nasal consonant. In Bengali, however, nasality is initially interpreted as an underlying nasal vowel. Bengali listeners respond to CVN stimuli with words containing a nasal vowel, until they get information about the nasal consonant. In contrast, oral vowels in both languages are unspecified for nasality and are interpreted accordingly. Listeners in both languages respond with CVN words (which have phonetic nasality on the surface) as well as with CVC words while hearing an oral vowel. The results of this cross-linguistic study support, in detail, the hypothesis that the listener's interpretation of the speech input is in terms of an abstract underspecified representation of lexical form.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号