首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
The interactions, during word-recognition in continuous speech, between the bottom-up analyses of the input and different forms of internally generated top-down constraint, were investigated using a shadowing task and a mispronunciation detection task (in the detection task the subject saw a text of the original passage as he listened to it). The listener's dependence on bottom-up analyses in the shadowing task, as measured by the number of fluent restorations of mispronounced words, was found to vary as a function of the syllable position of the mispronunciation within the word and of the contextual constraints on the word as a whole. In the detection task only syllable position effects were obtained. The results, discussed in conjunction with earlier research, were found to be inconsistent with either the logogen model of word-recognition or an autonomous search model. Instead, an active direct access model is proposed, in which top-down processing constraints interact directly with bottom-up information to produce the primary lexical interpretation of the acoustic-phonetic input.  相似文献   

Models of speech perception have stressed the importance of investigating recognition of words in fluent speech. The effects of word length and the initial phonemes of words on the speech perception of foreign language learners were investigated. English-speaking subjects were asked to listen for target words in repeated presentations of a prose passage read in French by a native speaker. The four target words were either one or four syllables in length and began with either an initial stop or fricative consonant. Each of the four words was substituted 60 times in identical sentence contexts in place of nouns deleted from the original story. The results indicated that four-syllable words were more easily detected than one-syllable words. Contrary to expectation, stop-initial words were not more accurately detected than fricative-initial words. Based on these findings additional considerations that seem needed in order to apply current models of word recognition to naive listeners are discussed.  相似文献   

Does memory retrieval occur in a continuous or an all-or-none manner? The shape of the receiver operating characteristic (ROC) has been used to answer this question, with curvilinear and linear memory ROCs indicating continuous and all-or-none retrieval processes, respectively. Signal detection models (e.g., the unequal variance model) correspond to a continuous retrieval process, whereas threshold models (including the multinomial model and the recollection component of the dual-process model) correspond to an all-or-none process. In studies of source memory, Slotnick et al. (2000) and others have observed curvilinear ROCs (supporting the unequal variance model), whereas Yonelinas (1999) observed linear ROCs (supporting the dual-process model). We resolve these seemingly inconsistent results, showing that source memory ROCs are naturally curvilinear but can appear linear when nondiagnostic source information is included in the analysis. Furthermore, the unequal variance model accounted for both recognition memory and source memory ROCs, supporting a continuous process of memory retrieval.  相似文献   

This article presents a theory of visual word recognition that assumes that, in the tasks of word identification, lexical decision, and semantic categorization, human readers behave as optimal Bayesian decision makers. This leads to the development of a computational model of word recognition, the Bayesian reader. The Bayesian reader successfully simulates some of the most significant data on human reading. The model accounts for the nature of the function relating word frequency to reaction time and identification threshold, the effects of neighborhood density and its interaction with frequency, and the variation in the pattern of neighborhood density effects seen in different experimental tasks. Both the general behavior of the model and the way the model predicts different patterns of results in different tasks follow entirely from the assumption that human readers approximate optimal Bayesian decision makers.  相似文献   

A theoretical account of the mirror effect for word frequency and of dissociations in the pattern of responding Remember vs. Know (R vs. K) for low- and high-frequency words was tested both empirically and computationally by comparing predicted with observed data theory in 3 experiments. The SAC (Source of Activation Confusion) theory of memory makes the novel prediction of more K responses for high- than for low-frequency words, for both old and new items. Two experiments used a continuous presentation and judgment paradigm that presented words up to 10 times. The computer simulation closely modeled the pattern of results, fitting new Know and Remember patterns of responding at each level of experimental presentation and for both levels of word frequency for each participant. Experiment 3 required list discrimination after each R response (Group 1) or after an R or K response (Group 2). List accuracy was better following R responses. All experiments were modeled using the same parameter values.  相似文献   

A dynamic oscillator-based model of the sequencing of phonemes in speech production (OSCAR) is described. An analysis of phoneme movement errors (anticipations, perseverations, and exchanges) from a large naturalistic speech error corpus provides a new set of data suitable for quantitative modeling and is used to derive a set of constraints that any speech-production model must address. The new computational model is shown to account for error type proportions, movement error distance gradients, the syllable-position effect, and phonological similarity effects. The model provides an alternative to frame-based accounts, serial buffer accounts, and associative chaining theories of serial order processing in speech.  相似文献   

Endress AD  Bonatti LL 《Cognition》2007,105(2):247-299
To learn a language, speakers must learn its words and rules from fluent speech; in particular, they must learn dependencies among linguistic classes. We show that when familiarized with a short artificial, subliminally bracketed stream, participants can learn relations about the structure of its words, which specify the classes of syllables occurring in first and last word positions. By studying the effect of familiarization length, we compared the general predictions of associative theories of learning and those of models postulating separate mechanisms for quickly extracting the word structure and for tracking the syllable distribution in the stream. As predicted by the dual-mechanism model, the preference for structurally correct items was negatively correlated with the familiarization length. This result is difficult to explain by purely associative schemes; an extensive set of neural network simulations confirmed this difficulty. Still, we show that powerful statistical computations operating on the stream are available to our participants, as they are sensitive to co-occurrence statistics among non-adjacent syllables. We suggest that different learning mechanisms analyze speech on-line: A rapid mechanism extracting structural information about the stream, and a slower mechanism detecting statistical regularities among the items occurring in it.  相似文献   

Batchelder EO 《Cognition》2002,83(2):167-206
Prelinguistic infants must find a way to isolate meaningful chunks from the continuous streams of speech that they hear. BootLex, a new model which uses distributional cues to build a lexicon, demonstrates how much can be accomplished using this single source of information. This conceptually simple probabilistic algorithm achieves significant segmentation results on various kinds of language corpora - English, Japanese, and Spanish; child- and adult-directed speech, and written texts; and several variations in coding structure - and reveals which statistical characteristics of the input have an influence on segmentation performance. BootLex is then compared, quantitatively and qualitatively, with three other groups of computational models of the same infant segmentation process, paying particular attention to functional characteristics of the models and their similarity to human cognition. Commonalities and contrasts among the models are discussed, as well as their implications both for theories of the cognitive problem of segmentation itself, and for the general enterprise of computational cognitive modeling.  相似文献   

Norris D  McQueen JM  Cutler A 《The Behavioral and brain sciences》2000,23(3):299-325; discussion 325-70
Top-down feedback does not benefit speech recognition; on the contrary, it can hinder it. No experimental data imply that feedback loops are required for speech recognition. Feedback is accordingly unnecessary and spoken word recognition is modular. To defend this thesis, we analyse lexical involvement in phonemic decision making. TRACE (McClelland & Elman 1986), a model with feedback from the lexicon to prelexical processes, is unable to account for all the available data on phonemic decision making. The modular Race model (Cutler & Norris 1979) is likewise challenged by some recent results, however. We therefore present a new modular model of phonemic decision making, the Merge model. In Merge, information flows from prelexical processes to the lexicon without feedback. Because phonemic decisions are based on the merging of prelexical and lexical information, Merge correctly predicts lexical involvement in phonemic decisions in both words and nonwords. Computer simulations show how Merge is able to account for the data through a process of competition between lexical hypotheses. We discuss the issue of feedback in other areas of language processing and conclude that modular models are particularly well suited to the problems and constraints of speech recognition.  相似文献   

Previous work (Tuller, Case, Ding, & Kelso, 1994) has revealed signature properties of nonlinear dynamical systems in how people categorize speech sounds. The data were modeled by using a twowell potential function that deformed with stimulus properties and was sensitive to context. Here we evaluate one prediction of the model—namely, that the rate of change of the potential’s slope should increase when the category is repeatedly perceived. Judged goodness of category membership was used as an index of the slope of the potential. Stimuli from a “say”-“stay” continuum were presented with gap duration changing sequentially throughout the range from 0 to 76 to 0 msec, or from 76 to 0 to 76 msec. Subjects identified each token as either “say” or “stay” and rated how good an exemplar it was of the identified category. As predicted, the same physical stimulus presented at the end of a sequence was judged a better exemplar of the category than was the identical stimulus presented at the beginning of the sequence. In contrast, stimuli presented twice near the middle of a sequence with few (or no) stimuli between them, as well as stimuli presented with an intervening random set, showed no such differences. These results confirm the hypothesis of a context-sensitive dynamical representation underlying speech.  相似文献   

Missing data are very common in behavioural and psychological research. In this paper, we develop a Bayesian approach in the context of a general nonlinear structural equation model with missing continuous and ordinal categorical data. In the development, the missing data are treated as latent quantities, and provision for the incompleteness of the data is made by a hybrid algorithm that combines the Gibbs sampler and the Metropolis‐Hastings algorithm. We show by means of a simulation study that the Bayesian estimates are accurate. A Bayesian model comparison procedure based on the Bayes factor and path sampling is proposed. The required observations from the posterior distribution for computing the Bayes factor are simulated by the hybrid algorithm in Bayesian estimation. Our simulation results indicate that the correct model is selected more frequently when the incomplete records are used in the analysis than when they are ignored. The methodology is further illustrated with a real data set from a study concerned with an AIDS preventative intervention for Filipina sex workers.  相似文献   

Räsänen O 《Cognition》2011,(2):149-176
Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this work, a computational model for word segmentation and learning of primitive lexical items from continuous speech is presented. The model does not utilize any a priori linguistic or phonemic knowledge such as phones, phonemes or articulatory gestures, but computes transitional probabilities between atomic acoustic events in order to detect recurring patterns in speech. Experiments with the model show that word segmentation is possible without any knowledge of linguistically relevant structures, and that the learned ungrounded word models show a relatively high selectivity towards specific words or frequently co-occurring combinations of short words.  相似文献   

Psychological experiments often collect choice responses using buttonpresses. However, spoken responses are useful in many cases—for example, when working with special clinical populations, or when a paradigm demands vocalization, or when accurate response time measurements are desired. In these cases, spoken responses are typically collected using a voice key, which usually involves manual coding by experimenters in a tedious and error-prone manner. We describe ChoiceKey, an open-source speech recognition package for MATLAB. It can be optimized by training for small response sets and different speakers. We show ChoiceKey to be reliable with minimal training for most participants in experiments with two different responses. Problems presented by individual differences, and occasional atypical responses, are examined, and extensions to larger response sets are explored. The ChoiceKey source files and instructions may be downloaded as supplemental materials for this article from brm.psychonomic-journals.org/content/supplemental.  相似文献   

To successfully infer a speaker's emotional state, diverse sources of emotional information need to be decoded. The present study explored to what extent emotional speech recognition of 'basic' emotions (anger, disgust, fear, happiness, pleasant surprise, sadness) differs between different sex (male/female) and age (young/middle-aged) groups in a behavioural experiment. Participants were asked to identify the emotional prosody of a sentence as accurately as possible. As a secondary goal, the perceptual findings were examined in relation to acoustic properties of the sentences presented. Findings indicate that emotion recognition rates differ between the different categories tested and that these patterns varied significantly as a function of age, but not of sex.  相似文献   

In 4 chronometric experiments, influences of spoken word planning on speech recognition were examined. Participants were shown pictures while hearing a tone or a spoken word presented shortly after picture onset. When a spoken word was presented, participants indicated whether it contained a prespecified phoneme. When the tone was presented, they indicated whether the picture name contained the phoneme (Experiment 1) or they named the picture (Experiment 2). Phoneme monitoring latencies for the spoken words were shorter when the picture name contained the prespecified phoneme compared with when it did not. Priming of phoneme monitoring was also obtained when the phoneme was part of spoken nonwords (Experiment 3). However, no priming of phoneme monitoring was obtained when the pictures required no response in the experiment, regardless of monitoring latency (Experiment 4). These results provide evidence that an internal phonological pathway runs from spoken word planning to speech recognition and that active phonological encoding is a precondition for engaging the pathway.  相似文献   

Sato H  Takeuchi T  Sakai KL 《Cognition》1999,73(3):B55-B66
Cortical activity during speech recognition was examined using optical topography (OT), a recently developed non-invasive technique. To assess relative changes in hemoglobin oxygenation, local changes in near-infrared light absorption were measured simultaneously from 44 points in both hemispheres. A dichotic listening paradigm was used in this experiment, in which target stimuli and non-target stimuli were presented to different ears. Subjects were asked to track targets and to press a button when targets shifted from one ear to the other. We compared three tasks: (i) a control task, in which a tone was used as the target; (ii) a repeat task, in which the target was one repeated sentence; (iii) a story task, in which the targets were continuous sentences of a story. The activity for the story task, compared with the repeat task, was localized in the left superior temporal cortex. Relative to the control task, we observed in this region a larger increase in oxyhemoglobin concentration and a decrease in deoxyhemoglobin concentration in the story task than those in the repeat task. These results suggest that the activity in the left temporal association area reflects the load of auditory, memory, and language information processing.  相似文献   

At the behavioral level one of the primary disturbances involved in congenital dyslexia concerns phonological processing. At the neuroarchitectural level autopsies have revealed ectopies, e.g., a reduced number of neurons in the upper layers of the cortex and an increased number in the lower ones. In dynamic models of interacting neuronal populations the behavioral level can be related to the neurophysiological level. In this study an attempt is made to do so at the cortical level. The first focus of this model study are the results of a Finnish experiment assessing geminate stop perception in quasi speech stimuli by 6 month old infants using a head turning paradigm and evoked potentials. The second focus of this study are the results of a Dutch experiment assessing discrimination of transients in speech stimuli, by adult dyslexics and controls and 2 month old infants. There appears to be a difference in the phonemic perceptual boundaries of children at genetic risk for dyslexia and control children as revealed in the Finnish study. Assuming a lowered neuronal density in the 'dyslexic' model, reflecting ectopies, it may be postulated that less neuronal surface is available for synaptic connections resulting in a lowered synaptic density and thus a lowered amount of available neurotransmitter. A lowered synaptic density also implies a reduced amount of membrane surface available for neurotransmitter metabolism. By assuming both, a reduced upper bound of neurotransmitter and a reduced metabolic transmitter rate in the dynamic model, the Finnish experimental results can be approximated closely. This applies both to data from behavioral head turning and that of the evoked potential study. In the Dutch study adult dyslexics show poor performance in discriminating transients in the speech signal compared to the controls. The same stimuli were used in a a study comparing infants from dyslexic families and controls. Using the same transmitter parameters as in modeling the results of the Finnish study, also in this case the experimental results for adults and infants can be approximated closely. Simulation of behavioral and pharmaceutical interventions with the model provide predictions which can be put to the test in experiments.  相似文献   

In a recent paper, four-look recognition performance was predicted from one-look (lL) data by Bayes’s theorem, with the entire pattern of two Ss’ four-look data being predicted reasonably well. In the present study, three Ss were run, with the addition that feedback was given and confidence judgments were required. Their task was to identify tachistoscopically presented graphemes A, T, or U. Predictions of four-look performance were made using three orders of lL data matrices, differing in the breakdown of confidence categories. The three matrices led to reasonably accurate predictions. Predictions varied somewhat in accuracy, depending on the order of the lL matrix. The possibility that the variation in predictive accuracy reflected the capacity of an S to combine information received from each observation was discussed. The capacity question is presently under investigation by the authors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号