首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Twelve male listeners categorized 54 synthetic vowel stimuli that varied in second and third formant frequency on a Bark scale into the American English vowel categories [see text]. A neuropsychologically plausible model of categorization in the visual domain, the Striatal Pattern Classifier (SPC; Ashby & Waldron, 1999), is generalized to the auditory domain and applied separately to the data from each observer. Performance of the SPC is compared with that of the successful Normal A Posteriori Probability model (NAPP; Nearey, 1990; Nearey & Hogan, 1986) of auditory categorization. A version of the SPC that assumed piece-wise linear response region partitions provided a better account of the data than the SPC that assumed linear partitions, and was indistinguishable from a version that assumed quadratic response region partitions. A version of the NAPP model that assumed nonlinear response regions was superior to the NAPP model with linear partitions. The best fitting SPC provided a good account of each observer's data but was outperformed by the best fitting NAPP model. Implications for bridging the gap between the domains of visual and auditory categorization are discussed.  相似文献   

2.
Sussman HM  Fruchter D  Hilbert J  Sirosh J 《The Behavioral and brain sciences》1998,21(2):241-59; discussion 260-99
Neuroethological investigations of mammalian and avian auditory systems have documented species-specific specializations for processing complex acoustic signals that could, if viewed in abstract terms, have an intriguing and striking relevance for human speech sound categorization and representation. Each species forms biologically relevant categories based on combinatorial analysis of information-bearing parameters within the complex input signal. This target article uses known neural models from the mustached bat and barn owl to develop, by analogy, a conceptualization of human processing of consonant plus vowel sequences that offers a partial solution to the noninvariance dilemma--the nontransparent relationship between the acoustic waveform and the phonetic segment. Critical input sound parameters used to establish species-specific categories in the mustached bat and barn owl exhibit high correlation and linearity due to physical laws. A cue long known to be relevant to the perception of stop place of articulation is the second formant (F2) transition. This article describes an empirical phenomenon--the locus equations--that describes the relationship between the F2 of a vowel and the F2 measured at the onset of a consonant-vowel (CV) transition. These variables, F2 onset and F2 vowel within a given place category, are consistently and robustly linearly correlated across diverse speakers and languages, and even under perturbation conditions as imposed by bite blocks. A functional role for this category-level extreme correlation and linearity (the "orderly output constraint") is hypothesized based on the notion of an evolutionarily conserved auditory-processing strategy. High correlation and linearity between critical parameters in the speech signal that help to cue place of articulation categories might have evolved to satisfy a preadaptation by mammalian auditory systems for representing tightly correlated, linearly related components of acoustic signals.  相似文献   

3.
Three experiments investigated whether extrinsic vowel normalization takes place largely at a categorical or a precategorical level of processing. Traditional vowel normalization effects in categorization were replicated in Experiment 1: Vowels taken from an [?]–[ε] continuum were more often interpreted as /?/ (which has a low first formant, F 1) when the vowels were heard in contexts that had a raised F 1 than when the contexts had a lowered F 1. This was established with contexts that consisted of only two syllables. These short contexts were necessary for Experiment 2, a discrimination task that encouraged listeners to focus on the perceptual properties of vowels at a precategorical level. Vowel normalization was again found: Ambiguous vowels were more easily discriminated from an endpoint [ε] than from an endpoint [?] in a high-F 1 context, whereas the opposite was true in a low-F 1 context. Experiment 3 measured discriminability between pairs of steps along the [?]–[ε] continuum. Contextual influences were again found, but without discrimination peaks, contrary to what was predicted from the same participants’ categorization behavior. Extrinsic vowel normalization therefore appears to be a process that takes place at least in part at a precategorical processing level.  相似文献   

4.
A two-alternative, forced-choice procedure was used in two experiments to test 3-year-old children’s categorization of natural vowel tokens produced by several talkers. An appropriate pointing response (right or left) was visually reinforced with one of two television displays. In Experiment 1, the stimuli were isolated tokens of /a/ and /i/ produced by a male adult, a female adult, a male child, and a female child. In Experiment 2, the stimuli were isolated tokens of /æ/ and /∧/ produced by the same talkers. In both experiments, 3-year-olds spontaneously generalized their pointing responses from the male adult vowel tokens to the corresponding vowels produced by the other talkers. Children reinforced for an arbitrary grouping of the two vowel categories persisted in categorizing on the basis of vowel quality. Results from both experiments demonstrate the presence of perceptual constancy for vowel tokens across talkers. In particular, the results from Experiment 2 provide evidence for normalization of isolated, quasi-steady-state vowel tokens because the formant values for tokens of /∧/ produced by the woman and the two children were closer to the formant frequencies of the male adult’s /æ/ than to the male adult’s /∧/.  相似文献   

5.
Six-year-old children's ability to categorize words on the basis of vowel categories was examined at the beginning of first grade and again after 6 months of formal schooling. The potential effects of relative proximity of vowels in the vowel space, of syllable structure, and of input phonology were assessed. Also, the effect of literacy instruction on vowel categorization and the relationship of vowel categorization with vowel spelling and reading skill were investigated. Results indicate that the ability to categorize vowels does not develop uniformly but is affected by the degree of spectral/articulatory proximity between vowels, by syllable structure, and potentially by characteristics of the input phonology. Error analyses further indicate that children have fuzzy category boundaries between vowels adjacent on the height continuum. The pattern of results on oral categorization and written tasks suggests a reciprocal relationship. Categorization ability improved after 6 months of schooling. However, vowels that children found more difficult to categorize were also more difficult to read and spell.  相似文献   

6.
A speculative neuronal template, equivalent to canonical syllable forms and independent of segmental representations, is offered to help account for (1) the inviolate nature of phonotactic constraints in aphasic speech output, and (2) left hemisphere specialization for speech sound access and output. The model, which attempts to relate plausible neuronal systems to linguistic function, is based on cell assemblies that are thought to develop by way of genetic predisposition and ontogenetic language experience, into configurations that can represent canonical slot positions for the consonants and vowel comprising a syllable. The syllable is assumed to be the basic organizational rhythmic unit for serial concatenation of sublexical segments. A scheme for neurological differentiation of vowels and consonants is offered. Phonotactic constraints can become "hard-wired" to help create the automaticity underlying phonological sound organization. Testable predictions are offered to substantiate the claims of the model.  相似文献   

7.
Ten nonaphasic left cerebrovascular accident (CVA) patients, 12 right CVA patients, and 16 normals were matched for age, education, lesion sizes, and postonset intervals; all were right handed. One task consisted of 36 sentences connoting one of six primary emotions (joy, sadness, fear, surprise, disgust, anger) presented binaurally with a neutral emotional tone. Subjects were required to point to the appropriate emotion name on a vertically arranged list. A second task consisted of the same 36 sentences voiced emotionally by humming with a closed mouth, presented binaurally, and requiring the same response as for the preceding task. A third task consisted of 18 of the sentences spoken with concordant emotional tone and the remaining 18 sentences spoken with discordant emotional tone, presented binaurally and requiring pointing to the word "SAME" or "DIFFERENT" arrayed vertically. The right hemisphere (RH) patients were significantly impaired, relative to the left hemisphere (LH) patients and normals, on the pure prosody task (2) and on the emotional concordance task (3), the latter effect being significant only for mismatch categorization. The LH patients performed (nonsignificantly) less well than the RH patients and normals on the verbal contextual task (1). Performances on the three tasks were not significantly correlated in the patient groups. It was concluded that the RH probably dominates for phonetic discrimination of vowel trains (fundamental frequency and/or single vowel or multivowel contour) and that the RH probably dominates for certain forms of selective attention in the verbal domain perhaps involving simultaneous mismatch treatment of ongoing sentence-level, distracting, complementary, verbal processes. Comparison of similar right and left, cortical (frontoparietal), and subcortical (capsule and basal ganglia) lesions suggested, but did not prove, that the RH pure prosody impairment is cortical whereas the RH tonal-semantic mismatch categorization impairment involves subcortical as well as cortical contributions.  相似文献   

8.
Working memory uses central sound representations as an informational basis. The central sound representation is the temporally and feature-integrated mental representation that corresponds to phenomenal perception. It is used in (higher-order) mental operations and stored in long-term memory. In the bottom-up processing path, the central sound representation can be probed at the level of auditory sensory memory with the mismatch negativity (MMN) of the event-related potential. The present paper reviews a newly developed MMN paradigm to tap into the processing of speech sound representations. Preattentive vowel categorization based on F1-F2 formant information occurs in speech sounds and complex tones even under conditions of high variability of the auditory input. However, an additional experiment demonstrated the limits of the preattentive categorization of language-relevant information. It tested whether the system categorizes complex tones containing the F1 and F2 formant components of the vowel /a/ differently than six sounds with nonlanguage-like F1-F2 combinations. From the absence of an MMN in this experiment, it is concluded that no adequate vowel representation was constructed. This shows limitations of the capability of preattentive vowel categorization.  相似文献   

9.
Two experiments were conducted to test the hypothesis that higher formant normalization results from the auditory integration of F2 and F3 when they are within 3 Bark of each other. In the first experiment, Formants 3-5 were manipulated in both a "hid"-"head" continuum (in which F2 and F3 are within 3 Bark of each other) and a "hood"-"HUD" continuum (in which F2 and F3 are not within 3 Bark of each other). It was found that there was a shift in identification consistent with the higher formant normalization effect only in the "hid"-"head" continuum. In the second experiment, F3 alone was manipulated in a "hood"-"HUD" continuum. The amplitude of F3 in this continuum was increased (as compared with the F3 in the "hood"-"HUD" continuum used in Experiment 1) and a pretest indicated that the shift in F3 could be detected. As in the first experiment, there was no shift in identification associated with shifting F3 frequency in a back-vowel continuum. The results of these experiments are not consistent with an explanation of higher formant normalization in which hearers adjust an internal vowel space in response to higher formant information; rather, the present findings indicate that higher formant normalization results from auditory integration of F2 and F3.  相似文献   

10.
The ability to form perceptual equivalence classes from variable input stimuli is common in both animals and humans. Neural circuitry that can disambiguate ambiguous stimuli to arrive at perceptual constancy has been documented in the barn owl's inferior colliculus where sound-source azimuth is signaled by interaural phase differences spanning the frequency spectrum of the sound wave. Extrapolating from the sound-localization system of the barn owl to human speech, 2 hypothetical models are offered to conceptualize the neural realization of relative invariance in (a) categorization of stop consonants/b, d, g/ across varying vowel contexts and (b) vowel identity across speakers. 2 computational algorithms employing real speech data were used to establish acoustic commonalities to form neural mappings representing phonemic equivalence classes in the form of functional arrays similar to those seen in the barn owl.  相似文献   

11.
Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important are the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context dependent. This study assessed the informational assumptions of several models of speech categorization, in particular, the number of cues that are the basis of categorization and whether these cues represent the input veridically or have undergone compensation. We collected a corpus of 2,880 fricative productions (Jongman, Wayland, & Wong, 2000) spanning many talker and vowel contexts and measured 24 cues for each. A subset was also presented to listeners in an 8AFC phoneme categorization task. We then trained a common classification model based on logistic regression to categorize the fricative from the cue values and manipulated the information in the training set to contrast (a) models based on a small number of invariant cues, (b) models using all cues without compensation, and (c) models in which cues underwent compensation for contextual factors. Compensation was modeled by computing cues relative to expectations (C-CuRE), a new approach to compensation that preserves fine-grained detail in the signal. Only the compensation model achieved a similar accuracy to listeners and showed the same effects of context. Thus, even simple categorization metrics can overcome the variability in speech when sufficient information is available and compensation schemes like C-CuRE are employed.  相似文献   

12.
Information-accumulation theory of speeded categorization   总被引:6,自引:0,他引:6  
A process model of perceptual categorization is presented, in which it is assumed that the earliest stages of categorization involve gradual accumulation of information about object features. The model provides a joint account of categorization choice proportions and response times by assuming that the probability that the information-accumulation process stops at a given time after stimulus presentation is a function of the stimulus information that has been acquired. The model provides an accurate account of categorization response times for integral-dimension stimuli and for separable-dimension stimuli, and it also explains effects of response deadlines and exemplar frequency.  相似文献   

13.
About 700 misspelled words were collected from the responses produced on a fill-in-the-blank task by two groups of advanced learners of English, (1) English-speaking children (9–11 years) and (2) Spanish-speaking adults studying English. The spelling errors were coded using a detailed categorization system (whose use and rationale are described), and the resulting tabulations were analyzed for differences between the two subject groups or differences across error types. The two groups were similar in that they both made proportionally more vowel than consonant errors. On the other hand, significant differences between the subject groups were found in three of the major categories: The Spanish speakers made more errors involving consonant doubling, while the native English speakers made more involving the unstressed vowel schwa (||) and the grapheme silent |e|. It is argued that these differences stem from the language backgrounds and resulting spelling strategies of the two groups, and the paper concludes with a discussion of the need for other studies comparing the spelling errors of first-and second-language learners.  相似文献   

14.
The distribution of striate cortex cells exhibits a maximum number of cells tuned to vertical and horizontal orientations (Mansfield, 1974). This was interpreted as an adaptation of the visual system to the presence in the visual environment of greater amounts of vertical and horizontal information compared to information from other orientations (Keil & Cristobal, 2000). The present research confirms that vertical and horizontal orientations are, indeed, present in greater number in natural scenes. After normalization of the amount of information across all orientations, vertical information appeared to be better for bottom-up categorization. We demonstrate this using a connectionist autoassociator model of categorization used elsewhere in simulations of early infant categorization.  相似文献   

15.
Most models of word recognition concerned with prosody are based on a distinction between strong syllables (containing a full vowel) and weak syllables (containing a schwa). In these models, the possibility that listeners take advantage of finer grained prosodic distinctions, such as primary versus secondary stress, is usually rejected on the grounds that these two categories are not discriminable from each other without lexical information or normalization of the speaker's voice. In the present experiment, subjects were presented with word fragments that differed only by their degree of stress--namely, primary or secondary stress (e.g.,/'prasI/vs./"prasI/). The task was to guess the origin of the fragment (e.g., "prosecutor" vs. "prosecution"). The results showed that guessing performance significantly exceeds the chance level, which indicates that making fine stress distinctions is possible without lexical information and with minimal speech normalization. This finding is discussed in the framework of prosody-based word recognition theories.  相似文献   

16.
A model was quantified to describe the integration of vowel duration, fricative duration, and fundamental frequency (F0) contour as cues to final position fricatives differing in voicing. The basic assumptions are that perceived vowel duration and perceived frication duration are cues to the identity of final position fricatives and that both F0 contour and vowel duration influence perceived vowel duration. Binary choice and rating responses to synthetic stimuli varying independently along the three dimensions were collected. The results were consistent with the assumption that F0 contour operates by modifying perceived vowel duration, which is a direct cue. Unfortunately, the nature of the modification appears to be very similar in form to that which results from the integration of two independent cues in syllable identification. Therefore, the results do not allow a rejection of the idea that the perception of F0 contour may directly cue the identity of final position fricatives.  相似文献   

17.
Synthetic continua of two minimal pairs, BAIT-DATE and DATE-GATE, closely modeled on natural utterances by a female speaker, were presented to a group of 16 listeners for identification in full-cue and reduced-cue conditions. Grouped results showed that categorization curves for full- and reduced-cue conditions differed significantly in both contrasts. However, an averaging of results obscures marked variability in labeling behavior. Some listeners showed large changes in categorization between the full- and reduced-cue conditions, whereas others showed relatively small or no changes. In a follow-up study, perception of the BAIT-DATE contrast was compared with the perception of a highly stylized BA-DA continuum. A smaller degree of intersubject and between-condition variability was found for these less complex synthetic stimuli. The amount of variability found in the labeling of speech contrasts may be dependent on cue salience, which will be determined by the speech pattern complexity of the stimuli and by the vowel environment.  相似文献   

18.
19.
Wilson C 《Cognitive Science》2006,30(5):945-982
There is an active debate within the field of phonology concerning the cognitive status of substantive phonetic factors such as ease of articulation and perceptual distinctiveness. A new framework is proposed in which substance acts as a bias, or prior, on phonological learning. Two experiments tested this framework with a method in which participants are first provided highly impoverished evidence of a new phonological pattern, and then tested on how they extend this pattern to novel contexts and novel sounds. Participants were found to generalize velar palatalization (e.g., the change from [k] as in keep to [t⌢∫S] as in cheap) in a way that accords with linguistic typology, and that is predicted by a cognitive bias in favor of changes that relate perceptually similar sounds. Velar palatalization was extended from the mid front vowel context (i.e., before [e] as in cape) to the high front vowel context (i.e., before [i] as in keep), but not vice versa. The key explanatory notion of perceptual similarity is quantified with a psychological model of categorization, and the substantively biased framework is formalized as a conditional random field. Implications of these results for the debate on substance, theories of phonological generalization, and the formalization of similarity are discussed.  相似文献   

20.
The reported research investigates how listeners recognize coarticulated phonemes. First, 2 data sets from experiments on the recognition of coarticulated phonemes published by D. H. Whalen (1989) are reanalyzed. The analyses indicate that listeners used categorization strategies involving a hierarchical dependency. Two new experiments are reported investigating the production and perception of fricative-vowel syllables. On the basis of measurements of acoustic cues on a large set of natural utterances, it was predicted that listeners would use categorization strategies involving a dependency of the fricative categorization on the perceived vowel. The predictions were tested in a perception experiment using a 2-dimensional synthetic fricative-vowel continuum. Model analyses of the results pooled across listeners confirmed the predictions. Individual analyses revealed some variability in the categorization dependencies used by different participants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号