首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The distinction between auditory and phonetic processes in speech perception was used in the design and analysis of an experiment. Earlier studies had shown that dichotically presented stop consonants are more often identified correctly when they share place of production (e.g., /ba-pa/) or voicing (e.g., /ba-da/) than when neither feature is shared (e.g., /ba-ta/). The present experiment was intended to determine whether the effect has an auditory or a phonetic basis. Increments in performance due to feature-sharing were compared for synthetic stop-vowel syllables in which formant transitions were the sole cues to place of production under two experimental conditions: (1) when the vowel was the same for both syllables in a dichotic pair, as in our earlier studies, and (2) when the vowels differed. Since the increment in performance due to sharing place was not diminished when vowels differed (i.e., when formant transitions did not coincide), it was concluded that the effect has a phonetic rather than an auditory basis. Right ear advantages were also measured and were found to interact with both place of production and vowel conditions. Taken together, the two sets of results suggest that inhibition of the ipsilateral signal in the perception of dichotically presented speech occurs during phonetic analysis.  相似文献   

2.
A dichotic listening experiment was conducted to determine if vowel perception is based on phonetic feature extraction as is consonant perception. Twenty normal right-handed subjects were given dichotic CV syllables contrasting in final vowels. It was found that, unlike consonants, the perception of dichotic vowels was not significantly lateralized, that the dichotic perception of vowels was not significantly enhanced by the number of phonetic features shared, and that the occurrence of double-blend errors was not greater than chance. However, there was strong evidence for the use of phonetic features at the level of response organization. It is suggested that the differences between vowel and consonant perception reflect the differential availability of the underlying acoustic information from auditory store, rather than differences in processing mechanisms.  相似文献   

3.
The nature of acoustic memory and its relationship to the categorizing process in speech perception is investigated in three experiments on the serial recall of lists of syllables. The first study confirms previous reports that sequences comprising the syllables, bah, dah, and gah show neither enhanced retention when presented auditorily rather than visually, nor a recency effect—both occurred with sequences in which vowel sounds differed (bee, bih, boo). This was found not to be a simple vowel-consonant difference since acoustic memory effects did occur with consonant sequences that were acoustically more discriminable (sha, ma, ga and ash, am, ag). Further experiments used the stimulus suffix effect to provide evidence of acoustic memory, and showed (1), increasing the acoustic similarity of the set grossly impairs acoustic memory effects for vowels as well as consonants, and (2) such memory effects are no greater for steady-state vowels than for continuously changing diphthongs. It is concluded that the usefulness of the information that can be retrieved from acoustic memory depends on the acoustic similarity of the items in the list rather than on their phonetic class or whether or not they have “encoded” acoustic cues. These results question whether there is any psychological evidence for “encoded” speech sounds being categorized in ways different from other speech sounds.  相似文献   

4.
A complete understanding of visual phonetic perception (lipreading) requires linking perceptual effects to physical stimulus properties. However, the talking face is a highly complex stimulus, affording innumerable possible physical measurements. In the search for isomorphism between stimulus properties and phoneticeffects, second-order isomorphism was examined between theperceptual similarities of video-recorded perceptually identified speech syllables and the physical similarities among the stimuli. Four talkers produced the stimulus syllables comprising 23 initial consonants followed by one of three vowels. Six normal-hearing participants identified the syllables in a visual-only condition. Perceptual stimulus dissimilarity was quantified using the Euclidean distances between stimuli in perceptual spaces obtained via multidimensional scaling. Physical stimulus dissimilarity was quantified using face points recorded in three dimensions by an optical motion capture system. The variance accounted for in the relationship between the perceptual and the physical dissimilarities was evaluated using both the raw dissimilarities and the weighted dissimilarities. With weighting and the full set of 3-D optical data, the variance accounted for ranged between 46% and 66% across talkers and between 49% and 64% across vowels. The robust second-order relationship between the sparse 3-D point representation of visible speech and the perceptual effects suggests that the 3-D point representation is a viable basis for controlled studies of first-order relationships between visual phonetic perception and physical stimulus attributes.  相似文献   

5.
Discrimination of speech sounds from three computer-generated continua that ranged from voiced to voiceless syllables (/ba-pa/, /da-ta/, and ga-ha/ was tested with three macaques. The stimuli on each continuum varied in voice-onset time (VOT). Paris of stimuli that were equally different in VOT were chosen such that they were either within-category pairs (syllables given the same phonetic label by human listeners) or between-category paks (syllables given different phonetic labels by human listeners). Results demonstrated that discrimination performance was always best for between-category pairs of stimuli, thus replicating the “phoneme boundary effect” seen in adult listeners and in human infants as young as I month of age. The findings are discussed in terms of their specific impact on accounts of voicing perception in human listeners and in terms of their impact on discussions of the evolution of language.  相似文献   

6.
Identification of CV syllables was studied in a backward masking paradigm in order to examine two types of interactions observed between dichotically presented speech sounds: the feature sharing effect and the lag effect. Pairs of syllables differed in the consonant, the vowel, and their relative times of onset. Interference between the two dichotic inputs was observed primarily for pairs which contrasted on voicing. Performance on pairs that shared voicing remained excellent under all three conditions. The results suggest that the interference underlying the lag effect and the feature sharing effect for voicing occur before phonetic analysis where both auditory inputs interact.  相似文献   

7.
The auditory neural representations of infants can easily be studied with electroencephalography using mismatch experimental designs. We recorded high‐density event‐related potentials while 3‐month‐old infants were listening to trials consisting of CV syllables produced with different vowels (/bX/ or /gX/). The consonant remained the same for the first three syllables, followed (or not) by a change in the fourth position. A consonant change evoked a significant difference around the second auditory peak (400–600 ms) relative to control trials. This mismatch response demonstrates that the infants robustly categorized the consonant despite coarticulation that blurs the phonetic cues, and at an age at which they do not produce these consonants themselves. This response was obtained even when infants had no visual articulatory information to help them to track the consonant repetition. In combination with previous studies establishing categorical perception and normalization across speakers, this result demonstrates that preverbal infants already have abstract phonetic representation integrating over acoustical features in the first months of life.  相似文献   

8.
Speech perception without hearing   总被引:6,自引:0,他引:6  
In this study of visual phonetic speech perception without accompanying auditory speech stimuli, adults with normal hearing (NH; n = 96) and with severely to profoundly impaired hearing (IH; n = 72) identified consonant-vowel (CV) nonsense syllables and words in isolation and in sentences. The measures of phonetic perception were the proportion of phonemes correct and the proportion of transmitted feature information for CVs, the proportion of phonemes correct for words, and the proportion of phonemes correct and the amount of phoneme substitution entropy for sentences. The results demonstrated greater sensitivity to phonetic information in the IH group. Transmitted feature information was related to isolated word scores for the IH group, but not for the NH group. Phoneme errors in sentences were more systematic in the IH than in the NH group. Individual differences in phonetic perception for CVs were more highly associated with word and sentence performance for the IH than for the NH group. The results suggest that the necessity to perceive speech without hearing can be associated with enhanced visual phonetic perception in some individuals.  相似文献   

9.
Duplex perception occurs when the phonetically distinguishing transitions of a syllable are presented to one ear and the rest of the syllable (the “base”) is simultaneously presented to the other ear. Subjects report hearing both a nonspeech “chirp” and a speech syllable correctly cued by the transitions. In two experiments, we compared phonetic identification of intact syllables, duplex percepts, isolated transitions, and bases. In both experiments, subjects were able to identify the phonetic information encoded into isolated transitions in the absence of an appropriate syllabic context. Also, there was no significant difference in phonetic identification of isolated transitions and duplex percepts. Finally, in the second experiment, the category boundaries from identification of isolated transitions and duplex percepts were not significantly different from each other. However, both boundaries were statistically different from the category boundary for intact syllables. Taken together, these results suggest that listeners do not need to perceptually integrate F2 transitions or F2 and F3 transition pairs with the base in duplex perception. Rather, it appears that listeners identify the chirps as speech without reference to the base.  相似文献   

10.
When listeners hear a sinusoidal replica of a sentence, they perceive linguistic properties despite the absence of short-time acoustic components typical of vocal signals. Is this accomplished by a postperceptual strategy that accommodates the anomalous acoustic pattern ad hoc, or is a sinusoidal sentence understood by the ordinary means of speech perception? If listeners treat sinusoidal signals as speech signals however unlike speech they may be, then perception should exhibit the commonplace sensitivity to the dimensions of the originating vocal tract. The present study, employing sinusoidal signals, raised this issue by testing the identification of target /bVt/, or b-vowel-t, syllables occurring in sentences that differed in the range of frequency variation of their component tones. Vowel quality of target syllables was influenced by this acoustic correlate of vocal-tract scale, implying that the perception of these nonvocal signals includes a process of vocal-tract scale, implying that the perception of these nonvocal signals includes a process of vocal-tract normalization. Converging evidence suggests that the perception of sinusoidal vowels depends on the relation among component tones and not on the phonetic likeness of each tone in isolation. The findings support the general claim that sinusoidal replicas of natural speech signals are perceptible phonetically because they preserve time-varying information present in natural signals.  相似文献   

11.
The aim of this study is to investigate whether speech sounds--as is stated by the widely accepted theory of categorical perception of speech--can be perceived only as instances of phonetic categories, or whether physical differences between speech sounds lead to perceptual differences regardless of their phonetic categorization. Subjects listened to pairs of synthetically generated speech sounds that correspond to realizations of the syllables "ba" and "pa" in natural German, and they were instructed to decide as fast as possible whether they perceived them as belonging to the same or to different phonetic categories. For 'same'-responses reaction times become longer when the physical distance between the speech sounds is increased; for 'different'-responses reaction times become shorter with growing physical distance between the stimuli. The results show that subjects can judge speech sounds on the basis of perceptual continua, which is inconsistent with the theory of categorical perception. A mathematical model is presented that attempts to explain the results by postulating two interacting stages of processing, a psychoacoustical and a phonetic one. The model is not entirely confirmed by the data, but it seems to deserve further consideration.  相似文献   

12.
Two new experimental operations were used to distinguish between auditory and phonetic levels of processing in speech perception: the first based on reaction time data in speeded classification tasks with synthetic speech stimuli, and the second based on average evoked potentials recorded concurrently in the same tasks. Each of four experiments compared the processing of two different dimensions of the same synthetic consonant-vowel syllables. When a phonetic dimensions was compared to an auditory dimension, different patterns of results were obtained in both the reaction time and evoked potential data. No such differences were obtained for isolated acoustic components of the phonetic dimension or for two purely auditory dimensions. Together with other recent evidence, the present results constitute additional converging operations on the distinction between auditory and phonetic processes in speech perception and on the idea that phonetic processing involves mechanisms that are lateralized in one cerebral hemisphere.  相似文献   

13.
Recognition memory for consonants and vowels selected from within and between phonetic categories was examined in a delayed comparison discrimination task. Accuracy of discrimination for synthetic vowels selected from both within and between categories was inversely related to the magnitude of the comparison interval. In contrast, discrimination of synthetic stop consonants remained relatively stable both within and between categories. The results indicate that differences in discrimination between consonants and vowels are primarily due to the differential availability of auditory short-term memory for the acoustic cues distinguishing these two classes of speech sounds. The findings provide evidence for distinct auditory and phonetic memory codes in speech perception.  相似文献   

14.
This study investigated the acoustic correlates of perceptual centers (p-centers) in CV and VC syllables and developed an acoustic p-center model. In Part 1, listeners located syllables’ p-centers by a method-of-adjustment procedure. The CV syllables contained the consonants /?/, /r/, /n /, /t/, /d/, /k/, and /g/; the VCs, the consonants /?/, /r/, and /n/. The vowel in all syllables was /a/. The results of this experiment replicated and extended previous findings regarding the effects of phonetic variation on p-centers. In Part 2, a digital signal processing procedure was used to acoustically model p-center perception. Each stimulus was passed through a six-band digital filter, and the outputs were processed to derive low-frequency modulation components. These components were weighted according to a perceived modulation magnitude function and recombined to create sixpsychoacoustic envelopes containing modulation energies from 3 to 47 Hz. In this analysis, p-centers were found to be highly correlated with the time-weighted function of the rate-of-change in the psychoacoustic envelopes, multiplied by the psychoacoustic envelope magnitude increment. The results were interpreted as suggesting (1) the probable role of low-frequency energy modulations in p-center perception, and (2) the presence of perceptual processes that integrate multiple articulatory events into a single syllabic event.  相似文献   

15.
Some reaction time experiments are reported on the relation between the perception and production of phonetic features in speech. Subjects had to produce spoken consonant-vowel syllables rapidly in response to other consonant-vowel stimulus syllables. The stimulus syllables were presented auditorily in one condition and visually in another. Reaction time was measured as a function of the phonetic features shared by the consonants of the stimulus and response syllables. Responses to auditory stimulus syllables were faster when the response syllables started with consonants that had the same voicing feature as those of the stimulus syllables. A shared place-of-articulation feature did not affect the speed of responses to auditory stimulus syllables, even though the place feature was highly salient. For visual stimulus syllables, performance was independent of whether the consonants of the response syllables had the same voicing, same place of articulation, or no shared features. This pattern of results occurred in cases where the syllables contained stop consonants and where they contained fricatives. It held for natural auditory stimuli as well as artificially synthesized ones. The overall data reveal a close relation between the perception and production of voicing features in speech. It does not appear that such a relation exists between perceiving and producing places of articulation. The experiments are relevant to the motor theory of speech perception and to other models of perceptual-motor interactions.  相似文献   

16.
Vocal tract resonances, called formants, are the most important parameters in human speech production and perception. They encode linguistic meaning and have been shown to be perceived by a wide range of species. Songbirds are also sensitive to different formant patterns in human speech. They can categorize words differing only in their vowels based on the formant patterns independent of speaker identity in a way comparable to humans. These results indicate that speech perception mechanisms are more similar between songbirds and humans than realized before. One of the major questions regarding formant perception concerns the weighting of different formants in the speech signal (“acoustic cue weighting”) and whether this process is unique to humans. Using an operant Go/NoGo design, we trained zebra finches to discriminate syllables, whose vowels differed in their first three formants. When subsequently tested with novel vowels, similar in either their first formant or their second and third formants to the familiar vowels, similarity in the higher formants was weighted much more strongly than similarity in the lower formant. Thus, zebra finches indeed exhibit a cue weighting bias. Interestingly, we also found that Dutch speakers when tested with the same paradigm exhibit the same cue weighting bias. This, together with earlier findings, supports the hypothesis that human speech evolution might have exploited general properties of the vertebrate auditory system.  相似文献   

17.
Despite many attempts to define the major unit of speech perception, none has been generally accepted. In a unique study, Mermelstein (1978) claimed that consonants and vowels are the appropriate units because a single piece of information (duration, in this case) can be used for one distinction without affecting the other. In a replication, this apparent independence was found, instead, to reflect a lack of statistical power: The vowel and consonant judgments did interact. In another experiment, interdependence of two phonetic judgments was found in responses based on the fricative noise and the vocalic formants of a fricative-vowel syllable. These results show that each judgment made on speech signals must take into account other judgments that compete for information in the same signal. An account is proposed that takes segments as the primary units, with syllables imposing constraints on the shape they may take.  相似文献   

18.
Properties of memory for unattended spoken syllables   总被引:1,自引:0,他引:1  
Whereas previous studies on memory for unattended speech have inadvertently included acoustic interference, the present study examines memory for unattended syllables during a silent period of 1, 5, or 10 s. The primary task was to read silently (Experiments 1-3) or whisper the reading (Experiment 4). Occasionally, when a light cue occurred, the subject was to recall the most recent spoken syllable, as well as the recent reading material. Memory for both the vowels and consonants of the syllables decreased across 10 s, confirming that auditory memory does decay in the absence of acoustic interference. However, the specific patterns of memory decay for vowels versus consonants depended on task demands, including the allocation of attention and the opportunity for subvocal coding. We suggest an account of performance that includes auditory sensory and phonetic memory codes with different properties, used in combination.  相似文献   

19.
Dorman (1974) found that small-intensity differences carried on the initial portions of consonantvowel syllables were not discriminable. Similar differences carried on steady-state vowels and on isolated formant transitions, however, were readily discriminable. He interpreted the difference between the first and latter conditions as a phonetic effect. Yet Pastore, Ahroon, Wolz, Puleo, and Berger (1975) found similar results using sine-wave analogs to Dorman’s stimuli. They concluded that the effect is not phonetic, and that it is attributable to simple backward masking. The present studies observed the discriminability of intensity differences carried on formant transitions varying in extent and duration. Results support the conclusion of Pastore et al. to the extent that the effect is clearly not phonetic. However, these results and others suggest that simple peripheral backward masking is not a likely cause; instead, recognition masking may be involved. Moreover, the finding that phonetic-like processes occur elsewhere in audition does not necessarily impugn the existence of a speech processor; phonemic and phonological processes remain, as yet, unmatched.  相似文献   

20.
Although many individual speech contrasts pairs have been studied within the cross-language literature, no one has created a comprehensive and systematic set of such stimuli. This article justifies and details an extensive set of contrast pairs for Mandarin Chinese and American English. The stimuli consist of 180 pairs of CVC syllables recorded in two tokens each (720 syllables total). Between each CVC pair, two of the segments are identical, whereas the third differs in that a segment drawn from a "native" phonetic category (either Mandarin, English, or both) is partnered with a segment drawn from a "foreign" phonetic category (nonnative to Mandarin, English, or both). Each contrast pair differs by a minimal phonetic amount and constitutes a meaningful contrast among the world's languages (as cataloged in the UCLA Phonological Segment Inventory Database of 451 languages). The entire collection of phonetic differences envelops Mandarin and English phonetic spaces and generates a range of phonetic discriminability. Contrastive segments are balanced through all possible syllable positions, with noncontrastive segments being filled in with other "foreign" segments. Although intended to measure phonetic perceptual sensitivity among adult speakers of the two languages, these stimuli are offered here to all for similar or for altogether unrelated investigations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号