首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
To help resolve the issue of whether developmental dyslexia (DD) is related to central auditory processing deficits or to language-specific processing deficits, we had nine dyslectic and nine nondyslectic right-handed undergraduate students perform linguistic (Experiment 1: phoneme identification) and nonlinguistic (Experiment 2: formant rate change detection) tasks. In Experiment 1, subjects listened to synthetic vowels whose second formant (F2) was modulated sinusoidally with F1, F3, and F4 held constant. F2 modulation rate (4-18 Hz) was manipulated within and across stimuli. The groups did not differ in phoneme identification. Experiment 2 was run three times and showed that the control subjects' performance improved across runs whereas the dyslexics' deteriorated across runs (p < .0001), suggesting practice and fatigue effects, respectively. Performance on the two experiments correlated significantly and negatively for the dyslexic subjects only. These results suggest that resource depletion or frontal lobe dysfunction may be implicated in developmental dyslexia.  相似文献   

2.
According to the formant centre of gravity (FCOG) hypothesis, two vowel formants in close proximity are merged during perceptual analysis, and their contribution to vowel quality depends on the centre of gravity of the formant cluster. Findings consistent with this hypothesis are that two formants can be replaced by a single formant of intermediate centre frequency, provided their separation is less than 3-3.5 Bark; and that changes in their relative amplitudes produce systematic shifts in vowel quality. In Experiment 1, listeners adjusted the frequencies of F1 and F2 in a synthesized 6-formant vowel (with the F1-F2 separation fixed at 250 Hz, i.e. less than 3 Bark) to find the best phonetic match to a reference vowel with modified formant amplitudes. Contrary to FCOG predictions, F2 attenuation did not produce lower frequency matches. Raising the amplitude of F2 led to predicted upward shifts in formant frequencies of the matched vowel, but with increased variability of matches for some stimuli. In Experiment 2, listeners identified synthesized vowels with a range of separations of F1 and F2. Formant amplitude manipulations had no effect on listeners' judgements when the fundamental frequency was low (125 Hz). Small shifts in vowel quality appeared for stimuli with a high fundamental (250 Hz), but the shifts were significantly larger for F1-F2 separations greater than 3.5 Bark. These effects of formant amplitude are qualitatively different from those observed with single-formant vowels and are generally incompatible with a formant-averaging mechanism.  相似文献   

3.
Subjects rated ambiguous steady-state vowels from a continuum with respect to the categories/i/ and /I/ (Experiment 1) or/ε/ and/ae/ (Experiment 2). Each target was preceded, .35 sec earlier, by one of the following precursors: (1) one endpoint from the target continuum, (2) the other endpoint, (3) the isolated first formant (F1) from (1), (4) the isolated F1 from (2), or (5), a hissing noise. Although (3) and (4) did not sound like they came from the target continuum, they produced reliable contrast in both experiments. In the /i/-/I/ experiment, contrast was as powerful from single formants as from the full vowels. These results suggest a sensory, rather than a judgmental, basis for the vowel contrast effects obtained.  相似文献   

4.
Working memory uses central sound representations as an informational basis. The central sound representation is the temporally and feature-integrated mental representation that corresponds to phenomenal perception. It is used in (higher-order) mental operations and stored in long-term memory. In the bottom-up processing path, the central sound representation can be probed at the level of auditory sensory memory with the mismatch negativity (MMN) of the event-related potential. The present paper reviews a newly developed MMN paradigm to tap into the processing of speech sound representations. Preattentive vowel categorization based on F1-F2 formant information occurs in speech sounds and complex tones even under conditions of high variability of the auditory input. However, an additional experiment demonstrated the limits of the preattentive categorization of language-relevant information. It tested whether the system categorizes complex tones containing the F1 and F2 formant components of the vowel /a/ differently than six sounds with nonlanguage-like F1-F2 combinations. From the absence of an MMN in this experiment, it is concluded that no adequate vowel representation was constructed. This shows limitations of the capability of preattentive vowel categorization.  相似文献   

5.
Recent work by Summerfield (1975) and others indicates that a listener’s phonemic judgments may vary with the utterance rate of prior context. In particular, if a phonemic distinction is signaled by a temporal cue such as voice onset time (VOT), faster utterance rates tend to shift the phoneme boundary toward smaller values of that cue. The listener thus appears to “normalize” temporal cues according to utterance rate. In the present experiment, subjects identified syllables varying in VOT ([ga]-[kha]) following either a slow or a fast version of the phrase “Teddy hears_ _ _ _ .” Typical normalization effects were observed when the precursor phrase and target syllable had formant frequencies corresponding to an adult male vocal tract. However, a reversal of the typical pattern (i.e., a shift in the perceived voicing boundary towardlarger values of VOT with an increased utterance rate) occurred when the precursor and target had formant frequencies corresponding to an adult female vocal tract. Both normalization and “reverse” normalization effects were reduced or eliminated under several conditions of source change between precursor and target. These conditions included a change in fundamental frequency, a change in implied vocal-tract size (as reflected in an upward or downward scaling of formant frequencies), or both.  相似文献   

6.
Underlying auditory processes in speech perception were explored. Specifically of interest were the stages of auditory processing involved in the integration of dynamic information in nontraditional speech cues such as the virtual formant transitions. These signals utilize intensity ratio cues and changes in spectral center-of-gravity (instead of the actual formant frequency transitions) to produce perceived F3 glides. 6 men and 8 women (M age = 24.2 yr., SD = 2.1), recruited through posted materials from graduate students at The Ohio State University, participated in two experiments. The results for frequency-based formant transitions (Exp. 1) indicated that spectral cues to syllable identification are combined at more central levels of auditory processing. However, when the components of the virtual formant stimuli were divided between the ears in a dichotic listening task (Exp. 2), the results indicated that auditory spectral integration may occur above the auditory periphery but at stages more intermediate rather than central.  相似文献   

7.
Twelve male listeners categorized 54 synthetic vowel stimuli that varied in second and third formant frequency on a Bark scale into the American English vowel categories [see text]. A neuropsychologically plausible model of categorization in the visual domain, the Striatal Pattern Classifier (SPC; Ashby & Waldron, 1999), is generalized to the auditory domain and applied separately to the data from each observer. Performance of the SPC is compared with that of the successful Normal A Posteriori Probability model (NAPP; Nearey, 1990; Nearey & Hogan, 1986) of auditory categorization. A version of the SPC that assumed piece-wise linear response region partitions provided a better account of the data than the SPC that assumed linear partitions, and was indistinguishable from a version that assumed quadratic response region partitions. A version of the NAPP model that assumed nonlinear response regions was superior to the NAPP model with linear partitions. The best fitting SPC provided a good account of each observer's data but was outperformed by the best fitting NAPP model. Implications for bridging the gap between the domains of visual and auditory categorization are discussed.  相似文献   

8.
Research has shown that speaking rate provides an important context for the perception of certain acoustic properties of speech. For example, syllable duration, which varies as a function of speaking rate, has been shown to influence the perception of voice onset time (VOT) for syllableinitial stop consonants. The purpose of the present experiments was to examine the influence of syllable duration when the initial portion of the syllable was produced by one talker and the remainder of the syllable was produced by a different talker. A short-duration and a long-duration /bi/-/pi/ continuum were synthesized with pitch and formant values appropriate to a female talker. When presented to listeners for identification, these stimuli demonstrated the typical effect of syllable duration on the voicing boundary: a shorter VOT boundary for the short stimuli than for the long stimuli. An /i/ vowel, synthesized with pitch and formant values appropriate to a male talker, was added to the end of each of the short tokens, producing a new hybrid continuum. Although the overall syllable duration of the hybrid stimuli equaled the original long stimuli, they produced a VOT boundary similar to that for the short stimuli. In a second experiment, two new /i/ vowels were synthesized. One had a pitch appropriate to a female talker with formant values appropriate to a male talker; the other had a pitch appropriate to a male talker and formants appropriate to a female talker. These vowels were used to create two new hybrid continua. In a third experiment, new hybrid continua were created by using more extreme male formant values. The results of both experiments demonstrated that the hybrid tokens with a change in pitch acted like the short stimuli, whereas the tokens with a change in formants acted like the long stimuli. A fourth experiment demonstrated that listeners could hear a change in talker with both sets of hybrid tokens. These results indicate that continuity of pitch but not formant structure appears to be the critical factor in the calculation of speaking rate within a syllable.  相似文献   

9.
Three experiments were performed to examine listeners’ thresholds for identifying stimuli whose spectra were modeled after the vowels /i/ and /ε/, with the differences between these stimuli restricted to the frequency of the first formant. The stimuli were presented in a low-pass masking noise that spectrally overlapped the first formant but not the higher formants. Identification thresholds were lower when the higher formants were present than when they were not, even though the first formant contained the only distinctive information for stimulus identification. This indicates that listeners were more sensitive in identifying the first formant energy through its contribution to the vowel than as an independent percept; this effect is given the namecoherence masking protection. The first experiment showed this effect for synthetic vowels in which the distinctive first formant was supported by a series of harmonics that progressed through the higher formants. In the second two experiments, the harmonics in the first formant region were removed, and the first formant was simulated by a narrow band of noise. This was done so that harmonic relations did not provide a basis for grouping the lower formant with the higher formants; coherence masking protection was still observed. However, when the temporal alignment of the onsets and offsets of the higher and lower formants was disrupted, the effect was eliminated, although the stimuli were still perceived as vowels. These results are interpreted as indicating that general principles of auditory grouping that can exploit regularities in temporal patterns cause acoustic energy belonging to a coherent speech sound to stand out in the auditory scene.  相似文献   

10.
Experiments were conducted investigating unimodal and cross-modal phonetic context effects on /r/ and /l/ identifications to test a hypothesis that context effects arise in early auditory speech processing. Experiment 1 demonstrated an influence of a preceding bilabial stop consonant on the acoustic realization of /r/ and /l/ produced within the stop clusters /ibri/ and /ibli/. In Experiment 2, members of an acoustic /iri/ to /ili/ continuum were paired with an acoustic /ibi/. These dichotic tokens were associated with an increase in "l" identification relative to the /iri/ to /ili/ continuum. In Experiment 3, the /iri/ to /ili/ tokens were dubbed onto a video of a talker saying /ibi/. This condition was associated with a reliable perceptual shift relative to an auditory-only condition in which the /iri/ to /ili/ tokens were presented by themselves, ruling out an account of these context effects as arising during early auditory processing.  相似文献   

11.
In two experiments, each including a simple reaction time (RT) task, a localization task, and a passive oddball paradigm, the physical similarity between two dichotically presented auditory stimuli was manipulated. In both experiments, a redundant signals effect (RSE), high localization performance, and a reliable mismatch negativity (MMN) was observed for largely differing stimuli, suggesting that these are coded separately in auditory memory. In contrast, no RSE and a localization rate close to chance level (experiment 1) or at chance (experiment 2) were observed for stimuli differing to a lesser degree. Crucially, for such stimuli a small (experiment 1) or no (experiment 2) MMN were observed. These MMN results indicate that such stimuli tend to fuse into a single percept and that this fusion occurs rather early within information processing.  相似文献   

12.
A series of experiments investigated the effect of phase changes in lownumbered single harmonics in target sounds that were either synthesized steady-state vowels or periodic signals having only a single formant. A matching procedure was used in which subjects selected a sound along a continuum differing in first formant frequency in order to get the best match with the target sound; perceptual effects of the phase manipulations in the target were detected as a change in the matched first formant frequency. Stimuli had to contain at least three harmonics to produce the effect, but it did not require a particular starting phase of the components. A suppression phenomenon is discussed, in which phase changes alter the phase-locking characteristics of auditory fibres tuned to low-numbered harmonics.  相似文献   

13.
Three experiments investigated whether extrinsic vowel normalization takes place largely at a categorical or a precategorical level of processing. Traditional vowel normalization effects in categorization were replicated in Experiment 1: Vowels taken from an [?]–[ε] continuum were more often interpreted as /?/ (which has a low first formant, F 1) when the vowels were heard in contexts that had a raised F 1 than when the contexts had a lowered F 1. This was established with contexts that consisted of only two syllables. These short contexts were necessary for Experiment 2, a discrimination task that encouraged listeners to focus on the perceptual properties of vowels at a precategorical level. Vowel normalization was again found: Ambiguous vowels were more easily discriminated from an endpoint [ε] than from an endpoint [?] in a high-F 1 context, whereas the opposite was true in a low-F 1 context. Experiment 3 measured discriminability between pairs of steps along the [?]–[ε] continuum. Contextual influences were again found, but without discrimination peaks, contrary to what was predicted from the same participants’ categorization behavior. Extrinsic vowel normalization therefore appears to be a process that takes place at least in part at a precategorical processing level.  相似文献   

14.
Listeners tune in to talkers’ vowels through extrinsic normalization. We asked here whether this process could be based on compensation for the long-term average spectrum (LTAS) of preceding sounds and whether the mechanisms responsible for normalization are indifferent to the nature of those sounds. If so, normalization should apply to nonspeech stimuli. Previous findings were replicated with first-formant (F1) manipulations of speech. Targets on a [pt]–[p?t] (low–high F1) continuum were labeled as [pt] more after high-F1 than after low-F1 precursors. Spectrally rotated nonspeech versions of these materials produced similar normalization. None occurred, however, with nonspeech stimuli that were less speechlike, even though precursor–target LTAS relations were equivalent to those used earlier. Additional experiments investigated the roles of pitch movement, amplitude variation, formant location, and the stimuli's perceived similarity to speech. It appears that normalization is not restricted to speech but that the nature of the preceding sounds does matter. Extrinsic normalization of vowels is due, at least in part, to an auditory process that may require familiarity with the spectrotemporal characteristics of speech.  相似文献   

15.
An electrophysiological correlate of the discrimination of stop consonants drawn from within and across phonetic categories was investigated by an auditory evoked response (AER) technique. Ss were presented a string of stimuli from the phonetic category [ba] (the standard stimulus) and were asked to detect the occurrence of a stimulus from the same phonetic category (within-category shift), or the occurrence of a stimulus from a different phonetic category [pa] (across-category shift). Both the across- and within-category shift stimuli differed equally from the standard stimulus in the time of onset of the first formant and in the amount of aspiration in the second and third formants. The NIP2 response of the AER was larger to the across-category shift than to the within-category shift. The within-category shift did not differ from a no-shift control. These findings suggest (1) that the AER can reflect the relative discriminability of stop consonants drawn from the same or different phonetic categories in a manner similar to other behavioral measures; (2) that the detailed acoustic representation of stop consonants is transformed into a categorized phonetic representation within 200 msec after stimulus onset.  相似文献   

16.
Characterization of the vocal profile of profoundly deaf children using an objective voice analysis was carried out in a university-based pediatric otolaryngology clinic. 21 persons ages 3.5 to 18 years were assessed. From each sustained phonation of the vowel /a/ the following acoustic variables were extracted: fundamental frequency (F0), jitter percentage, shimmer percentage, fundamental frequency variation (vF0), peak amplitude variation (vAM), and first, second, and third formant frequencies (F1, F2, F3). Mean F0 was 267.8 Hz and consistent with established normative data. Mean measurements of jitter (0.88%) and shimmer (3.5%) were also within normal limits. The notable feature of the acoustic analysis was a statistically significant elevation in vF0 (2.81%) and vAM (23.58%). With the exception of one subject, the F1, F2, and F3 formant frequencies were comparable to those for normal hearing children. Auditory deprivation results in poor long-term control of frequency and amplitude during sustained phonation. The inability to maintain a sustained phonation may represent the partial collapse of an internal model of voice and speech.  相似文献   

17.
Sussman HM  Fruchter D  Hilbert J  Sirosh J 《The Behavioral and brain sciences》1998,21(2):241-59; discussion 260-99
Neuroethological investigations of mammalian and avian auditory systems have documented species-specific specializations for processing complex acoustic signals that could, if viewed in abstract terms, have an intriguing and striking relevance for human speech sound categorization and representation. Each species forms biologically relevant categories based on combinatorial analysis of information-bearing parameters within the complex input signal. This target article uses known neural models from the mustached bat and barn owl to develop, by analogy, a conceptualization of human processing of consonant plus vowel sequences that offers a partial solution to the noninvariance dilemma--the nontransparent relationship between the acoustic waveform and the phonetic segment. Critical input sound parameters used to establish species-specific categories in the mustached bat and barn owl exhibit high correlation and linearity due to physical laws. A cue long known to be relevant to the perception of stop place of articulation is the second formant (F2) transition. This article describes an empirical phenomenon--the locus equations--that describes the relationship between the F2 of a vowel and the F2 measured at the onset of a consonant-vowel (CV) transition. These variables, F2 onset and F2 vowel within a given place category, are consistently and robustly linearly correlated across diverse speakers and languages, and even under perturbation conditions as imposed by bite blocks. A functional role for this category-level extreme correlation and linearity (the "orderly output constraint") is hypothesized based on the notion of an evolutionarily conserved auditory-processing strategy. High correlation and linearity between critical parameters in the speech signal that help to cue place of articulation categories might have evolved to satisfy a preadaptation by mammalian auditory systems for representing tightly correlated, linearly related components of acoustic signals.  相似文献   

18.
When synthetic fricative noises from a [∫]-[s] continuum are followed by [a] or [u] (with appropriate formant transitions), listeners perceive more instances of [s] in the context of [u] than in the context of [a]. Presumably, this reflects a perceptual adjustment for the coarticulatory effect of rounded vowels on preceding fricatives. In Experiment 1, we found that varying the duration of the fricative noise leaves the perceptual context effect unchanged, whereas insertion of a silent interval following the noise reduces the effect substantially. Experiment 2 suggested that it is temporal separation rather than the perception of an intervening stop consonant that is responsible for this reduction, in agreement with recent, analogous observations on anticipatory coarticulation. In Experiment 3, we showed that the vowel context effect disappears when the periodic stimulus portion is synthesized so as to contain no formant transitions. To dissociate the contribution of formant transitions from contextual effects due to vowel quality per se, Experiment 4 employed synthetic fricative noises followed by periodic portions excerpted from naturally produced [∫a], [sa], [∫u], and [su]. The results showed strong and largely independent effects of formant transitions and vowel quality on fricative perception. In addition, we found a strong speaker (male vs. female) normalization effect. All three influences on fricative perception were reduced by temporal separation of noise and periodic stimulus portions. Although no single hypothesis can explain all of our results, they are generally supportive of the view that some knowledge of the dynamics of speech production has a role in speech perception.  相似文献   

19.
In three experiments, we determined how perception of the syllable-initial distinction between the stop consonant [b] and the semivowel [w], when cued by duration of formant transitions, is affected by parts of the sound pattern that occur later in time. For the first experiment, we constructed four series of syllables, similar in that each had initial formant transitions ranging from one short enough for [ba] to one long enough for [wa], hut different in overall syllable duration. The consequence in perception was that, as syllable duration increased, the [b-w] boundary moved toward transitions of longer duration. Then, in the second experiment, we increased the duration of the sound by adding a second syllable, [da], (thus creating [bada-wada]), and observed that lengthening the second syllable also shifted the perceived [b-w] boundary in the first syllable toward transitions of longer duration; however, this effect was small by comparison with that produced when the first syllable was lengthened equivalently. In the third experiment, we found that altering the structure of the syllable had an effect that is not to be accounted for by the concomitant change in syllable duration: lengthening the syllable by adding syllable-final transitions appropriate for the stop consonant [d] (thus creating [bad-wad]) caused the perceived [b-w] boundary to shift toward transitions of shorter duration, an effect precisely opposite to that produced when the syllable was lengthened to the same extent by adding steady-state vowel. We suggest that, in all these cases, the later-occurring information specifies rate of articulation and that the effect on the earlier-occurring cue reflects an appropriate perceptual normalization.  相似文献   

20.
Identification and discrimination of two-formant [bae-dae-gae] and [pae-tae-kae] synthetic speech stimuli and discrimination of corresponding isolated second formant transitions (chirps) were performed by six subjects. Stimuli were presented at several intensity levels such that the intensity of the F2 transition was equated between speech and nonspeech stimuli, or the overall intensity of the stimulus was equated. At higher intensity (92 dB), b-d-g and p-t-k identification and between-category discrimination performance declined and bilabial-alveolar phonetic boundaries shifted in location on the continuum towards the F2 steady-state frequency. Between-category discrimination improved from performance at 92 dB when 92-dB speech stimuli were simultaneously masked by 60-dB speech noise; alveolar-velar boundaries shifted to a higher frequency location in the 92-dB-plus-noise condition. Chirps were discriminated categorically when presented at 58 dB, but discrimination peaks declined at higher intensities. Perceptual performance for chirps and p-t-k stimuli was very similar, and slightly inferior to performance for b-d-g stimuli, where simultaneous masking by F1 resulted in a lower effective intensity of F2. The results were related to a suggested model involving pitch comparison and transitional quality perceptual strategies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号