首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Phonetic segments are coarticulated in speech. Accordingly, the articulatory and acoustic properties of the speech signal during the time frame traditionally identified with a given phoneme are highly context-sensitive. For example, due to carryover coarticulation, the front tongue-tip position for /1/ results in more fronted tongue-body contact for a /g/ preceded by /1/ than for a /g/ preceded by /r/. Perception by mature listeners shows a complementary sensitivity--when a synthetic /da/-/ga/ continuum is preceded by either /al/ or /ar/, adults hear more /g/s following /l/ rather than /r/. That is, some of the fronting information in the temporal domain of the stop is perceptually attributed to /l/ (Mann, 1980). We replicated this finding and extended it to a signal-detection test of discrimination with adults, using triads of disyllables. Three equidistant items from a /da/-/ga/ continuum were used preceded by /al/ and /ar/. In the identification test, adults had identified item ga5 as "ga,' and dal as "da,' following both /al/ and /ar/, whereas they identified the crucial item d/ga3 predominantly as "ga' after /al/ but as "da' after /ar/. In the discrimination test, they discriminated d/ga3 from da1 preceded by /al/ but not /ar/; compatibly, they discriminated d/ga3 readily from ga5 preceded by /ar/ but poorly preceded by /al/. We obtained similar results with 4-month-old infants. Following habituation to either ald/ga3 or ard/ga3, infants heard either the corresponding ga5 or da1 disyllable. As predicted, the infants discriminated d/ga3 from da1 following /al/ but not /ar/; conversely, they discriminated d/ga3 from ga5 following /ar/ but not /al/. The results suggest that prelinguistic infants disentangle consonant-consonant coarticulatory influences in speech in an adult-like fashion.  相似文献   

2.
Phonetic segments are coarticulated in speech. Accordingly, the articulatory and acoustic properties of the speech signal during the time frame traditionally identified with a given phoneme are highly context-sensitive. For example, due to carryover coarticulation, the front tongue-tip position for HI results in more fronted tongue-body contact for a /g/ preceded by /l/ than for a /g/ preceded by /r/. Perception by mature listeners shows a complementary sensitivity—when a synthetic /da/-/ga/ continuum is preceded by either /al/ or /ar/, adults hear more /g/s following HI rather than Irl. That is, some of the fronting information in the temporal domain of the stop is perceptually attributed to /l/ (Mann, 1980). We replicated this finding and extended it to a signaldetection test of discrimination with adults, using triads of disyllables. Three equidistant items from a /da/-/ga/ continuum were used preceded by /al/ and /ar/. In the identification test, adults had identified item ga5 as “ga”, and dal as “da”, following both /al/ and /ar/, whereas they identified the crucial item d/ga3 predominantly as “ga” after /al/ but as “da” after /ar/. In the discrimination test, they discriminated d/ga3 from dal preceded by /al/ but not /ar/; compatibly, they discriminated d/ga3 readily from ga5 preceded by /ar/ but poorly preceded by /al/. We obtained similar results with 4-month-old infants. Following habituation to either ald/ga3 or ard/ga3, infants heard either the corresponding ga5 or dal disyllable. As predicted, the infants discrimi-nated d/ga3 from dal following /al/ but not /ar/; conversely, they discriminated d/ga3 from ga5 following /ar/ but not /al/. The results suggest that prelinguistic infants disentangle consonant-consonant coarticulatory influences in speech in an adult-like fashion.  相似文献   

3.
Certain attributes of a syllable-final liquid can influence the perceived place of articulation of a following stop consonant. To demonstrate this perceptual context effect, the CV portions of natural tokens of [al-da], [al-ga], [ar-da], [ar-ga] were excised and replaced with closely matched synthetic stimuli drawn from a [da]-[ga] continuum. The resulting hybrid disyllables were then presented to listeners who labeled both liquids and stops. The natural CV portions had two different effects on perception of the synthetic CVs. First, there was an effect of liquid category: Listeners perceived “g” more often in the context of [al] than in that of [ar]. Second, there was an effect due to tokens of [al] and [ar] having been produced before [da] or [ga]: More “g” percepts occurred when stops followed liquids that had been produced before [g]. A hypothesis that each of these perceptual effects finds a parallel in speech production is supported by spectrograms of the original utterances. Here, it seems, is another instance in which findings in speech perception reflect compensation for coarticulation during speech production.  相似文献   

4.
This article reports three experiments designed to explore the basis for speech perceivers' apparent compensations for coarticulation. In the first experiment, the stimuli were members of three /da/-to-/ga/ continua hybridized from natural speech. The monosyllables had originally been produced in disyllables /ada/ and /aga/ to make Continuum 1, /alda/ and /alga/ (Continuum 2), and /arda/ and /arga/ (Continuum 3). Members of the second and third continua were influenced by carryover coarticulation from the preceding /l/ or /r/ context. Listeners showed compensation for this carryover coarticulation in the absence of the precursor /al/ or /ar/ syllables. This rules out an account in which compensation for coarticulation reflects a spectral contrast effect exerted by a precursor syllable, as previously has been proposed by Lotto, Holt, and colleagues (e.g., Lotto, Kluender, & Holt, 1997; Lotto & Kluender, 1998). The second experiment showed an enhancing effect of the endpoint monosyllables in Experiment 1 on identifications of preceding natural hybrids along an /al/-to-/ar/ continuum. That is, coarticulatory /l/ and /r/ information in /da/ and /ga/ syllables led to increased judgments of /l/ and /r/, respectively, in the precursor /al/-to-/ar/ continuum members. This was opposite to the effect, in Experiment 3, of /da/ and /ga/ syllables on preceding tones synthesized to range in frequency from approximately the ending F3 of /ar/ to the ending F3 of /al/. The enhancing, not contrastive, effect in Experiment 2, juxtaposed to the contrastive effect in Experiment 3, further disconfirms the spectral contrast account of compensation for coarticulation. A review of the literature buttresses that conclusion and provides strong support for an account that invokes listeners' attention to information in speech for the occurrence of gestural overlap.  相似文献   

5.
For native speakers of English and several other languages, preceding vocalic duration andFi offset frequency are two of the cues that convey the stop consonant voicing distinction in wordfinal position. For speakers learning English as a second language, there are indications that use of vocalic duration, but notFl offset frequency, may be hindered by a lack of experience with phonemic (i.e., lexical) vowel length (the “phonemic vowel length account”: Crowther & Mann, 1992). In this study, native speakers of Arabic, a language that includes a phonemic vowel length distinction, were tested for their use of vocalic duration andF1 offset in production and perception of the English consonant-vowel-consonant forms pod and pot. The phonemic vowel length hypothesis predicts that Arabic speakers should use vocalic duration extensively in production and perception. On the contrary, experiment l repealed that, consistent with Flege and Port’s (1981) findings, they produced only slightly (but significantly) longer vocalic segments in their pod tokens. It further indicated that their productions showed a significant variation inFl offset as a function of final stop voicing. Perceptual sensitivity to vocalic duration andFl offset as voicing cues was tested in two experiments. In experiment 2, we employed a factorial combination of these two cues and a finely spaced vocalic duration continuum. Arabic speakers did not appear to be very sensitive to vocalic duration, but they were abort as sensitive as native English speakers toF1 offset frequency. In Experiment 3, we employed a one-dimensional continuum of more widely spaced stimuli that varied only vocalic duration. Arabic speakers showed native-English-like sensitivity to vocalic duration- Anexplanation based on tie perceptual anchor theory of context coding (Braida et al., 1984; Macmillan, 1987; Macmillan, Braida, & Goldberg, 1987) and phoneme perception theory (Schouten & Van Hessen, 2992) is offered to reconcile the apparently contradictory perceptual findings. The explanation does not attribute native-English-like voicing perception to the Ambit subjects. The findings in this study call fox a modification of the phonemic vowel length hypothesis.  相似文献   

6.
The third-formant (F3) transition of a three-formant /da/ or /ga/ syllable was extracted and replaced by sine-wave transitions that followed the F3 centre frequency. The syllable without the F3 transition (base) was always presented at the left ear, and a /da/ (falling) or /ga/ (rising) sine-wave transition could be presented at either the left, the right, or both ears. The listeners perceived the base as a syllable, and the sine-wave transition as a non-speech whistle, which was lateralized near the left ear, the right ear, or the middle of the head, respectively. In Experiment 1, the sine-wave transition strongly influenced the identity of the syllable only when it was lateralized at the same ear as the base (left ear). Phonetic integration between the base and the transitions became weak, but was not completely eliminated, when the latter was perceived near the middle of the head or at the opposite ear as the base (right ear). The second experiment replicated these findings by using duplex stimuli in which the level of the sine-wave transitions was such that the subjects could not reliably tell whether a /da/ or a /ga/ transition was present at the same ear as the base. This condition was introduced in order to control for the possibility that the subjects could have identified the syallables by associating a rising or falling transition presented at the left ear with a /da/ or /ga/ percept. Alternative suggestions about the relation between speech and non-speech perceptual processes are discussed on the basis of these results.  相似文献   

7.
Three experiments are reported that collectively show that listeners perceive speech sounds as contrasting auditorily with neighboring sounds. Experiment 1 replicates the well-established finding that listeners categorize more of a [d–g] continuum as [g] after [l] than after [r]. Experiments 2 and 3 show that listeners discriminate stimuli in which the energy concentrations differ in frequency between the spectra of neighboring sounds better than those in which they do not differ. In Experiment 2, [alga–arda] pairs, in which the energy concentrations in the liquid-stop sequences are H(igh) L(ow)–LH, were more discriminable than [alda–arga] pairs, in which they are HH–LL. In Experiment 3, [da] and [ga] syllables were more easily discriminated when they were preceded by lower and higher pure tones, respectively—that is, tones that differed from the stops’ higher and lower F3 onset frequencies—than when they were preceded by H and L pure tones with similar frequencies. These discrimination results show that contrast with the target’s context exaggerates its perceived value when energy concentrations differ in frequency between the target’s spectrum and its context’s spectrum. Because contrast with its context does more that merely shift the criterion for categorizing the target, it cannot be produced by neural adaptation. The finding that nonspeech contexts exaggerate the perceived values of speech targets also rules out compensation for coarticulation by showing that their values depend on the proximal auditory qualities evoked by the stimuli’s acoustic properties, rather than the distal articulatory gestures.  相似文献   

8.
Neisser, Hoenig, and Goldstein (1969) reduced the “stimulus prefix effect” (diminished recall of seven digits preceded by a redundant prefix) when the redundant prefix and the recall digits were produced by different speakers. In the present studies, similar results were obtained using one speaker only, but with the prefix and recall digits spoken separately in different utterances and combined by tape splicing. The results support a hypothesis concerning the perception of intact, wholistically organized articulatory units. A second hypothesis, also based on the idea of intact articulatory units, was tested.  相似文献   

9.
10.
It has been demonstrated using the “silent-center” (SC) syllable paradigm that there is sufficient information in syllable onsets and offsets,taken together, to support accurate identification of vowels spoken in both citation-form syllables and syllables spoken in sentence context. Using edited natural speech stimuli, the present study examined the identification of American English vowels when increasing amounts of syllable onsetsalone or syllable offsetsalone were presented in their original sentence context. The stimuli were /d/-vowel-/d/ syllables spoken in a short carrier sentence by a male speaker. Listeners attempted to identify the vowels in experimental conditions that differed in the number of pitch periods presented and whether the pitch periods were from syllable onsets or syllable off-sets. In general, syllable onsets were more informative than syllable offsets, although neither onsets nor offsets alone specified vowel identity as well as onsets and offsets together (SC syllables). Vowels differed widely in ease of identification; the diphthongized long vowels /e/, /ae/, /o/ were especially difficult to identify from syllable offsets. Identification of vowels as “front” or “back” was accurate, even from short samples of the syllable; however, vowel "height" was quite difficult to determine, again, especially from syllable offsets. The results emphasize the perceptual importance of time-varying acoustic parameters, which are the direct consequence of the articulatory dynamics involved in producing syllables.  相似文献   

11.
When synthetic fricative noises from a [∫]-[s] continuum are followed by [a] or [u] (with appropriate formant transitions), listeners perceive more instances of [s] in the context of [u] than in the context of [a]. Presumably, this reflects a perceptual adjustment for the coarticulatory effect of rounded vowels on preceding fricatives. In Experiment 1, we found that varying the duration of the fricative noise leaves the perceptual context effect unchanged, whereas insertion of a silent interval following the noise reduces the effect substantially. Experiment 2 suggested that it is temporal separation rather than the perception of an intervening stop consonant that is responsible for this reduction, in agreement with recent, analogous observations on anticipatory coarticulation. In Experiment 3, we showed that the vowel context effect disappears when the periodic stimulus portion is synthesized so as to contain no formant transitions. To dissociate the contribution of formant transitions from contextual effects due to vowel quality per se, Experiment 4 employed synthetic fricative noises followed by periodic portions excerpted from naturally produced [∫a], [sa], [∫u], and [su]. The results showed strong and largely independent effects of formant transitions and vowel quality on fricative perception. In addition, we found a strong speaker (male vs. female) normalization effect. All three influences on fricative perception were reduced by temporal separation of noise and periodic stimulus portions. Although no single hypothesis can explain all of our results, they are generally supportive of the view that some knowledge of the dynamics of speech production has a role in speech perception.  相似文献   

12.
It is well known that the formant transitions of stop consonants in CV and VC syllables are roughly the mirror image of each other in time. These formant motions reflect the acoustic correlates of the articulators as they move rapidly into and out of the period of stop closure. Although acoustically different, these formant transitions are correlated perceptually with similar phonetic segments. Earlier research of Klatt and Shattuck (1975) had suggested that mirror image acoustic patterns resembling formant transitions were not perceived as similar. However, mirror image patterns could still have some underlying similarity which might facilitate learning, recognition, and the establishment of perceptual constancy of phonetic segments across syllable positions. This paper reports the results of four experiments designed to study the perceptual similarity of mirror-image acoustic patterns resembling the formant transitions and steady-state segments of the CV and VC syllables /ba/, /da/, /ab/, and /ad/. Using a perceptual learning paradigm, we found that subjects could learn to assign mirror-image acoustic patterns to arbitrary response categories more consistently than they could do so with similar arrangements of the same patterns based on spectrotemporal commonalities. Subjects respond not only to the individual components or dimensions of these acoustic patterns, but also process entire patterns and make use of the patterns’ internal organization in learning to categorize them consistently according to different classification rules.  相似文献   

13.
This study investigated the acoustic correlates of perceptual centers (p-centers) in CV and VC syllables and developed an acoustic p-center model. In Part 1, listeners located syllables’ p-centers by a method-of-adjustment procedure. The CV syllables contained the consonants /?/, /r/, /n /, /t/, /d/, /k/, and /g/; the VCs, the consonants /?/, /r/, and /n/. The vowel in all syllables was /a/. The results of this experiment replicated and extended previous findings regarding the effects of phonetic variation on p-centers. In Part 2, a digital signal processing procedure was used to acoustically model p-center perception. Each stimulus was passed through a six-band digital filter, and the outputs were processed to derive low-frequency modulation components. These components were weighted according to a perceived modulation magnitude function and recombined to create sixpsychoacoustic envelopes containing modulation energies from 3 to 47 Hz. In this analysis, p-centers were found to be highly correlated with the time-weighted function of the rate-of-change in the psychoacoustic envelopes, multiplied by the psychoacoustic envelope magnitude increment. The results were interpreted as suggesting (1) the probable role of low-frequency energy modulations in p-center perception, and (2) the presence of perceptual processes that integrate multiple articulatory events into a single syllabic event.  相似文献   

14.
刘文理  祁志强 《心理科学》2016,39(2):291-298
采用启动范式,在两个实验中分别考察了辅音范畴和元音范畴知觉中的启动效应。启动音是纯音和目标范畴本身,目标音是辅音范畴和元音范畴连续体。结果发现辅音范畴连续体知觉的范畴反应百分比受到纯音和言语启动音影响,辅音范畴知觉的反应时只受言语启动音影响;元音范畴连续体知觉的范畴反应百分比不受两种启动音影响,但元音范畴知觉的反应时受到言语启动音影响。实验结果表明辅音范畴和元音范畴知觉中的启动效应存在差异,这为辅音和元音范畴内在加工机制的差异提供了新证据。  相似文献   

15.
A series of experiments is reported that investigated the pattern of acoustic information specifying place and manner of stop consonants in medial position after [s]. In both production and perception, information for stop place includes the spectrum of the fricative at offset, the duration of the silent closure interval, the spectral relationship between the frequency of the stop release burst and the following periodically excited formants, and the spectral and temporal characteristics of the first formant transition. Similarly, the information for stop manner includes the duration of silent closure, the frequency of the first formant at the release, the magnitude of the first formant transition, and the proximity of the second and third formants at release. A relationship was shown to exist in perception between the spectral characteristics of the first formant and the duration of the silent closure required to hear a stop. This appears to reciprocate the covariation of these parameters in production across different places of articulation and different vocalic contexts. The existence of perceptual sensitivity to a wide range of the acoustic consequences of production questions the efficacy of accounts of speech perception in terms of the fractionation of the signal into elemental acoustic cues, which are then integrated to yield a phonetic percept. It is argued that it is inappropriate to ascribe a psychological status to cues whose only reality is their operational role as physical parameters whose manipulation can change the phenotic interpretation of a signal. It is suggested that the metric of the information for phonetic perception cannot be that of the cues; rather, a metric should be sought in which acoustic and articulatory dynamics are isomorphic.  相似文献   

16.
Five-year-old children were tested for perceptual trading relations between a temporal cue (silence duration) and a spectral cue (F1 onset frequency) for the “say-stay” distinction. Identification functions were obtained for two synthetic “say-stay” continua, each containing systematic variations in the amount of silence following the /s/ noise. In one continuum, the vocalic portion had a lower F1 onset than in the other continuum. Children showed a smaller trading relation than has been found with adults. They did not differ from adults, however, in their perception of an “ay-day” continuum formed by varying F1 onset frequency only. The results of a discrimination task in which the two acoustic cues were made to “cooperate” or “conflict” phonetically supported the notion of perceptual equivalence of the temporal and spectral cues along a single phonetic dimension. The results indicate that young children, like adults, perceptually integrate multiple cues to a speech contrast in a phonetically relevant manner, but that they may not give the same perceptual weights to the various cues as do adults.  相似文献   

17.
18.
It has been demonstrated using the "silent-center" (SC) syllable paradigm that there is sufficient information in syllable onsets and offsets, taken together, to support accurate identification of vowels spoken in both citation-form syllables and syllables spoken in sentence context. Using edited natural speech stimuli, the present study examined the identification of American English vowels when increasing amounts of syllable onsets alone or syllable offsets alone were presented in their original sentence context. The stimuli were /d/-vowel-/d/ syllables spoken in a short carrier sentence by a male speaker. Listeners attempted to identify the vowels in experimental conditions that differed in the number of pitch periods presented and whether the pitch periods were from syllable onsets or syllable offsets. In general, syllable onsets were more informative than syllable offsets, although neither onsets nor offsets alone specified vowel identity as well as onsets and offsets together (SC syllables). Vowels differed widely in ease of identification; the diphthongized long vowels /e/, /ae/, /o/ were especially difficult to identify from syllable offsets. Identification of vowels as "front" or "back" was accurate, even from short samples of the syllable; however, vowel "height" was quite difficult to determine, again, especially from syllable offsets. The results emphasize the perceptual importance of time-varying acoustic parameters, which are the direct consequence of the articulatory dynamics involved in producing syllables.  相似文献   

19.
This paper shows that maximal rate of speech varies as a function of syllable structure. For example, CCV syllables such as [sku] and CVC syllables such as [kus] are produced faster than VCC syllables such as [usk] when subjects repeat these syllables as fast as possible. Spectrographic analyses indicated that this difference in syllable duration was not confined to any one portion of the syllables: the vowel, the consonants and even the interval between syllable repetitions was longer for VCC syllables than for CVC and CCV syllables. These and other findings could not be explained in terms of word frequency, transition frequency of adjacent phonemes, or coarticulation between segments. Moreover, number of phonemes was a poor predictor of maximal rate for a wide variety of syllable structures, since VCC structures such as [ulk] were produced slower than phonemically longer CCCV structures such as [sklu], and V structures such as [a] were produced no faster than phonemically longer CV structures such as [ga]. These findings could not be explained by traditional models of speech production or articulatory difficulty but supported a complexity metric derived from a recently proposed theory of the serial production of syllables. This theory was also shown to be consistent with the special status of CV syllables suggested by Jakobson as well as certain aspects of speech errors, tongue-twisters and word games such as Double Dutch.  相似文献   

20.
Several experiments investigate voicing judgments in minimal pairs likerabid-rapid when the duration of the first vowel and the medial stop are varied factorially and other cues for voicing remain ambiguous. In Experiments 1 and 2, in which synthetic labial and velar-stop voicing pairs are investigated, the perceptual boundary along a continuum of silent consonant durations varies in constant proportion to increases in the duration of the preceding vocalic interval. In Experiment 3, it is shown that speaking tempo external to the test word has far smaller effects on a closure duration boundary for voicing than does the tempo within the test word. Experiment 4 shows that, even within the word, it is primarily the preceding vowel that accounts for changes in the consonant duration effects. Furthermore, in Experiments 3 and 4, the effects of timing outside the vowel-consonant interval are independent of the duration of that interval itself. These findings suggest that consonant/vowel ratio serves as a primary acoustic cue for English voicing in syllable-final position and imply that this ratio possibly is directly extracted from the speech signal.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号