首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
An interesting phenomenon in human speech perception is the trading relation, in which two different acoustic cues both signal the same phonetic percept. The present study compared American English, Spanish, and monkey listeners in their perception of the trading relation between gap duration and F1 transition onset frequency in a synthetic say-stay continuum. For all the subjects, increased gap duration caused perception to change from say to stay; however, subjects differed in the extent to which the F1 cue traded with gap duration. For American English listeners, a change from a low to a high F1 onset caused a phoneme boundary shift of 26 msec toward shorter gap durations, indicating a strong trading relation. For Spanish listeners, the shift was significantly smaller at 13.7 msec, indicating a weaker trading relation. For monkeys, there was no shift at all, indicating no trading relation. These results provide evidence that the say-stay trading relation is dependent on perceptual learning from linguistic exposure.  相似文献   

2.
The perception of the distinction between /r/ and /l/ by native speakers of American English and of Japanese was studied using natural and synthetic speech. The American subjects were all nearly perfect at recognizing the natural speech sounds, whereas there was substantial variation among the Japanese subjects in their accuracy of recognizing /r/ and /l/ except in syllable-final position. A logit model, which additively combined the acoustic information conveyed byF1-transition duration and byF3-onset frequency, provided a good fit to the perception of synthetic /r/ and /l/ by the American subjects. There was substantial variation among the Japanese subjects in whether theF1 andF3 cues had a significant effect on their classifications of the synthetic speech. This variation was related to variation in accuracy of recognizing natural /r/ and /l/, such that greater use of both theF1 cue and theF3 cue in classifying the synthetic speech sounds was positively related to accuracy in recognizing the natural sounds. However, multiple regression showed that use of theF1 cue did not account for significant variance in natural speech performance beyond that accounted for by theF3 cue, indicating that theF3 cue is more important than theF1 cue for Japanese speakers learning English. The relation between performance on natural and synthetic speech also provides external validation of the logit model by showing that it predicts performance outside of the domain of data to which it was fit.  相似文献   

3.
For native speakers of English and several other languages, preceding vocalic duration andFi offset frequency are two of the cues that convey the stop consonant voicing distinction in wordfinal position. For speakers learning English as a second language, there are indications that use of vocalic duration, but notFl offset frequency, may be hindered by a lack of experience with phonemic (i.e., lexical) vowel length (the “phonemic vowel length account”: Crowther & Mann, 1992). In this study, native speakers of Arabic, a language that includes a phonemic vowel length distinction, were tested for their use of vocalic duration andF1 offset in production and perception of the English consonant-vowel-consonant forms pod and pot. The phonemic vowel length hypothesis predicts that Arabic speakers should use vocalic duration extensively in production and perception. On the contrary, experiment l repealed that, consistent with Flege and Port’s (1981) findings, they produced only slightly (but significantly) longer vocalic segments in their pod tokens. It further indicated that their productions showed a significant variation inFl offset as a function of final stop voicing. Perceptual sensitivity to vocalic duration andFl offset as voicing cues was tested in two experiments. In experiment 2, we employed a factorial combination of these two cues and a finely spaced vocalic duration continuum. Arabic speakers did not appear to be very sensitive to vocalic duration, but they were abort as sensitive as native English speakers toF1 offset frequency. In Experiment 3, we employed a one-dimensional continuum of more widely spaced stimuli that varied only vocalic duration. Arabic speakers showed native-English-like sensitivity to vocalic duration- Anexplanation based on tie perceptual anchor theory of context coding (Braida et al., 1984; Macmillan, 1987; Macmillan, Braida, & Goldberg, 1987) and phoneme perception theory (Schouten & Van Hessen, 2992) is offered to reconcile the apparently contradictory perceptual findings. The explanation does not attribute native-English-like voicing perception to the Ambit subjects. The findings in this study call fox a modification of the phonemic vowel length hypothesis.  相似文献   

4.
The effects of stimulus duration and spatial separation on the illusion of apparent motion in the auditory modality were examined. Two narrow-band noise sources (40 dB, A-weighted) were presented through speakers separated in space by 2.5°, 5°, or 100, centered about the subject’s midline. The duration of each stimulus was 5, 10, or 50 msec. On each trial, the sound pair was temporally separated by 1 of 10 interstimulus onset intervals (ISOIs): 0, 2, 4, 6, 8, 10, 15, 20, 50, or 70 msec. Five subjects were tested in nine trial blocks; each block represented a particular spatial-separation-duration combination. Within a trial block, each ISOI was presented 30 times each, in random order. Subjects were instructed to listen to the stimulus sequence and classify their perception of the sound into one of five categories: single sound, simultaneous sounds, continuous motion, broken motion, or successive sounds. Each subject was also required to identify the location of the first-occurring stimulus (left or right). The percentage of continuous-motion responses was significantly affected by the ISOI [F(9,36) = 5.67,p < .001], the duration × ISOI interaction [F(18,72) = 3.54,p < .0001], and the separation × duration × ISOI interaction [F(36,144) = 1.51,p < .05]. The results indicate that a minimum duration is required for the perception of auditory apparent motion. Little or no motion was reported at durations of 10 msec or less. At a duration of 50 msec, motion was reported most often for ISOIs of 20–50 msec. The effect of separation appeared to be limited to durations and-ISOIs during which little motion was perceived.  相似文献   

5.
The fundamental frequency (F0) of the voice is used to convey information about both linguistic and affective distinctions. However, no research has directly investigated how these two types of distinctions are simultaneously encoded in speech production. This study provides evidence thatF0 prominences intended to convey linguistic or affective distinctions can be differentiated by their influence on the amount of final-syllableF0 rise used to signal a question. Specifically, a trading relation obtains when theF0 prominence is used to convey emphatic stress. That is, the amount of finalsyllableF0 rise decreases as theF0 prominence increases. When theF0 prominence is used to convey affect, no trading relation is observed.  相似文献   

6.
In three experiments, the processing of lexical tone in Cantonese was examined. Cantonese listeners more often accepted a nonword as a word when the only difference between the nonword and the word was in tone, especially when theF0 onset difference between correct and erroneous tone was small.Same-different judgments by these listeners were also slower and less accurate when the only difference between two syllables was in tone, and this was true whether theF0 onset difference between the two tones was large or small. Listeners with no knowledge of Cantonese produced essentially the samesame-different judgment pattern as that produced by the native listeners, suggesting that the results display the effects of simple perceptual processing rather than of linguistic knowledge. It is argued that the processing of lexical tone distinctions may be slowed, relative to the processing of segmental distinctions, and that, in speeded-response tasks, tone is thus more likely to be misprocessed than is segmental structure.  相似文献   

7.
The acoustical spectrum of the five Spanish vowels |a, e, i, o, u| has been delimited to show the areas covered by F 1 , F 2 , and F 3 and the relative distribution energy among the formants. Through the analysis of the spectral components of vowels, isolated and in consonantal context, it is possible to estimate the different weight of each formant in vowel identification. At least for isolated vowels,F 2 andF 3 seem to be effective for the identification of [i] and [e] while theF 1 andF 2 carry the weight for the identification of [o] and [u]. The cue to differentiate [a] seems to beF 2. Spanish vowels are compared with cardinals and North American English vowels. There is no correlation with cardinal vowels while similarities are found with English vowels.  相似文献   

8.
The performance of Spanish-English bilinguals in two perception tasks, using a synthetic speech continuum varying in voice onset time, was compared with the performance of Spanish and English monolinguals. Voice onset time in speech production was also compared between these groups. Results in perception of bilinguals differed from that of both monolingual groups. Results of bilingual production in their two languages conformed with results obtained from each monolingual group. The perceptual results are interpreted in terms of differences in the use of available acoustic cues by bilingual and monolingual listeners of English and Spanish.  相似文献   

9.
The effects of variations in response categories, subjects' perception of natural speech, and stimulus range on the identification of American English /r/ and /l/ by native speakers of Japanese were investigated. Three experiments using a synthesized /rait/-/lait/ series showed that all these variables affected identification and discrimination performance by Japanese subjects. Furthermore, some of the perceptual characteristics of /r/ and /l/ for Japanese listeners were clarified: (1) Japanese listeners identified some of the stimuli of the series as /w/. (2) A positive correlation between the perception of synthesized stimuli and naturally spoken stimuli was found. Japanese listeners who were able to easily identify naturally spoken stimuli perceived the synthetic series categorically but still perceived a /w/ category on the series. (3) The stimulus range showed a striking effect on identification consistency; identification of /r/ and /l/ was strongly affected by the stimulus range, the /w/ identification less so. This indicates that Japanese listeners tend to make relative judgments between /r/ and /l/.  相似文献   

10.
The effects of variations in response categories, subjects’ perception of natural speech, and stimulus range on the identification of American English /r/ and /l/ by native speakers of Japanese were investigated. Three experiments using a synthesized /rait/-/lait/ series showed that all these variables affected identification and discrimination performance by Japanese-subjects. Furthermore, some of the perceptual characteristics of /r/ and /l/ for Japanese listeners were clarified: (1) Japanese listeners identified some of the stimuli of the series-as/w/.(2). Apositive correlation between the perception of synthesized stimuli and naturally-spoken stimuli was found. Japanese listeners who were able to easily identify naturally spoken stimuli perceived the synthetic series categorically but still perceived a /w/ category on the series. (3) The stimulus range showed a striking effect on identification consistency; identification of /r/ and /l/ was strongly affected by the stimulus range, the /w/ identification less so. This indicates that Japanese listeners tend to make relative judgments between /r/ and /l/.  相似文献   

11.
Five-year-old children were tested for perceptual trading relations between a temporal cue (silence duration) and a spectral cue (F1 onset frequency) for the “say-stay” distinction. Identification functions were obtained for two synthetic “say-stay” continua, each containing systematic variations in the amount of silence following the /s/ noise. In one continuum, the vocalic portion had a lower F1 onset than in the other continuum. Children showed a smaller trading relation than has been found with adults. They did not differ from adults, however, in their perception of an “ay-day” continuum formed by varying F1 onset frequency only. The results of a discrimination task in which the two acoustic cues were made to “cooperate” or “conflict” phonetically supported the notion of perceptual equivalence of the temporal and spectral cues along a single phonetic dimension. The results indicate that young children, like adults, perceptually integrate multiple cues to a speech contrast in a phonetically relevant manner, but that they may not give the same perceptual weights to the various cues as do adults.  相似文献   

12.
The dichotic perception of Mandarin tones by native and nonnative listeners was examined in order to investigate the lateralization of lexical tone. Twenty American listeners with no tone language background and 20 Chinese listeners were asked to identify dichotically presented tone pairs by identifying which tone they heard in each ear. For the Chinese listeners, 57% of the total errors occurred via the left ear, indicating a significant right ear advantage. However, the American listeners revealed no significant ear preference, with 48% of the errors attributable to the left ear. These results indicated that Mandarin tones are predominantly processed in the left hemisphere by native Mandarin speakers, whereas they are bilaterally processed by American English speakers with no prior tone experience. The results also suggest that the left hemisphere superiority for native Mandarin tone processing is similar to native processing of other tone languages.  相似文献   

13.
The perception of the distinction between /r/ and /l/ by native speakers of American English and of Japanese was studied using natural and synthetic speech. The American subjects were all nearly perfect at recognizing the natural speech sounds, whereas there was substantial variation among the Japanese subjects in their accuracy of recognizing /r/ and /l/ except in syllable-final position. A logit model, which additively combined the acoustic information conveyed by F1-transition duration and by F3-onset frequency, provided a good fit to the perception of synthetic /r/ and /l/ by the American subjects. There was substantial variation among the Japanese subjects in whether the F1 and F3 cues had a significant effect on their classifications of the synthetic speech. This variation was related to variation in accuracy of recognizing natural /r/ and /l/, such that greater use of both the F1 cue and the F3 cue in classifying the synthetic speech sounds was positively related to accuracy in recognizing the natural sounds. However, multiple regression showed that use of the F1 cue did not account for significant variance in natural speech performance beyond that accounted for by the F3 cue, indicating that the F3 cue is more important than the F1 cue for Japanese speakers learning English. The relation between performance on natural and synthetic speech also provides external validation of the logit model by showing that it predicts performance outside of the domain of data to which it was fit.  相似文献   

14.
The telling fact about duplex perception is that listeners integrate into a unitary phonetic percept signals that are coherent from a phonetic point of view, even though the signals are, on purely auditory grounds, separate sources. Here we explore the limits on the integration of a sinusoidal consonant cue (theF3 transition for [da] vs. [ga]) with the resonances of the remainder of the syllable. Perceiving duplexly, listeners hear the whistle of the sinusoid, but also the [da] and [ga] for which the sinusoid provides the critical information. In the first experiment, phonetic integration was significantly reduced, but not to zero, by a precursor that extended the transition cue forward in time so that it started 50 msec before the cue. The effect was the same above and below the duplexity threshold (the intensity of sinusoid in the combined pattern at which the whistle was just barely audible). In the second experiment, integration was reduced once again by the precursor, and also, but only below the duplexity threshold, by harmonics of the cues that were simultaneous with it. The third experiment showed that the simultaneous harmonics reduced phonetic integration only by serving as distractors while also permitting the conclusion that the precursor produced its effects by making the cue part of a coherent and competing auditory pattern, and so “capturing” it. The fourth experiment supported this interpretation by showing that for some subjects the amount of capture was reduced when the capturing tone was itself captured by being made part of a tonal complex. The results support the assumption that the independent phonetic system will integrate across disparate sources according to the cohesive power of that system as measured against the evidence for separate sources.  相似文献   

15.
We have previously identified categorical individual differences in the occurrence of temporal brightness enhancement (TBE) by using a simultaneous brightness discrimination paradigm (Bowen & Markell, 1980).TBE is a nonmonotonic relation between brightness and pulse duration, pulses of intermediate duration (75–125 msec) can appear brighter than longer or shorter pulses of the same luminance. Three classes of observers can be defined based on whether they perceive TBE under one of two conditions of temporal asynchrony between a short test pulse and a longer (500 msec) comparison pulse:simultaneous onset of the pulses orsimultaneous offset. Type A observers show TBE for both asynchrony conditions; Type B observers show the effect for simultaneous offset but not simultaneous onset; Type C observers do not show TBE for either asynchrony. In the present study, we show that Type A and Type C observers maintain a constant brightness-duration relation as the asynchrony between test and comparison pulses is varied from simultaneous onset to simultaneous offset. Type B observers show a gradual shift in the brightness-duration relation as asynchrony changes. In a separate experiment, we find that practice has little effect on Type A and Type B observers but that Type C observers may change in classification to Types A and B over as few as five experimental sessions. The hypothesis that individual differences are due to differential “weighting” of chromatic (sustained) and achromatic (transient) visual channels is discussed.  相似文献   

16.
In a series of three experiments, the effect of marker duration on temporal discrimination was evaluated with empty auditory intervals bounded by markers ranging from 3 to 300 msec or presented as a gap within a continuous tone. As a measure of performance, difference thresholds in relation to a base duration of 50 msec were computed. Performance on temporal discrimination was significantly better with markers ranging from 3 to 150 msec than with markers ranging from 225 to 300 msec or under the gap condition. However, within each range of marker duration (3–150 msec; 225–300 msec or gap) performance did not differ significantly. A fourth experiment provided evidence that the effect of marker duration cannot be explained in terms of marker-induced masking. A good approximation of the relationship between marker duration and temporal discrimination performance in the present experiments is a smooth step function, which can account for 99.3% of the variance of mean discrimination performance. Thus, the findings of the present study point to the conclusion that two different mechanisms are used in the processing of temporal information, depending on the duration of the auditory markers. The tradeoff point for the hypothetical shift from one timing mechanism to the other may be found at a marker duration of approximately 200 msec.  相似文献   

17.
Two experiments are reported in which the possibility that auditory attention may be controlled in a stimulus-driven manner by duration, intensity, and timbre cues was examined. In both experiments, listeners were presented with a cue followed, after a variable time period of a 150-, 450-, or 750-msec stimulus onset asynchrony (SOA), by a target. In three different conditions for each experiment, the duration, intensity, or timbre relation between the cue and the target was varied so that, on 50% of the trials, the two sounds were identical and, on 50% of the trials, the two sounds were different in the manipulated feature. The two experiments differed only in the judgment required, with listeners in Experiment 1 identifying the duration, intensity, or timbre of the target and listeners in Experiment 2 indicating whether the target incorporated a brief silent gap. In both experiments, performance was observed to depend on both the similarity of and the time between the cue and the target. Specifically, whereas at the 150-msec SOA performance was best when the target was identical to the preceding cue, at the 750-msec SOA performance was best when the cue and the target differed. This pattern establishes the existence of duration-, intensity-, and timbre-based auditory inhibition of return. The theoretical implications of these results are considered.  相似文献   

18.
Two experiments are reported in which the possibility that auditory attention may be controlled in a stimulus-driven manner by duration, intensity, and timbre cues was examined. In both experiments, listeners were presented with a cue followed, after a variable time period of a 150-, 450-, or 750-msec stimulus onset asynchrony (SOA), by a target. In three different conditions for each experiment, the duration, intensity, or timbre relation between the cue and the target was varied so that, on 50% of the trials, the two sounds were identical and, on 50% of the trials, the two sounds were different in the manipulated feature. The two experiments differed only in the judgment required, with listeners in Experiment 1 identifying the duration, intensity, or timbre of the target and listeners in Experiment 2 indicating whether the target incorporated a brief silent gap. In both experiments, performance was observed to depend on both the similarity of and the time between the cue and the target. Specifically, whereas at the 150-msec SOA performance was best when the target was identical to the preceding cue, at the 750-msec SOA performance was best when the cue and the target differed. This pattern establishes the existence of duration-, intensity-, and timbre-based auditory inhibition of return. The theoretical implications of these results are considered.  相似文献   

19.
This paper investigates the difficulties adult second language (L2) users of English encounter with plosive consonants in the L2. It presents the results of a task examining the acquisition of plosive voicing contrasts by college students with Cypriot Greek (CG) linguistic background. The task focused on the types of errors involving plosive consonants indicating that performance was significantly better in the voiceless plosive category. Participants were able to perceive voiced plosives but they treated such instances as a /nasal + voiced plosive/ sequence. Therefore, the question raised concerns different phonological contrasts realised through similar phonetic cues. The patterns observed suggested that this gap between phonetic cues and phonological contrast might explain why CG users have difficulties perceiving voiced English plosives. In this context, voice onset time (VOT) differences between the L1 and L2 are of crucial importance. In English, voiced plosives are characterised by short lag VOT while their voiceless counterparts fall within the long lag VOT continuum. The same phonetic contrast is used in CG to differentiate between single and geminate voiceless plosives. The results are discussed in relation to the frameworks of second language phonology and speech perception suggesting that the difficulties faced by the L2 listeners support the operation of a phonetic-phonological challenge.  相似文献   

20.
ABSTRACT

Two experiments evaluated a potential explanation of categorical perception (CP) for place of articulation – namely, that listeners derive limited information from rapid spectral changes. Experiment 1 examined vowel context effects for /b/–/d/ continua that included consonant–vowel tokens with F2 onset frequencies that varied systematically from the F2 frequencies of their corresponding steady-states. Phoneme categorisation sharply shifted with F2 direction at locations along the continuum where discrimination performance peaked, indicating CP. Experiment 2 compared findings for a replicated condition against conditions with vowels reduced to match consonant duration or consonants extended to match vowels. CP was similarly obtained for replicated and vowel-reduced conditions. However, listeners frequently perceived diphthongs centrally on the consonant-extended continuum. Some listeners demonstrated CP, although aggregate performance appeared more continuous. These experiments indicate a model based upon the perceived direction of frequency transitions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号