首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The effects of variations in response categories, subjects’ perception of natural speech, and stimulus range on the identification of American English /r/ and /l/ by native speakers of Japanese were investigated. Three experiments using a synthesized /rait/-/lait/ series showed that all these variables affected identification and discrimination performance by Japanese-subjects. Furthermore, some of the perceptual characteristics of /r/ and /l/ for Japanese listeners were clarified: (1) Japanese listeners identified some of the stimuli of the series-as/w/.(2). Apositive correlation between the perception of synthesized stimuli and naturally-spoken stimuli was found. Japanese listeners who were able to easily identify naturally spoken stimuli perceived the synthetic series categorically but still perceived a /w/ category on the series. (3) The stimulus range showed a striking effect on identification consistency; identification of /r/ and /l/ was strongly affected by the stimulus range, the /w/ identification less so. This indicates that Japanese listeners tend to make relative judgments between /r/ and /l/.  相似文献   

2.
Recent evidence shows that listeners use abstract prelexical units in speech perception. Using the phenomenon of lexical retuning in speech processing, we ask whether those units are necessarily phonemic. Dutch listeners were exposed to a Dutch speaker producing ambiguous phones between the Dutch syllable-final allophones approximant [r] and dark [l]. These ambiguous phones replaced either final /r/ or final /l/ in words in a lexical-decision task. This differential exposure affected perception of ambiguous stimuli on the same allophone continuum in a subsequent phonetic-categorization test: Listeners exposed to ambiguous phones in /r/-final words were more likely to perceive test stimuli as /r/ than listeners with exposure in /l/-final words. This effect was not found for test stimuli on continua using other allophones of /r/ and /l/. These results confirm that listeners use phonological abstraction in speech perception. They also show that context-sensitive allophones can play a role in this process, and hence that context-insensitive phonemes are not necessary. We suggest there may be no one unit of perception.  相似文献   

3.
Twenty-one normally developing 3-year-old children were tested on two approximant consonant contrasts, rake-lake and wake-rake, and a control contrast, wake-bake. Perception was assessed in a two-choice picture identification paradigm; stimuli were (1) natural and computer synthesized “clear cases” of the minimal pairs, and (2) synthetic stimulus series which interpolated on acoustic dimensions that differentiate the minimal pairs. As a group, the children showed very accurate perception of the minimal pairs. Performance on the synthetic series yielded consistent identification of the endpoint stimuli and monotonic functions with abrupt crossovers at the phoneme boundary. Children who did not yet articulate /r/ and /l/ appropriately showed somewhat less consistent perception than children who produced all phonemes correctly.  相似文献   

4.
Previous work from our laboratories has shown that monolingual Japanese adults who were given intensive high-variability perceptual training improved in both perception and production of English /r/-/l/ minimal pairs. In this study, we extended those findings by investigating the long-term retention of learning in both perception and production of this difficult non-native contrast. Results showed that 3 months after completion of the perceptual training procedure, the Japanese trainees maintained their improved levels of performance on the perceptual identification task. Furthermore, perceptual evaluations by native American English listeners of the Japanese trainees’ pretest, posttest, and 3-month follow-up speech productions showed that the trainees retained their long-term improvements in the general quality, identifiability, and overall intelligibility of their English /r/-/l/ word productions. Taken together, the results provide further support for the efficacy of high-variability laboratory speech sound training procedures, and suggest an optimistic outlook for the application of such procedures for a wide range of “special populations.” nt]mis|This work was supported by NIDCD Training Grant DC-00012 and by NIDCD Research Grant DC-00111 to Indiana University.  相似文献   

5.
Previous work from our laboratories has shown that monolingual Japanese adults who were given intensive high-variability perceptual training improved in both perception and production of English /r/-/l/ minimal pairs. In this study, we extended those findings by investigating the long-term retention of learning in both perception and production of this difficult non-native contrast. Results showed that 3 months after completion of the perceptual training procedure, the Japanese trainees maintained their improved levels of performance of the perceptual identification task. Furthermore, perceptual evaluations by native American English listeners of the Japanese trainees' pretest, posttest, and 3-month follow-up speech productions showed that the trainees retained their long-term improvements in the general quality, identifiability, and overall intelligibility of their English/r/-/l/ word productions. Taken together, the results provide further support for the efficacy of high-variability laboratory speech sound training procedures, and suggest an optimistic outlook for the application of such procedures for a wide range of "special populations."  相似文献   

6.
We investigated the effects of two types of task instructions on performance on a voice sorting task by listeners who were either familiar or unfamiliar with the voices. Listeners were asked to sort 15 naturally varying stimuli from two voice identities into perceived identities. Half of the listeners sorted the recordings freely into as many identities as they perceived; the other half were forced to sort stimuli into two identities only. As reported in previous studies, unfamiliar listeners formed more clusters than familiar listeners. Listeners therefore perceived different naturally varying stimuli from the same identity as coming from different identities, while being highly accurate at telling apart the stimuli from different voices. We further show that a change in task instructions – forcing listeners to sort stimuli into two identities only – helped unfamiliar listeners to overcome this selective failure at ‘telling people together’. This improvement, however, came at the cost of an increase in errors in telling people apart. For familiar listeners, similar non-significant trends were apparent. Therefore, even when informed about correct number of identities, listeners may fail to accurately perceive identity further highlighting that voice identity perception in the context of natural within-person variability is a challenging task. We discuss our results in terms of similarities and differences to findings in the face perception literature and their importance in applied settings, such as forensic voice identification.  相似文献   

7.
刘文理  乐国安 《心理学报》2012,44(5):585-594
采用启动范式, 以汉语听者为被试, 考察了非言语声音是否影响言语声音的知觉。实验1考察了纯音对辅音范畴连续体知觉的影响, 结果发现纯音影响到辅音范畴连续体的知觉, 表现出频谱对比效应。实验2考察了纯音和复合音对元音知觉的影响, 结果发现与元音共振峰频率一致的纯音或复合音加快了元音的识别, 表现出启动效应。两个实验一致发现非言语声音能够影响言语声音的知觉, 表明言语声音知觉也需要一个前言语的频谱特征分析阶段, 这与言语知觉听觉理论的观点一致。  相似文献   

8.
In noisy situations, visual information plays a critical role in the success of speech communication: listeners are better able to understand speech when they can see the speaker. Visual influence on auditory speech perception is also observed in the McGurk effect, in which discrepant visual information alters listeners’ auditory perception of a spoken syllable. When hearing /ba/ while seeing a person saying /ga/, for example, listeners may report hearing /da/. Because these two phenomena have been assumed to arise from a common integration mechanism, the McGurk effect has often been used as a measure of audiovisual integration in speech perception. In this study, we test whether this assumed relationship exists within individual listeners. We measured participants’ susceptibility to the McGurk illusion as well as their ability to identify sentences in noise across a range of signal-to-noise ratios in audio-only and audiovisual modalities. Our results do not show a relationship between listeners’ McGurk susceptibility and their ability to use visual cues to understand spoken sentences in noise, suggesting that McGurk susceptibility may not be a valid measure of audiovisual integration in everyday speech processing.  相似文献   

9.
The question of whether sensitivity peaks at vowel boundaries (i.e., phoneme boundary effects) and sensitivity minima near excellent category exemplars (i.e., perceptual magnet effects) stem from the same stage of perceptual processing was examined in two experiments. In Experiment 1, participants gave phoneme identification and goodness ratings for 13 synthesized English /i/ and /e/ vowels. In Experiment 2, participants discriminated pairs of these vowels. Either the listeners discriminated the entire range of stimuli within each block of trials, or the range within each block was restricted to a single stimulus pair. In addition, listeners discriminated either one-step or two-step intervals along the stimulus series. The results demonstrated that sensitivity peaks at vowel boundaries were more influenced by stimulus range than were perceptual magnet effects; peaks in sensitivity near the /i/-/e/ boundary were reduced with restricted stimulus ranges and one-step intervals, but minima in discrimination near the best exemplars of /i/ were present in all conditions.  相似文献   

10.
The present study examined acoustic cue utilisation for perception of vocal emotions. Two sets of vocal-emotional stimuli were presented to 35 German and 30 American listeners: (1) sentences in German spoken with five different vocal emotions; and (2) systematically rate- or pitch-altered versions of the original emotional stimuli. In addition to response frequencies on emotional categories, activity ratings were obtained. For the systematically altered stimuli, slow rate was reliably associated with the “sad” label. In contrast, fast rate was classified as angry, frightened, or neutral. Manipulation of pitch variation was less potent than rate manipulation in influencing vocal emotional category choices. Reduced pitch variation was associated with perception as sad or neutral; greater pitch variation increased frightened, angry, and happy responses. Performance was highly similar for the two samples, although across tasks, German subjects perceived greater variability of activity in the emotional stimuli than did American participants.  相似文献   

11.
This study investigated the acoustic correlates of perceptual centers (p-centers) in CV and VC syllables and developed an acoustic p-center model. In Part 1, listeners located syllables’ p-centers by a method-of-adjustment procedure. The CV syllables contained the consonants /?/, /r/, /n /, /t/, /d/, /k/, and /g/; the VCs, the consonants /?/, /r/, and /n/. The vowel in all syllables was /a/. The results of this experiment replicated and extended previous findings regarding the effects of phonetic variation on p-centers. In Part 2, a digital signal processing procedure was used to acoustically model p-center perception. Each stimulus was passed through a six-band digital filter, and the outputs were processed to derive low-frequency modulation components. These components were weighted according to a perceived modulation magnitude function and recombined to create sixpsychoacoustic envelopes containing modulation energies from 3 to 47 Hz. In this analysis, p-centers were found to be highly correlated with the time-weighted function of the rate-of-change in the psychoacoustic envelopes, multiplied by the psychoacoustic envelope magnitude increment. The results were interpreted as suggesting (1) the probable role of low-frequency energy modulations in p-center perception, and (2) the presence of perceptual processes that integrate multiple articulatory events into a single syllabic event.  相似文献   

12.
When the auditory and visual components of spoken audiovisual nonsense syllables are mismatched, perceivers produce four different types of perceptual responses, auditory correct, visual correct, fusion (the so-called McGurk effect), and combination (i.e., two consonants are reported). Here, quantitative measures were developed to account for the distribution of the four types of perceptual responses to 384 different stimuli from four talkers. The measures included mutual information, correlations, and acoustic measures, all representing audiovisual stimulus relationships. In Experiment 1, open-set perceptual responses were obtained for acoustic /bɑ/ or /lɑ/ dubbed to video /bɑ, dɑ, gɑ, vɑ, zɑ, lɑ, wɑ, eɑ/. The talker, the video syllable, and the acoustic syllable significantly influenced the type of response. In Experiment 2, the best predictors of response category proportions were a subset of the physical stimulus measures, with the variance accounted for in the perceptual response category proportions between 17% and 52%. That audiovisual stimulus relationships can account for perceptual response distributions supports the possibility that internal representations are based on modality-specific stimulus relationships.  相似文献   

13.
We examined the effect of perceptual training on a well-established hemispheric asymmetry in speech processing. Eighteen listeners were trained to use a within-category difference in voice onset time (VOT) to cue talker identity. Successful learners (n=8) showed faster response times for stimuli presented only to the left ear than for those presented only to the right. The development of a left-ear/right-hemisphere advantage for processing a prototypically phonetic cue supports a model of speech perception in which lateralization is driven by functional demands (talker identification vs. phonetic categorization) rather than by acoustic stimulus properties alone.  相似文献   

14.
In the McGurk effect, perceptual identification of auditory speech syllables is influenced by simultaneous presentation of discrepant visible speech syllables. This effect has been found in subjects of different ages and with various native language backgrounds. But no McGurk tests have been conducted with prelinguistic infants. In the present series of experiments, 5-month-old English-exposed infants were tested for the McGurk effect. Infants were first gaze-habituated to an audiovisual /va/. Two different dishabituation stimuli were then presented: audio /ba/-visual /va/ (perceived by adults as /va/), and audio /da/-visual /va/ (perceived by adults as /da/). The infants showed generalization from the audiovisual /va/ to the audio /ba/-visual /va/ stimulus but not to the audio /da/-visual /va/ stimulus. Follow-up experiments revealed that these generalization differences were not due to a general preference for the audio /da/-visual /va/ stimulus or to the auditory similarity of /ba/ to /va/ relative to /da/. These results suggest that the infants were visually influenced in the same way as Englishspeaking adults are visually influenced.  相似文献   

15.
To test the effect of linguistic experience on the perception of a cue that is known to be effective in distinguishing between [r] and [l] in English, 21 Japanese and 39 American adults were tested on discrimination of a set of synthetic speech-like stimuli. The 13 “speech” stimuli in this set varied in the initial stationary frequency of the third formant (F3) and its subsequent transition into the vowel over a range sufficient to produce the perception of [r a] and [l a] for American subjects and to produce [r a] (which is not in phonemic contrast to [l a ]) for Japanese subjects. Discrimination tests of a comparable set of stimuli consisting of the isolated F3 components provided a “nonspeech” control. For Americans, the discrimination of the speech stimuli was nearly categorical, i.e., comparison pairs which were identified as different phonemes were discriminated with high accuracy, while pairs which were identified as the same phoneme were discriminated relatively poorly. In comparison, discrimination of speech stimuli by Japanese subjects was only slightly better than chance for all comparison pairs. Performance on nonspeech stimuli, however, was virtually identical for Japanese and American subjects; both groups showed highly accurate discrimination of all comparison pairs. These results suggest that the effect of linguistic experience is specific to perception in the “speech mode.”  相似文献   

16.
American English liquids /r/ and /l/ have been considered intermediate between stop consonants and vowels acoustically, articulatorily, phonologically, and perceptually. Cutting (1947a) found position-dependent ear advantages for liquids in a dichotic listening task: syllable-initial liquids produced significant right ear advantages, while syllable-final liquids produced no reliable ear advantages. The present study employed identification and discrimination tasks to determine whether /r/and /l/ are perceived differently depending on syllable position when perception is tested by a different method. Fifteen subjects listened to two synthetically produced speech series—/li/ to /ri/ and /il/ to /ir/—in which stepwise variations of the third formant cued the difference in consonant identity. The results indicated that: (1) perception did not differ between syllable positions (in contrast to the dichotic listening results), (2) liquids in both syllable positions were perceived categorically, and (3) discrimination of a nonspeech control series did not account for the perception of the speech sounds.  相似文献   

17.
Detection and identification thresholds were obtained for 6- and 10-year-old normal children and normal adults using five-formant synthesized consonant-vowel ([baHdflrfea]) stimuli. Sixyear-old children were found to have poorer detection than adults, just as they do for pure tones. For the identification task, the slopes of the performance-intensity functions were more shallow for 6-year-old children than for 10-year-olds and adults. Consequently, compared with 10-yearolds and adults, 6-year-old listeners require a greater increase in stimulus intensity above detection threshold to identify these stimuli at a high performance level. The influence of acoustic characteristics of the stimuli on all listeners is also discussed.  相似文献   

18.
Event-related potentials (ERPs) were utilized to study brain activity while subjects listened to speech and nonspeech stimuli. The effect of duplex perception was exploited, in which listeners perceive formant transitions that are isolated as nonspeech "chirps," but perceive formant transitions that are embedded in synthetic syllables as unique linguistic events with no chirp-like sounds heard at all (Mattingly et al., 1971). Brain ERPs were recorded while subjects listened to and silently identified plain speech-only tokens, duplex tokens, and tone glides (perceived as "chirps" by listeners). A highly controlled set of stimuli was developed that represented equivalent speech and nonspeech stimulus tokens such that the differences were limited to a single acoustic parameter: amplitude. The acoustic elements were matched in terms of number and frequency of components. Results indicated that the neural activity in response to the stimuli was different for different stimulus types. Duplex tokens had significantly longer latencies than the pure speech tokens. The data are consistent with the contention of separate modules for phonetic and auditory stimuli.  相似文献   

19.
Speech perception can be viewed in terms of the listener’s integration of two sources of information: the acoustic features transduced by the auditory receptor system and the context of the linguistic message. The present research asked how these sources were evaluated and integrated in the identification of synthetic speech. A speech continuum between the glide-vowel syllables /ri/ and /li/ was generated by varying the onset frequency of the third formant. Each sound along the continuum was placed in a consonant-cluster vowel syllable after an initial consonant /p/, /t/, /s/, and /v/. In English, both /r/ and /l/ are phonologically admissible following /p/ but are not admissible following /v/. Only /l/ is admissible following /s/ and only /r/ is admissible following /t/. A third experiment used synthetic consonant-cluster vowel syllables in which the first consonant varied between /b/ and /d and the second consonant varied between /l/ and /r/. Identification of synthetic speech varying in both acoustic featural information and phonological context allowed quantitative tests of various models of how these two sources of information are evaluated and integrated in speech perception.  相似文献   

20.
Spoken language perception may be constrained by a listener's cognitive resources, including verbal working memory (WM) capacity and basic auditory perception mechanisms. For Japanese listeners, it is unknown how, or even if, these resources are involved in the processing of pitch accent at the word level. The present study examined the extent to which native Japanese speakers could make correctness judgments on and categorize spoken Japanese words by pitch accent pattern, and how verbal WM capacity and acoustic pitch sensitivity related to perception ability. Results showed that Japanese listeners were highly accurate at judging pitch accent correctness (M = 93%), but that the more cognitively demanding accent categorization task yielded notably lower performance (M = 61%). Of chief interest was the finding that acoustic pitch sensitivity significantly predicted accuracy scores on both perception tasks, while verbal WM had a predictive role only for the categorization of a specific accent pattern. These results indicate first, that task demands greatly influence accuracy and second, that basic cognitive capacities continue to support perception of lexical prosody even in adult listeners.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号