首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The performance of 14 poor readers on an audiovisual speech perception task was compared with 14 normal subjects matched on chronological age (CA) and 14 subjects matched on reading age (RA). The task consisted of identifying synthetic speech varying in place of articulation on an acoustic 9-point continuum between /ba/ and /da/ (Massaro & Cohen, 1983). The acoustic speech events were factorially combined with the visual articulation of /ba/, /da/, or none. In addition, the visual-only articulation of /ba/ or /da/ was presented. The results showed that (1) poor readers were less categorical than CA and RA in the identification of the auditory speech events and (2) that they were worse in speech reading. This convergence between the deficits clearly suggests that the auditory speech processing difficulty of poor readers is speech specific and relates to the processing of phonological information.  相似文献   

2.
The “McGurk effect” demonstrates that visual (lip-read) information is used during speech perception even when it is discrepant with auditory information. While this has been established as a robust effect in subjects from Western cultures, our own earlier results had suggested that Japanese subjects use visual information much less than American subjects do (Sekiyama & Tohkura, 1993). The present study examined whether Chinese subjects would also show a reduced McGurk effect due to their cultural similarities with the Japanese. The subjects were 14 native speakers of Chinese living in Japan. Stimuli consisted of 10 syllables (/ba/, /pa/, /ma/, /wa/, /da/, /ta/, /na/, /ga/, /ka/, /ra/ ) pronounced by two speakers, one Japanese and one American. Each auditory syllable was dubbed onto every visual syllable within one speaker, resulting in 100 audiovisual stimuli in each language. The subjects’ main task was to report what they thought they had heard while looking at and listening to the speaker while the stimuli were being uttered. Compared with previous results obtained with American subjects, the Chinese subjects showed a weaker McGurk effect. The results also showed that the magnitude of the McGurk effect depends on the length of time the Chinese subjects had lived in Japan. Factors that foster and alter the Chinese subjects’ reliance on auditory information are discussed.  相似文献   

3.
In noisy situations, visual information plays a critical role in the success of speech communication: listeners are better able to understand speech when they can see the speaker. Visual influence on auditory speech perception is also observed in the McGurk effect, in which discrepant visual information alters listeners’ auditory perception of a spoken syllable. When hearing /ba/ while seeing a person saying /ga/, for example, listeners may report hearing /da/. Because these two phenomena have been assumed to arise from a common integration mechanism, the McGurk effect has often been used as a measure of audiovisual integration in speech perception. In this study, we test whether this assumed relationship exists within individual listeners. We measured participants’ susceptibility to the McGurk illusion as well as their ability to identify sentences in noise across a range of signal-to-noise ratios in audio-only and audiovisual modalities. Our results do not show a relationship between listeners’ McGurk susceptibility and their ability to use visual cues to understand spoken sentences in noise, suggesting that McGurk susceptibility may not be a valid measure of audiovisual integration in everyday speech processing.  相似文献   

4.
In the McGurk effect, perceptual identification of auditory speech syllables is influenced by simultaneous presentation of discrepant visible speech syllables. This effect has been found in subjects of different ages and with various native language backgrounds. But no McGurk tests have been conducted with prelinguistic infants. In the present series of experiments, 5-month-old English-exposed infants were tested for the McGurk effect. Infants were first gaze-habituated to an audiovisual /va/. Two different dishabituation stimuli were then presented: audio /ba/-visual /va/ (perceived by adults as /va/), and audio /da/-visual /va/ (perceived by adults as /da/). The infants showed generalization from the audiovisual /va/ to the audio /ba/-visual /va/ stimulus but not to the audio /da/-visual /va/ stimulus. Follow-up experiments revealed that these generalization differences were not due to a general preference for the audio /da/-visual /va/ stimulus or to the auditory similarity of /ba/ to /va/ relative to /da/. These results suggest that the infants were visually influenced in the same way as Englishspeaking adults are visually influenced.  相似文献   

5.
刘文理  乐国安 《心理学报》2012,44(5):585-594
采用启动范式, 以汉语听者为被试, 考察了非言语声音是否影响言语声音的知觉。实验1考察了纯音对辅音范畴连续体知觉的影响, 结果发现纯音影响到辅音范畴连续体的知觉, 表现出频谱对比效应。实验2考察了纯音和复合音对元音知觉的影响, 结果发现与元音共振峰频率一致的纯音或复合音加快了元音的识别, 表现出启动效应。两个实验一致发现非言语声音能够影响言语声音的知觉, 表明言语声音知觉也需要一个前言语的频谱特征分析阶段, 这与言语知觉听觉理论的观点一致。  相似文献   

6.
Three experiments were carried out to investigate the evaluation and integration of visual and auditory information in speech perception. In the first two experiments, subjects identified /ba/ or /da/ speech events consisting of high-quality synthetic syllables ranging from /ba/ to /da/ combined with a videotaped /ba/ or /da/ or neutral articulation. Although subjects were specifically instructed to report what they heard, visual articulation made a large contribution to identification. The tests of quantitative models provide evidence for the integration of continuous and independent, as opposed to discrete or nonindependent, sources of information. The reaction times for identification were primarily correlated with the perceived ambiguity of the speech event. In a third experiment, the speech events were identified with an unconstrained set of response alternatives. In addition to /ba/ and /da/ responses, the /bda/ and /tha/ responses were well described by a combination of continuous and independent features. This body of results provides strong evidence for a fuzzy logical model of perceptual recognition.  相似文献   

7.
The McGurk effect, where an incongruent visual syllable influences identification of an auditory syllable, does not always occur, suggesting that perceivers sometimes fail to use relevant visual phonetic information. We tested whether another visual phonetic effect, which involves the influence of visual speaking rate on perceived voicing (Green & Miller, 1985), would occur in instances when the McGurk effect does not. In Experiment 1, we established this visual rate effect using auditory and visual stimuli matching in place of articulation, finding a shift in the voicing boundary along an auditory voice-onset-time continuum with fast versus slow visual speech tokens. In Experiment 2, we used auditory and visual stimuli differing in place of articulation and found a shift in the voicing boundary due to visual rate when the McGurk effect occurred and, more critically, when it did not. The latter finding indicates that phonetically relevant visual information is used in speech perception even when the McGurk effect does not occur, suggesting that the incidence of the McGurk effect underestimates the extent of audio-visual integration.  相似文献   

8.
Pre-delinquent peers in Achievement Place (a community based family style rehabilitation program based on a token economy) were given points (token reinforcement) to modify the articulation errors of two boys. In Experiment I, using a multiple baseline experimental design, error words involving the /l/, /r/, /th/, and /ting/ sounds were successfully treated by both a group of peers and by individual peers. Also, generalization occurred to words that were not trained. The speech correction procedure used by the peers involved a number of variables including modelling, peer approval, contingent points, and feedback. The individual role of each of these variables was not experimentally analyzed, but it was demonstrated that peers could function as speech therapists without instructions, feedback, or the presence of an adult. It was also found that payment of points to peers for detecting correct articulations produced closer agreement with the experimenter than when they were paid points for finding incorrect articulations. The results were replicated in a second experiment with another subject who had similar articulation errors. In addition, the second experiment showed that peer speech correction procedures resulted in some generalization to the correct use of target words in sentences and significant improvements on standard tests of articulation.  相似文献   

9.
The relations among articulation accuracy, speech perception, and phoneme awareness were examined in a sample of 97 typically developing children ages 48 to 66 months. Of these 97 children, 46 were assessed twice at ages 4 and 5 years. Children completed two tasks for each of the three skills, assessing these abilities for the target phoneme /r/ and the control phoneme /m/ in the word-initial position. Concurrent analyses revealed that phoneme-specific relations existed among articulation, awareness, and perception. Articulation accuracy of /r/ predicted speech perception and phoneme awareness for /r/ after controlling for age, vocabulary, letter-word knowledge, and speech perception or phoneme awareness for the control phoneme /m/. The longitudinal analyses confirmed the pattern of relations. The findings are consistent with a model whereby children's articulation accuracy affects preexisting differences in phonological representations and, consequently, affects how children perceive, discriminate, and manipulate speech sounds.  相似文献   

10.
The importance of visual cues in speech perception is illustrated by the McGurk effect, whereby a speaker’s facial movements affect speech perception. The goal of the present study was to evaluate whether the McGurk effect is also observed for sung syllables. Participants heard and saw sung instances of the syllables /ba/ and /ga/ and then judged the syllable they perceived. Audio-visual stimuli were congruent or incongruent (e.g., auditory /ba/ presented with visual /ga/). The stimuli were presented as spoken, sung in an ascending and descending triad (C E G G E C), and sung in an ascending and descending triad that returned to a semitone above the tonic (C E G G E C#). Results revealed no differences in the proportion of fusion responses between spoken and sung conditions confirming that cross-modal phonemic information is integrated similarly in speech and song.  相似文献   

11.
Auditory perception of speech and speech sounds was examined in three groups of patients with cerebral damage in the dominant hemisphere. Two groups consisted of brain-injured war veterans, one group of patients with high-frequency hearing loss and the other, a group of patients with a flat hearing loss. The third group consisted of patients with recent cerebral infarcts due to vascular occlusion of the middle cerebral and internal carotid artery. Word and phoneme discrimination as well as phoneme confusions in incorrect responses were analyzed from conventional speech audiometry tests with bisyllabic Finnish words fed close to the speech reception threshold of the patient. The results were compared with those of a control group with no cerebral disorders and normal hearing. The speech discrimination scores of veterans with high-frequency hearing loss and patients with recent cerebral infarcts were some 15–20% lower than those of controls or veterans with flat hearing loss. Speech sound feature discrimination, analyzed in terms of place of articulation and distinctive features, was distorted especially in cases of recent cerebral infarcts, whereas general information transmission of phonemes was more impaired in patients with high-frequency hearing loss.  相似文献   

12.
We have implemented a facial animation system to carry out visible speech synthesis. Using this system, it is possible to manipulate control parameters to synthesize a sequence of speech articulations. In addition, it is possible to synthesize novel articulations, such as one that is half way between /ba/ and /da/.  相似文献   

13.
We examined whether the orientation of the face influences speech perception in face-to-face communication. Participants identified auditory syllables, visible syllables, and bimodal syllables presented in an expanded factorial design. The syllables were /ba/, /va/, /δa/, or /da/. The auditory syllables were taken from natural speech whereas the visible syllables were produced by computer animation of a realistic talking face. The animated face was presented either as viewed in normal upright orientation or inverted orientation (180° frontal rotation). The central intent of the study was to determine if an inverted view of the face would change the nature of processing bimodal speech or simply influence the information available in visible speech. The results with both the upright and inverted face views were adequately described by the fuzzy logical model of perception (FLMP). The observed differences in the FLMP’s parameter values corresponding to the visual information indicate that inverting the view of the face influences the amount of visible information but does not change the nature of the information processing in bimodal speech perception  相似文献   

14.
Phonetic segments are coarticulated in speech. Accordingly, the articulatory and acoustic properties of the speech signal during the time frame traditionally identified with a given phoneme are highly context-sensitive. For example, due to carryover coarticulation, the front tongue-tip position for /1/ results in more fronted tongue-body contact for a /g/ preceded by /1/ than for a /g/ preceded by /r/. Perception by mature listeners shows a complementary sensitivity--when a synthetic /da/-/ga/ continuum is preceded by either /al/ or /ar/, adults hear more /g/s following /l/ rather than /r/. That is, some of the fronting information in the temporal domain of the stop is perceptually attributed to /l/ (Mann, 1980). We replicated this finding and extended it to a signal-detection test of discrimination with adults, using triads of disyllables. Three equidistant items from a /da/-/ga/ continuum were used preceded by /al/ and /ar/. In the identification test, adults had identified item ga5 as "ga,' and dal as "da,' following both /al/ and /ar/, whereas they identified the crucial item d/ga3 predominantly as "ga' after /al/ but as "da' after /ar/. In the discrimination test, they discriminated d/ga3 from da1 preceded by /al/ but not /ar/; compatibly, they discriminated d/ga3 readily from ga5 preceded by /ar/ but poorly preceded by /al/. We obtained similar results with 4-month-old infants. Following habituation to either ald/ga3 or ard/ga3, infants heard either the corresponding ga5 or da1 disyllable. As predicted, the infants discriminated d/ga3 from da1 following /al/ but not /ar/; conversely, they discriminated d/ga3 from ga5 following /ar/ but not /al/. The results suggest that prelinguistic infants disentangle consonant-consonant coarticulatory influences in speech in an adult-like fashion.  相似文献   

15.
Previous studies indicate that at least some aspects of audiovisual speech perception are impaired in children with specific language impairment (SLI). However, whether audiovisual processing difficulties are also present in older children with a history of this disorder is unknown. By combining electrophysiological and behavioral measures, we examined perception of both audiovisually congruent and audiovisually incongruent speech in school‐age children with a history of SLI (H‐SLI), their typically developing (TD) peers, and adults. In the first experiment, all participants watched videos of a talker articulating syllables ‘ba’, ‘da’, and ‘ga’ under three conditions – audiovisual (AV), auditory only (A), and visual only (V). The amplitude of the N1 (but not of the P2) event‐related component elicited in the AV condition was significantly reduced compared to the N1 amplitude measured from the sum of the A and V conditions in all groups of participants. Because N1 attenuation to AV speech is thought to index the degree to which facial movements predict the onset of the auditory signal, our findings suggest that this aspect of audiovisual speech perception is mature by mid‐childhood and is normal in the H‐SLI children. In the second experiment, participants watched videos of audivisually incongruent syllables created to elicit the so‐called McGurk illusion (with an auditory ‘pa’ dubbed onto a visual articulation of ‘ka’, and the expectant perception being that of ‘ta’ if audiovisual integration took place). As a group, H‐SLI children were significantly more likely than either TD children or adults to hear the McGurk syllable as ‘pa’ (in agreement with its auditory component) than as ‘ka’ (in agreement with its visual component), suggesting that susceptibility to the McGurk illusion is reduced in at least some children with a history of SLI. Taken together, the results of the two experiments argue against global audiovisual integration impairment in children with a history of SLI and suggest that, when present, audiovisual integration difficulties in this population likely stem from a later (non‐sensory) stage of processing.  相似文献   

16.
Research has shown that auditory speech recognition is influenced by the appearance of a talker's face, but the actual nature of this visual information has yet to be established. Here, we report three experiments that investigated visual and audiovisual speech recognition using color, gray-scale, and point-light talking faces (which allowed comparison with the influence of isolated kinematic information). Auditory and visual forms of the syllables /ba/, /bi/, /ga/, /gi/, /va/, and /vi/ were used to produce auditory, visual, congruent, and incongruent audiovisual speech stimuli. Visual speech identification and visual influences on identifying the auditory components of congruent and incongruent audiovisual speech were identical for color and gray-scale faces and were much greater than for point-light faces. These results indicate that luminance, rather than color, underlies visual and audiovisual speech perception and that this information is more than the kinematic information provided by point-light faces. Implications for processing visual and audiovisual speech are discussed.  相似文献   

17.
Discrimination of speech sounds from three computer-generated continua that ranged from voiced to voiceless syllables (/ba-pa/, /da-ta/, and ga-ha/ was tested with three macaques. The stimuli on each continuum varied in voice-onset time (VOT). Paris of stimuli that were equally different in VOT were chosen such that they were either within-category pairs (syllables given the same phonetic label by human listeners) or between-category paks (syllables given different phonetic labels by human listeners). Results demonstrated that discrimination performance was always best for between-category pairs of stimuli, thus replicating the “phoneme boundary effect” seen in adult listeners and in human infants as young as I month of age. The findings are discussed in terms of their specific impact on accounts of voicing perception in human listeners and in terms of their impact on discussions of the evolution of language.  相似文献   

18.
Different kinds of speech sounds are used to signify possible word forms in every language. For example, lexical stress is used in Spanish (/‘be.be/, ‘he/she drinks’ versus /be.’be/, ‘baby’), but not in French (/‘be.be/ and /be.’be/ both mean ‘baby’). Infants learn many such native language phonetic contrasts in their first year of life, likely using a number of cues from parental speech input. One such cue could be parents’ object labeling, which can explicitly highlight relevant contrasts. Here we ask whether phonetic learning from object labeling is abstract—that is, if learning can generalize to new phonetic contexts. We investigate this issue in the prosodic domain, as the abstraction of prosodic cues (like lexical stress) has been shown to be particularly difficult. One group of 10-month-old French-learners was given consistent word labels that contrasted on lexical stress (e.g., Object A was labeled /‘ma.bu/, and Object B was labeled /ma.’bu/). Another group of 10-month-olds was given inconsistent word labels (i.e., mixed pairings), and stress discrimination in both groups was measured in a test phase with words made up of new syllables. Infants trained with consistently contrastive labels showed an earlier effect of discrimination compared to infants trained with inconsistent labels. Results indicate that phonetic learning from object labeling can indeed generalize, and suggest one way infants may learn the sound properties of their native language(s).  相似文献   

19.
If a place-of-articulation contrast is created between the auditory and the visual component syllables of videotaped speech, frequently the syllable that listeners report they have heard differs phonetically from the auditory component. These “McGurk effects”, as they have come to be called, show that speech perception may involve some kind of intermodal process. There are two classes of these phenomena: fusions and combinations. Perception of the syllable /da/ when auditory /ba/ and visual /ga/ are presented provides a clear example of the former, and perception of the string /bga/ after presentation of auditory /ga/ and visual /ba/ an unambiguous instance of the latter. Besides perceptual fusions and combinations, hearing visually presented component syllables also shows an influence of vision on audition. It is argued that these “visual” responses arise from basically the same underlying processes that yield fusions and combinations, respectively. In the present study, the visual component of audiovisually incongruous CV-syllables was presented in the left and the right visual hemifield, respectively. Audiovisual fusion responses showed a left hemifield advantage, and audiovisual combination responses a right hemifield advantage. This finding suggests that the process of audiovisual integration differs between audiovisual fusions and combinations and, furthermore, that the two cerebral hemispheres contribute differentially to the two classes of response.  相似文献   

20.
Young infants are capable of integrating auditory and visual information and their speech perception can be influenced by visual cues, while 5-month-olds detect mismatch between mouth articulations and speech sounds. From 6 months of age, infants gradually shift their attention away from eyes and towards the mouth in articulating faces, potentially to benefit from intersensory redundancy of audiovisual (AV) cues. Using eye tracking, we investigated whether 6- to 9-month-olds showed a similar age-related increase of looking to the mouth, while observing congruent and/or redundant versus mismatched and non-redundant speech cues. Participants distinguished between congruent and incongruent AV cues as reflected by the amount of looking to the mouth. They showed an age-related increase in attention to the mouth, but only for non-redundant, mismatched AV speech cues. Our results highlight the role of intersensory redundancy and audiovisual mismatch mechanisms in facilitating the development of speech processing in infants under 12 months of age.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号