期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Lip-read me now, hear me better later: cross-modal transfer of talker-familiarity effects

Rosenblum LD Miller RM Sanchez K 《Psychological science》2007,18(5):392-396

There is evidence that for both auditory and visual speech perception, familiarity with the talker facilitates speech recognition. Explanations of these effects have concentrated on the retention of talker information specific to each of these modalities. It could be, however, that some amodal, talker-specific articulatory-style information facilitates speech perception in both modalities. If this is true, then experience with a talker in one modality should facilitate perception of speech from that talker in the other modality. In a test of this prediction, subjects were given about 1 hr of experience lipreading a talker and were then asked to recover speech in noise from either this same talker or a different talker. Results revealed that subjects who lip-read and heard speech from the same talker performed better on the speech-in-noise task than did subjects who lip-read from one talker and then heard speech from a different talker. 相似文献

2.

Integrating speech information across talkers,gender, and sensory modality: Female faces and male voices in the McGurk effect

Kerry P. Green Patricia K. Kuhl Andrew N. Meltzoff Erica B. Stevens 《Attention, perception & psychophysics》1991,50(6):524-536

Studies of the McGurk effect have shown that when discrepant phonetic information is delivered to the auditory and visual modalities, the information is combined into a new percept not originally presented to either modality. In typical experiments, the auditory and visual speech signals are generated by the same talker. The present experiment examined whether a discrepancy in the gender of the talker between the auditory and visual signals would influence the magnitude of the McGurk effect. A male talker’s voice was dubbed onto a videotape containing a female talker’s face, and vice versa. The gender-incongruent videotapes were compared with gender-congruent videotapes, in which a male talker’s voice was dubbed onto a male face and a female talker’s voice was dubbed onto a female face. Even though there was a clear incompatibility in talker characteristics between the auditory and visual signals on the incongruent videotapes, the resulting magnitude of the McGurk effectwas not significantly different for the incongruent as opposed to the congruent videotapes. The results indicate that the mechanism for integrating speech information from the auditory and the visual modalities is not disrupted by a gender incompatibility even when it is perceptually apparent. The findings are compatible with the theoretical notion that information about voice characteristics of the talker is extracted and used to normalize the speech signal at an early stage of phonetic processing, prior to the integration of the auditory and the visual information. 相似文献

3.

Integrating speech information across talkers, gender, and sensory modality: female faces and male voices in the McGurk effect. 总被引：2，自引：0，他引：2

K P Green P K Kuhl A N Meltzoff E B Stevens 《Perception & psychophysics》1991,50(6):524-536

Studies of the McGurk effect have shown that when discrepant phonetic information is delivered to the auditory and visual modalities, the information is combined into a new percept not originally presented to either modality. In typical experiments, the auditory and visual speech signals are generated by the same talker. The present experiment examined whether a discrepancy in the gender of the talker between the auditory and visual signals would influence the magnitude of the McGurk effect. A male talker's voice was dubbed onto a videotape containing a female talker's face, and vice versa. The gender-incongruent videotapes were compared with gender-congruent videotapes, in which a male talker's voice was dubbed onto a male face and a female talker's voice was dubbed onto a female face. Even though there was a clear incompatibility in talker characteristics between the auditory and visual signals on the incongruent videotapes, the resulting magnitude of the McGurk effect was not significantly different for the incongruent as opposed to the congruent videotapes. The results indicate that the mechanism for integrating speech information from the auditory and the visual modalities is not disrupted by a gender incompatibility even when it is perceptually apparent. The findings are compatible with the theoretical notion that information about voice characteristics of the talker is extracted and used to normalize the speech signal at an early stage of phonetic processing, prior to the integration of the auditory and the visual information. 相似文献

4.

Visual speech processing: Word-decoding and word-discrimination related to sentence-based speechreading and hearing-impairment 总被引：3，自引：3，他引：0

BJÖRN LYXELL JERKER RÖNNBERG 《Scandinavian journal of psychology》1991,32(1):9-17

Two aspects of visual speech processing in speechreading (word decoding and word discrimination) were tested in a group of 24 normal hearing and a group of 20 hearing-impaired subjects. Word decoding and word discrimination performance were independent of factors related to the impairment, both in a quantitative and a qualitative sense. Decoding skill, but not discrimination skill, was associated with sentence-based speechreading. The results were interpreted such that, in order to represent a critical component process in sentence-based speechreading, the visual speech perception task must entail lexically induced processing as a task-demand. The theoretical status of the word decoding task as one operationalization of a speech decoding module was discussed (Fodor, 1983). An error analysis of performance in the word decoding/discrimination tasks suggested that the perception of heard stimuli, as well as the perception of lipped stimuli, were critically dependent on the same features; that is, the temporally initial phonetic segment of the word (cf. Marslen-Wilson, 1987). Implications for a theory of visual speech perception were discussed. 相似文献

5.

Recognizing prosody across modalities, face areas and speakers: examining perceivers' sensitivity to variable realizations of visual prosody

Cvejic E Kim J Davis C 《Cognition》2012,122(3):442-453

Prosody can be expressed not only by modification to the timing, stress and intonation of auditory speech but also by modifying visual speech. Studies have shown that the production of visual cues to prosody is highly variable (both within and across speakers), however behavioural studies have shown that perceivers can effectively use such visual cues. The latter result suggests that people are sensitive to the type of prosody expressed despite cue variability. The current study investigated the extent to which perceivers can match visual cues to prosody from different speakers and from different face regions. Participants were presented two pairs of sentences (consisting of the same segmental content) and were required to decide which pair had the same prosody. Experiment 1 tested visual and auditory cues from the same speaker and Experiment 2 from different speakers. Experiment 3 used visual cues from the upper and the lower face of the same talker and Experiment 4 from different speakers. The results showed that perceivers could accurately match prosody even when signals were produced by different speakers. Furthermore, perceivers were able to match the prosodic cues both within and across modalities regardless of the face area presented. This ability to match prosody from very different visual cues suggests that perceivers cope with variation in the production of visual prosody by flexibly mapping specific tokens to abstract prosodic types. 相似文献

6.

Something in the way she moves

Munhall KG Buchan JN 《Trends in cognitive sciences》2004,8(2):51-53

A recent study using a crossmodal matching task showed that the identity of a talker could be recognized even when the auditory and visual stimuli that were being matched were different sentences spoken by the talker. This finding implies that general temporal features of a person's speech are shared across the auditory and visual modalities. 相似文献

7.

Experience with a talker can transfer across modalities to facilitate lipreading

Kauyumari Sanchez James W. Dias Lawrence D. Rosenblum 《Attention, perception & psychophysics》2013,75(7):1359-1365

Rosenblum, Miller, and Sanchez (Psychological Science, 18, 392-396, 2007) found that subjects first trained to lip-read a particular talker were then better able to perceive the auditory speech of that same talker, as compared with that of a novel talker. This suggests that the talker experience a perceiver gains in one sensory modality can be transferred to another modality to make that speech easier to perceive. An experiment was conducted to examine whether this cross-sensory transfer of talker experience could occur (1) from auditory to lip-read speech, (2) with subjects not screened for adequate lipreading skill, (3) when both a familiar and an unfamiliar talker are presented during lipreading, and (4) for both old (presentation set) and new words. Subjects were first asked to identify a set of words from a talker. They were then asked to perform a lipreading task from two faces, one of which was of the same talker they heard in the first phase of the experiment. Results revealed that subjects who lip-read from the same talker they had heard performed better than those who lip-read a different talker, regardless of whether the words were old or new. These results add further evidence that learning of amodal talker information can facilitate speech perception across modalities and also suggest that this information is not restricted to previously heard words. 相似文献

8.

Audio-visual interactions with intact clearly audible speech

Chris Davis Jeesun Kim 《The Quarterly Journal of Experimental Psychology Section A: Human Experimental Psychology》2004,57(6):1103-1121

The effects of viewing the face of the talker (visual speech) on the processing of clearly presented intact auditory stimuli were investigated using two measures likely to be sensitive to the articulatory motor actions produced in speaking. The aim of these experiments was to highlight the need for accounts of the effects of audio-visual (AV) speech that explicitly consider the properties of articulated action. The first experiment employed a syllable-monitoring task in which participants were required to monitor for target syllables within foreign carrier phrases. An AV effect was found in that seeing a talker's moving face (moving face condition) assisted in more accurate recognition (hits and correct rejections) of spoken syllables than of auditory-only still face (still face condition) presentations. The second experiment examined processing of spoken phrases by investigating whether an AV effect would be found for estimates of phrase duration. Two effects of seeing the moving face of the talker were found. First, the moving face condition had significantly longer duration estimates than the still face auditory-only condition. Second, estimates of auditory duration made in the moving face condition reliably correlated with the actual durations whereas those made in the still face auditory condition did not. The third experiment was carried out to determine whether the stronger correlation between estimated and actual duration in the moving face condition might have been due to generic properties of AV presentation. Experiment 3 employed the procedures of the second experiment but used stimuli that were not perceived as speech although they possessed the same timing cues as those of the speech stimuli of Experiment 2. It was found that simply presenting both auditory and visual timing information did not result in more reliable duration estimates. Further, when released from the speech context (used in Experiment 2), duration estimates for the auditory-only stimuli were significantly correlated with actual durations. In all, these results demonstrate that visual speech can assist in the analysis of clearly presented auditory stimuli in tasks concerned with information provided by viewing the production of an utterance. We suggest that these findings are consistent with there being a processing link between perception and action such that viewing a talker speaking will activate speech motor schemas in the perceiver. 相似文献

9.

Acoustic differences, listener expectations, and the perceptual accommodation of talker variability

Magnuson JS Nusbaum HC 《Journal of experimental psychology. Human perception and performance》2007,33(2):391-409

Two talkers' productions of the same phoneme may be quite different acoustically, whereas their productions of different speech sounds may be virtually identical. Despite this lack of invariance in the relationship between the speech signal and linguistic categories, listeners experience phonetic constancy across a wide range of talkers, speaking styles, linguistic contexts, and acoustic environments. The authors present evidence that perceptual sensitivity to talker variability involves an active cognitive mechanism: Listeners expecting to hear 2 different talkers differing only slightly in average pitch showed performance costs typical of adjusting to talker variability, whereas listeners hearing the same materials but expecting a single talker or given no special instructions did not show these performance costs. The authors discuss the implications for understanding phonetic constancy despite variability between talkers (and other sources of variability) and for theories of speech perception. The results provide further evidence for active, controlled processing in real-time speech perception and are consistent with a model of talker normalization that involves contextual tuning. 相似文献

10.

Effects of stimulus variability on perception and representation of spoken words in memory

Lynne C. Nygaard Mitchell S. Sommers David B. Pisoni 《Attention, perception & psychophysics》1995,57(7):989-1001

A series of experiments was conducted to investigate the effects of stimulus variability on the memory representations for spoken words. A serial recall task was used to study the effects of changes in speaking rate, talker variability, and overall amplitude on the initial encoding, rehearsal, and recall of lists of spoken words. Interstimulus interval (ISI) was manipulated to determine the time course and nature of processing. The results indicated that at short ISIs, variations in both talker and speaking rate imposed a processing cost that was reflected in poorer serial recall for the primacy portion of word lists. At longer ISIs, however, variation in talker characteristics resulted in improved recall in initial list positions, whereas variation in speaking rate had no effect on recall performance. Amplitude variability had no effect on serial recall across all ISIs. Taken together, these results suggest that encoding of stimulus dimensions such as talker characteristics, speaking rate, and overall amplitude may be the result of distinct perceptual operations. The effects of these sources of stimulus variability in speech are discussed with regard to perceptual saliency, processing demands, and memory representation for spoken words. 相似文献

11.

Single band amplitude envelope cues as an aid to speechreading

Ken W. Grant Louis D. Braida Rebecca J. Renn 《The Quarterly Journal of Experimental Psychology Section A: Human Experimental Psychology》1991,43(3):621-645

Amplitude envelopes derived from speech have been shown to facilitate speech-reading to varying degrees, depending on how the envelope signals were extracted and presented and on the amount of training given to the subjects. In this study, three parameters related to envelope extraction and presentation were examined using both easy and difficult sentence materials: (1) the bandwidth and centre frequency of the filtered speech signal used to obtain the envelope; (2) the bandwidth of the envelope signal determined by the lowpass filter cutoff frequency used to “smooth” the envelope fluctuations; and (3) the carrier signal used to convey the envelope cues. Results for normal hearing subjects following a brief visual and auditory-visual familiarization/training period showed that (1) the envelope derived from wideband speech does not provide the greatest benefit to speechreading when compared to envelopes derived from selected octave bands of speech; (2) as the bandwidth centred around the carrier frequency increased from 12.5 to 1600 Hz, auditory-visual (AV) performance obtained with difficult sentence materials improved, especially for envelopes derived from high-frequency speech energy; (3) envelope bandwidths below 25 Hz resulted in AV scores that were sometimes equal to or worse than speechreading alone; (4) for each filtering condition tested, there was at least one bandwidth and carrier condition that produced AV scores that were significantly greater than speech-reading alone; (5) low-frequency carriers were better than high-frequency or wideband carriers for envelopes derived from an octave band of speech centred at 500 Hz; and (6) low-frequency carriers were worse than high-frequency or wideband carriers for envelopes derived from an octave band centred at 3150 Hz. These results suggest that amplitude envelope cues can provide a substantial benefit to speechreading for both easy and difficult sentence materials, but that frequency transposition of these signals to regions remote from their “natural” spectral locations may result in reduced performance. 相似文献

12.

Some consequences of stimulus variability on speech processing by 2-month-old infants.

P W Jusczyk D B Pisoni J Mullennix 《Cognition》1992,43(3):253-291

The present study explores how stimulus variability in speech production influences the 2-month-old infant's perception and memory for speech sounds. Experiment 1 focuses on the consequences of talker variability for the infant's ability to detect differences between speech sounds. When tested with high-amplitude sucking (HAS) procedure, infants who listened to versions of a syllable, such as [symbol: see text], produced by 6 male and 6 female talkers, detected a change to another syllable, such as [symbol: see text], uttered by the same group of talkers. In fact, infants exposed to multiple talkers performed as well as other infants who heard utterances produced by only a single talker. Moreover, other results showed that infants discriminate the voices of the individual talkers, although discriminating one mixed group of talkers (3 males and 3 females) from another is too difficult for them. Experiment 2 explored the consequences of talker variability on infants' memory for speech sounds. The HAS procedure was modified by introducing a 2-min delay period between the familiarization and test phases of the experiment. Talker variability impeded infants' encoding of speech sounds. Infants who heard versions of the same syllable produced by 12 different talkers did not detect a change to a new syllable produced by the same talkers after the delay period. However, infants who heard the same syllable produced by a single talker were able to detect the phonetic change after the delay. Finally, although infants who heard productions from a single talker retained information about the phonetic structure of the syllable during the delay, they apparently did not retain information about the identity of the talker. Experiment 3 reduced the range of variability across talkers and investigated whether variability interferes with retention of all speech information. Although reducing the range of variability did not lead to retention of phonetic details, infants did recognize a change in the gender of the talkers' voices (from male to female or vice versa) after a 2-min delay. Two additional experiments explored the consequences of limiting the variability to a single talker. In Experiment 4, with an immediate testing procedure, infants exposed to 12 different tokens of one syllable produced by the same talker discriminated these from 12 tokens of another syllable.(ABSTRACT TRUNCATED AT 400 WORDS) 相似文献

13.

Cueing listeners to attend to a target talker progressively improves word report as the duration of the cue-target interval lengthens to 2,000 ms

Emma Holmes Padraig T. Kitterick A. Quentin Summerfield 《Attention, perception & psychophysics》2018,80(6):1520-1538

Endogenous attention is typically studied by presenting instructive cues in advance of a target stimulus array. For endogenous visual attention, task performance improves as the duration of the cue-target interval increases up to 800 ms. Less is known about how endogenous auditory attention unfolds over time or the mechanisms by which an instructive cue presented in advance of an auditory array improves performance. The current experiment used five cue-target intervals (0, 250, 500, 1,000, and 2,000 ms) to compare four hypotheses for how preparatory attention develops over time in a multi-talker listening task. Young adults were cued to attend to a target talker who spoke in a mixture of three talkers. Visual cues indicated the target talker’s spatial location or their gender. Participants directed attention to location and gender simultaneously (“objects”) at all cue-target intervals. Participants were consistently faster and more accurate at reporting words spoken by the target talker when the cue-target interval was 2,000 ms than 0 ms. In addition, the latency of correct responses progressively shortened as the duration of the cue-target interval increased from 0 to 2,000 ms. These findings suggest that the mechanisms involved in preparatory auditory attention develop gradually over time, taking at least 2,000 ms to reach optimal configuration, yet providing cumulative improvements in speech intelligibility as the duration of the cue-target interval increases from 0 to 2,000 ms. These results demonstrate an improvement in performance for cue-target intervals longer than those that have been reported previously in the visual or auditory modalities. 相似文献

14.

The role of visual information in the processing of

Kerry P. Green Patricia K. Kuhl 《Attention, perception & psychophysics》1989,45(1):34-42

Visual information provided by a talker’s mouth movements can influence the perception of certain speech features. Thus, the “McGurk effect” shows that when the syllable /bi/ is presented audibly, in synchrony with the syllable /gi/, as it is presented visually, a person perceives the talker as saying /di/. Moreover, studies have shown that interactions occur between place and voicing features in phonetic perception, when information is presented audibly. In our first experiment, we asked whether feature interactions occur when place information is specified by a combination of auditory and visual information. Members of an auditory continuum ranging from /ibi/ to /ipi/ were paired with a video display of a talker saying /igi/. The auditory tokens were heard as ranging from /ibi/ to /ipi/, but the auditory-visual tokens were perceived as ranging from /idi/ to /iti/. The results demonstrated that the voicing boundary for the auditory-visual tokens was located at a significantly longer VOT value than the voicing boundary for the auditory continuum presented without the visual information. These results demonstrate that place-voice interactions are not limited to situations in which place information is specified audibly. In three follow-up experiments, we show that (1) the voicing boundary is not shifted in the absence of a change in the global percept, even when discrepant auditory-visual information is presented; (2) the number of response alternatives provided for the subjects does not affect the categorization or the VOT boundary of the auditory-visual stimuli; and (3) the original effect of a VOT boundary shift is not replicated when subjects are forced by instruction to \ldrelabel\rd the /b-p/auditory stimuli as/d/or/t/. The subjects successfully relabeled the stimuli, but no shift in the VOT boundary was observed. 相似文献

15.

Effects of talker variability on recall of spoken word lists

C S Martin J W Mullennix D B Pisoni W V Summers 《Journal of experimental psychology. Learning, memory, and cognition》1989,15(4):676-684

Three experiments were conducted to investigate recall of lists of words containing items spoken by either a single talker or by different talkers. In each experiment, recall of early list items was better for lists spoken by a single talker than for lists of the same words spoken by different talkers. The use of a memory preload procedure demonstrated that recall of visually presented preload digits was superior when the words in a subsequent list were spoken by a single talker than by different talkers. In addition, a retroactive interference task demonstrated that the effects of talker variability on the recall of early list items were not due to use of talker-specific acoustic cues in working memory at the time of recall. Taken together, the results suggest that word lists produced by different talkers require more processing resources in working memory than do lists produced by a single talker. The findings are discussed in terms of the role that active rehearsal plays in the transfer of spoken items into long-term memory and the factors that may affect the efficiency of rehearsal. 相似文献

16.

The role of visual information in the processing of place and manner features in speech perception 总被引：1，自引：0，他引：1

K P Green P K Kuhl 《Perception & psychophysics》1989,45(1):34-42

Visual information provided by a talker's mouth movements can influence the perception of certain speech features. Thus, the "McGurk effect" shows that when the syllable (bi) is presented audibly, in synchrony with the syllable (gi), as it is presented visually, a person perceives the talker as saying (di). Moreover, studies have shown that interactions occur between place and voicing features in phonetic perception, when information is presented audibly. In our first experiment, we asked whether feature interactions occur when place information is specificed by a combination of auditory and visual information. Members of an auditory continuum ranging from (ibi) to (ipi) were paired with a video display of a talker saying (igi). The auditory tokens were heard as ranging from (ibi) to (ipi), but the auditory-visual tokens were perceived as ranging from (idi) to (iti). The results demonstrated that the voicing boundary for the auditory-visual tokens was located at a significantly longer VOT value than the voicing boundary for the auditory continuum presented without the visual information. These results demonstrate that place-voice interactions are not limited to situations in which place information is specified audibly.(ABSTRACT TRUNCATED AT 250 WORDS) 相似文献

17.

The influence of infant-directed speech on 12-month-olds’ intersensory perception of fluent speech

《Infant behavior & development》2014,37(4):644-651

The present study examined whether infant-directed (ID) speech facilitates intersensory matching of audio–visual fluent speech in 12-month-old infants. German-learning infants’ audio–visual matching ability of German and French fluent speech was assessed by using a variant of the intermodal matching procedure, with auditory and visual speech information presented sequentially. In Experiment 1, the sentences were spoken in an adult-directed (AD) manner. Results showed that 12-month-old infants did not exhibit a matching performance for the native, nor for the non-native language. However, Experiment 2 revealed that when ID speech stimuli were used, infants did perceive the relation between auditory and visual speech attributes, but only in response to their native language. Thus, the findings suggest that ID speech might have an influence on the intersensory perception of fluent speech and shed further light on multisensory perceptual narrowing. 相似文献

18.

On the nature of talker variability effects on recall of spoken word lists. 总被引：3，自引：0，他引：3

S D Goldinger D B Pisoni J S Logan 《Journal of experimental psychology. Learning, memory, and cognition》1991,17(1):152-162

In a recent study, Martin, Mullennix, Pisoni, and Summers (1989) reported that subjects' accuracy in recalling lists of spoken words was better for words in early list positions when the words were spoken by a single talker than when they were spoken by multiple talkers. The present study was conducted to examine the nature of these effects in further detail. Accuracy of serial-ordered recall was examined for lists of words spoken by either a single talker or by multiple talkers. Half the lists contained easily recognizable words, and half contained more difficult words, according to a combined metric of word frequency, lexical neighborhood density, and neighborhood frequency. Rate of presentation was manipulated to assess the effects of both variables on rehearsal and perceptual encoding. A strong interaction was obtained between talker variability and rate of presentation. Recall of multiple-talker lists was affected much more than single-talker lists by changes in presentation rate. At slow presentation rates, words in early serial positions produced by multiple talkers were actually recalled more accurately than words produced by a single talker. No interaction was observed for word confusability and rate of presentation. The data provide support for the proposal that talker variability affects the accuracy of recall of spoken words not only by increasing the processing demands for early perceptual encoding of the words, but also by affecting the efficiency of the rehearsal process itself. 相似文献

19.

An experimental analysis of memory processing

Wright AA 《Journal of the experimental analysis of behavior》2007,88(3):405-433

Rhesus monkeys were trained and tested in visual and auditory list-memory tasks with sequences of four travel pictures or four natural/environmental sounds followed by single test items. Acquisitions of the visual list-memory task are presented. Visual recency (last item) memory diminished with retention delay, and primacy (first item) memory strengthened. Capuchin monkeys, pigeons, and humans showed similar visual-memory changes. Rhesus learned an auditory memory task and showed octave generalization for some lists of notes--tonal, but not atonal, musical passages. In contrast with visual list memory, auditory primacy memory diminished with delay and auditory recency memory strengthened. Manipulations of interitem intervals, list length, and item presentation frequency revealed proactive and retroactive inhibition among items of individual auditory lists. Repeating visual items from prior lists produced interference (on nonmatching tests) revealing how far back memory extended. The possibility of using the interference function to separate familiarity vs. recollective memory processing is discussed. 相似文献

20.

Cross-modal source information and spoken word recognition

Lachs L Pisoni DB 《Journal of experimental psychology. Human perception and performance》2004,30(2):378-396

In a cross-modal matching task, participants were asked to match visual and auditory displays of speech based on the identity of the speaker. The present investigation used this task with acoustically transformed speech to examine the properties of sound that can convey cross-modal information. Word recognition performance was also measured under the same transformations. The authors found that cross-modal matching was only possible under transformations that preserved the relative spectral and temporal patterns of formant frequencies. In addition, cross-modal matching was only possible under the same conditions that yielded robust word recognition performance. The results are consistent with the hypothesis that acoustic and optical displays of speech simultaneously carry articulatory information about both the underlying linguistic message and indexical properties of the talker. 相似文献