首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The intelligibility of word lists subjected to various types of spectral filtering has been studied extensively. Although words used for communication are usually present in sentences rather than lists, there has been no systematic report of the intelligibility of lexical components of narrowband sentences. In the present study, we found that surprisingly little spectral information is required to identify component words when sentences are heard through narrow spectral slits. Four hundred twenty listeners (21 groups of 20 subjects) were each presented with 100 bandpass filtered CID ( “everyday speech ”) sentences; separate groups received center frequencies of 370, 530, 750, 1100, 1500, 2100, 3000, 4200, and 6000 Hz at 70 dBA SPL. In Experiment 1, intelligibility of single 1/3-octave bands with steep filter slopes (96 dB/octave) averaged more than 95% for sentences centered at 1100, 1500, and 2100 Hz. In Experiment 2, we used the same center frequencies with extremely narrow bands (slopes of 115 dB/octave intersecting at the center frequency, resulting in a nominal bandwidth of l/20 octave). Despite the severe spectral tilt for all frequencies of this impoverished spectrum, intelligibility remained relatively high for most bands, with the greatest intelligibility (77%) at 1500 Hz. In Experiments 1 and 2, the bands centered at 370 and 6000 Hz provided little useful information when presented individually, but in each experiment they interacted synergistically when combined. The present findings demonstrate the adaptive flexibility of mechanisms used for speech perception and are discussed in the context of the LAME model of opportunistic multilevel processing.  相似文献   

2.
Comodulation masking release (CMR) is a phenomenon that demonstrates the sensitivity of the auditory system to across-frequency differences in the temporal modulation pattern of a complex waveform. In this paper, we review briefly some of the data on the physical parameters that affect CMR and describe models that have been proposed to account for CMR -- namely, models based upon envelope equalization/cancellation, across-frequency envelope correlation, and “dip listening”. The present literature is ambiguous with regard to the relative importance of energy in the peak and dip regions of the waveform envelope. We therefore performed a series of experiments to investigate this issue. In the first experiment, we examined CMR for signals that resulted either in a uniform increment or in uniform decrement in the masking noise centred on the signal frequency. This was accomplished by using a 20-Hz-wide noise band centred on 700 Hz as both the masker and as the signal, adjusting the phase angle between the signal and masker to either 0° (increment) or 180° (decrement). Conditions were examined where either zero, one, two, four, or six comodulated flanking bands were present. Results indicated positive CMRs for all conditions in which a comodulated flanking band was present. CMR increased as the number of flanking bands increased for intensity increments, but not for intensity decrements. The remaining experiments examined conditions where signals were present only in masker peaks, or only in masker dips. The results of these experiments indicated relatively large CMRs when the signal occurred in dip regions, but no CMR when the signal occurred in peak regions. Whereas some of the results of the above experiments would be difficult to account for in terms of the dip listening hypothesis of CMR, the present findings did indicate that the stimulus cues that give rise to CMR appear to be derived primarily from the dip regions of the masking noise.  相似文献   

3.
This study examines a form of masking that can take place when the signal and masker are widely separated in frequency and cannot be explained in terms of the traditional concept of the auditory filter or critical band. We refer to this as across-channel masking. The task of the subject was to detect an increment in modulation depth of a 1000-Hz sinusoidal carrier. The carrier could either be sinusoidally amplitude modulated or sinusoidally frequency modulated at a 10-Hz rate. Modulation increment thresholds of this “target” signal were measured for the target alone, and in the presence of two interfering sounds with carrier frequencies of 230 and 3300 Hz. When the interfering sounds were unmodulated, they had no effect on modulation increment thresholds. When the interfering sounds were either amplitude or frequency modulated, thresholds increased. Amplitude modulation (AM) increment thresholds were affected by both amplitude-modulated and frequency-modulated interference. Similarly, frequency modulation (FM) increment thresholds were affected by both amplitude-modulated and frequency-modulated interference. For both types of signal, the interference was tuned for modulation rate; across-channel masking was greatest when the interfering sounds were modulated at rates close to 10 Hz, and declined for higher or lower rates. However, the tuning was rather broad. When the target and interfering sounds were modulated at the same rate, there was no effect of the relative phase of the modulators. Two possible explanations for the results are discussed. One is based on the idea that carriers that are modulated in a similar way tend to be perceptually “grouped”. The other is based on the idea that there are “channels” in the auditory system tuned for AM and FM rate. Neither explanation appears completely satisfactory.  相似文献   

4.
To comprehend speech in most environments, listeners must combine some but not all sounds from across a wide range of frequencies. Three experiments were conducted to examine the role of amplitude comodulation in performing an essential part of this function: the grouping together of the simultaneous components of a speech signal. Each of the experiments used time-varying sinusoidal (TVS) sentences (Remez, Rubin, Pisoni, & Carrell, 1981) as base stimuli because their component tones are acoustically unrelated. The independence of the three tones reduced the number of confounding grouping cues available compared with those found in natural or computersynthesized speech (e.g., fundamental frequency and simultaneity of harmonic onset). In each of the experiments, the TVS base stimuli were amplitude modulated to determine whether this modulation would lead to appropriate grouping of the three tones as reflected by sentence intelligibility. Experiment 1 demonstrated that amplitude comodulation at 100 Hz did improve the intelligibility of TVS sentences. Experiment 2 showed that the component tones of a TVS sentence must be comodulated (as opposed to independently modulated) for improvements in intelligibility to be found. Experiment 3 showed that the comodulation rates that led to intelligibility improvements were consistent with the effective rates found in experiments that examined the grouping of complex nonspeech sounds by common temporal envelopes(e.g., comodulation masking release; Hall, Haggard, & Fernandes, 1984). The results of these experiments support the claim that certain basic temporal-envelope processing capabilities of the liunian auditory system contribute to the perception of fluent speech.  相似文献   

5.
To comprehend speech in most environments, listeners must combine some but not all sounds from across a wide range of frequencies. Three experiments were conducted to examine the role of amplitude comodulation in performing an essential part of this function: the grouping together of the simultaneous components of a speech signal. Each of the experiments used time-varying sinusoidal (TVS) sentences (Remez, Rubin, Pisoni, & Carrell, 1981) as base stimuli because their component tones are acoustically unrelated. The independence of the three tones reduced the number of confounding grouping cues available compared with those found in natural or computer-synthesized speech (e.g., fundamental frequency and simultaneity of harmonic onset). In each of the experiments, the TVS base stimuli were amplitude modulated to determine whether this modulation would lead to appropriate grouping of the three tones as reflected by sentence intelligibility. Experiment 1 demonstrated that amplitude comodulation at 100 Hz did improve the intelligibility of TVS sentences. Experiment 2 showed that the component tones of a TVS sentence must be comodulated (as opposed to independently modulated) for improvements in intelligibility to be found. Experiment 3 showed that the comodulation rates that led to intelligibility improvements were consistent with the effective rates found in experiments that examined the grouping of complex nonspeech sounds by common temporal envelopes (e.g., comodulation masking release; Hall, Haggard, & Fernandes, 1984). The results of these experiments support the claim that certain basic temporal-envelope processing capabilities of the human auditory system contribute to the perception of fluent speech.  相似文献   

6.
R Frost 《Cognition》1991,39(3):195-214
When an amplitude-modulated noise generated from a spoken word is presented simultaneously with the word's printed version, the noise sounds more speechlike. This auditory illusion obtained by Frost, Repp, and Katz (1988) suggests that subjects detect correspondences between speech amplitude envelopes and printed stimuli. The present study investigated whether the speech envelope is assembled from the printed word or whether it is lexically addressed. In two experiments subjects were presented with speech-plus-noise and with noise-only trials, and were required to detect the speech in the noise. The auditory stimuli were accompanied with matching or non-matching Hebrew print, which was unvoweled in Experiment 1 and voweled in Experiment 2. The stimuli of both experiments consisted of high-frequency words, low-frequency words, and non-words. The results demonstrated that matching print caused a strong bias to detect speech in the noise when the stimuli were either high- or low-frequency words, whereas no bias was found for non-words. The bias effect for words or non-words was not affected by spelling to sound regularity; that is, similar effects were obtained in the voweled and the unvoweled conditions. These results suggest that the amplitude envelope of the word is not assembled from the print. Rather, it is addressed directly from the printed word and retrieved from the mental lexicon. Since amplitude envelopes are contingent on detailed phonetic structures, this outcome suggests that representations of words in the mental lexicon are not only phonological but also phonetic in character.  相似文献   

7.
In order to function effectively as a means of communication, speech must be intelligible under the noisy conditions encountered in everyday life. Two types of perceptual synthesis have been reported that can reduce or cancel the effects of masking by extraneous sounds: Phonemic restoration can enhance intelligibility when segments are replaced or masked by noise, and contralateral induction can prevent mislateralization by effectively restoring speech masked at one ear when it is heard in the other. The present study reports a third type of perceptual synthesis induced by noise: enhancement of intelligibility produced by adding noise to spectral gaps. In most of the experiments, the speech stimuli consisted of two widely separated narrow bands of speech (center frequencies of 370 and 6000 Hz, each band having high-pass and low-pass slopes of 115 dB/octave meeting at the center frequency). These very narrow bands effectively reduced the available information to frequency-limited patterns of amplitude fluctuation lacking information concerning formant structure and frequency transitions. When stochastic noise was introduced into the gap separating the two speech bands, intelligibility increased for “everyday” sentences, for sentences that varied in the transitional probability of keywords, and for monosyllabic word lists. Effects produced by systematically varying noise amplitude and noise bandwidth are reported, and the implications of some of the novel effects observed are discussed.  相似文献   

8.
The experiment investigated how the addition of emotion information from the voice affects the identification of facial emotion. We presented whole face, upper face, and lower face displays and examined correct recognition rates and patterns of response confusions for auditory-visual (AV), auditory-only (AO), and visual-only (VO) expressive speech. Emotion recognition accuracy was superior for AV compared to unimodal presentation. The pattern of response confusions differed across the unimodal conditions and across display type. For AV presentation, a response confusion only occurred when such a confusion was present in each modality separately, thus response confusions were reduced compared to unimodal presentations. Emotion space (calculated from the confusion data) differed across display types for the VO presentations but was more similar for the AV ones indicating that the addition of the auditory information acted to harmonize the various VO response patterns. These results are discussed with respect to how bimodal emotion recognition combines auditory and visual information.  相似文献   

9.
为探讨高低唇读理解能力听障学生唇读面部加工方式的差异,研究采用视频—图片匹配范式并结合眼动技术,考察高低唇读能力组语前-语中-语后和整体面部加工方式。结果发现,虽然两组都表现出社会协调模式,但高唇读能力组社会协调分数更高,且眼部维持时间更长。表明高唇读能力者整体加工和眼部、口形并行加工能力强,支持凝视假说和社会协调模式;低唇读能力者整体加工效率低,更依赖口形,未能通过补偿策略获得良好的补偿效果。  相似文献   

10.
In the context of face processing, the skill of processing speech from faces (speechreading) occupies a unique cognitive and neuropsychological niche. Neuropsychological dissociations in two cases (Campbell et al., 1986) suggested a very clear pattern: speechreading, but not face recognition, can be impaired by left-hemisphere damage, while face-recognition impairment consequent to right-hemisphere damage leaves speechreading unaffected. However, this story soon proved too simple, while neuroimaging techniques started to reveal further more detailed patterns. These patterns, moreover, were readily accommodated within the Bruce and Young (1986) model. Speechreading requires structural encoding of faces as faces, but further analysis of visible speech is supported by a network comprising several lateral temporal regions and inferior frontal regions. Posterior superior temporal regions play a significant role in speechreading natural speech, including audiovisual binding in hearing people. In deaf people, similar regions and circuits are implicated. While these detailed developments were not predicted by Bruce and Young, nevertheless, their model has stood the test of time, affording a structural framework for exploring speechreading in terms of face processing.  相似文献   

11.
Although speechreading can be facilitated by auditory or tactile supplements, the process that integrates cues across modalities is not well understood. This paper describes two “optimal processing” models for the types of integration that can be used in speechreading consonant segments and compares their predictions with those of the Fuzzy Logical Model of Perception (FLMP, Massaro, 1987). In “pre-labelling” integration, continuous sensory data is combined across modalities before response labels are assigned. In “post-labelling” integration, the responses that would be made under unimodal conditions are combined, and a joint response is derived from the pair. To describe pre-labelling integration, confusion matrices are characterized by a multidimensional decision model that allows performance to be described by a subject's sensitivity and bias in using continuous-valued cues. The cue space is characterized by the locations of stimulus and response centres. The distance between a pair of stimulus centres determines how well two stimuli can be distinguished in a given experiment. In the multimodal case, the cue space is assumed to be the product space of the cue spaces corresponding to the stimulation modes. Measurements of multimodal accuracy in five modern studies of consonant identification are more consistent with the predictions of the pre-labelling integration model than the FLMP or the post-labelling model.  相似文献   

12.
The cyclic variation in the energy envelope of the speech signal results from the production of speech in syllables. This acoustic property is often identified as a source of information in the perception of syllable attributes, though spectral variation can also provide this information reliably. In the present study of the relative contributions of the energy and spectral envelopes in speech perception, we employed sinusoidal replicas of utterances, which permitted us to examine the roles of these acoustic properties in establishing or maintaining time-varying perceptual coherence. Three experiments were carried out to assess the independent perceptual effects of variation in sinusoidal amplitude and frequency, using sentence-length signals. In Experiment 1, we found that the fine grain of amplitude variation was not necessary for the perception of segmental and suprasegmental linguistic attributes; in Experiment 2, we found that amplitude was nonetheless effective in influencing syllable perception, and that in some circumstances it was crucial to segmental perception; in Experiment 3, we observed that coarse-grain amplitude variation, above all, proved to be extremely important in phonetic perception. We conclude that in perceiving sinusoidal replicas, the perceiver derives much from following the coherent pattern of frequency variation and gross signal energy, but probably derives rather little from tracking the precise details of the energy envelope. These findings encourage the view that the perceiver uses time-varying acoustic properties selectively in understanding speech.  相似文献   

13.
Two aspects of visual speech processing in speechreading (word decoding and word discrimination) were tested in a group of 24 normal hearing and a group of 20 hearing-impaired subjects. Word decoding and word discrimination performance were independent of factors related to the impairment, both in a quantitative and a qualitative sense. Decoding skill, but not discrimination skill, was associated with sentence-based speechreading. The results were interpreted such that, in order to represent a critical component process in sentence-based speechreading, the visual speech perception task must entail lexically induced processing as a task-demand. The theoretical status of the word decoding task as one operationalization of a speech decoding module was discussed (Fodor, 1983). An error analysis of performance in the word decoding/discrimination tasks suggested that the perception of heard stimuli, as well as the perception of lipped stimuli, were critically dependent on the same features; that is, the temporally initial phonetic segment of the word (cf. Marslen-Wilson, 1987). Implications for a theory of visual speech perception were discussed.  相似文献   

14.
A sentence construction experiment examining the effect of part of speech and phonological form in written-word comprehension is reported. Normal and aphasic subjects had to write sentences incorporating a given word pair, one word was a homograph (e.g., “bank”) whose meaning was context-biased by the other (e.g., “money”/“river”). The effect of three psycholinguistic factors on subjects' performance was questioned: (i) The relative frequency of one meaning of the homograph as compared to the other meaning; (ii) The lexical/syntactic ambiguity (“ball”/“can”); (iii) The same/different phonological forms of the two meanings (“fair”/“bass”). The results are discussed in the framework of a model in which multiple special-purpose procedures are involved in normal processing, some of them being differentially impaired by brain disease in Broca's and Wernicke's aphasics.  相似文献   

15.
We evaluated whether lexical selection in speech production is affected by word frequency by means of two experiments. In Experiment 1 participants named pictures using utterances with the structure “pronoun + verb + adjective”. In Experiment 2 participants had to perform a gender decision task on the same pictures. Access to the noun's grammatical gender is needed in both tasks, and therefore lexical selection (lemma retrieval) is required. However, retrieval of the phonological properties (lexeme retrieval) of the referent noun is not needed to perform the tasks. In both experiments we observed faster latencies for high-frequency pictures than for low-frequency pictures. This frequency effect was stable over four repetitions of the stimuli. Our results suggest that lexical selection (lemma retrieval) is sensitive to word frequency. This interpretation runs against the hypothesis that a word's frequency exerts its effects only at the level at which the phonological properties of words are retrieved.  相似文献   

16.
Kim J  Davis C 《Perception》2003,32(1):111-120
We investigated audio-visual (AV) perceptual integration by examining the effect of seeing the speaker's synchronised moving face on masked-speech detection ability. Signal amplification and higher-level cognitive accounts of an AV advantage were contrasted, the latter by varying whether participants knew the language of the speaker. An AV advantage was shown for sentences whose mid-to-high-frequency acoustic envelope was highly correlated with articulator movement, regardless of knowledge of the language. For low-correlation sentences, knowledge of the language had a large impact; for participants with no knowledge of the language an AV inhibitory effect was found (providing support for reports of a compelling AV illusion). The results indicate a role for both sensory enhancement and higher-level cognitive factors in AV speech detection.  相似文献   

17.
Blind people can learn to understand speech at ultra-high syllable rates (ca. 20 syllables/s), a capability associated with hemodynamic activation of the central-visual system. To further elucidate the neural mechanisms underlying this skill, magnetoencephalographic (MEG) measurements during listening to sentence utterances were cross-correlated with time courses derived from the speech signal (envelope, syllable onsets and pitch periodicity) to capture phase-locked MEG components (14 blind, 12 sighted subjects; speech rate = 8 or 16 syllables/s, pre-defined source regions: auditory and visual cortex, inferior frontal gyrus). Blind individuals showed stronger phase locking in auditory cortex than sighted controls, and right-hemisphere visual cortex activity correlated with syllable onsets in case of ultra-fast speech. Furthermore, inferior-frontal MEG components time-locked to pitch periodicity displayed opposite lateralization effects in sighted (towards right hemisphere) and blind subjects (left). Thus, ultra-fast speech comprehension in blind individuals appears associated with changes in early signal-related processing mechanisms both within and outside the central-auditory terrain.  相似文献   

18.
Vocalizations often contain low-frequency modulations of the envelope of a high-frequency sound. The high-frequency portion of the cochlear nerve of mice (Mus musculus) generates a robust phase-locked response to these low-frequency modulations, and it can be easily recorded from the surface of the scalp. The cochlea is most sensitive to envelope modulation frequencies of approximately 500 to 2000 Hz. These responses have detection thresholds that are approximately 10 dB more sensitive than auditory brainstem responses, and they are very sharply tuned. These measurements may provide a nontraumatic means of repeatedly assessing cochlear functions involved in sound localization and perception of vocalizations.  相似文献   

19.
S J Prince  R A Eagle  B J Rogers 《Perception》1998,27(11):1345-1355
Yang and Blake (1991 Vision Research 31 1177-1189) investigated depth detection in stereograms containing spatially narrow-band signal and noise energies. The resulting masking functions led them to conclude that stereo vision was subserved by only two channels peaking at 3 and 5 cycles deg-1. Glennerster and Parker (1997 Vision Research 37 2143-2152) re-analysed these data, taking into account the relative attenuation of low- and high-frequency noise masks as a consequence of the modulation transfer function (MTF) of the early visual system. They transformed the data using an estimated MTF and found that peak masking was always at the signal frequency across a 2.8 octave range. Here we determine the MTF of the early visual system for individual subjects by measuring contrast thresholds in a 2AFC orientation-discrimination task (horizontal vs vertical) using band-limited stimuli presented in a 7 deg x 7 deg window at 4 deg eccentricity. The filtered stimuli had a bandwidth of 1.5 octaves in frequency and 15 degrees in orientation at half-height. In the subsequent stereo experiment, the same (vertical) filters were used to generate both signal and noise bands. The noise was binocularly uncorrelated and scaled by each subject's MTF. Subjects performed a 2AFC depth-discrimination task (crossed vs uncrossed disparity) to determine threshold signal contrast as a function of signal and mask frequency. The resulting functions showed that peak masking was at the signal frequency over the three octave range tested (0.4-3.2 cycles deg-1). Comparison with simple luminance-masking data from experiments with similar stimuli shows that bandwidths for stereo masking are considerably larger. These data suggest that there are multiple bandpass channels feeding into stereopsis but that their characteristics differ from luminance channels in pattern vision.  相似文献   

20.
The goal of this study was to explore the ability to discriminate languages using the visual correlates of speech (i.e., speech-reading). Participants were presented with silent video clips of an actor pronouncing two sentences (in Catalan and/or Spanish) and were asked to judge whether the sentences were in the same language or in different languages. Our results established that Spanish-Catalan bilingual speakers could discriminate running speech from their two languages on the basis of visual cues alone (Experiment 1). However, we found that this ability was critically restricted by linguistic experience, since Italian and English speakers who were unfamiliar with the test languages could not successfully discriminate the stimuli (Experiment 2). A test of Spanish monolingual speakers revealed that knowledge of only one of the two test languages was sufficient to achieve the discrimination, although at a lower level of accuracy than that seen in bilingual speakers (Experiment 3). Finally, we evaluated the ability to identify the language by speech-reading particularly distinctive words (Experiment 4). The results obtained are in accord with recent proposals arguing that the visual speech signal is rich in informational content, above and beyond what traditional accounts based solely on visemic confusion matrices would predict.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号