首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
When deleted segments of speech are replaced by extraneous sounds rather than silence, the missing speech fragments may be perceptually restored and intelligibility improved. This phonemic restoration (PhR) effect has been used to measure various aspects of speech processing, with deleted portions of speech typically being replaced by stochastic noise. However, several recent studies of PhR have used speech-modulated noise, which may provide amplitude-envelope cues concerning the replaced speech. The present study compared the effects upon intelligibility of replacing regularly spaced portions of speech with stochastic (white) noise versus speech-modulated noise. In Experiment 1, filling periodic gaps in sentences with noise modulated by the amplitude envelope of the deleted speech fragments produced twice the intelligibility increase obtained with interpolated stochastic noise. Moreover, when lists of isolated monosyllables were interrupted in Experiment 2, interpolation of speech-modulated noise increased intelligibility whereas stochastic noise reduced intelligibility. The augmentation of PhR produced by modulated noise appeared without practice, suggesting that speech processing normally involves not only a narrowband analysis of spectral information but also a wideband integration of amplitude levels across critical bands. This is of considerable theoretical interest, but it also suggests that since PhRs produced by speech-modulated noise utilize potent bottom-up cues provided by the noise, they differ from the PhRs produced by extraneous sounds, such as coughs and stochastic noise.  相似文献   

2.
In order to function effectively as a means of communication, speech must be intelligible under the noisy conditions encountered in everyday life. Two types of perceptual synthesis have been reported that can reduce or cancel the effects of masking by extraneous sounds: Phonemic restoration can enhance intelligibility when segments are replaced or masked by noise, and contralateral induction can prevent mislateralization by effectively restoring speech masked at one ear when it is heard in the other. The present study reports a third type of perceptual synthesis induced by noise: enhancement of intelligibility produced by adding noise to spectral gaps. In most of the experiments, the speech stimuli consisted of two widely separated narrow bands of speech (center frequencies of 370 and 6000 Hz, each band having high-pass and low-pass slopes of 115 dB/octave meeting at the center frequency). These very narrow bands effectively reduced the available information to frequency-limited patterns of amplitude fluctuation lacking information concerning formant structure and frequency transitions. When stochastic noise was introduced into the gap separating the two speech bands, intelligibility increased for “everyday” sentences, for sentences that varied in the transitional probability of keywords, and for monosyllabic word lists. Effects produced by systematically varying noise amplitude and noise bandwidth are reported, and the implications of some of the novel effects observed are discussed.  相似文献   

3.
4.
This study assessed intelligibility in a dysarthric patient with Parkinson's disease (PD) across five speech production tasks: spontaneous speech, repetition, reading, repeated singing, and spontaneous singing, using the same phrases for all but spontaneous singing. The results show that this speaker was significantly less intelligible when speaking spontaneously than in the other tasks. Acoustic analysis suggested that relative intensity and word duration were not independently linked to intelligibility, but dysfluencies (from perceptual analysis) and articulatory/resonance patterns (from acoustic records) were related to intelligibility in predictable ways. These data indicate that speech production task may be an important variable to consider during the evaluation of dysarthria. As speech production efficiency was found to vary with task in a patient with Parkinson's disease, these results can be related to recent models of basal ganglia function in motor performance.  相似文献   

5.
To comprehend speech in most environments, listeners must combine some but not all sounds from across a wide range of frequencies. Three experiments were conducted to examine the role of amplitude comodulation in performing an essential part of this function: the grouping together of the simultaneous components of a speech signal. Each of the experiments used time-varying sinusoidal (TVS) sentences (Remez, Rubin, Pisoni, & Carrell, 1981) as base stimuli because their component tones are acoustically unrelated. The independence of the three tones reduced the number of confounding grouping cues available compared with those found in natural or computer-synthesized speech (e.g., fundamental frequency and simultaneity of harmonic onset). In each of the experiments, the TVS base stimuli were amplitude modulated to determine whether this modulation would lead to appropriate grouping of the three tones as reflected by sentence intelligibility. Experiment 1 demonstrated that amplitude comodulation at 100 Hz did improve the intelligibility of TVS sentences. Experiment 2 showed that the component tones of a TVS sentence must be comodulated (as opposed to independently modulated) for improvements in intelligibility to be found. Experiment 3 showed that the comodulation rates that led to intelligibility improvements were consistent with the effective rates found in experiments that examined the grouping of complex nonspeech sounds by common temporal envelopes (e.g., comodulation masking release; Hall, Haggard, & Fernandes, 1984). The results of these experiments support the claim that certain basic temporal-envelope processing capabilities of the human auditory system contribute to the perception of fluent speech.  相似文献   

6.
To comprehend speech in most environments, listeners must combine some but not all sounds from across a wide range of frequencies. Three experiments were conducted to examine the role of amplitude comodulation in performing an essential part of this function: the grouping together of the simultaneous components of a speech signal. Each of the experiments used time-varying sinusoidal (TVS) sentences (Remez, Rubin, Pisoni, & Carrell, 1981) as base stimuli because their component tones are acoustically unrelated. The independence of the three tones reduced the number of confounding grouping cues available compared with those found in natural or computersynthesized speech (e.g., fundamental frequency and simultaneity of harmonic onset). In each of the experiments, the TVS base stimuli were amplitude modulated to determine whether this modulation would lead to appropriate grouping of the three tones as reflected by sentence intelligibility. Experiment 1 demonstrated that amplitude comodulation at 100 Hz did improve the intelligibility of TVS sentences. Experiment 2 showed that the component tones of a TVS sentence must be comodulated (as opposed to independently modulated) for improvements in intelligibility to be found. Experiment 3 showed that the comodulation rates that led to intelligibility improvements were consistent with the effective rates found in experiments that examined the grouping of complex nonspeech sounds by common temporal envelopes(e.g., comodulation masking release; Hall, Haggard, & Fernandes, 1984). The results of these experiments support the claim that certain basic temporal-envelope processing capabilities of the liunian auditory system contribute to the perception of fluent speech.  相似文献   

7.
An automated threshold method has been developed for determining the maximum rate of speech understood by individual listeners. Two experiments were undertaken to determine whether the threshold was related to the comprehension of speech or to speech intelligibility. The first experiment compared thresholds of two types of rapid speech reportedly different in intelligibility: simple speeded speech and speech compressed by the sampling method. The second experiment sought to determine the relationship of the threshold to traditional comprehension measures. The results are discussed in terms of the intelligibility and comprehensibility of speech.  相似文献   

8.
The present paper describes a simple and relatively inexpensive brain stimulator circuit for generating trains of conditioning (C) and test (T) pulse pairs for refractory period and excitability cycle analyses. G and T outputs are constant-current monophasic cathodal pulses of adjustable frequency, duration, amplitude, and delay. C and T pulses can be controlled manually or through logic programming and can be fed out the same or separate channels. The stimulator can be operated on either ac or dc supplies and, when battery operated, features a high degree of stimulus isolation.  相似文献   

9.
The intelligibility of word lists subjected to various types of spectral filtering has been studied extensively. Although words used for communication are usually present in sentences rather than lists, there has been no systematic report of the intelligibility of lexical components of narrowband sentences. In the present study, we found that surprisingly little spectral information is required to identify component words when sentences are heard through narrow spectral slits. Four hundred twenty listeners (21 groups of 20 subjects) were each presented with 100 bandpass filtered CID ( “everyday speech ”) sentences; separate groups received center frequencies of 370, 530, 750, 1100, 1500, 2100, 3000, 4200, and 6000 Hz at 70 dBA SPL. In Experiment 1, intelligibility of single 1/3-octave bands with steep filter slopes (96 dB/octave) averaged more than 95% for sentences centered at 1100, 1500, and 2100 Hz. In Experiment 2, we used the same center frequencies with extremely narrow bands (slopes of 115 dB/octave intersecting at the center frequency, resulting in a nominal bandwidth of l/20 octave). Despite the severe spectral tilt for all frequencies of this impoverished spectrum, intelligibility remained relatively high for most bands, with the greatest intelligibility (77%) at 1500 Hz. In Experiments 1 and 2, the bands centered at 370 and 6000 Hz provided little useful information when presented individually, but in each experiment they interacted synergistically when combined. The present findings demonstrate the adaptive flexibility of mechanisms used for speech perception and are discussed in the context of the LAME model of opportunistic multilevel processing.  相似文献   

10.
A continuous speech message alternated between the left and right ears retains generally good intelligibility, except at certain critical rates of alternation of about 3–4 switching cycles/sec. In the present experiment, subjects heard speech alternated between the two ears at eight different switching frequencies, and at four different speech rates. Results support an earlier contention that the critical intelligibility parameter in alternated speech is average speech content per ear segment, rather than absolute time per ear. Implications are discussed both in terms of critical speech segments in auditory analysis and in neural processing of binaural auditory information.  相似文献   

11.
We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.  相似文献   

12.
Understanding low-intelligibility speech is effortful. In three experiments, we examined the effects of intelligibility on working memory (WM) demands imposed by perception of synthetic speech. In all three experiments, a primary speeded word recognition task was paired with a secondary WM-load task designed to vary the availability of WM capacity during speech perception. Speech intelligibility was varied either by training listeners to use available acoustic cues in a more diagnostic manner (as in Experiment 1) or by providing listeners with more informative acoustic cues (i.e., better speech quality, as in Experiments 2 and 3). In the first experiment, training significantly improved intelligibility and recognition speed; increasing WM load significantly slowed recognition. A significant interaction between training and load indicated that the benefit of training on recognition speed was observed only under low memory load. In subsequent experiments, listeners received no training; intelligibility was manipulated by changing synthesizers. Improving intelligibility without training improved recognition accuracy, and increasing memory load still decreased it, but more intelligible speech did not produce more efficient use of available WM capacity. This suggests that perceptual learning modifies the way available capacity is used, perhaps by increasing the use of more phonetically informative features and/or by decreasing use of less informative ones.  相似文献   

13.
The cyclic variation in the energy envelope of the speech signal results from the production of speech in syllables. This acoustic property is often identified as a source of information in the perception of syllable attributes, though spectral variation can also provide this information reliably. In the present study of the relative contributions of the energy and spectral envelopes in speech perception, we employed sinusoidal replicas of utterances, which permitted us to examine the roles of these acoustic properties in establishing or maintaining time-varying perceptual coherence. Three experiments were carried out to assess the independent perceptual effects of variation in sinusoidal amplitude and frequency, using sentence-length signals. In Experiment 1, we found that the fine grain of amplitude variation was not necessary for the perception of segmental and suprasegmental linguistic attributes; in Experiment 2, we found that amplitude was nonetheless effective in influencing syllable perception, and that in some circumstances it was crucial to segmental perception; in Experiment 3, we observed that coarse-grain amplitude variation, above all, proved to be extremely important in phonetic perception. We conclude that in perceiving sinusoidal replicas, the perceiver derives much from following the coherent pattern of frequency variation and gross signal energy, but probably derives rather little from tracking the precise details of the energy envelope. These findings encourage the view that the perceiver uses time-varying acoustic properties selectively in understanding speech.  相似文献   

14.
Speech intelligibility performance with an in-the-ear microphone embedded in a custom-molded deep-insertion earplug was compared with results obtained using a free-field microphone. Intelligibility differences between microphones were further analyzed to assess whether reduced intelligibility was specific to certain sound classes. 36 participants completed the Modified Rhyme Test using recordings made with each microphone. While speech intelligibility for both microphones was highly accurate, intelligibility with the free-field microphone was significantly better than with the in-the-ear microphone. There were significant effects of place and manner of sound production. Significant differences in recognition among specific phonemes were also revealed. Implications included modifying the in-the-ear microphone to transmit more high frequency energy. Use of the in-the-ear microphone was limited by significant loss of high-frequency energy of the speech signal which resulted in reduced intelligibility for some sounds; however, the in-the-ear microphone is a promising technology for effective communication in military environments.  相似文献   

15.
16.
Although temporal processing is used in a wide range of sensory and motor tasks, there is little evidence as to whether a single centralized clock or a distributed system underlies timing in the range of tens to hundreds of milliseconds. We investigated this question by studying whether learning on an auditory interval discrimination task generalizes across stimulus types, intervals, and frequencies. The degree to which improvements in timing carry over to different stimulus features constrains the neural mechanisms underlying timing. Human subjects trained on a 100- or 200-msec interval discrimination task showed an improvement in temporal resolution. This learning generalized to a perceptually distinct duration stimulus, as well as to the trained interval presented with tones at untrained spectral frequencies. The improvement in performance did not generalize to untrained intervals. To determine if spectral generalization was dependent on the importance of frequency information in the task, subjects were simultaneously trained on two different intervals identified by frequency. As a whole, our results indicate that the brain uses circuits that are dedicated to specific time spans, and that each circuit processes stimuli across nontemporal stimulus features. The patterns of generalization additionally indicate that temporal learning does not rely on changes in early, subcortical processing, because the nontemporal features are encoded by different channels at early stages.  相似文献   

17.
A device is described which has 10 input and 2 output lines. Grounding an input causes a pulse with a specific amplitude, polarity, and duration to appear on one of the output lines. Pulse parameters can be set by front-panel controls. Thus, 10 distinct events can be coded by associating a unique pulse with each event. These pulses can be recorded on one (or two) channels of a magnetic tape recorder for subsequent processing. The use of this coder in the study of event-related potentials is described.  相似文献   

18.
When speech is rapidly alternated between the two ears, intelligibility declines as the rate of alternation approaches 3 to 5 switching cycles per second, and then, paradoxically, returns to a good level beyond that point. We tested intelligibility when shadowing was used as a response measure (Experiment 1), when recall was used as a response measure (Experiment 2), and when time-compression was used to vary the speech rate of the presented materials (Experiment 3). In spite of claims that older adults are generally slower in switching attention, younger and older adults did not differ in the critical alternation rates producing minimal intelligibility. We suggest that the point of minimal intelligibility in alternated speech reflects an interaction between (1) the rate of disruption induced by breaking the speech stream between two sound sources, (2) the amount of contextual information per ear, and (3) the size of the silent gaps separating the speech elements that must be perceptually bridged.  相似文献   

19.
The perception of speech and music requires processing of variations in spectra and amplitude over different time intervals. Near-term fetuses can discriminate acoustic features, such as frequencies and spectra, but whether they can process complex auditory streams, such as speech sequences and more specifically their temporal variations, fast or relatively slow acoustic variations, is unclear. We recorded the cardiac activity of 82 near-term fetuses (38 weeks GA) in quiet sleep during a silent control condition and four 15 s streams presented at 90 dB SPL Leq: two piano melodies with opposite contours, a natural Icelandic sentence and a chimera of the sentence--all its spectral information was replaced with broadband noise, leaving its specific temporal variations in amplitude intact without any phonological information. All stimuli elicited a heart rate deceleration. The response patterns to the melodies were the same and differed significantly from those observed with the Icelandic sentence and its chimera, which did not differ. The melodies elicited a monophasic heart rate deceleration, indicating a stimulus orienting reflex while the Icelandic and its chimera evoked a sustained lower magnitude response, indicating a sustained attentional response or more focused information processing. A conservative interpretation of the data is that near-term fetuses can perceive sound streams and the rapid temporal variations in amplitude that are specific to speech sounds with no spectral variations at all.  相似文献   

20.
Speech prosody has traditionally been considered solely in terms of its auditory features, yet correlated visual features exist, such as head and eyebrow movements. This study investigated the extent to which visual prosodic features are able to affect the perception of the auditory features. Participants were presented with videos of a speaker pronouncing two words, with visual features of emphasis on one of these words. For each trial, participants saw one video where the two words were identical in both pitch and amplitude, and another video where there was a difference in either pitch or amplitude that was congruent or incongruent with the visual changes. Participants were asked to decide which video contained the sound difference. Thresholds were obtained for the congruent and incongruent videos, and for an auditory-alone condition. It was found that the congruent thresholds were better than the incongruent thresholds for both pitch and amplitude changes. Interestingly, the congruent thresholds for amplitude were better than for the auditory-alone condition, which implies that the visual features improve sensitivity to loudness changes. These results demonstrate that visual stimuli can affect auditory thresholds for changes in pitch and amplitude, and furthermore support the view that visual prosodic features enhance speech processing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号