20 female students in speech-language pathology provided magnitude estimation scaling responses for the speech intelligibility and acceptability of audio-taped speech samples varying systematically the number of consonant sounds produced correctly. Analysis indicated no significant over-all differences between listeners' judgments of intelligibility and acceptability; however, listeners tended to judge samples with fewer than 50% of the consonants correct as more acceptable than intelligible, and they judged samples with more than 50% consonants correct as less acceptable than intelligible.  相似文献   

People naturally move their heads when they speak, and our study shows that this rhythmic head motion conveys linguistic information. Three-dimensional head and face motion and the acoustics of a talker producing Japanese sentences were recorded and analyzed. The head movement correlated strongly with the pitch (fundamental frequency) and amplitude of the talker's voice. In a perception study, Japanese subjects viewed realistic talking-head animations based on these movement recordings in a speech-in-noise task. The animations allowed the head motion to be manipulated without changing other characteristics of the visual or acoustic speech. Subjects correctly identified more syllables when natural head motion was present in the animation than when it was eliminated or distorted. These results suggest that nonverbal gestures such as head movements play a more direct role in the perception of speech than previously known.  相似文献   

Text cues facilitate the perception of spoken sentences to which they are semantically related (Zekveld, Rudner, et al., 2011). In this study, semantically related and unrelated cues preceding sentences evoked more activation in middle temporal gyrus (MTG) and inferior frontal gyrus (IFG) than nonword cues, regardless of acoustic quality (speech in noise or speech in quiet). Larger verbal working memory (WM) capacity (reading span) was associated with greater intelligibility benefit obtained from related cues, with less speech-related activation in the left superior temporal gyrus and left anterior IFG, and with more activation in right medial frontal cortex for related versus unrelated cues. Better ability to comprehend masked text was associated with greater ability to disregard unrelated cues, and with more activation in left angular gyrus (AG). We conclude that individual differences in cognitive abilities are related to activation in a speech-sensitive network including left MTG, IFG and AG during cued speech perception.  相似文献   

A continuous speech message alternated between the left and right ears retains generally good intelligibility, except at certain critical rates of alternation of about 3–4 switching cycles/sec. In the present experiment, subjects heard speech alternated between the two ears at eight different switching frequencies, and at four different speech rates. Results support an earlier contention that the critical intelligibility parameter in alternated speech is average speech content per ear segment, rather than absolute time per ear. Implications are discussed both in terms of critical speech segments in auditory analysis and in neural processing of binaural auditory information.  相似文献   

Listeners are able to accurately recognize speech despite variation in acoustic cues across contexts, such as different speaking rates. Previous work has suggested that listeners use rate information (indicated by vowel length; VL) to modify their use of context-dependent acoustic cues, like voice-onset time (VOT), a primary cue to voicing. We present several experiments and simulations that offer an alternative explanation: that listeners treat VL as a phonetic cue rather than as an indicator of speaking rate, and that they rely on general cue-integration principles to combine information from VOT and VL. We demonstrate that listeners use the two cues independently, that VL is used in both naturally produced and synthetic speech, and that the effects of stimulus naturalness can be explained by a cue-integration model. Together, these results suggest that listeners do not interpret VOT relative to rate information provided by VL and that the effects of speaking rate can be explained by more general cue-integration principles.  相似文献   

The reliability of magnitude-estimation scaling as a measure of overall clarity of speech was investigated. 40 subjects (M age = 19 yr.) provided magnitude-estimation responses for nine audiotaped versions of a nonsense sentence varying systematically in number of correct consonant phonemes. There was no significant difference in the magnitude-estimation responses of the subjects during two test sessions separated by one week. Analysis suggested that magnitude-estimation scaling is a reliable measure of speech clarity/intelligibility. This finding is discussed in relation to speech samples varying in aspects other than number of consonant phonemes correct and possible further clinical research applications.  相似文献   

When speakers repair speech errors, they plan the repair in the context of an abandoned word (the error) that is usually similar in meaning or form. Two picture-naming experiments tested whether the error's lexical representations influence repair planning. Context pictures were sometimes replaced with target pictures; the picture names were related in meaning or form or were unrelated. The authors measured target picture-naming latencies separately for trials in which the context name was interrupted or completed. Interrupted trials showed semantic interference and phonological facilitation, whereas completed trials showed semantic facilitation and phonological interference. Thus, errors influence repair production. The authors explain the polarity of these effects in terms of the literature on context effects in word production.  相似文献   

Listeners must cope with a great deal of variability in the speech signal, and thus theories of speech perception must also account for variability, which comes from a number of sources, including variation between accents. It is well known that there is a processing cost when listening to speech in an accent other than one's own, but recent work has suggested that this cost is reduced when listening to a familiar accent widely represented in the media, and/or when short amounts of exposure to an accent are provided. Little is known, however, about how these factors (long-term familiarity and short-term familiarization with an accent) interact. The current study tested this interaction by playing listeners difficult-to-segment sentences in noise, before and after a familiarization period where the same sentences were heard in the clear, allowing us to manipulate short-term familiarization. Listeners were speakers of either Glasgow English or Standard Southern British English, and they listened to speech in either their own or the other accent, thereby allowing us to manipulate long-term familiarity. Results suggest that both long-term familiarity and short-term familiarization mitigate the perceptual processing costs of listening to an accent that is not one's own, but seem not to compensate for them entirely, even when the accent is widely heard in the media.  相似文献   

A threshold method has been developed for determination of the maximum rate of connected speech understood by an individual. The method is similar to the Békésy method for the determination of pure-tone thresholds but differs from it in that rate of speech is varied rather than intensity of a tone. Instrumentation that varies speech rate with or without pitch changes was developed and is described in some detail.  相似文献   

How does the brain extract invariant properties of variable-rate speech? A neural model, called PHONET, is developed to explain aspects of this process and, along the way, data about perceptual context effects. For example, in consonant-vowel (CV) syllables, such as /ba/ and /wa/, an increase in the duration of the vowel can cause a switch in the percept of the preceding consonant from /w/ to /b/ (J.L. Miller & Liberman, 1979). The frequency extent of the initial formant transitions of fixed duration also influences the percept (Schwab, Sawusch, & Nusbaum, 1981). PHONET quantitatively simulates over 98% of the variance in these data, using a single set of parameters. The model also qualitatively explains many data about other perceptual context effects. In the model, C and V inputs are filtered by parallel auditory streams that respond preferentially to the transient and sustained properties of the acoustic signal before being stored in parallel working memories. A lateral inhibitory network of onset- and rate-sensitive cells in the transient channel extracts measures of frequency transition rate and extent. Greater activation of the transient stream can increase the processing rate in the sustained stream via a cross-stream automatic gain control interaction. The stored activities across these gain-controlled working memories provide a basis for rate-invariant perception, since the transient-to-sustained gain control tends to preserve the relative activities across the transient and sustained working memories as speech rate changes. Comparisons with alternative models tested suggest that the fit cannot be attributed to the simplicity of the data. Brain analogues of model cell types are described.  相似文献   

Outside of the laboratory, listening conditions are often less than ideal, and when attending to sounds from a particular source, portions are often obliterated by extraneous noises. However, listeners possess rather elegant reconstructive mechanisms. Restoration can be complete, so that missing segments are indistinguishable from those actually present and the listener is unaware that the signal is fragmented. This phenomenon, called temporal induction (TI), has been studied extensively with nonverbal signals and to a lesser extent with speech. Earlier studies have demonstrated that TI can produce illusory continuity spanning gaps of a few hundred milliseconds when portions of a signal are replaced by a louder sound capable of masking the signal were it actually present. The present study employed various types of speech signals with periodic gaps and measured the effects upon intelligibility produced by filling these gaps with noises. Enhancement of intelligibility through multiple phonemic restoration occurred when the acoustic requirements for TI were met and when sufficient contextual information was available in the remaining speech fragments. It appears that phonemic restoration is a specialized form of TI that uses linguistic skills for the reconstruction of obliterated speech.  相似文献   

An automated threshold method has been developed for determining the maximum rate of speech understood by individual listeners. Two experiments were undertaken to determine whether the threshold was related to the comprehension of speech or to speech intelligibility. The first experiment compared thresholds of two types of rapid speech reportedly different in intelligibility: simple speeded speech and speech compressed by the sampling method. The second experiment sought to determine the relationship of the threshold to traditional comprehension measures. The results are discussed in terms of the intelligibility and comprehensibility of speech.  相似文献   

Understanding low-intelligibility speech is effortful. In three experiments, we examined the effects of intelligibility on working memory (WM) demands imposed by perception of synthetic speech. In all three experiments, a primary speeded word recognition task was paired with a secondary WM-load task designed to vary the availability of WM capacity during speech perception. Speech intelligibility was varied either by training listeners to use available acoustic cues in a more diagnostic manner (as in Experiment 1) or by providing listeners with more informative acoustic cues (i.e., better speech quality, as in Experiments 2 and 3). In the first experiment, training significantly improved intelligibility and recognition speed; increasing WM load significantly slowed recognition. A significant interaction between training and load indicated that the benefit of training on recognition speed was observed only under low memory load. In subsequent experiments, listeners received no training; intelligibility was manipulated by changing synthesizers. Improving intelligibility without training improved recognition accuracy, and increasing memory load still decreased it, but more intelligible speech did not produce more efficient use of available WM capacity. This suggests that perceptual learning modifies the way available capacity is used, perhaps by increasing the use of more phonetically informative features and/or by decreasing use of less informative ones.  相似文献   

