首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Text cues facilitate the perception of spoken sentences to which they are semantically related (Zekveld, Rudner, et al., 2011). In this study, semantically related and unrelated cues preceding sentences evoked more activation in middle temporal gyrus (MTG) and inferior frontal gyrus (IFG) than nonword cues, regardless of acoustic quality (speech in noise or speech in quiet). Larger verbal working memory (WM) capacity (reading span) was associated with greater intelligibility benefit obtained from related cues, with less speech-related activation in the left superior temporal gyrus and left anterior IFG, and with more activation in right medial frontal cortex for related versus unrelated cues. Better ability to comprehend masked text was associated with greater ability to disregard unrelated cues, and with more activation in left angular gyrus (AG). We conclude that individual differences in cognitive abilities are related to activation in a speech-sensitive network including left MTG, IFG and AG during cued speech perception.  相似文献   

2.
Spoken language perception may be constrained by a listener's cognitive resources, including verbal working memory (WM) capacity and basic auditory perception mechanisms. For Japanese listeners, it is unknown how, or even if, these resources are involved in the processing of pitch accent at the word level. The present study examined the extent to which native Japanese speakers could make correctness judgments on and categorize spoken Japanese words by pitch accent pattern, and how verbal WM capacity and acoustic pitch sensitivity related to perception ability. Results showed that Japanese listeners were highly accurate at judging pitch accent correctness (M = 93%), but that the more cognitively demanding accent categorization task yielded notably lower performance (M = 61%). Of chief interest was the finding that acoustic pitch sensitivity significantly predicted accuracy scores on both perception tasks, while verbal WM had a predictive role only for the categorization of a specific accent pattern. These results indicate first, that task demands greatly influence accuracy and second, that basic cognitive capacities continue to support perception of lexical prosody even in adult listeners.  相似文献   

3.
Lip reading is the ability to partially understand speech by looking at the speaker's lips. It improves the intelligibility of speech in noise when audio-visual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speaker's lips also enhances sensitivity to acoustic information, decreasing the auditory detection threshold of speech embedded in noise [J. Acoust. Soc. Am. 109 (2001) 2272; J. Acoust. Soc. Am. 108 (2000) 1197]. However, detection is different from comprehension, and it remains to be seen whether improved sensitivity also results in an intelligibility gain in audio-visual speech perception. In this work, we use an original paradigm to show that seeing the speaker's lips enables the listener to hear better and hence to understand better. The audio-visual stimuli used here could not be differentiated by lip reading per se since they contained exactly the same lip gesture matched with different compatible speech sounds. Nevertheless, the noise-masked stimuli were more intelligible in the audio-visual condition than in the audio-only condition due to the contribution of visual information to the extraction of acoustic cues. Replacing the lip gesture by a non-speech visual input with exactly the same time course, providing the same temporal cues for extraction, removed the intelligibility benefit. This early contribution to audio-visual speech identification is discussed in relationships with recent neurophysiological data on audio-visual perception.  相似文献   

4.
Speech unfolds over time, and the cues for even a single phoneme are rarely available simultaneously. Consequently, to recognize a single phoneme, listeners must integrate material over several hundred milliseconds. Prior work contrasts two accounts: (a) a memory buffer account in which listeners accumulate auditory information in memory and only access higher level representations (i.e., lexical representations) when sufficient information has arrived; and (b) an immediate integration scheme in which lexical representations can be partially activated on the basis of early cues and then updated when more information arises. These studies have uniformly shown evidence for immediate integration for a variety of phonetic distinctions. We attempted to extend this to fricatives, a class of speech sounds which requires not only temporal integration of asynchronous cues (the frication, followed by the formant transitions 150–350 ms later), but also integration across different frequency bands and compensation for contextual factors like coarticulation. Eye movements in the visual world paradigm showed clear evidence for a memory buffer. Results were replicated in five experiments, ruling out methodological factors and tying the release of the buffer to the onset of the vowel. These findings support a general auditory account for speech by suggesting that the acoustic nature of particular speech sounds may have large effects on how they are processed. It also has major implications for theories of auditory and speech perception by raising the possibility of an encapsulated memory buffer in early auditory processing.  相似文献   

5.
During speech perception, listeners make judgments about the phonological category of sounds by taking advantage of multiple acoustic cues for each phonological contrast. Perceptual experiments have shown that listeners weight these cues differently. How do listeners weight and combine acoustic cues to arrive at an overall estimate of the category for a speech sound? Here, we present several simulations using a mixture of Gaussians models that learn cue weights and combine cues on the basis of their distributional statistics. We show that a cue‐weighting metric in which cues receive weight as a function of their reliability at distinguishing phonological categories provides a good fit to the perceptual data obtained from human listeners, but only when these weights emerge through the dynamics of learning. These results suggest that cue weights can be readily extracted from the speech signal through unsupervised learning processes.  相似文献   

6.
Many older listeners report difficulties in understanding speech in noisy situations. Working memory and other cognitive skills may modulate older listeners’ ability to use context information to alleviate the effects of noise on spoken-word recognition. In the present study, we investigated whether verbal working memory predicts older adults’ ability to immediately use context information in the recognition of words embedded in sentences, presented in different listening conditions. In a phoneme-monitoring task, older adults were asked to detect as fast and as accurately as possible target phonemes in sentences spoken by a target speaker. Target speech was presented without noise, with fluctuating speech-shaped noise, or with competing speech from a single distractor speaker. The gradient measure of contextual probability (derived from a separate offline rating study) affected the speed of recognition. Contextual facilitation was modulated by older listeners’ verbal working memory (measured with a backward digit span task) and age across listening conditions. Working memory and age, as well as hearing loss, were also the most consistent predictors of overall listening performance. Older listeners’ immediate benefit from context in spoken-word recognition thus relates to their ability to keep and update a semantic representation of the sentence content in working memory.  相似文献   

7.
Reflected sounds are often treated as an acoustic problem because they produce false localization cues and decrease speech intelligibility. However, their properties are shaped by the acoustic properties of the environment and therefore are a potential source of information about that environment. The objective of this study was to determine whether information carried by reflected sounds can be used by listeners to enhance their awareness of their auditory environment. Twelve listeners participated in two auditory training tasks in which they learned to identify three environments based on a limited subset of sounds and then were tested to determine whether they could transfer that learning to new, unfamiliar sounds. Results showed that significant learning occurred despite the task difficulty. An analysis of stimulus attributes suggests that it is easiest to learn to identify reflected sound when it occurs in sounds with longer decay times and broadly distributed dominant spectral components.  相似文献   

8.
Despite spectral and temporal discontinuities in the speech signal, listeners normally report coherent phonetic patterns corresponding to the phonemes of a language that they know. What is the basis for the internal coherence of phonetic segments? According to one account, listeners achieve coherence by extracting and integrating discrete cues; according to another, coherence arises automatically from general principles of auditory form perception; according to a third, listeners perceive speech patterns as coherent because they are the acoustic consequences of coordinated articulatory gestures in a familiar language. We tested these accounts in three experiments by training listeners to hear a continuum of three-tone, modulated sine wave patterns, modeled after a minimal pair contrast between three-formant synthetic speech syllables, either as distorted speech signals carrying a phonetic contrast (speech listeners) or as distorted musical chords carrying a nonspeech auditory contrast (music listeners). The music listeners could neither integrate the sine wave patterns nor perceive their auditory coherence to arrive at consistent, categorical percepts, whereas the speech listeners judged the patterns as speech almost as reliably as the synthetic syllables on which they were modeled. The outcome is consistent with the hypothesis that listeners perceive the phonetic coherence of a speech signal by recognizing acoustic patterns that reflect the coordinated articulatory gestures from which they arose.  相似文献   

9.
The effects of perceptual learning of talker identity on the recognition of spoken words and sentences were investigated in three experiments. In each experiment, listeners were trained to learn a set of 10 talkers’ voices and were then given an intelligibility test to assess the influence of learning the voices on the processing of the linguistic content of speech. In the first experiment, listeners learned voices from isolated words and were then tested with novel isolated words mixed in noise. The results showed that listeners who were given words produced by familiar talkers at test showed better identification performance than did listeners who were given words produced by unfamiliar talkers. In the second experiment, listeners learned novel voices from sentence-length utterances and were then presented with isolated words. The results showed that learning a talker’s voice from sentences did not generalize well to identification of novel isolated words. In the third experiment, listeners learned voices from sentence-length utterances and were then given sentence-length utterances produced by familiar and unfamiliar talkers at test. We found that perceptual learning of novel voices from sentence-length utterances improved speech intelligibility for words in sentences. Generalization and transfer from voice learning to linguistic processing was found to be sensitive to the talker-specific information available during learning and test. These findings demonstrate that increased sensitivity to talker-specific information affects the perception of the linguistic properties of speech in isolated words and sentences.  相似文献   

10.
Perception of speech in competing speech is facilitated by spatial separation of the target and distracting speech, but this benefit may arise at either a perceptual or a cognitive level of processing. Load theory predicts different effects of perceptual and cognitive (working memory) load on selective attention in flanker task contexts, suggesting that this paradigm may be used to distinguish levels of interference. Two experiments examined interference from competing speech during a word recognition task under different perceptual and working memory loads in a dual-task paradigm. Listeners identified words produced by a talker of one gender while ignoring a talker of the other gender. Perceptual load was manipulated using a nonspeech response cue, with response conditional upon either one or two acoustic features (pitch and modulation). Memory load was manipulated with a secondary task consisting of one or six visually presented digits. In the first experiment, the target and distractor were presented at different virtual locations (0° and 90°, respectively), whereas in the second, all the stimuli were presented from the same apparent location. Results suggest that spatial cues improve resistance to distraction in part by reducing working memory demand.  相似文献   

11.
Three experiments tested the role of verbal versus visuo-spatial working memory in the comprehension of co-speech iconic gestures. In Experiment 1, participants viewed congruent discourse primes in which the speaker's gestures matched the information conveyed by his speech, and incongruent ones in which the semantic content of the speaker's gestures diverged from that in his speech. Discourse primes were followed by picture probes that participants judged as being either related or unrelated to the preceding clip. Performance on this picture probe classification task was faster and more accurate after congruent than incongruent discourse primes. The effect of discourse congruency on response times was linearly related to measures of visuo-spatial, but not verbal, working memory capacity, as participants with greater visuo-spatial WM capacity benefited more from congruent gestures. In Experiments 2 and 3, participants performed the same picture probe classification task under conditions of high and low loads on concurrent visuo-spatial (Experiment 2) and verbal (Experiment 3) memory tasks. Effects of discourse congruency and verbal WM load were additive, while effects of discourse congruency and visuo-spatial WM load were interactive. Results suggest that congruent co-speech gestures facilitate multi-modal language comprehension, and indicate an important role for visuo-spatial WM in these speech–gesture integration processes.  相似文献   

12.
The performance of Spanish-English bilinguals in two perception tasks, using a synthetic speech continuum varying in voice onset time, was compared with the performance of Spanish and English monolinguals. Voice onset time in speech production was also compared between these groups. Results in perception of bilinguals differed from that of both monolingual groups. Results of bilingual production in their two languages conformed with results obtained from each monolingual group. The perceptual results are interpreted in terms of differences in the use of available acoustic cues by bilingual and monolingual listeners of English and Spanish.  相似文献   

13.
Listeners are able to accurately recognize speech despite variation in acoustic cues across contexts, such as different speaking rates. Previous work has suggested that listeners use rate information (indicated by vowel length; VL) to modify their use of context-dependent acoustic cues, like voice-onset time (VOT), a primary cue to voicing. We present several experiments and simulations that offer an alternative explanation: that listeners treat VL as a phonetic cue rather than as an indicator of speaking rate, and that they rely on general cue-integration principles to combine information from VOT and VL. We demonstrate that listeners use the two cues independently, that VL is used in both naturally produced and synthetic speech, and that the effects of stimulus naturalness can be explained by a cue-integration model. Together, these results suggest that listeners do not interpret VOT relative to rate information provided by VL and that the effects of speaking rate can be explained by more general cue-integration principles.  相似文献   

14.
In 5 experiments, the authors investigated how listeners learn to recognize unfamiliar talkers and how experience with specific utterances generalizes to novel instances. Listeners were trained over several days to identify 10 talkers from natural, sinewave, or reversed speech sentences. The sinewave signals preserved phonetic and some suprasegmental properties while eliminating natural vocal quality. In contrast, the reversed speech signals preserved vocal quality while distorting temporally based phonetic properties. The training results indicate that listeners learned to identify talkers even from acoustic signals lacking natural vocal quality. Generalization performance varied across the different signals and depended on the salience of phonetic information. The results suggest similarities in the phonetic attributes underlying talker recognition and phonetic perception.  相似文献   

15.
This study investigated to what extent advance planning during sentence production is affected by a concurrent cognitive load. In two picture–word interference experiments in which participants produced subject–verb–object sentences while ignoring auditory distractor words, we assessed advance planning at a phonological (lexeme) and at an abstract–lexical (lemma) level under visuospatial or verbal working memory (WM) load. At the phonological level, subject and object nouns were found to be activated before speech onset with concurrent visuospatial WM load, but only subject nouns were found to be activated with concurrent verbal WM load, indicating a reduced planning scope as a function of type of WM load (Experiment 1). By contrast, at the abstract–lexical level, subject and object nouns were found to be activated regardless of type of concurrent load (Experiment 2). In both experiments, sentence planning had a more detrimental effect on concurrent verbal WM task performance than on concurrent visuospatial WM task performance. Overall, our results suggest that advance planning at the phonological level is more affected by a concurrently performed verbal WM task than advance planning at the abstract–lexical level. Also, they indicate an overlap of resources allocated to phonological planning in speech production and verbal WM.  相似文献   

16.
The reported research investigates how listeners recognize coarticulated phonemes. First, 2 data sets from experiments on the recognition of coarticulated phonemes published by D. H. Whalen (1989) are reanalyzed. The analyses indicate that listeners used categorization strategies involving a hierarchical dependency. Two new experiments are reported investigating the production and perception of fricative-vowel syllables. On the basis of measurements of acoustic cues on a large set of natural utterances, it was predicted that listeners would use categorization strategies involving a dependency of the fricative categorization on the perceived vowel. The predictions were tested in a perception experiment using a 2-dimensional synthetic fricative-vowel continuum. Model analyses of the results pooled across listeners confirmed the predictions. Individual analyses revealed some variability in the categorization dependencies used by different participants.  相似文献   

17.
Bradlow AR  Bent T 《Cognition》2008,106(2):707-729
This study investigated talker-dependent and talker-independent perceptual adaptation to foreign-accent English. Experiment 1 investigated talker-dependent adaptation by comparing native English listeners' recognition accuracy for Chinese-accented English across single and multiple talker presentation conditions. Results showed that the native listeners adapted to the foreign-accented speech over the course of the single talker presentation condition with some variation in the rate and extent of this adaptation depending on the baseline sentence intelligibility of the foreign-accented talker. Experiment 2 investigated talker-independent perceptual adaptation to Chinese-accented English by exposing native English listeners to Chinese-accented English and then testing their perception of English produced by a novel Chinese-accented talker. Results showed that, if exposed to multiple talkers of Chinese-accented English during training, native English listeners could achieve talker-independent adaptation to Chinese-accented English. Taken together, these findings provide evidence for highly flexible speech perception processes that can adapt to speech that deviates substantially from the pronunciation norms in the native talker community along multiple acoustic-phonetic dimensions.  相似文献   

18.
To comprehend speech in most environments, listeners must combine some but not all sounds from across a wide range of frequencies. Three experiments were conducted to examine the role of amplitude comodulation in performing an essential part of this function: the grouping together of the simultaneous components of a speech signal. Each of the experiments used time-varying sinusoidal (TVS) sentences (Remez, Rubin, Pisoni, & Carrell, 1981) as base stimuli because their component tones are acoustically unrelated. The independence of the three tones reduced the number of confounding grouping cues available compared with those found in natural or computersynthesized speech (e.g., fundamental frequency and simultaneity of harmonic onset). In each of the experiments, the TVS base stimuli were amplitude modulated to determine whether this modulation would lead to appropriate grouping of the three tones as reflected by sentence intelligibility. Experiment 1 demonstrated that amplitude comodulation at 100 Hz did improve the intelligibility of TVS sentences. Experiment 2 showed that the component tones of a TVS sentence must be comodulated (as opposed to independently modulated) for improvements in intelligibility to be found. Experiment 3 showed that the comodulation rates that led to intelligibility improvements were consistent with the effective rates found in experiments that examined the grouping of complex nonspeech sounds by common temporal envelopes(e.g., comodulation masking release; Hall, Haggard, & Fernandes, 1984). The results of these experiments support the claim that certain basic temporal-envelope processing capabilities of the liunian auditory system contribute to the perception of fluent speech.  相似文献   

19.
Recent studies have documented substantial variability among typical listeners in how gradiently they categorize speech sounds, and this variability in categorization gradience may link to how listeners weight different cues in the incoming signal. The present study tested the relationship between categorization gradience and cue weighting across two sets of English contrasts, each varying orthogonally in two acoustic dimensions. Participants performed a four-alternative forced-choice identification task in a visual world paradigm while their eye movements were monitored. We found that (a) greater categorization gradience derived from behavioral identification responses corresponds to larger secondary cue weights derived from eye movements; (b) the relationship between categorization gradience and secondary cue weighting is observed across cues and contrasts, suggesting that categorization gradience may be a consistent within-individual property in speech perception; and (c) listeners who showed greater categorization gradience tend to adopt a buffered processing strategy, especially when cues arrive asynchronously in time.  相似文献   

20.
To comprehend speech in most environments, listeners must combine some but not all sounds from across a wide range of frequencies. Three experiments were conducted to examine the role of amplitude comodulation in performing an essential part of this function: the grouping together of the simultaneous components of a speech signal. Each of the experiments used time-varying sinusoidal (TVS) sentences (Remez, Rubin, Pisoni, & Carrell, 1981) as base stimuli because their component tones are acoustically unrelated. The independence of the three tones reduced the number of confounding grouping cues available compared with those found in natural or computer-synthesized speech (e.g., fundamental frequency and simultaneity of harmonic onset). In each of the experiments, the TVS base stimuli were amplitude modulated to determine whether this modulation would lead to appropriate grouping of the three tones as reflected by sentence intelligibility. Experiment 1 demonstrated that amplitude comodulation at 100 Hz did improve the intelligibility of TVS sentences. Experiment 2 showed that the component tones of a TVS sentence must be comodulated (as opposed to independently modulated) for improvements in intelligibility to be found. Experiment 3 showed that the comodulation rates that led to intelligibility improvements were consistent with the effective rates found in experiments that examined the grouping of complex nonspeech sounds by common temporal envelopes (e.g., comodulation masking release; Hall, Haggard, & Fernandes, 1984). The results of these experiments support the claim that certain basic temporal-envelope processing capabilities of the human auditory system contribute to the perception of fluent speech.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号