首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
When listeners hear a sinusoidal replica of a sentence, they perceive linguistic properties despite the absence of short-time acoustic components typical of vocal signals. Is this accomplished by a postperceptual strategy that accommodates the anomalous acoustic pattern ad hoc, or is a sinusoidal sentence understood by the ordinary means of speech perception? If listeners treat sinusoidal signals as speech signals however unlike speech they may be, then perception should exhibit the commonplace sensitivity to the dimensions of the originating vocal tract. The present study, employing sinusoidal signals, raised this issue by testing the identification of target /bVt/, or b-vowel-t, syllables occurring in sentences that differed in the range of frequency variation of their component tones. Vowel quality of target syllables was influenced by this acoustic correlate of vocal-tract scale, implying that the perception of these nonvocal signals includes a process of vocal-tract scale, implying that the perception of these nonvocal signals includes a process of vocal-tract normalization. Converging evidence suggests that the perception of sinusoidal vowels depends on the relation among component tones and not on the phonetic likeness of each tone in isolation. The findings support the general claim that sinusoidal replicas of natural speech signals are perceptible phonetically because they preserve time-varying information present in natural signals.  相似文献   

2.
In 5 experiments, the authors investigated how listeners learn to recognize unfamiliar talkers and how experience with specific utterances generalizes to novel instances. Listeners were trained over several days to identify 10 talkers from natural, sinewave, or reversed speech sentences. The sinewave signals preserved phonetic and some suprasegmental properties while eliminating natural vocal quality. In contrast, the reversed speech signals preserved vocal quality while distorting temporally based phonetic properties. The training results indicate that listeners learned to identify talkers even from acoustic signals lacking natural vocal quality. Generalization performance varied across the different signals and depended on the salience of phonetic information. The results suggest similarities in the phonetic attributes underlying talker recognition and phonetic perception.  相似文献   

3.
During a conversation, we hear the sound of the talker as well as the intended message. Traditional models of speech perception posit that acoustic details of a talker's voice are not encoded with the message whereas more recent models propose that talker identity is automatically encoded. When shadowing speech, listeners often fail to detect a change in talker identity. The present study was designed to investigate whether talker changes would be detected when listeners are actively engaged in a normal conversation, and visual information about the speaker is absent. Participants were called on the phone, and during the conversation the experimenter was surreptitiously replaced by another talker. Participants rarely noticed the change. However, when explicitly monitoring for a change, detection increased. Voice memory tests suggested that participants remembered only coarse information about both voices, rather than fine details. This suggests that although listeners are capable of change detection, voice information is not continuously monitored at a fine-grain level of acoustic representation during natural conversation and is not automatically encoded. Conversational expectations may shape the way we direct attention to voice characteristics and perceive differences in voice.  相似文献   

4.
Our studies revealed two stable modes of perceptual organization, one based on attributes of auditory sensory elements and another based on attributes of patterned sensory variation composed by the aggregation of sensory elements. In a dual-task method, listeners attended concurrently to both aspects, component and pattern, of a sine wave analogue of a word. Organization of elements was indexed by several single-mode tests of auditory form perception to verify the perceptual segregation of either an individual formant of a synthetic word or a tonal component of a sinusoidal word analogue. Organization of patterned variation was indexed by a test of lexical identification. The results show the independence of the perception of auditory and phonetic form, which appear to be differently organized concurrent effects of the same acoustic cause.  相似文献   

5.
During a conversation, we hear the sound of the talker as well as the intended message. Traditional models of speech perception posit that acoustic details of a talker's voice are not encoded with the message whereas more recent models propose that talker identity is automatically encoded. When shadowing speech, listeners often fail to detect a change in talker identity. The present study was designed to investigate whether talker changes would be detected when listeners are actively engaged in a normal conversation, and visual information about the speaker is absent. Participants were called on the phone, and during the conversation the experimenter was surreptitiously replaced by another talker. Participants rarely noticed the change. However, when explicitly monitoring for a change, detection increased. Voice memory tests suggested that participants remembered only coarse information about both voices, rather than fine details. This suggests that although listeners are capable of change detection, voice information is not continuously monitored at a fine-grain level of acoustic representation during natural conversation and is not automatically encoded. Conversational expectations may shape the way we direct attention to voice characteristics and perceive differences in voice.  相似文献   

6.
This study examined the effect of native language background on listeners’ perception of native and non-native vowels spoken by native (Hong Kong Cantonese) and non-native (Mandarin and Australian English) speakers. They completed discrimination and an identification task with and without visual cues in clear and noisy conditions. Results indicated that visual cues did not facilitate perception, and performance was better in clear than in noisy conditions. More importantly, the Cantonese talker’s vowels were the easiest to discriminate, and the Mandarin talker’s vowels were as intelligible as the native talkers’ speech. These results supported the interlanguage speech native intelligibility benefit patterns proposed by Hayes-Harb et al. (J Phonetics 36:664–679, 2008). The Mandarin and English listeners’ identification patterns were similar to those of the Cantonese listeners, suggesting that they might have assimilated Cantonese vowels to their closest native vowels. In addition, listeners’ perceptual patterns were consistent with the principles of Best’s Perceptual Assimilation Model (Best in Speech perception and linguistic experience: issues in cross-language research. York Press, Timonium, 1995).  相似文献   

7.
The effects of perceptual learning of talker identity on the recognition of spoken words and sentences were investigated in three experiments. In each experiment, listeners were trained to learn a set of 10 talkers’ voices and were then given an intelligibility test to assess the influence of learning the voices on the processing of the linguistic content of speech. In the first experiment, listeners learned voices from isolated words and were then tested with novel isolated words mixed in noise. The results showed that listeners who were given words produced by familiar talkers at test showed better identification performance than did listeners who were given words produced by unfamiliar talkers. In the second experiment, listeners learned novel voices from sentence-length utterances and were then presented with isolated words. The results showed that learning a talker’s voice from sentences did not generalize well to identification of novel isolated words. In the third experiment, listeners learned voices from sentence-length utterances and were then given sentence-length utterances produced by familiar and unfamiliar talkers at test. We found that perceptual learning of novel voices from sentence-length utterances improved speech intelligibility for words in sentences. Generalization and transfer from voice learning to linguistic processing was found to be sensitive to the talker-specific information available during learning and test. These findings demonstrate that increased sensitivity to talker-specific information affects the perception of the linguistic properties of speech in isolated words and sentences.  相似文献   

8.
The own-race bias in memory for faces has been a rich source of empirical work on the mechanisms of person perception. This effect is thought to arise because the face-perception system differentially encodes the relevant structural dimensions of features and their configuration based on experiences with different groups of faces. However, the effects of sociocultural experiences on person perception abilities in other identity-conveying modalities like audition have not been explored. Investigating an own-race bias in the auditory domain provides a unique opportunity for studying whether person identification is a modality-independent construct and how it is sensitive to asymmetric cultural experiences. Here we show that an own-race bias in talker identification arises from asymmetric experience with different spoken dialects. When listeners categorized voices by race (White or Black), a subset of the Black voices were categorized as sounding White, while the opposite case was unattested. Acoustic analyses indicated listeners’ perceptions about race were consistent with differences in specific phonetic and phonological features. In a subsequent person-identification experiment, the Black voices initially categorized as sounding White elicited an own-race bias from White listeners, but not from Black listeners. These effects are inconsistent with person-perception models that strictly analogize faces and voices based on recognition from only structural features. Our results demonstrate that asymmetric exposure to spoken dialect, independent from talkers’ physical characteristics, affects auditory perceptual expertise for talker identification. Person perception thus additionally relies on socioculturally-acquired dynamic information, which may be represented by different mechanisms in different sensory modalities.  相似文献   

9.
Given sequences of digits with temporally equidistant acoustic onsets, listeners do not perceive them as isochronous (Morton, Marcus, & Frankish, 1976). In order for the sequences to be perceptually isochronous, systematic departures from acoustic isochrony must be introduced. These acoustic departures are precisely those that talkers generate when asked to produce an isochronous sequence (Fowler, 1979), suggesting that listeners judge isochrony based on acoustic information about articulatory timing. The present experiment was an attempt to test directly whether perceptually isochronous sequences have isochronous articulatory correlates. Electromyographic potentials were recorded from the orbicularis oris muscle when speakers produced sequences of monosyllables “as if speaking in time to a metronome.” Sequences were devised so that lip-muscle activity was related to the syllable-initial consonant, the stressed vowel, or the stressed vowel and final consonant. Results indicate that isochronous muscular activity accompanies both isochronous and anisochronous acoustic signals produced under instructions to generate isochronous sequences. These results support an interpretation of the perceptual phenomenon reported by Morton et al. to the effect that listeners judge isochrony of the talker’s articulations as they are reflected in the acoustic signal.  相似文献   

10.
We examined the effect of perceptual training on a well-established hemispheric asymmetry in speech processing. Eighteen listeners were trained to use a within-category difference in voice onset time (VOT) to cue talker identity. Successful learners (n=8) showed faster response times for stimuli presented only to the left ear than for those presented only to the right. The development of a left-ear/right-hemisphere advantage for processing a prototypically phonetic cue supports a model of speech perception in which lateralization is driven by functional demands (talker identification vs. phonetic categorization) rather than by acoustic stimulus properties alone.  相似文献   

11.
The cyclic variation in the energy envelope of the speech signal results from the production of speech in syllables. This acoustic property is often identified as a source of information in the perception of syllable attributes, though spectral variation can also provide this information reliably. In the present study of the relative contributions of the energy and spectral envelopes in speech perception, we employed sinusoidal replicas of utterances, which permitted us to examine the roles of these acoustic properties in establishing or maintaining time-varying perceptual coherence. Three experiments were carried out to assess the independent perceptual effects of variation in sinusoidal amplitude and frequency, using sentence-length signals. In Experiment 1, we found that the fine grain of amplitude variation was not necessary for the perception of segmental and suprasegmental linguistic attributes; in Experiment 2, we found that amplitude was nonetheless effective in influencing syllable perception, and that in some circumstances it was crucial to segmental perception; in Experiment 3, we observed that coarse-grain amplitude variation, above all, proved to be extremely important in phonetic perception. We conclude that in perceiving sinusoidal replicas, the perceiver derives much from following the coherent pattern of frequency variation and gross signal energy, but probably derives rather little from tracking the precise details of the energy envelope. These findings encourage the view that the perceiver uses time-varying acoustic properties selectively in understanding speech.  相似文献   

12.
Two talkers' productions of the same phoneme may be quite different acoustically, whereas their productions of different speech sounds may be virtually identical. Despite this lack of invariance in the relationship between the speech signal and linguistic categories, listeners experience phonetic constancy across a wide range of talkers, speaking styles, linguistic contexts, and acoustic environments. The authors present evidence that perceptual sensitivity to talker variability involves an active cognitive mechanism: Listeners expecting to hear 2 different talkers differing only slightly in average pitch showed performance costs typical of adjusting to talker variability, whereas listeners hearing the same materials but expecting a single talker or given no special instructions did not show these performance costs. The authors discuss the implications for understanding phonetic constancy despite variability between talkers (and other sources of variability) and for theories of speech perception. The results provide further evidence for active, controlled processing in real-time speech perception and are consistent with a model of talker normalization that involves contextual tuning.  相似文献   

13.
The research investigates how listeners segment the acoustic speech signal into phonetic segments and explores implications that the segmentation strategy may have for their perception of the (apparently) context-sensitive allophones of a phoneme. Two manners of segmentation are contrasted. In one, listeners segment the signal into temporally discrete, context-sensitive segments. In the other, which may be consistent with the talker’s production of the segments, they partition the signal into separate, but overlapping, segments freed of their contextual influences. Two complementary predictions of the second hypothesis are tested. First, listeners will use anticipatory coarticulatory information for a segment as information for the forthcoming segment. Second, subjects will not hear anticipatory coarticulatory information as part of the phonetic segment with which it co-occurs in time. The first hypothesis is supported by findings on a choice reaction time procedure; the second is supported by findings on a 4IAX discrimination test. Implications of the findings for theories of speech production, perception, and of the relation between the two are considered.  相似文献   

14.
It has been well documented that listeners are able to estimate speaking rate when listening to a talker, but almost no work has been done on perception of rate information provided by looking at a talker’s face. In the present study, the method of magnitude estimation was used to collect estimates of the rate at which a talker was speaking. The estimates were collected under four experimental conditions: auditory only, visual only, combined auditory-visual, and inverted visual only. The results showed no difference in the slope of the functions relating perceived rate to physical rate for the auditory only, visual only, and combined auditory-visual presentations. There was, however, a significant difference between the normal visual-only and the inverted-visual presentations. These results indicate that there is visual rate information available on a talker’s face and, more importantly, suggest that there is a correspondence between the auditory and visual modalities for the perception of speaking rate, but only when the visual information is presented in its normal orientation.  相似文献   

15.
Research has shown that speaking rate provides an important context for the perception of certain acoustic properties of speech. For example, syllable duration, which varies as a function of speaking rate, has been shown to influence the perception of voice onset time (VOT) for syllableinitial stop consonants. The purpose of the present experiments was to examine the influence of syllable duration when the initial portion of the syllable was produced by one talker and the remainder of the syllable was produced by a different talker. A short-duration and a long-duration /bi/-/pi/ continuum were synthesized with pitch and formant values appropriate to a female talker. When presented to listeners for identification, these stimuli demonstrated the typical effect of syllable duration on the voicing boundary: a shorter VOT boundary for the short stimuli than for the long stimuli. An /i/ vowel, synthesized with pitch and formant values appropriate to a male talker, was added to the end of each of the short tokens, producing a new hybrid continuum. Although the overall syllable duration of the hybrid stimuli equaled the original long stimuli, they produced a VOT boundary similar to that for the short stimuli. In a second experiment, two new /i/ vowels were synthesized. One had a pitch appropriate to a female talker with formant values appropriate to a male talker; the other had a pitch appropriate to a male talker and formants appropriate to a female talker. These vowels were used to create two new hybrid continua. In a third experiment, new hybrid continua were created by using more extreme male formant values. The results of both experiments demonstrated that the hybrid tokens with a change in pitch acted like the short stimuli, whereas the tokens with a change in formants acted like the long stimuli. A fourth experiment demonstrated that listeners could hear a change in talker with both sets of hybrid tokens. These results indicate that continuity of pitch but not formant structure appears to be the critical factor in the calculation of speaking rate within a syllable.  相似文献   

16.
Bradlow AR  Bent T 《Cognition》2008,106(2):707-729
This study investigated talker-dependent and talker-independent perceptual adaptation to foreign-accent English. Experiment 1 investigated talker-dependent adaptation by comparing native English listeners' recognition accuracy for Chinese-accented English across single and multiple talker presentation conditions. Results showed that the native listeners adapted to the foreign-accented speech over the course of the single talker presentation condition with some variation in the rate and extent of this adaptation depending on the baseline sentence intelligibility of the foreign-accented talker. Experiment 2 investigated talker-independent perceptual adaptation to Chinese-accented English by exposing native English listeners to Chinese-accented English and then testing their perception of English produced by a novel Chinese-accented talker. Results showed that, if exposed to multiple talkers of Chinese-accented English during training, native English listeners could achieve talker-independent adaptation to Chinese-accented English. Taken together, these findings provide evidence for highly flexible speech perception processes that can adapt to speech that deviates substantially from the pronunciation norms in the native talker community along multiple acoustic-phonetic dimensions.  相似文献   

17.

The nondeterministic relationship between speech acoustics and abstract phonemic representations imposes a challenge for listeners to maintain perceptual constancy despite the highly variable acoustic realization of speech. Talker normalization facilitates speech processing by reducing the degrees of freedom for mapping between encountered speech and phonemic representations. While this process has been proposed to facilitate the perception of ambiguous speech sounds, it is currently unknown whether talker normalization is affected by the degree of potential ambiguity in acoustic-phonemic mapping. We explored the effects of talker normalization on speech processing in a series of speeded classification paradigms, parametrically manipulating the potential for inconsistent acoustic-phonemic relationships across talkers for both consonants and vowels. Listeners identified words with varying potential acoustic-phonemic ambiguity across talkers (e.g., beet/boat vs. boot/boat) spoken by single or mixed talkers. Auditory categorization of words was always slower when listening to mixed talkers compared to a single talker, even when there was no potential acoustic ambiguity between target sounds. Moreover, the processing cost imposed by mixed talkers was greatest when words had the most potential acoustic-phonemic overlap across talkers. Models of acoustic dissimilarity between target speech sounds did not account for the pattern of results. These results suggest (a) that talker normalization incurs the greatest processing cost when disambiguating highly confusable sounds and (b) that talker normalization appears to be an obligatory component of speech perception, taking place even when the acoustic-phonemic relationships across sounds are unambiguous.

  相似文献   

18.
People naturally move their heads when they speak, and our study shows that this rhythmic head motion conveys linguistic information. Three-dimensional head and face motion and the acoustics of a talker producing Japanese sentences were recorded and analyzed. The head movement correlated strongly with the pitch (fundamental frequency) and amplitude of the talker's voice. In a perception study, Japanese subjects viewed realistic talking-head animations based on these movement recordings in a speech-in-noise task. The animations allowed the head motion to be manipulated without changing other characteristics of the visual or acoustic speech. Subjects correctly identified more syllables when natural head motion was present in the animation than when it was eliminated or distorted. These results suggest that nonverbal gestures such as head movements play a more direct role in the perception of speech than previously known.  相似文献   

19.
Vocal Expression and Perception of Emotion   总被引:3,自引:0,他引:3  
Speech is an acoustically rich signal that provides considerable personal information about talkers. The expression of emotions in speech sounds and corresponding abilities to perceive such emotions are both fundamental aspects of human communication. Findings from studies seeking to characterize the acoustic properties of emotional speech indicate that speech acoustics provide an external cue to the level of nonspecific arousal associated with emotionalprocesses and to a lesser extent, the relative pleasantness of experienced emotions. Outcomes from perceptual tests show that listeners are able to accurately judge emotions from speech at rates far greater than expected by chance. More detailed characterizations of these production and perception aspects of vocal communication will necessarily involve knowledge aboutdifferences among talkers, such as those components of speech that provide comparatively stable cues to individual talkers identities.  相似文献   

20.
The acoustic structure of the speech signal is extremely variable due to a variety of contextual factors, including talker characteristics and speaking rate. To account for the listener’s ability to adjust to this variability, speech researchers have posited the existence of talker and rate normalization processes. The current study examined how the perceptual system encoded information about talker and speaking rate during phonetic perception. Experiments 1–3 examined this question, using a speeded classification paradigm developed by Garner (1974). The results of these experiments indicated that decisions about phonemic identity were affected by both talker and rate information: irrelevant variation in either dimension interfered with phonemic classification. While rate classification was also affected by phoneme variation, talker classification was not. Experiment 4 examined the impact of talker and rate variation on the voicing boundary under different blocking conditions. The results indicated that talker characteristics influenced the voicing boundary when talker variation occurred within a block of trials only under certain conditions. Rate variation, however, influenced the voicing boundary regardless of whether or not there was rate variation within a block of trials. The findings from these experiments indicate that phoneme and rate information are encoded in an integral manner during speech perception, while talker characteristics are encoded separately.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号