期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Coherence masking protection in speech sounds: The role of formant synchrony

Peter C. Gordon 《Attention, perception & psychophysics》1997,59(2):232-242

Three experiments were performed to examine listeners’ thresholds for identifying stimuli whose spectra were modeled after the vowels /i/ and /ε/, with the differences between these stimuli restricted to the frequency of the first formant. The stimuli were presented in a low-pass masking noise that spectrally overlapped the first formant but not the higher formants. Identification thresholds were lower when the higher formants were present than when they were not, even though the first formant contained the only distinctive information for stimulus identification. This indicates that listeners were more sensitive in identifying the first formant energy through its contribution to the vowel than as an independent percept; this effect is given the namecoherence masking protection. The first experiment showed this effect for synthetic vowels in which the distinctive first formant was supported by a series of harmonics that progressed through the higher formants. In the second two experiments, the harmonics in the first formant region were removed, and the first formant was simulated by a narrow band of noise. This was done so that harmonic relations did not provide a basis for grouping the lower formant with the higher formants; coherence masking protection was still observed. However, when the temporal alignment of the onsets and offsets of the higher and lower formants was disrupted, the effect was eliminated, although the stimuli were still perceived as vowels. These results are interpreted as indicating that general principles of auditory grouping that can exploit regularities in temporal patterns cause acoustic energy belonging to a coherent speech sound to stand out in the auditory scene. 相似文献

2.

The role of visual information in the processing of

Kerry P. Green Patricia K. Kuhl 《Attention, perception & psychophysics》1989,45(1):34-42

Visual information provided by a talker’s mouth movements can influence the perception of certain speech features. Thus, the “McGurk effect” shows that when the syllable /bi/ is presented audibly, in synchrony with the syllable /gi/, as it is presented visually, a person perceives the talker as saying /di/. Moreover, studies have shown that interactions occur between place and voicing features in phonetic perception, when information is presented audibly. In our first experiment, we asked whether feature interactions occur when place information is specified by a combination of auditory and visual information. Members of an auditory continuum ranging from /ibi/ to /ipi/ were paired with a video display of a talker saying /igi/. The auditory tokens were heard as ranging from /ibi/ to /ipi/, but the auditory-visual tokens were perceived as ranging from /idi/ to /iti/. The results demonstrated that the voicing boundary for the auditory-visual tokens was located at a significantly longer VOT value than the voicing boundary for the auditory continuum presented without the visual information. These results demonstrate that place-voice interactions are not limited to situations in which place information is specified audibly. In three follow-up experiments, we show that (1) the voicing boundary is not shifted in the absence of a change in the global percept, even when discrepant auditory-visual information is presented; (2) the number of response alternatives provided for the subjects does not affect the categorization or the VOT boundary of the auditory-visual stimuli; and (3) the original effect of a VOT boundary shift is not replicated when subjects are forced by instruction to \ldrelabel\rd the /b-p/auditory stimuli as/d/or/t/. The subjects successfully relabeled the stimuli, but no shift in the VOT boundary was observed. 相似文献

3.

Categorization and normalization of vowels by 3-year-old children

Cathy A. Kubaska Richard N. Aslin 《Attention, perception & psychophysics》1985,37(4):355-362

A two-alternative, forced-choice procedure was used in two experiments to test 3-year-old children’s categorization of natural vowel tokens produced by several talkers. An appropriate pointing response (right or left) was visually reinforced with one of two television displays. In Experiment 1, the stimuli were isolated tokens of /a/ and /i/ produced by a male adult, a female adult, a male child, and a female child. In Experiment 2, the stimuli were isolated tokens of /æ/ and /∧/ produced by the same talkers. In both experiments, 3-year-olds spontaneously generalized their pointing responses from the male adult vowel tokens to the corresponding vowels produced by the other talkers. Children reinforced for an arbitrary grouping of the two vowel categories persisted in categorizing on the basis of vowel quality. Results from both experiments demonstrate the presence of perceptual constancy for vowel tokens across talkers. In particular, the results from Experiment 2 provide evidence for normalization of isolated, quasi-steady-state vowel tokens because the formant values for tokens of /∧/ produced by the woman and the two children were closer to the formant frequencies of the male adult’s /æ/ than to the male adult’s /∧/. 相似文献

4.

Perception of dynamic information for vowels in syllable onsets and offsets.

J J Jenkins W Strange 《Perception & psychophysics》1999,61(6):1200-1210

It has been demonstrated using the "silent-center" (SC) syllable paradigm that there is sufficient information in syllable onsets and offsets, taken together, to support accurate identification of vowels spoken in both citation-form syllables and syllables spoken in sentence context. Using edited natural speech stimuli, the present study examined the identification of American English vowels when increasing amounts of syllable onsets alone or syllable offsets alone were presented in their original sentence context. The stimuli were /d/-vowel-/d/ syllables spoken in a short carrier sentence by a male speaker. Listeners attempted to identify the vowels in experimental conditions that differed in the number of pitch periods presented and whether the pitch periods were from syllable onsets or syllable offsets. In general, syllable onsets were more informative than syllable offsets, although neither onsets nor offsets alone specified vowel identity as well as onsets and offsets together (SC syllables). Vowels differed widely in ease of identification; the diphthongized long vowels /e/, /ae/, /o/ were especially difficult to identify from syllable offsets. Identification of vowels as "front" or "back" was accurate, even from short samples of the syllable; however, vowel "height" was quite difficult to determine, again, especially from syllable offsets. The results emphasize the perceptual importance of time-varying acoustic parameters, which are the direct consequence of the articulatory dynamics involved in producing syllables. 相似文献

5.

Timing relations in speech and the identification of voice-onset times: a stable perceptual boundary for voicing categories across speaking rates

Boucher VJ 《Perception & psychophysics》2002,64(1):121-130

This study shows that the ratio of voice onset time (VOT) to syllable duration for /t/ and /d/ presents distributions with a stable boundary across speaking rates and that this boundary constitutes a perceptual criterion by which listeners judge the category affiliation of VOT. In Experiment 1, best-fit regression lines for VOT ratios of intervocalic /t/ and /d/ against speaking rate had zero slopes, and there was an inferable boundary between the distributions. In Experiment 2, listeners' identifications of syllable-initial stops conformed to this boundary ratio. In Experiment 3, VOT was held constant, while VOT ratios were altered by modifying the duration of the following vowel. As VOT ratios exceeded the boundary estimated from the data of Experiment 1, listeners' identifications shifted from /d/ to /t/. Timing relations in speech production can determine the identification of voicing categories across speaking rates. 相似文献

6.

The perception of back vowels: Centre of gravity hypothesis

Peter F. Assmann 《The Quarterly Journal of Experimental Psychology Section A: Human Experimental Psychology》1991,43(3):423-448

According to the formant centre of gravity (FCOG) hypothesis, two vowel formants in close proximity are merged during perceptual analysis, and their contribution to vowel quality depends on the centre of gravity of the formant cluster. Findings consistent with this hypothesis are that two formants can be replaced by a single formant of intermediate centre frequency, provided their separation is less than 3-3.5 Bark; and that changes in their relative amplitudes produce systematic shifts in vowel quality. In Experiment 1, listeners adjusted the frequencies of F1 and F2 in a synthesized 6-formant vowel (with the F1-F2 separation fixed at 250 Hz, i.e. less than 3 Bark) to find the best phonetic match to a reference vowel with modified formant amplitudes. Contrary to FCOG predictions, F2 attenuation did not produce lower frequency matches. Raising the amplitude of F2 led to predicted upward shifts in formant frequencies of the matched vowel, but with increased variability of matches for some stimuli. In Experiment 2, listeners identified synthesized vowels with a range of separations of F1 and F2. Formant amplitude manipulations had no effect on listeners' judgements when the fundamental frequency was low (125 Hz). Small shifts in vowel quality appeared for stimuli with a high fundamental (250 Hz), but the shifts were significantly larger for F1-F2 separations greater than 3.5 Bark. These effects of formant amplitude are qualitatively different from those observed with single-formant vowels and are generally incompatible with a formant-averaging mechanism. 相似文献

7.

Perception of dynamic information for vowels in syllable onsets and offsets

James J. Jenkins Winifred Strange 《Attention, perception & psychophysics》1999,61(6):1200-1210

It has been demonstrated using the “silent-center” (SC) syllable paradigm that there is sufficient information in syllable onsets and offsets,taken together, to support accurate identification of vowels spoken in both citation-form syllables and syllables spoken in sentence context. Using edited natural speech stimuli, the present study examined the identification of American English vowels when increasing amounts of syllable onsetsalone or syllable offsetsalone were presented in their original sentence context. The stimuli were /d/-vowel-/d/ syllables spoken in a short carrier sentence by a male speaker. Listeners attempted to identify the vowels in experimental conditions that differed in the number of pitch periods presented and whether the pitch periods were from syllable onsets or syllable off-sets. In general, syllable onsets were more informative than syllable offsets, although neither onsets nor offsets alone specified vowel identity as well as onsets and offsets together (SC syllables). Vowels differed widely in ease of identification; the diphthongized long vowels /e/, /ae/, /o/ were especially difficult to identify from syllable offsets. Identification of vowels as “front” or “back” was accurate, even from short samples of the syllable; however, vowel "height" was quite difficult to determine, again, especially from syllable offsets. The results emphasize the perceptual importance of time-varying acoustic parameters, which are the direct consequence of the articulatory dynamics involved in producing syllables. 相似文献

8.

Mommy is only happy! Dutch mothers’ realisation of speech sounds in infant-directed speech expresses emotion,not didactic intent

Titia Benders 《Infant behavior & development》2013

Exaggeration of the vowel space in infant-directed speech (IDS) is well documented for English, but not consistently replicated in other languages or for other speech-sound contrasts. A second attested, but less discussed, pattern of change in IDS is an overall rise of the formant frequencies, which may reflect an affective speaking style. The present study investigates longitudinally how Dutch mothers change their corner vowels, voiceless fricatives, and pitch when speaking to their infant at 11 and 15 months of age. In comparison to adult-directed speech (ADS), Dutch IDS has a smaller vowel space, higher second and third formant frequencies in the vowels, and a higher spectral frequency in the fricatives. The formants of the vowels and spectral frequency of the fricatives are raised more strongly for infants at 11 than at 15 months, while the pitch is more extreme in IDS to 15-month olds. These results show that enhanced positive affect is the main factor influencing Dutch mothers’ realisation of speech sounds in IDS, especially to younger infants. This study provides evidence that mothers’ expression of emotion in IDS can influence the realisation of speech sounds, and that the loss or gain of speech clarity may be secondary effects of affect. 相似文献

9.

Categorical perception: effects of the extent and rate of spectral change

Michael D. Hall Rachael B. Peck 《Journal of Cognitive Psychology》2017,29(1):3-22

ABSTRACT

Two experiments evaluated a potential explanation of categorical perception (CP) for place of articulation – namely, that listeners derive limited information from rapid spectral changes. Experiment 1 examined vowel context effects for /b/–/d/ continua that included consonant–vowel tokens with F2 onset frequencies that varied systematically from the F2 frequencies of their corresponding steady-states. Phoneme categorisation sharply shifted with F2 direction at locations along the continuum where discrimination performance peaked, indicating CP. Experiment 2 compared findings for a replicated condition against conditions with vowels reduced to match consonant duration or consonants extended to match vowels. CP was similarly obtained for replicated and vowel-reduced conditions. However, listeners frequently perceived diphthongs centrally on the consonant-extended continuum. Some listeners demonstrated CP, although aggregate performance appeared more continuous. These experiments indicate a model based upon the perceived direction of frequency transitions. 相似文献

10.

The role of visual information in the processing of place and manner features in speech perception 总被引：1，自引：0，他引：1

K P Green P K Kuhl 《Perception & psychophysics》1989,45(1):34-42

Visual information provided by a talker's mouth movements can influence the perception of certain speech features. Thus, the "McGurk effect" shows that when the syllable (bi) is presented audibly, in synchrony with the syllable (gi), as it is presented visually, a person perceives the talker as saying (di). Moreover, studies have shown that interactions occur between place and voicing features in phonetic perception, when information is presented audibly. In our first experiment, we asked whether feature interactions occur when place information is specificed by a combination of auditory and visual information. Members of an auditory continuum ranging from (ibi) to (ipi) were paired with a video display of a talker saying (igi). The auditory tokens were heard as ranging from (ibi) to (ipi), but the auditory-visual tokens were perceived as ranging from (idi) to (iti). The results demonstrated that the voicing boundary for the auditory-visual tokens was located at a significantly longer VOT value than the voicing boundary for the auditory continuum presented without the visual information. These results demonstrate that place-voice interactions are not limited to situations in which place information is specified audibly.(ABSTRACT TRUNCATED AT 250 WORDS) 相似文献

11.

Neurological evidence in support of a specialized phonetic processing module

Gokcen JM Fox RA 《Brain and language》2001,78(2):241-253

Event-related potentials (ERPs) were utilized to study brain activity while subjects listened to speech and nonspeech stimuli. The effect of duplex perception was exploited, in which listeners perceive formant transitions that are isolated as nonspeech "chirps," but perceive formant transitions that are embedded in synthetic syllables as unique linguistic events with no chirp-like sounds heard at all (Mattingly et al., 1971). Brain ERPs were recorded while subjects listened to and silently identified plain speech-only tokens, duplex tokens, and tone glides (perceived as "chirps" by listeners). A highly controlled set of stimuli was developed that represented equivalent speech and nonspeech stimulus tokens such that the differences were limited to a single acoustic parameter: amplitude. The acoustic elements were matched in terms of number and frequency of components. Results indicated that the neural activity in response to the stimuli was different for different stimulus types. Duplex tokens had significantly longer latencies than the pure speech tokens. The data are consistent with the contention of separate modules for phonetic and auditory stimuli. 相似文献

12.

Zebra finches and Dutch adults exhibit the same cue weighting bias in vowel perception

Ohms VR Escudero P Lammers K ten Cate C 《Animal cognition》2012,15(2):155-161

Vocal tract resonances, called formants, are the most important parameters in human speech production and perception. They encode linguistic meaning and have been shown to be perceived by a wide range of species. Songbirds are also sensitive to different formant patterns in human speech. They can categorize words differing only in their vowels based on the formant patterns independent of speaker identity in a way comparable to humans. These results indicate that speech perception mechanisms are more similar between songbirds and humans than realized before. One of the major questions regarding formant perception concerns the weighting of different formants in the speech signal (“acoustic cue weighting”) and whether this process is unique to humans. Using an operant Go/NoGo design, we trained zebra finches to discriminate syllables, whose vowels differed in their first three formants. When subsequently tested with novel vowels, similar in either their first formant or their second and third formants to the familiar vowels, similarity in the higher formants was weighted much more strongly than similarity in the lower formant. Thus, zebra finches indeed exhibit a cue weighting bias. Interestingly, we also found that Dutch speakers when tested with the same paradigm exhibit the same cue weighting bias. This, together with earlier findings, supports the hypothesis that human speech evolution might have exploited general properties of the vertebrate auditory system. 相似文献

13.

Enhanced discriminability at the phonetic boundaries for the voicing feature in macaques

Patricia K. Kuhl Denise M. Padden 《Attention, perception & psychophysics》1982,32(6):542-550

Discrimination of speech sounds from three computer-generated continua that ranged from voiced to voiceless syllables (/ba-pa/, /da-ta/, and ga-ha/ was tested with three macaques. The stimuli on each continuum varied in voice-onset time (VOT). Paris of stimuli that were equally different in VOT were chosen such that they were either within-category pairs (syllables given the same phonetic label by human listeners) or between-category paks (syllables given different phonetic labels by human listeners). Results demonstrated that discrimination performance was always best for between-category pairs of stimuli, thus replicating the “phoneme boundary effect” seen in adult listeners and in human infants as young as I month of age. The findings are discussed in terms of their specific impact on accounts of voicing perception in human listeners and in terms of their impact on discussions of the evolution of language. 相似文献

14.

Neural processing of acoustic duration and phonological German vowel length: Time courses of evoked fields in response to speech and nonspeech signals

Fabian Tomaschek Hubert Truckenbrodt Ingo Hertrich 《Brain and language》2013,124(1):117-131

Recent experiments showed that the perception of vowel length by German listeners exhibits the characteristics of categorical perception. The present study sought to find the neural activity reflecting categorical vowel length and the short-long boundary by examining the processing of non-contrastive durations and categorical length using MEG. Using disyllabic words with varying /a/-durations and temporally-matched nonspeech stimuli, we found that each syllable elicited an M50/M100-complex. The M50-amplitude to the second syllable varied along the durational continuum, possibly reflecting the mapping of duration onto a rhythm representation. Categorical length was reflected by an additional response elicited when vowel duration exceeded the short-long boundary. This was interpreted to reflect the integration of an additional timing unit for long in contrast to short vowels. Unlike to speech, responses to short nonspeech durations lacked a M100 to the first and M50 to the second syllable, indicating different integration windows for speech and nonspeech signals. 相似文献

15.

Training to use voice onset time as a cue to talker identification induces a left-ear/right-hemisphere processing advantage

Francis AL Driscoll C 《Brain and language》2006,98(3):310-318

We examined the effect of perceptual training on a well-established hemispheric asymmetry in speech processing. Eighteen listeners were trained to use a within-category difference in voice onset time (VOT) to cue talker identity. Successful learners (n=8) showed faster response times for stimuli presented only to the left ear than for those presented only to the right. The development of a left-ear/right-hemisphere advantage for processing a prototypically phonetic cue supports a model of speech perception in which lateralization is driven by functional demands (talker identification vs. phonetic categorization) rather than by acoustic stimulus properties alone. 相似文献

16.

Some consequences of stimulus variability on speech processing by 2-month-old infants.

P W Jusczyk D B Pisoni J Mullennix 《Cognition》1992,43(3):253-291

The present study explores how stimulus variability in speech production influences the 2-month-old infant's perception and memory for speech sounds. Experiment 1 focuses on the consequences of talker variability for the infant's ability to detect differences between speech sounds. When tested with high-amplitude sucking (HAS) procedure, infants who listened to versions of a syllable, such as [symbol: see text], produced by 6 male and 6 female talkers, detected a change to another syllable, such as [symbol: see text], uttered by the same group of talkers. In fact, infants exposed to multiple talkers performed as well as other infants who heard utterances produced by only a single talker. Moreover, other results showed that infants discriminate the voices of the individual talkers, although discriminating one mixed group of talkers (3 males and 3 females) from another is too difficult for them. Experiment 2 explored the consequences of talker variability on infants' memory for speech sounds. The HAS procedure was modified by introducing a 2-min delay period between the familiarization and test phases of the experiment. Talker variability impeded infants' encoding of speech sounds. Infants who heard versions of the same syllable produced by 12 different talkers did not detect a change to a new syllable produced by the same talkers after the delay period. However, infants who heard the same syllable produced by a single talker were able to detect the phonetic change after the delay. Finally, although infants who heard productions from a single talker retained information about the phonetic structure of the syllable during the delay, they apparently did not retain information about the identity of the talker. Experiment 3 reduced the range of variability across talkers and investigated whether variability interferes with retention of all speech information. Although reducing the range of variability did not lead to retention of phonetic details, infants did recognize a change in the gender of the talkers' voices (from male to female or vice versa) after a 2-min delay. Two additional experiments explored the consequences of limiting the variability to a single talker. In Experiment 4, with an immediate testing procedure, infants exposed to 12 different tokens of one syllable produced by the same talker discriminated these from 12 tokens of another syllable.(ABSTRACT TRUNCATED AT 400 WORDS) 相似文献

17.

Pitch as a phonemic cue

Peter Howell 《Memory & cognition》1980,8(3):285-296

Three experiments that were designed to determine how pitch information is represented in auditory memory are reported. A same-different reaction time was used in all three experiments. Previous experiments have interpreted the finding of faster “same” responses to acoustically identical pairs than to pairs that are phonemically identical but acoustically distinct as indicating that there is a memory that preserves auditory information. It has been assumed that this can be used to match “same” pairs only if the formant frequencies of the members of the pair are the same. In the first experiment, the size of this matching advantage for pairs with identical formant frequencies was not altered when the members of the pair were on different pitches. This indicates that pitch is represented separately from the formants at the auditory level. The second and third experiments used a bigger pitch difference when the pairs were on a different pitch, which, for one of the stimulus sets, resulted in a change in vowel quality but not in the identity of the consonant. In the other stimulus set, both phonemes of the syllable remained the same when presented on different pitches. The matching advantage was reduced when the stimuli were on different pitches for both stimulus sets. This indicates that a difference in pitch can prevent matching at the auditory level under some circumstances. An additional finding, a reduced residual matching advantage when the syllable changes, indicates that at least a syllable-length representation is held in auditory memory. The results are discussed with respect to how the representation in auditory memory might be used in the perception of speech produced by different speakers. 相似文献

18.

The role of vowel formant frequencies and duration in the perception of foreign accent

Kit Ying Chan Michael D. Hall Ashley A. Assgari 《Journal of Cognitive Psychology》2017,29(1):23-34

ABSTRACT

This study used synthesis to manipulate vowel formant frequencies and durations to evaluate their role on foreign accent perception. Formant frequencies and durations for the vowels /æ/, /?/, and /a/ were manipulated with changes toward and away from the mean native English and Spanish-accented values from Sidaras, S. K., Alexander, J. E. D., & Nygaard, L. C. (2009. Perceptual learning of systematic variation in Spanish-accented speech. The Journal of the Acoustical Society of America, 125, 3306–3316). Native listeners rated these stimuli on degree of accentedness and comprehensibility. Gradual changes in formant frequencies from native to non-native values impacted /a/ negatively, /?/ positively, and /æ/ minimally. Effects of vowel duration on either type of ratings were small and restricted to vowel-specific interactions. The current findings suggest that vowel formant frequencies are primary cues to foreign accent. Their influence depends upon whether or not frequencies could reflect alternative vowel categories. 相似文献

19.

Minimal spectral contrast of formant peaks for vowel recognition as a function of spectral slope

Andrew P. Lea Quentin Summerfield 《Attention, perception & psychophysics》1994,56(4):379-391

In four experiments we investigated whether listeners can locate the formants of vowels not only from peaks, but also from spectral “shoulders”—features that give rise to zero crossings in the third, but not the first, differential of the excitation pattern—as hypothesized by Assmann and Summerfield (1989). Stimuli were steady-state approximations to the vowels [a, i, з, u, ?] created by summing the first 45 harmonics of a fundamental of 100 Hz. Thirty-nine harmonics had equal amplitudes; the other 6 formed three pairs that were raised in level to define three “formants.” An adaptive psychophysical procedure determined the minimal difference in level between the 6 harmonics and the remaining 39 at which the vowels were identifiably different from one another. These thresholds were measured through simulated communication channels, giving overall slopes to the excitation patterns of the five vowels that ranged from ?1 dB/erb to +2 dB/erb. Excitation patterns of the threshold stimuli were computed, and the locations of formants were estimated from zero crossings in the first and third differentials. With the more steeply sloping communication channels, some formants of some vowels were represented as shoulders rather than peaks, confirming the predictions of Assmann and Summerfield’s models. We discuss the limitations of the excitation pattern model and the related issue of whether the location of formants can be computed from spectral shoulders in auditory analysis. 相似文献

20.

Cardiac indices of infant speech perception: Orienting and burst discrimination

Cynthia L. Miller Philip A. Morse Michael F. Dorman 《Quarterly journal of experimental psychology (2006)》1977,29(3):533-545

The present study investigated burst cue discrimination in 3- to 4-month-old infants with the natural speech stimuli [bu] and [gu]. The experimental stimuli consisted of either a [bu] or a [gu] burst attached to the formants of the [bu], such that the sole difference between the two stimuli was the initial burst cue. Infants were tested using a cardiac orienting response (OR) paradigm which consisted of 20 tokens of one stimulus (e.g. [bu]) followed by 20 tokens of the second syllable (20/20 paradigm). An OR to the stimulus change revealed that young infants can discriminate burst cue differences in speech stimuli. Discussion of the results focused on asymmetries observed in the data and the relationship of these findings to our previous failure to demonstrate burst discrimination using the habituation/dishabituation cardiac measure generally employed with older infants. 相似文献