首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
It is well known that the formant transitions of stop consonants in CV and VC syllables are roughly the mirror image of each other in time. These formant motions reflect the acoustic correlates of the articulators as they move rapidly into and out of the period of stop closure. Although acoustically different, these formant transitions are correlated perceptually with similar phonetic segments. Earlier research of Klatt and Shattuck (1975) had suggested that mirror image acoustic patterns resembling formant transitions were not perceived as similar. However, mirror image patterns could still have some underlying similarity which might facilitate learning, recognition, and the establishment of perceptual constancy of phonetic segments across syllable positions. This paper reports the results of four experiments designed to study the perceptual similarity of mirror-image acoustic patterns resembling the formant transitions and steady-state segments of the CV and VC syllables /ba/, /da/, /ab/, and /ad/. Using a perceptual learning paradigm, we found that subjects could learn to assign mirror-image acoustic patterns to arbitrary response categories more consistently than they could do so with similar arrangements of the same patterns based on spectrotemporal commonalities. Subjects respond not only to the individual components or dimensions of these acoustic patterns, but also process entire patterns and make use of the patterns’ internal organization in learning to categorize them consistently according to different classification rules.  相似文献   

2.
This study investigated whether consonant phonetic features or consonant acoustic properties more appropriately describe perceptual confusions among speech stimuli in multitalker babble backgrounds. Ten normal-hearing subjects identified 19 consonants, each paired with /a/, 1–19 and lui in a CV format. The stimuli were presented in quiet and in three levels of babble. Multidimensional scaling analyses of the confusion data retrieved stimulus dimensions corresponding to consonant acoustic parameters. The acoustic dimensions identified were: periodicity/burst onset, friction duration, consonant-vowel ratio, second formant transition slope, and first formant transition onset. These findings are comparable to previous reports of acoustic effects observed in white-noise conditions, and support the theory that acoustic characteristics are the relevant perceptual properties of speech in noise conditions. Perceptual effects of vowel context and level of the babble also were observed. These condition effects contrast with those previously reported for white-noise interference, and are attributed to direct masking of the low-frequency acoustic cues in the nonsense syllables by the low-frequency spectrum of the babble.  相似文献   

3.
Speech perception can be viewed in terms of the listener’s integration of two sources of information: the acoustic features transduced by the auditory receptor system and the context of the linguistic message. The present research asked how these sources were evaluated and integrated in the identification of synthetic speech. A speech continuum between the glide-vowel syllables /ri/ and /li/ was generated by varying the onset frequency of the third formant. Each sound along the continuum was placed in a consonant-cluster vowel syllable after an initial consonant /p/, /t/, /s/, and /v/. In English, both /r/ and /l/ are phonologically admissible following /p/ but are not admissible following /v/. Only /l/ is admissible following /s/ and only /r/ is admissible following /t/. A third experiment used synthetic consonant-cluster vowel syllables in which the first consonant varied between /b/ and /d and the second consonant varied between /l/ and /r/. Identification of synthetic speech varying in both acoustic featural information and phonological context allowed quantitative tests of various models of how these two sources of information are evaluated and integrated in speech perception.  相似文献   

4.
The stop consonants /b, d, g, p, t, k/were recorded before/i/,/a/,/u/. The energy spectrum for each stop consonant was removed from its original vowel and spliced onto a different steady-state vowel. Results of a recognition test revealed that consonants were accurately recognized in all cases except when /k/ or/g[ was spliced from/i/to/u/. Further demonstrations suggested that/k/ and /g/ do have invariant characteristics before/i/, /a/, and /u/. These results support the general notion that stop consonants may be recognized before different vowels in normal speech in terms of invariant acoustic features.  相似文献   

5.
When members of a series of synthesized stop consonants varying acoustically inF3 characteristics and varying perceptually from /da/ to /ga/ are preceded by /al/, subjects report hearing more /ga/ syllables relative to when each member is preceded by /ar/ (Mann, 1980). It has been suggested that this result demonstrates the existence of a mechanism that compensates for coarticulation via tacit knowledge of articulatory dynamics and constraints, or through perceptual recovery of vocal-tract dynamics. The present study was designed to assess the degree to which these perceptual effects are specific to qualities of human articulatory sources. In three experiments, series of consonant-vowel (CV) stimuli varying inF3-onset frequency (/da/—/ga/) were preceded by speech versions or nonspeech analogues of /al/ and lav I. The effect of liquid identity on stop consonant labeling remained when the preceding VC was produced by a female speaker and the CV syllable was modeled after a male speaker’s productions. Labeling boundaries also shifted when the CV was preceded by a sine wave glide modeled after F3 characteristics of /al/ and /ar/. Identifications shifted even when the preceding sine wave was of constant frequency equal to the offset frequency ofF3 from a natural production. These results suggest an explanation in terms of general auditory processes as opposed to recovery of or knowledge of specific articulatory dynamics.  相似文献   

6.
We examined whether the orientation of the face influences speech perception in face-to-face communication. Participants identified auditory syllables, visible syllables, and bimodal syllables presented in an expanded factorial design. The syllables were /ba/, /va/, /δa/, or /da/. The auditory syllables were taken from natural speech whereas the visible syllables were produced by computer animation of a realistic talking face. The animated face was presented either as viewed in normal upright orientation or inverted orientation (180° frontal rotation). The central intent of the study was to determine if an inverted view of the face would change the nature of processing bimodal speech or simply influence the information available in visible speech. The results with both the upright and inverted face views were adequately described by the fuzzy logical model of perception (FLMP). The observed differences in the FLMP’s parameter values corresponding to the visual information indicate that inverting the view of the face influences the amount of visible information but does not change the nature of the information processing in bimodal speech perception  相似文献   

7.
When the auditory and visual components of spoken audiovisual nonsense syllables are mismatched, perceivers produce four different types of perceptual responses, auditory correct, visual correct, fusion (the so-called McGurk effect), and combination (i.e., two consonants are reported). Here, quantitative measures were developed to account for the distribution of the four types of perceptual responses to 384 different stimuli from four talkers. The measures included mutual information, correlations, and acoustic measures, all representing audiovisual stimulus relationships. In Experiment 1, open-set perceptual responses were obtained for acoustic /bɑ/ or /lɑ/ dubbed to video /bɑ, dɑ, gɑ, vɑ, zɑ, lɑ, wɑ, eɑ/. The talker, the video syllable, and the acoustic syllable significantly influenced the type of response. In Experiment 2, the best predictors of response category proportions were a subset of the physical stimulus measures, with the variance accounted for in the perceptual response category proportions between 17% and 52%. That audiovisual stimulus relationships can account for perceptual response distributions supports the possibility that internal representations are based on modality-specific stimulus relationships.  相似文献   

8.
On the basis of the lexical corpus created by Amano and Kondo (2000), using the Asahi newspaper, the present study provides frequencies of occurrence for units of Japanese phonemes, morae, and syllables. Among the five vowels, /a/ (23.42%), /i/ (21.54%), /u/ (23.47%), and /o/ (20.63%) showed similar frequency rates, whereas /e/ (10.94%) was less frequent. Among the 12 consonants, /k/ (17.24%), /t/ (15.53%), and /r/ (13.11%) were used often, whereas /p/ (0.60%) and /b/ (2.43%) appeared far less frequently. Among the contracted sounds, /sj/ (36.44%) showed the highest frequency, whereas /mj/ (0.27%) rarely appeared. Among the five long vowels, /aR/ (34.4%) was used most frequently, whereas /uR/ (12.11%) was not used so often. The special sound /N/ appeared very frequently in Japanese. The syllable combination /k/+V+/N/ (19.91%) appeared most frequently among syllabic combinations with the nasal /N/. The geminate (or voiceless obstruent) /Q/, when placed before the four consonants /p/, /t/, /k/, and /s/, appeared 98.87% of the time, but the remaining 1.13% did not follow the definition. The special sounds /R/, /N/, and /Q/ seem to appear very frequently in Japanese, suggesting that they are not special in terms of frequency counts. The present study further calculated frequencies for the 33 newly and officially listed morae/syllables, which are used particularly for describing alphabetic loanwords. In addition, the top 20 bi-mora frequency combinations are reported. Files of frequency indexes may be downloaded from the Psychonomic Society Web archive at http://www.psychonomic.org/archive/.  相似文献   

9.
Monkeys were presented with synthetic speech stimuli in a shock-avoidance situation. On the basis of their behavior, perceptual boundaries were determined along the physical continua between /ba/ and /pa/, and /ga/ and /ka/, that were close to the human boundaries between voiced and voiceless consonants. As is the case with humans, discrimination across a boundary was better than discrimination between stimuli that were both on one side of the boundary, and there was generalization of the voiced-voiceless distinction from labial to velar syllables. Unlike humans, the monkeys showed large shifts in boundary when the range of stimuli was varied.  相似文献   

10.
The results of earlier studies by several authors suggest that speech and nonspeech auditory patterns are processed primarily in different places in the brain and perhaps by different modes. The question arises in studies of speech perception whether all phonetic elements or all features of phonetic elements are processed in the same way. The technique of dichotic presentation was used to examine this question.

The present study compared identifications of dichotically presented pairs of synthetic CV syllables and pairs of steady-state vowels. The results show a significant right-ear advantage for CV syllables but not for steady-state vowels. Evidence for analysis by feature in the perception of consonants is discussed.  相似文献   

11.
Three experiments assessed the roles of release bursts and formant transitions as acoustic cues to place of articulation in syllable-initial voiced stop consonants by systematically removing them from American English /b,d,g/, spoken before nine different vowels by two speakers, and by transposing the bursts across all vowels for each class of stop consonant. The results showed that bursts were largely invariant in their effect, but carried significant perceptual weight in only one syllable out of 27 for Speaker 1, in only 13 syllables out of 27 for Speaker 2. Furthermore, bursts and transitions tended to be reciprocally related: Where the perceptual weight of one increased, the weight of the other declined. They were thus shown to be functionally equivalent, context-dependent cues, each contributing to the rapid spectral changes that follow consonantal release. The results are interpreted as pointing to the possible role of the front-cavity resonance in signaling place of articulation.  相似文献   

12.
A dichotic listening experiment was conducted to determine if vowel perception is based on phonetic feature extraction as is consonant perception. Twenty normal right-handed subjects were given dichotic CV syllables contrasting in final vowels. It was found that, unlike consonants, the perception of dichotic vowels was not significantly lateralized, that the dichotic perception of vowels was not significantly enhanced by the number of phonetic features shared, and that the occurrence of double-blend errors was not greater than chance. However, there was strong evidence for the use of phonetic features at the level of response organization. It is suggested that the differences between vowel and consonant perception reflect the differential availability of the underlying acoustic information from auditory store, rather than differences in processing mechanisms.  相似文献   

13.
Infant perception often deals with audiovisual speech input and a first step in processing this input is to perceive both visual and auditory information. The speech directed to infants has special characteristics and may enhance visual aspects of speech. The current study was designed to explore the impact of visual enhancement in infant-directed speech (IDS) on audiovisual mismatch detection in a naturalistic setting. Twenty infants participated in an experiment with a visual fixation task conducted in participants’ homes. Stimuli consisted of IDS and adult-directed speech (ADS) syllables with a plosive and the vowel /a:/, /i:/ or /u:/. These were either audiovisually congruent or incongruent. Infants looked longer at incongruent than congruent syllables and longer at IDS than ADS syllables, indicating that IDS and incongruent stimuli contain cues that can make audiovisual perception challenging and thereby attract infants’ gaze.  相似文献   

14.
Selective adaptation with a syllable-initial consonant fails to affect perception of the same consonant in syllable-final position, and vice versa. One account of this well-replicated result invokes a cancellation explanation: with the place-of-articulation stimuli used, the pattern of formant transitions switches according to syllabic position, allowing putative phonetic-level effects to be opposed by putative acoustic-level effects. Three experiments tested the cancellation hypothesis by preempting the possibility of acoustic countereffects. In Experiment 1, the test syllables and adaptors were /r/-/l/CVs and VCs, which do not produce cancelling formant patterns across syllabic position. In Experiment 2, /b/-/d/ continua were used in a paired-contrast procedure, believed to be sensitive to phonetic, but not acoustic, identity. In Experiment 3, cross-ear adaptation, also believed to tap phonetic rather than acoustic processes, was used. All three experiments refuted the cancellation hypothesis. Instead, it appears that the perceptual process treats syllable-initial consonants and syllable-final ones as inherently different. These results provide support for the use of demisyllabic representations in speech perception.  相似文献   

15.
16.
When discriminating pairs of speech stimuli from an acoustic voice onset time (VOT) continuum (for example, one ranging from /ba/ to /pa/), English-speaking subjects show a characteristic performance peak in the region of the phonemic category boundary. We demonstrate that this "category boundary effect" is reduced or eliminated when the stimuli are preceded by /s/. This suppression does not seem to be due to the absence of a phonological voicing contrast for stop consonants following /s/, since it is also obtained when the /s/ terminates a preceding word and (to a lesser extent) when broadband noise is substituted for the fricative noise. The suppression is stronger, however, when the noise has the acoustic properties of a syllable-initial /s/, all else being equal. We hypothesize that these properties make the noise cohere with the following speech signal, which makes it difficult for listeners to focus on the VOT differences to be discriminated.  相似文献   

17.
When listening to speech, do we recognize syllables or phonemes? Information concerning the organization of the decisions involved in identifying a syllable may be elicited by allowing separate phonetic decisions regarding the vowel and consonant constituents to be controlled by the same acoustic information and by looking for evidence of interaction between these decisions. The duration and first formant frequency of the steady-state vocalic segment in synthesized consonant-vowel-consonant syllables were varied to result in responses of /bεd/, /bæd/, /bεt/, and /bæt/. The fact that the duration of the steady-state segment controls both decisions implies that that segment must be included in its entirety in the signal intervals on which the two decisions are based. For most subjects, no further significant interaction between the vocalic and consonantal decision is found beyond the fact that they are both affected by changes in the duration parameter. A model of two separate and independent phonetic decisions based on overlapping ranges of the signal adequately accounts for these data, and no explicit syllable level recognition needs to be introduced.  相似文献   

18.
Different memory functions were obtained for consonants (C) and vowels (V) in a serial recall task. In general, the most recently heard vowels in a sequence were easier to recall than the most recently heard consonants. This effect was observed for auditorily presented sequences of CV or VC syllables, but was not observed for visually presented stimuli. The results were explained in terms of a limited capacity acoustic storage in which vowels are preserved longer than consonants. Retrieval of the last vowels from this storage was presumed to cause the vowel recency effect.  相似文献   

19.
How does the brain extract invariant properties of variable-rate speech? A neural model, called PHONET, is developed to explain aspects of this process and, along the way, data about perceptual context effects. For example, in consonant-vowel (CV) syllables, such as /ba/ and /wa/, an increase in the duration of the vowel can cause a switch in the percept of the preceding consonant from /w/ to /b/ (J.L. Miller & Liberman, 1979). The frequency extent of the initial formant transitions of fixed duration also influences the percept (Schwab, Sawusch, & Nusbaum, 1981). PHONET quantitatively simulates over 98% of the variance in these data, using a single set of parameters. The model also qualitatively explains many data about other perceptual context effects. In the model, C and V inputs are filtered by parallel auditory streams that respond preferentially to the transient and sustained properties of the acoustic signal before being stored in parallel working memories. A lateral inhibitory network of onset- and rate-sensitive cells in the transient channel extracts measures of frequency transition rate and extent. Greater activation of the transient stream can increase the processing rate in the sustained stream via a cross-stream automatic gain control interaction. The stored activities across these gain-controlled working memories provide a basis for rate-invariant perception, since the transient-to-sustained gain control tends to preserve the relative activities across the transient and sustained working memories as speech rate changes. Comparisons with alternative models tested suggest that the fit cannot be attributed to the simplicity of the data. Brain analogues of model cell types are described.  相似文献   

20.
Selective adaptation experiments were conducted to test for the presence of a mechanism that mediates an aspect of both speech perception and speech production. Ss were instructed to utter /i/ or /bi/ after listening to repetitions of either of these syllables or to repetitions of the vowel /i/. Analysis of the utterances showed that a timing relation which distinguishes /pi/from /bi/, namely the latency in onset of voicing relative to the release burst of the consonant, varied systematically for the /pi/ utterances but not for the /bi/ utterances as a function of the speech input. The effect for the /pi/ utterances was shown not to be attributable to factors such as compensation for distorted perception of the /pi/ adapting stimulus or voluntary mimicry of this stimulus.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号