共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Within the next few years, there will be an extensive proliferation of various types of voice response devices in human-machine communication systems. Unfortunately, at present, relatively little basic or applied research has been carried out on the intelligibility, comprehension, and perceptual processing of synthetic speech produced by these devices. On the basis of our research, we identify five factors that must be considered in studying the perception of synthetic speech: (1) the specific demands imposed by a particular task, (2) the inherent limitations of the human information processing system, (3) the experience and training of the human listener, (4) the linguistic structure of the message set, and (5) the structure and quality of the speech signal. 相似文献
3.
Richard M. Warren Keri Riener Hainsworth Bradley S. Brubaker James A. Bashford Eric W. Healy 《Attention, perception & psychophysics》1997,59(2):275-283
In order to function effectively as a means of communication, speech must be intelligible under the noisy conditions encountered in everyday life. Two types of perceptual synthesis have been reported that can reduce or cancel the effects of masking by extraneous sounds: Phonemic restoration can enhance intelligibility when segments are replaced or masked by noise, and contralateral induction can prevent mislateralization by effectively restoring speech masked at one ear when it is heard in the other. The present study reports a third type of perceptual synthesis induced by noise: enhancement of intelligibility produced by adding noise to spectral gaps. In most of the experiments, the speech stimuli consisted of two widely separated narrow bands of speech (center frequencies of 370 and 6000 Hz, each band having high-pass and low-pass slopes of 115 dB/octave meeting at the center frequency). These very narrow bands effectively reduced the available information to frequency-limited patterns of amplitude fluctuation lacking information concerning formant structure and frequency transitions. When stochastic noise was introduced into the gap separating the two speech bands, intelligibility increased for “everyday” sentences, for sentences that varied in the transitional probability of keywords, and for monosyllabic word lists. Effects produced by systematically varying noise amplitude and noise bandwidth are reported, and the implications of some of the novel effects observed are discussed. 相似文献
4.
Infant rule learning facilitated by speech 总被引:1,自引:0,他引:1
Sequences of speech sounds play a central role in human cognitive life, and the principles that govern such sequences are crucial in determining the syntax and semantics of natural languages. Infants are capable of extracting both simple transitional probabilities and simple algebraic rules from sequences of speech, as demonstrated by studies using ABB grammars (la ta ta, gai mu mu, etc.). Here, we report a striking finding: Infants are better able to extract rules from sequences of nonspeech--such as sequences of musical tones, animal sounds, or varying timbres--if they first hear those rules instantiated in sequences of speech. 相似文献
5.
Michael Natale 《Brain and language》1977,4(1):32-44
Two groups of right-handed young adults were tested on a series of handedness measures and on dichotic nonverbal rhythmic sequences. Cross-validated multiple regression analysis revealed that all of the cerebral-lateralization/manual-praxis measures were positively related to the degree of left-hemisphere perceptual asymmetry for nonverbal rhythms (Crawford peg, scissor, handwriting, Crawford screws, tracing, total R = .67). Seventeen of the 52 subjects manifested significant (p < .05) left-hemisphere laterality coefficients for the dichotic stimuli. More complex rhythms elicited greater left-hemisphere perceptual preference. The results are discussed in reference to the concept of cerebral lateralization. 相似文献
6.
Listeners are exquisitely sensitive to fine-grained acoustic detail within phonetic categories for sounds and words. Here we show that this sensitivity is optimal given the probabilistic nature of speech cues. We manipulated the probability distribution of one probabilistic cue, voice onset time (VOT), which differentiates word initial labial stops in English (e.g., "beach" and "peach"). Participants categorized words from distributions of VOT with wide or narrow variances. Uncertainty about word identity was measured by four-alternative forced-choice judgments and by the probability of looks to pictures. Both measures closely reflected the posterior probability of the word given the likelihood distributions of VOT, suggesting that listeners are sensitive to these distributions. 相似文献
7.
The insertion of noise in the silent intervals of interrupted speech has a very striking perceptual effect if a certain signal-to-noise ratio is used. Conflicting reports have been published as to whether the inserted noise improves speech intelligibility or not. The major difference between studies was the level of redundancy in the speech material. We show in the present paper that the noise leads to a better intelligibility of interrupted speech. The redundancy level determines the possible amount of improvement. The consequences of our findings are discussed. in relation to such phenomena as continuity perception and pulsation threshold measurement. A hypothesis is formulated for the processing of interrupted stimuli with and without intervening noise: for stimuli presented with intervening noise, the presence in the auditory system of an automatic interpolation mechanism is assumed. The mechanism operates only if the noise makes it impossible to perceive the interruption. 相似文献
8.
Four monkeys and 6 humans representing five different native languages were compared in the ability to categorize natural CV tokens of /b/ versus /d/ produced by 4 talkers of American English (2 male, 2 female) in four vowel contexts (/i, e, a, u/). A two-choice "left/right" procedure was used in which both percentage correct and response time data were compared between species. Both measures indicated striking context effects for monkeys, in that they performed better for the back vowels /a/ and /u/ than for the front vowels /i/ and /e/. Humans showed no context effects for the percentage correct measure, but their response times showed an enhancement for the /i/ vowel, in contrast with monkeys. Results suggest that monkey perception of place of articulation is more dependent than human perception on the direction of the F2 onset transitions of syllables, since back-vowel F2s differentiate /b/ and /d/ more distinctively. Although monkeys do not provide an accurate model of the adult human in place perception, they may be able to model the preverbal human infant before it learns a more speech-specific strategy of place information extraction. 相似文献
9.
Perception of intersensory temporal order is particularly difficult for (continuous) audiovisual speech, as perceivers may find it difficult to notice substantial timing differences between speech sounds and lip movements. Here we tested whether this occurs because audiovisual speech is strongly paired (“unity assumption”). Participants made temporal order judgments (TOJ) and simultaneity judgments (SJ) about sine-wave speech (SWS) replicas of pseudowords and the corresponding video of the face. Listeners in speech and non-speech mode were equally sensitive judging audiovisual temporal order. Yet, using the McGurk effect, we could demonstrate that the sound was more likely integrated with lipread speech if heard as speech than non-speech. Judging temporal order in audiovisual speech is thus unaffected by whether the auditory and visual streams are paired. Conceivably, previously found differences between speech and non-speech stimuli are not due to the putative “special” nature of speech, but rather reflect low-level stimulus differences. 相似文献
10.
Anne P. Copeland 《Journal of abnormal child psychology》1979,7(2):169-177
Types and amount of private speech (audible talking that is not addressed to another person) were assessed during the free play of 16 hyperactive and 16 nonhyperactive boys. Verbalizations were coded into nine categories that denoted the boys' level of use of verbal control of their own behavior (Luria, 1961; Kohlberg, Yeager, & Hjertholm; 1968). Differences in amount and type of private speech between hyperactive and nonhyperactive boys were found to indicate that hyperactive boys may be presenting a specific or general cognitive lag in development. Treatment ramifications are discussed. 相似文献
11.
12.
Alexander P.W Shubsachs James B Rounds RenéV Dawis Lloyd H Lofquist 《Journal of Vocational Behavior》1978,13(1):54-62
The factor structure of 109 Occupational Reinforcer Patterns approximating the distribution of the employed labor force of the United States was investigated. These work-reinforcer systems, as perceived by almost 6000 raters, were found to be represented best by a three-factor solution. The factors were identified as a Self Reinforcement factor, an Environmental/Organizational reinforcement factor, and a Reinforcement via-Altruism factor. The factors were found to correspond to, respectively, the Achievement-Autonomy, Safety-Comfort, and Al-truism need factors of the Minnesota Importance Questionnaire. For these two measures utilized in the assessment of individual-environment correspondence, commensurate measurement—as required by person-environment fit theories—is possible. 相似文献
13.
The aim of this study was to investigate the perception of competing speech stimuli in 3-, 4-, and 5-year-old normally developing children. A dichotic listening paradigm was used in which the temporal alignment between the two stimuli was varied to represent three levels of competition. Minimal, moderate, and maximal levels of temporal competition were represented by a Separation, Lag, and Simultaneous test condition, respectively. The subjects were behaviorally set to listen for and to report the two stimuli on each trial. The incidence of double correct responses in the test conditions was the measure of interest. The results show a sharp and linear drop in double correct scores from the Separation, to the Lag, and to the Simultaneous condition. There were no age-related differences in the Separation and the Simultaneous conditions. In the Lag condition, the performance of the 3-year-olds was significantly lower than the 4- and 5-year-olds. The findings were interpreted to be indicative of limited auditory processing ability in preschoolers for moderately and maximally competing speech stimuli. 相似文献
14.
To offset shortcomings of existing demonstrations of right-ear superiority in the analysis of formant transitions, an experiment was performed on whispered speech. Two aspects of dichotic listening performance were examined in a single-report paradigm: the right-ear advantage (REA) for the perception of the voicing distinction and the feature sharing advantage (FSA) for both voicing and place features. A significant REA was obtained for the voicing distinction cued by first formant transition in the absence of a switch from aperiodic to periodic excitation. This, plus a greater incidence of voiced responses to right-ear stimuli, suggests that a distinction involving transitions can specifically augment the REA. The data also showed better identification of place and of voicing feature values when the competing dichotic speech stimuli shared these respective features (FSA) than when they did not. This FSA was restricted to the feature shared and hence not an effect of response uncertainty. The implications of these results for models of speech processing are discussed. 相似文献
15.
Stephen M. Williams 《Current Psychology》1987,6(2):148-154
Previous experimental investigation of the effects of repeating an unfamiliar stimulus suggests that mere exposure breeds
attraction (e.g., Zajonc, 1968). On the other hand, correlational work with naturally occurring stimuli such as names, music,
or landscapes suggests that there is also an overexposure effect: the preference function does rise with familiarity at first
but then reaches a turning point and diminishes. The study (N=72) demonstrates this inverted-U relationship in an experimental setting. The stimuli were synthetic nonsense speech, permitting
exact control of exposure durations and interstimulus intervals. The critical factors for demonstrating the effect are probably
(1) the inclusion of a large number of repetitions, and (2) blocked repetition of each stimulus in a homogeneous sequence
not interspersed with other more or less frequent stimuli. 相似文献
16.
Kam-lun Edwin Lee 《Zygon》1997,32(1):65-81
This article seeks to explain the correspondence between human intelligibility and that of the physical world by synthesizing the contributions of Jean Ladrière. Ladrière shows that the objectification function of formal symbolism in mathematics as an artificial language has operative power acquired through algorithm to represent physical reality. In physical theories, mathematics relates to observations through theoretic and empirical languages mutually interacting in a methodological circle, and nonmathematical anticipatory intention guides mathematical algorithmic exploration. Ladrière reasons that mathematics can make the physical world comprehensible because of the presence of a rational principle in both kinds of intelligibility. 相似文献
17.
18.
Christopher Friel 《Heythrop Journal》2019,60(1):55-78
The aim of this paper is to note the convergence between two critical realist philosophies of science, namely, that of Roy Bhaskar and Bernard Lonergan with regard to the intelligibility of experimental activity. Bhaskar very explicitly argues that ‘differentiation implies stratification.’ The idea is that because the situations produced in laboratories are special instances of closure (like the solar system in the open universe, they do not represent the general case) the significance of experimental activity is that it brings about regularities with a view to understanding scientific laws at a deeper level. This is to say, when experiment is properly understood, the weaknesses of empiricism are exposed. Although he is not as explicit, Lonergan also has recourse to this argument. The parallels between Bhaskar and Lonergan are not surprising given the Aristotelian heritage that is manifest in their common concern for a realist ontology. Nevertheless, some differences between the two emerge, for example, in Lonergan's concern with the development of statistical science, and as well, a firm commitment to substance (rather than to powers, simply). Some attention to the significance of experimental activity for the debate surrounding realism is explored; it is suggested that Lonergan has something to offer in the subsequent conversation associated with Maxwell, van Fraassen, Hacking and Cartwright. 相似文献
19.
The perception of the distinction between /r/ and /l/ by native speakers of American English and of Japanese was studied using natural and synthetic speech. The American subjects were all nearly perfect at recognizing the natural speech sounds, whereas there was substantial variation among the Japanese subjects in their accuracy of recognizing /r/ and /l/ except in syllable-final position. A logit model, which additively combined the acoustic information conveyed byF1-transition duration and byF3-onset frequency, provided a good fit to the perception of synthetic /r/ and /l/ by the American subjects. There was substantial variation among the Japanese subjects in whether theF1 andF3 cues had a significant effect on their classifications of the synthetic speech. This variation was related to variation in accuracy of recognizing natural /r/ and /l/, such that greater use of both theF1 cue and theF3 cue in classifying the synthetic speech sounds was positively related to accuracy in recognizing the natural sounds. However, multiple regression showed that use of theF1 cue did not account for significant variance in natural speech performance beyond that accounted for by theF3 cue, indicating that theF3 cue is more important than theF1 cue for Japanese speakers learning English. The relation between performance on natural and synthetic speech also provides external validation of the logit model by showing that it predicts performance outside of the domain of data to which it was fit. 相似文献
20.
Leah S. Larkey Jerry Wald Winifred Strange 《Attention, perception & psychophysics》1978,23(4):299-312
Identification and discrimination of synthesized syllable-initial and syllable-final nasal consonants (/mæ-næ-ηæ/ and æm-æn-æη) by adult American subjects were assessed to determine (1) whether place-of-articulation contrasts in nasals, cued only by second and third formant transition variations, were perceived categorically, and (2) if linguistic experience affected_ the perception of this acoustic dimension. In two experiments, subjects produced consistent identification functions with sharp boundaries between familiar phoneme categories. Corresponding discrimination functions showed “peaks” of relatively accurate perception for cross-category comparison pairs, indicating categorical perception. Identification consistency and discrimination accuracy were inferior for the/n/-/η/ contrast in the unfamiliar (and phonologically inappropriate) syllable-initial condition compared to the familiar syllable-final condition. No such difference was found in identification and discrimination of the acoustically comparable oral stop consonant contrast/d/-/g/in syllable-initial and syllable-final position. These results provide evidence that perception of linguistically relevant acoustic dimensions by adults is constrained, at least in part, by their familiarity with those acoustic (and phonetic) contrasts in specific phonological contexts. 相似文献