共查询到20条相似文献,搜索用时 0 毫秒
1.
We have implemented a facial animation system to carry out visible speech synthesis. Using this system, it is possible to manipulate control parameters to synthesize a sequence of speech articulations. In addition, it is possible to synthesize novel articulations, such as one that is half way between /ba/ and /da/. 相似文献
2.
Luis E. López-bascuas Carlos Carrero Marín Francisco J. Serradilla garcía 《Behavior research methods》1999,31(2):334-340
A computer program capable of supporting auditory and speech perception experimentation is described. Given a continuum of acoustic stimuli, the program (Paradigm) allows the user to present those stimuli under different, well-known psychophysical paradigms (simple identification, identification with a rating scale, 2IAX, ABX, AXB, and oddity task). For discrimination tests, both high uncertainty (roving designs) and minimal psychophysical uncertainty (fixed designs) procedures are available. All the relevant time intervals can be precisely specified, and feedback is also available. Response times can be measured as well. Furthermore, the program stores subjects’ responses and provides summaries of experimental results for both individual subjects and groups. The program runs on Microsoft Windows (3.1 or 95) on personal computers equipped with any soundboard. 相似文献
3.
The hardware configuration and software for a voice-output communications station are described. Advantages of such a capability in computer-aided instruction and as an aid for the visually handicapped are noted. The man-computer communication station incorporates a relatively inexpensive digitally controlled voice synthesizer. 相似文献
4.
Kerzel D 《Psychological research》2002,66(3):195-200
The present study investigated compatibility effects between written and spoken syllables. Participants saw the syllables "Ba" or "Da" printed on a speaker's mouth that was articulating either /b wedge / or /d wedge /. Participants classified either the printed syllable or the mouth movement by pressing a left or right key. Responses were faster when mouth movement and letters were congruent regardless of imperative stimulus dimension. As the two stimulus dimensions (mouth movements and letters) showed dimensional overlap, but did not overlap with the response, stimulus-response compatibility was ruled out according to some models. It is argued that the compatibility effect was due to the competition of phonological codes at a stage preceding response selection. Also, the results lend support to the view that Stroop-like tasks are ambiguous with regard to the locus of compatibility effects. Stimulus-response and stimulus-stimulus compatibility may be observed. 相似文献
5.
Mary Beth Rosson 《Behavior research methods》1985,17(2):250-252
The availability of unlimited text-to-speech synthesis systems provides the potential for remote access to databases over existing telephone systems. This paper describes some of the intelligibility and user interface problems associated with such applications and reports work aimed at understanding and solving these problems. In particular, it describes an approach to intelligibility problems based on the understanding and manipulation of listeners’ adaptation to synthetic speech, and user interface work examining use of a simulated travel information system. 相似文献
6.
7.
An experiment was performed to test whether cross-modal speaker matches could be made using isolated visible speech movement information. Visible speech movements were isolated using a point-light technique. In five conditions, subjects were asked to match a voice to one of two (unimodal) speaking point-light faces on the basis of speaker identity. Two of these conditions were designed to maintain the idiosyncratic speech dynamics of the speakers, whereas three of the conditions deleted or distorted the dynamics in various ways. Some of these conditions also equated video frames across dynamically correct and distorted movements. The results revealed generally better matching performance in the conditions that maintained the correct speech dynamics than in those conditions that did not, despite containing exactly the same video frames. The results suggest that visible speech movements themselves can support cross-modal speaker matching. 相似文献
8.
Kerzel D Bekkering H 《Journal of experimental psychology. Human perception and performance》2000,26(2):634-647
In speech perception, phonetic information can be acquired optically as well as acoustically. The motor theory of speech perception holds that motor control structures are involved in the processing of visible speech, whereas perceptual accounts do not make this assumption. Motor involvement in speech perception was examined by showing participants response-irrelevant movies of a mouth articulating /ba/ or /da/ and asking them to verbally respond with either the same or a different syllable. The letters "Ba" and "Da" appeared on the speaker's mouth to indicate which response was to be performed. A reliable interference effect was observed. In subsequent experiments, perceptual interference was ruled out by using response-unrelated imperative stimuli and by preexposing the relevant stimulus information. Further, it was demonstrated that simple directional features (opening and closing) do not account for the effect. Rather, the present study provides evidence for the view that visible speech is processed up to a late, response-related processing stage, as predicted by the motor theory of speech perception. 相似文献
9.
Within the next few years, there will be an extensive proliferation of various types of voice response devices in human-machine communication systems. Unfortunately, at present, relatively little basic or applied research has been carried out on the intelligibility, comprehension, and perceptual processing of synthetic speech produced by these devices. On the basis of our research, we identify five factors that must be considered in studying the perception of synthetic speech: (1) the specific demands imposed by a particular task, (2) the inherent limitations of the human information processing system, (3) the experience and training of the human listener, (4) the linguistic structure of the message set, and (5) the structure and quality of the speech signal. 相似文献
10.
Stephen M. Williams 《Current Psychology》1987,6(2):148-154
Previous experimental investigation of the effects of repeating an unfamiliar stimulus suggests that mere exposure breeds
attraction (e.g., Zajonc, 1968). On the other hand, correlational work with naturally occurring stimuli such as names, music,
or landscapes suggests that there is also an overexposure effect: the preference function does rise with familiarity at first
but then reaches a turning point and diminishes. The study (N=72) demonstrates this inverted-U relationship in an experimental setting. The stimuli were synthetic nonsense speech, permitting
exact control of exposure durations and interstimulus intervals. The critical factors for demonstrating the effect are probably
(1) the inclusion of a large number of repetitions, and (2) blocked repetition of each stimulus in a homogeneous sequence
not interspersed with other more or less frequent stimuli. 相似文献
11.
The perception of the distinction between /r/ and /l/ by native speakers of American English and of Japanese was studied using natural and synthetic speech. The American subjects were all nearly perfect at recognizing the natural speech sounds, whereas there was substantial variation among the Japanese subjects in their accuracy of recognizing /r/ and /l/ except in syllable-final position. A logit model, which additively combined the acoustic information conveyed byF1-transition duration and byF3-onset frequency, provided a good fit to the perception of synthetic /r/ and /l/ by the American subjects. There was substantial variation among the Japanese subjects in whether theF1 andF3 cues had a significant effect on their classifications of the synthetic speech. This variation was related to variation in accuracy of recognizing natural /r/ and /l/, such that greater use of both theF1 cue and theF3 cue in classifying the synthetic speech sounds was positively related to accuracy in recognizing the natural sounds. However, multiple regression showed that use of theF1 cue did not account for significant variance in natural speech performance beyond that accounted for by theF3 cue, indicating that theF3 cue is more important than theF1 cue for Japanese speakers learning English. The relation between performance on natural and synthetic speech also provides external validation of the logit model by showing that it predicts performance outside of the domain of data to which it was fit. 相似文献
12.
CARSTEN ELBRO† INGELISE RASMUSSEN BIRGITTE SPELLING 《Scandinavian journal of psychology》1996,37(2):140-155
In a long-term study two groups of language and reading impaired students ( N = 15 + 15) were reading with the aid of segmented speech-feedback in a computerized program. One group received feedback that was simultaneously segmented visually and auditorily into syllables, the other received feedback by letter names. In both groups subjects were expected to synthesize segments into words and to compare their synthesis to whole word feed-back subsequently provided by the computer. They worked for half a lesson (approximately 20 minutes) a day for a total of 40 days. During this period, the experiment groups progressed more in reading than a control group of age and reading-level-matched students ( N = 35) who received traditional remedial instruction. The group in the syllable condition gained slightly more in non-word reading and in syllable segmentation than did the letter group. Differences in gains in reading abilities were not explained by differences in age, but to some extent by initial level of phoneme and syllable awareness. Future applications of the speech-feedback system are discussed. 相似文献
13.
Denny C. Lecompte 《Psychonomic bulletin & review》1995,2(3):391-397
The irrelevant speech effect is the impairment of task performance by the presentation of to-be-ignored speech stimuli. Typically, the irrelevant speech comprises a variety of sounds, but previous research (e.g., Jones, Madden, & Miles, 1992) has suggested that the deleterious effect of background speech is virtually eliminated if the speech comprises repetitions of a sound (e.g., “be, be, be”) or a single continuous sound (e.g., “beeeeeee”). Four experiments are reported that challenge this finding. Experiments 1, 2, and 4 show a substantial impairment in serial recall performance in the presence of a repeated sound, and Experiments 3 and 4 show a similar impairment of serial recall in the presence of a continuous sound. The relevance of these findings to several explanations of the irrelevant speech effect is discussed. 相似文献
14.
The perception of the distinction between /r/ and /l/ by native speakers of American English and of Japanese was studied using natural and synthetic speech. The American subjects were all nearly perfect at recognizing the natural speech sounds, whereas there was substantial variation among the Japanese subjects in their accuracy of recognizing /r/ and /l/ except in syllable-final position. A logit model, which additively combined the acoustic information conveyed by F1-transition duration and by F3-onset frequency, provided a good fit to the perception of synthetic /r/ and /l/ by the American subjects. There was substantial variation among the Japanese subjects in whether the F1 and F3 cues had a significant effect on their classifications of the synthetic speech. This variation was related to variation in accuracy of recognizing natural /r/ and /l/, such that greater use of both the F1 cue and the F3 cue in classifying the synthetic speech sounds was positively related to accuracy in recognizing the natural sounds. However, multiple regression showed that use of the F1 cue did not account for significant variance in natural speech performance beyond that accounted for by the F3 cue, indicating that the F3 cue is more important than the F1 cue for Japanese speakers learning English. The relation between performance on natural and synthetic speech also provides external validation of the logit model by showing that it predicts performance outside of the domain of data to which it was fit. 相似文献
15.
Barbara Wise Richard Olson Mike Anstett Lauralyn Andrews Maureen Terjak Vivian Schneider Julie Kostuch Laura Kriho 《Behavior research methods》1989,21(2):173-180
This paper discusses hardware choices, software developments, implementation issues, and preliminary results from an ongoing long-term remedial reading study. Reading-disabled children read books on microcomputers linked to speech synthesizers, obtaining speech feedback on difficult words at whole-word, syllable, or subsyllable levels of segmentation. Word-recognition ability and attitude about reading improved for children using the system. In addition, segmented feedback especially benefited phonological word-decoding skills for most of the children. 相似文献
16.
D W Massaro 《Journal of experimental psychology. General》1988,117(4):417-421
Bruno and Cutting (1988) varied four monocular cues to perceived depth in a factorial design. Subjects judged the distance between test objects. Given main effects in the analysis of variance, the authors concluded that the perceivers integrated the four different sources of information, as opposed to simply selecting a single source. Given no interactions in the analysis of variance, the authors concluded that the integration process was additive rather than multiplicative. The ambiguity inherent in Bruno and Cutting's experiments and analyses is discussed. As presented, their results did not provide evidence for integration of depth cues or evidence for additivity, independence, and parallel processing of the cues. An additional analysis of the distribution of the rating judgments given by their subjects, however, provides some evidence for integration of the cues. The fuzzy logical model of perception (FLMP) is extended to describe perceptual recognition of depth. The model assumes independence of the cues during feature evaluation and a nonadditive integration process in which the least ambiguous cues have the greatest impact on the judgment. The FLMP is contrasted with a model assuming additivity of the cues. Because both models describe the results equally well, it remains for future researchers to provide definitive tests between the models. 相似文献
17.
Catherine T. Best Harry Hoffman Bradley B. Glanville 《Attention, perception & psychophysics》1982,31(1):75-85
Groups of 2-, 3-, and 4-month olds were tested for dichotic ear differences in memory-based phonetic and music timbre discriminations. A right-ear advantage for speech and a left-ear advantage (LEA) for music were found in the 3- and 4-month-olds. However, the 2-month-olds showed only the music LEA, with no reliable evidence of memory-based speech discrimination by either hemisphere. Thus, the responses of all groups to speech contrasts were different from those to music contrasts, but the pattern of the response dichotomy in the youngest group deviated from that found in the older infants. It is suggested that the quality or use of lefthemisphere phonetic memory may change between 2 and 3 months, and that the engagement of right-hemisphere specialized memory for musical timbre may precede that for left-hemisphere phonetic memory. Several directions for future research are suggested to determine whether infant short-term memory asymmetries for speech and music are attributable to acoustic factors, to different modes or strategies in perception, or to structural and dynamic properties of natural sound sources. 相似文献
18.
L Goffman A Smith 《Journal of experimental psychology. Human perception and performance》1999,25(3):649-660
It is often hypothesized that speech production units are less distinctive in young children and that generalized movement primitives, or templates, serve as a base on which distinctive, mature templates are later elaborated. This hypothesis was examined by analyzing the shape and stability of single close-open speech movements of the lower lip recorded in 4-year-old, 7-year-old, and adult speakers during production of utterances that varied in only a single phoneme. To assess the presence of a generalized template, lower lip movement sequences were time and amplitude normalized, and a pattern recognition procedure was implemented. The findings indicate that speech movements of children already converged on phonetically distinctive patterns by 4 years of age. In contrast, an index of spatiotemporal stability demonstrated that the stability of underlying patterning of the movement sequence improves with maturation. 相似文献
19.
Language and speech were studied in a young child with perinatally acquired bifrontal lesions. Bilateral frontal pathology seriously interfered with the development of intelligible speech and resulted in a persistent expressive aphasia. Analysis of the neuropsychological profile indicated impairments in intelligence and language comprehension. These deficits, however, were considered secondary to the profound speech programming disorder. The findings indicate that, despite the plasticity of the immature central nervous system, bilateral frontal injury sustained at an early age precludes the development of intelligible speech. Furthermore, structurally intact cortical regions outside the territories of the speech zones fail to mediate normal speech and language development. 相似文献