首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Voice quality is an important perceptual cue in many disciplines, but knowledge of its nature is limited by a poor understanding of the relevant psychoacoustics. This article (aimed at researchers studying voice, speech, and vocal behavior) describes the UCLA voice synthesizer, software for voice analysis and synthesis designed to test hypotheses about the relationship between acoustic parameters and voice quality perception. The synthesizer provides experimenters with a useful tool for creating and modeling voice signals. In particular, it offers an integrated approach to voice analysis and synthesis and allows easy, precise, spectral-domain manipulations of the harmonic voice source. The synthesizer operates in near real time, using a parsimonious set of acoustic parameters for the voice source and vocal tract that a user can modify to accurately copy the quality of most normal and pathological voices. The software, user’s manual, and audio files may be downloaded from http:// brm.psychonomic-journals.org/content/supplemental. Future updates may be downloaded from www.surgery .medsch.ucla.edu/glottalaffairs/.  相似文献   

2.
In this paper, we describe the application of new computer and speech synthesis technologies for reading instruction. Stories are presented on the computer screen, and readers may designate words or parts of words that they cannot read for immediate speech feedback. The important contingency between speech sounds and their corresponding letter patterns is emphasized by displaying the letter patterns in reverse video as they are spoken. Speech feedback is provided by an advanced text-to-speech synthesizer (DECtalk). Intelligibility data are presented, showing that DECtalk can be understood almost as well as natural human speech by both normal adults and reading disabled children. Preliminary data from 26 disabled readers indicate that there are significant benefits of speech feedback for reading comprehension and word recognition, and that children enjoy reading with the system.  相似文献   

3.
In this report we describe a graphical interface for generating voiced speech using a frequency-domain implementation of the Klatt (1980) cascade formant synthesizer. The input to the synthesizer is a set of parameter vectors, calledtracks, which specify the overall amplitude, fundamental frequency, formant frequencies, and formant bandwidths at specified time intervals. Tracks are drawn with the aid of a computer mouse that can be used either inpoint-draw mode, which selects a parameter value for a single time frame, or inline-draw mode, which uses piecewise linear interpolation to connect two user-selected endpoints. Three versions of the program are described: (1) SYNTH draws tracks on an empty time-frequency grid, (2) SPECSYNTH creates a spectrogram of a recorded signal upon which tracks can be superimposed, and (3) SWSYNTH is similar to SPECSYNTH, except that it generatessine-wave speech (Remez, Rubin, Pisoni, & Carrell, 1981) using a set of time-varying sinusoids rather than cascaded formants. The program is written for MATLAB, an interactive computing environment for matrix computation. Track-Draw provides a useful tool for investigating the perceptually salient properties of voiced speech and other sounds.  相似文献   

4.
Adults who stutter can learn to control and improve their speech fluency by modifying their speaking rate. Existing speech therapy technologies can assist this practice by monitoring speaking rate and providing feedback to the patient, but cannot provide an accurate, quantitative measurement of speaking rate. Moreover, most technologies are too complex and costly to be used for home practice. We developed an algorithm and a smartphone application that monitor a patient’s speaking rate in real time and provide user-friendly feedback to both patient and therapist. Our speaking rate computation is performed by a phoneme counting algorithm which implements spectral transition measure extraction to estimate phoneme boundaries. The algorithm is implemented in real time in a mobile application that presents its results in a user-friendly interface. The application incorporates two modes: one provides the patient with visual feedback of his/her speech rate for self-practice and another provides the speech therapist with recordings, speech rate analysis and tools to manage the patient’s practice. The algorithm’s phoneme counting accuracy was validated on ten healthy subjects who read a paragraph at slow, normal and fast paces, and was compared to manual counting of speech experts. Test-retest and intra-counter reliability were assessed. Preliminary results indicate differences of −4% to 11% between automatic and human phoneme counting. Differences were largest for slow speech. The application can thus provide reliable, user-friendly, real-time feedback for speaking rate control practice.  相似文献   

5.
Inexpensive speech synthesizers are now available that either plug into the card slots on computers or connect to their serial ports. With some practice in listening, the speech produced by these synthesizers is quite intelligible, and because they present information that appears on the screen of the monitor, they make it possible for blind persons to interact with computers. However, they provide only a partial solution. In too many instances, some of the text appearing on the monitor screen cannot be directed to the synthesizer, and the missing information is often crucial. This happens because the writers of application programs sometimes modify the computer’s disk operating system. This limitation can be overcome by using special hardware that reads the screen buffer directly and creates a virtual image of the screen in memory external to the computer. Because the image is updated continually, it is always an accurate representation of whatever appears on the screen. The characters contained in the external buffer can be sent to a speech synthesizer, and can be examined selectively by means of a small keyboard. Several successful hardware solutions have been demonstrated, and one of these solutions is described here in detail.  相似文献   

6.
Talking computers employing computer-generated speech feedback have been used to remediate the literacy skills of dyslexic readers. A computer program is described that employs DECtalk, a highlevel speech synthesizer, to narrate instruction involving intensive training in identifying whole words or in identifying and blending word segments corresponding to onsets, rimes, and phonemes. Procedures for developing individualized instruction are described as well as for constructing and editing the speech and graphics features of the program. Neurologically impaired dyslexic children trained with this program achieved greater acquisition and transfer of word recognition skill when their training involved segmented rather than whole word feedback.  相似文献   

7.
Emotion is considered to be an essential element in the performance of human-computer interactions. In expressive synthesis speech, it is important to generate emotional speech that reflects subtle and complex emotional states. However, there has been limited research on how to effectively synthesize emotional speech using different levels of emotion strength with intuitive control, which is difficult to be modeled effectively. In this paper, we explore an expressive speech synthesis model that can be used to produce speech with multiple emotion strengths. Unlike previous studies that encoded emotions into discrete codes, we propose an embedding vector to continuously control the emotion strength, which is a data-driven method to synthesize speech with a fine control over the emotions. Compared with the models using the retraining technique or a one-hot vector, our proposed model using an embedding vector can explicitly learn the high-level emotion strength from the low-level acoustic features. As a result, we can control the emotion strength of synthetic speech in a relatively predictable and globally consistent way. The objective and subjective evaluation tests show that our proposed model achieves state-of-the-art performance in terms of model flexibility and controllability.  相似文献   

8.
There is no consensus regarding the fundamental phonetic units that underlie speech production. There is, however, general agreement that the frequency of occurrence of these units is a significant factor. Investigators often use the effects of manipulating frequency to support the importance of particular units. Studies of pseudoword production have been used to show the importance of sublexical units, such as initial syllables, phonemes, and biphones. However, it is not clear that these units play the same role when the production of pseudowords is compared to the production of real words. In this study, participants overtly repeated real and pseudowords that were similar for length, complexity, and initial syllable frequency while undergoing functional magnetic resonance imaging. Compared to real words, production of pseudowords produced greater activation in much of the speech production network, including bilateral inferior frontal cortex, precentral gyri and supplementary motor areas and left superior temporal cortex and anterior insula. Only middle right frontal gyrus showed greater activation for real words than for pseudowords. Compared to a no-speech control condition, production of pseudowords or real words resulted in activation of all of the areas shown to comprise the speech production network. Our data, in conjunction with previous studies, suggest that the unit that is identified as the basic unit of speech production is influenced by the nature of the speech that is being studied, i.e., real words as compared to other real words, pseudowords as compared to other pseudowords, or real words as compared to pseudowords.  相似文献   

9.
The roles of phonological short-term memory (pSTM) and speech perception in spoken sentence comprehension were examined in an experimental design. Deficits in pSTM and speech perception were simulated through task demands while typically-developing children (N \(=\) 71) completed a sentence-picture matching task. Children performed the control, simulated pSTM deficit, simulated speech perception deficit, or simulated double deficit condition. On long sentences, the double deficit group had lower scores than the control and speech perception deficit groups, and the pSTM deficit group had lower scores than the control group and marginally lower scores than the speech perception deficit group. The pSTM and speech perception groups performed similarly to groups with real deficits in these areas, who completed the control condition. Overall, scores were lowest on noncanonical long sentences. Results show pSTM has a greater effect than speech perception on sentence comprehension, at least in the tasks employed here.  相似文献   

10.
A system is described for analyzing recorded natural speech in real time using a microcomputer. Recordings up to 15 min in length can be analyzed in terms of fundamental frequency, amplitude, length of utterances, and pauses. Although primarily developed for clinical research, the system has applicability to other research areas involving speech.  相似文献   

11.
We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.  相似文献   

12.
To learn to produce speech, infants must effectively monitor and assess their own speech output. Yet very little is known about how infants perceive speech produced by an infant, which has higher voice pitch and formant frequencies compared to adult or child speech. Here, we tested whether pre‐babbling infants (at 4–6 months) prefer listening to vowel sounds with infant vocal properties over vowel sounds with adult vocal properties. A listening preference favoring infant vowels may derive from their higher voice pitch, which has been shown to attract infant attention in infant‐directed speech (IDS). In addition, infants' nascent articulatory abilities may induce a bias favoring infant speech given that 4‐ to 6‐month‐olds are beginning to produce vowel sounds. We created infant and adult /i/ (‘ee’) vowels using a production‐based synthesizer that simulates the act of speaking in talkers at different ages and then tested infants across four experiments using a sequential preferential listening task. The findings provide the first evidence that infants preferentially attend to vowel sounds with infant voice pitch and/or formants over vowel sounds with no infant‐like vocal properties, supporting the view that infants' production abilities influence how they process infant speech. The findings with respect to voice pitch also reveal parallels between IDS and infant speech, raising new questions about the role of this speech register in infant development. Research exploring the underpinnings and impact of this perceptual bias can expand our understanding of infant language development.  相似文献   

13.
We have implemented software for development of synthetic visual speech and perceptual experimentation on a UNIX workstation. We describe recent improvements in the speech synthesis and the capabilities of the development system. We also show how a typical experiment is programmed and describe our solutions for real-time experimental control under the UNIX operating system.  相似文献   

14.
We have implemented a facial animation system to carry out visible speech synthesis. Using this system, it is possible to manipulate control parameters to synthesize a sequence of speech articulations. In addition, it is possible to synthesize novel articulations, such as one that is half way between /ba/ and /da/.  相似文献   

15.
Recent studies have shown that the presentation of concurrent linguistic context can lead to highly efficient performance in a standard conjunction search task by the induction of an incremental search strategy (Spivey, Tyler, Eberhard, & Tanenhaus, 2001). However, these findings were obtained under anomalously slow speech rate conditions. Accordingly, in the present study, the effects of concurrent linguistic context on visual search performance were compared when speech was recorded at both a normal rate and a slow rate. The findings provided clear evidence that the visual search benefit afforded by concurrent linguistic context was contingent on speech rate, with normal speech producing a smaller benefit. Overall, these findings have important implications for understanding how linguistic and visual processes interact in real time and suggest a disparity in the temporal resolution of speech comprehension and visual search processes.  相似文献   

16.
A procedure based on monaural fusion has been developed to construct acoustic continua between natural speech sounds, to be used in studies of speech perception. Two speech stimuli of similar temporal structure and different spectral composition are precisely aligned in time and presented simultaneously to the listener. By mixing both stimulus components in varying intensity ratios, a transition from one component to the other can be achieved. Such stimulus continua have several advantages over the synthetic continua commonly used in studies of categorical perception and related phenomena: They are based on real speech stimuli; the endpoint stimuli are unambiguous; and the stimuli are characterized by a well-defined physical variable, the relative intensity of the two components.  相似文献   

17.
Schizophrenia involves multiple communication impairments, including (a) disorganized speech, or formal thought disorder (FTD); and (b) decreased speech output, or poverty of speech. Both FTD and poverty of speech have been hypothesized to be associated with deficits in executive functioning or cognitive control. The current study examined whether FTD and poverty of speech were differentially associated with two distinct aspects of cognitive control, working memory and controlled retrieval. Compared with control participants (n = 30), people with schizophrenia (n = 47) exhibited poorer performance on both working memory and controlled retrieval tasks. However, only FTD (and not poverty of speech) was associated with poor working memory. In contrast, only poverty of speech (and not FTD) had a significant zero-order association with poor controlled retrieval. At the same time, working memory and controlled retrieval interacted to predict FTD, with the highest amount of FTD associated with both poor working memory and poor controlled retrieval. In contrast, psychometric control tasks were not associated with FTD or poverty of speech. This research suggests that FTD and poverty of speech are differentially associated with deficits in distinct aspects of cognitive control.  相似文献   

18.
Stutterers stutter significantly less in the laboratory and the clinic than in everyday speaking situations. This paper reviews pertinent literature to show that, in the outside world, stutterers have a stake in what they say and therefore message content and interpersonal dynamics command attention-relegating speech-motor planning and execution to an automatic, memory-based process called speech concatenation. In the laboratory and the clinic, content of communication and interpersonal dynamics are less important, allowing stutterers to concentrate on the motor planning of articulation and prosody. Evidence reviewed here suggests that speech construction (real time preparation of an utterance motor plan) is incompatible with stuttering. Evidence also suggests that a slight delay in retrieving motor plans from memory during speech concatenation is the immediate source of stuttering.  相似文献   

19.
The present study investigated the articulatory implementation deficits of Broca's and Wernicke's aphasics and their potential neuroanatomical correlates. Five Broca's aphasics, two Wernicke's aphasics, and four age-matched normal speakers produced consonant-vowel-(consonant) real word tokens consisting of [m, n] followed by [i, e, a, o, u]. Three acoustic measures were analyzed corresponding to different properties of articulatory implementation: murmur duration (a measure of timing), amplitude of the first harmonic at consonantal release (a measure of articulatory coordination), and murmur amplitude over time (a measure of laryngeal control). Results showed that Broca's aphasics displayed impairments in all of these parameters, whereas Wernicke's aphasics only exhibited greater variability in the production of two of the parameters. The lesion extent data showed that damage in either Broca's area or the insula cortex was not predictive of the severity of the speech output impairment. Instead, lesions in the upper and lower motor face areas and the supplementary motor area resulted in the most severe implementation impairments. For the Wernicke's aphasics, the posterior areas (superior marginal gyrus, parietal, and sensory) appear to be involved in the retrieval and encoding of lexical forms for speech production, resulting in increased variability in speech production.  相似文献   

20.
One oft-cited problem with teaching speech skills to autistic children isthe failure of the speech to be spontaneous. That is, the children's speech often remains underthe control of the verbal behavior of others rather than underthe control of other nonverbal referents inthe environment. We investigatedthe effectiveness of a time delay procedure to increasethe spontaneous speech of seven autistic children. Initially, the experimenter presented a desired object (e.g., cookie) and immediately modeledthe appropriate response “I want (cookie).” Gradually, asthe child imitatedthe vocalization, the experimenter increasedthe time between presentation of the object andthe modeled vocalization in an attempt to transfer stimulus control of the child's vocalization fromthe experimenter's model tothe object. Results indicated that allthe children learned to request items spontaneously and generalized this behavior across settings, people, situations, and to objects which had not been taught. These results are discussed in relation tothe literature on spontaneous speech, prompting, and generalization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号