首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
This article presents MANULEX, a Web-accessible database that provides grade-level word frequency lists of nonlemmatized and lemmatized words (48,886 and 23,812 entries, respectively) computed from the 1.9 million words taken from 54 French elementary school readers. Word frequencies are provided for four levels: first grade (G1), second grade (G2), third to fifth grades (G3-5), and all grades (G1-5). The frequencies were computed following the methods described by Carroll, Davies, and Richman (1971) and Zeno, Ivenz, Millard, and Duvvuri (1995), with four statistics at each level (F, overall word frequency; D, index of dispersion across the selected readers; U, estimated frequency per million words; and SFI, standard frequency index). The database also provides the number of letters in the word and syntactic category information. MANULEX is intended to be a useful tool for studying language development through the selection of stimuli based on precise frequency norms. Researchers in artificial intelligence can also use it as a source of information on natural language processing to simulate written language acquisition in children. Finally, it may serve an educational purpose by providing basic vocabulary lists.  相似文献   

2.
The LEXIN database offers psycholinguistic indexes of the 13,184 different words (types) computed from 178,839 occurrences of these words (tokens) contained in a corpus of 134 beginning readers widely used in Spain. This database provides four statistical indicators: F (overall word frequency), D (index of dispersion across selected readers), U (estimated frequency per million words), and SFI (standard frequency index). It also gives information about the number of letters, syntactic category, and syllabic structure of the words included. To facilitate comparisons, LEXIN provides data from LEXESP’s (Sebastián-Gallés, Martí, Cuetos, & Carreiras, 2000), Alameda and Cuetos’s (1995), and Martínez and García’s (2004) Spanish adult psycholinguistic frequency databases. Access to the LEXIN database is facilitated by a computer program. The LEXIN program allows for the creation of word lists by letting the user specify searching criteria. LEXIN can be useful for researchers in cognitive psychology, particularly in the areas of psycholinguistics and education.  相似文献   

3.
In this article, we introduce ESCOLEX, the first European Portuguese children’s lexical database with grade-level-adjusted word frequency statistics. Computed from a 3.2-million-word corpus, ESCOLEX provides 48,381 word forms extracted from 171 elementary and middle school textbooks for 6- to 11-year-old children attending the first six grades in the Portuguese educational system. Like other children’s grade-level databases (e.g., Carroll, Davies, & Richman, 1971; Corral, Ferrero, & Goikoetxea, Behavior Research Methods, 41, 1009–1017, 2009; Lété, Sprenger-Charolles, & Colé, Behavior Research Methods, Instruments, & Computers, 36, 156–166, 2004; Zeno, Ivens, Millard, Duvvuri, 1995), ESCOLEX provides four frequency indices for each grade: overall word frequency (F), index of dispersion across the selected textbooks (D), estimated frequency per million words (U), and standard frequency index (SFI). It also provides a new measure, contextual diversity (CD). In addition, the number of letters in the word and its part(s) of speech, number of syllables, syllable structure, and adult frequencies taken from P-PAL (a European Portuguese corpus-based lexical database; Soares, Comesaña, Iriarte, Almeida, Simões, Costa, …, Machado, 2010; Soares, Iriarte, Almeida, Simões, Costa, França, …, Comesaña, in press) are provided. ESCOLEX will be a useful tool both for researchers interested in language processing and development and for professionals in need of verbal materials adjusted to children’s developmental stages. ESCOLEX can be downloaded along with this article or from http://p-pal.di.uminho.pt/about/databases.  相似文献   

4.
In this article, we present Procura-PALavras (P-PAL), a Web-based interface for a new European Portuguese (EP) lexical database. Based on a contemporary printed corpus of over 227 million words, P-PAL provides a broad range of word attributes and statistics, including several measures of word frequency (e.g., raw counts, per-million word frequency, logarithmic Zipf scale), morpho-syntactic information (e.g., parts of speech [PoSs], grammatical gender and number, dominant PoS, and frequency and relative frequency of the dominant PoS), as well as several lexical and sublexical orthographic (e.g., number of letters; consonant–vowel orthographic structure; density and frequency of orthographic neighbors; orthographic Levenshtein distance; orthographic uniqueness point; orthographic syllabification; and trigram, bigram, and letter type and token frequencies), and phonological measures (e.g., pronunciation, number of phonemes, stress, density and frequency of phonological neighbors, transposed and phonographic neighbors, syllabification, and biphone and phone type and token frequencies) for ~53,000 lemmatized and ~208,000 nonlemmatized EP word forms. To obtain these metrics, researchers can choose between two word queries in the application: (i) analyze words previously selected for specific attributes and/or lexical and sublexical characteristics, or (ii) generate word lists that meet word requirements defined by the user in the menu of analyses. For the measures it provides and the flexibility it allows, P-PAL will be a key resource to support research in all cognitive areas that use EP verbal stimuli. P-PAL is freely available at http://p-pal.di.uminho.pt/tools.  相似文献   

5.
In this article, we present a new lexical database for Modern Standard Arabic: Aralex. Based on a contemporary text corpus of 40 million words, Aralex provides information about (1) the token frequencies of roots and word patterns, (2) the type frequency, or family size, of roots and word patterns, and (3) the frequency of bigrams, trigrams in orthographic forms, roots, and word patterns. Aralex will be a useful tool for studying the cognitive processing of Arabic through the selection of stimuli on the basis of precise frequency counts. Researchers can use it as a source of information on natural language processing, and it may serve an educational purpose by providing basic vocabulary lists. Aralex is distributed under a GNU-like license, allowing people to interrogate it freely online or to download it from www.mrc-cbu.cam.ac.uk:8081/aralex .online/login.jsp.  相似文献   

6.
In this article, we introduce HelexKids, an online written-word database for Greek-speaking children in primary education (Grades 1 to 6). The database is organized on a grade-by-grade basis, and on a cumulative basis by combining Grade 1 with Grades 2 to 6. It provides values for Zipf, frequency per million, dispersion, estimated word frequency per million, standard word frequency, contextual diversity, orthographic Levenshtein distance, and lemma frequency. These values are derived from 116 textbooks used in primary education in Greece and Cyprus, producing a total of 68,692 different word types. HelexKids was developed to assist researchers in studying language development, educators in selecting age-appropriate items for teaching, as well as writers and authors of educational books for Greek/Cypriot children. The database is open access and can be searched online at www.helexkids.org.  相似文献   

7.
8.
Extant word lists are typically based on word frequency counts from various types of literature (e.g., basal readers, content textbooks, trade books, adult reading material). The word list described in this study was constructed by determining what words are commonly known (i.e., recognized in their written form) by beginning readers. Almost 7,000 primary grade children were tested for basic sight recognition of 1,800 high frequency words. Using a 90 percent minimum criterion (i.e., 90 percent or more of the students at each grade level recognized each word), a 1,683‐word list was established that consisted of 587 first‐grade words, 861 second‐grade words, and 235 third‐grade words. Implications and uses of the extended basic sight vocabulary are also discussed.  相似文献   

9.
Lexical development is typically viewed as elaboration, differentiation, and integration of semantic codes—codes that signify the meanings embodied in the words. Our earlier work based on 50 noun + noun (NN) compounds in Telugu has shown that children in the age group 8–14 years exhibit clear-cut developmental trends in producing and segmenting the NN compound nouns and generating words that are related in meaning to the target compounds. The database for the present study is drawn from our earlier work, and it consists of 1800 words reported to be related in meaning to the 50 target compound nouns by 36 children (12 III grade children, 12 VI grade children, and 12 IX grade children) and 600 words produced by 12 adults. A thorough analysis of the individual word associations generated by the subjects revealed that children tended to generate: (1) compounds with the same head word as the target word but with a new modifier word, (2) novel compounds that have phonetic/phonological association with the target words, most of which are actually nonwords in the language; and (3) new single-stem nouns and new compounds that are considerably fewer in number than those produced by adult subjects. Some of the theoretical and pedagogical implications of the differences in performance of children vs. adult subjects in the encoding of word meanings in an experimental context are discussed in this paper.  相似文献   

10.
Word frequency is the most important variable in research on word processing and memory. Yet, the main criterion for selecting word frequency norms has been the availability of the measure, rather than its quality. As a result, much research is still based on the old Kučera and Francis frequency norms. By using the lexical decision times of recently published megastudies, we show how bad this measure is and what must be done to improve it. In particular, we investigated the size of the corpus, the language register on which the corpus is based, and the definition of the frequency measure. We observed that corpus size is of practical importance for small sizes (depending on the frequency of the word), but not for sizes above 16–30 million words. As for the language register, we found that frequencies based on television and film subtitles are better than frequencies based on written sources, certainly for the monosyllabic and bisyllabic words used in psycholinguistic research. Finally, we found that lemma frequencies are not superior to word form frequencies in English and that a measure of contextual diversity is better than a measure based on raw frequency of occurrence. Part of the superiority of the latter is due to the words that are frequently used as names. Assembling a new frequency norm on the basis of these considerations turned out to predict word processing times much better than did the existing norms (including Kučera & Francis and Celex). The new SUBTL frequency norms from the SUBTLEXUS corpus are freely available for research purposes from http://brm.psychonomic-journals.org/content/supplemental, as well as from the University of Ghent and Lexique Web sites.  相似文献   

11.
杨群  王艳  张积家 《心理学报》2019,51(1):1-13
汉字的多音字数量众多, 种类复杂, 为维吾尔族学生学习汉语带来了困难。通过两个实验, 考察正字法深度对汉族大学生和维吾尔族大学生的汉字词命名的影响。结果表明, 无论是命名单字词还是命名双字词, 维吾尔族学生的反应时均比汉族学生显著长。对单字词, 两个民族被试的命名时间均受汉字的正字法深度和词频影响, 被试命名多音字的时间显著长于命名单音字, 命名低频字的时间显著长于命名高频字。对双字词, 两个民族被试的命名时间存在着词频与正字法深度的交互作用:对高频词, 汉族学生对由多音字组成的词与由单音字组成的词的反应时差异不显著, 维吾尔族学生对由多音字组成的词的反应时显著长于对由单音字组成的词; 对低频词, 汉族学生对由多音字组成的词的反应时显著长于对由单音字组成的词, 维吾尔族学生对由多音字组成的词与由单音字组成的词的反应时差异不显著。整个研究表明, 正字法深度对两个民族大学生的汉字词命名的影响具有不同模式。所以如此, 与两个民族的母语特点、词汇获得年龄、语言熟练程度和语言加工方式不同有关。  相似文献   

12.
In this article, we present a new lexical database for French:Lexique. In addition to classical word information such as gender, number, and grammatical category,Lexique includes a series of interesting new characteristics. First, word frequencies are based on two cues: a contemporary corpus of texts and the number of Web pages containing the word. Second, the database is split into a graphemic table with all the relevant frequencies, a table structured around lemmas (particularly interesting for the study of the inflectional family), and a table about surface frequency cues. Third,Lexique is distributed under a GNU-like license, allowing people to contribute to it. Finally, a metasearch engine,Open Lexique, has been developed so that new databases can be added very easily to the existing ones.Lexique can either be downloaded or interrogated freely fromhttp://www.lexique.org.  相似文献   

13.
After previously encoding lists of related words (e.g. bed, rest, awake, etc.) associated with one critical word (e.g. sleep), participants frequently falsely recognize critical words as having been previously presented. Past research indicates that warning participants of this memory illusion can reduce false recognition of critical words. However, the memory processes responsible for this reduction are not known. We investigated whether the increase in critical‐word memory performance reflects changes that are specific to the processing of critical words, or alternatively, changes that are applied generally to processing in all conditions. Different participants were warned (in two different ways) or not warned before encoding, and recognition sensitivity for critical words, related words and unrelated words was tested. The warnings increased memory performance equally across all conditions, not just for critical words. These results help to more effectively conceptualize false recognition and reductions in false recognition in this paradigm. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

14.
Background: When constructing stimuli for experimental investigations of cognitive processes in early reading development, researchers have to rely on adult or American children's word frequency counts, as no such counts exist for English children. Aim: The present paper introduces a database of children's early reading vocabulary, for use by researchers and teachers. Sample: Texts from 685 books from reading schemes and story books read by 5‐7 year‐old children were used in the construction of the database. Method: All words from the 685 books were typed or scanned into an Oracle database. Results: The resulting up‐to‐date word frequency list of early print exposure in the UK is available in two forms from a website address given in this paper. This allows access to one list of the words ordered alphabetically and one list of the words ordered by frequency. We also briefly address some fundamental issues underlying early reading vocabulary (e.g., that it is heavily skewed towards low frequencies). Other characteristics of the vocabulary are then discussed. Conclusions: We hope the word frequency lists will be of use to researchers seeking to control word frequency, and to teachers interested in the vocabulary to which young children are exposed in their reading material.  相似文献   

15.
We present Chinese translation norms for 1,429 English words. Chinese-English bilinguals (N = 28) were asked to provide the first Chinese translation that came to mind for 1,429 English words. The results revealed that 71 % of the English words received more than one correct translation indicating the large amount of translation ambiguity when translating from English to Chinese. The relationship between translation ambiguity and word frequency, concreteness and language proficiency was investigated. Although the significant correlations were not strong, results revealed that English word frequency was positively correlated with the number of alternative translations, whereas English word concreteness was negatively correlated with the number of translations. Importantly, regression analyses showed that the number of Chinese translations was predicted by word frequency and concreteness. Furthermore, an interaction between these predictors revealed that the number of translations was more affected by word frequency for more concrete words than for less concrete words. In addition, mixed-effects modelling showed that word frequency, concreteness and English language proficiency were all significant predictors of whether or not a dominant translation was provided. Finally, correlations between the word frequencies of English words and their Chinese dominant translations were higher for translation-unambiguous pairs than for translation-ambiguous pairs. The translation norms are made available in a database together with lexical information about the words, which will be a useful resource for researchers investigating Chinese-English bilingual language processing.  相似文献   

16.
Many studies show that age deficits in memory are smaller for information supported by pre-experimental experience. Many studies also find dissociations in memory tasks between words that occur with high and low frequencies in language, but the literature is mixed regarding the extent of word frequency effects in normal ageing. We examined whether age deficits in episodic memory could be influenced by manipulations of word frequency. In Experiment 1, young and older adults studied short and long lists of high- and low-frequency words for free recall. The list length effect (the drop in proportion recalled for longer lists) was larger in young compared to older adults and for high- compared to low-frequency words. In Experiment 2, young and older adults completed item and associative recognition memory tests with high- and low-frequency words. Age deficits were greater for associative memory than for item memory, demonstrating an age-related associative deficit. High-frequency words led to better associative memory performance whilst low-frequency words resulted in better item memory performance. In neither experiment was there any evidence for age deficits to be smaller for high- relative to low-frequency words, suggesting that word frequency effects on memory operate independently from effects due to cognitive ageing.  相似文献   

17.
The internal validity of several types of experiments in experimental psychology and neuroscience depends in part on the possibility of controlling or manipulating critical lexical variables such as word frequency of occurrence. Two ways of estimating this variable are (1) objective frequency counts and (2) subjective ratings of word frequency. Each method produces estimates that generally agree (i.e., they are highly correlated) but that disagree substantially concerning the relative frequency of a number of words. To investigate this issue more closely, the global and local agreement of subjective frequency estimates was examined in detail for a pool of 6,202 words drawn from the OMNILEX database of French words (Desrochers, 2006; www.omnilex.uottawa .ca). The results indicated that objective and subjective frequencies are strongly correlated, subjective frequencies share a significant amount of bias variance with other lexical characteristics (e.g., imageability), and the codeterminants of subjective frequency are in an antagonistic relationship with one another. The implications of these results for the selection of lexical stimuli are discussed, and multiple variables to aid in item selection are reported. Supplemental materials for this study may be downloaded from brm.psychonomic-journals.org/ content/supplemental.  相似文献   

18.
Do typological properties of language, such as agglutination (i.e., the morphological process of adding affixes to the lexeme of a word), have an impact on the development of visual word recognition? To answer this question, we carried out an experiment in which beginning, intermediate, and adult Basque readers (n = 32 each, average age = 7, 11, and 22 years, respectively) needed to read correctly versus incorrectly inflected words embedded in sentences. Half of the targets contained high-frequency stems, and the other half contained low-frequency stems. To each stem, four inflections of different lengths were attached (-a, -ari, -aren, and -arentzat, i.e., inflectional sequences). To test whether the process of word recognition was modulated by the knowledge of word structure in the language, half of the participants’ native language was Basque and the other half’s native language was Spanish. Children showed robust effects of frequency and length of inflection that diminished with age. In addition, the effect of length of inflection was modulated by the frequency of the stem and by the native language. Taken together, these results suggest that word recognition develops from a decoding strategy to a direct lexical access strategy and that this process is modulated by children’s knowledge of the inflectional structure of words from the beginning of their reading experience.  相似文献   

19.
This article introduces EsPal: a Web-accessible repository containing a comprehensive set of properties of Spanish words. EsPal is based on an extensible set of data sources, beginning with a 300 million token written database and a 460 million token subtitle database. Properties available include word frequency, orthographic structure and neighborhoods, phonological structure and neighborhoods, and subjective ratings such as imageability. Subword structure properties are also available in terms of bigrams and trigrams, biphones, and bisyllables. Lemma and part-of-speech information and their corresponding frequencies are also indexed. The website enables users either to upload a set of words to receive their properties or to receive a set of words matching constraints on the properties. The properties themselves are easily extensible and will be added over time as they become available. It is freely available from the following website: http://www.bcbl.eu/databases/espal/.  相似文献   

20.
WORD FREQUENCY AND WORD DIFFICULTY:   总被引:1,自引:0,他引:1  
Abstract— This article compares word counts made using four different collections of text, including one based on collections of electronic text For each of the collections, standard word frequency indices were computed and compared with a carefully developed list of words ranked in order of difficulty as determined by vocabulary tests Correlations between the word frequency indices and word difficulty ranks show that word frequencies for all four corpora are highly correlated with word difficulty Despite these high correlations, the results show also that the difficulty of some words is not estimated accurately by word frequency The reasons for disparities between word frequency and word difficulty are not clear The high correlations obtained for the corpus based on electronic text suggest that this method of text sampling has potential but that caution is advisable in conducting such collections.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号