首页 | 本学科首页   官方微博 | 高级检索  
   检索      


A Large‐Scale Analysis of Variance in Written Language
Authors:Brendan T Johns  Randall K Jamieson
Institution:1. Department of Communicative Disorders and SciencesUniversity at Buffalo;2. Department of PsychologyUniversity of Manitoba
Abstract:The collection of very large text sources has revolutionized the study of natural language, leading to the development of several models of language learning and distributional semantics that extract sophisticated semantic representations of words based on the statistical redundancies contained within natural language (e.g., Griffiths, Steyvers, & Tenenbaum, 2007 ; Jones & Mewhort, 2007 ; Landauer & Dumais, 1997 ; Mikolov, Sutskever, Chen, Corrado, & Dean, 2013 ). The models treat knowledge as an interaction of processing mechanisms and the structure of language experience. But language experience is often treated agnostically. We report a distributional semantic analysis that shows written language in fiction books varies appreciably between books from the different genres, books from the same genre, and even books written by the same author. Given that current theories assume that word knowledge reflects an interaction between processing mechanisms and the language environment, the analysis shows the need for the field to engage in a more deliberate consideration and curation of the corpora used in computational studies of natural language processing.
Keywords:Distributional semantics  Cognitive modeling  Natural language processing  Big data analytics
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号