首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Automatic Extraction of Property Norm‐Like Data From Large Text Corpora
Authors:Colin Kelly  Barry Devereux  Anna Korhonen
Institution:1. Computer Laboratory, University of Cambridge;2. Department of Psychology, Centre for Speech, Language and the Brain, University of Cambridge
Abstract:Traditional methods for deriving property‐based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is‐a vehicle ) or meronymy/metonymy (e.g., car has wheels ), or unspecified relations (e.g., car — petrol ). We propose a system for the challenging task of automatic, large‐scale acquisition of unconstrained, human‐like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept‐relation‐feature triples (e.g., car be fast , car require petrol , car cause pollution ), which approximate property‐based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human‐generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human‐judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state‐of‐the‐art, while subsequent evaluations exhibit the human‐like character of our generated properties.
Keywords:Natural language processing  Property norm  Wikipedia  Human evaluation  WordNet  Pointwise mutual information  Log‐likelihood  Entropy
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号