首页 | 本学科首页   官方微博 | 高级检索  
     


Sequential sampling models of human text classification
Authors:Michael D. Lee  Elissa Y. Corlett
Affiliation:2. Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, L’Aquila, Italy;1. The State Key Laboratory of Refractories and Metallurgy, Institute of Advanced Materials and Nanotechnology, College of Materials and Metallurgy, Wuhan University of Science and Technology, Wuhan 430081, PR China;2. School of Chemistry, South China Normal University, Guangzhou 510006, PR China
Abstract:Text classification involves deciding whether or not a document is about a given topic. It is an important problem in machine learning, because automated text classifiers have enormous potential for application in information retrieval systems. It is also an interesting problem for cognitive science, because it involves real world human decision making with complicated stimuli. This paper develops two models of human text document classification based on random walk and accumulator sequential sampling processes. The models are evaluated using data from an experiment where participants classify text documents presented one word at a time under task instructions that emphasize either speed or accuracy, and rate their confidence in their decisions. Fitting the random walk and accumulator models to these data shows that the accumulator provides a better account of the decisions made, and a “balance of evidence” measure provides the best account of confidence. Both models are also evaluated in the applied information retrieval context, by comparing their performance to established machine learning techniques on the standard Reuters‐21578 corpus. It is found that they are almost as accurate as the benchmarks, and make decisions much more quickly because they only need to examine a small proportion of the words in the document. In addition, the ability of the accumulator model to produce useful confidence measures is shown to have application in prioritizing the results of classification decisions.
Keywords:Text classification  Sequential sampling processes  Random walks  Accumulators  Information retrieval
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号