Sequential sampling models of human text classification期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Sequential sampling models of human text classification

Authors:	Michael D. Lee Elissa Y. Corlett

Affiliation:	2. Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, L’Aquila, Italy;1. The State Key Laboratory of Refractories and Metallurgy, Institute of Advanced Materials and Nanotechnology, College of Materials and Metallurgy, Wuhan University of Science and Technology, Wuhan 430081, PR China;2. School of Chemistry, South China Normal University, Guangzhou 510006, PR China

Abstract:	Text classification involves deciding whether or not a document is about a given topic. It is an important problem in machine learning, because automated text classifiers have enormous potential for application in information retrieval systems. It is also an interesting problem for cognitive science, because it involves real world human decision making with complicated stimuli. This paper develops two models of human text document classification based on random walk and accumulator sequential sampling processes. The models are evaluated using data from an experiment where participants classify text documents presented one word at a time under task instructions that emphasize either speed or accuracy, and rate their confidence in their decisions. Fitting the random walk and accumulator models to these data shows that the accumulator provides a better account of the decisions made, and a “balance of evidence” measure provides the best account of confidence. Both models are also evaluated in the applied information retrieval context, by comparing their performance to established machine learning techniques on the standard Reuters‐21578 corpus. It is found that they are almost as accurate as the benchmarks, and make decisions much more quickly because they only need to examine a small proportion of the words in the document. In addition, the ability of the accumulator model to produce useful confidence measures is shown to have application in prioritizing the results of classification decisions.

Keywords:	Text classification Sequential sampling processes Random walks Accumulators Information retrieval
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏