Sequential sampling models of human text classification |
| |
Authors: | Michael D. Lee Elissa Y. Corlett |
| |
Affiliation: | 2. Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, L’Aquila, Italy;1. The State Key Laboratory of Refractories and Metallurgy, Institute of Advanced Materials and Nanotechnology, College of Materials and Metallurgy, Wuhan University of Science and Technology, Wuhan 430081, PR China;2. School of Chemistry, South China Normal University, Guangzhou 510006, PR China |
| |
Abstract: | Text classification involves deciding whether or not a document is about a given topic. It is an important problem in machine learning, because automated text classifiers have enormous potential for application in information retrieval systems. It is also an interesting problem for cognitive science, because it involves real world human decision making with complicated stimuli. This paper develops two models of human text document classification based on random walk and accumulator sequential sampling processes. The models are evaluated using data from an experiment where participants classify text documents presented one word at a time under task instructions that emphasize either speed or accuracy, and rate their confidence in their decisions. Fitting the random walk and accumulator models to these data shows that the accumulator provides a better account of the decisions made, and a “balance of evidence” measure provides the best account of confidence. Both models are also evaluated in the applied information retrieval context, by comparing their performance to established machine learning techniques on the standard Reuters‐21578 corpus. It is found that they are almost as accurate as the benchmarks, and make decisions much more quickly because they only need to examine a small proportion of the words in the document. In addition, the ability of the accumulator model to produce useful confidence measures is shown to have application in prioritizing the results of classification decisions. |
| |
Keywords: | Text classification Sequential sampling processes Random walks Accumulators Information retrieval |
本文献已被 ScienceDirect 等数据库收录! |
|