机器学习在解决过拟合现象中的作用 The Role of Machine Learning in Solving Overfitting期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

机器学习在解决过拟合现象中的作用

引用本文：	董波陈艾睿张明. 机器学习在解决过拟合现象中的作用[J]. 心理科学, 2021, 0(2): 274-281

作者姓名：	董波陈艾睿张明

作者单位：	1. 苏州科技大学;2. 苏州大学教育学院;

摘要：	过拟合现象是心理学走向预测科学的重要阻碍。文章综述了机器学习在解决过拟合现象中的价值和实现途径：（1）介绍了过拟合的两种表现形式和现状；（2）分析过拟合的根因，即“高解释力≠高预测力”；（3）厘清机器学习的建模逻辑与核心技术在解决过拟合中的作用；（4）利用样例数据和代码说明机器学习统计思想在模型拟合中的具体应用过程。文章指出心理学应从解决实际问题的角度出发，借鉴机器学习的分析思想，避免过拟合，进而提供更准确更稳定的结论和预测模型。
关键词：	过拟合机器学习建模逻辑核心技术应用举例
收稿时间：	2019-04-11
修稿时间：	2020-03-02
The Role of Machine Learning in Solving Overfitting

Abstract:	Overfitting is the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably. There are modal overfitting and procedure overfitting in current psychology. An overfitted model is a statistical model that contains more parameters than can be justified by the data. The essence of overfitting is to have unknowingly extracted some of the residual variances (i.e. the noise) as if that variances represented underlying model structure. One of the ways to avoid overfitting is machine learning. Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead. Machine learning algorithms are used in email filtering, detection of network intruders, and computer vision. Here, we summarized the values and ways of machine learning in prediction. First of all, we describe the current state of psychology that explanation without prediction by introducing modal overfitting and procedural overfitting. In modal overfitting, if the researcher only pursues the high degree of goodness of fit of the model, it will lead to a decrease of prediction of the model in other samples. The practice of flexibly selecting analytical procedures based in part on the quality of the results they produce has come to be known as p-hacking or data-contingent analysis. This is mainly because the bias (not variance) of sample is incorrectly fitted to it. Secondly, we show the essential reason of overfitting: the viewpoint of ‘A model could interpret well does not mean that it could reliably predict human behavior’. Although explanation and prediction are relatively close in philosophy, they differ greatly from a statistical and practical point of view. From a statistical point of view, goodness of fit is an important indicator to measure how well the model fits the data generation process. The higher the goodness of fit of the model, the more variances can be explained, the closer it is to the real process of data generation. However, the variances contain meaningful variances caused by independent variables and meaningless variances caused by sampling. Theoretically, only meaningful variance needs to fit, but the existing tools cannot distinguish between each other in practice. This limitation directly leads to overfitting. Overfitting models have a high degree of explanation for sample data, but the prediction is poor in other samples. From a practical point of view, it is difficult to construct a model with good interpretation and prediction at the same time in reality. Thirdly, the logic of modeling and key technologies of machine learning were introduced. The two principles are: 1) Decomposing errors into bias and variances, we just fit variances to avoid the phenomenon of high variance of modeling; 2) Trading off biases and variances to minimize prediction errors. The three core technologies are cross-validation, regularization and big data. All of them guarantee the prediction of the model rather than explanation. At last, we use sample data and MATLAB code to illustrate the specific application process of machine learning in model fitting. We believe that psychology researchers should try to solving practical problems, using the method(s) of machine learning and building more accurate and robust prediction models.

Keywords:	overfitting machine learning logics of modeling key technologies application example
本文献已被 CNKI 等数据库收录！
	点击此处可从《心理科学》浏览原始摘要信息
	点击此处可从《心理科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏