共查询到20条相似文献,搜索用时 15 毫秒
1.
Yoon Jeon Kim Russell G. Almond Valerie J. Shute 《International Journal of Testing》2016,16(2):142-163
Game-based assessment (GBA) is a specific use of educational games that employs game activities to elicit evidence for educationally valuable skills and knowledge. While this approach can provide individualized and diagnostic information about students, the design and development of assessment mechanics for a GBA is a nontrivial task. In this article, we describe the 10-step procedure that the design team of Physics Playground (formerly known as Newton's Playground) has established by adapting evidence-centered design to address unique challenges of GBA. The scaling method used for Physics Playground was Bayesian networks; thus this article describes specific actions taken for the iterative process of constructing and revising Bayesian networks in the context of the game Physics Playground. 相似文献
2.
《心理科学进展》2024,33(1)
空间能力是个体对客体或空间图形在头脑中进行识别、编码、贮存、表征、分解组合和抽象概括的能力, 是个体理解自身所处环境并解决问题的认知基础。准确、便捷、有效地测评空间能力, 对增强STEM教育教学水平和人才培养质量都具有重要意义。由于空间能力受多因素共同作用, 具有复杂性、多维度、内隐性的特点, 使得利用计算机评价空间能力比较困难。本研究旨在准确、有效、大规模地测评空间能力, 将使用多模态学习分析方法探索学习者空间认知行为表现特征, 并基于视频游戏环境研发空间能力隐形测评关键技术与工具。具体包括: 1)构建空间能力内在表征框架和评价指标体系; 2)基于多模态学习分析构建学习者空间能力行为表现模型; 3)探索视频游戏影响空间能力的关键因素, 并使用游戏引擎开发基于视频游戏的测评工具; 4)使用以证据为中心的设计框架和贝叶斯网络模型, 开发并部署能够推断和预测空间能力的测评算法; 5)在实验室和真实课堂情境开展实证研究, 验证测评工具有效性。研究成果将有利于理解人类空间认知过程和行为表现, 拓展和丰富空间能力相关理论, 并为大规模数字化测评提供关键技术支撑。 相似文献
3.
4.
Janne V. Kujala Ulla Richardson Heikki Lyytinen 《Journal of mathematical psychology》2010,54(2):247-255
Adaptive learning games should provide opportunities for the student to learn as well as motivate playing until goals have been reached. In this paper, we give a mathematically rigorous treatment of the problem in the framework of Bayesian decision theory. To quantify the opportunities for learning, we assume that the learning tasks that yield the most information about the current skills of the student, while being desirable for measurement in their own right, would also be among those that are efficient for learning. Indeed, optimization of the expected information gain appears to naturally avoid tasks that are exceedingly demanding or exceedingly easy as their results are predictable and thus uninformative. Still, tasks that are efficient for learning may be experienced as too challenging, and the resulting failures can lower motivation. Therefore, in addition to quantifying the expected informational benefit for learning of any prospective task to be presented next, we also model the expected motivational cost of its presentation, measured simply as the estimated probability of failure in our example model. We propose a “learner-friendly” adaptation algorithm that chooses the learning tasks by optimizing the expected benefit divided by the expected cost. We apply this algorithm to a Rasch-like student model implemented within a real-world application and present initial results of a pilot experiment. 相似文献
5.
When participants assess the relationship between two variables, each with levels of presence and absence, the two most robust phenomena are that: (a) observing the joint presence of the variables has the largest impact on judgment and observing joint absence has the smallest impact, and (b) participants' prior beliefs about the variables' relationship influence judgment. Both phenomena represent departures from the traditional normative model (the phi coefficient or related measures) and have therefore been interpreted as systematic errors. However, both phenomena are consistent with a Bayesian approach to the task. From a Bayesian perspective: (a) joint presence is normatively more informative than joint absence if the presence of variables is rarer than their absence, and (b) failing to incorporate prior beliefs is a normative error. Empirical evidence is reported showing that joint absence is seen as more informative than joint presence when it is clear that absence of the variables, rather than their presence, is rare. 相似文献
6.
Ard J. Barends Reinout E. de Vries 《International Journal of Selection & Assessment》2023,31(1):120-134
There is scant research on the validity of personality assessment games in selection situations. Therefore, in two experimental simulated selection studies, the construct validity of an assessment game developed to assess honesty-humility was tested. Both studies found no differences between a control condition and a simulated selection condition on honesty-humility game scores. Moreover, convergent and discriminant validity with self-reported personality were not affected by the manipulation. We obtained mixed evidence that individual differences in dispositional insight and the ability to identify criteria influenced the validity of the game. As the validity of the personality assessment game was not significantly affected in the simulated selection context, our findings may imply that well-designed personality assessment games can be used for high-stakes selection assessments. 相似文献
7.
充分挖掘选择题(Multiple-Choice, MC)的诊断信息受到了较多关注, 将干扰项信息考虑在内可以提升诊断精度。为了弥补参数模型基于大样本才能获得可靠估计的不足, 以及适用于班级水平的小样本诊断测验情境, 本研究提出了非参数的多选题诊断方法。模拟和实证研结果表明:(1)当MC测验中题目参数不存在较大差异时, ${{d}_{text{ph}-text{MC}}}$ 法在多数情况下表现优于参数类诊断模型。(2)当MC测验中题目参数存在较大差异时, ${{d}_{ph-MC}}$ 法的表现最优。(3)实证研究中非参数方法和参数类模型的分类一致性程度较高, ${{d}_{text{ph}-text{MC}}}$ 距离法估计得到的考生属性总体掌握程度与总分相关最高。最后, 基于MC诊断测验的特点提出了若干研究方向。 相似文献
8.
多级计分认知诊断模型的开发对认知诊断的发展具有重要作用,但对于多级计分模型下的Q矩阵修正还有待研究。本研究尝试对多级计分认知诊断Q矩阵修正进行研究,并聚焦更具诊断价值的基于项目类别水平的Q矩阵修正。将相对拟合统计量应用于多级计分认知诊断Q矩阵修正,并与已有方法 Stepwise方法(Ma&de la Torre,2019)进行比较。研究表明:BIC方法对多级计分认知诊断模型的Q矩阵修正具有较高的模式判准率和属性判准率,其对Q矩阵的恢复率也高于Stepwise方法, BIC方法修正后的Q矩阵与数据更加拟合;在复杂模型中,相对拟合指标BIC比AIC和-2LL表现更好,在实践中,使用者可以选择BIC法进行测验Q矩阵修正; Q矩阵修正效果受到被试人数的影响,增加被试人数可以提高Q矩阵修正的正确率。总之,本研究为多级计分认知诊断Q矩阵修正提供了重要的方法支持。 相似文献
9.
A growing body of research suggests that, apart from the wording of specific questions, various aspects of the interview process itself may affect the reliability of information provided by research participants. To examine whether the order of presentation of specific diagnostic modules affects the likelihood of subjects' yes/no responses within the Diagnostic Interview Schedule for Children (DISC), the authors used a counterbalanced design, presenting two DISC diagnostic modules to children and their parents in standard or reversed order. Results indicate that the order of module administration exerts effects on the total numbers of symptoms endorsed, level of impairment, and the likelihood of meeting diagnostic criteria, regardless of whether the information is provided by parent or child respondents. Future child and adult assessment measures should take these difficulties fully into account through novel approaches to instrument design and interview procedures. 相似文献
10.
Despite the well-known difficulties in obtaining reliable and valid assessments of child psychopathology, investigators generally have not examined the influence of factors such as subject characteristics or the specific assessment procedures themselves on the validity of the information obtained. To address these issues, this special section presents four studies of the Diagnostic Interview Schedule for Children, in which investigators examined the impact of a range of variables on the reliability of its symptom and diagnostic information. Factors studied include interview structural characteristics; question length, complexity, and placement within the interview; and interview subject characteristics. Overall findings suggest that interview and subject characteristics exert important influences on the data obtained, and that novel approaches, such as allowing subjects a greater role in the ordering of questions to be answered, may improve the precision and accuracy of such measures of children's psychopathology. 相似文献
11.
David A. Wilder Daniel Cymbal Jamie Villacorta 《Journal of applied behavior analysis》2020,53(2):1170-1176
The Performance Diagnostic Checklist-Human Services (PDC-HS) is an informant-based tool designed to identify the variables contributing to poor employee performance in human service settings, such as clinics, schools, and residential facilities. Upon completion of the tool, an intervention indicated by PDC-HS results is used to improve employee performance. To date, the PDC-HS has been used in a number of studies. This review describes the existing research on the PDC-HS and provides suggestions for future research. 相似文献
12.
Cocaine is a type of drug that functions to increase the availability of the neurotransmitter dopamine in the brain. However, cocaine dependence or abuse is highly related to an increased risk of psychiatric disorders and deficits in cognitive performance, attention, and decision-making abilities. Given the chronic and persistent features of drug addiction, the progression of abstaining from cocaine often evolves across several states, such as addiction to, moderate dependence on, and swearing off cocaine. Hidden Markov models (HMMs) are well suited to the characterization of longitudinal data in terms of a set of unobservable states, and have increasingly been used to uncover the dynamic heterogeneity in progressive diseases or activities. However, the existence of outliers or influential points may misidentify the hidden states and distort the associated inference. In this study, we develop a Bayesian local influence procedure for HMMs with latent variables in the presence of missing data. The proposed model enables us to investigate the dynamic heterogeneity of multivariate longitudinal data, reveal how the interrelationships among latent variables change from one state to another, and simultaneously conduct statistical diagnosis for the given data, model assumptions, and prior inputs. We apply the proposed procedure to analyze a dataset collected by the UCLA center for advancing longitudinal drug abuse research. Several outliers or influential points that seriously influence estimation results are identified and removed. The proposed procedure also discovers the effects of treatment and individuals’ psychological problems on cocaine use behavior and delineates their dynamic changes across the cocaine-addiction states. 相似文献
13.
认知诊断评估旨在探讨个体内部的知识掌握结构,并提供关于学生优缺点的详细诊断信息,以促进个体的全面发展。当前研究者已开发了大量0-1评分的认知诊断模型,但对于多级评分认知诊断模型的研究还比较少。本文对已有的多级评分认知诊断模型进行了归纳,介绍了模型的假设,计量特征以及适用范围,为实际应用者和研究者在多级评分认知诊断模型的比较和选用上提供借鉴和参考。最后,对未来关于多级评分诊断模型的研究方向进行了展望。 相似文献
14.
James M. Lattin 《Psychometrika》1990,55(2):353-370
This paper presents an approach for determining unidimensional scale estimates that are relatively insensitive to limited inconsistencies in paired comparisons data. The solution procedure, shown to be a minimum-cost network-flow problem, is presented in conjunction with a sensitivity diagnostic that assesses the influence of a single pairwise comparison on traditional Thurstone (ordinary least squares) scale estimates. When the diagnostic indicates some source of distortion in the data, the network technique appears to be more successful than Thurstone scaling in preserving the interval scale properties of the estimates.My special thanks go to Alvin Silk, Thomas Magnanti, and Roy Welsch for their support and advice throughout the formative stages of this paper, and to V. Srinivasan for his helpful comments on a later draft of this paper. I also wish to thank the Editor, Associate Editor, and two reviewers for their constructive suggestions.James M. Lattin is Associate Professor of Marketing and Management Science and the James and Doris McNamara Faculty Fellow for 1988-1989. 相似文献
15.
Lucas CP Fisher P Piacentini J Zhang H Jensen PS Shaffer D Dulcan M Schwab-Stone M Regier D Canino G 《Journal of abnormal child psychology》1999,27(6):429-437
Previous studies have suggested that discrepant reporting in a test–retest reliability paradigm is not purely random measurement error, but partly a function of a systematic tendency to say no during retest to questions answered positively at initial testing (attenuation). To examine features of interview questions that may be associated with attenuation, three raters independently assessed the structural and content features of questions from the Diagnostic Interview Schedule for Children (version 2.3) and linked these to data from a test–retest reliability study of 223 community respondents (parent and child reports). Results indicated that for both parent and youth reports, item features most strongly associated with attenuation were (a) being a stem question (asked of all respondents, regardless of any skip structure); (b) question placement in the first half of the interview; (c) question length; (d) question complexity; or (e) requiring assessment of the timing, duration, or frequency of a symptom. Findings may be explained by participants' conscious efforts to avoid further questions or by their learning more about the nature and purpose of the interview as they gain more experience; alternatively, findings may represent a methodological artifact of structured interview design. 相似文献
16.
Xiaofeng Yu Ying Cheng 《The British journal of mathematical and statistical psychology》2020,73(Z1):145-179
In a cognitive diagnostic assessment (CDA), attributes refer to fine-grained knowledge points or skills. The Q -matrix is a central component of CDA, which specifies the relationship between items and attributes. Oftentimes, attributes and Q -matrix are defined by subject-matter experts, and assumed to be appropriate without any misspecifications. However, this assumption does not always hold in real applications. To address this concern, this paper proposes a residual-based statistic for validating the Q -matrix. Its performance is evaluated in a simulation study and compared against that of an existing method proposed in Liu, Xu and Ying (2012, Applied Psychological Measurement, 36, 548). Simulation results indicate that the proposed method leads to a higher recovery rate of the Q -matrix and is computationally more efficient. The advantage in computational efficiency is particularly pronounced when the number of attributes measured by the test reaches five or more. Results also suggest that the two methods have different tendencies in estimating the attribute vector for each item. In cases where the methods fail to recover the correct Q -matrix, the method in Liu et al. (2012, Applied Psychological Measurement, 36, 548) tends to overestimate the number of attributes measured by the items, whereas our method does not show that bias. 相似文献
17.
Bruce F. Chorpita Letitia M. Yim Susan A. Tracey 《Journal of psychopathology and behavioral assessment》2002,24(1):13-23
An important goal of clinical assessment is to balance cost-effectiveness, administration demands, and accuracy (G. Young, J. O'Brien, E. Gutterman, & P. Cohen, 1987). The incorporation of Bayesian logic into diagnostic interviewing may assist with this goal, but in previous examinations, such methods have been prohibitively complex. In this study, analysis of a simplified Bayesian system showed overall classification error rates as good or better than traditional structured interviewing, and reduction in error was positively related to the psychometric properties of the predictor used in the actuarial functions. A dynamic system using simplified Bayesian logic appears to function well in the context of a structured interview and requires comparatively less data than previously tested Bayesian approaches. This type of system appears suitable for further research with clinical populations to determine its performance in applied settings. 相似文献
18.
Konstantinos Vamvourellis Konstantinos Kalogeropoulos Irini Moustaki 《The British journal of mathematical and statistical psychology》2023,76(3):559-584
The paper proposes a novel model assessment paradigm aiming to address shortcoming of posterior predictive -values, which provide the default metric of fit for Bayesian structural equation modelling (BSEM). The model framework presented in the paper focuses on the approximate zero approach (Psychological Methods, 17 , 2012, 313), which involves formulating certain parameters (such as factor loadings) to be approximately zero through the use of informative priors, instead of explicitly setting them to zero. The introduced model assessment procedure monitors the out-of-sample predictive performance of the fitted model, and together with a list of guidelines we provide, one can investigate whether the hypothesised model is supported by the data. We incorporate scoring rules and cross-validation to supplement existing model assessment metrics for BSEM. The proposed tools can be applied to models for both continuous and binary data. The modelling of categorical and non-normally distributed continuous data is facilitated with the introduction of an item-individual random effect. We study the performance of the proposed methodology via simulation experiments as well as real data on the ‘Big-5’ personality scale and the Fagerstrom test for nicotine dependence. 相似文献
19.
测验信度是衡量测验质量的一个重要指标,认知诊断评估中同样需要重视信度问题。现有认知诊断中计算信度的方法均有一个前提假设:被试在前后两次测验的后验概率分布和边际概率完全相同。该假设过强,未考虑两次测验间存在的随机误差。基于Bootstrap抽样,提出了两类属性信度和模式信度的指标,分别是积差相关法和修正的一致性法。通过模拟研究比较了新方法和现有方法在不同属性个数、属性间相关性和题目数量下的表现,并基于英语能力认证考试ECPE和分数减法的实证数据验证了新方法的可行性。最后,对信度估计的影响因素进行了讨论。 相似文献
20.
Q矩阵在认知诊断的模型参数估计和诊断分类中起着重要作用。本文通过研究Liu等人的方法, 设计了同时估计项目参数和Q矩阵的联合估计算法。在DINA模型下, 对项目参数未知时开展模拟研究。研究假设项目为20个, 考察的属性个数分别是3、4和5, 初始Q矩阵中分别存在3、4和5个属性界定错误的项目。结果表明, 联合估计算法能在错误的初始Q矩阵基础上以很高的概率得到正确的Q矩阵。另外, 当专家认定测验的属性个数存在错误时, 该方法推导的Q矩阵和模型参数能提供很好的鉴别Q矩阵错误的信息。 相似文献