首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Game-based assessment (GBA) is a specific use of educational games that employs game activities to elicit evidence for educationally valuable skills and knowledge. While this approach can provide individualized and diagnostic information about students, the design and development of assessment mechanics for a GBA is a nontrivial task. In this article, we describe the 10-step procedure that the design team of Physics Playground (formerly known as Newton's Playground) has established by adapting evidence-centered design to address unique challenges of GBA. The scaling method used for Physics Playground was Bayesian networks; thus this article describes specific actions taken for the iterative process of constructing and revising Bayesian networks in the context of the game Physics Playground.  相似文献   

2.
Q矩阵是认知诊断评价的基础和核心要素, 它反映了测验的构念和内容设计, 直接影响着测验诊断分类的效果。本文采用Monte Carlo模拟, 研究了6种属性层级关系下, 不同的Q矩阵设计对于认知诊断效果的影响。用模式判准率的均值和标准差分别从分类准确性和稳定性的角度来评价诊断效果。实验结果表明:(1) 不同属性层级关系下, 分类准确性会随着测验长度的增加而提高, 但当测验长度增加到一定程度时, 会出现“天花板效应”; (2) Q矩阵中R*的个数(NR*)会影响测验的分类准确性及稳定性:NR*越大, 测验的分类稳定性越高, 当测验长度为属性个数的整数倍, 且NR*为测验长度相对属性个数的最大奇数倍时分类准确性最高; (3) Q矩阵中除R*以外的项目考察的属性个数会随着属性层级关系的不同对测验的分类准确性和稳定性产生不同的影响。根据实验结果, 本研究提出了进行诊断评价时Q矩阵优化设计的一些建议。  相似文献   

3.
测验信度是衡量测验质量的一个重要指标,认知诊断评估中同样需要重视信度问题。现有认知诊断中计算信度的方法均有一个前提假设:被试在前后两次测验的后验概率分布和边际概率完全相同。该假设过强,未考虑两次测验间存在的随机误差。基于Bootstrap抽样,提出了两类属性信度和模式信度的指标,分别是积差相关法和修正的一致性法。通过模拟研究比较了新方法和现有方法在不同属性个数、属性间相关性和题目数量下的表现,并基于英语能力认证考试ECPE和分数减法的实证数据验证了新方法的可行性。最后,对信度估计的影响因素进行了讨论。  相似文献   

4.
Pursuing the line of the difference models in IRT (Thissen &; Steinberg, 1986 Thissen, D., &; Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51:567577. doi:10.1007/BF02295596.[Crossref], [Web of Science ®] [Google Scholar]), this article proposed a new cognitive diagnostic model for graded/polytomous data based on the deterministic input, noisy, and gate (Haertel, 1989 Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 333352. doi:10.1111/j.1745-3984.1989.tb00336.x.[Crossref], [Web of Science ®] [Google Scholar]; Junker &; Sijtsma, 2001 Junker, B. W., &; Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258272. doi:10.1177/01466210122032064.[Crossref], [Web of Science ®] [Google Scholar]), which is named the DINA model for graded data (DINA-GD). We investigated the performance of a full Bayesian estimation of the proposed model. In the simulation, the classification accuracy and item recovery for the DINA-GD model were investigated. The results indicated that the proposed model had acceptable examinees' correct attribute classification rate and item parameter recovery. In addition, a real-data example was used to illustrate the application of this new model with the graded data or polytomously scored items.  相似文献   

5.
The assessment of higher-education student learning outcomes is an important component in understanding the strengths and weaknesses of academic and general education programs. This study illustrates the application of diagnostic classification models, a burgeoning set of statistical models, in assessing student learning outcomes. To facilitate understanding and future applications of diagnostic modeling, the log-linear cognitive diagnosis model used in this study is presented in a didactic manner. The model is applied in a context where undergraduate students were assessed along four learning outcomes related to psychosocial research across two time points. Results focus on implications and methods to aid stakeholders’ interpretation of the analyses. Contrasts to traditional measurement models and potential future applications are also discussed.  相似文献   

6.
分类一致性和准确性是认知诊断评估中的重要指标,前者反映信度问题,后者反映效度问题。已有研究提出的指标均是基于二分属性,而多分属性的后验概率分布和属性边际概率分布均不同于二分属性,需要构建新指标来衡量多分属性情景下的信效度。本研究基于二分思想,构建出二元式信息指标用于计算多分属性测验中的信效度,并通过实验设计考察了新指标在多种影响因素中的表现,验证了新指标的有效性。最后,为多分属性诊断测验的编制提供了建议,并提出未来研究方向。  相似文献   

7.
为认知诊断测验制定的题目属性向量平衡(IAVB)策略强调测验必须体现认知模型,并将题目属性向量而不是以单个属性作为考察单位。该策略克服严格属性平衡(SAB)策略仅适用于独立结构的不足,且在每个题目考察属性个数(大致)相同的条件下,以模式判准率(PMR)为衡量标准,该策略优于非IAVB策略。特别地,若属性层级结构为独立结构时,采用IAVB策略的测验最优,SAB策略次之,两种策略均未采用则最差。另IAVB矩阵可显著提高PMR。  相似文献   

8.
丁树良  毛萌萌  汪文义  罗芬  CUI Ying 《心理学报》2012,44(11):1535-1546
构建正确的认知模型是成功进行认知诊断的关键之一,如果认知诊断测验不能完整准确地代表这个认知模型,这个测验的效度就存在问题.属性及其层级可以表示一个认知模型.在认知模型正确基础上,给出了一个计量公式以衡量认知诊断测验能够多大程度上代表认知模型;对于不止包含一个知识状态的等价类及其形成原因进行了分析,对Cui等人的属性层级相合性指标(HCI)提出修改建议,以更好地探查数据与专家给出的认知模型的一致性.  相似文献   

9.
MicroCog: Assessment of Cognitive Functioning version 2.1 (Powell, D. H., Kaplan, E. F., Whitla, D., Catlin, R., and Funkenstein, H. H. (1993). The Psychological corporation, San Antonio, TX.) is one of the first computerized assessment batteries commercially developed to detect early signs of cognitive impairment. This paper reviews its psychometric characteristics and relates them to its clinical utility. It concludes that MicroCog provides an accurate, cost-effective screen for early dementia among elderly subjects living in the community and that it can distinguish dementia from depression. Its ability to detect cognitive decline at other ages or to discriminate dementia from other mental disorders has not been established. MicroCog measures different constructs than do traditional neuropsychological tests, making it difficult to relate test performance to current models of cognitive functioning. The review recommends further development of MicroCog and discusses its implications for the future of computer-based neuropsychological assessment.  相似文献   

10.
编制选项具有诊断信息的多选题是提高多选题认知诊断测验诊断效果的有效方法.研究从认知诊断的目标出发,根据认知诊断测验质量的评价标准,结合多选题的特点,探讨选项具有诊断信息的多选题认知诊断测验编制的原则.同时,结合多选题的特点和多选题采用称名计分方式的需要,对编制选项具有诊断信息的多选题提出两点要求.根据多选题认知诊断测验编制的原则和要求,给出具有可操作性的多选题认知诊断测验编制的步骤.模拟研究结果表明:根据所提出的原则和要求编制的多选题认知诊断测验具有较好的诊断效果,说明这些原则和要求合理、可行.由于这些原则、要求和步骤具有较强的可操作性,因此它对于编制多选题认知诊断测验具有一定的指导意义.  相似文献   

11.
认知诊断测验蓝图的设计   总被引:5,自引:0,他引:5       下载免费PDF全文
通常认为由属性和项目关联阵(即Q矩阵)的列对应的项目充任认知诊断测验中行为样本,其实这种做法不能有效防止理想反应模式的误判。如在测验之前便可确定欲测之属性及层级关系,找到可达阵,可证明可达阵的各个列对应的项目类在认知诊断测验中必不可少,否则在理想反应模式下就一定有一些被试会被误判。本文给出充分必要Q矩阵的概念,以区别Tatsuoka(1995,2009) 讨论过的充分Q矩阵概念。充分必要Q矩阵才能有效指导测验的编制。  相似文献   

12.
The cognitive content-specificity hypothesis proposes that depression and anxiety can be discriminated on the basis of unique cognitive profiles. Alternatively, the Tripartite model suggests that, although depression and anxiety share a general distress factor, anhedonia is a characteristic of depression with anxious arousal a characteristic of anxiety. Past research devoted to integrating these two models has been limited in a number of ways. To remedy these limitations, this study attempted to assess the complete Tripartite model and used a multidimensional cognitive assessment tool to handle the heterogeneity of anxious cognitive content. Results on data collected from 411 clients seeking services at a university counseling center suggested that a one-to-one mapping between Tripartite dimensions and cognitive content was possible. Further, variables from each model simultaneously explained unique variance in depression and anxiety ratings.  相似文献   

13.
使用模拟研究方法比较了以往研究中提出的基于观察信息矩阵、三明治矩阵的Wald(分别表示为W_Obs、W_Sw)、似然比(Likelihood Ratio)统计量以及新提出的基于经验交叉相乘信息矩阵的Wald统计量(W_XPD)在模型——数据失拟条件下进行项目水平上模型比较时的表现。结果显示:(1)W_Sw的一类错误控制率有很强的健壮性。(2)W_XPD在Q矩阵错误设定的大多数条件下的表现优于W_Sw。结论:模型—数据拟合良好时可以使用W_Sw进行项目水平上的模型比较,当模型与数据失拟时W_XPD可能是更好的选择。  相似文献   

14.
使用模拟研究方法比较了以往研究中提出的基于观察信息矩阵、三明治矩阵的Wald(分别表示为W_Obs、W_Sw)、似然比(Likelihood Ratio)统计量以及新提出的基于经验交叉相乘信息矩阵的Wald统计量(W_XPD)在模型——数据失拟条件下进行项目水平上模型比较时的表现。结果显示:(1)W_Sw的一类错误控制率有很强的健壮性。(2)W_XPD在Q矩阵错误设定的大多数条件下的表现优于W_Sw。结论:模型—数据拟合良好时可以使用W_Sw进行项目水平上的模型比较,当模型与数据失拟时W_XPD可能是更好的选择。  相似文献   

15.
人机交互过程中认知负荷的综合测评方法   总被引:7,自引:0,他引:7  
设计模拟网络引擎搜索和心算双任务实验,分析主观评定、绩效测量和生理测量三类评估指标对认知负荷变化的敏感性;采用因素分析、BP神经网络和自组织神经网络三种建模方法,探索人机交互过程中认知负荷的综合评估建模方法。结果显示:心理努力、任务主观难度、注视时间、注视次数、主任务反应时、主任务正确率6个指标对认知负荷变化敏感;采用多维综合评估模型对双任务作业认知负荷进行测量总体上比采用单一评估指标的测量更为有效。BP网络和自组织神经网络两种神经网络模型对认知负荷的测量结果优于传统的因素分析方法  相似文献   

16.
郭磊  郑蝉金  边玉芳 《心理学报》2015,47(1):129-140
本研究借鉴传统计算机化自适应测验的思想, 并结合认知诊断的特点, 在认知诊断框架下提出了4种变长CD-CAT的终止规则, 分别是属性标准误法(SEA)、邻近后验概率之差法(DAPP)、二等分法(HA)以及混合法(HM)。在未控制曝光和采用不同曝光控制条件下, 与HSU法及KL法进行了比较。研究结果表明:(1) 终止条件越严格, 平均测验长度越长, 按测验长度最大值终止的测验百分比越大, 模式判准率越高。(2) 当未加入曝光控制时, 4种新的终止规则均有较好表现, 与HSU法十分接近。随着最大后验概率预设值的增加或e的减小, 模式判准率呈上升趋势, 平均测验长度逐渐增加, 但在题库使用率方面均较差。(3) 当加入项目曝光控制时, 6种变长终止规则下的题库使用率有了极大的提升, 仍能保持较高的模式判准率, 并且不同的曝光控制方法对终止规则的影响是不同的。其中, 相对标准终止规则极易受到曝光控制方法的影响。(4) 综合来看, SEA、HM以及HA法在各项指标上的表现与HSU法基本一致, 其次为KL法和DAPP法。  相似文献   

17.
研究采用基于PASS理论建构的D-N认知评估系统(CAS)作为主要评估工具, 以临床推介的18名AD/HD儿童和18名在性别、年龄及智力水平与临床样本相匹配的正常儿童为被试, 对其PASS认知过程特征开展比较研究, 探查临床组儿童潜在的认知过程异常, 研究结果表明:(1)临床AD/HD儿童在CAS总量表分上显著低于正常对照组儿童;(2)临床AD/HD儿童与正常对照组儿童在计划和注意过程分数上存在显著差异, 且临床AD/HD儿童的计划和注意分量表分数可很好地预测其在DSM-IV上的注意涣散评估分数;(3)临床AD/HD组儿童与正常对照组儿童在同时性加工和继时性加工水平上则没有显著差异。  相似文献   

18.
郭磊  杨静  宋乃庆 《心理科学》2018,(3):735-742
聚类分析已成功用于认知诊断评估(CDA)中,使用广泛的聚类分析方法为K-means算法,有研究已证明K-means在CDA中具有较好的聚类效果。而谱聚类算法通常比K-means分类效果更佳,本研究将谱聚类算法引进CDA,探讨了属性层级结构、属性个数、样本量和失误率对该方法的影响。研究发现:(1)谱聚类算法要比K-means提供更好的聚类结果,尤其在实验条件较苛刻时,谱聚类算法更加稳健;(2)线型结构聚类效果最好,收敛型和发散型相近,独立型结构表现较差;(3)属性个数和失误率增加后,聚类效果会下降;(4)样本量增加后,聚类效果有所提升,但K-means方法有时会有反向结果出现。  相似文献   

19.
The Kimberley Indigenous Cognitive Assessment (KICA) was initially developed and validated as a culturally appropriate dementia screening tool for older Indigenous people living in the Kimberley. This paper describes the re‐evaluation of the psychometric properties of the cognitive section (KICA‐Cog) of this tool in two different populations, including a Northern Territory sample, and a larger population‐based cohort from the Kimberley. In both populations, participants were evaluated on the KICA‐Cog tool, and independently assessed by expert clinical raters blinded to the KICA scores, to determine validity and reliability of dementia diagnosis for both groups. Community consultation, feedback and education were integral parts of the research. for the Northern Territory sample, 52 participants were selected primarily through health services. Sensitivity was 82.4% and specificity was 87.5% for diagnosis of dementia, with area under the curve (AUC) of .95, based on a cut‐off score of 31/32 of a possible 39. for the Kimberley sample, 363 participants from multiple communities formed part of a prevalence study of dementia. Sensitivity was 93.3% and specificity was 98.4% for a cut‐off score of 33/34, with AUC = .98 (95% confidence interval: 0.97–0.99). There was no education bias found. The KICA‐Cog appears to be most reliable at a cut‐off of 33/39.  相似文献   

20.
Generating items during testing: Psychometric issues and models   总被引:2,自引:0,他引:2  
On-line item generation is becoming increasingly feasible for many cognitive tests. Item generation seemingly conflicts with the well established principle of measuring persons from items with known psychometric properties. This paper examines psychometric principles and models required for measurement from on-line item generation. Three psychometric issues are elaborated for item generation. First, design principles to generate items are considered. A cognitive design system approach is elaborated and then illustrated with an application to a test of abstract reasoning. Second, psychometric models for calibrating generating principles, rather than specific items, are required. Existing item response theory (IRT) models are reviewed and a new IRT model that includes the impact on item discrimination, as well as difficulty, is developed. Third, the impact of item parameter uncertainty on person estimates is considered. Results from both fixed content and adaptive testing are presented.This article is based on the Presidential Address Susan E. Embretson gave on June 26, 1999 at the 1999 Annual Meeting of the Psychometric Society held at the University of Kansas in Lawrence, Kansas. —Editor  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号