Game-based assessment (GBA) is a specific use of educational games that employs game activities to elicit evidence for educationally valuable skills and knowledge. While this approach can provide individualized and diagnostic information about students, the design and development of assessment mechanics for a GBA is a nontrivial task. In this article, we describe the 10-step procedure that the design team of Physics Playground (formerly known as Newton's Playground) has established by adapting evidence-centered design to address unique challenges of GBA. The scaling method used for Physics Playground was Bayesian networks; thus this article describes specific actions taken for the iterative process of constructing and revising Bayesian networks in the context of the game Physics Playground.  相似文献   

An important goal of clinical assessment is to balance cost-effectiveness, administration demands, and accuracy (G. Young, J. O'Brien, E. Gutterman, & P. Cohen, 1987). The incorporation of Bayesian logic into diagnostic interviewing may assist with this goal, but in previous examinations, such methods have been prohibitively complex. In this study, analysis of a simplified Bayesian system showed overall classification error rates as good or better than traditional structured interviewing, and reduction in error was positively related to the psychometric properties of the predictor used in the actuarial functions. A dynamic system using simplified Bayesian logic appears to function well in the context of a structured interview and requires comparatively less data than previously tested Bayesian approaches. This type of system appears suitable for further research with clinical populations to determine its performance in applied settings.  相似文献   

The assessment of higher-education student learning outcomes is an important component in understanding the strengths and weaknesses of academic and general education programs. This study illustrates the application of diagnostic classification models, a burgeoning set of statistical models, in assessing student learning outcomes. To facilitate understanding and future applications of diagnostic modeling, the log-linear cognitive diagnosis model used in this study is presented in a didactic manner. The model is applied in a context where undergraduate students were assessed along four learning outcomes related to psychosocial research across two time points. Results focus on implications and methods to aid stakeholders’ interpretation of the analyses. Contrasts to traditional measurement models and potential future applications are also discussed.  相似文献   

When participants assess the relationship between two variables, each with levels of presence and absence, the two most robust phenomena are that: (a) observing the joint presence of the variables has the largest impact on judgment and observing joint absence has the smallest impact, and (b) participants' prior beliefs about the variables' relationship influence judgment. Both phenomena represent departures from the traditional normative model (the phi coefficient or related measures) and have therefore been interpreted as systematic errors. However, both phenomena are consistent with a Bayesian approach to the task. From a Bayesian perspective: (a) joint presence is normatively more informative than joint absence if the presence of variables is rarer than their absence, and (b) failing to incorporate prior beliefs is a normative error. Empirical evidence is reported showing that joint absence is seen as more informative than joint presence when it is clear that absence of the variables, rather than their presence, is rare.  相似文献   

Cocaine is a type of drug that functions to increase the availability of the neurotransmitter dopamine in the brain. However, cocaine dependence or abuse is highly related to an increased risk of psychiatric disorders and deficits in cognitive performance, attention, and decision-making abilities. Given the chronic and persistent features of drug addiction, the progression of abstaining from cocaine often evolves across several states, such as addiction to, moderate dependence on, and swearing off cocaine. Hidden Markov models (HMMs) are well suited to the characterization of longitudinal data in terms of a set of unobservable states, and have increasingly been used to uncover the dynamic heterogeneity in progressive diseases or activities. However, the existence of outliers or influential points may misidentify the hidden states and distort the associated inference. In this study, we develop a Bayesian local influence procedure for HMMs with latent variables in the presence of missing data. The proposed model enables us to investigate the dynamic heterogeneity of multivariate longitudinal data, reveal how the interrelationships among latent variables change from one state to another, and simultaneously conduct statistical diagnosis for the given data, model assumptions, and prior inputs. We apply the proposed procedure to analyze a dataset collected by the UCLA center for advancing longitudinal drug abuse research. Several outliers or influential points that seriously influence estimation results are identified and removed. The proposed procedure also discovers the effects of treatment and individuals’ psychological problems on cocaine use behavior and delineates their dynamic changes across the cocaine-addiction states.  相似文献   

The Performance Diagnostic Checklist-Human Services (PDC-HS) is an informant-based tool designed to identify the variables contributing to poor employee performance in human service settings, such as clinics, schools, and residential facilities. Upon completion of the tool, an intervention indicated by PDC-HS results is used to improve employee performance. To date, the PDC-HS has been used in a number of studies. This review describes the existing research on the PDC-HS and provides suggestions for future research.  相似文献   

The paper proposes a novel model assessment paradigm aiming to address shortcoming of posterior predictive p -values, which provide the default metric of fit for Bayesian structural equation modelling (BSEM). The model framework presented in the paper focuses on the approximate zero approach (Psychological Methods, 17 , 2012, 313), which involves formulating certain parameters (such as factor loadings) to be approximately zero through the use of informative priors, instead of explicitly setting them to zero. The introduced model assessment procedure monitors the out-of-sample predictive performance of the fitted model, and together with a list of guidelines we provide, one can investigate whether the hypothesised model is supported by the data. We incorporate scoring rules and cross-validation to supplement existing model assessment metrics for BSEM. The proposed tools can be applied to models for both continuous and binary data. The modelling of categorical and non-normally distributed continuous data is facilitated with the introduction of an item-individual random effect. We study the performance of the proposed methodology via simulation experiments as well as real data on the ‘Big-5’ personality scale and the Fagerstrom test for nicotine dependence.  相似文献   

Despite the well-known difficulties in obtaining reliable and valid assessments of child psychopathology, investigators generally have not examined the influence of factors such as subject characteristics or the specific assessment procedures themselves on the validity of the information obtained. To address these issues, this special section presents four studies of the Diagnostic Interview Schedule for Children, in which investigators examined the impact of a range of variables on the reliability of its symptom and diagnostic information. Factors studied include interview structural characteristics; question length, complexity, and placement within the interview; and interview subject characteristics. Overall findings suggest that interview and subject characteristics exert important influences on the data obtained, and that novel approaches, such as allowing subjects a greater role in the ordering of questions to be answered, may improve the precision and accuracy of such measures of children's psychopathology.  相似文献   

在神经网络的最新取向下, 探讨阅读脑机制中背侧和腹侧通路的协作机制, 是解决语言认知神经科学多个理论问题共同面临的焦点。本项目拟通过两个脑功能成像实验, 建构汉字阅读的动态因果模型, 系统地考察汉字阅读的神经网络, 以及阅读网络中背、腹侧通路的协作机制。实验一利用快速适应实验范式的优点, 识别和考察汉字阅读涉及的认知成分所对应的功能脑区, 以及脑区联结形成的神经回路, 并建构汉字阅读的动态因果模型; 实验二进一步考察在刺激属性(语音和语义信息)和任务要求下阅读脑区的动态激活及相互作用。通过不同任务下的模型对比, 重点探讨阅读网络的脑区联结模式变化, 尤其是背、腹侧通路受刺激和任务影响时的协作机制。研究结果将为揭示阅读的神经生理模型、解决语言特异性脑区激活的争论等理论问题提供直接的证据, 还能为语言教学、阅读障碍矫治、以及临床应用提供理论基础与指导。  相似文献   

The incorporation of Bayesian logic into diagnostic interviewing may assist with empirically based diagnostic assessment strategies in practice settings, balancing cost effectiveness, administration demands, and accuracy, yet few demonstrations of such a system have been undertaken in the context of mental health diagnosis. The present study represented an initial feasibility demonstration of whether a simplified Bayesian approach offered comparative advantages in interview accuracy and efficiency against a standard assessment procedure. Two different diagnostic algorithms were compared targeting three selected diagnoses: generalized anxiety disorder (GAD), major depressive disorder (MDD), and social phobia (SP). The first algorithm was from a standard semi-structured diagnostic interview, and the second was from a dynamic system using diagnostic base rate information to select interview content. The dynamic algorithm reduced administration time and uniformly matched or improved accuracy over standard procedures. Preparation of this article was supported in part by National Institute of Mental Health Grant R03 MH60134, an award from the University of Hawai‘i Research Council, and awards from the Hawaii Departments of Health and Education to the first author.  相似文献   

In this paper, we propose a Bayesian framework for estimating finite mixtures of the LISREL model. The basic idea in our analysis is to augment the observed data of the manifest variables with the latent variables and the allocation variables. The Gibbs sampler is implemented to obtain the Bayesian solution. Other associated statistical inferences, such as the direct estimation of the latent variables, establishment of a goodness-of-fit assessment for a posited model, Bayesian classification, residual and outlier analyses, are discussed. The methodology is illustrated with a simulation study and a real example.This research was supported by a Hong Kong UGC Earmarked grant CUHK 4026/97H. The authors are indebted to the Editor, the Associate Editor, and three anonymous reviewers for constructive comments in improving the paper, and also to ICPSR and the relevant funding agency for allowing the use of the data. The assistance of Michael K.H. Leung and Esther L.S. Tam is gratefully acknowledged.  相似文献   

药物性肝损伤的诊断思维   总被引:5,自引:0,他引:5  
药物性肝损伤(DILI)是指由于药物或其代谢产物引起的肝脏损害。DILI的诊断尚无金标准,目前国际上比较常用的诊断标准分别从数个不同方面各自进行量化评分,根据得分情况做出诊断。本文归纳DILI的产生机制、诊断标准、临床分型及病理类型,总结推论其临床诊断思维与决策,以供临床医生参考,提高诊断水平。  相似文献   

詹沛达 《心理科学》2019,(1):170-178
随着心理与教育测量研究的发展和科技的进步,计算机化(大规模)测验逐渐受到人们的关注。为探究在计算机化多维测验中如何利用作答时间数据来辅助评估多维潜在能力,以及为我国义务教育阶段教育质量监测提供数据分析方法上的理论支持。本研究以2012年和2015年国际学生能力评估(PISA)计算机化数学测验数据为例,提出了一种可同时利用作答时间和作答精度数据的联合作答与时间的多维Rasch模型。根据新模型对PISA数据的分析结果,表明引入作答时间数据,不仅有助于提高模型参数的估计精度,还有助于数据分析者利用被试的作答时间信息来做进一步的决策和干预(e.g., 对异常作答行为或预备知识的诊断)。  相似文献   

测验信度是衡量测验质量的一个重要指标,认知诊断评估中同样需要重视信度问题。现有认知诊断中计算信度的方法均有一个前提假设:被试在前后两次测验的后验概率分布和边际概率完全相同。该假设过强,未考虑两次测验间存在的随机误差。基于Bootstrap抽样,提出了两类属性信度和模式信度的指标,分别是积差相关法和修正的一致性法。通过模拟研究比较了新方法和现有方法在不同属性个数、属性间相关性和题目数量下的表现,并基于英语能力认证考试ECPE和分数减法的实证数据验证了新方法的可行性。最后,对信度估计的影响因素进行了讨论。  相似文献   

There are a growing number of item response theory (IRT) studies that calibrate different patient-reported outcome (PRO) measures, such as anxiety, depression, physical function, and pain, on common, instrument-independent metrics. In the case of depression, it has been reported that there are considerable mean score differences when scoring on a common metric from different, previously linked instruments. Ideally, those estimates should be the same. We investigated to what extent those differences are influenced by different scoring methods that take into account several levels of uncertainty, such as measurement error (through plausible value imputation) and item parameter uncertainty (through full Bayesian IRT modeling). Depression estimates from different instruments were more similar, and their corresponding confidence/credible intervals were larger when plausible value imputation or Bayesian modeling was used, compared to the direct use of expected a posteriori (EAP) estimates. Furthermore, we explored the use of Bayesian IRT models to update item parameters based on newly collected data.  相似文献   

In this study, we introduce an interval estimation approach based on Bayesian structural equation modeling to evaluate factorial invariance. For each tested parameter, the size of noninvariance with an uncertainty interval (i.e. highest density interval [HDI]) is assessed via Bayesian parameter estimation. By comparing the most credible values (i.e. 95% HDI) with a region of practical equivalence (ROPE), the Bayesian approach allows researchers to (1) support the null hypothesis of practical invariance, and (2) examine the practical importance of the noninvariant parameter. Compared to the traditional likelihood ratio test, simulation results suggested that the proposed Bayesian approach could offer additional insight into evaluating factorial invariance, thus, leading to more informative conclusions. We provide an empirical example to demonstrate the procedures necessary to implement the proposed method in applied research. The importance of and influences on the choice of an appropriate ROPE are discussed.  相似文献   

Q矩阵是认知诊断评价的基础和核心要素, 它反映了测验的构念和内容设计, 直接影响着测验诊断分类的效果。本文采用Monte Carlo模拟, 研究了6种属性层级关系下, 不同的Q矩阵设计对于认知诊断效果的影响。用模式判准率的均值和标准差分别从分类准确性和稳定性的角度来评价诊断效果。实验结果表明:(1) 不同属性层级关系下, 分类准确性会随着测验长度的增加而提高, 但当测验长度增加到一定程度时, 会出现“天花板效应”; (2) Q矩阵中R*的个数(NR*)会影响测验的分类准确性及稳定性:NR*越大, 测验的分类稳定性越高, 当测验长度为属性个数的整数倍, 且NR*为测验长度相对属性个数的最大奇数倍时分类准确性最高; (3) Q矩阵中除R*以外的项目考察的属性个数会随着属性层级关系的不同对测验的分类准确性和稳定性产生不同的影响。根据实验结果, 本研究提出了进行诊断评价时Q矩阵优化设计的一些建议。  相似文献   

Pursuing the line of the difference models in IRT (Thissen &; Steinberg, 1986 Thissen, D., &; Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51:567577. doi:10.1007/BF02295596.[Crossref], [Web of Science ®] [Google Scholar]), this article proposed a new cognitive diagnostic model for graded/polytomous data based on the deterministic input, noisy, and gate (Haertel, 1989 Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 333352. doi:10.1111/j.1745-3984.1989.tb00336.x.[Crossref], [Web of Science ®] [Google Scholar]; Junker &; Sijtsma, 2001 Junker, B. W., &; Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258272. doi:10.1177/01466210122032064.[Crossref], [Web of Science ®] [Google Scholar]), which is named the DINA model for graded data (DINA-GD). We investigated the performance of a full Bayesian estimation of the proposed model. In the simulation, the classification accuracy and item recovery for the DINA-GD model were investigated. The results indicated that the proposed model had acceptable examinees' correct attribute classification rate and item parameter recovery. In addition, a real-data example was used to illustrate the application of this new model with the graded data or polytomously scored items.  相似文献   

The objective of the present article is to explore differences and similarities between cognitive diagnostic assessment (CDA) and evidence-centered game design (ECgD) in the service of intentional hybridization. Although some testing specialists might argue that both are essentially the same given their origins in principled assessment design and equivalency of measurement models, this view misses differences in their focus and operationalization. Given the strengths of both CDA and ECgD, there is motivation to consider ways in which each can deliberately inform the other. The intentional hybridization of CDA and ECgD has, at least in principle, significant advantages to produce a stronger offspring than either parent alone. The article includes four sections: (1) conceptual differences between CDA and ECgD, (2) conceptual similarities between CDA and ECgD, (3) challenges with CDA and ECgD, including narrowness of cognitive models, fidelity with learning, ocean of data, sensitivity to diverse learners, reliance on multidimensional psychometric models, and how hybridization may help, and (4) implications for educational assessment in the twenty-first century around the globe.  相似文献   

