首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Response times on test items are easily collected in modern computerized testing. When collecting both (binary) responses and (continuous) response times on test items, it is possible to measure the accuracy and speed of test takers. To study the relationships between these two constructs, the model is extended with a multivariate multilevel regression structure which allows the incorporation of covariates to explain the variance in speed and accuracy between individuals and groups of test takers. A Bayesian approach with Markov chain Monte Carlo (MCMC) computation enables straightforward estimation of all model parameters. Model-specific implementations of a Bayes factor (BF) and deviance information criterium (DIC) for model selection are proposed which are easily calculated as byproducts of the MCMC computation. Both results from simulation studies and real-data examples are given to illustrate several novel analyses possible with this modeling framework. The authors thank Steven Wise, James Madison University, and Pere Joan Ferrando, Universitat Rovira i Virgili, for generously making available their data sets for the empirical examples in this paper.  相似文献   

2.
The purpose of this study is to explore patterns in model-data fit related to subgroups of test takers from a large-scale writing assessment. Using data from the SAT, a calibration group was randomly selected to represent test takers who reported that English was their best language from the total population of test takers (N = 322,011). A reference scale for the items was constructed based on EBL responses. Response behaviors of test takers who reported that English was not their best language (ENBL) were examined in relationship to this reference scale. This study illustrates the use of differential subgroup analyses to identify patterns related to person misfit within subgroups, as well as subsets of items, that may affect the validity of writing scores for ENBL test takers. The methodology described here offers an approach that can be used to explore, understand, and improve the validity of scores obtained from ENBL test takers in large-scale writing assessments.  相似文献   

3.
《人类行为》2013,26(2):157-178
The aim of this study was to estimate the impact of distraction on standardized test performance. The distraction investigated here was from fellow examinees who were taking a speaking test. Study participants were volunteers (N = 171) who had previously taken the Graduate Management Admission Test (GMAT), the Graduate Record Examinations (GRE) General Test, or the Test of English as a Foreign Language (TOEFL). They were invited to retake a different form of the same test under either distracting conditions or standard, distraction-free conditions. Test takers expressed strong negative perceptions about the distraction caused by fellow test takers. The impact on actual test performance, however, was slight in the GMAT sample and negligible in both the GRE and TOEFL samples. Moreover, the influence of distraction was no greater than that associated with other common, undesirable influences.  相似文献   

4.
允许修改答案的认知诊断计算机化自适应测验(Reviewable Cognitive Diagnostic Computerized Adaptive Testing,RCD-CAT),有利于更准确诊断被试的知识状态,题目口袋法(Item Pocket,IP)为被试提供了缓存作答并修改的机会,改进的题目口袋法(Modified IP,MIP)对IP内修改的题目重新计分。模拟研究比较了IP、MIP、stocking Ⅰ和stocking Ⅱ在RCD-CAT效果,结果发现:stocking设计的效果最优,其中stocking Ⅱ的效果略优于stocking Ⅰ,IP法和MIP法判准率要低于传统CD-CAT,stocking设计在RCD-CAT具有较好的应用前景。  相似文献   

5.
John Ross 《Psychometrika》1964,29(1):67-73
The difference in factor structure resulting from the factorization of correlations, covariances, and cross products is discussed. Factoring cross products has the advantage of retaining information on both means and variances; this method of factoring learning data is recommended. The conditions under which factoring covariances and cross products lead to the same essential structure are investigated.  相似文献   

6.
孟祥斌 《心理科学》2016,39(3):727-734
近年来,项目反应时间数据的建模是心理和教育测量领域的热门方向之一。针对反应时间的对数正态模型和Box-Cox正态模型的不足,本文在van der Linden的分层模型框架下基于偏正态分布建立一个反应时间的对数线性模型,并成功给出模型参数估计的马尔科夫链蒙特卡罗(Markov Chain Monte Carlo, MCMC)算法。模拟研究和实例分析的结果均表明,与对数正态模型和Box-Cox正态模型相比,对数偏正态模型表现出更加优良的拟合效果,具有更强的灵活性和适用性。  相似文献   

7.

Individuals are known to categorize others into social groups based on cues like race and gender and to experience relative discomfort when interacting with “outgroup” members. Two experimental studies were used to examine whether actor demographic cues in situational judgment assessment items completed by test takers in a simulated employee selection context may lead to differences in their performance and reactions to the hiring organization. In both studies, test takers assumed the perspective of actors shown in video-based scenarios and indicated how they would respond to interaction partners (IPs) to whom they were racially similar or dissimilar. In Study 1, a given test taker responded to IPs of a constant gender; in Study 2, IPs’ gender varied across scenarios within each condition. In Study 1, Black test takers spent more time and scored better on two of the four scenarios when responding to racially similar IPs. These effects were not found in Study 2, but demographic cues showed new interactive effects on performance and reactions. We discuss the implications of different findings across the two studies.

  相似文献   

8.
For detecting differential item functioning (DIF) between two or more groups of test takers in the Rasch model, their item parameters need to be placed on the same scale. Typically this is done by means of choosing a set of so-called anchor items based on statistical tests or heuristics. Here the authors suggest an alternative strategy: By means of an inequality criterion from economics, the Gini Index, the item parameters are shifted to an optimal position where the item parameter estimates of the groups best overlap. Several toy examples, extensive simulation studies, and two empirical application examples are presented to illustrate the properties of the Gini Index as an anchor point selection criterion and compare its properties to those of the criterion used in the alignment approach of Asparouhov and Muthén. In particular, the authors show that—in addition to the globally optimal position for the anchor point—the criterion plot contains valuable additional information and may help discover unaccounted DIF-inducing multidimensionality. They further provide mathematical results that enable an efficient sparse grid optimization and make it feasible to extend the approach, for example, to multiple group scenarios.  相似文献   

9.
涂冬波  蔡艳  戴海琦 《心理科学》2013,36(2):469-474
计算机化认知诊断自适应测验(CD_CAT)是将认知诊断的基本理论、方法与计算机化自适应测验相结合的产物,是现代测量学发展的新领域。对于计算机化自适应测验(CAT)中的选题策略研究一直是国内外学者关注的问题,然而对于计算机化认知诊断自适应测验的选题策略研究却很少报导,而对于计算机化认知诊断自适应测验的初始题选取方法的研究却更少。本研究采用计算机模拟程序对HO-DINA模型下CD_CAT的五种选题策略及二种初始题选取方法进行研究。研究表明:不同初始题选取方法及选题策略均会影响对被试诊断的准确性及能力估计的精度;总体来看,对于二种初始题选取方法,本研究提出的“T阵法”优于传统的随机法;对于五种选题策略,SL_GDI法最优;初始题选取方法及选题策略的搭配中,“T阵法”和SL_GDI法的搭配最佳。  相似文献   

10.
We evaluated the fit of Morey's (1991) proposed 4-factor structure on Personality Assessment Inventory-Borderline Features Scale (PAI-BOR; Morey, 1991) items in a sample of approximately 5,000 nonclinical participants. The proposed model did not fit the data well. Results from a series of exploratory and confirmatory factor analyses suggested that a 6-factor model provided the best fit to the PAI-BOR item covariances.  相似文献   

11.
Conditional Covariance Theory and Detect for Polytomous Items   总被引:1,自引:0,他引:1  
This paper extends the theory of conditional covariances to polytomous items. It has been proven that under some mild conditions, commonly assumed in the analysis of response data, the conditional covariance of two items, dichotomously or polytomously scored, given an appropriately chosen composite is positive if, and only if, the two items measure similar constructs besides the composite. The theory provides a theoretical foundation for dimensionality assessment procedures based on conditional covariances or correlations, such as DETECT and DIMTEST, so that the performance of these procedures is theoretically justified when applied to response data with polytomous items. Various estimators of conditional covariances are constructed, and special attention is paid to the case of complex sampling data, such as those from the National Assessment of Educational Progress (NAEP). As such, the new version of DETECT can be applied to response data sets not only with polytomous items but also with missing values, either by design or at random. DETECT is then applied to analyze the dimensional structure of the 2002 NAEP reading samples of grades 4 and 8. The DETECT results show that the substantive test structure based on the purposes for reading is consistent with the statistical dimensional structure for either grade. This research was supported by the Educational Testing Service and the National Assessment of Educational Progress (Grant R902F980001), US Department of Education. The opinions expressed herein are solely those of the author and do not necessarily represent those of the Educational Testing Service. The author would like to thank Ting Lu, Paul Holland, Shelby Haberman, and Feng Yu for their comments and suggestions. Requests for reprints should be sent to Jinming Zhang, Educational Testing Service, MS 02-T, Rosedale Road, Princeton, NJ 08541, USA. E-mail: jzhang@ets.org  相似文献   

12.
MST结合了纸笔测验和CAT的优势,现阶段在美国的许多大型考试中得到了应用。本文结合MST、认知诊断、CD-CAT和OMST的思想对CD-MST的可行性进行研究。CD-MST具有认知诊断和自适应的功能,能够使用较少的题目为被试提供即时的、准确的、丰富的诊断信息;同时它计算速度较快,允许考生返回检查和修改,更符合实际考试情境,且在测验的编制上更容易控制。本研究考察了选题策略和题库质量对不同测验设计的CD-MST的影响,并同CD-CAT进行了比较。通过模拟研究发现:MPWKL、GDI和SHE选题策略同样也适用于CD-MST的选题,在题库质量好的情况下这三种选题策略的判准率同CD-CAT持平。CD-MST的测验时间要比CD-CAT缩短2/3以上。  相似文献   

13.
Previous research has found that test takers can score above chance level on reading comprehension tests even when the passages are omitted. The present research investigated whether the effect would vary as a function of race. In Study 1, 386 participants completed a reading test with the passages omitted. General mental ability and race were significantly correlated with test performance. In Study 2, 827 job applicants completed the test as part of an entry-level selection battery. Eliminating items for which large race differences existed in Study 1 had no real effect on the size of the Black-White mean difference.  相似文献   

14.
This study examined the psychometric properties of the Revised Illness Perception Questionnaire adapted for a clinical sample of low-income Latinos suffering from depression. Participants (N = 339) were recruited from public primary care centers. Their average age was 49.73 years and the majority was foreign born females of either Mexican or Central American descent. Confirmatory factor analysis was used to test the factor structure of this measure. Construct and discriminant validity and internal consistency were evaluated. After the elimination of three items because of low factor loadings (< .40) and the specification of seven error covariances, a revised model composed of 24 items had adequate goodness-of-fit indices and factor loadings, supporting construct validity. Each of the subscales reported satisfactory internal consistency. Intercorrelations between the 5 illness perception factors provided initial support for the discriminant validity of these factors in the context of depression. The establishment of the psychometric properties of this adapted measure will pave the way for future studies examining the role illness perceptions play in the help seeking and management of depression among Latinos.  相似文献   

15.
In this study, eCAT-Listening, a new computerized adaptive test for the evaluation of English Listening, is described. Item bank development, anchor design for data collection, and the study of the psychometric properties of the item bank and the adaptive test are described. The calibration sample comprised 1.576 participants. Good psychometric guarantees: the bank is unidimensional, the items are satisfactorily fitted to the 3-parameter logistic model, and an accurate estimation of the trait level is obtained. As validity evidence, a high correlation was obtained between the estimated trait level and a latent factor made up of the diverse criteria selected. The analysis of the trait level estimation by means of a simulation led us to fix the test length at 20 items, with a maximum exposure rate of .40.  相似文献   

16.
The Beck Depression Inventory-II (BDI-II) is a frequently used scale for measuring depressive severity. BDI-II data (404 clinical; 695 nonclinical adults) were analyzed by means of confirmatory factor analysis to test whether the factor structure model with a somatic-affective and cognitive component of depression, formulated by Beck and colleagues, has a good fit. We also evaluated 10 alternative models. The fit of Beck's model was not good for all criteria. Three of the alternative models had a better fit in both samples, but none of these met all criteria for good fit. Of the alternatives with a better fit, we selected the only model with unidimensional subscales, which assesses a somatic, affective, and cognitive dimension. For this model, which we recommend, as well as for Beck' original model, a good fitting structure containing 15 and 16 items was developed with an item-deletion algorithm.  相似文献   

17.
In the classical test theory, a high-reliability test always leads to a precise measurement. However, when it comes to the prediction of test scores, it is not necessarily so. Based on a Bayesian statistical approach, we predicted the distributions of test scores for a new subject, a new test, and a new subject taking a new test. Under some reasonable conditions, the predicted means, variances, and covariances of predicted scores were obtained and investigated. We found that high test reliability did not necessarily lead to small variances or covariances. For a new subject, higher test reliability led to larger predicted variances and covariances, because high test reliability enabled a more accurate prediction of test score variances. Regarding a new subject taking a new test, in this study, higher test reliability led to a large variance when the sample size was smaller than half the number of tests. The classical test theory is reanalyzed from the viewpoint of predictions and some suggestions are made.  相似文献   

18.
The purpose of the present study was to test the factorial and discriminant validity of the Revised Illness Perception Questionnaire (IPQ-R), a measure of illness representations based on Leventhal, Meyer and Nerenz's Self-Regulation Theory, in a cervical screening context using confirmatory factor analysis. Six hundred and sixty women, who had attended a colposcopy clinic and were invited to re-attend, completed the IPQ-R. Data were analysed using covariance structure analysis. The adequacy of an a priori confirmatory factor analytic model that included seven dimensions of the cognitive illness representation: identity, timeline-acute/chronic, serious consequences, personal control, treatment control, illness coherence, and causal attributions, and one emotional representation factor was tested against the observed data. After the elimination of two items responsible for large standardised residuals and with low factor loadings, the model adequately accounted for covariances among the IPQ-R items according to multiple criteria for goodness-of-fit. Factor inter-correlations supported the discriminant validity of the constructs and the factors exhibited satisfactory composite reliability. A theoretically predictable pattern of relationships among the representation dimensions was evident. In particular, the control-related constructs and the illness coherence dimension were negatively related to other illness representation constructs. The present study provided confirmatory evidence using a robust hypothesis-testing framework to support the proposed structure of the illness representation dimensions in a cervical screening context.  相似文献   

19.
Genetic and environmental influences on academic achievement were investigated in four groups of siblings: (1) White full siblings, (2) White half-siblings, (3) Black full siblings, and (4) Black half-siblings. Our expectation was that the variances and covariances among three achievement tests would have the same structure across the four groups. This expectation was confirmed by a quantitative genetic model that imposed equal factor loadings across groups. This best fitting model had two factors: a Genetic factor representing genetic variation and a Shared Environment factor representing environmental differences among families. Reading recognition, reading comprehension, and mathematics tests all loaded on the Genetic factor, but primarily mathematics loaded on the Shared Environment factor. The quantitative genetic model was next fit to the achievement test means. Its successful fit suggested that the genetic and environmental influences involved in producing individual variation were the same as those producing the group-mean differences. In this sample, genes accounted for 66% to 74% of the observed group difference in verbal achievement and 36% of the difference in mathematics achievement. Shared environment accounted for the remainder, 34% to 26% of the difference in verbal achievement and 64% of that in mathematics achievement.  相似文献   

20.
毛秀珍  刘欢  唐倩 《心理科学》2019,(1):187-193
双因子模型假设测验考察一个一般因子和多个组因子,符合很多教育和心理测验的因素结构。“维度缩减”方法将参数估计中多维积分计算化简为多个迭代二维积分,是双因子模型的重要特征。本文针对考察多级评分项目的计算机化自适应测验,首先推导双因子等级反应模型下Fisher信息量的计算,然后推导“维度缩减”方法在项目选择方法中的应用,最后在低、中、高双因子模式题库中比较D-优化方法、后验加权Fisher信息D优化方法(PDO)、后验加权Kullback-Leibler方法(PKL)、连续熵(CEM)和互信息(MI)方法在能力估计的相关、均方根误差、绝对值偏差和欧氏距离的表现。模拟研究表明:(1)双因子模式越强,即一般因子和组因子在项目上的区分度的差异越小,一般因子估计精度降低,组因子估计精度增加,整体能力的估计精度提高;(2)相同实验条件下,连续熵方法的测量精度最高,PKL方法的能力估计精度最低,其它方法的测量精度没有显著差异。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号