首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
迫选测验的传统计分方式会产生自模式数据, 不能进行传统的信效度检验、因素分析和方差分析等。近年来研究者提出了一些基于项目反应理论的计分模型, 如瑟斯顿IRT模型和MUPP模型等, 它们可以规避自模式数据的弊端。瑟斯顿IRT模型方便进行参数估计, 模型定义灵活; 而MUPP模型的拓展性较差, 参数估计的方法有待提高。另一方面, 已有研究者基于MUPP模型开发了一些抗作假的迫选测验, 而瑟斯顿IRT模型距离这种应用还比较远。此外, 两个模型的适用性和有效性都有待更多的实证研究来检验。  相似文献   

2.
人格测验中作假的控制方法   总被引:2,自引:0,他引:2  
被试很容易对人格测验作假,这严重影响了人格测验的有效性。目前测评专家已经提出了一些应对作假的方法,它们可被分为事前控制技术和事后识别技术两大类。前者包括迫选式量表,警告及假渠道技术等,后者包括作假识别量表,IRT及反应时识别技术等。目前,在人格测验中嵌套使用作假识别量表,以及在测验指导语中加入警告是比较有效的两种方法,迫选式量表的发展也值得期待。由于研究者对作假的内部发生机制了解较少,这制约了IRT与反应时识别技术的发展。  相似文献   

3.
传统的迫选量表得分是自模式数据,最近提出的Thurstone IRT模型建构了被试对迫选量表反应的数学模型,能够更精确地度量被试的特质水平.研究自编了神经质人格迫选量表,与常用的测量神经质的Likert量表一起,在无压力、模拟应聘和实际应聘三种情境下进行施测.结果发现,迫选量表的实测数据能够较好地拟合Thurstone IRT模型,该模型估计的特质得分不具有自模式数据的性质,比传统计分更能够抵抗作假.无论采用哪种计分方式,迫选量表都比Likert量表更能够抵抗作假.  相似文献   

4.
应征公民计算机自适应化拼图测验的编制   总被引:1,自引:0,他引:1  
在文献回顾和参考外军有关资料的基础上,根据项目反应理论和空间能力测验的有关理论编制试题库。首先采用纸笔测验的形式进行预实验,探讨采用IRT理论编制CAT拼图测验的可行性。然后,在预实验的基础上对试题进行修订并扩充试题数量,编制计算机辅助测验。选择三参数Logistic模型,采用铆题等值设计,分7份不同的试卷在全国征兵心理检测的过程中对55777名应征公民进行施测。根据测试结果,对题目进行分析,选择高质量的题目构成CAT试题库,采用a系数分层抽样的方法控制曝光率,并采用不同的测验终止策略编制CAT拼图测验。最后用WAIS智力测验积木分测验和三门功课的考试成绩为效标,通过72名被试对CAT拼图测验进行效度验证。结果显示该测验符合项目反应理论三参数Logistic模型的假设,各题目参数比较理想,所编制的测验具有较好的信度和效度,可用于应征公民心理选拔的实践  相似文献   

5.
本文在IRT框架下,结合国内外知名的社交焦虑量表,构建社交焦虑题库及其计算机化自适应测验(CAT-SA)。IRT分析包括:单维性检验、项目模型-资料拟合检验、局部独立性检验、DIF检验,选择符合IRT要求的项目构建社交焦虑题库及其CAT。最后分析了CAT-SA诊断效果及信、效度验证。结果显示CAT-SA具有较好的信、效度;能大大减少测试题量,达到减轻测试负担的目的。总之,本文开发的CAT-SA为实现对社交焦虑的高效、快速和准确测量提供新的测量技术和工具。  相似文献   

6.
本研究以4岁~5岁儿童认知能力测验为例,在IRT框架下探讨了如何进行追踪数据的测量不变性分析。分析模型采用项目间多维项目反应理论模型(between-item MIRT model)和项目内(within-item MIRT model)多维two-tier model,被试为来自全国的882名48个月的儿童,工具为自编4岁~5岁儿童认知能力测验。经测验水平 分析和项目水平分析,结果表明:(1)本文对追踪数据的测量不变性分析方法合理有效; (2)该测验在两个时间点上满足部分测量不变性要求,测验的潜在结构稳定; (3)“方位题”的区分度和难度参数都发生变化,另有4题难度参数出现浮动; (4)儿童在4岁~5岁期间认知能力总体呈快速发展趋势,能力增长显著。  相似文献   

7.
项目反应理论(IRT)模型依据项目与被试的特征预测被试的作答表现, 是常用的心理测量模型。但IRT的有效运用依赖于所选用IRT模型与实际数据资料相符合的程度(即模型?资料拟合度, goodness of fit)。只有当所采用IRT分析模型与实际数据资料拟合较好时, IRT的优点和功能才能真正发挥出来(Orlando & Thissen, 2000)。而当所采用IRT模型与资料不拟合或选择了错误的模型, 则会导致如参数估计、测验等值及项目功能差异分析等具有较大误差(Kang, Cohen & Sung, 2009), 给实际工作带来不良影响。因此, 在使用IRT分析时, 应首先充分考察及检验所选用模型与实际数据是否相匹配/相拟合(McKinley & Mills, 1985)。IRT领域中常用模型?资料拟合检验统计量可从项目拟合、测验拟合两个角度进行阐述并比较, 这是心理、教育测量领域的重要主题, 也是测验分析过程中较易忽视的环节, 目前还未见此类公开发表的文章。未来的研究可以在各统计量的实证比较研究以及在认知诊断领域的拓展方面有所发展。  相似文献   

8.
目的:修订梅尔美术判断测验(Meier Art Judgment Test)并对其信度、效度进行检验。方法:通过对来自6所大学、中专共2270人施测梅尔美术判断测验,采用CTT区分度和IRT的模型拟合检验、区分度筛选项目,以霍兰德艺术分测验、学生艺术创作水平自评与艺术过往经历分量表为效标,以及采用效标组法(美术与非美术专业)检验效标关联效度。结果:保留的61题都拟合IRT的2参数logistic模型,量表得分与各效标得分相关显著,美术与非美术专业学生得分存在显著差异; 但测验信息量分析表明,对高能力被试的测量误差相对较大。结论:修订的量表能测量个体的美术判断能力; 今后改进方向应该是增加更难的试题。  相似文献   

9.
李金波  王权 《心理科学》2003,26(5):885-886
1 引言  测验信度和效度是衡量测验编制质量的两个主要参数。测验信度和效度受项目难度、区分度以及被试能力分布等多方面因素的制约。IRT利用信息函数的概念提出了用项目参数来调节测验信度的具体方法 ,这是IRT在心理和教育测量学上的一大贡献。但对于如何提高测验效度 ,至今人们还是凭经验来选择测验项目 ,缺乏客观有效的方法。另外 ,项目难度与区分度是密切地关联着的 ,它们协同影响着测验效度。为此 ,在研究项目参数与测验效度间的关系前 ,首先应该研究项目难度与项目区分度间的关系。2 区分度对难度的回归关系的模拟试验2 .1 …  相似文献   

10.
内隐学习的主观测量标准   总被引:4,自引:1,他引:3  
郭秀艳  朱磊  邹庆宇 《心理科学》2005,28(5):1192-1195
近40年来,验证知识的内隐性一直是内隐学习研究的焦点。传统三大范式常以自由言语报告或是迫选测验作为验证知识内隐性的外显指标,然而各种迹象表明,言语报告和迫选测验存在一定的局限性。由此,Dienes等人提出了内隐学习的主观测量。文章将从言语报告和追选测验的局限性着手,阐述主观测量诞生的必要性及其在衡量知识意识性上的效度。  相似文献   

11.
Abstract

Differential item functioning (DIF) is a pernicious statistical issue that can mask true group differences on a target latent construct. A considerable amount of research has focused on evaluating methods for testing DIF, such as using likelihood ratio tests in item response theory (IRT). Most of this research has focused on the asymptotic properties of DIF testing, in part because many latent variable methods require large samples to obtain stable parameter estimates. Much less research has evaluated these methods in small sample sizes despite the fact that many social and behavioral scientists frequently encounter small samples in practice. In this article, we examine the extent to which model complexity—the number of model parameters estimated simultaneously—affects the recovery of DIF in small samples. We compare three models that vary in complexity: logistic regression with sum scores, the 1-parameter logistic IRT model, and the 2-parameter logistic IRT model. We expected that logistic regression with sum scores and the 1-parameter logistic IRT model would more accurately estimate DIF because these models yielded more stable estimates despite being misspecified. Indeed, a simulation study and empirical example of adolescent substance use show that, even when data are generated from / assumed to be a 2-parameter logistic IRT, using parsimonious models in small samples leads to more powerful tests of DIF while adequately controlling for Type I error. We also provide evidence for minimum sample sizes needed to detect DIF, and we evaluate whether applying corrections for multiple testing is advisable. Finally, we provide recommendations for applied researchers who conduct DIF analyses in small samples.  相似文献   

12.
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The objective was to provide bounds of the likely DIF effects on these measurement consequences. Five factors were manipulated: test length, percentage of DIF items per form, item type, sample size, and level of group ability difference. Results indicate that the greatest DIF effect was less than 2 points on the 0 to 60 total score scale and about 0.15 on the IRT ability scale. DIF had a limited effect on the ratio of true-score variance to observed-score variance, but its influence on the standard error of estimation for the IRT ability parameter was evident for certain ability values.  相似文献   

13.
It is common practice in IRT to consider items as fixed and persons as random. Both, continuous and categorical person parameters are most often random variables, whereas for items only continuous parameters are used and they are commonly of the fixed type, although exceptions occur. It is shown in the present article that random item parameters make sense theoretically, and that in practice the random item approach is promising to handle several issues, such as the measurement of persons, the explanation of item difficulties, and trouble shooting with respect to DIF. In correspondence with these issues, three parts are included. All three rely on the Rasch model as the simplest model to study, and the same data set is used for all applications. First, it is shown that the Rasch model with fixed persons and random items is an interesting measurement model, both, in theory, and for its goodness of fit. Second, the linear logistic test model with an error term is introduced, so that the explanation of the item difficulties based on the item properties does not need to be perfect. Finally, two more models are presented: the random item profile model (RIP) and the random item mixture model (RIM). In the RIP, DIF is not considered a discrete phenomenon, and when a robust regression approach based on the RIP difficulties is applied, quite good DIF identification results are obtained. In the RIM, no prior anchor sets are defined, but instead a latent DIF class of items is used, so that posterior anchoring is realized (anchoring based on the item mixture). It is shown that both approaches are promising for the identification of DIF.  相似文献   

14.
刘红云  骆方 《心理学报》2008,40(1):92-100
作者简要介绍了多水平项目反应模型,对多水平项目反应理论与通常项目反应理论之间的关系进行了探讨,得到了多水平项目反应模型参数与通常项目反应模型参数之间的关系,并讨论了多水平项目反应模型的推广模型。通过一个实际例子,用多水平项目反应模型对测验中项目的特征进行分析;检验个体水平和组水平预测变量对能力参数的影响;对项目功能差异进行分析。最后文章就多水平项目反应理论模型的优势与不足进行了讨论  相似文献   

15.
A real-data simulation of computerized adaptive testing (CAT) is an important step in real-life CAT applications. Such a simulation allows CAT developers to evaluate important features of the CAT system, such as item selection and stopping rules, before live testing. SIMPOLYCAT, an SAS macro program, was created by the authors to conduct real-data CAT simulations based on polytomous item response theory (IRT) models. In SIMPOLYCAT, item responses can be input from an external file or generated internally on the basis of item parameters provided by users. The program allows users to choose among methods of setting initial ?, approaches to item selection, trait estimators, CAT stopping criteria, polytomous IRT models, and other CAT parameters. In addition, CAT simulation results can be saved easily and used for further study. The purpose of this article is to introduce SIMPOLYCAT, briefly describe the program algorithm and parameters, and provide examples of CAT simulations, using generated and real data. Visual comparisons of the results obtained from the CAT simulations are presented.  相似文献   

16.
Missing data, such as item responses in multilevel data, are ubiquitous in educational research settings. Researchers in the item response theory (IRT) context have shown that ignoring such missing data can create problems in the estimation of the IRT model parameters. Consequently, several imputation methods for dealing with missing item data have been proposed and shown to be effective when applied with traditional IRT models. Additionally, a nonimputation direct likelihood analysis has been shown to be an effective tool for handling missing observations in clustered data settings. This study investigates the performance of six simple imputation methods, which have been found to be useful in other IRT contexts, versus a direct likelihood analysis, in multilevel data from educational settings. Multilevel item response data were simulated on the basis of two empirical data sets, and some of the item scores were deleted, such that they were missing either completely at random or simply at random. An explanatory IRT model was used for modeling the complete, incomplete, and imputed data sets. We showed that direct likelihood analysis of the incomplete data sets produced unbiased parameter estimates that were comparable to those from a complete data analysis. Multiple-imputation approaches of the two-way mean and corrected item mean substitution methods displayed varying degrees of effectiveness in imputing data that in turn could produce unbiased parameter estimates. The simple random imputation, adjusted random imputation, item means substitution, and regression imputation methods seemed to be less effective in imputing missing item scores in multilevel data settings.  相似文献   

17.
Constant latent odds-ratios models and the mantel-haenszel null hypothesis   总被引:1,自引:0,他引:1  
In the present paper, a new family of item response theory (IRT) models for dichotomous item scores is proposed. Two basic assumptions define the most general model of this family. The first assumption is local independence of the item scores given a unidimensional latent trait. The second assumption is that the odds-ratios for all item-pairs are constant functions of the latent trait. Since the latter assumption is characteristic of the whole family, the models are called constant latent odds-ratios (CLORs) models. One nonparametric special case and three parametric special cases of the general CLORs model are shown to be generalizations of the one-parameter logistic Rasch model. For all CLORs models, the total score (the unweighted sum of the item scores) is shown to be a sufficient statistic for the latent trait. In addition, conditions under the general CLORs model are studied for the investigation of differential item functioning (DIF) by means of the Mantel-Haenszel procedure. This research was supported by the Dutch Organization for Scientific Research (NWO), grant number 400-20-026.  相似文献   

18.
In this article, the authors developed a common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory (IRT). They proposed examining the loadings (discrimination) and the intercept (location) parameters simultaneously using the likelihood ratio test with a free-baseline model and Bonferroni corrected critical p values. They compared the relative efficacy of this approach with alternative implementations for various types and amounts of DIF, sample sizes, numbers of response categories, and amounts of impact (latent mean differences). Results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model. Both MACS and IRT performed similarly well in the majority of experimental conditions. As expected, MACS performed slightly worse in dichotomous conditions but better than IRT in polytomous cases where sample sizes were small. Also, contrary to popular belief, MACS performed well in conditions where DIF was simulated on item thresholds (item means), and its accuracy was not affected by impact.  相似文献   

19.
The use of multidimensional forced-choice (MFC) items to assess non-cognitive traits such as personality, interests and values in psychological tests has a long history, because MFC items show strengths in preventing response bias. Recently, there has been a surge of interest in developing item response theory (IRT) models for MFC items. However, nearly all of the existing IRT models have been developed for MFC items with binary scores. Real tests use MFC items with more than two categories; such items are more informative than their binary counterparts. This study developed a new IRT model for polytomous MFC items based on the cognitive model of choice, which describes the cognitive processes underlying humans' preferential choice behaviours. The new model is unique in its ability to account for the ipsative nature of polytomous MFC items, to assess individual psychological differentiation in interests, values and emotions, and to compare the differentiation levels of latent traits between individuals. Simulation studies were conducted to examine the parameter recovery of the new model with existing computer programs. The results showed that both statement parameters and person parameters were well recovered when the sample size was sufficient. The more complete the linking of the statements was, the more accurate the parameter estimation was. This paper provides an empirical example of a career interest test using four-category MFC items. Although some aspects of the model (e.g., the nature of the person parameters) require additional validation, our approach appears promising.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号