首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
对三维心理旋转操作任务特性的效应的初步研究   总被引:7,自引:2,他引:5  
蔡华俭  杨治良 《心理科学》1998,21(2):153-158
在认知心理学领域,对心理旋转的研究主要是以反应时范式为基础。本研究试图突破反应时范式,把心理测量学和实验的方法相结合,以现代测量理论——项目反应理论的项目参数(如项目难度、区分度、猜测参数)为指标,对心理旋转测验中,操作任务的若干特性对测验项目特性的影响进行研究,结果发现:对于以Shepard三维旋转材料编制的心理旋转测验、测验材料的复杂程度,标准图形和匹配图形的角度差,以及匹配图形的编排位置等对测验项目的诸特性均无显著影响。  相似文献   

2.
魏知超  杨靖 《心理科学》2006,29(2):401-405
本研究编制了一种用于测量儿童语音工作记忆的测验———非词复述测验,并在48名四年级小学生中初步进行信度、效度检验和项目分析。结果表明:(1)该测验有较高的重测信度;(2)该测验具有较高的结构效度和效标效度;(3)分测验二的项目难度分布比较合理,多数项目鉴别力较高,而分测验一的项目难度分布和项目鉴别力则有待于在今后的研究中进一步提高。  相似文献   

3.
计算机化自适应测验选题策略述评   总被引:2,自引:0,他引:2  
毛秀珍  辛涛 《心理科学进展》2011,19(10):1552-1562
计算机化自适应测验(computerized adaptive testing, CAT)是基于测量理论和计算机技术的一种测验模式。它根据考生的作答反应自适应地选择测验项目。选题策略是CAT的重要组成部分之一, 关系到测量效率、测验安全和测验信、效度等重要问题。根据CAT是否具有非统计约束对传统CAT和认知诊断CAT的选题策略进行了分类介绍, 未来研究应进一步提高选题策略的综合表现、深入探讨多级评分项目和认知诊断CAT的选题策略。  相似文献   

4.
李金波  王权 《心理科学》2003,26(5):885-886
1 引言  测验信度和效度是衡量测验编制质量的两个主要参数。测验信度和效度受项目难度、区分度以及被试能力分布等多方面因素的制约。IRT利用信息函数的概念提出了用项目参数来调节测验信度的具体方法 ,这是IRT在心理和教育测量学上的一大贡献。但对于如何提高测验效度 ,至今人们还是凭经验来选择测验项目 ,缺乏客观有效的方法。另外 ,项目难度与区分度是密切地关联着的 ,它们协同影响着测验效度。为此 ,在研究项目参数与测验效度间的关系前 ,首先应该研究项目难度与项目区分度间的关系。2 区分度对难度的回归关系的模拟试验2 .1 …  相似文献   

5.
自编235个图形推理测验题目。采用铆测验等值设计,以72个联合型瑞文测验题目为铆题,对初中到大学各能力层次的1733名男性进行了测验。使用BILOG MG3.0(边际极大似然估计)对实测数据进行了分析,采用Logsitic 3参数模型。剔除数据与模型拟合不好的题目以及信息函数最大值小于0.3的题目,最终建立一个包含181道题目的题库。该题库可以用于淘汰智力较低的应征青年  相似文献   

6.
肖玮  苗丹民  贡京京  武圣君 《心理科学》2007,30(1):139-141,127
基于信息加工速度理论自编征兵用数字搜索测验,使用该测验对全国15735名应征青年及190名新兵进行了测量,3个月新兵营训练结束时由228上级对1900名士兵的智力相关工作绩效情况进行了调查,通过对上述数据的分析确定了测验方法及划界分数,并进行了信、效度检验。结果表明:缺失不同数字对题目难度有影响;划界分数为197秒正确应答27题以上;该测验的内部一致性α系数为0.864;预测符合率为95.7%。  相似文献   

7.
本文以项目反应理论为基础对联合型瑞文测验进行了项目分析。分析主要包括如下几个方面:项目特征曲线,项目信息函数,测验信息函数,项目效度和项目偏差。分析结果揭示了一些值得探讨的问题以供测验的编制,修订和使用者参考。  相似文献   

8.
针对测验中高能力被试答错容易试题的睡眠现象,可使用四参数Logistic模型分析数据。研究选取了来自心理测验和成就测验的实际数据,分别采用传统模型和四参数Logistic模型进行拟合,对不同模型的拟合指标及参数估计结果进行比较。结果表明,四参数Logistic模型能够提高拟合程度,增强估计结果的准确性,有效纠正高能力被试能力被低估的现象。建议在必要时使用四参数Logistic模型进行数据分析。  相似文献   

9.
迫选(forced-choice, FC)测验由于可以控制传统李克特方法带来的反应偏差, 被广泛应用于非认知测验中, 而迫选测验的传统计分方式会产生自模式数据, 这种数据由于不适合于个体间的比较, 一直备受批评。近年来, 多种迫选IRT模型的发展使研究者能够从迫选测验中获得接近常模性的数据, 再次引起了研究者与实践人员对迫选IRT模型的兴趣。首先, 依据所采纳的决策模型和题目反应模型对6种较为主流的迫选IRT模型进行分类和介绍。然后, 从模型构建思路、参数估计方法两个角度对各模型进行比较与总结。其次, 从参数不变性检验、计算机化自适应测验(computerized adaptive testing, CAT)和效度研究3个应用研究方面进行述评。最后提出未来研究可以在模型拓展、参数不变性检验、迫选CAT测验和效度研究4个方向深入。  相似文献   

10.
简小珠  戴步云  戴海琦 《心理学报》2016,48(12):1625-1630
试题难度、试题考查重要性程度加权是多级记分试题的两个基本属性, 因而在IRT项目特征函数中需用不同参数来表示。以往多级记分模型用多个难度参数来描述多级记分试题的难度, 不能有效的表达多级记分试题的分数权重作用。从多级记分试题的分数加权作用角度, 本文提出Logistic加权模型并论述了理论构建思想。在Logistic加权模型下对项目参数估计的EM算法进行推导并编写了相应的参数估计程序。在Logistic加权模型下进行测验模拟, 发现项目参数估计的模拟返真性能良好。  相似文献   

11.
正如不同的病症需要使用不同的医疗技术方法来诊断一样, 不同的认知结构也需要设计对应的测验模式来进行诊断, 从而保证测验具有高质量的诊断评估效果。但传统测验形式未考虑不同认知结构的针对性诊断测验需求, 导致“千人一卷”在测验效率上有所不足; 认知诊断计算机化自适应测验虽可针对不同认知结构的被试施测不同的项目, 然而支持自适应过程的题库却没有针对不同认知结构被试设计对应的项目, 导致题库使用效率较低。要解决上述问题的关键在于, 探索如何针对不同认知结构设计相对应的测验模式。本研究采用Monte Carlo模拟, 对六种属性层级关系下, 不同认知结构的测验设计模式进行探讨。实验结果表明(1)同一属性层级关系下, 不同认知结构的最佳测验设计模式不同; (2)依据不同认知结构的最佳测验设计模式构建的题库具有更高的使用效率。测验编制者可以根据实验结果针对不同认知结构优化对应的测验设计模式, 并用于指导题库建设。  相似文献   

12.
杨向东 《心理学报》2010,42(7):802-812
自动化项目生成(Automatic Item Generation)中的项目参数是基于认知项目设计的刺激特征集预测的, 在不确定性来源上较之用经验数据标定的参数更为复杂。文章通过实证研究分析了在计算机适应性测验条件下基于认知设计系统法生成的抽象推理测验(ART)项目预测参数对能力参数估计的精确性。研究表明, 项目预测参数比相应标定参数分布更为趋中。这种回归效应既影响到能力参数估计误差大小, 也导致适应性测验过程中项目选择的差异。在控制了项目选择差异之后, 能力参数估计误差较之基于项目标定参数的能力估计误差大, 但差别并不明显。两者相应的能力估计值相关很高, 对应能力值之间的差异很小, 且几乎贯彻整个能力分布区间。  相似文献   

13.
计算机化自适应测验中原始题项目参数的估计   总被引:1,自引:1,他引:0  
计算机化自适应测验(Computerized Adaptive Testing, 简称CAT)其安全性面临着新的挑战, 小题库的安全更受威胁。如何建设一个大型、优质的题库成为CAT研究中一个非常重要的课题。目前CAT题库的建设存在一些问题, 如成本高且保密性较差。尤其是等值技术较复杂且锚题重复使用容易造成泄露。如能在实施CAT过程中插入未经过参数估计的项目(原始题), 同时对原始题项目参数进行估计, 这对建设大型、优质的CAT题库来说其意义是不言而喻的。本文基于1PLM和2PLM对此进行研究, 提出了原始题在线估计的新方法以及推导出了求区分度参数a迭代初值的计算公式。研究结果表明:无论是模拟研究还是实证研究, 原始题被作答的次数对项目参数估计结果都会产生不同的影响, 并且原始题作答人数越多项目参数估计精度也越高。  相似文献   

14.

To further advance assessment of patient-reported outcomes, the European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group has developed computerized adaptive test (CAT) versions of all EORTC Quality of Life Core Questionnaire (QLQ-C30) scales/items. The aim of this study was to develop and evaluate an item bank for CAT measurement of insomnia (CAT-SL). In line with the EORTC guidelines, the developmental process comprised four phases: (I) defining the concept insomnia and literature search, (II) selection and formulation of new items, (III) pre-testing and (IV) field-testing, including psychometric analyses of the final item bank. In phase I, the literature search identified 155 items that were compatible with our conceptualisation of insomnia, including both quantity and quality of sleep. In phase II, following a multistep-approach, this number was reduced to 15 candidate items. Pre-testing of these items in cancer patients (phase III) resulted in an item list of 14 items, which were field-tested among 1094 patients in phase IV. Psychometric evaluations showed that eight items could be retained in a unidimensional model. The final item bank yielded greater measurement precision than the original QLQ-C30 insomnia item. It was estimated that administering two or more items from the insomnia item bank with CAT results in a saving in sample size between approximately 15–25%. The 8-item EORTC CAT-SL item bank facilitates precise and efficient measurement of insomnia as part of the EORTC CAT system of health-related quality life assessment in both clinical research and practice.

  相似文献   

15.
This paper describes several simulation studies that examine the effects of capitalization on chance in the selection of items and the ability estimation in CAT, employing the 3-parameter logistic model. In order to generate different estimation errors for the item parameters, the calibration sample size was manipulated (N = 500, 1000 and 2000 subjects) as was the ratio of item bank size to test length (banks of 197 and 788 items, test lengths of 20 and 40 items), both in a CAT and in a random test. Results show that capitalization on chance is particularly serious in CAT, as revealed by the large positive bias found in the small sample calibration conditions. For broad ranges of theta, the overestimation of the precision (asymptotic Se) reaches levels of 40%, something that does not occur with the RMSE (theta). The problem is greater as the item bank size to test length ratio increases. Potential solutions were tested in a second study, where two exposure control methods were incorporated into the item selection algorithm. Some alternative solutions are discussed.  相似文献   

16.
In the first of three experiments, university undergraduates were presented a list of 300 words and 100 nonwords in two sessions. Their confidence that an item was a word was indicated for each item on a 6-point scale. This experiment demonstrated the feasibility of creating a recognition test of vocabulary. In Expeiment II, 100 items were chosen to form a subtest, and the subtest was cross-validated on a new sample of subjects. The tests in Experiments I and II were scored using signal-detection measures. The primary criterion, SAT (verbal) scores, correlated approximately .60 with the test scores. In Experiment III subjects scaled the words and nonwords for four psychological attributes. These were submitted to a stepwise regression with the confidence ratings from Experiment I as the dependent variable. It was concluded that associability. frequency, orthography, and pronounceability all may be components of word recognition. However, only frequency was found to be a significant predictor of the confidence of recognition of nonwords.  相似文献   

17.
陈平  辛涛 《心理学报》2011,43(6):710-724
项目增补对认知诊断计算机化自适应测验(CD-CAT)中的题库维护至关重要。在传统CAT中, 在线标定方法经常用于估计新题的项目参数。然而直到现在, 在CD-CAT领域还没有任何关于在线标定的论文公开发表。为将传统CAT中3种有代表性的在线标定方法(Method A、OEM和 MEM)推广至CD-CAT (CD-Method A、CD-OEM和CD-MEM)建立分析基础, 并采用模拟方法对这3种方法进行比较。研究表明:CD-Method A方法在项目参数的返真性方面优于其它两种方法; 自适应标定设计较随机标定设计可以提高项目参数的返真质量。  相似文献   

18.
Multidimensional computerized adaptive testing (MCAT) has received increasing attention over the past few years in educational measurement. Like all other formats of CAT, item replenishment is an essential part of MCAT for its item bank maintenance and management, which governs retiring overexposed or obsolete items over time and replacing them with new ones. Moreover, calibration precision of the new items will directly affect the estimation accuracy of examinees’ ability vectors. In unidimensional CAT (UCAT) and cognitive diagnostic CAT, online calibration techniques have been developed to effectively calibrate new items. However, there has been very little discussion of online calibration in MCAT in the literature. Thus, this paper proposes new online calibration methods for MCAT based upon some popular methods used in UCAT. Three representative methods, Method A, the ‘one EM cycle’ method and the ‘multiple EM cycles’ method, are generalized to MCAT. Three simulation studies were conducted to compare the three new methods by manipulating three factors (test length, item bank design, and level of correlation between coordinate dimensions). The results showed that all the new methods were able to recover the item parameters accurately, and the adaptive online calibration designs showed some improvements compared to the random design under most conditions.  相似文献   

19.
In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.  相似文献   

20.
The most commonly employed item selection rule in a computerized adaptive test (CAT) is that of selecting the item with the maximum Fisher information for the estimated trait level. This means a highly unbalanced distribution of item‐exposure rates, a high overlap rate among examinees and, for item bank management, strong pressure to replace items with a high discrimination parameter in the bank. An alternative for mitigating these problems involves, at the beginning of the test, basing item selection mainly on randomness. As the test progresses, the weight of information in the selection increases. In the present work we study, for two selection rules, the progressive methods ( Revuelta & Ponsoda, 1998 ) and the proportional method ( Segall, 2004a ), different functions that define the weight of the random component according to the position in the test of the item to be administered. The functions were tested in simulated item banks and in an operative bank. We found that both the progressive and the proportional methods tolerate a high weight of the random component with minimal or zero loss of accuracy, while bank security and maintenance are improved.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号