期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Comparison of Using the Fixed Common-Precalibrated Parameter Method and the Matched Characteristic Curve Method for Linking Multiple-Test Items

《International Journal of Testing》2013,13(3):267-293

A linking design typically consists of a data collection procedure together with an item linking procedure that places item parameters calibrated from multiple test forms onto a common scale. This study considered 2 potentially useful item response theory linking designs. The first one is characterized by selecting a single set of common items across all multiple test forms, the precalibrated item parameters of which are kept fixed while the unknown parameters of the other items are being estimated. This linking design will be referred to as the fixed common-precalibrated item parameter design. However, data collected under this design could also be analyzed by the characteristic curve method, which constituted an alternative linking procedure. In this study, the relative merits of the 2 linking designs were examined with respect to their robustness against 3 manipulated conditions-namely, when the common items have imprecise estimates, when there is a noticeable difference in the average item difficulty between the common and the noncommon items, and when the examinees are heterogeneous in terms of their abilities. A parameter recovery study was conducted to achieve this purpose. The results indicated that both linking designs were capable of producing accurate linking of items and equivalent estimation of ability parameters under the 3 conditions. When the 2 designs were actually utilized in the development of an item bank, it was found that both linking designs produced quite consistent solutions despite minor differences on some item and ability estimates. Condition under which a linking design is preferred over the other is also provided in the Discussion section of this article. 相似文献

2.

基于可达阵的一种Q矩阵标定方法

汪文义宋丽红丁树良《心理科学》2018,(4):968-975

Q矩阵标定是实施认知诊断评估的前提,已有Q矩阵修正方法并不太适合测验中已知属性向量的题目数较少的情形。根据拓展Q矩阵理论中可达阵R列与简化Q阵列存在布尔“或”关系,在一定认知假设下,率先提出可达阵R与简化Q阵的潜在反应列存在布尔“与”关系,并由此提出基于可达阵的Q矩阵标定方法。研究显示：在已知一个可达阵下,当可达阵项目的猜测或失误参数在.20以下且待标定项目的项目参数约在.30以下时,新方法所得Q矩阵元素返真率基本在.90以上,并且真实Q矩阵与估计Q矩阵下被试分类准确率差异很小;对于含5个属性的独立结构,新方法要求的随机样本的样本量较小;实证研究也印证了模拟研究的结论。新方法只需专家标定少量题目的Q矩阵,即已经标定的Q矩阵对应属性层级结构的可达阵。相似文献

3.

A general solution for the latent class model of latent structure analysis 总被引：1，自引：0，他引：1

GREEN BF 《Psychometrika》1951,16(2):151-166

相似文献

4.

计算机适应性测验条件下认知设计项目预测参数的影响

杨向东《心理学报》2010,42(7):802-812

自动化项目生成(Automatic Item Generation)中的项目参数是基于认知项目设计的刺激特征集预测的, 在不确定性来源上较之用经验数据标定的参数更为复杂。文章通过实证研究分析了在计算机适应性测验条件下基于认知设计系统法生成的抽象推理测验(ART)项目预测参数对能力参数估计的精确性。研究表明, 项目预测参数比相应标定参数分布更为趋中。这种回归效应既影响到能力参数估计误差大小, 也导致适应性测验过程中项目选择的差异。在控制了项目选择差异之后, 能力参数估计误差较之基于项目标定参数的能力估计误差大, 但差别并不明显。两者相应的能力估计值相关很高, 对应能力值之间的差异很小, 且几乎贯彻整个能力分布区间。相似文献

5.

A comparison of the efficiency and accuracy of BILOG and LOGIST

Wendy M. Yen 《Psychometrika》1987,52(2):275-291

Comparisons are made between BILOG version 2.2 and LOGIST 5.0 Version 2.5 in estimating the item parameters, traits, item characteristic functions (ICFs), and test characteristic functions (TCFs) for the three-parameter logistic model. Data analyzed are simulated item responses for 1000 simulees and one 10-item test, four 20-item tests, and four 40-item tests. LOGIST usually was faster than BILOG in producing maximum likelihood estimates. BILOG almost always produced more accurate estimates of individual item parameters. In estimating ICFs and TCFs BILOG was more accurate for the 10-item test, and the two programs were about equally accurate for the 20- and 40-item tests.I am grateful to Robert J. Mislevy, Martha L. Stocking, and Marilyn S. Wingersky for many helpful comments on an earlier version of this paper. I would also like to thank Hamid Kamrani and Bongmyoung Park for getting LOGIST and BILOG running and keeping them running under changing computer systems at CTB/McGraw-Hill. 相似文献

6.

A Note on the Reliability Coefficients for Item Response Model-Based Ability Estimates

Seonghoon?Kim Email author 《Psychometrika》2012,77(1):153-162

Assuming item parameters on a test are known constants, the reliability coefficient for item response theory (IRT) ability estimates is defined for a population of examinees in two different ways: as (a) the product-moment correlation between ability estimates on two parallel forms of a test and (b) the squared correlation between the true abilities and estimates. Due to the bias of IRT ability estimates, the parallel-forms reliability coefficient is not generally equal to the squared-correlation reliability coefficient. It is shown algebraically that the parallel-forms reliability coefficient is expected to be greater than the squared-correlation reliability coefficient, but the difference would be negligible in a practical sense. 相似文献

7.

多级评分题计算机自适应测验选题策略比较 总被引：12，自引：2，他引：10

戴海琦陈德枝丁树良邓太萍《心理学报》2006,38(5):778-783

研究比较了多级评分题计算机化自适应测验五种选题策略的优劣。应用的IRT模型是Samejima的等级反应模型。参加比较的选题策略有难度均值与能力匹配法、难度中值与能力匹配法、信息量最大法和两种A分层法。比较指标采用了能力估计值返回真值偏差、能力估计标准差、人均用题数和试题调用次数标准差四个。研究采用蒙特卡罗模拟法,结果显示每种方法各有优劣,在分层得当情况下,A分层法(中)的综合效果最佳相似文献

8.

An extension of item analysis procedures to the case of polychotomous response 总被引：2，自引：0，他引：2

Frank B. Baker John Gurland 《Psychometrika》1968,33(3):259-266

Classical item analysis procedures were developed for dichotomously scored items and do not apply to items allowing multiple correct responses. Maximum likelihood procedures analogous to those employed in polychotomous bio-assay are presented which yield estimates of the sets of parameters for items having multiple nonordered responses. Expressions for the estimates of the asymptotic variances of the item parameters and on overall chi-square goodness of fit test are also provided. 相似文献

9.

多维测验项目参数的估计：基于SEM与MIRT方法的比较

刘红云骆方王玥张玉《心理学报》2012,44(1):121-132

作者简要回顾了SEM框架下分类数据因素分析(CCFA)模型和MIRT框架下测验题目和潜在能力的关系模型, 对两种框架下的主要参数估计方法进行了总结。通过模拟研究, 比较了SEM框架下WLSc和WLSMV估计方法与MIRT框架下MLR和MCMC估计方法的差异。研究结果表明：(1) WLSc得到参数估计的偏差最大, 且存在参数收敛的问题; (2)随着样本量增大, 各种项目参数估计的精度均提高, WLSMV方法与MLR方法得到的参数估计精度差异很小, 大多数情况下不比MCMC方法差; (3)除WLSc方法外, 随着每个维度测验题目的增多参数估计的精度逐渐增高; (4)测验维度对区分度参数和难度参数的影响较大, 而测验维度对项目因素载荷和阈值的影响相对较小; (5)项目参数的估计精度受项目测量维度数的影响, 只测量一个维度的项目参数估计精度较高。另外文章还对两种方法在实际应用中应该注意的问题提供了一些建议。相似文献

10.

Using a Response Time–Based Expected A Posteriori Estimator to Control for Differential Speededness in Computerized Adaptive Test

Justin L. Kern Edison Choe 《应用心理检测》2021,45(5):361

This study investigates using response times (RTs) with item responses in a computerized adaptive test (CAT) setting to enhance item selection and ability estimation and control for differential speededness. Using van der Linden’s hierarchical framework, an extended procedure for joint estimation of ability and speed parameters for use in CAT is developed following van der Linden; this is called the joint expected a posteriori estimator (J-EAP). It is shown that the J-EAP estimate of ability and speededness outperforms the standard maximum likelihood estimator (MLE) of ability and speededness in terms of correlation, root mean square error, and bias. It is further shown that under the maximum information per time unit item selection method (MICT)—a method which uses estimates for ability and speededness directly—using the J-EAP further reduces average examinee time spent and variability in test times between examinees above the resulting gains of this selection algorithm with the MLE while maintaining estimation efficiency. Simulated test results are further corroborated with test parameters derived from a real data example. 相似文献

11.

Compromised item detection: A Bayesian change-point perspective

Yang Du Susu Zhang Hua-Hua Chang 《The British journal of mathematical and statistical psychology》2023,76(1):131-153

Psychometric methods for accurate and timely detection of item compromise have been a long-standing topic. While Bayesian methods can incorporate prior knowledge or expert inputs as additional information for item compromise detection, they have not been employed in item compromise detection itself. The current study proposes a two-phase Bayesian change-point framework for both stationary and real-time detection of changes in each item's compromise status. In Phase I, a stationary Bayesian change-point model for compromise detection is fitted to the observed responses over a specified time-frame. The model produces parameter estimates for the change-point of each item from uncompromised to compromised, as well as structural parameters accounting for the post-change response distribution. Using the post-change model identified in Phase I, the Shiryaev procedure for sequential testing is employed in Phase II for real-time monitoring of item compromise. The proposed methods are evaluated in terms of parameter recovery, detection accuracy, and detection efficiency under various simulation conditions and in a real data example. The proposed method also showed superior detection accuracy and efficiency compared to the cumulative sum procedure. 相似文献

12.

Assessing statistical accuracy in ability estimation: A bootstrap approach

Michelle Liou Lien-Chi Yu 《Psychometrika》1991,56(1):55-67

相似文献

13.

Consequences of Ignoring Guessing Effects on Measurement Invariance Analysis

Ismail Cuhadar Yanyun Yang Insu Paek 《应用心理检测》2021,45(4):283

Pseudo-guessing parameters are present in item response theory applications for many educational assessments. When sample size is not sufficiently large, the guessing parameters may be ignored from the analysis. This study examines the impact of ignoring pseudo-guessing parameters on measurement invariance analysis, specifically, on item difficulty, item discrimination, and mean and variance of ability distribution. Results show that when non-zero guessing parameters are ignored from the measurement invariance analysis, item discrimination estimates tend to decrease particularly for more difficult items, and item difficulty estimates decrease unless the items are highly discriminating and difficult. As the guessing parameter increases, the size of the decrease in item discrimination and difficulty tends to increase, and the estimated mean and variance of ability distribution tend to be inaccurate. When two groups have heterogeneous ability distributions, ignoring the guessing parameter affects the reference group and the focal group differently. Implications of result findings are discussed. 相似文献

14.

2PLM下缺失数据处理方法及其比较

汪文义宋丽红罗芬丁树良《心理科学》2016,39(6):1500-1507

项目反应理论(IRT)是用于客观测量的现代教育与心理测量理论之一,广泛用于缺失数据十分常见的大尺度测验分析。IRT中两参数逻辑斯蒂克模型(2PLM)下仅有完全随机缺失机制下缺失反应和缺失能力处理的EM算法。本研究推导2PLM下缺失反应忽略的EM 算法,并提出随机缺失机制下缺失反应和缺失能力处理的EM算法和考虑能力估计和作答反应不确定性的多重借补法。研究显示：在各种缺失机制、缺失比例和测验设计下,缺失反应忽略的EM算法和多重借补法表现理想。相似文献

15.

Item bias detection using loglinear irt

Henk Kelderman 《Psychometrika》1989,54(4):681-697

A method is proposed for the detection of item bias with respect to observed or unobserved subgroups. The method uses quasi-loglinear models for the incomplete subgroup × test score × Item 1 × ... × itemk contingency table. If subgroup membership is unknown the models are Haberman's incomplete-latent-class models.The (conditional) Rasch model is formulated as a quasi-loglinear model. The parameters in this loglinear model, that correspond to the main effects of the item responses, are the conditional estimates of the parameters in the Rasch model. Item bias can then be tested by comparing the quasi-loglinear-Rasch model with models that contain parameters for the interaction of item responses and the subgroups.The author thanks Wim J. van der Linden and Gideon J. Mellenbergh for comments and suggestions and Frank Kok for empirical data. 相似文献

16.

Utilizing response times in cognitive diagnostic computerized adaptive testing under the higher-order deterministic input,noisy ‘and’ gate model

Hung-Yu Huang 《The British journal of mathematical and statistical psychology》2020,73(1):109-141

Methods of cognitive diagnostic computerized adaptive testing (CD-CAT) under higher-order cognitive diagnosis models have been developed to simultaneously provide estimates of the attribute mastery statuses of examinees for formative assessment and estimates of a latent continuous trait for overall summative evaluation. In a typical CD-CAT environment, examinees are often subject to a time limit, and the examinees’ response times (RTs) for specific test items can be routinely recorded by custom-made programs. Because examinees are individually administered tailored sets of test items from the item pool, they may experience different levels of speededness during testing and different levels of risk of running out of time. In this study, RTs were considered during the item-selection procedure to control the test speededness and the RTs were treated as useful information for improving latent trait estimation in CD-CAT under the higher-order deterministic input, noisy ‘and’ gate (DINA) model. A modified posterior-weighted Kullback–Leibler (PWKL) method that maximizes the item information per time unit and a shadow-test method that assembles a provisional test subject to a specified time constraint were developed. Two simulation studies were conducted to assess the effects of the proposed methods on the quality of CD-CAT for fixed- and variable-length exams. The results show that, compared with the traditional PWKL method, the proposed methods preserve a lower risk of running out of time while ensuring satisfactory attribute estimation and providing more accurate estimates of the latent trait and speed parameters. Finally, several suggestions for future research are proposed. 相似文献

17.

Empirical comparison of item parameters based on the logistic and normal functions

Frank B. Baker 《Psychometrika》1961,26(2):239-246

Maximum likelihood estimates of item parameters of a scholastic aptitude test were computed using the normal and logistic models. The goodness of fit of ogives specified by the pairs of item parameters to the observed data was determined for all items. While negligible differences in the limen values were found, differences in item discrimination indices indicated that interpretation of these indices requires separate frames of reference. The empirical results showed the logistic model to be a useful alternative to the normal model in item analysis. 相似文献

18.

Marginal maximum likelihood estimation of item response theory (IRT) equating coefficients for the common-examinee design

Haruhiko Ogasawara 《The Japanese psychological research》2001,43(2):72-82

A method of estimating item response theory (IRT) equating coefficients by the common-examinee design with the assumption of the two-parameter logistic model is provided. The method uses the marginal maximum likelihood estimation, in which individual ability parameters in a common-examinee group are numerically integrated out. The abilities of the common examinees are assumed to follow a normal distribution but with an unknown mean and standard deviation on one of the two tests to be equated. The distribution parameters are jointly estimated with the equating coefficients. Further, the asymptotic standard errors of the estimates of the equating coefficients and the parameters for the ability distribution are given. Numerical examples are provided to show the accuracy of the method. 相似文献

19.

The equivalence of two methods of parameter estimation for the rasch model

Larry G. Blackwood Edwin L. Bradley 《Psychometrika》1989,54(4):751-754

Two methods of estimating parameters in the Rasch model are compared. It is shown that estimates for a certain loglinear model for the score × item × response table are equivalent to the unconditional maximum likelihood estimates for the Rasch model. 相似文献

20.

Bayesian estimation in the two-parameter logistic model

Hariharan Swaminathan Janice A. Gifford 《Psychometrika》1985,50(3):349-364

A Bayesian procedure is developed for the estimation of parameters in the two-parameter logistic item response model. Joint modal estimates of the parameters are obtained and procedures for the specification of prior information are described. Through simulation studies it is shown that Bayesian estimates of the parameters are superior to maximum likelihood estimates in the sense that they are (a) more meaningful since they do not drift out of range, and (b) more accurate in that they result in smaller mean squared differences between estimates and true values.The research reported here was performed pursuant to Grant No. N0014-79-C-0039 with the Office of Naval Research. 相似文献