项目反应理论框架下多级评分项目的信息函数   总被引:1,自引:0,他引:1  
杜文久 《心理学报》2006,38(1):135-144
目的是给出多级评分项目的信息函数计算公式,同时通过几个实例讨论了多级评分项目信息函数在实践中的应用。主要取得了如下成果:(1)首先通过一个例子给出了测验项目的样本空间;(2)以二参数逻辑斯蒂模型为基础,讨论了几种多级评分项目的概率函数,并在此基础上给出了多级评分项目的信息函数计算公式;(3)通过几个实例讨论了多级评分项目信息函数在实践中的应用  相似文献   

The relations among alternative parameterizations of the binary factor analysis (FA) model and two-parameter logistic (2PL) item response theory (IRT) model have been thoroughly discussed in literature. However, the conversion formulas widely available are mainly for transforming parameter estimates from one parameterization to another. There is a lack of discussion about the standard error (SE) conversion among different parameterizations, when SEs of IRT model parameters are often of immediate interest to practitioners. This article provides general formulas for computing the SEs of transformed parameter values, when these parameters are transformed from FA to IRT models. These formulas are suitable for unidimensional 2PL, multidimensional 2PL, and bi-factor 2PL models. A simulation study is conducted to verify the formula by providing empirical evidence. A real data example is given in the end for an illustration.  相似文献   

This paper studies changes of standard errors (SE) of the normal-distribution-based maximum likelihood estimates (MLE) for confirmatory factor models as model parameters vary. Using logical analysis, simplified formulas and numerical verification, monotonic relationships between SEs and factor loadings as well as unique variances are found. Conditions under which monotonic relationships do not exist are also identified. Such functional relationships allow researchers to better understand the problem when significant factor loading estimates are expected but not obtained, and vice versa. What will affect the likelihood for Heywood cases (negative unique variance estimates) is also explicit through these relationships. Empirical findings in the literature are discussed using the obtained results.  相似文献   

In this paper, the efficiency of conditional maximum likelihood (CML) and marginal maximum likelihood (MML) estimation of the item parameters of the Rasch model in incomplete designs is investigated. The use of the concept of F-information (Eggen, 2000) is generalized to incomplete testing designs. The scaled determinant of the F-information matrix is used as a scalar measure of information contained in a set of item parameters. In this paper, the relation between the normalization of the Rasch model and this determinant is clarified. It is shown that comparing estimation methods with the defined information efficiency is independent of the chosen normalization. The generalization of the method to other models than the Rasch model is discussed. In examples, information comparisons are conducted. It is found that for both CML and MML some information is lost in all incomplete designs compared to complete designs. A general result is that with increasing test booklet length the efficiency of an incomplete design, compared to a complete design, is increasing, as is the efficiency of CML compared to MML. The main difference between CML and MML is seen in the effect of the length of the test booklet. It will be demonstrated that with very small booklets, there is a substantial loss in information (about 35%) with CML estimation, while this loss is only about 10% in MML estimation. However, with increasing test length, the differences between CML and MML quickly disappear.  相似文献   

心理和教育测量一般只能达到顺序量表的水平,其测量数据与被测因子间并非简单线性关系。题目因素分析是用来描述测量题目与因子间非线性关系的统计模型。题目因素分析主要有基于结构方程模型和基于项目反应理论两类方法,两类方法之间存在紧密的联系,甚至可以看作是同一模型的两种表现形式。本文详细阐述了该关系,同时对两类方法在参数估计、模型拟合指标、测量一致性检验和支撑软件等方面的特点进行了分析和比较,以便研究者选择最为适合其研究的方法。  相似文献   

等级反应模型下项目特征曲线等值法在大型考试中的应用   总被引:2,自引:1,他引:1  
在中国最大的资格考试之一的经济专业资格考试中,为保证不同年度间考试的可比性、进行题库建设和为计算机自适应考试做准备,应用项目反应理论中等级反应模型下的项目特征曲线等值法,采用铆测验等值设计,实现了4个年度考试资料的项目参数和能力参数的等值,并成功地组建了经济专业题库。在此基础上,利用等值技术对不同年份试卷的划界分数进行了比较,为经济考试的合格标准制定、确保考试的公平性提供了实证依据。  相似文献   

该文对GPCM的项目参数估计的方法进行较为深入的探讨,特别对容易混淆的参数初值计算以及项目参数分两步进行估计的原因进行了阐述。并且基于MMLE/EM算法,开发了相应的项目参数估计程序。通过大量的蒙特卡洛模拟,与国外专业软件PARSCALE比较,本程序的步骤参数估计值更好,区分度参数估计值相当,具有较好的应用价值。  相似文献   

检验项目功能差异的两类方法-CFA和IRT的比较   总被引:2,自引:0,他引:2  
目前在验证性因素分析(CFA)和项目反应理论(IRT)两个领域,都有一些检验方法来识别项目功能差异(DIF)。该文主要针对单维的多级计分项目,分别介绍CFA和IRT检测DIF的方法,并进行二者的比较。  相似文献   

With reference to a questionnaire aimed at assessing the performance of Italian nursing homes on the basis of the health conditions of their patients, we investigate two relevant issues: dimensionality of the latent structure and discriminating power of the items composing the questionnaire. The approach is based on a multidimensional item response theory model, which assumes a two-parameter logistic parameterization for the response probabilities. This model represents the health status of a patient by latent variables having a discrete distribution and, therefore, it may be seen as a constrained version of the latent class model. On the basis of the adopted model, we implement a hierarchical clustering algorithm aimed at assessing the actual number of dimensions measured by the questionnaire. These dimensions correspond to disjoint groups of items. Once the number of dimensions is selected, we also study the discriminating power of every item, so that it is possible to select the subset of these items which is able to provide an amount of information close to that of the full set. We illustrate the proposed approach on the basis of the data collected on 1,051 elderly people hosted in a sample of Italian nursing homes.  相似文献   

In educational and psychological measurement when short test forms are used, the asymptotic normality of the maximum likelihood estimator of the person parameter of item response models does not hold. As a result, hypothesis tests or confidence intervals of the person parameter based on the normal distribution are likely to be problematic. Inferences based on the exact distribution, on the other hand, do not suffer from this limitation. However, the computation involved for the exact distribution approach is often prohibitively expensive. In this paper, we propose a general framework for constructing hypothesis tests and confidence intervals for IRT models within the exponential family based on exact distribution. In addition, an efficient branch and bound algorithm for calculating the exact p value is introduced. The type-I error rate and statistical power of the proposed exact test as well as the coverage rate and the lengths of the associated confidence interval are examined through a simulation. We also demonstrate its practical use by analyzing three real data sets.  相似文献   

Exploratory process factor analysis (EPFA) is a data-driven latent variable model for multivariate time series. This article presents analytic standard errors for EPFA. Unlike standard errors for exploratory factor analysis with independent data, the analytic standard errors for EPFA take into account the time dependency in time series data. In addition, factor rotation is treated as the imposition of equality constraints on model parameters. Properties of the analytic standard errors are demonstrated using empirical and simulated data.  相似文献   

The multivariate rather than the univariate range correction is used for estimating unrestricted applicant population validities in many military test validity studies but not uniformly. A Monte Carlo approach compared the standard errors of range-corrected validities under various experimental conditions adhering to the assumptions underlying correction accuracy. The multivariate corrected validities had smaller standard errors than both the univariate-corrected validities and the unrestricted validities. We conclude that using the univariate correction could fail to reveal the most valid selection instrument and that the multivariate correction should be used when scores for relevant predictors are available for the unrestricted population.  相似文献   

The article describes 6 issues influencing standard errors in exploratory factor analysis and reviews 7 methods of computing standard errors for rotated factor loadings and factor correlations. These 7 methods are the augmented information method, the nonparametric bootstrap method, the infinitesimal jackknife method, the method using the asymptotic distributions of unrotated factor loadings, the sandwich method, the parametric bootstrap method, and the jackknife method. Standard error estimates are illustrated using a personality study with 537 men and an intelligence study with 145 children.  相似文献   

The paper obtains consistent standard errors (SE) and biases of order O(1/n) for the sample standardized regression coefficients with both random and given predictors. Analytical results indicate that the formulas for SEs given in popular text books are consistent only when the population value of the regression coefficient is zero. The sample standardized regression coefficients are also biased in general, although it should not be a concern in practice when the sample size is not too small. Monte Carlo results imply that, for both standardized and unstandardized sample regression coefficients, SE estimates based on asymptotics tend to under-predict the empirical ones at smaller sample sizes.  相似文献   

In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal.  相似文献   

The common way to calculate confidence intervals for item response theory models is to assume that the standardized maximum likelihood estimator for the person parameter θ is normally distributed. However, this approximation is often inadequate for short and medium test lengths. As a result, the coverage probabilities fall below the given level of significance in many cases; and, therefore, the corresponding intervals are no longer confidence intervals in terms of the actual definition. In the present work, confidence intervals are defined more precisely by utilizing the relationship between confidence intervals and hypothesis testing. Two approaches to confidence interval construction are explored that are optimal with respect to criteria of smallness and consistency with the standard approach.  相似文献   

The cognitive reflection test (CRT) is a short measure of a person's ability to resist intuitive response tendencies and to produce a normatively correct response, which is based on effortful reasoning. Although the CRT is a very popular measure, its psychometric properties have not been extensively investigated. A major limitation of the CRT is the difficulty of the items, which can lead to floor effects in populations other than highly educated adults. The present study aimed at investigating the psychometric properties of the CRT applying item response theory analyses (a two‐parameter logistic model) and at developing a new version of the scale (the CRT‐long), which is appropriate for participants with both lower and higher levels of cognitive reflection. The results demonstrated the good psychometric properties of the original, as well as the new scale. The validity of the new scale was also assessed by measuring correlations with various indicators of intelligence, numeracy, reasoning and decision‐making skills, and thinking dispositions. Moreover, we present evidence for the suitability of the new scale to be used with developmental samples. Finally, by comparing the performance of adolescents and young adults on the CRT and CRT‐long, we report the first investigation into the development of cognitive reflection. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

