首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
现在,等值越来越受到各考试测验机构及测量学研究人员的重视,特别是项目反应理论等值的优越性更使他们有了信心。然而,很多人却没有注意到被试能力分布形态可能给等值结果带来的影响效果及程度。本研究以项目反应理论两级记分模型的项目参数等值在不同被试能力分布形态下的结果差异作为重点,探讨被试抽样偏差可能给项目特征曲线等值带来的误差问题。研究结果表明,被试能力分布形态会显著地影响项目参数等值的系数,特别地,能力分布的偏态系数与等值方程的截距存在显著的线性相关关系,但能力分布形态的变化对等值方程中斜率的影响并不明显  相似文献   

2.
Adapting educational tests from 1 language to others requires equating across the different language versions to be able to compare examinees from different language groups. Such equating is usually based on translated items considered to have similar content and psychometric characteristics in both source and target languages. However, because it is not possible to ascertain that these items are really similar in different languages, it is difficult to control and validate the equating outcome. The purpose of this study was to develop a method for evaluating cross-lingual equating and apply it to the Psychometric Entrance Test (PET) used for admission to Israeli universities. This test is written in Hebrew (the source-language, SL) and translated into 5 languages. A cross-lingual equating in a double-linking plan was performed in each of 12 forms translated to 1 target language (TL1), and each of 9 forms translated to another target language (TL2). The average difference between the equating results in the 2 links, indicating the overall instability incorporated in the equating process, was more than 10 times the standard error of equating in the TL1-SL process and about half this size in the TL2-SL process. The significance of the results and the differences found between the 2 TLs are discussed, as well as the potential displayed by the method for use as a general evaluative tool for cross-lingual equating.  相似文献   

3.
吴锐  丁树良  甘登文 《心理学报》2010,42(3):434-442
题组越来越多地出现在各类考试中, 采用标准的IRT模型对有题组的测验等值, 可能因忽略题组的局部相依性导致等值结果的失真。为解决此问题, 我们采用基于题组的2PTM模型及IRT特征曲线法等值, 以等值系数估计值的误差大小作为衡量标准, 以Wilcoxon符号秩检验为依据, 在几种不同情况下进行了大量的Monte Carlo模拟实验。实验结果表明, 考虑了局部相依性的题组模型2PTM绝大部分情况下都比2PLM等值的误差小且有显著性差异。另外, 用6种不同等值准则对2PTM等值并评价了不同条件下等值准则之间的优劣。  相似文献   

4.
A method of the IRT observed-score equating using chain equating through a third test without equating coefficients is presented with the assumption of the three-parameter logistic model. The asymptotic standard errors of the equated scores by this method are obtained using the results given by M. Liou and P.E. Cheng. The asymptotic standard errors of the IRT observed-score equating method using a synthetic examinee group with equating coefficients, which is a currently used method, are also provided. Numerical examples show that the standard errors by these observed-score equating methods are similar to those by the corresponding true score equating methods except in the range of low scores.The author is indebted to Michael J. Kolen for access to the real data used in this article and anonymous reviewers for their corrections and suggestions on this work.  相似文献   

5.
刘铁川  戴海琦  赵玉 《心理科学》2012,35(2):446-451
设置铆题来链接不同测验形式是一种常用的等值设计。但受到曝光等因素影响,铆题功能在不同施测时间会发生改变。本研究采用MH检验和logistic回归考察我国一大型考试等值的铆题质量,结果发现,有22个铆题发生参数漂移,铆题的难度参数和区分度参数可能发生漂移;这些铆题中大部分在二次使用时无法通过模型拟合检验;若不删除参数发生漂移的铆题导致较大的系统等值误差,应将铆题参数漂移检验作为等值中的一步必要工作。  相似文献   

6.
In the design of common-item equating, two groups of examinees are administered separate test forms, and each test form contains a common subset of items. We consider test equating under this situation as an incomplete data problem—that is, examinees have observed scores on one test form and missing scores on the other. Through the use of statistical data-imputation techniques, the missing scores can be replaced by reasonable estimates, and consequently the forms may be directly equated as if both forms were administered to both groups. In this paper we discuss different data-imputation techniques that are useful for equipercentile equating; we also use empirical data to evaluate the accuracy of these techniques as compared with chained equipercentile equating.A paper presented at the European Meeting of the Psychometric Society, Barcelona, Spain, July, 1993.  相似文献   

7.
不同定义平行测验等值的群体不变性   总被引:1,自引:0,他引:1  
群体不变性是等值的一个重要假设,即对不同的考生子群体等值函数一致。本研究对不同平行测验定义下线性等值的群体不变性进行了理论分析和模拟研究,模拟研究REMSD指标通过六种不同加权方式计算。结果显示,严格平行测验在信度较低时REMSD指标更大;子群体均值差异和信度差异对REMSD的影响存在明显的交互作用;REMSD指标在期望权重等权下的最大,在分数权重采用子群体比例加权最小。最后对结果进行了讨论,对REMSD权重使用及进一步研究给出了建议。  相似文献   

8.
A method of estimating item response theory (IRT) equating coefficients by the common-examinee design with the assumption of the two-parameter logistic model is provided. The method uses the marginal maximum likelihood estimation, in which individual ability parameters in a common-examinee group are numerically integrated out. The abilities of the common examinees are assumed to follow a normal distribution but with an unknown mean and standard deviation on one of the two tests to be equated. The distribution parameters are jointly estimated with the equating coefficients. Further, the asymptotic standard errors of the estimates of the equating coefficients and the parameters for the ability distribution are given. Numerical examples are provided to show the accuracy of the method.  相似文献   

9.
核等值流程包括:预平滑、估计分数概率、连续化、等值、评估等值结果。该方法兼具线性等值与等百分位等值的优点, 各环节扩展性与包容性较强; 采用平滑与连续化处理, 可降低等值随机误差; 等值差异标准误等其所特有的概念为结果评估提供可靠的工具。连续化与带宽选择方法等因素均可影响其表现; 基于核等值的新方法为等值发展提供了新颖的视角。未来可关注核等值体系的扩充与完善、流程的更新、等值方法的结合和比较等方向。  相似文献   

10.
一种新的等值准则及其适用范围的探讨   总被引:3,自引:0,他引:3  
受假设检验方法的启发,该文引出了一种基于项目反应理论的新等值方法——平方根等值准则。它具有一些特点:定义式中答对、答错概率同时出现而不能互相替代;极易从0—1评分模式的版本转换到多级评分版本;它可以看成是Haebara等值准则的加权形式。以等值系数估计值的误差大小为衡量标准,以Wilcoxon符号秩检验为依据,大量的Monte Carlo模拟结果显示了一种有趣的现象,即等值方法的运用范围既与项目参数估计精度有关,又与等值系数A的范围有关,但与另一个等值系数B的范围无关。当项目参数估计精度较高或中等而A取值在0.9~1.3之间,新方法往往比Stocking_Lord方法和Haebara方法的估计误差小且有显著性差异,当项目参数估计精度较低时,而A从1.0~2.0时新方法都有优越性。  相似文献   

11.
该文受Berkson将检验方法用于估计未知参数的启发,根据三个拟合优度统计量导出三种新的求取等值系数的方法,即:平方根等值方法(Square Root criterion,SQRTcrit)、对称相对熵等值方法(Symmetric Relative Entropy criterion,SREcrit)、加权等值方法(Weighted criterion,Wcrit),即Haebara准则的加权式。虽然在被检验的两个分布列很接近时,这三个多项拟合优度检验方法是渐近等价的,然而用它们求取等值系数时,Monte-Carlo模拟结果表明这三种新等值方法的行为表现存在差异。它们之间的差异和随机误差的大小有密切关系,即与项目参数估计的精度有关;还与等值系数A的范围有关。  相似文献   

12.
项目反应理论框架下的新等值方法——对数对比等值法   总被引:3,自引:2,他引:1  
项目反应理论有一些以除法形式给出的多级评分模型,若采用Haebara等值法、Stocking_Lord等值法或对称相对熵等值法进行测验等值,都因其对初值有较高要求而可能导致失败。针对这一类模型,我们给出了一种新的等值方法——对数对比等值法。这种方法收敛快,对迭代初值要求低,所得结果精度较高,可以为其他等值方法提供良好的初值。研究表明,对数对比等值法还改进和推广了0-1评分的两参数Logistic模型的Logit变换等值法  相似文献   

13.
A cubic spline method for smoothing equipercentile equating relationships under the common item nonequivalent populations design is described. Statistical techniques based on bootstrap estimation are presented that are designed to aid in choosing an equating method/degree of smoothing. These include: (a) asymptotic significance tests that compare no equating and linear equating to equipercentile equating; (b) a scheme for estimating total equating error and for dividing total estimated error into systematic and random components. The smoothing technique and statistical procedures are explored and illustrated using data from forms of a professional certification test.  相似文献   

14.
A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions of scores from two tests. The Bayesian model and the previous equating models are compared through the analysis of data sets famous in the equating literature. Also, the classical percentile-rank, linear, and mean equating models are each proven to be a special case of a Bayesian model under a highly-informative choice of prior distribution.  相似文献   

15.
叶萌  辛涛 《心理科学》2015,(1):209-215
本文旨在以“锚题代表性”这一研究命题切入,探索在非等组锚测验设计下,作为实现测验链接的重要载体,锚题和相关的测验试卷/水平之间究竟应该有什么关系。本文首先指出锚题代表性这一概念在等值和垂直量尺化领域中具有不同的含义,并给出其在垂直量尺化中的含义。通过考察测验链接中有关锚题代表性的既有研究,系统总结相关研究成果,本文概括出了当前锚题构建实践的可能优化方案,分析了锚题代表性研究的未来方向。  相似文献   

16.
This study examined the effects of passage and presentation order on progress monitoring assessments of oral reading fluency in 134 second grade students. The students were randomly assigned to read six one-minute passages in one of six fixed orders over a seven week period. The passages had been developed to be comparable based on readability formulas. Estimates of oral reading fluency varied across the six stories (67.9 to 93.9), but not as a function of presentation order. These passage effects altered the shape of growth trajectories and affected estimates of linear growth rates, but were shown to be removed when forms were equated. Explicit equating is essential to the development of equivalent forms, which can vary in difficulty despite high correlations across forms and apparent equivalence through readability indices.  相似文献   

17.
基于经典测验理论(CTT)的等值方法主要有线性等值和等百分位等值两种。在不同情境下,不同的等值方法会产生不同的等值结果。本研究以真分数等值为依据,用蒙特卡洛模拟研究方法,综合比较了各种题目难度分布条件下和各种样本容量条件下两种CTT等值方法的等值结果。研究结果表明:(1)线性等值的误差受题目难度分布影响较大,等百分位等值的误差几乎不受题目难度分布影响。(2)线性等值的误差几乎不受样本容量的影响,等百分位等值的误差受样本容量影响较大。(3)不论题目难度分布如何,只要样本容量足够大,等百分位等值的效果都比线性等值更好。  相似文献   

18.
Symbolic play and language are known to be highly interrelated, but the developmental process involved in this relationship is not clear. Three hypothetical paths were postulated to explore how play and language drive each other: (1) direct paths, whereby initiation of basic forms in symbolic action or babbling, will be directly related to all later emerging language and motor outputs; (2) an indirect interactive path, whereby basic forms in symbolic action will be associated with more complex forms in symbolic play, as well as with babbling, and babbling mediates the relationship between symbolic play and speech; and (3) a dual path, whereby basic forms in symbolic play will be associated with basic forms of language, and complex forms of symbolic play will be associated with complex forms of language. We micro-coded 288 symbolic vignettes gathered during a yearlong prospective bi-weekly examination (N = 14; from 6 to 18 months of age). Results showed that the age of initiation of single-object symbolic play correlates strongly with the age of initiation of later-emerging symbolic and vocal outputs; its frequency at initiation is correlated with frequency at initiation of babbling, later-emerging speech, and multi-object play in initiation. Results support the notion that a single-object play relates to the development of other symbolic forms via a direct relationship and an indirect relationship, rather than a dual-path hypothesis.  相似文献   

19.
Intraclass correlations: uses in assessing rater reliability   总被引:52,自引:0,他引:52  
Reliability coefficients often take the form of intraclass correlation coefficients. In this article, guidelines are given for choosing among six different forms of the intraclass correlation for reliability studies in which n target are rated by k judges. Relevant to the choice of the coefficient are the appropriate statistical model for the reliability and the application to be made of the reliability results. Confidence intervals for each of the forms are reviewed.  相似文献   

20.
The utility of the method of aggregation as a measure of behavioral consistency was examined in 26 studies involving computer-generated, repeated-measurement data. The first series of studies involved rectangular distributions in which score constant. A second series of studies used normally distributed z scores, and score consistency was manipulated by inducing a desired correlation between the scores in adjacent trials. In both sets of studies, the aggregate stability coefficient was a strictly increasing function of the number of aggregated trials, and even trivial amounts of score stability resulted in large stability coefficients. In a third series of studies, high stability coefficients occurred when computed on combined unstable subsamples which differed from each other only in central tendency. Terminal aggregate coefficients were compared with Spearman-Brown prophecy and Cronbach's alpha reliability coefficients computed on the experimental data. It was concluded that the method of aggregation produces spuriously high estimates of behavioral consistency. It was further shown that the Spearman-Brown prophecy formula and coefficient alpha accurately predict the results of the aggregation method, suggesting that aggregation is an internal consistency reliability procedure. The equating of stability with traditional notions of reliability was questioned.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号