共查询到20条相似文献,搜索用时 31 毫秒
1.
现在,等值越来越受到各考试测验机构及测量学研究人员的重视,特别是项目反应理论等值的优越性更使他们有了信心。然而,很多人却没有注意到被试能力分布形态可能给等值结果带来的影响效果及程度。本研究以项目反应理论两级记分模型的项目参数等值在不同被试能力分布形态下的结果差异作为重点,探讨被试抽样偏差可能给项目特征曲线等值带来的误差问题。研究结果表明,被试能力分布形态会显著地影响项目参数等值的系数,特别地,能力分布的偏态系数与等值方程的截距存在显著的线性相关关系,但能力分布形态的变化对等值方程中斜率的影响并不明显 相似文献
2.
《International Journal of Testing》2013,13(2):101-117
Adapting educational tests from 1 language to others requires equating across the different language versions to be able to compare examinees from different language groups. Such equating is usually based on translated items considered to have similar content and psychometric characteristics in both source and target languages. However, because it is not possible to ascertain that these items are really similar in different languages, it is difficult to control and validate the equating outcome. The purpose of this study was to develop a method for evaluating cross-lingual equating and apply it to the Psychometric Entrance Test (PET) used for admission to Israeli universities. This test is written in Hebrew (the source-language, SL) and translated into 5 languages. A cross-lingual equating in a double-linking plan was performed in each of 12 forms translated to 1 target language (TL1), and each of 9 forms translated to another target language (TL2). The average difference between the equating results in the 2 links, indicating the overall instability incorporated in the equating process, was more than 10 times the standard error of equating in the TL1-SL process and about half this size in the TL2-SL process. The significance of the results and the differences found between the 2 TLs are discussed, as well as the potential displayed by the method for use as a general evaluative tool for cross-lingual equating. 相似文献
3.
题组越来越多地出现在各类考试中, 采用标准的IRT模型对有题组的测验等值, 可能因忽略题组的局部相依性导致等值结果的失真。为解决此问题, 我们采用基于题组的2PTM模型及IRT特征曲线法等值, 以等值系数估计值的误差大小作为衡量标准, 以Wilcoxon符号秩检验为依据, 在几种不同情况下进行了大量的Monte Carlo模拟实验。实验结果表明, 考虑了局部相依性的题组模型2PTM绝大部分情况下都比2PLM等值的误差小且有显著性差异。另外, 用6种不同等值准则对2PTM等值并评价了不同条件下等值准则之间的优劣。 相似文献
4.
A method of the IRT observed-score equating using chain equating through a third test without equating coefficients is presented with the assumption of the three-parameter logistic model. The asymptotic standard errors of the equated scores by this method are obtained using the results given by M. Liou and P.E. Cheng. The asymptotic standard errors of the IRT observed-score equating method using a synthetic examinee group with equating coefficients, which is a currently used method, are also provided. Numerical examples show that the standard errors by these observed-score equating methods are similar to those by the corresponding true score equating methods except in the range of low scores.The author is indebted to Michael J. Kolen for access to the real data used in this article and anonymous reviewers for their corrections and suggestions on this work. 相似文献
5.
6.
In the design of common-item equating, two groups of examinees are administered separate test forms, and each test form contains a common subset of items. We consider test equating under this situation as an incomplete data problem—that is, examinees have observed scores on one test form and missing scores on the other. Through the use of statistical data-imputation techniques, the missing scores can be replaced by reasonable estimates, and consequently the forms may be directly equated as if both forms were administered to both groups. In this paper we discuss different data-imputation techniques that are useful for equipercentile equating; we also use empirical data to evaluate the accuracy of these techniques as compared with chained equipercentile equating.A paper presented at the European Meeting of the Psychometric Society, Barcelona, Spain, July, 1993. 相似文献
7.
8.
Haruhiko Ogasawara 《The Japanese psychological research》2001,43(2):72-82
A method of estimating item response theory (IRT) equating coefficients by the common-examinee design with the assumption of the two-parameter logistic model is provided. The method uses the marginal maximum likelihood estimation, in which individual ability parameters in a common-examinee group are numerically integrated out. The abilities of the common examinees are assumed to follow a normal distribution but with an unknown mean and standard deviation on one of the two tests to be equated. The distribution parameters are jointly estimated with the equating coefficients. Further, the asymptotic standard errors of the estimates of the equating coefficients and the parameters for the ability distribution are given. Numerical examples are provided to show the accuracy of the method. 相似文献
9.
10.
一种新的等值准则及其适用范围的探讨 总被引:3,自引:0,他引:3
受假设检验方法的启发,该文引出了一种基于项目反应理论的新等值方法——平方根等值准则。它具有一些特点:定义式中答对、答错概率同时出现而不能互相替代;极易从0—1评分模式的版本转换到多级评分版本;它可以看成是Haebara等值准则的加权形式。以等值系数估计值的误差大小为衡量标准,以Wilcoxon符号秩检验为依据,大量的Monte Carlo模拟结果显示了一种有趣的现象,即等值方法的运用范围既与项目参数估计精度有关,又与等值系数A的范围有关,但与另一个等值系数B的范围无关。当项目参数估计精度较高或中等而A取值在0.9~1.3之间,新方法往往比Stocking_Lord方法和Haebara方法的估计误差小且有显著性差异,当项目参数估计精度较低时,而A从1.0~2.0时新方法都有优越性。 相似文献
11.
该文受Berkson将检验方法用于估计未知参数的启发,根据三个拟合优度统计量导出三种新的求取等值系数的方法,即:平方根等值方法(Square Root criterion,SQRTcrit)、对称相对熵等值方法(Symmetric Relative Entropy criterion,SREcrit)、加权等值方法(Weighted criterion,Wcrit),即Haebara准则的加权式。虽然在被检验的两个分布列很接近时,这三个多项拟合优度检验方法是渐近等价的,然而用它们求取等值系数时,Monte-Carlo模拟结果表明这三种新等值方法的行为表现存在差异。它们之间的差异和随机误差的大小有密切关系,即与项目参数估计的精度有关;还与等值系数A的范围有关。 相似文献
12.
13.
Analytic smoothing for equipercentile equating under the common item nonequivalent populations design 总被引:1,自引:0,他引:1
A cubic spline method for smoothing equipercentile equating relationships under the common item nonequivalent populations design is described. Statistical techniques based on bootstrap estimation are presented that are designed to aid in choosing an equating method/degree of smoothing. These include: (a) asymptotic significance tests that compare no equating and linear equating to equipercentile equating; (b) a scheme for estimating total equating error and for dividing total estimated error into systematic and random components. The smoothing technique and statistical procedures are explored and illustrated using data from forms of a professional certification test. 相似文献
14.
A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages
over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions
of scores from two tests. The Bayesian model and the previous equating models are compared through the analysis of data sets
famous in the equating literature. Also, the classical percentile-rank, linear, and mean equating models are each proven to
be a special case of a Bayesian model under a highly-informative choice of prior distribution. 相似文献
15.
本文旨在以“锚题代表性”这一研究命题切入,探索在非等组锚测验设计下,作为实现测验链接的重要载体,锚题和相关的测验试卷/水平之间究竟应该有什么关系。本文首先指出锚题代表性这一概念在等值和垂直量尺化领域中具有不同的含义,并给出其在垂直量尺化中的含义。通过考察测验链接中有关锚题代表性的既有研究,系统总结相关研究成果,本文概括出了当前锚题构建实践的可能优化方案,分析了锚题代表性研究的未来方向。 相似文献
16.
David J. Francis Kristi L. Santi Christopher Barr Jack M. Fletcher Al Varisco Barbara R. Foorman 《Journal of School Psychology》2008,46(3):315-342
This study examined the effects of passage and presentation order on progress monitoring assessments of oral reading fluency in 134 second grade students. The students were randomly assigned to read six one-minute passages in one of six fixed orders over a seven week period. The passages had been developed to be comparable based on readability formulas. Estimates of oral reading fluency varied across the six stories (67.9 to 93.9), but not as a function of presentation order. These passage effects altered the shape of growth trajectories and affected estimates of linear growth rates, but were shown to be removed when forms were equated. Explicit equating is essential to the development of equivalent forms, which can vary in difficulty despite high correlations across forms and apparent equivalence through readability indices. 相似文献
17.
基于经典测验理论(CTT)的等值方法主要有线性等值和等百分位等值两种。在不同情境下,不同的等值方法会产生不同的等值结果。本研究以真分数等值为依据,用蒙特卡洛模拟研究方法,综合比较了各种题目难度分布条件下和各种样本容量条件下两种CTT等值方法的等值结果。研究结果表明:(1)线性等值的误差受题目难度分布影响较大,等百分位等值的误差几乎不受题目难度分布影响。(2)线性等值的误差几乎不受样本容量的影响,等百分位等值的误差受样本容量影响较大。(3)不论题目难度分布如何,只要样本容量足够大,等百分位等值的效果都比线性等值更好。 相似文献
18.
Symbolic play and language are known to be highly interrelated, but the developmental process involved in this relationship is not clear. Three hypothetical paths were postulated to explore how play and language drive each other: (1) direct paths, whereby initiation of basic forms in symbolic action or babbling, will be directly related to all later emerging language and motor outputs; (2) an indirect interactive path, whereby basic forms in symbolic action will be associated with more complex forms in symbolic play, as well as with babbling, and babbling mediates the relationship between symbolic play and speech; and (3) a dual path, whereby basic forms in symbolic play will be associated with basic forms of language, and complex forms of symbolic play will be associated with complex forms of language. We micro-coded 288 symbolic vignettes gathered during a yearlong prospective bi-weekly examination (N = 14; from 6 to 18 months of age). Results showed that the age of initiation of single-object symbolic play correlates strongly with the age of initiation of later-emerging symbolic and vocal outputs; its frequency at initiation is correlated with frequency at initiation of babbling, later-emerging speech, and multi-object play in initiation. Results support the notion that a single-object play relates to the development of other symbolic forms via a direct relationship and an indirect relationship, rather than a dual-path hypothesis. 相似文献
19.
Intraclass correlations: uses in assessing rater reliability 总被引:52,自引:0,他引:52
Reliability coefficients often take the form of intraclass correlation coefficients. In this article, guidelines are given for choosing among six different forms of the intraclass correlation for reliability studies in which n target are rated by k judges. Relevant to the choice of the coefficient are the appropriate statistical model for the reliability and the application to be made of the reliability results. Confidence intervals for each of the forms are reviewed. 相似文献
20.
H.D Day David Marshall Basil Hamilton John Christy 《Journal of research in personality》1983,17(1):97-109
The utility of the method of aggregation as a measure of behavioral consistency was examined in 26 studies involving computer-generated, repeated-measurement data. The first series of studies involved rectangular distributions in which score constant. A second series of studies used normally distributed z scores, and score consistency was manipulated by inducing a desired correlation between the scores in adjacent trials. In both sets of studies, the aggregate stability coefficient was a strictly increasing function of the number of aggregated trials, and even trivial amounts of score stability resulted in large stability coefficients. In a third series of studies, high stability coefficients occurred when computed on combined unstable subsamples which differed from each other only in central tendency. Terminal aggregate coefficients were compared with Spearman-Brown prophecy and Cronbach's alpha reliability coefficients computed on the experimental data. It was concluded that the method of aggregation produces spuriously high estimates of behavioral consistency. It was further shown that the Spearman-Brown prophecy formula and coefficient alpha accurately predict the results of the aggregation method, suggesting that aggregation is an internal consistency reliability procedure. The equating of stability with traditional notions of reliability was questioned. 相似文献