首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
GULLIKSEN H 《Psychometrika》1950,15(3):259-269
Some methods are presented for estimating the reliability of a partially speeded test without the use of a parallel form. The effect of these formulas on some test data is illustrated. Whenever an odd-even reliability is computed it is probably desirable to use one of the formulas noted in Section 2 of this paper in addition to the usual Spearman-Brown correction. Since the formulas given here involve the mean and the standard deviation of the “number unattempted score,” a method is given in Section 4 for computing this mean and standard deviation from item analysis data. If the item analysis data are available, this method will save considerable time as compared with rescoring answer sheets.  相似文献   

2.
Asymptotic formulas are derived for the bias in the maximum likelihood estimators of the item parameters in the logistic item response model when examinee abilities are known. Numerical results are given for a typical verbal test for college admission.  相似文献   

3.
Lord  Frederic M. 《Psychometrika》1960,25(4):325-342
Formulas are derived for using the available item statistics and score statistics on a test to estimate the moments of the score distribution of a lengthened (or shortened) form of the same test. Other formulas are derived for estimating the bivariate moments of the scatterplot between two parallel test forms using only the data available on either form alone. An empirical study is made showing in each case satisfactory agreement between the theoretical values predicted from the formulas and the values actually observed. These results suggest the utility of the true-score model used in deriving the formulas.This work was supported by contract Nonr-2752(00) between the Office of Naval Research and Educational Testing Service. Reproduction in whole or in part for any purpose of the United States Government is permitted.  相似文献   

4.
考试评分缺失数据较为常见,如何有效利用现有数据进行统计分析是个关键性问题。在考试评分中,题目与评分者对试卷得分的影响不容忽视。根据概化理论原理,按考试评分规则推导出含有缺失数据双侧面交叉设计(p×i×r)方差分量估计公式,用Matlab7.0软件模拟多组缺失数据,验证此公式的有效性。结果发现:(1)推导出的公式较为可靠,估计缺失数据的方差分量偏差相对较小,即便数据缺失率达到50%以上,公式仍能对方差分量进行较为准确地估计;(2)题目数量对概化理论缺失数据方差分量的估计影响最大,评分者次之,当题目和评价者数量分别为6和5时,公式能够趋于稳定地估计;(3)学生数量对各方差分量的估计影响较小,无论是小规模考试还是大规模考试,概化理论估计缺失数据的多个方差分量结果相差不大。  相似文献   

5.
Previous papers on this subject derive the correlation between an item and the remainder of the test. This correlation is unsatisfactory because the reliability of the remainder varies inversely with the reliability of the item omitted. The present paper derives the correlation between an item and the total test, with that item replaced by a rationally equivalent item. The general formula is then modified, for dichotomus items, to give the corrected point-biserial, biserial, and Brogden biserial correlations. The results apply strictly only to factorially homogeneous tests: those in which the same trait or combination of traits is measured (apart from error) by every item.  相似文献   

6.
Babitz  Milton  Keys  Noel 《Psychometrika》1940,5(4):283-288
It is noted that the average inter-item correlation, which represents the internal consistency of a test, yields a unique estimate of test reliability. A close approximation to this average is given by a formula which requires the correlation of each item with the total score and the standard deviation of each item. The formula is especially useful in those instances where the number of items is small and where the variation in item sigmas should not be neglected.  相似文献   

7.
This article (a) describes how McDonald's nonlinear factor analytic approach to the normal ogive curve can be used to factor analyse total test scores, (b) discusses the conditions in which this model is more appropriate than the widely used linear model, and (c) illustrates the applicability of both models using an empirical example. The rationale for the described procedure is that the test scores are simple sums of binary item responses whose item characteristic curves are adequately represented by normal ogives. The results obtained in the empirical example are meaningful and informative, and agree with the results obtained at the item level.  相似文献   

8.
A hybrid procedure for number correct scoring is proposed. The proposed scoring procedure is based on both classical true-score theory (CTT) and multidimensional item response theory (MIRT). Specifically, the hybrid scoring procedure uses test item weights based on MIRT and the total test scores are computed based on CTT. Thus, what makes the hybrid scoring method attractive is that this method accounts for the dimensionality of the test items while test scores remain easy to compute. Further, the hybrid scoring does not require large sample sizes once the item parameters are known. Monte Carlo techniques were used to compare and contrast the proposed hybrid scoring method with three other scoring procedures. Results indicated that all scoring methods in this study generated estimated and true scores that were highly correlated. However, the hybrid scoring procedure had significantly smaller error variances between the estimated and true scores relative to the other procedures.  相似文献   

9.
Although pictures are often added to text in items of educational tests, little is known about their influence on item solving. Therefore, we conducted an experiment in which we examined how pictures affected item solving. A total of N = 158 fourth‐grade students completed a physics knowledge test under one of six experimental conditions. The experimental conditions varied according to whether or not pictures were presented in the stem and in the answer options of the test items. The results showed that pictures in the stem and in the answer options increased the correctness with which students responded to the test items. This was particularly true for test items that required the application of relationships. In addition, response time was reduced when pictures were added to the answer options of the test items. Hence, pictures are an important feature of test items that produce changes in item processing. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

10.
Rasch models are characterised by sufficient statistics for all parameters. In the Rasch unidimensional model for two ordered categories, the parameterisation of the person and item is symmetrical and it is readily established that the total scores of a person and item are sufficient statistics for their respective parameters. In contrast, in the unidimensional polytomous Rasch model for more than two ordered categories, the parameterisation is not symmetrical. Specifically, each item has a vector of item parameters, one for each category, and each person only one person parameter. In addition, different items can have different numbers of categories and, therefore, different numbers of parameters. The sufficient statistic for the parameters of an item is itself a vector. In estimating the person parameters in presently available software, these sufficient statistics are not used to condition out the item parameters. This paper derives a conditional, pairwise, pseudo-likelihood and constructs estimates of the parameters of any number of persons which are independent of all item parameters and of the maximum scores of all items. It also shows that these estimates are consistent. Although Rasch’s original work began with equating tests using test scores, and not with items of a test, the polytomous Rasch model has not been applied in this way. Operationally, this is because the current approaches, in which item parameters are estimated first, cannot handle test data where there may be many scores with zero frequencies. A small simulation study shows that, when using the estimation equations derived in this paper, such a property of the data is no impediment to the application of the model at the level of tests. This opens up the possibility of using the polytomous Rasch model directly in equating test scores.  相似文献   

11.
When planning a study, sample size determination is one of the most important tasks facing the researcher. The size will depend on the purpose of the study, the cost limitations, and the nature of the data. By specifying the standard deviation ratio and/or the sample size ratio, the present study considers the problem of heterogeneous variances and non‐normality for Yuen's two‐group test and develops sample size formulas to minimize the total cost or maximize the power of the test. For a given power, the sample size allocation ratio can be manipulated so that the proposed formulas can minimize the total cost, the total sample size, or the sum of total sample size and total cost. On the other hand, for a given total cost, the optimum sample size allocation ratio can maximize the statistical power of the test. After the sample size is determined, the present simulation applies Yuen's test to the sample generated, and then the procedure is validated in terms of Type I errors and power. Simulation results show that the proposed formulas can control Type I errors and achieve the desired power under the various conditions specified. Finally, the implications for determining sample sizes in experimental studies and future research are discussed.  相似文献   

12.
For multiple-choice tests where noa priori key exists, the initial selection of a key for maximum validity may be made on the basis of the number of persons choosing each alternative and their mean criterion score. The keying formula is derived. Once the initial keying has been done, further precision in keying and item selection may use, in addition, the mean total test score for persons choosing each alternative. Item-selection formulas suggested by Horst and by Gulliksen for maximizing test validity are both in the form of a ratio, an item-validity index divided by an item-reliability index. The formula derived here is shown to be equivalent to the numerators of these formulas. The expression in the denominators uses the total test score. Although a radical appears in the denominator of Horst's formula and not in the denominator of Gulliksen's formula, both of them select the same items in practice.The author gratefully acknowledges the suggestions and criticisms of Dr. Harold Gulliksen, Research Adviser at the Educational Testing Service.  相似文献   

13.
The construct validity of the short form of the Bruininks-Oseretsky Test of Motor Proficiency for the assessment of gross and fine motor skills was assessed in 377 nondisabled Greek preschool and primary school children (age range 5 yr. to 8:3 mo.) from urban areas of northern Greece. Analysis showed the three factors accounted for 54.1% of the total score variance, agreeing with the earlier findings. Moreover, the item scores had statistically significant relationships with the total short-form score, except for that of copying a circle with the preferred hand. This latter item was also the only one with a small effect size. Age confirmed a statistically significant effect on the scores of the half items of the test battery, also an earlier finding. This test seemed to be a valid test of motor proficiency in normal Greek preschool and primary school children.  相似文献   

14.
The item recovery or reminiscence component of recall in RTT procedures was investigated in two free recall experiments. In the first, Erdelyi and Becker's (1974) "hypermnesia" effect was found with pictures as the to-be-remembered material: total amount recalled increased over two successive test trials, and included a large reminiscence effect, with some 27% of previously unrecalled items appearing in the second test. The second experiment, with word lists, showed that the frequency of occurrence of new items was greater following a 12-min separation of two test trials than in two relatively massed tests. This kind of item recovery is relevant to models of output interference and retrieval limitations in free recall, and may be also related to spontaneous recovery effects.  相似文献   

15.
For the tests in which the score on an item is not restricted to 0 and 1, but is any number on a continuous scale, a procedure for estimating an examinee's true score is given. For the case of 0, 1 item scoring this problem was considered by Lord [1959]. Following Lord, the least squares estimation procedure is used and the regression coefficient is obtained, which is compared with the generalized KR(20) and KR(21) formulas. Also, results are discussed using analysis of variance models.Now at Brooklyn College of the City University of New York.  相似文献   

16.
Hypermnesia is an increase in recall over repeated tests. A core issue is the role of repeated testing, per se, versus total retrieval time. Prior research implies an equivalence between multiple recall tests and a single test of equal total duration, but theoretical analyses indicate otherwise. Three experiments investigated this issue using various study materials (unrelated word lists, related word lists, and a short story). In the first experimental session, the study phase was followed by a series of short recall tests or by a single, long test of equal total duration. Two days later, participants took a final recall test. The multiple and single test conditions produced equivalent performance in the first session, but the multiple test group exhibited less forgetting and fewer item losses in the final test. In a fourth experiment, using a brief delay (15 min) between the recall sessions, the multiple recall condition produced greater hypermnesia as well as fewer item losses. In addition, final recall was significantly higher in the multiple than in the single test condition in three of the four experiments. Thus, single and repeated recall tests of equal total duration are not functionally equivalent, but rather produce differences observable in subsequent recall tests.  相似文献   

17.
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The objective was to provide bounds of the likely DIF effects on these measurement consequences. Five factors were manipulated: test length, percentage of DIF items per form, item type, sample size, and level of group ability difference. Results indicate that the greatest DIF effect was less than 2 points on the 0 to 60 total score scale and about 0.15 on the IRT ability scale. DIF had a limited effect on the ratio of true-score variance to observed-score variance, but its influence on the standard error of estimation for the IRT ability parameter was evident for certain ability values.  相似文献   

18.
A general linear latent trait model for continuous item responses is described. The special unidimensional case for continuous item responses is Joreskog's (1971) model of congeneric item responses. In the context of the unidimensional case model for continuous item responses the concepts of item and test information functions, specific objectivity, item bias, and reliability are discussed; also the application of the model to test construction is shown. Finally, the correspondence with latent trait theory for dichotomous item responses is discussed.  相似文献   

19.
To date, exposure control procedures that are designed to control item exposure and test overlap simultaneously are based on the assumption of item sharing between pairs of examinees. However, examinees may obtain test information from more than one examinee in practice. This larger scope of information sharing needs to be taken into account in refining exposure control procedures. To control item exposure and test overlap among a group of examinees larger than two, the relationship between the two indices needs to be identified first. The purpose of this paper is to analytically derive the relationships between item exposure rate and each of the two forms of test overlap, item sharing and item pooling, for fixed‐length computerized adaptive tests. Item sharing is defined as the number of common items shared by all examinees in a group, while item pooling is the number of overlapping items that an examinee has with a group of examinees. The accuracy of the derived relationships was verified using numerical examples. The relationships derived will lay the foundation for future development of procedures to simultaneously control item exposure and item sharing or item pooling among a group of examinees larger than two.  相似文献   

20.
This paper proposes an on‐line version of the Sympson and Hetter procedure with test overlap control (SHT) that can provide item exposure control at both the item and test levels on the fly without iterative simulations. The on‐line procedure is similar to the SHT procedure in that exposure parameters are used for simultaneous control of item exposure rates and test overlap rate. The exposure parameters for the on‐line procedure, however, are updated sequentially on the fly, rather than through iterative simulations conducted prior to operational computerized adaptive tests (CATs). Unlike the SHT procedure, the on‐line version can control item exposure rate and test overlap rate without time‐consuming iterative simulations even when item pools or examinee populations have been changed. Moreover, the on‐line procedure was found to perform better than the SHT procedure in controlling item exposure and test overlap for examinees who take tests earlier. Compared with two other on‐line alternatives, this proposed on‐line method provided the best all‐around test security control. Thus, it would be an efficient procedure for controlling item exposure and test overlap in CATs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号