首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
For multiple-choice tests where noa priori key exists, the initial selection of a key for maximum validity may be made on the basis of the number of persons choosing each alternative and their mean criterion score. The keying formula is derived. Once the initial keying has been done, further precision in keying and item selection may use, in addition, the mean total test score for persons choosing each alternative. Item-selection formulas suggested by Horst and by Gulliksen for maximizing test validity are both in the form of a ratio, an item-validity index divided by an item-reliability index. The formula derived here is shown to be equivalent to the numerators of these formulas. The expression in the denominators uses the total test score. Although a radical appears in the denominator of Horst's formula and not in the denominator of Gulliksen's formula, both of them select the same items in practice.The author gratefully acknowledges the suggestions and criticisms of Dr. Harold Gulliksen, Research Adviser at the Educational Testing Service.  相似文献   

2.
3.
Time-limit tests: estimating their reliability and degree of speeding   总被引:7,自引:0,他引:7  
Non-spurious methods are needed for estimating the coefficient of equivalence for speeded tests from single-trial data. Spuriousness in a split-half estimate depends on three conditions; the split-half method may be used if any of these is demonstrated to be absent. A lower-bounds formula,r c, is developed. An empirical trial of this coefficient and other bounds proposed by Gulliksen demonstrates that, for moderately speeded tests, the coefficient of equivalence can be determined approximately from single-trial data. It is proposed that the degree to which tests are speeded be investigated explicitly, and an index is advanced to define this concept.  相似文献   

4.
TUCKER LR 《Psychometrika》1949,14(2):117-119
The Kuder-Richardson formula (20) is rewritten to be identical with the simplest formula, (21), except for the addition of a term involving the standard deviation, p , of the itemp's. If p can be estimated, a rapid and superior estimate of test reliability is possible in contrast to the simpler formula (21) used when the number of items and mean and standard deviation of test scores are known.Kuder, G. F. and Richardson, M. W. The theory of the estimation of test reliability.Psychometrika. 1937, 2, 151–160.  相似文献   

5.
It is shown that approaches other than the internal consistency method of estimating test reliability are either less satisfactory or lead to the same general results. The commonly attendant assumption of a single factor throughout the test items is challenged, however. The consideration of a test made up ofK sub-tests each composed of a different orthogonal factor disclosed that the assumption of a single factor produced an erroneous estimate of reliability with a ratio of (nK)/(n–1) to the correct estimate. Special difficulties arising from this error in application of current techniques to short tests or to test batteries are discussed. Application of this same multi-factor concept to item-analysis discloses similar difficulties in that field. The item-test coefficient approaches 1/K as an upper limit rather than 1.00 and approaches 1/n as a lower limit rather than .00. This latter finding accounts for an over-estimation error in the Kuder-Richardson formula (8). A new method of isolating sub-tests based upon the item-test coefficient is proposed and tentatively outlined. Either this new method or a complete factor analysis is regarded as the only proper approach to the problem of test reliability, and the item-sub-est coefficient is similarly recommended as the proper approach for item analysis.  相似文献   

6.
Cronbach’s α is widely used in social science research to estimate the internal consistency of reliability of a measurement scale. However, when items are not strictly parallel, the Cronbach’s α coefficient provides a lower-bound estimate of true reliability, and this estimate may be further biased downward when items are dichotomous. The estimation of standardized Cronbach’s α for a scale with dichotomous items can be improved by using the upper bound of coefficient ϕ. SAS and SPSS macros have been developed in this article to obtain standardized Cronbach’s α via this method. The simulation analysis showed that Cronbach’s α from upper-bound ϕ might be appropriate for estimating the real reliability when standardized Cronbach’s α is problematic.  相似文献   

7.
Maximizing the discriminating power of a multiple-score test involves maximizing the homogeneity of each subtest and minimizing the correlations between subtests. A method is presented for constructing such tests from items whose intercorrelations are not too high. Under certain restrictions the saturation, defined as the proportion of inter-item covariance to total variance, is maximized for each subtest. The nucleus of each subtest is three items with high covariancesinter se. All items which will lower the saturation are discarded; the one item is added which will maximize the saturation of the resultant test. This process is repeated until all the items are included or discarded for that subtest. If the correlation between any such subtests approaches the geometric mean of their saturations, their items form a new pool for one or more subtests. Formulas are presented for deciding which items to eliminate in order to reduce further the correlations between subtests.This research was supported in part by the United States Air Force under Contract AF 33(038)-10588 with Human Resources Research Center, Lackland Air Force Base, San Antonio, Texas. Permission is granted for reproduction, translation, publication use and disposal in whole and in part by or for the United States Government.  相似文献   

8.
Rae G 《心理学方法》2007,12(2):177-184
The relationship between stratified alpha (alpha(s)) and the reliability of a test composed of interrelated nonhomogeneous items is examined. It is mathematically demonstrated that when there is congeneric equivalence within the strata or subtests, the difference between the coefficients is a function of the variances of the loadings within the strata. When the items within each stratum are essentially tau equivalent, these variances are 0, and alpha(s) and true reliability are equal, provided errors of measurement are uncorrelated. If errors of measurement are positively correlated and there is essential tau equivalence within strata, stratified alpha will overestimate reliability. These findings indicate that recent studies involving stratified alpha (A. Kamata, A. Turhan, & E. Darandari, 2003; H. G. Osburn, 2000) need to be interpreted with some degree of caution. Nevertheless, the hypothetical population data presented in this article suggest that under certain circumstances, stratified alpha can be considerably greater than alpha and closer to the true reliability. Because the former is easily computed, it is recommended that with stratified tests, practicing researchers should routinely calculate both alpha and stratified alpha coefficients.  相似文献   

9.
The conventional scoring formula to correct for guessing is derived and is compared with a regression method for scoring which has been recently proposed by Hamilton. It is shown that the usual formula,S=RW/(n–1), yields a close approximation (correct within one point) to the maximum-likelihood estimate of an individual's true score on the test, if we assume that the individual knows or does not know the answer to each item, that guessing at unknown items is random, and that success at guessing is governed by the binomial law. It is also shown that the usual scoring formula yields an unbiased estimate of the individual's true score, when the true score is defined as the mean score over an indefinitely large number of independent attempts at the test or at equivalent (parallel) tests.  相似文献   

10.
JOHNSON HG 《Psychometrika》1950,15(2):115-119
Evidence is cited to show that specificity, or lack of equivalence, in the comparable forms of tests has a tendency to lower the value of reliability coefficients but has no tendency to lower the value of observed trait coefficients. This implies that the greater the lack of equivalence, the higher will be coefficients corrected for attenuation. Errors of measurement are supposed to reduce the magnitude of observed trait coefficients. Since specificity does not lower the correlation between two tests and since the split-half and equivalent-form reliability coefficients treat specificity as error, it follows that these two coefficients cannot legitimately be used in Spearman's correction-for-attenuation formula.  相似文献   

11.
Equivalence tests are an alternative to traditional difference‐based tests for demonstrating a lack of association between two variables. While there are several recent studies investigating equivalence tests for comparing means, little research has been conducted on equivalence methods for evaluating the equivalence or similarity of two correlation coefficients or two regression coefficients. The current project proposes novel tests for evaluating the equivalence of two regression or correlation coefficients derived from the two one‐sided tests (TOST) method (Schuirmann, 1987, J. Pharmacokinet. Biopharm, 15, 657) and an equivalence test by Anderson and Hauck (1983, Stat. Commun., 12, 2663). A simulation study was used to evaluate the performance of these tests and compare them with the common, yet inappropriate, method of assessing equivalence using non‐rejection of the null hypothesis in difference‐based tests. Results demonstrate that equivalence tests have more accurate probabilities of declaring equivalence than difference‐based tests. However, equivalence tests require large sample sizes to ensure adequate power. We recommend the Anderson–Hauck equivalence test over the TOST method for comparing correlation or regression coefficients.  相似文献   

12.
For (0, 1) scored multiple-choice tests, a formula giving test reliability as a function of the number of item options is derived, assuming the knowledge or random guessing model, the parallelism of the new and old tests (apart from the guessing probability), and the assumptions of classical test theory. It is shown that the formula is a more general case of an equation by Lord, and reduces to Lord's equation if the items are effectively parallel. Further, the formula is shown to be closely related to another formula derived from Lord's randomly parallel tests model.  相似文献   

13.
A sizeable amount of research literature has failed to demonstrate a stable relationship between self-report and the Rorschach (Exner, 1993). However, principal component first-factor related test-interaction style has been shown to moderate convergence. In this study 78 psychiatric patients completed the Rorschach and the Minnesota Multiphasic Personality Inventory-2 (MMPI-2; Hathaway & McKinley, 1989). Practically no correlation was evident between the MMPI-2 and the Rorschach, measuring similar constructs, in all patients. Patients with similar test-interaction styles demonstrated positive intermethod correlations between both conceptually related and conceptually not directly related test indexes. The same scales were negatively correlated in patients with discordant test-interaction styles, and this difference between test-interaction style groups was significant. It is suggested that first-factor related test-interaction style moderates convergence. It is further suggested that test-interaction style moderates convergence between both conceptually related and conceptually not directly related measures of distress or psychopathology.  相似文献   

14.
A sizeable amount of research literature has failed to demonstrate a stable relationship between self-report and the Rorschach (Exner, 1993). However, principal component first-factor related test-interaction style has been shown to moderate convergence. In this study 78 psychiatric patients completed the Rorschach and the Minnesota Multiphasic Personality Inventory-2 (MMPI-2; Hathaway & McKinley, 1989). Practically no correlation was evident between the MMPI-2 and the Rorschach, measuring similar constructs, in all patients. Patients with similar test-interaction styles demonstrated positive intermethod correlations between both conceptually related and conceptually not directly related test indexes. The same scales were negatively correlated in patients with discordant test-interaction styles, and this difference between test-interaction style groups was significant. It is suggested that first-factor related test-interaction style moderates convergence. It is further suggested that test-interaction style moderates convergence between both conceptually related and conceptually not directly related measures of distress or psychopathology.  相似文献   

15.
The theory of the estimation of test reliability   总被引:13,自引:0,他引:13  
The theoretically best estimate of the reliability coefficient is stated in terms of a precise definition of the equivalence of two forms of a test. Various approximations to this theoretical formula are derived, with reference to several degrees of completeness of information about the test and to special assumptions. The familiar Spearman-Brown Formula is shown to be a special case of the general formulation of the problem of reliability. Reliability coefficients computed in various ways are presented for comparative purposes.  相似文献   

16.
Despite a growing body of applied research on using the Internet for some human resource management practices, few studies have provided equivalence information or practical lessons concerning selection testing via the Internet. We identify several issues associated with measurement and validity, the role of several individual characteristics, respondents' reactions and behaviors, and other considerations concerning Internet test administration. We also report results from an exploratory study of the correlation between paper‐and‐pencil and Internet‐administered cognitively oriented selection tests (including timed and untimed, proctored tests). Our empirical results suggest modest degrees of cross‐mode equivalence for an untimed situational judgment test (r= .84) and for a timed cognitive ability test (r= .60). Further, some types of items (math, verbal, spatial) in the timed cognitive ability test seem to play a differential role in the reduced cross‐mode equivalence. New issues regarding the perception of, and reaction to, items presented via the Internet are presented, and a variety of practical issues are derived and discussed.  相似文献   

17.
Computerized adaptive testing (CAT) was originally proposed to measure θ, usually a latent trait, with greater precision by sequentially selecting items according to the student’s responses to previously administered items. Although the application of CAT is promising for many educational testing programs, most of the current CAT systems were not designed to provide diagnostic information. This article discusses item selection strategies specifically tailored for cognitive diagnostic tests. Our goal is to identify an effective item selection algorithm that not only estimates θ efficiently, but also classifies the student’s knowledge status α accurately. A single-stage item selection method with a dual purpose will be introduced. The main idea is to treat diagnostic criteria as constraints: Using the maximum priority index method to meet these constraints, the CAT system is able to generate cognitive diagnostic feedback in a fairly straightforward fashion. Different priority functions are proposed. Some of them are based on certain information measures, such as Kullback–Leibler information, and others utilize only the information provided by the Q-matrix. An extensive simulation study is conducted, and the results indicate that the information-based method not only yields higher classification rates for cognitive diagnosis, but also achieves more accurate θ estimation. Other constraint controls, such as item exposure rates, are also considered for all the competing methods.  相似文献   

18.
Item-analysis data are usually obtained from a single test administration, with a given item sequence and time limit. Questions can be raised as to the effects upon item data resulting from changes in item-position and test-timing. In this study, two forms of a verbal test and two forms of a mathematics test were used. In each case, both forms of each test contained the same items, but items coming early in one form were placed late in the other. Each of these forms was administered once with a short time limit and once with generous timing to comparable groups of high school students. The relationships of various speed and power scores were determined, and the changes which occurred during the added time were studied. Values of the item indicesp (proportion right), (another difficulty index), and the item-test biserial correlation coefficient were obtained for both the speed and the power conditions and were systematically compared. The proportion right of those attempting the item, the index, and the biserialr were all found to have undesirable characteristics for items appearing late in a speeded test.The author gratefully acknowledges the suggestions and criticisms of Dr. Harold Gulliksen, Research Adviser at the Educational Testing Service.  相似文献   

19.
Sampling fluctuations resulting from the sampling of test items rather than of examinees are discussed. It is shown that the Kuder-Richardson reliability coefficients actually are measures of this type of sampling fluctuation. Formulas for certain standard errors are derived; in particular, a simple formula is given for the standard error of measurement of an individual examinee's score. A common misapplication of the Wilks-Votaw criterion for parallel tests is pointed out. It is shown that the Kuder-Richardson formula-21 reliability coefficient should be used instead of the formula-20 coefficient in certain common practical situations.Most of the work reported here was carried out under contract with the Office of Naval Research. The writer is indebted to Professor S. S. Wilks, who has checked over certain critical portions of a draft of this paper.  相似文献   

20.
The relation between item difficulty distributions and the validity and reliability of tests is computed through use of normal correlation surfaces for varying numbers of items and varying degrees of item intercorrelations. Optimal or near optimal item difficulty distributions are thus identified for various possible item difficulty distributions. The results indicate that, if a test is of conventional length, is homogeneous as to content, and has a symmetrical distribution of item difficulties, correlation with a normally distributed perfect measure of the attribute common to the items does not vary appreciably with variation in the item difficulty distribution. Greater variation was evident in correlation with a second duplicate test (reliability). The general implications of these findings and their particular significance for evaluating techniques aimed at increasing reliability are considered.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号