首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Coefficient alpha and the internal structure of tests   总被引:135,自引:0,他引:135  
A general formula (α) of which a special case is the Kuder-Richardson coefficient of equivalence is shown to be the mean of all split-half coefficients resulting from different splittings of a test. α is therefore an estimate of the correlation between two random samples of items from a universe of items like those in the test. α is found to be an appropriate index of equivalence and, except for very short tests, of the first-factor concentration in the test. Tests divisible into distinct subtests should be so divided before using the formula. The index , derived from α, is shown to be an index of inter-item homogeneity. Comparison is made to the Guttman and Loevinger approaches. Parallel split coefficients are shown to be unnecessary for tests of common types. In designing tests, maximum interpretability of scores is obtained by increasing the first-factor concentration in any separately-scored subtest and avoiding substantial group-factor clusters within a subtest. Scalability is not a requisite. The assistance of Dora Damrin and Willard Warrington is gratefully acknowledged. Miss Damrin took major responsibility for the empirical studies reported. This research was supported by the Bureau of Research and Service, College of Education.  相似文献   

2.
A technique is indicated by which approximations to the factor loadings of a new test may be obtained if factor loadings of a given group of tests and the correlations of the new test with the other tests are known. The technique is applicable to any orthogonal system and is especially adapted to cases in which a ji a jk = 0 wheni k. Application is also made to the simultaneous determination of the factor weights of a group of tests in which no additional common factor is present. The technique is useful in adding tests to a completed factorial solution and in using factorial solutions involving errors to give results which are approximately correct.  相似文献   

3.
It is shown that approaches other than the internal consistency method of estimating test reliability are either less satisfactory or lead to the same general results. The commonly attendant assumption of a single factor throughout the test items is challenged, however. The consideration of a test made up ofK sub-tests each composed of a different orthogonal factor disclosed that the assumption of a single factor produced an erroneous estimate of reliability with a ratio of (nK)/(n–1) to the correct estimate. Special difficulties arising from this error in application of current techniques to short tests or to test batteries are discussed. Application of this same multi-factor concept to item-analysis discloses similar difficulties in that field. The item-test coefficient approaches 1/K as an upper limit rather than 1.00 and approaches 1/n as a lower limit rather than .00. This latter finding accounts for an over-estimation error in the Kuder-Richardson formula (8). A new method of isolating sub-tests based upon the item-test coefficient is proposed and tentatively outlined. Either this new method or a complete factor analysis is regarded as the only proper approach to the problem of test reliability, and the item-sub-est coefficient is similarly recommended as the proper approach for item analysis.  相似文献   

4.
A method is derived for finding the average Spearman rank correlation coefficient ofN sets of ranks with a single dependent or criterion ranking ofn items without computing any of the individual coefficients. Procedures for calculating the exact distribution of av for small values ofN andn are described for the null case. The first four moments about zero of this distribution are derived, and it is concluded that for samples as small asN=4 andn=4 the normal distribution can be used safely in testing the hypothesis av=0.This problem first came to the writer's attention in discussions with Dr. Dean J. Clyde.  相似文献   

5.
A table is developed and presented to facilitate the computation of the PearsonQ 3 (cosine method) estimate of the tetrachoric correlation coefficient. Data are presented concerning the accuracy ofQ 3 as an estimate of the tetrachoric correlation coefficient, and it is compared with the results obtainable from the Chesire, Saffir, and Thurstone tables for the same four-fold frequency tables.The authors are indebted to Mr. John Scott, Chief of the Test Development Section of the U.S. Civil Service Commission, for his encouragement and to Miss Elaine Ambrifi and Mrs. Elaine Nixon for the large amount of computational work involved in this paper.  相似文献   

6.
Item-analysis data are usually obtained from a single test administration, with a given item sequence and time limit. Questions can be raised as to the effects upon item data resulting from changes in item-position and test-timing. In this study, two forms of a verbal test and two forms of a mathematics test were used. In each case, both forms of each test contained the same items, but items coming early in one form were placed late in the other. Each of these forms was administered once with a short time limit and once with generous timing to comparable groups of high school students. The relationships of various speed and power scores were determined, and the changes which occurred during the added time were studied. Values of the item indicesp (proportion right), (another difficulty index), and the item-test biserial correlation coefficient were obtained for both the speed and the power conditions and were systematically compared. The proportion right of those attempting the item, the index, and the biserialr were all found to have undesirable characteristics for items appearing late in a speeded test.The author gratefully acknowledges the suggestions and criticisms of Dr. Harold Gulliksen, Research Adviser at the Educational Testing Service.  相似文献   

7.
The validity of a univocal multiple-choice test is determined for varying distributions of item difficulty and varying degrees of item precision. Validity is a function of d 2 + v 2 , where d measures item unreliability and v measures the spread of item difficulties. When this variance is very small, validity is high for one optimum cutting score, but the test gives relatively little valid information for other cutting scores. As this variance increases, eta increases up to a certain point, and then begins to decrease. Screening validity at the optimum cutting score declines as this variance increases, but the test becomes much more flexible, maintaining the same validity for a wide range of cutting scores. For items of the type ordinarily used in psychological tests, the test with uniform item difficulty gives greater over-all validity, and superior validity for most cutting scores, compared to a test with a range of item difficulties. When a multiple-choice test is intended to reject the poorestF per cent of the men tested, items should on the average be located at or above the threshold for men whose true ability is at theFth percentile.This research was performed under contract Nop 536 with the Bureau of Naval Personnel, and received additional support from the Bureau of Research and Service, College of Education, University of Illinois.  相似文献   

8.
Experimental studies of the successive changes (frequently represented by curves describing laws of learning and other similar functional relationships) in a criterion variable accompanying experimental variations in a given treatment, and experimental comparisons of such changes for different populations or for different treatments, constitute a large and important class of psychological experiments. In most such experiments, no attempt has been made to analyze or to make allowance for errors of sampling or of observation. In many others, the techniques of error analysis that have been employed have been inefficient, inexact, or inappropriate. This paper suggests tests, using the methods of analysis of variance, of certain hypotheses concerning trends and trend differences in sample means obtained in experiments of this general type. For means of successive independent samples, tests are provided of the hypotheses: (H 1) that there is no trend, or that the trend is a horizontal straight line, (H 3) that there is a linear trend, (H 5) that the trend is as described by a line not derived from the observed means, and (H 7) that the trend is as described by a line fitted to the observed means. Tests are also provided of similar hypotheses (H 2,H 4,H 6, andH 8, respectively) for means of successive measurements of the same sample. Finally, tests are provided of the null hypotheses that there is no difference in trend in two series of means: (H 9) when each mean in each series is based on an independent sample, (H 10) when each pair of corresponding means is based on an independent sample, (H 11) when each series of means is based on an independent sample, and (H 12) when both series are based on a single sample.  相似文献   

9.
Two new methods of item analysis are described. One involves the computation of the coefficient (correlation of a fourfold point distribution) and the other involves chi square. The only data required are the proportions of passing individuals in the upper and lower criterion groups, for the determination of , and in addition,N, for the determination of chi square. Abacs are presented for graphic solution of the two indices of validity, and tests of significance are provided.  相似文献   

10.
Determining a lack of association between an outcome variable and a number of different explanatory variables is frequently necessary in order to disregard a proposed model (i.e., to confirm the lack of a meaningful association between an outcome and predictors). Despite this, the literature rarely offers information about, or technical recommendations concerning, the appropriate statistical methodology to be used to accomplish this task. This paper introduces non-inferiority tests for ANOVA and linear regression analyses, which correspond to the standard widely used F test for and R2, respectively. A simulation study is conducted to examine the Type I error rates and statistical power of the tests, and a comparison is made with an alternative Bayesian testing approach. The results indicate that the proposed non-inferiority test is a potentially useful tool for ‘testing the null’.  相似文献   

11.
It is assumed that a battery ofn tests has been resolved into components in a common factor space ofr dimensions and a unique factor space of at mostn dimensions, wherer is much less thann. Simplified formulas for ordinary multiple and partial correlation of tests are derived directly in terms of the components. The best (in the sense of least squares) linear regression equations for predicting factor scores from test scores are derived also in terms of the components. Spearman's single factor prediction formulas emerge as special cases. The last part of the paper shows how the communality is an upper bound for multiple correlation. A necessary and sufficient condition is established for the square of the multiple correlation coefficient of testj on the remainingn—1 tests to approach the communality of testj as a limit asn increases indefinitely whiler remains constant. Limits are established for partial correlation and regression coefficients and for the prediction of factor scores.I am indebted to Professor Dunham Jackson for helpful criticism of most of this paper.  相似文献   

12.
Pen-based computers are similar to paper and pencil (P&P) tests in the method of responding, and thus, may more closely match paper and pencil administration in construct equivalence than keyboard-entry computers. A study was conducted comparing P&P, pen-based note-book computer, and keyboard-entry PC versions of two test batteries. Participants completed tests administered using different administration modes on separate days; construct equivalence was evaluated by comparing Day 1-Day 2 correlations across conditions. Although construct equivalence was found for the power tests, differences emerged for the speeded tests. For the pen-based computer, solid evidence of equivalence to P&P appeared for all but one of the speeded tests, whereas the keyboard PC showed borderline equivalence for only one of the three speeded tests. These findings suggested that the pen-entry computer may be more capable than the keyboard-entry computer in maintaining construct equivalence to P&P tests.  相似文献   

13.
Use of the same term split-half for division of ann-item test into two subtests containing equal [Cronbach], and possibly unequal [Guttman], numbers of items sometimes leads to a misunderstanding about the relation between Guttman's maximum split-half bound and Cronbach's coefficient alpha.Coefficient alpha is the average of split-half bounds in the Cronbach sense and so is not larger than the maximum split-half bound in either sense whenn is even. Whenn is odd, however, splithalf bounds exist only in the Guttman sense and the largest of these may be smaller than coefficient alpha.  相似文献   

14.
Tautologies are established for the reliability coefficient 2 t of the sum ofn part scores. It is not assumed that the part scores are experimentally independent of each other nor that the parts are equivalent to each other. The tautologies show the exact role played by experimental dependence and nonequivalence of parts, respectively, in the reliability of the sum. The formal algebra is appropriate to reliability in the sense of repeated trials of the same test, as well as in the sense of a universe of parallel tests, although the empirical meanings are different. Emphasis is on practical formulas that require information from only a single experiment (or test). These can take the form only of lower bounds to 2 t , four of which are developed.  相似文献   

15.
Let Σ x be the (population) dispersion matrix, assumed well-estimated, of a set of non-homogeneous item scores. Finding the greatest lower bound for the reliability of the total of these scores is shown to be equivalent to minimizing the trace of Σ x by reducing the diagonal elements while keeping the matrix non-negative definite. Using this approach, Guttman's bounds are reviewed, a method is established to determine whether his λ4 (maximum split-half coefficient alpha) is the greatest lower bound in any instance, and three new bounds are discussed. A geometric representation, which sheds light on many of the bounds, is described. Present affiliation of the second author: Department of Statistics, University of Nigeria (Nsukka Campus). Work on this paper was carried out while on study leave in Aberystwyth.  相似文献   

16.
A common question of interest to researchers in psychology is the equivalence of two or more groups. Failure to reject the null hypothesis of traditional hypothesis tests such as the ANOVA F‐test (i.e., H0: μ1 = … = μk) does not imply the equivalence of the population means. Researchers interested in determining the equivalence of k independent groups should apply a one‐way test of equivalence (e.g., Wellek, 2003). The goals of this study were to investigate the robustness of the one‐way Wellek test of equivalence to violations of homogeneity of variance assumption, and compare the Type I error rates and power of the Wellek test with a heteroscedastic version which was based on the logic of the one‐way Welch (1951) F‐test. The results indicate that the proposed Wellek–Welch test was insensitive to violations of the homogeneity of variance assumption, whereas the original Wellek test was not appropriate when the population variances were not equal.  相似文献   

17.
Image theory for the structure of quantitative variates   总被引:1,自引:0,他引:1  
A universe of infinitely many quantitative variables is considered, from which a sample ofn variables is arbitrarily selected. Only linear least-squares regressions are considered, based on an infinitely large population of individuals or respondents. In the sample of variables, the predicted value of a variablex from the remainingn – 1 variables is called the partial image ofx, and the error of prediction is called the partial anti-image ofx. The predicted value ofx from the entire universe, or the limit of its partial images asn , is called the total image ofx, and the corresponding error is called the total anti-image. Images and anti-images can be used to explain why any two variablesx j andx k are correlated with each other, or to reveal the structure of the intercorrelations of the sample and of the universe. It is demonstrated that image theory is related to common-factor theory but has greater generality than common-factor theory, being able to deal with structures other than those describable in a Spearman-Thurstone factor space. A universal computing procedure is suggested, based upon the inverse of the correlation matrix.This paper introduces one of three new structural theories, each of which generalizes common-factor analysis in a different direction.Nodular theory extends common-factor analysis to qualitative data and to data with curvilinear regressions (6).Order-factor theory introduces the notions oforder among the observed variables and ofseparable factors (7). The presentimage theory is relevant also to the other two.Attention may be called to empirical results published since this paper was written: Louis Guttman, Two new approaches to factor analysis, Annual Technical Report on contract Nonr—731(00). The present research was aided by an uncommitted grant-in-aid from the Ford Foundation.  相似文献   

18.
It is proposed that a satisfactory criterion for an approximation to simple structure is the minimization of the sums of cross-products (across factors) ofsquares of factor loadings. This criterion is completely analytical and yields a unique solution; it requires no plotting, nor any decisions as to the clustering of variables into subgroups. The equations involved appear to be capable only of iterative solution; for more than three or four factors the computations become extremely laborious but may be feasible for high-speed electronic equipment. Either orthogonal or oblique solutions may be achieved. For illustrations, the Johnson-Reynolds study of flow and selection factors and the Thurstone box problem are reanalyzed. The presence of factorially complex tests produces a type of hyperplanar fit which the investigator may desire to adjust by graphical rotations; the smaller the number of such tests, the closer the criterion comes to approximating simple structure.  相似文献   

19.
TUCKER LR 《Psychometrika》1949,14(2):117-119
The Kuder-Richardson formula (20) is rewritten to be identical with the simplest formula, (21), except for the addition of a term involving the standard deviation, p , of the itemp's. If p can be estimated, a rapid and superior estimate of test reliability is possible in contrast to the simpler formula (21) used when the number of items and mean and standard deviation of test scores are known.Kuder, G. F. and Richardson, M. W. The theory of the estimation of test reliability.Psychometrika. 1937, 2, 151–160.  相似文献   

20.
Holistic processing (HP) of faces is usually measured by the composite effect. While Weston and Perfect [2005. Effects of processing bias on the recognition of composite face halves. Psychonomic Bulletin & Review, 12, 1038–1042. doi:10.3758/BF03206440] found that priming at the local level speeded recognition of components of faces, Gao et al. [2011. Priming global and local processing of composite faces: Revisiting the processing-bias effect on face perception. Attention Perception & Psychophysics, 73, 1477–1486. doi:10.3758/s13414-011-0109-7] found that only global priming had an effect on HP of faces. The two studies used different versions of the composite task (the partial design, which is considered to be prone on bias, and the complete design). However, the two studies also differed in other respects and it is difficult to know to what extent issues with the partial design contributed to the differing conclusions. In the present study, the HP indexed by the complete design measure was augmented by global priming. In contrast, no effect was observed in the partial design index. We claim that the partial design index reflects other factors besides HP, including response bias, and conclude that HP can be understood within the context of domain-general attentional processes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号