期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The effect of difficulty and chance success on correlations between items or between tests

John B. Carroll 《Psychometrika》1945,10(1):1-19

A study is made of the extent to which correlations between items and between tests are affected by the difficulties of the items involved and by chance success through guessing. The Pearsonian product-moment coefficient does not necessarily give a correct indication of the relation between items or sets of items, since it tends to decrease as the items or tests become less similar in difficulty. It is suggested that the tetrachoric correlation coefficient can properly be used for estimating the correlation between the continua underlying items or sets of items even though they differ in difficulty, and a method for correcting a 2 × 2 table for the effect of chance is proposed.The opinions expressed in this article are the private ones of the writer and are not to be construed as official or reflecting the views of the Navy Department or the naval service at large. The writer is indebted to Lt. C. L. Vaughn, H (S) USNR, for critical comments on this paper. 相似文献

2.

The concept of test and item reliability in relation to factor pattern

Robert J. Wherry Richard H. Gaylord 《Psychometrika》1943,8(4):247-264

It is shown that approaches other than the internal consistency method of estimating test reliability are either less satisfactory or lead to the same general results. The commonly attendant assumption of a single factor throughout the test items is challenged, however. The consideration of a test made up ofK sub-tests each composed of a different orthogonal factor disclosed that the assumption of a single factor produced an erroneous estimate of reliability with a ratio of (n–K)/(n–1) to the correct estimate. Special difficulties arising from this error in application of current techniques to short tests or to test batteries are discussed. Application of this same multi-factor concept to item-analysis discloses similar difficulties in that field. The item-test coefficient approaches 1/K as an upper limit rather than 1.00 and approaches 1/n as a lower limit rather than .00. This latter finding accounts for an over-estimation error in the Kuder-Richardson formula (8). A new method of isolating sub-tests based upon the item-test coefficient is proposed and tentatively outlined. Either this new method or a complete factor analysis is regarded as the only proper approach to the problem of test reliability, and the item-sub-est coefficient is similarly recommended as the proper approach for item analysis. 相似文献

3.

A new rank correlation coefficient with application to the consensus ranking problem

Edward J. Emond David W. Mason 《Journal of Multi-Criteria Decision Analysis》2002,11(1):17-28

The consensus ranking problem has received much attention in the statistical literature. Given m rankings of n objects the objective is to determine a consensus ranking. The input rankings may contain ties, be incomplete, and may be weighted. Two solution concepts are discussed, the first maximizing the average weighted rank correlation of the solution ranking with the input rankings and the second minimizing the average weighted Kemeny–Snell distance. A new rank correlation coefficient called τ_x is presented which is shown to be the unique rank correlation coefficient which is equivalent to the Kemeny‐Snell distance metric. The new rank correlation coefficient is closely related to Kendall's tau but differs from it in the way ties are handled. It will be demonstrated that Kendall's τ_b is flawed as a measure of agreement between weak orderings and should no longer be used as a rank correlation coefficient. The use of τ_x in the consensus ranking problem provides a more mathematically tractable solution than the Kemeny–Snell distance metric because all the ranking information can be summarized in a single matrix. The methods described in this paper allow analysts to accommodate the fully general consensus ranking problem with weights, ties, and partial inputs. Copyright © 2002 John Wiley & Sons, Ltd. 相似文献

4.

The factorial interpretation of test difficulty

George A. Ferguson 《Psychometrika》1941,6(5):323-329

This paper discusses the influence of test difficulty on the correlation between test items and between tests. The greater the difference in difficulty between two test items or between two tests the smaller the maximum correlation between them. In general, the greater the number of degrees of difficulty among the items in a test or among the tests in a battery, the higher the rank of the matrix of intercorrelations; that is, differences in difficulty are represented in the factorial configuration as additional factors. The suggestion is made that if all tests included in a battery are roughly homogeneous with respect to difficulty existing hierarchies will be more clearly defined and meaningful psychological interpretation of factors more readily attained. 相似文献

5.

MMPI–A Forensic Case Studies: Uses in Documented Court Decisions

Tracy O'Connor Pennuto Robert P. Archer 《Journal of personality assessment》2013,95(3):215-226

The preliminary development of a personality test, the Zax Information Profile (ZIP), involving 24 content areas was described. Measures of internal consistency of the items in the separate sub-tests are reported as well-as the factor structure of the test with reference to several different S samples. Although internal consistency measures are not as high as were hoped for, they are consistent with similar measures done on somewhat similar test instruments. Furthermore, an external validation study demonstrated that the test differentiated very effectively between music students and arts college students in the predicted directions. The potential use of the test as a screening device for entering college students, and as an instrument for providing leads for constructing optimal housing arrangements and programs to prevent social maladjustment in college students was discussed. 相似文献

6.

The difficulty of a test and its factor composition

J. P. Guilford 《Psychometrika》1941,6(2):67-77

A factor analysis of the ten sub-tests of the Seashore test of pitch discrimination revealed that more than one ability is involved. One factor, which accounted for the greater share of the variances, had loadings that decreased systematically with increasing difficulty. A second factor had strongest loadings among the more difficult items, particularly those with frequency differences of 2 to 5 cycles per second. A third had strongest loadings at differences of 5 to 12 cycles per second. No explanation for the three factors is apparent, but the hypothesis is accepted that they represent distinct abilities. In tests so homogeneous as to content and form, where a single common factor might well have been expected, the appearance of additional common factors emphasizes the importance of considering the difficulty level of test items, both in the attempt to interpret new factors and in the practice of testing. The same kind of item may measure different abilities according as it is easy or difficult for the individuals to whom it is applied. 相似文献

7.

To Balance or not to Balance: Confirmatory Factor Analysis of the Affect-Balance Scale

Kyung A. Kim Daniel J. Mueller 《Journal of Happiness Studies》2001,2(3):289-306

This study examined the psychometric quality of the Affect-Balance Scale (ABS) (Bradburn, 1969) using data collected from 292 middle-aged and older adults, living independently. The dimensionality of the scale was examined, the quality of individual items was tested, and the validity of the ABS was studied. Using a tetrachoric correlation matrix with the robust weighted least squares (WLSMV) estimation method of the Mplus program, we found that two moderately correlated (r = -0.37) constructs are needed to adequately account for the pattern of item scores in the ABS. Two of the 10 ABS items were found to be problematic. When raw sum scores were used in analysis, the correlation between the positive-affect and the negative-affect subscales was lower (r = -0.17), indicating that random and nonrandom measurement error masked the relationship between the two. While affect-balance correlated substantially with five criterion well-being measures, the negative-affect subscale (which constitutes half of the ABS) had a similar pattern of correlations, with only slightly lower magnitude. The theoretical construct of nobreak 'balance' is also questioned. The 'balance' scoring method (subtracting the negative-affect subscale score from the positive-affect subscale score) nets exactly the same score as does summing scores from both subscales together. Accordingly, the summed scores have the very same correlations with other variables as do the balance scores. 相似文献

8.

Variation in test validity with variation in the distribution of item difficulties,number of items,and degree of their intercorrelation

Hubert E. Brogden 《Psychometrika》1946,11(4):197-214

The relation between item difficulty distributions and the validity and reliability of tests is computed through use of normal correlation surfaces for varying numbers of items and varying degrees of item intercorrelations. Optimal or near optimal item difficulty distributions are thus identified for various possible item difficulty distributions. The results indicate that, if a test is of conventional length, is homogeneous as to content, and has a symmetrical distribution of item difficulties, correlation with a normally distributed perfect measure of the attribute common to the items does not vary appreciably with variation in the item difficulty distribution. Greater variation was evident in correlation with a second duplicate test (reliability). The general implications of these findings and their particular significance for evaluating techniques aimed at increasing reliability are considered. 相似文献

9.

A table for the rapid determination of the tetrachoric correlation coefficient

Melvin D. Davidoff Howard W. Goheen 《Psychometrika》1953,18(2):115-121

A table is developed and presented to facilitate the computation of the PearsonQ ₃ (cosine method) estimate of the tetrachoric correlation coefficient. Data are presented concerning the accuracy ofQ ₃ as an estimate of the tetrachoric correlation coefficient, and it is compared with the results obtainable from the Chesire, Saffir, and Thurstone tables for the same four-fold frequency tables.The authors are indebted to Mr. John Scott, Chief of the Test Development Section of the U.S. Civil Service Commission, for his encouragement and to Miss Elaine Ambrifi and Mrs. Elaine Nixon for the large amount of computational work involved in this paper. 相似文献

10.

The relation of the reliability of multiple-choice tests to the distribution of item difficulties

Frederic M. Lord 《Psychometrika》1952,17(2):181-194

Under certain assumptions an expression, in terms of item difficulties and intercorrelations, is derived for the curvilinear correlation of test score on the ability underlying the test, this ability being defined as the common factor of the item tetrachoric intercorrelations corrected for guessing. It is shown that this curvilinear correlation is equal to the square root of the test reliability. Numerical values for these curvilinear correlations are presented for a number of hypothetical tests, defined in terms of their item parameters. These numerical results indicate that the reliability and the curvilinear correlation will be maximized by (1) minimizing the variability of item difficulty and (2) making the level of item difficulty somewhat easier than the halfway point between a chance percentage of correct answers and 100 per cent correct answers. 相似文献

11.

Factor invariance of the Millon Clinical Multiaxial Inventory

Michael Gibertini Paul D. Retzlaff 《Journal of psychopathology and behavioral assessment》1988,10(1):65-74

Several factor analyses of the Millon Clinical Multiaxial Inventory (MCMI) have resulted in very similar solutions. Interpretation of this consistency is hampered by the fact that the 20 scales of the inventory share items. Overlapping items cause the scales to be linearly dependent and may create structure in the interscale correlation matrix which is separate from the subject response patterns. A factor analysis was performed on the matrix of item-overlap coefficients which describes the underlying artifactual structure of the instrument. Data from two new subject samples were factor analyzed and compared to previously published studies. Similarity coefficients among factors across studies were calculated. 相似文献

12.

Fitting a response model forn dichotomously scored items

R. Darrell Bock Marcus Lieberman 《Psychometrika》1970,35(2):179-197

A method of estimating the parameters of the normal ogive model for dichotomously scored item-responses by maximum likelihood is demonstrated. Although the procedure requires numerical integration in order to evaluate the likelihood equations, a computer implemented Newton-Raphson solution is shown to be straightforward in other respects. Empirical tests of the procedure show that the resulting estimates are very similar to those based on a conventional analysis of item difficulties and first factor loadings obtained from the matrix of tetrachoric correlation coefficients. Problems of testing the fit of the model, and of obtaining invariant parameters are discussed.Research reported in this paper was supported by NSF Grant 1025 to the University of Chicago. 相似文献

13.

An empirical verification of the Wherry-Gaylord iterative factor analysis procedure

Robert J. Wherry Joel T. Campbell Robert Perloff 《Psychometrika》1951,16(1):67-74

A comparison of the Wherry-Gaylord iterative factor analysis procedure and the Thurstone multiple-group analysis of sub-tests shows that the two methods result in the same factors. The Wherry-Gaylord method has the advantage of giving factor loadings for items. The number of iterations needed can be reduced by doing a factor analysis of sub-tests, re-grouping sub-tests according to factors, and using each group as a starting point for iterations.This research was carried out under Contract No. WSW-2503, between the Department of the Army and Ohio State University. This paper is based on the final report PRS No. 827 under that contract. The opinions expressed herein regarding matters relating to the Department of the Army are those of the authors and are not necessarily official. 相似文献

14.

Rank-reducibility of a symmetric matrix and sampling theory of minimum trace factor analysis

Alexander Shapiro 《Psychometrika》1982,47(2):187-199

One of the intriguing questions of factor analysis is the extent to which one can reduce the rank of a symmetric matrix by only changing its diagonal entries. We show in this paper that the set of matrices, which can be reduced to rankr, has positive (Lebesgue) measure if and only ifr is greater or equal to the Ledermann bound. In other words the Ledermann bound is shown to bealmost surely the greatest lower bound to a reduced rank of the sample covariance matrix. Afterwards an asymptotic sampling theory of so-called minimum trace factor analysis (MTFA) is proposed. The theory is based on continuous and differential properties of functions involved in the MTFA. Convex analysis techniques are utilized to obtain conditions for differentiability of these functions. 相似文献

15.

Calculating Impact Factor: How Bibliographical Classification of Journal Items Affects the Impact Factor of Large and Small Journals

Golubic R Rudes M Kovacic N Marusic M Marusic A 《Science and engineering ethics》2008,14(1):41-49

As bibliographical classification of published journal items affects the denominator in this equation, we investigated how the numerator and denominator of the impact factor (IF) equation were generated for representative journals in two categories of the Journal Citation Reports (JCR). We performed a full text search of the 1st-ranked journal in 2004 JCR category “Medicine, General and Internal” (New England Journal of Medicine, NEJM, IF = 38.570) and 61st-ranked journal (Croatian Medical Journal, CMJ, IF = 0.690), 1st-ranked journal in category “Multidisciplinary Sciences” (Nature, IF = 32.182) and journal with a relative rank of CMJ (Anais da Academia Brasileira de Ciencias, AABC, IF = 0.435). Large journals published more items categorized by Web of Science (WoS) as non-research items (editorial material, letters, news, book reviews, bibliographical items, or corrections): 63% out of total 5,193 items in Nature and 81% out of 3,540 items in NEJM, compared with 31% out of 283 items in CMJ and only 2 (2%) out of 126 items in AABC. Some items classified by WoS as non-original contained original research data (9.5% in Nature, 7.2% in NEJM, 13.7% in CMJ and none in AABC). These items received a significant number of citations: 6.9% of total citations in Nature, 14.7% in NEJM and 18.5% in CMJ. IF decreased for all journals when only items presenting original research and citations to them were used for IF calculation. Regardless of the journal’s size or discipline, publication of non-original research and its classification by the bibliographical database have an effect on both numerator and denominator of the IF equation. Preliminary results of the study were presented at the 2006 ORI Research Conference on Research Integrity, Tampa, FL, December 1–3, 2006. 相似文献

16.

TETRA-COM: A comprehensive SPSS program for estimating the tetrachoric correlation

Urbano Lorenzo-Seva Pere J. Ferrando 《Behavior research methods》2012,44(4):1191-1196

相似文献

17.

平衡秤任务复杂性的事前与事后分析 总被引：2，自引：0，他引：2

下载免费PDF全文

张丽辛自强《心理发展与教育》2008,24(2):46-53

任务复杂性的分析和评估是心理测量学和认知心理学都非常关注的重要主题。以264名小学四、五、六年级儿童为被试,平衡秤任务为研究材料,考察任务在未旋转时第一个因素上的载荷(事后分析)能否作为评价任务复杂性的一个指标,以及关系-表征复杂性模型对平衡秤任务复杂性分析(事前分析)的有效性两个问题。结果表明:平衡秤任务施测后所得所有项目的因素载荷与其难度之间没有显著正相关,即因素载荷的高低没有反映平衡秤任务复杂性的大小;而基于关系-表征复杂性模型对任务的事前分析所确定的任务等级复杂性和知识经验对任务难度的解释率为95.0%,可见,关系-表征复杂性模型提供的分析任务复杂性的思路和方法是较为合理的。相似文献

18.

An item‐level investigation of conceptual and empirical distinctiveness of proactivity constructs

Seonghee Cho Nichelle C. Carpenter Bo Zhang 《International Journal of Selection & Assessment》2020,28(3):337-350

Beyond the prior investigations that took scale‐level approaches to determining discriminant validity in proactivity constructs, the current study contributes a much‐needed interrogation of the items used to measure the behaviors in this domain. The substantive validity (SV) assessments (Study 1) showed that many of the items were judged to be inconsistent with the definition of the construct they assess or, alternatively, more consistent with the definition of a different construct in the domain. Further, exploratory factor analysis revealed the difficulty in empirically separating the four behaviors, while BiM results also advocated against the unique variance of them after accounting for a general factor (Study 2). Altogether, our results show that the items are partly to blame for the empirical redundancy issue. 相似文献

19.

Sampling theory in item analysis

Walter W. Merrill Jr. 《Psychometrika》1937,2(4):215-223

Since item values obtained by item analysis procedures are not always stable from one situation to another, it follows that selection of items for validity or difficulty is sometimes useless. An application of Chi Square to testing homogeneity of item values is made, in the case of theUL method, and illustrative data are presented. A method of applying sampling theory to Horst's maximizing function is outlined, as illustrative of author's observation that the results of item analysis by any of various methods may be similarly tested. 相似文献

20.

Interpretation of a Full-Information Item-Level Factor Analysis of the MMPI-2: Normative Sampling and Nonpathognomonic Descriptors

《Journal of personality assessment》2013,95(3):400-422

An exploratory item-level full-information factor analysis was performed on the normative sample for the MMPI-2 (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989). This method of factor analysis, developed by Schilling and Bock (Bock & Schilling, 1997) and based on item response theory, works directly with the response patterns and avoids the artifacts associated with phi coefficients and tetrachoric coefficients. Promax rotation of the factor solution organizes the clinical scale items into 10 factors that we labeled Distrust, Self-Doubt, Fitness, Serenity, Rebelliousness, Instrumentality, Irritability, Artistry, Sociability, and Self-Reliance. A comparison was made to the results of Johnson, Butcher, Null, and Johnson (1984), who performed a principal-component analysis on an item set of 550 items from the previous version of the MMPI (Hathaway & McKinley, 1943). Along with version changes and sampling differences, the essential differences between Johnson et al.'s results and ours may be attributed to differences between the Schilling and Bock method, which uses all information in the item responses, and the principal-component analysis, which uses the partial information contained in pairwise correlation coefficients. This study included 518 of the complete 567 items of the MMPI-2, versus Johnson et al.'s retention of 309 of the initially included 550 items of the previous MMPI. The full-information analysis retained all 518 initially included items and more evenly distributed the items over the 10 resulting factors, all sharply defined by their highest loading items and easy to interpret. Sampling effects and factor label considerations are discussed, along with recommendations for research that would validate the clinical utility of the implied scales for describing normal personality profiles. The full-information procedure provides for Bayes estimation of scores on these scales. 相似文献