期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A note on reliability estimation for a test with components of unknown functional lengths

Michelle Liou 《Psychometrika》1989,54(1):153-163

This research note proposes two reliability coefficients for tests with components of unknown functional lengths. The derived coefficients are extensions of the techniques devised by Kristof and Feldt and do not require a reduction of test components into parts. Simulation study indicates that the new coefficients yield reasonably stable reliability estimates when the number of test components is small. 相似文献

2.

A test of the hypothesis that Cronbach's alpha reliability coefficient is the same for two tests administered to the same sample

Leonard S. Feldt 《Psychometrika》1980,45(1):99-105

In measurement studies the researcher may wish to test the hypothesis that Cronbach's alpha reliability coefficient is the same for two measurement procedures. A statistical test exists for independent samples of subjects. In this paper three procedures are developed for the situation in which the coefficients are determined from the same sample. All three procedures are computationally simple and give tight control of Type I error when the sample size is 50 or greater.The author is indebted to Jerry S. Gilmer for development of the computer programs used in this study. 相似文献

3.

A k-sample significance test for independent alpha coefficients 总被引：1，自引：0，他引：1

A. Ralph Hakstian Thomas E. Whalen 《Psychometrika》1976,41(2):219-231

The earlier two-sample procedure of Feldt [1969] for comparing independent alpha reliability coefficients is extended to the case ofK 2 independent samples. Details of a normalization of the statistic under consideration are presented, leading to computational procedures for the overallK-group significance test and accompanying multiple comparisons. Results based on computer simulation methods are presented, demonstrating that the procedures control Type I error adequately. The results of a power comparison of the case ofK=2 with Feldt's [1969]F test are also presented. The differences in power were negligible. Some final observations, along with suggestions for further research, are noted.The authors gratefully acknowledge the assistance of Michael E. Masson, in the computations performed, and of Leonard S. Feldt, in suggesting the data generation procedures used in the study. In addition, the authors thank James Zidek and the Institute of Applied Mathematics and Statistics, University of British Columbia, for advice concerning some of the theoretical development. 相似文献

4.

Ability estimation for conventional tests

Jwa K. Kim W. Alan Nicewander 《Psychometrika》1993,58(4):587-599

Five different ability estimators—maximum likelihood [MLE ()], weighted likelihood [WLE ()], Bayesian modal [BME ()], expected a posteriori [EAP ()] and the standardized number-right score [Z ()]—were used as scores for conventional, multiple-choice tests. The bias, standard error and reliability of the five ability estimators were evaluated using Monte Carlo estimates of the unknown conditional means and variances of the estimators. The results indicated that ability estimates based on BME (), EAP () or WLE () were reasonably unbiased for the range of abilities corresponding to the difficulty of a test, and that their standard errors were relatively small. Also, they were as reliable as the old standby—the number-right score. 相似文献

5.

Some critical observations of the test information function as a measure of local accuracy in ability estimation

Fumiko Samejima 《Psychometrika》1994,59(3):307-329

The test information function serves important roles in latent trait models and in their applications. Among others, it has been used as the measure of accuracy in ability estimation. A question arises, however, if the test information function is accurate enough for all meaningful levels of ability relative to the test, especially when the number of test items is relatively small (e.g., less than 50). In the present paper, using the constant information model and constant amounts of test information for a finite interval of ability, simulated data were produced for eight different levels of ability and for twenty different numbers of test items ranging between 10 and 200. Analyses of these data suggest that it is desirable to consider some modification of the test information function when it is used as the measure of accuracy in ability estimation. 相似文献

6.

The greatest lower bound to the reliability of a test and the hypothesis of unidimensionality

Jos?M.?F.?Ten?Berge Email author Gregor?So?an 《Psychometrika》2004,69(4):613-625

To assess the reliability of congeneric tests, specifically designed reliability measures have been proposed. This paper emphasizes that such measures rely on a unidimensionality hypothesis, which can neither be confirmed nor rejected when there are only three test parts, and will invariably be rejected when there are more than three test parts. Jackson and Agunwamba's (1977) greatest lower bound to reliability is proposed instead. Although this bound has a reputation for overestimating the population value when the sample size is small, this is no reason to prefer the unidimensionality-based reliability. Firstly, the sampling bias problem of the glb does not play a role when the number of test parts is small, as is often the case with congeneric measures. Secondly, glb and unidimensionality based reliability are often equal when there are three test parts, and when there are more test parts, their numerical values are still very similar. To the extent that the bias problem of the greatest lower bound does play a role, unidimensionality-based reliability is equally affected. Although unidimensionality and reliability are often thought of as unrelated, this paper shows that, from at least two perspectives, they act as antagonistic concepts. A measure, based on the same framework that led to the greatest lower bound, is discussed for assessing how close is a set of variables to unidimensionality. It is the percentage of common variance that can be explained by a single factor. An empirical example is given to demonstrate the main points of the paper. The authors are obliged to Henk Kiers for commenting on a previous version. Gregor Sočan is now at the University of Ljubljana. 相似文献

7.

Effects of Situational Judgment Test Format on Reliability and Validity

Michelle P. Martin-Raugh Cristina Anguiano-Carrsaco Teresa Jackson Meghan W. Brenneman Lauren Carney Patrick Barnwell 《International Journal of Testing》2018,18(2):135-154

Single-response situational judgment tests (SRSJTs) differ from multiple-response SJTs (MRSJTS) in that they present test takers with edited critical incidents and simply ask test takers to read over the action described and evaluate it according to its effectiveness. Research comparing the reliability and validity of SRSJTs and MRSJTs is thus far extremely limited. The study reported here directly compares forms of a SRSJT and MRSJT and explores the reliability, convergent validity, and predictive validity of each format. Results from this investigation present preliminary evidence to suggest SRSJTs may produce internal consistency reliability, convergent validity, and predictive validity estimates that are comparable to those achieved with many traditional MRSJTs. We conclude by discussing practical implications for personnel selection and assessment, and future research in psychological science more broadly. 相似文献

8.

Reliability of test scores in nonparametric item response theory

Klaas Sijtsma Ivo W. Molenaar 《Psychometrika》1987,52(1):79-97

Three methods for estimating reliability are studied within the context of nonparametric item response theory. Two were proposed originally by Mokken (1971) and a third is developed in this paper. Using a Monte Carlo strategy, these three estimation methods are compared with four classical lower bounds to reliability. Finally, recommendations are given concerning the use of these estimation methods.The authors are grateful for constructive comments from the reviewers and from Charles Lewis. 相似文献

9.

The effects on the predictive variance of a new subject’s score for a new test

Hidetoshi Ishii Hiroshi Watanabe 《The Japanese psychological research》2002,44(2):113-119

Abstract: It is often required to predict the scores or their variations under interest. Ishii and Watanabe (2001) investigated, in the context of psychological measurement, the Bayesian predictive distribution of a new subject’s scores for tests and subjects’ scores for a new test. In this paper, the Bayesian posterior predictive distribution of a new subject’s scores for a new parallel test were considered. And the effects of the number of subjects, the number of the tests, and the test reliability were investigated. Then, it was found that, under assumptions that (co)variance parameters are known, the predictive variance of a new subject’s score for a new test was equal to the predictive variances of the new subject’s scores for the existent tests. It was also found that the effect of the number of subjects was relatively large and the effect of the number of tests was relatively small, when a new subject’s scores for existent tests were not observed. 相似文献

10.

大五人格测验在中国应用的信度概化分析

下载免费PDF全文

罗杰周瑗陈维潘运赵守盈《心理发展与教育》2016,32(1):121-128

对过去20年(1994~2013年)间国内有关大五人格测验的研究文献进行信度概化分析。结果表明:(1)检索到的文献中约68.15%存在"信度引入"现象;(2)未加权估计中,A和O的均值最低,N和C的均值最高,国内所得结果均略低于国外(O除外),而后者的变异性略大(E除外);采用α系数效果量方法,在随机效应模型中,N的估计值最高,O和A的估计值最低;(3)回归分析显示,分数均值、量表来源和南北地域差异是N维度信度的预测变量;量表来源、文章专业类型、测验版本和测验记分对E维度信度具有预测作用;样本量、文章专业类型和量表来源是O维度信度的预测变量;量表来源、文章专业类型、项目数和样本类型对A维度信度具有预测作用;量表来源、项目数、文章专业类型和测验记分是C维度信度的预测变量。相似文献

11.

An experimental test of stimulus estimation theory: danger and safety with snake phobic stimuli

Wright LM Holborn SW Rezutek PE 《Behaviour research and therapy》2002,40(8):911-922

The stimulus estimation model (Taylor & Rachman, 1994) asserts that fear overprediction stems from: (a) overprediction of the danger elements of a phobic stimulus, and (b) underprediction of existing safety resources. Using a 2x2 factorial design, with danger (high vs low) and safety (high vs low) as between-subjects variables, an experimental test of the model was conducted with 25 snake-fearful participants per condition. The four experimental conditions were matched on initial levels of snake fearfulness, as assessed by the Snake Questionnaire (SNAQ). For the 51 participants who demonstrated overprediction of fear, high danger led to reliably more fear overprediction than low danger; and low safety led to reliably more fear overprediction than high safety. The interaction between danger and safety was not statistically significant. The results offer the first convincing experimental support for the stimulus estimation model of fear overprediction. 相似文献

12.

信度的再认识与信度概括化研究

关丹丹张厚粲《心理科学》2004,27(2):445-448

本文首先对信度概念进行了明确,指出信度是评价测验结果可靠与否的一个指标,而不是测验工具的不变属性。针对测验结果的信度估计的可变性,介绍了上世纪末Vacha-Haase提出的信度概括化研究方法．即一种用来探索得分信度估计的可变性、并对引起变异的预测源进行探讨的一种元分析方法。最后通过对信度概括化研究手段的分析,指出信度概念的再认识与信度概括化研究将会给心理测验工作者带来新的启示。相似文献

13.

Correlation and prediction in ordinal test theory

Robert S. Schulman 《Psychometrika》1976,41(3):329-340

Based on the test theory model for ordinal measurements proposed by Schulman and Haden, the present paper considers correlations between tests, attenuation, regressions involving true and observed scores, and prediction of test reliability.The population correlation between tests is shown to be related to the expected sample correlation for samples of sizen ₁ andn ₂. Errors of estimation, measurement and prediction are found to be similar to their counterparts in interval test theory, while attenuation is identical to its counterpart. The bias in estimating population reliability from sample data is compared for Kendall's tau and Spearman's rho.The author wishes to thank the referees for their helpful comments on an earlier draft of this paper, and in particular, for the suggested alternative methods of establishing some of the presented results. 相似文献

14.

A modification of Feldt's test of the equality of two dependent alpha coefficients

Yousef M. Alsawalmeh Leonard S. Feldt 《Psychometrika》1994,59(1):49-57

The available statistical tests of the equality of nonindependent alpha reliability coefficients require that the product of the number of test parts times the number of subjects be quite large—1000 or more. A modification of one of these tests is derived which avoids this limitation. Monte Carlo studies indicate that the modified test effectively controls the Type I error rate with as few as 2 or 3 test parts and 50 subjects. This means the modified test can be safely employed in comparisons between interrater reliabilities. 相似文献

15.

Attenuation-Corrected Estimators of Reliability

Jari Metsmuuronen 《应用心理检测》2022,46(8):720

The estimates of reliability are usually attenuated and deflated because the item–score correlation (

ρ_{g X}

, Rit) embedded in the most widely used estimators is affected by several sources of mechanical error in the estimation. Empirical examples show that, in some types of datasets, the estimates by traditional alpha may be deflated by 0.40–0.60 units of reliability and those by maximal reliability by 0.40 units of reliability. This article proposes a new kind of estimator of correlation: attenuation-corrected correlation (R_AC): the proportion of observed correlation with the maximal possible correlation reachable by the given item and score. By replacing

ρ_{g X}

with R_AC in known formulas of estimators of reliability, we get attenuation-corrected alpha, theta, omega, and maximal reliability which all belong to a family of so-called deflation-corrected estimators of reliability. 相似文献

16.

单维测验合成信度元分析

叶宝娟温忠粦胡竹菁《心理科学》2013,36(6):1464-1469

元分析是根据现有研究对感兴趣的主题得出比较准确和有代表性结论的一种重要方法,在心理、教育、管理、医学等社会科学研究中得到广泛应用。信度是衡量测验质量的重要指标,用合成信度能比较准确的估计测验信度。未见有文献提供合成信度元分析方法。本研究在比较对参数进行元分析的三种模型优劣的基础上,在变化系数模型下推出合成信度元分析点估计及区间估计的方法;以区间覆盖率为衡量指标,模拟研究表明本研究提出的合成信度元分析区间估计的方法得当;举例说明如何对单维测验的合成信度进行元分析。相似文献

17.

Pseudo maximum likelihood estimation and a test for misspecification in mean and covariance structure models

Gerhard Arminger Ronald J. Schoenberg 《Psychometrika》1989,54(3):409-425

Using the theory of pseudo maximum likelihood estimation the asymptotic covariance matrix of maximum likelihood estimates for mean and covariance structure models is given for the case where the variables are not multivariate normal. This asymptotic covariance matrix is consistently estimated without the computation of the empirical fourth order moment matrix. Using quasi-maximum likelihood theory a Hausman misspecification test is developed. This test is sensitive to misspecification caused by errors that are correlated with the independent variables. This misspecification cannot be detected by the test statistics currently used in covariance structure analysis.For helpful comments on a previous draft of the paper we are indebted to Kenneth A. Bollen, Ulrich L. Küsters, Michael E. Sobel and the anonymous reviewers of Psychometrika. For partial research support, the first author wishes to thank the Department of Sociology at the University of Arizona, where he was a visiting professor during the fall semester 1987. 相似文献

18.

Reliability and validity of a cost-efficient sociometric measure

William T. Riley 《Journal of psychopathology and behavioral assessment》1985,7(3):235-241

Sociometric measures have been used frequently to measure social status; however, reliable sociograms for young children usually involve time-consuming administrations. A group-administered, peer-rating sociogram, the Sociometric Peer-Rating Scale (SPRS), was devised and given to 217 first and second graders. Concomitantly, teacher nominations of children most liked, aggressive, or withdrawn and behavioral observations of the high- and low-SPRS children were obtained. After 7 months, the SPRS was readministered. On a separate population of eight kindergarten children, this sociogram and a similar, individually administered sociogram were given. Normative data, test-retest reliability, and split-half reliability were reported. The test-retest reliability was comparable to the reported reliability of other peer-rating sociograms, and the SPRS correlated significantly with teacher ratings of aggressiveness and likability and with the individually administered sociogram. The number of positive interactions was significantly different for high-versus low-SPRS children. The usefulness of the SPRS as a measure of social competence was discussed.This research was submitted by the author in partial fulfillment of the requirements of a master's degree at the Florida State University.I would like to thank the Master's committee, Wallace Kennedy, William Pelham, and Joseph Torgesen, and the participating schools, Developmental Research School of Florida State University and Woodville Elementary School of the Leon County School District, for their assistance in this study. 相似文献

19.

Reliability and convergent validity of a symbolic test for authoritarianism

H W Hogan 《The Journal of psychology》1970,76(1):39-43

相似文献

20.

Extensions of a versatile randomization test for assessing single-case intervention effects

Levin JR Lall VF Kratochwill TR 《Journal of School Psychology》2011,49(1):55-79

The purpose of the present study was to investigate the statistical properties of two extensions of the Levin-Wampold (1999) single-case simultaneous start-point model's comparative effectiveness randomization test. The two extensions were (a) adapting the test to situations where there are more than two different intervention conditions and (b) examining the test's performance in classroom-based intervention situations, where the number of time periods (and associated outcome observations) is much smaller than in the contexts for which the test was originally developed. Various Monte Carlo sampling situations were investigated, including from one to five participant blocks per condition and differing numbers of time periods, potential intervention start points, degrees of within-phase autocorrelation, and effect sizes. For all situations, it was found that the Type I error probability of the randomization test was maintained at an acceptable level. With a few notable exceptions, respectable power was observed only in situations where the numbers of observations and potential intervention start points were relatively large, effect sizes were large, and the degree of within-phase autocorrelation was relatively low. It was concluded that the comparative effectiveness randomization test, with its desirable internal validity and statistical-conclusion validity features, is a versatile analytic tool that can be incorporated into a variety of single-case school psychology intervention research situations. 相似文献