首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
This research note proposes two reliability coefficients for tests with components of unknown functional lengths. The derived coefficients are extensions of the techniques devised by Kristof and Feldt and do not require a reduction of test components into parts. Simulation study indicates that the new coefficients yield reasonably stable reliability estimates when the number of test components is small.  相似文献   

In measurement studies the researcher may wish to test the hypothesis that Cronbach's alpha reliability coefficient is the same for two measurement procedures. A statistical test exists for independent samples of subjects. In this paper three procedures are developed for the situation in which the coefficients are determined from the same sample. All three procedures are computationally simple and give tight control of Type I error when the sample size is 50 or greater.The author is indebted to Jerry S. Gilmer for development of the computer programs used in this study.  相似文献   

A k-sample significance test for independent alpha coefficients   总被引:1,自引:0,他引:1  
The earlier two-sample procedure of Feldt [1969] for comparing independent alpha reliability coefficients is extended to the case ofK 2 independent samples. Details of a normalization of the statistic under consideration are presented, leading to computational procedures for the overallK-group significance test and accompanying multiple comparisons. Results based on computer simulation methods are presented, demonstrating that the procedures control Type I error adequately. The results of a power comparison of the case ofK=2 with Feldt's [1969]F test are also presented. The differences in power were negligible. Some final observations, along with suggestions for further research, are noted.The authors gratefully acknowledge the assistance of Michael E. Masson, in the computations performed, and of Leonard S. Feldt, in suggesting the data generation procedures used in the study. In addition, the authors thank James Zidek and the Institute of Applied Mathematics and Statistics, University of British Columbia, for advice concerning some of the theoretical development.  相似文献   

Five different ability estimators—maximum likelihood [MLE ()], weighted likelihood [WLE ()], Bayesian modal [BME ()], expected a posteriori [EAP ()] and the standardized number-right score [Z ()]—were used as scores for conventional, multiple-choice tests. The bias, standard error and reliability of the five ability estimators were evaluated using Monte Carlo estimates of the unknown conditional means and variances of the estimators. The results indicated that ability estimates based on BME (), EAP () or WLE () were reasonably unbiased for the range of abilities corresponding to the difficulty of a test, and that their standard errors were relatively small. Also, they were as reliable as the old standby—the number-right score.  相似文献   

The test information function serves important roles in latent trait models and in their applications. Among others, it has been used as the measure of accuracy in ability estimation. A question arises, however, if the test information function is accurate enough for all meaningful levels of ability relative to the test, especially when the number of test items is relatively small (e.g., less than 50). In the present paper, using the constant information model and constant amounts of test information for a finite interval of ability, simulated data were produced for eight different levels of ability and for twenty different numbers of test items ranging between 10 and 200. Analyses of these data suggest that it is desirable to consider some modification of the test information function when it is used as the measure of accuracy in ability estimation.  相似文献   

To assess the reliability of congeneric tests, specifically designed reliability measures have been proposed. This paper emphasizes that such measures rely on a unidimensionality hypothesis, which can neither be confirmed nor rejected when there are only three test parts, and will invariably be rejected when there are more than three test parts. Jackson and Agunwamba's (1977) greatest lower bound to reliability is proposed instead. Although this bound has a reputation for overestimating the population value when the sample size is small, this is no reason to prefer the unidimensionality-based reliability. Firstly, the sampling bias problem of the glb does not play a role when the number of test parts is small, as is often the case with congeneric measures. Secondly, glb and unidimensionality based reliability are often equal when there are three test parts, and when there are more test parts, their numerical values are still very similar. To the extent that the bias problem of the greatest lower bound does play a role, unidimensionality-based reliability is equally affected. Although unidimensionality and reliability are often thought of as unrelated, this paper shows that, from at least two perspectives, they act as antagonistic concepts. A measure, based on the same framework that led to the greatest lower bound, is discussed for assessing how close is a set of variables to unidimensionality. It is the percentage of common variance that can be explained by a single factor. An empirical example is given to demonstrate the main points of the paper. The authors are obliged to Henk Kiers for commenting on a previous version. Gregor Sočan is now at the University of Ljubljana.  相似文献   

Single-response situational judgment tests (SRSJTs) differ from multiple-response SJTs (MRSJTS) in that they present test takers with edited critical incidents and simply ask test takers to read over the action described and evaluate it according to its effectiveness. Research comparing the reliability and validity of SRSJTs and MRSJTs is thus far extremely limited. The study reported here directly compares forms of a SRSJT and MRSJT and explores the reliability, convergent validity, and predictive validity of each format. Results from this investigation present preliminary evidence to suggest SRSJTs may produce internal consistency reliability, convergent validity, and predictive validity estimates that are comparable to those achieved with many traditional MRSJTs. We conclude by discussing practical implications for personnel selection and assessment, and future research in psychological science more broadly.  相似文献   

Three methods for estimating reliability are studied within the context of nonparametric item response theory. Two were proposed originally by Mokken (1971) and a third is developed in this paper. Using a Monte Carlo strategy, these three estimation methods are compared with four classical lower bounds to reliability. Finally, recommendations are given concerning the use of these estimation methods.The authors are grateful for constructive comments from the reviewers and from Charles Lewis.  相似文献   

Abstract: It is often required to predict the scores or their variations under interest. Ishii and Watanabe (2001) investigated, in the context of psychological measurement, the Bayesian predictive distribution of a new subject’s scores for tests and subjects’ scores for a new test. In this paper, the Bayesian posterior predictive distribution of a new subject’s scores for a new parallel test were considered. And the effects of the number of subjects, the number of the tests, and the test reliability were investigated. Then, it was found that, under assumptions that (co)variance parameters are known, the predictive variance of a new subject’s score for a new test was equal to the predictive variances of the new subject’s scores for the existent tests. It was also found that the effect of the number of subjects was relatively large and the effect of the number of tests was relatively small, when a new subject’s scores for existent tests were not observed.  相似文献   

对过去20年(1994~2013年)间国内有关大五人格测验的研究文献进行信度概化分析。结果表明:(1)检索到的文献中约68.15%存在"信度引入"现象;(2)未加权估计中,A和O的均值最低,N和C的均值最高,国内所得结果均略低于国外(O除外),而后者的变异性略大(E除外);采用α系数效果量方法,在随机效应模型中,N的估计值最高,O和A的估计值最低;(3)回归分析显示,分数均值、量表来源和南北地域差异是N维度信度的预测变量;量表来源、文章专业类型、测验版本和测验记分对E维度信度具有预测作用;样本量、文章专业类型和量表来源是O维度信度的预测变量;量表来源、文章专业类型、项目数和样本类型对A维度信度具有预测作用;量表来源、项目数、文章专业类型和测验记分是C维度信度的预测变量。  相似文献   

The stimulus estimation model (Taylor & Rachman, 1994) asserts that fear overprediction stems from: (a) overprediction of the danger elements of a phobic stimulus, and (b) underprediction of existing safety resources. Using a 2x2 factorial design, with danger (high vs low) and safety (high vs low) as between-subjects variables, an experimental test of the model was conducted with 25 snake-fearful participants per condition. The four experimental conditions were matched on initial levels of snake fearfulness, as assessed by the Snake Questionnaire (SNAQ). For the 51 participants who demonstrated overprediction of fear, high danger led to reliably more fear overprediction than low danger; and low safety led to reliably more fear overprediction than high safety. The interaction between danger and safety was not statistically significant. The results offer the first convincing experimental support for the stimulus estimation model of fear overprediction.  相似文献   

关丹丹  张厚粲 《心理科学》2004,27(2):445-448
本文首先对信度概念进行了明确,指出信度是评价测验结果可靠与否的一个指标,而不是测验工具的不变属性。针对测验结果的信度估计的可变性,介绍了上世纪末Vacha-Haase提出的信度概括化研究方法.即一种用来探索得分信度估计的可变性、并对引起变异的预测源进行探讨的一种元分析方法。最后通过对信度概括化研究手段的分析,指出信度概念的再认识与信度概括化研究将会给心理测验工作者带来新的启示。  相似文献   

Based on the test theory model for ordinal measurements proposed by Schulman and Haden, the present paper considers correlations between tests, attenuation, regressions involving true and observed scores, and prediction of test reliability.The population correlation between tests is shown to be related to the expected sample correlation for samples of sizen 1 andn 2. Errors of estimation, measurement and prediction are found to be similar to their counterparts in interval test theory, while attenuation is identical to its counterpart. The bias in estimating population reliability from sample data is compared for Kendall's tau and Spearman's rho.The author wishes to thank the referees for their helpful comments on an earlier draft of this paper, and in particular, for the suggested alternative methods of establishing some of the presented results.  相似文献   

The available statistical tests of the equality of nonindependent alpha reliability coefficients require that the product of the number of test parts times the number of subjects be quite large—1000 or more. A modification of one of these tests is derived which avoids this limitation. Monte Carlo studies indicate that the modified test effectively controls the Type I error rate with as few as 2 or 3 test parts and 50 subjects. This means the modified test can be safely employed in comparisons between interrater reliabilities.  相似文献   

The estimates of reliability are usually attenuated and deflated because the item–score correlation ( ρgX , Rit) embedded in the most widely used estimators is affected by several sources of mechanical error in the estimation. Empirical examples show that, in some types of datasets, the estimates by traditional alpha may be deflated by 0.40–0.60 units of reliability and those by maximal reliability by 0.40 units of reliability. This article proposes a new kind of estimator of correlation: attenuation-corrected correlation (R AC ): the proportion of observed correlation with the maximal possible correlation reachable by the given item and score. By replacing ρgX with R AC in known formulas of estimators of reliability, we get attenuation-corrected alpha, theta, omega, and maximal reliability which all belong to a family of so-called deflation-corrected estimators of reliability.  相似文献   

元分析是根据现有研究对感兴趣的主题得出比较准确和有代表性结论的一种重要方法,在心理、教育、管理、医学等社会科学研究中得到广泛应用。信度是衡量测验质量的重要指标,用合成信度能比较准确的估计测验信度。未见有文献提供合成信度元分析方法。本研究在比较对参数进行元分析的三种模型优劣的基础上,在变化系数模型下推出合成信度元分析点估计及区间估计的方法;以区间覆盖率为衡量指标,模拟研究表明本研究提出的合成信度元分析区间估计的方法得当;举例说明如何对单维测验的合成信度进行元分析。  相似文献   

Using the theory of pseudo maximum likelihood estimation the asymptotic covariance matrix of maximum likelihood estimates for mean and covariance structure models is given for the case where the variables are not multivariate normal. This asymptotic covariance matrix is consistently estimated without the computation of the empirical fourth order moment matrix. Using quasi-maximum likelihood theory a Hausman misspecification test is developed. This test is sensitive to misspecification caused by errors that are correlated with the independent variables. This misspecification cannot be detected by the test statistics currently used in covariance structure analysis.For helpful comments on a previous draft of the paper we are indebted to Kenneth A. Bollen, Ulrich L. Küsters, Michael E. Sobel and the anonymous reviewers of Psychometrika. For partial research support, the first author wishes to thank the Department of Sociology at the University of Arizona, where he was a visiting professor during the fall semester 1987.  相似文献   

Sociometric measures have been used frequently to measure social status; however, reliable sociograms for young children usually involve time-consuming administrations. A group-administered, peer-rating sociogram, the Sociometric Peer-Rating Scale (SPRS), was devised and given to 217 first and second graders. Concomitantly, teacher nominations of children most liked, aggressive, or withdrawn and behavioral observations of the high- and low-SPRS children were obtained. After 7 months, the SPRS was readministered. On a separate population of eight kindergarten children, this sociogram and a similar, individually administered sociogram were given. Normative data, test-retest reliability, and split-half reliability were reported. The test-retest reliability was comparable to the reported reliability of other peer-rating sociograms, and the SPRS correlated significantly with teacher ratings of aggressiveness and likability and with the individually administered sociogram. The number of positive interactions was significantly different for high-versus low-SPRS children. The usefulness of the SPRS as a measure of social competence was discussed.This research was submitted by the author in partial fulfillment of the requirements of a master's degree at the Florida State University.I would like to thank the Master's committee, Wallace Kennedy, William Pelham, and Joseph Torgesen, and the participating schools, Developmental Research School of Florida State University and Woodville Elementary School of the Leon County School District, for their assistance in this study.  相似文献   

The purpose of the present study was to investigate the statistical properties of two extensions of the Levin-Wampold (1999) single-case simultaneous start-point model's comparative effectiveness randomization test. The two extensions were (a) adapting the test to situations where there are more than two different intervention conditions and (b) examining the test's performance in classroom-based intervention situations, where the number of time periods (and associated outcome observations) is much smaller than in the contexts for which the test was originally developed. Various Monte Carlo sampling situations were investigated, including from one to five participant blocks per condition and differing numbers of time periods, potential intervention start points, degrees of within-phase autocorrelation, and effect sizes. For all situations, it was found that the Type I error probability of the randomization test was maintained at an acceptable level. With a few notable exceptions, respectable power was observed only in situations where the numbers of observations and potential intervention start points were relatively large, effect sizes were large, and the degree of within-phase autocorrelation was relatively low. It was concluded that the comparative effectiveness randomization test, with its desirable internal validity and statistical-conclusion validity features, is a versatile analytic tool that can be incorporated into a variety of single-case school psychology intervention research situations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号