首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
bootstrap法在合成分数信度区间估计中的应用   总被引:1,自引:0,他引:1  
屠金路  金瑜  王庭照 《心理科学》2005,28(5):1199-1200
在介绍bootstrap法原理的基础上,本文以一个同质测量模式的模拟数据为例,对结构方程模型下使用bootstrap法对合成分数信度的区间估计的应用中进行了演示。  相似文献   

2.
Composite measures play an important role in psychology and related disciplines. Composite measures almost always have error. Correspondingly, it is important to understand the reliability of the scores from any particular composite measure. However, the point estimates of the reliability of composite measures are fallible and thus all such point estimates should be accompanied by a confidence interval. When confidence intervals are wide, there is much uncertainty in the population value of the reliability coefficient. Given the importance of reporting confidence intervals for estimates of reliability, coupled with the undesirability of wide confidence intervals, we develop methods that allow researchers to plan sample size in order to obtain narrow confidence intervals for population reliability coefficients. We first discuss composite reliability coefficients and then provide a discussion on confidence interval formation for the corresponding population value. Using the accuracy in parameter estimation approach, we develop two methods to obtain accurate estimates of reliability by planning sample size. The first method provides a way to plan sample size so that the expected confidence interval width for the population reliability coefficient is sufficiently narrow. The second method ensures that the confidence interval width will be sufficiently narrow with some desired degree of assurance (e.g., 99% assurance that the 95% confidence interval for the population reliability coefficient will be less than W units wide). The effectiveness of our methods was verified with Monte Carlo simulation studies. We demonstrate how to easily implement the methods with easy-to-use and freely available software.  相似文献   

3.
有两种方法可以估计多维测验合成信度的置信区间:Bootstrap法和Delta法.本文用模拟研究比较这两种方法,结果发现,Delta法与Bootstrap法得到结果的差异很小.因为Bootstrap法得到的是实证结果,通常被认为是真值的反映,而Delta法比Bootstrap法简单得多,所以可以用Delta法估计合成信度的置信区间.举例演示如何计算多维测验的合成信度以及用Delta法计算其置信区间.  相似文献   

4.
单维测验合成信度三种区间估计的比较   总被引:3,自引:0,他引:3  
叶宝娟  温忠麟 《心理学报》2011,43(4):453-461
已有许多研究建议使用合成信度来估计测验信度, 并报告其置信区间。有三种方法或途径可以计算单维测验合成信度的置信区间, 包括Bootstrap法、Delta法和直接用统计软件(如LISREL)输出的标准误进行计算。本文通过模拟研究进行比较, 发现Delta法与Bootstrap法得到的置信区间相当接近, 但用LISREL输出的标准误计算的与Bootstrap法得到的结果相差很大。推荐用Delta法估计合成信度的置信区间(使用Mplus容易实现), 但不能直接用LISREL输出的标准误来计算。举例说明了如何计算单维测验的合成信度以及用Delta法计算其置信区间。  相似文献   

5.
叶宝娟  温忠粦 《心理科学》2012,35(5):1213-1217
大量研究表明,一般情况下用合成信度可以较好地估计测验信度。对于合成信度及其置信区间的估计方法,在单维测验的情形已有不少研究。但罕有研究讨论多维测验合成信度的区间估计方法。本文用Delta法推导出计算多维测验合成信度的标准误公式,进而计算置信区间,并用一个例子说明如何编程估计多维测验合成信度及其置信区间。  相似文献   

6.
Many writers implicate perceptions of the opportunity structure in the labor market as essential components of the formation, stability, and enactment of socioeconomic achievement attitudes. These perceptions of opportunity are thought to be observed structural constraints and reflective of more than just pure motivation. Previous attempts at measuring “perceived opportunity” have no consistent approach or conceptualization. This study evaluates a 10-item scale of perceived occupational opportunity in an attempt to overcome many of these problems. Using panel data covering the period of career decision making and labor force entry (adolescence to young adulthood), the internal reliability and construct validity of the linear composite are assessed. The scale's external validity is then further explored within the context of a structural equation model linking perceived opportunity to social origins, adolescent career plans, and early socioeconomic attainments.  相似文献   

7.
元分析是根据现有研究对感兴趣的主题得出比较准确和有代表性结论的一种重要方法,在心理、教育、管理、医学等社会科学研究中得到广泛应用。信度是衡量测验质量的重要指标,用合成信度能比较准确的估计测验信度。未见有文献提供合成信度元分析方法。本研究在比较对参数进行元分析的三种模型优劣的基础上,在变化系数模型下推出合成信度元分析点估计及区间估计的方法;以区间覆盖率为衡量指标,模拟研究表明本研究提出的合成信度元分析区间估计的方法得当;举例说明如何对单维测验的合成信度进行元分析。  相似文献   

8.
This paper demonstrates that the widely available and routinely used index ‘coefficient alpha if item deleted’ can be misleading in the process of construction and revision of multiple‐component instruments with congeneric measures. An alternative approach to evaluation of scale reliability following deletion of each component in a given composite is outlined that can be recommended in general for scale development purposes. The method provides ranges of plausible values for instrument reliability when dispensing with single components in a tentative composite, and permits testing hypotheses about reliability of resulting scale versions. The proposed procedure is illustrated with an example.  相似文献   

9.
Inter-rater reliability and accuracy are measures of rater performance. Inter-rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter-rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert-generated criterion ratings and between raters using intraclass correlation (2,1). Inter-rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter-rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter-rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter-rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.  相似文献   

10.
Heng Li 《Psychometrika》1997,62(2):245-249
A formally simple expression for the maximal reliability of a linear composite is provided, its theoretical implications and its relation to existing results for reliability are discussed.  相似文献   

11.
Weights may be determined for combining tests so that the composite has maximum reliability, but the calculation of these weights by means of the original equations is cumbersome. It is shown that the desired weighted composite is the first principal axis of a matrix closely related to the intercorrelation matrix. Thus, simple and straightforward procedures are available for calculating the weights.  相似文献   

12.
探讨高校教师教学水平评价多元概化理论合成信度的权重效应。使用"高校教师教学水平评价问卷",要求534名学生对16名教师进行评价,对收集到的数据作嵌套设计的多元概化理论分析。结果表明:(1)在高校教师教学水平评价中,多元概化理论合成信度估计权重、先验权重和效果权重产生不同效应;(2)结合高校教师教学水平评价,在充分分析三种权重对合成信度影响的基础上,提出了多元概化理论"合成信度三种权重效应分析模式图",能为正确使用多元概化理论权重提供科学参考。  相似文献   

13.
It is often desirable for mental health practitioners to combine standard scores from different tests, raters, or times into a single composite standard score. Most often the result is a more reliable and accurate standard score. This paper describes a computer program that uses two standard scores, score reliability and correlation with a third variable, to yield a composite standard score, reliability and correlation. Trends, limitations, optimum benefits, and examples are discussed. References are provided for calculating composites based on more than two scores.  相似文献   

14.
In this article we show that a particular mathematical learning model, the Bower-Trabasso (1964) concept identification model taken together with an assumption of independence of replicate measurements, implies the existence of substantial and statistically significant performance differences across individuals. The individual differences in turn imply a sizeable reliability coefficient. These results contradict naive intuition, for this model (like many other mathematical models of learning) assumes that all individuals begin the experiment with identical parameter values for the process under study. Thus at least one such model has the characteristic of implying the generation of individual performance differences among originally identical organisms.Examination of data from an experiment by Cotton shows that the Hoyt reliability coefficient under classical test theory, a lower bound for the (composite) reliability of total scores for a series of trials, increases with the number of trials analyzed and exceeds the corresponding theoretical values implied by the Bower-Trabasso model. An experiment by Levine was also analyzed because its use of blank trials between feedback trials permitted direct calculation of composite reliability (or more properly stated composite consistency). For this experiment, the theoretical development just discussed (Case I) was used together with Restle's hypothesis selection model specialized to include a local consistency assumption, the so-called P2 model of Gregg and Simon (Case II). Moderate conformity of empirical and theoretical reliabilities was found, with discrepancies between observed and predicted values usually being smaller with Case II. However, the Hoyt reliability coefficient is not a lower bound for composite reliability in Case II, because composite reliability is underestimated when identical stimuli are not used for comparable trials.Despite the Bower-Trabasso assumption of no initial differences, it seems reasonable to attribute the difference between predicted and obtained reliabilities to preexisting individual differences. Implications of the tentative conclusion that individual differences in concept identification performance are attributable to a combination of preexisting differences and differences induced in a current task are discussed briefly.  相似文献   

15.
Inter‐rater reliability and accuracy are measures of rater performance. Inter‐rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter‐rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert‐generated criterion ratings and between raters using intraclass correlation (2,1). Inter‐rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter‐rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter‐rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter‐rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.  相似文献   

16.
A reanalysis of previously published data suggests that the Defense Mechanism Inventory can be utilized to yield a composite measure of reaction to frustration by contrasting linearly the defenses of Turning-against-object and Projection against those of Principalization and Reversal-of-affect. Factor-analytic and correlational data support the exclusion of Turning-against-self from the composite measure. Studies of content validity are presented for the combination of the four defenses into one dimension. Patterns of interitem reliability are charted for the five defenses and the composite measure for both men and women. Internal consistency data are also presented for the standard scoring as well as for a modified method to explore the feasibility of simplifying and shortening the test-taking procedure.  相似文献   

17.
领导干部结构化面试信度的多元概括化理论分析   总被引:1,自引:0,他引:1  
洪自强  涂冬波 《心理学探新》2006,26(1):85-90,95
本研究尝试运用多元概括化理论对北京市某区副处级干部准入资格结构化面试测评数据进行测量信度分析,为提高领导干部考试与测评工作科学化水平提供了有益的实证依据。主要结论有:(1)本次结构化面试难度适中,区分度较高;(2)各测评要素及合成分数的类信度系数均较高,合成分数的测量信度高于单个测评要素的测量信度;(3)各测评要素及合成分数的类信度系数随着考官数量的增加而增加,且从确保信度和降低成本考虑,考官数量以5-9位为宜;(4)在这次面试测评中,各项测评要素间的相关系数较高,这为目前在选拔面试中将各项测评要素得分进行合成提供了依据,说明用合成分数计算总分具有一定的合理性。  相似文献   

18.
Two types of interobserver reliability values may be needed in treatment studies in which observers constitute the primary data-acquisition system: trial reliability and the reliability of the composite unit or score which is subsequently analyzed, e.g., daily or weekly session totals. Two approaches to determining interobserver reliability are described: percentage agreement and "correlational" measures of reliability. The interpretation of these estimates, factors affecting their magnitude, and the advantages and limitations of each approach are presented.  相似文献   

19.
Popular computer programs print 2 versions of Cronbach's alpha: unstandardized alpha, α(Σ), based on the covariance matrix, and standardized alpha, α(R), based on the correlation matrix. Sources that accurately describe the theoretical distinction between the 2 coefficients are lacking, which can lead to the misconception that the differences between α(R) and α(Σ) are unimportant and to the temptation to report the larger coefficient. We explore the relationship between α(R) and α(Σ) and the reliability of the standardized and unstandardized composite under 3 popular measurement models; we clarify the theoretical meaning of each coefficient and conclude that researchers should choose an appropriate reliability coefficient based on theoretical considerations. We also illustrate that α(R) and α(Σ) estimate the reliability of different composite scores, and in most cases cannot be substituted for one another.  相似文献   

20.
叶宝娟  温忠麟 《心理学报》2012,44(12):1687-1694
在决定将多维测验分数合并成测验总分时, 应当考虑测验同质性。如果同质性太低, 合成总分没有什么意义。同质性高低可以用同质性系数来衡量。用来计算同质性系数的模型是近年来受到关注的双因子模型(既有全局因子又有局部因子), 测验的同质性系数定义为测验分数方差中全局因子分数方差所占的比例。本文用Delta法推导出计算同质性系数的标准误公式, 进而计算其置信区间。提供了简单的计算同质性系数及其置信区间的程序。用一个例子说明如何估计同质性系数及其置信区间, 通过模拟比较了用Delta法和用Bootstrap法计算的置信区间, 发现两者差异很小。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号