首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A covariance structure analysis method for improved point and interval estimation of composite reliability in repeated measure designs is outlined that accounts for specificity variance. The approach also permits the testing of time‐invariance in reliability of multiple‐component instruments in terms of the ratio of ‘pure’ measurement error variance to observed scale score variance. In addition, the procedure allows interval estimation of the difference in composite reliability coefficients across assessment occasions. The method described is illustrated with data from a cognitive intervention study.  相似文献   

2.
Inter‐rater reliability and accuracy are measures of rater performance. Inter‐rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter‐rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert‐generated criterion ratings and between raters using intraclass correlation (2,1). Inter‐rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter‐rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter‐rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter‐rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.  相似文献   

3.
Drewes DW 《心理学方法》2000,5(2):214-227
The requirement of parallel parts has long been the cornerstone of classic reliability theory. By recasting reliability in a structural equation framework, items, raters, or judges no longer need to be treated as equivalent entities. Instead, unique reliability estimates can be determined for each and collectively used to assess the maximal reliability of a weighted composite, with the composite reliability submitted to inferential test. Procedures are shown to generalize from single to multifactor applications. Ramifications of a structural approach to reliability determination are probed, and the dilemma posed by possible falsification of the true score hypothesis presented for individual researcher consideration.  相似文献   

4.
A simple stochastic model is formulated in order to determine the optimal time between the first test and the second test when the test-retest method of assessing reliability is used. A forgetting process and a change in true score process are postulated. The optimal time between tests is derived by maximizing the probability that the respondent has not remembered the response on the first test and has not had a change in true score. The resulting test-retest correlation is then found to be a linear function of the true reliability of the test, where the slope of this function is the key probability of not remembering and having no change in true score. Some numerical examples and suggestions for using the results in empirical studies are given. Specific recommendations are presented for improved design and analysis of intentions data.This research was made possible by a grant from the Center for Food Policy Research, Graduate School of Business, Columbia University, New York, New York, 10027.  相似文献   

5.
It is shown that measurement error in predictor variables can be modeled using item response theory (IRT). The predictor variables, that may be defined at any level of an hierarchical regression model, are treated as latent variables. The normal ogive model is used to describe the relation between the latent variables and dichotomous observed variables, which may be responses to tests or questionnaires. It will be shown that the multilevel model with measurement error in the observed predictor variables can be estimated in a Bayesian framework using Gibbs sampling. In this article, handling measurement error via the normal ogive model is compared with alternative approaches using the classical true score model. Examples using real data are given.This paper is part of the dissertation by Fox (2001) that won the 2002 Psychometric Society Dissertation Award.  相似文献   

6.
The distinction between state and trait sources of variation in psychological variables, familiar in differential psychology ever since the work of Cattell, is translated into a linear model of state-trait analysis. On the basis of this model indices of relative state- and trait variance of a measure (independent of errors of measurement) and the psychometric reliability of the same measure (independent of true state variance) can be determined. The model is applied to a larger set of data from in-field research employing 149 male high-school students (average age 17 years) from the city of Hamburg. Following a time-sampling plan of data acquisition, subjects were requested over a period of three successive weeks and on the average twelve times a day to indicate (on a portable computerized behavioral data recorder) their current mood and feeling and had to answer short psychometric tests (in-field psychometrics); in addition, two peripheral psychophysiological measures of activation (heart rate, finger temperature) were recorded. The psychophysiological measures depended on state variations to about two thirds of their true score variance, the different cognitive tests to about 50 to 80 percent of their true score variance on trait variations. Different scales for recording current mood and feeling states gave different state-trait variance percentages. The method also allows for determining intraindividual psychometric state reliability coefficients. Fields of application of the method in different areas of psychology are indicated.  相似文献   

7.
Causal theories of measurement view test items as effects of a common cause. Behavior domain theories view test item responses as behaviors sampled from a common domain. A domain score is a composite score over this domain. The question arises whether latent variables can simultaneously constitute domain scores and common causes of item scores. One argument to the contrary holds that behavior domain theory offers more effective guidance for item construction than a causal theory of measurement. A second argument appeals to the apparent circularity of taking a domain score, which is defined in terms of a domain of behaviors, as a cause of those behaviors. Both arguments require qualification and behavior domain theory seems to rely on implicit causal relationships in two respects. Three strategies permit reconciliation of the two theories: One can take a causal structure as providing the basis for a homogeneous domain. One can construct a homogeneous domain and then investigate whether a causal structure explains the homogeneity. Or, one can take the domain score as linked to an existing attribute constrained by indirect measurement.  相似文献   

8.
A method for examining change in maximal reliability for pre‐specified sets of congeneric measures when developing a multi‐component instrument is outlined. The approach is applicable for purposes of estimation and testing of gain or loss in the maximal reliability coefficient as a consequence of adding or dropping one or more measures from a homogeneous composite with uncorrelated errors, as well as when one is concerned with optimal component choice for highest increase or correspondingly smallest drop in maximal reliability. The method is compared with a procedure for ascertaining change in unweighted sum score reliability, and implications for instrument construction and revision are discussed. The approach is illustrated with a numerical example.  相似文献   

9.
Latent growth curve models with piecewise functions for continuous repeated measures data have become increasingly popular and versatile tools for investigating individual behavior that exhibits distinct phases of development in observed variables. As an extension of this framework, this research study considers a piecewise function for describing segmented change of a latent construct over time where the latent construct is itself measured by multiple indicators gathered at each measurement occasion. The time of transition from one phase to another is not known a priori and thus is a parameter to be estimated. Utility of the model is highlighted in 2 ways. First, a small Monte Carlo simulation is executed to show the ability of the model to recover true (known) growth parameters, including the location of the point of transition (or knot), under different manipulated conditions. Second, an empirical example using longitudinal reading data is fitted via maximum likelihood and results discussed. Mplus (Version 6.1) code is provided in Appendix C to aid in making this class of models accessible to practitioners.  相似文献   

10.
刘源 《心理科学进展》2021,29(10):1755-1772
追踪研究当中, 交叉滞后模型可以探究多变量之间往复式影响, 潜增长模型可以探究个体增长趋势。对两类模型进行整合, 例如同时关注往复式影响与个体增长趋势, 同时可以定义测量误差、随机截距等变异成分, 衍生出随机截距交叉滞后模型、特质-状态-误差模型、自回归潜增长模型、结构化残差潜增长模型等。以交叉滞后模型和潜增长模型分别作为基础模型, 从个体间/个体内变异分解的角度对上述各类模型梳理, 整合出此类模型的分析框架, 并拓展建立“因子结构化潜增长模型(factor latent curve model with structured reciprocals)”作为统合框架。通过实证研究(早期儿童的追踪研究-幼儿园版, ECLS-K), 建立21049名儿童的阅读和数学能力的往复式影响与增长趋势。研究发现, 分离了稳定特质的模型拟合最优。研究也对模型建模思路和模型选择提供了建议。  相似文献   

11.
ABSTRACT In this article, autoregressive models and growth curve models are compared Autoregressive models are useful because they allow for random change, permit scores to increase or decrease, and do not require strong assumptions about the level of measurement Three previously presented designs for estimating stability are described (a) time-series, (b) simplex, and (c) two-wave, one-factor methods A two-wave, multiple-factor model also is presented, in which the variables are assumed to be caused by a set of latent variables The factor structure does not change over time and so the synchronous relationships are temporally invariant The factors do not cause each other and have the same stability The parameters of the model are the factor loading structure, each variable's reliability, and the stability of the factors We apply the model to two data sets For eight cognitive skill variables measured at four times, the 2-year stability is estimated to be 92 and the 6-year stability is 83 For nine personality variables, the 3-year stability is 68 We speculate that for many variables there are two components one component that changes very slowly (the trait component) and another that changes very rapidly (the state component), thus each variable is a mixture of trait and state Circumstantial evidence supporting this view is presented  相似文献   

12.
单维测验合成信度三种区间估计的比较   总被引:3,自引:0,他引:3  
叶宝娟  温忠麟 《心理学报》2011,43(4):453-461
已有许多研究建议使用合成信度来估计测验信度, 并报告其置信区间。有三种方法或途径可以计算单维测验合成信度的置信区间, 包括Bootstrap法、Delta法和直接用统计软件(如LISREL)输出的标准误进行计算。本文通过模拟研究进行比较, 发现Delta法与Bootstrap法得到的置信区间相当接近, 但用LISREL输出的标准误计算的与Bootstrap法得到的结果相差很大。推荐用Delta法估计合成信度的置信区间(使用Mplus容易实现), 但不能直接用LISREL输出的标准误来计算。举例说明了如何计算单维测验的合成信度以及用Delta法计算其置信区间。  相似文献   

13.
The Social Relations Model (SRM) is a conceptual and analytical approach to examining dyadic behaviors and interpersonal perceptions within groups. In an SRM, the perceiver effect describes a person's tendency to perceive other group members in a certain way, whereas the target effect measures the tendency to be perceived by others in certain ways. In SRM research, it is often of interest to relate these individual SRM effects to covariates. However, the estimated individual SRM effects might not provide a very reliable measure of the true, unobserved SRM effects, resulting in distorted estimates of associations with other variables. This article introduces a plausible values approach that allows users to correct for measurement error when assessing the association of individual SRM effects with other individual difference variables. In the plausible values approach, the latent, true individual SRM effects are treated as missing values and are imputed from an imputation model by applying Bayesian estimation techniques. In a simulation study, the statistical properties of the plausible values approach are compared with two approaches that have been used in previous research. A data example from educational psychology is presented to illustrate how the plausible values approach can be implemented with the software WinBUGS.  相似文献   

14.
Inter-rater reliability and accuracy are measures of rater performance. Inter-rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter-rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert-generated criterion ratings and between raters using intraclass correlation (2,1). Inter-rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter-rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter-rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter-rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.  相似文献   

15.
双因子模型:多维构念测量的新视角   总被引:1,自引:0,他引:1       下载免费PDF全文
顾红磊  温忠粦  方杰 《心理科学》2014,37(4):973-979
双因子模型是一种既有全局因子又有局部因子的模型,近年来有了许多应用。本文讨论了双因子模型和高阶因子模型在数学模型、参数之间的关系,概念上和应用上的差异;概述了双因子模型在信度研究、平衡量表、探索性因子分析和项目反应理论中的应用。作为例子,在Rosenberg自尊量表结构的研究中,通过双因子模型分析了自尊特质效应与项目表述方法效应。  相似文献   

16.
In this article we show that a particular mathematical learning model, the Bower-Trabasso (1964) concept identification model taken together with an assumption of independence of replicate measurements, implies the existence of substantial and statistically significant performance differences across individuals. The individual differences in turn imply a sizeable reliability coefficient. These results contradict naive intuition, for this model (like many other mathematical models of learning) assumes that all individuals begin the experiment with identical parameter values for the process under study. Thus at least one such model has the characteristic of implying the generation of individual performance differences among originally identical organisms.Examination of data from an experiment by Cotton shows that the Hoyt reliability coefficient under classical test theory, a lower bound for the (composite) reliability of total scores for a series of trials, increases with the number of trials analyzed and exceeds the corresponding theoretical values implied by the Bower-Trabasso model. An experiment by Levine was also analyzed because its use of blank trials between feedback trials permitted direct calculation of composite reliability (or more properly stated composite consistency). For this experiment, the theoretical development just discussed (Case I) was used together with Restle's hypothesis selection model specialized to include a local consistency assumption, the so-called P2 model of Gregg and Simon (Case II). Moderate conformity of empirical and theoretical reliabilities was found, with discrepancies between observed and predicted values usually being smaller with Case II. However, the Hoyt reliability coefficient is not a lower bound for composite reliability in Case II, because composite reliability is underestimated when identical stimuli are not used for comparable trials.Despite the Bower-Trabasso assumption of no initial differences, it seems reasonable to attribute the difference between predicted and obtained reliabilities to preexisting individual differences. Implications of the tentative conclusion that individual differences in concept identification performance are attributable to a combination of preexisting differences and differences induced in a current task are discussed briefly.  相似文献   

17.
新世纪头20年, 国内心理学11本专业期刊一共发表了213篇统计方法研究论文。研究范围主要包括以下10类(按论文篇数排序):结构方程模型、测验信度、中介效应、效应量与检验力、纵向研究、调节效应、探索性因子分析、潜在类别模型、共同方法偏差和多层线性模型。对各类做了简单的回顾与梳理。结果发现, 国内心理统计方法研究的广度和深度都不断增加, 研究热点在相互融合中共同发展; 但综述类论文比例较大, 原创性研究论文比例有待提高, 研究力量也有待加强。  相似文献   

18.
A new multilevel latent state graded response model for longitudinal multitrait–multimethod (MTMM) measurement designs combining structurally different and interchangeable methods is proposed. The model allows researchers to examine construct validity over time and to study the change and stability of constructs and method effects based on ordinal response variables. We show how Bayesian estimation techniques can address a number of important issues that typically arise in longitudinal multilevel MTMM studies and facilitates the estimation of the model presented. Estimation accuracy and the impact of between‐ and within‐level sample sizes as well as different prior specifications on parameter recovery were investigated in a Monte Carlo simulation study. Findings indicate that the parameters of the model presented can be accurately estimated with Bayesian estimation methods in the case of low convergent validity with as few as 250 clusters and more than two observations within each cluster. The model was applied to well‐being data from a longitudinal MTMM study, assessing the change and stability of life satisfaction and subjective happiness in young adults after high‐school graduation. Guidelines for empirical applications are provided and advantages and limitations of a Bayesian approach to estimating longitudinal multilevel MTMM models are discussed.  相似文献   

19.
Hertzog et al. evaluated the statistical power of linear latent growth curve models (LGCMs) to detect individual differences in change, i.e., variances of latent slopes, as a function of sample size, number of longitudinal measurement occasions, and growth curve reliability. We extend this work by investigating the effect of the number of indicators per measurement occasion on power. We analytically demonstrate that the positive effect of multiple indicators on statistical power is inversely related to the relative magnitude of occasion‐specific latent residual variance and is independent of the specific model that constitutes the observed variables, in particular of other parameters in the LGCM. When designing a study, researchers have to consider trade‐offs of costs and benefits of different design features. We demonstrate how knowledge about power equivalent transformations between indicator measurement designs allows researchers to identify the most cost‐efficient research design for detecting parameters of interest. Finally, we integrate different formal results to exhibit the trade‐off between the number of measurement occasions and number of indicators per occasion for constant power in LGCMs.  相似文献   

20.
元分析是根据现有研究对感兴趣的主题得出比较准确和有代表性结论的一种重要方法,在心理、教育、管理、医学等社会科学研究中得到广泛应用。信度是衡量测验质量的重要指标,用合成信度能比较准确的估计测验信度。未见有文献提供合成信度元分析方法。本研究在比较对参数进行元分析的三种模型优劣的基础上,在变化系数模型下推出合成信度元分析点估计及区间估计的方法;以区间覆盖率为衡量指标,模拟研究表明本研究提出的合成信度元分析区间估计的方法得当;举例说明如何对单维测验的合成信度进行元分析。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号