首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A method for examining change in maximal reliability for pre‐specified sets of congeneric measures when developing a multi‐component instrument is outlined. The approach is applicable for purposes of estimation and testing of gain or loss in the maximal reliability coefficient as a consequence of adding or dropping one or more measures from a homogeneous composite with uncorrelated errors, as well as when one is concerned with optimal component choice for highest increase or correspondingly smallest drop in maximal reliability. The method is compared with a procedure for ascertaining change in unweighted sum score reliability, and implications for instrument construction and revision are discussed. The approach is illustrated with a numerical example.  相似文献   

2.
In covariance structure modelling, the non‐centrality parameter of the asymptotic chi‐squared distribution is typically used as an indicator of asymptotic power for hypothesis tests. When a latent linear regression is of interest, the contribution to power by the maximal reliability coefficient, which is associated with used latent variable indicators, is examined and this relationship is further explicated in the case of congeneric measures. It is also shown that item parcelling may reduce power of tests of latent regression parameters. Recommendations on weights for parcelling to avoid power loss are provided, which are found to be those of optimal linear composites with maximal reliability.  相似文献   

3.
A one‐step covariance structure analysis procedure for estimation of maximal reliability of linear composites with congeneric measures is outlined. The approach is readily employed within a single modelling session using popular covariance structure analysis software, and permits simultaneous estimation of the optimal measure weights with standard errors. The method is illustrated by a numerical example.  相似文献   

4.
Unlike a substantial part of reliability literature in the past, this article is concerned with weighted combinations of a given set of congeneric measures with uncorrelated errors. The relationship between maximal coefficient alpha and maximal reliability for such composites is initially dealt with, and it is shown that the former is a lower bound of the latter. A direct method for obtaining approximate standard error and confidence interval for maximal reliability is then outlined. The procedure is based on a second-order Taylor series approximation and is readily and widely applicable in empirical research via use of covariance structure modeling. The described method is illustrated with a numerical example.  相似文献   

5.
We examine the validity and reliability of a single‐item measure of social identification (SISI). Convergent validity is shown with significant positive correlations with previously published unidimensional and multidimensional measures of in‐group identification and other group‐relevant measures (e.g., entitativity and collective self‐esteem). Divergent validity is shown via nonsignificant correlations with social desirability measures. Predictive validity is shown with positive correlations with group‐relevant behavior (e.g., volunteerism and voting). External validity is shown with correlations with other in‐group identification measures in a community sample. The reliability of the scale is shown by examining scores of the SISI for six different identities at three points in time. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

6.
In a recent article in this journal, Lombard, Snyder‐Duch, and Bracken (2002) surveyed 200 content analyses for their reporting of reliability tests, compared the virtues and drawbacks of five popular reliability measures, and proposed guidelines and standards for their use. Their discussion revealed that numerous misconceptions circulate in the content analysis literature regarding how these measures behave and can aid or deceive content analysts in their effort to ensure the reliability of their data. This article proposes three conditions for statistical measures to serve as indices of the reliability of data and examines the mathematical structure and the behavior of the five coefficients discussed by the authors, as well as two others. It compares common beliefs about these coefficients with what they actually do and concludes with alternative recommendations for testing reliability in content analysis and similar data‐making efforts.  相似文献   

7.
A procedure for point and interval estimation of maximal reliability of multiple‐component measuring instruments in multi‐level settings is outlined. The approach is applicable to hierarchical designs in which individuals are nested within higher‐order units and exhibit possibly related performance on components of a given homogeneous scale. The method is developed within the framework of multi‐level factor analysis. The proposed procedure is illustrated with an empirical example.  相似文献   

8.
Reliability can be studied in a generalized way using repeated measurements. Linear mixed models are used to derive generalized test–retest reliability measures. The method allows for repeated measures with a different mean structure due to correction for covariate effects. Furthermore, different variance–covariance structures between measurements can be implemented. When the variance structure reduces to a random intercept (compound symmetry), classical methods are recovered. With more complex variance structures (e.g. including random slopes of time and/or serial correlation), time‐dependent reliability functions are obtained. The effect of time lag between measurements on reliability estimates can be evaluated. The methodology is applied to a psychiatric scale for schizophrenia.  相似文献   

9.
There are few self‐report measures of morality. The Religious Status Inventory—‘Being Ethical’ subscale represents one approach. However, at present there is limited information on the psychometric properties of either the original 20‐item version (RSInv‐20) or the shortened embedded 10‐item version (RSInv‐S10). The aim of the present study was to provide psychometric data on the internal reliability of these two versions of the ‘Being Ethical’ subscale. As part of a larger study, 595 Northern Irish adolescents, drawn from both Grammar and Secondary schools, completed the RSInv‐20. An unsatisfactory level of internal reliability was found for the RSInv‐20 (Cronbach’s alpha = 0.42), but a satisfactory level of internal reliability was found for the RSInv‐S10 (Cronbach’s alpha = 0.70). Subsequent item analysis produced an alternative 10‐item version (RSInv‐A10) that provided the optimum level of internal reliability for a 10‐item measure in the present sample (Cronbach’s alpha = 0.74). In addition, on all three versions of the measure (RSInv‐20, RSInv‐S10, and RSInv‐A10), differences were found in levels of internal reliability among Grammar and Secondary school respondents, with the former producing higher levels of internal reliability.  相似文献   

10.
Fourteen subjects participated in a sleep study designed to document the reliability of measurements of REM-related vaginal blood-flow changes. Several standard sleep parameters were also examined for comparison with vaginal measures. The most reliable vaginal measure was the maximal change in disengorgement for a given REM period averaged across a night of REM periods. The reliability of this measure compared favorably with that of a highly reliable computerized measure of REM density. It is suggested that nocturnal vaginal blood-flow measures have sufficient reliability to be useful in the differential diagnosis of organic and psychogenic sexual dysfunction.This study was supported in part by a training grant to the Department of Clinical Psychology, University of Florida, Gainesville (USPHS532DE07133-02), and a grant from the NIH (USPHSCA26364R073241-29) to the second author.  相似文献   

11.
This paper is concerned with the reliability of weighted combinations of a given set of dichotomous measures. Maximal reliability for such measures has been discussed in the past, but the pertinent estimator exhibits a considerable bias and mean squared error for moderate sample sizes. We examine this bias, propose a procedure for bias correction, and develop a more accurate asymptotic confidence interval for the resulting estimator. In most empirically relevant cases, the bias correction and mean squared error correction can be performed simultaneously. We propose an approximate (asymptotic) confidence interval for the maximal reliability coefficient, discuss the implementation of this estimator, and investigate the mean squared error of the associated asymptotic approximation. We illustrate the proposed methods using a numerical example.  相似文献   

12.
The convergence on the Big Five in personality theory has produced a demand for efficient yet psychometrically sound measures. Therefore, five single‐item measures, using bipolar response scales, were constructed to measure the Big Five and evaluated in terms of their convergent and off‐diagonal divergent properties, their pattern of criterion correlations and their reliability when compared with four longer Big Five measures. In a combined sample (N = 791) the Single‐Item Measures of Personality (SIMP) demonstrated a mean convergence of r = 0.61 with the longer scales. The SIMP also demonstrated acceptable reliability, self–other accuracy, and divergent correlations, and a closely similar pattern of criterion correlations when compared with the longer scales. It is concluded that the SIMP offer a reasonable alternative to longer scales, balancing the demands of brevity versus reliability and validity. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

13.
Inter‐rater reliability and accuracy are measures of rater performance. Inter‐rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter‐rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert‐generated criterion ratings and between raters using intraclass correlation (2,1). Inter‐rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter‐rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter‐rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter‐rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.  相似文献   

14.
The present study is a reliability‐generalization meta‐analysis of 5 commonly used measures of romantic love: Rubin’s Loving and Liking scales, the Love Attitudes Scale, the Love Attitudes Scale–Short Form, the Passionate Love Scale, and the Triangular Love Scale. Data from 127 studies representing 38,132 participants provided internal consistency reliability estimates. The studies provide information on the average score reliability across studies and the effect of sample characteristics on score reliability across measures. The reliability of scores from several measures proved to be susceptible to the influence of sample characteristics, most notably, the ethnicity of the sample. The discussion provides recommendations to researchers and focuses on the ramifications of score reliability on love research.  相似文献   

15.
A new, short, and easily administered Risk Propensity Scale (RPS) is introduced that measures general risk‐taking tendencies. This paper investigates the reliability and discriminant validity of the RPS. The RPS provided scores that yielded a good internal reliability coefficient and adequate test–retest reliability, and the scores correlated moderately to well with those of the Everyday Risk Inventory and the short Sensation‐Seeking Scale. The correlation with the scores from other scales (Need for Cognition scale, Need for Structure scale, and 2 self‐esteem scales) was low to moderate, indicating good discriminant validity. The findings are discussed in relation to risk‐perception research using gambling experiments and in relation to their usefulness for risky decision‐making research.  相似文献   

16.
For item response theory (IRT) models, which belong to the class of generalized linear or non‐linear mixed models, reliability at the scale of observed scores (i.e., manifest correlation) is more difficult to calculate than latent correlation based reliability, but usually of greater scientific interest. This is not least because it cannot be calculated explicitly when the logit link is used in conjunction with normal random effects. As such, approximations such as Fisher's information coefficient, Cronbach's α, or the latent correlation are calculated, allegedly because it is easy to do so. Cronbach's α has well‐known and serious drawbacks, Fisher's information is not meaningful under certain circumstances, and there is an important but often overlooked difference between latent and manifest correlations. Here, manifest correlation refers to correlation between observed scores, while latent correlation refers to correlation between scores at the latent (e.g., logit or probit) scale. Thus, using one in place of the other can lead to erroneous conclusions. Taylor series based reliability measures, which are based on manifest correlation functions, are derived and a careful comparison of reliability measures based on latent correlations, Fisher's information, and exact reliability is carried out. The latent correlations are virtually always considerably higher than their manifest counterparts, Fisher's information measure shows no coherent behaviour (it is even negative in some cases), while the newly introduced Taylor series based approximations reflect the exact reliability very closely. Comparisons among the various types of correlations, for various IRT models, are made using algebraic expressions, Monte Carlo simulations, and data analysis. Given the light computational burden and the performance of Taylor series based reliability measures, their use is recommended.  相似文献   

17.
We examined the effects of several variations in response rate on the calculation of total, interval, exact‐agreement, and proportional reliability indices. Trained observers recorded computer‐generated data that appeared on a computer screen. In Study 1, target responses occurred at low, moderate, and high rates during separate sessions so that reliability results based on the four calculations could be compared across a range of values. Total reliability was uniformly high, interval reliability was spuriously high for high‐rate responding, proportional reliability was somewhat lower for high‐rate responding, and exact‐agreement reliability was the lowest of the measures, especially for high‐rate responding. In Study 2, we examined the separate effects of response rate per se, bursting, and end‐of‐interval responding. Response rate and bursting had little effect on reliability scores; however, the distribution of some responses at the end of intervals decreased interval reliability somewhat, proportional reliability noticeably, and exact‐agreement reliability markedly.  相似文献   

18.
Two studies using college student samples were conducted to establish reliability and validity for new scales measuring rape victim empathy and rape perpetrator empathy separately. In Experiment 1, two 13‐item measures of rape empathy were developed. Variables examined for purposes of construct validity included personal sexual assault experience, general empathy, and perceived rape victim responsibility. In Experiment 2, we added 5 new items to each scale. The final scales were two 18‐item measures with high reliability. Variables examined in Experiment 2 included personal sexual assault, general empathy, and acquaintanceship with a victim or a perpetrator. Both studies found gender differences for empathy scores, with women tending to be higher on rape victim empathy, and men tending to be higher on rape perpetrator empathy. Personal sexual experience was related to rape empathy scores. Perceived victim responsibility was negatively correlated with rape victim empathy and positively correlated with rape perpetrator empathy.  相似文献   

19.
Background: Although ultra‐brief outcome and process measures have been developed for individual therapy, currently there are no ultra‐brief alliance measures for group therapy. Method: The current study examined 105 clients in group therapy for issues related to substance abuse or with issues related to the substance abuse of a significant other. We tested whether a newly developed group therapy alliance measure – the Group Session Rating Scale would be related to other commonly used group process measures (Working Alliance Inventory, Group Cohesion, Group Climate) and early change (change over the first four sessions of group therapy). Results: The findings provided support for reliability based on Cronbach alphas and test‐retest coefficients. Additionally, the GSRS was a one‐factor measure that was related to other group process measures as well as predicted early change. Discussion: Clinical implications for how to utilise ultra‐brief outcome and alliance measures are provided.  相似文献   

20.
This paper demonstrates that the widely available and routinely used index ‘coefficient alpha if item deleted’ can be misleading in the process of construction and revision of multiple‐component instruments with congeneric measures. An alternative approach to evaluation of scale reliability following deletion of each component in a given composite is outlined that can be recommended in general for scale development purposes. The method provides ranges of plausible values for instrument reliability when dispensing with single components in a tentative composite, and permits testing hypotheses about reliability of resulting scale versions. The proposed procedure is illustrated with an example.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号