首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Interrater correlations are widely interpreted as estimates of the reliability of supervisory performance ratings, and are frequently used to correct the correlations between ratings and other measures (e.g., test scores) for attenuation. These interrater correlations do provide some useful information, but they are not reliability coefficients. There is clear evidence of systematic rater effects in performance appraisal, and variance associated with raters is not a source of random measurement error. We use generalizability theory to show why rater variance is not properly interpreted as measurement error, and show how such systematic rater effects can influence both reliability estimates and validity coefficients. We show conditions under which interrater correlations can either overestimate or underestimate reliability coefficients, and discuss reasons other than random measurement error for low interrater correlations.  相似文献   

2.
This paper demonstrates and compares methods for estimating the interrater reliability and interrater agreement of performance ratings. These methods can be used by applied researchers to investigate the quality of ratings gathered, for example, as criteria for a validity study, or as performance measures for selection or promotional purposes. While estimates of interrater reliability are frequently used for these purposes, indices of interrater agreement appear to be rarely reported for performance ratings. A recommended index of interrater agreement, theT index (Tinsley & Weiss, 1975), is compared to four methods of estimating interrater reliability (Pearsonr, coefficient alpha, mean correlation between raters, and intraclass correlation). Subordinate and superior ratings of the performance of 100 managers were used in these analyses. The results indicated that, in general, interrater agreement and reliability among subordinates were fairly high. Interrater agreement between subordinates and superiors was moderately high; however, interrater reliability between these two rating sources was very low. The results demonstrate that interrater agreement and reliability are distinct indices and that both should be reported. Reasons are discussed as to why interrater reliability should not be reported alone.This paper is based, in part, on a thesis submitted to East Carolina University by the second author. Portions of this study were presented at the American Psychological Association meeting in New Orleans, LA, August, 1989. The authors would like to thank Michael Campion and two anonymous reviewers for their comments on earlier drafts of this paper.  相似文献   

3.
This article examines the test–retest reliability of supervisory ratings for several dimensions of job performance and for overall job performance. We found that the test–retest reliability of overall job performance is .79 (SD = .08), a value very close to the one found by Viswesvaran, Ones and Schmidt (1996), and that the average test–retest reliability for specific dimensions of job performance is .57 (SD = .07). We also found that some dimensions of job performance appear to be easier to rate than others. We suggest some implications of these findings for research and practice of personnel selection.  相似文献   

4.
Transient errors are caused by variations in feelings, moods, and mental states over time. If these errors are present, coefficient alpha is an inflated estimate of reliability. A true-score model is presented that incorporates transient errors for test-retest data, and a reliability estimate is derived. This estimate, referred to as the test-retest alpha, is less than coefficient alpha if transient error is present and is less susceptible to effects due to item recall than a test-retest correlation. An assumption underlying the test-retest alpha is essential tau equivalency of items. A test-retest split-half coefficient is presented as an alternative to the test-retest alpha when this assumption is violated. The test-retest alpha is the mean of all possible test-retest split-half coefficients.  相似文献   

5.
This study examined the validity and reliability of a Turkish version of the Modified Moral Sensitivity Questionnaire for Student Nurses (MMSQSN). After obtaining permission to adapt the MMSQSN into Turkish, the translation/back-translation method was used with expert opinions to determine content validity. Factor analysis was conducted to examine the construct validity and test–retest was performed on the questionnaire to determine reliability. Cronbach’s alpha coefficients were calculated to assess for internal consistency. Participants included 272 baccalaureate degree student nurses who took ethics lessons prior to their clinical internship. The factor analysis revealed that even though the factor structure in the original scale was the same, relevant items were categorized with similar components, and factor loads were sufficient. The correlation coefficient in the analyses of test–retest scores was .66 for the total scale (p < .05) and the Cronbach’s alpha was .73 for the total scale. The translated MMSQSN is a valid and reliable measure of ethical sensitivity in student nurses in Turkey.  相似文献   

6.
The current meta‐analysis of the selection validity of assessment centres aims to update an earlier meta‐analysis of assessment centre validity. To this end, we retrieved 26 studies and 27 validity coefficients (N=5850) relating the Overall Assessment Rating (OAR) to supervisory performance ratings. The current study obtained a corrected correlation of .28 between the OAR and supervisory job performance ratings (95% confidence interval .24≤ρ≤.32). It is further suggested that this validity estimate is likely to be conservative given that assessment centre validities tend to be affected by indirect range restriction.  相似文献   

7.
元分析是根据现有研究对感兴趣的主题得出比较准确和有代表性结论的一种重要方法,在心理、教育、管理、医学等社会科学研究中得到广泛应用。信度是衡量测验质量的重要指标,用合成信度能比较准确的估计测验信度。未见有文献提供合成信度元分析方法。本研究在比较对参数进行元分析的三种模型优劣的基础上,在变化系数模型下推出合成信度元分析点估计及区间估计的方法;以区间覆盖率为衡量指标,模拟研究表明本研究提出的合成信度元分析区间估计的方法得当;举例说明如何对单维测验的合成信度进行元分析。  相似文献   

8.
In the present research, we developed and conducted a field test of the computerized adaptive rating scale (CARS) for assessing military officer performance. Participants completed the CARS and a behaviorally anchored rating scale (BARS) which were both designed to assess five leadership competencies (action orientation/initiative, communication, developing self and others, behavioral flexibility, and teamwork). We obtained data from 116 supervisors and 207 peers who provided ratings on 126 officer ratees. Although interrater reliability estimates were lower for CARS ratings on some competencies, there was a 20–25% improvement in standard error of measurement, the measurement precision in CARS ratings compared to the BARS. Results support findings from a previous lab study.  相似文献   

9.
Studies of the relationship between human resource (HR) practices and firm performance typically use a single respondent to assess firm level HR practices or HR effectiveness. However, previous research in other substantive areas suggests that rater differences are a potentially important source of measurement error. We demonstrate analytically the potential consequences of both random and systematic measurement error in research on HR and firm performance. However, our main focus is on random error and we show how generalizability theory can be applied to obtain better estimates of reliability by simultaneously recognizing multiple sources (e.g., items, raters) of random measurement error. These more inclusive reliability estimates, in turn, offer the possibility of more precisely quantifying substantive relationships in the HR and firm performance literature. In our sample, reliabilities (as estimated by generalizability coefficients) for single-rater assessments of HR variables were generally below .50. This degree of measurement error, if present in substantive studies on HR and firm performance, could lead to considerable bias, given that an unstandardized regression coefficient is corrected for measurement error in the independent variable by dividing by its reliability coefficient (not its square root). We also found only limited convergent validity between HR and line managers ratings of a second type of HR measure, HR effectiveness. In general, our findings suggest that future researchers need to devote greater attention to measurement error and construct validity issues. Our study provides an example of how generalizability theory can be useful in this pursuit.  相似文献   

10.
This paper studies the asymptotic distributions of three reliability coefficient estimates: Sample coefficient alpha, the reliability estimate of a composite score following a factor analysis, and the estimate of the maximal reliability of a linear combination of item scores following a factor analysis. Results indicate that the asymptotic distribution for each of the coefficient estimates, obtained based on a normal sampling distribution, is still valid within a large class of nonnormal distributions. Therefore, a formula for calculating the standard error of the sample coefficient alpha, recently obtained by van Zyl, Neudecker and Nel, applies to other reliability coefficients and can still be used even with skewed and kurtotic data such as are typical in the social and behavioral sciences.This research was supported by grants DA01070 and DA00017 from the National Institute on Drug Abuse and a University of North Texas faculty research grant. We would like to thank the Associate Editor and two reviewers for suggestions that helped to improve the paper.  相似文献   

11.
The author provides statistical approaches to aid investigators in assuring that sufficiently high test score reliabilities are achieved for specific research purposes. The statistical approaches use tests of statistical significance between the obtained reliability and lowest population reliability that an investigator will tolerate. The statistical approaches work for coefficient alpha and related coefficients and for alternate-forms, split-half (2-part alpha), and retest reliabilities. The author shows that, in some instances, a formula can help to estimate the sample size necessary for the statistical test.  相似文献   

12.
A covariance structure modelling method for the estimation of reliability for composites of congeneric measures in test–retest designs is outlined. The approach also allows an approximate standard error and confidence interval for scale reliability in such settings to be obtained. The procedure further permits measurement error components due to possible transient condition influences to be accounted for and evaluated, and is illustrated with a pair of examples.  相似文献   

13.
This study investigated the psychometric properties of three methods of scoring a Mixed Standard Scale (MSS) performance evaluation: the patterned procedure as corrected by Saal (1979); a simple nonpatterned scoring procedure suggested by Prien, Jones, and Miller (1977), which gives equal weights to the performance statements; and a procedure that assigned differential weights to each statement on the basis of scale values provided by a panel of subject matter experts. Interrater reliabilities, scale variances for averaged ratings, and a convergent/discriminant validity analysis, which included an alternate method of job skill ratings, indicated no difference in the score distribution variance, interrater reliability, or validity of different method scores.  相似文献   

14.
Previous studies have not fully investigated the psychometric properties of the Photographic Figure Rating Scale (PFRS). In 2 studies, we report on the test–retest reliability and convergent validity of ratings derived from the PFRS. In Study 1, 322 female university students in Britain provided self-ratings on the PFRS and objectively measured body mass index (BMI); a subsample of 132 women also completed the task after 5 weeks. In Study 2, 243 women from the community in Austria completed the PFRS along with a battery of other body image scales. Results of Study 1 showed that ratings on the PFRS had good test–retest reliability (all rs > .87) and good convergent validity in relation to BMI. Results of Study 2 showed that PFRS-derived body dissatisfaction scores were significantly correlated with a range of body image variables. These results provide evidence for the convergent validity and good test–retest reliability of the PFRS.  相似文献   

15.
The consistency of individual differences across time has implications for theory building and clinical applications. Indeed, personality psychologists have long worked to place constructs on the continuum of consistency of more trait-like to more state-like constructs. Recently, Chmielewski and Watson () highlighted the importance of dependability coefficients for interpreting the results of stability studies. These coefficients provide an estimate of how strongly short-term transient error affects retest correlations for a given measure. In this article, we use a modified version of Kenny and Zautra's (, ) STARTS model to estimate dependability of personality, life satisfaction, and affect in a 2-month longitudinal study of 8 waves. Results from 226 undergraduate students indicated that personality ratings were least influenced by transient state factors, whereas affect was most influenced. We discuss these findings in terms of their implications for the continuum of consistency and for the practical issue of selecting retest intervals for dependability analyses.  相似文献   

16.
We examined the psychometric properties of the Big Five personality traits assessed through social networking profiles in 2 studies consisting of 274 and 244 social networking website (SNW) users. First, SNW ratings demonstrated sufficient interrater reliability and internal consistency. Second, ratings via SNWs demonstrated convergent validity with self‐ratings of the Big Five traits. Third, SNW ratings correlated with job performance, hirability, and academic performance criteria; and the magnitude of these correlations was generally larger than for self‐ratings. Finally, SNW ratings accounted for significant variance in the criterion measures beyond self‐ratings of personality and cognitive ability. We suggest that SNWs may provide useful information for potential use in organizational research and practice, taking into consideration various legal and ethical issues.  相似文献   

17.
This paper reports on a study about the reliability and validity of a structured behavioral interview to assess private security personnel. Reliability was estimated using interrater coefficients. Two independent interviewers were used to rate each interviewee. Results show a reliability coefficient of .81 (N = 43) and .89 with Spearman-Brown correction for two raters. Validity was estimated using a content validation approach. This strategy was suggested by Lawshe (1975) to estimate the content validity of selection tests. So far, only two studies carried out by Schmitt and Ostroff (1986) and Carrier et al. (1990) have used Lawshe's strategy in the structured behavioral interview case. The interview consisted of seven questions and each was rated by 11 experts in the job. Results show a significant content validity ratio (CVR) for majority of the questions in the interview and a content validity index (CVI) of .89. Implications of these findings for the practice of the structured behavioral interview are discussed and future research is suggested.  相似文献   

18.
The authors reviewed 12 studies using the Counselor Burnout Inventory, including the results from their original, large‐sample study of school counselors (N = 1,005). Aggregated internal consistency (coefficient alpha) was .90 (N = 1,708), and subscale alphas ranged from .73 to .86 (N = 2,809). Test–retest reliability was .81 (N = 18; k = 1), with subscale test–retest reliability estimates ranging from .72 to .85. Convergent comparisons were robust across 10 instruments. Structural validity indicated a 5‐factor solution and an adequate to good fit of the model to the current study's data.  相似文献   

19.
Although research has shown that individual job performance changes over time, the extent of such changes is unknown. In this article, the authors define and distinguish between the concepts of temporal consistency, stability, and test-retest reliability when considering individual job performance ratings over time. Furthermore, the authors examine measurement type (i.e., subjective and objective measures) and job complexity in relation to temporal consistency, stability, and test-retest reliability. On the basis of meta-analytic results, the authors found that the test-retest reliability of these ratings ranged from .83 for subjective measures in low-complexity jobs to .50 for objective measures in high-complexity jobs. The stability of these ratings over a 1-year time lag ranged from .85 to .67. The analyses also reveal that correlations between performance measures decreased as the time interval between performance measurements increased, but the estimates approached values greater than zero.  相似文献   

20.
Estimating the reliability of scores on single‐item measures can be difficult because commonly used internal consistency estimates of reliability cannot be calculated. When longitudinal data is available, statistical models can be used to decompose the variability in the latent variable at each wave into trait versus state variance. Then, reliability can be estimated as a ratio of the sum of the trait variance that is captured in repeated assessments over the total variance. The current study used latent trait‐state‐error models on a nine‐year longitudinal data (N = 5,003) to estimate the test–retest reliability of scores on a single‐item measure of job satisfaction. Results showed that job satisfaction scores were somewhat unreliable (rxx = .49–.59) and amenable to change.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号