首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The question as to which structural equation model should be selected when multitrait-multimethod (MTMM) data are analyzed is of interest to many researchers. In the past, attempts to find a well-fitting model have often been data-driven and highly arbitrary. In the present article, the authors argue that the measurement design (type of methods used) should guide the choice of the statistical model to analyze the data. In this respect, the authors distinguish between (a) interchangeable methods, (b) structurally different methods, and (c) the combination of both kinds of methods. The authors present an appropriate model for each type of method. All models allow separating measurement error from trait influences and trait-specific method effects. With respect to interchangeable methods, a multilevel confirmatory factor model is presented. For structurally different methods, the correlated trait-correlated (method-1) model is recommended. Finally, the authors demonstrate how to appropriately analyze data from MTMM designs that simultaneously use interchangeable and structurally different methods. All models are applied to empirical data to illustrate their proper use. Some implications and guidelines for modeling MTMM data are discussed.  相似文献   

2.
In this study we extend and assess the trifactor model for multiple-ratings data in which two different raters give independent scores for the same responses (e.g., in the GRE essay or to subset of PISA constructed-responses). The trifactor model was extended to incorporate a cross-classified data structure (e.g., items and raters) instead of a strictly hierarchical structure. we present a set of simulations to reflect the incompleteness and imbalance in real-world assessments. The effects of the rate of missingness in the data and of ignoring differences among raters are investigated using two sets of simulations. The use of the trifactor model is also illustrated with empirical data analysis using a well-known international large-scale assessment.  相似文献   

3.
Carrasco, Ling, and Read (2004) showed that transient attention increases perceived contrast. However, Prinzmetal, Long, and Leonhardt (2008) suggest that for targets of low visibility, observers may bias their response toward the cued location, and they propose a cue-bias explanation for our previous results. Our response is threefold. First, we outline several key methodological differences between the studies that could account for the different results. We conclude that the cue-bias hypothesis is a plausible explanation for Prinzmetal et al.'s (2008) results, given the characteristics of their stimuli, but not for the studies by Carrasco and colleagues, in which the stimuli were suprathreshold (Carrasco, Ling, & Read, 2004; Fuller, Rodriguez, & Carrasco, 2008; Ling & Carrasco, 2007). Second, we conduct a study to show that the stimuli used in our previous studies are not near-threshold, but suprathreshold (Experiment 1, Phase 1). Furthermore, we found an increase in apparent contrast for a high-contrast stimulus when it was precued, but not when it was postcued, providing more evidence against a cue-bias hypothesis (Experiment 1, Phase 2). We also show that the visibility of the stimuli in Prinzmetal et al. (2008) was much lower than that of Carrasco, Ling, and Read, rendering their stimuli susceptible to their cue-bias explanation (Experiment 2). Third, we present a comprehensive summary of all the control conditions used in different labs that have ruled out a cue bias explanation of the appearance studies. We conclude that a cue-bias explanation may operate with near-threshold and low-visibility stimuli, as was the case in Prinzmetal et al. (2008), but that such an explanation has no bearing on studies with suprathreshold stimuli. Consistent with our previous studies, the present data support the claim that attention does alter the contrast appearance of suprathreshold stimuli.  相似文献   

4.
Latent state-trait (LST) analysis is frequently applied in psychological research to determine the degree to which observed scores reflect stable person-specific effects, effects of situations and/or person-situation interactions, and random measurement error. Most LST applications use multiple repeatedly measured observed variables as indicators of latent trait and latent state residual factors. In practice, such indicators often show shared indicator-specific (or method) variance over time. In this article, the authors compare 4 approaches to account for such method effects in LST models and discuss the strengths and weaknesses of each approach based on theoretical considerations, simulations, and applications to actual data sets. The simulation study revealed that the LST model with indicator-specific traits (Eid, 1996) and the LST model with M - 1 correlated method factors (Eid, Schneider, & Schwenkmezger, 1999) performed well, whereas the model with M orthogonal method factors used in the early work of Steyer, Ferring, and Schmitt (1992) and the correlated uniqueness approach (Kenny, 1976) showed limitations under conditions of either low or high method-specificity. Recommendations for the choice of an appropriate model are provided.  相似文献   

5.
To assign an overall performance rating to a target, a rater must weight and combine various pieces of specific performance information about that target. Policy‐capturing research has demonstrated that individual differences in raters can influence the way raters combine specific performance information. The current study examined information processing from a different perspective by exploring the possibility that target differences may also influence the way raters weight and combine performance information. Raters (N = 146) rated each of six targets on six specific performance dimensions and on overall performance. Sequential moderation analyses indicated that targets influenced the way raters, as a group, combined information across targets. These results lend support to the inference that overall performance ratings may not be comparable across targets, that is, they may not reflect the same underlying performance across targets.  相似文献   

6.
ABSTRACT Although peer raters of personality traits do tend to agree, the strength of their consensus is often modest. This article focuses on methods for analyzing determinants of consensus. Variance components methods adapted from generalizability theory have some untapped potential for understanding gradations in consensus. The methods allow explicit analysis of how social categories of targets might affect judgments of raters from the same or different social categories. Limitations of the variance components approach are also discussed. The methods are illustrated with artificial data.  相似文献   

7.
Error in performance ratings is typically believed to be due to the cognitive complexity of the rating task. Distributional assessment (DA) is proposed to improve rater accuracy by reducing cognitive load. In two laboratory studies, raters reported perceptions of cognitive effort and difficulty while assessing rating targets using DA or the traditional assessment approach. Across both studies, DA raters showed greater interrater agreement, and Study 2 findings provide some support for DA being associated with greater true score rating accuracy. However, DA raters also reported experiencing greater cognitive load during the rating task, and cognitive load did not mediate the relationship between rating format and rater accuracy. These findings have important implications regarding our understanding of cognitive load in the rating process.  相似文献   

8.
Emerging studies have shown that observers' ratings of personality predict performance behaviors better than do self-ratings. However, it is unclear whether these predictive advantages stem from (a) use of observers who have a frame of reference more closely aligned with the criterion ("narrower scope") or (b) observers having greater accuracy than targets themselves ("clearer lens"). In a primary study of 291 raters of 97 targets, we found predictive advantages even when observers were personal acquaintances who knew targets only outside of the work context. Integrating these findings with previous meta-analyses showed that colleagues' unique perspectives did not predict incrementally beyond commonly held trait perceptions across all raters (except for openness) and that self-raters who overestimate their agreeableness and conscientiousness perform worse on the job. Broadly, our results suggest that observers have clearer lenses for viewing targets' personality traits, and we discuss the theoretical implications of these findings for studying and measuring personality.  相似文献   

9.
Study 1 expands upon previous research by looking at the ability of untrained raters to detect pathological traits within a normal population of college students. In Study 1, 30-s video clips of 81 target persons were shown to 42 raters. Ratings of traits of personality disorders made by thin slice raters reliably predicted scores on the personality pathology measures obtained from the targets themselves and from close peers. Study 2 is a preliminary examination of how pathological rater traits impacts thin slice accuracy. In Study 2, peer and self-report data were examined regarding 87 thin slice raters. Raters who exhibited traits of narcissistic personality disorder were significantly less accurate in making personality predictions regarding targets. Three clusters of personality items were identified based on rater characteristics related to accuracy in predicting behavior.  相似文献   

10.
ABSTRACT “Shared meaning” is a parameter in Kenny's (1991) rater agreement model concerning the extent to which two raters agree about the trait-implicative meaning of the observations they have made of a target. In the first study, 201 individuals rated observations relevant to friendliness and organization on the meaning dimensions of typicality, difficulty level, and evaluation. They also rated 25 targets on the two constructs. We found strong support for a modest relation between the similarity of meaning ratings and the similarity of target ratings, especially for raw, as opposed to standard score, ratings. In Study 2 we considered shared meaning in a version of Kenny's model that included the consistency and communication parameters. Judge pairs (N= 110) evaluated two targets described by play and openness on several personality dimensions. Shared meaning significantly contributed to rating agreement for both targets, but consistency and communication, as manipulated in this study, did not. Implications of employing the broader consensus model in experimental studies are discussed. If I say “sorrow,” you'll know exactly what I mean only if you've experienced it in the same sense I have. -Joel Peterson, Ravenswood's Winemaker, in Darlington (1991)  相似文献   

11.
Recent studies have noted positive effects of red clothing on success in competitive sports, perhaps arising from an evolutionary predisposition to associate the color red with dominance status. Red may also enhance judgments of women's attractiveness by men, perhaps through a similar association with fertility. Here we extend these studies by investigating attractiveness judgments of both sexes and by contrasting attributions based on six different colors. Furthermore, by photographing targets repeatedly in different colors, we could investigate whether color effects are due to influences on raters or clothing wearers, by either withholding from raters information about clothing color or holding it constant via digital manipulation, while retaining color-associated variation in wearer's expression and posture. When color cues were available, we found color-attractiveness associations when males were judged by either sex, or when males judged females, but not when females judged female images. Both red and black were associated with higher attractiveness judgments and had approximately equivalent effects. Importantly, we also detected significant clothing color-attractiveness associations even when clothing color was obscured from raters and when color was held constant by digital manipulation. These results suggest that clothing color has a psychological influence on wearers at least as much as on raters, and that this ultimately influences attractiveness judgments by others. Our results lend support for the idea that evolutionarily-derived color associations can bias interpersonal judgments, although these are limited neither to effects on raters nor to the color red.  相似文献   

12.
Differential rater functioning (DRF) occurs when raters show evidence of exercising differential severity or leniency when scoring examinees within different subgroups. Previous studies of DRF have examined rater bias using manifest variables (e.g., use of covariates) to determine the subgroups. These manifest variables include gender and the ethnicity of the examinee. For example, a rater may score males more severely. Ideally, each rater’s severity should be invariant across subgroups. This study examines DRF in the context of latent subgroups that classify possible sources of DRF based on raters’ scoring behavior rather than manifest factors. An extension of the latent class signal detection theory (LC-SDT) model for identifying DRF is proposed and examined using real-world data and simulations. Results from real-world data show that the signal detection approach leads to an effective method to identify latent DRF. Simulations with varying sample sizes and conditions of rater precision were shown to recover parameters at an adequate level, supporting its use to identify latent DRF in large-scale data. These findings suggest that the DRF extension of the LC-SDT can be a useful model to examine characteristics of raters and add information that can aid rater training.  相似文献   

13.
This study looks beyond gender to explore the impact of the social status of race and of token difference defined by race. In a 2 × 4 design, 53 African American women and 76 white women undergraduates rated a woman target, of the same race as themselves, who was described as being of the same race and gender as the dominant members of her work group or as a token defined by her gender alone, race alone, or both her race and gender. White women tokens were perceived to experience better social relations, more supportive colleagues, and lower stress than African American targets. Across African American and white raters/targets, token representation, defined by any ascribed status, was associated with expected negative tokenism outcomes relative to those projected for dominants. The omnirelevance of race toward understanding tokenism processes is discussed.We wish to thank Marchell Bass, Paulina Beres, Darya Burns, Roy Carrera, Nicole Cassie, Comilita Jackson, Susan Mathews, Pamela Ramsey, Catina Scott, and Aretha Strickland for their invaluable help with data collection and entry. These findings were presented at the meeting of the Midwestern Psychological Association in May 1995 in Chicago.  相似文献   

14.

Purpose

The purpose of this study was to explore the role rater and target age play in the evaluations of poorly performing workers. Intergroup attribution theory suggests that rater age predicts the attributions made for the poor performance of older workers.

Design/Methodology/Approach

In this study, 203 supervisors in various industries completed measures of causal attribution and evaluations for a poorly performing hypothetical subordinate.

Findings

Compared to the poor performance of younger targets, the poor performance of older targets was more likely to be attributed to external and controllable causes by older raters and more likely to be attributed to stable causes by younger raters. These attributions predicted willingness to punish and likelihood to provide training.

Implications

Our findings were partially supported by intergroup attribution theory and suggest that this theory may be useful in understanding how older workers’ performance is evaluated.

Originality/Value

This is one of the first studies to utilize intergroup attribution theory among supervisors in exploring how older workers are evaluated in the workplace and to demonstrate that the theory predicts how older workers’ poor performance will be attributed. Our study provides evidence that when evaluating a poorly performing older target, older raters will be more inclined to attribute this performance to controllable causes and thus be more punitive than younger raters. Further, we provide some evidence that raters will be more punitive and less willing to provide training when evaluating poorly performing targets to whom they are similar.  相似文献   

15.
探讨了康春花,孙小坚和曾平飞(2016)提出的等级反应多水平侧面模型(GR-MLFM)在包含被试及评分者层面预测变量(完整模型)下的返真性和适用性。结果表明:(1)GR-MLFM完整模型具有逻辑上和数理上的合理性,可用于主观题的评分情境,能较好地检测出评分者效应、影响因素及其影响程度;(2)在数学问题解决的评分实践中,评分员存在两种类型的评分倾向(宽松和严格效应),但绝大多数评分员的宽严度不明显;评分者的责任心可正向预测其严格程度,自信心可正向预测其宽松程度,而情绪稳定性和评分经验的预测作用不显著。  相似文献   

16.
国内外考试改革和大型测评实践越来越强调主观题的作用,则评分者信度研究又重新成为一个备受关注的议题。研究在Wang和Liu(2007)的广义多水平侧面模型基础上,提出并探讨了等级反应多水平侧面模型。结果表明:在评分者固定效应和随机效应两种实验条件下,各偏差值的均值与标准差均较小,说明模型在当前实验条件下,各参数估计值的返真性和稳健性均较好,可以检测出评分者效应,由此,后续可进一步加入评分者效应的影响因素,使其发展为可同时检测评分者效应及其影响因素的完整模型。  相似文献   

17.
Few group psychotherapy studies focus on therapists' interventions, and instruments that can measure group psychotherapy treatment fidelity are scarce. The aim of the present study was to evaluate the reliability of the Mentalization‐based Group Therapy Adherence and Quality Scale (MBT‐G‐AQS), which is a 19‐item scale developed to measure adherence and quality in mentalization‐based group therapy (MBT‐G). Eight MBT groups and eight psychodynamic groups (a total of 16 videotaped therapy sessions) were rated independently by five raters. All groups were long‐term, outpatient psychotherapy groups with 1.5 hours weekly sessions. Data were analysed by a Generalizability Study (G‐study and D‐study). The generalizability models included analyses of reliability for different numbers of raters. The global (overall) ratings for adherence and quality showed high to excellent reliability for all numbers of raters (the reliability by use of five raters was 0.97 for adherence and 0.96 for quality). The mean reliability for all 19 items for a single rater was 0.57 (item range 0.26–0.86) for adherence, and 0.62 (item range 0.26–0.83) for quality. The reliability for two raters obtained mean absolute G‐coefficients on 0.71 (item range 0.41–0.92 for the different items) for adherence and 0.76 (item range 0.42–0.91) for quality. With all five raters the mean absolute G‐coefficient for adherence was 0.86 (item range 0.63–0.97) and 0.88 for quality (item range 0.64–0.96). The study demonstrates high reliability of ratings of MBT‐G‐AQS. In models differentiating between different numbers of raters, reliability was particularly high when including several raters, but was also acceptable for two raters. For practical purposes, the MBT‐G‐AQS can be used for training, supervision and psychotherapy research.  相似文献   

18.
Multifaceted data are very common in the human sciences. For example, test takers' responses to essay items are marked by raters. If multifaceted data are analyzed with standard facets models, it is assumed there is no interaction between facets. In reality, an interaction between facets can occur, referred to as differential facet functioning. A special case of differential facet functioning is the interaction between ratees and raters, referred to as differential rater functioning (DRF). In existing DRF studies, the group membership of ratees is known, such as gender or ethnicity. However, DRF may occur when the group membership is unknown (latent) and thus has to be estimated from data. To solve this problem, in this study, we developed a new mixture facets model to assess DRF when the group membership is latent and we provided two empirical examples to demonstrate its applications. A series of simulations were also conducted to evaluate the performance of the new model in the DRF assessment in the Bayesian framework. Results supported the use of the mixture facets model because all parameters were recovered fairly well, and the more data there were, the better the parameter recovery.  相似文献   

19.
黎光明  蒋欢 《心理科学》2019,(3):731-738
包含评分者侧面的测验通常不符合任意一种概化理论设计,因此从概化理论的角度来看这类测验下的数据应属于缺失数据,而决定缺失结构的就是测验的评分方案。用R软件模拟出三种评分方案下的数据,并比较传统法、评价法和拆分法在各评分方案下的估计效果,结果表明:(1)传统法估计准确性较差;(2)评分者一致性较高时,适宜用评价法进行估计;(3)拆分法的估计结果最准确,仅在固定评分者评分方案下需注意评分者与考生数量之比,该比值小于等于0.0047 时估计结果较为准确。  相似文献   

20.
National income has a pervasive influence on the perception of ingroup stereotypes, with high status and wealthy targets perceived as more competent. In two studies we investigated the degree to which economic wealth of raters related to perceptions of outgroup competence. Raters’ economic wealth predicted trait ratings when (1) raters in 48 other cultures rated Americans’ competence and (2) Mexican Americans rated Anglo Americans’ competence. Rater wealth also predicted ratings of interpersonal warmth on the culture level. In conclusion, raters’ economic wealth, either nationally or individually, is significantly associated with perception of outgroup members, supporting the notion that ingroup conditions or stereotypes function as frames of reference in evaluating outgroup traits.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号