首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
叶宝娟  温忠粦 《心理科学》2013,36(3):728-733
在心理、教育和管理等研究领域中,经常会碰到两水平(两层)的数据结构,如学生嵌套在班级中,员工嵌套在企业中。在两水平研究中,被试通常不是独立的,如果直接用单水平信度公式进行估计,会高估测验信度。文献上已有研究讨论如何更准确地估计两水平研究中单维测验的信度。本研究指出了现有的估计公式的不足之处,用两水平验证性因子分析推导出一个新的信度公式,举例演示如何计算,并给出简单的计算程序。  相似文献   

2.
While most researchers do agree now that situations may have an effect in the assessment of traits, the consequences have been neglected, so far: if situations affect the assessment of traits we have to take this fact into account in studies on reliability and validity of measurement instruments and their application. In the theoretical part of this article we provide a more formal exposition of this point, introducing the basic concepts of latent state–trait (LST) theory. LST theory and the associated models allow for the estimation of the situational impact on trait measures in non-experimental, correlational studies. In the empirical part, LST theory is applied to three well known trait questionnaires: the Freiburg Personality Inventory, the NEO Five-Factor Inventory and the Eysenck Personality Inventory. It is shown that significant proportions of the variances of the scales of these questionnaires are due to situational effects. The following consequences of this finding are discussed, (i) Instead of the reliability coefficient, the proportion of variance due to the latent trait, the consistency coefficient, should be used for the estimation of confidence intervals for trait scores, (ii) To reduce the situational effects on trait estimates it may be useful to base such an estimate on several occasions, i.e., to aggregate data across occasions. (iii) Reliability and validity studies should not only be based on a sample of persons representative of those to whom the test will be applied; they should also be conducted in situational contexts representative of the intended applications.  相似文献   

3.
I describe how multilevel logistic regression can be used to assess the consistency of an individual's response pattern with an item response theory measurement model. Specifically, by treating item responses as being nested within individuals, multilevel logistic regression is used to estimate a person-response curve that models how an individual's item endorsement rate decreases as a function of item difficulty. The slope of an individual's person-response curve is used as an indicator of the degree of response consistency or person-fit. I argue that the proposed multilevel modeling approach to person-fit assessment has several potential advantages over traditional techniques. The most important advantage being that the multilevel modeling approach allows explanatory variables to be entered into the model so that the causes of response inconsistency or differential test functioning can be investigated.  相似文献   

4.
A total of 4 raters, including 2 teachers and 2 research assistants, used Direct Behavior Rating Single Item Scales (DBR-SIS) to measure the academic engagement and disruptive behavior of 7 middle school students across multiple occasions. Generalizability study results for the full model revealed modest to large magnitudes of variance associated with persons (students), occasions of measurement (day), and associated interactions. However, an unexpectedly low proportion of the variance in DBR data was attributable to the facet of rater, as well as a negligible variance component for the facet of rating occasion nested within day (10-min interval within a class period). Results of a reduced model and subsequent decision studies specific to individual rater and rater type (research assistant and teacher) suggested degree of reliability-like estimates differed substantially depending on rater. Overall, findings supported previous recommendations that in the absence of estimates of rater reliability and firm recommendations regarding rater training, ratings obtained from DBR-SIS, and subsequent analyses, be conducted within rater. Additionally, results suggested that when selecting a teacher rater, the person most likely to substantially interact with target students during the specified observation period may be the best choice.  相似文献   

5.
Psychological scaling methods can be applied and can bring substantial knowledge to bear upon problems within the realm of environmental hygiene. Methodological issues related to the possibility of obtaining calibrated scales of perceptual variables such as discomfort, annoyance, and unpleasant odors are critical when the scale values are to be entered into a physical pollution index or a perceived environmental quality index, both being applied in the form of norms or recommendations. The level of measurement (ordinal, interval, or ratio scale) is related to the question of calibration of scales. From a practical point of view, the effects of observer-environment dependence are important to disguise in the scale values. Problems having to do with calibration of scales and with application of knowledge about perceptual processes to the scales are being explored in ongoing research projects. These include the measurement of temperature discomfort in different climates and the measurement of annoyance in areas with different noise exposures. Attempts have been made to solve an odorous air pollution problem with psychological measurements.  相似文献   

6.
A lack of information about disease in children can lead to erroneous views such as children believing that hospital admittance or the presence of a disease is a punishment for a perceived wrong. There has thus far been no standard tool available to measure children's illness conceptualizations from a Leventhalian framework. Three groups of children with eczema, asthma and eczema and asthma between the ages of 7 and 12 years of age were recruited. Children were given the Children's Illness Perception Questionnaire (CIPQ), a 26-item instrument adapted from the Illness Perception Questionnaire for adults. A Kuder - Richardson 20 test of reliability for dichotomous data was performed allowing an estimate of the internal consistency of the measurement scales. It can be seen that, for all three illness groups, internal consistency is acceptable for the timeline and consequences scale. The cure/control scale, however, was not internally consistent for any illness group. As health professionals, we need to develop the means to further understand how paediatric illness beliefs relate to specific disease types, age and psychosocial factors and the utility of this instrument is discussed within this context.  相似文献   

7.
The relations between English spellings and pronunciations have been described as a fractal pattern. Manipulations of word properties are constructed to coincide with the fractal pattern of ambiguity in these relations (sampled as random variables). New word naming and lexical decision experiments replicate previously established effects of relations between word spellings and pronunciations. The structure of these relations establishes a measurement scale, which is nested in a manner that is loosely analogous to the way that centimeters are nested within meters, and so on. The experiments revealed that broader, increasingly stretched-more variable-distributions of pronunciation and response time accompany the introduction of higher resolution measurement scales to the naming and lexical decision measurement process. Moreover, the distribution of lexical decision response times is shown to obey an inverse power-law scaling relation, which implies lexical decision does not conform to a characteristic measurement scale.  相似文献   

8.
The Coping Strategies Scales (COSTS) were developed to provide a means of measuring how depressed persons cope with depression and to identify the behavior which they find to be most or least helpful. Items were rated by eight psychologists, psychiatrists and social workers. Those items achieving 75% level of agreement on scale assignments were included. The COSTS was then administered to 100 depressed outpatients and inpatients currently in psychotherapy. A replication study of 64 patients was also completed. Nine of the 10 scales had acceptable internal reliability, ranging from 0.70 to 0.86. An initial factor analysis of the 10 scale scores showed there to be three primary factors. Internal reliability coefficients for these three factorially-derived scales ranged from 0.86 to 0.91.  相似文献   

9.
Social scientists are frequently interested in assessing the qualities of social settings such as classrooms, schools, neighborhoods, or day care centers. The most common procedure requires observers to rate social interactions within these settings on multiple items and then to combine the item responses to obtain a summary measure of setting quality. A key aspect of the quality of such a summary measure is its reliability. In this paper we derive a confidence interval for reliability, a test for the hypothesis that the reliability meets a minimum standard, and the power of this test against alternative hypotheses. Next, we consider the problem of using data from a preliminary field study of the measurement procedure to inform the design of a later study that will test substantive hypotheses about the correlates of setting quality. The preliminary study is typically called the ??generalizability study?? or ??G study?? while the later, substantive study is called the ??decision study?? or ??D study.?? We show how to use data from the G study to estimate reliability, a confidence interval for the reliability, and the power of tests for the reliability of measurement produced under alternative designs for the D study. We conclude with a discussion of sample size requirements for G studies.  相似文献   

10.
The development of methods to create self‐reported attitude scales has lost momentum, in part because of increased research focused on implicit measures. This paper reviews 162 papers on methodological approaches applied to the validation and assessment of attitude scales. Assessment of methodological approaches applied indicates that neither reliability, validity, nor dimensionality assessments are consistently used according to standard operating procedures or in accordance with best practice. Within current practices in the field of attitude scale development, the full potential of self‐report scales is not met, in part because of such methodological issues. The improvement of existing practices and adoption of promising new developments in attitude scale construction and evaluation are discussed, together with recommendations for best practice in scale validation.  相似文献   

11.
ABSTRACT In this article we discuss the applicability of several new measurement models to the construction of personality scales, and we contrast these models with more traditional approaches in common usage, such as the principal factor analysis model. Our goal is to illustrate how nonlinear item-response models can be profitably used in personality research. We describe the development of a 30-item Negative Emotionality scale that was constructed using nonlinear factor analysis and item-response theory. We also show how traditional (linear) factor analysis can produce misleading results when it is applied to personality items with dichotomous response formats (e.g., true/false, agree/disagree). No formal training in modern measurement theory is assumed of the reader as we describe the nonlinear models that are used in this study in nontechnical language with a minimum of mathematics.  相似文献   

12.
Other-ratings of targets’ traits may consist – besides true trait variance (TTV) – of different measurement error sources, particularly due to raters, scales, items, measurement times, and random fluctuations. Using Gnambs’ (2015) and Ones, Wiernik, Wilmot, and Kostal’s (2016) procedures for partitioning variance in scales due to measurement error, available meta-analytical data on Big Five other-ratings were analyzed. They showed relatively little TTV (0–13%), which was especially decreased by both low inter-rater reliability and convergent validity of Big Five measures. Accounting for both, TTV levels rose, but were still small to medium (4–26%). These findings provide important insights on what Big Five other-ratings are composed of and how such scale scores may be interpreted and treated in further analyses (e.g., trait-outcome relations).  相似文献   

13.
One of the central tenets of classical test theory is that scales should have a high degree of internal consistency, as evidenced by Cronbach's a, the mean interitem correlation, and a strong first component. However, there are many instances in which this rule does not apply. Following Bollen and Lennox (1991), I differentiate between questionnaires such as anxiety or depression inventories, which are composed of items that are manifestations of an underlying hypothetical construct (i.e., where the items are called effect indicators) and those such as Scale 6 of the Minnesota Multiphasic Personality Inventory (Hathaway & McKinley, 1943) and ones used to tap quality of life or activities of daily living in which the items or subscales themselves define the construct (these items are called causal indicators). Questionnaires of the first sort, which are referred to as scales in this article, meet the criteria of classical test theory, whereas the second type, which are called indexes here, do not. I discuss the implications of this difference for how items are selected, the relationship among the items, and the statistics that should and should not be used in establishing the reliability of the scale or index.  相似文献   

14.
One of the central tenets of classical test theory is that scales should have a high degree of internal consistency, as evidenced by Cronbach's α, the mean interitem correlation, and a strong first component. However, there are many instances in which this rule does not apply. Following Bollen and Lennox (1991), I differentiate between questionnaires such as anxiety or depression inventories, which are composed of items that are manifestations of an underlying hypothetical construct (i.e., where the items are called effect indicators) and those such as Scale 6 of the Minnesota Multiphasic Personality Inventory (Hathaway & McKinley, 1943) and ones used to tap quality of life or activities of daily living in which the items or subscales themselves define the construct (these items are called causal indicators). Questionnaires of the first sort, which are referred to as scales in this article, meet the criteria of classical test theory, whereas the second type, which are called indexes here, do not. I discuss the implications of this difference for how items are selected, the relationship among the items, and the statistics that should and should not be used in establishing the reliability of the scale or index.  相似文献   

15.
Self-organizing individual differences in brain development   总被引:1,自引:0,他引:1  
Brain development is self-organizing in that the unique structure of each brain evolves in unpredictable ways through recursive modifications of synaptic networks. In this article, I review mechanisms of neural change in real time and over development, and I argue that change at each of these time scales embodies principles of self-organizing systems. I demonstrate how corticolimbic configurations that emerge within occasions lay down synaptic structure across occasions, giving rise to individual trajectories that become entrenched with age. Emotions have a powerful influence on this process.This is because the neural processes mediating emotion consolidate patterns of activation across the brain, through their enhancement of inter-regional coordination in real time and their contribution to synaptic shaping over development. The loss of corticolimbic plasticity with age is an unfortunate fact of development, but it is compensated in part by transitional phases and individual learning experiences through which habits are modified or replaced. I emphasize variations in inter-systemic coupling as a key mediator of developing individual differences, and I discuss the acquisition of anxious/depressive appraisals as an example.  相似文献   

16.
The social relations model (SRM) is a useful tool for measuring relationship effects, defined as the unique perceptions or behaviors of 2 people. The sources of variance in SRM studies are persons (actors and partners), groups, and items; the relationship effect is defined as the actor–partner interaction. By removing variance because of persons and groups, a measure of a “pure” relationship effect is obtained. In this article, generalizability theory (G Theory) is applied to estimate the reliability of SRM components from round‐robin data structures. Using G Theory, reliability formulas for actor, partner, group, and relationship are developed and interpretations for the reliability estimates are provided. The authors also discuss how these formulas can be used in both planning and interpreting results from relationship research.  相似文献   

17.
The aim of this study was to determine the test-retest reliability and internal consistency of the scales of the Spanish version of the Minnesota Multiphasic Personality Inventory-Adolescent (MMPI-A; Butcher et al., 1992). Two samples of 939 and 109 Spanish adolescents ages 14 to 18 years were assessed with the MMPI-A in their school environment. The first sample responded to the inventory once, whereas the second sample responded to it on 2 occasions with a 2-week interval between sessions. Results showed no significant differences in means or variances between the first and the second test administration for most MMPI-A scales. Test-retest reliability ranged between .62 (Amorality, Ma(1)) and .92 (Immaturity, IMM); most correlations exceeded .70. Internal consistency values for the MMPI-A scales in the pretest and posttest were very similar overall. External validity of the MMPI-A was demonstrated through several significant correlations between its scales and YSR/11-18 syndromes and social interaction measures. The highest correlations were established between the Anxious/Depressed YSR/11-18 scale and other MMPI-A scales such as Schizophrenia (Sc), Welsh's Anxiety (A), Adolescent-Anxiety (A-anx) and Adolescent-Alienation (A-aln), and between the Social Avoidance and Distress Scale and the MMPI-A Adolescent-Social Discomfort (A-sod) scale.  相似文献   

18.
The Team Role Self Perception Inventory (TRSPI) has attracted several studies critical of its psychometric properties. This research uses a large data set and employs confirmatory factor analysis on within‐scale scores to examine the dimensionality and reliability of the TRSPI's scales. Data show that five of the nine scales are unidimensional and that two other scales show generally good fit to a unidimensional solution. The ‘completer‐finisher’ and ‘implementer’ scales show a better fit to a bidimensional structure and would benefit from improved item wording for a small number of items. The ‘shaper’ scale would also benefit from some attention to item wording. Reliability estimates suggest that the reliability of the TRSPI's scales is better than previous estimates imply.  相似文献   

19.
Likert‐type rating scales are among the most widely used tools in psychological research. Different numbers of response categories would likely affect response style, data distribution, reliability, and construct validity. There is a lack of research in factor structure invariance under Likert scales with different numbers of categories. The purpose of this study is to examine the effects of varying numbers of Likert points (4–11) on scale properties such as factor structure, external validity, and latent means based on the Rosenberg Self‐Esteem Scale (M. Rosenberg, 1989 ). The sample consists of 1,807 students from secondary schools in Macau. Confirmatory factor analysis shows that the correlated two‐factor model is the most appropriate one; longitudinal invariance analysis reveals that measurement invariance across Likert scales was satisfied at the scalar level. In addition, latent mean scores on the two factors as well as observed means on the subscales are comparable across Likert scales. Moreover, the measurement model across Likert scales exhibit similar external validity. Although psychometric properties are mostly similar among a different number of points, the 4‐point Likert scale is not recommended for its higher skewness and lower loadings; the 11‐point Likert scale from 0 to 10 is slightly preferred for its higher loadings and composite reliability.  相似文献   

20.
Study designs involving clustering in some study arms, but not all study arms, are common in clinical treatment-outcome and educational settings. For instance, in a treatment arm, persons may be nested in therapy groups, whereas in a control arm there are no groups. Methodological approaches for handling such partially nested designs have recently been developed in a multilevel modeling framework (MLM-PN) and have proved very useful. We introduce two alternative structural equation modeling (SEM) approaches for analyzing partially nested data: a multivariate single-level SEM (SSEM-PN) and a multiple-arm multilevel SEM (MSEM-PN). We show how SSEM-PN and MSEM-PN can produce results equivalent to existing MLM-PNs and can be extended to flexibly accommodate several modeling features that are difficult or impossible to handle in MLM-PNs. For instance, using an SSEM-PN or MSEM-PN, it is possible to specify complex structural models involving cluster-level outcomes, obtain absolute model fit, decompose person-level predictor effects in the treatment arm using latent cluster means, and include traditional factors as predictors/outcomes. Importantly, implementation of such features for partially nested designs differs from that for fully nested designs. An empirical example involving a partially nested depression intervention combines several of these features in an analysis of interest for treatment-outcome studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号