首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The generalizability of behaviors across observational conditions is a critical issue in behavioral assessment. Generalizability theory was used to examine two aspects of audio recorded parent-child interactions recorded over 6 days of home measurement and 1 day of laboratory measurement in a behavioral treatment program for childhood obesity. Families audiotaped parent-child home meetings during which they reviewed self-monitored diet and exercise records that were coded for the following types of interactions: praise statements, negative statements, prompts for new behaviors, and statements promoting problem solving. A similar meeting was audiotaped in our laboratory. The first question explored was the number of measurements needed to generalize to the universe of the six home measures. Results showed an increase in generalizability over measurements for each behavioral category. Using generalizability coefficients of .60 or more, praise, negative comments and prompts, respectively, could be reliably observed based on 1, 4, or 4 days of measurement. Second, the effects of setting (laboratory versus home) were assessed for 1 day of measurement in each environment. Again using generalizability coefficients of .60, generalizability analysis showed that the lab setting could not be generalized to the home setting based on 1 day of measurement, with generalizability coefficients ranging from .27 for negative comments to .57 for praise. Results suggest that 4 days of behavioral assessment in the home can be used to establish generalizable data for all the dependent measures studied. However, generalizability coefficients suggested that 1 day of laboratory measurement was not adequate to generalize to typical home behavior.This research was supported in part by Grant NIH HD 23713 awarded to the third author.  相似文献   

2.
"青少年学生生活满意度量表"的概化理论研究   总被引:2,自引:0,他引:2  
何立国  周爱保 《心理科学》2006,29(5):1199-1202,1218
概化理论是用统计调整技术分析测量误差的一种测量理论,它侧重于从宏观方面讨论实测时的测量条件与结论推广应用范围之间的关系来探讨测量的外部效度问题。本文用概化理论对青少年学生生活满意度量表(CASLSS)进行了研究,得到以下研究结果:(1)对于生活满意度的维度数目,就我国青少年学生而言取6到8个维度较为合适,当对CASLSS取2个维度时,CASLSS只适合作常模参照性解释,而不适合作标准参照性解释;(2)CASLSS的各分量表和总量表的信度较高,且它不仅可以作常模参照性解释,还适合作标准参照性解释;(3)CASLSS的环境满意度因子相对其它五个因子而言,量表特性稍差,是今后改进该量表的主要方向。CASLSS无论是各个因子还是总量表均具有非常优良的量表特性,值得在实际的工作和研究中加以推广应用。  相似文献   

3.
用多元概化理论对普通话的测试   总被引:5,自引:0,他引:5  
杨志明  张雷 《心理学报》2002,34(1):51-56
用多元概化理论 (MGT)研究了国家语委编制的普通话测验。在G研究中 ,利用香港人普通话测试的数据 ,估计了引起分数变异的各种来源的方差与协方差分量。在D研究中 ,首先估计了该测验 3个部分的全域分数和各自的概化系数等技术指标 ,然后估计了全域合成分数及其概化系数、信噪比等指标。结果表明 ,该测验的信度从总体上讲是较高的 ,把三个部分的全域分数进行合成也是合理的 ,但从细节上看其第 3部分的信度较低。另外 ,当评分者个数为 3、试题数量为 2 8时 ,测验的第 1、2部分的信度已经较高 ,因此 ,在实测时减少这两部分的题量并不会有太大问题  相似文献   

4.
For item responses fitting the Rasch model, the assumptions underlying the Mokken model of double monotonicity are met. This makes non‐parametric item response theory a natural starting‐point for Rasch item analysis. This paper studies scalability coefficients based on Loevinger's H coefficient that summarizes the number of Guttman errors in the data matrix. These coefficients are shown to yield efficient tests of the Rasch model using p‐values computed using Markov chain Monte Carlo methods. The power of the tests of unequal item discrimination, and their ability to distinguish between local dependence and unequal item discrimination, are discussed. The methods are illustrated and motivated using a simulation study and a real data example.  相似文献   

5.

This study investigated the generalizability of the Global Assessment of Relational Functioning (GARF) Scale. Found in an appendix of the Diagnostic and Statistical Manual of Mental Disorders under "Criteria Sets for Axes Provided for Future Study," the GARF Scale provides a global rating of a relational unit (family or couple). Thirty-two raters assigned GARF ratings to five relational units. Generalizability analyses indicate extremely high dependability of GARF scores across raters. Higher generalizability coefficients were found for raters who had formal education in family systems theory. Overall, these results are an encouraging step towards adopting the GARF for widespread use.  相似文献   

6.
Over the years, research in the social sciences has been dominated by reporting of reliability coefficients that fail to account for key sources of measurement error. Use of these coefficients, in turn, to correct for measurement error can hinder scientific progress by misrepresenting true relationships among the underlying constructs being investigated. In the research reported here, we addressed these issues using generalizability theory (G-theory) in both traditional and new ways to account for the three key sources of measurement error (random-response, specific-factor, and transient) that affect scores from objectively scored measures. Results from 20 widely used measures of personality, self-concept, and socially desirable responding showed that conventional indices consistently misrepresented reliability and relationships among psychological constructs by failing to account for key sources of measurement error and correlated transient errors within occasions. The results further revealed that G-theory served as an effective framework for remedying these problems. We discuss possible extensions in future research and provide code from the computer package R in an online supplement to enable readers to apply the procedures we demonstrate to their own research.  相似文献   

7.
Generalizability theory was applied to the Matching Familiar Figures Test (MFF) to analyze the dependability of the MFF as a measure of reflection-impulsivity at four grade levels: second, third, fourth, and fifth. A completely crossed, two-facet random model design was used to provide a multidimensional framework for examining the dependability of the MFF. Components of variance and coefficients of generalizability were derived from this design for the MFF error and latency scores at each grade level. Results showed that the MFF latency score was a more dependable measure than the MFF error score. In addition, the number of testing occasions made a more significant contribution to the generalizability of the MFF than the number of items. Coefficients of generalizability based on extrapolated items and occasions were also computed, providing the basis for improving the dependability of the MFF in future research. Overall, results indicate that the traditional method of allowing multiple trials for each item contributes to the imprecision of the MFF error score. An alternative procedure for administering the MFF is recommended.Portions of this article were presented at the annual convention of the American Educational Research Association, Toronto, Canada, March 1978. At various times during the conduct and completion of the present article, the first author was affiliated with the Los Angeles Unified School District and the University of California, Los Angeles.  相似文献   

8.
用多元概化理论考察大学生网络成瘾量表在大学生群体中应用的测量学性能。以随机测量模式的概化设计,针对1200名在校大学生进行问卷调查。结果显示双因子结构的相关程度在0.92以上,五因子结构的相关程度均在0.76~0.97间;整体量表的概化系数和可靠性指数均达到了0.94以上,而双因子结构各因子在0.90左右,五因子结构各因子均在0.74~0.85间。所以,整体量表及各因子在大学生群体中应用的信效度较高,可用作常模和标准参照测验;无论双因子还是五因子结构,CIAS-R各因子在分值比和项目数上,设计非常合理和完善。  相似文献   

9.
Studies of the relationship between human resource (HR) practices and firm performance typically use a single respondent to assess firm level HR practices or HR effectiveness. However, previous research in other substantive areas suggests that rater differences are a potentially important source of measurement error. We demonstrate analytically the potential consequences of both random and systematic measurement error in research on HR and firm performance. However, our main focus is on random error and we show how generalizability theory can be applied to obtain better estimates of reliability by simultaneously recognizing multiple sources (e.g., items, raters) of random measurement error. These more inclusive reliability estimates, in turn, offer the possibility of more precisely quantifying substantive relationships in the HR and firm performance literature. In our sample, reliabilities (as estimated by generalizability coefficients) for single-rater assessments of HR variables were generally below .50. This degree of measurement error, if present in substantive studies on HR and firm performance, could lead to considerable bias, given that an unstandardized regression coefficient is corrected for measurement error in the independent variable by dividing by its reliability coefficient (not its square root). We also found only limited convergent validity between HR and line managers ratings of a second type of HR measure, HR effectiveness. In general, our findings suggest that future researchers need to devote greater attention to measurement error and construct validity issues. Our study provides an example of how generalizability theory can be useful in this pursuit.  相似文献   

10.
We propose an integrative framework for understanding the relationship among 4 closely related issues in human resource (HR) selection: test validity, test bias, selection errors, and adverse impact. One byproduct of our integrative approach is the concept of a previously undocumented source of selection errors we call bias‐based selection errors (i.e., errors that arise from using a biased test as if it were unbiased). Our integrative framework provides researchers and practitioners with a unique tool that generates numerical answers to questions such as the following: What are the anticipated consequences for bias‐based selection errors of various degrees of test validity and test bias? What are the anticipated consequences for adverse impact of various degrees of test validity and test bias? From a theory point of view, our framework provides a more complete picture of the selection process by integrating 4 key concepts that have not been examined simultaneously thus far. From a practical point of view, our framework provides test developers, employers, and policy makers a broader perspective and new insights regarding practical consequences associated with various selection systems that vary on their degree of validity and bias. We present a computer program available online to perform all needed calculations.  相似文献   

11.
从多元概化理论看高考综合能力测试的改进   总被引:10,自引:0,他引:10  
杨志明  张雷  马世晔 《心理学报》2004,36(2):195-200
通过多元概化理论的研究发现,高考综合能力测试(2001,广东)的总体信度达到了可以接受的水平(0.784)。但测验中各部分对总方差的贡献程度与预定的赋分比例有较大差距。其中,地理和政治的贡献度偏低,化学和历史的贡献度偏高。这表明有(历史和化学)偏科特点的考生得到了较高的综合分数。又经决策(D)研究发现,增加地理部分的题量会反常地降低测验的总体信度,这说明有不少高分考生答错或主动放弃了地理科题目。因此,如何有效控制各部分的实际贡献程度、避免负面导向是当前高考综合能力测试亟待解决的问题。  相似文献   

12.
Hierarchical regression analysis is potentially a very useful statistical technique for establishing the significance of sets of predictor variables. However, when a hierarchical analysis which is based on theory is performed, some estimation procedures for the regression coefficients and their associated standard errors are potentially inappropriate. Specifically, the hierarchical regression equations, the incremental or hierarchical tests, and the parameter estimation of this procedure may not correspond. This problem is investigated by the development of four approaches (simultaneous, stagewise, orthogonal, and hierarchical) of estimation to the analysis. For each method, regression, coefficient estimators and their standard errors are determined. By comparison of these approaches, the use of orthogonalized sets of predictor variates or a modification to a series of simultaneous analyses are recommended as the most sensible technique for a theory driven hierarchical analysis.  相似文献   

13.
The Mayer, Salovey, & Caruso Emotional Intelligence Test (MSCEIT) has been reported to provide reliable scores for the four-branch ability model of emotional intelligence [Mayer, J. D., Salovey, P., & Caruso, D. R. (2002). Mayer–Salovey–Caruso Emotional Intelligence Test (MSCEIT). User's manual. Toronto, Canada: Multi-Health Systems]. However, no studies have yet been reported that have carried out a comprehensive analysis of reliability of scores from MSCEIT, taking into account the different conceptual features of the multifacet measurement design. Results from generalizability analyses of scores from 111 Norwegian executives' responses on the Norwegian version of MSCEIT show that scores reflect considerable amounts of measurement error. Ability scores from Perceiving Emotions are multidimensional, reflecting different types of emotion and the presence or absence of rated emotions in the stimuli. Generalizability (reliability) coefficients for scores from Perceiving Emotions, Facilitating Thought, Understanding Emotion, and Managing Emotions were estimated to .71, .37, .50, and .46, respectively, which is substantially lower than reported in previous studies. The low estimated generalizability coefficients suggest that the scores may not generalize well to intended domains, and the validity of some of the scores may be questioned.  相似文献   

14.
The development is reported of an SR-inventory of achievement-related behaviour for the purpose of managerial selection. SR-inventories stem from interactional personality psychology. As the design of an SR-inventory is two-facetted, Cronbach et al.'s generalizability theory forms a suitable framework to investigate it. Using data of 404 Dutch respondents — mostly applicants — several generalizability analyses have been performed to conclude under which circumstances the inventory can be a useful tool. Furthermore, confirmatory factor analysis has been used to substantiate the suggested SR-structure of the instrument. The relationship with other personality factors has been investigated to classify the instrument in the domain of personality assessment.  相似文献   

15.
杨志明  张雷 《心理科学》2003,26(2):305-307
本文针对WISC-CR的二因子和三因子模型,以201位6至7岁小学生为对象,运用多元概化理论的方法研究了其总信度和各亚因子的测量信度。研究发现:(1)二因子模型下各亚因子与总测验的测量信度都比较高。但当它被作为测量言语理解、知觉组织和抗干扰三个因子的量表时,第3个因子的信度过低,且不易得到改善。这表明不宜用它来测量抗干扰的能力;(2)该量表不宜作为标准参照性测验来使用,因为其测量信度无法得到保障,此外,文章还展示了评价一个成套测验总信度及其亚能力因子信度的MGT方法,这于心理测量学的发展是有价值的。  相似文献   

16.
This study examined ratings of fourth graders’ oral reading expression. Randomly assigned participants (n = 36) practiced repeated readings using narrative or informational passages for 7 weeks. After this period raters used the Multidimensional Fluency Scale (MFS) on two separate occasions to rate students’ expressive reading of four equivalent passages. Results of this generalizability study showed that a minimum of two and preferably three equivalent passages, two raters, and one rating occasion are recommended to obtain reliable ratings. This research substantiates the reliability of the MFS and demonstrates the importance of raters collaborating and finding texts at students’ independent reading levels.  相似文献   

17.
Previous research on the effect of class size on student ratings of instruction has primarily investigated the effect of class size on the favorableness of these ratings rather than its effect on their reliability (dependability). A few studies have used "generalizability theory" to demonstrate the relative effect of class size on the dependability of student ratings of instruction. The purpose of the present study was to test the validity of the findings of these studies in a different cultural setting using a different student ratings questionnaire. Using a random-effect analysis of variance to estimate the variance components for a design in which students were nested within classes and crossed with items, it was found that the variance component for class size was appreciably larger than that for items. At least 20 students were needed to obtain a generalizability coefficient for relative decisions of .70 or more. Increasing the number of students has a greater effect on generalizability coefficients than increasing the number of items.  相似文献   

18.
Potential Performance Theory (PPT) is a general theory for parsing observed performance into the underlying strategy and the consistency with which it is used. Although empirical research has supported that PPT is useful, it is desirable to have more information about the bias and standard errors of PPT findings. It also is beneficial to know the effects of violations of PPT assumptions. The authors present computer simulations that evaluate bias and standard errors at varying levels of strategy, consistency, and number of trials per participant. The simulations show that, when the assumptions are true, there is very little bias and the standard errors are low when there are moderate or large numbers of trials per participant (e.g., N = 50 or N = 100). But when the independence assumption is violated, PPT provides biased findings, although the bias is quite small unless the violations are large.  相似文献   

19.
Rater bias is a substantial source of error in psychological research. Bias distorts observed effect sizes beyond the expected level of attenuation due to intrarater error, and the impact of bias is not accurately estimated using conventional methods of correction for attenuation. Using a model based on multivariate generalizability theory, this article illustrates how bias affects research results. The model identifies 4 types of bias that may affect findings in research using observer ratings, including the biases traditionally termed leniency and halo errors. The impact of bias depends on which of 4 classes of rating design is used, and formulas are derived for correcting observed effect sizes for attenuation (due to bias variance) and inflation (due to bias covariance) in each of these classes. The rater bias model suggests procedures for researchers seeking to minimize adverse impact of bias on study findings.  相似文献   

20.
Interrater correlations are widely interpreted as estimates of the reliability of supervisory performance ratings, and are frequently used to correct the correlations between ratings and other measures (e.g., test scores) for attenuation. These interrater correlations do provide some useful information, but they are not reliability coefficients. There is clear evidence of systematic rater effects in performance appraisal, and variance associated with raters is not a source of random measurement error. We use generalizability theory to show why rater variance is not properly interpreted as measurement error, and show how such systematic rater effects can influence both reliability estimates and validity coefficients. We show conditions under which interrater correlations can either overestimate or underestimate reliability coefficients, and discuss reasons other than random measurement error for low interrater correlations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号