期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Generalizability theory: A practical guide to study design,implementation, and interpretation

Amy M. Briesch Hariharan Swaminathan Megan Welsh Sandra M. Chafouleas 《Journal of School Psychology》2014

Generalizability Theory (GT) offers increased utility for assessment research given the ability to concurrently examine multiple sources of variance, inform both relative and absolute decision making, and determine both the consistency and generalizability of results. Despite these strengths, assessment researchers within the fields of education and psychology have been slow to adopt and utilize a GT approach. This underutilization may be due to an incomplete understanding of the conceptual underpinnings of GT, the actual steps involved in designing and implementing generalizability studies, or some combination of both issues. The goal of the current article is therefore two-fold: (a) to provide readers with the conceptual background and terminology related to the use of GT and (b) to facilitate understanding of the range of issues that need to be considered in the design, implementation, and interpretation of generalizability and dependability studies. Given the relevance of this analytic approach to applied assessment contexts, there exists a need to ensure that GT is both accessible to, and understood by, researchers in education and psychology. Important methodological and analytical considerations are presented and implications for applied use are described. 相似文献

2.

高校教师教学水平评价的概化理论分析

王幸君黎光明张敏强蒋欣梁正妍楚肖依《心理科学》2016,39(1):90-96

以概化理论为基础,探究影响高校教师教学水平评价结果的因素。采用《高校教师教学水平评价量表（学生用）》收集评价数据,用mGENOVA对数据进行分析。结果发现：（1）与在第一学期末进行教学水平评价相比,在第二学期初进行教学水平评价的结果可靠性更高;（2）评价每位教师的教学水平仅需抽查20名学生即可保证评价结果的可靠性;（3）不同专业类型的学生对评价指标的侧重点不同,继而影响评价结果的可靠性;（4）学生对理科课程的评价可靠性较高,对文科课程的评价可靠性较低。相似文献

3.

The dependability of general-factor loadings: The effects of factor-extraction methods, test battery composition, test battery size, and their interactions

Randy G. Floyd Elizabeth I. Shands Fawziya A. Rafael Renee Bergeron Kevin S. McGrew 《Intelligence》2009,37(5):453-465

相似文献

4.

An investigation of the generalizability and dependability of Direct Behavior Rating Single Item Scales (DBR-SIS) to measure academic engagement and disruptive behavior of middle school students

Sandra M. Chafouleas Amy M. Briesch Theodore J. Christ Stephen P. Kilgus 《Journal of School Psychology》2010,48(3):219-246

A total of 4 raters, including 2 teachers and 2 research assistants, used Direct Behavior Rating Single Item Scales (DBR-SIS) to measure the academic engagement and disruptive behavior of 7 middle school students across multiple occasions. Generalizability study results for the full model revealed modest to large magnitudes of variance associated with persons (students), occasions of measurement (day), and associated interactions. However, an unexpectedly low proportion of the variance in DBR data was attributable to the facet of rater, as well as a negligible variance component for the facet of rating occasion nested within day (10-min interval within a class period). Results of a reduced model and subsequent decision studies specific to individual rater and rater type (research assistant and teacher) suggested degree of reliability-like estimates differed substantially depending on rater. Overall, findings supported previous recommendations that in the absence of estimates of rater reliability and firm recommendations regarding rater training, ratings obtained from DBR-SIS, and subsequent analyses, be conducted within rater. Additionally, results suggested that when selecting a teacher rater, the person most likely to substantially interact with target students during the specified observation period may be the best choice. 相似文献

5.

The dependability of the general factor of intelligence: Why small, single-factor models do not adequately represent g

Jason T. Major Wendy Johnson Thomas J. Bouchard Jr. 《Intelligence》2011,39(5):418

Floyd, Shands, Rafael, Bergeron and McGrew (2009) used generalizability theory to test the reliability of general-factor loadings and to compare three different sources of error in them: the test battery size, the test battery composition, the factor-extraction technique, and their interactions. They found that their general-factor loadings were moderately to strongly dependable. We replicated the methods of Floyd et al. (2009) in a different sample of tests, from the Minnesota Study of Twins Reared Apart (MISTRA). Our first hypothesis was that, given the greater diversity of the tests in MISTRA, the general-factor loadings would be less dependable than in Floyd et al. (2009). Our second hypothesis, contrary to the positions of Floyd et al. (2009) and Jensen and Weng (1994), was that the general factors from the small, randomly-formed test batteries would differ substantively from the general factor from a well-specified hierarchical model of all available tests. Subtests from MISTRA were randomly selected to form independent and overlapping batteries of 2, 4 and 8 tests in size, and the general-factor loadings of eight probe tests were obtained in each battery by principal components analysis, principal factor analysis and maximum likelihood estimation. Results initially indicated that the general-factor loadings were unexpectedly more dependable than in Floyd et al. (2009); however, further analysis revealed that this was due to the greater diversity of our probe tests. After adjustment for this difference in diversity, and consideration of the representativeness of our probe tests versus those of Floyd et al. (2009), our first hypothesis of lower dependability was confirmed in the overlapping batteries, but not the independent ones. To test the second hypothesis, we correlated g factor scores from the random test batteries with g factor scores from the VPR model; we also calculated special coefficients of congruence on the same relation. Consistent with our second hypothesis, the general factors from small non-hierarchical models were found to not be reliable enough for the purposes of theoretical research. We discuss appropriate standards for the construction and factor analysis of intelligence test batteries. 相似文献

6.

The efficiency of behavior rating scales to assess inattentive-overactive and oppositional-defiant behaviors: applying generalizability theory to streamline assessment

Volpe RJ Briesch AM Gadow KD 《Journal of School Psychology》2011,49(1):131-155

Although the efficiency with which a wide range of behavioral data can be obtained makes behavior rating scales particularly attractive tools for the purposes of screening and evaluation, feasibility concerns arise in the context of formative assessment. Specifically, informant load, or the amount of time informants are asked to contribute to the assessment process, likely has a negative impact on the quality of data over time and the informant's willingness to participate. Two important determinants of informant load in progress monitoring are the length of the rating scale (i.e., the number of items) and how frequently informants are asked to provide ratings (i.e., the number of occasions). The purpose of the current study was to investigate the dependability of the IOWA Conners Teacher Rating Scale (Loney & Milich, 1982), which is used to differentiate inattentive-overactive from oppositional-defiant behaviors. Specifically, the facets of items and occasions were examined to identify combinations of these sources of error necessary to reach an acceptable level of dependability for both absolute and relative decisions. Results from D studies elucidated a variety of possible item-occasion combinations reaching the criteria for adequate dependability. Recommendations for research and practice are discussed. 相似文献

7.

档案袋评价中评分者信度的实证研究

赵群曹亦薇《应用心理学》2006,12(3):258-263

档案袋评价因能充分发挥促进学生发展和教学改进的功能而受到青睐,但不佳的测评信度和效度限制了其在教学评价中的应用。本文对档案袋评分者信度的特点进行实证研究,4位评分者对152份档案袋进行了2次等级评分,运用多种统计方法计算评分者信度。结果表明,档案袋的评分有较高的关联性、中等偏弱的一致性和一定的稳定性,对档案袋整体水平的评分信度最高。本研究中,评分者个数为3时,对档案袋整体水平评分的概化系数和可靠性系数都在0.80以上。相似文献