首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   7篇
  免费   0篇
  2016年   1篇
  2014年   1篇
  2011年   2篇
  2010年   1篇
  2009年   1篇
  2006年   1篇
排序方式: 共有7条查询结果,搜索用时 15 毫秒
1
1.
Generalizability Theory (GT) offers increased utility for assessment research given the ability to concurrently examine multiple sources of variance, inform both relative and absolute decision making, and determine both the consistency and generalizability of results. Despite these strengths, assessment researchers within the fields of education and psychology have been slow to adopt and utilize a GT approach. This underutilization may be due to an incomplete understanding of the conceptual underpinnings of GT, the actual steps involved in designing and implementing generalizability studies, or some combination of both issues. The goal of the current article is therefore two-fold: (a) to provide readers with the conceptual background and terminology related to the use of GT and (b) to facilitate understanding of the range of issues that need to be considered in the design, implementation, and interpretation of generalizability and dependability studies. Given the relevance of this analytic approach to applied assessment contexts, there exists a need to ensure that GT is both accessible to, and understood by, researchers in education and psychology. Important methodological and analytical considerations are presented and implications for applied use are described.  相似文献   
2.
以概化理论为基础,探究影响高校教师教学水平评价结果的因素。采用《高校教师教学水平评价量表(学生用)》收集评价数据,用mGENOVA对数据进行分析。结果发现:(1)与在第一学期末进行教学水平评价相比,在第二学期初进行教学水平评价的结果可靠性更高;(2)评价每位教师的教学水平仅需抽查20名学生即可保证评价结果的可靠性;(3)不同专业类型的学生对评价指标的侧重点不同,继而影响评价结果的可靠性;(4)学生对理科课程的评价可靠性较高,对文科课程的评价可靠性较低。  相似文献   
3.
4.
A total of 4 raters, including 2 teachers and 2 research assistants, used Direct Behavior Rating Single Item Scales (DBR-SIS) to measure the academic engagement and disruptive behavior of 7 middle school students across multiple occasions. Generalizability study results for the full model revealed modest to large magnitudes of variance associated with persons (students), occasions of measurement (day), and associated interactions. However, an unexpectedly low proportion of the variance in DBR data was attributable to the facet of rater, as well as a negligible variance component for the facet of rating occasion nested within day (10-min interval within a class period). Results of a reduced model and subsequent decision studies specific to individual rater and rater type (research assistant and teacher) suggested degree of reliability-like estimates differed substantially depending on rater. Overall, findings supported previous recommendations that in the absence of estimates of rater reliability and firm recommendations regarding rater training, ratings obtained from DBR-SIS, and subsequent analyses, be conducted within rater. Additionally, results suggested that when selecting a teacher rater, the person most likely to substantially interact with target students during the specified observation period may be the best choice.  相似文献   
5.
Floyd, Shands, Rafael, Bergeron and McGrew (2009) used generalizability theory to test the reliability of general-factor loadings and to compare three different sources of error in them: the test battery size, the test battery composition, the factor-extraction technique, and their interactions. They found that their general-factor loadings were moderately to strongly dependable. We replicated the methods of Floyd et al. (2009) in a different sample of tests, from the Minnesota Study of Twins Reared Apart (MISTRA). Our first hypothesis was that, given the greater diversity of the tests in MISTRA, the general-factor loadings would be less dependable than in Floyd et al. (2009). Our second hypothesis, contrary to the positions of Floyd et al. (2009) and Jensen and Weng (1994), was that the general factors from the small, randomly-formed test batteries would differ substantively from the general factor from a well-specified hierarchical model of all available tests. Subtests from MISTRA were randomly selected to form independent and overlapping batteries of 2, 4 and 8 tests in size, and the general-factor loadings of eight probe tests were obtained in each battery by principal components analysis, principal factor analysis and maximum likelihood estimation. Results initially indicated that the general-factor loadings were unexpectedly more dependable than in Floyd et al. (2009); however, further analysis revealed that this was due to the greater diversity of our probe tests. After adjustment for this difference in diversity, and consideration of the representativeness of our probe tests versus those of Floyd et al. (2009), our first hypothesis of lower dependability was confirmed in the overlapping batteries, but not the independent ones. To test the second hypothesis, we correlated g factor scores from the random test batteries with g factor scores from the VPR model; we also calculated special coefficients of congruence on the same relation. Consistent with our second hypothesis, the general factors from small non-hierarchical models were found to not be reliable enough for the purposes of theoretical research. We discuss appropriate standards for the construction and factor analysis of intelligence test batteries.  相似文献   
6.
Although the efficiency with which a wide range of behavioral data can be obtained makes behavior rating scales particularly attractive tools for the purposes of screening and evaluation, feasibility concerns arise in the context of formative assessment. Specifically, informant load, or the amount of time informants are asked to contribute to the assessment process, likely has a negative impact on the quality of data over time and the informant's willingness to participate. Two important determinants of informant load in progress monitoring are the length of the rating scale (i.e., the number of items) and how frequently informants are asked to provide ratings (i.e., the number of occasions). The purpose of the current study was to investigate the dependability of the IOWA Conners Teacher Rating Scale (Loney & Milich, 1982), which is used to differentiate inattentive-overactive from oppositional-defiant behaviors. Specifically, the facets of items and occasions were examined to identify combinations of these sources of error necessary to reach an acceptable level of dependability for both absolute and relative decisions. Results from D studies elucidated a variety of possible item-occasion combinations reaching the criteria for adequate dependability. Recommendations for research and practice are discussed.  相似文献   
7.
赵群  曹亦薇 《应用心理学》2006,12(3):258-263
档案袋评价因能充分发挥促进学生发展和教学改进的功能而受到青睐,但不佳的测评信度和效度限制了其在教学评价中的应用。本文对档案袋评分者信度的特点进行实证研究,4位评分者对152份档案袋进行了2次等级评分,运用多种统计方法计算评分者信度。结果表明,档案袋的评分有较高的关联性、中等偏弱的一致性和一定的稳定性,对档案袋整体水平的评分信度最高。本研究中,评分者个数为3时,对档案袋整体水平评分的概化系数和可靠性系数都在0.80以上。  相似文献   
1
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号