首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A new method for determining the minimum number of observations per subject needed to achieve a specific generalizability coefficient is presented. This method, which consists of a branch-and-bound algorithm, allows for the employment of constraints specified by the investigator.  相似文献   

2.
"青少年学生生活满意度量表"的概化理论研究   总被引:2,自引:0,他引:2  
何立国  周爱保 《心理科学》2006,29(5):1199-1202,1218
概化理论是用统计调整技术分析测量误差的一种测量理论,它侧重于从宏观方面讨论实测时的测量条件与结论推广应用范围之间的关系来探讨测量的外部效度问题。本文用概化理论对青少年学生生活满意度量表(CASLSS)进行了研究,得到以下研究结果:(1)对于生活满意度的维度数目,就我国青少年学生而言取6到8个维度较为合适,当对CASLSS取2个维度时,CASLSS只适合作常模参照性解释,而不适合作标准参照性解释;(2)CASLSS的各分量表和总量表的信度较高,且它不仅可以作常模参照性解释,还适合作标准参照性解释;(3)CASLSS的环境满意度因子相对其它五个因子而言,量表特性稍差,是今后改进该量表的主要方向。CASLSS无论是各个因子还是总量表均具有非常优良的量表特性,值得在实际的工作和研究中加以推广应用。  相似文献   

3.
Previous research on the effect of class size on student ratings of instruction has primarily investigated the effect of class size on the favorableness of these ratings rather than its effect on their reliability (dependability). A few studies have used "generalizability theory" to demonstrate the relative effect of class size on the dependability of student ratings of instruction. The purpose of the present study was to test the validity of the findings of these studies in a different cultural setting using a different student ratings questionnaire. Using a random-effect analysis of variance to estimate the variance components for a design in which students were nested within classes and crossed with items, it was found that the variance component for class size was appreciably larger than that for items. At least 20 students were needed to obtain a generalizability coefficient for relative decisions of .70 or more. Increasing the number of students has a greater effect on generalizability coefficients than increasing the number of items.  相似文献   

4.
从多元概化理论看高考综合能力测试的改进   总被引:10,自引:0,他引:10  
杨志明  张雷  马世晔 《心理学报》2004,36(2):195-200
通过多元概化理论的研究发现,高考综合能力测试(2001,广东)的总体信度达到了可以接受的水平(0.784)。但测验中各部分对总方差的贡献程度与预定的赋分比例有较大差距。其中,地理和政治的贡献度偏低,化学和历史的贡献度偏高。这表明有(历史和化学)偏科特点的考生得到了较高的综合分数。又经决策(D)研究发现,增加地理部分的题量会反常地降低测验的总体信度,这说明有不少高分考生答错或主动放弃了地理科题目。因此,如何有效控制各部分的实际贡献程度、避免负面导向是当前高考综合能力测试亟待解决的问题。  相似文献   

5.
用多元概化理论对普通话的测试   总被引:5,自引:0,他引:5  
杨志明  张雷 《心理学报》2002,34(1):51-56
用多元概化理论 (MGT)研究了国家语委编制的普通话测验。在G研究中 ,利用香港人普通话测试的数据 ,估计了引起分数变异的各种来源的方差与协方差分量。在D研究中 ,首先估计了该测验 3个部分的全域分数和各自的概化系数等技术指标 ,然后估计了全域合成分数及其概化系数、信噪比等指标。结果表明 ,该测验的信度从总体上讲是较高的 ,把三个部分的全域分数进行合成也是合理的 ,但从细节上看其第 3部分的信度较低。另外 ,当评分者个数为 3、试题数量为 2 8时 ,测验的第 1、2部分的信度已经较高 ,因此 ,在实测时减少这两部分的题量并不会有太大问题  相似文献   

6.
Solutions for the problem of maximizing the generalizability coefficient under a budget constraint are presented. It is shown that the Cauchy-Schwarz inequality can be applied to derive optimal continuous solutions for the number of conditions of each facet. The author thanks Sjoerd Baas and Agnes Broeren for their many helpful remarks.  相似文献   

7.
黎光明  秦越 《心理学报》2022,54(10):1262-1276
概化理论在心理与教育测量领域应用较广。如何使测量程序在预算限制的情况下达到较优的可靠性是研究者需要考虑的重要问题, 这个问题可以转换为最佳样本量估计的问题。提出了一种基于进化算法的估计概化理论下最佳样本量的新方法——约束进化算法, 并采用模拟研究的方法比较了微分优化法、拉格朗日法、柯西不等式法等三种传统方法与约束进化算法的优劣。结果表明:在两侧面交叉设计、两侧面嵌套设计和三侧面交叉设计中都证明了约束进化算法更具优越性, 建议研究者在今后的研究中优先使用。  相似文献   

8.
An important relationship is given for two generalizations of coefficient alpha, Rajaratnam, Cronbach and Gleser's generalizability formula for stratified-parallel tests and Raju's coefficient beta.The author gratefully acknowledges the generous assistance given by reviewers and the editor.  相似文献   

9.
Three pigeons were exposed to a fixed-time response independent food-delivery schedule and a live target pigeon restrained in a holder mounted on a spring and microswitch assembly. This common method of recording aggression was compared with a photocell system, and both were evaluated by observation of video-tape recordings. Dependent variables included the number of interfood intervals with an attack, attacks per minute, and attack duration. The photocell proved more reliable than the microswitch and correlated highly with observations of both the number of interfood intervals with an attack for three subjects and attack duration for two. Neither apparatus provided accurate information about the rate of attacks. The microswitch apparatus was not sensitive to changes in the subject’s attack topography, while both recording devices were susceptible to activation by responses in the attacking pigeon other than discrete pecks or physical blows. In view of these findings, attacks per minute may not be an appropriate measure of aggression in studies using such devices.  相似文献   

10.
Some developments in multivariate generalizability   总被引:2,自引:0,他引:2  
This article is concerned with estimation of components of maximum generalizability in multifacet experimental designs involving multiple dependent measures. Within a Type II multivariate analysis of variance framework, components of maximum generalizability are defined as those composites of the dependent measures that maximize universe score variance for persons relative to observed score variance. The coefficient of maximum generalizability, expressed as a function of variance component matrices, is shown to equal the squared canonical correlation between true and observed scores. Emphasis is placed on estimation of variance component matrices, on the distinction between generalizability- and decision-studies, and on extension to multifacet designs involving crossed and nested facets. An example of a two-facet partially nested design is provided.Appreciation is expressed to the Office of Research in Medical Education, University of Texas Medical Branch, for permitting use of their data.  相似文献   

11.
杨志明  张雷 《心理科学》2003,26(2):305-307
本文针对WISC-CR的二因子和三因子模型,以201位6至7岁小学生为对象,运用多元概化理论的方法研究了其总信度和各亚因子的测量信度。研究发现:(1)二因子模型下各亚因子与总测验的测量信度都比较高。但当它被作为测量言语理解、知觉组织和抗干扰三个因子的量表时,第3个因子的信度过低,且不易得到改善。这表明不宜用它来测量抗干扰的能力;(2)该量表不宜作为标准参照性测验来使用,因为其测量信度无法得到保障,此外,文章还展示了评价一个成套测验总信度及其亚能力因子信度的MGT方法,这于心理测量学的发展是有价值的。  相似文献   

12.
Monte Carlo techniques were used to evaluate the performance of an on-line paired-comparisons data collection procedure that makes use of a common computer sorting algorithm. The results revealed that the sorting method can reduce the number of trials per subject substantially even when a considerable amount of random error is present. While a complete paired-comparisons design requires N(N?1)/2 trials (where N is the number of objects), the sorting procedure requires a theoretical minimum of N(log2N) trials. The savings in the number of trials consequently increases with N. Furthermore, the negative effect of random error on the final ordering of the data from the sorting method is small and decreases with the number of stimuli. The data from a small empirical study reinforces the Monte Carlo observations. It is recommended that the sorting method be used in place of the complete paired-comparisons procedure whenever a substantial number of stimuli are included in the design.  相似文献   

13.
14.
The advent of telemetric devices that sample data extensively over time has facilitated single subject or idiographic research to intensively study a single person over time. One of the challenges of idiographic research is combining single subject results to determine generalizability across subjects. This article demonstrates the first behavioral science application of pooled time series analysis, an extension of time series analysis that allows for the testing of between-person effects. The analysis used cardiovascular data gathered from 4 children with autism between the ages of 10 and 20 while exposed to 6 experimentally manipulated environmental stressors. A pooled time series analysis using the general transformation approach identified 1 general (a difficult learning task) and 3 specific stressors (exposure to a loud noise, unstructured time, and eating a preferred food) across the 4 participants. This application of pooled time series analysis demonstrates the challenges and potential for this method to address the issue of generalizability when using an idiographic research approach in the behavioral sciences.  相似文献   

15.
Experiments in which recognition performance is measured sometimes involve only a small number of observations per subject, rendering d' analysis unreliable (Schooler & Shiffrin, 2005). Here, we introduce, in signal detection models, subject-specific random variables to account for heterogeneous hit and false alarm rates among individuals. Population d' effects for comparing groups are estimated, in this approach, by pooling information from a sample of subjects across experimental conditions. The method is validated by a simulation study and is illustrated with an analysis of the effect of neutral and emotional words on recognition performance, employing the emotional Stroop task (Lee & Shih, 2007).  相似文献   

16.
The generalizability of behaviors across observational conditions is a critical issue in behavioral assessment. Generalizability theory was used to examine two aspects of audio recorded parent-child interactions recorded over 6 days of home measurement and 1 day of laboratory measurement in a behavioral treatment program for childhood obesity. Families audiotaped parent-child home meetings during which they reviewed self-monitored diet and exercise records that were coded for the following types of interactions: praise statements, negative statements, prompts for new behaviors, and statements promoting problem solving. A similar meeting was audiotaped in our laboratory. The first question explored was the number of measurements needed to generalize to the universe of the six home measures. Results showed an increase in generalizability over measurements for each behavioral category. Using generalizability coefficients of .60 or more, praise, negative comments and prompts, respectively, could be reliably observed based on 1, 4, or 4 days of measurement. Second, the effects of setting (laboratory versus home) were assessed for 1 day of measurement in each environment. Again using generalizability coefficients of .60, generalizability analysis showed that the lab setting could not be generalized to the home setting based on 1 day of measurement, with generalizability coefficients ranging from .27 for negative comments to .57 for praise. Results suggest that 4 days of behavioral assessment in the home can be used to establish generalizable data for all the dependent measures studied. However, generalizability coefficients suggested that 1 day of laboratory measurement was not adequate to generalize to typical home behavior.This research was supported in part by Grant NIH HD 23713 awarded to the third author.  相似文献   

17.
各种心理调查、心理实验中, 数据的缺失随处可见。由于数据缺失, 给概化理论分析非平衡数据的方差分量带来一系列问题。基于概化理论框架下, 运用Matlab 7.0软件, 自编程序模拟产生随机双面交叉设计p×i×r缺失数据, 比较和探讨公式法、REML法、拆分法和MCMC法在估计各个方差分量上的性能优劣。结果表明:(1) MCMC方法估计随机双面交叉设计p×i×r缺失数据方差分量, 较其它3种方法表现出更强的优势; (2) 题目和评分者是缺失数据方差分量估计重要的影响因素。  相似文献   

18.
Studies of the relationship between human resource (HR) practices and firm performance typically use a single respondent to assess firm level HR practices or HR effectiveness. However, previous research in other substantive areas suggests that rater differences are a potentially important source of measurement error. We demonstrate analytically the potential consequences of both random and systematic measurement error in research on HR and firm performance. However, our main focus is on random error and we show how generalizability theory can be applied to obtain better estimates of reliability by simultaneously recognizing multiple sources (e.g., items, raters) of random measurement error. These more inclusive reliability estimates, in turn, offer the possibility of more precisely quantifying substantive relationships in the HR and firm performance literature. In our sample, reliabilities (as estimated by generalizability coefficients) for single-rater assessments of HR variables were generally below .50. This degree of measurement error, if present in substantive studies on HR and firm performance, could lead to considerable bias, given that an unstandardized regression coefficient is corrected for measurement error in the independent variable by dividing by its reliability coefficient (not its square root). We also found only limited convergent validity between HR and line managers ratings of a second type of HR measure, HR effectiveness. In general, our findings suggest that future researchers need to devote greater attention to measurement error and construct validity issues. Our study provides an example of how generalizability theory can be useful in this pursuit.  相似文献   

19.
A rank testR n has been proposed by Cronholm and Revusky which in effect increases the number of observations per subject by the device of randomly assigning a subject to a treated group in each of a series of sub-experiments. In this note further results on the asymptotic normality of theR n test are discussed both under the null hypothesisH 0 of randomness and under a class of alternatives toH 0. The limiting power of the test is also discussed.  相似文献   

20.
三种心理测量理论的信度观   总被引:5,自引:0,他引:5  
目前,心理测量领域中主要存在三大理论派别。本文分别对这三种理论即经典测验理论、可概括性理论和项目反应理论作了简要介绍,着重分析这三种理论的信度观。文章讨论了这三种信度观的理论基础和研究方法,比较了它们的异同,指出经典测验理论存在的一些不足及概化理论和项目反应理论所作的改进。概化理论是对经典测验理论的扩展,它用多维的信度指标(概化系数)替代了经典测验理论的信度系数,项目反应理论则从信息量的角度出发,用项目信息函数、测验信息函数等指标更具体深入地反映项目、测验的测量可靠程度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号