期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

叶宝娟温忠粦胡竹菁《心理科学》2013,36(6):1464-1469

元分析是根据现有研究对感兴趣的主题得出比较准确和有代表性结论的一种重要方法,在心理、教育、管理、医学等社会科学研究中得到广泛应用。信度是衡量测验质量的重要指标,用合成信度能比较准确的估计测验信度。未见有文献提供合成信度元分析方法。本研究在比较对参数进行元分析的三种模型优劣的基础上,在变化系数模型下推出合成信度元分析点估计及区间估计的方法;以区间覆盖率为衡量指标,模拟研究表明本研究提出的合成信度元分析区间估计的方法得当;举例说明如何对单维测验的合成信度进行元分析。相似文献

2.

追踪研究中测验信度的估计

叶宝娟温忠麟陈启山《心理科学进展》2012,20(3):467-474

追踪研究中测验工具的信度是衡量追踪研究质量的重要指标。传统的信度估计方法不适用于估计追踪研究的测验信度。近年来, 研究者提出了四种估计追踪研究的测验信度, 包括估计单个时间点的测验信度系数rw和r(Sw), 以及估计整个追踪研究的测验信度系数RT和RL。本文评述了这四种信度估计方法的数学模型、前提假设及其优缺点。RT和RL既可估计追踪研究中单个时间点的测验信度, 也可估计追踪研究中整个追踪研究的测验信度, 所需要的前提假设较少, 推荐同时使用RT和RL来估计追踪研究的测验信度。相似文献

3.

信度的再认识与信度概括化研究

关丹丹张厚粲《心理科学》2004,27(2):445-448

本文首先对信度概念进行了明确,指出信度是评价测验结果可靠与否的一个指标,而不是测验工具的不变属性。针对测验结果的信度估计的可变性,介绍了上世纪末Vacha-Haase提出的信度概括化研究方法．即一种用来探索得分信度估计的可变性、并对引起变异的预测源进行探讨的一种元分析方法。最后通过对信度概括化研究手段的分析,指出信度概念的再认识与信度概括化研究将会给心理测验工作者带来新的启示。相似文献

4.

信度估计的γ系数 总被引：11，自引：0，他引：11

谢小庆《心理学报》1998,31(2):193-196

本文提出了信度估计的γ系数。与今天应用最广泛的α系数相比,γ系数较少受到被试样本特点的影响,更多地反映了测验受到误差因素影响的程度,因此,γ系数是更好的测验信度估计指标。相似文献

5.

标准参照测验及其等级线信度的概化理论分析 总被引：2，自引：1，他引：1

杨志明《心理学探新》2003,23(3):52-56

在测量工作中,误用经典测验理论方法估计标准参照性测验的整体信度和等级线决策信度的情况非常突出。如,无论测量设计是交叉的还是嵌套的,也无论测验结果是做常模参照性解释,还是做标准参照性解释,测验工作者往往只报告克龙巴赫α系数或经典测验理论中的其它少数几个信度指标,而误把整体信度作为等级线信度的现象则更加普遍,这是十分不妥的。本文借用概化理论中的可靠性指数Φ和Φ(λ)公式,分别针对交叉设计和嵌套设计,就标准参照性测验的整体信度和等级分数线决策信度的估计问题进行了探讨。用数据演示的方法比较了交叉设计与嵌套设计在估计标准参照性测验整体信度方面的差异,展示了等级决策分数线决策信度的估计方法。相似文献

6.

资格认证测验的信度估计及其特征分析

赵世明《心理学探新》2006,26(3):84-87

资格认证测验属于典型的标准参照测验,在国内已得到普遍应用,但在报告心理测量学指标时很少提到标准参照模式的信度估计指标。该文归纳了标准参照测验信度估计的指标体系,分析讨论了适用于资格认证测验的信度估计及其与测验长度、分界标准分布、样本同质性的关系与特征。相似文献

7.

测验信度估计：从α系数到内部一致性信度 总被引：5，自引：0，他引：5

温忠麟叶宝娟《心理学报》2011,43(7):821-829

沿用经典的测验信度定义, 简介了信度与a 系数的关系以及a系数的局限。为了推荐替代a系数的信度估计方法, 深入讨论了与a 系数关系密切的同质性信度和内部一致性信度。在很一般的条件下, 证明了a 系数和同质性信度都不超过内部一致性信度, 后者不超过测验信度, 说明内部一致性信度比较接近测验信度。总结出一个测验信度分析流程, 说明什么情况下a 系数还有参考价值; 什么情况下a 系数不再适用, 应当使用内部一致性信度(文献上也常称为合成信度)。提供了计算同质性信度和内部一致性信度的计算程序, 一般的应用工作者可以直接套用。相似文献

8.

使用Bootstrap方法计算认知诊断评估中的信度

下载免费PDF全文

郭磊张金明《心理学探新》2018,(5):433-439

测验信度是衡量测验质量的一个重要指标,认知诊断评估中同样需要重视信度问题。现有认知诊断中计算信度的方法均有一个前提假设:被试在前后两次测验的后验概率分布和边际概率完全相同。该假设过强,未考虑两次测验间存在的随机误差。基于Bootstrap抽样,提出了两类属性信度和模式信度的指标,分别是积差相关法和修正的一致性法。通过模拟研究比较了新方法和现有方法在不同属性个数、属性间相关性和题目数量下的表现,并基于英语能力认证考试ECPE和分数减法的实证数据验证了新方法的可行性。最后,对信度估计的影响因素进行了讨论。相似文献

9.

用Delta法估计多维测验合成信度的置信区间

叶宝娟温忠粦《心理科学》2012,35(5):1213-1217

大量研究表明,一般情况下用合成信度可以较好地估计测验信度。对于合成信度及其置信区间的估计方法,在单维测验的情形已有不少研究。但罕有研究讨论多维测验合成信度的区间估计方法。本文用Delta法推导出计算多维测验合成信度的标准误公式,进而计算置信区间,并用一个例子说明如何编程估计多维测验合成信度及其置信区间。相似文献

10.

同质信度多种指标的比较研究 总被引：1，自引：0，他引：1

顾海根李超《心理科学》2005,28(5):1196-1198

本研究采用实验研究方式．以被试人数、项目数和方差是否相等为3个自变量,研究随自变量变化因变景同质信度不同指标各自的稳定性。然后采用概化理论的研究方法对上述结果进一步验证。两种研究结果一致表明,α系数对方差是否相等比较敏感．不是同质信度的理想指标。β系数γ系数ζ系数对信度的估计精度基本处于同一水平．在抗干扰性上明显比。系数优越,但不及ρ系数。ρ系数在各种条件下都很稳定,能较准确地反映同质信度。相似文献

11.

中文版一般自我效能量表的信度和效度检验

胡象岭田春凤孙方尽《心理学探新》2014,(1):53-56

该研究以山东省610名高中生为被试,检验了中文版一般自我效能量表（GSES）的信度和效度.结果表明：（1）中文版GSES的有些项目,区分度不高;（2）中文版GSES具有较高的内部一致性信度与分半信度,但重测信度不高;（3）中文版GSES的单维度性没有得到证实;（4）中文版GSES不具有很好的预测效度. 相似文献

12.

Reliability coefficients for multiple group item response theory models

Björn Andersson Hao Luo Kseniia Marcq 《The British journal of mathematical and statistical psychology》2022,75(2):395-410

Reliability of scores from psychological or educational assessments provides important information regarding the precision of measurement. The reliability of scores is however population dependent and may vary across groups. In item response theory, this population dependence can be attributed to differential item functioning or to differences in the latent distributions between groups and needs to be accounted for when estimating the reliability of scores for different groups. Here, we introduce group-specific and overall reliability coefficients for sum scores and maximum likelihood ability estimates defined by a multiple group item response theory model. We derive confidence intervals using asymptotic theory and evaluate the empirical properties of estimators and the confidence intervals in a simulation study. The results show that the estimators are largely unbiased and that the confidence intervals are accurate with moderately large sample sizes. We exemplify the approach with the Montreal Cognitive Assessment (MoCA) in two groups defined by education level and give recommendations for applied work. 相似文献

13.

How to find what's in a name: Scrutinizing the optimality of five scoring algorithms for the name‐letter task

Etienne P. LeBel Bertram Gawronski 《欧洲人格杂志》2009,23(2):85-106

Although the name‐letter task (NLT) has become an increasingly popular technique to measure implicit self‐esteem (ISE), researchers have relied on different algorithms to compute NLT scores and the psychometric properties of these differently computed scores have never been thoroughly investigated. Based on 18 independent samples, including 2690 participants, the current research examined the optimality of five scoring algorithms based on the following criteria: reliability; variability in reliability estimates across samples; types of systematic error variance controlled for; systematic production of outliers and shape of the distribution of scores. Overall, an ipsatized version of the original algorithm exhibited the most optimal psychometric properties, which is recommended for future research using the NLT. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

14.

内隐联想测验：信度、效度及原理 总被引：9，自引：0，他引：9

侯珂邹泓张秋凌《心理科学进展》2004,12(2):223-230

内隐联想测验（Implicit Association Test）是一种评估个体对两个概念的自动化联系强度的间接测量方法,近年来被广泛应用于内隐社会认知研究。有很多证据显示,较之于外显测验,IAT能为研究提供更多新的信息,但其信度、效度指标都有待改善,而且不同学者对其测验原理仍有不同解释。因此,研究者对IAT的应用及对其结果的解释需持谨慎态度。另外,文章还简单介绍了IAT数据处理的新方法和一些IAT的变式。相似文献

15.

研究生入学考试写作评分的概化理论研究与多面Rasch分析

关丹丹《心理学探新》2014,34(5):437-440

为了评价和改进硕士研究生入学考试一般能力测试的写作评分,研究者采用概化理论和多面Rasch分析对113位考生的写作样本的评分误差来源、评分信度等进行了探讨.概化理论研究显示,评分者和题目对评分准确性影响不大,以两道写作题的考试设计而言,评分者为2人即可保证评分信度在0.75以上.多面Rasch分析显示,评分者宽严度的估计值及其误差均在可接受的范围内,评分者之间在宽严度上不存在显著差异,且评分者自身在评分时总体上比较稳定.但个别评分者在特定考生特定题目上表现出特殊偏向.概化理论和多面Rasch分析丰富了写作评分研究的量化指标,证实了硕士研究生入学考试一般能力测试的写作评分具有较高的信度. 相似文献

16.

A graphical judgmental aid which summarizes obtained and chance reliability data and helps assess the believability of experimental effects

Birkimer JC Brown JH 《Journal of applied behavior analysis》1979,12(4):523-533

Interval by interval reliability has been criticized for "inflating" observer agreement when target behavior rates are very low or very high. Scored interval reliability and its converse, unscored interval reliability, however, vary as target behavior rates vary when observer disagreement rates are constant. These problems, along with the existence of "chance" values of each reliability which also vary as a function of response rate, may cause researchers and consumers difficulty in interpreting observer agreement measures. Because each of these reliabilities essentially compares observer disagreements to a different base, it is suggested that the disagreement rate itself be the first measure of agreement examined, and its magnitude relative to occurrence and to nonoccurrence agreements then be considered. This is easily done via a graphic presentation of the disagreement range as a bandwidth around reported rates of target behavior. Such a graphic presentation summarizes all the information collected during reliability assessments and permits visual determination of each of the three reliabilities. In addition, graphing the "chance" disagreement range around the bandwidth permits easy determination of whether or not true observer agreement has likely been demonstrated. Finally, the limits of the disagreement bandwidth help assess the believability of claimed experimental effects: those leaving no overlap between disagreement ranges are probably believable, others are not. 相似文献

17.

Ill-structured measurement designs in organizational research: implications for estimating interrater reliability

Putka DJ Le H McCloy RA Diaz T 《The Journal of applied psychology》2008,93(5):959-981

Organizational research and practice involving ratings are rife with what the authors term ill-structured measurement designs (ISMDs)--designs in which raters and ratees are neither fully crossed nor nested. This article explores the implications of ISMDs for estimating interrater reliability. The authors first provide a mock example that illustrates potential problems that ISMDs create for common reliability estimators (e.g., Pearson correlations, intraclass correlations). Next, the authors propose an alternative reliability estimator--G(q,k)--that resolves problems with traditional estimators and is equally appropriate for crossed, nested, and ill-structured designs. By using Monte Carlo simulation, the authors evaluate the accuracy of traditional reliability estimators compared with that of G(q,k) for ratings arising from ISMDs. Regardless of condition, G(q,k) yielded estimates as precise or more precise than those of traditional estimators. The advantage of G(q,k) over the traditional estimators became more pronounced with increases in the (a) overlap between the sets of raters that rated each ratee and (b) ratio of rater main effect variance to true score variance. Discussion focuses on implications of this work for organizational research and practice. 相似文献

18.

The Role of Popular Music in the Construction of Alternative Spiritual Identities and Ideologies 总被引：1，自引：0，他引：1

GORDON LYNCH 《Journal for the scientific study of religion》2006,45(4):481-488

Setting its discussion in the wider context of the decline of institutional religion among young adults, the rise of alternative spiritualities, and the mediatization of religion, the article explores the significance of popular music in the development of alternative spiritual identities and ideologies. A summary is given of leading research conducted in this field by Christopher Partridge and Graham St. John. It is argued that they demonstrate the encoding of alternative spiritual symbols and ideologies into certain forms of popular music, they fail to give an adequate account of how audiences actively make use of this music to construct alternative spiritual identities or frameworks of meaning. The article concludes that researchers in the field of religion and popular music need to draw more on theories and methods developed in ethno-musicology and the sociology of music, and suggests that the work of Tia De Nora on music in everyday life raises important questions about the qualities and context of the act of listening to music that could generate more nuanced accounts of how popular music shapes alternative spiritual identities and ideologies. 相似文献

19.

Need to train your rat? There is an App for that: A touchscreen behavioral evaluation system

Joshua E. Wolf Catherine M. Urbano Chad M. Ruprecht Kenneth J. Leising 《Behavior research methods》2014,46(1):206-214

The increasing demand for highly automated and flexible tasks capable of assessing visual learning and memory in nonhuman animals has led to the exciting development of a wide array of prefabricated touchscreen-equipped systems. However, the high cost of these prefabricated systems has led many researchers to develop or modify their own preexisting equipment. We developed a freely downloadable App, the Touchscreeen Behavioral Evaluation System (TBES) for use in conjunction with an iPad (Apple, Cupertino, California) as an alternative to prefabricated touchscreen systems. TBES allows for stimulus presentation and data collection on an iPad. The touchscreen technology offered by the iPad is attractive to researchers due to its affordability, reliability, and resistance to false inputs. We highlight these, as well as the feasibility and procedural flexibility of TBES, in an effort to promote our system as a competitive alternative to those currently available. 相似文献