期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

焦璨张敏强张洁婷吴利张文怡《心理科学》2011,34(6):1488-1495

对中国7种主要的心理学杂志,于1998-2008年间发表的与EPQ有关的研究报告或论文做信度概化分析,并与Caruso等人对其他国家的EPQ信度概化分析结果作比较。分析结果表明：中外心理量表使用者都存在严重的“信度引入”的状况;分量表的分数的标准差是信度系数最重要的预测变量;但其他预测变量有所不同。提供的启示是：使用心理量表时一定要报告当前样本的信度系数;不符合心理量表特性要求而增加项目,不一定能提高测验结果的信度。相似文献

2.

领导干部结构化面试信度的多元概括化理论分析 总被引：1，自引：0，他引：1

洪自强涂冬波《心理学探新》2006,26(1):85-90,95

本研究尝试运用多元概括化理论对北京市某区副处级干部准入资格结构化面试测评数据进行测量信度分析,为提高领导干部考试与测评工作科学化水平提供了有益的实证依据。主要结论有:(1)本次结构化面试难度适中,区分度较高;(2)各测评要素及合成分数的类信度系数均较高,合成分数的测量信度高于单个测评要素的测量信度;(3)各测评要素及合成分数的类信度系数随着考官数量的增加而增加,且从确保信度和降低成本考虑,考官数量以5-9位为宜;(4)在这次面试测评中,各项测评要素间的相关系数较高,这为目前在选拔面试中将各项测评要素得分进行合成提供了依据,说明用合成分数计算总分具有一定的合理性。相似文献

3.

信度系数与观测分数和潜在特质的相关比

陈希镇《心理学报》1993,26(4):61-65

在经典真分数模型中,信度系数R=D(T)/D(X)。通常认为,信度系数与项目反应理论没有什么联系。事实上,信度系数恰好等于考生观测分数与潜在特质分数的非线性相关比η_(xθ)~2=1-MD(x|θ)/D(X),据此我们得出估计信度系数的一种新途径,同时讨论了x、θ的相关系数与相关比η_(xθ)~2的关系。相似文献

4.

大学生自信现状研究 总被引：5，自引：0，他引：5

黄泽娟许冬青李董平陈岳标黎峻成《社会心理科学》2005,20(5):84-89

采用自编大学生自信心量表对广州市396名大学生的自信状况进行调查,结果表明：大学生自信心量表信度和效度较好,包含交往、综合确认、能力、个体发展四个维度。男女大学生在个体发展上的自信水平存在性别差异,男生得分显著高于女生;独生子女与非独生子女在综合确认上存在显著差异,非独生子女得分显著高于独生子女;学生干部与非学生干部在自信诸维度及自信总分上差异非常显著,前者显著高于后者;学习成绩越好的大学生在综合确认、能力、个体发展以及自信总分上得分越高。相似文献

5.

心理测验中的趋中回归与超常分数重现概率 总被引：2，自引：0，他引：2

温忠麟侯杰泰《心理学报》2003,35(3):419-425

研究了心理测验中的趋中回归及其性质,超常分数重现概率与真分数的分布、测验信度、超常分数界值的关系。结果表明,信度越低,趋中回归越大;超常分数界值越高,趋中回归越大。在正态分布下,超常分数重现概率与信度是指数关系;超常分数重现概率与界值是直线关系。初步讨论了如何在心理学研究中避免和减少趋中回归的误导。相似文献

6.

信度的再认识与信度概括化研究

关丹丹张厚粲《心理科学》2004,27(2):445-448

本文首先对信度概念进行了明确,指出信度是评价测验结果可靠与否的一个指标,而不是测验工具的不变属性。针对测验结果的信度估计的可变性,介绍了上世纪末Vacha-Haase提出的信度概括化研究方法．即一种用来探索得分信度估计的可变性、并对引起变异的预测源进行探讨的一种元分析方法。最后通过对信度概括化研究手段的分析,指出信度概念的再认识与信度概括化研究将会给心理测验工作者带来新的启示。相似文献

7.

同质信度多种指标的比较研究 总被引：1，自引：0，他引：1

顾海根李超《心理科学》2005,28(5):1196-1198

本研究采用实验研究方式．以被试人数、项目数和方差是否相等为3个自变量,研究随自变量变化因变景同质信度不同指标各自的稳定性。然后采用概化理论的研究方法对上述结果进一步验证。两种研究结果一致表明,α系数对方差是否相等比较敏感．不是同质信度的理想指标。β系数γ系数ζ系数对信度的估计精度基本处于同一水平．在抗干扰性上明显比。系数优越,但不及ρ系数。ρ系数在各种条件下都很稳定,能较准确地反映同质信度。相似文献

8.

会“说谎”的信度

陈启山《社会心理科学》2009,(6):79-80,97

本文在经典测量理论的框架之内，就信度的概念与运用阐述了三个相关问题：首先，指出信度适用于测量分数而非测量工具，然后，阐述为什么最常用的信度估计是α系数，最后，就α系数的正确使用提出了建议。相似文献

9.

测验信度估计：从α系数到内部一致性信度 总被引：5，自引：0，他引：5

温忠麟叶宝娟《心理学报》2011,43(7):821-829

沿用经典的测验信度定义, 简介了信度与a 系数的关系以及a系数的局限。为了推荐替代a系数的信度估计方法, 深入讨论了与a 系数关系密切的同质性信度和内部一致性信度。在很一般的条件下, 证明了a 系数和同质性信度都不超过内部一致性信度, 后者不超过测验信度, 说明内部一致性信度比较接近测验信度。总结出一个测验信度分析流程, 说明什么情况下a 系数还有参考价值; 什么情况下a 系数不再适用, 应当使用内部一致性信度(文献上也常称为合成信度)。提供了计算同质性信度和内部一致性信度的计算程序, 一般的应用工作者可以直接套用。相似文献

10.

使用Bootstrap方法计算认知诊断评估中的信度

下载免费PDF全文

郭磊张金明《心理学探新》2018,(5):433-439

测验信度是衡量测验质量的一个重要指标,认知诊断评估中同样需要重视信度问题。现有认知诊断中计算信度的方法均有一个前提假设:被试在前后两次测验的后验概率分布和边际概率完全相同。该假设过强,未考虑两次测验间存在的随机误差。基于Bootstrap抽样,提出了两类属性信度和模式信度的指标,分别是积差相关法和修正的一致性法。通过模拟研究比较了新方法和现有方法在不同属性个数、属性间相关性和题目数量下的表现,并基于英语能力认证考试ECPE和分数减法的实证数据验证了新方法的可行性。最后,对信度估计的影响因素进行了讨论。相似文献

11.

Score reliability for tests constructed for African-American populations

Charter RA 《Psychological reports》2006,99(3):997-1000

Test score reliabilities and sample sizes (N) used to establish the reliabilities are described for a variety of tests constructed for African-American populations. The sample size was 341. The average internal consistency reliability was .74 (SD = .16) with a median value of .77. The median N was 131. The mean internal consistency reliability and median N for tests intended for assessment of individuals were only .72 and 96, respectively. 相似文献

12.

The British Army Recruit Battery Goes Operational: From Theory to Practice in Computer-Based Testing Using Item-Generation Techniques

J.M. Collis P.G.C. Tapsfield S.H. Irvine P.L. Dann D. Wright 《International Journal of Selection & Assessment》1995,3(2):96-104

The British Army Recruit Battery (BARB) was commissioned in 1986 by the Army Personnel Research Establishment (APRE) under a mandate from the Directorate of Army Recruiting as part of a programme of strategic research. Item-generation from computer algorithms and computer delivery of the battery are the two fundamental building blocks of the BARB system and they are described in detail. In addition, reports of the psychometric properties of the battery and the results of validity studies are provided. A true score model of reliability is outlined and its utility demonstrated by comparing predicted reliabilities against operational test–retest reliabilities. 相似文献

13.

The content reliability of a test

Harold Gulliksen 《Psychometrika》1936,1(3):189-194

The content unreliability of an essay test is the error due to the items used or the content of the test. The reader unreliability is due to variation in judgment of the persons who read and score the essay test. The content reliability of an essay test is accordingly defined as being independent of the reader reliability. Formulae are derived for the reader reliability and for the content reliability. The content reliability is found to be equal to the geometric mean of the test reliabilities computed from the scores assigned by the two readers, divided by the reader reliability. 相似文献

14.

Statistical approaches to achieving sufficiently high test score reliabilities for research purposes

Charter RA 《The Journal of general psychology》2008,135(3):241-251

The author provides statistical approaches to aid investigators in assuring that sufficiently high test score reliabilities are achieved for specific research purposes. The statistical approaches use tests of statistical significance between the obtained reliability and lowest population reliability that an investigator will tolerate. The statistical approaches work for coefficient alpha and related coefficients and for alternate-forms, split-half (2-part alpha), and retest reliabilities. The author shows that, in some instances, a formula can help to estimate the sample size necessary for the statistical test. 相似文献

15.

Reliability and construct validity of the Dutch psychopathy checklist: youth version: findings from a sample of male adolescents in a juvenile justice treatment institution

Das J de Ruiter C Doreleijers T Hillege S 《Assessment》2009,16(1):88-102

The present study examines the reliability and construct validity of the Dutch version of the Psychopathy Check List: Youth Version (PCL:YV) in a sample of male adolescents admitted to a secure juvenile justice treatment institution (N = 98). Hare's four-factor model is used to examine reliability and validity of the separate dimensions of psychopathy. Interrater reliabilities are good to excellent for the PCL:YV total score and most factor scores, except for the affective factor. Several suggestions are offered for optimizing reliability of this factor. Finally, meaningful associations between PCL:YV scores and scores on the Minnesota Multiphasic Personality Inventory-Adolescent and the Interpersonal Checklist-Revised support the construct validity of the PCL:YV total score as well as the four factors in the Dutch context. 相似文献

16.

A Confirmatory Analysis of Item Reliability Trends (CAIRT): Differentiating True Score and Error Variance in the Analysis of Item Context Effects

Johannes Hartig Britta Hölzel Helfried Moosbrugger 《Multivariate behavioral research》2013,48(1):157-183

Numerous studies have shown increasing item reliabilities as an effect of the item position in personality scales. Traditionally, these context effects are analyzed based on item-total correlations. This approach neglects that trends in item reliabilities can be caused either by an increase in true score variance or by a decrease in error variance. This article presents the Confirmatory Analysis of Item Reliability Trends (CAIRT) that allows estimating both trends separately within a structural equation modeling framework. Results of a simulation study prove the CAIRT method to provide reliable and independent parameter estimates; the power exceeds the analysis of item-total correlations. We present an empirical application to self- and peer ratings collected in an Internet-based experiment. Results show that reliability trends are caused by increasing true score variance in self-ratings and by decreasing error variance in peer ratings. 相似文献

17.

Substituing supplementary subtests for core subtests on reliability of WISC-IV Indexes and Full Scale IQ

Ryan JJ Glass LA 《Psychological reports》2006,98(1):187-190

The effects of replacing core subtests with supplementary subtests on composite score reliabilities were evaluated for the WISC-IV Indexes and Full Scale IQ. When Wechsler's guidelines are followed, i.e., only one substitution for each Index; no more than two substitutions from different Indexes when assessing the Full Scale IQ, summary score reliabilities remain high, and measurement error, as defined by confidence intervals around obtained scores, never increases by more than 1 index score point. In three instances, substitution of a supplementary subtest for a core subtest actually increased the reliabilities and decreased the amount of associated measurement error. 相似文献

18.

The relationship between mean square differences and standard error of measurement: comment on Barchard (2012)

Pan T Yin Y 《心理学方法》2012,17(2):309-311

In the discussion of mean square difference (MSD) and standard error of measurement (SEM), Barchard (2012) concluded that the MSD between 2 sets of test scores is greater than 2(SEM)2 and SEM underestimates the score difference between 2 tests when the 2 tests are not parallel. This conclusion has limitations for 2 reasons. First, strictly speaking, MSD should not be compared to SEM because they measure different things, have different assumptions, and capture different sources of errors. Second, the related proof and conclusions in Barchard hold only under the assumptions of equal reliabilities, homogeneous variances, and independent measurement errors. To address the limitations, we propose that MSD should be compared to the standard error of measurement of difference scores (SEMx-y) so that the comparison can be extended to the conditions when 2 tests have unequal reliabilities and score variances. 相似文献

19.

Examining the reliability of ADAS-Cog change scores

Joseph H. Grochowalski Ying Liu Karen L. Siedlecki 《Neuropsychology, development, and cognition. Section B, Aging, neuropsychology and cognition》2016,23(5):513-529

The purpose of this study was to estimate and examine ways to improve the reliability of change scores on the Alzheimer’s Disease Assessment Scale, Cognitive Subtest (ADAS-Cog). The sample, provided by the Alzheimer’s Disease Neuroimaging Initiative, included individuals with Alzheimer’s disease (AD) (n = 153) and individuals with mild cognitive impairment (MCI) (n = 352). All participants were administered the ADAS-Cog at baseline and 1 year, and change scores were calculated as the difference in scores over the 1-year period. Three types of change score reliabilities were estimated using multivariate generalizability. Two methods to increase change score reliability were evaluated: reweighting the subtests of the scale and adding more subtests. Reliability of ADAS-Cog change scores over 1 year was low for both the AD sample (ranging from .53 to .64) and the MCI sample (.39 to .61). Reweighting the change scores from the AD sample improved reliability (.68 to .76), but lengthening provided no useful improvement for either sample. The MCI change scores had low reliability, even with reweighting and adding additional subtests. The ADAS-Cog scores had low reliability for measuring change. Researchers using the ADAS-Cog should estimate and report reliability for their use of the change scores. The ADAS-Cog change scores are not recommended for assessment of meaningful clinical change. 相似文献