本文提出一种多级计分项目下的个人拟合统计量R, 考察它在检测6种常见的异常作答模式(作弊、猜测、随机、粗心、创新作答、混合异常)下的表现, 并与标准化对数似然统计量lzp进行比较。结果表明:(1) 在异常作答覆盖率较低并且异常作答类型为作弊和猜测时, R的检测率显著高于lzp; (2) 随着测验长度和被试异常程度的增加, 两种统计量的检测率都会上升; (3) 在一些条件下, Rlzp检测效果接近。实证数据分析进一步展示了R统计量的使用方法和过程, 结果也表明R统计量具有较好的应用前景。  相似文献   

认知风格分析测验(CSA)修订及大学生样本的划界尝试   总被引:2,自引:1,他引:1  
以大学生为对象,对“认知风格分析测验”(Cognitive Style Analysis,CSA)进行了项目分析、信度、效度检验等一系列研究,并通过大学生样本划界尝试。项目的区分度检验表明测验中所有项目都具有较好的区分度,两个维度间的相关分析进一步确证了CSA的结构效度。进一步的信度分析表明CSA具备可接受的内部一致性信度和重测信度。跨文化比较结果表明,我国大学生认知风格的划界与英国常模具有显著差异。  相似文献   

目的:修订中文版正念教学量表并检验其在中国教师群体中的信效度。方法:使用样本1(n1=302)进行项目分析探索性因素分析; 样本2(n2=185)进行验证性因素分析; 使用样本1和2检验量表的跨组测量不变性,并考察效标效度; 样本3(n3=30)进行重测信度检验。结果:中文版正念教学量表为二因子结构,包括个体内正念和人际间正念两个分量表,具有良好的结构效度、效标效度及信度。且该量表在小学、初中及高中教师中达到部分测量强等值。结论:该量表适宜在我国中小学教师群体中使用。  相似文献   

心理学期刊论文中可重复性不高的现象,原因之一是研究结果的效果普遍较小。并且在报告效果大小的文章中,效果大小指标使用不当。在方差分析中最经常报告的是η2η2p,但是在不同的研究设计中,这些效果大小是无法直接进行比较的。广义eta方(η2G)是近年来新出现的一种新的效果大小指标,可克服的η2η2p不足,灵活处理重复测量等多种研究设计下个体差异的计算问题,实现跨研究设计效果大小的可比性。论文结合实例介绍了η2G的原理和计算方法,并对其优缺点、使用和报告等问题进行了讨论。研究人员在报告效果大小时要考虑到不同的研究设计和研究假设,并选择恰当的指标防止过高估计效果大小。  相似文献   

翻译与修订个体相对剥夺感问卷(Personal Relative Deprivation Scale,PRDS),并在大学生群体中进行了信效度检验。探索性因子分析结果显示,原问卷中两个反向计分的项目在主因子载荷很低,删除这两个项目后的PRDS-3各项目鉴别力良好; 探索性因子分析和验证性因子分析均显示PRDS-3符合单维度结构; PRDS-3重测信度为0.89,在不同样本间的内部一致性系数为0.77-0.81; PRDS-3得分与相对剥夺感、经济相对剥夺感、焦虑、压力、抑郁、攻击性等效标具有中高程度的相关关系(r>0.4,p<0.01)。PRDS-3得分与社会比较倾向的能力维度显著相关(r=0.46,p<0.01),且能部分地中介能力社会比较倾向对物质主义和生活满意度的影响。综上,可以认为,中文版PRDS-3信效度指标良好,符合测量学要求,可作为相对剥夺感的评估工具。  相似文献   

目的 将婴儿态度量表(Attitude Toward Babies Scale,ABS)进行汉化,并检验在中国已婚育龄女性中的信效度。方法 采用方便取样的方法,选取贵州、山西、湖北等地的700名育龄女性进行施测,通过项目分析、内容效度分析、探索性因素分析、验证性因素分析、效标关联效度、Cronbach α系数、分半信度、重测信度评价其信效度。结果 项目分析表明,婴儿态度量表各条目与量表各维度总分显著相关,具有良好的区分度; 内容效度分析表明专家间一致性水平(IR)为1,I-CVI在0.83~1之间,S-CVI/UA为0.82,S-CVI/Ave为0.97; 探索性因素分析得出5个特征值>1的因子,累计方差贡献率为54.399%; 验证性因素分析表明五因素模型拟合度较好(χ2/df=2.500,CFI=0.922,TLI=0.914,RMSEA=0.048,SRMR=0.050); 各效标与该量表显著相关; 总量表Cronbach α系数为0.748,量表的分半信度为0.661,重测信度为0.639。结论 修订后的ABS具有良好的信效度,可以作为已婚育龄女性生育动机的有效测量工具。  相似文献   

目的:修订女性主义认同发展量表(feminist identity development scale,FIDS),检验其在中国女性大学生群体中的信度和效度。方法:对1657名女性大学生施测中文版FIDS、自尊量表(the self-esteem scale,SES)和矛盾性别偏见量表(ambivalent sexism inventory,ASI)中的敌意性别偏见分量表,随后进行项目分析、信度分析、探索性因子分析、验证性因子分析和效标效度的检验。结果:修订后的中文版FIDS共有27个条目,保留了5个分量表,分别是被动接受、醒悟、融合发展、整合和积极投入,累积方差贡献率为56.18%; 五因子模型拟合较好(χ2/df=2.99,IFI=0.92,CFI=0.92,GFI=0.92,TLI=0.91,RMSEA=0.05); 中文版FIDS的内部一致性系数(Cronbach’s α)在0.71~0.89之间,分半信度在0.75~0.90之间。结论:修订后的中文版FIDS在中国女性大学生群体中具有较好的信效度,是测量女性主义认同发展水平的有效工具。  相似文献   

本论文汇报了《学前及初小儿童中文识字量表》(PPCLS)的编制及初步效度检验工作。首先根据北京、香港和新加坡三地小学语文教学大纲所列生字表确定出2600个汉字作为PPCLS测验用字库。然后采用分等抽样测查法选出200个汉字作为PPCLS的测验用字。PPCLS包含四个分量表:字图匹配,听音辨字,指字认字,认字说话。本文描述了PPCLS的以下心理测量学特征:(1)量表的结构和内容效度,(2)量表的信度(内部一致性和再测信度),(3)量表的区分效度和聚合效度。本研究表明,PPCLS在北京、香港和新加坡三地均表现出令人满意的信度和效度,具备发展性量表的特点,适用于三地学前儿童的测试。  相似文献   

质性研究中编码者信度的多种方法考察   总被引:1,自引:0,他引:1  
徐建平  张厚粲 《心理科学》2005,28(6):1430-1432
质性研究中检验编码者信度的方法有归类一致性指数、编码信度系数、相关系数、中位数检验、概化系数等。基于教师胜任力访谈数据集,对编码者信度考察结果表明,归类一致性指数和编码信度系数受相同编码数影响而不稳定,相关系数受数据类型制约,中位数检验受研究设计影响,概化系数则受编码者和编码项目的数量影响。研究中须合理选用。  相似文献   

新世纪头20年, 国内心理学11本专业期刊一共发表了213篇统计方法研究论文。研究范围主要包括以下10类(按论文篇数排序):结构方程模型、测验信度、中介效应、效应量与检验力、纵向研究、调节效应、探索性因子分析、潜在类别模型、共同方法偏差和多层线性模型。对各类做了简单的回顾与梳理。结果发现, 国内心理统计方法研究的广度和深度都不断增加, 研究热点在相互融合中共同发展; 但综述类论文比例较大, 原创性研究论文比例有待提高, 研究力量也有待加强。  相似文献   



Computerized Cognitive Tests (CCT) play an increasing role in the assessment of elderly cognitive impairment. In this framework, it is important to review the psychometric data of the available CCT in the detection of cognitive decline.


To analyse psychometric properties of the CCT, the available data concerning reliability and validity indices were considered.


Only eleven CCT identified with elderly were retained to systematic review. 9 of 11 CCT present some results about their reliability properties; 8 of 11 CCT present results from concurrent validity; only 4 of 11 CCT present data concerning criterion validity; and just 4 of 11 CCT present some results regarding factor analysis.


There are only a few papers published with well-structured psychometric data (reliability and validity). Some results have important limitations concerning to the adequacy of reliability and validity indices and some of these CCT psychometric properties were not studied yet. Considering these limitations, more research on the CCT is needed, including systematic studies of their psychometric properties, and Item Response Theory should be considered.  相似文献   


A variety of collective phenomena are understood to exist to the extent that workers agree on their perceptions of the phenomena, such as perceptions of their organization’s climate or perceptions of their team’s mental model. Researchers conducting group-level studies of such phenomena measure individuals’ perceptions via surveys and then aggregate data to the group level if the mean within-group agreement for a sample of groups is sufficiently high. Despite this widespread practice, we know little about the factors potentially affecting mean within-group agreement. Here, focusing on work climate, we report an investigation of a number of expected contextual (social interaction) and methodological predictors of mean rWG, a common statistic for judging within-group agreement in applied psychology and management research. We used the novel approach of meta-CART, which allowed us to assess the relative importance and possible interactions of the predictor variables. Notably, mean rWG values are driven by both contextual (average number of individuals per group and cultural individualism-collectivism) and methodological factors (the number of items in a scale and scale reliability). Our findings are largely consistent with expectations concerning how social interaction affects within-group agreement and psychometric arguments regarding why adding more items to a scale will not necessarily increase the magnitude of an index based on a Spearman-Brown “stepped-up correction.” We discuss the key insights from our results, which are relevant to the study of multilevel phenomena relying on the aggregation of individual-level data and informative for how meta-analytic researchers can simultaneously examine multiple moderator variables.


This paper demonstrates and compares methods for estimating the interrater reliability and interrater agreement of performance ratings. These methods can be used by applied researchers to investigate the quality of ratings gathered, for example, as criteria for a validity study, or as performance measures for selection or promotional purposes. While estimates of interrater reliability are frequently used for these purposes, indices of interrater agreement appear to be rarely reported for performance ratings. A recommended index of interrater agreement, theT index (Tinsley & Weiss, 1975), is compared to four methods of estimating interrater reliability (Pearsonr, coefficient alpha, mean correlation between raters, and intraclass correlation). Subordinate and superior ratings of the performance of 100 managers were used in these analyses. The results indicated that, in general, interrater agreement and reliability among subordinates were fairly high. Interrater agreement between subordinates and superiors was moderately high; however, interrater reliability between these two rating sources was very low. The results demonstrate that interrater agreement and reliability are distinct indices and that both should be reported. Reasons are discussed as to why interrater reliability should not be reported alone.This paper is based, in part, on a thesis submitted to East Carolina University by the second author. Portions of this study were presented at the American Psychological Association meeting in New Orleans, LA, August, 1989. The authors would like to thank Michael Campion and two anonymous reviewers for their comments on earlier drafts of this paper.  相似文献   

This research dealt with the reliability and validity of the DTVP when used with a sample of economically disadvantaged, predominantly Negro children from a large eastern city. Regarding reliability, test-retest and split-half procedures were employed; for validity the test was correlated with intelligence and achievement measures. The authors concluded that (a) the total test values alone evidence the necessary reliability to be used with confidence for diagnostic purposes, and (b) the validity of the measure has not been sufficiently demonstrated.  相似文献   

Re-injury worry is an important construct in competitive sport that may influence performance and increase the risk of re-injury. However, there are currently no available instruments to measure the causes of re-injury worry. The purpose of this study was to develop the Causes of Re-Injury Worry Questionnaire (CR-IWQ). The study was conducted in three independent research phases to investigate the following: (a) the content relevance, (b) the factor structure and the factorial validity, (c) the concurrent validity, (d) the discriminant validity, and (e) the test-retest reliability (intraclass correlation coefficients; ICC), and the internal consistency of the instrument. Exploratory factor analysis (EFA) was chosen to examine the factor structure of the CR-IWQ. Confirmatory factor analysis (CFA) was used to examine further the factorial validity of the instrument. A number of valid constructs were used to assess the concurrent and discriminant validity of the CR-IWQ. The reliability of the new instrument was examined using Pearson r (ICC) and Cronbach α. Three hundred and seventy athletes with an acute musculoskeletal sport injury in the last year participated in the study. EFA revealed a 12-item model, representing two factors ("Re-injury worry due to rehabilitation" and "Re-injury worry due to opponent's ability"). CFA supported the two-factor model of the CR-IWQ. The concurrent and discriminant validity of the CR-IWQ was confirmed by examining correlations between the CR-IWQ with other constructs. The ICCs and the Cronbach α indices of the CR-IWQ were acceptable. We have demonstrated that the CR-IWQ is a good psychometric instrument that can be used for clinical and research purposes.  相似文献   

This study evaluated the validity and reliability of the Perceived Ethnic Discrimination Questionnaire-Community Version (PEDQ-CV) Lifetime Exposure scale in a multiethnic Asian sample (N = 509). The 34-item scale measures perceived interpersonal racial/ethnic discrimination and includes four subscales assessing different types of discrimination: Social Exclusion, Stigmatization, Discrimination at Work/School, and Threat/Aggression. The Lifetime Exposure scale demonstrated excellent reliability across the full group and in all major subgroups. Subscales displayed good reliability across the full group and moderate-to-good reliability in each subgroup. The Lifetime Exposure scale was significantly correlated with the depression and anxiety subscales of the SCL-90-R, providing preliminary evidence of construct validity. The data suggest the Lifetime Exposure scale, previously validated in Black and Latino adults, is also appropriate for use with Asian samples, and can be used to examine both within-group and between-groups differences in discrimination.  相似文献   

