首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Differential item functioning (DIF) analysis is important in terms of test fairness. While DIF analyses have mainly been conducted with manifest grouping variables, such as gender or race/ethnicity, it has been recently claimed that not only the grouping variables but also contextual variables pertaining to examinees should be considered in DIF analyses. This study adopted propensity scores to incorporate the contextual variables into the gender DIF analysis. In this study, propensity scores were used to control for the contextual variables that potentially affect the gender DIF. Subsequent DIF analyses with the Mantel-Haenszel (MH) procedure and the Logistic Regression (LR) model were run with the propensity score applied reference (males) and focal groups (females) through propensity score matching. The propensity score embedded MH model and LR model detected fewer number of gender DIF than the conventional MH and LR models. The propensity score embedded models, as a confirmatory approach in DIF analysis, could contribute to hypothesizing an inference on the potential cause of DIF. Also, salient advantages of propensity score embedded DIF analysis models are discussed.  相似文献   

A model-based modification (SIBTEST) of the standardization index based upon a multidimensional IRT bias modeling approach is presented that detects and estimates DIF or item bias simultaneously for several items. A distinction between DIF and bias is proposed. SIBTEST detects bias/DIF without the usual Type 1 error inflation due to group target ability differences. In simulations, SIBTEST performs comparably to Mantel-Haenszel for the one item case. SIBTEST investigates bias/DIF for several items at the test score level (multiple item DIF called differential test functioning: DTF), thereby allowing the study of test bias/DIF, in particular bias/DIF amplification or cancellation and the cognitive bases for bias/DIF.This research was partially supported by Office of Naval Research Cognitive and Neural Sciences Grant N0014-90-J-1940, 4421-548 and National Science Foundation Mathematics Grant NSF-DMS-91-01436. The research reported here is collaborative in every respect and the order of authorship is alphabetical. The assistance of Hsin-hung Li and Louis Roussos in conducting the simulation studies was of great help. Discussions with Terry Ackerman, Paul Holland, and Louis Roussos were very helpful.  相似文献   

This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.  相似文献   

Using an item‐response theory‐based approach (i.e. likelihood ratio test with an iterative procedure), we examined the equivalence of the Rosenberg Self‐Esteem Scale (RSES) in a sample of US and Chinese college students. Results from the differential item functioning (DIF) analysis showed that the RSES was not fully equivalent at the item level, as well as at the scale level. The two cultural groups did not use the scale comparably, with the US students showing more extreme responses than the Chinese students. Moreover, we evaluated the practical impact of DIF and found that cultural differences in average self‐esteem scores disappeared after the DIF was taken into account. In the present study, we discuss the implications of our findings for cross‐cultural research and provide suggestions for future studies using the RSES in China.  相似文献   

本文将多维题组反应模型(MTRM)应用到多维题组测验的项目功能差异(DIF)检验中,通过模拟研究和应用研究探究MTRM在DIF检验中的准确性、有效性和影响因素,并与忽略题组效应的多维随机系数多项Logistic模型(MRCMLM)进行对比。结果表明:(1)随着样本量的增大,MTRM对有效DIF值检出率增高,错误率降低,在不同条件下结果的稳定性更高;(2)与MRCMLM相比,基于MTRM的DIF检验模型检验率更高,受到其他因素的影响更小;(3)当测验中题组效应较小时,MTRM与MRCMLM结果差异较小,但是MTRM模型拟合度更高。  相似文献   

Probabilistic reasoning skills are important in various contexts. The aim of the present study was to develop a new instrument (the Probabilistic Reasoning Scale – PRS) to accurately measure low levels of probabilistic reasoning ability in order to identify people with difficulties in this domain. Item response theory was applied to construct the scale, and to investigate differential item functioning (i.e., whether the items were invariant) across genders, educational levels, and languages. Additionally, we tested the validity of the scale by investigating the relationships between the PRS and several other measures. The results revealed that the items had a low level of difficulty. Nonetheless, the discriminative measures showed that the items can discriminate between individuals with different trait levels, and the test information function showed that the scale accurately assesses low levels of probabilistic reasoning ability. Additionally, through investigating differential item functioning, the measurement equivalence of the scale at the item level was confirmed for gender, educational status, and language (i.e., Italian and English). Concerning validity, the results showed the expected correlations with numerical skills, math‐related attitudes, statistics achievement, IQ, reasoning skills, and risky choices both in the Italian and British samples. In conclusion, the PRS is an ideal instrument for identifying individuals who struggle with basic probabilistic reasoning, and who could be targeted by specific interventions. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

The PARELLA model is a probabilistic parallelogram model that can be used for the measurement of latent attitudes or latent preferences. The data analyzed are the dichotomous responses of persons to items, with a one (zero) indicating agreement (disagreement) with the content of the item. The model provides a unidimensional representation of persons and items. The response probabilities are a function of the distance between person and item: the smaller the distance, the larger the probability that a person will agree with the content of the item. This paper discusses how the approach to differential item functioning presented by Thissen, Steinberg, and Wainer can be implemented for the PARELLA model. Requests for the PARELLA software should be sent to Iec Progamma PO Box 841, 9700 AV Groningen, The Netherlands.  相似文献   

This study tested the Selection Procedural Justice Scale (SPJS) in an educational setting. The sample consisted of 617 students. Four different confirmatory models were tested, followed by an IRT analysis to test the scale structure at the item level in the two different contexts (selection vs. academic exams). Results indicated that the 11‐factor structure is the best factorial solution, and SPJS items were found to be free of DIF.  相似文献   

The present study examined the psychometric properties of a universal screening instrument called the Emotional and Behavioral Screener (EBS), which is designed to identify students exhibiting emotional and behavioral problems. The primary purposes of this study were to assess the measurement invariance of EBS items between Caucasian and African-American students and to assess the impact of differential item functioning (DIF) on EBS scores. The sample consisted of 946 elementary students from throughout the U.S. The findings suggested that EBS items exhibited small to negligible levels of DIF, and that DIF did not significantly impact EBS scores. The results supported the EBS as universal screening instrument that is fair in measuring the emotional and behavioral risk of elementary students. Research limitations and implications for school professionals are discussed.  相似文献   

The word any may appear in some sentences, but not in others. For example, any is permitted in sentences that contain the word nobody, as in Nobody ate any fruit. However, in a minimally different context any seems strikingly anomalous: *Everybody ate any fruit. The aim of the present study was to investigate how the brain responds to the word any in such minimally different contexts - where it is permitted (licensed) and where it is not permitted (unlicensed). Brain responses were measured from adult readers using magnetoencephalography (MEG). The results showed significantly larger responses to permissible contexts in the left posterior temporal areas between 400-500 ms and 590-660 ms. These results clarify the anatomy and timing of brain processes that contribute to our judgment that a word such as any is or is not permitted in a given context.  相似文献   

以日常生活中的条件推理语句为实验材料,采用大、小前提和结论依次呈现的“推断-判断”范式,利用事件相关电位(event-related brain potential, ERP)技术探讨了条件推理中否定前件下信念偏差效应的脑内时程动态变化。结果发现:在行为反应上,较信念促进,信念阻碍下的正确率更低反应时更长;在脑电上,两条件(信念阻碍和信念促进)诱发的ERP波形仅在大前提加工阶段出现明显的分离。这表明该推理下的信念偏差效应可能早在对大前提的语义表征阶段就已发生。  相似文献   

多级评分计算机化自适应测验动态综合选题策略   总被引:1,自引:0,他引:1  
罗芬  丁树良  王晓庆 《心理学报》2012,44(3):400-412
多级评分可以提供更多关于被试的信息, 是计算机化自适应测验的一个发展方向, 选题策略是计算机化自适应测验的研究重点。对于多级评分的等级反应模型, 本文拟用区间估计的思想改进近期提出的几种选题策略, 并且将两级评分b-STR和a-STR推广到多级评分以改进最大信息量选题策略。Monte Carlo模拟实验表明在达到或接近原有选题策略测验精度的基础上, 本文提出的几种新选题策略有的能够有效降低测验长度, 有的可以极大降低项目曝光率。  相似文献   

Production studies have shown that speakers of languages with larger phoneme inventories expand their acoustic space relative to languages with smaller inventories [Bradlow, A. (1995). A comparative acoustic study of English and Spanish vowels. Journal of the Acoustical Society of America, 97(3), 1916-1924; Jongman, A., Fourakis, M., & Sereno, J. (1989). The acoustic vowel space of Modern Greek and German. Language Speech, 32, 221-248]. In this study, we investigated whether this acoustic expansion in production has a perceptual correlate, that is, whether the perceived distance between pairs of sounds separated by equal acoustic distances varies as a function of inventory size or organization. We used magnetoencephalography, specifically the mismatch field response (MMF), and compared two language groups, French and Spanish, whose vowel inventories differ in size and organization. Our results show that the MMF is sensitive to inventory size but not organization, suggesting that speakers of languages with larger inventories perceive the same sounds as less similar than speakers with smaller inventories.  相似文献   

为探索潜在剖面分析(latent profile analysis, LPA)在心理行为问题识别上的应用, 对12718名大学生进行心理普查, 并对644名学生由心理咨询师、辅导员和兼职班主任对其心理状况进行评定, 采用评定结果和阳性症状检出率作为"黄金标准"分析了诊断的敏感度与特异度。结果发现:(1)潜在剖面分析发现本研究大学生样本的心理行为问题可划分为三个亚群体:风险组、困扰组和健康组, 分别占比9.86%、19.15%和70.99%;(2)风险组表现为突出的精神症状(Z≥2.6SD), 有61.21%的被试出现阳性症状, 远高于困扰组的38.28%和健康组的8.36%;此外, 困扰组以认知与情绪症状为主;(3)潜在剖面分析方法比传统划界分数方法在敏感度上能提高8.93%~35.26%, 更为科学有效。  相似文献   

The Attitudes and Belief Scale-2 (ABS-2: DiGiuseppe, Leaf, Exner, & Robin, 1988. The development of a measure of rational/irrational thinking. Paper presented at the World Congress of Behavior Therapy, Edinburg, Scotland.) is a 72-item self-report measure of evaluative rational and irrational beliefs widely used in Rational Emotive Behavior Therapy research contexts. However, little psychometric evidence exists regarding the measure's underlying factor structure. Furthermore, given the length of the ABS-2 there is a need for an abbreviated version that can be administered when there are time demands on the researcher, such as in clinical settings. This study sought to examine a series of theoretical models hypothesized to represent the latent structure of the ABS-2 within an alternative models framework using traditional confirmatory factor analysis as well as utilizing a bifactor modeling approach. Furthermore, this study also sought to develop a psychometrically sound abbreviated version of the ABS-2. Three hundred and thirteen (N = 313) active emergency service personnel completed the ABS-2. Results indicated that for each model, the application of bifactor modeling procedures improved model fit statistics, and a novel eight-factor intercorrelated solution was identified as the best fitting model of the ABS-2. However, the observed fit indices failed to satisfy commonly accepted standards. A 24-item abbreviated version was thus constructed and an intercorrelated eight-factor solution yielded satisfactory model fit statistics. Current results support the use of a bifactor modeling approach to determining the factor structure of the ABS-2. Furthermore, results provide empirical support for the psychometric properties of the newly developed abbreviated version.  相似文献   

不同来源的财富因其在人们心理的权重差异会导致对其消费态度与方式的不同。前人运用一系列行为实验已发现意外所得更容易消费、辛苦所得则不容易消费。基于心理账户和内隐社会认知等理论, 运用IAT和ERP考察财富的意外所得与辛苦所得在人们心理引起的内在消费偏差, 进而对内隐消费态度和脑加工机制进行间接检测。IAT研究结果表明, 意外所得与容易消费的联结更加紧密, 辛苦所得与不易消费的联结更加紧密, 从内隐层面验证了前人的行为研究结论; ERP研究结果发现辛苦所得与意外所得这两个不同收入来源可能建立了不同脑加工机制, 且在P3和LPC成分得到反映, P3成分可能是反映辛苦所得组偏好难消费型消费方式的ERP成分, LPC反映了意外所得收入条件下个体偏好易消费型消费方式的特点。  相似文献   

房慧聪  周琳 《心理科学》2012,35(4):857-861
本研究采用行为测量与ERP分析相结合的范式,以视差分离方式、线索提示有效性为自变量,考察三维空间中内源性注意定向对立体视觉加工的影响及其脑电机制。结果发现:无论是短SOA还是长SOA,3D空间中预测性中央符号线索对后续辨认任务均产生了启动效应,且启动效应量不随SOA延长而变化。短SOA下,经典提示效应和跨深度提示效应均表现为有效提示下的N1波幅大于无效提示条件;长SOA下,线索提示效应则表现出了无效提示条件诱发较大N1波幅的趋势。  相似文献   

We investigated early behavioural markers of autism spectrum disorder (ASD) using the Autism Observational Scale for Infants (AOSI) in a prospective familial high-risk (HR) sample of infant siblings (N = 54) and low-risk (LR) controls (N = 50). The AOSI was completed at 7 and 14 month infant visits and children were seen again at age 24 and 36 months. Diagnostic outcome of ASD (HR-ASD) versus no ASD (HR-No ASD) was determined for the HR sample at the latter timepoint. The HR group scored higher than the LR group at 7 months and marginally but non-significantly higher than the LR group at 14 months, although these differences did not remain when verbal and nonverbal developmental level were covaried. The HR-ASD outcome group had higher AOSI scores than the LR group at 14 months but not 7 months, even when developmental level was taken into account. The HR-No ASD outcome group had scores intermediate between the HR-ASD and LR groups. At both timepoints a few individual items were higher in the HR-ASD and HR-No ASD outcome groups compared to the LR group and these included both social (e.g. orienting to name) and non-social (e.g. visual tracking) behaviours. AOSI scores at 14 months but not at 7 months were moderately correlated with later scores on the autism diagnostic observation schedule (ADOS) suggesting continuity of autistic-like behavioural atypicality but only from the second and not first year of life. The scores of HR siblings who did not go on to have ASD were intermediate between the HR-ASD outcome and LR groups, consistent with the notion of a broader autism phenotype.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号