首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到13条相似文献,搜索用时 0 毫秒
1.
Methods for the identification of differential item functioning (DIF) in Rasch models are typically restricted to the case of two subgroups. A boosting algorithm is proposed that is able to handle the more general setting where DIF can be induced by several covariates at the same time. The covariates can be both continuous and (multi‐)categorical, and interactions between covariates can also be considered. The method works for a general parametric model for DIF in Rasch models. Since the boosting algorithm selects variables automatically, it is able to detect the items which induce DIF. It is demonstrated that boosting competes well with traditional methods in the case of subgroups. The method is illustrated by an extensive simulation study and an application to real data.  相似文献   

2.
When developing and evaluating psychometric measures, a key concern is to ensure that they accurately capture individual differences on the intended construct across the entire population of interest. Inaccurate assessments of individual differences can occur when responses to some items reflect not only the intended construct but also construct-irrelevant characteristics, like a person's race or sex. Unaccounted for, this item bias can lead to apparent differences on the scores that do not reflect true differences, invalidating comparisons between people with different backgrounds. Accordingly, empirically identifying which items manifest bias through the evaluation of differential item functioning (DIF) has been a longstanding focus of much psychometric research. The majority of this work has focused on evaluating DIF across two (or a few) groups. Modern conceptualizations of identity, however, emphasize its multi-determined and intersectional nature, with some aspects better represented as dimensional than categorical. Fortunately, many model-based approaches to modelling DIF now exist that allow for simultaneous evaluation of multiple background variables, including both continuous and categorical variables, and potential interactions among background variables. This paper provides a comparative, integrative review of these new approaches to modelling DIF and clarifies both the opportunities and challenges associated with their application in psychometric research.  相似文献   

3.
4.
This report documents relationships between differential item functioning (DIF) identification and: (1) item–trait association, and (2) scale multidimensionality in personality assessment. Applying [Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.] logistic regression model, DIF effect size is found to become increasingly inflated as investigated item associations with trait scores decrease. Similar patterns were noted for the influence of scale multidimensionality on DIF identification. Individuals who investigate DIF in personality assessment applications are provided with estimates regarding the impact of the magnitude of item and trait association and scale multidimensionality on DIF occurrence and effect size. The results emphasize the importance of excluding investigated items in focal trait identification prior to conducting DIF analyses and reporting item and scale psychometric properties in DIF reports.  相似文献   

5.
This paper investigates the psychometric properties of the values in action (VIA) character strengths (Peterson and Seligman, 2004 Peterson, C., & Seligman, M. E. P. (2004). Character strengths and virtues: A handbook and classification. Oxford, United Kingdom: Oxford University Press. [Google Scholar]). A sample of 904 South African undergraduate students (female=77%, male=23%, black=70%, mean age=21.07 years, SD age=2.73 years) was assessed using a 380-item questionnaire that included the items from the international personality item pool (IPIP) values in action (VIA) measure of 24 character strengths as well as additional items based on the underlying theory of the particular constructs. Responses were analysed with the Rasch rating scale model. Reliability coefficients were computed for the retained scale items. The majority (21) of the scales demonstrated satisfactory Rasch model fit and good reliability of scores. The finding that a large proportion of strengths exhibited differential item functioning for at least one of (1) gender, (2) ethnicity and (3) home language group, challenges the assumption that character strengths are necessarily accultural, indicating qualitative distinctions in construct conceptualisations and measurement as a function of emic factors.  相似文献   

6.
Using an item‐response theory‐based approach (i.e. likelihood ratio test with an iterative procedure), we examined the equivalence of the Rosenberg Self‐Esteem Scale (RSES) in a sample of US and Chinese college students. Results from the differential item functioning (DIF) analysis showed that the RSES was not fully equivalent at the item level, as well as at the scale level. The two cultural groups did not use the scale comparably, with the US students showing more extreme responses than the Chinese students. Moreover, we evaluated the practical impact of DIF and found that cultural differences in average self‐esteem scores disappeared after the DIF was taken into account. In the present study, we discuss the implications of our findings for cross‐cultural research and provide suggestions for future studies using the RSES in China.  相似文献   

7.
This study examined the factor structure, and differential item functioning of the Depression Anxiety Stress Scales (DASS; Lovibond & Lovibond, 1995) across sex. The DASS was completed by 201 women and 165 men from the general community. Confirmatory factor analysis (CFA) indicated support for the original 3-factor oblique model (factors for depression, anxiety and stress). There was however more support for a bifactor model, with four orthogonal factors: a general factor on which all the depression, anxiety and stress items load, and specific independent factors for depression, anxiety and stress items. None of the DASS items showed DIF. The practical, theoretical, research and clinical implications of the findings are discussed.  相似文献   

8.
ABSTRACT

We evaluated the reliability, validity, and differential item functioning (DIF) of a shorter version of the Defining Issues Test-1 (DIT-1), the behavioural DIT (bDIT), measuring the development of moral reasoning. About 353 college students (81 males, 271 females, 1 not reported; age M = 18.64 years, SD = 1.20 years) who were taking introductory psychology classes at a public University in a suburb area in the Southern United States participated in the present study. First, we examined the reliability of the bDIT using Cronbach’s α and its concurrent validity with the original DIT-1 using disattenuated correlation. Second, we compared the test duration between the two measures. Third, we tested the DIF of each question between males and females. Findings reported that first, the bDIT showed acceptable reliability and good concurrent validity. Second, the test duration could be significantly shortened by employing the bDIT. Third, DIF results indicated that the bDIT items did not favour any gender. Practical implications of the present study based on the reported findings are discussed.  相似文献   

9.
The aim of this study was to determine whether the items from a reading comprehension test in European Portuguese function differently across students from rural and urban areas, which biases the test validity and the equity in assessment. The sample was composed of 653 students from second, third and fourth grades. The presence of differential item functioning (DIF) was analysed using logistic regression and the Mantel–Haenszel procedure. Although 17 items were flagged with DIF, only five items showed non-negligible DIF in all effect-size measures. The evidence of invariance across students with rural or urban backgrounds for most of the items supports the validity of the test though the five identified items should be further investigated.  相似文献   

10.
Recent detection methods for Differential Item Functioning (DIF) include approaches like Rasch Trees, DIFlasso, GPCMlasso and Item Focussed Trees, all of which - in contrast to well established methods - can handle metric covariates inducing DIF. A new estimation method shall address their downsides by mainly aiming at combining three central virtues: the use of conditional likelihood for estimation, the incorporation of linear influence of metric covariates on item difficulty and the possibility to detect different DIF types: certain items showing DIF, certain covariates inducing DIF, or certain covariates inducing DIF in certain items. Each of the approaches mentioned lacks in two of these aspects. We introduce a method for DIF detection, which firstly utilizes the conditional likelihood for estimation combined with group Lasso-penalization for item or variable selection and L1-penalization for interaction selection, secondly incorporates linear effects instead of approximation through step functions, and thirdly provides the possibility to investigate any of the three DIF types. The method is described theoretically, challenges in implementation are discussed. A dataset is analysed for all DIF types and shows comparable results between methods. Simulation studies per DIF type reveal competitive performance of cmlDIFlasso, particularly when selecting interactions in case of large sample sizes and numbers of parameters. Coupled with low computation times, cmlDIFlasso seems a worthwhile option for applied DIF detection.  相似文献   

11.
采用Rosenberg自尊量表(RSES)对425名在校大学生进行施测,应用项目反应理论的Rasch模型对项目指标进行分析及DIF检验。结果表明,Rosenberg自尊量表具有单维性,量表的信度为0.84; 除项目8以外,其他项目拟合指标良好,较适用来区分中等及偏低自尊水平的个体,项目功能差异检验发现在项目1和项目5上存在DIF,表现为男生自尊水平要高于女生。相对于经典测量理论,应用Rasch模型分析Rosenberg自尊量表具有优势,为进一步的完善和使用该自尊量表提供依据。  相似文献   

12.
We conducted two experimental studies with between-subjects and within-subjects designs to investigate the item response process for personality measures administered in high- versus low-stakes situations. Apart from assessing measurement validity of the item response process, we examined predictive validity; that is, whether or not different response models entail differential selection outcomes. We found that ideal point response models fit slightly better than dominance response models across high- versus low-stakes situations in both studies. Additionally, fitting ideal point models to the data led to fewer items displaying differential item functioning compared to fitting dominance models. We also identified several items that functioned as intermediate items in both the faking and honest conditions when ideal point models were fitted, suggesting that ideal point model is “theoretically” more suitable across these contexts for personality inventories. However, the use of different response models (dominance vs. ideal point) did not have any substantial impact on the validity of personality measures in high-stakes situations, or the effectiveness of selection decisions such as mean performance or percent of fakers selected. These findings are significant in that although prior research supports the importance and use of ideal point models for measuring personality, we find that in the case of personality faking, though ideal point models seem to have slightly better measurement validity, the use of dominance models may be adequate with no loss to predictive validity.  相似文献   

13.
A model-based modification (SIBTEST) of the standardization index based upon a multidimensional IRT bias modeling approach is presented that detects and estimates DIF or item bias simultaneously for several items. A distinction between DIF and bias is proposed. SIBTEST detects bias/DIF without the usual Type 1 error inflation due to group target ability differences. In simulations, SIBTEST performs comparably to Mantel-Haenszel for the one item case. SIBTEST investigates bias/DIF for several items at the test score level (multiple item DIF called differential test functioning: DTF), thereby allowing the study of test bias/DIF, in particular bias/DIF amplification or cancellation and the cognitive bases for bias/DIF.This research was partially supported by Office of Naval Research Cognitive and Neural Sciences Grant N0014-90-J-1940, 4421-548 and National Science Foundation Mathematics Grant NSF-DMS-91-01436. The research reported here is collaborative in every respect and the order of authorship is alphabetical. The assistance of Hsin-hung Li and Louis Roussos in conducting the simulation studies was of great help. Discussions with Terry Ackerman, Paul Holland, and Louis Roussos were very helpful.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号