首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 812 毫秒
1.
Usually, methods for detection of differential item functioning (DIF) compare the functioning of items across manifest groups. However, the manifest groups with respect to which the items function differentially may not necessarily coincide with the true source of the bias. It is expected that DIF detection under a model that includes a latent DIF variable is more sensitive to this source of bias. In a simulation study, it is shown that a mixture item response theory model, which includes a latent grouping variable, performs better in identifying DIF items than DIF detection methods using manifest variables only. The difference between manifest and latent DIF detection increases as the correlation between the manifest variable and the true source of the DIF becomes smaller. Different sample sizes, relative group sizes, and significance levels are studied. Finally, an empirical example demonstrates the detection of heterogeneity in a minority sample using a latent grouping variable. Manifest and latent DIF detection methods are applied to a Vocabulary test of the General Aptitude Test Battery (GATB).  相似文献   

2.
This report documents relationships between differential item functioning (DIF) identification and: (1) item–trait association, and (2) scale multidimensionality in personality assessment. Applying [Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.] logistic regression model, DIF effect size is found to become increasingly inflated as investigated item associations with trait scores decrease. Similar patterns were noted for the influence of scale multidimensionality on DIF identification. Individuals who investigate DIF in personality assessment applications are provided with estimates regarding the impact of the magnitude of item and trait association and scale multidimensionality on DIF occurrence and effect size. The results emphasize the importance of excluding investigated items in focal trait identification prior to conducting DIF analyses and reporting item and scale psychometric properties in DIF reports.  相似文献   

3.
The aim of this study was to determine whether the items from a reading comprehension test in European Portuguese function differently across students from rural and urban areas, which biases the test validity and the equity in assessment. The sample was composed of 653 students from second, third and fourth grades. The presence of differential item functioning (DIF) was analysed using logistic regression and the Mantel–Haenszel procedure. Although 17 items were flagged with DIF, only five items showed non-negligible DIF in all effect-size measures. The evidence of invariance across students with rural or urban backgrounds for most of the items supports the validity of the test though the five identified items should be further investigated.  相似文献   

4.
In this study, we contrast results from two differential item functioning (DIF) approaches (manifest and latent class) by the number of items and sources of items identified as DIF using data from an international reading assessment. The latter approach yielded three latent classes, presenting evidence of heterogeneity in examinee response patterns. It also yielded more DIF items with larger effect sizes and more consistent item response patterns by substantive aspects (e.g., reading comprehension processes and cognitive complexity of items). Based on our findings, we suggest empirically evaluating the homogeneity assumption in international assessments because international populations cannot be assumed to have homogeneous item response patterns. Otherwise, differences in response patterns within these populations may be under-detected when conducting manifest DIF analyses. Detecting differences in item responses across international examinee populations has implications on the generalizability and meaningfulness of DIF findings as they apply to heterogeneous examinee subgroups.  相似文献   

5.
A method for analyzing test item responses is proposed to examine differential item functioning (DIF) in multiple-choice items through a combination of the usual notion of DIF, for correct/incorrect responses and information about DIF contained in each of the alternatives. The proposed method uses incomplete latent class models to examine whether DIF is caused by the attractiveness of the alternatives, difficulty of the item, or both. DIF with respect to either known or unknown subgroups can be tested by a likelihood ratio test that is asymptotically distributed as a chi-square random variable.  相似文献   

6.
汉语词汇测验中的项目功能差异初探   总被引:6,自引:1,他引:5  
曹亦薇  张厚粲 《心理学报》1999,32(4):460-467
该文运用两种不同的方法对实际的汉语词汇测验中的36个词汇进行了DIF探测。对于1400多劬的初三学生分别作了男女生与城郊学生间的比较。在男女组分析中检出7个属于一致性DIF的项目;对于城郊学生组经两种方法同时确定的DIF项目有7个,其中5个是一致性DIF,2个是非一致性DIF的项目。该文还讨论了产生DIF的可能因素。  相似文献   

7.
The Wisconsin Schizotypy Scales are widely used for assessing schizotypy in nonclinical and clinical samples. However, they were developed using classical test theory (CTT) and have not had their psychometric properties examined with more sophisticated measurement models. The present study employed item response theory (IRT) as well as traditional CTT to examine psychometric properties of four of the schizotypy scales on the item and scale level, using a large sample of undergraduate students (n = 6,137). In addition, we investigated differential item functioning (DIF) for sex and ethnicity. The analyses revealed many strengths of the four scales, but some items had low discrimination values and many items had high DIF. The results offer useful guidance for applied users and for future development of these scales.  相似文献   

8.
A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.  相似文献   

9.
Given that a key function of tests is to serve as evaluation instruments and for decision making in the fields of psychology and education, the possibility that some of their items may show differential behaviour is a major concern for psychometricians. In recent decades, important progress has been made as regards the efficacy of techniques designed to detect this differential item functioning (DIF). However, the findings are scant when it comes to explaining its causes. The present study addresses this problem from the perspective of multilevel analysis. Starting from a case study in the area of transcultural comparisons, multilevel logistic regression is used: 1) to identify the item characteristics associated with the presence of DIF; 2) to estimate the proportion of variation in the DIF coefficients that is explained by these characteristics; and 3) to evaluate alternative explanations of the DIF by comparing the explanatory power or fit of different sequential models. The comparison of these models confirmed one of the two alternatives (familiarity with the stimulus) and rejected the other (the topic area) as being a cause of differential functioning with respect to the compared groups.  相似文献   

10.
This research used logistic regression to model item responses from a popular 360-degree-for-development survey used in a leadership development programme given to middle and upper level European managers in Brussels. The survey contained 106 items on 16 scales. The model used gender of ratee and rater group to identify items that exhibited differential item functioning (DIF). The rater groups were self, boss, peer, and direct report. The sample consisted of 356 survey families where a survey family consisted of a matched set of four surveys: one self, one boss, one peer, and one direct report. The sample contained 88% male and 12% female raters. The sample contained 1424 total surveys. The procedure for flagging items exhibiting differential functioning used effect size computed from Wald chi-square statistics rather than statistical significance, resulting in fewer flagged items. One item exhibited rating anomalies due to the gender of the ratee; 55 items exhibited DIF attributable to rater group. The apparent effect of the DIF was small with each item. An examination of the maximum likelihood parameter estimates suggested the rater group DIF was the result of either hierarchical complexity or organizational contingency. The DIF due to gender conformed to prior expectations of gender-related stereotypical interpretations. This research further suggested that DIF due to environmental complexity or organizational contingency could be a naturally occurring phenomenon in some 360-degree assessment, and that the interpretation of some 360-degree feedback could need to include the potential for such DIF to exist.  相似文献   

11.
Cross-cultural researchers have not used cultural dimensions to predict when differential item functioning (DIF) in attitude survey items is likely to occur. Predictive hypotheses for items related to supervision on a global corporate survey were developed based on 3 of Hofstede's (1991a) dimensions. In some cases, greater DIF was found on hypothesized items between countries differing on cultural dimensions. Implications for the use of this framework and DIF in examining multinational employee opinion surveys are discussed.  相似文献   

12.
This study investigated the equivalence of different types of informants, such as children (or early adolescents) and parents, in evaluating child externalizing and internalizing problems. We applied a polytomous item response theory (IRT) model for the Strengths and Difficulties Questionnaire (SDQ). We obtained responses to three subscales—Conduct Problems, Hyperactivity/Inattention, and Emotional Symptoms—from 541 elementary school students aged 10–12 years, fathers for 233 students, mothers for 275 students, and the homeroom teachers for 524 students. Expected values on the individual item calculated by the discrimination and threshold parameters were compared among students, fathers, and mothers as an investigation of differential item functioning (DIF) or differential informant functioning. Assessing either externalizing or internalizing problems were mostly equivalent between fathers and mothers, and most items for externalizing problems functioned equally between students and parents, whereas items for internalizing problems showed DIF between them. IRT also yielded that the intervals of response categories varied across items, particularly for the conduct problems items “fight” and “steal,” and positively worded items showed an extremely low threshold.  相似文献   

13.
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The objective was to provide bounds of the likely DIF effects on these measurement consequences. Five factors were manipulated: test length, percentage of DIF items per form, item type, sample size, and level of group ability difference. Results indicate that the greatest DIF effect was less than 2 points on the 0 to 60 total score scale and about 0.15 on the IRT ability scale. DIF had a limited effect on the ratio of true-score variance to observed-score variance, but its influence on the standard error of estimation for the IRT ability parameter was evident for certain ability values.  相似文献   

14.
This article considers and illustrates a strategy to study effects of school context on differential item functioning (DIF) in large-scale assessment. The approach employs a hierarchical generalized linear modeling framework to (a) detect DIF, and (b) identify school-level correlates of the between-group differences in item performance. To illustrate, I investigated (a) whether any of the civic skills items used in the U.S. Civic Education Study of the International Association for the Evaluation of Educational Achievement displayed ethnic–racial DIF, and (b) the extent to which the ethnic–racial DIF was related to teacher-reported opportunity to learn (OTL) major civic-related topics. The results indicated that 3 of the 13 items displayed ethnic–racial DIF. After adjusting for OTL, two of the three flagged items ceased to exhibit DIF. Implications of this approach are discussed.  相似文献   

15.
In this study, an item response theory-based differential functioning of items and tests (DFIT) framework (N. S. Raju, W. J. van der Linden, & P. F. Fleer, 1995) was applied to a Likert-type scale. Several differential item functioning (DIF) analyses compared the item characteristics of a 10-item satisfaction scale for Black and White examinees and for female and male examinees. F. M. Lord's (1980) chi-square and the extended signed area (SA) measures were also used. The results showed that the DFIT indices consistently performed in the expected manner. The results from Lord's chi-square and the SA procedures were somewhat varied across comparisons. A discussion of these results along with an illustration of an item with significant DIF and suggestions for future DIF research are presented.  相似文献   

16.
This research provides an example of testing for differential item functioning (DIF) using multiple indicator multiple cause (MIMIC) structural equation models. True/False items on five scales of the Schedule for Nonadaptive and Adaptive Personality (SNAP) were tested for uniform DIF in a sample of Air Force recruits with groups defined by gender and ethnicity. Uniform DIF exists when an item is more easily endorsed for one group than the other, controlling for group mean differences on the variable under study. Results revealed significant DIF for many SNAP items and some effects were quite large. Differentially-functioning items can produce measurement bias and should be either deleted or modeled as if separate items were administered to different groups. Future research should aim to determine whether the DIF observed here holds for other samples.  相似文献   

17.
We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is proposed to identify DIF items as outliers in the multivariate space. For low dimensionalities, up to 2–3 groups, a simple graphical tool is derived. We illustrate our approach with a reanalysis of data from Kim, Cohen, and Park (1995) on using calculators for a mathematics test.  相似文献   

18.
本研究基于项目反应理论,提出了一种检验力高且犯Ⅰ类错误率小的检测DIF的新方法:LP法(Likelihood Procedure),且以2PLM下对题目进行DIF检验为例介绍此法。本文通过与MH方法、Lord卡方检验法和Raju面积测量法三种常用的检验DIF的方法比较研究LP法的有效性,同时探讨样本容量、测验长度、目标组和参照组能力分布的差异、DIF值大小等相关因素对LP法有效性可能产生的影响。通过模拟研究,得到以下结论:(1)LP法比MH法及Lord卡方法更灵敏且更稳健;(2) LP法比Raju面积测量法更合理;(3)LP法的检验力随着被试样本容量或DIF值的增大而增大;(4)当参照组与目标组的能力无差异时,LP法在各种条件下的检验力比参照组与目标组的能力有差异时的检验力高;(5)LP法对一致性DIF和非一致性DIF都有良好的检验力,且LP法对一致性DIF的检验力比对非一致性DIF的检验力高。LP法可以简便的扩展并运用到多维度、多级评分项目上。  相似文献   

19.
We investigated measurement equivalence in two antisocial behavior scales (i.e., one scale for adolescents and a second scale for young adults) by examining differential item functioning (DIF) for respondents from single-parent (n = 109) and two-parent families (n = 447). Even though one item in the scale for adolescents and two items in the scale for young adults showed significant DIF, the two scales exhibited non-significant differential test functioning (DTF). Both uniform and nonuniform DIF were investigated and examples of each type were identified. Specifically, uniform DIF was exhibited in the adolescent scale whereas nonuniform DIF was shown in the young adult scale. Implications of DIF results for assessment of antisocial behavior, along with strengths and limitations of the study, are discussed.  相似文献   

20.
The authors report differential item functioning (DIF) between Black and White participants completing the 60-item Padua Inventory (PI) for obsessive-compulsive disorder (OCD). The authors use an Internet-generated sample that included 105 Blacks, 67 Hispanics, 582 Whites, and 136 additional participants reporting an OCD diagnosis. Factor analysis replicated prior work indicating the PI consists of four factors: contamination fears, checking behaviors, impaired control over thoughts, and fear of losing control over impulses. On the contamination subscale, nonclinical Black and Hispanic mean scores were as high as the OCD group. Comparing Blacks to Whites, the authors applied an item response theory, DIF-graded response model to each factor and found significant DIF on eight items, with biased items in each factor. Results suggest that extraneous factors contribute to racial differences on scores. Cultural practices and fear of being negatively stereotyped may contribute to item bias.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号