首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We investigated measurement equivalence in two antisocial behavior scales (i.e., one scale for adolescents and a second scale for young adults) by examining differential item functioning (DIF) for respondents from single-parent (n = 109) and two-parent families (n = 447). Even though one item in the scale for adolescents and two items in the scale for young adults showed significant DIF, the two scales exhibited non-significant differential test functioning (DTF). Both uniform and nonuniform DIF were investigated and examples of each type were identified. Specifically, uniform DIF was exhibited in the adolescent scale whereas nonuniform DIF was shown in the young adult scale. Implications of DIF results for assessment of antisocial behavior, along with strengths and limitations of the study, are discussed.  相似文献   

2.
This research used logistic regression to model item responses from a popular 360-degree-for-development survey used in a leadership development programme given to middle and upper level European managers in Brussels. The survey contained 106 items on 16 scales. The model used gender of ratee and rater group to identify items that exhibited differential item functioning (DIF). The rater groups were self, boss, peer, and direct report. The sample consisted of 356 survey families where a survey family consisted of a matched set of four surveys: one self, one boss, one peer, and one direct report. The sample contained 88% male and 12% female raters. The sample contained 1424 total surveys. The procedure for flagging items exhibiting differential functioning used effect size computed from Wald chi-square statistics rather than statistical significance, resulting in fewer flagged items. One item exhibited rating anomalies due to the gender of the ratee; 55 items exhibited DIF attributable to rater group. The apparent effect of the DIF was small with each item. An examination of the maximum likelihood parameter estimates suggested the rater group DIF was the result of either hierarchical complexity or organizational contingency. The DIF due to gender conformed to prior expectations of gender-related stereotypical interpretations. This research further suggested that DIF due to environmental complexity or organizational contingency could be a naturally occurring phenomenon in some 360-degree assessment, and that the interpretation of some 360-degree feedback could need to include the potential for such DIF to exist.  相似文献   

3.
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The objective was to provide bounds of the likely DIF effects on these measurement consequences. Five factors were manipulated: test length, percentage of DIF items per form, item type, sample size, and level of group ability difference. Results indicate that the greatest DIF effect was less than 2 points on the 0 to 60 total score scale and about 0.15 on the IRT ability scale. DIF had a limited effect on the ratio of true-score variance to observed-score variance, but its influence on the standard error of estimation for the IRT ability parameter was evident for certain ability values.  相似文献   

4.
The Wisconsin Schizotypy Scales are widely used for assessing schizotypy in nonclinical and clinical samples. However, they were developed using classical test theory (CTT) and have not had their psychometric properties examined with more sophisticated measurement models. The present study employed item response theory (IRT) as well as traditional CTT to examine psychometric properties of four of the schizotypy scales on the item and scale level, using a large sample of undergraduate students (n = 6,137). In addition, we investigated differential item functioning (DIF) for sex and ethnicity. The analyses revealed many strengths of the four scales, but some items had low discrimination values and many items had high DIF. The results offer useful guidance for applied users and for future development of these scales.  相似文献   

5.
Because of the practical, theoretical, and legal implications of differential item functioning (DIF) for organizational assessments, studies of measurement equivalence are a necessary first step before scores can be compared across individuals from different groups. However, commonly recommended criteria for evaluating results from these analyses have several important limitations. The present study proposes an effect size index for confirmatory factor analytic (CFA) studies of measurement equivalence to address 1 of these limitations. The application of this index is illustrated with personality data from American English, Greek, and Chinese samples. Results showed a range of nonequivalence across these samples, and these differences were linked to the observed effects of DIF on the outcomes of the assessment (i.e., group-level mean differences and adverse impact).  相似文献   

6.
This study investigated gender based differential item functioning (DIF) in science literacy items included in the Program for International Student Assessment (PISA) 2012. Prior research has suggested presence of such DIF in large scale surveys. Our study extends the empirical literature by examining gender based DIF differences at the country level in order to gain a better overall picture of how cultural and national differences affect occurrence of uniform and nonuniform DIF. Our statistical results indicate existence of widespread gender based DIF in PISA with estimates of percentage of potentially biased items ranging between 2 and 44% (M = 16, SD = 9.9). Our reliance on nationally representative country samples allow these findings to have wide applicability.  相似文献   

7.
In this study, an item response theory-based differential functioning of items and tests (DFIT) framework (N. S. Raju, W. J. van der Linden, & P. F. Fleer, 1995) was applied to a Likert-type scale. Several differential item functioning (DIF) analyses compared the item characteristics of a 10-item satisfaction scale for Black and White examinees and for female and male examinees. F. M. Lord's (1980) chi-square and the extended signed area (SA) measures were also used. The results showed that the DFIT indices consistently performed in the expected manner. The results from Lord's chi-square and the SA procedures were somewhat varied across comparisons. A discussion of these results along with an illustration of an item with significant DIF and suggestions for future DIF research are presented.  相似文献   

8.
This study used an ideal point response model to examine the extent to which applicants and incumbents differ when responding to personality items. It was hypothesized that applicants' responses would exhibit less folding at high trait levels than incumbents' responses. We used sample data from applicants (N=1,509) and incumbents (N=1,568) who completed the 16 Personality Questionnaire Select. Differential item (DIF) and test functioning (DTF) analyses were conducted using the generalized graded unfolding model, which is based on ideal point model assumptions. Out of the 90 items, 50 showed DIF; however, only 11 were in the hypothesized direction. DTF was significant for 3 of the 12 scales; 2 were in the hypothesized direction.  相似文献   

9.
The aim of this study was to determine whether the items from a reading comprehension test in European Portuguese function differently across students from rural and urban areas, which biases the test validity and the equity in assessment. The sample was composed of 653 students from second, third and fourth grades. The presence of differential item functioning (DIF) was analysed using logistic regression and the Mantel–Haenszel procedure. Although 17 items were flagged with DIF, only five items showed non-negligible DIF in all effect-size measures. The evidence of invariance across students with rural or urban backgrounds for most of the items supports the validity of the test though the five identified items should be further investigated.  相似文献   

10.
In this study, we contrast results from two differential item functioning (DIF) approaches (manifest and latent class) by the number of items and sources of items identified as DIF using data from an international reading assessment. The latter approach yielded three latent classes, presenting evidence of heterogeneity in examinee response patterns. It also yielded more DIF items with larger effect sizes and more consistent item response patterns by substantive aspects (e.g., reading comprehension processes and cognitive complexity of items). Based on our findings, we suggest empirically evaluating the homogeneity assumption in international assessments because international populations cannot be assumed to have homogeneous item response patterns. Otherwise, differences in response patterns within these populations may be under-detected when conducting manifest DIF analyses. Detecting differences in item responses across international examinee populations has implications on the generalizability and meaningfulness of DIF findings as they apply to heterogeneous examinee subgroups.  相似文献   

11.
This study investigated differential item functioning (DIF) mechanisms in the context of differential testlet effects across subgroups. Specifically, we investigated DIF manifestations when the stochastic ordering assumption on the nuisance dimension in a testlet does not hold. DIF hypotheses were formulated analytically using a parametric marginal item response function approach and compared with empirical DIF results from a unidimensional item response theory approach. The comparisons were made in terms of type of DIF (uniform or non‐uniform) and direction (whether the focal or reference group was advantaged). In general, the DIF hypotheses were supported by the empirical results, showing the usefulness of the parametric approach in explaining DIF mechanisms. Both analytical predictions of DIF and the empirical results provide insights into conditions where a particular type of DIF becomes dominant in a specific DIF direction, which is useful for the study of DIF causes.  相似文献   

12.
This article considers and illustrates a strategy to study effects of school context on differential item functioning (DIF) in large-scale assessment. The approach employs a hierarchical generalized linear modeling framework to (a) detect DIF, and (b) identify school-level correlates of the between-group differences in item performance. To illustrate, I investigated (a) whether any of the civic skills items used in the U.S. Civic Education Study of the International Association for the Evaluation of Educational Achievement displayed ethnic–racial DIF, and (b) the extent to which the ethnic–racial DIF was related to teacher-reported opportunity to learn (OTL) major civic-related topics. The results indicated that 3 of the 13 items displayed ethnic–racial DIF. After adjusting for OTL, two of the three flagged items ceased to exhibit DIF. Implications of this approach are discussed.  相似文献   

13.
Given that a key function of tests is to serve as evaluation instruments and for decision making in the fields of psychology and education, the possibility that some of their items may show differential behaviour is a major concern for psychometricians. In recent decades, important progress has been made as regards the efficacy of techniques designed to detect this differential item functioning (DIF). However, the findings are scant when it comes to explaining its causes. The present study addresses this problem from the perspective of multilevel analysis. Starting from a case study in the area of transcultural comparisons, multilevel logistic regression is used: 1) to identify the item characteristics associated with the presence of DIF; 2) to estimate the proportion of variation in the DIF coefficients that is explained by these characteristics; and 3) to evaluate alternative explanations of the DIF by comparing the explanatory power or fit of different sequential models. The comparison of these models confirmed one of the two alternatives (familiarity with the stimulus) and rejected the other (the topic area) as being a cause of differential functioning with respect to the compared groups.  相似文献   

14.
运用均数与协方差结构模型侦查项目功能差异   总被引:1,自引:0,他引:1       下载免费PDF全文
阐释了运用多组均数与协方差结构(MACS)模型侦查多级反应项目的一致性与非一致性项目功能差异(DIF)的原理与程序, 以道德自我概念量表DIF的侦查进行示例, 并对该方法进行了评价。与项目反应理论比照, MACS采用系统的、迭代的方式利用修正指数来侦查DIF, 并提供多个拟合指数协同评价模型拟合;与标准验证性因素分析相较, MACS不仅能侦查非一致性DIF, 而且能侦查一致性DIF。运用MACS侦查DIF是一种值得推荐的方法。  相似文献   

15.
For detecting differential item functioning (DIF) between two or more groups of test takers in the Rasch model, their item parameters need to be placed on the same scale. Typically this is done by means of choosing a set of so-called anchor items based on statistical tests or heuristics. Here the authors suggest an alternative strategy: By means of an inequality criterion from economics, the Gini Index, the item parameters are shifted to an optimal position where the item parameter estimates of the groups best overlap. Several toy examples, extensive simulation studies, and two empirical application examples are presented to illustrate the properties of the Gini Index as an anchor point selection criterion and compare its properties to those of the criterion used in the alignment approach of Asparouhov and Muthén. In particular, the authors show that—in addition to the globally optimal position for the anchor point—the criterion plot contains valuable additional information and may help discover unaccounted DIF-inducing multidimensionality. They further provide mathematical results that enable an efficient sparse grid optimization and make it feasible to extend the approach, for example, to multiple group scenarios.  相似文献   

16.
The aim of the current study was to reduce the number of items in the 48-item hypomanic personality scale (HPS) and determine whether a unidimensional scale of the hypomanic trait could be derived. Previously collected HPS data from University students (n = 318) were applied to the Rasch model (one-parameter item response theory). Overall scale and individual item fit statistics were used to judge fit to the model and item maps employed to determine coverage of the trait. Cronbach’s Alpha and correlations with other questionnaires pre- and post-item reduction were evaluated. Rasch analysis indicated that the original HPS was not unidimensional, had significant redundancy and differential item functioning by age and gender. An iterative process of item reduction produced a 20-item HPS (HPS-20) that retained the concepts of the original HPS and had excellent fit to the Rasch model (χ2 p = 0.27). Unidimensionality of the HPS-20 was confirmed. The traditional psychometric properties of the HPS-20 and coverage of the underlying hypomanic construct were similar to the original. It was possible to derive a unidimensional measure of the hypomanic trait. Further use of the HPS-20 is encouraged as it may increase understanding of the risk factors for affective disorders.  相似文献   

17.
本文将多维题组反应模型(MTRM)应用到多维题组测验的项目功能差异(DIF)检验中,通过模拟研究和应用研究探究MTRM在DIF检验中的准确性、有效性和影响因素,并与忽略题组效应的多维随机系数多项Logistic模型(MRCMLM)进行对比。结果表明:(1)随着样本量的增大,MTRM对有效DIF值检出率增高,错误率降低,在不同条件下结果的稳定性更高;(2)与MRCMLM相比,基于MTRM的DIF检验模型检验率更高,受到其他因素的影响更小;(3)当测验中题组效应较小时,MTRM与MRCMLM结果差异较小,但是MTRM模型拟合度更高。  相似文献   

18.
Measurement invariance is a prerequisite for confident cross-cultural comparisons of personality profiles. Multigroup confirmatory factor analysis was used to detect differential item functioning (DIF) in factor loadings and intercepts for the Revised NEO Personality Inventory (P. T. Costa, Jr., & R. R. McCrae, 1992) in comparisons of college students in the United States (N = 261), Philippines (N = 268), and Mexico (N = 775). About 40%-50% of the items exhibited some form of DIF and item-level noninvariance often carried forward to the facet level at which scores are compared. After excluding DIF items, some facet scales were too short or unreliable for cross-cultural comparisons, and for some other facets, cultural mean differences were reduced or eliminated. The results indicate that considerable caution is warranted in cross-cultural comparisons of personality profiles.  相似文献   

19.
A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.  相似文献   

20.
本研究基于项目反应理论,提出了一种检验力高且犯Ⅰ类错误率小的检测DIF的新方法:LP法(Likelihood Procedure),且以2PLM下对题目进行DIF检验为例介绍此法。本文通过与MH方法、Lord卡方检验法和Raju面积测量法三种常用的检验DIF的方法比较研究LP法的有效性,同时探讨样本容量、测验长度、目标组和参照组能力分布的差异、DIF值大小等相关因素对LP法有效性可能产生的影响。通过模拟研究,得到以下结论:(1)LP法比MH法及Lord卡方法更灵敏且更稳健;(2) LP法比Raju面积测量法更合理;(3)LP法的检验力随着被试样本容量或DIF值的增大而增大;(4)当参照组与目标组的能力无差异时,LP法在各种条件下的检验力比参照组与目标组的能力有差异时的检验力高;(5)LP法对一致性DIF和非一致性DIF都有良好的检验力,且LP法对一致性DIF的检验力比对非一致性DIF的检验力高。LP法可以简便的扩展并运用到多维度、多级评分项目上。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号