首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
Methods for the identification of differential item functioning (DIF) in Rasch models are typically restricted to the case of two subgroups. A boosting algorithm is proposed that is able to handle the more general setting where DIF can be induced by several covariates at the same time. The covariates can be both continuous and (multi‐)categorical, and interactions between covariates can also be considered. The method works for a general parametric model for DIF in Rasch models. Since the boosting algorithm selects variables automatically, it is able to detect the items which induce DIF. It is demonstrated that boosting competes well with traditional methods in the case of subgroups. The method is illustrated by an extensive simulation study and an application to real data.  相似文献   

A method for analyzing test item responses is proposed to examine differential item functioning (DIF) in multiple-choice items through a combination of the usual notion of DIF, for correct/incorrect responses and information about DIF contained in each of the alternatives. The proposed method uses incomplete latent class models to examine whether DIF is caused by the attractiveness of the alternatives, difficulty of the item, or both. DIF with respect to either known or unknown subgroups can be tested by a likelihood ratio test that is asymptotically distributed as a chi-square random variable.  相似文献   

《Body image》2014,11(3):206-209
Many widely used measures of body image were developed using all-female samples and thus may not adequately capture the male experience of body dissatisfaction. The current study examined differential item functioning (DIF) in three commonly-used measures of body image: The Body Shape Questionnaire (N = 590, 39.7% male), the Body Dissatisfaction subscale of the Eating Disorders Inventory (N = 529, 44.6% male), and the Shape and Weight Concern subscales of the Eating Disorders Examination Questionnaire (N = 1116, 43.5% male). Participants completed a series of measures evaluating body image and eating pathology. Results evidenced statistically significant DIF in several of the items; one item met criteria for clinically significant DIF. While most items did not evidence clinically elevated levels of DIF, additional evaluation is necessary in order to determine overall quality of the measures in terms of capturing the experience of male body image concerns.  相似文献   

Differential item functioning (DIF) is one technique for comparing ethnic populations that test makers employ to help ensure the fairness of their tests. The purpose of this ethnic comparison study is to investigate factors that may have a significant influence on DIF values associated with 217 SAT and 234 GRE analogy items obtained by comparing large samples of Black and White examinees matched for total verbal score. In one study, five significant regression predictors of ethnic differences were found to account for approximately 30% of the DIF variance. A second study replicated these findings. These significant ethnic comparisons are interpreted as consistent with a cultural/contextualist framework although competing explanations involving social-economic status and biological contributions could not be ruled out. Practical implications are discussed.  相似文献   

To date, the statistical software designed for assessing differential item functioning (DIF) with Mantel-Haenszel procedures has employed the following statistics: the Mantel-Haenszel chi-square statistic, the generalized Mantel-Haenszel test and the Mantel test. These statistics permit detecting DIF in dichotomous and polytomous items, although they limit the analysis to two groups. On the contrary, this article describes a new approach (and the related software) that, using the generalized Mantel-Haenszel statistic proposed by Landis, Heyman, and Koch (1978), permits DIF assessment in multiple groups, both for dichotomous and polytomous items. The program is free of charge and is available in the following languages: Spanish, English and Portuguese.  相似文献   

The authors describe and use four methods for detecting Differential Item Functioning in polytomous items: Mantel, Generalized Mantel-Haenszel (GMH), Ordinal Logistic Regression (RLO), and Discriminant Logistic Regression (RLD). For each procedure, the theoretical model and the measure of effect size are described. The data from the "Reading Comprehension Test" from the PISA2000 evaluation program were analyzed using a cross-validation design. Two booklets were independently evaluated in the American and Spanish samples. Adopting as decision rule the significance of the statistical test and the measurement of the effect size, agreement among the evaluated procedures was total for two of the analyzed items.  相似文献   

Given that a key function of tests is to serve as evaluation instruments and for decision making in the fields of psychology and education, the possibility that some of their items may show differential behaviour is a major concern for psychometricians. In recent decades, important progress has been made as regards the efficacy of techniques designed to detect this differential item functioning (DIF). However, the findings are scant when it comes to explaining its causes. The present study addresses this problem from the perspective of multilevel analysis. Starting from a case study in the area of transcultural comparisons, multilevel logistic regression is used: 1) to identify the item characteristics associated with the presence of DIF; 2) to estimate the proportion of variation in the DIF coefficients that is explained by these characteristics; and 3) to evaluate alternative explanations of the DIF by comparing the explanatory power or fit of different sequential models. The comparison of these models confirmed one of the two alternatives (familiarity with the stimulus) and rejected the other (the topic area) as being a cause of differential functioning with respect to the compared groups.  相似文献   

Research has shown that boys display higher levels of childhood conduct problems than girls, and Black children display higher levels than White children, but few studies have tested for scalar equivalence of conduct problems across gender and race. The authors conducted a 2-parameter item response theory (IRT) model to examine item characteristics of the Authority Acceptance scale from the Teacher Observation of Classroom Adaptation-Revised (AA-TOCA-R; L. Larsson-Werthamer, S. G. Kellam, & L. Wheeler, 1991) in 8,820 kindergarten children and estimated the degree of differential item functioning (DIF) by gender and race/urban status. The mean level of latent conduct problems was best represented by behaviors such as being stubborn, breaking rules, and being disobedient, whereas breaking things and taking others' property best represented the construct at one standard deviation above the mean. DIF by gender was detected, such that at equivalent levels of latent conduct problems, males received more endorsements of overt behaviors from teachers, whereas females received more endorsements of nonphysical behaviors. Moreover, overt behaviors were better discriminators of latent conduct problems for males, and nonphysical behaviors were better discriminators of latent conduct problems for females. Differences across race/urban status were not found to be conceptually meaningful. The authors' analyses also suggest that the item scaling of the AA-TOCA-R may be best represented by 5e categories instead of 6. These findings provide support for the use of IRT modeling to examine item characteristics of conduct problem scales and DIF to test for scalar equivalence across diverse subpopulations.  相似文献   

Background/Objective: Orgasm Rating Scale (ORS) assess the subjective orgasm experience in context of sexual relationship. It is composed of four dimensions attributed to the orgasm (Affective, Sensory, Intimacy, and Rewards). The purpose is to analyse the factorial invariance of the ORS across groups, to examine the metric equivalence across sex, and to present the standard scores. Method: A total of 1,472 Spanish adults (715 men and 757 women) were evaluated. They were distributed across age groups (18-34, 35-49 and 50 years old and older). Factorial invariance across different groups and the differential functioning of the items across sex were analyzed, internal consistency was examined, and the standard scores were developed. Results: The structure of the ORS showed strict measurement invariance across sex, relationship status, sexual orientation and education level. It also reached a scalar measurement invariance across age range and duration of the relationship. Some items showed a differential functioning between sexes. Conclusions: The Spanish version of the ORS is invariant across different groups at a factorial level, and it shows equivalence across sex in most of its items at a metric level. The standard scores allow a more accurate assessment of the subjective orgasm experience in context of sexual relationship.  相似文献   

One of the psychological problems with highest prevalence is anxiety. The State Trait Anxiety Inventory is one of the instruments to measure it. This questionnaire assesses Trait Anxiety (understood as a personality factor that predisposes one to suffer from anxiety) and State Anxiety (refers to environment factors that protect from or generate anxiety). The questionnaire was adapted in Spain in 1982. Therefore, the goal of the study is to review the current psychometric properties of the STAI. A total of 1036 adults took part in the study. Cronbach's alpha reliability was .90 for Trait and .94 for State Anxiety. Factor analysis showed similar results compared with the original data. Moreover, differential item functioning (DIF) was carried out to explore sex bias. Only one of the 40 items showed DIF problems. Lastly, a t-Test was run, comparing the original and current values; whereas Trait Anxiety varied in 1 point, State Anxiety had differences of up to 6 points. In general, this result shows that the STAI has maintained adequate psychometric properties and has also been sensitive to increased environmental stimuli that produce stress.  相似文献   

In this article, the authors developed a common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory (IRT). They proposed examining the loadings (discrimination) and the intercept (location) parameters simultaneously using the likelihood ratio test with a free-baseline model and Bonferroni corrected critical p values. They compared the relative efficacy of this approach with alternative implementations for various types and amounts of DIF, sample sizes, numbers of response categories, and amounts of impact (latent mean differences). Results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model. Both MACS and IRT performed similarly well in the majority of experimental conditions. As expected, MACS performed slightly worse in dichotomous conditions but better than IRT in polytomous cases where sample sizes were small. Also, contrary to popular belief, MACS performed well in conditions where DIF was simulated on item thresholds (item means), and its accuracy was not affected by impact.  相似文献   

Differential item functioning (DIF) is an important issue of interest in psychometrics and educational measurement. Several methods have been proposed in recent decades for identifying items that function differently between two or more groups of examinees. Starting from a framework for classifying DIF detection methods and from a comparative overview of the most traditional methods, an R package for nine methods, called difR, is presented. The commands and options are briefly described, and the package is illustrated through the analysis of a data set on verbal aggression.  相似文献   

When developing and evaluating psychometric measures, a key concern is to ensure that they accurately capture individual differences on the intended construct across the entire population of interest. Inaccurate assessments of individual differences can occur when responses to some items reflect not only the intended construct but also construct-irrelevant characteristics, like a person's race or sex. Unaccounted for, this item bias can lead to apparent differences on the scores that do not reflect true differences, invalidating comparisons between people with different backgrounds. Accordingly, empirically identifying which items manifest bias through the evaluation of differential item functioning (DIF) has been a longstanding focus of much psychometric research. The majority of this work has focused on evaluating DIF across two (or a few) groups. Modern conceptualizations of identity, however, emphasize its multi-determined and intersectional nature, with some aspects better represented as dimensional than categorical. Fortunately, many model-based approaches to modelling DIF now exist that allow for simultaneous evaluation of multiple background variables, including both continuous and categorical variables, and potential interactions among background variables. This paper provides a comparative, integrative review of these new approaches to modelling DIF and clarifies both the opportunities and challenges associated with their application in psychometric research.  相似文献   

In complex three-dimensional mental rotation tasks males have been reported to score up to one standard deviation higher than females. However, this effect size estimate could be compromised by the presence of gender bias at the item level, which calls the validity of purely quantitative performance comparisons into question. We hypothesized that the effect of gender bias at the level of distinct item design features could lead to either an over- or underestimation of reported effect sizes of the gender difference in three-dimensional mental rotation. Using automatic item generation we conducted a series of psychometric experiments in which we independently manipulated one out of four different item design features that have exhibited a gender bias in the previous studies (study 1). This was done in a between-subjects design. The results indicated that gender bias caused by item design features linked to the perceptual stadium of mental rotation led to an overestimation of the effect size of the gender difference while item design features associated with the encoding and transformational stadium resulted in an underestimation of the effect size of the gender difference. In study 2 we tested the hypothesis that the gender difference still remains while controlling for the item design features causing gender bias. The results suggest that a significant portion of the gender difference may be attributable to perceptual and encoding processes involved in mental rotation.  相似文献   

This study examined the factor structure, and differential item functioning of the Depression Anxiety Stress Scales (DASS; Lovibond & Lovibond, 1995) across sex. The DASS was completed by 201 women and 165 men from the general community. Confirmatory factor analysis (CFA) indicated support for the original 3-factor oblique model (factors for depression, anxiety and stress). There was however more support for a bifactor model, with four orthogonal factors: a general factor on which all the depression, anxiety and stress items load, and specific independent factors for depression, anxiety and stress items. None of the DASS items showed DIF. The practical, theoretical, research and clinical implications of the findings are discussed.  相似文献   

This report documents relationships between differential item functioning (DIF) identification and: (1) item–trait association, and (2) scale multidimensionality in personality assessment. Applying [Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.] logistic regression model, DIF effect size is found to become increasingly inflated as investigated item associations with trait scores decrease. Similar patterns were noted for the influence of scale multidimensionality on DIF identification. Individuals who investigate DIF in personality assessment applications are provided with estimates regarding the impact of the magnitude of item and trait association and scale multidimensionality on DIF occurrence and effect size. The results emphasize the importance of excluding investigated items in focal trait identification prior to conducting DIF analyses and reporting item and scale psychometric properties in DIF reports.  相似文献   

The aim of latent variable selection in multidimensional item response theory (MIRT) models is to identify latent traits probed by test items of a multidimensional test. In this paper the expectation model selection (EMS) algorithm proposed by Jiang et al. (2015) is applied to minimize the Bayesian information criterion (BIC) for latent variable selection in MIRT models with a known number of latent traits. Under mild assumptions, we prove the numerical convergence of the EMS algorithm for model selection by minimizing the BIC of observed data in the presence of missing data. For the identification of MIRT models, we assume that the variances of all latent traits are unity and each latent trait has an item that is only related to it. Under this identifiability assumption, the convergence of the EMS algorithm for latent variable selection in the multidimensional two-parameter logistic (M2PL) models can be verified. We give an efficient implementation of the EMS for the M2PL models. Simulation studies show that the EMS outperforms the EM-based L1 regularization in terms of correctly selected latent variables and computation time. The EMS algorithm is applied to a real data set related to the Eysenck Personality Questionnaire.  相似文献   


We evaluated the reliability, validity, and differential item functioning (DIF) of a shorter version of the Defining Issues Test-1 (DIT-1), the behavioural DIT (bDIT), measuring the development of moral reasoning. About 353 college students (81 males, 271 females, 1 not reported; age M = 18.64 years, SD = 1.20 years) who were taking introductory psychology classes at a public University in a suburb area in the Southern United States participated in the present study. First, we examined the reliability of the bDIT using Cronbach’s α and its concurrent validity with the original DIT-1 using disattenuated correlation. Second, we compared the test duration between the two measures. Third, we tested the DIF of each question between males and females. Findings reported that first, the bDIT showed acceptable reliability and good concurrent validity. Second, the test duration could be significantly shortened by employing the bDIT. Third, DIF results indicated that the bDIT items did not favour any gender. Practical implications of the present study based on the reported findings are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号