共查询到20条相似文献,搜索用时 15 毫秒
1.
用Logistic Regression侦察题目差异功能 总被引:1,自引:0,他引:1
题目差异功能(differential item functioning,DIF)是构造测验公平性的重要依据,DIF的研究与测验的效度有直接的关联.本文通过对DIF的提出作简要的回顾,着重介绍如何运用Logistic Regression探测一致性DIF和非一致性DIF,并例证了学习适应性测验(AAT)的6个项目在性别上存在题目差异功能. 相似文献
2.
Improvement in Detection of Differential Item Functioning Using a Mixture Item Response Theory Model
Annette M. Maij-de Meij Henk Kelderman Henk van der Flier 《Multivariate behavioral research》2013,48(6):975-999
Usually, methods for detection of differential item functioning (DIF) compare the functioning of items across manifest groups. However, the manifest groups with respect to which the items function differentially may not necessarily coincide with the true source of the bias. It is expected that DIF detection under a model that includes a latent DIF variable is more sensitive to this source of bias. In a simulation study, it is shown that a mixture item response theory model, which includes a latent grouping variable, performs better in identifying DIF items than DIF detection methods using manifest variables only. The difference between manifest and latent DIF detection increases as the correlation between the manifest variable and the true source of the DIF becomes smaller. Different sample sizes, relative group sizes, and significance levels are studied. Finally, an empirical example demonstrates the detection of heterogeneity in a minority sample using a latent grouping variable. Manifest and latent DIF detection methods are applied to a Vocabulary test of the General Aptitude Test Battery (GATB). 相似文献
3.
《International Journal of Testing》2013,13(3):287-300
Identifying the sources of differential item functioning (DIF) in international assessments is very challenging, because such sources are often nebulous and intertwined. Even though researchers frequently focus on test translation and content area, few actually go beyond these factors to investigate other cultural sources of DIF. This article introduces the multiple-variable matching method using logistic regression analysis to identify sources of DIF. A case study demonstrates how this methodology identified Extra Lesson Hours After School (ELHAS) as a potential source of DIF between Taiwan and the United States in the Third International Mathematics and Science Study (TIMSS) 1999. DIF is not a fixed character of any test item, nor is a cultural factor an inherent source of DIF. The legitimacy of a source of DIF relies on the specific context and purpose for the cross-country comparison. 相似文献
4.
Differential item functioning (DIF), referring to between-group variation in item characteristics above and beyond the group-level disparity in the latent variable of interest, has long been regarded as an important item-level diagnostic. The presence of DIF impairs the fit of the single-group item response model being used, and calls for either model modification or item deletion in practice, depending on the mode of analysis. Methods for testing DIF with continuous covariates, rather than categorical grouping variables, have been developed; however, they are restrictive in parametric forms, and thus are not sufficiently flexible to describe complex interaction among latent variables and covariates. In the current study, we formulate the probability of endorsing each test item as a general bivariate function of a unidimensional latent trait and a single covariate, which is then approximated by a two-dimensional smoothing spline. The accuracy and precision of the proposed procedure is evaluated via Monte Carlo simulations. If anchor items are available, we proposed an extended model that simultaneously estimates item characteristic functions (ICFs) for anchor items, ICFs conditional on the covariate for non-anchor items, and the latent variable density conditional on the covariate—all using regression splines. A permutation DIF test is developed, and its performance is compared to the conventional parametric approach in a simulation study. We also illustrate the proposed semiparametric DIF testing procedure with an empirical example. 相似文献
5.
Joost van Rosmalen Alex J. Koning Patrick J. F. Groenen 《Multivariate behavioral research》2013,48(1):59-81
This study involved two phases: first, when classification was based on the calibration sample; and second, in a cross-validation setting. Computer generated data were used. Results obtained from rules based on probabilities of group membership were compared for accuracy when classifying in the discriminant space and in the predictor variable spaces. In the first phase accuracy was greater in the predictor variable spaces, while the reverse was true in the second phase. In general, rules based on probabilities of group membership were approximately equally accurate and more accurate than a rule related to a multiple regression analysis. Other findings are also discussed. 相似文献
6.
Benjamin O. Emmert-Aronson Michael T. Moore Timothy A. Brown 《Journal of psychopathology and behavioral assessment》2014,36(3):424-431
This study examines the psychometric properties, and particularly differential item functioning (DIF) due to racial and ethnic group, of the criteria for a major depressive episode using a large sample (N?=?1,063) of outpatients seeking treatment for mood and anxiety disorders. DIF was evaluated using multiple group confirmatory factor analysis. Item thresholds fell along a continuum with the core features of depressed mood and anhedonia, along with fatigue, being endorsed at lower levels of depression, and change in appetite and suicidal ideation endorsed at more severe levels of depression. Item discriminations, reflecting an item’s ability to discriminate between lower and higher levels of depression, were highest for depressed mood and anhedonia, and lowest for change in appetite and suicidal ideation. When examining model fit among the racial groups we did not find differences in symptom functioning, providing support for the use of these symptoms across diverse groups. This is of particular importance given the paucity of studies examining this question using a semi-structured clinician administered instrument to a clinical sample. 相似文献
7.
XS-DIF is a program for detection of Differential Item Functioning (DIF) using Item Response Theory (IRT). It calculates Lords Chi-Square, Raju's Signed Area and Unsigned Area, and Kim and Cohen's Closed-interval signed area and Closed-interval unsigned area. XS-DIF was designed to be executed in Excel 2000 and it has a capacity of analysis of up to 100 items. It is useful to support data analysis of research projects and in detection and teaching processes in DIF. 相似文献
8.
Measurement invariance is a fundamental assumption in item response theory models, where the relationship between a latent construct (ability) and observed item responses is of interest. Violation of this assumption would render the scale misinterpreted or cause systematic bias against certain groups of persons. While a number of methods have been proposed to detect measurement invariance violations, they typically require advance definition of problematic item parameters and respondent grouping information. However, these pieces of information are typically unknown in practice. As an alternative, this paper focuses on a family of recently proposed tests based on stochastic processes of casewise derivatives of the likelihood function (i.e., scores). These score-based tests only require estimation of the null model (when measurement invariance is assumed to hold), and they have been previously applied in factor-analytic, continuous data contexts as well as in models of the Rasch family. In this paper, we aim to extend these tests to two-parameter item response models, with strong emphasis on pairwise maximum likelihood. The tests’ theoretical background and implementation are detailed, and the tests’ abilities to identify problematic item parameters are studied via simulation. An empirical example illustrating the tests’ use in practice is also provided. 相似文献
9.
本文指出了自我报告法中项目前后关系效应的普遍存在性及其危害性。讨论了用信息加工的观点对项目前后关系效应所作的认知上的理论解释,以及测量工具中促使产生这种项目前后关系效应的关键特征。同时也讨论了项目序列位置的作用。 相似文献
10.
检验项目功能差异的两类方法-CFA和IRT的比较 总被引:2,自引:0,他引:2
目前在验证性因素分析(CFA)和项目反应理论(IRT)两个领域,都有一些检验方法来识别项目功能差异(DIF)。该文主要针对单维的多级计分项目,分别介绍CFA和IRT检测DIF的方法,并进行二者的比较。 相似文献
11.
A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method. 相似文献
12.
Psychometrika - When latent variables are used as outcomes in regression analysis, a common approach that is used to solve the ignored measurement error issue is to take a multilevel perspective on... 相似文献
13.
Mark D. Schluchter 《Multivariate behavioral research》2013,48(2):268-288
In behavioral research, interest is often in examining the degree to which the effect of an independent variable X on an outcome Y is mediated by an intermediary or mediator variable M. This article illustrates how generalized estimating equations (GEE) modeling can be used to estimate the indirect or mediated effect, defined as the amount by which the regression coefficient of X on Y changes after adjusting for M. Advantages of this method are: (a) it applies to the class of generalized linear models, including linear, logistic, and Poisson regression as special cases; (b) it allows multiple independent variables and mediators in the same model; and (c) asymptotically valid standard errors and confidence intervals are obtained using standard software. This methodology is compared with the bootstrap, another general methodology that can be applied to the same broad class of models, and is evaluated using simulation in both linear and logistic regression scenarios. The methods are utilized to examine the degree to which the effect of low birthweight status on internalizing symptoms at age 20 is mediated through IQ at age 8. 相似文献
14.
We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is proposed to identify DIF items as outliers in the multivariate space. For low dimensionalities, up to 2–3 groups, a simple graphical tool is derived. We illustrate our approach with a reanalysis of data from Kim, Cohen, and Park (1995) on using calculators for a mathematics test. 相似文献
15.
Joke Van den Broeck Gina Rossi Eva Dierckx Barbara De Clercq 《Journal of psychopathology and behavioral assessment》2012,34(3):361-369
Geriatric researchers and clinicians often have to deal with a lack of valid personality measures for older age groups (e.g., Mroczek, Hurt, & Berman, 1999; Zweig 2008), which hampers a reliable assessment of personality in later life. An age-neutral measurement system is one of the basic conditions for an accurate personality assessment across the lifespan, both longitudinally and cross-sectionally. In the present study, we empirically investigate the age-neutrality of one of the most widely used personality measures (i.e., the NEO PI-R (Costa & McCrae, 1992)), by examining potential Differential Item Functioning (DIF). Overall, results indicate that the vast majority (92.9?% at domain-level and 95?% at facet-level) of the NEO PI-R items was similarly endorsed by younger and older age groups with the same position on the personality trait of interest, corroborating the NEO PI-R??s age neutrality. However, Differential Test Functioning (DTF) analyses revealed large DTF for Extraversion, and facet A6 (Tender-Mindedness). Results are discussed in terms of their implications for using the current format of the NEO PI-R in older aged samples. 相似文献
16.
Charles A. Scherbaum Jennifer Sabet Michael J. Kern Paul Agnello 《Journal of personality assessment》2013,95(2):207-216
A concern about personality inventories in diagnostic and decision-making contexts is that individuals will fake. Although there is extensive research on faking, little research has focused on how perceptions of personality items change when individuals are faking or responding honestly. This research demonstrates how the delta parameter from the generalized graded unfolding item response theory model can be used to examine how individuals’ perceptions about personality items might change when responding honestly or when faking. The results indicate that perceptions changed from honest to faking conditions for several neuroticism items. The direction of the change varied, indicating that faking can operate to increase or decrease scores within a personality factor. 相似文献
17.
Differential item functioning (DIF) analyses of the Beck Depression Inventory-II (BDI-II) were conducted on samples of 267 women with breast cancer and 294 women with clinical depression. Patterns of items in which there was significant and nonsignificant DIF were identified using statistical tests and measures of DIF effect size. At the most general level, 15 of 21 BDI-II items were associated with nontrivial DIF suggesting that the item responses of these samples do not reflect the same underlying construct. Factor analyses of the BDI-II using a psychometrically defensible method for item level factor analysis supported the conclusions from the DIF analyses. These findings suggest that researchers and practitioners should apply caution when interpreting self-report depression symptoms in breast cancer patients. 相似文献
18.
Johannes Hartig Britta Hölzel Helfried Moosbrugger 《Multivariate behavioral research》2013,48(1):157-183
Numerous studies have shown increasing item reliabilities as an effect of the item position in personality scales. Traditionally, these context effects are analyzed based on item-total correlations. This approach neglects that trends in item reliabilities can be caused either by an increase in true score variance or by a decrease in error variance. This article presents the Confirmatory Analysis of Item Reliability Trends (CAIRT) that allows estimating both trends separately within a structural equation modeling framework. Results of a simulation study prove the CAIRT method to provide reliable and independent parameter estimates; the power exceeds the analysis of item-total correlations. We present an empirical application to self- and peer ratings collected in an Internet-based experiment. Results show that reliability trends are caused by increasing true score variance in self-ratings and by decreasing error variance in peer ratings. 相似文献
19.
Young I. Cho Monica J. Martin Rand D. Conger Keith F. Widaman 《Journal of psychopathology and behavioral assessment》2010,32(2):157-168
We investigated measurement equivalence in two antisocial behavior scales (i.e., one scale for adolescents and a second scale
for young adults) by examining differential item functioning (DIF) for respondents from single-parent (n = 109) and two-parent families (n = 447). Even though one item in the scale for adolescents and two items in the scale for young adults showed significant
DIF, the two scales exhibited non-significant differential test functioning (DTF). Both uniform and nonuniform DIF were investigated
and examples of each type were identified. Specifically, uniform DIF was exhibited in the adolescent scale whereas nonuniform
DIF was shown in the young adult scale. Implications of DIF results for assessment of antisocial behavior, along with strengths
and limitations of the study, are discussed. 相似文献