首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.  相似文献   

用Logistic Regression侦察题目差异功能   总被引:1,自引:0,他引:1  
题目差异功能(differential item functioning,DIF)是构造测验公平性的重要依据,DIF的研究与测验的效度有直接的关联.本文通过对DIF的提出作简要的回顾,着重介绍如何运用Logistic Regression探测一致性DIF和非一致性DIF,并例证了学习适应性测验(AAT)的6个项目在性别上存在题目差异功能.  相似文献   

运用均数与协方差结构模型侦查项目功能差异   总被引:1,自引:0,他引:1       下载免费PDF全文
阐释了运用多组均数与协方差结构(MACS)模型侦查多级反应项目的一致性与非一致性项目功能差异(DIF)的原理与程序, 以道德自我概念量表DIF的侦查进行示例, 并对该方法进行了评价。与项目反应理论比照, MACS采用系统的、迭代的方式利用修正指数来侦查DIF, 并提供多个拟合指数协同评价模型拟合;与标准验证性因素分析相较, MACS不仅能侦查非一致性DIF, 而且能侦查一致性DIF。运用MACS侦查DIF是一种值得推荐的方法。  相似文献   

Usually, methods for detection of differential item functioning (DIF) compare the functioning of items across manifest groups. However, the manifest groups with respect to which the items function differentially may not necessarily coincide with the true source of the bias. It is expected that DIF detection under a model that includes a latent DIF variable is more sensitive to this source of bias. In a simulation study, it is shown that a mixture item response theory model, which includes a latent grouping variable, performs better in identifying DIF items than DIF detection methods using manifest variables only. The difference between manifest and latent DIF detection increases as the correlation between the manifest variable and the true source of the DIF becomes smaller. Different sample sizes, relative group sizes, and significance levels are studied. Finally, an empirical example demonstrates the detection of heterogeneity in a minority sample using a latent grouping variable. Manifest and latent DIF detection methods are applied to a Vocabulary test of the General Aptitude Test Battery (GATB).  相似文献   

Identifying the sources of differential item functioning (DIF) in international assessments is very challenging, because such sources are often nebulous and intertwined. Even though researchers frequently focus on test translation and content area, few actually go beyond these factors to investigate other cultural sources of DIF. This article introduces the multiple-variable matching method using logistic regression analysis to identify sources of DIF. A case study demonstrates how this methodology identified Extra Lesson Hours After School (ELHAS) as a potential source of DIF between Taiwan and the United States in the Third International Mathematics and Science Study (TIMSS) 1999. DIF is not a fixed character of any test item, nor is a cultural factor an inherent source of DIF. The legitimacy of a source of DIF relies on the specific context and purpose for the cross-country comparison.  相似文献   

Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The objective was to provide bounds of the likely DIF effects on these measurement consequences. Five factors were manipulated: test length, percentage of DIF items per form, item type, sample size, and level of group ability difference. Results indicate that the greatest DIF effect was less than 2 points on the 0 to 60 total score scale and about 0.15 on the IRT ability scale. DIF had a limited effect on the ratio of true-score variance to observed-score variance, but its influence on the standard error of estimation for the IRT ability parameter was evident for certain ability values.  相似文献   

Differential item functioning (DIF), referring to between-group variation in item characteristics above and beyond the group-level disparity in the latent variable of interest, has long been regarded as an important item-level diagnostic. The presence of DIF impairs the fit of the single-group item response model being used, and calls for either model modification or item deletion in practice, depending on the mode of analysis. Methods for testing DIF with continuous covariates, rather than categorical grouping variables, have been developed; however, they are restrictive in parametric forms, and thus are not sufficiently flexible to describe complex interaction among latent variables and covariates. In the current study, we formulate the probability of endorsing each test item as a general bivariate function of a unidimensional latent trait and a single covariate, which is then approximated by a two-dimensional smoothing spline. The accuracy and precision of the proposed procedure is evaluated via Monte Carlo simulations. If anchor items are available, we proposed an extended model that simultaneously estimates item characteristic functions (ICFs) for anchor items, ICFs conditional on the covariate for non-anchor items, and the latent variable density conditional on the covariate—all using regression splines. A permutation DIF test is developed, and its performance is compared to the conventional parametric approach in a simulation study. We also illustrate the proposed semiparametric DIF testing procedure with an empirical example.  相似文献   

This study involved two phases: first, when classification was based on the calibration sample; and second, in a cross-validation setting. Computer generated data were used. Results obtained from rules based on probabilities of group membership were compared for accuracy when classifying in the discriminant space and in the predictor variable spaces. In the first phase accuracy was greater in the predictor variable spaces, while the reverse was true in the second phase. In general, rules based on probabilities of group membership were approximately equally accurate and more accurate than a rule related to a multiple regression analysis. Other findings are also discussed.  相似文献   

Summary: The possible relationship between masculinity and creativity in college women was investigated through a battery of masculinity-femininity scales that tapped both manifest and latent masculinity, factorially derived clusters, and an ipsative measure. Two samples (n = 45 each) of women who had scored above the 75th percentile and below the 25th percentile respectively on two measures of creativity were used. High creative subjects scored higher on activity and described themselves as more masculine; indications are that they possess a broader, less stereotyped sex-role identity.  相似文献   

The present study examined the psychometric properties of a universal screening instrument called the Emotional and Behavioral Screener (EBS), which is designed to identify students exhibiting emotional and behavioral problems. The primary purposes of this study were to assess the measurement invariance of EBS items between Caucasian and African-American students and to assess the impact of differential item functioning (DIF) on EBS scores. The sample consisted of 946 elementary students from throughout the U.S. The findings suggested that EBS items exhibited small to negligible levels of DIF, and that DIF did not significantly impact EBS scores. The results supported the EBS as universal screening instrument that is fair in measuring the emotional and behavioral risk of elementary students. Research limitations and implications for school professionals are discussed.  相似文献   

This study examines the psychometric properties, and particularly differential item functioning (DIF) due to racial and ethnic group, of the criteria for a major depressive episode using a large sample (N?=?1,063) of outpatients seeking treatment for mood and anxiety disorders. DIF was evaluated using multiple group confirmatory factor analysis. Item thresholds fell along a continuum with the core features of depressed mood and anhedonia, along with fatigue, being endorsed at lower levels of depression, and change in appetite and suicidal ideation endorsed at more severe levels of depression. Item discriminations, reflecting an item’s ability to discriminate between lower and higher levels of depression, were highest for depressed mood and anhedonia, and lowest for change in appetite and suicidal ideation. When examining model fit among the racial groups we did not find differences in symptom functioning, providing support for the use of these symptoms across diverse groups. This is of particular importance given the paucity of studies examining this question using a semi-structured clinician administered instrument to a clinical sample.  相似文献   

基于改进的Wald统计量,将适用于两群组的DIF检测方法拓展至多群组的项目功能差异(DIF)检验;改进的Wald统计量将分别通过计算观察信息矩阵(Obs)和经验交叉相乘信息矩阵(XPD)而得到。模拟研究探讨了此二者与传统计算方法在多个群组下的DIF检验情况,结果表明:(1)Obs和XPD的一类错误率明显低于传统方法,DINA模型估计下Obs和XPD的一类错误率接近理论水平;(2)样本量和DIF量较大时,Obs和XPD具有与传统Wald统计量大体相同的统计检验力。  相似文献   

Ordóñez XG  Romero SJ 《Psicothema》2007,19(1):171-172
XS-DIF is a program for detection of Differential Item Functioning (DIF) using Item Response Theory (IRT). It calculates Lords Chi-Square, Raju's Signed Area and Unsigned Area, and Kim and Cohen's Closed-interval signed area and Closed-interval unsigned area. XS-DIF was designed to be executed in Excel 2000 and it has a capacity of analysis of up to 100 items. It is useful to support data analysis of research projects and in detection and teaching processes in DIF.  相似文献   

Measurement invariance is a fundamental assumption in item response theory models, where the relationship between a latent construct (ability) and observed item responses is of interest. Violation of this assumption would render the scale misinterpreted or cause systematic bias against certain groups of persons. While a number of methods have been proposed to detect measurement invariance violations, they typically require advance definition of problematic item parameters and respondent grouping information. However, these pieces of information are typically unknown in practice. As an alternative, this paper focuses on a family of recently proposed tests based on stochastic processes of casewise derivatives of the likelihood function (i.e., scores). These score-based tests only require estimation of the null model (when measurement invariance is assumed to hold), and they have been previously applied in factor-analytic, continuous data contexts as well as in models of the Rasch family. In this paper, we aim to extend these tests to two-parameter item response models, with strong emphasis on pairwise maximum likelihood. The tests’ theoretical background and implementation are detailed, and the tests’ abilities to identify problematic item parameters are studied via simulation. An empirical example illustrating the tests’ use in practice is also provided.  相似文献   

A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.  相似文献   

检验项目功能差异的两类方法-CFA和IRT的比较   总被引:2,自引:0,他引:2  
目前在验证性因素分析(CFA)和项目反应理论(IRT)两个领域,都有一些检验方法来识别项目功能差异(DIF)。该文主要针对单维的多级计分项目,分别介绍CFA和IRT检测DIF的方法,并进行二者的比较。  相似文献   

本文指出了自我报告法中项目前后关系效应的普遍存在性及其危害性。讨论了用信息加工的观点对项目前后关系效应所作的认知上的理论解释,以及测量工具中促使产生这种项目前后关系效应的关键特征。同时也讨论了项目序列位置的作用。  相似文献   

Wang  Chun  Xu  Gongjun  Zhang  Xue 《Psychometrika》2019,84(3):673-700
Psychometrika - When latent variables are used as outcomes in regression analysis, a common approach that is used to solve the ignored measurement error issue is to take a multilevel perspective on...  相似文献   

In behavioral research, interest is often in examining the degree to which the effect of an independent variable X on an outcome Y is mediated by an intermediary or mediator variable M. This article illustrates how generalized estimating equations (GEE) modeling can be used to estimate the indirect or mediated effect, defined as the amount by which the regression coefficient of X on Y changes after adjusting for M. Advantages of this method are: (a) it applies to the class of generalized linear models, including linear, logistic, and Poisson regression as special cases; (b) it allows multiple independent variables and mediators in the same model; and (c) asymptotically valid standard errors and confidence intervals are obtained using standard software. This methodology is compared with the bootstrap, another general methodology that can be applied to the same broad class of models, and is evaluated using simulation in both linear and logistic regression scenarios. The methods are utilized to examine the degree to which the effect of low birthweight status on internalizing symptoms at age 20 is mediated through IQ at age 8.  相似文献   

Differential item functioning (DIF) assessment is key in score validation. When DIF is present scores may not accurately reflect the construct of interest for some groups of examinees, leading to incorrect conclusions from the scores. Given rising immigration, and the increased reliance of educational policymakers on cross-national assessments such as Programme for International Student Assessment, Trends in International Mathematics and Science Study, and Progress in International Reading Literacy Study (PIRLS), DIF with regard to native language is of particular interest in this context. However, given differences in language and cultures, assuming similar cross-national DIF may lead to mistaken assumptions about the impact of immigration status, and native language on test performance. The purpose of this study was to use model-based recursive partitioning (MBRP) to investigate uniform DIF in PIRLS items across European nations. Results demonstrated that DIF based on mother's language was present for several items on a PIRLS assessment, but that the patterns of DIF were not the same across all nations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号