Abstract: | With the development of psychology metrology and wide application of psychological and educational tests, the fairness of test has been concerned by educators and psychologists, and more in-depth study on the differential item functioning has become the fact. Detection of differential item functioning (DIF) has been widely employed in the analysis of routine items, and a number of methods have been developed to detect DIF, such as Mantel-Hansel(MH) Procedure, Standardization(STND), Simultaneous Item Bias Procedure(SIBTEST), Likelihood Ration (LR) Test, Lord’s Chi-Square, Raju's Area Measures, MIMIC Method, etc. in most of those which there exist either a low power of test or a high type I error rate. Therefore it's necessary to find out one more effective method to detect DIF. Proposed in the paper for detecting differential item functioning (DIF), LP(Likelihood Procedure) is an IRT-based method with item-detection under the condition of two parameter logistic model (2PLM) as a representative.The performance of LP was compared with that of MH method, Lord chi-squared and Raju Area Measurement. DIF size, Test length, Sample size, the difference distribution of abilities between the focal group and reference group were also considered. Three levels of DIF size are 0.3, 0.5 and 0.8. Two levels of test length are 40 and 100. Three levels of sample size are 500 examinees, 1000 examinees and 2000 examinees. There are two distributions of abilities between the focal group and reference group, One fits in with standard normal distribution individually, the other says that distribution of abilities in reference group fits in with standard normal distribution while those in focal group fits in with normal distribution in which the mean is -1 and the standard deviation is 1. In this simulation study, data was generated using two parameter logistic model. The DIF item’s difficulty value in the study is corresponding to those in the focal group, or discrimination value is greater than those in the reference group. There are six DIF items in each group totally under the condition of uniform DIF and non-uniform DIF, including corresponding ones of three true-value DIF item. The simulation research indicates the following results: (1) LP has a high power of test and low and stable type I error rate. (2) As a whole the power of LP is higher than Lord chi-squared method and far higher than Mantel-Hansel(MH) method; and the type I error rate of LP is lower than Lord chi-squared method, when the test length is 100, MH method’s type I error rate is far beyond the range of stability scope. (3) LP is no better than Raju Area Measurement method in power of test, but the type I error rate of the later is so high that it’s above 0.1 and far beyond the range of stability scope under a variety of conditions.Generally speaking, LP has the following advantages: (1) LP is more sensitive and stability compared with MH. (2) LP is more reasonable used for checking DIF compared with Raju Area Measurement. (3) LP's power increases with the participants sample size or true DIF value. (4) Compared with the condition of same abilities, LP's power is lower When focal group and reference group behave diffierent abilities. (5) LP's power is high for both uniform DIF and non-uniform DIF, and the power is higher for the former. Finnally, LP is not only applicable to two parameter logistic model, but single parameter and three parameter logistic model as well. In addition, It’s easy to be applied extensively to multidimensional and multicategory scoring item. |