首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Usually, methods for detection of differential item functioning (DIF) compare the functioning of items across manifest groups. However, the manifest groups with respect to which the items function differentially may not necessarily coincide with the true source of the bias. It is expected that DIF detection under a model that includes a latent DIF variable is more sensitive to this source of bias. In a simulation study, it is shown that a mixture item response theory model, which includes a latent grouping variable, performs better in identifying DIF items than DIF detection methods using manifest variables only. The difference between manifest and latent DIF detection increases as the correlation between the manifest variable and the true source of the DIF becomes smaller. Different sample sizes, relative group sizes, and significance levels are studied. Finally, an empirical example demonstrates the detection of heterogeneity in a minority sample using a latent grouping variable. Manifest and latent DIF detection methods are applied to a Vocabulary test of the General Aptitude Test Battery (GATB).  相似文献   

2.
This study examines the psychometric properties, and particularly differential item functioning (DIF) due to racial and ethnic group, of the criteria for a major depressive episode using a large sample (N?=?1,063) of outpatients seeking treatment for mood and anxiety disorders. DIF was evaluated using multiple group confirmatory factor analysis. Item thresholds fell along a continuum with the core features of depressed mood and anhedonia, along with fatigue, being endorsed at lower levels of depression, and change in appetite and suicidal ideation endorsed at more severe levels of depression. Item discriminations, reflecting an item’s ability to discriminate between lower and higher levels of depression, were highest for depressed mood and anhedonia, and lowest for change in appetite and suicidal ideation. When examining model fit among the racial groups we did not find differences in symptom functioning, providing support for the use of these symptoms across diverse groups. This is of particular importance given the paucity of studies examining this question using a semi-structured clinician administered instrument to a clinical sample.  相似文献   

3.
The Wisconsin Schizotypy Scales—the Perceptual Aberration, Magical Ideation, Physical Anhedonia, and Revised Social Anhedonia Scales—have been used extensively since their development in the 1970s and 1980s. Based on psychometric analyses using item response theory, the present work presents 15-item short forms of each scale. In addition to being briefer, the short forms omit items with high differential item functioning. Based on data from a sample of young adults (n = 1144), the short forms have strong internal consistency, and they mirror effects found for the longer scales. They thus appear to be a good option for researchers interested in the brief assessment of schizotypic traits. The items are listed in an Appendix A.  相似文献   

4.
The Wisconsin Schizotypy Scales—the Perceptual Aberration, Magical Ideation, Physical Anhedonia, and Revised Social Anhedonia Scales—have been used extensively since their development in the 1970s and 1980s. Based on psychometric analyses using item response theory, the present work presents 15-item short forms of each scale. In addition to being briefer, the short forms omit items with high differential item functioning. Based on data from a sample of young adults (n = 1144), the short forms have strong internal consistency, and they mirror effects found for the longer scales. They thus appear to be a good option for researchers interested in the brief assessment of schizotypic traits. The items are listed in an Appendix A.  相似文献   

5.
测验理论的新发展:多维项目反应理论   总被引:3,自引:0,他引:3  
多维项目反应理论是基于因子分析和单维项目反应理论两大背景下发展起来的一种新型测验理论。根据被试在完成一项任务时多种能力之间是如何相互作用的,多维项目反应模型可以分为补偿性模型和非补偿性模型两类。本文在系统介绍了当前普遍使用的补偿性模型的基础上,指出后续研究者应关注多维项目反应理论中多级评分和高维空间的多维模型、补偿性和非补偿性模型的融合、参数估计程序的开发和多维测验等值四个方面的研究。  相似文献   

6.
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The objective was to provide bounds of the likely DIF effects on these measurement consequences. Five factors were manipulated: test length, percentage of DIF items per form, item type, sample size, and level of group ability difference. Results indicate that the greatest DIF effect was less than 2 points on the 0 to 60 total score scale and about 0.15 on the IRT ability scale. DIF had a limited effect on the ratio of true-score variance to observed-score variance, but its influence on the standard error of estimation for the IRT ability parameter was evident for certain ability values.  相似文献   

7.
Little research has been conducted on Loevinger's Washington University Sentence Completion Test of Ego Development in adult psychiatric outpatients. The measure is a promising method of assessing a construct of personality and character functioning that should be useful research on psychopathology and in choosing treatment modalities. The data presented in this study address the question of the psychometric adequacy of the measure in this segment of the subject population. Specifically, estimates of interrater reliability, internal consistency, and test-retest reliability are presented for a sample of 42 adult outpatients. In addition, the relationship between total protocol ratings and item sum scores is explored.  相似文献   

8.
应用项目反应理论对瑞文测验联合型的分析   总被引:1,自引:0,他引:1  
使用BILOG-MG3.0软件,边际极大似然估计,3参数Logistic模型对354名不同能力水平的男性青年的瑞文测验联合型数据进行了分析。结果显示:大多数瑞文测验联合型的题目都适合3参数Logistic模型(有6道题不适合)。整个测验的信息函数峰值的位置在难度量表的-3到-2之间,其值为16.82。共有18道题的信息函数峰值在0.2以下。从区分度来看,72道题目的区分度均大于0.5,比较理想。难度参数显示所有题目均较低,绝大部分都在0以下,最高的只有1.01。题目的难度主要由所需的操作水平决定。伪猜测参数在0.07-0.24之间。综合分析表明瑞文测验联合型对正常青年的智力评价精度较差。  相似文献   

9.
We give an account of Classical Test Theory (CTT) in terms of the more fundamental ideas of Item Response Theory (IRT). This approach views classical test theory as a very general version of IRT, and the commonly used IRT models as detailed elaborations of CTT for special purposes. We then use this approach to CTT to derive some general results regarding the prediction of the true-score of a test from an observed score on that test as well from an observed score on a different test. This leads us to a new view of linking tests that were not developed to be linked to each other. In addition we propose true-score prediction analogues of the Dorans and Holland measures of the population sensitivity of test linking functions. We illustrate the accuracy of the first-order theory using simulated data from the Rasch model, and illustrate the effect of population differences using a set of real data.This research is collaborative in every respect and the order of authorship is alphabetical. It was begun when both authors were on the faculty of the Graduate School of Education at the University of California, Berkeley.We would like to thank both Neil Dorans, Skip Livingston and two anonymous referees for many suggestions that have greatly improved this paper.  相似文献   

10.
Globally, the COVID-19 pandemic has impaired every aspect of life, especially causing much psychological damage—for instance, increasing the risk of suicide. Intense fear and anxiety are considered to play a central role in mental health problems. This study examined the psychological properties of the Japanese version of the Fear of COVID-19 Scale (FCV-19S) using classical test theory (CTT) and item response theory (IRT). Five hundred fifty participants aged 18–69 years and from across Japan completed questionnaires, including the Japanese FCV-19S, the Japanese Depression Anxiety Stress Scales-15 (DASS-15), and the Japanese version of the Kessler 6 (K6). CTT showed that each item of the Japanese FCV-19S had no ceiling and floor effect and was close to the normal distribution, and IRT revealed that each item had an appropriate parameter of discrimination and difficulty. Finally, the Japanese FCV-19S was shown to have an acceptable reliability and moderate good concurrent validity. Consequently, the Japanese FCV-19S has robust psychometric properties and can be useful for early detection of adults impacted by the COVID-19 pandemic.  相似文献   

11.
Gough’s Creative Personality Scale (CPS) has been very widely used to assess creative personality characteristics, and many researchers have argued that it is associated with strong reliability and validity evidence. However, findings vary considerably across the samples used in each study, suggesting that an analysis using the item response theory framework would provide more useful evidence of the instrument’s psychometric integrity. Results suggest that the CPS is multidimensional, and some items have low discrimination indices. Additionally, significant correlations between participants’ trait estimates (i.e., estimated creative personality) and their engagement in a range of creative activities indicate that the CPS is associated with convincing evidence of criterion-related validity.  相似文献   

12.
Ordóñez XG  Romero SJ 《Psicothema》2007,19(1):171-172
XS-DIF is a program for detection of Differential Item Functioning (DIF) using Item Response Theory (IRT). It calculates Lords Chi-Square, Raju's Signed Area and Unsigned Area, and Kim and Cohen's Closed-interval signed area and Closed-interval unsigned area. XS-DIF was designed to be executed in Excel 2000 and it has a capacity of analysis of up to 100 items. It is useful to support data analysis of research projects and in detection and teaching processes in DIF.  相似文献   

13.
自编235个图形推理测验题目。采用铆测验等值设计,以72个联合型瑞文测验题目为铆题,对初中到大学各能力层次的1733名男性进行了测验。使用BILOG MG3.0(边际极大似然估计)对实测数据进行了分析,采用Logsitic 3参数模型。剔除数据与模型拟合不好的题目以及信息函数最大值小于0.3的题目,最终建立一个包含181道题目的题库。该题库可以用于淘汰智力较低的应征青年  相似文献   

14.
刘红云  骆方 《心理学报》2008,40(1):92-100
作者简要介绍了多水平项目反应模型,对多水平项目反应理论与通常项目反应理论之间的关系进行了探讨,得到了多水平项目反应模型参数与通常项目反应模型参数之间的关系,并讨论了多水平项目反应模型的推广模型。通过一个实际例子,用多水平项目反应模型对测验中项目的特征进行分析;检验个体水平和组水平预测变量对能力参数的影响;对项目功能差异进行分析。最后文章就多水平项目反应理论模型的优势与不足进行了讨论  相似文献   

15.
16.
Item response theory (IRT) provides valuable methods for the analysis of the psychometric properties of a psychological measure. To date, however, these methods have not been used frequently by personality assessment researchers, in part because many researchers have not been introduced to the methods and in part because most of the development of IRT has taken place in applied education assessment settings, resulting in terminology that is ability focused rather than trait focused. The purpose of this article is twofold. First, an overview of IRT is presented, highlighting the concepts of the three-parameter IRT model, item and test information, and conditional standard error of measurement. Second, the psychometric properties of the (MMPI-2) PSY-5 scales are examined to demonstrate IRT's value.  相似文献   

17.
Psychometric work on the widely used Depression Anxiety and Stress Scales (DASS) has mostly used classical psychometrics and ignored common internet-administered versions. Therefore, the present study used not only classical, but also modern psychometrics based on item response theory (IRT) to evaluate an internet-administered version of the DASS (Dutch translation). Internet-administered DASS data were collected as part of a large internet-based study in the Dutch adult population (n = 7972). Initially, external correlates (i.e. demographics other measures) and some classical psychometrics (internal consistency, convergent/divergent validity) of the DASS scales were evaluated. Next, IRT was used to investigate the scales’ dimensionality, discrimination and item-functioning. Finally, the DASS depression scale was further investigated by linking it to the more clinically-oriented Quick Inventory of Depressive Symptomatology (QIDS) using item response theory (IRT). Initial classical psychometric analyses supported the scales’ internal consistency (alpha = 0.94–0.98) and convergent/divergent validity. IRT analyses showed that each of the DASS scales was only suitable to measure variations in a very narrow and rather mild severity range. Linking the DASS depression scale with the QIDS also showed that the DASS depression scale discriminated best in the mild-moderate severity range, but not at higher severity levels that were covered by the QIDS. In conclusion, the scales of the internet-administered DASS show good internal consistency and validity. However, users should be aware that the scales discriminate best at mild-moderate severity ranges in the general population.  相似文献   

18.
Differential item functioning (DIF), referring to between-group variation in item characteristics above and beyond the group-level disparity in the latent variable of interest, has long been regarded as an important item-level diagnostic. The presence of DIF impairs the fit of the single-group item response model being used, and calls for either model modification or item deletion in practice, depending on the mode of analysis. Methods for testing DIF with continuous covariates, rather than categorical grouping variables, have been developed; however, they are restrictive in parametric forms, and thus are not sufficiently flexible to describe complex interaction among latent variables and covariates. In the current study, we formulate the probability of endorsing each test item as a general bivariate function of a unidimensional latent trait and a single covariate, which is then approximated by a two-dimensional smoothing spline. The accuracy and precision of the proposed procedure is evaluated via Monte Carlo simulations. If anchor items are available, we proposed an extended model that simultaneously estimates item characteristic functions (ICFs) for anchor items, ICFs conditional on the covariate for non-anchor items, and the latent variable density conditional on the covariate—all using regression splines. A permutation DIF test is developed, and its performance is compared to the conventional parametric approach in a simulation study. We also illustrate the proposed semiparametric DIF testing procedure with an empirical example.  相似文献   

19.
A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.  相似文献   

20.
Traditional testing procedures typically utilize unidimensional item response theory (IRT) models to provide a single, continuous estimate of a student’s overall ability. Advances in psychometrics have focused on measuring multiple dimensions of ability to provide more detailed feedback for students, teachers, and other stakeholders. Diagnostic classification models (DCMs) provide multidimensional feedback by using categorical latent variables that represent distinct skills underlying a test that students may or may not have mastered. The Scaling Individuals and Classifying Misconceptions (SICM) model is presented as a combination of a unidimensional IRT model and a DCM where the categorical latent variables represent misconceptions instead of skills. In addition to an estimate of ability along a latent continuum, the SICM model provides multidimensional, diagnostic feedback in the form of statistical estimates of probabilities that students have certain misconceptions. Through an empirical data analysis, we show how this additional feedback can be used by stakeholders to tailor instruction for students’ needs. We also provide results from a simulation study that demonstrate that the SICM MCMC estimation algorithm yields reasonably accurate estimates under large-scale testing conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号