共查询到20条相似文献,搜索用时 0 毫秒
1.
Improvement in Detection of Differential Item Functioning Using a Mixture Item Response Theory Model
Annette M. Maij-de Meij Henk Kelderman Henk van der Flier 《Multivariate behavioral research》2013,48(6):975-999
Usually, methods for detection of differential item functioning (DIF) compare the functioning of items across manifest groups. However, the manifest groups with respect to which the items function differentially may not necessarily coincide with the true source of the bias. It is expected that DIF detection under a model that includes a latent DIF variable is more sensitive to this source of bias. In a simulation study, it is shown that a mixture item response theory model, which includes a latent grouping variable, performs better in identifying DIF items than DIF detection methods using manifest variables only. The difference between manifest and latent DIF detection increases as the correlation between the manifest variable and the true source of the DIF becomes smaller. Different sample sizes, relative group sizes, and significance levels are studied. Finally, an empirical example demonstrates the detection of heterogeneity in a minority sample using a latent grouping variable. Manifest and latent DIF detection methods are applied to a Vocabulary test of the General Aptitude Test Battery (GATB). 相似文献
2.
Benjamin O. Emmert-Aronson Michael T. Moore Timothy A. Brown 《Journal of psychopathology and behavioral assessment》2014,36(3):424-431
This study examines the psychometric properties, and particularly differential item functioning (DIF) due to racial and ethnic group, of the criteria for a major depressive episode using a large sample (N?=?1,063) of outpatients seeking treatment for mood and anxiety disorders. DIF was evaluated using multiple group confirmatory factor analysis. Item thresholds fell along a continuum with the core features of depressed mood and anhedonia, along with fatigue, being endorsed at lower levels of depression, and change in appetite and suicidal ideation endorsed at more severe levels of depression. Item discriminations, reflecting an item’s ability to discriminate between lower and higher levels of depression, were highest for depressed mood and anhedonia, and lowest for change in appetite and suicidal ideation. When examining model fit among the racial groups we did not find differences in symptom functioning, providing support for the use of these symptoms across diverse groups. This is of particular importance given the paucity of studies examining this question using a semi-structured clinician administered instrument to a clinical sample. 相似文献
3.
Beate P. Winterstein Paul J. Silvia Thomas R. Kwapil James C. Kaufman Roni Reiter-Palmon Benjamin Wigert 《Personality and individual differences》2011,51(8):920-924
The Wisconsin Schizotypy Scales—the Perceptual Aberration, Magical Ideation, Physical Anhedonia, and Revised Social Anhedonia Scales—have been used extensively since their development in the 1970s and 1980s. Based on psychometric analyses using item response theory, the present work presents 15-item short forms of each scale. In addition to being briefer, the short forms omit items with high differential item functioning. Based on data from a sample of young adults (n = 1144), the short forms have strong internal consistency, and they mirror effects found for the longer scales. They thus appear to be a good option for researchers interested in the brief assessment of schizotypic traits. The items are listed in an Appendix A. 相似文献
4.
《Journal of personality assessment》2013,95(3):478-486
Little research has been conducted on Loevinger's Washington University Sentence Completion Test of Ego Development in adult psychiatric outpatients. The measure is a promising method of assessing a construct of personality and character functioning that should be useful research on psychopathology and in choosing treatment modalities. The data presented in this study address the question of the psychometric adequacy of the measure in this segment of the subject population. Specifically, estimates of interrater reliability, internal consistency, and test-retest reliability are presented for a sample of 42 adult outpatients. In addition, the relationship between total protocol ratings and item sum scores is explored. 相似文献
5.
应用项目反应理论对瑞文测验联合型的分析 总被引:1,自引:0,他引:1
使用BILOG-MG3.0软件,边际极大似然估计,3参数Logistic模型对354名不同能力水平的男性青年的瑞文测验联合型数据进行了分析。结果显示:大多数瑞文测验联合型的题目都适合3参数Logistic模型(有6道题不适合)。整个测验的信息函数峰值的位置在难度量表的-3到-2之间,其值为16.82。共有18道题的信息函数峰值在0.2以下。从区分度来看,72道题目的区分度均大于0.5,比较理想。难度参数显示所有题目均较低,绝大部分都在0以下,最高的只有1.01。题目的难度主要由所需的操作水平决定。伪猜测参数在0.07-0.24之间。综合分析表明瑞文测验联合型对正常青年的智力评价精度较差。 相似文献
6.
We give an account of Classical Test Theory (CTT) in terms of the more fundamental ideas of Item Response Theory (IRT). This approach views classical test theory as a very general version of IRT, and the commonly used IRT models as detailed elaborations of CTT for special purposes. We then use this approach to CTT to derive some general results regarding the prediction of the true-score of a test from an observed score on that test as well from an observed score on a different test. This leads us to a new view of linking tests that were not developed to be linked to each other. In addition we propose true-score prediction analogues of the Dorans and Holland measures of the population sensitivity of test linking functions. We illustrate the accuracy of the first-order theory using simulated data from the Rasch model, and illustrate the effect of population differences using a set of real data.This research is collaborative in every respect and the order of authorship is alphabetical. It was begun when both authors were on the faculty of the Graduate School of Education at the University of California, Berkeley.We would like to thank both Neil Dorans, Skip Livingston and two anonymous referees for many suggestions that have greatly improved this paper. 相似文献
7.
XS-DIF is a program for detection of Differential Item Functioning (DIF) using Item Response Theory (IRT). It calculates Lords Chi-Square, Raju's Signed Area and Unsigned Area, and Kim and Cohen's Closed-interval signed area and Closed-interval unsigned area. XS-DIF was designed to be executed in Excel 2000 and it has a capacity of analysis of up to 100 items. It is useful to support data analysis of research projects and in detection and teaching processes in DIF. 相似文献
8.
Klaas J. Wardenaar Rob B. K. Wanders Bertus F. Jeronimus Peter de Jonge 《Journal of psychopathology and behavioral assessment》2018,40(2):318-333
Psychometric work on the widely used Depression Anxiety and Stress Scales (DASS) has mostly used classical psychometrics and ignored common internet-administered versions. Therefore, the present study used not only classical, but also modern psychometrics based on item response theory (IRT) to evaluate an internet-administered version of the DASS (Dutch translation). Internet-administered DASS data were collected as part of a large internet-based study in the Dutch adult population (n = 7972). Initially, external correlates (i.e. demographics other measures) and some classical psychometrics (internal consistency, convergent/divergent validity) of the DASS scales were evaluated. Next, IRT was used to investigate the scales’ dimensionality, discrimination and item-functioning. Finally, the DASS depression scale was further investigated by linking it to the more clinically-oriented Quick Inventory of Depressive Symptomatology (QIDS) using item response theory (IRT). Initial classical psychometric analyses supported the scales’ internal consistency (alpha = 0.94–0.98) and convergent/divergent validity. IRT analyses showed that each of the DASS scales was only suitable to measure variations in a very narrow and rather mild severity range. Linking the DASS depression scale with the QIDS also showed that the DASS depression scale discriminated best in the mild-moderate severity range, but not at higher severity levels that were covered by the QIDS. In conclusion, the scales of the internet-administered DASS show good internal consistency and validity. However, users should be aware that the scales discriminate best at mild-moderate severity ranges in the general population. 相似文献
9.
10.
《Journal of personality assessment》2013,95(2):282-307
Item response theory (IRT) provides valuable methods for the analysis of the psychometric properties of a psychological measure. To date, however, these methods have not been used frequently by personality assessment researchers, in part because many researchers have not been introduced to the methods and in part because most of the development of IRT has taken place in applied education assessment settings, resulting in terminology that is ability focused rather than trait focused. The purpose of this article is twofold. First, an overview of IRT is presented, highlighting the concepts of the three-parameter IRT model, item and test information, and conditional standard error of measurement. Second, the psychometric properties of the (MMPI-2) PSY-5 scales are examined to demonstrate IRT's value. 相似文献
11.
Differential item functioning (DIF), referring to between-group variation in item characteristics above and beyond the group-level disparity in the latent variable of interest, has long been regarded as an important item-level diagnostic. The presence of DIF impairs the fit of the single-group item response model being used, and calls for either model modification or item deletion in practice, depending on the mode of analysis. Methods for testing DIF with continuous covariates, rather than categorical grouping variables, have been developed; however, they are restrictive in parametric forms, and thus are not sufficiently flexible to describe complex interaction among latent variables and covariates. In the current study, we formulate the probability of endorsing each test item as a general bivariate function of a unidimensional latent trait and a single covariate, which is then approximated by a two-dimensional smoothing spline. The accuracy and precision of the proposed procedure is evaluated via Monte Carlo simulations. If anchor items are available, we proposed an extended model that simultaneously estimates item characteristic functions (ICFs) for anchor items, ICFs conditional on the covariate for non-anchor items, and the latent variable density conditional on the covariate—all using regression splines. A permutation DIF test is developed, and its performance is compared to the conventional parametric approach in a simulation study. We also illustrate the proposed semiparametric DIF testing procedure with an empirical example. 相似文献
12.
A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method. 相似文献
13.
Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions 总被引:1,自引:0,他引:1
Traditional testing procedures typically utilize unidimensional item response theory (IRT) models to provide a single, continuous estimate of a student’s overall ability. Advances in psychometrics have focused on measuring multiple dimensions of ability to provide more detailed feedback for students, teachers, and other stakeholders. Diagnostic classification models (DCMs) provide multidimensional feedback by using categorical latent variables that represent distinct skills underlying a test that students may or may not have mastered. The Scaling Individuals and Classifying Misconceptions (SICM) model is presented as a combination of a unidimensional IRT model and a DCM where the categorical latent variables represent misconceptions instead of skills. In addition to an estimate of ability along a latent continuum, the SICM model provides multidimensional, diagnostic feedback in the form of statistical estimates of probabilities that students have certain misconceptions. Through an empirical data analysis, we show how this additional feedback can be used by stakeholders to tailor instruction for students’ needs. We also provide results from a simulation study that demonstrate that the SICM MCMC estimation algorithm yields reasonably accurate estimates under large-scale testing conditions. 相似文献
14.
Joke Van den Broeck Gina Rossi Eva Dierckx Barbara De Clercq 《Journal of psychopathology and behavioral assessment》2012,34(3):361-369
Geriatric researchers and clinicians often have to deal with a lack of valid personality measures for older age groups (e.g., Mroczek, Hurt, & Berman, 1999; Zweig 2008), which hampers a reliable assessment of personality in later life. An age-neutral measurement system is one of the basic conditions for an accurate personality assessment across the lifespan, both longitudinally and cross-sectionally. In the present study, we empirically investigate the age-neutrality of one of the most widely used personality measures (i.e., the NEO PI-R (Costa & McCrae, 1992)), by examining potential Differential Item Functioning (DIF). Overall, results indicate that the vast majority (92.9?% at domain-level and 95?% at facet-level) of the NEO PI-R items was similarly endorsed by younger and older age groups with the same position on the personality trait of interest, corroborating the NEO PI-R??s age neutrality. However, Differential Test Functioning (DTF) analyses revealed large DTF for Extraversion, and facet A6 (Tender-Mindedness). Results are discussed in terms of their implications for using the current format of the NEO PI-R in older aged samples. 相似文献
15.
检验项目功能差异的两类方法-CFA和IRT的比较 总被引:2,自引:0,他引:2
目前在验证性因素分析(CFA)和项目反应理论(IRT)两个领域,都有一些检验方法来识别项目功能差异(DIF)。该文主要针对单维的多级计分项目,分别介绍CFA和IRT检测DIF的方法,并进行二者的比较。 相似文献
16.
Annisa Ahmed Tyler M. Moore Jason Lewis Laura Butler Tami D Benton 《Journal of aggression, maltreatment & trauma》2013,22(10):1110-1124
Bullying involvement is a multidimensional issue and a significant concern for school-aged youth. However, research on bullying involvement within clinical samples is limited. The present study examines the psychometric properties of the University of Illinois Bully, Fighting, and Victimization Scales among children and adolescents presenting for mood symptoms at a behavioral health outpatient clinic. Patients (n = 165) with an age range of 8–18 years were included in this investigation. Item frequencies, internal consistency, and construct validity were examined while confirmatory and exploratory factor analyses were conducted. Results showed that the scales were low to moderately correlated with each other. Internal consistencies were acceptable for the Bully (α = .70), Fighting (α = .84), and Victimization (α = .88) Scales. Exploratory factor analysis revealed clean three-factor solutions with items loading on their intended factors and with few cross-loadings. Fit of the confirmatory factor analysis was good when items were treated as ordinal and two items were allowed to cross-load on more than one factor. The University of Illinois Bully, Fighting, and Victimization Scales show utility for use with clinically referred youth. 相似文献
17.
《Cognitive behaviour therapy》2013,42(4):194-202
The course of severe anxiety surrounding health issues is unknown. The available literature suggests that adults who are overly anxious about health issues often interpret or misinterpret their bodily signs and symptoms to be indicative of a serious illness. The construct of health anxiety has not been examined in children and, to date, there has not been an instrument developed for this purpose. The Illness Attitude Scales is one of the most commonly used instruments for evaluating fears, beliefs, and attitudes that are associated with hypochondriasis and abnormal illness behaviour in adults. We sought to adapt the Illness Attitude Scales for use with children ages 8–15 years. The adapted Illness Attitude Scales was renamed the Childhood Illness Attitude Scales. Revisions to the adult version consisted of simplification of language, revision of Likert scale (i.e. 5-point to 3-point scale), and the addition of 7 questions to evaluate the role parents/guardians play in facilitating medical attention or treatment. Correlations between Childhood Illness Attitude Scales total scores and other self-report measures were supportive of the construct-related validity of the Childhood Illness Attitude Scales and suggested that it is a useful measure of health anxiety in school-age children. Practical and theoretical implications of the present results are discussed. 相似文献
18.
19.
Augustine Osman Peter M. Gutierrez Francisco X. Barrios Beverly A. Kopper Christine E. Chiros 《Journal of psychopathology and behavioral assessment》1998,20(3):249-264
The present investigations examined the factor structure and psychometric properties of two new self-report measures of social phobia, the Social Interaction Anxiety Scale (SIAS) and the Social Phobia Scale (SPS). A confirmatory factor analysis in Study I provided support for the fit of a two-factor model of the SIAS and SPS. Internal consistency estimates were high for the original two scales with a sample of 200 undergraduates. Also, using an item parceling procedure, the obtained internal consistency reliability indices for each parcel were acceptable. Results of the CFA in Study II provided support for the factorial stability of the model identified in Study I. Furthermore, multisample analyses showed invariant patterns for factor loadings and factor correlations across 138 men and 272 women. Gender differences were not observed in the mean SIAS and SPS scale and item scores. Both scales correlated negatively and significantly with measures of social desirability. Concurrent validity was established for the scales. The SPS was less specific than the SIAS to symptoms of social phobia. 相似文献
20.
Shalane K. Sadri Peter M. McEvoy Anthony Pinto Rebecca A. Anderson 《Journal of personality assessment》2019,101(3):284-293
Obsessive-compulsive personality disorder (OCPD) has been subject to numerous definition and classification changes, which has contributed to difficulties in reliable measurement of the disorder. Consequently, OCPD measures have yielded poor validity and inconsistent prevalence estimates. Reliable and valid measures of OCPD are needed. The aim of the current study was to examine the factor structure and psychometric properties of the Pathological Obsessive Compulsive Personality Scale (POPS). Participants (N = 571 undergraduates) completed a series of self-report measures online, including the POPS. Confirmatory factor analysis was used to compare the fit of unidimensional, five factor, and bifactor models of the POPS. Convergent and divergent validity were assessed in relation to other personality dimensions. A bifactor model provided the best fit to the data, indicating that the total POPS scale and four subscales can be scored to obtain reliable indicators of OCPD. The POPS was most strongly associated with a disorder-specific measure of OCPD, however there were also positive associations with theoretically disparate constructs, thus further research is needed to clarify validity of the scale. 相似文献