期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Illustration of MIMIC-Model DIF Testing with the Schedule for Nonadaptive and Adaptive Personality

Carol M. Woods Thomas F. Oltmanns Eric Turkheimer 《Journal of psychopathology and behavioral assessment》2009,31(4):320-330

This research provides an example of testing for differential item functioning (DIF) using multiple indicator multiple cause (MIMIC) structural equation models. True/False items on five scales of the Schedule for Nonadaptive and Adaptive Personality (SNAP) were tested for uniform DIF in a sample of Air Force recruits with groups defined by gender and ethnicity. Uniform DIF exists when an item is more easily endorsed for one group than the other, controlling for group mean differences on the variable under study. Results revealed significant DIF for many SNAP items and some effects were quite large. Differentially-functioning items can produce measurement bias and should be either deleted or modeled as if separate items were administered to different groups. Future research should aim to determine whether the DIF observed here holds for other samples. 相似文献

2.

Cross-country gender DIF in PISA science literacy items

Jehanzeb R. Cheema 《European Journal of Developmental Psychology》2019,16(2):152-166

This study investigated gender based differential item functioning (DIF) in science literacy items included in the Program for International Student Assessment (PISA) 2012. Prior research has suggested presence of such DIF in large scale surveys. Our study extends the empirical literature by examining gender based DIF differences at the country level in order to gain a better overall picture of how cultural and national differences affect occurrence of uniform and nonuniform DIF. Our statistical results indicate existence of widespread gender based DIF in PISA with estimates of percentage of potentially biased items ranging between 2 and 44% (M = 16, SD = 9.9). Our reliance on nationally representative country samples allow these findings to have wide applicability. 相似文献

3.

Examining the Relationship Between Gender DIF and Language Complexity in Mathematics Assessments

Adnan Kan 《International Journal of Testing》2014,14(3):245-264

This study investigated whether the linguistic complexity of items leads to gender differential item functioning (DIF) on mathematics assessments. Two forms of a mathematics test were developed. The first form consisted of algebra items based on mathematical expressions, terms, and equations. In the second form, the same items were written as word problems without changing their contents and solutions. The test forms were given to a sample of 671 sixth-grade students from 10 middle schools in Turkey. The tests were administered to the students with a 4-week interval. Explanatory item response modeling and logistic regression approaches were used to examine gender DIF. Several word problems were flagged as having gender DIF in favor of female examinees, whereas mathematically expressed forms of the same items did not function differently across male and female examinees. The verbal content of word problems seems to influence the way males and females respond to items. 相似文献

4.

Classification of loneliness using the T‐ILS: Is it as simple as it seems?

Tine Nielsen Ida S. Friderichsen Signe Boe Rayce 《Scandinavian journal of psychology》2021,62(1):104-115

Student well‐being is a growing issue in higher education, and assessment of the prevalence of conditions as loneliness is therefore important. In higher education and population surveys the Three‐Item Loneliness Scale (T‐ILS) is used increasingly. The T‐ILS is attractive for large multi‐subject surveys, as it consists of only three items (derived from the UCLA Loneliness Scale). Several ways of classifying persons as lonely based on T‐ILS scores exist: dichotomous and trichotomous classification schemes and use of sum scores with rising levels indicating more loneliness. The question remains whether T‐ILS scores are comparable across the different population groups where they are used or across groups of students in the higher education system. The aim was to investigate whether the T‐ILS suffers from differential item functioning (DIF) that might change the loneliness classification among higher education students, using a large sample just admitted to 22 different academy profession degree programs in Denmark (N = 3,757). DIF was tested relative to degree program, age groups and gender. The framework of graphical loglinear Rasch models was applied, as this allows for adjustment of sum scores for uniform DIF, and thus for assessment of whether DIF might change the classification. Two items showed DIF relative to degree program and gender, and adjusting for this DIF changed the classification for some subgroups. The consequences were negligible when using a dichotomous classification and larger when using a trichotomous classification. Therefore, trichotomous classification should be used with caution unless suitable adjustments for DIF are done prior to classification. 相似文献

5.

Understanding the latent structure of job performance ratings

Scullen SE Mount MK Goff M 《The Journal of applied psychology》2000,85(6):956-970

This study quantified the effects of 5 factors postulated to influence performance ratings: the ratee's general level of performance, the ratee's performance on a specific dimension, the rater's idiosyncratic rating tendencies, the rater's organizational perspective, and random measurement error. Two large data sets, consisting of managers (n = 2,350 and n = 2,142) who received developmental ratings on 3 performance dimensions from 7 raters (2 bosses, 2 peers, 2 subordinates, and self) were used. Results indicated that idiosyncratic rater effects (62% and 53%) accounted for over half of the rating variance in both data sets. The combined effects of general and dimensional ratee performance (21% and 25%) were less than half the size of the idiosyncratic rater effects. Small perspective-related effects were found in boss and subordinate ratings but not in peer ratings. Average random error effects in the 2 data sets were 11% and 18%. 相似文献

6.

ASSESSING THE EFFECT OF COHORT,GENDER, AND RACE ON DIFFERENTIAL ITEM FUNCTIONING (DIF) IN AN ADAPTIVE TEST DESIGNED FOR MULTI-AGE GROUPS

HYE-SOOK PARK P. DAVID PEARSON MARK D. RECKASE 《Reading Psychology》2013,34(1):81-101

Differential item functioning (DIF) statistics were computed using items from the Peabody Individual Achievement Test (PIAT)-Reading Comprehension subtest for children of the same age group (ages 7 through 12 respectively). The pattern of observed DIF items was determined by comparing each cohort across age groups. Differences related to race and gender were also identified within each cohort. Characteristics of DIF items were identified based on sentence length, vocabulary frequency, and density of a sentence. DIF items were more frequently associated with short sentences than with long sentences. This study explored the potential limitation in the longitudinal use of items in an adaptive test. 相似文献

7.

Examining differential item functioning due to item difficulty and alternative attractiveness

Paul Westers Henk Kelderman 《Psychometrika》1992,57(1):107-118

A method for analyzing test item responses is proposed to examine differential item functioning (DIF) in multiple-choice items through a combination of the usual notion of DIF, for correct/incorrect responses and information about DIF contained in each of the alternatives. The proposed method uses incomplete latent class models to examine whether DIF is caused by the attractiveness of the alternatives, difficulty of the item, or both. DIF with respect to either known or unknown subgroups can be tested by a likelihood ratio test that is asymptotically distributed as a chi-square random variable. 相似文献

8.

Evaluating the Equivalence of an Employee Attitude Survey Across Languages,Cultures, and Administration Formats

《International Journal of Testing》2013,13(2):129-150

Modern multinational organizations have found that testing or surveying their employees is difficult when the employees come from a variety of language backgrounds. In such situations, surveys and tests are often adapted for use across multiple languages. However, different language versions of an instrument are not necessarily equivalent, which may lead to misleading interpretations. In this study, we used weighted multidimensional scaling (MDS), analysis of covariance (ANCOVA), and ordinal logistic regression (LR) to evaluate the structural equivalence and differential item functioning (DIF) of an employee attitude survey from a large international corporation. Specifically, we evaluated the functioning of the survey items across 3 different languages, 8 different cultures, and 2 mediums of administration (paper-based and Web-based). MDS was used to evaluate structural equivalence, and ANCOVA and LR were used to evaluate DIF across selected employee groups and the 2 administration formats. The results indicated that the structure of the survey data was consistent and that the items functioned similarly across all groups. The results also illustrate the utility of MDS, ANCOVA, and LR for evaluating translated instruments. The implications of the results for future research in this area are discussed. 相似文献

9.

Improvement in Detection of Differential Item Functioning Using a Mixture Item Response Theory Model

Annette M. Maij-de Meij Henk Kelderman Henk van der Flier 《Multivariate behavioral research》2013,48(6):975-999

Usually, methods for detection of differential item functioning (DIF) compare the functioning of items across manifest groups. However, the manifest groups with respect to which the items function differentially may not necessarily coincide with the true source of the bias. It is expected that DIF detection under a model that includes a latent DIF variable is more sensitive to this source of bias. In a simulation study, it is shown that a mixture item response theory model, which includes a latent grouping variable, performs better in identifying DIF items than DIF detection methods using manifest variables only. The difference between manifest and latent DIF detection increases as the correlation between the manifest variable and the true source of the DIF becomes smaller. Different sample sizes, relative group sizes, and significance levels are studied. Finally, an empirical example demonstrates the detection of heterogeneity in a minority sample using a latent grouping variable. Manifest and latent DIF detection methods are applied to a Vocabulary test of the General Aptitude Test Battery (GATB). 相似文献

10.

Differential Item Functioning on Antisocial Behavior Scale Items for Adolescents and Young Adults from Single-Parent and Two-Parent Families

Young I. Cho Monica J. Martin Rand D. Conger Keith F. Widaman 《Journal of psychopathology and behavioral assessment》2010,32(2):157-168

We investigated measurement equivalence in two antisocial behavior scales (i.e., one scale for adolescents and a second scale for young adults) by examining differential item functioning (DIF) for respondents from single-parent (n = 109) and two-parent families (n = 447). Even though one item in the scale for adolescents and two items in the scale for young adults showed significant DIF, the two scales exhibited non-significant differential test functioning (DTF). Both uniform and nonuniform DIF were investigated and examples of each type were identified. Specifically, uniform DIF was exhibited in the adolescent scale whereas nonuniform DIF was shown in the young adult scale. Implications of DIF results for assessment of antisocial behavior, along with strengths and limitations of the study, are discussed. 相似文献

11.

汉语词汇测验中的项目功能差异初探 总被引：6，自引：1，他引：5

曹亦薇张厚粲《心理学报》1999,32(4):460-467

该文运用两种不同的方法对实际的汉语词汇测验中的３６个词汇进行了ＤＩＦ探测。对于１４００多劬的初三学生分别作了男女生与城郊学生间的比较。在男女组分析中检出７个属于一致性ＤＩＦ的项目;对于城郊学生组经两种方法同时确定的ＤＩＦ项目有７个,其中５个是一致性ＤＩＦ,２个是非一致性ＤＩＦ的项目。该文还讨论了产生ＤＩＦ的可能因素。相似文献

12.

HYPOTHESIZING DIFFERENTIAL ITEM FUNCTIONING IN GLOBAL EMPLOYEE OPINION SURVEYS

ANN MARIE RYAN MICHAEL HORVATH ROBERT E. PLOYHART NEAL SCHMITT L. ALLEN SLADE 《Personnel Psychology》2000,53(3):531-562

Cross-cultural researchers have not used cultural dimensions to predict when differential item functioning (DIF) in attitude survey items is likely to occur. Predictive hypotheses for items related to supervision on a global corporate survey were developed based on 3 of Hofstede's (1991a) dimensions. In some cases, greater DIF was found on hypothesized items between countries differing on cultural dimensions. Implications for the use of this framework and DIF in examining multinational employee opinion surveys are discussed. 相似文献

13.

Gender bias and construct validity in vocational interest measurement: Differential item functioning in the Strong Interest Inventory

Sif Einarsdóttir James Rounds 《Journal of Vocational Behavior》2009,74(3):295-307

Item response theory was used to address gender bias in interest measurement. Differential item functioning (DIF) technique, SIBTEST and DIMTEST for dimensionality, were applied to the items of the six General Occupational Theme (GOT) and 25 Basic Interest (BI) scales in the Strong Interest Inventory. A sample of 1860 women and 1105 men was used. The scales were not unidimensional and contain both primary and minor dimensions. Gender-related DIF was detected in two-thirds of the items. Item type (i.e., occupations, activities, school subjects, types of people) did not differ in DIF. A sex-type dimension was found to influence the responses of men and women differently. When the biased items were removed from the GOT scales, gender differences favoring men were reduced in the R and I scales but gender differences favoring women remained in the A and S scales. Implications for the development, validation and use of interest measures are discussed. 相似文献

14.

Illustration of Multilevel Explanatory IRT Model DIF Testing with the Creative Thinking Scale

Meihua Qian Xianyong Wang 《创造性行为杂志》2020,54(4):1021-1027

Creativity has been well studied in the past several decades, and numerous measures have been developed to assess creativity. However, validity evidence associated with each measure is often mixed. In particular, the social consequence aspect of validity has received little attention. This is partly due to the difficulty of testing for differential item functioning (DIF) within the traditional classical test theory framework, which still remains the most popular approach to assessing creativity. Hence, this study provides an example of examining differential item functioning using multilevel explanatory item response theory models. The Creative Thinking Scale was tested for DIF in a sample of 1043 10th–12th graders. Results revealed significant uniform and non-uniform DIF for some items. Differentially functioning items are able to produce measurement bias and should be either deleted or modeled. The detailed implications for researchers and practitioners are discussed. 相似文献

15.

Differential Item Functioning of the Emotional and Behavioral Screener for Caucasian and African American Elementary School Students

Matthew C. Lambert Allen G. Garcia Michael H. Epstein Douglas Cullinan 《Journal Of Applied School Psychology》2018,34(3):201-214

The present study examined the psychometric properties of a universal screening instrument called the Emotional and Behavioral Screener (EBS), which is designed to identify students exhibiting emotional and behavioral problems. The primary purposes of this study were to assess the measurement invariance of EBS items between Caucasian and African-American students and to assess the impact of differential item functioning (DIF) on EBS scores. The sample consisted of 946 elementary students from throughout the U.S. The findings suggested that EBS items exhibited small to negligible levels of DIF, and that DIF did not significantly impact EBS scores. The results supported the EBS as universal screening instrument that is fair in measuring the emotional and behavioral risk of elementary students. Research limitations and implications for school professionals are discussed. 相似文献

16.

Computing interrater reliability on the Apple Macintosh computer

John O. Brooks Laura L. Brooks 《Behavior research methods》1991,23(1):82-84

A program is described for computing interrater reliability by averaging, for each rater, the correlations between one rater’s ratings and every other rater’s ratings. For situations in which raters rate more than one ratee, raters’ reliabilities can be computed for either each item or each ratee. The program reads data from a text file and puts the reliability coefficients in a text file. The standard Macintosh interface is implemented. The Quick-BASIC program is distributed both as a listing and in compiled form; it can be run with advantage with math coprocessors. 相似文献

17.

Are performance appraisal ratings from different rating sources comparable? 总被引：2，自引：0，他引：2

Facteau JD Craig SB 《The Journal of applied psychology》2001,86(2):215-227

The purpose of this study was to test whether a multisource performance appraisal instrument exhibited measurement invariance across different groups of raters. Multiple-groups confirmatory factor analysis as well as item response theory (IRT) techniques were used to test for invariance of the rating instrument across self, peer, supervisor, and subordinate raters. The results of the confirmatory factor analysis indicated that the rating instrument was invariant across these rater groups. The IRT analysis yielded some evidence of differential item and test functioning, but it was limited to the effects of just 3 items and was trivial in magnitude. Taken together, the results suggest that the rating instrument could be regarded as invariant across the rater groups, thus supporting the practice of directly comparing their ratings. Implications for research and practice are discussed, as well as for understanding the meaning of between-source rating discrepancies. 相似文献

18.

The PHQ-9 assesses depression similarly in men and women from the general population

《Personality and individual differences》2014

Gender-based differential item functioning occurs when men and women respond differently to an item despite being similar on the trait assessed by that item. The Patient Health Questionnaire-9 (PHQ-9) is a prominent screening tool for depression. Researchers exploring whether the PHQ-9 exhibits gender-based differential item functioning have used only specialized samples (e.g., individuals with cancer or vision loss). We explored gender bias in the PHQ-9 by means of differential item functioning analyses in a population-based sample.We made use of the National Health and Nutrition Examination Surveys (NHANES, 2008), a population-based sample of the USA including 5995 participants. Differential item functioning was assessed using the Mantel-Haenszel chi-square test and by comparing item characteristic curves between men and women.All items exhibited negligible differential item functioning as demonstrated by the Mantel-Haenszel test, with absolute standardized mean differences ranging from 0.00 to 0.06. Item characteristic curves were similar between genders for all but one item. Item 5 (i.e., changes in appetite) exhibited very minor non-uniform differential item functioning, wherein extremely depressed women endorsed higher response options on this item compared to equally depressed men.Researchers can use the PHQ-9 without concern of gender biases, particularly in epidemiological research. 相似文献

19.

Item–trait association, scale multidimensionality, and differential item functioning identification in personality assessment

John T. Kulas Jenny Merriam Yuko Onama 《Journal of research in personality》2008,42(4):1102-1108

This report documents relationships between differential item functioning (DIF) identification and: (1) item–trait association, and (2) scale multidimensionality in personality assessment. Applying [Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.] logistic regression model, DIF effect size is found to become increasingly inflated as investigated item associations with trait scores decrease. Similar patterns were noted for the influence of scale multidimensionality on DIF identification. Individuals who investigate DIF in personality assessment applications are provided with estimates regarding the impact of the magnitude of item and trait association and scale multidimensionality on DIF occurrence and effect size. The results emphasize the importance of excluding investigated items in focal trait identification prior to conducting DIF analyses and reporting item and scale psychometric properties in DIF reports. 相似文献

20.

Psychometric Properties of the Wisconsin Schizotypy Scales in an Undergraduate Sample: Classical Test Theory,Item Response Theory,and Differential Item Functioning

Beate?P.?Winterstein Terry?A.?Ackerman Paul?J.?Silvia Thomas?R.?Kwapil Email author 《Journal of psychopathology and behavioral assessment》2011,33(4):480-490

The Wisconsin Schizotypy Scales are widely used for assessing schizotypy in nonclinical and clinical samples. However, they were developed using classical test theory (CTT) and have not had their psychometric properties examined with more sophisticated measurement models. The present study employed item response theory (IRT) as well as traditional CTT to examine psychometric properties of four of the schizotypy scales on the item and scale level, using a large sample of undergraduate students (n = 6,137). In addition, we investigated differential item functioning (DIF) for sex and ethnicity. The analyses revealed many strengths of the four scales, but some items had low discrimination values and many items had high DIF. The results offer useful guidance for applied users and for future development of these scales. 相似文献