首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This research provides an example of testing for differential item functioning (DIF) using multiple indicator multiple cause (MIMIC) structural equation models. True/False items on five scales of the Schedule for Nonadaptive and Adaptive Personality (SNAP) were tested for uniform DIF in a sample of Air Force recruits with groups defined by gender and ethnicity. Uniform DIF exists when an item is more easily endorsed for one group than the other, controlling for group mean differences on the variable under study. Results revealed significant DIF for many SNAP items and some effects were quite large. Differentially-functioning items can produce measurement bias and should be either deleted or modeled as if separate items were administered to different groups. Future research should aim to determine whether the DIF observed here holds for other samples.  相似文献   

2.
This study investigated gender based differential item functioning (DIF) in science literacy items included in the Program for International Student Assessment (PISA) 2012. Prior research has suggested presence of such DIF in large scale surveys. Our study extends the empirical literature by examining gender based DIF differences at the country level in order to gain a better overall picture of how cultural and national differences affect occurrence of uniform and nonuniform DIF. Our statistical results indicate existence of widespread gender based DIF in PISA with estimates of percentage of potentially biased items ranging between 2 and 44% (M = 16, SD = 9.9). Our reliance on nationally representative country samples allow these findings to have wide applicability.  相似文献   

3.
This study investigated whether the linguistic complexity of items leads to gender differential item functioning (DIF) on mathematics assessments. Two forms of a mathematics test were developed. The first form consisted of algebra items based on mathematical expressions, terms, and equations. In the second form, the same items were written as word problems without changing their contents and solutions. The test forms were given to a sample of 671 sixth-grade students from 10 middle schools in Turkey. The tests were administered to the students with a 4-week interval. Explanatory item response modeling and logistic regression approaches were used to examine gender DIF. Several word problems were flagged as having gender DIF in favor of female examinees, whereas mathematically expressed forms of the same items did not function differently across male and female examinees. The verbal content of word problems seems to influence the way males and females respond to items.  相似文献   

4.
Student well‐being is a growing issue in higher education, and assessment of the prevalence of conditions as loneliness is therefore important. In higher education and population surveys the Three‐Item Loneliness Scale (T‐ILS) is used increasingly. The T‐ILS is attractive for large multi‐subject surveys, as it consists of only three items (derived from the UCLA Loneliness Scale). Several ways of classifying persons as lonely based on T‐ILS scores exist: dichotomous and trichotomous classification schemes and use of sum scores with rising levels indicating more loneliness. The question remains whether T‐ILS scores are comparable across the different population groups where they are used or across groups of students in the higher education system. The aim was to investigate whether the T‐ILS suffers from differential item functioning (DIF) that might change the loneliness classification among higher education students, using a large sample just admitted to 22 different academy profession degree programs in Denmark (N = 3,757). DIF was tested relative to degree program, age groups and gender. The framework of graphical loglinear Rasch models was applied, as this allows for adjustment of sum scores for uniform DIF, and thus for assessment of whether DIF might change the classification. Two items showed DIF relative to degree program and gender, and adjusting for this DIF changed the classification for some subgroups. The consequences were negligible when using a dichotomous classification and larger when using a trichotomous classification. Therefore, trichotomous classification should be used with caution unless suitable adjustments for DIF are done prior to classification.  相似文献   

5.
This study quantified the effects of 5 factors postulated to influence performance ratings: the ratee's general level of performance, the ratee's performance on a specific dimension, the rater's idiosyncratic rating tendencies, the rater's organizational perspective, and random measurement error. Two large data sets, consisting of managers (n = 2,350 and n = 2,142) who received developmental ratings on 3 performance dimensions from 7 raters (2 bosses, 2 peers, 2 subordinates, and self) were used. Results indicated that idiosyncratic rater effects (62% and 53%) accounted for over half of the rating variance in both data sets. The combined effects of general and dimensional ratee performance (21% and 25%) were less than half the size of the idiosyncratic rater effects. Small perspective-related effects were found in boss and subordinate ratings but not in peer ratings. Average random error effects in the 2 data sets were 11% and 18%.  相似文献   

6.

Differential item functioning (DIF) statistics were computed using items from the Peabody Individual Achievement Test (PIAT)-Reading Comprehension subtest for children of the same age group (ages 7 through 12 respectively). The pattern of observed DIF items was determined by comparing each cohort across age groups. Differences related to race and gender were also identified within each cohort. Characteristics of DIF items were identified based on sentence length, vocabulary frequency, and density of a sentence. DIF items were more frequently associated with short sentences than with long sentences. This study explored the potential limitation in the longitudinal use of items in an adaptive test.  相似文献   

7.
A method for analyzing test item responses is proposed to examine differential item functioning (DIF) in multiple-choice items through a combination of the usual notion of DIF, for correct/incorrect responses and information about DIF contained in each of the alternatives. The proposed method uses incomplete latent class models to examine whether DIF is caused by the attractiveness of the alternatives, difficulty of the item, or both. DIF with respect to either known or unknown subgroups can be tested by a likelihood ratio test that is asymptotically distributed as a chi-square random variable.  相似文献   

8.
Modern multinational organizations have found that testing or surveying their employees is difficult when the employees come from a variety of language backgrounds. In such situations, surveys and tests are often adapted for use across multiple languages. However, different language versions of an instrument are not necessarily equivalent, which may lead to misleading interpretations. In this study, we used weighted multidimensional scaling (MDS), analysis of covariance (ANCOVA), and ordinal logistic regression (LR) to evaluate the structural equivalence and differential item functioning (DIF) of an employee attitude survey from a large international corporation. Specifically, we evaluated the functioning of the survey items across 3 different languages, 8 different cultures, and 2 mediums of administration (paper-based and Web-based). MDS was used to evaluate structural equivalence, and ANCOVA and LR were used to evaluate DIF across selected employee groups and the 2 administration formats. The results indicated that the structure of the survey data was consistent and that the items functioned similarly across all groups. The results also illustrate the utility of MDS, ANCOVA, and LR for evaluating translated instruments. The implications of the results for future research in this area are discussed.  相似文献   

9.
Usually, methods for detection of differential item functioning (DIF) compare the functioning of items across manifest groups. However, the manifest groups with respect to which the items function differentially may not necessarily coincide with the true source of the bias. It is expected that DIF detection under a model that includes a latent DIF variable is more sensitive to this source of bias. In a simulation study, it is shown that a mixture item response theory model, which includes a latent grouping variable, performs better in identifying DIF items than DIF detection methods using manifest variables only. The difference between manifest and latent DIF detection increases as the correlation between the manifest variable and the true source of the DIF becomes smaller. Different sample sizes, relative group sizes, and significance levels are studied. Finally, an empirical example demonstrates the detection of heterogeneity in a minority sample using a latent grouping variable. Manifest and latent DIF detection methods are applied to a Vocabulary test of the General Aptitude Test Battery (GATB).  相似文献   

10.
We investigated measurement equivalence in two antisocial behavior scales (i.e., one scale for adolescents and a second scale for young adults) by examining differential item functioning (DIF) for respondents from single-parent (n = 109) and two-parent families (n = 447). Even though one item in the scale for adolescents and two items in the scale for young adults showed significant DIF, the two scales exhibited non-significant differential test functioning (DTF). Both uniform and nonuniform DIF were investigated and examples of each type were identified. Specifically, uniform DIF was exhibited in the adolescent scale whereas nonuniform DIF was shown in the young adult scale. Implications of DIF results for assessment of antisocial behavior, along with strengths and limitations of the study, are discussed.  相似文献   

11.
汉语词汇测验中的项目功能差异初探   总被引:6,自引:1,他引:5  
曹亦薇  张厚粲 《心理学报》1999,32(4):460-467
该文运用两种不同的方法对实际的汉语词汇测验中的36个词汇进行了DIF探测。对于1400多劬的初三学生分别作了男女生与城郊学生间的比较。在男女组分析中检出7个属于一致性DIF的项目;对于城郊学生组经两种方法同时确定的DIF项目有7个,其中5个是一致性DIF,2个是非一致性DIF的项目。该文还讨论了产生DIF的可能因素。  相似文献   

12.
Cross-cultural researchers have not used cultural dimensions to predict when differential item functioning (DIF) in attitude survey items is likely to occur. Predictive hypotheses for items related to supervision on a global corporate survey were developed based on 3 of Hofstede's (1991a) dimensions. In some cases, greater DIF was found on hypothesized items between countries differing on cultural dimensions. Implications for the use of this framework and DIF in examining multinational employee opinion surveys are discussed.  相似文献   

13.
Item response theory was used to address gender bias in interest measurement. Differential item functioning (DIF) technique, SIBTEST and DIMTEST for dimensionality, were applied to the items of the six General Occupational Theme (GOT) and 25 Basic Interest (BI) scales in the Strong Interest Inventory. A sample of 1860 women and 1105 men was used. The scales were not unidimensional and contain both primary and minor dimensions. Gender-related DIF was detected in two-thirds of the items. Item type (i.e., occupations, activities, school subjects, types of people) did not differ in DIF. A sex-type dimension was found to influence the responses of men and women differently. When the biased items were removed from the GOT scales, gender differences favoring men were reduced in the R and I scales but gender differences favoring women remained in the A and S scales. Implications for the development, validation and use of interest measures are discussed.  相似文献   

14.
Creativity has been well studied in the past several decades, and numerous measures have been developed to assess creativity. However, validity evidence associated with each measure is often mixed. In particular, the social consequence aspect of validity has received little attention. This is partly due to the difficulty of testing for differential item functioning (DIF) within the traditional classical test theory framework, which still remains the most popular approach to assessing creativity. Hence, this study provides an example of examining differential item functioning using multilevel explanatory item response theory models. The Creative Thinking Scale was tested for DIF in a sample of 1043 10th–12th graders. Results revealed significant uniform and non-uniform DIF for some items. Differentially functioning items are able to produce measurement bias and should be either deleted or modeled. The detailed implications for researchers and practitioners are discussed.  相似文献   

15.
The present study examined the psychometric properties of a universal screening instrument called the Emotional and Behavioral Screener (EBS), which is designed to identify students exhibiting emotional and behavioral problems. The primary purposes of this study were to assess the measurement invariance of EBS items between Caucasian and African-American students and to assess the impact of differential item functioning (DIF) on EBS scores. The sample consisted of 946 elementary students from throughout the U.S. The findings suggested that EBS items exhibited small to negligible levels of DIF, and that DIF did not significantly impact EBS scores. The results supported the EBS as universal screening instrument that is fair in measuring the emotional and behavioral risk of elementary students. Research limitations and implications for school professionals are discussed.  相似文献   

16.
A program is described for computing interrater reliability by averaging, for each rater, the correlations between one rater’s ratings and every other rater’s ratings. For situations in which raters rate more than one ratee, raters’ reliabilities can be computed for either each item or each ratee. The program reads data from a text file and puts the reliability coefficients in a text file. The standard Macintosh interface is implemented. The Quick-BASIC program is distributed both as a listing and in compiled form; it can be run with advantage with math coprocessors.  相似文献   

17.
Are performance appraisal ratings from different rating sources comparable?   总被引:2,自引:0,他引:2  
The purpose of this study was to test whether a multisource performance appraisal instrument exhibited measurement invariance across different groups of raters. Multiple-groups confirmatory factor analysis as well as item response theory (IRT) techniques were used to test for invariance of the rating instrument across self, peer, supervisor, and subordinate raters. The results of the confirmatory factor analysis indicated that the rating instrument was invariant across these rater groups. The IRT analysis yielded some evidence of differential item and test functioning, but it was limited to the effects of just 3 items and was trivial in magnitude. Taken together, the results suggest that the rating instrument could be regarded as invariant across the rater groups, thus supporting the practice of directly comparing their ratings. Implications for research and practice are discussed, as well as for understanding the meaning of between-source rating discrepancies.  相似文献   

18.
Gender-based differential item functioning occurs when men and women respond differently to an item despite being similar on the trait assessed by that item. The Patient Health Questionnaire-9 (PHQ-9) is a prominent screening tool for depression. Researchers exploring whether the PHQ-9 exhibits gender-based differential item functioning have used only specialized samples (e.g., individuals with cancer or vision loss). We explored gender bias in the PHQ-9 by means of differential item functioning analyses in a population-based sample.We made use of the National Health and Nutrition Examination Surveys (NHANES, 2008), a population-based sample of the USA including 5995 participants. Differential item functioning was assessed using the Mantel-Haenszel chi-square test and by comparing item characteristic curves between men and women.All items exhibited negligible differential item functioning as demonstrated by the Mantel-Haenszel test, with absolute standardized mean differences ranging from 0.00 to 0.06. Item characteristic curves were similar between genders for all but one item. Item 5 (i.e., changes in appetite) exhibited very minor non-uniform differential item functioning, wherein extremely depressed women endorsed higher response options on this item compared to equally depressed men.Researchers can use the PHQ-9 without concern of gender biases, particularly in epidemiological research.  相似文献   

19.
This report documents relationships between differential item functioning (DIF) identification and: (1) item–trait association, and (2) scale multidimensionality in personality assessment. Applying [Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.] logistic regression model, DIF effect size is found to become increasingly inflated as investigated item associations with trait scores decrease. Similar patterns were noted for the influence of scale multidimensionality on DIF identification. Individuals who investigate DIF in personality assessment applications are provided with estimates regarding the impact of the magnitude of item and trait association and scale multidimensionality on DIF occurrence and effect size. The results emphasize the importance of excluding investigated items in focal trait identification prior to conducting DIF analyses and reporting item and scale psychometric properties in DIF reports.  相似文献   

20.
The Wisconsin Schizotypy Scales are widely used for assessing schizotypy in nonclinical and clinical samples. However, they were developed using classical test theory (CTT) and have not had their psychometric properties examined with more sophisticated measurement models. The present study employed item response theory (IRT) as well as traditional CTT to examine psychometric properties of four of the schizotypy scales on the item and scale level, using a large sample of undergraduate students (n = 6,137). In addition, we investigated differential item functioning (DIF) for sex and ethnicity. The analyses revealed many strengths of the four scales, but some items had low discrimination values and many items had high DIF. The results offer useful guidance for applied users and for future development of these scales.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号