首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This research provides an example of testing for differential item functioning (DIF) using multiple indicator multiple cause (MIMIC) structural equation models. True/False items on five scales of the Schedule for Nonadaptive and Adaptive Personality (SNAP) were tested for uniform DIF in a sample of Air Force recruits with groups defined by gender and ethnicity. Uniform DIF exists when an item is more easily endorsed for one group than the other, controlling for group mean differences on the variable under study. Results revealed significant DIF for many SNAP items and some effects were quite large. Differentially-functioning items can produce measurement bias and should be either deleted or modeled as if separate items were administered to different groups. Future research should aim to determine whether the DIF observed here holds for other samples.  相似文献   

2.
In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.  相似文献   

3.
This study investigated the equivalence of different types of informants, such as children (or early adolescents) and parents, in evaluating child externalizing and internalizing problems. We applied a polytomous item response theory (IRT) model for the Strengths and Difficulties Questionnaire (SDQ). We obtained responses to three subscales—Conduct Problems, Hyperactivity/Inattention, and Emotional Symptoms—from 541 elementary school students aged 10–12 years, fathers for 233 students, mothers for 275 students, and the homeroom teachers for 524 students. Expected values on the individual item calculated by the discrimination and threshold parameters were compared among students, fathers, and mothers as an investigation of differential item functioning (DIF) or differential informant functioning. Assessing either externalizing or internalizing problems were mostly equivalent between fathers and mothers, and most items for externalizing problems functioned equally between students and parents, whereas items for internalizing problems showed DIF between them. IRT also yielded that the intervals of response categories varied across items, particularly for the conduct problems items “fight” and “steal,” and positively worded items showed an extremely low threshold.  相似文献   

4.
本研究以4岁~5岁儿童认知能力测验为例,在IRT框架下探讨了如何进行追踪数据的测量不变性分析。分析模型采用项目间多维项目反应理论模型(between-item MIRT model)和项目内(within-item MIRT model)多维two-tier model,被试为来自全国的882名48个月的儿童,工具为自编4岁~5岁儿童认知能力测验。经测验水平 分析和项目水平分析,结果表明:(1)本文对追踪数据的测量不变性分析方法合理有效; (2)该测验在两个时间点上满足部分测量不变性要求,测验的潜在结构稳定; (3)“方位题”的区分度和难度参数都发生变化,另有4题难度参数出现浮动; (4)儿童在4岁~5岁期间认知能力总体呈快速发展趋势,能力增长显著。  相似文献   

5.
Score equity assessment (SEA) refers to an examination of population invariance of equating across two or more subpopulations of test examinees. Previous SEA studies have shown that score equity may be present for examinees scoring at particular test score ranges but absent for examinees scoring at other score ranges. No studies to date have performed research for the purpose of understanding why score equity can be inconsistent across the score range of some tests. The purpose of this study is to explore a source of uneven subpopulation score equity across the score range of a test. It is hypothesized that the difficulty of anchor items displaying differential item functioning (DIF) is directly related to the score location at which issues of score inequity are observed. The simulation study supports the hypothesis that the difficulty of DIF items has a systematic impact on the uneven nature of conditional score equity.  相似文献   

6.
The present study examined the psychometric properties of a universal screening instrument called the Emotional and Behavioral Screener (EBS), which is designed to identify students exhibiting emotional and behavioral problems. The primary purposes of this study were to assess the measurement invariance of EBS items between Caucasian and African-American students and to assess the impact of differential item functioning (DIF) on EBS scores. The sample consisted of 946 elementary students from throughout the U.S. The findings suggested that EBS items exhibited small to negligible levels of DIF, and that DIF did not significantly impact EBS scores. The results supported the EBS as universal screening instrument that is fair in measuring the emotional and behavioral risk of elementary students. Research limitations and implications for school professionals are discussed.  相似文献   

7.

Differential item functioning (DIF) statistics were computed using items from the Peabody Individual Achievement Test (PIAT)-Reading Comprehension subtest for children of the same age group (ages 7 through 12 respectively). The pattern of observed DIF items was determined by comparing each cohort across age groups. Differences related to race and gender were also identified within each cohort. Characteristics of DIF items were identified based on sentence length, vocabulary frequency, and density of a sentence. DIF items were more frequently associated with short sentences than with long sentences. This study explored the potential limitation in the longitudinal use of items in an adaptive test.  相似文献   

8.
Standardized tests are used widely in comparative studies of clinical populations, either as dependent or control variables. Yet, one cannot always be sure that the test items measure the same constructs in the groups under study. In the present work, 460 participants with intellectual disability of undifferentiated etiology and 488 typical children were tested using Raven's Colored Progressive Matrices (RCPM). Data were analyzed using binomial logistic regression modeling designed to detect differential item functioning (DIF). Results showed that 12 items out of 36 function differentially between the two groups, but only 2 items exhibit at least moderate DIF. Thus, a very large majority of the items have identical discriminative power and difficulty levels across the two groups. It is concluded that RCPM can be used with confidence in studies comparing participants with and without intellectual disability. In addition, it is suggested that methods for investigating internal bias of tests used in cross-cultural, cross-linguistic or cross-gender comparisons should also be regularly employed in studies of clinical populations, particularly in the field of developmental disability, to show the absence of systematic measurement error (i.e. DIF) affecting item responses.  相似文献   

9.
The aim of this study was to determine whether the items from a reading comprehension test in European Portuguese function differently across students from rural and urban areas, which biases the test validity and the equity in assessment. The sample was composed of 653 students from second, third and fourth grades. The presence of differential item functioning (DIF) was analysed using logistic regression and the Mantel–Haenszel procedure. Although 17 items were flagged with DIF, only five items showed non-negligible DIF in all effect-size measures. The evidence of invariance across students with rural or urban backgrounds for most of the items supports the validity of the test though the five identified items should be further investigated.  相似文献   

10.
Item response theory was used to address gender bias in interest measurement. Differential item functioning (DIF) technique, SIBTEST and DIMTEST for dimensionality, were applied to the items of the six General Occupational Theme (GOT) and 25 Basic Interest (BI) scales in the Strong Interest Inventory. A sample of 1860 women and 1105 men was used. The scales were not unidimensional and contain both primary and minor dimensions. Gender-related DIF was detected in two-thirds of the items. Item type (i.e., occupations, activities, school subjects, types of people) did not differ in DIF. A sex-type dimension was found to influence the responses of men and women differently. When the biased items were removed from the GOT scales, gender differences favoring men were reduced in the R and I scales but gender differences favoring women remained in the A and S scales. Implications for the development, validation and use of interest measures are discussed.  相似文献   

11.
Item response theory (IRT) based differential item functioning (DIF) was used to examine the construct and normative invariance of the DSM-IV oppositional defiant disorder (ODD) symptoms for ratings across Malaysian and Australian children, and Malaysian Malay and Malaysian Chinese children. To accomplish these goals, parents completed the Disruptive Behavior Rating Scale, which includes the eight DSM-IV ODD symptoms. Although the comparisons involving Malaysian and Australian children indicated DIF for five symptoms, only the symptom for “touchy” showed notable DIF. This was also the only symptom that showed DIF for the comparisons involving Malay and Chinese children. There were also minimal differences in the latent mean scores across Australian and Malaysian children and also Malay and Chinese children. These results indicate good support for the construct and normative invariance of the ODD symptoms for the samples compared.  相似文献   

12.
This study investigated whether the linguistic complexity of items leads to gender differential item functioning (DIF) on mathematics assessments. Two forms of a mathematics test were developed. The first form consisted of algebra items based on mathematical expressions, terms, and equations. In the second form, the same items were written as word problems without changing their contents and solutions. The test forms were given to a sample of 671 sixth-grade students from 10 middle schools in Turkey. The tests were administered to the students with a 4-week interval. Explanatory item response modeling and logistic regression approaches were used to examine gender DIF. Several word problems were flagged as having gender DIF in favor of female examinees, whereas mathematically expressed forms of the same items did not function differently across male and female examinees. The verbal content of word problems seems to influence the way males and females respond to items.  相似文献   

13.
This research used logistic regression to model item responses from a popular 360-degree-for-development survey used in a leadership development programme given to middle and upper level European managers in Brussels. The survey contained 106 items on 16 scales. The model used gender of ratee and rater group to identify items that exhibited differential item functioning (DIF). The rater groups were self, boss, peer, and direct report. The sample consisted of 356 survey families where a survey family consisted of a matched set of four surveys: one self, one boss, one peer, and one direct report. The sample contained 88% male and 12% female raters. The sample contained 1424 total surveys. The procedure for flagging items exhibiting differential functioning used effect size computed from Wald chi-square statistics rather than statistical significance, resulting in fewer flagged items. One item exhibited rating anomalies due to the gender of the ratee; 55 items exhibited DIF attributable to rater group. The apparent effect of the DIF was small with each item. An examination of the maximum likelihood parameter estimates suggested the rater group DIF was the result of either hierarchical complexity or organizational contingency. The DIF due to gender conformed to prior expectations of gender-related stereotypical interpretations. This research further suggested that DIF due to environmental complexity or organizational contingency could be a naturally occurring phenomenon in some 360-degree assessment, and that the interpretation of some 360-degree feedback could need to include the potential for such DIF to exist.  相似文献   

14.
The most widely used instrument to measure alexithymia is the 20-item Toronto Alexithymia Scale (TAS-20). However, different factor structures have been found in different languages. This study tests six published factor models and metric invariance across clinical and nonclinical samples. It also investigated whether there is a method effect of the negatively keyed items. Second-order models with alexithymia as a higher order factor are tested. Confirmatory factor analyses showed that the original factor model with three factors-difficulty identifying feelings (DIF); difficulty describing feelings (DDF) and externally oriented thinking (EOT)-is the best fitting model. Partial measurement invariance across samples was illustrated but requires further study. A weakness of the model is the low internal consistency of the third factor. Because models with a method factor had a better fit, future reconsideration of the negatively formulated items seems necessary. No evidence was found for the second-order models.  相似文献   

15.
The Adolescent Quality of Life-Mental Health Scale (AQOL-MHS) was designed to measure quality of life in clinical samples of Latino adolescents aged 12–18 years, but has also been used in community samples. The original measure included three factors: Emotional Regulation (ER), Self-Concept (SC) and Social Context (SoC). The goals of this study are to replicate the factor structure using confirmatory factor analysis (CFA), shorten the instrument and test the degree of measurement invariance across gender, age, and type of sample. Participants for the analyses (N?=?354) came from two populations in the San Juan Metropolitan Area: (1) adolescents from randomly selected households, using a multi-stage probability sampling design (n?=?295), and (2) adolescents receiving treatment at mental health clinics (n?=?59). We first carried out a conceptual item analysis for item reduction purposes and then assessed dimensional, configural, metric and scalar invariance for each factor using the Mplus software system. The original 3-factor structure was replicated with comparable model fit in each treatment context. Metric invariance was attained for all three scales across groups. Either full or partial scalar invariance was also observed with DIF in a total of 6 items. Invariance testing supports the use of the abridged 21 item version of the AQOL-MHS to compare diverse individuals with little bias using observed scores, but for refined estimates the ideal scoring will be from a latent variable model.  相似文献   

16.
The Student-Teacher Relationship Scale (STRS) is widely used to examine teachers' relationships with young students in terms of closeness, conflict, and dependency. This study aimed to verify the dimensional structure of the STRS with confirmatory factor analysis, test its measurement invariance across child gender and age, improve its measurement of the dependency construct, and extend its age range. Teachers completed a slightly adapted STRS for a Dutch sample of 2335 children aged 3 to 12. Overall, the 3-factor model showed an acceptable fit. Results indicated metric invariance across gender and age up to 8years. Scalar invariance generally did not hold. Lack of metric invariance at ages 8 to 12 primarily involved Conflict items, whereas scale differences across gender and age primarily involved Closeness items. The adapted Dependency scale showed strong invariance and higher internal consistencies than the original scale for this Dutch sample. Importantly, the revealed non-invariance for gender and age did not influence mean group comparisons.  相似文献   

17.
The importance of parenting styles on children’s outcomes, including cognitive, social, academic, and values makes this topic a central concern to social researchers and psychologists. However, past research has reported controversial evidence on the relationship between authoritarian parenting and children’s outcomes in non-Western cultural contexts. This raises awareness on the implication of cultural differences in parenting styles. As a result, the training parenting style scale (TPSS) was proposed based on the Confucian concept of ‘Guan’ and ‘Chiao Shu.’ This scale is allegedly more reflective of the Asian parenting style. The present study examined the psychometric properties and measurement invariance of the Malay version of the TPSS across adolescents’ perceived maternal and paternal training and by adolescent gender. Of the 8 items in the original TPSS, confirmatory factor analysis supported 6-item scale with error correlations was the best-fitting model. Internal consistency was also good for the 6-item scale. Furthermore, support for configural, metric, scalar, residual, and structural invariance emerged across adolescents’ perceived maternal and paternal training and across adolescent gender. Results of this study supported the psychometric properties of the 6-item TPSS after taking into account several cautiously considered limitations.  相似文献   

18.
The present study suggests a modelling methodology for examining measurement invariance of ordered categorical item indicators of latent constructs such as anxiety, coping, motives etc., in research settings with few subgroups and a large sample of individuals. The Hungarian version of the Anxiety Trait scale of the State-Trait Anxiety Inventory for Children (STAIC-H) was administered to 605 boys and 975 girls of age 10–15 in 12 schools. A MIMIC model was suggested for examining measurement invariance across subgroups of schools and ages, while a multi-group analysis was recommended for investigating invariance across gender. High degree of invariance across groups was obtained for the Anxiety Trait scale in terms of item factor loadings, item thresholds and item homogeneity with respect to group contrast variables. Based on the diagnostic information obtained by the present methodology, the few item indicators showing non-invariance were discussed with reference to methodological and conceptual considerations.  相似文献   

19.
Student well‐being is a growing issue in higher education, and assessment of the prevalence of conditions as loneliness is therefore important. In higher education and population surveys the Three‐Item Loneliness Scale (T‐ILS) is used increasingly. The T‐ILS is attractive for large multi‐subject surveys, as it consists of only three items (derived from the UCLA Loneliness Scale). Several ways of classifying persons as lonely based on T‐ILS scores exist: dichotomous and trichotomous classification schemes and use of sum scores with rising levels indicating more loneliness. The question remains whether T‐ILS scores are comparable across the different population groups where they are used or across groups of students in the higher education system. The aim was to investigate whether the T‐ILS suffers from differential item functioning (DIF) that might change the loneliness classification among higher education students, using a large sample just admitted to 22 different academy profession degree programs in Denmark (N = 3,757). DIF was tested relative to degree program, age groups and gender. The framework of graphical loglinear Rasch models was applied, as this allows for adjustment of sum scores for uniform DIF, and thus for assessment of whether DIF might change the classification. Two items showed DIF relative to degree program and gender, and adjusting for this DIF changed the classification for some subgroups. The consequences were negligible when using a dichotomous classification and larger when using a trichotomous classification. Therefore, trichotomous classification should be used with caution unless suitable adjustments for DIF are done prior to classification.  相似文献   

20.
Wheeler DL  Vassar M  Hale WD 《Body image》2011,8(2):168-172
The current study sought to explore the measurement invariance of the SATAQ-3 across gender using a single mixed gender sample consisting of 122 men and 268 women. Participants' age ranged from 18 to 36 years (M=19.6, SD=1.9). Preliminary results indicate that the 28 item scale was a poor fit for either gender in the current sample. Reverse scored items were deleted as they formed a unique method factor with low factor loadings. The resulting 21 items were a good fit to the hypothesized four factor model for both males and females and established evidence of both strict factorial invariance and population heterogeneity across groups. Coefficient alpha estimates of internal consistency reliability ranged from .79 to .94. These findings support use of the SATAQ-3 in mixed gender samples and validate previous research that reported analysis of gender-based mean differences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号