首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
As research continues to document differences in the prevalence of mental health problems such as depression across racial/ethnic groups, the issue of measurement equivalence becomes increasingly important to address. The Mood and Feelings Questionnaire (MFQ) is a widely used screening tool for child and adolescent depression. This study applied a differential item functioning (DIF) framework to data from a sample of 6th and 8th grade students in the Seattle Public School District (N = 3,593) to investigate the measurement equivalence of the MFQ. Several items in the MFQ were found to have DIF, but this DIF was associated with negligible individual- or group-level impact. These results suggest that differences in MFQ scores across groups are unlikely to be caused by measurement non-equivalence.  相似文献   

2.
We investigated measurement equivalence in two antisocial behavior scales (i.e., one scale for adolescents and a second scale for young adults) by examining differential item functioning (DIF) for respondents from single-parent (n = 109) and two-parent families (n = 447). Even though one item in the scale for adolescents and two items in the scale for young adults showed significant DIF, the two scales exhibited non-significant differential test functioning (DTF). Both uniform and nonuniform DIF were investigated and examples of each type were identified. Specifically, uniform DIF was exhibited in the adolescent scale whereas nonuniform DIF was shown in the young adult scale. Implications of DIF results for assessment of antisocial behavior, along with strengths and limitations of the study, are discussed.  相似文献   

3.
Research has shown that boys display higher levels of childhood conduct problems than girls, and Black children display higher levels than White children, but few studies have tested for scalar equivalence of conduct problems across gender and race. The authors conducted a 2-parameter item response theory (IRT) model to examine item characteristics of the Authority Acceptance scale from the Teacher Observation of Classroom Adaptation-Revised (AA-TOCA-R; L. Larsson-Werthamer, S. G. Kellam, & L. Wheeler, 1991) in 8,820 kindergarten children and estimated the degree of differential item functioning (DIF) by gender and race/urban status. The mean level of latent conduct problems was best represented by behaviors such as being stubborn, breaking rules, and being disobedient, whereas breaking things and taking others' property best represented the construct at one standard deviation above the mean. DIF by gender was detected, such that at equivalent levels of latent conduct problems, males received more endorsements of overt behaviors from teachers, whereas females received more endorsements of nonphysical behaviors. Moreover, overt behaviors were better discriminators of latent conduct problems for males, and nonphysical behaviors were better discriminators of latent conduct problems for females. Differences across race/urban status were not found to be conceptually meaningful. The authors' analyses also suggest that the item scaling of the AA-TOCA-R may be best represented by 5e categories instead of 6. These findings provide support for the use of IRT modeling to examine item characteristics of conduct problem scales and DIF to test for scalar equivalence across diverse subpopulations.  相似文献   

4.
Item response theory (IRT) based differential item functioning (DIF) was used to examine the construct and normative invariance of the DSM-IV oppositional defiant disorder (ODD) symptoms for ratings across Malaysian and Australian children, and Malaysian Malay and Malaysian Chinese children. To accomplish these goals, parents completed the Disruptive Behavior Rating Scale, which includes the eight DSM-IV ODD symptoms. Although the comparisons involving Malaysian and Australian children indicated DIF for five symptoms, only the symptom for “touchy” showed notable DIF. This was also the only symptom that showed DIF for the comparisons involving Malay and Chinese children. There were also minimal differences in the latent mean scores across Australian and Malaysian children and also Malay and Chinese children. These results indicate good support for the construct and normative invariance of the ODD symptoms for the samples compared.  相似文献   

5.
《创造性行为杂志》2017,51(2):153-162
Despite significant scholarly attention, the literature on the existence and direction of gender differences in creativity has produced inconsistent findings. In the present paper, we argue that this lack of consensus may be attributable, at least in part, to gender‐specific inconsistencies in the measurement of creative problem‐solving. To explore this possibility, we empirically tested assumptions of multiple‐group measurement invariance using samples borrowed from four recent studies that assessed creative problem‐solving (J.D. B arrett et al., 2013; K.S. H ester et al., 2012; D.R. P eterson et al., 2013; I.C. R obledo et al., 2012). Across the four samples, apparent gender differences emerged on all three components of S.P. B esemer & K. O 'Q uin's (1999) three‐facet model of creativity (i.e., quality, originality, and elegance) such that, on average, females appeared to exhibit higher baseline levels of creativity. However, in light of violations of measurement invariance assumptions across genders found in these samples, comparisons such as these may not ultimately be appropriate. Although the underlying factor structure and factor loadings on a unitary creativity factor were consistent across gender (i.e., weak factorial invariance), measurement in‐equivalence assumptions were violated at the subfacet level (i.e., strong factorial invariance). Implications of these findings for understanding gender differences in creative problem‐solving are discussed.  相似文献   

6.
Various definitions and different approaches for assessing the complex construct of parental involvement (PI) have led to inconsistent findings regarding the impact of PI on child development. To date, limited information is available regarding the measurement invariance of PI measures across time and groups (e.g., children’s gender, ethnicity, and socio-economic status), leaving a concern that group differences in PI might reflect item bias instead of true differences in PI. The present study aimed to obtain a set of optimal items for measuring PI from kindergarten through the elementary school years and investigate whether they could be used for parents from different groups. A Rasch measurement model was implemented to investigate item difficulty, step calibrations, and measurement invariance (differential item functioning; DIF, here). The results from the Early Childhood Longitudinal Study, Kindergarten Class of 1998–1999 data set showed that 20 items can be used to measure three dimensions of PI—namely school/home involvement, family educational investment, and family routines—across four time points. Administrative time, children’s gender, ethnicity, and social economic status showed different levels of effect on item difficulty for half of these items. Practitioners and researchers should be cautious when using these items and are suggested to freely estimate the item parameters of DIF items as well as add more items to the PI scale to improve reliability.  相似文献   

7.
Using an item‐response theory‐based approach (i.e. likelihood ratio test with an iterative procedure), we examined the equivalence of the Rosenberg Self‐Esteem Scale (RSES) in a sample of US and Chinese college students. Results from the differential item functioning (DIF) analysis showed that the RSES was not fully equivalent at the item level, as well as at the scale level. The two cultural groups did not use the scale comparably, with the US students showing more extreme responses than the Chinese students. Moreover, we evaluated the practical impact of DIF and found that cultural differences in average self‐esteem scores disappeared after the DIF was taken into account. In the present study, we discuss the implications of our findings for cross‐cultural research and provide suggestions for future studies using the RSES in China.  相似文献   

8.
Sheppard R  Han K  Colarelli SM  Dai G  King DW 《Assessment》2006,13(4):442-453
The authors examined measurement bias in the Hogan Personality Inventory by investigating differential item functioning (DIF) across sex and two racial groups (Caucasian and Black). The sample consisted of 1,579 Caucasians (1,023 men, 556 women) and 523 Blacks (321 men, 202 women) who were applying for entry-level, unskilled jobs in factories. Although the group mean differences were trivial, more than a third of the items showed DIF by sex (38.4%) and by race (37.3%). A content analysis of potentially biased items indicated that the themes of items displaying DIF were slightly more cohesive for sex than for race. The authors discuss possible explanations for differing clustering tendencies of items displaying DIF and some practical and theoretical implications of DIF in the development and interpretation of personality inventories.  相似文献   

9.
Modern multinational organizations have found that testing or surveying their employees is difficult when the employees come from a variety of language backgrounds. In such situations, surveys and tests are often adapted for use across multiple languages. However, different language versions of an instrument are not necessarily equivalent, which may lead to misleading interpretations. In this study, we used weighted multidimensional scaling (MDS), analysis of covariance (ANCOVA), and ordinal logistic regression (LR) to evaluate the structural equivalence and differential item functioning (DIF) of an employee attitude survey from a large international corporation. Specifically, we evaluated the functioning of the survey items across 3 different languages, 8 different cultures, and 2 mediums of administration (paper-based and Web-based). MDS was used to evaluate structural equivalence, and ANCOVA and LR were used to evaluate DIF across selected employee groups and the 2 administration formats. The results indicated that the structure of the survey data was consistent and that the items functioned similarly across all groups. The results also illustrate the utility of MDS, ANCOVA, and LR for evaluating translated instruments. The implications of the results for future research in this area are discussed.  相似文献   

10.
Although Loss of Control (LOC) is a transdiagnostic factor in eating pathology, there are few standalone assessments of LOC. The objective of this study was to evaluate the uni-dimensionality and measurement equivalence of the Eating Loss of Control Scale (ELOCS). Confirmatory factor analyses were used to achieve a well-fitting uni-dimensional model in clinical (N?=?226) and non-clinical (N?=?476) samples. Measurement equivalence was tested in a factor analytic framework, and effect sizes were computed to evaluate the impact of non-equivalence. A well-fitting model was achieved in both samples after the removal of 4 items. The instrument showed configural equivalence but not metric equivalence. Results suggest that the ELOCS is a reliable and valid measure of LOC in clinical and non-clinical samples. However, while the nature of the LOC construct is similar across binge eating and non-clinical participants, comparisons of ELOCS across these groups are affected by measurement non-equivalence. This research also revealed novel insights into the relative sensitivity of model fitting and effect size approaches to investigating measurement equivalence.  相似文献   

11.
A growing body of research demonstrates that older individuals tend to score differently on personality measures than younger adults. However, recent research using item response theory (IRT) has questioned these findings, suggesting that apparent age differences in personality traits merely reflect artifacts of the response process rather than true differences in the latent constructs. Conversely, other studies have found the opposite—age differences appear to be true differences rather than response artifacts. Given these contradictory findings, the goal of the present study was to examine the measurement equivalence of personality ratings drawn from large groups of young and middle‐aged adults (a) to examine whether age differences in personality traits could be completely explained by measurement nonequivalence and (b) to illustrate the comparability of IRT and confirmatory factor analysis approaches to testing equivalence in this context. Self‐ratings of personality traits were analyzed in two groups of Internet respondents aged 20 and 50 (n = 15,726 in each age group). Measurement nonequivalence across these groups was negligible. The effect sizes of the mean differences due to nonequivalence ranged from –.16 to .15. Results indicate that personality trait differences across age groups reflect actual differences rather than merely response artifacts.  相似文献   

12.
Practice effects in memory testing complicate the interpretation of score changes over repeated testings, particularly in clinical applications. Consequently, several alternative forms of the Auditory Verbal Learning Test (AVLT) have been developed. Studies of these typically indicate that the forms examined are equivalent. However, the implication that the forms in the literature are interchangeable must be tempered by several caveats. Few studies of equivalence have been undertaken; most are restricted to the comparison of single pairs of forms, and the pairings vary across studies. These limitations are exacerbated by the minimal overlapping across studies in variables reported, or in the analyses of equivalence undertaken. The data generated by these studies are nonetheless valuable, as significant practice effects result from serial use of the same form. The available data on alternative AVLT forms are summarized, and recommendations regarding form development and the determination of form equivalence are offered.  相似文献   

13.
This research provides an example of testing for differential item functioning (DIF) using multiple indicator multiple cause (MIMIC) structural equation models. True/False items on five scales of the Schedule for Nonadaptive and Adaptive Personality (SNAP) were tested for uniform DIF in a sample of Air Force recruits with groups defined by gender and ethnicity. Uniform DIF exists when an item is more easily endorsed for one group than the other, controlling for group mean differences on the variable under study. Results revealed significant DIF for many SNAP items and some effects were quite large. Differentially-functioning items can produce measurement bias and should be either deleted or modeled as if separate items were administered to different groups. Future research should aim to determine whether the DIF observed here holds for other samples.  相似文献   

14.
以生活满意度量表为例,运用实证性因素分析,考察在中国文化下网络测验和传统纸笔测验之间的测量不变性。结果显示,网络测验和纸笔测验之间存在弱不变性,即网络测验和纸笔测验有着相同的测量单位;但网络测验和纸笔测验只存在部分的强不变性和部分的严格不变性,测验实施环境对结果的影响不可忽视。该研究表明,恰当设计的网络测验是可靠的,同时还提示,当一个测验在不同情境下运用时,检验测量不变性十分必要  相似文献   

15.
The current study considers methodological challenges in developmental research with linguistically diverse samples of young adolescents. By empirically examining the cross-language measurement equivalence of a measure assessing three components of ethnic identity development (i.e., exploration, resolution, and affirmation) among Mexican American adolescents, the study both assesses the cross-language measurement equivalence of a common measure of ethnic identity and provides an appropriate conceptual and analytical model for researchers needing to evaluate measurement scales translated into multiple languages. Participants are 678 Mexican-origin early adolescents and their mothers. Measures of exploration and resolution achieve the highest levels of equivalence across language versions. The measure of affirmation achieves high levels of equivalence. Results highlight potential ways to correct for any problems of nonequivalence across language versions of the affirmation measure. Suggestions are made for how researchers working with linguistically diverse samples can use the highlighted techniques to evaluate their own translated measures.  相似文献   

16.
In this study, we contrast results from two differential item functioning (DIF) approaches (manifest and latent class) by the number of items and sources of items identified as DIF using data from an international reading assessment. The latter approach yielded three latent classes, presenting evidence of heterogeneity in examinee response patterns. It also yielded more DIF items with larger effect sizes and more consistent item response patterns by substantive aspects (e.g., reading comprehension processes and cognitive complexity of items). Based on our findings, we suggest empirically evaluating the homogeneity assumption in international assessments because international populations cannot be assumed to have homogeneous item response patterns. Otherwise, differences in response patterns within these populations may be under-detected when conducting manifest DIF analyses. Detecting differences in item responses across international examinee populations has implications on the generalizability and meaningfulness of DIF findings as they apply to heterogeneous examinee subgroups.  相似文献   

17.
Although previous research has examined cross-cultural differences in personality, many of these studies neglected to first establish that the measures being used were equivalent in meaning across cultures. Using samples of Chinese, Greek, and American respondents, the measurement equivalence of the Big Five Mini-Markers [Saucier, G. (1994). Mini-markers: A brief version of Goldberg’s unipolar Big-Five markers. Journal of Personality assessment, 63, 506–516] was assessed using confirmatory factor analysis. The results indicate that all of the scales demonstrate configural invariance, but fail to show metric or scalar invariance. Several adjectives from these scales were found to exhibit bias at the item-level. The practical implications of these results are discussed and future research is suggested.  相似文献   

18.
Given the growing interest in the study of subjective well-being as a measure of social progress, instruments that produce valid and reliable scores and that can be used within and across countries are needed. The aim of the present study was to analyze the measurement equivalence of the Day Reconstruction Method in its brief version, using nationally representative samples from Finland, Poland, and Spain obtained within the COURAGE in Europe project. The goodness-of-fit of a two-correlated-factors model and the reliability of the scores obtained were assessed. Cross-country invariance was tested employing a multiple group confirmatory factor analysis, through sequential constraint imposition. In each country, measurement invariance was tested across time frames (morning, afternoon and evening) and days of the week (weekday and weekend). The results found support for the hypothesis of a two-correlated-factors (positive and negative affect) structure; the reliability of the positive, the negative and the net affect scores showed appropriate values. A high equivalence across the three national samples was found: all items except one showed strong measurement invariance indicating that respondents from Finland, Poland, and Spain attribute the same meaning to the latent construct under study, and the levels of the underlying items are equal in all three countries. Similar results were found for the measurement equivalence across time frames and days of the week. Our findings support the assumption of comparability across the different samples considered; in general, higher positive affect and lower negative affect were found in Finland, in the evening and at the weekend.  相似文献   

19.
Differential item functioning (DIF) assessment is key in score validation. When DIF is present scores may not accurately reflect the construct of interest for some groups of examinees, leading to incorrect conclusions from the scores. Given rising immigration, and the increased reliance of educational policymakers on cross-national assessments such as Programme for International Student Assessment, Trends in International Mathematics and Science Study, and Progress in International Reading Literacy Study (PIRLS), DIF with regard to native language is of particular interest in this context. However, given differences in language and cultures, assuming similar cross-national DIF may lead to mistaken assumptions about the impact of immigration status, and native language on test performance. The purpose of this study was to use model-based recursive partitioning (MBRP) to investigate uniform DIF in PIRLS items across European nations. Results demonstrated that DIF based on mother's language was present for several items on a PIRLS assessment, but that the patterns of DIF were not the same across all nations.  相似文献   

20.
Studies of differential prediction typically examine group differences in linear regression slopes or intercepts for predicting criterion scores from one or more test scores. When there are no group differences in slopes, what are the implications of differences in regression intercepts for the measurement equivalence of the tests or criterion across groups? Measurement equivalence is here defined as factorial invariance under a single-factor model for the tests and criterion. Two theorems are given that describe conditions under which intercept differences can exist under factorial invariance. In such cases, intercept differences do not result from measurement bias in either the tests or criterion. The conditions of the theorems are testable using multiple-group confirmatory factor analysis. These test procedures are illustrated in real data. The implications of the theorems and the test procedures for studies of differential prediction are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号