首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
本研究主要目的是针对高考英语成绩存在的城乡差异,检验这种差异是否来源于试题在城乡上的项目功能差异。如果两个能力本来相同的考生群体在某一试题得分上表现出不同程度的差异,该试题就存在项目功能差异。研究采用试题标准化分数差法,利用STDIF软件逐一分析了2016年三套全国高考英语卷的客观题是否存在城乡上的项目功能差异,在确定客观题没有项目功能差异后,以客观题成绩为匹配变量,采用条件得分图法对书面表达题是否存在城乡上的项目功能差异进行了分析。研究结果显示,高考英语全国I、II、III卷均未发现城乡上的项目功能差异试题,即可以认为高考英语全国卷对城乡不同户籍考生都非常公平、公正,城乡考生在英语成绩上的差异并非题目的公平性所致。  相似文献   

2.
This study investigated whether the linguistic complexity of items leads to gender differential item functioning (DIF) on mathematics assessments. Two forms of a mathematics test were developed. The first form consisted of algebra items based on mathematical expressions, terms, and equations. In the second form, the same items were written as word problems without changing their contents and solutions. The test forms were given to a sample of 671 sixth-grade students from 10 middle schools in Turkey. The tests were administered to the students with a 4-week interval. Explanatory item response modeling and logistic regression approaches were used to examine gender DIF. Several word problems were flagged as having gender DIF in favor of female examinees, whereas mathematically expressed forms of the same items did not function differently across male and female examinees. The verbal content of word problems seems to influence the way males and females respond to items.  相似文献   

3.

Differential item functioning (DIF) statistics were computed using items from the Peabody Individual Achievement Test (PIAT)-Reading Comprehension subtest for children of the same age group (ages 7 through 12 respectively). The pattern of observed DIF items was determined by comparing each cohort across age groups. Differences related to race and gender were also identified within each cohort. Characteristics of DIF items were identified based on sentence length, vocabulary frequency, and density of a sentence. DIF items were more frequently associated with short sentences than with long sentences. This study explored the potential limitation in the longitudinal use of items in an adaptive test.  相似文献   

4.
Student well‐being is a growing issue in higher education, and assessment of the prevalence of conditions as loneliness is therefore important. In higher education and population surveys the Three‐Item Loneliness Scale (T‐ILS) is used increasingly. The T‐ILS is attractive for large multi‐subject surveys, as it consists of only three items (derived from the UCLA Loneliness Scale). Several ways of classifying persons as lonely based on T‐ILS scores exist: dichotomous and trichotomous classification schemes and use of sum scores with rising levels indicating more loneliness. The question remains whether T‐ILS scores are comparable across the different population groups where they are used or across groups of students in the higher education system. The aim was to investigate whether the T‐ILS suffers from differential item functioning (DIF) that might change the loneliness classification among higher education students, using a large sample just admitted to 22 different academy profession degree programs in Denmark (N = 3,757). DIF was tested relative to degree program, age groups and gender. The framework of graphical loglinear Rasch models was applied, as this allows for adjustment of sum scores for uniform DIF, and thus for assessment of whether DIF might change the classification. Two items showed DIF relative to degree program and gender, and adjusting for this DIF changed the classification for some subgroups. The consequences were negligible when using a dichotomous classification and larger when using a trichotomous classification. Therefore, trichotomous classification should be used with caution unless suitable adjustments for DIF are done prior to classification.  相似文献   

5.
Differential item functioning (DIF) assessment is key in score validation. When DIF is present scores may not accurately reflect the construct of interest for some groups of examinees, leading to incorrect conclusions from the scores. Given rising immigration, and the increased reliance of educational policymakers on cross-national assessments such as Programme for International Student Assessment, Trends in International Mathematics and Science Study, and Progress in International Reading Literacy Study (PIRLS), DIF with regard to native language is of particular interest in this context. However, given differences in language and cultures, assuming similar cross-national DIF may lead to mistaken assumptions about the impact of immigration status, and native language on test performance. The purpose of this study was to use model-based recursive partitioning (MBRP) to investigate uniform DIF in PIRLS items across European nations. Results demonstrated that DIF based on mother's language was present for several items on a PIRLS assessment, but that the patterns of DIF were not the same across all nations.  相似文献   

6.
This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.  相似文献   

7.
Creativity has been well studied in the past several decades, and numerous measures have been developed to assess creativity. However, validity evidence associated with each measure is often mixed. In particular, the social consequence aspect of validity has received little attention. This is partly due to the difficulty of testing for differential item functioning (DIF) within the traditional classical test theory framework, which still remains the most popular approach to assessing creativity. Hence, this study provides an example of examining differential item functioning using multilevel explanatory item response theory models. The Creative Thinking Scale was tested for DIF in a sample of 1043 10th–12th graders. Results revealed significant uniform and non-uniform DIF for some items. Differentially functioning items are able to produce measurement bias and should be either deleted or modeled. The detailed implications for researchers and practitioners are discussed.  相似文献   

8.
Usually, methods for detection of differential item functioning (DIF) compare the functioning of items across manifest groups. However, the manifest groups with respect to which the items function differentially may not necessarily coincide with the true source of the bias. It is expected that DIF detection under a model that includes a latent DIF variable is more sensitive to this source of bias. In a simulation study, it is shown that a mixture item response theory model, which includes a latent grouping variable, performs better in identifying DIF items than DIF detection methods using manifest variables only. The difference between manifest and latent DIF detection increases as the correlation between the manifest variable and the true source of the DIF becomes smaller. Different sample sizes, relative group sizes, and significance levels are studied. Finally, an empirical example demonstrates the detection of heterogeneity in a minority sample using a latent grouping variable. Manifest and latent DIF detection methods are applied to a Vocabulary test of the General Aptitude Test Battery (GATB).  相似文献   

9.
As research continues to document differences in the prevalence of mental health problems such as depression across racial/ethnic groups, the issue of measurement equivalence becomes increasingly important to address. The Mood and Feelings Questionnaire (MFQ) is a widely used screening tool for child and adolescent depression. This study applied a differential item functioning (DIF) framework to data from a sample of 6th and 8th grade students in the Seattle Public School District (N = 3,593) to investigate the measurement equivalence of the MFQ. Several items in the MFQ were found to have DIF, but this DIF was associated with negligible individual- or group-level impact. These results suggest that differences in MFQ scores across groups are unlikely to be caused by measurement non-equivalence.  相似文献   

10.
汉语词汇测验中的项目功能差异初探   总被引:6,自引:1,他引:5  
曹亦薇  张厚粲 《心理学报》1999,32(4):460-467
该文运用两种不同的方法对实际的汉语词汇测验中的36个词汇进行了DIF探测。对于1400多劬的初三学生分别作了男女生与城郊学生间的比较。在男女组分析中检出7个属于一致性DIF的项目;对于城郊学生组经两种方法同时确定的DIF项目有7个,其中5个是一致性DIF,2个是非一致性DIF的项目。该文还讨论了产生DIF的可能因素。  相似文献   

11.
ABSTRACT

We evaluated the reliability, validity, and differential item functioning (DIF) of a shorter version of the Defining Issues Test-1 (DIT-1), the behavioural DIT (bDIT), measuring the development of moral reasoning. About 353 college students (81 males, 271 females, 1 not reported; age M = 18.64 years, SD = 1.20 years) who were taking introductory psychology classes at a public University in a suburb area in the Southern United States participated in the present study. First, we examined the reliability of the bDIT using Cronbach’s α and its concurrent validity with the original DIT-1 using disattenuated correlation. Second, we compared the test duration between the two measures. Third, we tested the DIF of each question between males and females. Findings reported that first, the bDIT showed acceptable reliability and good concurrent validity. Second, the test duration could be significantly shortened by employing the bDIT. Third, DIF results indicated that the bDIT items did not favour any gender. Practical implications of the present study based on the reported findings are discussed.  相似文献   

12.
This study investigated the equivalence of different types of informants, such as children (or early adolescents) and parents, in evaluating child externalizing and internalizing problems. We applied a polytomous item response theory (IRT) model for the Strengths and Difficulties Questionnaire (SDQ). We obtained responses to three subscales—Conduct Problems, Hyperactivity/Inattention, and Emotional Symptoms—from 541 elementary school students aged 10–12 years, fathers for 233 students, mothers for 275 students, and the homeroom teachers for 524 students. Expected values on the individual item calculated by the discrimination and threshold parameters were compared among students, fathers, and mothers as an investigation of differential item functioning (DIF) or differential informant functioning. Assessing either externalizing or internalizing problems were mostly equivalent between fathers and mothers, and most items for externalizing problems functioned equally between students and parents, whereas items for internalizing problems showed DIF between them. IRT also yielded that the intervals of response categories varied across items, particularly for the conduct problems items “fight” and “steal,” and positively worded items showed an extremely low threshold.  相似文献   

13.
14.
In this study, we contrast results from two differential item functioning (DIF) approaches (manifest and latent class) by the number of items and sources of items identified as DIF using data from an international reading assessment. The latter approach yielded three latent classes, presenting evidence of heterogeneity in examinee response patterns. It also yielded more DIF items with larger effect sizes and more consistent item response patterns by substantive aspects (e.g., reading comprehension processes and cognitive complexity of items). Based on our findings, we suggest empirically evaluating the homogeneity assumption in international assessments because international populations cannot be assumed to have homogeneous item response patterns. Otherwise, differences in response patterns within these populations may be under-detected when conducting manifest DIF analyses. Detecting differences in item responses across international examinee populations has implications on the generalizability and meaningfulness of DIF findings as they apply to heterogeneous examinee subgroups.  相似文献   

15.
Score equity assessment (SEA) refers to an examination of population invariance of equating across two or more subpopulations of test examinees. Previous SEA studies have shown that score equity may be present for examinees scoring at particular test score ranges but absent for examinees scoring at other score ranges. No studies to date have performed research for the purpose of understanding why score equity can be inconsistent across the score range of some tests. The purpose of this study is to explore a source of uneven subpopulation score equity across the score range of a test. It is hypothesized that the difficulty of anchor items displaying differential item functioning (DIF) is directly related to the score location at which issues of score inequity are observed. The simulation study supports the hypothesis that the difficulty of DIF items has a systematic impact on the uneven nature of conditional score equity.  相似文献   

16.
The present study examined the psychometric properties of a universal screening instrument called the Emotional and Behavioral Screener (EBS), which is designed to identify students exhibiting emotional and behavioral problems. The primary purposes of this study were to assess the measurement invariance of EBS items between Caucasian and African-American students and to assess the impact of differential item functioning (DIF) on EBS scores. The sample consisted of 946 elementary students from throughout the U.S. The findings suggested that EBS items exhibited small to negligible levels of DIF, and that DIF did not significantly impact EBS scores. The results supported the EBS as universal screening instrument that is fair in measuring the emotional and behavioral risk of elementary students. Research limitations and implications for school professionals are discussed.  相似文献   

17.
This study examines the organization and development of 5 domains of reasoning (categorical, quantitative, spatial, causal, and propositional) and the construct validity of a test designed to measure development from early adolescence to early adulthood. The theory underlying the test is first summarized and the conceptual design of the test is then illustrated. Each domain was addressed by tasks tapping abilities known to be acquired in this age period. The test was administered to 629 adolescents ranging in age from 12 to 18 years. Confirmatory factor analysis validated the 5 domains of reasoning and revealed a common factor underlying all domains. The Rasch model was used to scale the items and specify the reliability of the test across the whole sample and within different groups of participants (female, male, students of gymnasium, and students of lyceum). This model showed that the test is highly reliable and invariant across groups. Cluster analysis and the saltus model were applied to uncover successive developmental stage‐like levels of difficulty and showed the presence of five such levels. The procedural and representational characteristics of these levels were also specified and their implications for developmental and cognitive theory were discussed.  相似文献   

18.
大学生特质焦虑:结构及其特点   总被引:10,自引:0,他引:10       下载免费PDF全文
本研究采用自编大学生特质焦虑问卷和斯皮尔伯格特质焦虑问卷测查了497名1至4年级大学生的特质焦虑.结果表明:①自编特质焦虑问卷可以提取五个因子,分别命名为学习焦虑,就业焦虑,人际焦虑,健康焦虑和惧怕否定评价焦虑;②自编问卷有较好的信度和效度,其克伦巴赫a系数、各维度与总分的相关以及与Spielberger焦虑问卷的相关均达到了心理测量学所认可的标准;③在学习焦虑上,有显著的年级差异,年级越高焦虑水平越低;在就业焦虑上,有显著的城乡差异和性别差异,农村学生的焦虑水平高于城市学生,女生高于男生.  相似文献   

19.
Sheppard R  Han K  Colarelli SM  Dai G  King DW 《Assessment》2006,13(4):442-453
The authors examined measurement bias in the Hogan Personality Inventory by investigating differential item functioning (DIF) across sex and two racial groups (Caucasian and Black). The sample consisted of 1,579 Caucasians (1,023 men, 556 women) and 523 Blacks (321 men, 202 women) who were applying for entry-level, unskilled jobs in factories. Although the group mean differences were trivial, more than a third of the items showed DIF by sex (38.4%) and by race (37.3%). A content analysis of potentially biased items indicated that the themes of items displaying DIF were slightly more cohesive for sex than for race. The authors discuss possible explanations for differing clustering tendencies of items displaying DIF and some practical and theoretical implications of DIF in the development and interpretation of personality inventories.  相似文献   

20.
本文对多级计分认知诊断测验的DIF概念进行了界定,并通过模拟实验以及实证研究对四种常见的多级计分DIF检验方法的适用性进行理论以及实践性的探索。研究结果表明:四种方法均能对多级计分认知诊断中的DIF进行有效的检验,且各方法的表现受模型的影响不大;相较于以总分为匹配变量,以KS为匹配变量时更利于DIF的检测;以KS为匹配变量的LDFA方法以及以KS为匹配变量的曼特尔检验方法在检测DIF题目时有着最高的检验力。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号