首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
自编235个图形推理测验题目。采用铆测验等值设计,以72个联合型瑞文测验题目为铆题,对初中到大学各能力层次的1733名男性进行了测验。使用BILOG MG3.0(边际极大似然估计)对实测数据进行了分析,采用Logsitic 3参数模型。剔除数据与模型拟合不好的题目以及信息函数最大值小于0.3的题目,最终建立一个包含181道题目的题库。该题库可以用于淘汰智力较低的应征青年  相似文献   

2.
You J  Leung F  Lai CM  Fu K 《Assessment》2011,18(4):464-475
This study used item response theory (IRT) to examine the Impulsive Behaviors Checklist for Adolescents (IBCL-A) among 6,276 (67.7% girls) Chinese secondary school students. The IBCL-A included 15 maladaptive impulsive behaviors adapted from the Revised Diagnostic Interview for Borderlines. The authors obtained the severity and discrimination parameters for each item in the IBCL-A, examined differential item functioning across gender and age groups, and tested reliability and concurrent validity of the IBCL-A IRT-scaled score. Most items in the IBCL-A were the most accurate in assessing moderate to high levels of impulsivity and discriminated well among adolescents with varied levels of impulsivity. Differential item functioning emerged in several items across gender. The IRT-scaled score showed good construct validity and incremental predictive validity. Findings demonstrate the sound psychometric properties of the IBCL-A and support the clinical utility of this scale.  相似文献   

3.
The cognitive reflection test (CRT) is a short measure of a person's ability to resist intuitive response tendencies and to produce a normatively correct response, which is based on effortful reasoning. Although the CRT is a very popular measure, its psychometric properties have not been extensively investigated. A major limitation of the CRT is the difficulty of the items, which can lead to floor effects in populations other than highly educated adults. The present study aimed at investigating the psychometric properties of the CRT applying item response theory analyses (a two‐parameter logistic model) and at developing a new version of the scale (the CRT‐long), which is appropriate for participants with both lower and higher levels of cognitive reflection. The results demonstrated the good psychometric properties of the original, as well as the new scale. The validity of the new scale was also assessed by measuring correlations with various indicators of intelligence, numeracy, reasoning and decision‐making skills, and thinking dispositions. Moreover, we present evidence for the suitability of the new scale to be used with developmental samples. Finally, by comparing the performance of adolescents and young adults on the CRT and CRT‐long, we report the first investigation into the development of cognitive reflection. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

4.
We conducted two experimental studies with between-subjects and within-subjects designs to investigate the item response process for personality measures administered in high- versus low-stakes situations. Apart from assessing measurement validity of the item response process, we examined predictive validity; that is, whether or not different response models entail differential selection outcomes. We found that ideal point response models fit slightly better than dominance response models across high- versus low-stakes situations in both studies. Additionally, fitting ideal point models to the data led to fewer items displaying differential item functioning compared to fitting dominance models. We also identified several items that functioned as intermediate items in both the faking and honest conditions when ideal point models were fitted, suggesting that ideal point model is “theoretically” more suitable across these contexts for personality inventories. However, the use of different response models (dominance vs. ideal point) did not have any substantial impact on the validity of personality measures in high-stakes situations, or the effectiveness of selection decisions such as mean performance or percent of fakers selected. These findings are significant in that although prior research supports the importance and use of ideal point models for measuring personality, we find that in the case of personality faking, though ideal point models seem to have slightly better measurement validity, the use of dominance models may be adequate with no loss to predictive validity.  相似文献   

5.
This study investigated the equivalence of different types of informants, such as children (or early adolescents) and parents, in evaluating child externalizing and internalizing problems. We applied a polytomous item response theory (IRT) model for the Strengths and Difficulties Questionnaire (SDQ). We obtained responses to three subscales—Conduct Problems, Hyperactivity/Inattention, and Emotional Symptoms—from 541 elementary school students aged 10–12 years, fathers for 233 students, mothers for 275 students, and the homeroom teachers for 524 students. Expected values on the individual item calculated by the discrimination and threshold parameters were compared among students, fathers, and mothers as an investigation of differential item functioning (DIF) or differential informant functioning. Assessing either externalizing or internalizing problems were mostly equivalent between fathers and mothers, and most items for externalizing problems functioned equally between students and parents, whereas items for internalizing problems showed DIF between them. IRT also yielded that the intervals of response categories varied across items, particularly for the conduct problems items “fight” and “steal,” and positively worded items showed an extremely low threshold.  相似文献   

6.
Visual perceptual skills of school-age children are often assessed using the Supplemental Developmental Test of Visual Perception of the Developmental Test of Visual-Motor Integration. The study purpose was to consider the construct validity of this test by evaluating its scalability (interval level measurement), unidimensionality, differential item functioning, and hierarchical ordering of its items. Visual perceptual performance scores from a sample of 356 typically developing children (171 boys and 185 girls ages 5 to 11 years) were used to complete a Rasch analysis of the test. Seven items were discarded for poor fit, while none of the items exhibited differential item functioning by sex. The construct validity, scalability, hierarchical ordering, and lack of differential item functioning requirements were met by the final test version. Since 7 test items did not fit the Rasch analysis specifications, the clinical value of the test is questionable and limited.  相似文献   

7.
The aim of this study was to determine whether the items from a reading comprehension test in European Portuguese function differently across students from rural and urban areas, which biases the test validity and the equity in assessment. The sample was composed of 653 students from second, third and fourth grades. The presence of differential item functioning (DIF) was analysed using logistic regression and the Mantel–Haenszel procedure. Although 17 items were flagged with DIF, only five items showed non-negligible DIF in all effect-size measures. The evidence of invariance across students with rural or urban backgrounds for most of the items supports the validity of the test though the five identified items should be further investigated.  相似文献   

8.
In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.  相似文献   

9.
This report documents relationships between differential item functioning (DIF) identification and: (1) item–trait association, and (2) scale multidimensionality in personality assessment. Applying [Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.] logistic regression model, DIF effect size is found to become increasingly inflated as investigated item associations with trait scores decrease. Similar patterns were noted for the influence of scale multidimensionality on DIF identification. Individuals who investigate DIF in personality assessment applications are provided with estimates regarding the impact of the magnitude of item and trait association and scale multidimensionality on DIF occurrence and effect size. The results emphasize the importance of excluding investigated items in focal trait identification prior to conducting DIF analyses and reporting item and scale psychometric properties in DIF reports.  相似文献   

10.
Nonparametric tests for testing the validity of polytomous ISOP-models (unidimensional ordinal probabilistic polytomous IRT-models) are presented. Since the ISOP-model is a very general nonparametric unidimensional rating scale model the test statistics apply to a great multitude of latent trait models. A test for the comonotonicity of item sets of two or more items is suggested. Procedures for testing the comonotonicity of two item sets and for item selection are developed. The tests are based on Goodman-Kruskal's gamma index of ordinal association and are generalizations thereof. It is an essential advantage of polytomous ISOP-models within probabilistic IRT-models that the tests of validity of the model can be performed before and without the model being fitted to the data. The new test statistics have the further advantage that no prior order of items or subjects needs to be known.  相似文献   

11.
ABSTRACT

We evaluated the reliability, validity, and differential item functioning (DIF) of a shorter version of the Defining Issues Test-1 (DIT-1), the behavioural DIT (bDIT), measuring the development of moral reasoning. About 353 college students (81 males, 271 females, 1 not reported; age M = 18.64 years, SD = 1.20 years) who were taking introductory psychology classes at a public University in a suburb area in the Southern United States participated in the present study. First, we examined the reliability of the bDIT using Cronbach’s α and its concurrent validity with the original DIT-1 using disattenuated correlation. Second, we compared the test duration between the two measures. Third, we tested the DIF of each question between males and females. Findings reported that first, the bDIT showed acceptable reliability and good concurrent validity. Second, the test duration could be significantly shortened by employing the bDIT. Third, DIF results indicated that the bDIT items did not favour any gender. Practical implications of the present study based on the reported findings are discussed.  相似文献   

12.
This study investigated the psychometric properties of the Depression Anxiety Stress Scale-21 (DASS-21) in a non-clinical sample of working people. Working South African persons (N = 269; females = 62%; mean age = 33, SD = 11.5) completed the DASS-21, as well as the Center for Epidemiologic Studies Depression Scale-Revised (CESD-R) and the Generalized Anxiety Disorder Scale-7 (GAD-7). Results following Confirmatory Factor Analysis and correlational analysis yielded a three-factor structure (depression, anxiety, and stress) for the DASS-21. The evidence for discriminant and convergent validity was strong. Additionally, we found good reliabilities for the overall scale as well as the subscales. The DASS-21 appears a valid and reliable instrument for measuring depression, anxiety, and stress in the workplace. Future studies should investigate differential item functioning and equivalence of items among South African working populations.  相似文献   

13.
Likert-type self-report scales are frequently used in large-scale educational assessment of social-emotional skills. Self-report scales rely on the assumption that their items elicit information only about the trait they are supposed to measure. However, different response biases may threaten this assumption. Specifically, in children, the response style of acquiescence is an important source of systematic error. Balanced scales, including an equal number of positively and negatively keyed items, have been proposed as a solution to control for acquiescence, but the reasons why this design feature worked from the perspective of modern psychometric models have been underexplored. Three methods for controlling for acquiescence are compared: classical method by partialling out the mean; an item response theory method to measure differential person functioning (DPF); and multidimensional item response theory (MIRT) with random intercept. Comparative analyses are conducted on simulated ratings and on self-ratings provided by 40,649 students (aged 11–18) on a fully balanced 30-item scale assessing conscientious self-management. Acquiescence bias was explained as DPF and it was demonstrated that: the acquiescence index is highly related to DPF; balanced scales produce scores controlled for DPF; and MIRT factor scores are highly related to scores controlled for DPF and the random intercept is highly related to DPF.  相似文献   

14.
杨向东 《心理学报》2010,42(7):802-812
自动化项目生成(Automatic Item Generation)中的项目参数是基于认知项目设计的刺激特征集预测的, 在不确定性来源上较之用经验数据标定的参数更为复杂。文章通过实证研究分析了在计算机适应性测验条件下基于认知设计系统法生成的抽象推理测验(ART)项目预测参数对能力参数估计的精确性。研究表明, 项目预测参数比相应标定参数分布更为趋中。这种回归效应既影响到能力参数估计误差大小, 也导致适应性测验过程中项目选择的差异。在控制了项目选择差异之后, 能力参数估计误差较之基于项目标定参数的能力估计误差大, 但差别并不明显。两者相应的能力估计值相关很高, 对应能力值之间的差异很小, 且几乎贯彻整个能力分布区间。  相似文献   

15.
The PARELLA model is a probabilistic parallelogram model that can be used for the measurement of latent attitudes or latent preferences. The data analyzed are the dichotomous responses of persons to items, with a one (zero) indicating agreement (disagreement) with the content of the item. The model provides a unidimensional representation of persons and items. The response probabilities are a function of the distance between person and item: the smaller the distance, the larger the probability that a person will agree with the content of the item. This paper discusses how the approach to differential item functioning presented by Thissen, Steinberg, and Wainer can be implemented for the PARELLA model. Requests for the PARELLA software should be sent to Iec Progamma PO Box 841, 9700 AV Groningen, The Netherlands.  相似文献   

16.
检验生命意义问卷(修订版)在初中生群体中的信效度,并比较了留守与非留守学生在测量学指标上的差异。采用生命意义问卷(修订版)、超越意义量表、情感调节量表、Rosenberg自尊量表和幸福感指数量表对1300名初中生进行调查,其中有636名留守初中生。探索性因素分析、平行分析和最小平均偏相关分析均表明该量表为双因子结构,验证性因素分析与各类群体拟合良好;与上述效标变量均有显著的正相关;在性别和是否留守学生变量上,个别条目表现出一致性或非一致性条目功能差异;总量表、追寻和拥有意义分量表的δ系数都大于0.9。生命意义问卷(修订版)具有在初中生和留守初中生中均有良好的信效度;可以忽略在性别和是否留守学生变量的条目功能差异;问卷辨识度较高。  相似文献   

17.
For detecting differential item functioning (DIF) between two or more groups of test takers in the Rasch model, their item parameters need to be placed on the same scale. Typically this is done by means of choosing a set of so-called anchor items based on statistical tests or heuristics. Here the authors suggest an alternative strategy: By means of an inequality criterion from economics, the Gini Index, the item parameters are shifted to an optimal position where the item parameter estimates of the groups best overlap. Several toy examples, extensive simulation studies, and two empirical application examples are presented to illustrate the properties of the Gini Index as an anchor point selection criterion and compare its properties to those of the criterion used in the alignment approach of Asparouhov and Muthén. In particular, the authors show that—in addition to the globally optimal position for the anchor point—the criterion plot contains valuable additional information and may help discover unaccounted DIF-inducing multidimensionality. They further provide mathematical results that enable an efficient sparse grid optimization and make it feasible to extend the approach, for example, to multiple group scenarios.  相似文献   

18.
Various definitions and different approaches for assessing the complex construct of parental involvement (PI) have led to inconsistent findings regarding the impact of PI on child development. To date, limited information is available regarding the measurement invariance of PI measures across time and groups (e.g., children’s gender, ethnicity, and socio-economic status), leaving a concern that group differences in PI might reflect item bias instead of true differences in PI. The present study aimed to obtain a set of optimal items for measuring PI from kindergarten through the elementary school years and investigate whether they could be used for parents from different groups. A Rasch measurement model was implemented to investigate item difficulty, step calibrations, and measurement invariance (differential item functioning; DIF, here). The results from the Early Childhood Longitudinal Study, Kindergarten Class of 1998–1999 data set showed that 20 items can be used to measure three dimensions of PI—namely school/home involvement, family educational investment, and family routines—across four time points. Administrative time, children’s gender, ethnicity, and social economic status showed different levels of effect on item difficulty for half of these items. Practitioners and researchers should be cautious when using these items and are suggested to freely estimate the item parameters of DIF items as well as add more items to the PI scale to improve reliability.  相似文献   

19.
Creativity has been well studied in the past several decades, and numerous measures have been developed to assess creativity. However, validity evidence associated with each measure is often mixed. In particular, the social consequence aspect of validity has received little attention. This is partly due to the difficulty of testing for differential item functioning (DIF) within the traditional classical test theory framework, which still remains the most popular approach to assessing creativity. Hence, this study provides an example of examining differential item functioning using multilevel explanatory item response theory models. The Creative Thinking Scale was tested for DIF in a sample of 1043 10th–12th graders. Results revealed significant uniform and non-uniform DIF for some items. Differentially functioning items are able to produce measurement bias and should be either deleted or modeled. The detailed implications for researchers and practitioners are discussed.  相似文献   

20.
Gender-based differential item functioning occurs when men and women respond differently to an item despite being similar on the trait assessed by that item. The Patient Health Questionnaire-9 (PHQ-9) is a prominent screening tool for depression. Researchers exploring whether the PHQ-9 exhibits gender-based differential item functioning have used only specialized samples (e.g., individuals with cancer or vision loss). We explored gender bias in the PHQ-9 by means of differential item functioning analyses in a population-based sample.We made use of the National Health and Nutrition Examination Surveys (NHANES, 2008), a population-based sample of the USA including 5995 participants. Differential item functioning was assessed using the Mantel-Haenszel chi-square test and by comparing item characteristic curves between men and women.All items exhibited negligible differential item functioning as demonstrated by the Mantel-Haenszel test, with absolute standardized mean differences ranging from 0.00 to 0.06. Item characteristic curves were similar between genders for all but one item. Item 5 (i.e., changes in appetite) exhibited very minor non-uniform differential item functioning, wherein extremely depressed women endorsed higher response options on this item compared to equally depressed men.Researchers can use the PHQ-9 without concern of gender biases, particularly in epidemiological research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号