首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
We illustrate the usefulness of person-fit methodology for personality assessment. For this purpose, we use person-fit methods from item response theory. First, we give a nontechnical introduction to existing person-fit statistics. Second, we analyze data from Harter's (1985) Self-Perception Profile for Children (Harter, 1985) in a sample of children ranging from 8 to 12 years of age (N = 611) and argue that for some children, the scale scores should be interpreted with care and caution. Combined information from person-fit indexes and from observation, interviews, and self-concept theory showed that similar score profiles may have a different interpretation. For some children in the sample, item scores did not adequately reflect their trait level. Based on teacher interviews, this was found to be due most likely to a less developed self-concept and/or problems understanding the meaning of the questions. We recommend investigating the scalability of score patterns when using self-report inventories to help the researcher interpret respondents' behavior correctly.  相似文献   

Usually, methods for detection of differential item functioning (DIF) compare the functioning of items across manifest groups. However, the manifest groups with respect to which the items function differentially may not necessarily coincide with the true source of the bias. It is expected that DIF detection under a model that includes a latent DIF variable is more sensitive to this source of bias. In a simulation study, it is shown that a mixture item response theory model, which includes a latent grouping variable, performs better in identifying DIF items than DIF detection methods using manifest variables only. The difference between manifest and latent DIF detection increases as the correlation between the manifest variable and the true source of the DIF becomes smaller. Different sample sizes, relative group sizes, and significance levels are studied. Finally, an empirical example demonstrates the detection of heterogeneity in a minority sample using a latent grouping variable. Manifest and latent DIF detection methods are applied to a Vocabulary test of the General Aptitude Test Battery (GATB).  相似文献   

In using organizational surveys for decision-making, it is essential to consider measurement equivalence/invariance (ME/I), which addresses the questions of whether score differences are attributable to differences in the latent variable we intend to measure, or attributable to confounding differences in measurement properties. Due to the tendency for null results to remain unpublished, most articles have focused on findings of, and reasons for violations of ME/I. On the other hand, little is available to practitioners and researchers concerning situations where ME/I can be expected to uphold. This is especially disconcerting due to the fact that the null is the desired result in such analyses, and allows for unfettered observed-score comparisons. This special issue presents a unique opportunity to provide such a discussion using real-world examples from an organizational culture survey. In doing so we hope to clear up confusion surrounding the concept of ME/I, when it can be expected, and how it relates to actual differences in scores. First, we review the basic tenets and past findings focusing on ME/I, and discuss the item response theory differential item functioning framework used here. Next, we show ME/I being upheld using organizational survey data wherein violations of ME/I would reasonably not be expected (i.e., the null hypothesis was predicted and supported), and simulate the consequences of ignoring ME/I. Finally, we suggest a set of conditions wherein ME/I is likely to be upheld.  相似文献   

An abbreviated Spider Phobia Questionnaire (SPQ) was developed using methods based in item response theory. Fifteen of the 31 SPQ items that demonstrated good to excellent discrimination along the spider fear continuum were retained in Study 1 that consisted of 1,555 nonclinical and clinical participants. The SPQ-15 demonstrated good internal consistency and correlated highly with the full SPQ. Structural equation modeling revealed that the SPQ-15 demonstrated excellent convergent validity, with strong associations with small animal disgust and other phobic symptoms. Supportive evidence was also found for divergent validity in relation to panic-related symptoms. The SPQ-15 was uniquely predictive of avoidance behavior and fear and disgust responding towards spiders in nonclinical, analogue, and treatment-seeking samples in Studies 2, 3, and 4. Lastly, in Study 5, the SPQ-15 was sensitive to the effects of exposure-based treatment. These findings suggest that the SPQ-15 has considerable strengths, including decreased assessment and scoring time while retaining high reliability, validity, and sensitivity.  相似文献   

Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.  相似文献   

This paper provides an introduction to two commonly used item response theory (IRT) models (the two-parameter logistic model and the graded response model). Throughout the paper, the Need for Cognition Scale (NCS) is used to help illustrate different features of the IRT model. After introducing the IRT models, I explore the assumptions these models make as well as ways to assess the extent to which those assumptions are plausible. Next, I describe how adopting an IRT approach to measurement can change how one thinks about scoring, score precision, and scale construction. I briefly introduce the advanced topics of differential item functioning and computerized adaptive testing before concluding with a summary of what was learned about IRT generally, and the NCS specifically.  相似文献   

The current study examined the relationship between test-taker cognition and psychometric item properties in multiple-selection multiple-choice and grid items. In a study with content-equivalent mathematics items in alternative item formats, adult participants’ tendency to respond to an item was affected by the presence of a grid and variations of answer options. The results of an item response theory analysis were consistent with the hypothesized cognitive processes in alternative item formats. The findings suggest that seemingly subtle variations of item design could substantially affect test-taker cognition and psychometric outcomes, emphasizing the need for investigating item format effects at a fine-grained level.  相似文献   

Recently, there has been increasing interest in reporting subscores. This paper examines reporting of subscores using multidimensional item response theory (MIRT) models (e.g., Reckase in Appl. Psychol. Meas. 21:25–36, 1997; C.R. Rao and S. Sinharay (Eds), Handbook of Statistics, vol. 26, pp. 607–642, North-Holland, Amsterdam, 2007; Beguin & Glas in Psychometrika, 66:471–488, 2001). A MIRT model is fitted using a stabilized Newton–Raphson algorithm (Haberman in The Analysis of Frequency Data, University of Chicago Press, Chicago, 1974; Sociol. Methodol. 18:193–211, 1988) with adaptive Gauss–Hermite quadrature (Haberman, von Davier, & Lee in ETS Research Rep. No. RR-08-45, ETS, Princeton, 2008). A new statistical approach is proposed to assess when subscores using the MIRT model have any added value over (i)  the total score or (ii)  subscores based on classical test theory (Haberman in J. Educ. Behav. Stat. 33:204–229, 2008; Haberman, Sinharay, & Puhan in Br. J. Math. Stat. Psychol. 62:79–95, 2008). The MIRT-based methods are applied to several operational data sets. The results show that the subscores based on MIRT are slightly more accurate than subscore estimates derived by classical test theory.  相似文献   

应用项目反应理论对《中国士兵人格问卷》的项目分析   总被引:4,自引:0,他引:4  
采用项目反应理论(IRT)对《中国士兵人格问卷》进行项目分析。计算机呈现中国士兵人格问卷(CSPQ)对100,523名适龄男性青年进行测验,随机抽取2676名任一维度标准分均低于70的定为合格组;将任一维度大于70分并经专业人员访谈不合格的274名定为不合格组;从精神病院抽取男性年龄相当的221名缓解期精神分裂症患者定为精神病组,并完成CSPQ测验。运用基于IRT的双参数Logistic模型进行分析;结果发现,区分度参数超过区间(0.30,4.00)的条目删除前后,被试的能力值与标准分均存在显著相关;精神病组的测验分数经IRT分析,图形曲线与不合格组有高度吻合。研究结果说明,在测验精度基本相同的条件下,应用IRT可以减少施测条目,提高测验效率,可在一定程度上更精确地区分被试的特质水平  相似文献   

Despite recent technological advances in mass survey administration, surveys seeking to measure personality traits still employ procedures identical to the traditional paper-and-pencil scales, in which every respondent is asked exactly the same items, regardless of his or her trait level. We present an empirical application of personality measurement in which the number and sequence of scale items are tailored to the trait level of each respondent and show that the total number of questions asked of each respondent could be reduced substantially, with a measurable and controllable increase in the standard error of measurement.  相似文献   


In an attempt to measure understudied dimensions of spirituality, recent efforts have focused on the transcendent dimension of spirituality. The Spiritual Transcendence Index (STI) was developed to assess a perceived experience of the sacred that affects one’s ability to transcend life’s difficulties. The main focus of the current study was to investigate the psychometric properties of the STI by utilizing the microscopic item-level examination tools unique in item response theory (IRT), as well as its scale-level exploration devices for psychometric properties of an assessment measure. IRT analyses were conducted to investigate the STI’s psychometric properties across samples (= 712) including how well the measure assesses the latent construct, spiritual transcendence, from the low to high range of the construct. The findings confirm that the 8-item index is a single factor that assesses the latent construct, spiritual transcendence. Instead of the original 6-category version, these findings support a 4-category response version; the 3 categories of disagreement may be collapsed into a single category. These findings not only inform the refinement of the STI but also highlight an important psychometric approach for the refinement of spirituality/religiousness measures, especially those with ceiling effects.  相似文献   

The Harter Self-Perception Profile for Children (SPPC) is one of the most commonly used measures of childhood self-esteem, yet there is little research assessing the psychometric properties of the SPPC for use with an African American population. A sample of 92 African American adolescent females (M age = 12.33) was administered the SPPC in order to assess its suitability for this population in three ways. First, an exploratory factor analysis demonstrated complex components without any factors being identical to the normative factors. The greatest differences were within the behavior and scholastic subscales which had items cross loading on three different factors. Second, the SPPC demonstrated only moderate internal reliability with subscale alpha coefficients ranging from .71 to .82. Third, a comparison with the Rosenberg Self-esteem Scale provided evidence of poor convergent validity. These results raise questions about the validity of the SPPC for use with African American adolescent females.  相似文献   

This article presents further evidence for the psychometric qualities of the Self-Perception Profile for Children (SPPC), a widely used questionnaire for assessing self-esteem in youths. The SPPC was administered to a large sample of Dutch school children (N=1143) in order to study its factor structure, reliability (internal consistency and test–retest stability), and validity. Results showed that the hypothesized factor structure of the SPPC representing five specific domains of self-esteem (i.e. scholastic competence, social acceptance, athletic competence, physical appearance, and behavioral conduct) provided a reasonable fit for the data. Furthermore, the reliability of the scale appeared to be satisfactory with good internal consistency and test–retest stability. Finally, evidence was also obtained for the validity of the SPPC. More specifically, the scale correlated in a theoretically meaningful way with child-, parent-, and teacher-reports of psychopathology and personality. Altogether, the current findings confirm the notion that the SPPC is a reliable and valid self-report measure for assessing children's self-esteem.  相似文献   

The purpose of this paper is to introduce a new method for fitting item response theory models with the latent population distribution estimated from the data using splines. A spline-based density estimation system provides a flexible alternative to existing procedures that use a normal distribution, or a different functional form, for the population distribution. A simulation study shows that the new procedure is feasible in practice, and that when the latent distribution is not well approximated as normal, two-parameter logistic (2PL) item parameter estimates and expected a posteriori scores (EAPs) can be improved over what they would be with the normal model. An example with real data compares the new method and the extant empirical histogram approach.  相似文献   

Creativity is increasingly identified as a key educational outcome at the local, regional, and national levels in several countries. Yet one key issue about the nature of creativity remains controversial: Whether creativity is domain specific or domain general. Resolving this issue would significantly impact the way creativity is identified, nurtured, and assessed in our schools. Three-hundred and fifty-nine undergraduate and graduate students completed measures that assessed their creative achievements in 6 distinct domains. Results based on item response theory models suggested that creativity was domain general, rather than domain specific, and part of the evidence provided by the classical test theory models seemed to favor the domain-specific view. These findings have great implications for researchers and practitioners who aim to assess and promote creativity in schools.  相似文献   

Worthington conceptualized a model of religiosity assessment. The dimensions of the model include Religious Norms, Religious Doctrine, and Authority of Leaders. A 10-item scale for Islamic religious assessment was constructed and administered in Kazakhstan and Kyrgyzstan. First-order factor analysis conducted on the 10 items of the religiosity scale revealed factorial structure corresponding to Worthington’s model. A second-order factor analysis assured 1 underlying latent trait. Two-parameter logistic item response theory models were fit to responses collected in Kazakhstan and Kyrgyzstan. Results supported psychometric soundness of the instrument. The items on the scale revealed excellent discrimination properties between the populations of high and low religious commitment. The study offers a short, practical scale for assessment of commitment to Islam in Central Asian countries.  相似文献   

The componential structure of synonym tasks is investigated using confirmatory multidimensional two-parameter IRT models. It was hypothesized that an open synonym task is decomposable into generating synonym candidates and evaluating these candidate words with respect to their synonymy with the stimulus word. Two subtasks were constructed to identify these two components. Different confirmatory models were estimated both with TESTMAP and with NOHARM. The componential hypothesis was supported, but it was found that the generation subtask also involved some evaluation and that generation and evaluation were highly correlated.  相似文献   


Item response theory (IRT) was applied to evaluate the psychometric properties of the Spiritual Assessment Inventory (SAI; Hall & Edwards, 1996 Hall, T. W. and Edwards, K. J. 1996. The initial development and factor analysis of the spiritual assessment inventory. Journal of Psychology and Theology, 24: 233246. [Crossref], [Web of Science ®] [Google Scholar], 2002 Hall, T. W. and Edwards, K. J. 2002. The spiritual assessment inventory: A theistic model and measure for assessing spiritual development. Journal for the Scientific Study of Religion, 41: 341357. [Crossref], [Web of Science ®] [Google Scholar]). The SAI is a 49-item self-report questionnaire designed to assess five aspects of spirituality: Awareness of God, Disappointment (with God), Grandiosity (excessive self-importance), Realistic Acceptance (of God), and Instability (in one's relationship to God). IRT analysis revealed that for several scales: (a) two or three items per scale carry the psychometric workload and (b) measurement precision is peaked for all five scales, such that one end of the scale, and not the other, is measured precisely. We considered how sample homogeneity and the possible quasi-continuous nature of the SAI constructs may have affected our results and, in light of this, made suggestions for SAI revisions, as well as for measuring spirituality, in general.  相似文献   

Wiberg  Marie  Ramsay  James O.  Li  Juan 《Psychometrika》2019,84(1):310-322
Psychometrika - The aim of this paper is to discuss nonparametric item response theory scores in terms of optimal scores as an alternative to parametric item response theory scores and sum scores....  相似文献   

The Adult Attachment Ratings (AAR) include 3 scales for anxious, ambivalent attachment (excessive dependency, interpersonal ambivalence, and compulsive care-giving), 3 for avoidant attachment (rigid self-control, defensive separation, and emotional detachment), and 1 for secure attachment. The scales include items (ranging from 6–16 in their original form) scored by raters using a 3-point format (0 = absent, 1 = present, and 2 = strongly present) and summed to produce a total score. Item response theory (IRT) analyses were conducted with data from 414 participants recruited from psychiatric outpatient, medical, and community settings to identify the most informative items from each scale. The IRT results allowed us to shorten the scales to 5-item versions that are more precise and easier to rate because of their brevity. In general, the effective range of measurement for the scales was 0 to +2 SDs for each of the attachment constructs; that is, from average to high levels of attachment problems. Evidence for convergent and discriminant validity of the scales was investigated by comparing them with the Experiences of Close Relationships–Revised (ECR–R) scale and the Kobak Attachment Q-sort. The best consensus among self-reports on the ECR–R, informant ratings on the ECR–R, and expert judgments on the Q-sort and the AAR emerged for anxious, ambivalent attachment. Given the good psychometric characteristics of the scale for secure attachment, however, this measure alone might provide a simple alternative to more elaborate procedures for some measurement purposes. Conversion tables are provided for the 7 scales to facilitate transformation from raw scores to IRT-calibrated (theta) scores.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号