首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Use of subject scores as manifest variables to assess the relationship between latent variables produces attenuated estimates. This has been demonstrated for raw scores from classical test theory (CTT) and factor scores derived from factor analysis. Conclusions on scores have not been sufficiently extended to item response theory (IRT) theta estimates, which are still recommended for estimation of relationships between latent variables. This is because IRT estimates appear to have preferable properties compared to CTT, while structural equation modeling (SEM) is often advised as an alternative to scores for estimation of the relationship between latent variables. The present research evaluates the consequences of using subject scores as manifest variables in regression models to test the relationship between latent variables. Raw scores and three methods for obtaining theta estimates were used and compared to latent variable SEM modeling. A Monte Carlo study was designed by manipulating sample size, number of items, type of test, and magnitude of the correlation between latent variables. Results show that, despite the advantage of IRT models in other areas, estimates of the relationship between latent variables are always more accurate when SEM models are used. Recommendations are offered for applied researchers.  相似文献   

2.
本研究以4岁~5岁儿童认知能力测验为例,在IRT框架下探讨了如何进行追踪数据的测量不变性分析。分析模型采用项目间多维项目反应理论模型(between-item MIRT model)和项目内(within-item MIRT model)多维two-tier model,被试为来自全国的882名48个月的儿童,工具为自编4岁~5岁儿童认知能力测验。经测验水平 分析和项目水平分析,结果表明:(1)本文对追踪数据的测量不变性分析方法合理有效; (2)该测验在两个时间点上满足部分测量不变性要求,测验的潜在结构稳定; (3)“方位题”的区分度和难度参数都发生变化,另有4题难度参数出现浮动; (4)儿童在4岁~5岁期间认知能力总体呈快速发展趋势,能力增长显著。  相似文献   

3.
This study investigated the equivalence of different types of informants, such as children (or early adolescents) and parents, in evaluating child externalizing and internalizing problems. We applied a polytomous item response theory (IRT) model for the Strengths and Difficulties Questionnaire (SDQ). We obtained responses to three subscales—Conduct Problems, Hyperactivity/Inattention, and Emotional Symptoms—from 541 elementary school students aged 10–12 years, fathers for 233 students, mothers for 275 students, and the homeroom teachers for 524 students. Expected values on the individual item calculated by the discrimination and threshold parameters were compared among students, fathers, and mothers as an investigation of differential item functioning (DIF) or differential informant functioning. Assessing either externalizing or internalizing problems were mostly equivalent between fathers and mothers, and most items for externalizing problems functioned equally between students and parents, whereas items for internalizing problems showed DIF between them. IRT also yielded that the intervals of response categories varied across items, particularly for the conduct problems items “fight” and “steal,” and positively worded items showed an extremely low threshold.  相似文献   

4.
This study demonstrated reliability and factor structure of the Medical Outcomes Study Short-Form Health Survey (SF-36) among older Americans with Traumatic brain injury (TBI), and evaluated effects of injury severity and race on SF-36's items and latent domains. A representative sample of 654 older, racially diverse patients with TBI was selected from the South Carolina Traumatic Brain Injury Follow-up Registry. Reliability and factor structure of SF-36 were evaluated using Cronbach’s alpha and confirmatory factor analysis (CFA). Multiple-indicator multiple-causes (MIMIC) models were used to study effects of injury severity and race on items (differential item functioning, DIF) and latent domains (population heterogeneity) of SF-36. SF-36 was reliable and its current eightfactor structure was confirmed. While TBI severity did not impact latent domain scores of SF-36, race did. Blacks had higher vitality and lower role-emotional scores than whites. The measurement model was invariant to injury severity and race (free of DIF), and DIF did not contribute to the differences of vitality and role-emotional between black and white older TBI patients. SF-36 was valid to measure quality of life (OoL) after TBI in racially diverse elderly population. Blacks tend to assert to strong coping behaviors in the presence of physical stress while admitting low performance due to emotional stress. In QoL research where the primary outcomes are usually composite scores from instruments, MIMIC models have advantages over conventional multivariable regression models in testing the validity of the instruments and assessing covariate effects on latent traits of instruments while controlling for DIF effects.  相似文献   

5.
A central assumption that is implicit in estimating item parameters in item response theory (IRT) models is the normality of the latent trait distribution, whereas a similar assumption made in categorical confirmatory factor analysis (CCFA) models is the multivariate normality of the latent response variables. Violation of the normality assumption can lead to biased parameter estimates. Although previous studies have focused primarily on unidimensional IRT models, this study extended the literature by considering a multidimensional IRT model for polytomous responses, namely the multidimensional graded response model. Moreover, this study is one of few studies that specifically compared the performance of full-information maximum likelihood (FIML) estimation versus robust weighted least squares (WLS) estimation when the normality assumption is violated. The research also manipulated the number of nonnormal latent trait dimensions. Results showed that FIML consistently outperformed WLS when there were one or multiple skewed latent trait distributions. More interestingly, the bias of the discrimination parameters was non-ignorable only when the corresponding factor was skewed. Having other skewed factors did not further exacerbate the bias, whereas biases of boundary parameters increased as more nonnormal factors were added. The item parameter standard errors recovered well with both estimation algorithms regardless of the number of nonnormal dimensions.  相似文献   

6.
The positive affect and negative affect schedule (PANAS) is a popular measure of positive (PA) and negative affectivity (NA). Developed and validated in Western contexts, the 20‐item scale has been frequently administered on respondents from Asian countries with the assumption of cross‐cultural measurement invariance. We examine this assumption via a rigorous multigroup confirmatory factor analysis, which allows us to assess between‐group differences in both strength of scale item‐to‐latent factor relationship (metric invariance test) and mean of each scale item (scalar invariance test), on a large sample of 1,065 respondents recruited from Singapore (Asian sample) and the United States (Western sample). We found that two items assessing PA (“excited” and “proud”) and three items assessing NA (“guilty,” “hostile,” and “ashamed”) exhibited metric noninvariance whereas 11 of the remaining metric invariant items exhibited scalar noninvariance, suggesting that the PA and NA constructs differ from what the PANAS is expected to measure for Asian respondents. Our findings serve as a cautionary note to researchers who intend to administer the PANAS in future studies as well as to researchers interpreting the results of past studies involving respondents from Asian countries.  相似文献   

7.
In this article, the authors developed a common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory (IRT). They proposed examining the loadings (discrimination) and the intercept (location) parameters simultaneously using the likelihood ratio test with a free-baseline model and Bonferroni corrected critical p values. They compared the relative efficacy of this approach with alternative implementations for various types and amounts of DIF, sample sizes, numbers of response categories, and amounts of impact (latent mean differences). Results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model. Both MACS and IRT performed similarly well in the majority of experimental conditions. As expected, MACS performed slightly worse in dichotomous conditions but better than IRT in polytomous cases where sample sizes were small. Also, contrary to popular belief, MACS performed well in conditions where DIF was simulated on item thresholds (item means), and its accuracy was not affected by impact.  相似文献   

8.
This article proposes a general mixture item response theory (IRT) framework that allows for classes of persons to differ with respect to the type of processes underlying the item responses. Through the use of mixture models, nonnested IRT models with different structures can be estimated for different classes, and class membership can be estimated for each person in the sample. If researchers are able to provide competing measurement models, this mixture IRT framework may help them deal with some violations of measurement invariance. To illustrate this approach, we consider a two-class mixture model, where a person’s responses to Likert-scale items containing a neutral middle category are either modeled using a generalized partial credit model, or through an IRTree model. In the first model, the middle category (“neither agree nor disagree”) is taken to be qualitatively similar to the other categories, and is taken to provide information about the person’s endorsement. In the second model, the middle category is taken to be qualitatively different and to reflect a nonresponse choice, which is modeled using an additional latent variable that captures a person’s willingness to respond. The mixture model is studied using simulation studies and is applied to an empirical example.  相似文献   

9.
10.
This article describes a generalized longitudinal mixture item response theory (IRT) model that allows for detecting latent group differences in item response data obtained from electronic learning (e-learning) environments or other learning environments that result in large numbers of items. The described model can be viewed as a combination of a longitudinal Rasch model, a mixture Rasch model, and a random-item IRT model, and it includes some features of the explanatory IRT modeling framework. The model assumes the possible presence of latent classes in item response patterns, due to initial person-level differences before learning takes place, to latent class-specific learning trajectories, or to a combination of both. Moreover, it allows for differential item functioning over the classes. A Bayesian model estimation procedure is described, and the results of a simulation study are presented that indicate that the parameters are recovered well, particularly for conditions with large item sample sizes. The model is also illustrated with an empirical sample data set from a Web-based e-learning environment.  相似文献   

11.
Items bundles     
An item bundle is a small group of multiple choice items that share a common reading passage or graph, or a small group of matching items that share distractors. Item bundles are easily identified by paging through a copy of a test. Bundled items may violate the latent conditional independence assumption of unidimensional item response theory (IRT), but such a violation would not typically suggest the existence of a new fundamental human ability to read one specific reading passage or to interpret one specific graph. It is important, therefore, to have theoretical concepts and empirical checks that distinguish between, on the one hand, anticipated violations of latent conditional independence within item bundles, and, on the other hand, violations that cannot be attributed to idiosyncratic features of test format and instead suggest departures from unidimensionalty. To this end, two theorems on unidimensional IRT are extended to describe observable item response distributions when there is conditional independencebetween but not necessarilywithin item bundles.The author is grateful to Ivo Molenaar and the referees for many helpful suggestions, and to D. Thayer for assistance with computing.  相似文献   

12.
The paper examines the psychometric properties of the leadership practices inventory (LPI) in the framework of item response theory (IRT). The LPI assesses five dimensions (i.e. leadership practices) of transformational leadership and consists of 30 items. IRT is a model‐based theory that relates the characteristics of questionnaire items (item parameters) and characteristics of individuals (latent variables) to the probability of choosing each of the response categories. The theory does not assume that the instrument is equally reliable for all levels of the latent variable examined. Samejima's graded response model was used to estimate LPI item characteristics, such as item difficulty and item discrimination power. The results show that some items are redundant in the sense they contribute little to the overall precision of the instrument. Moreover, the LPI seems to be most precise and reliable for respondents with low to medium leadership competence, whereas it becomes increasingly unreliable for high‐quality leaders. These findings suggest that the LPI is best used for training and development purposes, but not for leader selection purposes.  相似文献   

13.
Bolt DM  Hare RD  Neumann CS 《Assessment》2007,14(1):44-56
David Cooke and colleagues have published a series of item response theory (IRT) studies investigating the equivalence of the Psychopathy Checklist-Revised (PCL-R) for European versus North American (NA) male criminal offenders. They have consistently concluded that PCL-R scores are not equivalent, with European offenders receiving scores up to five points lower than those in NA when matched according to the latent trait. In this article, the authors critique the Cooke et al. analyses and demonstrate how their anchor item selection method is responsible for their final conclusions concerning the apparent lack of equivalence. The authors provide a competing IRT analysis using an iterative purification strategy for anchor item selection and show how this more justifiable approach leads to very different conclusions regarding the equivalence of the PCL-R. More generally, it is argued that strong interpretations of IRT analyses in the presence of uncorroborated anchor items can be highly misleading when evaluating score metric equivalence.  相似文献   

14.
The application of item response theory (IRT) models requires the identification of the data's dimensionality. A popular method for determining the number of latent dimensions is the factor analysis of a correlation matrix. Unlike factor analysis, which is based on a linear model, IRT assumes a nonlinear relationship between item performance and ability. Because multidimensional scaling (MDS) assumes a monotonic relationship this method may be useful for the assessment of a data set's dimensionality for use with IRT models. This study compared MDS, exploratory and confirmatory factor analysis (EFA and CFA, respectively) in the assessment of the dimensionality of data sets which had been generated to be either one- or two-dimensional. In addition, the data sets differed in the degree of interdimensional correlation and in the number of items defining a dimension. Results showed that MDS and CFA were able to correctly identify the number of latent dimensions for all data sets. In general, EFA was able to correctly identify the data's dimensionality, except for data whose interdimensional correlation was high.  相似文献   

15.
16.
In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.  相似文献   

17.
初中词汇理解能力量表的编制   总被引:4,自引:2,他引:2  
曹亦薇 《心理学报》1999,32(2):215-221
应用项目反应理论为初中各年级编制了词汇理解能力的测验,其中包含了143个多项选择的词汇项目,经过反复预测和大规模的正式测试,证关了这三个测验的量表拟全于2PL模型,项目特征曲线拟合度良好的项目占全体项目数90%以上,能力的一维性也得以确认,经等值化后,各年级的区分度均值分别为0.61(初一),0.59(初二),0.55(初三)难度均值分别为-1.61,-1.30,-0.56。  相似文献   

18.
马洁  刘红云 《心理科学》2018,(6):1374-1381
本研究通过高中英语阅读测验实测数据,对比分析双参数逻辑斯蒂克模型 (2PL-IRT)和加入不同数量题组的双参数逻辑斯蒂克模型 (2PL-TRT), 探究题组数量对参数估计及模型拟合的影响。结果表明:(1) 2PL-IRT模型对能力介于-1.50到0.50的被试,能力参数估计偏差较大;(2)将题组效应大于0.50的题组作为局部独立题目纳入模型,会导致部分题目区分度参数的低估和大部分题目难度参数的高估;(3)题组效应越大,将其当作局部独立题目纳入模型估计项目参数的偏差越大。  相似文献   

19.
Item response theory (IRT) plays an important role in psychological and educational measurement. Unlike the classical testing theory, IRT models aggregate the item level information, yielding more accurate measurements. Most IRT models assume local independence, an assumption not likely to be satisfied in practice, especially when the number of items is large. Results in the literature and simulation studies in this paper reveal that misspecifying the local independence assumption may result in inaccurate measurements and differential item functioning. To provide more robust measurements, we propose an integrated approach by adding a graphical component to a multidimensional IRT model that can offset the effect of unknown local dependence. The new model contains a confirmatory latent variable component, which measures the targeted latent traits, and a graphical component, which captures the local dependence. An efficient proximal algorithm is proposed for the parameter estimation and structure learning of the local dependence. This approach can substantially improve the measurement, given no prior information on the local dependence structure. The model can be applied to measure both a unidimensional latent trait and multidimensional latent traits.  相似文献   

20.
Likert-type scales are commonly used when assessing attitudes, personality characteristics, and other psychological variables. This study examined the effect of varying the number of response options on the same set of 28 attitudinal items. Participants answered items using either a 4-point scale (forced choice), a 5-point scale that included a “neither” mid-point, or a 4-point scale with an option of “no opinion” presented after the item. The questionnaire also included an item asking participants what they believe the midpoint in a scale indicated. As predicted, participants’ interpretations of the midpoint varied widely with the most common responses being: “no opinion,” “don't care,” “unsure,” “neutral,” “equal/both,” and “neither.” The quantitative results showed that participants’ levels of item endorsement varied based on the response options offered. For example, “neither” was chosen more often than “no opinion” on all of the items.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号