首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Background: Although there have been numerous studies conducted on the psychometric properties of Biggs' Learning Process Questionnaire (LPQ), these have involved the use of traditional omnibus measures of scale quality such as corrected item total correlations, internal consistency estimates of reliability, and factor analysis. However, these omnibus measures of scale quality are sample dependent and fail to model item responses as a function of trait level. And since the item trait relationship is typically nonlinear, traditional factor analytic methods are inappropriate. Aims: The purpose of this study was to identify a unidimensional subset of LPQ items and examine the effectiveness of these items and their options in discriminating between changes in the underlying trait level. In addition to assessing item quality, we were interested in assessing overall scale quality with non‐sample dependent measures. Method: The sample was split into two nearly equal halves, and a undimensional subset of items was identified in one of these samples and cross‐validated in the other. The nonlinear relationship between the probability of endorsing an item option and the underlying trait level was modelled using a nonparametric latent trait technique known as kernel smoothing and implemented with the program TestGraf. After item and scale quality were established, maximum likelihood estimates of participants' trait level were obtained and used to examine grade and gender differences. Results: A undimensional subset of 16 deep and achieving items was identified. Slightly more than half of these items needed some of their options combined so that the probability of endorsing an item option as a function of increasing trait level corresponded to the ideal rank ordering of the item options. With this adjustment, scale quality as measured by the information function and standard error function was found to be good. However, no statistically significant gender differences were observed and, although statistically significant grade differences were observed, they were not substantively meaningful. Conclusions: The use of nonparametric kernel‐smoothing techniques is advocated over parametric latent trait methods for the analysis of attitudinal and psychological measures involving polychotomous ordered‐response categories. It is also suggested that latent trait methods are more appropriate than traditional test‐based measures for studying differential item functioning both within and between cultures. Nonparametric kernel‐smoothing techniques hold particular promise in identifying and understanding cross‐cultural differences in student approaches to learning at both the item and scale level.  相似文献   

2.
3.
The application of psychological measures often results in item response data that arguably are consistent with both unidimensional (a single common factor) and multidimensional latent structures (typically caused by parcels of items that tap similar content domains). As such, structural ambiguity leads to seemingly endless "confirmatory" factor analytic studies in which the research question is whether scale scores can be interpreted as reflecting variation on a single trait. An alternative to the more commonly observed unidimensional, correlated traits, or second-order representations of a measure's latent structure is a bifactor model. Bifactor structures, however, are not well understood in the personality assessment community and thus rarely are applied. To address this, herein we (a) describe issues that arise in conceptualizing and modeling multidimensionality, (b) describe exploratory (including Schmid-Leiman [Schmid & Leiman, 1957] and target bifactor rotations) and confirmatory bifactor modeling, (c) differentiate between bifactor and second-order models, and (d) suggest contexts where bifactor analysis is particularly valuable (e.g., for evaluating the plausibility of subscales, determining the extent to which scores reflect a single variable even when the data are multidimensional, and evaluating the feasibility of applying a unidimensional item response theory (IRT) measurement model). We emphasize that the determination of dimensionality is a related but distinct question from either determining the extent to which scores reflect a single individual difference variable or determining the effect of multidimensionality on IRT item parameter estimates. Indeed, we suggest that in many contexts, multidimensional data can yield interpretable scale scores and be appropriately fitted to unidimensional IRT models.  相似文献   

4.
刘红云  李冲  张平平  骆方 《心理学报》2012,44(8):1124-1136
测量工具满足等价性是进行多组比较的前提, 测量等价性的检验方法主要有基于CFA的多组比较法和基于IRT的DIF检验两类方法。文章比较了单维测验情境下基于CCFA的DIFFTEST检验方法和基于IRT模型的IRT-LR检验方法, 以及多维测验情境下DIFFTEST和基于MIRT的卡方检验方法的差异。通过模拟研究的方法, 比较了几种方法的检验力和第一类错误, 并考虑了样本总量、样本量的组间均衡性、测验长度、阈值差异大小以及维度间相关程度的影响。研究结果表明:(1)在单维测验下, IRT-LR是比DIFFTEST更为严格的检验方法; 多维测验下, 在测验较长、测验维度之间相关较高时, MIRT-MG比DIFFTEST更容易检验出项目阈值的差异, 而在测验长度较短、维度之间相关较小时, DIFFTEST的检验力反而略高于MIRT-MG方法。(2)随着阈值差值增加, DIFFTEST、IRT-LR和MIRT-MG三种方法的检验力均在增加, 当阈值差异达到中等或较大时, 三种方法都可以有效检验出测验阈值的不等价性。(3)随着样本总量增加, DIFFTEST、IRT-LR和MIRT-MG方法的检验力均在增加; 在总样本量不变, 两组样本均衡情况下三种方法的检验力均高于不均衡的情况。(4)违背等价性题目个数不变时, 测验越长DIFFTEST的检验力会下降, 而IRT-LR和MIRT-MG检验力则上升。(5) DIFFTEST方法的一类错误率平均值接近名义值0.05; 而IRT-LR和MIRT-MG方法的一类错误率平均值远低于0.05。  相似文献   

5.
In this article, we offer some suggestions as to why tetrads and pentads have become the dominant formats for administering multidimensional forced choice (MFC) items but, in turn, raise questions regarding the underlying psychometric model and means of addressing item quality and scoring accuracy. We then focus our attention on multidimensional pairwise preference (MDPP) items and present an item response theory–based approach to constructing and modeling MDPP responses directly, assessing information at the item and scale levels, and a way of computing standard errors for trait scores and estimating scale reliability. To demonstrate the viability of this method for applied use, we show that the correspondence between MDPP scores derived from direct modeling with those obtained using single statement and unidimensional pairwise preference measures administered in a laboratory setting. Trait score correlations and criterion related validities are compared across testing formats and rating sources (i.e., self and other), and the usefulness of our model-based approach is further demonstrated by some illustrative results involving computerized adaptive tests (CAT).  相似文献   

6.
Classical methods for detecting outliers deal with continuous variables. These methods are not readily applicable to categorical data, such as incorrect/correct scores (0/1) and ordered rating scale scores (e.g., 0, …, 4) typical of multi-item tests and questionnaires. This study proposes two definitions of outlier scores suited for categorical data. One definition combines information on outliers from scores on all the items in the test, and the other definition combines information from all pairs of item scores. For a particular item-score vector, an outlier score expresses the degree to which the item-score vector is unusual. For ten real-data sets, the distribution of each of the two outlier scores is inspected by means of Tukey's fences and the extreme studentized deviate procedure. It is investigated whether the outliers that are identified are influential with respect to the statistical analysis performed on these data. Recommendations are given for outlier identification and accommodation in test and questionnaire data.  相似文献   

7.
顾红磊  温忠粦 《心理科学》2014,37(5):1245-1252
项目表述效应是指由项目表述方式的差异引起的与测量内容无关的系统变异,项目表述效应模型的统计本质是一种双因子模型。本研究以核心自我评价量表(CSES)为例,探讨项目表述效应对人格测验信效度的影响。采用核心自我评价量表、生活满意度量表和积极情感消极情感量表对340名“蚁族”进行测查。结果表明,CSES在核心自我评价特质以外,还存在一个反向题项目表述效应因子;忽视项目表述效应对CSES的同质性信度和效标关联效度有重要影响:高估CSES的同质性信度,低估核心自我评价与生活满意度、积极情感的正相关,高估核心自我评价与消极情感的负相关。  相似文献   

8.
The aim of the current study was to reduce the number of items in the 48-item hypomanic personality scale (HPS) and determine whether a unidimensional scale of the hypomanic trait could be derived. Previously collected HPS data from University students (n = 318) were applied to the Rasch model (one-parameter item response theory). Overall scale and individual item fit statistics were used to judge fit to the model and item maps employed to determine coverage of the trait. Cronbach’s Alpha and correlations with other questionnaires pre- and post-item reduction were evaluated. Rasch analysis indicated that the original HPS was not unidimensional, had significant redundancy and differential item functioning by age and gender. An iterative process of item reduction produced a 20-item HPS (HPS-20) that retained the concepts of the original HPS and had excellent fit to the Rasch model (χ2 p = 0.27). Unidimensionality of the HPS-20 was confirmed. The traditional psychometric properties of the HPS-20 and coverage of the underlying hypomanic construct were similar to the original. It was possible to derive a unidimensional measure of the hypomanic trait. Further use of the HPS-20 is encouraged as it may increase understanding of the risk factors for affective disorders.  相似文献   

9.
汪文义  宋丽红  丁树良 《心理学报》2016,48(12):1612-1624
介绍多维项目反应理论模型下分类准确性和分类一致性指标, 采用蒙特卡罗方法实现复杂决策规则下指标计算, 并从数学上证明分类准确性指标两类估计量在均匀先验和相同决策规则条件下依概率收敛于同一真值。研究结果表明:分类准确性指标可以比较准确地评价分类结果的准确性; 分类一致性指标可以较好地评价分类结果的重测一致性; 在一定条件下, 基于能力量尺的指标优于基于原始总分的指标; 纵使测验维度增加, 估计精度仍比较好; 随着测验长度和维度间相关增加, 分类准确性和分类一致性更高。指标可以用来评价标准参照测验或计算机分类测验的多种决策规则下分类信度和效度。  相似文献   

10.
The scientific study of happiness requires accurate measurement of the construct that satisfies assumptions of parametric statistics and thus allows both researchers and clinicians to make reliable and valid comparisons with the relevant data sources. The 29-item Oxford Happiness Questionnaire (OHQ) is a widely-used scale for assessment of personal happiness. While its psychometric properties are acknowledged to be acceptable, it presents scores on an ordinal scale and may thus not discriminate precisely between individual happiness levels. The current study aimed to improve precision and item functioning of the OHQ by applying Rasch analysis to a sample of 281 participants. To correct disordered thresholds items were rescored in a uniform fashion. Four items displayed poor relationships with the latent trait of happiness and were removed. Best fit to the unidimensional Rasch model was achieved after locally dependent items were combined into subtests and adjusted for personal differences. Using the ordinal-to-interval conversion tables published here, ordinal OHQ scores can now be transformed to interval level data and thus subjected to parametric statistical analysis without violating fundamental assumptions. The precision of the instrument can be improved significantly by these minor modifications without the need to modify the original response format.  相似文献   

11.
A central assumption that is implicit in estimating item parameters in item response theory (IRT) models is the normality of the latent trait distribution, whereas a similar assumption made in categorical confirmatory factor analysis (CCFA) models is the multivariate normality of the latent response variables. Violation of the normality assumption can lead to biased parameter estimates. Although previous studies have focused primarily on unidimensional IRT models, this study extended the literature by considering a multidimensional IRT model for polytomous responses, namely the multidimensional graded response model. Moreover, this study is one of few studies that specifically compared the performance of full-information maximum likelihood (FIML) estimation versus robust weighted least squares (WLS) estimation when the normality assumption is violated. The research also manipulated the number of nonnormal latent trait dimensions. Results showed that FIML consistently outperformed WLS when there were one or multiple skewed latent trait distributions. More interestingly, the bias of the discrimination parameters was non-ignorable only when the corresponding factor was skewed. Having other skewed factors did not further exacerbate the bias, whereas biases of boundary parameters increased as more nonnormal factors were added. The item parameter standard errors recovered well with both estimation algorithms regardless of the number of nonnormal dimensions.  相似文献   

12.
We explore the justification and formulation of a four‐parameter item response theory model (4PM) and employ a Bayesian approach to recover successfully parameter estimates for items and respondents. For data generated using a 4PM item response model, overall fit is improved when using the 4PM rather than the 3PM or the 2PM. Furthermore, although estimated trait scores under the various models correlate almost perfectly, inferences at the high and low ends of the trait continuum are compromised, with poorer coverage of the confidence intervals when the wrong model is used. We also show in an empirical example that the 4PM can yield new insights into the properties of a widely used delinquency scale. We discuss the implications for building appropriate measurement models in education and psychology to model more accurately the underlying response process.  相似文献   

13.
本研究以义务教育阶段学生识字量测验为工具,综合运用探索性结构方程建模(ESEM)以及非参数项目反应理论中的摩根量表(Mokken量表)和DETECT分析方法,探讨了识字能力的维度。探索性结构方程建模结果显示,识字的单维性模型优于多维模型,多维的结果更多的体现出一个难度维度的特征,即字频的作用。Mokken量表分析结果显示,1~2年级和3~9年级测验更倾向于单维量表的特征。DETECT分析结果显示,两个测验的D值趋近于零,表明识字能力是单维能力。结合三种分析方法,识字能力具有单维性。  相似文献   

14.
This study aims to evaluate a number of procedures that have been proposed to enhance cross‐cultural comparability of personality and value data. A priori procedures (anchoring vignettes and direct measures of response styles (i.e. acquiescence, extremity, midpoint responding, and social desirability), a posteriori procedures focusing on data transformations prior to analysis (ipsatization and item parcelling), and two data modelling procedures (treating data as continuous vs as ordered categories) were compared using data collected from university students in 16 countries. We found that (i) anchoring vignettes showed lack of invariance, so they were not bias‐free; (ii) anchoring vignettes showed higher internal consistencies than raw scores where all other correction procedures, notably ipsatization, showed lower internal consistencies; (iii) in measurement invariance testing, no procedure yielded scalar invariance; anchoring vignettes and item parcelling slightly improved comparability, response style correction did not affect it, and ipsatization resulted in lower comparability; (iv) treating Likert‐scale data as categorical resulted in higher levels of comparability; (v) factor scores of scales extracted from different procedures showed similar correlational patterning; and (vi) response style correction was the only procedure that suggested improvement in external validity of country‐level conscientiousness. We conclude that, although no procedure resolves all comparability issues, anchoring vignettes, parcelling, and treating data as ordered categories seem promising to alleviate incomparability. We advise caution in uncritically applying any of these procedures. Copyright © 2017 European Association of Personality Psychology  相似文献   

15.
This paper calls into question traditional methods of measuring the social desirability of items and their use in scale construction. First, we make explicit that the proper focus for desirability studies of items and traits are the rated desirabilities of the alternative item responses indicating different trait levels. Second, the results from our first study show that the relation between degree of endorsement of an item and its judged desirability level is often nonlinear and varies across items such that no general model of item desirability can be adopted that will accurately represent the relations across all items, traits, and trait levels. In addition, the nature of these relationships can vary depending on whether desirability is considered in a work or general context. Third, results from a second study indicate specifically that people when instructed to self-present in a maximally desirable manner will choose for some attributes a moderate level of endorsement (e.g., "agree") rather than a more extreme response option (e.g., "strongly agree"). Subjects offer several different reasons for viewing the less extreme response options, which yield more moderate trait level scores, as more desirable. These reasons are linked to perceptions of the more extreme response option as being associated with negative behaviors and concerns about how others will view a more extreme response to the item. Both studies indicate that desirable responding to personality items is more complex than previously believed.  相似文献   

16.
A rasch model for partial credit scoring   总被引:24,自引:0,他引:24  
A unidimensional latent trait model for responses scored in two or more ordered categories is developed. This “Partial Credit” model is a member of the family of latent trait models which share the property of parameter separability and so permit “specifically objective” comparisons of persons and items. The model can be viewed as an extension of Andrich's Rating Scale model to situations in which ordered response alternatives are free to vary in number and structure from item to item. The difference between the parameters in this model and the “category boundaries” in Samejima's Graded Response model is demonstrated. An unconditional maximum likelihood procedure for estimating the model parameters is developed. Preparation of this paper was supported by grants from the Spencer Foundation and the National Institute for Justice. I would like to thank Professor Benjamin D. Wright of the University of Chicago for his very kind help with the various drafts of this paper.  相似文献   

17.
The Team Role Self Perception Inventory (TRSPI) has attracted several studies critical of its psychometric properties. This research uses a large data set and employs confirmatory factor analysis on within‐scale scores to examine the dimensionality and reliability of the TRSPI's scales. Data show that five of the nine scales are unidimensional and that two other scales show generally good fit to a unidimensional solution. The ‘completer‐finisher’ and ‘implementer’ scales show a better fit to a bidimensional structure and would benefit from improved item wording for a small number of items. The ‘shaper’ scale would also benefit from some attention to item wording. Reliability estimates suggest that the reliability of the TRSPI's scales is better than previous estimates imply.  相似文献   

18.
Rasch models are characterised by sufficient statistics for all parameters. In the Rasch unidimensional model for two ordered categories, the parameterisation of the person and item is symmetrical and it is readily established that the total scores of a person and item are sufficient statistics for their respective parameters. In contrast, in the unidimensional polytomous Rasch model for more than two ordered categories, the parameterisation is not symmetrical. Specifically, each item has a vector of item parameters, one for each category, and each person only one person parameter. In addition, different items can have different numbers of categories and, therefore, different numbers of parameters. The sufficient statistic for the parameters of an item is itself a vector. In estimating the person parameters in presently available software, these sufficient statistics are not used to condition out the item parameters. This paper derives a conditional, pairwise, pseudo-likelihood and constructs estimates of the parameters of any number of persons which are independent of all item parameters and of the maximum scores of all items. It also shows that these estimates are consistent. Although Rasch’s original work began with equating tests using test scores, and not with items of a test, the polytomous Rasch model has not been applied in this way. Operationally, this is because the current approaches, in which item parameters are estimated first, cannot handle test data where there may be many scores with zero frequencies. A small simulation study shows that, when using the estimation equations derived in this paper, such a property of the data is no impediment to the application of the model at the level of tests. This opens up the possibility of using the polytomous Rasch model directly in equating test scores.  相似文献   

19.
Differential item functioning (DIF), referring to between-group variation in item characteristics above and beyond the group-level disparity in the latent variable of interest, has long been regarded as an important item-level diagnostic. The presence of DIF impairs the fit of the single-group item response model being used, and calls for either model modification or item deletion in practice, depending on the mode of analysis. Methods for testing DIF with continuous covariates, rather than categorical grouping variables, have been developed; however, they are restrictive in parametric forms, and thus are not sufficiently flexible to describe complex interaction among latent variables and covariates. In the current study, we formulate the probability of endorsing each test item as a general bivariate function of a unidimensional latent trait and a single covariate, which is then approximated by a two-dimensional smoothing spline. The accuracy and precision of the proposed procedure is evaluated via Monte Carlo simulations. If anchor items are available, we proposed an extended model that simultaneously estimates item characteristic functions (ICFs) for anchor items, ICFs conditional on the covariate for non-anchor items, and the latent variable density conditional on the covariate—all using regression splines. A permutation DIF test is developed, and its performance is compared to the conventional parametric approach in a simulation study. We also illustrate the proposed semiparametric DIF testing procedure with an empirical example.  相似文献   

20.
迫选(forced-choice, FC)测验由于可以控制传统李克特方法带来的反应偏差, 被广泛应用于非认知测验中, 而迫选测验的传统计分方式会产生自模式数据, 这种数据由于不适合于个体间的比较, 一直备受批评。近年来, 多种迫选IRT模型的发展使研究者能够从迫选测验中获得接近常模性的数据, 再次引起了研究者与实践人员对迫选IRT模型的兴趣。首先, 依据所采纳的决策模型和题目反应模型对6种较为主流的迫选IRT模型进行分类和介绍。然后, 从模型构建思路、参数估计方法两个角度对各模型进行比较与总结。其次, 从参数不变性检验、计算机化自适应测验(computerized adaptive testing, CAT)和效度研究3个应用研究方面进行述评。最后提出未来研究可以在模型拓展、参数不变性检验、迫选CAT测验和效度研究4个方向深入。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号