共查询到20条相似文献,搜索用时 15 毫秒
1.
Maria Bolsinova Benjamin Deonovic Meirav Arieli-Attali Burr Settles Masato Hagiwara Gunter Maris 《应用心理检测》2022,46(3):219
Adaptive learning and assessment systems support learners in acquiring knowledge and skills in a particular domain. The learners’ progress is monitored through them solving items matching their level and aiming at specific learning goals. Scaffolding and providing learners with hints are powerful tools in helping the learning process. One way of introducing hints is to make hint use the choice of the student. When the learner is certain of their response, they answer without hints, but if the learner is not certain or does not know how to approach the item they can request a hint. We develop measurement models for applications where such on-demand hints are available. Such models take into account that hint use may be informative of ability, but at the same time may be influenced by other individual characteristics. Two modeling strategies are considered: (1) The measurement model is based on a scoring rule for ability which includes both response accuracy and hint use. (2) The choice to use hints and response accuracy conditional on this choice are modeled jointly using Item Response Tree models. The properties of different models and their implications are discussed. An application to data from Duolingo, an adaptive language learning system, is presented. Here, the best model is the scoring-rule-based model with full credit for correct responses without hints, partial credit for correct responses with hints, and no credit for all incorrect responses. The second dimension in the model accounts for the individual differences in the tendency to use hints. 相似文献
2.
Xue Zhang Chun Wang 《The British journal of mathematical and statistical psychology》2021,74(Z1):247-274
Among current state-of-the-art estimation methods for multilevel IRT models, the two-stage divide-and-conquer strategy has practical advantages, such as clearer definition of factors, convenience for secondary data analysis, convenience for model calibration and fit evaluation, and avoidance of improper solutions. However, various studies have shown that, under the two-stage framework, ignoring measurement error in the dependent variable in stage II leads to incorrect statistical inferences. To this end, we proposed a novel method to correct both measurement bias and measurement error of latent trait estimates from stage I in the stage II estimation. In this paper, the HO-IRT model is considered as the measurement model, and a linear mixed effects model on overall (i.e., higher-order) abilities is considered as the structural model. The performance of the proposed correction method is illustrated and compared via a simulation study and a real data example using the National Educational Longitudinal Survey data (NELS 88). Results indicate that structural parameters can be recovered better after correcting measurement biases and errors. 相似文献
3.
Mark Hansen Li Cai Scott Monroe Zhen Li 《The British journal of mathematical and statistical psychology》2016,69(3):225-252
Despite the growing popularity of diagnostic classification models (e.g., Rupp et al., 2010, Diagnostic measurement: theory, methods, and applications, Guilford Press, New York, NY) in educational and psychological measurement, methods for testing their absolute goodness of fit to real data remain relatively underdeveloped. For tests of reasonable length and for realistic sample size, full‐information test statistics such as Pearson's X2 and the likelihood ratio statistic G2 suffer from sparseness in the underlying contingency table from which they are computed. Recently, limited‐information fit statistics such as Maydeu‐Olivares and Joe's (2006, Psychometrika, 71, 713) M2 have been found to be quite useful in testing the overall goodness of fit of item response theory models. In this study, we applied Maydeu‐Olivares and Joe's (2006, Psychometrika, 71, 713) M2 statistic to diagnostic classification models. Through a series of simulation studies, we found that M2 is well calibrated across a wide range of diagnostic model structures and was sensitive to certain misspecifications of the item model (e.g., fitting disjunctive models to data generated according to a conjunctive model), errors in the Q‐matrix (adding or omitting paths, omitting a latent variable), and violations of local item independence due to unmodelled testlet effects. On the other hand, M2 was largely insensitive to misspecifications in the distribution of higher‐order latent dimensions and to the specification of an extraneous attribute. To complement the analyses of the overall model goodness of fit using M2, we investigated the utility of the Chen and Thissen (1997, J. Educ. Behav. Stat., 22, 265) local dependence statistic for characterizing sources of misfit, an important aspect of model appraisal often overlooked in favour of overall statements. The statistic was found to be slightly conservative (with Type I error rates consistently below the nominal level) but still useful in pinpointing the sources of misfit. Patterns of local dependence arising due to specific model misspecifications are illustrated. Finally, we used the M2 and statistics to evaluate a diagnostic model fit to data from the Trends in Mathematics and Science Study, drawing upon analyses previously conducted by Lee et al., (2011, IJT, 11, 144). 相似文献
4.
5.
解释性项目反应理论模型(Explanatory Item Response Theory Models, EIRTM)是指基于广义线性混合模型和非线性混合模型构建的项目反应理论(Item Response Theory, IRT)模型。EIRTM能在IRT模型的基础上直接加入预测变量, 从而解决各类测量问题。首先介绍EIRTM的相关概念和参数估计方法, 然后展示如何使用EIRTM处理题目位置效应、测验模式效应、题目功能差异、局部被试依赖和局部题目依赖, 接着提供实例对EIRTM的使用进行说明, 最后对EIRTM的不足之处和应用前景进行讨论。 相似文献
6.
Roger E. Millsap 《Child Development Perspectives》2010,4(1):5-9
Abstract— Item response theory (IRT) consists of a set of mathematical models for the probabilities of various responses to test items as a function of item and person characteristics. In longitudinal data, changes in measured variables can only be interpreted if important psychometric features of the measured variables are assumed invariant across time. Measurement invariance is invariance in the relation of a measure to the latent variable underlying it. Measurement invariance in longitudinal studies concerns invariance over time, and IRT provides a useful approach to investigating longitudinal measurement invariance. Commonly used IRT models are described, along with the representation of measurement invariance in IRT. The use of IRT for investigating invariance is then described, along with practical considerations in using IRT for this purpose. Conceptual issues, rather than technical details, are emphasized throughout. 相似文献
7.
首先, 本文诠释了“题组”的本质即一个存在共同刺激的项目集合。并基于此, 将题组效应划分为项目内单维题组效应和项目内多维题组效应。其次, 本文基于Rasch模型开发了二级评分和多级评分的多维题组效应Rasch模型, 以期较好地处理项目内多维题组效应。最后, 模拟研究结果显示新模型有效合理, 与Rasch题组模型、分部评分模型对比研究后表明:(1)测验存在项目内多维题组效应时, 仅把明显的捆绑式题组效应进行分离而忽略其他潜在的题组效应, 仍会导致参数的偏差估计甚或高估测验信度; (2)新模型更具普适性, 即便当被试作答数据不存在题组效应或只存在项目内单维题组效应, 采用新模型进行测验分析也能得到较好的参数估计结果。 相似文献
8.
Sien Deng Danielle E. McCarthy Megan E. Piper Timothy B. Baker Daniel M. Bolt 《Multivariate behavioral research》2018,53(2):199-218
Extreme response style (ERS) has the potential to bias the measurement of intra-individual variability in psychological constructs. This paper explores such bias through a multilevel extension of a latent trait model for modeling response styles applied to repeated measures rating scale data. Modeling responses to multi-item scales of positive and negative affect collected from smokers at clinic visits following a smoking cessation attempt revealed considerable ERS bias in the intra-individual sum score variances. In addition, simulation studies suggest the magnitude and direction of bias due to ERS is heavily dependent on the mean affect level, supporting a model-based approach to the study and control of ERS effects. Application of the proposed model-based adjustment is found to improve intra-individual variability as a predictor of smoking cessation. 相似文献
9.
ABSTRACTWhen measuring psychological traits, one has to consider that respondents often show content-unrelated response behavior in answering questionnaires. To disentangle the target trait and two such response styles, extreme responding and midpoint responding, Böckenholt (2012a) developed an item response model based on a latent processing tree structure. We propose a theoretically motivated extension of this model to also measure acquiescence, the tendency to agree with both regular and reversed items. Substantively, our approach builds on multinomial processing tree (MPT) models that are used in cognitive psychology to disentangle qualitatively distinct processes. Accordingly, the new model for response styles assumes a mixture distribution of affirmative responses, which are either determined by the underlying target trait or by acquiescence. In order to estimate the model parameters, we rely on Bayesian hierarchical estimation of MPT models. In simulations, we show that the model provides unbiased estimates of response styles and the target trait, and we compare the new model and Böckenholt’s model in a recovery study. An empirical example from personality psychology is used for illustrative purposes. 相似文献
10.
Wendy M. Yen 《Psychometrika》1985,50(4):399-410
When the three-parameter logistic model is applied to tests covering a broad range of difficulty, there frequently is an increase in mean item discrimination and a decrease in variance of item difficulties and traits as the tests become more difficult. To examine the hypothesis that this unexpected scale shrinkage effect occurs because the items increase in complexity as they increase in difficulty, an approximate relationship is derived between the unidimensional model used in data analysis and a multidimensional model hypothesized to be generating the item responses. Scale shrinkage is successfully predicted for several sets of simulated data.The author is grateful to Robert Mislevy for kindly providing a copy of his computer program, RESOLVE. 相似文献
11.
测验理论的新发展:多维项目反应理论 总被引:3,自引:0,他引:3
多维项目反应理论是基于因子分析和单维项目反应理论两大背景下发展起来的一种新型测验理论。根据被试在完成一项任务时多种能力之间是如何相互作用的,多维项目反应模型可以分为补偿性模型和非补偿性模型两类。本文在系统介绍了当前普遍使用的补偿性模型的基础上,指出后续研究者应关注多维项目反应理论中多级评分和高维空间的多维模型、补偿性和非补偿性模型的融合、参数估计程序的开发和多维测验等值四个方面的研究。 相似文献
12.
It is often considered desirable to have the same ordering of the items by difficulty across different levels of the trait or ability. Such an ordering is an invariant item ordering (IIO). An IIO facilitates the interpretation of test results. For dichotomously scored items, earlier research surveyed the theory and methods of an invariant ordering in a nonparametric IRT context. Here the focus is on polytomously scored items, and both nonparametric and parametric IRT models are considered.The absence of the IIO property in twononparametric polytomous IRT models is discussed, and two nonparametric models are discussed that imply an IIO. A method is proposed that can be used to investigate whether empirical data imply an IIO. Furthermore, only twoparametric polytomous IRT models are found to imply an IIO. These are the rating scale model (Andrich, 1978) and a restricted rating scale version of the graded response model (Muraki, 1990). Well-known models, such as the partial credit model (Masters, 1982) and the graded response model (Samejima, 1969), do no imply an IIO. 相似文献
13.
A number of models for categorical item response data have been proposed in recent years. The models appear to be quite different.
However, they may usefully be organized as members of only three distinct classes, within which the models are distinguished
only by assumptions and constraints on their parameters. “Difference models” are appropriate for ordered responses, “divide-by-total”
models may be used for either ordered or nominal responses, and “left-side added” models are used for multiple-choice responses
with guessing. The details of the taxonomy and the models are described in this paper.
The present study was supported in part by two postdoctoral fellowships awarded to Lynne Steinberg: an Educational Testing
Service Postdoctoral Fellowship at ETS, Princeton, NJ and an NIMH Individual National Research Service Award at Stanford University,
Stanford, CA. Helpful comments by the editor and three anonymous reviewers are gratefully acknowledged. 相似文献
14.
An IRT model based on the Rasch model is proposed for composite tasks, that is, tasks that are decomposed into subtasks of
different kinds. There is one subtask for each component that is discerned in the composite tasks. A component is a generic
kind of subtask of which the subtasks resulting from the decomposition are specific instantiations with respect to the particular
composite tasks under study. The proposed model constrains the difficulties of the composite tasks to be linear combinations
of the difficulties of the corresponding subtask items, which are estimated together with the weights used in the linear combinations,
one weight for each kind of subtask. Although the model does not belong to the exponential family, its parameters can be estimated
using conditional maximum likelihood estimation. The approach is demonstrated with an application to spelling tasks.
We thank Eric Maris for his helpful comments. 相似文献
15.
16.
Hyeon-Ah Kang Ya-Hui Su Hua-Hua Chang 《The British journal of mathematical and statistical psychology》2018,71(3):523-535
A monotone relationship between a true score (τ) and a latent trait level (θ) has been a key assumption for many psychometric applications. The monotonicity property in dichotomous response models is evident as a result of a transformation via a test characteristic curve. Monotonicity in polytomous models, in contrast, is not immediately obvious because item response functions are determined by a set of response category curves, which are conceivably non-monotonic in θ. The purpose of the present note is to demonstrate strict monotonicity in ordered polytomous item response models. Five models that are widely used in operational assessments are considered for proof: the generalized partial credit model (Muraki, 1992, Applied Psychological Measurement, 16, 159), the nominal model (Bock, 1972, Psychometrika, 37, 29), the partial credit model (Masters, 1982, Psychometrika, 47, 147), the rating scale model (Andrich, 1978, Psychometrika, 43, 561), and the graded response model (Samejima, 1972, A general model for free-response data (Psychometric Monograph no. 18). Psychometric Society, Richmond). The study asserts that the item response functions in these models strictly increase in θ and thus there exists strict monotonicity between τ and θ under certain specified conditions. This conclusion validates the practice of customarily using τ in place of θ in applied settings and provides theoretical grounds for one-to-one transformations between the two scales. 相似文献
17.
Hairong Song Huajian Cai Jonathon D. Brown Kevin J. Grimm 《Asian Journal of Social Psychology》2011,14(3):176-188
Using an item‐response theory‐based approach (i.e. likelihood ratio test with an iterative procedure), we examined the equivalence of the Rosenberg Self‐Esteem Scale (RSES) in a sample of US and Chinese college students. Results from the differential item functioning (DIF) analysis showed that the RSES was not fully equivalent at the item level, as well as at the scale level. The two cultural groups did not use the scale comparably, with the US students showing more extreme responses than the Chinese students. Moreover, we evaluated the practical impact of DIF and found that cultural differences in average self‐esteem scores disappeared after the DIF was taken into account. In the present study, we discuss the implications of our findings for cross‐cultural research and provide suggestions for future studies using the RSES in China. 相似文献
18.
Karl Christoph Klauer 《Psychometrika》1991,56(3):535-547
A commonly used method to evaluate the accuracy of a measurement is to provide a confidence interval that contains the parameter of interest with a given high probability. Smallest exact confidence intervals for the ability parameter of the Rasch model are derived and compared to the traditional, asymptotically valid intervals based on the Fisher information. Tables of the exact confidence intervals, termed Clopper-Pearson intervals, can be routinely drawn up by applying a computer program designed by and obtainable from the author. These tables are particularly useful for tests of only moderate lengths where the asymptotic method does not provide valid confidence intervals. 相似文献
19.
When developing and evaluating psychometric measures, a key concern is to ensure that they accurately capture individual differences on the intended construct across the entire population of interest. Inaccurate assessments of individual differences can occur when responses to some items reflect not only the intended construct but also construct-irrelevant characteristics, like a person's race or sex. Unaccounted for, this item bias can lead to apparent differences on the scores that do not reflect true differences, invalidating comparisons between people with different backgrounds. Accordingly, empirically identifying which items manifest bias through the evaluation of differential item functioning (DIF) has been a longstanding focus of much psychometric research. The majority of this work has focused on evaluating DIF across two (or a few) groups. Modern conceptualizations of identity, however, emphasize its multi-determined and intersectional nature, with some aspects better represented as dimensional than categorical. Fortunately, many model-based approaches to modelling DIF now exist that allow for simultaneous evaluation of multiple background variables, including both continuous and categorical variables, and potential interactions among background variables. This paper provides a comparative, integrative review of these new approaches to modelling DIF and clarifies both the opportunities and challenges associated with their application in psychometric research. 相似文献
20.
The polytomous unidimensional Rasch model with equidistant scoring, also known as the rating scale model, is extended in such a way that the item parameters are linearly decomposed into certain basic parameters. The extended model is denoted as the linear rating scale model (LRSM). A conditional maximum likelihood estimation procedure and a likelihood-ratio test of hypotheses within the framework of the LRSM are presented. Since the LRSM is a generalization of both the dichotomous Rasch model and the rating scale model, the present algorithm is suited for conditional maximum likelihood estimation in these submodels as well. The practicality of the conditional method is demonstrated by means of a dichotomous Rasch example with 100 items, of a rating scale example with 30 items and 5 categories, and in the light of an empirical application to the measurement of treatment effects in a clinical study.Work supported in part by the Fonds zur Förderung der Wissenschaftlichen Forschung under Grant No. P6414. 相似文献