首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
多维题组效应Rasch模型   总被引:2,自引:0,他引:2  
首先, 本文诠释了“题组”的本质即一个存在共同刺激的项目集合。并基于此, 将题组效应划分为项目内单维题组效应和项目内多维题组效应。其次, 本文基于Rasch模型开发了二级评分和多级评分的多维题组效应Rasch模型, 以期较好地处理项目内多维题组效应。最后, 模拟研究结果显示新模型有效合理, 与Rasch题组模型、分部评分模型对比研究后表明:(1)测验存在项目内多维题组效应时, 仅把明显的捆绑式题组效应进行分离而忽略其他潜在的题组效应, 仍会导致参数的偏差估计甚或高估测验信度; (2)新模型更具普适性, 即便当被试作答数据不存在题组效应或只存在项目内单维题组效应, 采用新模型进行测验分析也能得到较好的参数估计结果。  相似文献   

2.
The use of multilevel modeling is presented as an alternative to separate item and subject ANOVAs (F1 x F2) in psycholinguistic research. Multilevel modeling is commonly utilized to model variability arising from the nesting of lower level observations within higher level units (e.g., students within schools, repeated measures within individuals). However, multilevel models can also be used when two random factors are crossed at the same level, rather than nested. The current work illustrates the use of the multilevel model for crossed random effects within the context of a psycholinguistic experimental study, in which both subjects and items are modeled as random effects within the same analysis, thus avoiding some of the problems plaguing current approaches.  相似文献   

3.
An additive multilevel item structure (AMIS) model with random residuals is proposed. The model includes multilevel latent regressions of item discrimination and item difficulty parameters on covariates at both item and item category levels with random residuals at both levels. The AMIS model is useful for explanation purposes and also for prediction purposes as in an item generation context. The parameters can be estimated with an alternating imputation posterior algorithm that makes use of adaptive quadrature, and the performance of this algorithm is evaluated in a simulation study.  相似文献   

4.
It is common practice in IRT to consider items as fixed and persons as random. Both, continuous and categorical person parameters are most often random variables, whereas for items only continuous parameters are used and they are commonly of the fixed type, although exceptions occur. It is shown in the present article that random item parameters make sense theoretically, and that in practice the random item approach is promising to handle several issues, such as the measurement of persons, the explanation of item difficulties, and trouble shooting with respect to DIF. In correspondence with these issues, three parts are included. All three rely on the Rasch model as the simplest model to study, and the same data set is used for all applications. First, it is shown that the Rasch model with fixed persons and random items is an interesting measurement model, both, in theory, and for its goodness of fit. Second, the linear logistic test model with an error term is introduced, so that the explanation of the item difficulties based on the item properties does not need to be perfect. Finally, two more models are presented: the random item profile model (RIP) and the random item mixture model (RIM). In the RIP, DIF is not considered a discrete phenomenon, and when a robust regression approach based on the RIP difficulties is applied, quite good DIF identification results are obtained. In the RIM, no prior anchor sets are defined, but instead a latent DIF class of items is used, so that posterior anchoring is realized (anchoring based on the item mixture). It is shown that both approaches are promising for the identification of DIF.  相似文献   

5.
A pplications of standard item response theory models assume local independence of items and persons. This paper presents polytomous multilevel testlet models for dual dependence due to item and person clustering in testlet‐based assessments with clustered samples. Simulation and survey data were analysed with a multilevel partial credit testlet model. This model was compared with three alternative models – a testlet partial credit model (PCM), multilevel PCM, and PCM – in terms of model parameter estimation. The results indicated that the deviance information criterion was the fit index that always correctly identified the true multilevel testlet model based on the quantified evidence in model selection, while the Akaike and Bayesian information criteria could not identify the true model. In general, the estimation model and the magnitude of item and person clustering impacted the estimation accuracy of ability parameters, while only the estimation model and the magnitude of item clustering affected the item parameter estimation accuracy. Furthermore, ignoring item clustering effects produced higher total errors in item parameter estimates but did not have much impact on the accuracy of ability parameter estimates, while ignoring person clustering effects yielded higher total errors in ability parameter estimates but did not have much effect on the accuracy of item parameter estimates. When both clustering effects were ignored in the PCM, item and ability parameter estimation accuracy was reduced.  相似文献   

6.
This paper advances nonparametric multidimensional item response theory by reporting experimental results on the use of nonmetric multidimensional scaling (MDS) to synthesize a multidimensional model from several approximating one-dimensional models. A two-dimensional simulation data set contains items in which the two-component traits combine linearly (dominance model items) and items in which the two-component traits combine quadratically (ideal point items). Several unidimensional approximations of the two-dimensional model were obtained by running unidimensional estimation software on the simulated data set. The graphs reconstructed from MDS of the unidimensional approximations at selected points clearly separate dominance items from ideal point items, and also various types of dominance or ideal point models. MDS also succeeded in determining the dimensionality of the simulation model items from the observable item responses.  相似文献   

7.
Missing data, such as item responses in multilevel data, are ubiquitous in educational research settings. Researchers in the item response theory (IRT) context have shown that ignoring such missing data can create problems in the estimation of the IRT model parameters. Consequently, several imputation methods for dealing with missing item data have been proposed and shown to be effective when applied with traditional IRT models. Additionally, a nonimputation direct likelihood analysis has been shown to be an effective tool for handling missing observations in clustered data settings. This study investigates the performance of six simple imputation methods, which have been found to be useful in other IRT contexts, versus a direct likelihood analysis, in multilevel data from educational settings. Multilevel item response data were simulated on the basis of two empirical data sets, and some of the item scores were deleted, such that they were missing either completely at random or simply at random. An explanatory IRT model was used for modeling the complete, incomplete, and imputed data sets. We showed that direct likelihood analysis of the incomplete data sets produced unbiased parameter estimates that were comparable to those from a complete data analysis. Multiple-imputation approaches of the two-way mean and corrected item mean substitution methods displayed varying degrees of effectiveness in imputing data that in turn could produce unbiased parameter estimates. The simple random imputation, adjusted random imputation, item means substitution, and regression imputation methods seemed to be less effective in imputing missing item scores in multilevel data settings.  相似文献   

8.
A Monte Carlo study was used to compare four approaches to growth curve analysis of subjects assessed repeatedly with the same set of dichotomous items: A two‐step procedure first estimating latent trait measures using MULTILOG and then using a hierarchical linear model to examine the changing trajectories with the estimated abilities as the outcome variable; a structural equation model using modified weighted least squares (WLSMV) estimation; and two approaches in the framework of multilevel item response models, including a hierarchical generalized linear model using Laplace estimation, and Bayesian analysis using Markov chain Monte Carlo (MCMC). These four methods have similar power in detecting the average linear slope across time. MCMC and Laplace estimates perform relatively better on the bias of the average linear slope and corresponding standard error, as well as the item location parameters. For the variance of the random intercept, and the covariance between the random intercept and slope, all estimates are biased in most conditions. For the random slope variance, only Laplace estimates are unbiased when there are eight time points.  相似文献   

9.
本研究开发了两种新的适用于多级评分项目的多维计算机化自适应测验(PMCAT)的选题策略——修正的连续熵(RCEM)和修正的后验期望KL信息(MKB)方法,并与以往PMCAT的选题策略进行了对比研究。Monte Carlo实验结果表明:两种新开发的选题策略比原方法估计精度更高,并且RCEM方法在所有选题策略中曝光率最低。新开发的选题策略具有较理想的估计精度和曝光控制效果,为PMCAT在实践中的应用提供了新的方法支持。  相似文献   

10.
In responding to rating scale items, respondents may hold different perspectives on the given categories. The random-effect rating scale model (RERSM), developed to account for variations in the category thresholds across respondents, is unidimensional and unilevel. It becomes statistically inefficient when multiple unidimensional tests have to be analyzed and inapplicable when data have a multilevel structure (e.g., respondents nested within organizations, students nested within schools). To resolve these problems, this study develops a multidimensional and multilevel version of the RERSM. The parameters can be estimated with existing computer software. Thus, there is no need to develop estimation procedures or corresponding computer programs. Simulation studies were conducted to evaluate the parameter recovery of the multidimensional RERSM, the multilevel RERSM, and the multidimensional and multilevel RERSM using WinBUGS. The results showed that the parameter recovery was generally satisfactory. An empirical example of the application of the multidimensional and multilevel RERSM to 2006 Program for International Student Assessment inventories about attitudes toward learning sciences is provided.  相似文献   

11.
詹沛达 《心理科学》2019,(1):170-178
随着心理与教育测量研究的发展和科技的进步,计算机化(大规模)测验逐渐受到人们的关注。为探究在计算机化多维测验中如何利用作答时间数据来辅助评估多维潜在能力,以及为我国义务教育阶段教育质量监测提供数据分析方法上的理论支持。本研究以2012年和2015年国际学生能力评估(PISA)计算机化数学测验数据为例,提出了一种可同时利用作答时间和作答精度数据的联合作答与时间的多维Rasch模型。根据新模型对PISA数据的分析结果,表明引入作答时间数据,不仅有助于提高模型参数的估计精度,还有助于数据分析者利用被试的作答时间信息来做进一步的决策和干预(e.g., 对异常作答行为或预备知识的诊断)。  相似文献   

12.
Abstract

This paper evaluated multilevel reliability measures in two-level nested designs (e.g., students nested within teachers) within an item response theory framework. A simulation study was implemented to investigate the behavior of the multilevel reliability measures and the uncertainty associated with the measures in various multilevel designs regarding the number of clusters, cluster sizes, and intraclass correlations (ICCs), and in different test lengths, for two parameterizations of multilevel item response models with separate item discriminations or the same item discrimination over levels. Marginal maximum likelihood estimation (MMLE)-multiple imputation and Bayesian analysis were employed to evaluate the accuracy of the multilevel reliability measures and the empirical coverage rates of Monte Carlo (MC) confidence or credible intervals. Considering the accuracy of the multilevel reliability measures and the empirical coverage rate of the intervals, the results lead us to generally recommend MMLE-multiple imputation. In the model with separate item discriminations over levels, marginally acceptable accuracy of the multilevel reliability measures and empirical coverage rate of the MC confidence intervals were found in a limited condition, 200 clusters, 30 cluster size, .2 ICC, and 40 items, in MMLE-multiple imputation. In the model with the same item discrimination over levels, the accuracy of the multilevel reliability measures and the empirical coverage rate of the MC confidence intervals were acceptable in all multilevel designs we considered with 40 items under MMLE-multiple imputation. We discuss these findings and provide guidelines for reporting multilevel reliability measures.  相似文献   

13.
Six experiments examined the proposal that an item of long-term knowledge can be simultaneously inhibited and activated. In 2 directed forgetting experiments items to-be-forgotten were found to be inhibited in list-cued recall but activated in lexical decision tasks. In 3 retrieval practice experiments, unpracticed items from practiced categories were found to be inhibited in category-cued recall but were primed in lexical decision. If, however, the primes and targets in lexical decision were taken directly from the study list, inhibition was observed. Finally, it was found that when items highly associated with a study list were processed in between study and test, no inhibition in recall was present. These, and a broad range of other findings, can be explained by the concept of "episodic inhibition," which proposes that episodic memories retain copies of semantic knowledge structures that preserve patterns of activation/inhibition originally generated in those structures during encoding. ((c) 2006 APA, all rights reserved).  相似文献   

14.
国内外考试改革和大型测评实践越来越强调主观题的作用,则评分者信度研究又重新成为一个备受关注的议题。研究在Wang和Liu(2007)的广义多水平侧面模型基础上,提出并探讨了等级反应多水平侧面模型。结果表明:在评分者固定效应和随机效应两种实验条件下,各偏差值的均值与标准差均较小,说明模型在当前实验条件下,各参数估计值的返真性和稳健性均较好,可以检测出评分者效应,由此,后续可进一步加入评分者效应的影响因素,使其发展为可同时检测评分者效应及其影响因素的完整模型。  相似文献   

15.
Lexical and conceptual factors in the naming of relations   总被引:1,自引:1,他引:0  
Recent models of language production distinguish three main stages, the generation of a preverbal (or conceptual) message level representation, the stage of linguistic formulation processes (which access lexical items and generate the syntactic frames in which these items are inserted), and the stage of articulation. This means that at least two sources of difficulty in producing a lexical item must be distinguished. First, the difficulty can be due to properties of the message representation. So, for example, several concepts may compete for expression. Second, a given lexical item might be more difficult to access than another item because of differences in the complexity of the processes translating from conceptual to lexical representations. The present study presents evidence for these two sources of difficulty in producing lexical items for the domain of semantically unmarked versus marked dimensional adjectives (e.g., big versus small). The first set of experiments establishes an effect of semantic markedness in language production which is due to a difference in the difficulty of accessing unmarked versus marked lexical items. The second set of experiments shows that competition between concepts for expression can lead to incorrect selection of an (unintended) lexical item (as reflected in certain types of speech errors), or to a higher processing load for producing the correct (intended) lexical item. Together, these experiments support the distinction between a preverbal conceptual and a lexical level of representation in language production, and show that both levels contribute to the relative difficulty of producing lexical items.  相似文献   

16.
This study explores the foundation of lexical/semantic phoneme binding effects in verbal short-term memory (STM). The immediate serial recall of pure lists of words and nonwords was compared with the recall of mixed lists that had either a predictable, alternating structure (e.g., wnwnwn) or an unpredictable structure (i.e., the serial positions of the words/nonwords could not be known in advance). The study provides evidence for two separate mechanisms by which long-term linguistic knowledge contributes to STM. First, there was evidence for automatic lexical/semantic binding effects that were independent of knowledge of lexical status. The nonwords in both types of mixed list damaged word recall and encouraged the phonological elements of words to migrate. In both alternating and unpredictable mixed lists, the phonemes of words were more likely than the phonemes of nonwords to be recalled together as a coherent item, suggesting that lexical/semantic knowledge encourages the phonological elements of words to emerge together in immediate serial recall, even when lexical status is unknown. Secondly, there was evidence for “strategic redintegration”, which was dependent on prior knowledge of the lexical status of the items in mixed lists. When participants recalled items that they knew to be words in advance, they were able to use this knowledge to constrain their responses so that they were more likely to be lexically appropriate. These findings motivate modifications to current theories of the interaction between linguistic knowledge and verbal short-term memory.  相似文献   

17.
In between-item multidimensional item response models, it is often desirable to compare individual latent trait estimates across dimensions. These comparisons are only justified if the model dimensions are scaled relative to each other. Traditionally, this scaling is done using approaches such as standardization—fixing the latent mean and standard deviation to 0 and 1 for all dimensions. However, approaches such as standardization do not guarantee that Rasch model properties hold across dimensions. Specifically, for between-item multidimensional Rasch family models, the unique ordering of items holds within dimensions, but not across dimensions. Previously, Feuerstahler and Wilson described the concept of scale alignment, which aims to enforce the unique ordering of items across dimensions by linearly transforming item parameters within dimensions. In this article, we extend the concept of scale alignment to the between-item multidimensional partial credit model and to models fit using incomplete data. We illustrate this method in the context of the Kindergarten Individual Development Survey (KIDS), a multidimensional survey of kindergarten readiness used in the state of Illinois. We also present simulation results that demonstrate the effectiveness of scale alignment in the context of polytomous item response models and missing data.  相似文献   

18.
阶层线性模型是处理阶层结构数据的高级统计方法, 项目反应理论是精确测量被试能力的现代测量理论。多水平项目反应理论将阶层线性模型和项目反应理论相结合, 将项目反应模型嵌套在阶层线性模型内, 实现了项目参数和不同水平能力参数的估计, 对回归系数和误差项变异的估计也更加精确。作者概述了多水平项目反应理论的发展历程, 并从项目功能差异、测验等值、学校效能研究等方面评述了多水平项目反应理论在心理与教育测量中的应用, 总结了多水平项目反应理论的价值, 同时展望了今后的研究趋势。  相似文献   

19.
The responses of 2813 individuals to the Personal Globe Inventory (Tracey, 2002) were examined with the goal of developing a shorter, yet valid version of the scale using item response theory to guide the process. A random sample of 1000 individuals was used to select the best items and then the remaining 1813 were used as a validation sample to examine psychometric properties. For items to be included in the shortened form, the option characteristic curves had to conform to theory and there could be no presence of differential item functioning across either gender or ethnicity. The best 80 items were retained forming the PGI-Short. This instrument demonstrated excellent reliability and adherence to a circular model, there was no differential item functioning across either gender or ethnicity. The PGI-Short was supported as an alternative to the fuller version of the PGI.  相似文献   

20.
Previous studies have reported that, in contrast to the effect on immediate serial recall, lexical/semantic factors have little effect on immediate serial recognition. This has been taken as evidence that linguistic knowledge contributes to verbal short-term memory in a redintegrative process at recall. Contrary to this view, we found that lexicality, frequency, and imageability all influenced matching span. The standard matching span task, requiring changes in item order to be detected, was less susceptible to lexical/semantic factors than was a novel task involving the detection of phoneme order and hence item identity changes. Therefore, in both immediate recognition and immediate serial recall, lexical/semantic knowledge makes a greater contribution to item identity than to item order memory. Task sensitivity, and not the absence of overt recall, may have underpinned previous failures to show effects of these variables in immediate recognition. We also compared matching span for pure and unpredictable mixed lists of words and nonwords. Lexicality had a larger impact on immediate recognition for pure than for mixed lists, in line with findings for immediate serial recall. List composition affected the detection of phoneme but not item order changes in matching span; similarly, in recall, mixed lists produce more frequent word phoneme migrations but not migrations of entire items. These results point to strong similarities between immediate serial recall and recognition. Lexical/semantic knowledge may contribute to phonological stability in both tasks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号