首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Conjunctive item response models are introduced such that (a) sufficient statistics for latent traits are not necessarily additive in item scores; (b) items are not necessarily locally independent; and (c) existing compensatory (additive) item response models including the binomial, Rasch, logistic, and general locally independent model are special cases. Simple estimates and hypothesis tests for conjunctive models are introduced and evaluated as well. Conjunctive models are also identified with cognitive models that assume the existence of several individually necessary component processes for a global ability. It is concluded that conjunctive models and methods may show promise for constructing improved tests and uncovering conjunctive cognitive structure. It is also concluded that conjunctive item response theory may help to clarify the relationships between local dependence, multidimensionality, and item response function form.I appreciate the many helpful suggestions that were given by the reviewers and Ivo Molenaar.  相似文献   

2.
In optimal design research, designs are optimized with respect to some statistical criterion under a certain model for the data. The ideas from optimal design research have spread into various fields of research, and recently have been adopted in test theory and applied to item response theory (IRT) models. In this paper a generalized variance criterion is used for sequential sampling in the two-parameter IRT model. Some general principles are offered to enable a researcher to select the best sampling design for the efficient estimation of item parameters.  相似文献   

3.
Three methods for estimating reliability are studied within the context of nonparametric item response theory. Two were proposed originally by Mokken (1971) and a third is developed in this paper. Using a Monte Carlo strategy, these three estimation methods are compared with four classical lower bounds to reliability. Finally, recommendations are given concerning the use of these estimation methods.The authors are grateful for constructive comments from the reviewers and from Charles Lewis.  相似文献   

4.
The item response function (IRF) for a polytomously scored item is defined as a weighted sum of the item category response functions (ICRF, the probability of getting a particular score for a randomly sampled examinee of ability ). This paper establishes the correspondence between an IRF and a unique set of ICRFs for two of the most commonly used polytomous IRT models (the partial credit models and the graded response model). Specifically, a proof of the following assertion is provided for these models: If two items have the same IRF, then they must have the same number of categories; moreover, they must consist of the same ICRFs. As a corollary, for the Rasch dichotomous model, if two tests have the same test characteristic function (TCF), then they must have the same number of items. Moreover, for each item in one of the tests, an item in the other test with an identical IRF must exist. Theoretical as well as practical implications of these results are discussed.This research was supported by Educational Testing Service Allocation Projects No. 79409 and No. 79413. The authors wish to thank John Donoghue, Ming-Mei Wang, Rebecca Zwick, and Zhiliang Ying for their useful comments and discussions. The authors also wish to thank three anonymous reviewers for their comments.  相似文献   

5.
This paper discusses two forms of separability of item and person parameters in the context of response time (RT) models. The first is separate sufficiency: the existence of sufficient statistics for the item (person) parameters that do not depend on the person (item) parameters. The second is ranking independence: the likelihood of the item (person) ranking with respect to RTs does not depend on the person (item) parameters. For each form a theorem stating sufficient conditions, is proved. The two forms of separability are shown to include several (special cases of) models from psychometric and biometric literature. Ranking independence imposes no restrictions on the general distribution form, but on its parametrization. An estimation procedure based upon ranks and pseudolikelihood theory is discussed, as well as the relation of ranking independence to the concept of double monotonicity.I am indebted to Wim van der Linden for bringing Thissen's (1983) paper to my notice, and to Martijn Berger, Frans Tan, and the anonymous reviewers for their constructive comments on earlier drafts of this paper.  相似文献   

6.
The Dutch Identity: A new tool for the study of item response models   总被引:1,自引:0,他引:1  
The Dutch Identity is a useful way to reexpress the basic equations of item response models that relate the manifest probabilities to the item response functions (IRFs) and the latent trait distribution. The identity may be exploited in several ways. For example: (a) to suggest how item response models behave for large numbers of items—they are approximate submodels of second-order loglinear models for 2 J tables; (b) to suggest new ways to assess the dimensionality of the latent trait—principle components analysis of matrices composed of second-order interactions from loglinear models; (c) to give insight into the structure of latent class models; and (d) to illuminate the problem of identifying the IRFs and the latent trait distribution from sample data.This research was supported in part by contract number N00014-87-K-0730 from the Cognitive Science Program of the Office of Naval Research. I realized the usefulness of the identity in Theorem 1 while lecturing in the Netherlands during October, 1986. Because this was in no small part due to the stimulating psychometric atmosphere there, I call the result the Dutch Identity.  相似文献   

7.
It is shown that measurement error in predictor variables can be modeled using item response theory (IRT). The predictor variables, that may be defined at any level of an hierarchical regression model, are treated as latent variables. The normal ogive model is used to describe the relation between the latent variables and dichotomous observed variables, which may be responses to tests or questionnaires. It will be shown that the multilevel model with measurement error in the observed predictor variables can be estimated in a Bayesian framework using Gibbs sampling. In this article, handling measurement error via the normal ogive model is compared with alternative approaches using the classical true score model. Examples using real data are given.This paper is part of the dissertation by Fox (2001) that won the 2002 Psychometric Society Dissertation Award.  相似文献   

8.
We conducted two experimental studies with between-subjects and within-subjects designs to investigate the item response process for personality measures administered in high- versus low-stakes situations. Apart from assessing measurement validity of the item response process, we examined predictive validity; that is, whether or not different response models entail differential selection outcomes. We found that ideal point response models fit slightly better than dominance response models across high- versus low-stakes situations in both studies. Additionally, fitting ideal point models to the data led to fewer items displaying differential item functioning compared to fitting dominance models. We also identified several items that functioned as intermediate items in both the faking and honest conditions when ideal point models were fitted, suggesting that ideal point model is “theoretically” more suitable across these contexts for personality inventories. However, the use of different response models (dominance vs. ideal point) did not have any substantial impact on the validity of personality measures in high-stakes situations, or the effectiveness of selection decisions such as mean performance or percent of fakers selected. These findings are significant in that although prior research supports the importance and use of ideal point models for measuring personality, we find that in the case of personality faking, though ideal point models seem to have slightly better measurement validity, the use of dominance models may be adequate with no loss to predictive validity.  相似文献   

9.
This paper proposes a multi-objective programming method for determining samples of examinees needed for estimating the parameters of a group of items. In the numerical experiments, optimum samples are compared to uniformly and normally distributed samples. The results show that the samples usually recommended in the literature are well suited for estimating the difficulty parameters. Furthermore, they are also adequate for estimating the discrimination parameters in the three-parameter model, butnot for the guessing parameters.  相似文献   

10.
The aim of latent variable selection in multidimensional item response theory (MIRT) models is to identify latent traits probed by test items of a multidimensional test. In this paper the expectation model selection (EMS) algorithm proposed by Jiang et al. (2015) is applied to minimize the Bayesian information criterion (BIC) for latent variable selection in MIRT models with a known number of latent traits. Under mild assumptions, we prove the numerical convergence of the EMS algorithm for model selection by minimizing the BIC of observed data in the presence of missing data. For the identification of MIRT models, we assume that the variances of all latent traits are unity and each latent trait has an item that is only related to it. Under this identifiability assumption, the convergence of the EMS algorithm for latent variable selection in the multidimensional two-parameter logistic (M2PL) models can be verified. We give an efficient implementation of the EMS for the M2PL models. Simulation studies show that the EMS outperforms the EM-based L1 regularization in terms of correctly selected latent variables and computation time. The EMS algorithm is applied to a real data set related to the Eysenck Personality Questionnaire.  相似文献   

11.
Recent progress in mouse genetics has led to an increased interest in developing procedures for assessing mouse behavior, but relatively few of the behavioral procedures developed involve positively reinforced operant behavior. When operant methods are used, nose poking, not lever pressing, is the target response. In the current study differential acquisition of milk-reinforced lever pressing was observed in five inbred strains (C57BL/6J, DBA/2J, 129X1/SvJ, C3H/HeJ, and BALB/cJ) and one outbred stock (CD-1) of mice. Regardless of whether one or two levers (an "operative" and "inoperative" lever) were in the operant chamber, a concomitant variable-time fixed-ratio schedule of milk reinforcement established lever pressing in the majority of mice within two 120-min sessions. Substantial differences in lever pressing were observed across mice and between procedures. Adding an inoperative lever retarded acquisition in C57BL/6J, DBA/2J, 129X1/SvJ, and C3H/HeJ mice, but not in CD-1 and BALB/cJ mice. Locomotor activity was positively correlated with number of lever presses in both procedures. Analyses of durations of the subcomponents (e.g., time to move from hopper to lever) of operant behavior revealed further differences among the six types of mice. Together, the data suggest that appetitively reinforced lever pressing can be acquired rapidly in mice and that a combination of procedural, behavioral, and genetic variables contributes to this acquisition.  相似文献   

12.
Multidimensional computerized adaptive testing (MCAT) has received increasing attention over the past few years in educational measurement. Like all other formats of CAT, item replenishment is an essential part of MCAT for its item bank maintenance and management, which governs retiring overexposed or obsolete items over time and replacing them with new ones. Moreover, calibration precision of the new items will directly affect the estimation accuracy of examinees’ ability vectors. In unidimensional CAT (UCAT) and cognitive diagnostic CAT, online calibration techniques have been developed to effectively calibrate new items. However, there has been very little discussion of online calibration in MCAT in the literature. Thus, this paper proposes new online calibration methods for MCAT based upon some popular methods used in UCAT. Three representative methods, Method A, the ‘one EM cycle’ method and the ‘multiple EM cycles’ method, are generalized to MCAT. Three simulation studies were conducted to compare the three new methods by manipulating three factors (test length, item bank design, and level of correlation between coordinate dimensions). The results showed that all the new methods were able to recover the item parameters accurately, and the adaptive online calibration designs showed some improvements compared to the random design under most conditions.  相似文献   

13.
Examinee‐selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non‐ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two‐dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non‐ignorable and to determine how to apply the new model to the data collected. Two follow‐up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non‐ignorable missing data were mistakenly treated as ignorable.  相似文献   

14.
15.
This study investigated the specificities and communalities of visuospatial and verbal memory contents, as well as short-term memory and working memory (STM and WM) processes with the use of Rasch models as an alternative to the previous studies based on the classical test theory. A sample of 547 undergraduate students executed four computerised tasks, each consisting of verbal-numeric WM, visuospatial WM, verbal-numeric STM, and visuospatial STM content. Confirmatory factor analyses indicated that visuospatial and verbal-numeric memory are distinct, but correlated variables. Findings also support a domain-general view of WM capacity distinct from domain-specific storage. With the use of Rasch models, our results confirm previous experimental, psychometric, and neuropsychological studies, highlighting that memory span tasks can be divided into separate subsystems for content and processes. This study also shows that better results are obtained when models with person parameter estimates (provided from Rasch models) are adopted, rather than summed raw scores.  相似文献   

16.
17.
Pedestrian is vulnerable to mortality and severe injury in road crashes. Red light running violation of pedestrians is one of the leading causes to the crashes at signalized intersections, at which the crash involvement rates of pedestrians are high. Therefore, it is important to identify the factors that affect the propensity of red light running of pedestrian. In this study, effects of both personal factors (pedestrians’ demographics and behavior) and environmental factors (presence and behavior of other pedestrians, signal time, and traffic condition) on the individual decision of red light running violation are examined, using the video observation surveys at the signalized crossings that are prone to pedestrian-vehicle crashes and have moderate pedestrian and vehicular traffic volumes in the urban area. Crossing behaviors of 6320 pedestrians are captured. Results of a random parameter logit model indicate that pedestrian gender, age, number of lanes, presence of a companion, number of pedestrians around, presence of other violators in the same cycle, time to green, red time, traffic volume, and percentage of heavy vehicles all affect the propensity of red light running violation of pedestrians. Also, there are significant interaction effects by pedestrian’s gender and age, presence of other violators, with a companion, and traffic volume on the propensity. Findings are indicative to the development of effective engineering, enforcement and educational initiatives combating the red light running violation behavior of pedestrians. Therefore, pedestrian safety level at the signalized intersections can be enhanced.  相似文献   

18.
19.
Two experiments are reported in which the ratio of the average times spent in the terminal and initial links (Tt/Ti) in concurrent chains was varied. In Experiment 1, pigeons responded in a three-component procedure in which terminal-link variable-interval schedules were in constant ratio, but their average duration increased across components by a factor of two. The log initial-link response ratio was a negatively accelerated function of Tt/Ti. Overall, the data were well described by Grace's (1994) contextual choice model (CCM) with temporal context represented as (Tt/Ti)k or 2Tt/(Tt + Ti), and by Mazur's (2001) hyperbolic value-added model (HVA), with each model accounting for approximately 93% of the variance. In Experiment 2, fixed-parameter predictions for each model were generated, based on the data from Experiment 1, for conditions in which Tt/Ti was varied over a more extreme range. Data were consistent with the predictions of CCM with temporal context represented as 2Tt/(Tt + Ti) and to a lesser extent as (Tt/Ti)k, but not with HVA. Overall, these results suggest that preference increases as a hyperbolic function of Tt/Ti when terminal-link duration is increased relative to initial-link duration, with the terminal-link schedule ratio held constant.  相似文献   

20.
Subgroup analyses allow us to examine the influence of a categorical moderator on the effect size in meta‐analysis. We conducted a simulation study using a dichotomous moderator, and compared the impact of pooled versus separate estimates of the residual between‐studies variance on the statistical performance of the Q B (P) and Q B (S) tests for subgroup analyses assuming a mixed‐effects model. Our results suggested that similar performance can be expected as long as there are at least 20 studies and these are approximately balanced across categories. Conversely, when subgroups were unbalanced, the practical consequences of having heterogeneous residual between‐studies variances were more evident, with both tests leading to the wrong statistical conclusion more often than in the conditions with balanced subgroups. A pooled estimate should be preferred for most scenarios, unless the residual between‐studies variances are clearly different and there are enough studies in each category to obtain precise separate estimates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号