首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Conditional Covariance Theory and Detect for Polytomous Items   总被引:1,自引:0,他引:1  
This paper extends the theory of conditional covariances to polytomous items. It has been proven that under some mild conditions, commonly assumed in the analysis of response data, the conditional covariance of two items, dichotomously or polytomously scored, given an appropriately chosen composite is positive if, and only if, the two items measure similar constructs besides the composite. The theory provides a theoretical foundation for dimensionality assessment procedures based on conditional covariances or correlations, such as DETECT and DIMTEST, so that the performance of these procedures is theoretically justified when applied to response data with polytomous items. Various estimators of conditional covariances are constructed, and special attention is paid to the case of complex sampling data, such as those from the National Assessment of Educational Progress (NAEP). As such, the new version of DETECT can be applied to response data sets not only with polytomous items but also with missing values, either by design or at random. DETECT is then applied to analyze the dimensional structure of the 2002 NAEP reading samples of grades 4 and 8. The DETECT results show that the substantive test structure based on the purposes for reading is consistent with the statistical dimensional structure for either grade. This research was supported by the Educational Testing Service and the National Assessment of Educational Progress (Grant R902F980001), US Department of Education. The opinions expressed herein are solely those of the author and do not necessarily represent those of the Educational Testing Service. The author would like to thank Ting Lu, Paul Holland, Shelby Haberman, and Feng Yu for their comments and suggestions. Requests for reprints should be sent to Jinming Zhang, Educational Testing Service, MS 02-T, Rosedale Road, Princeton, NJ 08541, USA. E-mail: jzhang@ets.org  相似文献   

2.
3.
本研究以义务教育阶段学生识字量测验为工具,综合运用探索性结构方程建模(ESEM)以及非参数项目反应理论中的摩根量表(Mokken量表)和DETECT分析方法,探讨了识字能力的维度。探索性结构方程建模结果显示,识字的单维性模型优于多维模型,多维的结果更多的体现出一个难度维度的特征,即字频的作用。Mokken量表分析结果显示,1~2年级和3~9年级测验更倾向于单维量表的特征。DETECT分析结果显示,两个测验的D值趋近于零,表明识字能力是单维能力。结合三种分析方法,识字能力具有单维性。  相似文献   

4.
Various different item response theory (IRT) models can be used in educational and psychological measurement to analyze test data. One of the major drawbacks of these models is that efficient parameter estimation can only be achieved with very large data sets. Therefore, it is often worthwhile to search for designs of the test data that in some way will optimize the parameter estimates. The results from the statistical theory on optimal design can be applied for efficient estimation of the parameters.A major problem in finding an optimal design for IRT models is that the designs are only optimal for a given set of parameters, that is, they are locally optimal. Locally optimal designs can be constructed with a sequential design procedure. In this paper minimax designs are proposed for IRT models to overcome the problem of local optimality. Minimax designs are compared to sequentially constructed designs for the two parameter logistic model and the results show that minimax design can be nearly as efficient as sequentially constructed designs.  相似文献   

5.
在认知诊断评估中,评价认知模型与作答数据的拟合非常重要。已有的层级相合性指标(HCI)仅能用于评价连接规则下模型与数据的拟合情况,有必要研究分离规则下相合性指标。HCI假设某项目上正确作答,便推断其子项目上的错误作答为失拟。由于作答反应的随机性,提出基于假设检验的项目相合性指标。该指标可用于区分连接规则和分离规则的作答数据、评价Q矩阵质量和衡量作答数据中的噪音、还可为评价认知模型和选择认知诊断模型提供参考。  相似文献   

6.
Answer similarity indices were developed to detect pairs of test takers who may have worked together on an exam or instances in which one test taker copied from another. For any pair of test takers, an answer similarity index can be used to estimate the probability that the pair would exhibit the observed response similarity or a greater degree of similarity under the assumption that the test takers worked independently. To identify groups of test takers with unusually similar response patterns, Wollack and Maynes suggested conducting cluster analysis using probabilities obtained from an answer similarity index as measures of distance. However, interpretation of results at the cluster level can be challenging because the method is sensitive to the choice of clustering procedure and only enables probabilistic statements about pairwise relationships. This article addresses these challenges by presenting a statistical test that can be applied to clusters of examinees rather than pairs. The method is illustrated with both simulated and real data.  相似文献   

7.
J. O. Ramsay 《Psychometrika》1995,60(3):323-339
The probability that an examinee chooses a particular option within an item is estimated by averaging over the responses to that item of examinees with similar response patterns for the whole test. The approach does not presume any latent variable structure or any dimensionality. But simulated and actual data analyses are presented to show that when the responses are determined by a latent ability variable, this similarity-based smoothing procedure can reveal the dimensionality of ability very satisfactorily.The author wishes to acknowledge the support of the Natural Sciences and Engineering Research Council of Canada through grant A320, and to thank Educational Testing Service for making the data on the Advanced Placement Chemistry Exam available.  相似文献   

8.
With the advent of web-based technology, online testing is becoming a mainstream mode in large-scale educational assessments. Most online tests are administered continuously in a testing window, which may post test security problems because examinees who take the test earlier may share information with those who take the test later. Researchers have proposed various statistical indices to assess the test security, and one most often used index is the average test-overlap rate, which was further generalized to the item pooling index (Chang & Zhang, 2002, 2003). These indices, however, are all defined as the means (that is, the expected proportion of common items among examinees) and they were originally proposed for computerized adaptive testing (CAT). Recently, multistage testing (MST) has become a popular alternative to CAT. The unique features of MST make it important to report not only the mean, but also the standard deviation (SD) of test overlap rate, as we advocate in this paper. The standard deviation of test overlap rate adds important information to the test security profile, because for the same mean, a large SD reflects that certain groups of examinees share more common items than other groups. In this study, we analytically derived the lower bounds of the SD under MST, with the results under CAT as a benchmark. It is shown that when the mean overlap rate is the same between MST and CAT, the SD of test overlap tends to be larger in MST. A simulation study was conducted to provide empirical evidence. We also compared the security of MST under the single-pool versus the multiple-pool designs; both analytical and simulation studies show that the non-overlapping multiple-pool design will slightly increase the security risk.  相似文献   

9.
Some nonparametric dimensionality assessment procedures, such as DIMTEST and DETECT, use nonparametric estimates of item pair conditional covariances given an appropriately chosen subtest score as their basic building blocks. Such conditional covariances given some subtest score can be regarded as an approximation to the conditional covariances given an appropriately chosen unidimensional latent composite, where the composite is oriented in the multidimensional test space direction in which the subtest score measures best. In this paper, the structure and properties of such item pair conditional covariances given a unidimensional latent composite are thoroughly investigated, assuming a semiparametric IRT modeling framework called a generalized compensatory model. It is shown that such conditional covariances are highly informative about the multidimensionality structure of a test. The theory developed here is very useful in establishing properties of dimensionality assessment procedures, current and yet to be developed, that are based upon estimating such conditional covariances.In particular, the new theory is used to justify the DIMTEST procedure. Because of the importance of conditional covariance estimation, a new bias reducing approach is presented. A byproduct of likely independent importance beyond the study of conditional covariances is a rigorous score information based definition of an item's and a score's direction of best measurement in the multidimensional test space.This paper is based on a chapter of the first author's doctoral dissertation, written at the University of Illinois and supervised by the second author. Part of this research has been presented at the annual meeting of the National Council on Measurement in Education, San Francisco, April 1995.The authors would like to thank Jeff Douglas, Xuming He and Ming-mei Wang for their comments and suggestions. The research of the first author was partially supported by an ETS/GREB Psychometric Fellowship, and by Educational Testing Service Research Allocation Project 884-01. The research of the second author was partially supported by NSF grant DMS 97-04474.  相似文献   

10.
A linking design typically consists of a data collection procedure together with an item linking procedure that places item parameters calibrated from multiple test forms onto a common scale. This study considered 2 potentially useful item response theory linking designs. The first one is characterized by selecting a single set of common items across all multiple test forms, the precalibrated item parameters of which are kept fixed while the unknown parameters of the other items are being estimated. This linking design will be referred to as the fixed common-precalibrated item parameter design. However, data collected under this design could also be analyzed by the characteristic curve method, which constituted an alternative linking procedure. In this study, the relative merits of the 2 linking designs were examined with respect to their robustness against 3 manipulated conditions-namely, when the common items have imprecise estimates, when there is a noticeable difference in the average item difficulty between the common and the noncommon items, and when the examinees are heterogeneous in terms of their abilities. A parameter recovery study was conducted to achieve this purpose. The results indicated that both linking designs were capable of producing accurate linking of items and equivalent estimation of ability parameters under the 3 conditions. When the 2 designs were actually utilized in the development of an item bank, it was found that both linking designs produced quite consistent solutions despite minor differences on some item and ability estimates. Condition under which a linking design is preferred over the other is also provided in the Discussion section of this article.  相似文献   

11.
The Type I error rates and powers of three recent tests for analyzing nonorthogonal factorial designs under departures from the assumptions of homogeneity and normality were evaluated using Monte Carlo simulation. Specifically, this work compared the performance of the modified Brown-Forsythe procedure, the generalization of Box's method proposed by Brunner, Dette, and Munk, and the mixed-model procedure adjusted by the Kenward-Roger solution available in the SAS statistical package. With regard to robustness, the three approaches adequately controlled Type I error when the data were generated from symmetric distributions; however, this study's results indicate that, when the data were extracted from asymmetric distributions, the modified Brown-Forsythe approach controlled the Type I error slightly better than the other procedures. With regard to sensitivity, the higher power rates were obtained when the analyses were done with the MIXED procedure of the SAS program. Furthermore, results also identified that, when the data were generated from symmetric distributions, little power was sacrificed by using the generalization of Box's method in place of the modified Brown-Forsythe procedure.  相似文献   

12.
王阳  温忠麟  付媛姝 《心理科学进展》2020,28(11):1961-1969
常用的结构方程模型拟合指数存在一定局限, 如χ 2以传统零假设为目标假设, 无法验证模型, 而RMSEA和CFI等描述性的拟合指数不具备推断统计性质, 等效性检验有效弥补了这些问题。首先说明等效性检验如何评价单个模型的拟合, 并解释其与零假设检验的不同, 然后介绍等效性检验如何分析测量不变性, 接着用实证数据展示了等效性检验在单个模型评价和测量不变性检验中的效果, 并与传统模型评价方法比较。  相似文献   

13.
在认知诊断中还没有指标能在无作答数据情况下直接评价项目的属性分类准确率或属性判准率。项目水平上的属性分类准确率,与项目属性向量、项目参数、先验分布和作答反应等有关。综合各个影响因素定义了项目水平上的属性期望分类准确率指标,并将其用于组卷。模拟研究显示:新指标可十分准确地评价项目的属性判准率,新指标对于项目筛选十分重要;以模式分类准确率为评价指标,基于新指标的组卷方法与经典的组卷方法表现相当。  相似文献   

14.
Neuro-fuzzy networks have been successfully applied to extract knowledge from data in the form of fuzzy rules. However, one drawback with the neuro-fuzzy approach is that the fuzzy rules induced by the learning process are not necessarily understandable. The lack of readability is essentially due to the high dimensionality of the parameter space that leads to excessive flexibility in the modification of parameters during learning. In this paper, to obtain readable knowledge from data, we propose a new neuro-fuzzy model and its learning algorithm that works in a parameter space with reduced dimensionality. The dimensionality of the new parameter space is necessary and sufficient to generate human-understandable fuzzy rules, in the sense formally defined by a set of properties. The learning procedure is based on a gradient descent technique and the proposed model is general enough to be applied to other neuro-fuzzy architectures. Simulation studies on a benchmark and a real-life problem are carried out to embody the idea of the paper.  相似文献   

15.
This article compares the use of single- and multiple-item pools with respect to test security against item sharing among some examinees in computerized testing. A simulation study was conducted to make a comparison among different pool designs using the item selection method of maximum item information with the Sympson-Hetter exposure control and content balance. The results from the simulation study indicate that two-pool designs have a better degree of resistance to item sharing than do the single-pool design in terms of measurement precision in ability estimation. This article further characterizes the conditions under which employing a multiple-pool design is better than using a single, whole pool in terms of minimizing the number of compromised items encountered by examinees under a randomized item selection method. Although no current computerized testing program endorses the randomized item selection method, the results derived in this study can shed some light on item pool designs regarding test security for all item selection algorithms, especially those that try to equalize or balance item exposure rates by employing a randomized item selection method locally, such as the a-stratified-with-b-blocking method.  相似文献   

16.
The data obtained from one‐way independent groups designs is typically non‐normal in form and rarely equally variable across treatment populations (i.e. population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e. the analysis of variance F test) typically provides invalid results (e.g. too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non‐normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e. trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non‐normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non‐normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non‐normal.  相似文献   

17.
Human performance in cognitive testing and experimental psychology is expressed in terms of response speed and accuracy. Data analysis is often limited to either speed or accuracy, and/or to crude summary measures like mean response time (RT) or the percentage correct responses. This paper proposes the use of mixed regression for the psychometric modeling of response speed and accuracy in testing and experiments. Mixed logistic regression of response accuracy extends logistic item response theory modeling to multidimensional models with covariates and interactions. Mixed linear regression of response time extends mixed ANOVA to unbalanced designs with covariates and heterogeneity of variance. Related to mixed regression is conditional regression, which requires no normality assumption, but is limited to unidimensional models. Mixed and conditional methods are both applied to an experimental study of mental rotation. Univariate and bivariate analyzes show how within-subject correlation between response and RT can be distinguished from between-subject correlation, and how latent traits can be detected, given careful item design or content analysis. It is concluded that both response and RT must be recorded in cognitive testing, and that mixed regression is a versatile method for analyzing test data.I am grateful to Rogier Donders for putting his data at my disposal.  相似文献   

18.
Inference methods for null hypotheses formulated in terms of distribution functions in general non‐parametric factorial designs are studied. The methods can be applied to continuous, ordinal or even ordered categorical data in a unified way, and are based only on ranks. In this set‐up Wald‐type statistics and ANOVA‐type statistics are the current state of the art. The first method is asymptotically exact but a rather liberal statistical testing procedure for small to moderate sample size, while the latter is only an approximation which does not possess the correct asymptotic α level under the null. To bridge these gaps, a novel permutation approach is proposed which can be seen as a flexible generalization of the Kruskal–Wallis test to all kinds of factorial designs with independent observations. It is proven that the permutation principle is asymptotically correct while keeping its finite exactness property when data are exchangeable. The results of extensive simulation studies foster these theoretical findings. A real data set exemplifies its applicability.  相似文献   

19.
In optimal design research, designs are optimized with respect to some statistical criterion under a certain model for the data. The ideas from optimal design research have spread into various fields of research, and recently have been adopted in test theory and applied to item response theory (IRT) models. In this paper a generalized variance criterion is used for sequential sampling in the two-parameter IRT model. Some general principles are offered to enable a researcher to select the best sampling design for the efficient estimation of item parameters.  相似文献   

20.
车文博 《心理科学》2005,28(3):747-754
反应风格是共同方法偏差的主要来源之一。本文首先讨论反应风格的定义和类型,梳理其危害,认为反应风格能使测验分数出现偏差,影响测验信效度分析和变量关系分析,有必要控制其危害。然后介绍了常用的反应风格测量方法,包括计数法和模型法两大类,对测量方法的选择给出了建议,在此基础上,就如何结合反应风格的测量方法与残差回归法、偏相关法来控制反应风格危害给出建议。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号