首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
当前国内外大部分认知诊断计算机化自适应测验(CD-CAT)主要采用PWKL作为选题策略进行研究。PWKL结合后验分布信息对KL指标进行加权,提高了判准率,但该方法仅利用个体层面信息加权,忽视了项目本身能够提供的信息,属于单源指标。本研究结合认知诊断中的项目区分度信息,对PWKL进行修正,提出了4种新的多源选题策略:GIDPWKL、AIDPWKL、CIDPWKL和KLEDPWKL方法,并在加入曝光控制下与PWKL和互信息法(MIM)进行比较。模拟研究结果表明:(1)在定长测验情景下的绝大多数实验结果表明,测验长度越短,新方法的判准率越高。平均属性/模式判准率最高的是GIDPWKL,之后是AIDPWKL,而CIDPWKL、KLEDPWKL和MIM方法的优势随实验条件不同而不同。(2)在定长测验情景下的绝大多数实验结果表明,题目质量越高,新方法的优势越明显。(3)Q矩阵结构的复杂性会影响不同选题策略的表现。(4)在变长测验情景下,4种新方法和MIM的平均测验长度均要低于PWKL方法,表现最好的是GIDPWKL方法。因此,若实际测验情景与本研究的模拟情景相似,推荐GIDPWKL方法。  相似文献   

2.
Two nomographs are presented for estimating item validity indices identical in value to those obtained from Flanagan's table and to those obtained from Davis' chart. Experience has shown that the use of the nomographs results in the saving of a significant amount of time with no loss in accuracy. The nomographs also provide a method of quick conversion between the familiar coefficients and the Davis indices, which are less familiar but which offer greater flexibility.© Robert M. Colver, 1959.  相似文献   

3.
4.
Hierarchical Bayes procedures for the two-parameter logistic item response model were compared for estimating item and ability parameters. Simulated data sets were analyzed via two joint and two marginal Bayesian estimation procedures. The marginal Bayesian estimation procedures yielded consistently smaller root mean square differences than the joint Bayesian estimation procedures for item and ability estimates. As the sample size and test length increased, the four Bayes procedures yielded essentially the same result.The authors wish to thank the Editor and anonymous reviewers for their insightful comments and suggestions.  相似文献   

5.
针对双目标CD-CAT,将六种项目区分度(鉴别力D、一般区分度GDI、优势比OR、2PL的区分度a、属性区分度ADI、认知诊断区分度CDI)分别与IPA方法结合,得到新的选题策略。模拟研究比较了它们的表现,还考察了区分度分层在控制项目曝光的表现。结果发现:新方法都能明显提高知识状态的判准率和能力估计精度;分层选题均能很好地提高题库利用率。总体上,OR加权能显著提高测量精度;OR分层选题在保证测量精度条件下显著提高项目曝光均匀性。  相似文献   

6.
Caution indices based on item response theory   总被引:2,自引:0,他引:2  
A new family of indices was introduced earlier as a link between two approaches: One based on item response theory and the other on sample statistics. In this study, the statistical properties of these indices are investigated and then the relationships to Guttman Scales, and to item and person response curves are discussed. Further, these indices are standardized, and an example of their potential usefulness for diagnosing students' misconceptions is shown.This research was sponsored by the Personnel and Training Research Program, Psychological Sciences Division, Office of Naval Research, under contract No. N00014-82-K-0604.  相似文献   

7.
te Pas SF  Koenderink JJ 《Perception》2004,33(12):1483-1497
Human observers seem to be able to use different features that classify materials with a large degree of accuracy. In this paper, we look at human perception of statistical properties of the spectral distribution in a scene. We investigated whether human observers can discriminate just as accurately between coloured textures that have a spectral distribution due either to shading only or to both shading and specular reflectance as between uniform colours. Thresholds for the discrimination of coloured textures are about 15 times as high as thresholds for the discrimination of uniform colours, provided there is a sharp transition between the two colours. However, the coloured texture thresholds are only 1.5 times higher when we introduce a gradual transition between the two colours. There are also distinct qualitative differences in discrimination thresholds for different base colours. These differences cannot be predicted from discrimination thresholds for uniform colours. Human observers are surprisingly good at discriminating between a material edge and a shadow edge in complex scenes. Statistical differences in the orientation of the colour distributions in colour space might be used to accomplish this. In a second experiment we investigated how well observers can discriminate between two linear distributions in colour space that have the same base colour but different orientations. When we vary the line-length in R, G, B space, thresholds cannot be predicted completely by the conservation of the average distance between the two distributions. This means that observers use not only the maximum colour difference in the stimulus to do the task, but other cues are also involved.  相似文献   

8.
The categorical discrimination of synthetic human speech sounds by rhesus macaques was examined using the cardiac component of the orienting response. A within-category change which consisted of stimuli differing acoustically in the onset of F2 and F3 transitions, but which are identified by humans as belonging to thesame phonetic category, were responded to differently from a no-change control condition. Stimuli which differed by the same amount in the onset of F2 and F3 transitions, but which human observers identify as belonging toseparate phonetic categories, were differentiated to an even greater degree than the within-category stimuli. The results provide ambiguous data for an articulatory model of human speech perception and are interpreted instead in terms of a feature-detector model of auditory perception.  相似文献   

9.
10.
Item response theory (IT) models are now in common use for the analysis of dichotomous item responses. This paper examines the sampling theory foundations for statistical inference in these models. The discussion includes: some history on the stochastic subject versus the random sampling interpretations of the probability in IRT models; the relationship between three versions of maximum likelihood estimation for IRT models; estimating versus estimating -predictors; IRT models and loglinear models; the identifiability of IRT models; and the role of robustness and Bayesian statistics from the sampling theory perspective.A presidential address can serve many different functions. This one is a report of investigations I started at least ten years ago to understand what IRT was all about. It is a decidedly one-sided view, but I hope it stimulates controversy and further research. I have profited from discussions of this material with many people including: Brian Junker, Charles Lewis, Nicholas Longford, Robert Mislevy, Ivo Molenaar, Donald Rock, Donald Rubin, Lynne Steinberg, Martha Stocking, William Stout, Dorothy Thayer, David Thissen, Wim van der Linden, Howard Wainer, and Marilyn Wingersky. Of course, none of them is responsible for any errors or misstatements in this paper. The research was supported in part by the Cognitive Science Program, Office of Naval Research under Contract No. Nooo14-87-K-0730 and by the Program Statistics Research Project of Educational Testing Service.  相似文献   

11.
In optimal design research, designs are optimized with respect to some statistical criterion under a certain model for the data. The ideas from optimal design research have spread into various fields of research, and recently have been adopted in test theory and applied to item response theory (IRT) models. In this paper a generalized variance criterion is used for sequential sampling in the two-parameter IRT model. Some general principles are offered to enable a researcher to select the best sampling design for the efficient estimation of item parameters.  相似文献   

12.
A mixture extension of signal detection theory is applied to source discrimination. The basic idea of the approach is that only a portion of the sources (say A or B) of items to be discriminated is encoded or attended to during the study period. As a result, in addition to 2 underlying probability distributions associated with the 2 sources, there is a 3rd distribution that represents items for which sources were not attended to. Thus, over trials, the observed response results from a mixture of an attended (A or B) distribution and a nonattended distribution. The situation differs in an interesting way from detection in that, for detection, there is mixing only on signal trials and not on noise trials, whereas for discrimination, there is mixing on both A and B trials. Predictions of the mixture model are examined for data from several recent studies and in a new experiment.  相似文献   

13.
Abstract.— Previous studies of sampling distributions have been conducted almost exclusively under the assumption that persons behave in accordance with the "fundamental convention" of probability, i.e. that the sum of all probability estimates will equal 1. When this assumption was tested by asking subjects to give "unrestricted" probability estimates of all possible outcomes of samples from a given population, a general tendency of overestimation made the sum of all probabilities exceed 1 to a considerable extent. The subjective sampling distributions appeared to be unaffected by sample size ( N=5 or 10) and number of outcomes, and were flatter than the corresponding "objective" sampling distributions.  相似文献   

14.
A test theory using only ordinal assumptions is presented. It is based on the idea that the test items are a sample from a universe of items. The sum across items of the ordinal relations for a pair of persons on the universe items is analogous to a true score. Using concepts from ordinal multiple regression, it is possible to estimate the tau correlations of test items with the universe order from the taus among the test items. These in turn permit the estimation of the tau of total score with the universe. It is also possible to estimate the odds that the direction of a given observed score difference is the same as that of the true score difference. The estimates of the correlations between items and universe and between total score and universe are found to agree well with the actual values in both real and artificial data.Part of this paper was presented at the June, 1989, Meeting of the Psychometric Society. The authors wish to thank several reviewers for their suggestions. This research was mainly done while the second author was a University Fellow at the University of Southern California.  相似文献   

15.
For a discrimination experiment, a plot of the hit rate against the false-alarm rate--the ROC curve--summarizes performance across a range of confidence levels. In many content areas, ROCs are well described by a normal-distribution model and the z-transformed hit and false-alarm rates are approximately linearly related. We examined the sampling distributions of three parameters of this model when applied to a ratings procedure: the area under the ROC (Az), the normalized difference between the means of the underlying signal and noise distributions (da), and the slope of the ROC on z-coordinates (s). Statistical bias (the degree to which the mean of the sampling distribution differed from the true value) was trivial for Az, small but noticeable for da, and substantial for s. Variability of the sampling distributions decreased with the number of trials and was also affected by the number of response categories available to the participant and by the overall sensitivity level. Figures in the article and tables available on line can be used to construct confidence intervals around ROC statistics and to test statistical hypotheses.  相似文献   

16.
17.
18.
The purpose of this note is to reconsider the Kelley-Cureton definition of optimal extreme groups for estimating item-criterion correlations. Optimal tail per cents are derived, using the criterion of minimum sampling variance of the tetrachoric correlation coefficient, and the findings are related to earlier work of Mosteller. It is shown that upper and lower 27 per cent groups yield the most precise estimate of the tetrachoric coefficient only when the population correlation is close to zero. When the population value is .4, extreme 20 per cent groups provide estimates with the smallest sampling error variance. It is further shown, however, that 27 per cent extremes yield highly efficient estimates. Thus no change is recommended in traditional item analysis procedures.  相似文献   

19.
The many null distributions of person fit indices   总被引:1,自引:0,他引:1  
This paper deals with the situation of an investigator who has collected the scores ofn persons to a set ofk dichotomous items, and wants to investigate whether the answers of all respondents are compatible with the one parameter logistic test model of Rasch. Contrary to the standard analysis of the Rasch model, where all persons are kept in the analysis and badly fittingitems may be removed, this paper studies the alternative model in which a small minority ofpersons has an answer strategy not described by the Rasch model. Such persons are called anomalous or aberrant. From the response vectors consisting ofk symbols each equal to 0 or 1, it is desired to classify each respondent as either anomalous or as conforming to the model. As this model is probabilistic, such a classification will possibly involve false positives and false negatives. Both for the Rasch model and for other item response models, the literature contains several proposals for a person fit index, which expresses for each individual the plausibility that his/her behavior follows the model. The present paper argues that such indices can only provide a satisfactory solution to the classification problem if their statistical distribution is known under the null hypothesis that all persons answer according to the model. This distribution, however, turns out to be rather different for different values of the person's latent trait value. This value will be called ability parameter, although our results are equally valid for Rasch scales measuring other attributes.As the true ability parameter is unknown, one can only use its estimate in order to obtain an estimated person fit value and an estimated null hypothesis distribution. The paper describes three specifications for the latter: assuming that the true ability equals its estimate, integrating across the ability distribution assumed for the population, and conditioning on the total score, which is in the Rasch model the sufficient statistic for the ability parameter.Classification rules for aberrance will be worked out for each of the three specifications. Depending on test length, item parameters and desired accuracy, they are based on the exact distribution, its Monte Carlo estimate and a new and promising approximation based on the moments of the person fit statistic. Results for the likelihood person fit statistic are given in detail, the methods could also be applied to other fit statistics. A comparison of the three specifications results in the recommendation to condition on the total score, as this avoids some problems of interpretation that affect the other two specifications.The authors express their gratitude to the reviewers and to many colleagues for comments on an earlier version.  相似文献   

20.
The current study examined the impact of both the tendency to worry (trait worry) and the process of worry (state worry) on subsequent behavioral responding in a schedule discrimination learning task. High and low trait worriers were randomly assigned to a state worry or relaxation incubation condition and completed a test of executive functioning and a dual contingency learning task that utilized neutral discriminative cues over the course of 2 contingency phases. Although state and trait worry did not impact executive functioning, the state worry condition was associated with diminished sensitivity to learning task contingencies over the course of the first contingency learning trials in comparison to the relaxation condition. This relationship was unique to the state worry condition above and beyond shared variance with subjective anxiety level. Results suggest that state worry may lead to a decrement in selective behavioral responding to neutral discriminative cues in the environment. The findings suggest that the process of worry may lead to less adaptive responding to neutral cues and interfere with adaptive behaviors, which may thereby contribute to and maintain anxiety.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号