首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper proposes an on‐line version of the Sympson and Hetter procedure with test overlap control (SHT) that can provide item exposure control at both the item and test levels on the fly without iterative simulations. The on‐line procedure is similar to the SHT procedure in that exposure parameters are used for simultaneous control of item exposure rates and test overlap rate. The exposure parameters for the on‐line procedure, however, are updated sequentially on the fly, rather than through iterative simulations conducted prior to operational computerized adaptive tests (CATs). Unlike the SHT procedure, the on‐line version can control item exposure rate and test overlap rate without time‐consuming iterative simulations even when item pools or examinee populations have been changed. Moreover, the on‐line procedure was found to perform better than the SHT procedure in controlling item exposure and test overlap for examinees who take tests earlier. Compared with two other on‐line alternatives, this proposed on‐line method provided the best all‐around test security control. Thus, it would be an efficient procedure for controlling item exposure and test overlap in CATs.  相似文献   

2.
We show that if overall sample size and effect size are held constant, the power of theF test for a one-way analysis of variance decreases dramatically as the number of groups increases. This reduction in power is even greater when the groups added to the design do not produce treatment effects. If a second independent variable is added to the design, either a split-plot or a completely randomized design may be employed. For the split-plot design, we show that the power of theF test on the betweengroups factor decreases as the correlation across the levels of the within-groups factor increases. The attenuation in between-groups power becomes more pronounced as the number of levels of the withingroups factor increases. Sample size and total cost calculations are required to determine whether the split-plot or completely randomized design is more efficient in a particular application. The outcome hinges on the cost of obtaining (or recruiting) a single subject relative to the cost of obtaining a single observation: We call this thesubject-to-observation cost (SOC) ratio. Split-plot designs are less costly than completely randomized designs only when the SOC ratio is high, the correlation across the levels of the within-groups factor is low, and the number of such levels is small.  相似文献   

3.
The purpose of this study is to find a formula that describes the relationship between item exposure parameters and item parameters in computerized adaptive tests by using genetic programming (GP) – a biologically inspired artificial intelligence technique. Based on the formula, item exposure parameters for new parallel item pools can be predicted without conducting additional iterative simulations. Results show that an interesting formula between item exposure parameters and item parameters in a pool can be found by using GP. The item exposure parameters predicted based on the found formula were close to those observed from the Sympson and Hetter (1985) procedure and performed well in controlling item exposure rates. Similar results were observed for the Stocking and Lewis (1998) multinomial model for item selection and the Sympson and Hetter procedure with content balancing. The proposed GP approach has provided a knowledge‐based solution for finding item exposure parameters.  相似文献   

4.
In this article, four item selection methods in computerized adaptive testing are examined in terms of classification accuracy and consistency, including two popular heuristics for constraint management, the maximum priority index (MPI) method and the weighted deviation modeling method, as well as the widely known maximum Fisher information method and randomized item selection as baselines. Results suggest that the MPI method is able to meet constraints and keep test overlap rate low. Among the four methods, it is the only one that manages to produce parallel forms in terms of content coverage and, consequently, the only method to which the idea of classification consistency applies. With tests as short as 12 items, the MPI method does fairly well in classifying examinees accurately and consistently. Its performance improves with longer tests. The effects of number of decision categories and cut score locations are also examined. Recommendations are made in the Discussion section.  相似文献   

5.
6.
Content balancing is one of the most important issues in computerized classification testing. To adapt to variable-length forms, special treatments are needed to successfully control content constraints without knowledge of test length during the test. To this end, we propose the notions of ‘look-ahead’ and ‘step size’ to adaptively control content constraints in each item selection step. The step size gives a prediction of the number of items to be selected at the current stage, that is, how far we will look ahead. Two look-ahead content balancing (LA-CB) methods, one with a constant step size and another with an adaptive step size, are proposed as feasible solutions to balancing content areas in variable-length computerized classification testing. The proposed LA-CB methods are compared with conventional item selection methods in variable-length tests and are examined with different classification methods. Simulation results show that, integrated with heuristic item selection methods, the proposed LA-CB methods result in fewer constraint violations and can maintain higher classification accuracy. In addition, the LA-CB method with an adaptive step size outperforms that with a constant step size in content management. Furthermore, the LA-CB methods generate higher test efficiency while using the sequential probability ratio test classification method.  相似文献   

7.
Item pools or item banks used in most testing situations are inherently multidimensional. This is especially a problem in computerized adaptive testing (CAT), which is driven by item response theory; item response theory requires that the item pool be unidimensional. This series of computer simulations demonstrates how alternative item-presentation controls (content-balancing and “mini-CATs”) may be employed in CAT to estimate ability accurately in spite of the violation of unidimensionality. Averaged, shorter mini-CATs provide the most accurate estimation of ability and ameliorate problems intrinsic to violating the unidimensionality assumption of item response theory.  相似文献   

8.
There has recently been much interest in computerized adaptive testing (CAT) for cognitive diagnosis. While there exist various item selection criteria and different asymptotically optimal designs, these are mostly constructed based on the asymptotic theory assuming the test length goes to infinity. In practice, with limited test lengths, the desired asymptotic optimality may not always apply, and there are few studies in the literature concerning the optimal design of finite items. Related questions, such as how many items we need in order to be able to identify the attribute pattern of an examinee and what types of initial items provide the optimal classification results, are still open. This paper aims to answer these questions by providing non‐asymptotic theory of the optimal selection of initial items in cognitive diagnostic CAT. In particular, for the optimal design, we provide necessary and sufficient conditions for the Q ‐matrix structure of the initial items. The theoretical development is suitable for a general family of cognitive diagnostic models. The results not only provide a guideline for the design of optimal item selection procedures, but also may be applied to guide item bank construction.  相似文献   

9.
In computerized adaptive testing, the most commonly used valuating function is the Fisher information function. When the goal is to keep item bank security at a maximum, the valuating function that seems most convenient is the matching criterion, valuating the distance between the estimated trait level and the point where the maximum of the information function is located. Recently, it has been proposed not to keep the same valuating function constant for all the items in the test. In this study we expand the idea of combining the matching criterion with the Fisher information function. We also manipulate the number of strata into which the bank is divided. We find that the manipulation of the number of items administered with each function makes it possible to move from the pole of high accuracy and low security to the opposite pole. It is possible to greatly improve item bank security with much fewer losses in accuracy by selecting several items with the matching criterion. In general, it seems more appropriate not to stratify the bank.  相似文献   

10.
任赫  黄颖诗  陈平 《心理科学进展》2022,30(5):1168-1182
计算机化分类测验(Computerized Classification Testing, CCT)能够高效地对被试进行分类, 已广泛应用于合格性测验及临床心理学中。作为CCT的重要组成部分, 终止规则决定测验何时停止以及将被试最终划分到何种类别, 因此直接影响测验效率及分类准确率。已有的三大类终止规则(似然比规则、贝叶斯决策理论规则及置信区间规则)的核心思想分别为构造假设检验、设计损失函数和比较置信区间相对位置。同时, 在不同测验情境下, CCT的终止规则发展出不同的具体形式。未来研究可以继续开发贝叶斯规则、考虑多维多类别情境以及结合作答时间和机器学习算法。针对测验实际需求, 三类终止规则在合格性测验上均有应用潜力, 而临床问卷则倾向应用贝叶斯规则。  相似文献   

11.
The goal of this study is to compare the handwriting behaviours of true and false writing. Based on the cognitive load and dis‐automaticity known to be experienced while communicating a deceptive message, we hypothesized a difference (in temporal and spatial, pressure measures and peak velocities) between the handwriting of true vs. false messages. Thirty‐four participants wrote true and false sentences on a digitizer, which is part of a new system called the Computerized Penmanship Evaluation Tool (ComPET). The ComPET evaluates brain‐hand performance, as manifested through handwriting behaviour, and was found to be a valid measure for detecting the dis‐automaticity that is indicative of certain diseases in the clinical field. Differences were found in mean pressure, spatial measures (mean stroke length and mean stroke height), but no differences were found in temporal measures and in the number of peak velocities. The use of ComPET in lie detection is discussed. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

12.
A lognormal model for response times is used to check response times for aberrances in examinee behavior on computerized adaptive tests. Both classical procedures and Bayesian posterior predictive checks are presented. For a fixed examinee, responses and response times are independent; checks based on response times offer thus information independent of the results of checks on response patterns. Empirical examples of the use of classical and Bayesian checks for detecting two different types of aberrances in response times are presented. The detection rates for the Bayesian checks outperformed those for the classical checks, but at the cost of higher false-alarm rates. A guideline for the choice between the two types of checks is offered.This study received funding from the Law School Admission Council (LSAC). The opinions and conclusions contained in this paper are those of the authors and do not necessarily reflect the policy and position of LSAC. The authors are most indebted to Wim M. M. Tielen for his computational assistance and to the US Defense Manpower Data Center for the permission to use the ASVAB data set in the empirical examples.  相似文献   

13.
Comparability and validity of computerized adaptive testing with the MMPI-2   总被引:1,自引:0,他引:1  
The comparability and validity of a computerized adaptive (CA) Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 571 undergraduate college students. The CA MMPI-2 administered adaptively Scales L, E the 10 clinical scales, and the 15 content scales, utilizing the countdown method (Butcher, Keller, & Bacon, 1985). All subjects completed the MMPI-2 twice, with three experimental conditions: booklet test-retest, booklet-CA, and conventional computerized (CC)-CA. Profiles across administration modalities show a high degree of similarity, providing evidence for the comparability of the three forms. Correlations between MMPI-2 scales and other psychometric measures (Beck Depression Inventory; Symptom Checklist-Revised; State-Trait Anxiety and Anger Scales; and the Anger Expression Scale) support the validity of the CA MMPI-2. Substantial item savings may be realized with the implementation of the countdown procedure.  相似文献   

14.
张雪琴  毛秀珍  李佳 《心理科学进展》2020,28(11):1970-1978
项目增补是题库建设和维护的重要手段, 而标定新题参数是项目增补的重要内容。在线标定设计和在线标定方法分别研究新题的施测方式和参数估计方法, 是计算机化自适应测验(computerized adaptive testing, CAT)情景下项目增补的核心技术。重点厘清在线标定设计与在线标定方法的发展思路和脉络, 并对它们的特点、联系和表现进行介绍和评价。未来应基于其他信息指标进一步研究在线标定设计, 可基于联合估计和误差校正的思路探究在线标定方法, 应加强研究认知诊断CAT和多维CAT的在线标定技术, 深入开展项目增补方法的实证研究。  相似文献   

15.
A computerized adaptive version and the standard version of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were administered 1 week apart to a sample of 155 college students to assess the comparability of the two versions. The countdown method was used to adaptively administer Scales L, F, the I0 clinical scales, and the 15 new content scales. Profiles across administration modalities show a high degree of similarity, providing evidence for the comparability of computerized adaptive and conventional testing with the MMPI-2. Substantial item savings were found with the adaptive version. Future directions in the study of adaptive testing with the MMPI-2 are discussed.  相似文献   

16.
Most computer interviewing and testing systems have adopted paper-and-pencil approaches to information gathering with little modification. However, computer technology offers two fundamental advantages over paper-and-pencil technology for psychological information gathering: (1) A computer can record ancillary data such as latencies and pressure on response keys during an interviewing session, and (2) A computer can react adaptively to special events as these arise during a session. Ways to capitalize on these advantages are outlined. A pilot study of interviewee behavior during a computer problem-screening interview is described, and the implications of the results for future research in the area are discussed. Passive and active computer testing systems occupy positions on a continuum between paper-based psychological testing and the flexible, but less well controlled, technology represented by the human. With its unique capabilities, computer technology has a special role to play in the future of psychological measurement.  相似文献   

17.
Seven hundred three members of the Society for Industrial and Organizational Psychology indicated agreement or disagreement with 49 propositions regarding cognitive ability tests in organizations. There was consensus that cognitive ability tests are valid and fair, that they provide good but incomplete measures, that different abilities are necessary for different jobs, and that diversity is valuable. Items dealing with the unique status of cognitive ability were most likely to generate polarized opinions. A 2-factor model, classifying items as those reflecting societal concerns over the consequences of ability testing and those reflecting an emphasis on the unique status of "g," fit the data well, and these factors proved especially important for predicting responses to the more controversial items.  相似文献   

18.
Several ways of using the traditional analysis of variance to test heterogeneity of spread in factorial designs with equal or unequaln are compared using both theoretical and Monte Carlo results. Two types of spread variables, (1) the jackknife pseudovalues ofs 2 and (2) the absolute deviations from the cell median, are shown to be robust and relatively powerful. These variables seem to be generally superior to the Z-variance and Box-Scheffé procedures.This research was sponsored by Public Health Service Training Grant MH-08258 from the National Institute of Mental Health. The author thanks Mark I. Appelbaum, Elliot M. Cramer, and Scott E. Maxwell for their helpful criticisms of this paper. An earlier version of this work was presented at the Annual Meeting of the Psychometric Society, Murray Hill, New Jersey, April, 1976.  相似文献   

19.
Barrada JR  Mazuela P  Olea J 《Psicothema》2006,18(1):156-159
The proposal for increasing the security in Computerized Adaptive Tests that has received most attention in recent years is the a-stratified method (AS - Chang and Ying, 1999): at the beginning of the test only items with low discrimination parameters ( a ) can be administered, with the values of the a parameters increasing as the test goes on. With this method, distribution of the exposure rates of the items is less skewed, while efficiency is maintained in trait-level estimation. The pseudo-guessing parameter ( c ), present in the three-parameter logistic model, is considered irrelevant, and is not used in the AS method. The Maximum Information Stratified (MIS) model incorporates the c parameter in the stratification of the bank and in the item-selection rule, improving accuracy by comparison with the AS, for item banks with a and b parameters correlated and uncorrelated. For both kinds of banks, the blocking b methods (Chang, Qian and Ying, 2001) improve the security of the item bank.  相似文献   

20.
About 30–40% of stroke patients suffer from visual field defects following injury. These can interfere with the standard neuropsychological assessment and complicate the interpretation of tests that use visual materials. However, information about the integrity of a patient's central visual field is often unavailable. We, therefore, designed a screening tool, the computerized visual field test (c‐VFT), specifically targeted at providing easily available, but rough, information about patients' central visual field. c‐VFT was tested in two samples of stroke patients. Eleven patients were tested on c‐VFT and on the Esterman test. Five patients were tested on c‐VFT and the Humphrey Visual Field Analyzer (HFA), central 10‐2. Criterion validity of the c‐VFT was investigated by calculating quadrantwise intraclass correlation for both comparisons. For the HFA comparison, we also calculated point‐to‐point intraclass correlation, sensitivity, and specificity. Analyses revealed moderately good correspondence between c‐VFT and the Esterman test, and between c‐VFT and HFA 10‐2, respectively. When looking specifically at test points within one degree of visual angle apart in the two tests, intraclass correlation increased. For these points, the sensitivity of c‐VFT was 0.89 and specificity was 0.97. While the c‐VFT is not designed to be diagnostic nor to replace the detailed visual field analysis, this study shows that it provides a reasonable screening of the central visual field. The test can easily be used and will be made freely available to neuropsychological clinicians and researchers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号