期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Controlling item exposure and test overlap on the fly in computerized adaptive testing

Shu‐Ying Chen Pui‐Wa Lei Wen‐Han Liao 《The British journal of mathematical and statistical psychology》2008,61(2):471-492

This paper proposes an on‐line version of the Sympson and Hetter procedure with test overlap control (SHT) that can provide item exposure control at both the item and test levels on the fly without iterative simulations. The on‐line procedure is similar to the SHT procedure in that exposure parameters are used for simultaneous control of item exposure rates and test overlap rate. The exposure parameters for the on‐line procedure, however, are updated sequentially on the fly, rather than through iterative simulations conducted prior to operational computerized adaptive tests (CATs). Unlike the SHT procedure, the on‐line version can control item exposure rate and test overlap rate without time‐consuming iterative simulations even when item pools or examinee populations have been changed. Moreover, the on‐line procedure was found to perform better than the SHT procedure in controlling item exposure and test overlap for examinees who take tests earlier. Compared with two other on‐line alternatives, this proposed on‐line method provided the best all‐around test security control. Thus, it would be an efficient procedure for controlling item exposure and test overlap in CATs. 相似文献

2.

Some cautions regarding statistical power in split-plot designs

Drake R. Bradley Ronald L. Russell 《Behavior research methods》1998,30(3):462-477

We show that if overall sample size and effect size are held constant, the power of theF test for a one-way analysis of variance decreases dramatically as the number of groups increases. This reduction in power is even greater when the groups added to the design do not produce treatment effects. If a second independent variable is added to the design, either a split-plot or a completely randomized design may be employed. For the split-plot design, we show that the power of theF test on the betweengroups factor decreases as the correlation across the levels of the within-groups factor increases. The attenuation in between-groups power becomes more pronounced as the number of levels of the withingroups factor increases. Sample size and total cost calculations are required to determine whether the split-plot or completely randomized design is more efficient in a particular application. The outcome hinges on the cost of obtaining (or recruiting) a single subject relative to the cost of obtaining a single observation: We call this thesubject-to-observation cost (SOC) ratio. Split-plot designs are less costly than completely randomized designs only when the SOC ratio is high, the correlation across the levels of the within-groups factor is low, and the number of such levels is small. 相似文献

3.

Predicting item exposure parameters in computerized adaptive testing

Shu‐Ying Chen Shing‐Hwang Doong 《The British journal of mathematical and statistical psychology》2008,61(1):75-91

The purpose of this study is to find a formula that describes the relationship between item exposure parameters and item parameters in computerized adaptive tests by using genetic programming (GP) – a biologically inspired artificial intelligence technique. Based on the formula, item exposure parameters for new parallel item pools can be predicted without conducting additional iterative simulations. Results show that an interesting formula between item exposure parameters and item parameters in a pool can be found by using GP. The item exposure parameters predicted based on the found formula were close to those observed from the Sympson and Hetter (1985) procedure and performed well in controlling item exposure rates. Similar results were observed for the Stocking and Lewis (1998) multinomial model for item selection and the Sympson and Hetter procedure with content balancing. The proposed GP approach has provided a knowledge‐based solution for finding item exposure parameters. 相似文献

4.

Classification accuracy and consistency of computerized adaptive testing

Ying Cheng Deanna L. Morgan 《Behavior research methods》2013,45(1):132-142

In this article, four item selection methods in computerized adaptive testing are examined in terms of classification accuracy and consistency, including two popular heuristics for constraint management, the maximum priority index (MPI) method and the weighted deviation modeling method, as well as the widely known maximum Fisher information method and randomized item selection as baselines. Results suggest that the MPI method is able to meet constraints and keep test overlap rate low. Among the four methods, it is the only one that manages to produce parallel forms in terms of content coverage and, consequently, the only method to which the idea of classification consistency applies. With tests as short as 12 items, the MPI method does fairly well in classifying examinees accurately and consistently. Its performance improves with longer tests. The effects of number of decision categories and cut score locations are also examined. Recommendations are made in the Discussion section. 相似文献

5.

Hypergeometric family and item overlap rates in computerized adaptive testing 总被引：1，自引：0，他引：1

Hua-Hua Chang Jinming Zhang 《Psychometrika》2002,67(3):387-398

相似文献

6.

Item-presentation controls for multidimensional item pools in computerized adaptive testing

Thomas J. Thomas 《Behavior research methods》1990,22(2):247-252

Item pools or item banks used in most testing situations are inherently multidimensional. This is especially a problem in computerized adaptive testing (CAT), which is driven by item response theory; item response theory requires that the item pool be unidimensional. This series of computer simulations demonstrates how alternative item-presentation controls (content-balancing and “mini-CATs”) may be employed in CAT to estimate ability accurately in spite of the violation of unidimensionality. Averaged, shorter mini-CATs provide the most accurate estimation of ability and ameliorate problems intrinsic to violating the unidimensionality assumption of item response theory. 相似文献

7.

Varying the valuating function and the presentable bank in computerized adaptive testing

Barrada JR Abad FJ Olea J 《The Spanish journal of psychology》2011,14(1):500-508

In computerized adaptive testing, the most commonly used valuating function is the Fisher information function. When the goal is to keep item bank security at a maximum, the valuating function that seems most convenient is the matching criterion, valuating the distance between the estimated trait level and the point where the maximum of the information function is located. Recently, it has been proposed not to keep the same valuating function constant for all the items in the test. In this study we expand the idea of combining the matching criterion with the Fisher information function. We also manipulate the number of strata into which the bank is divided. We find that the manipulation of the number of items administered with each function makes it possible to move from the pole of high accuracy and low security to the opposite pole. It is possible to greatly improve item bank security with much fewer losses in accuracy by selecting several items with the matching criterion. In general, it seems more appropriate not to stratify the bank. 相似文献

8.

Controversy and consensus regarding the use of cognitive ability testing in organizations

Murphy KR Cronin BE Tam AP 《The Journal of applied psychology》2003,88(4):660-671

Seven hundred three members of the Society for Industrial and Organizational Psychology indicated agreement or disagreement with 49 propositions regarding cognitive ability tests in organizations. There was consensus that cognitive ability tests are valid and fair, that they provide good but incomplete measures, that different abilities are necessary for different jobs, and that diversity is valuable. Items dealing with the unique status of cognitive ability were most likely to generate polarized opinions. A 2-factor model, classifying items as those reflecting societal concerns over the consequences of ability testing and those reflecting an emphasis on the unique status of "g," fit the data well, and these factors proved especially important for predicting responses to the more controversial items. 相似文献

9.

Comparability and validity of computerized adaptive testing with the MMPI-2 总被引：1，自引：0，他引：1

Roper BL Ben-Porath YS Butcher JN 《Journal of personality assessment》1995,65(2):358-371

The comparability and validity of a computerized adaptive (CA) Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 571 undergraduate college students. The CA MMPI-2 administered adaptively Scales L, E the 10 clinical scales, and the 15 content scales, utilizing the countdown method (Butcher, Keller, & Bacon, 1985). All subjects completed the MMPI-2 twice, with three experimental conditions: booklet test-retest, booklet-CA, and conventional computerized (CC)-CA. Profiles across administration modalities show a high degree of similarity, providing evidence for the comparability of the three forms. Correlations between MMPI-2 scales and other psychometric measures (Beck Depression Inventory; Symptom Checklist-Revised; State-Trait Anxiety and Anger Scales; and the Anger Expression Scale) support the validity of the CA MMPI-2. Substantial item savings may be realized with the implementation of the countdown procedure. 相似文献

10.

Comparability of computerized adaptive and conventional testing with the MMPI-2

Roper BL Ben-Porath YS Butcher JN 《Journal of personality assessment》1991,57(2):278-290

A computerized adaptive version and the standard version of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were administered 1 week apart to a sample of 155 college students to assess the comparability of the two versions. The countdown method was used to adaptively administer Scales L, F, the I0 clinical scales, and the 15 new content scales. Profiles across administration modalities show a high degree of similarity, providing evidence for the comparability of computerized adaptive and conventional testing with the MMPI-2. Substantial item savings were found with the adaptive version. Future directions in the study of adaptive testing with the MMPI-2 are discussed. 相似文献

11.

Using response times to detect aberrant responses in computerized adaptive testing

Wim?J.?van der?Linden Email author Edith?M.?L.?A.?van?Krimpen-Stoop 《Psychometrika》2003,68(2):251-265

A lognormal model for response times is used to check response times for aberrances in examinee behavior on computerized adaptive tests. Both classical procedures and Bayesian posterior predictive checks are presented. For a fixed examinee, responses and response times are independent; checks based on response times offer thus information independent of the results of checks on response patterns. Empirical examples of the use of classical and Bayesian checks for detecting two different types of aberrances in response times are presented. The detection rates for the Bayesian checks outperformed those for the classical checks, but at the cost of higher false-alarm rates. A guideline for the choice between the two types of checks is offered.This study received funding from the Law School Admission Council (LSAC). The opinions and conclusions contained in this paper are those of the authors and do not necessarily reflect the policy and position of LSAC. The authors are most indebted to Wim M. M. Tielen for his computational assistance and to the US Defense Manpower Data Center for the permission to use the ASVAB data set in the empirical examples. 相似文献

12.

New approaches to the design of computerized interviewing and testing systems

Robert L. Stout 《Behavior research methods》1981,13(4):436-442

Most computer interviewing and testing systems have adopted paper-and-pencil approaches to information gathering with little modification. However, computer technology offers two fundamental advantages over paper-and-pencil technology for psychological information gathering: (1) A computer can record ancillary data such as latencies and pressure on response keys during an interviewing session, and (2) A computer can react adaptively to special events as these arise during a session. Ways to capitalize on these advantages are outlined. A pilot study of interviewee behavior during a computer problem-screening interview is described, and the implications of the results for future research in the area are discussed. Passive and active computer testing systems occupy positions on a continuum between paper-based psychological testing and the flexible, but less well controlled, technology represented by the human. With its unique capabilities, computer technology has a special role to play in the future of psychological measurement. 相似文献

13.

Robust techniques for testing heterogeneity of variance effects in factorial designs

Ralph G. O'Brien 《Psychometrika》1978,43(3):327-342

Several ways of using the traditional analysis of variance to test heterogeneity of spread in factorial designs with equal or unequaln are compared using both theoretical and Monte Carlo results. Two types of spread variables, (1) the jackknife pseudovalues ofs ² and (2) the absolute deviations from the cell median, are shown to be robust and relatively powerful. These variables seem to be generally superior to the Z-variance and Box-Scheffé procedures.This research was sponsored by Public Health Service Training Grant MH-08258 from the National Institute of Mental Health. The author thanks Mark I. Appelbaum, Elliot M. Cramer, and Scott E. Maxwell for their helpful criticisms of this paper. An earlier version of this work was presented at the Annual Meeting of the Psychometric Society, Murray Hill, New Jersey, April, 1976. 相似文献

14.

Maximum information stratification method for controlling item exposure in computerized adaptive testing

Barrada JR Mazuela P Olea J 《Psicothema》2006,18(1):156-159

The proposal for increasing the security in Computerized Adaptive Tests that has received most attention in recent years is the a-stratified method (AS - Chang and Ying, 1999): at the beginning of the test only items with low discrimination parameters ( a ) can be administered, with the values of the a parameters increasing as the test goes on. With this method, distribution of the exposure rates of the items is less skewed, while efficiency is maintained in trait-level estimation. The pseudo-guessing parameter ( c ), present in the three-parameter logistic model, is considered irrelevant, and is not used in the AS method. The Maximum Information Stratified (MIS) model incorporates the c parameter in the stratification of the bank and in the item-selection rule, improving accuracy by comparison with the AS, for item banks with a and b parameters correlated and uncorrelated. For both kinds of banks, the blocking b methods (Chang, Qian and Ying, 2001) improve the security of the item bank. 相似文献

15.

Preliminary data on a computerized test of finger speed and endurance

Barry A. Tanner Richard L. Bowles 《Behavior research methods》1995,27(2):160-161

We set out to develop a computer-assisted finger-tapping task (the T3) that would measure motor speed much like the Reitan test, but that would also measure endurance. Data were collected for a convenience sample on both the T3 and the Reitan finger-tapping test. Moderate and significant correlations were obtained between the T3 and the Reitan test for both hands. Mean scores for the first 50 sec of the T3 were approximately 0.15 taps greater than the mean Reitan score for both the preferred and the nonpreferred hands, while the mean scores for the full 2 min of the T3 were 1.52 taps less than those of the Reitan test for the preferred hand, and 1.32 taps less for the nonpreferred hand. The mean for the last 40 sec with the preferred hand averaged 3.93 taps (7.62%) slower than for the first 40 sec, whereas for the nonpreferred hand, the difference was 5.12 taps (11.15%). These results are consistent with our intent to develop measures of (1) relatively pure motor speed (the first 50 sec of the T3); (2) motor speed combined with endurance (the full 2 min of the T3); and (3) finger endurance (the first 40 sec compared with the last 40 sec of the T3). 相似文献

16.

A simple test for heterogeneity of variance in complex factorial designs

John E. Overall J. Arthur Woodward 《Psychometrika》1974,39(3):311-318

A simple procedure for testing heterogeneity of variance is developed which generalizes readily to complex, multi-factor experimental designs. Monte Carlo Studies indicate that the Z-variance test statistic presented here yields results equivalent to other familiar tests for heterogeneity of variance in simple one-way designs where comparisons are feasible. The primary advantage of the Z-variance test is in the analysis of factorial effects on sample variances in more complex designs. An example involving a three-way factorial design is presented. 相似文献

17.

Verification of hypothesized factors in one hundred and fifteen objective personality test designs

R. B. Cattell S. S. Dubin D. R. Saunders 《Psychometrika》1954,19(3):209-230

This paper reports on the third of a series of four experiments using similar concepts and methods for objective personality measurement, and overlapping test batteries. One hundred students were measured with 115 tests. The scores were correlated and factored by a re-iterated multiple group centroid method. The 17 factors thus obtained were rotated toward a clear simple structure. The relation of the rotated factors to earlier ones is indicated, but no extensive interpretation is attempted. 相似文献

18.

Rotating item banks versus restriction of maximum exposure rates in computerized adaptive testing

Barrada JR Olea J Abad FJ 《The Spanish journal of psychology》2008,11(2):618-625

If examinees were to know, beforehand, part of the content of a computerized adaptive test, their estimated trait levels would then have a marked positive bias. One of the strategies to avoid this consists of dividing a large item bank into several sub-banks and rotating the sub-bank employed (Ariel, Veldkamp & van der Linden, 2004). This strategy permits substantial improvements in exposure control at little cost to measurement accuracy, However, we do not know whether this option provides better results than using the master bank with greater restriction in the maximum exposure rates (Sympson & Hetter, 1985). In order to investigate this issue, we worked with several simulated banks of 2100 items, comparing them, for RMSE and overlap rate, with the same banks divided in two, three... up to seven sub-banks. By means of extensive manipulation of the maximum exposure rate in each bank, we found that the option of rotating banks slightly outperformed the option of restricting maximum exposure rate of the master bank by means of the Sympson-Hetter method. 相似文献

19.

Comparing patients treated in special security civil state hospital units with patients in forensic programs

Bruce B. Way 《Behavioral sciences & the law》1985,3(2):227-235

Psychiatric inpatients in special units designed for the treatment/management of violence (secure care) are compared with inpatients in three forensic programs. Although program designers anticipated that secure care and forensic patients would be similar, they were not. Principally, secure care patients were lower functioning in the psychiatric areas and were more likely to have engaged in a physical assault in the last 30 days. 相似文献

20.

The maximum priority index method for severely constrained item selection in computerized adaptive testing

Ying Cheng Hua‐Hua Chang 《The British journal of mathematical and statistical psychology》2009,62(2):369-383

This paper introduces a new heuristic approach, the maximum priority index (MPI) method, for severely constrained item selection in computerized adaptive testing. Our simulation study shows that it is able to accommodate various non‐statistical constraints simultaneously, such as content balancing, exposure control, answer key balancing, and so on. Compared with the weighted deviation modelling method, it leads to fewer constraint violations and better exposure control while maintaining the same level of measurement precision. 相似文献