首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
New formulas are developed to give lower bounds to the reliability of a test, whether or not all respondents attempt all items. The formulas apply in particular, then, to completed tests, pure speed tests, pure power tests, and any mixture of speed and power. For the case of completed tests, the formulas give the same answer as certain standard ones; for noncompleted tests the formulas give a correct answer where previous standard formulas are inappropriate. The formulas hold both in the sense of retest reliability and of parallel tests.This research was facilitated by an uncommitted grant-in-aid to the writer from the Behavioral Sciences Division of the Ford Foundation.  相似文献   

2.
Four experiments compare the effect of familiarity on item, associative, and plurality recognition on self-paced and speeded tests. The familiarity of test items was enhanced by presenting a prime that matched the subsequent test item. On item and plurality recognition tests, participants were more likely to respond "old" to primed than to unprimed test items. In associative recognition, priming increased the proportion of old responses on a speeded test, but not on a self-paced test. This suggests that familiarity plays a larger role in item and plurality recognition than in associative recognition on self-paced tests. On speeded tests, priming has a similar effect on item, associative, and plurality recognition. Results suggest that item and associative recognition rely differentially on familiarity and recollection. They are also consistent with recent evidence suggesting that different processes underlie plurality and associative recognition.  相似文献   

3.
A method is presented for converting the scores on one form of a test to those on another form of the same test. The method is particularly applicable to the case where each form has been administered to a different group and the only link between the two forms is a subset of items common to both. The proposed method, called theitem method of conversion, has been applied to several tests for which other methods of conversion are available for comparison. The necessary data are limited to tests for which the total score is the criterion for item analyses. The method gives highly satisfactory results for all the tests to which it has been applied, particularly when the two groups are rather different, in which case the delta method (a different item method) is inappropriate.The authors are only two of a group, including W. H. Angoff, F. M. Lord, and M. K. Schultz, all of whom have made important contributions to this paper.  相似文献   

4.
Pen-based computers are similar to paper and pencil (P&P) tests in the method of responding, and thus, may more closely match paper and pencil administration in construct equivalence than keyboard-entry computers. A study was conducted comparing P&P, pen-based note-book computer, and keyboard-entry PC versions of two test batteries. Participants completed tests administered using different administration modes on separate days; construct equivalence was evaluated by comparing Day 1-Day 2 correlations across conditions. Although construct equivalence was found for the power tests, differences emerged for the speeded tests. For the pen-based computer, solid evidence of equivalence to P&P appeared for all but one of the speeded tests, whereas the keyboard PC showed borderline equivalence for only one of the three speeded tests. These findings suggested that the pen-entry computer may be more capable than the keyboard-entry computer in maintaining construct equivalence to P&P tests.  相似文献   

5.
A speeded item response model is proposed. We consider the situation where examinees may retain the harder items to a later test period in a time limit test. With such a strategy, examinees may not finish answering some of the harder items within the allocated time. In the proposed model, we try to describe such a mechanism by incorporating a speeded-effect term into the two-parameter logistic item response model. A Bayesian estimation procedure of the current model using Markov chain Monte Carlo is presented, and its performance over the two-parameter logistic item response model in a speeded test is demonstrated through simulations. The methodology is applied to physics examination data of the Department Required Test for college entrance in Taiwan for illustration.  相似文献   

6.
Within-dimension conjunction search for red-green targets amongst red-blue, and blue-green, nontargets is extremely inefficient (Wolfe et al, 1990 Journal of Experimental Psychology: Human Perception and Performance 16 879-892). We tested whether pairs of red-green conjunction targets can nevertheless be processed spatially in parallel. Participants made speeded detection responses whenever a red-green target was present. Across trials where a second identical target was present, the distribution of detection times was compatible with the assumption that targets were processed in parallel (Miller, 1982 Cognitive Psychology 14 247-279). We show that this was not an artifact of response-competition or feature-based processing. We suggest that within-dimension conjunctions can be processed spatially in parallel. Visual search for such items may be inefficient owing to within-dimension grouping between items.  相似文献   

7.
The factor structures of two recently developed measures of emotional intelligence, the Situational Test of Emotional Understanding and Situational Test of Emotion Management (STEU, STEM; MacCann & Roberts, 2008) were examined. The results did not support a factor structure of either measure’s subscales indicated by the approach used in developing the test items, and examination of the factors obtained using parallel analysis to determine the number of factors to extract did not yield interpretable factors. These findings suggest that only total scale scores should be used for these tests, although the general factor extracted from the items was not strong for either test; further development work on these tests is indicated.  相似文献   

8.
A new algorithm for obtaining exact person fit indexes for the Rasch model is introduced which realizes most powerful tests for a very general family of alternative hypotheses, including tests concerning DIF as well as model-deviating item correlations. The method is also used as a goodness-of-fit test for whole data sets where the item parameters are assumed to be known. For tests with 30 items at most, exact values are obtained, for longer tests a Monte Carlo-algorithm is proposed. Simulated examples and an empirical investigation demonstrate test power and applicability to item elimination.The author wishes to thank Elisabeth Ponocny-Seliger and the reviewers for many helpful comments. All exact goodness-of-fit tests proposed in this article are implemented in the menu-driven program T-Rasch 1.0 by Ponocny and Ponocny-Seliger (1999) which can be obtained from ProGAMMA (WWW: http://www.gamma.rug.nl) and also performs nonparametric tests.  相似文献   

9.
Despite their widespread use in personnel selection, there is concern that cheating could undermine the validity of unproctored Internet‐based tests. This study examined the presence of cheating in a speeded ability test used for personnel selection. The same test was administered to applicants in either proctored or unproctored conditions. Item response theory differential functioning analyses were used to evaluate the equivalence of the psychometric properties of test items across proctored and unproctored conditions. A few items displayed different psychometric properties, and the nature of these differences was not uniform. Theta scores were not reflective of widespread cheating among unproctored examinees. Thus, results were not consistent with what would be expected if cheating on unproctored tests was pervasive.  相似文献   

10.
On a multiple-choice test in which each item hask alternative responses, the test taker is permitted to choose any subset which he believes contains the one correct answer. A scoring system is devised that depends on the size of the subset and on whether or not the correct answer is eliminated. The mean and variance of the score per item are obtained. Methods are derived for determining the total number of items that should be included on the test so that the average score on all items can be regarded as a good measure of the subject's knowledge. Efficiency comparisons between conventional and the subset selection scoring procedures are made. The analogous problem ofr > 1 correct answers for each item (withr fixed and known) is also considered.The authors are grateful to M. Aitkin, C. Coombs, F. Lord, and the reviewers for their comments and suggestions.  相似文献   

11.
A variety of procedures have been used to assess automatic retrieval effects on memory, including implicit memory tests and the process dissociation approach. Theoretical concerns with each are summarized prior to describing a procedure for evaluating automatic retrieval that is based on retrieval speed. Specifically, in a speeded implicit task, participants were encouraged to complete word stems using strictly automatic retrieval by presenting several practice test trials that did not allow responding based on previously studied items and by encouraging speed of responding. This speeded implicit task was compared with a condition in which conscious retrieval of studied information was not possible and a condition in which conscious retrieval was required, providing converging evidence to support the hypothesis that the speeded implicit procedure can yield pure estimates of automatic retrieval. Furthermore, evidence from a standard implicit memory task yielded comparable data that suggests that participants engaged automatic retrieval processes on this task also.  相似文献   

12.
Two experiments investigated the possibility that external statements in the Rotter I-E scale are more depressing in tone than internal statements, and thus depressed subjects may respond to external items due to item mood level rather than locus of control content. Results of Experiment 1 revealed that the external alternative was rated as more depressing than its internal counterpart for the majority Of the 23 I-E items (18 for females and 15 for males), while a small number of I-E items (3 for females and 6 for males) contained internal and external statements rated as balanced for depressing Content. For two I-E items the internal alternative was rated as more depressing. Results of Experiment 2 revealed that endorsement of external items was significantly related to self-reported depression for both total I-E score and for the item subset where external statements(as revealed in Experiment 1) were the more depressing of the item pair. External endorsement was not significantly related to depression for the I-E item subset where options are balanced for mood level, while endorsement of internal statements was related to depression only for the item subset where the internal option was rated as more depressing. These results were interpreted as supporting prior research which demonstrated mood response set using altered Rotter I-E scale items, Implications for use of the Rotter I-E scale in the study of depression were discussed.  相似文献   

13.
In two experiments, we investigated whether re-exposure to previously studied items at test affects false recognition in the DRM paradigm. Furthermore, we examined whether exposure to the critical lure at test influences memory for subsequently presented study items. In Experiment 1, immediately following each studied DRM list, participants were given a recognition test. The tests were constructed such that the number of studied items preceding the critical lure varied from zero to five. Neither false recognition for critical lures nor accurate memory for studied items was affected by this manipulation. In Experiment 2, we replicated this pattern of results under speeded conditions at test. Both experiments confirm that exposure to previously studied items at test does not affect true or false recognition in the DRM paradigm. This pattern strongly suggests that retrieval processes do not influence false recognition in the DRM paradigm.  相似文献   

14.
In two experiments, we investigated whether re-exposure to previously studied items at test affects false recognition in the DRM paradigm. Furthermore, we examined whether exposure to the critical lure at test influences memory for subsequently presented study items. In Experiment 1, immediately following each studied DRM list, participants were given a recognition test. The tests were constructed such that the number of studied items preceding the critical lure varied from zero to five. Neither false recognition for critical lures nor accurate memory for studied items was affected by this manipulation. In Experiment 2, we replicated this pattern of results under speeded conditions at test. Both experiments confirm that exposure to previously studied items at test does not affect true or false recognition in the DRM paradigm. This pattern strongly suggests that retrieval processes do not influence false recognition in the DRM paradigm.  相似文献   

15.
This is an historical review and contemporary empirical evaluation of the Motivation Analysis Test (MAT), one of the first tests to take a psychometric approach to the assessment of motivation. Reviews were quite positive, but the test is now over 50 years old. Nevertheless, it employs innovations in measurement not widely used in objective measurement then or now: (1) subtests with different formats, (2) disguised items, (3) speeded administration procedures, and (4) ipsative format and scoring procedures. These issues are discussed and a contemporary sample (N = 360) obtained to evaluate the Motivation Analysis Test in light of its innovative characteristics.  相似文献   

16.
Five experiments examined whether retrieval-induced-forgetting effects are observed for implicit tests of memory. In each experiment participants first studied category-exemplar paired associates, then practiced retrieval for a subset of items from a subset of categories before finally completing memory tests for all the studied items. In standard fashion, inhibition was measured as the performance difference of unpracticed items from practiced categories and unpracticed items from unpracticed categories. Across the 5 experiments poorer performance for unpracticed items was seen in conceptual implicit memory (category generation and category matching) but not in perceptual implicit memory (stem completion, perceptual identification). Thus, retrieval-induced-forgetting effects are limited to tests of conceptual memory.  相似文献   

17.
Maximizing the discriminating power of a multiple-score test involves maximizing the homogeneity of each subtest and minimizing the correlations between subtests. A method is presented for constructing such tests from items whose intercorrelations are not too high. Under certain restrictions the saturation, defined as the proportion of inter-item covariance to total variance, is maximized for each subtest. The nucleus of each subtest is three items with high covariancesinter se. All items which will lower the saturation are discarded; the one item is added which will maximize the saturation of the resultant test. This process is repeated until all the items are included or discarded for that subtest. If the correlation between any such subtests approaches the geometric mean of their saturations, their items form a new pool for one or more subtests. Formulas are presented for deciding which items to eliminate in order to reduce further the correlations between subtests.This research was supported in part by the United States Air Force under Contract AF 33(038)-10588 with Human Resources Research Center, Lackland Air Force Base, San Antonio, Texas. Permission is granted for reproduction, translation, publication use and disposal in whole and in part by or for the United States Government.  相似文献   

18.
The use of unproctored internet‐based testing (UIT) for employee selection is quite widespread. Although this mode of testing has advantages over onsite testing, researchers and practitioners continue to be concerned about potential malfeasance (e.g., cheating and response distortion) under high‐stakes conditions. Therefore, the primary objective of the present study was to investigate the magnitude and extent of high‐ and low‐stakes retest effects on the scores of a UIT speeded cognitive ability test and two UIT personality measures. These data permitted inferences about the magnitude and extent of malfeasant responding. The study objectives were accomplished by implementing two within‐subjects design studies ( Study 1 N=296; Study 2 N=318) in which test takers first completed the tests as job applicants (high‐stakes) or incumbents (low‐stakes) then as research participants (low‐stakes). For the speeded cognitive ability measure, the pattern of test score differences was more consonant with a psychometric practice effect than a malfeasance explanation. This result is likely due to the speeded nature of the test. And for the UIT personality measures, the pattern of higher high‐stakes scores compared with lower low‐stakes scores is similar to those reported for proctored tests in the extant literature. Thus, our results indicate that the use of a UIT administration does not uniquely threaten personality measures in terms of elevated scores under high‐stakes testing that are higher than those observed for proctored tests in the extant literature.  相似文献   

19.
刘玥  刘红云 《心理科学》2015,(6):1504-1512
研究旨在探索无铆题情况下,使用构造铆测验法,实现测验分数等值。研究一和研究二分别探索题目难度排序错误、铆题难度差异对构造铆测验法的影响。结果表明:(1)等组条件下,随着错误铆题比例,难度排序错误程度,铆题难度差异增大,构造铆测验法的等值误差逐渐增大,随机等组法的等值误差较为稳定;不等组条件下,构造铆测验法的等值误差均小于随机等组法;(2)对于构造铆测验法,在不等组条件下,铆测验长度越短,等值误差越大。  相似文献   

20.
HORST P 《Psychometrika》1948,13(3):125-134
A battery of pencil-and-paper tests is commonly used for predicting a single criterion. If the score on each test is the number of correct answers, the composite battery score would normally be the sum of the weighted test scores, where the weights are the raw score regression weights. Knowing the reliability of each test, it is possible to alter the lengths of the tests in a manner such that the weights will all be equal. The composite battery score would then simply be the total number of items answered correctly and scoring would be greatly simplified. Such simplification is particularly desirable where the volume of testing is large. Section I of the article outlines the procedure for altering the lengths of the tests, and Section II gives a proof of the method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号