首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
2.
3.
4.
Tests can be used either diagnostically (i.e., to confirm or rule out the presence of a condition in people suspected of having it) or as a screening instrument (determining who in a large group of people has the condition and often when those people are unaware of it or unwilling to admit to it). Tests that may be useful and accurate for diagnosis may actually do more harm than good when used as a screening instrument. The reason is that the proportion of false negatives may be high when the prevalence is high, and the proportion of false positives tends to be high when the prevalence of the condition is low (the usual situation with screening tests). My first aim of this article is to discuss the effects of the base rate, or prevalence, of a disorder on the accuracy of test results. My second aim is to review some of the many diagnostic efficiency statistics that can be derived from a 2 x 2 table, including the overall correct classification rate, kappa, phi, the odds ratio, positive and negative predictive power and some variants of them, and likelihood ratios. In the last part of this article, I review the recent Standards for Reporting of Diagnostic Accuracy guidelines (Bossuyt et al., 2003) for reporting the results of diagnostic tests and extend them to cover the types of tests used by psychologists.  相似文献   

5.
On distractor-identification tests students mark as many distractors as possible on each test item. A grading scale is developed for this type testing. The scale is optimal in that it is the unique scale giving an unbiased estimate of the student's true score, i.e., the score that would result if no guessing occurred. If the test is administered as a usual multiple choice test and graded using the usual correction for guessing scale, the expected item score is the same as for the distractor-identification testing using the optimal grading scale. However, the variance of the item score is shown to be less for distractor-identification testing than for usual multiple choice testing under certain conditions.  相似文献   

6.
Weanling and mature rats were presented with saccharin or saline solutions for 1 h on alternate days. Following exposure to saccharin, rats were injected with 0, 21, or 37 mg/kg of cyclophosphamide. Injections had no significant effect on saccharin preference in one-stimulus tests, but had a highly significant effect in two-stimulus tests.  相似文献   

7.
Generalizability of stratified-parallel tests   总被引:6,自引:0,他引:6  
  相似文献   

8.
This article investigates the origins of the intelligence test item known as the Ball and Field in Lewis M. Terman's Stanford Revision of the Binet-Simon Intelligence Scale. The question was initially raised by the resemblance of paleontological ocean bed floor tracings left by ancient creatures to the responses produced by children given the Ball and Field Test. A version of the Ball and Field Test was invented by Clifton F. Hodge, one of Terman's graduate school instructors who devised it as a result of his observations about how birds and other animals navigated and found their way. He then tested how humans and children located hidden objects and found that, in many ways, animals and humans used similar strategies for getting home or finding objects.  相似文献   

9.
Loglinear Rasch model tests   总被引:1,自引:0,他引:1  
Existing statistical tests for the fit of the Rasch model have been criticized, because they are only sensitive to specific violations of its assumptions. Contingency table methods using loglinear models have been used to test various psychometric models. In this paper, the assumptions of the Rasch model are discussed and the Rasch model is reformulated as a quasi-independence model. The model is a quasi-loglinear model for the incomplete subgroup × score × item 1 × item 2 × ... × itemk contingency table. Using ordinary contingency table methods the Rasch model can be tested generally or against less restrictive quasi-loglinear models to investigate specific violations of its assumptions.  相似文献   

10.
A general critical analysis of the median tests proposed by Wilson for certain analysis of variance hypotheses is presented. Specifically, discrepancies between the purported and actual approximate distributions of some of the test statistics are noted. Validity and power of the resulting tests are discussed.This work was sponsored in part by the Office of Naval Research while the author was at Stanford University. Reproduction in whole or in part is permitted for any purpose of the United States Government. The author wishes to thank Professors Fred C. Andrews, Lincoln E. Moses, and David L. Wallace for their helpful criticisms and suggestions in the writing of this paper.  相似文献   

11.
12.
13.
14.
15.
16.
A large number of experiments have found a moderate degree of dependence between subsequent tests of recognition and cued recall as described by the TW-function. This paper investigates the dependence in word pair recognition. Tests of word pair recognition are conducted with the subsequent test being free recall, cued recall, recognition, and cued recognition. The dependence is compared to subsequent tests of cued recognition (i.e. recognition of a target with the presence of a cue). The results are related to a general theory of memory called TECO (Target, Event, Cue, & Object, see Sikstr?m 1996b). This theory makes different quantitative predictions depending on the number of shared connections in the subsequent tests. Using a function suggested by TECO, different degrees of dependencies are predicted for pair and cued recognition. The predictions of the TECO-function show a non-significant deviation from observed data, whereas those of the TW-function deviate significantly in all conditions.  相似文献   

17.
18.
19.
Five different ability estimators—maximum likelihood [MLE ()], weighted likelihood [WLE ()], Bayesian modal [BME ()], expected a posteriori [EAP ()] and the standardized number-right score [Z ()]—were used as scores for conventional, multiple-choice tests. The bias, standard error and reliability of the five ability estimators were evaluated using Monte Carlo estimates of the unknown conditional means and variances of the estimators. The results indicated that ability estimates based on BME (), EAP () or WLE () were reasonably unbiased for the range of abilities corresponding to the difficulty of a test, and that their standard errors were relatively small. Also, they were as reliable as the old standby—the number-right score.  相似文献   

20.
Random item effects models provide a natural framework for the exploration of violations of measurement invariance without the need for anchor items. Within the random item effects modelling framework, Bayesian tests (Bayes factor, deviance information criterion) are proposed which enable multiple marginal invariance hypotheses to be tested simultaneously. The performance of the tests is evaluated with a simulation study which shows that the tests have high power and low Type I error rate. Data from the European Social Survey are used to test for measurement invariance of attitude towards immigrant items and to show that background information can be used to explain cross‐national variation in item functioning.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号