首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 327 毫秒
1.
Huynh Huynh 《Psychometrika》1980,45(2):167-182
A nonrandomized minimax solution is presented for passing scores in the binomial error model. The computation does not require prior knowledge regarding an individual examinee or group test data for a population of examinees. The optimum passing score minimizes the maximum risk which would be incurred by misclassifications. A closed-form solution is provided for the case of constant losses, and tables are presented for a variety of situations including linear and quadratic losses. A scheme which allows for correction for guessing is also described.This work was performed pursuant to Grant No. NIE-G-78-0087 with the National Institute of Education, Department of Health, Education, and Welfare, Huynh Huynh, Principal Investigator. Points of view or opinions stated do not necessarily reflect NIE position or policy and no official endorsement should be inferred. The editorial assistance and comments of Anthony J. Nitko and of Joseph C. Saunders are gratefully acknowledged.  相似文献   

2.
A Bayesian approach for simultaneous optimization of test-based decisions is presented using the example of a selection decision for a treatment followed by a mastery decision. A distinction is made between weak and strong rules where, as opposed to strong rules, weak rules use prior test scores as collateral data. Conditions for monotonicity of optimal weak and strong rules are presented. It is shown that under mild conditions on the test score distributions and utility functions, weak rules are always compensatory by nature. The authors are indebted to Wilbert Kallenberg for his valuable comments and to Jan Gulmans for providing the data for the empirical example. The names of the authors are alphabetical; they are equally responsible for the contents of this paper.  相似文献   

3.
Huynh Huynh 《Psychometrika》1977,42(4):601-608
Two simple classes of mastery scores which are suitable for hand calculations are presented for beta-binomial test score distributions combined with linear and cubic referral success. The models provide a simple way to explore the consequences of selecting an arbitrary mastery score. Such assessment would be useful whenever the test user is not willing to posta priori a loss ratio, but wishes to look at the various consequences before aiming at a particular score.  相似文献   

4.
Recently there has been interest in the problem of determining an optimal passing score for a mastery test when the purpose of the test is to predict success or failure on an external criterion. For the case of constant losses for the two error types, a method of determining an optimal passing score is readily derived using standard techniques. The purpose of this note is to describe a lower bound to the probability of identifying an optimal passing score based on a random sample ofN examinees.The work upon which this publication is based was performed pursuant to a grant [contract] with the National Institute of Education, Department of Health, Education and Welfare. Points of view or opinions stated do not necessarily represent official NIE position or policy.  相似文献   

5.
Huynh Huynh 《Psychometrika》1980,45(1):107-120
This paper describes an asymptotic inferential procedure for the estimates of the false positive and false negative error rates. Formulas and tables are described for the computations of the standard errors. A simulation study indicates that the asymptotic standard errors may be used even with samples of 25 cases as long as the Kuder-Richardson Formula 21 reliability is reasonably large. Otherwise, a large sample would be required.This work was performed pursuant to Grant No NIE-G-78-0087 with the National Institute of Education, Department of Health, Education and Welfare, Huynh Huynh, Principal Investigator. Points of view or opinions stated do not necessarily reflect NIE position or policy and no official endorsement should be inferred. The editorial assistance of Joseph C. Saunders is gratefully acknowledged.  相似文献   

6.
The Developmental Test of Visual-motor Integration was administered to 63 children in regular classrooms and 51 children in Special Education. Prediction based on total score (r = .68) was similar to a multiple R utilizing only five scores (.67). The over-all raw test score, based on all 24 designs, correctly classified 85% of the children while a combined abbreviated score utilizing only 5 designs achieved an 80% differentiation. Results were interpreted as confirming the hypothesis of redundancy in this perceptual-motor test.  相似文献   

7.
The impact of response distortion (faking) on selection decisions was investigated. Participants (N = 224) completed the NEO-PI-R under instructions to “make the most favorable impression” and/or “answer honestly.” Those instructed to fake were often over-represented at the top of the score distributions as instructions to fake resulted in higher scores both between and within groups in a test–retest situation. There was significantly lower correspondence between participants’ honest scores and their faked scores as well as multiple instances where participants with unfavorable honest scores subsequently produced the most favorable scores when faking. Response distortion may remain a serious threat to the use of personality test scores in selection.
Adrian ThomasEmail:
  相似文献   

8.
A modern test that takes advantage of the opportunities provided by advancements in computer technology is the multimedia test. The purpose of this study was to investigate the criterion-related validity of a specific open-ended multimedia test, namely a webcam test, by means of a concurrent validity study. In a webcam test a number of work-related situations are presented and participants have to respond as if these were real work situations. The responses are recorded with a webcam. The aim of the webcam test which we investigated is to measure the effectiveness of social work behaviour. This first field study on a webcam test was conducted in an employment agency in The Netherlands. The sample consisted of 188 consultants who participated in a certification process. For the webcam test, good interrater reliabilities and internal consistencies were found. The results showed the webcam test to be significantly correlated with job placement success. The webcam test scores were also found to be related to job knowledge. Hierarchical regression analysis demonstrated that the webcam test has incremental validity up to and beyond job knowledge in predicting job placement success. The webcam test, therefore, seems a promising type of instrument for personnel selection.  相似文献   

9.
For assigning subjects to treatments the point of intersection of within-group regression lines is ordinarily used as the critical point. This decision rule is critized and, for several utility functions and any number of treatments, replaced by optimal monotone, nonrandomized (Bayes) rules. Both treatments with and without mastery scores are considered. Moreover, the effect of unreliable criterion scores on the optimal decision rule is examined, and it is illustrated how qualitative information can be combined with aptitude measurements to improve treatment assignment decisions. Although the models in this paper are presented with special reference to the aptitude-treatment interaction problem in education, it is indicated that they apply to a variety of situations in which subjects are assigned to treatments on the basis of some predictor score, as long as there are no allocation quota considerations.  相似文献   

10.
The validity of a univocal multiple-choice test is determined for varying distributions of item difficulty and varying degrees of item precision. Validity is a function of d 2 + v 2 , where d measures item unreliability and v measures the spread of item difficulties. When this variance is very small, validity is high for one optimum cutting score, but the test gives relatively little valid information for other cutting scores. As this variance increases, eta increases up to a certain point, and then begins to decrease. Screening validity at the optimum cutting score declines as this variance increases, but the test becomes much more flexible, maintaining the same validity for a wide range of cutting scores. For items of the type ordinarily used in psychological tests, the test with uniform item difficulty gives greater over-all validity, and superior validity for most cutting scores, compared to a test with a range of item difficulties. When a multiple-choice test is intended to reject the poorestF per cent of the men tested, items should on the average be located at or above the threshold for men whose true ability is at theFth percentile.This research was performed under contract Nop 536 with the Bureau of Naval Personnel, and received additional support from the Bureau of Research and Service, College of Education, University of Illinois.  相似文献   

11.
In this paper, we apply sequential one-sided confidence interval estimation procedures with β-protection to adaptive mastery testing. The procedures of fixed-width and fixed proportional accuracy confidence interval estimation can be viewed as extensions of one-sided confidence interval procedures. It can be shown that the adaptive mastery testing procedure based on a one-sided confidence interval with β-protection is more efficient in terms of test length than a testing procedure based on a two-sided/fixed-width confidence interval. Some simulation studies applying the one-sided confidence interval procedure and its extensions mentioned above to adaptive mastery testing are conducted. For the purpose of comparison, we also have a numerical study of adaptive mastery testing based on Wald's sequential probability ratio test. The comparison of their performances is based on the correct classification probability, averages of test length, as well as the width of the “indifference regions.” From these empirical results, we found that applying the one-sided confidence interval procedure to adaptive mastery testing is very promising.  相似文献   

12.
The Counterproductive Behavior Index (CBI) is a 120-item, true-false questionnaire developed to assess five aspects of counterproductive workplace behavior: Dependability Concerns, Aggression, Substance Abuse, Honesty Concerns, and Computer Abuse, plus an overall measure of Total Concerns. It also yields a Good Impression score. To assess predictive validity, undergraduates with significant work experience simulated persons who had each of the five counterproductive behaviors but were exercising care not to get caught trying to conceal that behavior. All differences between simulated and normative responding were highly significant, with a median sensitivity of .89 for a specificity of .90. For similar participants, construct validity correlations ranged from .37 though .72 with a median of .50, and the correlation of CBI Total Concerns with a Total Validity Index was .66. Test-retest reliabilities of the CBI scales ranged from .79 to .94 with a median correlation of .87. These compare favorably with previously reported internal consistencies (Cronbach alphas). Analysis of the CBI scores of the original normative group at different levels of Good Impression showed that none of the six Concerns scores were affected by attempts to make a good impression until the Good Impression score reached the 90th percentile.  相似文献   

13.
The content unreliability of an essay test is the error due to the items used or the content of the test. The reader unreliability is due to variation in judgment of the persons who read and score the essay test. The content reliability of an essay test is accordingly defined as being independent of the reader reliability. Formulae are derived for the reader reliability and for the content reliability. The content reliability is found to be equal to the geometric mean of the test reliabilities computed from the scores assigned by the two readers, divided by the reader reliability.  相似文献   

14.
Statistically based banding is often considered a viable method for minimizing adverse impact in test‐based employment decisions. By utilizing the standard error of the difference (SED), scores are equated based on the assumption that there is substantial unreliability in any single observed score. However, based on the derivations of Dudek, the formula commonly used to calculate the standard error of measurement (SEM) – a component that is typically used to calculate the SED – is incorrect. Specifically, utilizing the SEM when calculating the SED produces a band of observed scores around a true score, not a band of true scores around an observed score as would be appropriate for banding. This study compares the differences between banding‐based selection decisions when the appropriate SED formula – which utilizes the standard error of estimate – is and is not applied. Overall, results suggest that utilizing the appropriate formula for calculating the SED produces substantial variations in employment decisions. The potential legal and ethical implications of these discrepancies are discussed.  相似文献   

15.
Pan T  Yin Y 《心理学方法》2012,17(2):309-311
In the discussion of mean square difference (MSD) and standard error of measurement (SEM), Barchard (2012) concluded that the MSD between 2 sets of test scores is greater than 2(SEM)2 and SEM underestimates the score difference between 2 tests when the 2 tests are not parallel. This conclusion has limitations for 2 reasons. First, strictly speaking, MSD should not be compared to SEM because they measure different things, have different assumptions, and capture different sources of errors. Second, the related proof and conclusions in Barchard hold only under the assumptions of equal reliabilities, homogeneous variances, and independent measurement errors. To address the limitations, we propose that MSD should be compared to the standard error of measurement of difference scores (SEMx-y) so that the comparison can be extended to the conditions when 2 tests have unequal reliabilities and score variances.  相似文献   

16.
Researchers have been documenting the influence of framing upon decision making for more than two decades; decisions appear to change in response to superficial changes in the presentation of possible outcomes. Several studies of medical decision making have revealed; for instance, that clinical decisions differ when options are presented as gains (survival rates) rather than losses (mortality rates). However, most studies of framing effects in the medical domain have utilized a very limited number of clinical problems that have not allowed an adequate test of the prevalence of the phenomena. To extend previous studies, we presented three groups of subjects (experienced internists, residents, and third-year medical students) with booklets containing twelve hypothetical medical cases. Half of the subjects received gain versions and half received loss versions of the same cases. Chi-square analyses revealed that framing did not influence any of the decisions of medical students and influenced the decisions of residents and experienced physicians on only two of the clinical problems (the same two problems). It appears that the prevalence of framing effects in the clinical domain may be limited.  相似文献   

17.
谢悦  贾晓明 《心理科学》2021,(4):1004-1011
为探索高校咨询师面临的多重关系伦理情境、决策过程,对访谈17名高校心理咨询师的资料进行分析。结果:常见情境主要包括接送礼物承载的新关系、来访者和咨询师除咨询关系外的师生关系、咨询师与和来访者有关的第三人有关系、咨询师在咨询室之外的场所偶遇来访者、来访者有咨询师的联系方式、来访者和咨询师有身体接触等。决策表现为两种:经验主导型,决策时未意识到处于伦理情境只凭经验决策;伦理主导型,决策时意识到处在伦理情境。结论:高校心理咨询存在一些特殊多重关系,心理咨询师需增加具有伦理意识的决策。  相似文献   

18.
An analytic method is presented for optimally classifying individuals into two subgroups on the basis of a cutting score on a test or test composite. The development assumes the test and criterion scores to be normally distributed, and the correlation surface to be bivariate normal. It is further assumed that individuals belong to the first or second sub-group depending on whether their criterion score is above or below a specified value. The predictor cutting score is determined so as to maximize the expected value of the decision procedure, taking gains and losses associated with correct and incorrect assignments into account.The opinions expressed are those of the authors and do not necessarily reflect those of the Navy Department. This research was supported in part by a grant from the National Institute of Mental Health, MH 10449-01.  相似文献   

19.
Some developments in multivariate generalizability   总被引:2,自引:0,他引:2  
This article is concerned with estimation of components of maximum generalizability in multifacet experimental designs involving multiple dependent measures. Within a Type II multivariate analysis of variance framework, components of maximum generalizability are defined as those composites of the dependent measures that maximize universe score variance for persons relative to observed score variance. The coefficient of maximum generalizability, expressed as a function of variance component matrices, is shown to equal the squared canonical correlation between true and observed scores. Emphasis is placed on estimation of variance component matrices, on the distinction between generalizability- and decision-studies, and on extension to multifacet designs involving crossed and nested facets. An example of a two-facet partially nested design is provided.Appreciation is expressed to the Office of Research in Medical Education, University of Texas Medical Branch, for permitting use of their data.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号