首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Huynh Huynh 《Psychometrika》1982,47(3):309-319
A Bayesian framework for making mastery/nonmastery decisions based on multivariate test data is described in this study. Overall, mastery is granted (or denied) if the posterior expected loss associated with such action is smaller than the one incurred by the denial (or grant) of mastery. An explicit form for the cutting contour which separates mastery and nonmastery states in the test score space is given for multivariate normal test scores and for a constant loss ratio. For multiple cutting scores in the true ability space, the test score cutting contour will resemble the boundary defined by multiple test cutting scores when the test reliabilities are reasonably close to unity. For tests with low reliabilities, decisions may very well be based simply on a suitably chosen composite score.This work was performed pursuant to Grant NIE-G-78-0087 with the National Institute of Education, Department of Health, Education, and Welfare, Huynh Huynh, Principal Investigator. Points of view or opinions stated do not necessarily reflect NIE positions or policy and no official endorsement should be inferred. The assistance of Joseph C. Saunders is gratefully acknowledged. The author is indebted to an anonymous referee who pointed out several computational errors in the earlier versions of the paper.  相似文献   

2.
Established results on latent variable models are applied to the study of the validity of a psychological test. When the test predicts a criterion by measuring a unidimensional latent construct, not only must the total score predict the criterion, but the joint distribution of criterion scores and item responses must exhibit a certain pattern. The presence of this population pattern may be tested with sample data using the stratified Wilcoxon rank sum test. Often, criterion information is available only for selected examinees, for instance, those who are admitted or hired. Three cases are discussed: (i) selection at random, (ii) selection based on the current test, and (iii) selection based on other measures of the latent construct. Discriminant validity is also discussed.This work was supported in part by Grant SES-87-01890 from the Measurement Methods and Data Improvement Program of the U.S. National Science Foundation.  相似文献   

3.
WhenK tests are given toN individuals, and for each individual there are two criterion measures, then (1) the multiple regression weight to be applied to the standard score for each test to predict the criterion-difference score equals the difference of the weights for predicting each criterion separately; (2) the difference between the predicted scores equals the predicted difference (each test being assigned the appropriate multiple regression weight); (3) the square of the multiple correlation between predicted and actual criterion-difference scores equals the sum of squares of the multiple correlations of the battery with each criterion less the product of these correlations and the correlation between predicted scores all divided by twice the quantity one minus the criterion intercorrelation; and (4) the variance of errors of estimating the criterion-difference score equals the sum of the variances of errors of estimating each criterion score minus twice the criterion intercorrelation, plus twice the correlation between predicted scores multiplied by the product of the square root of one minus the variance of errors of estimating one criterion and the corresponding square root for the second criterion.The author wishes to express his appreciation for the suggestions and guidance given by Dr. Harold Gulliksen in the preparation of this article. He also wishes to acknowledge the helpful comments of Dr. Paul Horst and Dr. Ledyard Tucker on certain phases of the development.  相似文献   

4.
The Gleser-DuBois conditions for selecting from a number of test items those which will maximize the correlation between total test score and criterion will degenerate into expressions requiring only item counts on total distributions and the upper halves of distributions. A grouping convention for scores near medians is recommended. The inefficiency of the method is easily compensated for, because, regardless of the size of the sample, only standard test-scoring equipment and brief computations are required. A procedure is outlined, and some applications are discussed.  相似文献   

5.
Mediation analysis uses measures of hypothesized mediating variables to test theory for how a treatment achieves effects on outcomes and to improve subsequent treatments by identifying the most efficient treatment components. Most current mediation analysis methods rely on untested distributional and functional form assumptions for valid conclusions, especially regarding the relation between the mediator and outcome variables. Propensity score methods offer an alternative whereby the propensity score is used to compare individuals in the treatment and control groups who would have had the same value of the mediator had they been assigned to the same treatment condition. This article describes the use of propensity score weighting for mediation with a focus on explicating the underlying assumptions. Propensity scores have the potential to offer an alternative estimation procedure for mediation analysis with alternative assumptions from those of standard mediation analysis. The methods are illustrated investigating the mediational effects of an intervention to improve sense of mastery to reduce depression using data from the Job Search Intervention Study (JOBS II). We find significant treatment effects for those individuals who would have improved sense of mastery when in the treatment condition but no effects for those who would not have improved sense of mastery under treatment.  相似文献   

6.
Self-reported SAT and ACT scores and grades in college-level mathematics and related courses were verified for 494 individuals. Correct responses decline for 2 years and then stabilize. The percentage of correct responses is approximately twice as high for individuals whose grades are in the upper tercile than for those whose grades are in the lowest tercile. Given that an incorrect report is made, the proportion of those overstating grades or test scores is approximately 0.50 for those in the upper tercile of the distribution, and 0.90 and 0.75 for those in the lowest tercile of the grade and test score distribution. The inflated reports of grades and test scores by individuals with low scores are interpreted as reconstructions of memory content based on failure experiences and the affective context of these experiences.  相似文献   

7.
Four rats' choices between two levers were differentially reinforced using a runs‐test algorithm. On each trial, a runs‐test score was calculated based on the last 20 choices. In Experiment 1, the onset of stimulus lights cued when the runs score was smaller than criterion. Following cuing, the correct choice was occasionally reinforced with food, and the incorrect choice resulted in a blackout. Results indicated that this contingency reduced sequential dependencies among successive choice responses. With one exception, subjects' choice rule was well described as biased coin flipping. In Experiment 2, cuing was removed and the reinforcement criterion was changed to a percentile score based on the last 20 reinforced responses. The results replicated those of Experiment 1 in successfully eliminating first‐order dependencies in all subjects. For 2 subjects, choice allocation was approximately consistent with nonbiased coin flipping. These results suggest that sequential dependencies may be a function of reinforcement contingency.  相似文献   

8.
HORST P 《Psychometrika》1951,16(2):189-202
Having given a fixed amount of total testing time it is important to know how long each test in the battery should be so that the correlation of the battery with the criterion will be a maximum. The precise solution for the test lengths will depend on a particular set of conditions which may be specified. The writer has previously presented solutions for two sets of conditions. This article presents the solution for a third set of conditions. These are: (1) The total number of items or testing time is fixed. (2) The score is the total number of items correctly answered. (3) The test lengths are determined in such a way that the correlation of total score with the criterion is a maximum. The solutions for the two previous sets of conditions, together with the current set, are summarized. A set of experimental data is submitted to each solution and the three sets of results are compared.  相似文献   

9.
An index is proposed to measure the extent of agreement of the data of a sociometric test with another test made at an earlier time or on another test criterion. The index is used to define an index of concordancebetween the two tests. It is shown how the index may be used for either individuals or groups. Tests of the hypothesis that agreement is random are given for all cases and applied to an example.Work done under the sponsorship of the Office of Naval Research.  相似文献   

10.
HORST P 《Psychometrika》1948,13(3):125-134
A battery of pencil-and-paper tests is commonly used for predicting a single criterion. If the score on each test is the number of correct answers, the composite battery score would normally be the sum of the weighted test scores, where the weights are the raw score regression weights. Knowing the reliability of each test, it is possible to alter the lengths of the tests in a manner such that the weights will all be equal. The composite battery score would then simply be the total number of items answered correctly and scoring would be greatly simplified. Such simplification is particularly desirable where the volume of testing is large. Section I of the article outlines the procedure for altering the lengths of the tests, and Section II gives a proof of the method.  相似文献   

11.
Three data sets are analyzed that permit double cross-validation of a test battery against criterion variables in a number of educational programs or jobs. The validity of the first general factor score is compared with that obtained from the set of cross-validated regression weights, and is found to account, respectively, for approximately 85, 90 and 120 percent as much criterion variance as the cross-validated regression weights. Small further contributions appear to be made by a mechanical/technical and by a psychomotor factor. However, for a wide range of criterion variables the major role in validity appears to be played by a common general factor.  相似文献   

12.
The present research validated the construct and criterion validities of the Cooperative and Competitive Personality Scale (CCPS) in a social dilemma context. The results from three studies supported the notion that cooperativeness and competitiveness are two independent dimensions, challenging the traditional view that they are two ends of a single continuum. First, confirmatory factor analyses revealed that a two‐factor structure fit the data significantly better than a one‐factor structure. Moreover, cooperativeness and competitiveness were either not significantly correlated (Studies 1 and 3) or only moderately positively correlated (Study 2). Second, cooperativeness and competitiveness were differentially associated with Schwartz's Personal Values. These results further supported the idea that cooperativeness and competitiveness are two distinct constructs. Specifically, the individuals who were highly cooperative emphasized self‐transcendent values (i.e., universalism and benevolence) more, whereas the individuals who were highly competitive emphasized self‐enhancement values (i.e., power and achievement) more. Finally, the CCPS, which adheres to the trait perspective of personality, was found to be a useful supplement to more prevalent social motive measures (i.e., social value orientation) in predicting cooperative behaviors. Specifically, in Study 2, when social value orientation was controlled for, the CCPS significantly predicted cooperative behaviors in a public goods dilemma (individuals who score higher on cooperativeness scale contributed more to the public goods). In Study 3, when social value orientation was controlled for, the CCPS significantly predicted cooperative behaviors in commons dilemmas (individuals who score higher on cooperativeness scale requested fewer resources from the common resource pool). The practical implications of the CCPS in conflict resolution, as well as in recruitment and selection settings, are discussed.  相似文献   

13.
The validity of a test is often estimated in a nonrandom sample of selected individuals. To accurately estimate the relation between the predictor and the criterion we correct this correlation for range restriction. Unfortunately, this corrected correlation cannot be transformed using Fisher'sZ transformation, and asymptotic tests of hypotheses based on small or moderate samples are not accurate. We developed a Fisherr toZ transformation for the corrected correlation for each of two conditions: (a) the criterion data were missing due to selection on the predictor (the missing data were MAR); and (b) the criterion was missing at random, not due to selection (the missing data were MCAR). The twoZ transformations were evaluated in a computer simulation. The transformations were accurate, and tests of hypotheses and confidence intervals based on the transformations were superior to those that were not based on the transformations.  相似文献   

14.
An attempt has been made in this paper to show that culture fair tests have some problems associated with them. These tests should be examined and reviewed closely before being used and should not be regarded as the answer to testing the culturally disadvantaged. The following points were made in this paper: Culture fair tests measure different psychological functions. Culture fair tests today measure such functions as spatial visualization, abstract reasoning, perceptual speed, etc. Culture fair tests vary considerably in format. Some are pencil and paper tests, some are performance tests. Some use verbal instructions, others do not. There are many test parameters along which culture fair tests now vary. Some evidence suggests that culture fair tests possibly increase the differential between the culturally disadvantaged and the more advantaged population. Use of these tests may not be in the best interests of minority groups. It is not yet definite about the kind of items on which culturally disadvantaged people perform poorer. Some evidence suggests that they do better on verbal items and worst on perceptual items, which is in contrast to the assumption of most proponents of culture-fair tests. The validity of culture fair tests has not been shown to be better than more traditional tests. In contrast, some research even indicates that they do not show relationships as high. What is to be done, if anything, about the test differentials between the culturally disadvantaged and the majority population? Some individuals (Lorge, 1964; Coffman, 1964) agree that the elimination of group differences on tests is futile and argue that the real task at hand is a realistic attempt to study the behavioral significance of test differences. In essence, this is an all out attempt to collect validation information. Does a particular score for a black have the same behavioral implications of a higher (or lower) test score for a white? Are there criterion differences that are related to test differences? Do differential validities exist for various subgroups? Are the standard errors of estimates different for different groups? This approach is essentially what has been pursued by individuals investigating the “moderating” effects of subgrouping by race and/or socio-economic factors. The investigation of test differences within and between subgroups is called for. In my opinion, attempting to mask test differentials by using culture fair tests may in actuality have a reverse effect than what was intended. Test differentials may actually increase, making it more difficult for culturally disadvantaged individuals to be selected into schools, jobs, etc. Clearly, the construction of culture fair tests is not the only answer to testing the disadvantaged.  相似文献   

15.
Recently there has been interest in the problem of determining an optimal passing score for a mastery test when the purpose of the test is to predict success or failure on an external criterion. For the case of constant losses for the two error types, a method of determining an optimal passing score is readily derived using standard techniques. The purpose of this note is to describe a lower bound to the probability of identifying an optimal passing score based on a random sample ofN examinees.The work upon which this publication is based was performed pursuant to a grant [contract] with the National Institute of Education, Department of Health, Education and Welfare. Points of view or opinions stated do not necessarily represent official NIE position or policy.  相似文献   

16.
According to prospect theory, individuals are risk averse regarding gains but risk seeking regarding losses, implying an S-shaped value function. The S-shaped value function hypothesis is based on experiments in which subjects are asked to choose separately between alternatives with either only positive or only negative outcomes, alternatives which rarely exist in the capital market. In addition, the S-shaped findings may be biased by the “certainty effect” and by probability distortion. In this paper we employ the recently developed prospect stochastic dominance criterion to test the prospect theory S-shaped value function hypothesis with mixed outcomes and with no “certainty effect.” Assuming that subjects do not distort moderate probabilities, we strongly reject the prospect theory S-shaped value function, with at least 76–86% of the choices being inconsistent with such preferences. When possible subjective probability distortions are taken into account, we find that at least 50–66% of the choices are inconsistent with an S-shaped value function.  相似文献   

17.
The purpose of this study was to test a priori predictions about the way in which avoidant personality disorder (APD) can be differentiated from depressive personality disorder (DPD) in a clinical population. Psychiatric outpatients were administered two measures of DPD, including the SCID-II for other DSM-IV Axis II personality disorders, along with criterion measures upon which the two disorders would be differentiated. APD was found to be most strongly associated with state and trait measures of anxiety, while DPD was most strongly associated with state and trait measures of hostility. Individuals with DPD had higher mean scores on measures of hostility than those without DPD, and individuals with APD had higher mean scores on measures of anxiety than those without APD. However, DPD measures were also significantly correlated with state and trait measures of anxiety and APD with measures of depressive symptoms. Furthermore, anxiety was found to be higher in some groups of individuals with DPD than those with APD. It is concluded that the level of hostility in this DPD population appears to be an important symptom by which to differentiate the two disorders and that a reconsideration of including DPD criterion #4 -- prone to brooding and worrying -- may be justified. Furthermore, the SCID-II interview may be better at differentiating DPD and APD than a self-report measure of DPD.  相似文献   

18.
Using scores of 1200 students on a long test as a criterion, each of five subtests of different difficulty has maximum correlation with the criterion when the criterion is dichotomized at a value appropriate to the difficulty of the subtest. A 50-item test element is scored on an all-or-none basis with different standards for passing, and the percentage of passes for successive points on the criterion variable is computed. The Constant Method is applied to this relationship. The limen thus computed is a measure of difficulty, the dispersion is a measure of average (or total) validity, and the slope of the curve is a measure of differential validity. The difficulty of a test element is thus directly related to the maximum differential validity.  相似文献   

19.
For 25 years psychologists have measured systematic measurement bias in terms of regression lines. According to this traditional approach a test is an unbiased predictor of a criterion for all subgroups if all subgroups have identical Y regression lines (i.e., identical slopes and identical Y intercepts). This paper shows that the traditional model is fundamentally incorrect and identical Y regression lines are not expected to occur with an unbiased test in a testing situation in which one group score lower than another group on both the test and criterion. This is the case even if the test is perfectly reliable. The traditional model for measuring bias actually results in a consistent error or bias against groups which score lower than average on both the test and criterion. In practice this bias operates against minority groups. Tests now thought to be unbiased or even biased in favor of minority groups may in fact be biased against minority groups. A new model of test bias, which is based solely on measurement principles, is briefly introduced. In this model unbiased tests produce groups with identical test-criterion common-factor axes having a slope of S YC/S XC and with each axis intersecting the group centroids.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号