首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Although difference scores are widely used in classifying children as learning-disabled, their psychometric properties are often not well understood. Such scores generally contain more error than single test scores. Reliability and standard error of measurement figures for several combinations of ability and achievement measures are presented. The rates and types of errors that occur when such scores are used to classify children as learning-disabled are discussed. Three recommendations for using difference scores are given: (a) combinations of ability and achievement tests that yield difference score reliabilities higher than .80 should be used when classifying children; (b) scores should be reported as a band of scores (± one standard error of measurement) to inform decision-makers regarding the amount of error estimated to be in the score, and (c) the criterion score for classifying the learning disabled should be set after consideration of the rate and types of errors likely to occur.  相似文献   

2.
In experimental research, it is not uncommon to assign clusters to conditions. When analysing the data of such cluster-randomized trials, a multilevel analysis should be applied in order to take into account the dependency of first-level units (i.e., subjects) within a second-level unit (i.e., a cluster). Moreover, the multilevel analysis can handle covariates on both levels. If a first-level covariate is involved, usually the within-cluster effect of this covariate will be estimated, implicitly assuming the contextual effect to be equal. However, this assumption may be violated. The focus of the present simulation study is the effects of ignoring the inequality of the within-cluster and contextual covariate effects on parameter and standard error estimates of the treatment effect, which is the parameter of main interest in experimental research. We found that ignoring the inequality of the within-cluster and contextual effects does not affect the estimation of the treatment effect or its standard errors. However, estimates of the variance components, as well as standard errors of the constant, were found to be biased.  相似文献   

3.
The REMBRANDT system for multicriteria decision analysis consists of both the multiplicative variant of the AHP (which employs a method of pairwise comparative judgements by a decision maker to arrive at final impact scores for the alternatives under consideration) and SMART, the simple multiattribute rating technique (which utilizes direct rating of alternatives to achieve final impact scores). This paper examines the effect of imprecision or uncertainty in the decision maker's pairwise judgements or ratings of alternatives by expressing each pairwise judgement or rating as a probability distribution, and the structure of REMBRANDT's component models is exploited to derive interval judgements or interval ratings of the alternatives’ final impact scores. These interval judgements or interval ratings can be used to determine the probability of rank reversal amongst alternatives, i.e. to assess the stability of the final impact score vector. © 1998 John Wiley & Sons, Ltd.  相似文献   

4.
The multivariate rather than the univariate range correction is used for estimating unrestricted applicant population validities in many military test validity studies but not uniformly. A Monte Carlo approach compared the standard errors of range-corrected validities under various experimental conditions adhering to the assumptions underlying correction accuracy. The multivariate corrected validities had smaller standard errors than both the univariate-corrected validities and the unrestricted validities. We conclude that using the univariate correction could fail to reveal the most valid selection instrument and that the multivariate correction should be used when scores for relevant predictors are available for the unrestricted population.  相似文献   

5.
The method of selecting among job applicants using statistically based banding has been proposed over the last 10 years as a way to increase workforce diversity. The method continues to be reviewed by academics and considered by practitioners. Although the goal of increasing workforce diversity is important, statistical banding of scores remains controversial. We present a set of unique, statistically and theoretically based criticisms of a form of banding (top‐score‐referenced banding) that is widely used in hundreds of jobs in the public sector throughout the United States. We suggest that even within the premises of such banding, the wrong formula is used to estimate the standard error of measurement and standard error of the difference. One consequence is that too many individuals are labeled as essentially equal with respect to test scores. A related consequence is that test scores within a single band are statistically different and should therefore be treated as such for selection purposes. A more logically and statistically defensible procedure for responding to diversity concerns is to continue to attend to adverse impact issues at each step of the recruiting and test development process.  相似文献   

6.
Eye movements, alternating movements, rapid pointing movements, and various tremors were measured on patients with Parkinson's disease (n = 21), on Cree subjects exposed to methylmercury (n = 36), and on healthy control subjects (n = 30). Neuromotor profiles were created according to thirty characteristics extracted from test results of four subgroups matched for age and composed of six subjects each. Z scores were calculated with respect to the mean and standard deviation of the control group for each of the 30 characteristics. The subgroup with the lower methylmercury blood level had larger z scores than the control subgroup and with a few positive values above one standard deviation. The subgroup with the higher methylmercury blood level had several z scores above two standard deviations. Interestingly, the abnormal values for the subgroup with Parkinson's disease were mostly limited to static tremor recorded with no visual feedback and reached up to 5 standard deviations. These results indicate that neuromotor profiles can be used to summarize information extracted from different neuromotor tests and to differentiate neurological conditions.  相似文献   

7.
Pan T  Yin Y 《心理学方法》2012,17(2):309-311
In the discussion of mean square difference (MSD) and standard error of measurement (SEM), Barchard (2012) concluded that the MSD between 2 sets of test scores is greater than 2(SEM)2 and SEM underestimates the score difference between 2 tests when the 2 tests are not parallel. This conclusion has limitations for 2 reasons. First, strictly speaking, MSD should not be compared to SEM because they measure different things, have different assumptions, and capture different sources of errors. Second, the related proof and conclusions in Barchard hold only under the assumptions of equal reliabilities, homogeneous variances, and independent measurement errors. To address the limitations, we propose that MSD should be compared to the standard error of measurement of difference scores (SEMx-y) so that the comparison can be extended to the conditions when 2 tests have unequal reliabilities and score variances.  相似文献   

8.
GREEN BF 《Psychometrika》1950,15(3):251-257
A procedure is proposed for testing the significance of group differences in the standard error of measurement of a psychological test. Wilks' criterion is used to assure that the tests used in ascertaining reliability and hence variance of errors of measurement may be assumed parallel for each group. Votaw's criterion may be used to check whether the test scores of all the groups have the same mean, variance, and covariance. It is possible, however, for the variance and reliability of the test to differ widely from group to group, so that Votaw's criterion is not satisfied even though the variance of errors of measurement stays relatively constant. For this case a modification of Neyman and Pearson's criterion is developed to test agreement among standard errors of measurement despite group differences in mean, variance, and reliability of the test.The author wishes to acknowledge the helpful criticisms of Dr. Harold Gulliksen, who suggested the problem.  相似文献   

9.
Principles and practice in reporting structural equation analyses   总被引:4,自引:0,他引:4  
Principles for reporting analyses using structural equation modeling are reviewed, with the goal of supplying readers with complete and accurate information. It is recommended that every report give a detailed justification of the model used, along with plausible alternatives and an account of identifiability. Nonnormality and missing data problems should also be addressed. A complete set of parameters and their standard errors is desirable, and it will often be convenient to supply the correlation matrix and discrepancies, as well as goodness-of-fit indices, so that readers can exercise independent critical judgment. A survey of fairly representative studies compares recent practice with the principles of reporting recommended here.  相似文献   

10.
In every cross-cultural study, the question as to whether test scores obtained in different cultural populations can be interpreted in the same way across these populations has to be dealt with. Bias and equivalence have become the common terms to refer to the issue. Taxonomy of both bias and equivalence is presented. Bias can be engendered by the theoretical construct (construct bias), the method such as the form of test administration (method bias), and the item content (item bias). Equivalence refers to the measurement level at which scores can be compared across cultures. Three levels of equivalence are possible: the same construct is measured in each cultural group but the functional form of the relationship between scores obtained in various groups is unknown (structural equivalence), scores have the same measurement unit across populations but have different origins (measurement unit equivalence), and scores have the same measurement unit and origin in all populations (full scale equivalence). The most frequently encountered sources of bias and their remedies are described.  相似文献   

11.
The present research contrasts two seemingly complementary decision strategies: acceptance and elimination. In acceptance, a choice set is created by including suitable alternatives from an initial set of alternatives, whereas in elimination it is created by removing inappropriate alternatives from that same initial set. The research used realistic career decision-making scenarios and presented to respondents sets of alternatives that varied in their preexperimental strength values. Whereas complementarity of acceptance and elimination is implied by three standard (normative) assumptions of decision theory, we find a systematic discrepancy between the outcomes of these procedures: choice sets were larger in elimination than in acceptance. This acceptance–elimination discrepancy is directly tied to subcomplementarity. The central tenet of the theoretical framework developed here is that acceptance and elimination procedures imply different types of status quo for the alternatives, thereby invoking a different selection criterion for each procedure. A central prediction of the dual-criterion framework is that middling alternatives should be most susceptible to the type of procedure used. The present studies focus on this prediction which is substantiated by the results showing that middling alternatives yield the greatest discrepancy between acceptance and elimination. The implications of this model and findings for various research domains are discussed.  相似文献   

12.
In many situations, researchers collect multilevel (clustered or nested) data yet analyze the data either ignoring the clustering (disaggregation) or averaging the micro-level units within each cluster and analyzing the aggregated data at the macro level (aggregation). In this study we investigate the effects of ignoring the nested nature of data in confirmatory factor analysis (CFA). The bias incurred by ignoring clustering is examined in terms of model fit and standardized parameter estimates, which are usually of interest to researchers who use CFA. We find that the disaggregation approach increases model misfit, especially when the intraclass correlation (ICC) is high, whereas the aggregation approach results in accurate detection of model misfit in the macro level. Standardized parameter estimates from the disaggregation and aggregation approaches are deviated toward the values of the macro- and micro-level standardized parameter estimates, respectively. The degree of deviation depends on ICC and cluster size, particularly for the aggregation method. The standard errors of standardized parameter estimates from the disaggregation approach depend on the macro-level item communalities. Those from the aggregation approach underestimate the standard errors in multilevel CFA (MCFA), especially when ICC is low. Thus, we conclude that MCFA or an alternative approach should be used if possible.  相似文献   

13.
This study examined two modes of administering the Rorschach Inkblot Technique to determine which was more appropriate for a college-educated, deaf population. Twenty-four prelingually deaf adults took the Rorschach in sign language and in written English, using a counterbalanced test-retest design, and their sign and written scores were compared to each other and to 1986 norms for Exner's Comprehensive System. Seventeen variables measuring such areas as perceptual accuracy, perceptual complexity, and self-focus were found to vary more than one standard deviation from Exner's norms. Differences between sign and written conditions on several affective variables were found. Written administration can be used by examiners who are informed about deafness and aware of variables that may be underreported by written inquiry.  相似文献   

14.
A method of the IRT observed-score equating using chain equating through a third test without equating coefficients is presented with the assumption of the three-parameter logistic model. The asymptotic standard errors of the equated scores by this method are obtained using the results given by M. Liou and P.E. Cheng. The asymptotic standard errors of the IRT observed-score equating method using a synthetic examinee group with equating coefficients, which is a currently used method, are also provided. Numerical examples show that the standard errors by these observed-score equating methods are similar to those by the corresponding true score equating methods except in the range of low scores.The author is indebted to Michael J. Kolen for access to the real data used in this article and anonymous reviewers for their corrections and suggestions on this work.  相似文献   

15.
《人类行为》2013,26(2):187-207
Incumbents are often used in the development and validation of a wide variety of personnel selection instruments, including noncognitive instruments such as personality tests. However, the degree to which assumed motivational factors impact the measurement equivalence and validity of tests developed using incumbents has not been adequately addressed. This study addressed this issue by examining the measurement equivalence of 6 personality scales between a group applying for jobs as sales managers in a large retail organization (N = 999) and a group of sales managers currently employed in that organization (N = 796). A graded item response theory model (Samejima, 1969) was fit to the personality scales in each group. Results indicated that moderately large differences existed in personality scale scores (approximately 1/2 standard deviation units) but only one of the six scales contained any items that evidenced differential item functioning and no scales evidenced differential test functioning. In addition, person-level analyses showed no apparent differences across groups in aberrant responding. The results suggest that personality measures used for selection retain similar psychometric properties to those used in incumbent validation studies.  相似文献   

16.
Weighted additive evaluation functions are widely used to rank alternatives in decision making under certainty with multiple evaluation attributes. Some researchers have suggested that approximate attribute weights may be adequate to accurately rank alternatives. Use of approximate weights would simplify decision analysis since detailed elicitation of weights can be time consuming and controversial. This article investigates the degree to which partial information about the relative magnitudes of attribute weights is sufficient to rank alternatives as a function of the number of decision alternatives, the number of attributes, and the number of allowed levels for each attribute. A simulation analysis, as well as a reanalysis of an actual application, shows that partial information about weights is often not sufficient to determine the most preferred alternative for realistic decision problems. Hence, approximation procedures for specifying weights may lead to errors. However, our work also shows that a simple analysis procedure can be used to accurately determine whether partial information about weights is adequate to correctly specify the most preferred alternative. This procedure can be useful for identifying situations in which detailed elicitation of weights is not needed.  相似文献   

17.
The use of hierarchical data (also called multilevel data or clustered data) is common in behavioural and psychological research when data of lower-level units (e.g., students, clients, repeated measures) are nested within clusters or higher-level units (e.g., classes, hospitals, individuals). Over the past 25 years we have seen great advances in methods for computing the sample sizes needed to obtain the desired statistical properties for such data in experimental evaluations. The present research provides closed-form and iterative formulas for sample size determination that can be used to ensure the desired width of confidence intervals for hierarchical data. Formulas are provided for a four-level hierarchical linear model that assumes slope variances and inclusion of covariates under both balanced and unbalanced designs. In addition, we address several mathematical properties relating to sample size determination for hierarchical data via the standard errors of experimental effect estimates. These include the relative impact of several indices (e.g., random intercept or slope variance at each level) on standard errors, asymptotic standard errors, minimum required values at the highest level, and generalized expressions of standard errors for designs with any-level randomization under any number of levels. In particular, information on the minimum required values will help researchers to minimize the risk of conducting experiments that are statistically unlikely to show the presence of an experimental effect.  相似文献   

18.
合成分数、基于最弱联结假设提出的最大值以及基于解释方式差异性提出的个体内标准差是目前无望抑郁研究中应用较广的认知易感操作化方法.回顾无望抑郁研究所使用的认知易感操作化方法,梳理这些操作化方法对理解无望抑郁症状发展及其治疗的意义,可发现合成分数与最大值反映了无望抑郁认知易感因子之间关系的不同侧面,后续研究可继续对这两种操作化方法进行比较,以个体内标准差计算出的解释弹性可能是无望抑郁理论之外新的易感因素,可为治疗抑郁提供新的视角.  相似文献   

19.
This computer program will compute standard scores with any desired mean and standard deviation. The program requires the user to enter the raw scores of each member of the sample to be standardized. The program accepts input from the computer keyboard, stores data in files on a floppy disk, and produces output on the printer.  相似文献   

20.
Motor activity has the potential to persist after action and influence subsequent behaviour. A standard approach to isolating a motoric influence is to map two stimuli onto each response, so that response and stimulus repetition can be dissociated. A response-only response-repetition (RoRR) effect can then be assessed, arising if the same response made to two unrelated stimuli is nonetheless produced more rapidly. This kind of motoric behavioural influence of one response on the next has proved elusive in reaction time tasks involving choices between key presses, at least when stimuli mapped to each response are difficult to categorise together. However, such tasks have traditionally involved only a few response alternatives. We hypothesised that a larger load on the motor system might prevent participants from holding all possible action plans active throughout an experiment, and thus reveal trial-to-trial motor priming in the form of an RoRR effect. In our first experiment, increasing the number of response alternatives to four or eight yielded a reliable RoRR effect. This effect was replicated in Experiment 2, where it also proved persistent across practice and resistant to changes in response configuration. Our results are consistent with evidence of motoric perseveration in other kinds of motor task, such as reaching and grasping, and have implications for the generation of speeded decisions in a range of activities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号