首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The use of one-way analysis of variance tables for obtaining unbiased estimates of true score variance and error score variance in the classical test theory model is discussed. Attention is paid to both balanced (equal numbers of observations on each person) and unbalanced designs, and estimates provided for both homoscedastic (common error variance for all persons) and heteroscedastic cases. It is noted that optimality properties (minimum variance) can be claimed for estimates derived from analysis of variance tables only in the balanced, homoscedastic case, and that there they are essentially a reflection of the symmetry inherent in the situation. Estimates which might be preferable in other cases are discussed. An example is given where a natural analysis of variance table leads to estimates which cannot be derived from the set of statistics which is sufficient under normality assumptions. Reference is made to Bayesian studies which shed light on the difficulties encountered. Work on this paper was carried out at the headquarters of the American College Testing Program, Iowa City, Iowa, while the author was on leave from the University College of Wales.  相似文献   

2.
In the theory of test validity it is assumed that error scores on two distinct tests, a predictor and a criterion, are uncorrelated. The expected-value concept of true score in the calssical test-theory model as formulated by Lord and Novick, Guttman, and others, implies mathematically, without further assumptions, that true scores and error scores are uncorrelated. This concept does not imply, however, that error scores on two arbitrary tests are uncorrelated, and an additional axiom of “experimental independence” is needed in order to obtain familiar results in the theory of test validity. The formulas derived in the present paper do not depend on this assumption and can be applied to all test scores. These more general formulas reveal some unexpected and anomalous properties of test validty and have implications for the interpretation of validity coefficients in practice. Under some conditions there is no attenuation produced by error of measurement, and the correlation between observed scores sometimes can exceed the correlation between true scores, so that the usual correction for attenuation may be inappropriate and misleading. Observed scores on two tests can be positively correlated even when true scores are negatively correlated, and the validity coefficient can exceed the index of reliability. In some cases of practical interest, the validity coefficient will decrease with increase in test length. These anomalies sometimes occur even when the correlation between error scores is quite small, and their magnitude is inversely related to test reliability. The elimination of correlated errors in practice will not enhance a test's predictive value, but will restore the properties of the validity coefficient that are familiar in the classical theory.  相似文献   

3.
4.
Abstract— A commonly used method for comparing groups of individuals is the analysis of variance (ANOVA) F test. When the assumptions underlying the derivation of this test are true, its power, meaning its probability of detecting true differences among the groups, competes well with all other methods that might be used. But when these assumptions are false, its power can be relatively low. Many new statistical methods have been proposed—ones that are aimed at achieving about the same amount of power when the assumptions of the F test are true but which have the potential of high power in situations where the F test performs poorly. A brief summary of some relevant issues and recent developments is provided. Some related issues are discussed and implications for future research are described.  相似文献   

5.
It is shown that measurement error in predictor variables can be modeled using item response theory (IRT). The predictor variables, that may be defined at any level of an hierarchical regression model, are treated as latent variables. The normal ogive model is used to describe the relation between the latent variables and dichotomous observed variables, which may be responses to tests or questionnaires. It will be shown that the multilevel model with measurement error in the observed predictor variables can be estimated in a Bayesian framework using Gibbs sampling. In this article, handling measurement error via the normal ogive model is compared with alternative approaches using the classical true score model. Examples using real data are given.This paper is part of the dissertation by Fox (2001) that won the 2002 Psychometric Society Dissertation Award.  相似文献   

6.
This article examines a Bayesian nonparametric approach to model selection and model testing, which is based on concepts from Bayesian decision theory and information theory. The approach can be used to evaluate the predictive-utility of any model that is either probabilistic or deterministic, with that model analyzed under either the Bayesian or classical-frequentist approach to statistical inference. Conditional on an observed set of data, generated from some unknown true sampling density, the approach identifies the “best” model as the one that predicts a sampling density that explains the most information about the true density. Furthermore, in the approach, the decision is to reject a model when it does not explain enough information about the true density (according to a straightforward calibration of the Kullback-Leibler divergence measure). The posterior estimate of the true density is based on a Bayesian nonparametric prior that can give positive support to the entire space of sampling densities (defined on some sample space). This article also discusses the theoretical and practical advantages of the Bayesian nonparametric approach over all other types of model selection procedures, and over any model testing procedure that depends on interpreting a p-value. Finally, the Bayesian nonparametric approach is illustrated on four real data sets, in the comparison and testing of order-constrained models, cognitive models, models of choice-behavior, and a test of a general psychometric model.  相似文献   

7.
Klotzke  Konrad  Fox  Jean-Paul 《Psychometrika》2019,84(3):649-672

A multivariate generalization of the log-normal model for response times is proposed within an innovative Bayesian modeling framework. A novel Bayesian Covariance Structure Model (BCSM) is proposed, where the inclusion of random-effect variables is avoided, while their implied dependencies are modeled directly through an additive covariance structure. This makes it possible to jointly model complex dependencies due to for instance the test format (e.g., testlets, complex constructs), time limits, or features of digitally based assessments. A class of conjugate priors is proposed for the random-effect variance parameters in the BCSM framework. They give support to testing the presence of random effects, reduce boundary effects by allowing non-positive (co)variance parameters, and support accurate estimation even for very small true variance parameters. The conjugate priors under the BCSM lead to efficient posterior computation. Bayes factors and the Bayesian Information Criterion are discussed for the purpose of model selection in the new framework. In two simulation studies, a satisfying performance of the MCMC algorithm and of the Bayes factor is shown. In comparison with parameter expansion through a half-Cauchy prior, estimates of variance parameters close to zero show no bias and undercoverage of credible intervals is avoided. An empirical example showcases the utility of the BCSM for response times to test the influence of item presentation formats on the test performance of students in a Latin square experimental design.

  相似文献   

8.
A two-step Bayesian propensity score approach is introduced that incorporates prior information in the propensity score equation and outcome equation without the problems associated with simultaneous Bayesian propensity score approaches. The corresponding variance estimators are also provided. The two-step Bayesian propensity score is provided for three methods of implementation: propensity score stratification, weighting, and optimal full matching. Three simulation studies and one case study are presented to elaborate the proposed two-step Bayesian propensity score approach. Results of the simulation studies reveal that greater precision in the propensity score equation yields better recovery of the frequentist-based treatment effect. A slight advantage is shown for the Bayesian approach in small samples. Results also reveal that greater precision around the wrong treatment effect can lead to seriously distorted results. However, greater precision around the correct treatment effect parameter yields quite good results, with slight improvement seen with greater precision in the propensity score equation. A comparison of coverage rates for the conventional frequentist approach and proposed Bayesian approach is also provided. The case study reveals that credible intervals are wider than frequentist confidence intervals when priors are non-informative.  相似文献   

9.
In the classical test theory, a high-reliability test always leads to a precise measurement. However, when it comes to the prediction of test scores, it is not necessarily so. Based on a Bayesian statistical approach, we predicted the distributions of test scores for a new subject, a new test, and a new subject taking a new test. Under some reasonable conditions, the predicted means, variances, and covariances of predicted scores were obtained and investigated. We found that high test reliability did not necessarily lead to small variances or covariances. For a new subject, higher test reliability led to larger predicted variances and covariances, because high test reliability enabled a more accurate prediction of test score variances. Regarding a new subject taking a new test, in this study, higher test reliability led to a large variance when the sample size was smaller than half the number of tests. The classical test theory is reanalyzed from the viewpoint of predictions and some suggestions are made.  相似文献   

10.
First, a brief historical trace of the developments in confirmation theory leading up to Goodman’s infamous “grue” paradox is presented. Then, Goodman’s argument is analyzed from both Hempelian and Bayesian perspectives. A guiding analogy is drawn between certain arguments against classical deductive logic, and Goodman’s “grue” argument against classical inductive logic. The upshot of this analogy is that the “New Riddle” is not as vexing as many commentators have claimed (especially, from a Bayesian inductive-logical point of view). Specifically, the analogy reveals an intimate connection between Goodman’s problem, and the “problem of old evidence”. Several other novel aspects of Goodman’s argument are also discussed (mainly, from a Bayesian perspective).  相似文献   

11.
Formal models in psychology are used to make theoretical ideas precise and allow them to be evaluated quantitatively against data. We focus on one important??but under-used and incorrectly maligned??method for building theoretical assumptions into formal models, offered by the Bayesian statistical approach. This method involves capturing theoretical assumptions about the psychological variables in models by placing informative prior distributions on the parameters representing those variables. We demonstrate this approach of casting basic theoretical assumptions in an informative prior by considering a case study that involves the generalized context model (GCM) of category learning. We capture existing theorizing about the optimal allocation of attention in an informative prior distribution to yield a model that is higher in psychological content and lower in complexity than the standard implementation. We also highlight that formalizing psychological theory within an informative prior distribution allows standard Bayesian model selection methods to be applied without concerns about the sensitivity of results to the prior. We then use Bayesian model selection to test the theoretical assumptions about optimal allocation formalized in the prior. We argue that the general approach of using psychological theory to guide the specification of informative prior distributions is widely applicable and should be routinely used in psychological modeling.  相似文献   

12.
基于经典测量理论标准参照测验的传统划界分数设置方法是等级评分或指定划界分数,划界分数设置的方法有待进一步拓展。Bookmark法是基于项目反应理论的划界分数设置方法,学科专家以测验材料的能力参数值为基础,依据掌握百分比分数与被试能力水平的定量关系,设置多重划界分数,相对于传统方法更高效、精确。作者评述了Bookmark法的基本原理和具体实施方法,分析了Bookmark法的应用前景,并对Bookmark法设置划界分数的信效度和标准误估计的研究作了评述。  相似文献   

13.
The authors modeled sources of error variance in job specification ratings collected from 3 levels of raters across 5 organizations (N=381). Variance components models were used to estimate the variance in ratings attributable to true score (variance between knowledge, skills, abilities, and other characteristics [KSAOs]) and error (KSAO-by-rater and residual variance). Subsequent models partitioned error variance into components related to the organization, position level, and demographic characteristics of the raters. Analyses revealed that the differential ordering of KSAOs by raters was not a function of these characteristics but rather was due to unexplained rating differences among the raters. The implications of these results for job specification and validity transportability are discussed.  相似文献   

14.
A hybrid procedure for number correct scoring is proposed. The proposed scoring procedure is based on both classical true-score theory (CTT) and multidimensional item response theory (MIRT). Specifically, the hybrid scoring procedure uses test item weights based on MIRT and the total test scores are computed based on CTT. Thus, what makes the hybrid scoring method attractive is that this method accounts for the dimensionality of the test items while test scores remain easy to compute. Further, the hybrid scoring does not require large sample sizes once the item parameters are known. Monte Carlo techniques were used to compare and contrast the proposed hybrid scoring method with three other scoring procedures. Results indicated that all scoring methods in this study generated estimated and true scores that were highly correlated. However, the hybrid scoring procedure had significantly smaller error variances between the estimated and true scores relative to the other procedures.  相似文献   

15.
A new measure for reliability of a rating scale is introduced, based on the classical definition of reliability, as the ratio of the true score variance and the total variance. Clinical trial data can be employed to estimate the reliability of the scale in use, whenever repeated measurements are taken. The reliability is estimated from the covariance parameters obtained from a linear mixed model. The method provides a single number to express the reliability of the scale, but allows for the study of the reliability’s time evolution. The method is illustrated using a case study in schizophrenia. The authors are grateful to J&J PRD for kind permission to use their data. We gratefully acknowledge support from the Belgian IUAP/PAI network “Statistical Techniques and Modeling for Complex Substantive Questions with Complex Data.”  相似文献   

16.
Numerous studies have shown increasing item reliabilities as an effect of the item position in personality scales. Traditionally, these context effects are analyzed based on item-total correlations. This approach neglects that trends in item reliabilities can be caused either by an increase in true score variance or by a decrease in error variance. This article presents the Confirmatory Analysis of Item Reliability Trends (CAIRT) that allows estimating both trends separately within a structural equation modeling framework. Results of a simulation study prove the CAIRT method to provide reliable and independent parameter estimates; the power exceeds the analysis of item-total correlations. We present an empirical application to self- and peer ratings collected in an Internet-based experiment. Results show that reliability trends are caused by increasing true score variance in self-ratings and by decreasing error variance in peer ratings.  相似文献   

17.
The analysis of variance (ANOVA) is still one of the most widely used statistical methods in the social sciences. This article is about stochastic group weights in ANOVA models – a neglected aspect in the literature. Stochastic group weights are present whenever the experimenter does not determine the exact group sizes before conducting the experiment. We show that classic ANOVA tests based on estimated marginal means can have an inflated type I error rate when stochastic group weights are not taken into account, even in randomized experiments. We propose two new ways to incorporate stochastic group weights in the tests of average effects one based on the general linear model and one based on multigroup structural equation models (SEMs). We show in simulation studies that our methods have nominal type I error rates in experiments with stochastic group weights while classic approaches show an inflated type I error rate. The SEM approach can additionally deal with heteroscedastic residual variances and latent variables. An easy-to-use software package with graphical user interface is provided.  相似文献   

18.
This paper presents a contribution to the sampling theory of a set of homogeneous tests which differ only in length, test length being regarded as an essential test parameter. Observed variance-covariance matrices of such measurements are taken to follow a Wishart distribution. The familiar true score-and-error concept of classical test theory is employed. Upon formulation of the basic model it is shown that in a combination of such tests forming a “total” test, the singal-to-noise ratio of the components is additive and that the inverse of the population variance-covariance matrix of the component measures has all of its off-diagonal elements equal, regardless of distributional assumptions. This fact facilitates the subsequent derivation of a statistical sampling theory, there being at mostm + 1 free parameters whenm is the number of component tests. In developing the theory, the cases of known and unknown test lengths are treated separately. For both cases maximum-likelihood estimators of the relevant parameters are derived. It is argued that the resulting formulas will remain resonable even if the distributional assumptions are too narrow. Under these assumptions, however, maximum-likelihood ratio tests of the validity of the model and of hypotheses concerning reliability and standard error of measurement of the total test are given. It is shown in each case that the maximum-likelihood equations possess precisely one acceptable solution under rather natural conditions. Application of the methods can be effected without the use of a computer. Two numerical examples are appended by way of illustration. This research was supported in part by The National Institute of Child Health and Human Development, under Research Grant 1 PO1 HDO1762.  相似文献   

19.
Finite sample inference procedures are considered for analyzing the observed scores on a multiple choice test with several items, where, for example, the items are dissimilar, or the item responses are correlated. A discrete p-parameter exponential family model leads to a generalized linear model framework and, in a special case, a convenient regression of true score upon observed score. Techniques based upon the likelihood function, Akaike's information criteria (AIC), an approximate Bayesian marginalization procedure based on conditional maximization (BCM), and simulations for exact posterior densities (importance sampling) are used to facilitate finite sample investigations of the average true score, individual true scores, and various probabilities of interest. A simulation study suggests that, when the examinees come from two different populations, the exponential family can adequately generalize Duncan's beta-binomial model. Extensions to regression models, the classical test theory model, and empirical Bayes estimation problems are mentioned. The Duncan, Keats, and Matsumura data sets are used to illustrate potential advantages and flexibility of the exponential family model, and the BCM technique.The authors wish to thank Ella Mae Matsumura for her data set and helpful comments, Frank Baker for his advice on item response theory, Hirotugu Akaike and Taskin Atilgan, for helpful discussions regarding AIC, Graham Wood for his advice concerning the class of all binomial mixture models, Yiu Ming Chiu for providing useful references and information on tetrachoric models, and the Editor and two referees for suggesting several references and alternative approaches.  相似文献   

20.
《人类行为》2013,26(1):19-35
Investigations of the construct-related evidence of the validity of performance ratings have been rare, perhaps because researchers are dissuaded by the con- siderable amount of evidence needed to show construct validity (Landy, 1986). It is argued that generalizability (G) theory (Cronbach, Gleser, Nanda, & Rajaratnam, 1972) is well-suited to investigations of construct-related evi- dence of validity because a single generalizability investigation may provide multiple inferences of validity. G theory permits the researcher to partition observed score variance into universe (true) score variance and multiple, distinct estimates of error variance. G theory was illustrated through the anal- ysis of proficiency ratings of 256 Air Force jet engine mechanics. Mechanics were rated on three different rating forms by themselves, peers, and supervi- sors. Interpretation of G study variance components revealed suitable evi- dence of construct validity. Ratings within sources were reliable. Proficiency ratings showed strong convergence over rating forms, though not over rating sources. Raters showed adequate discriminant validity across rating dimen- sions. The expectation of convergence over sources was further questioned.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号