首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The average causal treatment effect (ATE) can be estimated from observational data based on covariate adjustment. Even if all confounding covariates are observed, they might not necessarily be reliably measured and may fail to obtain an unbiased ATE estimate. Instead of fallible covariates, the respective latent covariates can be used for covariate adjustment. But is it always necessary to use latent covariates? How well do analysis of covariance (ANCOVA) or propensity score (PS) methods estimate the ATE when latent covariates are used? We first analytically delineate the conditions under which latent instead of fallible covariates are necessary to obtain the ATE. Then we empirically examine the difference between ATE estimates when adjusting for fallible or latent covariates in an applied example. We discuss the issue of fallible covariates within a stochastic theory of causal effects and analyse data of a within-study comparison with recently developed ANCOVA and PS procedures that allow for latent covariates. We show that fallible covariates do not necessarily bias ATE estimates, but point out different scenarios in which adjusting for latent covariates is required. In our empirical application, we demonstrate how latent covariates can be incorporated for ATE estimation in ANCOVA and in PS analysis.  相似文献   

2.
Two common methods for adjusting group comparisons for differences in the distribution of confounders, namely analysis of covariance (ANCOVA) and subset selection, are compared using real examples from neuropsychology, theory, and simulations. ANCOVA has potential pitfalls, but the blanket rejection of the method in some areas of empirical psychology is not justified. Assumptions of the methods are reviewed, with issues of selection bias, nonlinearity, and interaction emphasized. Advantages of ANCOVA include better power, improved ability to detect and estimate interactions, and the availability of extensions to deal with measurement error in the covariates. Forms of ANCOVA are advocated that relax the standard assumption of linearity between the outcome and covariates. Specifically, a version of ANCOVA that models the relationship between the covariate and the outcome through cubic spline with fixed knots outperforms other methods in simulations.  相似文献   

3.
Four misconceptions about the requirements for proper use of analysis of covariance (ANCOVA) are examined by means of Monte Carlo simulation. Conclusions are that ANCOVA does not require covariates to be measured without error, that ANCOVA can be used effectively to adjust for initial group differences that result from nonrandom assignment which is dependent on observed covariate scores, that ANCOVA does not provide unbiased estimates of true treatment effects where initial group differences are due to nonrandom assignment which is dependent on the true latent covariable if the covariate contains measurement error, and that ANCOVA requires no assumption concerning the equality of within-groups and between-groups regression. Where treatments actually influence covariate scores, the hypothesis tested by ANCOVA concerns a weighted combination of effects on covariate and dependent variables.  相似文献   

4.
An ordinally‐observed variable is a variable that is only partially observed through an ordinal surrogate. Although statistical models for ordinally‐observed response variables are well known, relatively little attention has been given to the problem of ordinally‐observed regressors. In this paper I show that if surrogates to ordinally‐observed covariates are used as regressors in a generalized linear model then the resulting measurement error in the covariates can compromise the consistency of point estimators and standard errors for the effects of fully‐observed regressors. To properly account for this measurement error when making inferences concerning the fully‐observed regressors, I propose a general modelling framework for generalized linear models with ordinally‐observed covariates. I discuss issues of model specification, identification, and estimation, and illustrate these with examples.  相似文献   

5.
Implications of random error of measurement for the sensitivity of theF test of differences between means are elaborated. By considering the mathematical models appropriate to design situations involving true and fallible measures, it is shown how measurement error decreases the sensitivity of a test of significance. A method of reducing such loss of sensitivity is described and recommended for general practice.I wish to express my thanks in acknowledgement that the present form of this paper has benefited from editorial comment, and from the advice of Dr. H. Mulhall of the Department of Mathematics, University of Sydney.  相似文献   

6.
The pretest-posttest control group design can be analyzed with the posttest as dependent variable and the pretest as covariate (ANCOVA) or with the difference between posttest and pretest as dependent variable (CHANGE). These 2 methods can give contradictory results if groups differ at pretest, a phenomenon that is known as Lord's paradox. Literature claims that ANCOVA is preferable if treatment assignment is based on randomization or on the pretest and questionable for preexisting groups. Some literature suggests that Lord's paradox has to do with measurement error in the pretest. This article shows two new things: First, the claims are confirmed by proving the mathematical equivalence of ANCOVA to a repeated measures model without group effect at pretest. Second, correction for measurement error in the pretest is shown to lead back to ANCOVA or to CHANGE, depending on the assumed absence or presence of a true group difference at pretest. These two new theoretical results are illustrated with multilevel (mixed) regression and structural equation modeling of data from two studies.  相似文献   

7.
The central question considered is: given appropriate precisations of the ideas of an empirical system's approximately satisfying laws of measurement with error at most ? (for some ? ≥ 0), and of a real-valued function over its domain providing an approximate representation of its basic operations and relations with error at most δ, can it be shown that satisfaction of the laws with ‘sufficiently small’ error insures numerical representability with arbitrarily small error? Positive answers are given in the cases of ordinal and nominal measurement, together with some indications of the sizes of the errors involved. Problems of extending the theory to more complex types of measurement are discussed, some open problems and conjectures are formulated, and a relation between the ‘approximate representation’ and ‘stochastic choice model’ approaches to measurement with fallible data is established.  相似文献   

8.
Categorical moderators are often included in mixed-effects meta-analysis to explain heterogeneity in effect sizes. An assumption in tests of categorical moderator effects is that of a constant between-study variance across all levels of the moderator. Although it rarely receives serious thought, there can be statistical ramifications to upholding this assumption. We propose that researchers should instead default to assuming unequal between-study variances when analysing categorical moderators. To achieve this, we suggest using a mixed-effects location-scale model (MELSM) to allow group-specific estimates for the between-study variance. In two extensive simulation studies, we show that in terms of Type I error and statistical power, little is lost by using the MELSM for moderator tests, but there can be serious costs when an equal variance mixed-effects model (MEM) is used. Most notably, in scenarios with balanced sample sizes or equal between-study variance, the Type I error and power rates are nearly identical between the MEM and the MELSM. On the other hand, with imbalanced sample sizes and unequal variances, the Type I error rate under the MEM can be grossly inflated or overly conservative, whereas the MELSM does comparatively well in controlling the Type I error across the majority of cases. A notable exception where the MELSM did not clearly outperform the MEM was in the case of few studies (e.g., 5). With respect to power, the MELSM had similar or higher power than the MEM in conditions where the latter produced non-inflated Type 1 error rates. Together, our results support the idea that assuming unequal between-study variances is preferred as a default strategy when testing categorical moderators.  相似文献   

9.
Statistical power of latent growth curve models to detect quadratic growth   总被引:1,自引:0,他引:1  
Latent curve models (LCMs) have been used extensively to analyze longitudinal data. However, little is known about the power of LCMs to detect nonlinear trends when they are present in the data. For this study, we utilized simulated data to investigate the power of LCMs to detect the mean of the quadratic slope, Type I error rates, and rates of nonconvergence during the estimation of quadratic LCMs. Five factors were examined: the number of time points, growth magnitude, interindividual variability, sample size, and the R 2s of the measured variables. The results showed that the empirical Type I error rates were close to the nominal value of 5 %. The empirical power to detect the mean of the quadratic slope was affected by the simulation factors. Finally, a substantial proportion of samples failed to converge under conditions of no to small variation in the quadratic factor, small sample sizes, and small R 2 of the repeated measures. In general, we recommended that quadratic LCMs be based on samples of (a) at least 250 but ideally 400, when four measurement points are available; (b) at least 100 but ideally 150, when six measurement points are available; (c) at least 50 but ideally 100, when ten measurement points are available.  相似文献   

10.
In sparse tables for categorical data well‐known goodness‐of‐fit statistics are not chi‐square distributed. A consequence is that model selection becomes a problem. It has been suggested that a way out of this problem is the use of the parametric bootstrap. In this paper, the parametric bootstrap goodness‐of‐fit test is studied by means of an extensive simulation study; the Type I error rates and power of this test are studied under several conditions of sparseness. In the presence of sparseness, models were used that were likely to violate the regularity conditions. Besides bootstrapping the goodness‐of‐fit usually used (full information statistics), corrected versions of these statistics and a limited information statistic are bootstrapped. These bootstrap tests were also compared to an asymptotic test using limited information. Results indicate that bootstrapping the usual statistics fails because these tests are too liberal, and that bootstrapping or asymptotically testing the limited information statistic works better with respect to Type I error and outperforms the other statistics by far in terms of statistical power. The properties of all tests are illustrated using categorical Markov models.  相似文献   

11.
The crux in psychometrics is how to estimate the probability that a respondent answers an item correctly on one occasion out of many. Under the current testing paradigm this probability is estimated using all kinds of statistical techniques and mathematical modeling. Multiple evaluation is a new testing paradigm using the person's own personal estimates of these probabilities as data. It is compared to multiple choice, which appears to be a degenerated form of multiple evaluation. Multiple evaluation has much less measurement error than multiple choice, and this measurement error is not in favor of the examinee. When the test is used for selection purposes as it is with multiple choice, the probability of a Type II error (unjustified passes) is almost negligible. Procedures for statistical item-and-test analyses under the multiple evaluation paradigm are presented. These procedures provide more accurate information in comparison to what is possible under the multiple choice paradigm. A computer program that implements multiple evaluation is also discussed.  相似文献   

12.
We examine methods for measuring performance in signal-detection-like tasks when each participant provides only a few observations. Monte Carlo simulations demonstrate that standard statistical techniques applied to ad’ analysis can lead to large numbers of Type I errors (incorrectly rejecting a hypothesis of no difference). Various statistical methods were compared in terms of their Type I and Type II error (incorrectly accepting a hypothesis of no difference) rates. Our conclusions are the same whether these two types of errors are weighted equally or Type I errors are weighted more heavily. The most promising method is to combine an aggregated’ measure with a percentile bootstrap confidence interval, a computerintensive nonparametric method of statistical inference. Researchers who prefer statistical techniques more commonly used in psychology, such as a repeated measurest test, should useγ (Goodman & Kruskal, 1954), since it performs slightly better than or nearly as well asd’. In general, when repeated measurest tests are used,γ is more conservative thand’: It makes more Type II errors, but its Type I error rate tends to be much closer to that of the traditional .05 α level. It is somewhat surprising thatγ performs as well as it does, given that the simulations that generated the hypothetical data conformed completely to thed’ model. Analyses in which H—FA was used had the highest Type I error rates. Detailed simulation results can be downloaded fromwww.psychonomic.org/archive/Schooler-BRM-2004.zip.  相似文献   

13.
Latent variable models with many categorical items and multiple latent constructs result in many dimensions of numerical integration, and the traditional frequentist estimation approach, such as maximum likelihood (ML), tends to fail due to model complexity. In such cases, Bayesian estimation with diffuse priors can be used as a viable alternative to ML estimation. This study compares the performance of Bayesian estimation with ML estimation in estimating single or multiple ability factors across 2 types of measurement models in the structural equation modeling framework: a multidimensional item response theory (MIRT) model and a multiple-indicator multiple-cause (MIMIC) model. A Monte Carlo simulation study demonstrates that Bayesian estimation with diffuse priors, under various conditions, produces results quite comparable with ML estimation in the single- and multilevel MIRT and MIMIC models. Additionally, an empirical example utilizing the Multistate Bar Examination is provided to compare the practical utility of the MIRT and MIMIC models. Structural relationships among the ability factors, covariates, and a binary outcome variable are investigated through the single- and multilevel measurement models. The article concludes with a summary of the relative advantages of Bayesian estimation over ML estimation in MIRT and MIMIC models and suggests strategies for implementing these methods.  相似文献   

14.
Previous studies of different methods of testing mediation models have consistently found two anomalous results. The first result is elevated Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap tests not found in nonresampling tests or in resampling tests that did not include a bias correction. This is of special concern as the bias-corrected bootstrap is often recommended and used due to its higher statistical power compared with other tests. The second result is statistical power reaching an asymptote far below 1.0 and in some conditions even declining slightly as the size of the relationship between X and M, a, increased. Two computer simulations were conducted to examine these findings in greater detail. Results from the first simulation found that the increased Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap are a function of an interaction between the size of the individual paths making up the mediated effect and the sample size, such that elevated Type I error rates occur when the sample size is small and the effect size of the nonzero path is medium or larger. Results from the second simulation found that stagnation and decreases in statistical power as a function of the effect size of the a path occurred primarily when the path between M and Y, b, was small. Two empirical mediation examples are provided using data from a steroid prevention and health promotion program aimed at high school football players (Athletes Training and Learning to Avoid Steroids; Goldberg et al., 1996), one to illustrate a possible Type I error for the bias-corrected bootstrap test and a second to illustrate a loss in power related to the size of a. Implications of these findings are discussed.  相似文献   

15.
Considering that the absence of measurement error in research is a rare phenomenon and its effects can be dramatic, we examine the impact of measurement error on propensity score (PS) analysis used to minimize selection bias in behavioral and social observational studies. A Monte Carlo study was conducted to explore the effects of measurement error on the treatment effect and balance estimates in PS analysis across seven different PS conditioning methods. In general, the results indicate that even low levels of measurement error in the covariates lead to substantial bias in estimates of treatment effects and concomitant reduction in confidence interval coverage across all methods of conditioning on the PS.  相似文献   

16.
In a variety of measurement situations, the researcher may wish to compare the reliabilities of several instruments administered to the same sample of subjects. This paper presents eleven statistical procedures which test the equality ofm coefficient alphas when the sample alpha coefficients are dependent. Several of the procedures are derived in detail, and numerical examples are given for two. Since all of the procedures depend on approximate asymptotic results, Monte Carlo methods are used to assess the accuracy of the procedures for sample sizes of 50, 100, and 200. Both control of Type I error and power are evaluated by computer simulation. Two of the procedures are unable to control Type I errors satisfactorily. The remaining nine procedures perform properly, but three are somewhat superior in power and Type I error control.A more detailed version of this paper is also available.  相似文献   

17.
In a pre‐test–post‐test cluster randomized trial, one of the methods commonly used to detect an intervention effect involves controlling pre‐test scores and other related covariates while estimating an intervention effect at post‐test. In many applications in education, the total post‐test and pre‐test scores, ignoring measurement error, are used as response variable and covariate, respectively, to estimate the intervention effect. However, these test scores are frequently subject to measurement error, and statistical inferences based on the model ignoring measurement error can yield a biased estimate of the intervention effect. When multiple domains exist in test data, it is sometimes more informative to detect the intervention effect for each domain than for the entire test. This paper presents applications of the multilevel multidimensional item response model with measurement error adjustments in a response variable and a covariate to estimate the intervention effect for each domain.  相似文献   

18.
Shieh  Gwowen 《Psychometrika》2020,85(1):101-120

The analysis of covariance (ANCOVA) has notably proven to be an effective tool in a broad range of scientific applications. Despite the well-documented literature about its principal uses and statistical properties, the corresponding power analysis for the general linear hypothesis tests of treatment differences remains a less discussed issue. The frequently recommended procedure is a direct application of the ANOVA formula in combination with a reduced degrees of freedom and a correlation-adjusted variance. This article aims to explicate the conceptual problems and practical limitations of the common method. An exact approach is proposed for power and sample size calculations in ANCOVA with random assignment and multinormal covariates. Both theoretical examination and numerical simulation are presented to justify the advantages of the suggested technique over the current formula. The improved solution is illustrated with an example regarding the comparative effectiveness of interventions. In order to facilitate the application of the described power and sample size calculations, accompanying computer programs are also presented.

  相似文献   

19.
The statistical simulation program DATASIM is designed to conduct large-scale sampling experiments on microcomputers. Monte Carlo procedures are used to investigate the Type I and Type II error rates for statistical tests when one or more assumptions are systematically violated-assumptions, for example, regarding normality, homogeneity of variance or covariance, mini-mum expected cell frequencies, and the like. In the present paper, we report several initial tests of the data-generating algorithms employed by DATASIM. The results indicate that the uniform and standard normal deviate generators perform satisfactorily. Furthermore, Kolmogorov-Smirnov tests show that the sampling distributions ofz, t, F, χ2, andr generated by DATASIM simulations follow the appropriate theoretical distributions. Finally, estimates of Type I error rates obtained by DATASIM under various patterns of violations of assumptions are in close agreement with the results of previous analytical and empirical studies; These converging lines of evidence suggest that DATASIM may well prove to be a reliable and productive tool for conducting statistical simulation research.  相似文献   

20.
Extensions of latent state-trait models for continuous observed variables to mixture latent state-trait models with and without covariates of change are presented that can separate individuals differing in their occasion-specific variability. An empirical application to the repeated measurement of mood states (N=501) revealed that a model with 2 latent classes fits the data well. The larger class (76%) consists of individuals whose mood is highly variable, whose general well-being is comparatively lower, and whose mood variability is influenced by daily hassles and uplifts. The smaller class (24%) represents individuals who are rather stable and happier and whose mood is influenced only by daily uplifts but not by daily hassles. A simulation study on the model without covariates with 5 sets of sample sizes and 5 sets of number of occasions revealed that the appropriateness of the parameter estimates of this model depends on number of observations (the higher the better) and number of occasions (the higher the better). Another simulation study estimated Type I and II errors of the Lo-Mendell-Rubin test.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号