首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
It is difficult to obtain adequate power to test a small effect size with a set criterion alpha of 0.05. Probably an inferential test will indicate non-statistical significance and not be published. Rarely, statistical significance will be obtained, and an exaggerated effect size calculated and reported. Accepting all inferential probabilities and associated effect sizes could solve exaggeration problems. Graphs, generated through Monte Carlo methods, are presented to illustrate this. The first graph presents effect sizes (Cohen's d) as lines from 1 to 0 with probabilities on the Y axis and the number of measures on the X axis. This graph shows effect sizes of .5 or less should yield non-significance with sample sizes below 120 measures. The other graphs show results with as many as 10 small sample size replications. There is a convergence of means with the effect size as sample size increases and measurement accuracy emerges.  相似文献   

2.
3.
The statistical significance levels of the Wilcoxon-Mann-Whitney test and the Kruskal-Wallis test are substantially biased by heterogeneous variances of treatment groups--even when sample sizes are equal. Under these conditions, the Type I error probabilities of the nonparametric tests, performed at the .01, .05, and .10 significance levels, increase by as much as 40%-50% in many cases and sometimes as much as 300%. The bias increases systematically as the ratio of standard deviations of treatment groups increases and remains fairly constant for various sample sizes. There is no indication that Type I error probabilities approach the significance level asymptotically as sample size increases.  相似文献   

4.
The statistical significance levels of the Wilcoxon-Mann-Whitney test and the Kruskal-Wallis test are substantially biased by heterogeneous variances of treatment groups—even when sample sizes are equal. Under these conditions, the Type I error probabilities of the nonparametric tests, performed at the .01, .05, and .10 significance levels, increase by as much as 40%-50% in many cases and sometimes as much as 300%. The bias increases systematically as the ratio of standard deviations of treatment groups increases and remains fairly constant for various sample sizes. There is no indication that Type I error probabilities approach the significance level asymptotically as sample size increases.  相似文献   

5.
GPOWER: A general power analysis program   总被引:1,自引:0,他引:1  
GPOWER is a completely interactive, menu-driven program for IBM-compatible and Apple Macintosh personal computers. It performs high-precision statistical power analyses for the most common statistical tests in behavioral research, that is,t tests,F tests, andχ 2 tests. GPOWER computes (1) power values for given sample sizes, effect sizes andα levels (post hoc power analyses); (2) sample sizes for given effect sizes,α levels, and power values (a priori power analyses); and (3)α andβ values for given sample sizes, effect sizes, andβ/α ratios (compromise power analyses). The program may be used to display graphically the relation between any two of the relevant variables, and it offers the opportunity to compute the effect size measures from basic parameters defining the alternative hypothesis. This article delineates reasons for the development of GPOWER and describes the program’s capabilities and handling.  相似文献   

6.
The equality of two group variances is frequently tested in experiments. However, criticisms of null hypothesis statistical testing on means have recently arisen and there is interest in other types of statistical tests of hypotheses, such as superiority/non-inferiority and equivalence. Although these tests have become more common in psychology and social sciences, the corresponding sample size estimation for these tests is rarely discussed, especially when the sampling unit costs are unequal or group sizes are unequal for two groups. Thus, for finding optimal sample size, the present study derived an initial allocation by approximating the percentiles of an F distribution with the percentiles of the standard normal distribution and used the exhaustion algorithm to select the best combination of group sizes, thereby ensuring the resulting power reaches the designated level and is maximal with a minimal total cost. In this manner, optimization of sample size planning is achieved. The proposed sample size determination has a wide range of applications and is efficient in terms of Type I errors and statistical power in simulations. Finally, an illustrative example from a report by the Health Survey for England, 1995–1997, is presented using hypertension data. For ease of application, four R Shiny apps are provided and benchmarks for setting equivalence margins are suggested.  相似文献   

7.
A statistical model for combining p values from multiple tests of significance is used to define rejection and acceptance regions for two-stage and three-stage sampling plans. Type I error rates, power, frequencies of early termination decisions, and expected sample sizes are compared. Both the two-stage and three-stage procedures provide appropriate protection against Type I errors. The two-stage sampling plan with its single interim analysis entails minimal loss in power and provides substantial reduction in expected sample size as compared with a conventional single end-of-study test of significance for which power is in the adequate range. The three-stage sampling plan with its two interim analyses introduces somewhat greater reduction in power, but it compensates with greater reduction in expected sample size. Either interim-analysis strategy is more efficient than a single end-of-study analysis in terms of power per unit of sample size.  相似文献   

8.
In this study, eight statistical selection strategies were evaluated for selecting the parameterizations of log‐linear models used to model the distributions of psychometric tests. The selection strategies included significance tests based on four chi‐squared statistics (likelihood ratio, Pearson, Freeman–Tukey, and Cressie–Read) and four additional strategies (Akaike information criterion (AIC), Bayesian information criterion (BIC), consistent Akaike information criterion (CAIC), and a measure attributed to Goodman). The strategies were evaluated in simulations for different log‐linear models of univariate and bivariate test‐score distributions and two sample sizes. Results showed that all eight selection strategies were most accurate for the largest sample size considered. For univariate distributions, the AIC selection strategy was especially accurate for selecting the correct parameterization of a complex log‐linear model and the likelihood ratio chi‐squared selection strategy was the most accurate strategy for selecting the correct parameterization of a relatively simple log‐linear model. For bivariate distributions, the likelihood ratio chi‐squared, Freeman–Tukey chi‐squared, BIC, and CAIC selection strategies had similarly high selection accuracies.  相似文献   

9.
The authors examined statistical practices in 193 randomized controlled trials (RCTs) of psychological therapies published in prominent psychology and psychiatry journals during 1999-2003. Statistical significance tests were used in 99% of RCTs, 84% discussed clinical significance, but only 46% considered-even minimally-statistical power, 31% interpreted effect size and only 2% interpreted confidence intervals. In a second study, 42 respondents to an email survey of the authors of RCTs analyzed in the first study indicated they consider it very important to know the magnitude and clinical importance of the effect, in addition to whether a treatment effect exists. The present authors conclude that published RCTs focus on statistical significance tests ("Is there an effect or difference?"), and neglect other important questions: "How large is the effect?" and "Is the effect clinically important?" They advocate improved statistical reporting of RCTs especially by reporting and interpreting clinical significance, effect sizes and confidence intervals.  相似文献   

10.
Calculations of the power of statistical tests are important in planning research studies (including meta-analyses) and in interpreting situations in which a result has not proven to be statistically significant. The authors describe procedures to compute statistical power of fixed- and random-effects tests of the mean effect size, tests for heterogeneity (or variation) of effect size parameters across studies, and tests for contrasts among effect sizes of different studies. Examples are given using 2 published meta-analyses. The examples illustrate that statistical power is not always high in meta-analysis.  相似文献   

11.
Choice of the appropriate model in meta‐analysis is often treated as an empirical question which is answered by examining the amount of variability in the effect sizes. When all of the observed variability in the effect sizes can be accounted for based on sampling error alone, a set of effect sizes is said to be homogeneous and a fixed‐effects model is typically adopted. Whether a set of effect sizes is homogeneous or not is usually tested with the so‐called Q test. In this paper, a variety of alternative homogeneity tests – the likelihood ratio, Wald and score tests – are compared with the Q test in terms of their Type I error rate and power for four different effect size measures. Monte Carlo simulations show that the Q test kept the tightest control of the Type I error rate, although the results emphasize the importance of large sample sizes within the set of studies. The results also suggest under what conditions the power of the tests can be considered adequate.  相似文献   

12.
Issues involved in the evaluation of null hypotheses are discussed. The use of equivalence testing is recommended as a possible alternative to the use of simple t or F tests for evaluating a null hypothesis. When statistical power is low and larger sample sizes are not available or practical, consideration should be given to using one-tailed tests or less conservative levels for determining criterion levels of statistical significance. Effect sizes should always be reported along with significance levels, as both are needed to understand results of research. Probabilities alone are not enough and are especially problematic for very large or very small samples. Pre-existing group differences should be tested and properly accounted for when comparing independent groups on dependent variables. If confirmation of a null hypothesis is expected, potential suppressor variables should be considered. If different methods are used to select the samples to be compared, controls for social desirability bias should be implemented. When researchers deviate from these standards or appear to assume that such standards are unimportant or irrelevant, their results should be deemed less credible than when such standards are maintained and followed. Several examples of recent violations of such standards in family social science, comparing gay, lesbian, bisexual, and transgender families with heterosexual families, are provided. Regardless of their political values or expectations, researchers should strive to test null hypotheses rigorously, in accordance with the best professional standards.  相似文献   

13.
Kraemer HC 《心理学方法》2005,10(4):413-419
R. Rosenthal and D. B. Rubin (2003) proposed an effect size, r equivalent, to be used when (a) only sample size and p values are known for a study, (b) there are no generally accepted effect size indicators, or (c) sample sizes are so small or the data so non-normal that the directly computed effect sizes would be more misleading than the simple effect size. The limitations of their proposal, however, are many, and much more serious than the authors suggested, and should be carefully considered before this effect size is applied, as well as in developing other effect sizes using similar methods.  相似文献   

14.
Communication researchers, along with social scientists from a variety of disciplines, are increasingly recognizing the importance of reporting effect sizes to augment significance tests. Serious errors in the reporting of effect sizes, however, have appeared in recently published articles. This article calls for accurate reporting of estimates of effect size. Eta squared (η2) is the most commonly reported estimate of effect sized for the ANOVA. The classical formulation of eta squared (Pearson, 1911; Fisher, 1928) is distinguished from the lesser known partial eta squared (Cohen, 1973), and a mislabeling problem in the statistical software SPSS (1998) is identified. What SPSS reports as eta squared is really partial eta squared. Hence, researchers obtaining estimates of eta squared from SPSS are at risk of reporting incorrect values. Several simulations are reported to demonstrate critical issues. The strengths and limitations of several estimates of effect size used in ANOVA are discussed, as are the implications of the reporting errors. A list of suggestions for researchers is then offered.  相似文献   

15.
Preliminary tests of equality of variances used before a test of location are no longer widely recommended by statisticians, although they persist in some textbooks and software packages. The present study extends the findings of previous studies and provides further reasons for discontinuing the use of preliminary tests. The study found Type I error rates of a two‐stage procedure, consisting of a preliminary Levene test on samples of different sizes with unequal variances, followed by either a Student pooled‐variances t test or a Welch separate‐variances t test. Simulations disclosed that the twostage procedure fails to protect the significance level and usually makes the situation worse. Earlier studies have shown that preliminary tests often adversely affect the size of the test, and also that the Welch test is superior to the t test when variances are unequal. The present simulations reveal that changes in Type I error rates are greater when sample sizes are smaller, when the difference in variances is slight rather than extreme, and when the significance level is more stringent. Furthermore, the validity of the Welch test deteriorates if it is used only on those occasions where a preliminary test indicates it is needed. Optimum protection is assured by using a separate‐variances test unconditionally whenever sample sizes are unequal.  相似文献   

16.
郑昊敏  温忠麟  吴艳 《心理科学进展》2011,19(12):1868-1878
效应量在量化方面弥补了零假设检验的不足。除了报告检验结果外, 许多期刊还要求在研究报告中包括效应量。效应量可以分为三大类别:差异类、相关类和组重叠类, 它们在不同的研究设计(如单因素和多因素被试间、被试内和混合实验设计)或在不同的数据条件下(如小样本、方差异质等)可能有不同的计算方法和用法, 但许多效应量可以相互转换。我们梳理出一个表格有助应用工作者根据研究目的和研究类型选用合适的效应量。  相似文献   

17.
郭春彦  朱滢 《心理科学》1997,20(5):410-413
利用计算机构造被试总体、模拟实验研究程序进行抽样研究,探讨抽样样本总体达到t检验显著性的数目与统计检验能力之间的一致性。模拟实验结果表明,统计检验能力与样本总体t检验显著性的数目具有很高的一致性,但两者同时受到显著性水平α、样本客量n和总体效果大小δ的影响,从而有可能影响统计推断的可靠性。因此,在进行显著性检验过程中,应对统计检验能力进行估计,这将有利于心理学研究成果的积累。  相似文献   

18.
When uncertain about the magnitude of an effect, researchers commonly substitute in the standard sample-size-determination formula an estimate of effect size derived from a previous experiment. A problem with this approach is that the traditional sample-size-determination formula was not designed to deal with the uncertainty inherent in an effect-size estimate. Consequently, estimate-substitution in the traditional sample-size-determination formula can lead to a substantial loss of power. A method of sample-size determination designed to handle uncertainty in effect-size estimates is described. The procedure uses thet value and sample size from a previous study, which might be a pilot study or a related study in the same area, to establish a distribution of probable effect sizes. The sample size to be employed in the new study is that which supplies an expected power of the desired amount over the distribution of probable effect sizes. A FORTRAN 77 program is presented that permits swift calculation of sample size for a variety oft tests, including independentt tests, relatedt tests,t tests of correlation coefficients, andt tests of multiple regressionb coefficients.  相似文献   

19.
Sequential stopping rules (SSRs) should augment traditional hypothesis tests in many planned experiments, because they can provide the same statistical power with up to 30% fewer subjects without additional education or software. This article includes new Monte-Carlo-generated power curves and tables of stopping criteria based on the p values from simulated t tests and one-way ANOVAs. The tables improve existing SSR techniques by holding alpha very close to a target value when 1–10 subjects are added at each iteration. The emphasis is on small sample sizes (3–40 subjects per group) and large standardized effect sizes (0.8–2.0). The generality of the tables for dependent samples and one-tailed tests is discussed. SSR methods should be of interest to ethics bodies governing research when it is desirable to limit the number of subjects tested, such as in studies of pain, experimental disease, or surgery with animal or human subjects.  相似文献   

20.
Improved research practice is based on estimation of effect sizes rather than statistical significance. We discuss the challenging task of interpreting effect sizes in the research context, with particular attention to social psychological research. We emphasize the need to acknowledge the uncertainty in an effect size estimate, as signaled by the confidence interval. Interpretation must consider the independent variables, participants, measures, and other aspects of the research. Comparison with other results in the research field, and consideration of theoretical and practical implications are useful strategies. Researchers should consider the possible value of agreeing on benchmarks to help guide effect size interpretation, at least within focused research fields. More broadly, researchers should wherever possible think of experimental manipulations as well as results in quantitative terms. Doing so is fundamental for designing ingenious, informative experiments, understanding research results and their implications, developing theory, and building a quantitative cumulative social psychology. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号