首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 93 毫秒
Null hypothesis significance testing (NHST) is the most widely accepted and frequently used approach to statistical inference in quantitative communication research. NHST, however, is highly controversial, and several serious problems with the approach have been identified. This paper reviews NHST and the controversy surrounding it. Commonly recognized problems include a sensitivity to sample size, the null is usually literally false, unacceptable Type II error rates, and misunderstanding and abuse. Problems associated with the conditional nature of NHST and the failure to distinguish statistical hypotheses from substantive hypotheses are emphasized. Recommended solutions and alternatives are addressed in a companion article.  相似文献   

Confidence intervals (CIs) for means are frequently advocated as alternatives to null hypothesis significance testing (NHST), for which a common theme in the debate is that conclusions from CIs and NHST should be mutually consistent. The authors examined a class of CIs for which the conclusions are said to be inconsistent with NHST in within-subjects designs and a class for which the conclusions are said to be consistent. The difference between them is a difference in models. In particular, the main issue is that the class for which the conclusions are said to be consistent derives from fixed-effects models with subjects fixed, not mixed models with subjects random. Offered is mixed model methodology that has been popularized in the statistical literature and statistical software procedures. Generalizations to different classes of within-subjects designs are explored, and comments on the future direction of the debate on NHST are offered.  相似文献   

新世纪20年来国内假设检验方法学研究内容可分为如下几类: 零假设显著性检验的不足、p值的使用问题、心理学研究的可重复性问题、效应量、检验力、等效性检验、其他与假设检验关联的研究。零假设显著性检验已经发展成一套组合流程: 为了保证检验力和节省成本, 实验研究需要做先验检验力分析预估样本容量, 但问卷超过160人在传统统计中就没有必要这样做。当拒绝零假设时, 应当结合效应量做出结论。当不拒绝零假设时, 需要报告后验检验力; 如果效应量中或大而检验力不够高, 则可增加被试再行分析, 但这一过程应主动披露, 报告最后的实际p值并对可能犯的第一类错误率做出评估。  相似文献   

人们熟知的零假设显著性检验,受到一次次质疑与辩护,地位并未动摇,报告检验结果仍然是统计分析的习惯做法。不过,其局限性促使研究者探寻更多的统计方法如区间估计、效应量分析、检验力分析等。本文先介绍假设检验与置信区间的关系;然后讨论检验力与两类错误率和效应量的关系;最后在理顺上述统计方法的基础上,提供一个可操作的统计分析流程。  相似文献   

统计推断在科学研究中起到关键作用, 然而当前科研中最常用的经典统计方法——零假设检验(Null hypothesis significance test, NHST)却因难以理解而被部分研究者误用或滥用。有研究者提出使用贝叶斯因子(Bayes factor)作为一种替代和(或)补充的统计方法。贝叶斯因子是贝叶斯统计中用来进行模型比较和假设检验的重要方法, 其可以解读为对零假设H0或者备择假设H1的支持程度。其与NHST相比有如下优势:同时考虑H0H1并可以用来支持H0、不“严重”地倾向于反对H0、可以监控证据强度的变化以及不受抽样计划的影响。目前, 贝叶斯因子能够很便捷地通过开放的统计软件JASP实现, 本文以贝叶斯t检验进行示范。贝叶斯因子的使用对心理学研究者来说具有重要的意义, 但使用时需要注意先验分布选择的合理性以及保持数据分析过程的透明与公开。  相似文献   

There have been frequent attempts in psychology to reduce the reliance on null hypothesis significance testing (NHST) as the criterion for establishing the importance of results. Many authorities now recommend the reporting of effect sizes (ESs) as a supplement or alternative to NHST. However, there is extensive specialist literature highlighting problems associated with the use and interpretation of ESs. A review of the coverage of ESs in over 100 textbooks on statistical analysis in behavioural science revealed widespread neglect of ESs and the relevant critical issues that have widespread coverage in the more specialist literature. For example, many textbooks claim that ESs should be interpreted as a simple measure of the practical real-world importance of a result despite the fact that ESs are profoundly influenced by features of design and analysis strategy. We seek to highlight areas of misunderstanding about ESs found in the pedagogical literature in the light of the more specialist literature and make recommendations to researchers for the appropriate use and interpretation of ESs. This is critical as statistics textbooks have a crucial role in the education of researchers.  相似文献   

吕小康 《心理科学》2012,35(6):1502-1506
假设检验思想的提出者Fisher与Neyman–Pearson在统计模型的方法论基础、两类错误的性质、显著性水平的理解、以及假设检验的功能等方面存在诸多分歧, 使得心理统计中最常用的原假设显著性检验模式呈现出隐含的各种矛盾, 从而引发了应用上的争议。心理统计不仅需要检讨现有检验模型的模糊之处和提出其他补充性的统计推论方式,更应注重反思心理统计的教育传统, 以建立更加开放和多元的统计应用视野, 使心理统计为更好地心理学研究服务。  相似文献   

Null hypothesis significance testing (NHST) is undoubtedly the most common inferential technique used to justify claims in the social sciences. However, even staunch defenders of NHST agree that its outcomes are often misinterpreted. Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual. Nevertheless, little is known about how researchers interpret CIs. In this study, 120 researchers and 442 students—all in the field of psychology—were asked to assess the truth value of six particular statements involving different interpretations of a CI. Although all six statements were false, both researchers and students endorsed, on average, more than three statements, indicating a gross misunderstanding of CIs. Self-declared experience with statistics was not related to researchers’ performance, and, even more surprisingly, researchers hardly outperformed the students, even though the students had not received any education on statistical inference whatsoever. Our findings suggest that many researchers do not know the correct interpretation of a CI. The misunderstandings surrounding p-values and CIs are particularly unfortunate because they constitute the main tools by which psychologists draw conclusions from data.  相似文献   

Null hypothesis significance testing (NHST) is arguably the most widely used approach to hypothesis evaluation among behavioral and social scientists. It is also very controversial. A major concern expressed by critics is that such testing is misunderstood by many of those who use it. Several other objections to its use have also been raised. In this article the author reviews and comments on the claimed misunderstandings as well as on other criticisms of the approach, and he notes arguments that have been advanced in support of NHST. Alternatives and supplements to NHST are considered, as are several related recommendations regarding the interpretation of experimental data. The concluding opinion is that NHST is easily misunderstood and misused but that when applied with good judgment it can be an effective aid to the interpretation of experimental data.  相似文献   

Some methodologists have recently suggested that scientific psychology's over-reliance on null hypothesis significance testing (NHST) impedes the progress of the discipline. In response, a number of defenders have maintained that NHST continues to play a vital role in psychological research. Both sides of the argument to date have been presented abstractly. The authors take a different approach to this issue by illustrating the use of NHST along with 2 possible alternatives (meta-analysis as a primary data analysis strategy and Bayesian approaches) in a series of 3 studies. Comparing and contrasting the approaches on actual data brings out the strengths and weaknesses of each approach. The exercise demonstrates that the approaches are not mutually exclusive but instead can be used to complement one another.  相似文献   

Because of their historical reliance upon null hypothesis statistical tests (NHST), the human sciences have developed a number of potentially problematic research literatures. While aware of the file drawer effect since the 1970s, scientists have been largely unsuccessful at addressing its pernicious effects. Because significant results have a greater likelihood of being published than do nonsignificant effects, many of our research literatures might currently be constructed upon a series of Type I errors and inflated effect sizes. A method (called Original Replication of Meta-Analyses or ORMA) has recently been developed for identifying problematic research literatures and offering a method to address the problems due to publication bias. Philosophers of science have long argued that a chief reason for science's preeminence as a source of knowledge rests in its ability to self-correct. Researchers in the human sciences are now able to empirically test their research literatures to ascertain which are in need of repair. The use of ORMA serves to lessen the problems that led to the recent calls for bans on significant/nonsignificant statistics in human science research. ORMA will also improve psychology's ability to successfully replicate its research findings.  相似文献   

Running studies with high statistical power, while effect size estimates in psychology are often inaccurate, leads to a practical challenge when designing an experiment. This challenge can be addressed by performing sequential analyses while the data collection is still in progress. At an interim analysis, data collection can be stopped whenever the results are convincing enough to conclude that an effect is present, more data can be collected, or the study can be terminated whenever it is extremely unlikely that the predicted effect will be observed if data collection would be continued. Such interim analyses can be performed while controlling the Type 1 error rate. Sequential analyses can greatly improve the efficiency with which data are collected. Additional flexibility is provided by adaptive designs where sample sizes are increased on the basis of the observed effect size. The need for pre‐registration, ways to prevent experimenter bias, and a comparison between Bayesian approaches and null‐hypothesis significance testing (NHST) are discussed. Sequential analyses, which are widely used in large‐scale medical trials, provide an efficient way to perform high‐powered informative experiments. I hope this introduction will provide a practical primer that allows researchers to incorporate sequential analyses in their research. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

Geoffrey Loftus, Editor of Memory & Cognition from 1994 to 1997, strongly encouraged presentation of figures with error bars and avoidance of null hypothesis significance testing (NHST). The authors examined 696 Memory & Cognition articles published before, during, and after the Loftus editorship. Use of figures with bars increased to 47% under Loftus's editorship and then declined. Bars were rarely used for interpretation, and NHST remained almost universal. Analysis of 309 articles in other psychology journals confirmed that Loftus's influence was most evident in the articles he accepted for publication, but was otherwise limited. An e-mail survey of authors of papers accepted by Loftus revealed some support for his policy, but allegiance to traditional practices as well. Reform of psychologists' statistical practices would require more than editorial encouragement.  相似文献   

We investigated the way experienced users interpret Null Hypothesis Significance Testing (NHST) outcomes. An empirical study was designed to compare the reactions of two populations of NHST users, psychological researchers and professional applied statisticians, when faced with contradictory situations. The subjects were presented with the results of an experiment designed to test the efficacy of a drug by comparing two groups (treatment/placebo). Four situations were constructed by combining the outcome of the t test (significant vs. nonsignificant) and the observed difference between the two means D (large vs. small). Two of these situations appeared as conflicting (t significant/D small and t nonsignificant/D large). Three fundamental aspects of statistical inference were investigated by means of open questions: drawing inductive conclusions about the magnitude of the true difference from the data in hand, making predictions for future data, and making decisions about stopping the experiment. The subjects were 25 statisticians from pharmaceutical companies in France, subjects well versed in statistics, and 20 psychological researchers from various laboratories in France, all with experience in processing and analyzing experimental data. On the whole, statisticians and psychologists reacted in a similar way and were very impressed by significant results. It must be outlined that professional applied statisticians were not immune to misinterpretations, especially in the case of nonsignificance. However, the interpretations that accustomed users attach to the outcome of NHST can vary from one individual to another, and it is hard to conceive that there could be a consensus in the face of seemingly conflicting situations. In fact, beyond the superficial report of “erroneous” interpretations, it can be seen in the misuses of NHST intuitive judgmental “adjustments” that try to overcome its inherent shortcomings. These findings encourage the many recent attempts to improve the habitual ways of analyzing and reporting experimental data.  相似文献   

Null hypothesis significance testing (NHST) is the researcher's workhorse for making inductive inferences. This method has often been challenged, has occasionally been defended, and has persistently been used through most of the history of scientific psychology. This article reviews both the criticisms of NHST and the arguments brought to its defense. The review shows that the criticisms address the logical validity of inferences arising from NHST, whereas the defenses stress the pragmatic value of these inferences. The author suggests that both critics and apologists implicitly rely on Bayesian assumptions. When these assumptions are made explicit, the primary challenge for NHST--and any system of induction--can be confronted. The challenge is to find a solution to the question of replicability.  相似文献   

Geoffrey Loftus, Editor ofMemory & Cognition from 1994 to 1997, strongly encouraged presentation of figures with error bars and avoidance of null hypothesis significance testing (NHST). The authors examined 696Memory & Cognition articles published before, during, and after the Loftus editorship. Use of figures with bars increased to 47% under Loftus’s editorship and then declined. Bars were rarely used for interpretation, and NHST remained almost universal. Analysis of 309 articles in other psychology journals confirmed that Loftus’s influence was most evident in the articles he accepted for publication, but was otherwise limited. An e-mail survey of authors of papers accepted by Loftus revealed some support for his policy, but allegiance to traditional practices as well. Reform of psychologists’ statistical practices would require more than editorial encouragement.  相似文献   

Null hypothesis significance testing (NHST) is the most commonly used statistical methodology in psychology. The probability of achieving a value as extreme or more extreme than the statistic obtained from the data is evaluated, and if it is low enough, the null hypothesis is rejected. However, because common experimental practice often clashes with the assumptions underlying NHST, these calculated probabilities are often incorrect. Most commonly, experimenters use tests that assume that sample sizes are fixed in advance of data collection but then use the data to determine when to stop; in the limit, experimenters can use data monitoring to guarantee that the null hypothesis will be rejected. Bayesian hypothesis testing (BHT) provides a solution to these ills because the stopping rule used is irrelevant to the calculation of a Bayes factor. In addition, there are strong mathematical guarantees on the frequentist properties of BHT that are comforting for researchers concerned that stopping rules could influence the Bayes factors produced. Here, we show that these guaranteed bounds have limited scope and often do not apply in psychological research. Specifically, we quantitatively demonstrate the impact of optional stopping on the resulting Bayes factors in two common situations: (1) when the truth is a combination of the hypotheses, such as in a heterogeneous population, and (2) when a hypothesis is composite—taking multiple parameter values—such as the alternative hypothesis in a t-test. We found that, for these situations, while the Bayesian interpretation remains correct regardless of the stopping rule used, the choice of stopping rule can, in some situations, greatly increase the chance of experimenters finding evidence in the direction they desire. We suggest ways to control these frequentist implications of stopping rules on BHT.  相似文献   

Lee MD  Wagenmakers EJ 《Psychological review》2005,112(3):662-8; discussion 669-74
D. Trafimow (2003) presented an analysis of null hypothesis significance testing (NHST) using Bayes's theorem. Among other points, he concluded that NHST is logically invalid, but that logically valid Bayesian analyses are often not possible. The latter conclusion reflects a fundamental misunderstanding of the nature of Bayesian inference. This view needs correction, because Bayesian methods have an important role to play in many psychological problems where standard techniques are inadequate. This comment, with the help of a simple example, explains the usefulness of Bayesian inference for psychology.  相似文献   

Abstract— Null-hypothesis significance tests (NHST), properly used, tell us whether we have sufficient evidence to be confident of the sign of the population effect—but only if we abandon two-valued logic in favor of Kaiser's (1960) three-alternative hypothesis tests Confidence intervals provide a useful addition to NHSTs, and can be used to provide the same sign-determination function as NHST However, when so used, confidence intervals are subject to exactly the same Type I, II, and III error rates as NHST In addition, NHSTs provide two pieces of information about our data—maximum probability of a Type III error and probability of a successful exact replication—that confidence intervals do not The proposed alternative to NHST is just as susceptible to misinterpretation as is NHST The problem of bias due to censoring of data collection or publication can be handled by providing archives for all methodologically sound data sets, but reserving interpretations and conclusions for statistically significant results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号