首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Traditional null hypothesis significance testing does not yield the probability of the null or its alternative and, therefore, cannot logically ground scientific decisions. The decision theory proposed here calculates the expected utility of an effect on the basis of (1) the probability of replicating it and (2) a utility function on its size. It takes significance tests—which place all value on the replicability of an effect and none on its magnitude—as a special case, one in which the cost of a false positive is revealed to be an order of magnitude greater than the value of a true positive. More realistic utility functions credit both replicability and effect size, integrating them for a single index of merit. The analysis incorporates opportunity cost and is consistent with alternate measures of effect size, such as r2 and information transmission, and with Bayesian model selection criteria. An alternate formulation is functionally equivalent to the formal theory, transparent, and easy to compute.  相似文献   

2.
Solving theoretical or empirical issues sometimes involves establishing the equality of two variables with repeated measures. This defies the logic of null hypothesis significance testing, which aims at assessing evidence against the null hypothesis of equality, not for it. In some contexts, equivalence is assessed through regression analysis by testing for zero intercept and unit slope (or simply for unit slope in case that regression is forced through the origin). This paper shows that this approach renders highly inflated Type I error rates under the most common sampling models implied in studies of equivalence. We propose an alternative approach based on omnibus tests of equality of means and variances and in subject-by-subject analyses (where applicable), and we show that these tests have adequate Type I error rates and power. The approach is illustrated with a re-analysis of published data from a signal detection theory experiment with which several hypotheses of equivalence had been tested using only regression analysis. Some further errors and inadequacies of the original analyses are described, and further scrutiny of the data contradict the conclusions raised through inadequate application of regression analyses.  相似文献   

3.
Running studies with high statistical power, while effect size estimates in psychology are often inaccurate, leads to a practical challenge when designing an experiment. This challenge can be addressed by performing sequential analyses while the data collection is still in progress. At an interim analysis, data collection can be stopped whenever the results are convincing enough to conclude that an effect is present, more data can be collected, or the study can be terminated whenever it is extremely unlikely that the predicted effect will be observed if data collection would be continued. Such interim analyses can be performed while controlling the Type 1 error rate. Sequential analyses can greatly improve the efficiency with which data are collected. Additional flexibility is provided by adaptive designs where sample sizes are increased on the basis of the observed effect size. The need for pre‐registration, ways to prevent experimenter bias, and a comparison between Bayesian approaches and null‐hypothesis significance testing (NHST) are discussed. Sequential analyses, which are widely used in large‐scale medical trials, provide an efficient way to perform high‐powered informative experiments. I hope this introduction will provide a practical primer that allows researchers to incorporate sequential analyses in their research. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

4.
Chow SL 《The Behavioral and brain sciences》1998,21(2):169-94; discussion 194-239
The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of data. Neither can any of them fulfill the nonstatistical functions expected of them by critics.  相似文献   

5.
We introduce a new, readily computed statistic, the counternull value of an obtained effect size, which is the nonnull magnitude of effect size that is supported by exactly the same amount of evidence as supports the null value of the effect size In other words, if the counternull value were taken as the null hypothesis, the resulting p value would be the same as the obtained p value for the actual null hypothesis Reporting the counternull, in addition to the p value, virtually eliminates two common errors (a) equating failure to reject the null with the estimation of the effect size as equal to zero and (b) takmg the rejection of a null hypothesis on the basis of a significant p value to imply a scientifically important finding In many common situations with a one-degree-of-freedom effect size, the value of the counternull is simply twice the magnitude of the obtained effect size, but the counternull is defined in general, even with multidegree-of-freedom effect sizes, and therefore can be applied when a confidence interval cannot be The use of the counternull can be especially useful in meta-analyses when evaluating the scientific importance of summary effect sizes  相似文献   

6.
新世纪20年来国内假设检验方法学研究内容可分为如下几类: 零假设显著性检验的不足、p值的使用问题、心理学研究的可重复性问题、效应量、检验力、等效性检验、其他与假设检验关联的研究。零假设显著性检验已经发展成一套组合流程: 为了保证检验力和节省成本, 实验研究需要做先验检验力分析预估样本容量, 但问卷超过160人在传统统计中就没有必要这样做。当拒绝零假设时, 应当结合效应量做出结论。当不拒绝零假设时, 需要报告后验检验力; 如果效应量中或大而检验力不够高, 则可增加被试再行分析, 但这一过程应主动披露, 报告最后的实际p值并对可能犯的第一类错误率做出评估。  相似文献   

7.
This article concerns acceptance of the null hypothesis that one variable has no effect on another. Despite frequent opinions to the contrary, this null hypothesis can be correct in some situations. Appropriate criteria for accepting the null hypothesis are (1) that the null hypothesis is possible; (2) that the results are consistent with the null hypothesis; and (3) that the experiment was a good effort to find an effect. These criteria are consistent with the meta-rules for psychology. The good-effort criterion is subjective, which is somewhat undesirable, but the alternative—never accepting the null hypothesis—is neither desirable nor practical.  相似文献   

8.
Effect sizes for experimenting psychologists.   总被引:2,自引:0,他引:2  
This article describes three families of effect size estimators and their use in situations of general and specific interest to experimenting psychologists. The situations discussed include both between- and within-group (repeated measures) designs. Also described is the counternull statistic, which is useful in preventing common errors of interpretation in null hypothesis significance testing. The emphasis is on correlation (r-type) effect size indicators, but a wide variety of difference-type and ratio-type effect size estimators are also described.  相似文献   

9.
Functional magnetic reasonance imaging (fMRI) plays an important role in pre-surgical planning for patients with resectable brain lesions such as tumors. With appropriately designed tasks, the results of fMRI studies can guide resection, thereby preserving vital brain tissue. The mass univariate approach to fMRI data analysis consists of performing a statistical test in each voxel, which is used to classify voxels as either active or inactive—that is, related, or not, to the task of interest. In cognitive neuroscience, the focus is on controlling the rate of false positives while accounting for the severe multiple testing problem of searching the brain for activations. However, stringent control of false positives is accompanied by a risk of false negatives, which can be detrimental, particularly in clinical settings where false negatives may lead to surgical resection of vital brain tissue. Consequently, for clinical applications, we argue for a testing procedure with a stronger focus on preventing false negatives. We present a thresholding procedure that incorporates information on false positives and false negatives. We combine two measures of significance for each voxel: a classical p-value, which reflects evidence against the null hypothesis of no activation, and an alternative p-value, which reflects evidence against activation of a prespecified size. This results in a layered statistical map for the brain. One layer marks voxels exhibiting strong evidence against the traditional null hypothesis, while a second layer marks voxels where activation cannot be confidently excluded. The third layer marks voxels where the presence of activation can be rejected.  相似文献   

10.
Although the consequences of ignoring a nested factor on decisions to reject the null hypothesis of no treatment effects have been discussed in the literature, typically researchers in applied psychology and education ignore treatment providers (often a nested factor) when comparing the efficacy of treatments. The incorrect analysis, however, not only invalidates tests of hypotheses, but it also overestimates the treatment effect. Formulas were derived and a Monte Carlo study was conducted to estimate the degree to which the F statistic and treatment effect size measures are inflated by ignoring the effects due to providers of treatments. These untoward effects are illustrated with examples from psychotherapeutic treatments.  相似文献   

11.
12.
王阳  温忠麟  付媛姝 《心理科学进展》2020,28(11):1961-1969
常用的结构方程模型拟合指数存在一定局限, 如χ 2以传统零假设为目标假设, 无法验证模型, 而RMSEA和CFI等描述性的拟合指数不具备推断统计性质, 等效性检验有效弥补了这些问题。首先说明等效性检验如何评价单个模型的拟合, 并解释其与零假设检验的不同, 然后介绍等效性检验如何分析测量不变性, 接着用实证数据展示了等效性检验在单个模型评价和测量不变性检验中的效果, 并与传统模型评价方法比较。  相似文献   

13.
For comparing nested covariance structure models, the standard procedure is the likelihood ratio test of the difference in fit, where the null hypothesis is that the models fit identically in the population. A procedure for determining statistical power of this test is presented where effect size is based on a specified difference in overall fit of the models. A modification of the standard null hypothesis of zero difference in fit is proposed allowing for testing an interval hypothesis that the difference in fit between models is small, rather than zero. These developments are combined yielding a procedure for estimating power of a test of a null hypothesis of small difference in fit versus an alternative hypothesis of larger difference.  相似文献   

14.
Several studies aimed at testing the validity of Holland's hexagonal and Roe's circular models of interests showed results on which the null hypothesis of random arrangement can be rejected, and the investigators concluded that the tested models were supported. None of these studies, however, tested each model in its entirety. The present study is based on the assumption that the rejection of the null hypothesis of chance is not rigorous enough. Reanalysis of 13 data sets of published studies, using a more rigorous method, reveals that although the random null hypothesis can in fact be rejected in 11 data sets, the hexagonal-circular model was supported by only 2 data sets and was rejected by 11 data sets. The hierarchical model for the structure of vocational interests (I. Gati, Journal of Vocational Behavior, 1979, 15, 90–106) was submitted to an identical test and was supported by 6 out of 10 data sets, including 4 data sets that rejected the hexagonal-circular model. The predictions of each of the models which tend to be discontinued by empirical data were identified. The implications of the findings for the structure of interests and occupational choice are discussed.  相似文献   

15.
L. V. Jones and J. W. Tukey (2000) pointed out that the usual 2-sided, equal-tails null hypothesis test at level alpha can be reinterpreted as simultaneous tests of 2 directional inequality hypotheses, each at level alpha/2, and that the maximum probability of a Type I error is alpha/2 if the truth of the null hypothesis is considered impossible. This article points out that in multiple testing with familywise error rate controlled at alpha, the directional error rate (assuming all null hypotheses are false) is greater than alpha/2 and can be arbitrarily close to alpha. Single-step, step-down, and step-up procedures are analyzed, and other error rates, including the false discovery rate, are discussed. Implications for confidence interval estimation and hypothesis testing practices are considered.  相似文献   

16.
Null-hypothesis significance testing remains the standard inferential tool in cognitive science despite its serious disadvantages. Primary among these is the fact that the resulting probability value does not tell the researcher what he or she usually wants to know: How probable is a hypothesis, given the obtained data? Inspired by developments presented by Wagenmakers (Psychonomic Bulletin &; Review, 14, 779–804, 2007), I provide a tutorial on a Bayesian model selection approach that requires only a simple transformation of sum-of-squares values generated by the standard analysis of variance. This approach generates a graded level of evidence regarding which model (e.g., effect absent [null hypothesis] vs. effect present [alternative hypothesis]) is more strongly supported by the data. This method also obviates admonitions never to speak of accepting the null hypothesis. An Excel worksheet for computing the Bayesian analysis is provided as supplemental material.  相似文献   

17.
新世纪头20年, 国内心理学11本专业期刊一共发表了213篇统计方法研究论文。研究范围主要包括以下10类(按论文篇数排序):结构方程模型、测验信度、中介效应、效应量与检验力、纵向研究、调节效应、探索性因子分析、潜在类别模型、共同方法偏差和多层线性模型。对各类做了简单的回顾与梳理。结果发现, 国内心理统计方法研究的广度和深度都不断增加, 研究热点在相互融合中共同发展; 但综述类论文比例较大, 原创性研究论文比例有待提高, 研究力量也有待加强。  相似文献   

18.
郑昊敏  温忠麟  吴艳 《心理科学进展》2011,19(12):1868-1878
效应量在量化方面弥补了零假设检验的不足。除了报告检验结果外, 许多期刊还要求在研究报告中包括效应量。效应量可以分为三大类别:差异类、相关类和组重叠类, 它们在不同的研究设计(如单因素和多因素被试间、被试内和混合实验设计)或在不同的数据条件下(如小样本、方差异质等)可能有不同的计算方法和用法, 但许多效应量可以相互转换。我们梳理出一个表格有助应用工作者根据研究目的和研究类型选用合适的效应量。  相似文献   

19.
The practice of statistical inference in psychological research is critically reviewed. Particular emphasis is put on the fast pace of change from the sole reliance on null hypothesis significance testing (NHST) to the inclusion of effect size estimates, confidence intervals, and an interest in the Bayesian approach. We conclude that these developments are helpful for psychologists seeking to extract a maximum of useful information from statistical research data, and that seven decades of criticism against NHST is finally having an effect.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号