首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Bayes factor approaches for testing interval null hypotheses   总被引:1,自引:0,他引:1  
Psychological theories are statements of constraint. The role of hypothesis testing in psychology is to test whether specific theoretical constraints hold in data. Bayesian statistics is well suited to the task of finding supporting evidence for constraint, because it allows for comparing evidence for 2 hypotheses against each another. One issue in hypothesis testing is that constraints may hold only approximately rather than exactly, and the reason for small deviations may be trivial or uninteresting. In the large-sample limit, these uninteresting, small deviations lead to the rejection of a useful constraint. In this article, we develop several Bayes factor 1-sample tests for the assessment of approximate equality and ordinal constraints. In these tests, the null hypothesis covers a small interval of non-0 but negligible effect sizes around 0. These Bayes factors are alternatives to previously developed Bayes factors, which do not allow for interval null hypotheses, and may especially prove useful to researchers who use statistical equivalence testing. To facilitate adoption of these Bayes factor tests, we provide easy-to-use software.  相似文献   

2.
In the present article, we focus on two indices that quantify directionality and skew-symmetrical patterns in social interactions as measures of social reciprocity: the directional consistency (DC) and skew-symmetry indices. Although both indices enable researchers to describe social groups, most studies require statistical inferential tests. The main aims of the present study are first, to propose an overall statistical technique for testing null hypotheses regarding social reciprocity in behavioral studies, using the DC and skew-symmetry statistics (Phi) at group level; and second, to compare both statistics in order to allow researchers to choose the optimal measure depending on the conditions. In order to allow researchers to make statistical decisions, statistical significance for both statistics has been estimated by means of a Monte Carlo simulation. Furthermore, this study will enable researchers to choose the optimal observational conditions for carrying out their research, since the power of the statistical tests has been estimated.  相似文献   

3.
Null hypothesis significance testing (NHST) is the most commonly used statistical methodology in psychology. The probability of achieving a value as extreme or more extreme than the statistic obtained from the data is evaluated, and if it is low enough, the null hypothesis is rejected. However, because common experimental practice often clashes with the assumptions underlying NHST, these calculated probabilities are often incorrect. Most commonly, experimenters use tests that assume that sample sizes are fixed in advance of data collection but then use the data to determine when to stop; in the limit, experimenters can use data monitoring to guarantee that the null hypothesis will be rejected. Bayesian hypothesis testing (BHT) provides a solution to these ills because the stopping rule used is irrelevant to the calculation of a Bayes factor. In addition, there are strong mathematical guarantees on the frequentist properties of BHT that are comforting for researchers concerned that stopping rules could influence the Bayes factors produced. Here, we show that these guaranteed bounds have limited scope and often do not apply in psychological research. Specifically, we quantitatively demonstrate the impact of optional stopping on the resulting Bayes factors in two common situations: (1) when the truth is a combination of the hypotheses, such as in a heterogeneous population, and (2) when a hypothesis is composite—taking multiple parameter values—such as the alternative hypothesis in a t-test. We found that, for these situations, while the Bayesian interpretation remains correct regardless of the stopping rule used, the choice of stopping rule can, in some situations, greatly increase the chance of experimenters finding evidence in the direction they desire. We suggest ways to control these frequentist implications of stopping rules on BHT.  相似文献   

4.
Progress in science often comes from discovering invariances in relationships among variables; these invariances often correspond to null hypotheses. As is commonly known, it is not possible to state evidence for the null hypothesis in conventional significance testing. Here we highlight a Bayes factor alternative to the conventional t test that will allow researchers to express preference for either the null hypothesis or the alternative. The Bayes factor has a natural and straightforward interpretation, is based on reasonable assumptions, and has better properties than other methods of inference that have been advocated in the psychological literature. To facilitate use of the Bayes factor, we provide an easy-to-use, Web-based program that performs the necessary calculations.  相似文献   

5.
6.
The log-linear model for contingency tables expresses the logarithm of a cell frequency as an additive function of main effects, interactions, etc., in a way formally identical with an analysis of variance model. Exact statistical tests are developed to test hypotheses that specific effects or sets of effects are zero, yielding procedures for exploring relationships among qualitative variables which are suitable for small samples. The tests are analogous to Fisher's exact test for a 2 × 2 contingency table. Given a hypothesis, the exact probability of the obtained table is determined, conditional on fixed marginals or other functions of the cell frequencies. The sum of the probabilities of the obtained table and of all less probable ones is the exact probability to be considered in testing the null hypothesis. Procedures for obtaining exact probabilities are explained in detail, with examples given.  相似文献   

7.
Chow SL 《The Behavioral and brain sciences》1998,21(2):169-94; discussion 194-239
The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of data. Neither can any of them fulfill the nonstatistical functions expected of them by critics.  相似文献   

8.
The equality of two group variances is frequently tested in experiments. However, criticisms of null hypothesis statistical testing on means have recently arisen and there is interest in other types of statistical tests of hypotheses, such as superiority/non-inferiority and equivalence. Although these tests have become more common in psychology and social sciences, the corresponding sample size estimation for these tests is rarely discussed, especially when the sampling unit costs are unequal or group sizes are unequal for two groups. Thus, for finding optimal sample size, the present study derived an initial allocation by approximating the percentiles of an F distribution with the percentiles of the standard normal distribution and used the exhaustion algorithm to select the best combination of group sizes, thereby ensuring the resulting power reaches the designated level and is maximal with a minimal total cost. In this manner, optimization of sample size planning is achieved. The proposed sample size determination has a wide range of applications and is efficient in terms of Type I errors and statistical power in simulations. Finally, an illustrative example from a report by the Health Survey for England, 1995–1997, is presented using hypertension data. For ease of application, four R Shiny apps are provided and benchmarks for setting equivalence margins are suggested.  相似文献   

9.
10.
There have been frequent attempts in psychology to reduce the reliance on null hypothesis significance testing (NHST) as the criterion for establishing the importance of results. Many authorities now recommend the reporting of effect sizes (ESs) as a supplement or alternative to NHST. However, there is extensive specialist literature highlighting problems associated with the use and interpretation of ESs. A review of the coverage of ESs in over 100 textbooks on statistical analysis in behavioural science revealed widespread neglect of ESs and the relevant critical issues that have widespread coverage in the more specialist literature. For example, many textbooks claim that ESs should be interpreted as a simple measure of the practical real-world importance of a result despite the fact that ESs are profoundly influenced by features of design and analysis strategy. We seek to highlight areas of misunderstanding about ESs found in the pedagogical literature in the light of the more specialist literature and make recommendations to researchers for the appropriate use and interpretation of ESs. This is critical as statistics textbooks have a crucial role in the education of researchers.  相似文献   

11.
Randomization tests are valid alternatives to parametric tests like the t test and analysis of variance when the normality or random sampling assumptions of these tests are violated. Three SPSS programs are listed and described that will conduct approximate randomization tests for testing the null hypotheses that two or more means or distributions are the same or that two variables are independent (i.e., uncorrelated or “randomly associated”). The programs will work on both desktop and mainframe versions of SPSS. Although the SPSS programs are slower on desktop machines than software designed explicitly for randomization tests, these programs bring randomization tests into the reach of researchers who prefer the SPSS computing environment for data analysis.  相似文献   

12.
Although the consequences of ignoring a nested factor on decisions to reject the null hypothesis of no treatment effects have been discussed in the literature, typically researchers in applied psychology and education ignore treatment providers (often a nested factor) when comparing the efficacy of treatments. The incorrect analysis, however, not only invalidates tests of hypotheses, but it also overestimates the treatment effect. Formulas were derived and a Monte Carlo study was conducted to estimate the degree to which the F statistic and treatment effect size measures are inflated by ignoring the effects due to providers of treatments. These untoward effects are illustrated with examples from psychotherapeutic treatments.  相似文献   

13.
吕小康 《心理科学》2012,35(6):1502-1506
假设检验思想的提出者Fisher与Neyman–Pearson在统计模型的方法论基础、两类错误的性质、显著性水平的理解、以及假设检验的功能等方面存在诸多分歧, 使得心理统计中最常用的原假设显著性检验模式呈现出隐含的各种矛盾, 从而引发了应用上的争议。心理统计不仅需要检讨现有检验模型的模糊之处和提出其他补充性的统计推论方式,更应注重反思心理统计的教育传统, 以建立更加开放和多元的统计应用视野, 使心理统计为更好地心理学研究服务。  相似文献   

14.
The ease with which data can be collected and analyzed via personal computer makes it potentially attractive to “peek” at the data before a target sample size is achieved. This tactic might seem appealing because data collection could be stopped early, which would save valuable resources, if a peek revealed a significant effect. Unfortunately, such data snooping comes with a cost. When the null hypothesis is true, the Type I error rate is inflated, sometimes quite substantially. If the null hypothesis is false, premature significance testing leads to inflated estimates of power and effect size. This program provides simulation results for a wide variety of premature and repeated null hypothesis testing scenarios. It gives researchers the ability to know in advance the consequences of data peeking so that appropriate corrective action can be taken.  相似文献   

15.
The variable-criteria sequential stopping rule (SSR) is a method for conducting planned experiments in stages after the addition of new subjects until the experiment is stopped because the p value is less than or equal to a lower criterion and the null hypothesis has been rejected, the p value is above an upper criterion, or a maximum sample size has been reached. Alpha is controlled at the expected level. The table of stopping criteria has been validated for a t test or ANOVA with four groups. New simulations in this article demonstrate that the SSR can be used with unequal sample sizes or heterogeneous variances in a t test. As with the usual t test, the use of a separate-variance term instead of a pooled-variance term prevents an inflation of alpha with heterogeneous variances. Simulations validate the original table of criteria for up to 20 groups without a drift of alpha. When used with a multigroup ANOVA, a planned contrast can be substituted for the global F as the focus for the stopping rule. The SSR is recommended when significance tests are appropriate and when the null hypothesis can be tested in stages. Because of its efficiency, the SSR should be used instead of the usual approach to the t test or ANOVA when subjects are expensive, rare, or limited by ethical considerations such as pain or distress.  相似文献   

16.
Discounting is the process by which outcomes lose value. Much of discounting research has focused on differences in the degree of discounting across various groups. This research has relied heavily on conventional null hypothesis significance tests that are familiar to psychologists, such as t‐tests and ANOVAs. As discounting research questions have become more complex by simultaneously focusing on within‐subject and between‐group differences, conventional statistical testing is often not appropriate for the obtained data. Generalized estimating equations (GEE) are one type of mixed‐effects model that are designed to handle autocorrelated data, such as within‐subject repeated‐measures data, and are therefore more appropriate for discounting data. To determine if GEE provides similar results as conventional statistical tests, we compared the techniques across 2,000 simulated data sets. The data sets were created using a Monte Carlo method based on an existing data set. Across the simulated data sets, the GEE and the conventional statistical tests generally provided similar patterns of results. As the GEE and more conventional statistical tests provide the same pattern of result, we suggest researchers use the GEE because it was designed to handle data that has the structure that is typical of discounting data.  相似文献   

17.
Solving theoretical or empirical issues sometimes involves establishing the equality of two variables with repeated measures. This defies the logic of null hypothesis significance testing, which aims at assessing evidence against the null hypothesis of equality, not for it. In some contexts, equivalence is assessed through regression analysis by testing for zero intercept and unit slope (or simply for unit slope in case that regression is forced through the origin). This paper shows that this approach renders highly inflated Type I error rates under the most common sampling models implied in studies of equivalence. We propose an alternative approach based on omnibus tests of equality of means and variances and in subject-by-subject analyses (where applicable), and we show that these tests have adequate Type I error rates and power. The approach is illustrated with a re-analysis of published data from a signal detection theory experiment with which several hypotheses of equivalence had been tested using only regression analysis. Some further errors and inadequacies of the original analyses are described, and further scrutiny of the data contradict the conclusions raised through inadequate application of regression analyses.  相似文献   

18.
Abstract— Significance testing of null hypotheses is the standard epistemologicat method for advancing scientific knowledge in psychology, even though a has drawbacks and it leads to common inferential mistakes These mistakes include accepting the null hypothesis when it fails to be rejected, automatically interpreting rejected null hypotheses as theoretically meaningful, and failing to consider the likelihood of Type II errors Although these mistakes have been discussed repeatedly for decades, there is no evidence that the academic discussion has had an impact A group of methodologists is proposing a new approach simply ban significance tests in psychology journals The impact of a similar ban in public-health and epidemiology journals is reported  相似文献   

19.
This study conducted a statistical power analysis of 64 articles appearing in the first four volumes of Human Communication Research, 1974–1978. Each article was examined, using Cohen's revised handbook, assuming nondirectional null hypotheses and an alpha level of .05. Statistical power, the probability of rejecting a false null hypothesis, was calculated for small, medium, and large experimental effect sizes and averaged by article and volume. Results indicated that the average probability of beta errors appears to have decreased over time, providing a greater chance of rejecting false null hypotheses, but this also raised several power-related issues relevant to communication research in general.  相似文献   

20.
Effect sizes (e.g., Cohen's d, Glass's Δ, η2, adjusted R2, ω2) quantify the extent to which sample results diverge from the expectations specified in the null hypothesis. The present article addresses 5 related questions. First, is the advocacy for reporting and interpreting effect sizes part of the controversy over statistical significance testing? Second, why cannot p values be used as effect sizes? Third, what are the various categories of effect sizes and some commonly used examples of each type? Fourth, how should effect sizes be interpreted? Fifth, what are some recommendations for further reading?  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号