期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Bayesian t tests for accepting and rejecting the null hypothesis

Jeffrey N. Rouder Paul L. Speckman Dongchu Sun Richard D. Morey Geoffrey Iverson 《Psychonomic bulletin & review》2009,16(2):225-237

Progress in science often comes from discovering invariances in relationships among variables; these invariances often correspond to null hypotheses. As is commonly known, it is not possible to state evidence for the null hypothesis in conventional significance testing. Here we highlight a Bayes factor alternative to the conventional t test that will allow researchers to express preference for either the null hypothesis or the alternative. The Bayes factor has a natural and straightforward interpretation, is based on reasonable assumptions, and has better properties than other methods of inference that have been advocated in the psychological literature. To facilitate use of the Bayes factor, we provide an easy-to-use, Web-based program that performs the necessary calculations. 相似文献

2.

Bayes factors for testing inequality constrained hypotheses: Issues with prior specification

Joris Mulder 《The British journal of mathematical and statistical psychology》2014,67(1):153-171

Several issues are discussed when testing inequality constrained hypotheses using a Bayesian approach. First, the complexity (or size) of the inequality constrained parameter spaces can be ignored. This is the case when using the posterior probability that the inequality constraints of a hypothesis hold, Bayes factors based on non‐informative improper priors, and partial Bayes factors based on posterior priors. Second, the Bayes factor may not be invariant for linear one‐to‐one transformations of the data. This can be observed when using balanced priors which are centred on the boundary of the constrained parameter space with a diagonal covariance structure. Third, the information paradox can be observed. When testing inequality constrained hypotheses, the information paradox occurs when the Bayes factor of an inequality constrained hypothesis against its complement converges to a constant as the evidence for the first hypothesis accumulates while keeping the sample size fixed. This paradox occurs when using Zellner's g prior as a result of too much prior shrinkage. Therefore, two new methods are proposed that avoid these issues. First, partial Bayes factors are proposed based on transformed minimal training samples. These training samples result in posterior priors that are centred on the boundary of the constrained parameter space with the same covariance structure as in the sample. Second, a g prior approach is proposed by letting g go to infinity. This is possible because the Jeffreys–Lindley paradox is not an issue when testing inequality constrained hypotheses. A simulation study indicated that the Bayes factor based on this g prior approach converges fastest to the true inequality constrained hypothesis. 相似文献

3.

Automatic Bayes Factors for Testing Equality- and Inequality-Constrained Hypotheses on Variances

Florian Böing-Messing Joris Mulder 《Psychometrika》2018,83(3):586-617

In comparing characteristics of independent populations, researchers frequently expect a certain structure of the population variances. These expectations can be formulated as hypotheses with equality and/or inequality constraints on the variances. In this article, we consider the Bayes factor for testing such (in)equality-constrained hypotheses on variances. Application of Bayes factors requires specification of a prior under every hypothesis to be tested. However, specifying subjective priors for variances based on prior information is a difficult task. We therefore consider so-called automatic or default Bayes factors. These methods avoid the need for the user to specify priors by using information from the sample data. We present three automatic Bayes factors for testing variances. The first is a Bayes factor with equal priors on all variances, where the priors are specified automatically using a small share of the information in the sample data. The second is the fractional Bayes factor, where a fraction of the likelihood is used for automatic prior specification. The third is an adjustment of the fractional Bayes factor such that the parsimony of inequality-constrained hypotheses is properly taken into account. The Bayes factors are evaluated by investigating different properties such as information consistency and large sample consistency. Based on this evaluation, it is concluded that the adjusted fractional Bayes factor is generally recommendable for testing equality- and inequality-constrained hypotheses on variances. 相似文献

4.

The frequentist implications of optional stopping on Bayesian hypothesis tests

Adam N. Sanborn Thomas T. Hills 《Psychonomic bulletin & review》2014,21(2):283-300

Null hypothesis significance testing (NHST) is the most commonly used statistical methodology in psychology. The probability of achieving a value as extreme or more extreme than the statistic obtained from the data is evaluated, and if it is low enough, the null hypothesis is rejected. However, because common experimental practice often clashes with the assumptions underlying NHST, these calculated probabilities are often incorrect. Most commonly, experimenters use tests that assume that sample sizes are fixed in advance of data collection but then use the data to determine when to stop; in the limit, experimenters can use data monitoring to guarantee that the null hypothesis will be rejected. Bayesian hypothesis testing (BHT) provides a solution to these ills because the stopping rule used is irrelevant to the calculation of a Bayes factor. In addition, there are strong mathematical guarantees on the frequentist properties of BHT that are comforting for researchers concerned that stopping rules could influence the Bayes factors produced. Here, we show that these guaranteed bounds have limited scope and often do not apply in psychological research. Specifically, we quantitatively demonstrate the impact of optional stopping on the resulting Bayes factors in two common situations: (1) when the truth is a combination of the hypotheses, such as in a heterogeneous population, and (2) when a hypothesis is composite—taking multiple parameter values—such as the alternative hypothesis in a t-test. We found that, for these situations, while the Bayesian interpretation remains correct regardless of the stopping rule used, the choice of stopping rule can, in some situations, greatly increase the chance of experimenters finding evidence in the direction they desire. We suggest ways to control these frequentist implications of stopping rules on BHT. 相似文献

5.

贝叶斯因子及其在JASP中的实现

胡传鹏孔祥祯 Eric-Jan Wagenmakers Alexander Ly 彭凯平《心理科学进展》2018,26(6):951-965

统计推断在科学研究中起到关键作用, 然而当前科研中最常用的经典统计方法——零假设检验(Null hypothesis significance test, NHST)却因难以理解而被部分研究者误用或滥用。有研究者提出使用贝叶斯因子(Bayes factor)作为一种替代和(或)补充的统计方法。贝叶斯因子是贝叶斯统计中用来进行模型比较和假设检验的重要方法, 其可以解读为对零假设H₀或者备择假设H₁的支持程度。其与NHST相比有如下优势：同时考虑H₀和H₁并可以用来支持H₀、不“严重”地倾向于反对H₀、可以监控证据强度的变化以及不受抽样计划的影响。目前, 贝叶斯因子能够很便捷地通过开放的统计软件JASP实现, 本文以贝叶斯t检验进行示范。贝叶斯因子的使用对心理学研究者来说具有重要的意义, 但使用时需要注意先验分布选择的合理性以及保持数据分析过程的透明与公开。相似文献

6.

An inferential confidence interval method of establishing statistical equivalence that corrects Tryon's (2001) reduction factor

Tryon WW Lewis C 《心理学方法》2008,13(3):272-277

Evidence of group matching frequently takes the form of a nonsignificant test of statistical difference. Theoretical hypotheses of no difference are also tested in this way. These practices are flawed in that null hypothesis statistical testing provides evidence against the null hypothesis and failing to reject H-sub-0 is not evidence supportive of it. Tests of statistical equivalence are needed. This article corrects the inferential confidence interval (ICI) reduction factor introduced by W. W. Tryon (2001) and uses it to extend his discussion of statistical equivalence. This method is shown to be algebraically equivalent with D. J. Schuirmann's (1987) use of 2 one-sided t tests, a highly regarded and accepted method of testing for statistical equivalence. The ICI method provides an intuitive graphic method for inferring statistical difference as well as equivalence. Trivial difference occurs when a test of difference and a test of equivalence are both passed. Statistical indeterminacy results when both tests are failed. Hybrid confidence intervals are introduced that impose ICI limits on standard confidence intervals. These intervals are recommended as replacements for error bars because they facilitate inferences. 相似文献

7.

Multiplicity,directional (type III) errors,and the null hypothesis

Shaffer JP 《心理学方法》2002,7(3):356-369

L. V. Jones and J. W. Tukey (2000) pointed out that the usual 2-sided, equal-tails null hypothesis test at level alpha can be reinterpreted as simultaneous tests of 2 directional inequality hypotheses, each at level alpha/2, and that the maximum probability of a Type I error is alpha/2 if the truth of the null hypothesis is considered impossible. This article points out that in multiple testing with familywise error rate controlled at alpha, the directional error rate (assuming all null hypotheses are false) is greater than alpha/2 and can be arbitrarily close to alpha. Single-step, step-down, and step-up procedures are analyzed, and other error rates, including the false discovery rate, are discussed. Implications for confidence interval estimation and hypothesis testing practices are considered. 相似文献

8.

SHOULD SIGNIFICANCE TESTS BE BANNED?

Patrick E. Shrout 《Psychological science》1997,8(1):1-2

Abstract— Significance testing of null hypotheses is the standard epistemologicat method for advancing scientific knowledge in psychology, even though a has drawbacks and it leads to common inferential mistakes These mistakes include accepting the null hypothesis when it fails to be rejected, automatically interpreting rejected null hypotheses as theoretically meaningful, and failing to consider the likelihood of Type II errors Although these mistakes have been discussed repeatedly for decades, there is no evidence that the academic discussion has had an impact A group of methodologists is proposing a new approach simply ban significance tests in psychology journals The impact of a similar ban in public-health and epidemiology journals is reported 相似文献

9.

Evaluating expectations about negative emotional states of aggressive boys using Bayesian model selection

van de Schoot R Hoijtink H Mulder J Van Aken MA de Castro BO Meeus W Romeijn JW 《Developmental psychology》2011,47(1):203-212

Researchers often have expectations about the research outcomes in regard to inequality constraints between, e.g., group means. Consider the example of researchers who investigated the effects of inducing a negative emotional state in aggressive boys. It was expected that highly aggressive boys would, on average, score higher on aggressive responses toward other peers than moderately aggressive boys, who would in turn score higher than nonaggressive boys. In most cases, null hypothesis testing is used to evaluate such hypotheses. We show, however, that hypotheses formulated using inequality constraints between the group means are generally not evaluated properly. The wrong hypotheses are tested, i.e.. the null hypothesis that group means are equal. In this article, we propose an innovative solution to these above-mentioned issues using Bayesian model selection, which we illustrate using a case study. 相似文献

10.

Properties of hypothesis testing techniques and (Bayesian) model selection for exploration‐based and theory‐based (order‐restricted) hypotheses

下载免费PDF全文

Rebecca M. Kuiper Tim Nederhoff Irene Klugkist 《The British journal of mathematical and statistical psychology》2015,68(2):220-245

In this paper, the performance of six types of techniques for comparisons of means is examined. These six emerge from the distinction between the method employed (hypothesis testing, model selection using information criteria, or Bayesian model selection) and the set of hypotheses that is investigated (a classical, exploration‐based set of hypotheses containing equality constraints on the means, or a theory‐based limited set of hypotheses with equality and/or order restrictions). A simulation study is conducted to examine the performance of these techniques. We demonstrate that, if one has specific, a priori specified hypotheses, confirmation (i.e., investigating theory‐based hypotheses) has advantages over exploration (i.e., examining all possible equality‐constrained hypotheses). Furthermore, examining reasonable order‐restricted hypotheses has more power to detect the true effect/non‐null hypothesis than evaluating only equality restrictions. Additionally, when investigating more than one theory‐based hypothesis, model selection is preferred over hypothesis testing. Because of the first two results, we further examine the techniques that are able to evaluate order restrictions in a confirmatory fashion by examining their performance when the homogeneity of variance assumption is violated. Results show that the techniques are robust to heterogeneity when the sample sizes are equal. When the sample sizes are unequal, the performance is affected by heterogeneity. The size and direction of the deviations from the baseline, where there is no heterogeneity, depend on the effect size (of the means) and on the trend in the group variances with respect to the ordering of the group sizes. Importantly, the deviations are less pronounced when the group variances and sizes exhibit the same trend (e.g., are both increasing with group number). 相似文献

11.

Answering two criticisms of hypothesis testing: a comment

Serlin RC 《Psychological reports》2000,87(2):579-581

In a recent article, Leventhal (1999) responds to two criticisms of hypothesis testing by showing that the one-tailed test and the directional two-tailed test are valid, even if all point null hypotheses are false and that hypothesis tests can provide the probability of decisions being correct which are based on the tests. Unfortunately, the falseness of all point null hypotheses affects the operating characteristics of the directional two-tailed test, seeming to weaken certain of Leventhal's arguments in favor of this procedure. 相似文献

12.

How to quantify the evidence for the absence of a correlation

Eric-Jan Wagenmakers Josine Verhagen Alexander Ly 《Behavior research methods》2016,48(2):413-426

We present a suite of Bayes factor hypothesis tests that allow researchers to grade the decisiveness of the evidence that the data provide for the presence versus the absence of a correlation between two variables. For concreteness, we apply our methods to the recent work of Donnellan et al. (in press) who conducted nine replication studies with over 3,000 participants and failed to replicate the phenomenon that lonely people compensate for a lack of social warmth by taking warmer baths or showers. We show how the Bayes factor hypothesis test can quantify evidence in favor of the null hypothesis, and how the prior specification for the correlation coefficient can be used to define a broad range of tests that address complementary questions. Specifically, we show how the prior specification can be adjusted to create a two-sided test, a one-sided test, a sensitivity analysis, and a replication test. 相似文献

13.

All for one or some for all? Evaluating informative hypotheses using multiple <Emphasis Type="Italic">N</Emphasis> = 1 studies

Fayette Klaassen Claire M. Zedelius Harm Veling Henk Aarts Herbert Hoijtink 《Behavior research methods》2018,50(6):2276-2291

Analyses are mostly executed at the population level, whereas in many applications the interest is on the individual level instead of the population level. In this paper, multiple N =?1 experiments are considered, where participants perform multiple trials with a dichotomous outcome in various conditions. Expectations with respect to the performance of participants can be translated into so-called informative hypotheses. These hypotheses can be evaluated for each participant separately using Bayes factors. A Bayes factor expresses the relative evidence for two hypotheses based on the data of one individual. This paper proposes to “average” these individual Bayes factors in the gP-BF, the average relative evidence. The gP-BF can be used to determine whether one hypothesis is preferred over another for all individuals under investigation. This measure provides insight into whether the relative preference of a hypothesis from a pre-defined set is homogeneous over individuals. Two additional measures are proposed to support the interpretation of the gP-BF: the evidence rate (ER), the proportion of individual Bayes factors that support the same hypothesis as the gP-BF, and the stability rate (SR), the proportion of individual Bayes factors that express a stronger support than the gP-BF. These three statistics can be used to determine the relative support in the data for the informative hypotheses entertained. Software is available that can be used to execute the approach proposed in this paper and to determine the sensitivity of the outcomes with respect to the number of participants and within condition replications. 相似文献

14.

Testing equivalence with repeated measures: tests of the difference model of two-alternative forced-choice performance

García-Pérez MA Alcalá-Quintana R 《The Spanish journal of psychology》2011,14(2):1023-1049

Solving theoretical or empirical issues sometimes involves establishing the equality of two variables with repeated measures. This defies the logic of null hypothesis significance testing, which aims at assessing evidence against the null hypothesis of equality, not for it. In some contexts, equivalence is assessed through regression analysis by testing for zero intercept and unit slope (or simply for unit slope in case that regression is forced through the origin). This paper shows that this approach renders highly inflated Type I error rates under the most common sampling models implied in studies of equivalence. We propose an alternative approach based on omnibus tests of equality of means and variances and in subject-by-subject analyses (where applicable), and we show that these tests have adequate Type I error rates and power. The approach is illustrated with a re-analysis of published data from a signal detection theory experiment with which several hypotheses of equivalence had been tested using only regression analysis. Some further errors and inadequacies of the original analyses are described, and further scrutiny of the data contradict the conclusions raised through inadequate application of regression analyses. 相似文献

15.

Statistical requirements for properly investigating a null hypothesis

Schumm WR 《Psychological reports》2010,107(3):953-971

Issues involved in the evaluation of null hypotheses are discussed. The use of equivalence testing is recommended as a possible alternative to the use of simple t or F tests for evaluating a null hypothesis. When statistical power is low and larger sample sizes are not available or practical, consideration should be given to using one-tailed tests or less conservative levels for determining criterion levels of statistical significance. Effect sizes should always be reported along with significance levels, as both are needed to understand results of research. Probabilities alone are not enough and are especially problematic for very large or very small samples. Pre-existing group differences should be tested and properly accounted for when comparing independent groups on dependent variables. If confirmation of a null hypothesis is expected, potential suppressor variables should be considered. If different methods are used to select the samples to be compared, controls for social desirability bias should be implemented. When researchers deviate from these standards or appear to assume that such standards are unimportant or irrelevant, their results should be deemed less credible than when such standards are maintained and followed. Several examples of recent violations of such standards in family social science, comparing gay, lesbian, bisexual, and transgender families with heterosexual families, are provided. Regardless of their political values or expectations, researchers should strive to test null hypotheses rigorously, in accordance with the best professional standards. 相似文献

16.

Approximated adjusted fractional Bayes factors: A general method for testing informative hypotheses

Xin Gu Joris Mulder Herbert Hoijtink 《The British journal of mathematical and statistical psychology》2018,71(2):229-261

Informative hypotheses are increasingly being used in psychological sciences because they adequately capture researchers’ theories and expectations. In the Bayesian framework, the evaluation of informative hypotheses often makes use of default Bayes factors such as the fractional Bayes factor. This paper approximates and adjusts the fractional Bayes factor such that it can be used to evaluate informative hypotheses in general statistical models. In the fractional Bayes factor a fraction parameter must be specified which controls the amount of information in the data used for specifying an implicit prior. The remaining fraction is used for testing the informative hypotheses. We discuss different choices of this parameter and present a scheme for setting it. Furthermore, a software package is described which computes the approximated adjusted fractional Bayes factor. Using this software package, psychological researchers can evaluate informative hypotheses by means of Bayes factors in an easy manner. Two empirical examples are used to illustrate the procedure. 相似文献

17.

Bayesian inference for psychology,part IV: parameter estimation and Bayes factors

Jeffrey N. Rouder Julia M. Haaf Joachim Vandekerckhove 《Psychonomic bulletin & review》2018,25(1):102-113

In the psychological literature, there are two seemingly different approaches to inference: that from estimation of posterior intervals and that from Bayes factors. We provide an overview of each method and show that a salient difference is the choice of models. The two approaches as commonly practiced can be unified with a certain model specification, now popular in the statistics literature, called spike-and-slab priors. A spike-and-slab prior is a mixture of a null model, the spike, with an effect model, the slab. The estimate of the effect size here is a function of the Bayes factor, showing that estimation and model comparison can be unified. The salient difference is that common Bayes factor approaches provide for privileged consideration of theoretically useful parameter values, such as the value corresponding to the null hypothesis, while estimation approaches do not. Both approaches, either privileging the null or not, are useful depending on the goals of the analyst. 相似文献

18.

Hail the impossible: p-values, evidence, and likelihood

Johansson T 《Scandinavian journal of psychology》2011,52(2):113-125

Significance testing based on p-values is standard in psychological research and teaching. Typically, research articles and textbooks present and use p as a measure of statistical evidence against the null hypothesis (the Fisherian interpretation), although using concepts and tools based on a completely different usage of p as a tool for controlling long-term decision errors (the Neyman-Pearson interpretation). There are four major problems with using p as a measure of evidence and these problems are often overlooked in the domain of psychology. First, p is uniformly distributed under the null hypothesis and can therefore never indicate evidence for the null. Second, p is conditioned solely on the null hypothesis and is therefore unsuited to quantify evidence, because evidence is always relative in the sense of being evidence for or against a hypothesis relative to another hypothesis. Third, p designates probability of obtaining evidence (given the null), rather than strength of evidence. Fourth, p depends on unobserved data and subjective intentions and therefore implies, given the evidential interpretation, that the evidential strength of observed data depends on things that did not happen and subjective intentions. In sum, using p in the Fisherian sense as a measure of statistical evidence is deeply problematic, both statistically and conceptually, while the Neyman-Pearson interpretation is not about evidence at all. In contrast, the likelihood ratio escapes the above problems and is recommended as a tool for psychologists to represent the statistical evidence conveyed by obtained data relative to two hypotheses. 相似文献

19.

Hypothesis testing for coefficient alpha: An SEM approach

Alberto Maydeu-Olivares Donna L. Coffman Carlos García-Forero David Gallardo-Pujol 《Behavior research methods》2010,42(2):618-625

We show how to test hypotheses for coefficient alpha in three different situations: (1) hypothesis tests of whether coefficient alpha equals a prespecified value, (2) hypothesis tests involving two statistically independent sample alphas as may arise when testing the equality of coefficient alpha across groups, and (3) hypothesis tests involving two statistically dependent sample alphas as may arise when testing the equality of alpha across time or when testing the equality of alpha for two test scores within the same sample. We illustrate how these hypotheses may be tested in a structural equation-modeling framework under the assumption of normally distributed responses and also under asymptotically distribution free assumptions. The formulas for the hypothesis tests and computer code are given for four different applied examples. Supplemental materials for this article may be downloaded from http://brm.psychonomic-journals.org/content/supplemental. 相似文献

20.

Use them or lose them: Are manipulatives needed to assess numeracy and geometry performance in preschool?

Connor D. O'Rear Erica L. Zippert Patrick Ehrman Lauren Westerberg Christopher J. Lonigan David J. Purpura 《Infant and child development》2023,32(5):e2444

In two studies, we investigated whether using three-dimensional (3D) manipulatives during assessment aided performance on a variety of preschool mathematics tasks compared to pictorial representations. On measures of children's understanding of counting and cardinality (n = 103), there was no difference in performance between manipulatives and pictures, with Bayes factors suggesting moderate evidence in favor of the null hypothesis. On a measure of children's shape identification (n = 93), there was no difference in performance between objects and pictures, with Bayes factors suggesting moderate evidence in favor of the null hypothesis. These results suggest flexibility in the materials that can be used during assessment. Pictures, or 2D renderings of 3D objects, which can be easily printed and reproduced, may be sufficient for assessing counting and shape knowledge without the need for more cumbersome concrete manipulatives. 相似文献