首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In a recent article, Leventhal (1999) responds to two criticisms of hypothesis testing by showing that the one-tailed test and the directional two-tailed test are valid, even if all point null hypotheses are false and that hypothesis tests can provide the probability of decisions being correct which are based on the tests. Unfortunately, the falseness of all point null hypotheses affects the operating characteristics of the directional two-tailed test, seeming to weaken certain of Leventhal's arguments in favor of this procedure.  相似文献   

2.
L. V. Jones and J. W. Tukey (2000) pointed out that the usual 2-sided, equal-tails null hypothesis test at level alpha can be reinterpreted as simultaneous tests of 2 directional inequality hypotheses, each at level alpha/2, and that the maximum probability of a Type I error is alpha/2 if the truth of the null hypothesis is considered impossible. This article points out that in multiple testing with familywise error rate controlled at alpha, the directional error rate (assuming all null hypotheses are false) is greater than alpha/2 and can be arbitrarily close to alpha. Single-step, step-down, and step-up procedures are analyzed, and other error rates, including the false discovery rate, are discussed. Implications for confidence interval estimation and hypothesis testing practices are considered.  相似文献   

3.
Null hypothesis significance testing (NHST) is the most widely accepted and frequently used approach to statistical inference in quantitative communication research. NHST, however, is highly controversial, and several serious problems with the approach have been identified. This paper reviews NHST and the controversy surrounding it. Commonly recognized problems include a sensitivity to sample size, the null is usually literally false, unacceptable Type II error rates, and misunderstanding and abuse. Problems associated with the conditional nature of NHST and the failure to distinguish statistical hypotheses from substantive hypotheses are emphasized. Recommended solutions and alternatives are addressed in a companion article.  相似文献   

4.
In his classic article on the fallacy of the null hypothesis in soft psychology [J. Consult. Clin. Psychol. 46 (1978)], Paul Meehl claimed that, in nonexperimental settings, the probability of rejecting the null hypothesis of nil group differences in favor of a directional alternative was 0.50—a value that is an order of magnitude higher than the customary Type I error rate. In a series of real data simulations, using Minnesota Multiphasic Personality Inventory-Revised (MMPI-2) data collected from more than 80,000 individuals, I found strong support for Meehl’s claim.  相似文献   

5.
We propose a simple modification of Hochberg's step‐up Bonferroni procedure for multiple tests of significance. The proposed procedure is always more powerful than Hochberg's procedure for more than two tests, and is more powerful than Hommel's procedure for three and four tests. A numerical analysis of the new procedure indicates that its Type I error is controlled under independence of the test statistics, at a level equal to or just below the nominal Type I error. Examination of various non‐null configurations of hypotheses shows that the modified procedure has a power advantage over Hochberg's procedure which increases in relationship to the number of false hypotheses.  相似文献   

6.
According to Bayesians, the null hypothesis significance-testing procedure is not deductively valid because it involves the retention or rejection of the null hypothesis under conditions where the posterior probability of that hypothesis is not known. Other criticisms are that this procedure is pointless and encourages imprecise hypotheses. However, according to non-Bayesians, there is no way of assigning a prior probability to the null hypothesis, and so Bayesian statistics do not work either. Consequently, no procedure has been accepted by both groups as providing a compelling reason to accept or reject hypotheses. The author aims to provide such a method. In the process, the author distinguishes between probability and epistemic estimation and argues that, although both are important in a science that is not completely deterministic, epistemic estimation is most relevant for hypothesis testing. Based on this analysis, the author proposes that hypotheses be evaluated via epistemic ratios and explores the implications of this proposal. One implication is that it is possible to encourage precise theorizing by imposing a penalty for imprecise hypotheses.  相似文献   

7.
Consider two independent groups with K measures for each subject. For the jth group and kth measure, let μtjk be the population trimmed mean, j = 1, 2; k = 1, ..., K. This article compares several methods for testing H 0 : u1k = t2k such that the probability of at least one Type I error is, and simultaneous probability coverage is 1 - α when computing confidence intervals for μt1k - μt2k . The emphasis is on K = 4 and α = .05. For zero trimming the problem reduces to comparing means, but it is well known that when comparing means, arbitrarily small departures from normality can result in extremely low power relative to using say 20% trimming. Moreover, when skewed distributions are being compared, conventional methods for comparing means can be biased for reasons reviewed in the article. A consequence is that in some realistic situations, the probability of rejecting can be higher when the null hypothesis is true versus a situation where the means differ by a half standard deviation. Switching to robust measures of location is known to reduce this problem, and combining robust measures of location with some type of bootstrap method reduces the problem even more. Published articles suggest that for the problem at hand, the percentile t bootstrap, combined with a 20% trimmed mean, will perform relatively well, but there are known situations where it does not eliminate all problems. In this article we consider an extension of the percentile bootstrap approach that is found to give better results.  相似文献   

8.
There have been many discussions of how Type I errors should be controlled when many hypotheses are tested (e.g., all possible comparisons of means, correlations, proportions, the coefficients in hierarchical models, etc.). By and large, researchers have adopted familywise (FWER) control, though this practice certainly is not universal. Familywise control is intended to deal with the multiplicity issue of computing many tests of significance, yet such control is conservative--that is, less powerful--compared to per test/hypothesis control. The purpose of our article is to introduce the readership, particularly those readers familiar with issues related to controlling Type I errors when many tests of significance are computed, to newer methods that provide protection from the effects of multiple testing, yet are more powerful than familywise controlling methods. Specifically, we introduce a number of procedures that control the k-FWER. These methods--say, 2-FWER instead of 1-FWER (i.e., FWER)--are equivalent to specifying that the probability of 2 or more false rejections is controlled at .05, whereas FWER controls the probability of any (i.e., 1 or more) false rejections at .05. 2-FWER implicitly tolerates 1 false rejection and makes no explicit attempt to control the probability of its occurrence, unlike FWER, which tolerates no false rejections at all. More generally, k-FWER tolerates k - 1 false rejections, but controls the probability of k or more false rejections at α =.05. We demonstrate with two published data sets how more hypotheses can be rejected with k-FWER methods compared to FWER control.  相似文献   

9.
Chow SL 《The Behavioral and brain sciences》1998,21(2):169-94; discussion 194-239
The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of data. Neither can any of them fulfill the nonstatistical functions expected of them by critics.  相似文献   

10.
11.
A study is reported testing two hypotheses about a close parallel relation between indicative conditionals, if A then B, and conditional bets, I bet you that if A then B. The first is that both the indicative conditional and the conditional bet are related to the conditional probability, P(B|A). The second is that de Finetti's three-valued truth table has psychological reality for both types of conditional—true, false, or void for indicative conditionals and win, lose, or void for conditional bets. The participants were presented with an array of chips in two different colours and two different shapes, and an indicative conditional or a conditional bet about a random chip. They had to make judgements in two conditions: either about the chances of making the indicative conditional true or false or about the chances of winning or losing the conditional bet. The observed distributions of responses in the two conditions were generally related to the conditional probability, supporting the first hypothesis. In addition, a majority of participants in further conditions chose the third option, “void”, when the antecedent of the conditional was false, supporting the second hypothesis.  相似文献   

12.
Statistical tests of the primality of some numbers look similar to statistical tests of many nonmathematical, clearly empirical propositions. Yet interpretations of probability prima facie appear to preclude the possibility of statistical tests of mathematical propositions. For example, it is hard to understand how the statement that n is prime could have a frequentist probability other than 0 or 1. On the other hand, subjectivist approaches appear to be saddled with ‘coherence’ constraints on rational probabilities that require rational agents to assign extremal probabilities to logical and mathematical propositions. In the light of these problems, many philosophers have come to think that there must be some way to generalize a Bayesian statistical account. In this article I propose that a classical frequentist approach should be reconsidered. I conclude that we can give a conditional justification of statistical testing of at least some mathematical hypotheses: if statistical tests provide us with reasons to believe or bet on empirical hypotheses in the standard situations, then they also provide us with reasons to believe or bet on mathematical hypotheses in the structurally similar mathematical cases.  相似文献   

13.
This article attempts to summarize a few criteria of progress in philosophy—clarifying problems; rejecting false theories; opening new perspectives in familiar fields; inventing new arguments or thought experiments; and so on—and to apply them to contemporary philosophy of mind. As a result, the article concludes that while some progress was obvious in the past fifty years, there is much work yet to be done. It then tries to outline a transformation of conceptual analysis needed for further developments in this field. The author argues that conceptual analysis might be revived if it is treated as a clarification of the relations among our natural beliefs.  相似文献   

14.
15.
The latent Markov (LM) model is a popular method for identifying distinct unobserved states and transitions between these states over time in longitudinally observed responses. The bootstrap likelihood-ratio (BLR) test yields the most rigorous test for determining the number of latent states, yet little is known about power analysis for this test. Power could be computed as the proportion of the bootstrap p values (PBP) for which the null hypothesis is rejected. This requires performing the full bootstrap procedure for a large number of samples generated from the model under the alternative hypothesis, which is computationally infeasible in most situations. This article presents a computationally feasible shortcut method for power computation for the BLR test. The shortcut method involves the following simple steps: (1) obtaining the parameters of the model under the null hypothesis, (2) constructing the empirical distributions of the likelihood ratio under the null and alternative hypotheses via Monte Carlo simulations, and (3) using these empirical distributions to compute the power. We evaluate the performance of the shortcut method by comparing it to the PBP method and, moreover, show how the shortcut method can be used for sample-size determination.  相似文献   

16.
The author compared simulations of the “true” null hypothesis (z) test, in which ò was known and fixed, with the t test, in which s, an estimate of ò, was calculated from the sample because the t test was used to emulate the “true” test. The true null hypothesis test bears exclusively on calculating the probability that a sample distance (mean) is larger than a specified value. The results showed that the value of t was sensitive to sampling fluctuations in both distance and standard error. Large values of t reflect small standard errors when n is small. The value of t achieves sensitivity primarily to distance only when the sample sizes are large. One cannot make a definitive statement about the probability or “significance” of a distance solely on the basis of the value of t.  相似文献   

17.
One familiar form of argument for rejecting entities of a certain kind is that, by rejecting them, we avoid certain difficult problems associated with them. Such problem-avoidance arguments backfire if the problems cited survive the elimination of the rejected entities. In particular, we examine one way problems can survive: a question for the realist about which of a set of inconsistent statements is false may give way to an equally difficult question for the eliminativist about which of a set of inconsistent statements fail to be ‘factual’. Much of the first half of the paper is devoted to explaining a notion of factuality that does not imply truth but still consists in ‘getting the world right’. The second half of the paper is a case study. Some ‘compositional nihilists’ have argued that, by rejecting composite objects (and so by denying the composition ever takes place), we avoid the notorious puzzles of coincidence, for example, the statue/lump and the ship of Theseus puzzles. Using the apparatus developed in the first half of the paper, we explore the question of whether these puzzles survive the elimination of composite objects.  相似文献   

18.
Researchers often have expectations about the research outcomes in regard to inequality constraints between, e.g., group means. Consider the example of researchers who investigated the effects of inducing a negative emotional state in aggressive boys. It was expected that highly aggressive boys would, on average, score higher on aggressive responses toward other peers than moderately aggressive boys, who would in turn score higher than nonaggressive boys. In most cases, null hypothesis testing is used to evaluate such hypotheses. We show, however, that hypotheses formulated using inequality constraints between the group means are generally not evaluated properly. The wrong hypotheses are tested, i.e.. the null hypothesis that group means are equal. In this article, we propose an innovative solution to these above-mentioned issues using Bayesian model selection, which we illustrate using a case study.  相似文献   

19.
Li L  Bentler PM 《心理学方法》2011,16(2):116-126
MacCallum, Browne, and Cai (2006) proposed a new framework for evaluation and power analysis of small differences between nested structural equation models (SEMs). In their framework, the null and alternative hypotheses for testing a small difference in fit and its related power analyses were defined by some chosen root-mean-square error of approximation (RMSEA) pairs. In this article, we develop a new method that quantifies those chosen RMSEA pairs and allows a quantitative comparison of them. Our method proposes the use of single RMSEA values to replace the choice of RMSEA pairs for model comparison and power analysis, thus avoiding the differential meaning of the chosen RMSEA pairs inherent in the approach of MacCallum et al. (2006). With this choice, the conventional cutoff values in model overall evaluation can directly be transferred and applied to the evaluation and power analysis of model differences.  相似文献   

20.
I seem to know that I won't experience spaceflight but also that if I win the lottery, then I will take a flight into space. Suppose I competently deduce from these propositions that I won't win the lottery. Competent deduction from known premises seems to yield knowledge of the deduced conclusion. So it seems that I know that I won't win the lottery; but it also seems clear that I don't know this, despite the minuscule probability of my winning (if I have a lottery ticket). So we have a puzzle. It seems to generalize, for analogues of the lottery-proposition threaten almost all ordinary knowledge attributions. For example, my apparent knowledge that my bike is parked outside seems threatened by the possibility that it's been stolen since I parked it, a proposition with a low but non-zero probability; and it seems that I don't know this proposition to be false. Familiar solutions to this family of puzzles incur unacceptable costs—either by rejecting deductive closure for knowledge, or by yielding untenable consequences for ordinary attributions of knowledge or of ignorance. After canvassing and criticizing these solutions, I offer a new solution free of these costs.

Knowledge that p requires an explanatory link between the fact that p and the belief that p. This necessary but insufficient condition on knowledge distinguishes actual lottery cases from typical, apparently analogous ‘quasi-lottery’ cases. It does yield scepticism about my not winning the lottery and not experiencing spaceflight, but the scepticism doesn't generalize to quasi-lottery cases such as that involving my bike.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号