期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Robust tests of equivalence for k independent groups

Andy Koh Robert Cribbie 《The British journal of mathematical and statistical psychology》2013,66(3):426-434

A common question of interest to researchers in psychology is the equivalence of two or more groups. Failure to reject the null hypothesis of traditional hypothesis tests such as the ANOVA F‐test (i.e., H₀: μ₁ = … = μ_k) does not imply the equivalence of the population means. Researchers interested in determining the equivalence of k independent groups should apply a one‐way test of equivalence (e.g., Wellek, 2003). The goals of this study were to investigate the robustness of the one‐way Wellek test of equivalence to violations of homogeneity of variance assumption, and compare the Type I error rates and power of the Wellek test with a heteroscedastic version which was based on the logic of the one‐way Welch (1951) F‐test. The results indicate that the proposed Wellek–Welch test was insensitive to violations of the homogeneity of variance assumption, whereas the original Wellek test was not appropriate when the population variances were not equal. 相似文献

2.

Consequences of choosing samples in hypothesis testing to ensure homogeneity of variance

Donald W. Zimmerman 《The British journal of mathematical and statistical psychology》2014,67(1):1-29

The two‐sample Student t test of location was performed on random samples of scores and on rank‐transformed scores from normal and non‐normal population distributions with unequal variances. The same test also was performed on scores that had been explicitly selected to have nearly equal sample variances. The desired homogeneity of variance was brought about by repeatedly rejecting pairs of samples having a ratio of standard deviations that exceeded a predetermined cut‐off value of 1.1, 1.2, or 1.3, while retaining pairs with ratios less than the cut‐off value. Despite this forced conformity with the assumption of equal variances, the tests on the selected samples were no more robust than tests on unselected samples, and in most cases substantially less robust. Under conditions where sample sizes were unequal, so that Type I error rates were inflated and power curves were atypical, the selection procedure produced still greater inflation and distortion of the power curves. 相似文献

3.

Testing differences between nested covariance structure models: Power analysis and null hypotheses

MacCallum RC Browne MW Cai L 《心理学方法》2006,11(1):19-35

For comparing nested covariance structure models, the standard procedure is the likelihood ratio test of the difference in fit, where the null hypothesis is that the models fit identically in the population. A procedure for determining statistical power of this test is presented where effect size is based on a specified difference in overall fit of the models. A modification of the standard null hypothesis of zero difference in fit is proposed allowing for testing an interval hypothesis that the difference in fit between models is small, rather than zero. These developments are combined yielding a procedure for estimating power of a test of a null hypothesis of small difference in fit versus an alternative hypothesis of larger difference. 相似文献

4.

A Method for the Quantitative Recording of Eye Movements

Ward C. Halstead 《The Journal of psychology》2013,147(1):177-180

Many writers have implicitly or explicitly stated that nonparametric tests are free from the assumption of homogeneity of variance. Nonparametric tests for difference in central tendencies generally involve the assumption of homogeneity of variance. The assumption of homogeneity of variance for the t test and for nonparametric tests serves the same purpose: it allows the user to draw more specific inferences when the null hypothesis is rejected. 相似文献

5.

The variable-criteria sequential stopping rule: Generality to unequal sample sizes,unequal variances,or to large ANOVAs

Fitts DA 《Behavior research methods》2010,42(4):918-929

The variable-criteria sequential stopping rule (SSR) is a method for conducting planned experiments in stages after the addition of new subjects until the experiment is stopped because the p value is less than or equal to a lower criterion and the null hypothesis has been rejected, the p value is above an upper criterion, or a maximum sample size has been reached. Alpha is controlled at the expected level. The table of stopping criteria has been validated for a t test or ANOVA with four groups. New simulations in this article demonstrate that the SSR can be used with unequal sample sizes or heterogeneous variances in a t test. As with the usual t test, the use of a separate-variance term instead of a pooled-variance term prevents an inflation of alpha with heterogeneous variances. Simulations validate the original table of criteria for up to 20 groups without a drift of alpha. When used with a multigroup ANOVA, a planned contrast can be substituted for the global F as the focus for the stopping rule. The SSR is recommended when significance tests are appropriate and when the null hypothesis can be tested in stages. Because of its efficiency, the SSR should be used instead of the usual approach to the t test or ANOVA when subjects are expensive, rare, or limited by ethical considerations such as pain or distress. 相似文献

6.

Statistical inferences about the error variance

Walter Kristof 《Psychometrika》1963,28(2):129-143

This paper is a presentation of an essential part of the sampling theory of the error variance and the standard error of measurement. An experimental assumption is that several equivalent tests with equal variances are available. These may be either final forms of the same test or obtained by dividing one test into several parts. The simple model of independent and normally distributed errors of measurement with zero mean is employed. No assumption is made about the form of the distributions of true and observed scores. This implies unrestricted freedom in defining the population. First, maximum-likelihood estimators of the error variance and the standard error of measurement are obtained, their sampling distributions given, and their properties investigated. Then unbiased estimators are defined and their distributions derived. The accuracy of estimation is given special consideration from various points of view. Next, rigorous statistical tests are developed to test hypotheses about error variances on the basis of one and two samples. Also the construction of confidence intervals is treated. Finally, Bartlett's test of homogeneity of variances is used to provide a multi-sample test of equality of error variances. 相似文献

7.

Testing for negligible interaction: A coherent and robust approach

下载免费PDF全文

Robert A. Cribbie Chantal Ragoonanan Alyssa Counsell 《The British journal of mathematical and statistical psychology》2016,69(2):159-174

Researchers often want to demonstrate a lack of interaction between two categorical predictors on an outcome. To justify a lack of interaction, researchers typically accept the null hypothesis of no interaction from a conventional analysis of variance (ANOVA). This method is inappropriate as failure to reject the null hypothesis does not provide statistical evidence to support a lack of interaction. This study proposes a bootstrap‐based intersection–union test for negligible interaction that provides coherent decisions between the omnibus test and post hoc interaction contrast tests and is robust to violations of the normality and variance homogeneity assumptions. Further, a multiple comparison strategy for testing interaction contrasts following a non‐significant omnibus test is proposed. Our simulation study compared the Type I error control, omnibus power and per‐contrast power of the proposed approach to the non‐centrality‐based negligible interaction test of Cheng and Shao (2007, Statistica Sinica, 17, 1441). For 2 × 2 designs, the empirical Type I error rates of the Cheng and Shao test were very close to the nominal α level when the normality and variance homogeneity assumptions were satisfied; however, only our proposed bootstrapping approach was satisfactory under non‐normality and/or variance heterogeneity. In general a × b designs, although the omnibus Cheng and Shao test, as expected, is the most powerful, it is not robust to assumption violation and results in incoherent omnibus and interaction contrast decisions that are not possible with the intersection–union approach. 相似文献

8.

On sample size calculation for 2×2 fixed‐effect ANOVA when variances are unknown and possibly unequal

Jiin‐Huarng Guo Dr Wei‐Ming Luh 《The British journal of mathematical and statistical psychology》2009,62(2):417-425

The factorial 2 × 2 fixed‐effect ANOVA is a procedure used frequently in scientific research to test mean differences between‐subjects in all of the groups. But if the assumption of homogeneity is violated, the test for the row, column, and the interaction effect might be invalid or less powerful. Therefore, for planning research in the case of unknown and possibly unequal variances, it is worth developing a sample size formula to obtain the desired power. This article suggests a simple formula to determine the sample size for 2 × 2 fixed‐effect ANOVA for heterogeneous variances across groups. We use the approximate Welch t test and consider the variance ratio to derive the formula. The sample size determination requires two‐step iterations but the approximate sample sizes needed for the main effect and the interaction effect can be determined separately with the specified power. The present study also provides an example and a SAS program to facilitate the calculation process. 相似文献

9.

Bayes factor approaches for testing interval null hypotheses 总被引：1，自引：0，他引：1

Morey RD Rouder JN 《心理学方法》2011,16(4):406-419

Psychological theories are statements of constraint. The role of hypothesis testing in psychology is to test whether specific theoretical constraints hold in data. Bayesian statistics is well suited to the task of finding supporting evidence for constraint, because it allows for comparing evidence for 2 hypotheses against each another. One issue in hypothesis testing is that constraints may hold only approximately rather than exactly, and the reason for small deviations may be trivial or uninteresting. In the large-sample limit, these uninteresting, small deviations lead to the rejection of a useful constraint. In this article, we develop several Bayes factor 1-sample tests for the assessment of approximate equality and ordinal constraints. In these tests, the null hypothesis covers a small interval of non-0 but negligible effect sizes around 0. These Bayes factors are alternatives to previously developed Bayes factors, which do not allow for interval null hypotheses, and may especially prove useful to researchers who use statistical equivalence testing. To facilitate adoption of these Bayes factor tests, we provide easy-to-use software. 相似文献

10.

Confidence intervals for standardized linear contrasts of means

Bonett DG 《心理学方法》2008,13(2):99-109

Most psychology journals now require authors to report a sample value of effect size along with hypothesis testing results. The sample effect size value can be misleading because it contains sampling error. Authors often incorrectly interpret the sample effect size as if it were the population effect size. A simple solution to this problem is to report a confidence interval for the population value of the effect size. Standardized linear contrasts of means are useful measures of effect size in a wide variety of research applications. New confidence intervals for standardized linear contrasts of means are developed and may be applied to between-subjects designs, within-subjects designs, or mixed designs. The proposed confidence interval methods are easy to compute, do not require equal population variances, and perform better than the currently available methods when the population variances are not equal. 相似文献

11.

A review of exact hypothesis testing procedures (and selection techniques) that control power regardless of the variances

Rand R. Wilcox 《The British journal of mathematical and statistical psychology》1984,37(1):34-48

When testing hypotheses, two important problems that applied statisticians must consider are whether a large enough sample was used, and what to do when the frequently adopted homogeneity of variance assumption is violated. The first goal in this paper is to briefly review exact solutions to these problems. In the one-way ANOVA, for example, these procedures tell an experimenter whether enough observations were sampled so that the power will be at least as large as some pre-specified level. If too few observations were sampled, the procedure indicates how many more observations are required. The solution is exact, which is in contrast to another well-known procedure described in the paper. Also, the variances are allowed to be unequal. The second goal is to review how the techniques used to test hypotheses have also been used to solve problems in selection. 相似文献

12.

Exact F Tests in an ANOVA Procedure for Dependent Observations

《Multivariate behavioral research》2013,48(4):408-420

Several general correlation patterns are shown in this paper which give exact F tests in an ANOVA procedure with dependent observations. This paper presents the most general correlation patterns one can assume in a one-way and two-way layout and still have the F tests be valid. Exact F tests are given for various designs. These include the unbalanced ANOVA design, analysis of covariance, random effects models, and mixed models. Bartlett's test for homogeneity of variances is shown to be exact when the independence assumption is relaxed. An example is provided to illustrate how the general correlation can occur in an experimental design. 相似文献

13.

Testing equivalence with repeated measures: tests of the difference model of two-alternative forced-choice performance

García-Pérez MA Alcalá-Quintana R 《The Spanish journal of psychology》2011,14(2):1023-1049

Solving theoretical or empirical issues sometimes involves establishing the equality of two variables with repeated measures. This defies the logic of null hypothesis significance testing, which aims at assessing evidence against the null hypothesis of equality, not for it. In some contexts, equivalence is assessed through regression analysis by testing for zero intercept and unit slope (or simply for unit slope in case that regression is forced through the origin). This paper shows that this approach renders highly inflated Type I error rates under the most common sampling models implied in studies of equivalence. We propose an alternative approach based on omnibus tests of equality of means and variances and in subject-by-subject analyses (where applicable), and we show that these tests have adequate Type I error rates and power. The approach is illustrated with a re-analysis of published data from a signal detection theory experiment with which several hypotheses of equivalence had been tested using only regression analysis. Some further errors and inadequacies of the original analyses are described, and further scrutiny of the data contradict the conclusions raised through inadequate application of regression analyses. 相似文献

14.

Model comparison in ANOVA

Jeffrey N. Rouder Christopher R. Engelhardt Simon McCabe Richard D. Morey 《Psychonomic bulletin & review》2016,23(6):1779-1786

Analysis of variance (ANOVA), the workhorse analysis of experimental designs, consists of F-tests of main effects and interactions. Yet, testing, including traditional ANOVA, has been recently critiqued on a number of theoretical and practical grounds. In light of these critiques, model comparison and model selection serve as an attractive alternative. Model comparison differs from testing in that one can support a null or nested model vis-a-vis a more general alternative by penalizing more flexible models. We argue this ability to support more simple models allows for more nuanced theoretical conclusions than provided by traditional ANOVA F-tests. We provide a model comparison strategy and show how ANOVA models may be reparameterized to better address substantive questions in data analysis. 相似文献

15.

An inferential confidence interval method of establishing statistical equivalence that corrects Tryon's (2001) reduction factor

Tryon WW Lewis C 《心理学方法》2008,13(3):272-277

Evidence of group matching frequently takes the form of a nonsignificant test of statistical difference. Theoretical hypotheses of no difference are also tested in this way. These practices are flawed in that null hypothesis statistical testing provides evidence against the null hypothesis and failing to reject H-sub-0 is not evidence supportive of it. Tests of statistical equivalence are needed. This article corrects the inferential confidence interval (ICI) reduction factor introduced by W. W. Tryon (2001) and uses it to extend his discussion of statistical equivalence. This method is shown to be algebraically equivalent with D. J. Schuirmann's (1987) use of 2 one-sided t tests, a highly regarded and accepted method of testing for statistical equivalence. The ICI method provides an intuitive graphic method for inferring statistical difference as well as equivalence. Trivial difference occurs when a test of difference and a test of equivalence are both passed. Statistical indeterminacy results when both tests are failed. Hybrid confidence intervals are introduced that impose ICI limits on standard confidence intervals. These intervals are recommended as replacements for error bars because they facilitate inferences. 相似文献

16.

Answering two criticisms of hypothesis testing: a comment

Serlin RC 《Psychological reports》2000,87(2):579-581

In a recent article, Leventhal (1999) responds to two criticisms of hypothesis testing by showing that the one-tailed test and the directional two-tailed test are valid, even if all point null hypotheses are false and that hypothesis tests can provide the probability of decisions being correct which are based on the tests. Unfortunately, the falseness of all point null hypotheses affects the operating characteristics of the directional two-tailed test, seeming to weaken certain of Leventhal's arguments in favor of this procedure. 相似文献

17.

Properties of hypothesis testing techniques and (Bayesian) model selection for exploration‐based and theory‐based (order‐restricted) hypotheses

下载免费PDF全文

Rebecca M. Kuiper Tim Nederhoff Irene Klugkist 《The British journal of mathematical and statistical psychology》2015,68(2):220-245

In this paper, the performance of six types of techniques for comparisons of means is examined. These six emerge from the distinction between the method employed (hypothesis testing, model selection using information criteria, or Bayesian model selection) and the set of hypotheses that is investigated (a classical, exploration‐based set of hypotheses containing equality constraints on the means, or a theory‐based limited set of hypotheses with equality and/or order restrictions). A simulation study is conducted to examine the performance of these techniques. We demonstrate that, if one has specific, a priori specified hypotheses, confirmation (i.e., investigating theory‐based hypotheses) has advantages over exploration (i.e., examining all possible equality‐constrained hypotheses). Furthermore, examining reasonable order‐restricted hypotheses has more power to detect the true effect/non‐null hypothesis than evaluating only equality restrictions. Additionally, when investigating more than one theory‐based hypothesis, model selection is preferred over hypothesis testing. Because of the first two results, we further examine the techniques that are able to evaluate order restrictions in a confirmatory fashion by examining their performance when the homogeneity of variance assumption is violated. Results show that the techniques are robust to heterogeneity when the sample sizes are equal. When the sample sizes are unequal, the performance is affected by heterogeneity. The size and direction of the deviations from the baseline, where there is no heterogeneity, depend on the effect size (of the means) and on the trend in the group variances with respect to the ordering of the group sizes. Importantly, the deviations are less pronounced when the group variances and sizes exhibit the same trend (e.g., are both increasing with group number). 相似文献

18.

A simple test for heterogeneity of variance in complex factorial designs

John E. Overall J. Arthur Woodward 《Psychometrika》1974,39(3):311-318

A simple procedure for testing heterogeneity of variance is developed which generalizes readily to complex, multi-factor experimental designs. Monte Carlo Studies indicate that the Z-variance test statistic presented here yields results equivalent to other familiar tests for heterogeneity of variance in simple one-way designs where comparisons are feasible. The primary advantage of the Z-variance test is in the analysis of factorial effects on sample variances in more complex designs. An example involving a three-way factorial design is presented. 相似文献

19.

Percentage points of Wilks'L mvc andL vc criteria

J. Roy V. K. Murthy 《Psychometrika》1960,25(3):243-250

Likelihood ratio tests have been proposed by Wilks for testing the hypothesis of equal means, variances, and covariances (H _mvc) and the hypothesis of equal variances and covariances (H _vc) in ap-variate normal distribution. Using exact distributions of the appropriate likelihood ratio statistics, tables of the .05 and .01 points of these distributions are constructed forp = 4, 5, 6, 7 and sample sizen = 25 (5) 60 (10) 100. A correction factor is recommended for largern. Two numerical examples illustrate use of the tables. A nonparametric test is proposed forH _mvc when the multivariate parent population is known to be non-normal.This research was supported partly by the Office of Naval Research under Contract No. Nonr-855(06) and partly by the United States Air Force through the Air Force Office of Scientific Research of the Air Research and Development Command, under Contract No. 18(600)-83. Reproduction in whole or in part for any purpose of the United States Government is permitted. 相似文献

20.

Preserving the ethical propriety of statistical devices

Pittenger DJ 《The Journal of psychology》2002,136(2):117-124

C. D. Herrera (1996) introduced an innovative argument against the use of deception in psychological research. In essence, Herrera contended that because of the presumed problems with null hypothesis statistical testing, researchers could not justify their continued use of deception in research. Although this is an interesting argument, there are several alternative perspectives that must be considered. In examining these alternatives, the author concluded that psychologists may continue to use deception under certain circumstances outlined in the American Psychological Association's ethical code of conduct. 相似文献