期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multiplicity,directional (type III) errors,and the null hypothesis

Shaffer JP 《心理学方法》2002,7(3):356-369

L. V. Jones and J. W. Tukey (2000) pointed out that the usual 2-sided, equal-tails null hypothesis test at level alpha can be reinterpreted as simultaneous tests of 2 directional inequality hypotheses, each at level alpha/2, and that the maximum probability of a Type I error is alpha/2 if the truth of the null hypothesis is considered impossible. This article points out that in multiple testing with familywise error rate controlled at alpha, the directional error rate (assuming all null hypotheses are false) is greater than alpha/2 and can be arbitrarily close to alpha. Single-step, step-down, and step-up procedures are analyzed, and other error rates, including the false discovery rate, are discussed. Implications for confidence interval estimation and hypothesis testing practices are considered. 相似文献

2.

Testing hypotheses involving Cronbach's alpha using marginal models

Renske E. Kuijpers L. Andries van der Ark Marcel A. Croon 《The British journal of mathematical and statistical psychology》2013,66(3):503-520

We discuss the statistical testing of three relevant hypotheses involving Cronbach's alpha: one where alpha equals a particular criterion; a second testing the equality of two alpha coefficients for independent samples; and a third testing the equality of two alpha coefficients for dependent samples. For each of these hypotheses, various statistical tests have been proposed. Over the years, these tests have depended on progressively fewer assumptions. We propose a new approach to testing the three hypotheses that relies on even fewer assumptions, is especially suited for discrete item scores, and can be applied easily to tests containing large numbers of items. The new approach uses marginal modelling. We compared the Type I error rate and the power of the marginal modelling approach to several of the available tests in a simulation study using realistic conditions. We found that the marginal modelling approach had the most accurate Type I error rates, whereas the power was similar across the statistical tests. 相似文献

3.

On Selecting Tests for Equality of Two Normal Mean Vectors

《Multivariate behavioral research》2013,48(4):533-548

The conventional approach for testing the equality of two normal mean vectors is to test first the equality of covariance matrices, and if the equality assumption is tenable, then use the two-sample Hotelling T ² test. Otherwise one can use one of the approximate tests for the multivariate Behrens–Fisher problem. In this article, we study the properties of the Hotelling T ² test, the conventional approach, and one of the best approximate invariant tests (Krishnamoorthy & Yu, 2004) for the Behrens–Fisher problem. Our simulation studies indicated that the conventional approach often leads to inflated Type I error rates. The approximate test not only controls Type I error rates very satisfactorily when covariance matrices were arbitrary but was also comparable with the T ² test when covariance matrices were equal. 相似文献

4.

Testing two variances for superiority/non-inferiority and equivalence: Using the exhaustion algorithm for sample size allocation with cost

Jiin-huarng Guo Wei-ming Luh 《The British journal of mathematical and statistical psychology》2020,73(2):316-332

The equality of two group variances is frequently tested in experiments. However, criticisms of null hypothesis statistical testing on means have recently arisen and there is interest in other types of statistical tests of hypotheses, such as superiority/non-inferiority and equivalence. Although these tests have become more common in psychology and social sciences, the corresponding sample size estimation for these tests is rarely discussed, especially when the sampling unit costs are unequal or group sizes are unequal for two groups. Thus, for finding optimal sample size, the present study derived an initial allocation by approximating the percentiles of an F distribution with the percentiles of the standard normal distribution and used the exhaustion algorithm to select the best combination of group sizes, thereby ensuring the resulting power reaches the designated level and is maximal with a minimal total cost. In this manner, optimization of sample size planning is achieved. The proposed sample size determination has a wide range of applications and is efficient in terms of Type I errors and statistical power in simulations. Finally, an illustrative example from a report by the Health Survey for England, 1995–1997, is presented using hypertension data. For ease of application, four R Shiny apps are provided and benchmarks for setting equivalence margins are suggested. 相似文献

5.

ANOVA and the variance homogeneity assumption: Exploring a better gatekeeper

Yoosun Jamie Kim Robert A. Cribbie 《The British journal of mathematical and statistical psychology》2018,71(1):1-12

Valid use of the traditional independent samples ANOVA procedure requires that the population variances are equal. Previous research has investigated whether variance homogeneity tests, such as Levene's test, are satisfactory as gatekeepers for identifying when to use or not to use the ANOVA procedure. This research focuses on a novel homogeneity of variance test that incorporates an equivalence testing approach. Instead of testing the null hypothesis that the variances are equal against an alternative hypothesis that the variances are not equal, the equivalence-based test evaluates the null hypothesis that the difference in the variances falls outside or on the border of a predetermined interval against an alternative hypothesis that the difference in the variances falls within the predetermined interval. Thus, with the equivalence-based procedure, the alternative hypothesis is aligned with the research hypothesis (variance equality). A simulation study demonstrated that the equivalence-based test of population variance homogeneity is a better gatekeeper for the ANOVA than traditional homogeneity of variance tests. 相似文献

6.

Bayes factor approaches for testing interval null hypotheses 总被引：1，自引：0，他引：1

Morey RD Rouder JN 《心理学方法》2011,16(4):406-419

Psychological theories are statements of constraint. The role of hypothesis testing in psychology is to test whether specific theoretical constraints hold in data. Bayesian statistics is well suited to the task of finding supporting evidence for constraint, because it allows for comparing evidence for 2 hypotheses against each another. One issue in hypothesis testing is that constraints may hold only approximately rather than exactly, and the reason for small deviations may be trivial or uninteresting. In the large-sample limit, these uninteresting, small deviations lead to the rejection of a useful constraint. In this article, we develop several Bayes factor 1-sample tests for the assessment of approximate equality and ordinal constraints. In these tests, the null hypothesis covers a small interval of non-0 but negligible effect sizes around 0. These Bayes factors are alternatives to previously developed Bayes factors, which do not allow for interval null hypotheses, and may especially prove useful to researchers who use statistical equivalence testing. To facilitate adoption of these Bayes factor tests, we provide easy-to-use software. 相似文献

7.

Testing for negligible interaction: A coherent and robust approach

下载免费PDF全文

Robert A. Cribbie Chantal Ragoonanan Alyssa Counsell 《The British journal of mathematical and statistical psychology》2016,69(2):159-174

Researchers often want to demonstrate a lack of interaction between two categorical predictors on an outcome. To justify a lack of interaction, researchers typically accept the null hypothesis of no interaction from a conventional analysis of variance (ANOVA). This method is inappropriate as failure to reject the null hypothesis does not provide statistical evidence to support a lack of interaction. This study proposes a bootstrap‐based intersection–union test for negligible interaction that provides coherent decisions between the omnibus test and post hoc interaction contrast tests and is robust to violations of the normality and variance homogeneity assumptions. Further, a multiple comparison strategy for testing interaction contrasts following a non‐significant omnibus test is proposed. Our simulation study compared the Type I error control, omnibus power and per‐contrast power of the proposed approach to the non‐centrality‐based negligible interaction test of Cheng and Shao (2007, Statistica Sinica, 17, 1441). For 2 × 2 designs, the empirical Type I error rates of the Cheng and Shao test were very close to the nominal α level when the normality and variance homogeneity assumptions were satisfied; however, only our proposed bootstrapping approach was satisfactory under non‐normality and/or variance heterogeneity. In general a × b designs, although the omnibus Cheng and Shao test, as expected, is the most powerful, it is not robust to assumption violation and results in incoherent omnibus and interaction contrast decisions that are not possible with the intersection–union approach. 相似文献

8.

Regression-based techniques for statistical decision making in single-case designs

Manolov R Arnau J Solanas A Bono R 《Psicothema》2010,22(4):1026-1032

The present study evaluates the performance of four methods for estimating regression coefficients used to make statistical decisions about intervention effectiveness in single-case designs. Ordinary least square estimation is compared to two correction techniques dealing with general trend and a procedure that eliminates autocorrelation whenever it is present. Type I error rates and statistical power are studied for experimental conditions defined by the presence or absence of treatment effect (change in level or in slope), general trend, and serial dependence. The results show that empirical Type I error rates do not approach the nominal ones in the presence of autocorrelation or general trend when ordinary and generalized least squares are applied. The techniques controlling trend show lower false alarm rates, but prove to be insufficiently sensitive to existing treatment effects. Consequently, the use of the statistical significance of the regression coefficients for detecting treatment effects is not recommended for short data series. 相似文献

9.

SNOOP: A program for demonstrating the consequences of premature and repeated null hypothesis testing

Strube MJ 《Behavior research methods》2006,38(1):24-27

The ease with which data can be collected and analyzed via personal computer makes it potentially attractive to “peek” at the data before a target sample size is achieved. This tactic might seem appealing because data collection could be stopped early, which would save valuable resources, if a peek revealed a significant effect. Unfortunately, such data snooping comes with a cost. When the null hypothesis is true, the Type I error rate is inflated, sometimes quite substantially. If the null hypothesis is false, premature significance testing leads to inflated estimates of power and effect size. This program provides simulation results for a wide variety of premature and repeated null hypothesis testing scenarios. It gives researchers the ability to know in advance the consequences of data peeking so that appropriate corrective action can be taken. 相似文献

10.

Omnibus hypothesis testing in dominance-based ordinal multiple regression

Long JD 《心理学方法》2005,10(3):329-351

Often quantitative data in the social sciences have only ordinal justification. Problems of interpretation can arise when least squares multiple regression (LSMR) is used with ordinal data. Two ordinal alternatives are discussed, dominance-based ordinal multiple regression (DOMR) and proportional odds multiple regression. The Q2 statistic is introduced for testing the omnibus null hypothesis in DOMR. A simulation study is discussed that examines the actual Type I error rate and power of Q2 in comparison to the LSMR omnibus F test under normality and non-normality. Results suggest that Q2 has favorable sampling properties as long as the sample size-to-predictors ratio is not too small, and Q2 can be a good alternative to the omnibus F test when the response variable is non-normal. 相似文献

11.

Robust tests of equivalence for k independent groups

Andy Koh Robert Cribbie 《The British journal of mathematical and statistical psychology》2013,66(3):426-434

A common question of interest to researchers in psychology is the equivalence of two or more groups. Failure to reject the null hypothesis of traditional hypothesis tests such as the ANOVA F‐test (i.e., H₀: μ₁ = … = μ_k) does not imply the equivalence of the population means. Researchers interested in determining the equivalence of k independent groups should apply a one‐way test of equivalence (e.g., Wellek, 2003). The goals of this study were to investigate the robustness of the one‐way Wellek test of equivalence to violations of homogeneity of variance assumption, and compare the Type I error rates and power of the Wellek test with a heteroscedastic version which was based on the logic of the one‐way Welch (1951) F‐test. The results indicate that the proposed Wellek–Welch test was insensitive to violations of the homogeneity of variance assumption, whereas the original Wellek test was not appropriate when the population variances were not equal. 相似文献

12.

Evaluating clinical significance: Incorporating robust statistics with normative comparison tests

Katrina van Wieringen Robert A. Cribbie 《The British journal of mathematical and statistical psychology》2014,67(2):213-230

The purpose of this study was to evaluate a modified test of equivalence for conducting normative comparisons when distribution shapes are non‐normal and variances are unequal. A Monte Carlo study was used to compare the empirical Type I error rates and power of the proposed Schuirmann–Yuen test of equivalence, which utilizes trimmed means, with that of the previously recommended Schuirmann and Schuirmann–Welch tests of equivalence when the assumptions of normality and variance homogeneity are satisfied, as well as when they are not satisfied. The empirical Type I error rates of the Schuirmann–Yuen were much closer to the nominal α level than those of the Schuirmann or Schuirmann–Welch tests, and the power of the Schuirmann–Yuen was substantially greater than that of the Schuirmann or Schuirmann–Welch tests when distributions were skewed or outliers were present. The Schuirmann–Yuen test is recommended for assessing clinical significance with normative comparisons. 相似文献

13.

Robust step‐down tests for multivariate independent group designs

《The British journal of mathematical and statistical psychology》2007,60(2):245-265

A composite step‐down procedure, in which a set of step‐down tests are summarized collectively with Fisher's combination statistic, was considered to test for multivariate mean equality in two‐group designs. An approximate degrees of freedom (ADF) composite procedure based on trimmed/Winsorized estimators and a non‐pooled estimate of error variance is proposed, and compared to a composite procedure based on trimmed/Winsorized estimators and a pooled estimate of error variance. The step‐down procedures were also compared to Hotelling's T² and Johansen's ADF global procedure based on trimmed estimators in a simulation study. Type I error rates of the pooled step‐down procedure were sensitive to covariance heterogeneity in unbalanced designs; error rates were similar to those of Hotelling's T² across all of the investigated conditions. Type I error rates of the ADF composite step‐down procedure were insensitive to covariance heterogeneity and less sensitive to the number of dependent variables when sample size was small than error rates of Johansen's test. The ADF composite step‐down procedure is recommended for testing hypotheses of mean equality in two‐group designs except when the data are sampled from populations with different degrees of multivariate skewness. 相似文献

14.

Comparing several robust tests of stochastic equality with ordinally scaled variables and small to moderate sized samples

Delaney HD Vargha A 《心理学方法》2002,7(4):485-503

In a comparison of 2 treatments, if outcome scores are denoted by X in 1 condition and by Y in the other, stochastic equality is defined as P(X < Y) = P(X > Y). Tests of stochastic equality can be affected by characteristics of the distributions being compared, such as heterogeneity of variance. Thus, various robust tests of stochastic equality have been proposed and are evaluated here using a Monte Carlo study with sample sizes ranging from 10 to 30. Three robust tests are identified that perform well in Type I error rates and power except when extremely skewed data co-occur with very small n. When tests of stochastic equality might be preferred to tests of means is also considered. 相似文献

15.

A practical method for analyzing factorial designs with heteroscedastic data

Vallejo G Ato M Fernández MP Livacic-Rojas PE 《Psychological reports》2008,102(3):643-656

The Type I error rates and powers of three recent tests for analyzing nonorthogonal factorial designs under departures from the assumptions of homogeneity and normality were evaluated using Monte Carlo simulation. Specifically, this work compared the performance of the modified Brown-Forsythe procedure, the generalization of Box's method proposed by Brunner, Dette, and Munk, and the mixed-model procedure adjusted by the Kenward-Roger solution available in the SAS statistical package. With regard to robustness, the three approaches adequately controlled Type I error when the data were generated from symmetric distributions; however, this study's results indicate that, when the data were extracted from asymmetric distributions, the modified Brown-Forsythe approach controlled the Type I error slightly better than the other procedures. With regard to sensitivity, the higher power rates were obtained when the analyses were done with the MIXED procedure of the SAS program. Furthermore, results also identified that, when the data were generated from symmetric distributions, little power was sacrificed by using the generalization of Box's method in place of the modified Brown-Forsythe procedure. 相似文献

16.

An inferential confidence interval method of establishing statistical equivalence that corrects Tryon's (2001) reduction factor

Tryon WW Lewis C 《心理学方法》2008,13(3):272-277

Evidence of group matching frequently takes the form of a nonsignificant test of statistical difference. Theoretical hypotheses of no difference are also tested in this way. These practices are flawed in that null hypothesis statistical testing provides evidence against the null hypothesis and failing to reject H-sub-0 is not evidence supportive of it. Tests of statistical equivalence are needed. This article corrects the inferential confidence interval (ICI) reduction factor introduced by W. W. Tryon (2001) and uses it to extend his discussion of statistical equivalence. This method is shown to be algebraically equivalent with D. J. Schuirmann's (1987) use of 2 one-sided t tests, a highly regarded and accepted method of testing for statistical equivalence. The ICI method provides an intuitive graphic method for inferring statistical difference as well as equivalence. Trivial difference occurs when a test of difference and a test of equivalence are both passed. Statistical indeterminacy results when both tests are failed. Hybrid confidence intervals are introduced that impose ICI limits on standard confidence intervals. These intervals are recommended as replacements for error bars because they facilitate inferences. 相似文献

17.

等效性检验——结构方程模型评价和测量不变性分析的新视角

王阳温忠麟付媛姝《心理科学进展》2020,28(11):1961-1969

常用的结构方程模型拟合指数存在一定局限, 如χ ²以传统零假设为目标假设, 无法验证模型, 而RMSEA和CFI等描述性的拟合指数不具备推断统计性质, 等效性检验有效弥补了这些问题。首先说明等效性检验如何评价单个模型的拟合, 并解释其与零假设检验的不同, 然后介绍等效性检验如何分析测量不变性, 接着用实证数据展示了等效性检验在单个模型评价和测量不变性检验中的效果, 并与传统模型评价方法比较。相似文献

18.

The Case for Use of Simple Difference Scores to Test the Significance of Differences in Mean Rates of Change in Controlled Repeated Measurements Designs

John E. Overall Scott Tonidandel 《Multivariate behavioral research》2013,48(5):806-827

A previous Monte Carlo study examined the relative powers of several simple and more complex procedures for testing the significance of difference in mean rates of change in a controlled, longitudinal, treatment evaluation study. Results revealed that the relative powers depended on the correlation structure of the simulated repeated measurements. Tests on dropout-weighted linear slope coefficients fitted to all of the available measurements for each participant were found to provide superior power in the presence of compound symmetry (CS), but tests of significance applied to simple baseline-to-endpoint difference scores provided superior power in the presence of a strongly autoregressive (AR) correlation structure. Type I error rates appeared in an acceptable range for both of those analyses. Insofar as the previous study considered only two widely disparate correlation structures, the present work was undertaken to examine where along a continuum of correlation structures lying between strongly AR and CS the power balance shifts from favoring the simple endpoint difference-score analysis to favoring a regression analysis that utilizes all of the available repeated measurements for each participant. With power calculated from the relative frequencies of rejecting H_o at different levels of autoregression, the results indicate superior power for the simple endpoint analysis across more than half the distance from strongly AR to CS. To examine replicability of the simulation results using real data from a previously published study, sampling with replacement from a double-blind controlled study examining the treatment of depression was used to create a Monte Carlo data set from which power could be calculated from relative frequencies of rejecting H_o. 相似文献

19.

Evaluating the robustness of repeated measures analyses: The case of small sample sizes and nonnormal data

Daniel Oberfeld Thomas Franke 《Behavior research methods》2013,45(3):792-812

Repeated measures analyses of variance are the method of choice in many studies from experimental psychology and the neurosciences. Data from these fields are often characterized by small sample sizes, high numbers of factor levels of the within-subjects factor(s), and nonnormally distributed response variables such as response times. For a design with a single within-subjects factor, we investigated Type I error control in univariate tests with corrected degrees of freedom, the multivariate approach, and a mixed-model (multilevel) approach (SAS PROC MIXED) with Kenward–Roger’s adjusted degrees of freedom. We simulated multivariate normal and nonnormal distributions with varied population variance–covariance structures (spherical and nonspherical), sample sizes (N), and numbers of factor levels (K). For normally distributed data, as expected, the univariate approach with Huynh–Feldt correction controlled the Type I error rate with only very few exceptions, even if samples sizes as low as three were combined with high numbers of factor levels. The multivariate approach also controlled the Type I error rate, but it requires N ≥ K. PROC MIXED often showed acceptable control of the Type I error rate for normal data, but it also produced several liberal or conservative results. For nonnormal data, all of the procedures showed clear deviations from the nominal Type I error rate in many conditions, even for sample sizes greater than 50. Thus, none of these approaches can be considered robust if the response variable is nonnormally distributed. The results indicate that both the variance heterogeneity and covariance heterogeneity of the population covariance matrices affect the error rates. 相似文献

20.

Moderated multiple regression for interactions involving categorical variables: a statistical control for heterogeneous variance across two groups

Overton RC 《心理学方法》2001,6(3):218-233

Moderated multiple regression (MMR) arguably is the most popular statistical technique for investigating regression slope differences (interactions) across groups (e.g., aptitude-treatment interactions in training and differential test score-job performance prediction in selection testing). However, heterogeneous error variances can greatly bias the typical MMR analysis, and the conditions that cause heterogeneity are not uncommon. Statistical corrections that have been developed require special calculations and are not conducive to follow-up analyses that describe an interaction effect in depth. A weighted least squares (WLS) approach is recommended for 2-group studies. For 2-group studies, WLS is statistically accurate, is readily executed through popular software packages (e.g., SAS Institute, 1999; SPSS, 1999), and allows follow-up tests. 相似文献