首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The extent to which rank transformations result in the same statistical decisions as their non‐parametric counterparts is investigated. Simulations are presented using the Wilcoxon–Mann–Whitney test, the Wilcoxon signed‐rank test and the Kruskal–Wallis test, together with the rank transformations and t and F tests corresponding to each of those non‐parametric methods. In addition to Type I errors and power over all simulations, the study examines the consistency of the outcomes of the two methods on each individual sample. The results show how acceptance or rejection of the null hypothesis and differences in p‐values of the test statistics depend in a regular and predictable way on sample size, significance level, and differences between means, for normal and various non‐normal distributions.  相似文献   

2.
For one‐way fixed effects ANOVA, it is well known that the conventional F test of the equality of means is not robust to unequal variances, and numerous methods have been proposed for dealing with heteroscedasticity. On the basis of extensive empirical evidence of Type I error control and power performance, Welch's procedure is frequently recommended as the major alternative to the ANOVA F test under variance heterogeneity. To enhance its practical usefulness, this paper considers an important aspect of Welch's method in determining the sample size necessary to achieve a given power. Simulation studies are conducted to compare two approximate power functions of Welch's test for their accuracy in sample size calculations over a wide variety of model configurations with heteroscedastic structures. The numerical investigations show that Levy's (1978a) approach is clearly more accurate than the formula of Luh and Guo (2011) for the range of model specifications considered here. Accordingly, computer programs are provided to implement the technique recommended by Levy for power calculation and sample size determination within the context of the one‐way heteroscedastic ANOVA model.  相似文献   

3.
We derive the statistical power functions in multi‐site randomized trials with multiple treatments at each site, using multi‐level modelling. An F statistic is used to test multiple parameters in the multi‐level model instead of the Wald chi square test as suggested in the current literature. The F statistic is shown to be more conservative than the Wald statistic in testing any overall treatment effect among the multiple study conditions. In addition, we improvise an easy way to estimate the non‐centrality parameters for the means comparison t‐tests and the F test, using Helmert contrast coding in the multi‐level model. The variance of treatment means, which is difficult to fathom but necessary for power analysis, is decomposed into intuitive simple effect sizes in the contrast tests. The method is exemplified by a multi‐site evaluation study of the behavioural interventions for cannabis dependence.  相似文献   

4.
Yuen's two‐sample trimmed mean test statistic is one of the most robust methods to apply when variances are heterogeneous. The present study develops formulas for the sample size required for the test. The formulas are applicable for the cases of unequal variances, non‐normality and unequal sample sizes. Given the specified α and the power (1?β), the minimum sample size needed by the proposed formulas under various conditions is less than is given by the conventional formulas. Moreover, given a specified size of sample calculated by the proposed formulas, simulation results show that Yuen's test can achieve statistical power which is generally superior to that of the approximate t test. A numerical example is provided.  相似文献   

5.
Researchers often want to demonstrate a lack of interaction between two categorical predictors on an outcome. To justify a lack of interaction, researchers typically accept the null hypothesis of no interaction from a conventional analysis of variance (ANOVA). This method is inappropriate as failure to reject the null hypothesis does not provide statistical evidence to support a lack of interaction. This study proposes a bootstrap‐based intersection–union test for negligible interaction that provides coherent decisions between the omnibus test and post hoc interaction contrast tests and is robust to violations of the normality and variance homogeneity assumptions. Further, a multiple comparison strategy for testing interaction contrasts following a non‐significant omnibus test is proposed. Our simulation study compared the Type I error control, omnibus power and per‐contrast power of the proposed approach to the non‐centrality‐based negligible interaction test of Cheng and Shao (2007, Statistica Sinica, 17, 1441). For 2 × 2 designs, the empirical Type I error rates of the Cheng and Shao test were very close to the nominal α level when the normality and variance homogeneity assumptions were satisfied; however, only our proposed bootstrapping approach was satisfactory under non‐normality and/or variance heterogeneity. In general a × b designs, although the omnibus Cheng and Shao test, as expected, is the most powerful, it is not robust to assumption violation and results in incoherent omnibus and interaction contrast decisions that are not possible with the intersection–union approach.  相似文献   

6.
Repeated measures designs have been widely employed in psychological experimentation, however, such designs have rarely been analyzed by means of permutation procedures. In the present paper certain aspects of hypothesis tests ina particular repeated measures design (one non-repeated factor (A) and one repeated factor (B) withK subjects per level ofA) were investigated by means of permutation rather than sampling processes. The empirical size and power of certain normal theoryF-tests obtained under permutation were compared to their nominal normal theory values. Data sets were established in which various combinations of kurtosis of subject means and intra-subject variance heterogeneity existed in order that their effect upon the agreement of these two models could be ascertained. The results indicated that except in cases of high intra-subject variance heterogeneity, the usualF-tests onB andAB exhibited approximately the same size and power characteristics whether based upon a permutation or normal theory sampling basis.This research prepared under Contract No. 2593 from the Cooperative Research Branch of the U. S. Office of Education.  相似文献   

7.
A common question of interest to researchers in psychology is the equivalence of two or more groups. Failure to reject the null hypothesis of traditional hypothesis tests such as the ANOVA F‐test (i.e., H0: μ1 = … = μk) does not imply the equivalence of the population means. Researchers interested in determining the equivalence of k independent groups should apply a one‐way test of equivalence (e.g., Wellek, 2003). The goals of this study were to investigate the robustness of the one‐way Wellek test of equivalence to violations of homogeneity of variance assumption, and compare the Type I error rates and power of the Wellek test with a heteroscedastic version which was based on the logic of the one‐way Welch (1951) F‐test. The results indicate that the proposed Wellek–Welch test was insensitive to violations of the homogeneity of variance assumption, whereas the original Wellek test was not appropriate when the population variances were not equal.  相似文献   

8.
In contrast to prospective power analysis, retrospective power analysis provides an estimate of the statistical power of a hypothesis test after an investigation has been conducted rather than before. In this article, three approaches to obtaining point estimates of power and an interval estimation algorithm are delineated. Previous research on the bias and sampling error of these estimates is briefly reviewed. Finally, an SAS macro that calculates the point and interval estimates is described. The macro was developed to estimate the power of anF test (obtained from analysis of variance, multiple regression analysis, or any of several multivariate analyses), but it may be easily adapted for use with other statistics, such as chi-square tests ort tests.  相似文献   

9.
Research problems that require a non‐parametric analysis of multifactor designs with repeated measures arise in the behavioural sciences. There is, however, a lack of available procedures in commonly used statistical packages. In the present study, a generalization of the aligned rank test for the two‐way interaction is proposed for the analysis of the typical sources of variation in a three‐way analysis of variance (ANOVA) with repeated measures. It can be implemented in the usual statistical packages. Its statistical properties are tested by using simulation methods with two sample sizes (n = 30 and n = 10) and three distributions (normal, exponential and double exponential). Results indicate substantial increases in power for non‐normal distributions in comparison with the usual parametric tests. Similar levels of Type I error for both parametric and aligned rank ANOVA were obtained with non‐normal distributions and large sample sizes. Degrees‐of‐freedom adjustments for Type I error control in small samples are proposed. The procedure is applied to a case study with 30 participants per group where it detects gender differences in linguistic abilities in blind children not shown previously by other methods.  相似文献   

10.
This article proposes an approach to modelling partially cross‐classified multilevel data where some of the level‐1 observations are nested in one random factor and some are cross‐classified by two random factors. Comparisons between a proposed approach to two other commonly used approaches which treat the partially cross‐classified data as either fully nested or fully cross‐classified are completed with a simulation study. Results show that the proposed approach demonstrates desirable performance in terms of parameter estimates and statistical inferences. Both the fully nested model and the fully cross‐classified model suffer from biased estimates of some variance components and statistical inferences of some fixed effects. Results also indicate that the proposed model is robust against cluster size imbalance.  相似文献   

11.
In Ordinary Least Square regression, researchers often are interested in knowing whether a set of parameters is different from zero. With complete data, this could be achieved using the gain in prediction test, hierarchical multiple regression, or an omnibus F test. However, in substantive research scenarios, missing data often exist. In the context of multiple imputation, one of the current state-of-art missing data strategies, there are several different analogous multi-parameter tests of the joint significance of a set of parameters, and these multi-parameter test statistics can be referenced to various distributions to make statistical inferences. However, little is known about the performance of these tests, and virtually no research study has compared the Type 1 error rates and statistical power of these tests in scenarios that are typical of behavioral science data (e.g., small to moderate samples, etc.). This paper uses Monte Carlo simulation techniques to examine the performance of these multi-parameter test statistics for multiple imputation under a variety of realistic conditions. We provide a number of practical recommendations for substantive researchers based on the simulation results, and illustrate the calculation of these test statistics with an empirical example.  相似文献   

12.
Two statistics, one recent and one well known, are shown to be equivalent. The recent statistic, prep, gives the probability that the sign of an experimental effect is replicable by an experiment of equal power. That statistic is equivalent to the well‐known measure for the area under a receiver operating characteristic (ROC) curve for statistical power against significance level. Both statistics can be seen as exemplifying the area theorem of psychophysics.  相似文献   

13.
Gene V Glass 《Psychometrika》1966,31(4):545-561
The relationship between the factor pattern,F, derived from fallible (containing measurement error) observations on variables and the factor pattern,F*, derived from infallible observations on variables is investigated. A widely believed relationship betweenF andF*, viz.,F*=AF whereA is a diagonal matrix containing the inverses of the square roots of the reliabilities of the variables, is shown to be false for several factor analytic techniques. Under suitable assumptions, it is shown that for Kaiser and Caffrey's alpha factor analysisF* andF are related byF*=AF. Empirical examples for which the corresponding elements ofF* andAF are equal to two decimal places are presented. The implications of the equality ofF* andAF for alpha factor analysis are discussed.I wish to acknowledge the generous assistance of Drs. Chester W. Harris and Henry F. Kaiser in the execution of the research reported in this paper.  相似文献   

14.
Unproctored Internet testing (UIT) is becoming more popular in employment settings due to its cost effectiveness and efficiency. However, one of the major concerns with UIT is the possibility of cheating behaviors: a more capable conspirator can sit beside the real applicant and answer test items, or the applicant may use unauthorized materials. The present study examined the effectiveness of using a proctored verification test following the UIT to identify cheating in UIT, where 2 test statistics, a Z‐test and a likelihood ratio (LR) test, compare the consistency of test performance across the testing conditions. A simulation study was conducted to test the effectiveness of the two test statistics for a computerized adaptive test format. Results indicate that both test statistics have high power to detect dishonest job applicants at low Type I error rates. Compared with the LR test, the Z‐test was more efficient and effective and is therefore recommended for practical applications. The theoretical and practical implications are discussed.  相似文献   

15.
Randomization tests are a class of nonparametric statistics that determine the significance of treatment effects. Unlike parametric statistics, randomization tests do not assume a random sample, or make any of the distributional assumptions that often preclude statistical inferences about single‐case data. A feature that randomization tests share with parametric statistics, however, is the derivation of a p‐value. P‐values are notoriously misinterpreted and are partly responsible for the putative “replication crisis.” Behavior analysts might question the utility of adding such a controversial index of statistical significance to their methods, so it is the aim of this paper to describe the randomization test logic and its potentially beneficial consequences. In doing so, this paper will: (1) address the replication crisis as a behavior analyst views it, (2) differentiate the problematic p‐values of parametric statistics from the, arguably, more useful p‐values of randomization tests, and (3) review the logic of randomization tests and their unique fit within the behavior analytic tradition of studying behavioral processes that cut across species.  相似文献   

16.
The data obtained from one‐way independent groups designs is typically non‐normal in form and rarely equally variable across treatment populations (i.e. population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e. the analysis of variance F test) typically provides invalid results (e.g. too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non‐normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e. trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non‐normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non‐normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non‐normal.  相似文献   

17.
Random effects meta‐regression is a technique to synthesize results of multiple studies. It allows for a test of an overall effect, as well as for tests of effects of study characteristics, that is, (discrete or continuous) moderator effects. We describe various procedures to test moderator effects: the z, t, likelihood ratio (LR), Bartlett‐corrected LR (BcLR), and resampling tests. We compare the Type I error of these tests, and conclude that the common z test, and to a lesser extent the LR test, do not perform well since they may yield Type I error rates appreciably larger than the chosen alpha. The error rate of the resampling test is accurate, closely followed by the BcLR test. The error rate of the t test is less accurate but arguably tolerable. With respect to statistical power, the BcLR and t tests slightly outperform the resampling test. Therefore, our recommendation is to use either the resampling or the BcLR test. If these statistics are unavailable, then the t test should be used since it is certainly superior to the z test.  相似文献   

18.
19.
Correlated multivariate ordinal data can be analysed with structural equation models. Parameter estimation has been tackled in the literature using limited-information methods including three-stage least squares and pseudo-likelihood estimation methods such as pairwise maximum likelihood estimation. In this paper, two likelihood ratio test statistics and their asymptotic distributions are derived for testing overall goodness-of-fit and nested models, respectively, under the estimation framework of pairwise maximum likelihood estimation. Simulation results show a satisfactory performance of type I error and power for the proposed test statistics and also suggest that the performance of the proposed test statistics is similar to that of the test statistics derived under the three-stage diagonally weighted and unweighted least squares. Furthermore, the corresponding, under the pairwise framework, model selection criteria, AIC and BIC, show satisfactory results in selecting the right model in our simulation examples. The derivation of the likelihood ratio test statistics and model selection criteria under the pairwise framework together with pairwise estimation provide a flexible framework for fitting and testing structural equation models for ordinal as well as for other types of data. The test statistics derived and the model selection criteria are used on data on ‘trust in the police’ selected from the 2010 European Social Survey. The proposed test statistics and the model selection criteria have been implemented in the R package lavaan.  相似文献   

20.
A Monte Carlo simulation was conducted to compare five, pairwise multiple comparison procedures. The number of means varied from 4 to 6 and the sample size ratio varied from 1 to 60. Procedures were evaluated on the basis of Type I errors, any‐pair power and all‐pairs power. Four procedures were shown to be conservative, while the fifth provided adequate control of Type I errors only for restricted values of sample size ratios. No procedure was found to be uniformly most powerful. The Tukey‐Kramer procedure was found to provide the best any‐pair power provided it is applied without requiring a significant overall F test. In most cases, the Hayter‐Fisher modification of the Tukey‐Kramer was found to provide very good any‐pair power and to be uniformly more powerful than the Tukey‐Kramer when a significant overall F test is required. A partition‐based version of Peritz's method usually provided the greatest all‐pairs power. A modification of the Shaffer‐Welsch was found to be useful in certain conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号