首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Valid use of the traditional independent samples ANOVA procedure requires that the population variances are equal. Previous research has investigated whether variance homogeneity tests, such as Levene's test, are satisfactory as gatekeepers for identifying when to use or not to use the ANOVA procedure. This research focuses on a novel homogeneity of variance test that incorporates an equivalence testing approach. Instead of testing the null hypothesis that the variances are equal against an alternative hypothesis that the variances are not equal, the equivalence-based test evaluates the null hypothesis that the difference in the variances falls outside or on the border of a predetermined interval against an alternative hypothesis that the difference in the variances falls within the predetermined interval. Thus, with the equivalence-based procedure, the alternative hypothesis is aligned with the research hypothesis (variance equality). A simulation study demonstrated that the equivalence-based test of population variance homogeneity is a better gatekeeper for the ANOVA than traditional homogeneity of variance tests.  相似文献   

2.
Preliminary tests of equality of variances used before a test of location are no longer widely recommended by statisticians, although they persist in some textbooks and software packages. The present study extends the findings of previous studies and provides further reasons for discontinuing the use of preliminary tests. The study found Type I error rates of a two‐stage procedure, consisting of a preliminary Levene test on samples of different sizes with unequal variances, followed by either a Student pooled‐variances t test or a Welch separate‐variances t test. Simulations disclosed that the twostage procedure fails to protect the significance level and usually makes the situation worse. Earlier studies have shown that preliminary tests often adversely affect the size of the test, and also that the Welch test is superior to the t test when variances are unequal. The present simulations reveal that changes in Type I error rates are greater when sample sizes are smaller, when the difference in variances is slight rather than extreme, and when the significance level is more stringent. Furthermore, the validity of the Welch test deteriorates if it is used only on those occasions where a preliminary test indicates it is needed. Optimum protection is assured by using a separate‐variances test unconditionally whenever sample sizes are unequal.  相似文献   

3.
We consider the problem of comparingm latent population distributions when the observed values are scores on a test battery with binary items. The latent densities are assumed to be normal densities, and we consider a test for equality of the means as well as a test equality of the variances. In addition, we consider a longitudinal model, where the test battery has been applied to the same individuals at different points in time. This model allows for correlations between the latent variable at different time points, and methods are discussed for estimating the correlation coefficient.This work was supported in part by a grant from the Danish Social Science Research Council.  相似文献   

4.
This paper is a presentation of an essential part of the sampling theory of the error variance and the standard error of measurement. An experimental assumption is that several equivalent tests with equal variances are available. These may be either final forms of the same test or obtained by dividing one test into several parts. The simple model of independent and normally distributed errors of measurement with zero mean is employed. No assumption is made about the form of the distributions of true and observed scores. This implies unrestricted freedom in defining the population. First, maximum-likelihood estimators of the error variance and the standard error of measurement are obtained, their sampling distributions given, and their properties investigated. Then unbiased estimators are defined and their distributions derived. The accuracy of estimation is given special consideration from various points of view. Next, rigorous statistical tests are developed to test hypotheses about error variances on the basis of one and two samples. Also the construction of confidence intervals is treated. Finally, Bartlett's test of homogeneity of variances is used to provide a multi-sample test of equality of error variances.  相似文献   

5.
J. Roy  V. K. Murthy 《Psychometrika》1960,25(3):243-250
Likelihood ratio tests have been proposed by Wilks for testing the hypothesis of equal means, variances, and covariances (H mvc) and the hypothesis of equal variances and covariances (H vc) in ap-variate normal distribution. Using exact distributions of the appropriate likelihood ratio statistics, tables of the .05 and .01 points of these distributions are constructed forp = 4, 5, 6, 7 and sample sizen = 25 (5) 60 (10) 100. A correction factor is recommended for largern. Two numerical examples illustrate use of the tables. A nonparametric test is proposed forH mvc when the multivariate parent population is known to be non-normal.This research was supported partly by the Office of Naval Research under Contract No. Nonr-855(06) and partly by the United States Air Force through the Air Force Office of Scientific Research of the Air Research and Development Command, under Contract No. 18(600)-83. Reproduction in whole or in part for any purpose of the United States Government is permitted.  相似文献   

6.
The conventional approach for testing the equality of two normal mean vectors is to test first the equality of covariance matrices, and if the equality assumption is tenable, then use the two-sample Hotelling T 2 test. Otherwise one can use one of the approximate tests for the multivariate Behrens–Fisher problem. In this article, we study the properties of the Hotelling T 2 test, the conventional approach, and one of the best approximate invariant tests (Krishnamoorthy & Yu, 2004) for the Behrens–Fisher problem. Our simulation studies indicated that the conventional approach often leads to inflated Type I error rates. The approximate test not only controls Type I error rates very satisfactorily when covariance matrices were arbitrary but was also comparable with the T 2 test when covariance matrices were equal.  相似文献   

7.
Tests of the null hypothesis for comparisons involving sample means use the t test when the conditions of the z test cannot be met. The 2 tests have different rationales and can lead to different conclusions regarding significance. In the present study, the authors compared the properties of t and z in simulation runs. The differences in the results are a result of fluctuations in the t test sample variances that do not exist in the z test, and those differences lead to differences in designating the significance of comparisons.  相似文献   

8.
A procedure for developing alternate test forms that are parallel in the sense that scores on the different forms have similar means, standard deviations, and factor structures is described and applied to a bio-data inventory and a situational judgment test. Careful consideration of item-by-item parallelism during development resulted in alternate forms that were parallel at the item level. Further, comparison with a biodata test form comprised of items randomly selected from a pool of biodata items revealed that for the types of measures described here it may be necessary to produce parallel forms of each item to create alternate forms that are parallel in the way in which Cronbach (1947) originally defined parallelism.  相似文献   

9.
Assessment centers rely on multiple, carefully constructed behavioral simulation exercises to measure individuals on multiple performance dimensions. Although methods for establishing parallelism among alternate forms of paper-and-pencil tests have been well researched (i.e., to equate tests on difficulty such that the scores can be compared), little research has considered the why and how of parallel simulation exercises. This paper extends established procedures for constructing parallel test forms to dimension-based behavioral simulations. We discuss reasons for establishing comparable, alternate simulation forms and discuss the issues raised when applying traditional procedures to simulation exercises. After proposing a set of guidelines for establishing alternate forms among simulations, we apply these guidelines to simulations used in an operational assessment center.  相似文献   

10.
Many empirical studies measure psychometric functions (curves describing how observers’ performance varies with stimulus magnitude) because these functions capture the effects of experimental conditions. To assess these effects, parametric curves are often fitted to the data and comparisons are carried out by testing for equality of mean parameter estimates across conditions. This approach is parametric and, thus, vulnerable to violations of the implied assumptions. Furthermore, testing for equality of means of parameters may be misleading: Psychometric functions may vary meaningfully across conditions on an observer-by-observer basis with no effect on the mean values of the estimated parameters. Alternative approaches to assess equality of psychometric functions per se are thus needed. This paper compares three nonparametric tests that are applicable in all situations of interest: The existing generalized Mantel–Haenszel test, a generalization of the Berry–Mielke test that was developed here, and a split variant of the generalized Mantel–Haenszel test also developed here. Their statistical properties (accuracy and power) are studied via simulation and the results show that all tests are indistinguishable as to accuracy but they differ non-uniformly as to power. Empirical use of the tests is illustrated via analyses of published data sets and practical recommendations are given. The computer code in matlab and R to conduct these tests is available as Electronic Supplemental Material.  相似文献   

11.
Many books on statistical methods advocate a ‘conditional decision rule’ when comparing two independent group means. This rule states that the decision as to whether to use a ‘pooled variance’ test that assumes equality of variance or a ‘separate variance’ Welch t test that does not should be based on the outcome of a variance equality test. In this paper, we empirically examine the Type I error rate of the conditional decision rule using four variance equality tests and compare this error rate to the unconditional use of either of the t tests (i.e. irrespective of the outcome of a variance homogeneity test) as well as several resampling‐based alternatives when sampling from 49 distributions varying in skewness and kurtosis. Several unconditional tests including the separate variance test performed as well as or better than the conditional decision rule across situations. These results extend and generalize the findings of previous researchers who have argued that the conditional decision rule should be abandoned.  相似文献   

12.
The data obtained from one‐way independent groups designs is typically non‐normal in form and rarely equally variable across treatment populations (i.e. population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e. the analysis of variance F test) typically provides invalid results (e.g. too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non‐normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e. trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non‐normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non‐normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non‐normal.  相似文献   

13.
Solving theoretical or empirical issues sometimes involves establishing the equality of two variables with repeated measures. This defies the logic of null hypothesis significance testing, which aims at assessing evidence against the null hypothesis of equality, not for it. In some contexts, equivalence is assessed through regression analysis by testing for zero intercept and unit slope (or simply for unit slope in case that regression is forced through the origin). This paper shows that this approach renders highly inflated Type I error rates under the most common sampling models implied in studies of equivalence. We propose an alternative approach based on omnibus tests of equality of means and variances and in subject-by-subject analyses (where applicable), and we show that these tests have adequate Type I error rates and power. The approach is illustrated with a re-analysis of published data from a signal detection theory experiment with which several hypotheses of equivalence had been tested using only regression analysis. Some further errors and inadequacies of the original analyses are described, and further scrutiny of the data contradict the conclusions raised through inadequate application of regression analyses.  相似文献   

14.
Two test statistics are proposed for testing the equality of two correlated proportions when some observations are missing on both responses. The performance of these tests in terms of size and power is compared with other tests by means of Monte Carlo simulations. The proposed tests are easily computed and compare favorably with other tests.  相似文献   

15.
The purpose of this study was to evaluate a modified test of equivalence for conducting normative comparisons when distribution shapes are non‐normal and variances are unequal. A Monte Carlo study was used to compare the empirical Type I error rates and power of the proposed Schuirmann–Yuen test of equivalence, which utilizes trimmed means, with that of the previously recommended Schuirmann and Schuirmann–Welch tests of equivalence when the assumptions of normality and variance homogeneity are satisfied, as well as when they are not satisfied. The empirical Type I error rates of the Schuirmann–Yuen were much closer to the nominal α level than those of the Schuirmann or Schuirmann–Welch tests, and the power of the Schuirmann–Yuen was substantially greater than that of the Schuirmann or Schuirmann–Welch tests when distributions were skewed or outliers were present. The Schuirmann–Yuen test is recommended for assessing clinical significance with normative comparisons.  相似文献   

16.
For one‐way fixed effects ANOVA, it is well known that the conventional F test of the equality of means is not robust to unequal variances, and numerous methods have been proposed for dealing with heteroscedasticity. On the basis of extensive empirical evidence of Type I error control and power performance, Welch's procedure is frequently recommended as the major alternative to the ANOVA F test under variance heterogeneity. To enhance its practical usefulness, this paper considers an important aspect of Welch's method in determining the sample size necessary to achieve a given power. Simulation studies are conducted to compare two approximate power functions of Welch's test for their accuracy in sample size calculations over a wide variety of model configurations with heteroscedastic structures. The numerical investigations show that Levy's (1978a) approach is clearly more accurate than the formula of Luh and Guo (2011) for the range of model specifications considered here. Accordingly, computer programs are provided to implement the technique recommended by Levy for power calculation and sample size determination within the context of the one‐way heteroscedastic ANOVA model.  相似文献   

17.
In this paper, the performance of six types of techniques for comparisons of means is examined. These six emerge from the distinction between the method employed (hypothesis testing, model selection using information criteria, or Bayesian model selection) and the set of hypotheses that is investigated (a classical, exploration‐based set of hypotheses containing equality constraints on the means, or a theory‐based limited set of hypotheses with equality and/or order restrictions). A simulation study is conducted to examine the performance of these techniques. We demonstrate that, if one has specific, a priori specified hypotheses, confirmation (i.e., investigating theory‐based hypotheses) has advantages over exploration (i.e., examining all possible equality‐constrained hypotheses). Furthermore, examining reasonable order‐restricted hypotheses has more power to detect the true effect/non‐null hypothesis than evaluating only equality restrictions. Additionally, when investigating more than one theory‐based hypothesis, model selection is preferred over hypothesis testing. Because of the first two results, we further examine the techniques that are able to evaluate order restrictions in a confirmatory fashion by examining their performance when the homogeneity of variance assumption is violated. Results show that the techniques are robust to heterogeneity when the sample sizes are equal. When the sample sizes are unequal, the performance is affected by heterogeneity. The size and direction of the deviations from the baseline, where there is no heterogeneity, depend on the effect size (of the means) and on the trend in the group variances with respect to the ordering of the group sizes. Importantly, the deviations are less pronounced when the group variances and sizes exhibit the same trend (e.g., are both increasing with group number).  相似文献   

18.
Approximate randomization tests are alternatives to conventional parametric statistical methods used when the normality and homoscedasticity assumptions are violated This article presents an SAS program that tests the equality of two means using an approximate randomization test This program can serve as a template for testing other hypotheses, which is illustrated by modifications to test the significance of a correlation coefficient or the equality of more than two means.  相似文献   

19.
Although substitution tests have been included in tests of intelligence for years, the underlying abilities they measure have still not been clearly determined. This study used componential analysis to investigate the information-processing components underlying substitution test performance. The bases of sex and age differences were also of interest. One hundred subjects from each of three age groups (9–11, 18–25, and 60–89 years) were tested.The componential analysis found that substitution tests measure perceptual speed and, to a lesser extent, memory ability and writing speed. The component “Stimulus Orientation, Response Initiation, and Execution” was related to substitution test performance in the sample of children and the sample of older adults but not in the sample of younger adults. Verbal ability was not significantly related to substitution test performance in the two younger samples but was strongly related to substitution performance in the oldest sample. Although females outperformed males on the Symbol Digit Test, males did as well as females on the computerized tasks. Apparently, sex differences in substitution test performance cannot be explained by the components of the test measured here.  相似文献   

20.
A goodness-of-fit test based on the maximum likelihood criterion is derived for use in evaluating models of choice reaction time that predict choice probabilities and means and variances of latency. Special cases of the test involving models that predict only one or two of these statistics are considered and shown to be asymptotically identical to the traditional goodness-of-fit tests appropriate for these special cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号