首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Experience with real data indicates that psychometric measures often have heavy-tailed distributions. This is known to be a serious problem when comparing the means of two independent groups because heavy-tailed distributions can have a serious effect on power. Another problem that is common in some areas is outliers. This paper suggests an approach to these problems based on the one-step M-estimator of location. Simulations indicate that the new procedure provides very good control over the probability of a Type I error even when distributions are skewed, have different shapes, and the variances are unequal. Moreover, the new procedure has considerably more power than Welch's method when distributions have heavy tails, and it compares well to Yuen's method for comparing trimmed means. Wilcox's median procedure has about the same power as the proposed procedure, but Wilcox's method is based on a statistic that has a finite sample breakdown point of only 1/n, wheren is the sample size. Comments on other methods for comparing groups are also included.  相似文献   

2.
Adverse impact evaluations often call for evidence that the disparity between groups in selection rates is statistically significant, and practitioners must choose which test statistic to apply in this situation. To identify the most effective testing procedure, the authors compared several alternate test statistics in terms of Type I error rates and power, focusing on situations with small samples. Significance testing was found to be of limited value because of low power for all tests. Among the alternate test statistics, the widely-used Z-test on the difference between two proportions performed reasonably well, except when sample size was extremely small. A test suggested by G. J. G. Upton (1982) provided slightly better control of Type I error under some conditions but generally produced results similar to the Z-test. Use of the Fisher Exact Test and Yates's continuity-corrected chi-square test are not recommended because of overly conservative Type I error rates and substantially lower power than the Z-test.  相似文献   

3.
In a recent article in The Journal of General Psychology, J. B. Hittner, K. May, and N. C. Silver (2003) described their investigation of several methods for comparing dependent correlations and found that all can be unsatisfactory, in terms of Type I errors, even with a sample size of 300. More precisely, when researchers test at the .05 level, the actual Type I error probability can exceed .10. The authors of this article extended J. B. Hittner et al.'s research by considering a variety of alternative methods. They found 3 that avoid inflating the Type I error rate above the nominal level. However, a Monte Carlo simulation demonstrated that when the underlying distribution of scores violated the assumption of normality, 2 of these methods had relatively low power and had actual Type I error rates well below the nominal level. The authors report comparisons with E. J. Williams' (1959) method.  相似文献   

4.
Wilcox, Keselman, Muska and Cribbie (2000) found a method for comparing the trimmed means of dependent groups that performed well in simulations, in terms of Type I errors, with a sample size as small as 21. Theory and simulations indicate that little power is lost under normality when using trimmed means rather than untrimmed means, and trimmed means can result in substantially higher power when sampling from a heavy‐tailed distribution. However, trimmed means suffer from two practical concerns described in this paper. Replacing trimmed means with a robust M‐estimator addresses these concerns, but control over the probability of a Type I error can be unsatisfactory when the sample size is small. Methods based on a simple modification of a one‐step M‐estimator that address the problems with trimmed means are examined. Several omnibus tests are compared, one of which performed well in simulations, even with a sample size of 11.  相似文献   

5.
Four applications of permutation tests to the single-mediator model are described and evaluated in this study. Permutation tests work by rearranging data in many possible ways in order to estimate the sampling distribution for the test statistic. The four applications to mediation evaluated here are the permutation test of ab, the permutation joint significance test, and the noniterative and iterative permutation confidence intervals for ab. A Monte Carlo simulation study was used to compare these four tests with the four best available tests for mediation found in previous research: the joint significance test, the distribution of the product test, and the percentile and bias-corrected bootstrap tests. We compared the different methods on Type I error, power, and confidence interval coverage. The noniterative permutation confidence interval for ab was the best performer among the new methods. It successfully controlled Type I error, had power nearly as good as the most powerful existing methods, and had better coverage than any existing method. The iterative permutation confidence interval for ab had lower power than do some existing methods, but it performed better than any other method in terms of coverage. The permutation confidence interval methods are recommended when estimating a confidence interval is a primary concern. SPSS and SAS macros that estimate these confidence intervals are provided.  相似文献   

6.
Adverse impact is often assessed by evaluating whether the success rates for 2 groups on a selection procedure are significantly different. Although various statistical methods have been used to analyze adverse impact data, Fisher's exact test (FET) has been widely adopted, especially when sample sizes are small. In recent years, however, the statistical field has expressed concern regarding the default use of the FET and has proposed several alternative tests. This article reviews Lancaster's mid-P (LMP) test (Lancaster, 1961), an adjustment to the FET that tends to have increased power while maintaining a Type I error rate close to the nominal level. On the basis of Monte Carlo simulation results, the LMP test was found to outperform the FET across a wide range of conditions typical of adverse impact analyses. The LMP test was also found to provide better control over Type I errors than the large-sample Z-test when sample size was very small, but it tended to have slightly lower power than the Z-test under some conditions.  相似文献   

7.
This paper is concerned with supplementing statistical tests for the Rasch model so that additionally to the probability of the error of the first kind (Type I probability) the probability of the error of the second kind (Type II probability) can be controlled at a predetermined level by basing the test on the appropriate number of observations. An approach to determining a practically meaningful extent of model deviation is proposed, and the approximate distribution of the Wald test is derived under the extent of model deviation of interest.  相似文献   

8.
A simulation study investigated the effects of skewness and kurtosis on level-specific maximum likelihood (ML) test statistics based on normal theory in multilevel structural equation models. The levels of skewness and kurtosis at each level were manipulated in multilevel data, and the effects of skewness and kurtosis on level-specific ML test statistics were examined. When the assumption of multivariate normality was violated, the level-specific ML test statistics were inflated, resulting in Type I error rates that were higher than the nominal level for the correctly specified model. Q-Q plots of the test statistics against a theoretical chi-square distribution showed that skewness led to a thicker upper tail and kurtosis led to a longer upper tail of the observed distribution of the level-specific ML test statistic for the correctly specified model.  相似文献   

9.
Standard least squares analysis of variance methods suffer from poor power under arbitrarily small departures from normality and fail to control the probability of a Type I error when standard assumptions are violated. This article describes a framework for robust estimation and testing that uses trimmed means with an approximate degrees of freedom heteroscedastic statistic for independent and correlated groups designs in order to achieve robustness to the biasing effects of nonnormality and variance heterogeneity. The authors describe a nonparametric bootstrap methodology that can provide improved Type I error control. In addition, the authors indicate how researchers can set robust confidence intervals around a robust effect size parameter estimate. In an online supplement, the authors use several examples to illustrate the application of an SAS program to implement these statistical methods.  相似文献   

10.
When sample observations are not independent, the variance estimate in the denominator of the Student t statistic is altered, inflating the value of the test statistic and resulting in far too many Type I errors. Furthermore, how much the Type I error probability exceeds the nominal significance level is an increasing function of sample size. If N is quite large, in the range of 100 to 200 or larger, small apparently inconsequential correlations that are unknown to a researcher, such as .01 or .02, can have substantial effects and lead to false reports of statistical significance when effect size is zero.  相似文献   

11.
12.
Consider two independent groups with K measures for each subject. For the jth group and kth measure, let μtjk be the population trimmed mean, j = 1, 2; k = 1, ..., K. This article compares several methods for testing H 0 : u1k = t2k such that the probability of at least one Type I error is, and simultaneous probability coverage is 1 - α when computing confidence intervals for μt1k - μt2k . The emphasis is on K = 4 and α = .05. For zero trimming the problem reduces to comparing means, but it is well known that when comparing means, arbitrarily small departures from normality can result in extremely low power relative to using say 20% trimming. Moreover, when skewed distributions are being compared, conventional methods for comparing means can be biased for reasons reviewed in the article. A consequence is that in some realistic situations, the probability of rejecting can be higher when the null hypothesis is true versus a situation where the means differ by a half standard deviation. Switching to robust measures of location is known to reduce this problem, and combining robust measures of location with some type of bootstrap method reduces the problem even more. Published articles suggest that for the problem at hand, the percentile t bootstrap, combined with a 20% trimmed mean, will perform relatively well, but there are known situations where it does not eliminate all problems. In this article we consider an extension of the percentile bootstrap approach that is found to give better results.  相似文献   

13.
Goodness-of-fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square, but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's (1984) asymptotically distribution-free method and Satorra Bentler's (1988, 1994) mean scaling statistic were developed under the presumption of nonnormality in the factors and errors. This article finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent, and Bibby's (1980) study of students tested for their ability in 5 content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.  相似文献   

14.
基于改进的Wald统计量,将适用于两群组的DIF检测方法拓展至多群组的项目功能差异(DIF)检验;改进的Wald统计量将分别通过计算观察信息矩阵(Obs)和经验交叉相乘信息矩阵(XPD)而得到。模拟研究探讨了此二者与传统计算方法在多个群组下的DIF检验情况,结果表明:(1)Obs和XPD的一类错误率明显低于传统方法,DINA模型估计下Obs和XPD的一类错误率接近理论水平;(2)样本量和DIF量较大时,Obs和XPD具有与传统Wald统计量大体相同的统计检验力。  相似文献   

15.
Methods for comparing means are known to be highly nonrobust in terms of Type II errors. The problem is that slight shifts from normal distributions toward heavy-tailed distributions inflate the standard error of the sample mean. In contrast, the standard error of various robust measures of location, such as the one-step M-estimator, are relatively unaffected by heavy tails. Wilcox recently examined a method of comparing the one-step M-estimators of location corresponding to two independent groups which provided good control over the probability of a Type I error even for unequal sample sizes, unequal variances, and different shaped distributions. There is a fairly obvious extension of this procedure to pairwise comparisons of more than two independent groups, but simulations reported here indicate that it is unsatisfactory. A slight modification of the procedure is found to give much better results, although some caution must be taken when there are unequal sample sizes and light-tailed distributions. An omnibus test is examined as well.  相似文献   

16.
When more than one significance test is carried out on data from a single experiment, researchers are often concerned with the probability of one or more Type I errors over the entire set of tests. This article considers several methods of exercising control over that probability (the so-called family-wise Type I error rate), provides a schematic that can be used by a researcher to choose among the methods, and discusses applications to contingency tables.  相似文献   

17.
Many robust regression estimators have been proposed that have a high, finite‐sample breakdown point, roughly meaning that a large porportion of points must be altered to drive the value of an estimator to infinity. But despite this, many of them can be inordinately influenced by two properly placed outliers. With one predictor, an estimator that appears to correct this problem to a fair degree, and simultaneously maintain good efficiency when standard assumptions are met, consists of checking for outliers using a projection‐type method, removing any that are found, and applying the Theil — Sen estimator to the data that remain. When dealing with multiple predictors, there are two generalizations of the Theil — Sen estimator that might be used, but nothing is known about how their small‐sample properties compare. Also, there are no results on testing the hypothesis of zero slopes, and there is no information about the effect on efficiency when outliers are removed. In terms of hypothesis testing, using the more obvious percentile bootstrap method in conjunction with a slight modification of Mahalanobis distance was found to avoid Type I error probabilities above the nominal level, but in some situations the actual Type I error probabilities can be substantially smaller than intended when the sample size is small. An alternative method is found to be more satisfactory.  相似文献   

18.
In a variety of measurement situations, the researcher may wish to compare the reliabilities of several instruments administered to the same sample of subjects. This paper presents eleven statistical procedures which test the equality ofm coefficient alphas when the sample alpha coefficients are dependent. Several of the procedures are derived in detail, and numerical examples are given for two. Since all of the procedures depend on approximate asymptotic results, Monte Carlo methods are used to assess the accuracy of the procedures for sample sizes of 50, 100, and 200. Both control of Type I error and power are evaluated by computer simulation. Two of the procedures are unable to control Type I errors satisfactorily. The remaining nine procedures perform properly, but three are somewhat superior in power and Type I error control.A more detailed version of this paper is also available.  相似文献   

19.
The Type I error probability and the power of the independent samples t test, performed directly on the ranks of scores in combined samples in place of the original scores, are known to be the same as those of the non‐parametric Wilcoxon–Mann–Whitney (WMW) test. In the present study, simulations revealed that these probabilities remain essentially unchanged when the number of ranks is reduced by assigning the same rank to multiple ordered scores. For example, if 200 ranks are reduced to as few as 20, or 10, or 5 ranks by replacing sequences of consecutive ranks by a single number, the Type I error probability and power stay about the same. Significance tests performed on these modular ranks consistently reproduce familiar findings about the comparative power of the t test and the WMW tests for normal and various non‐normal distributions. Similar results are obtained for modular ranks used in comparing the one‐sample t test and the Wilcoxon signed ranks test.  相似文献   

20.
The important assumption of independent errors should be evaluated routinely in the application of interrupted time-series regression models. The two most frequently recommended tests of this assumption [Mood's runs test and the Durbin-Watson (D-W) bounds test] have several weaknesses. The former has poor small sample Type I error performance and the latter has the bothersome property that results are often declared to be "inconclusive." The test proposed in this article is simple to compute (special software is not required), there is no inconclusive region, an exact p-value is provided, and it has good Type I error and power properties relative to competing procedures. It is shown that these desirable properties hold when design matrices of a specified form are used to model the response variable. A Monte Carlo evaluation of the method, including comparisons with other tests (viz., runs, D-W bounds, and D-W beta), and examples of application are provided.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号