首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Abstract This article considers the problem of comparing two independent groups in terms of some measure of location. It is well known that with Student's two-independent-sample t test, the actual level of significance can be well above or below the nominal level, confidence intervals can have inaccurate probability coverage, and power can be low relative to other methods. A solution to deal with heterogeneity is Welch's (1938) test. Welch's test deals with heteroscedasticity but can have poor power under arbitrarily small departures from normality. Yuen (1974) generalized Welch's test to trimmed means; her method provides improved control over the probability of a Type I error, but problems remain. Transformations for skewness improve matters, but the probability of a Type I error remains unsatisfactory in some situations. We find that a transformation for skewness combined with a bootstrap method improves Type I error control and probability coverage even if sample sizes are small.  相似文献   

2.
There have been many discussions of how Type I errors should be controlled when many hypotheses are tested (e.g., all possible comparisons of means, correlations, proportions, the coefficients in hierarchical models, etc.). By and large, researchers have adopted familywise (FWER) control, though this practice certainly is not universal. Familywise control is intended to deal with the multiplicity issue of computing many tests of significance, yet such control is conservative--that is, less powerful--compared to per test/hypothesis control. The purpose of our article is to introduce the readership, particularly those readers familiar with issues related to controlling Type I errors when many tests of significance are computed, to newer methods that provide protection from the effects of multiple testing, yet are more powerful than familywise controlling methods. Specifically, we introduce a number of procedures that control the k-FWER. These methods--say, 2-FWER instead of 1-FWER (i.e., FWER)--are equivalent to specifying that the probability of 2 or more false rejections is controlled at .05, whereas FWER controls the probability of any (i.e., 1 or more) false rejections at .05. 2-FWER implicitly tolerates 1 false rejection and makes no explicit attempt to control the probability of its occurrence, unlike FWER, which tolerates no false rejections at all. More generally, k-FWER tolerates k - 1 false rejections, but controls the probability of k or more false rejections at α =.05. We demonstrate with two published data sets how more hypotheses can be rejected with k-FWER methods compared to FWER control.  相似文献   

3.
In a recent article in The Journal of General Psychology, J. B. Hittner, K. May, and N. C. Silver (2003) described their investigation of several methods for comparing dependent correlations and found that all can be unsatisfactory, in terms of Type I errors, even with a sample size of 300. More precisely, when researchers test at the .05 level, the actual Type I error probability can exceed .10. The authors of this article extended J. B. Hittner et al.'s research by considering a variety of alternative methods. They found 3 that avoid inflating the Type I error rate above the nominal level. However, a Monte Carlo simulation demonstrated that when the underlying distribution of scores violated the assumption of normality, 2 of these methods had relatively low power and had actual Type I error rates well below the nominal level. The authors report comparisons with E. J. Williams' (1959) method.  相似文献   

4.
Wilcox, Keselman, Muska and Cribbie (2000) found a method for comparing the trimmed means of dependent groups that performed well in simulations, in terms of Type I errors, with a sample size as small as 21. Theory and simulations indicate that little power is lost under normality when using trimmed means rather than untrimmed means, and trimmed means can result in substantially higher power when sampling from a heavy‐tailed distribution. However, trimmed means suffer from two practical concerns described in this paper. Replacing trimmed means with a robust M‐estimator addresses these concerns, but control over the probability of a Type I error can be unsatisfactory when the sample size is small. Methods based on a simple modification of a one‐step M‐estimator that address the problems with trimmed means are examined. Several omnibus tests are compared, one of which performed well in simulations, even with a sample size of 11.  相似文献   

5.
Researchers can adopt one of many different measures of central tendency to examine the effect of a treatment variable across groups. These include least squares means, trimmed means, M‐estimators and medians. In addition, some methods begin with a preliminary test to determine the shapes of distributions before adopting a particular estimator of the typical score. We compared a number of recently developed adaptive robust methods with respect to their ability to control Type I error and their sensitivity to detect differences between the groups when data were non‐normal and heterogeneous, and the design was unbalanced. In particular, two new approaches to comparing the typical score across treatment groups, due to Babu, Padmanabhan, and Puri, were compared to two new methods presented by Wilcox and by Keselman, Wilcox, Othman, and Fradette. The procedures examined generally resulted in good Type I error control and therefore, on the basis of this critetion, it would be difficult to recommend one method over the other. However, the power results clearly favour one of the methods presented by Wilcox and Keselman; indeed, in the vast majority of the cases investigated, this most favoured approach had substantially larger power values than the other procedures, particularly when there were more than two treatment groups.  相似文献   

6.
Standard least squares analysis of variance methods suffer from poor power under arbitrarily small departures from normality and fail to control the probability of a Type I error when standard assumptions are violated. This article describes a framework for robust estimation and testing that uses trimmed means with an approximate degrees of freedom heteroscedastic statistic for independent and correlated groups designs in order to achieve robustness to the biasing effects of nonnormality and variance heterogeneity. The authors describe a nonparametric bootstrap methodology that can provide improved Type I error control. In addition, the authors indicate how researchers can set robust confidence intervals around a robust effect size parameter estimate. In an online supplement, the authors use several examples to illustrate the application of an SAS program to implement these statistical methods.  相似文献   

7.
The paper takes up the problem of performing all pairwise comparisons amongJ independent groups based on 20% trimmed means. Currently, a method that stands out is the percentile-t bootstrap method where the bootstrap is used to estimate the quantiles of a Studentized maximum modulus distribution when all pairs of population trimmed means are equal. However, a concern is that in simulations, the actual probability of one or more Type I errors can drop well below the nominal level when sample sizes are small. A practical issue is whether a method can be found that corrects this problem while maintaining the positive features of the percentile-t bootstrap. Three new methods are considered here, one of which achieves the desired goal. Another method, which takes advantage of theoretical results by Singh (1998), performs almost as well but is not recommended when the smallest sample size drops below 15. In some situations, however, it gives substantially shorter confidence intervals.  相似文献   

8.
The study investigated the independent and interactive effects of caffeine and expectancy on caffeine-related symptoms. High- and low-caffeine consumers were randomly assigned to either an expectancy or nonexpectancy instructional set and one of four caffeine doses. Subjects were administered the State-Trait Anxiety Inventory, (Spielberger & Gorsuch, 1970) and a Symptom Questionnaire (Christensen, White, Krietsch, & Steele, 1990) prior to and 45 min following consumption of one of the four caffeine doses. An analysis of covariance identified a significant main effect for the State-Trait Anxiety Inventory scores and significant main and interaction effects for four Symptom Questionnaire items. However, when the alpha levels were corrected for the increased probability of Type I error, using the Bonferroni procedure, these effects failed to achieve significance. These results suggest that previous reports of subjective caffeine effects are also suspect because of their failure to control for the increased probability of Type I error.  相似文献   

9.
"Perhaps it would be better not to know everything."   总被引:1,自引:0,他引:1  
The advent of statistical methods for evaluating the data of individual-subject designs invites a comparison of the usual research tactics of the group-design paradigm and the individual-subject-design paradigm. That comparison can hinge on the concept of assigning probabilities of Type 1 and Type 2 errors. Individual-subject designs are usually interpreted with implicit, very low probabilities of Type 1 errors, and correspondingly high probabilities of Type 1 errors, and correspondingly high probabilities of Type 2 errors. Group designs are usually interpreted with explicit, moderately low probabilities of Type 1 errors, and therefore with not such high probabilities of Type 2 errors as in the other paradigm. This difference may seem to be a minor one, considered in terms of centiles on a probability scale. However, when it is interpreted in terms of the substantive kinds of results likely to be produced by each paradigm, it appears that the individual-subject-design paradigm is more likely to contribute to the development of a technology of behavior, and it is suggested that this orientation should not be abandoned.  相似文献   

10.
This paper is concerned with supplementing statistical tests for the Rasch model so that additionally to the probability of the error of the first kind (Type I probability) the probability of the error of the second kind (Type II probability) can be controlled at a predetermined level by basing the test on the appropriate number of observations. An approach to determining a practically meaningful extent of model deviation is proposed, and the approximate distribution of the Wald test is derived under the extent of model deviation of interest.  相似文献   

11.
Experience with real data indicates that psychometric measures often have heavy-tailed distributions. This is known to be a serious problem when comparing the means of two independent groups because heavy-tailed distributions can have a serious effect on power. Another problem that is common in some areas is outliers. This paper suggests an approach to these problems based on the one-step M-estimator of location. Simulations indicate that the new procedure provides very good control over the probability of a Type I error even when distributions are skewed, have different shapes, and the variances are unequal. Moreover, the new procedure has considerably more power than Welch's method when distributions have heavy tails, and it compares well to Yuen's method for comparing trimmed means. Wilcox's median procedure has about the same power as the proposed procedure, but Wilcox's method is based on a statistic that has a finite sample breakdown point of only 1/n, wheren is the sample size. Comments on other methods for comparing groups are also included.  相似文献   

12.
Methods for comparing means are known to be highly nonrobust in terms of Type II errors. The problem is that slight shifts from normal distributions toward heavy-tailed distributions inflate the standard error of the sample mean. In contrast, the standard error of various robust measures of location, such as the one-step M-estimator, are relatively unaffected by heavy tails. Wilcox recently examined a method of comparing the one-step M-estimators of location corresponding to two independent groups which provided good control over the probability of a Type I error even for unequal sample sizes, unequal variances, and different shaped distributions. There is a fairly obvious extension of this procedure to pairwise comparisons of more than two independent groups, but simulations reported here indicate that it is unsatisfactory. A slight modification of the procedure is found to give much better results, although some caution must be taken when there are unequal sample sizes and light-tailed distributions. An omnibus test is examined as well.  相似文献   

13.
The direct social perception (DSP) thesis claims that we can directly perceive some mental states of other people. The direct perception of mental states has been formulated phenomenologically and psychologically, and typically restricted to the mental state types of intentions and emotions. I will compare DSP to another account of mindreading: dual process accounts that posit a fast, automatic “Type 1” form of mindreading and a slow, effortful “Type 2” form. I will here analyze whether dual process accounts’ Type 1 mindreading serves as a rival to DSP or whether some Type 1 mindreading can be perceptual. I will focus on Apperly and Butterfill’s dual process account of mindreading epistemic states such as perception, knowledge, and belief. This account posits a minimal form of Type 1 mindreading of belief-like states called registrations. I will argue that general dual process theories fit well with a modular view of perception that is considered a kind of Type 1 process. I will show that this modular view of perception challenges and has significant advantages over DSP’s phenomenological and psychological theses. Finally, I will argue that if such a modular view of perception is accepted, there is significant reason for thinking Type 1 mindreading of belief-like states is perceptual in nature. This would mean extending the scope of DSP to at least one type of epistemic state.  相似文献   

14.
Studies were reviewed in which the psychophysiological responses of Type A and B subjects were studied in various contexts. It appears that Type A's manifest greater psychophysiological arousal than Type B's in solitary as well as interpersonal situations in which there is a moderate external incentive to accomplish something. and there is an intermediate probability of failing to accomplish that something. Further, Type A's appear to manifest greater psychophysiological arousal than Type B's in interpersonal situations in which another person annoys or harasses the subject. Why Type A's respond in these situations with greater psychophysiological arousal was discussed in terms of the possibilities that (a) these situations may engage some defining characteristic(s) of Type A's, (b) Type A's may fear and therefore try to avoid failure more vigorously than Type B's, and (c) Type A's may be more motivated to gain and maintain control over important environmental events and therefore are more aroused by threats to such control than Type B's.  相似文献   

15.
We examine methods for measuring performance in signal-detection-like tasks when each participant provides only a few observations. Monte Carlo simulations demonstrate that standard statistical techniques applied to ad’ analysis can lead to large numbers of Type I errors (incorrectly rejecting a hypothesis of no difference). Various statistical methods were compared in terms of their Type I and Type II error (incorrectly accepting a hypothesis of no difference) rates. Our conclusions are the same whether these two types of errors are weighted equally or Type I errors are weighted more heavily. The most promising method is to combine an aggregated’ measure with a percentile bootstrap confidence interval, a computerintensive nonparametric method of statistical inference. Researchers who prefer statistical techniques more commonly used in psychology, such as a repeated measurest test, should useγ (Goodman & Kruskal, 1954), since it performs slightly better than or nearly as well asd’. In general, when repeated measurest tests are used,γ is more conservative thand’: It makes more Type II errors, but its Type I error rate tends to be much closer to that of the traditional .05 α level. It is somewhat surprising thatγ performs as well as it does, given that the simulations that generated the hypothetical data conformed completely to thed’ model. Analyses in which H—FA was used had the highest Type I error rates. Detailed simulation results can be downloaded fromwww.psychonomic.org/archive/Schooler-BRM-2004.zip.  相似文献   

16.
We propose a simple modification of Hochberg's step‐up Bonferroni procedure for multiple tests of significance. The proposed procedure is always more powerful than Hochberg's procedure for more than two tests, and is more powerful than Hommel's procedure for three and four tests. A numerical analysis of the new procedure indicates that its Type I error is controlled under independence of the test statistics, at a level equal to or just below the nominal Type I error. Examination of various non‐null configurations of hypotheses shows that the modified procedure has a power advantage over Hochberg's procedure which increases in relationship to the number of false hypotheses.  相似文献   

17.
We examined nine adaptive methods of trimming, that is, methods that empirically determine when data should be trimmed and the amount to be trimmed from the tails of the empirical distribution. Over the 240 empirical values collected for each method investigated, in which we varied the total percentage of data trimmed, sample size, degree of variance heterogeneity, pairing of variances and group sizes, and population shape, one method resulted in exceptionally good control of Type I errors. However, under less extreme cases of non‐normality and variance heterogeneity a number of methods exhibited reasonably good Type I error control. With regard to the power to detect non‐null treatment effects, we found that the choice among the methods depended on the degree of non‐normality and variance heterogeneity. Recommendations are offered.  相似文献   

18.
The Institute for Perception, Richmond, Virginia In the dual pair method, the subject is presented with two stimuli in two pairs: One pair is composed of two samples of the same stimulus; the other pair is composed of two samples of different stimuli, one being the same as that in the identical pair. The task of the judge is to select the most different pair. The psychometric function for the dual pair method is derived and expressed in terms of a singly noncentral beta distribution. A table is provided that connects a measure of the degree of difference, d , to the probability of a correct response. This table assumes an unbiased observer and differencing decision rule. A table is provided to give an estimate of the variance of d¢, the experimental estimate of d. The power of the dual pair method is also investigated, and a formula to determine the sample size required to meet Type I and Type II error specifications is given. The dual pair method appears to be slightly less powerful than the duotrio and the triangular methods. Experimental investigation is needed to explore the dual pair in applied research work.  相似文献   

19.
In the dual pair method, the subject is presented with two stimuli in two pairs: One pair is composed of two samples of the same stimulus; the other pair is composed of two samples of different stimuli, one being the same as that in the identical pair. The task of the judge is to select the most different pair. The psychometric function for the dual pair method is derived and expressed in terms of a singly noncentral beta distribution. A table is provided that connects a measure of the degree of difference, d, to the probability of a correct response. This table assumes an unbiased observer and differencing decision rule. A table is provided to give an estimate of the variance of d , the experimental estimate of d. The power of the dual pair method is also investigated, and a formula to determine the sample size required to meet Type I and Type II error specifications is given. The dual pair method appears to be slightly less powerful than the duo-trio and the triangular methods. Experimental investigation is needed to explore the dual pair in applied research work.  相似文献   

20.
The Type I error probability and the power of the independent samples t test, performed directly on the ranks of scores in combined samples in place of the original scores, are known to be the same as those of the non‐parametric Wilcoxon–Mann–Whitney (WMW) test. In the present study, simulations revealed that these probabilities remain essentially unchanged when the number of ranks is reduced by assigning the same rank to multiple ordered scores. For example, if 200 ranks are reduced to as few as 20, or 10, or 5 ranks by replacing sequences of consecutive ranks by a single number, the Type I error probability and power stay about the same. Significance tests performed on these modular ranks consistently reproduce familiar findings about the comparative power of the t test and the WMW tests for normal and various non‐normal distributions. Similar results are obtained for modular ranks used in comparing the one‐sample t test and the Wilcoxon signed ranks test.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号