首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this journal, Zimmerman (2004, 2011) has discussed preliminary tests that researchers often use to choose an appropriate method for comparing locations when the assumption of normality is doubtful. The conceptual problem with this approach is that such a two‐stage process makes both the power and the significance of the entire procedure uncertain, as type I and type II errors are possible at both stages. A type I error at the first stage, for example, will obviously increase the probability of a type II error at the second stage. Based on the idea of Schmider et al. (2010) , which proposes that simulated sets of sample data be ranked with respect to their degree of normality, this paper investigates the relationship between population non‐normality and sample non‐normality with respect to the performance of the ANOVA, Brown–Forsythe test, Welch test, and Kruskal–Wallis test when used with different distributions, sample sizes, and effect sizes. The overall conclusion is that the Kruskal–Wallis test is considerably less sensitive to the degree of sample normality when populations are distinctly non‐normal and should therefore be the primary tool used to compare locations when it is known that populations are not at least approximately normal.  相似文献   

2.
The data obtained from one‐way independent groups designs is typically non‐normal in form and rarely equally variable across treatment populations (i.e. population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e. the analysis of variance F test) typically provides invalid results (e.g. too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non‐normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e. trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non‐normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non‐normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non‐normal.  相似文献   

3.
In the present paper, a general class of heteroscedastic one‐factor models is considered. In these models, the residual variances of the observed scores are explicitly modelled as parametric functions of the one‐dimensional factor score. A marginal maximum likelihood procedure for parameter estimation is proposed under both the assumption of multivariate normality of the observed scores conditional on the single common factor score and the assumption of normality of the common factor score. A likelihood ratio test is derived, which can be used to test the usual homoscedastic one‐factor model against one of the proposed heteroscedastic models. Simulation studies are carried out to investigate the robustness and the power of this likelihood ratio test. Results show that the asymptotic properties of the test statistic hold under both small test length conditions and small sample size conditions. Results also show under what conditions the power to detect different heteroscedasticity parameter values is either small, medium, or large. Finally, for illustrative purposes, the marginal maximum likelihood estimation procedure and the likelihood ratio test are applied to real data.  相似文献   

4.
Preliminary tests of equality of variances used before a test of location are no longer widely recommended by statisticians, although they persist in some textbooks and software packages. The present study extends the findings of previous studies and provides further reasons for discontinuing the use of preliminary tests. The study found Type I error rates of a two‐stage procedure, consisting of a preliminary Levene test on samples of different sizes with unequal variances, followed by either a Student pooled‐variances t test or a Welch separate‐variances t test. Simulations disclosed that the twostage procedure fails to protect the significance level and usually makes the situation worse. Earlier studies have shown that preliminary tests often adversely affect the size of the test, and also that the Welch test is superior to the t test when variances are unequal. The present simulations reveal that changes in Type I error rates are greater when sample sizes are smaller, when the difference in variances is slight rather than extreme, and when the significance level is more stringent. Furthermore, the validity of the Welch test deteriorates if it is used only on those occasions where a preliminary test indicates it is needed. Optimum protection is assured by using a separate‐variances test unconditionally whenever sample sizes are unequal.  相似文献   

5.
6.
A one-way random effects model for trimmed means   总被引:1,自引:0,他引:1  
The random effects ANOVA model plays an important role in many psychological studies, but the usual model suffers from at least two serious problems. The first is that even under normality, violating the assumption of equal variances can have serious consequences in terms of Type I errors or significance levels, and it can affect power as well. The second and perhaps more serious concern is that even slight departures from normality can result in a substantial loss of power when testing hypotheses. Jeyaratnam and Othman (1985) proposed a method for handling unequal variances, under the assumption of normality, but no results were given on how their procedure performs when distributions are nonnormal. A secondary goal in this paper is to address this issue via simulations. As will be seen, problems arise with both Type I errors and power. Another secondary goal is to provide new simulation results on the Rust-Fligner modification of the Kruskal-Wallis test. The primary goal is to propose a generalization of the usual random effects model based on trimmed means. The resulting test of no differences among J randomly sampled groups has certain advantages in terms of Type I errors, and it can yield substantial gains in power when distributions have heavy tails and outliers. This last feature is very important in applied work because recent investigations indicate that heavy-tailed distributions are common. Included is a suggestion for a heteroscedastic Winsorized analog of the usual intraclass correlation coefficient.  相似文献   

7.
Test of homogeneity of covariances (or homoscedasticity) among several groups has many applications in statistical analysis. In the context of incomplete data analysis, tests of homoscedasticity among groups of cases with identical missing data patterns have been proposed to test whether data are missing completely at random (MCAR). These tests of MCAR require large sample sizes n and/or large group sample sizes n i , and they usually fail when applied to nonnormal data. Hawkins (Technometrics 23:105–110, 1981) proposed a test of multivariate normality and homoscedasticity that is an exact test for complete data when n i are small. This paper proposes a modification of this test for complete data to improve its performance, and extends its application to test of homoscedasticity and MCAR when data are multivariate normal and incomplete. Moreover, it is shown that the statistic used in the Hawkins test in conjunction with a nonparametric k-sample test can be used to obtain a nonparametric test of homoscedasticity that works well for both normal and nonnormal data. It is explained how a combination of the proposed normal-theory Hawkins test and the nonparametric test can be employed to test for homoscedasticity, MCAR, and multivariate normality. Simulation studies show that the newly proposed tests generally outperform their existing competitors in terms of Type I error rejection rates. Also, a power study of the proposed tests indicates good power. The proposed methods use appropriate missing data imputations to impute missing data. Methods of multiple imputation are described and one of the methods is employed to confirm the result of our single imputation methods. Examples are provided where multiple imputation enables one to identify a group or groups whose covariance matrices differ from the majority of other groups.  相似文献   

8.
Several studies have demonstrated that the fixed-sample stopping rule (FSR), in which the sample size is determined in advance, is less practical and efficient than are sequential-stopping rules. The composite limited adaptive sequential test (CLAST) is one such sequential-stopping rule. Previous research has shown that CLAST is more efficient in terms of sample size and power than are the FSR and other sequential rules and that it reflects more realistically the practice of experimental psychology researchers. The CLAST rule has been applied only to thet test of mean differences with two matched samples and to the chi-square independence test for twofold contingency tables. The present work extends previous research on the efficiency of CLAST to multiple group statistical tests. Simulation studies were conducted to test the efficiency of the CLAST rule for the one-way ANOVA for fixed effects models. The ANOVA general test and two linear contrasts of multiple comparisons among treatment means are considered. The article also introduces four rules for allocatingN observations toJ groups under the general null hypothesis and three allocation rules for the linear contrasts. Results show that the CLAST rule is generally more efficient than the FSR in terms of sample size and power for one-way ANOVA tests. However, the allocation rules vary in their optimality and have a differential impact on sample size and power. Thus, selecting an allocation rule depends on the cost of sampling and the intended precision.  相似文献   

9.
Student's one-sample t-test is a commonly used method when inference about the population mean is made. As advocated in textbooks and articles, the assumption of normality is often checked by a preliminary goodness-of-fit (GOF) test. In a paper recently published by Schucany and Ng it was shown that, for the uniform distribution, screening of samples by a pretest for normality leads to a more conservative conditional Type I error rate than application of the one-sample t-test without preliminary GOF test. In contrast, for the exponential distribution, the conditional level is even more elevated than the Type I error rate of the t-test without pretest. We examine the reasons behind these characteristics. In a simulation study, samples drawn from the exponential, lognormal, uniform, Student's t-distribution with 2 degrees of freedom (t(2) ) and the standard normal distribution that had passed normality screening, as well as the ingredients of the test statistics calculated from these samples, are investigated. For non-normal distributions, we found that preliminary testing for normality may change the distribution of means and standard deviations of the selected samples as well as the correlation between them (if the underlying distribution is non-symmetric), thus leading to altered distributions of the resulting test statistics. It is shown that for skewed distributions the excess in Type I error rate may be even more pronounced when testing one-sided hypotheses.  相似文献   

10.
We study several aspects of bootstrap inference for covariance structure models based on three test statistics, including Type I error, power and sample‐size determination. Specifically, we discuss conditions for a test statistic to achieve a more accurate level of Type I error, both in theory and in practice. Details on power analysis and sample‐size determination are given. For data sets with heavy tails, we propose applying a bootstrap methodology to a transformed sample by a downweighting procedure. One of the key conditions for safe bootstrap inference is generally satisfied by the transformed sample but may not be satisfied by the original sample with heavy tails. Several data sets illustrate that, by combining downweighting and bootstrapping, a researcher may find a nearly optimal procedure for evaluating various aspects of covariance structure models. A rule for handling non‐convergence problems in bootstrap replications is proposed.  相似文献   

11.
Sequential rules are explored in the context of null hypothesis significance testing. Several studies have demonstrated that the fixed-sample stopping rule, in which the sample size used by researchers is determined in advance, is less practical and less efficient than sequential stopping rules. It is proposed that a sequential stopping rule called CLAST (composite limited adaptive sequential test) is a superior variant of COAST (composite open adaptive sequential test), a sequential rule proposed by Frick (1998). Simulation studies are conducted to test the efficiency of the proposed rule in terms of sample size and power. Two statistical tests are used: the one-tailed t test of mean differences with two matched samples, and the chi-square independence test for twofold contingency tables. The results show that the CLAST rule is more efficient than the COAST rule and reflects more realistically the practice of experimental psychology researchers.  相似文献   

12.
The study explores the robustness to violations of normality and sphericity of linear mixed models when they are used with the Kenward–Roger procedure (KR) in split‐plot designs in which the groups have different distributions and sample sizes are small. The focus is on examining the effect of skewness and kurtosis. To this end, a Monte Carlo simulation study was carried out, involving a split‐plot design with three levels of the between‐subjects grouping factor and four levels of the within‐subjects factor. The results show that: (1) the violation of the sphericity assumption did not affect KR robustness when the assumption of normality was not fulfilled; (2) the robustness of the KR procedure decreased as skewness in the distributions increased, there being no strong effect of kurtosis; and (3) the type of pairing between kurtosis and group size was shown to be a relevant variable to consider when using this procedure, especially when pairing is positive (i.e., when the largest group is associated with the largest value of the kurtosis coefficient and the smallest group with its smallest value). The KR procedure can be a good option for analysing repeated‐measures data when the groups have different distributions, provided the total sample sizes are 45 or larger and the data are not highly or extremely skewed.  相似文献   

13.
Research with infants is often slow and time-consuming, so infant researchers face great pressure to use the available participants in an efficient way. One strategy that researchers sometimes use to optimize efficiency is data peeking (or “optional stopping”), that is, doing a preliminary analysis (whether a formal significance test or informal eyeballing) of collected data. Data peeking helps researchers decide whether to abandon or tweak a study, decide that a sample is complete, or decide to continue adding data points. Unfortunately, data peeking can have negative consequences such as increased rates of false positives (wrongly concluding that an effect is present when it is not). We argue that, with simple corrections, the benefits of data peeking can be harnessed to use participants more efficiently. We review two corrections that can be transparently reported: one can be applied at the beginning of a study to lay out a plan for data peeking, and a second can be applied after data collection has already started. These corrections are easy to implement in the current framework of infancy research. The use of these corrections, together with transparent reporting, can increase the replicability of infant research.  相似文献   

14.
When planning a study, sample size determination is one of the most important tasks facing the researcher. The size will depend on the purpose of the study, the cost limitations, and the nature of the data. By specifying the standard deviation ratio and/or the sample size ratio, the present study considers the problem of heterogeneous variances and non‐normality for Yuen's two‐group test and develops sample size formulas to minimize the total cost or maximize the power of the test. For a given power, the sample size allocation ratio can be manipulated so that the proposed formulas can minimize the total cost, the total sample size, or the sum of total sample size and total cost. On the other hand, for a given total cost, the optimum sample size allocation ratio can maximize the statistical power of the test. After the sample size is determined, the present simulation applies Yuen's test to the sample generated, and then the procedure is validated in terms of Type I errors and power. Simulation results show that the proposed formulas can control Type I errors and achieve the desired power under the various conditions specified. Finally, the implications for determining sample sizes in experimental studies and future research are discussed.  相似文献   

15.
Previous experiments have demonstrated that exposure to faces can change the perception of normality in new faces, such that faces similar to those at exposure appear more normal. Here we examined how experience influences adaptation effects in African Hadza hunter-gatherers, who have limited experience with White faces. We exposed participants to sets of either Hadza or White European faces that were manipulated to possess either wide-spaced or narrow-spaced eyes. We collected normality judgments both pre-exposure and post-exposure by showing pairs of images, one with wide-spaced and one with narrow-spaced eyes. Examining the difference between the pre-exposure and post-exposure judgments revealed that participants selected an increased number of images that were congruent with the faces to which they had been exposed. The change in normality judgments was strongest for White faces, suggesting that representations of White ethnicity faces are more malleable and less robust to adaptation, potentially because of the decreased experience that individuals had with them. A second experiment using the same test stimuli with a sample of White participants revealed equivalent adaptation effects for both Hadza and White faces. These data highlight the role of experience on the high-level visual adaptation of faces.  相似文献   

16.
A split-sample replication criterion originally proposed by J. E. Overall and K. N. Magee (1992) as a stopping rule for hierarchical cluster analysis is applied to multiple data sets generated by sampling with replacement from an original simulated primary data set. An investigation of the validity of this bootstrap procedure was undertaken using different combinations of the true number of latent populations, degrees of overlap, and sample sizes. The bootstrap procedure enhanced the accuracy of identifying the true number of latent populations under virtually all conditions. Increasing the size of the resampled data sets relative to the size of the primary data set further increased accuracy. A computer program to implement the bootstrap stopping rule is made available via a referenced Web site.  相似文献   

17.
Abstract This article considers the problem of comparing two independent groups in terms of some measure of location. It is well known that with Student's two-independent-sample t test, the actual level of significance can be well above or below the nominal level, confidence intervals can have inaccurate probability coverage, and power can be low relative to other methods. A solution to deal with heterogeneity is Welch's (1938) test. Welch's test deals with heteroscedasticity but can have poor power under arbitrarily small departures from normality. Yuen (1974) generalized Welch's test to trimmed means; her method provides improved control over the probability of a Type I error, but problems remain. Transformations for skewness improve matters, but the probability of a Type I error remains unsatisfactory in some situations. We find that a transformation for skewness combined with a bootstrap method improves Type I error control and probability coverage even if sample sizes are small.  相似文献   

18.
Organizational and validation researchers often work with data that has been subjected to selection on the predictor and attrition on the criterion. These researchers often use the data observed under these conditions to estimate either the predictor or criterion's restricted population means. We show that the restricted means due to direct or indirect selection are a function of the population means plus the selection ratios. Thus, any difference between selected mean groups reflects the population difference plus the selection ratio difference. When there is also attrition on the criterion, the estimation of group differences becomes even more complicated. The effect of selection and attrition induces measurement bias when estimating the restricted population mean of either the predictor or criterion. A sample mean observed under selection and attrition does not estimate either the population mean or the restricted population mean. We propose several procedures under normality that yield unbiased estimates of the mean. The procedures focus on correcting the effects of selection and attrition. Each procedure was evaluated with a Monte Carlo simulation to ascertain its strengths and weaknesses. Given appropriate sample size and conditions, we show that these procedures yield unbiased estimators of the restricted and unrestricted population means for both predictor and criterion. We also show how our findings have implications for replicating selected group differences.  相似文献   

19.
Social scientists are frequently interested in assessing the qualities of social settings such as classrooms, schools, neighborhoods, or day care centers. The most common procedure requires observers to rate social interactions within these settings on multiple items and then to combine the item responses to obtain a summary measure of setting quality. A key aspect of the quality of such a summary measure is its reliability. In this paper we derive a confidence interval for reliability, a test for the hypothesis that the reliability meets a minimum standard, and the power of this test against alternative hypotheses. Next, we consider the problem of using data from a preliminary field study of the measurement procedure to inform the design of a later study that will test substantive hypotheses about the correlates of setting quality. The preliminary study is typically called the ??generalizability study?? or ??G study?? while the later, substantive study is called the ??decision study?? or ??D study.?? We show how to use data from the G study to estimate reliability, a confidence interval for the reliability, and the power of tests for the reliability of measurement produced under alternative designs for the D study. We conclude with a discussion of sample size requirements for G studies.  相似文献   

20.
This article develops a procedure based on copulas to simulate multivariate nonnormal data that satisfy a prespecified variance-covariance matrix. The covariance matrix used can comply with a specific moment structure form (e.g., a factor analysis or a general structural equation model). Thus, the method is particularly useful for Monte Carlo evaluation of structural equation models within the context of nonnormal data. The new procedure for nonnormal data simulation is theoretically described and also implemented in the widely used R environment. The quality of the method is assessed by Monte Carlo simulations. A 1-sample test on the observed covariance matrix based on the copula methodology is proposed. This new test for evaluating the quality of a simulation is defined through a particular structural model specification and is robust against normality violations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号