首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
When planning a study, sample size determination is one of the most important tasks facing the researcher. The size will depend on the purpose of the study, the cost limitations, and the nature of the data. By specifying the standard deviation ratio and/or the sample size ratio, the present study considers the problem of heterogeneous variances and non‐normality for Yuen's two‐group test and develops sample size formulas to minimize the total cost or maximize the power of the test. For a given power, the sample size allocation ratio can be manipulated so that the proposed formulas can minimize the total cost, the total sample size, or the sum of total sample size and total cost. On the other hand, for a given total cost, the optimum sample size allocation ratio can maximize the statistical power of the test. After the sample size is determined, the present simulation applies Yuen's test to the sample generated, and then the procedure is validated in terms of Type I errors and power. Simulation results show that the proposed formulas can control Type I errors and achieve the desired power under the various conditions specified. Finally, the implications for determining sample sizes in experimental studies and future research are discussed.  相似文献   

2.
Yuen's two‐sample trimmed mean test statistic is one of the most robust methods to apply when variances are heterogeneous. The present study develops formulas for the sample size required for the test. The formulas are applicable for the cases of unequal variances, non‐normality and unequal sample sizes. Given the specified α and the power (1?β), the minimum sample size needed by the proposed formulas under various conditions is less than is given by the conventional formulas. Moreover, given a specified size of sample calculated by the proposed formulas, simulation results show that Yuen's test can achieve statistical power which is generally superior to that of the approximate t test. A numerical example is provided.  相似文献   

3.
The issue of the sample size necessary to ensure adequate statistical power has been the focus of considerableattention in scientific research. Conventional presentations of sample size determination do not consider budgetary and participant allocation scheme constraints, although there is some discussion in the literature. The introduction of additional allocation and cost concerns complicates study design, although the resulting procedure permits a practical treatment of sample size planning. This article presents exact techniques for optimizing sample size determinations in the context of Welch (Biometrika, 29, 350–362, 1938) test of the difference between two means under various design and cost considerations. The allocation schemes include cases in which (1) the ratio of group sizes is given and (2) one sample size is specified. The cost implications suggest optimally assigning subjects (1) to attain maximum power performance for a fixed cost and (2) to meet adesignated power level for the least cost. The proposed methods provide useful alternatives to the conventional procedures and can be readily implemented with the developed R and SAS programs that are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.  相似文献   

4.
The equality of two group variances is frequently tested in experiments. However, criticisms of null hypothesis statistical testing on means have recently arisen and there is interest in other types of statistical tests of hypotheses, such as superiority/non-inferiority and equivalence. Although these tests have become more common in psychology and social sciences, the corresponding sample size estimation for these tests is rarely discussed, especially when the sampling unit costs are unequal or group sizes are unequal for two groups. Thus, for finding optimal sample size, the present study derived an initial allocation by approximating the percentiles of an F distribution with the percentiles of the standard normal distribution and used the exhaustion algorithm to select the best combination of group sizes, thereby ensuring the resulting power reaches the designated level and is maximal with a minimal total cost. In this manner, optimization of sample size planning is achieved. The proposed sample size determination has a wide range of applications and is efficient in terms of Type I errors and statistical power in simulations. Finally, an illustrative example from a report by the Health Survey for England, 1995–1997, is presented using hypertension data. For ease of application, four R Shiny apps are provided and benchmarks for setting equivalence margins are suggested.  相似文献   

5.
Welch’s (Biometrika 29: 350–362, 1938) procedure has emerged as a robust alternative to the Student’s t test for comparing the means of two normal populations with unknown and possibly unequal variances. To facilitate the advocated statistical practice of confidence intervals and further improve the potential applicability of Welch’s procedure, in the present article, we consider exact approaches to optimize sample size determinations for precise interval estimation of the difference between two means under various allocation and cost considerations. The desired precision of a confidence interval is assessed with respect to the control of expected half-width, and to the assurance probability of interval half-width within a designated value. Furthermore, the design schemes in terms of participant allocation and cost constraints include (a) giving the ratio of group sizes, (b) specifying one sample size, (c) attaining maximum precision performance for a fixed cost, and (d) meeting a specified precision level for the least cost. The proposed methods provide useful alternatives to the conventional sample size procedures. Also, the developed programs expand the degree of generality for the existing statistical software packages and can be accessed at brm.psychonomic-journals.org/content/ supplemental.  相似文献   

6.
Several studies have demonstrated that the fixed-sample stopping rule (FSR), in which the sample size is determined in advance, is less practical and efficient than are sequential-stopping rules. The composite limited adaptive sequential test (CLAST) is one such sequential-stopping rule. Previous research has shown that CLAST is more efficient in terms of sample size and power than are the FSR and other sequential rules and that it reflects more realistically the practice of experimental psychology researchers. The CLAST rule has been applied only to thet test of mean differences with two matched samples and to the chi-square independence test for twofold contingency tables. The present work extends previous research on the efficiency of CLAST to multiple group statistical tests. Simulation studies were conducted to test the efficiency of the CLAST rule for the one-way ANOVA for fixed effects models. The ANOVA general test and two linear contrasts of multiple comparisons among treatment means are considered. The article also introduces four rules for allocatingN observations toJ groups under the general null hypothesis and three allocation rules for the linear contrasts. Results show that the CLAST rule is generally more efficient than the FSR in terms of sample size and power for one-way ANOVA tests. However, the allocation rules vary in their optimality and have a differential impact on sample size and power. Thus, selecting an allocation rule depends on the cost of sampling and the intended precision.  相似文献   

7.
When the underlying variances are unknown or/and unequal, using the conventional F test is problematic in the two‐factor hierarchical data structure. Prompted by the approximate test statistics (Welch and Alexander–Govern methods), the authors develop four new heterogeneous test statistics to test factor A and factor B nested within A for the unbalanced fixed‐effect two‐stage nested design under variance heterogeneity. The actual significance levels and statistical power of the test statistics were compared in a simulation study. The results show that the proposed procedures maintain better Type I error rate control and have greater statistical power than those obtained by the conventional F test in various conditions. Therefore, the proposed test statistics are recommended in terms of robustness and easy implementation.  相似文献   

8.
Intraclass correlation coefficients are used extensively to measure the reliability or degree of resemblance among group members in multilevel research. This study concerns the problem of the necessary sample size to ensure adequate statistical power for hypothesis tests concerning the intraclass correlation coefficient in the one-way random-effects model. In view of the incomplete and problematic numerical results in the literature, the approximate sample size formula constructed from Fisher’s transformation is reevaluated and compared with an exact approach across a wide range of model configurations. These comprehensive examinations showed that the Fisher transformation method is appropriate only under limited circumstances, and therefore it is not recommended as a general method in practice. For advance design planning of reliability studies, the exact sample size procedures are fully described and illustrated for various allocation and cost schemes. Corresponding computer programs are also developed to implement the suggested algorithms.  相似文献   

9.
The factorial 2 × 2 fixed‐effect ANOVA is a procedure used frequently in scientific research to test mean differences between‐subjects in all of the groups. But if the assumption of homogeneity is violated, the test for the row, column, and the interaction effect might be invalid or less powerful. Therefore, for planning research in the case of unknown and possibly unequal variances, it is worth developing a sample size formula to obtain the desired power. This article suggests a simple formula to determine the sample size for 2 × 2 fixed‐effect ANOVA for heterogeneous variances across groups. We use the approximate Welch t test and consider the variance ratio to derive the formula. The sample size determination requires two‐step iterations but the approximate sample sizes needed for the main effect and the interaction effect can be determined separately with the specified power. The present study also provides an example and a SAS program to facilitate the calculation process.  相似文献   

10.
For one‐way fixed effects ANOVA, it is well known that the conventional F test of the equality of means is not robust to unequal variances, and numerous methods have been proposed for dealing with heteroscedasticity. On the basis of extensive empirical evidence of Type I error control and power performance, Welch's procedure is frequently recommended as the major alternative to the ANOVA F test under variance heterogeneity. To enhance its practical usefulness, this paper considers an important aspect of Welch's method in determining the sample size necessary to achieve a given power. Simulation studies are conducted to compare two approximate power functions of Welch's test for their accuracy in sample size calculations over a wide variety of model configurations with heteroscedastic structures. The numerical investigations show that Levy's (1978a) approach is clearly more accurate than the formula of Luh and Guo (2011) for the range of model specifications considered here. Accordingly, computer programs are provided to implement the technique recommended by Levy for power calculation and sample size determination within the context of the one‐way heteroscedastic ANOVA model.  相似文献   

11.
Two ways to reduce the costs of training evaluation are examined. First, we examine the potential for reducing the costs of training evaluation by assigning different numbers of subjects into training and control groups. Given a total N of subjects, statistical power to detect the effectiveness of a training program can be maximized by assigning the subjects equally to training and control groups. If we take into account the costs of training evaluation, however, an unequal-group-size design with a larger total N may achieve the same level of statistical power at lower cost. We derive formulas for the optimal ratios of the control group size to the training group size for both ANOVA and ANCOVA designs, incorporating the differential costs of training and control group participation. Second, we examine the possibility that using a less expensive proxy criterion measure in place of the target criterion measure of interest when evaluating the training effectiveness can pay off. We show that using a proxy criterion increases the sample size needed to achieve a given level of statistical power, and then we describe procedures for examining the tradeoff between the costs saved by using the less expensive proxy criterion and the costs incurred by the larger sample size.  相似文献   

12.
Hertzog et al. evaluated the statistical power of linear latent growth curve models (LGCMs) to detect individual differences in change, i.e., variances of latent slopes, as a function of sample size, number of longitudinal measurement occasions, and growth curve reliability. We extend this work by investigating the effect of the number of indicators per measurement occasion on power. We analytically demonstrate that the positive effect of multiple indicators on statistical power is inversely related to the relative magnitude of occasion‐specific latent residual variance and is independent of the specific model that constitutes the observed variables, in particular of other parameters in the LGCM. When designing a study, researchers have to consider trade‐offs of costs and benefits of different design features. We demonstrate how knowledge about power equivalent transformations between indicator measurement designs allows researchers to identify the most cost‐efficient research design for detecting parameters of interest. Finally, we integrate different formal results to exhibit the trade‐off between the number of measurement occasions and number of indicators per occasion for constant power in LGCMs.  相似文献   

13.
Adverse impact indices are intended to detect potentially discriminatory employment practices, and although the number of such indices continues to increase, with a few exceptions, little is known about the comparative performance of those indices under various conditions. We examined the performance of eight such indices, including statistical significance tests, ad hoc “practical” tests proposed by the courts, and effect size measures. The power of these indices was examined under various levels of adverse impact, applicant pool sizes, and selection ratios. Although no index performed well in all circumstances, typically, Fisher’s exact test and the Z test for proportions (with correction for continuity) performed adequately most often. However, in some conditions, these indices were outperformed by some of the “practical” tests.  相似文献   

14.
The underlying statistical models for multiple regression analysis are typically attributed to two types of modeling: fixed and random. The procedures for calculating power and sample size under the fixed regression models are well known. However, the literature on random regression models is limited and has been confined to the case of all variables having a joint multivariate normal distribution. This paper presents a unified approach to determining power and sample size for random regression models with arbitrary distribution configurations for explanatory variables. Numerical examples are provided to illustrate the usefulness of the proposed method and Monte Carlo simulation studies are also conducted to assess the accuracy. The results show that the proposed method performs well for various model specifications and explanatory variable distributions. The author would like to thank the editor, the associate editor, and the referees for drawing attention to pertinent references that led to improved presentation. This research was partially supported by National Science Council grant NSC-94-2118-M-009-004.  相似文献   

15.
Experience with real data indicates that psychometric measures often have heavy-tailed distributions. This is known to be a serious problem when comparing the means of two independent groups because heavy-tailed distributions can have a serious effect on power. Another problem that is common in some areas is outliers. This paper suggests an approach to these problems based on the one-step M-estimator of location. Simulations indicate that the new procedure provides very good control over the probability of a Type I error even when distributions are skewed, have different shapes, and the variances are unequal. Moreover, the new procedure has considerably more power than Welch's method when distributions have heavy tails, and it compares well to Yuen's method for comparing trimmed means. Wilcox's median procedure has about the same power as the proposed procedure, but Wilcox's method is based on a statistic that has a finite sample breakdown point of only 1/n, wheren is the sample size. Comments on other methods for comparing groups are also included.  相似文献   

16.
Adverse impact is often assessed by evaluating whether the success rates for 2 groups on a selection procedure are significantly different. Although various statistical methods have been used to analyze adverse impact data, Fisher's exact test (FET) has been widely adopted, especially when sample sizes are small. In recent years, however, the statistical field has expressed concern regarding the default use of the FET and has proposed several alternative tests. This article reviews Lancaster's mid-P (LMP) test (Lancaster, 1961), an adjustment to the FET that tends to have increased power while maintaining a Type I error rate close to the nominal level. On the basis of Monte Carlo simulation results, the LMP test was found to outperform the FET across a wide range of conditions typical of adverse impact analyses. The LMP test was also found to provide better control over Type I errors than the large-sample Z-test when sample size was very small, but it tended to have slightly lower power than the Z-test under some conditions.  相似文献   

17.
Categorical moderators are often included in mixed-effects meta-analysis to explain heterogeneity in effect sizes. An assumption in tests of categorical moderator effects is that of a constant between-study variance across all levels of the moderator. Although it rarely receives serious thought, there can be statistical ramifications to upholding this assumption. We propose that researchers should instead default to assuming unequal between-study variances when analysing categorical moderators. To achieve this, we suggest using a mixed-effects location-scale model (MELSM) to allow group-specific estimates for the between-study variance. In two extensive simulation studies, we show that in terms of Type I error and statistical power, little is lost by using the MELSM for moderator tests, but there can be serious costs when an equal variance mixed-effects model (MEM) is used. Most notably, in scenarios with balanced sample sizes or equal between-study variance, the Type I error and power rates are nearly identical between the MEM and the MELSM. On the other hand, with imbalanced sample sizes and unequal variances, the Type I error rate under the MEM can be grossly inflated or overly conservative, whereas the MELSM does comparatively well in controlling the Type I error across the majority of cases. A notable exception where the MELSM did not clearly outperform the MEM was in the case of few studies (e.g., 5). With respect to power, the MELSM had similar or higher power than the MEM in conditions where the latter produced non-inflated Type 1 error rates. Together, our results support the idea that assuming unequal between-study variances is preferred as a default strategy when testing categorical moderators.  相似文献   

18.
Subgroup analyses allow us to examine the influence of a categorical moderator on the effect size in meta‐analysis. We conducted a simulation study using a dichotomous moderator, and compared the impact of pooled versus separate estimates of the residual between‐studies variance on the statistical performance of the Q B (P) and Q B (S) tests for subgroup analyses assuming a mixed‐effects model. Our results suggested that similar performance can be expected as long as there are at least 20 studies and these are approximately balanced across categories. Conversely, when subgroups were unbalanced, the practical consequences of having heterogeneous residual between‐studies variances were more evident, with both tests leading to the wrong statistical conclusion more often than in the conditions with balanced subgroups. A pooled estimate should be preferred for most scenarios, unless the residual between‐studies variances are clearly different and there are enough studies in each category to obtain precise separate estimates.  相似文献   

19.
Wilcox, Keselman, Muska and Cribbie (2000) found a method for comparing the trimmed means of dependent groups that performed well in simulations, in terms of Type I errors, with a sample size as small as 21. Theory and simulations indicate that little power is lost under normality when using trimmed means rather than untrimmed means, and trimmed means can result in substantially higher power when sampling from a heavy‐tailed distribution. However, trimmed means suffer from two practical concerns described in this paper. Replacing trimmed means with a robust M‐estimator addresses these concerns, but control over the probability of a Type I error can be unsatisfactory when the sample size is small. Methods based on a simple modification of a one‐step M‐estimator that address the problems with trimmed means are examined. Several omnibus tests are compared, one of which performed well in simulations, even with a sample size of 11.  相似文献   

20.
The data obtained from one‐way independent groups designs is typically non‐normal in form and rarely equally variable across treatment populations (i.e. population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e. the analysis of variance F test) typically provides invalid results (e.g. too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non‐normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e. trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non‐normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non‐normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non‐normal.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号