首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Yuen's two‐sample trimmed mean test statistic is one of the most robust methods to apply when variances are heterogeneous. The present study develops formulas for the sample size required for the test. The formulas are applicable for the cases of unequal variances, non‐normality and unequal sample sizes. Given the specified α and the power (1?β), the minimum sample size needed by the proposed formulas under various conditions is less than is given by the conventional formulas. Moreover, given a specified size of sample calculated by the proposed formulas, simulation results show that Yuen's test can achieve statistical power which is generally superior to that of the approximate t test. A numerical example is provided.  相似文献   

2.
The data obtained from one‐way independent groups designs is typically non‐normal in form and rarely equally variable across treatment populations (i.e. population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e. the analysis of variance F test) typically provides invalid results (e.g. too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non‐normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e. trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non‐normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non‐normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non‐normal.  相似文献   

3.
Preliminary tests of equality of variances used before a test of location are no longer widely recommended by statisticians, although they persist in some textbooks and software packages. The present study extends the findings of previous studies and provides further reasons for discontinuing the use of preliminary tests. The study found Type I error rates of a two‐stage procedure, consisting of a preliminary Levene test on samples of different sizes with unequal variances, followed by either a Student pooled‐variances t test or a Welch separate‐variances t test. Simulations disclosed that the twostage procedure fails to protect the significance level and usually makes the situation worse. Earlier studies have shown that preliminary tests often adversely affect the size of the test, and also that the Welch test is superior to the t test when variances are unequal. The present simulations reveal that changes in Type I error rates are greater when sample sizes are smaller, when the difference in variances is slight rather than extreme, and when the significance level is more stringent. Furthermore, the validity of the Welch test deteriorates if it is used only on those occasions where a preliminary test indicates it is needed. Optimum protection is assured by using a separate‐variances test unconditionally whenever sample sizes are unequal.  相似文献   

4.
In this paper, the performance of six types of techniques for comparisons of means is examined. These six emerge from the distinction between the method employed (hypothesis testing, model selection using information criteria, or Bayesian model selection) and the set of hypotheses that is investigated (a classical, exploration‐based set of hypotheses containing equality constraints on the means, or a theory‐based limited set of hypotheses with equality and/or order restrictions). A simulation study is conducted to examine the performance of these techniques. We demonstrate that, if one has specific, a priori specified hypotheses, confirmation (i.e., investigating theory‐based hypotheses) has advantages over exploration (i.e., examining all possible equality‐constrained hypotheses). Furthermore, examining reasonable order‐restricted hypotheses has more power to detect the true effect/non‐null hypothesis than evaluating only equality restrictions. Additionally, when investigating more than one theory‐based hypothesis, model selection is preferred over hypothesis testing. Because of the first two results, we further examine the techniques that are able to evaluate order restrictions in a confirmatory fashion by examining their performance when the homogeneity of variance assumption is violated. Results show that the techniques are robust to heterogeneity when the sample sizes are equal. When the sample sizes are unequal, the performance is affected by heterogeneity. The size and direction of the deviations from the baseline, where there is no heterogeneity, depend on the effect size (of the means) and on the trend in the group variances with respect to the ordering of the group sizes. Importantly, the deviations are less pronounced when the group variances and sizes exhibit the same trend (e.g., are both increasing with group number).  相似文献   

5.
Categorical moderators are often included in mixed-effects meta-analysis to explain heterogeneity in effect sizes. An assumption in tests of categorical moderator effects is that of a constant between-study variance across all levels of the moderator. Although it rarely receives serious thought, there can be statistical ramifications to upholding this assumption. We propose that researchers should instead default to assuming unequal between-study variances when analysing categorical moderators. To achieve this, we suggest using a mixed-effects location-scale model (MELSM) to allow group-specific estimates for the between-study variance. In two extensive simulation studies, we show that in terms of Type I error and statistical power, little is lost by using the MELSM for moderator tests, but there can be serious costs when an equal variance mixed-effects model (MEM) is used. Most notably, in scenarios with balanced sample sizes or equal between-study variance, the Type I error and power rates are nearly identical between the MEM and the MELSM. On the other hand, with imbalanced sample sizes and unequal variances, the Type I error rate under the MEM can be grossly inflated or overly conservative, whereas the MELSM does comparatively well in controlling the Type I error across the majority of cases. A notable exception where the MELSM did not clearly outperform the MEM was in the case of few studies (e.g., 5). With respect to power, the MELSM had similar or higher power than the MEM in conditions where the latter produced non-inflated Type 1 error rates. Together, our results support the idea that assuming unequal between-study variances is preferred as a default strategy when testing categorical moderators.  相似文献   

6.
The factorial 2 × 2 fixed‐effect ANOVA is a procedure used frequently in scientific research to test mean differences between‐subjects in all of the groups. But if the assumption of homogeneity is violated, the test for the row, column, and the interaction effect might be invalid or less powerful. Therefore, for planning research in the case of unknown and possibly unequal variances, it is worth developing a sample size formula to obtain the desired power. This article suggests a simple formula to determine the sample size for 2 × 2 fixed‐effect ANOVA for heterogeneous variances across groups. We use the approximate Welch t test and consider the variance ratio to derive the formula. The sample size determination requires two‐step iterations but the approximate sample sizes needed for the main effect and the interaction effect can be determined separately with the specified power. The present study also provides an example and a SAS program to facilitate the calculation process.  相似文献   

7.
Efforts to change power differences with others who are equal and unequal in power were examined. According to social comparison theory (Festinger, 1954; Rijsman, 1983), people prefer slight superiority in power over comparison others. In Experiment 1, 93 participants imagined working with two others in a group. Group members varied in hierarchical rank and on exact power scores. Participants indicated their preferred changes in power differences. Social comparison theory was supported regarding rank differences, but not regarding power scores. In Experiment 2, 145 participants imagined a similar group setting. Group members were equal, unequal, or very unequal in power. Social comparison theory was supported regarding ranks: power differences with an equally powerful person were increased more often than with a less powerful person. Power scores again yielded no effects. This suggests that social comparisons of power are based on rank and not interval information.  相似文献   

8.
Pan T  Yin Y 《心理学方法》2012,17(2):309-311
In the discussion of mean square difference (MSD) and standard error of measurement (SEM), Barchard (2012) concluded that the MSD between 2 sets of test scores is greater than 2(SEM)2 and SEM underestimates the score difference between 2 tests when the 2 tests are not parallel. This conclusion has limitations for 2 reasons. First, strictly speaking, MSD should not be compared to SEM because they measure different things, have different assumptions, and capture different sources of errors. Second, the related proof and conclusions in Barchard hold only under the assumptions of equal reliabilities, homogeneous variances, and independent measurement errors. To address the limitations, we propose that MSD should be compared to the standard error of measurement of difference scores (SEMx-y) so that the comparison can be extended to the conditions when 2 tests have unequal reliabilities and score variances.  相似文献   

9.
The Kimberley Indigenous Cognitive Assessment (KICA) was initially developed and validated as a culturally appropriate dementia screening tool for older Indigenous people living in the Kimberley. This paper describes the re‐evaluation of the psychometric properties of the cognitive section (KICA‐Cog) of this tool in two different populations, including a Northern Territory sample, and a larger population‐based cohort from the Kimberley. In both populations, participants were evaluated on the KICA‐Cog tool, and independently assessed by expert clinical raters blinded to the KICA scores, to determine validity and reliability of dementia diagnosis for both groups. Community consultation, feedback and education were integral parts of the research. for the Northern Territory sample, 52 participants were selected primarily through health services. Sensitivity was 82.4% and specificity was 87.5% for diagnosis of dementia, with area under the curve (AUC) of .95, based on a cut‐off score of 31/32 of a possible 39. for the Kimberley sample, 363 participants from multiple communities formed part of a prevalence study of dementia. Sensitivity was 93.3% and specificity was 98.4% for a cut‐off score of 33/34, with AUC = .98 (95% confidence interval: 0.97–0.99). There was no education bias found. The KICA‐Cog appears to be most reliable at a cut‐off of 33/39.  相似文献   

10.
The Type I error probability and the power of the independent samples t test, performed directly on the ranks of scores in combined samples in place of the original scores, are known to be the same as those of the non‐parametric Wilcoxon–Mann–Whitney (WMW) test. In the present study, simulations revealed that these probabilities remain essentially unchanged when the number of ranks is reduced by assigning the same rank to multiple ordered scores. For example, if 200 ranks are reduced to as few as 20, or 10, or 5 ranks by replacing sequences of consecutive ranks by a single number, the Type I error probability and power stay about the same. Significance tests performed on these modular ranks consistently reproduce familiar findings about the comparative power of the t test and the WMW tests for normal and various non‐normal distributions. Similar results are obtained for modular ranks used in comparing the one‐sample t test and the Wilcoxon signed ranks test.  相似文献   

11.
The equality of two group variances is frequently tested in experiments. However, criticisms of null hypothesis statistical testing on means have recently arisen and there is interest in other types of statistical tests of hypotheses, such as superiority/non-inferiority and equivalence. Although these tests have become more common in psychology and social sciences, the corresponding sample size estimation for these tests is rarely discussed, especially when the sampling unit costs are unequal or group sizes are unequal for two groups. Thus, for finding optimal sample size, the present study derived an initial allocation by approximating the percentiles of an F distribution with the percentiles of the standard normal distribution and used the exhaustion algorithm to select the best combination of group sizes, thereby ensuring the resulting power reaches the designated level and is maximal with a minimal total cost. In this manner, optimization of sample size planning is achieved. The proposed sample size determination has a wide range of applications and is efficient in terms of Type I errors and statistical power in simulations. Finally, an illustrative example from a report by the Health Survey for England, 1995–1997, is presented using hypertension data. For ease of application, four R Shiny apps are provided and benchmarks for setting equivalence margins are suggested.  相似文献   

12.
The allocation of sufficient participants into different experimental groups for various research purposes under given constraints is an important practical problem faced by researchers. We address the problem of sample size determination between two independent groups for unequal and/or unknown variances when both the power and the differential cost are taken into consideration. We apply the well‐known Welch approximate test to derive various sample size allocation ratios by minimizing the total cost or, equivalently, maximizing the statistical power. Two types of hypotheses including superiority/non‐inferiority and equivalence of two means are each considered in the process of sample size planning. A simulation study is carried out and the proposed method is validated in terms of Type I error rate and statistical power. As a result, the simulation study reveals that the proposed sample size formulas are very satisfactory under various variances and sample size allocation ratios. Finally, a flowchart, tables, and figures of several sample size allocations are presented for practical reference.  相似文献   

13.
In the classical test theory, a high-reliability test always leads to a precise measurement. However, when it comes to the prediction of test scores, it is not necessarily so. Based on a Bayesian statistical approach, we predicted the distributions of test scores for a new subject, a new test, and a new subject taking a new test. Under some reasonable conditions, the predicted means, variances, and covariances of predicted scores were obtained and investigated. We found that high test reliability did not necessarily lead to small variances or covariances. For a new subject, higher test reliability led to larger predicted variances and covariances, because high test reliability enabled a more accurate prediction of test score variances. Regarding a new subject taking a new test, in this study, higher test reliability led to a large variance when the sample size was smaller than half the number of tests. The classical test theory is reanalyzed from the viewpoint of predictions and some suggestions are made.  相似文献   

14.
The purpose of this study was to evaluate a modified test of equivalence for conducting normative comparisons when distribution shapes are non‐normal and variances are unequal. A Monte Carlo study was used to compare the empirical Type I error rates and power of the proposed Schuirmann–Yuen test of equivalence, which utilizes trimmed means, with that of the previously recommended Schuirmann and Schuirmann–Welch tests of equivalence when the assumptions of normality and variance homogeneity are satisfied, as well as when they are not satisfied. The empirical Type I error rates of the Schuirmann–Yuen were much closer to the nominal α level than those of the Schuirmann or Schuirmann–Welch tests, and the power of the Schuirmann–Yuen was substantially greater than that of the Schuirmann or Schuirmann–Welch tests when distributions were skewed or outliers were present. The Schuirmann–Yuen test is recommended for assessing clinical significance with normative comparisons.  相似文献   

15.
For one‐way fixed effects ANOVA, it is well known that the conventional F test of the equality of means is not robust to unequal variances, and numerous methods have been proposed for dealing with heteroscedasticity. On the basis of extensive empirical evidence of Type I error control and power performance, Welch's procedure is frequently recommended as the major alternative to the ANOVA F test under variance heterogeneity. To enhance its practical usefulness, this paper considers an important aspect of Welch's method in determining the sample size necessary to achieve a given power. Simulation studies are conducted to compare two approximate power functions of Welch's test for their accuracy in sample size calculations over a wide variety of model configurations with heteroscedastic structures. The numerical investigations show that Levy's (1978a) approach is clearly more accurate than the formula of Luh and Guo (2011) for the range of model specifications considered here. Accordingly, computer programs are provided to implement the technique recommended by Levy for power calculation and sample size determination within the context of the one‐way heteroscedastic ANOVA model.  相似文献   

16.
Methods for comparing means are known to be highly nonrobust in terms of Type II errors. The problem is that slight shifts from normal distributions toward heavy-tailed distributions inflate the standard error of the sample mean. In contrast, the standard error of various robust measures of location, such as the one-step M-estimator, are relatively unaffected by heavy tails. Wilcox recently examined a method of comparing the one-step M-estimators of location corresponding to two independent groups which provided good control over the probability of a Type I error even for unequal sample sizes, unequal variances, and different shaped distributions. There is a fairly obvious extension of this procedure to pairwise comparisons of more than two independent groups, but simulations reported here indicate that it is unsatisfactory. A slight modification of the procedure is found to give much better results, although some caution must be taken when there are unequal sample sizes and light-tailed distributions. An omnibus test is examined as well.  相似文献   

17.
In the present paper, a general class of heteroscedastic one‐factor models is considered. In these models, the residual variances of the observed scores are explicitly modelled as parametric functions of the one‐dimensional factor score. A marginal maximum likelihood procedure for parameter estimation is proposed under both the assumption of multivariate normality of the observed scores conditional on the single common factor score and the assumption of normality of the common factor score. A likelihood ratio test is derived, which can be used to test the usual homoscedastic one‐factor model against one of the proposed heteroscedastic models. Simulation studies are carried out to investigate the robustness and the power of this likelihood ratio test. Results show that the asymptotic properties of the test statistic hold under both small test length conditions and small sample size conditions. Results also show under what conditions the power to detect different heteroscedasticity parameter values is either small, medium, or large. Finally, for illustrative purposes, the marginal maximum likelihood estimation procedure and the likelihood ratio test are applied to real data.  相似文献   

18.
The ability scores from all complete twin pairs recruited to the compulsory Norwegian military service during the period 1950–1954 were analysed, using a multivariate design. Three separate scores were obtained from the Army tests: general ability, technical comprehension and arithmetical skills. A chi- square model test and a maximum likelihood estimation were performed by the use of LISREL computer program. The total genetic variances varied from 40% to 66% in the three subttests. The environmental within family variances and the environmental between family variances were about equally large. The intra-correlations between the subtests were generally high, and the major part of the genetic variance was common, but specific genetic effects and specific environmental within family effects both explained about 20% of the total variances in technical comprehension and in arithmetical skills. Nearly no variance was specific for general ability. All the environmental between family variance was common for the three subtests.  相似文献   

19.
The use of cut‐off values is common in research on the effort‐reward imbalance (ERI) model. They are often used to identify health risk situations or behaviour at work, although little is known about their diagnostic properties. The aim of the study was to investigate empirically cut‐off points for the effort‐reward ratio and the overcommitment (OC) scale. The study was based on a sample of 302 teachers. According to the International Classification of Mental and Behavioural Disorders‐10 (ICD‐10), 115 subjects suffered from a mood disorder. The control group consisted of 187 matched healthy subjects. Receiver‐operating characteristic analyses were conducted using the ERI ratio and OC as diagnostic variables. A mood disorder served as gold standard reference test. Results demonstrated the ability of the effort‐reward ratio and OC to discriminate between diseased and healthy individuals. However, a comparison of the areas under the curve revealed a significantly higher diagnostic power for the effort‐reward ratio. According to the Youden index, optimal cut‐off points were ERI>0.715 and OC>16. Furthermore, sensitivity and specificity for different cut‐off values are presented. Results point to shortcomings in the ERI literature using established approaches to define cut‐off points. Validating cut‐off values is of particular importance in order to ensure valid results in ERI research.  相似文献   

20.
International comparisons of IQ test norms show differences between nations. In the present study, nonverbal reasoning, processing speed and working memory subtest scores of the US, German, French, Finnish, and Scandinavian (combined Swedish‐Norwegian‐Danish sample) WAIS IV standardization samples were compared. The European samples had higher scores on the reasoning subtests compared to the American sample, corroborating earlier studies. The Finnish and Scandinavian samples had lower processing speed and working memory scores than the American, German, and French samples. Mechanisms that may underlie the observed national IQ profiles include: (1) test‐taking attitudes—in tests that require balancing speed and accuracy of performance Americans may prioritize fast performance while Europeans avoid mistakes; (2) differences between languages in digit articulation times; and (3) educational factors—the European advantage on reasoning subtests may be based on there being better educational systems in Europe as compared to the US.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号