首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Categorical moderators are often included in mixed-effects meta-analysis to explain heterogeneity in effect sizes. An assumption in tests of categorical moderator effects is that of a constant between-study variance across all levels of the moderator. Although it rarely receives serious thought, there can be statistical ramifications to upholding this assumption. We propose that researchers should instead default to assuming unequal between-study variances when analysing categorical moderators. To achieve this, we suggest using a mixed-effects location-scale model (MELSM) to allow group-specific estimates for the between-study variance. In two extensive simulation studies, we show that in terms of Type I error and statistical power, little is lost by using the MELSM for moderator tests, but there can be serious costs when an equal variance mixed-effects model (MEM) is used. Most notably, in scenarios with balanced sample sizes or equal between-study variance, the Type I error and power rates are nearly identical between the MEM and the MELSM. On the other hand, with imbalanced sample sizes and unequal variances, the Type I error rate under the MEM can be grossly inflated or overly conservative, whereas the MELSM does comparatively well in controlling the Type I error across the majority of cases. A notable exception where the MELSM did not clearly outperform the MEM was in the case of few studies (e.g., 5). With respect to power, the MELSM had similar or higher power than the MEM in conditions where the latter produced non-inflated Type 1 error rates. Together, our results support the idea that assuming unequal between-study variances is preferred as a default strategy when testing categorical moderators.  相似文献   

2.
The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic strategy to isolate, examine, and remove suppression effects has been offered. In this article such an approach, rooted in confirmatory factor analysis theory and employing matrix algebra, is developed. Suppression is viewed as the result of criterion-irrelevant variance operating among predictors. Decomposition of predictor variables into criterion-relevant and criterion-irrelevant components using structural equation modeling permits derivation of regression weights with the effects of criterion-irrelevant variance omitted. Three examples with data from applied research are used to illustrate the approach: the first assesses child and parent characteristics to explain why some parents of children with obsessive-compulsive disorder accommodate their child's compulsions more so than do others, the second examines various dimensions of personal health to explain individual differences in global quality of life among patients following heart surgery, and the third deals with quantifying the relative importance of various aptitudes for explaining academic performance in a sample of nursing students. The approach is offered as an analytic tool for investigators interested in understanding predictor-criterion relationships when complex patterns of intercorrelation among predictors are present and is shown to augment dominance analysis.  相似文献   

3.
In a recent article in The Journal of General Psychology, J. B. Hittner, K. May, and N. C. Silver (2003) described their investigation of several methods for comparing dependent correlations and found that all can be unsatisfactory, in terms of Type I errors, even with a sample size of 300. More precisely, when researchers test at the .05 level, the actual Type I error probability can exceed .10. The authors of this article extended J. B. Hittner et al.'s research by considering a variety of alternative methods. They found 3 that avoid inflating the Type I error rate above the nominal level. However, a Monte Carlo simulation demonstrated that when the underlying distribution of scores violated the assumption of normality, 2 of these methods had relatively low power and had actual Type I error rates well below the nominal level. The authors report comparisons with E. J. Williams' (1959) method.  相似文献   

4.
Choice of the appropriate model in meta‐analysis is often treated as an empirical question which is answered by examining the amount of variability in the effect sizes. When all of the observed variability in the effect sizes can be accounted for based on sampling error alone, a set of effect sizes is said to be homogeneous and a fixed‐effects model is typically adopted. Whether a set of effect sizes is homogeneous or not is usually tested with the so‐called Q test. In this paper, a variety of alternative homogeneity tests – the likelihood ratio, Wald and score tests – are compared with the Q test in terms of their Type I error rate and power for four different effect size measures. Monte Carlo simulations show that the Q test kept the tightest control of the Type I error rate, although the results emphasize the importance of large sample sizes within the set of studies. The results also suggest under what conditions the power of the tests can be considered adequate.  相似文献   

5.
One approach to the analysis of repeated measures data allows researchers to model the covariance structure of the data rather than presume a certain structure, as is the case with conventional univariate and multivariate test statistics. This mixed-model approach was evaluated for testing all possible pairwise differences among repeated measures marginal means in a Between-Subjects x Within-Subjects design. Specifically, the authors investigated Type I error and power rates for a number of simultaneous and stepwise multiple comparison procedures using SAS (1999) PROC MIXED in unbalanced designs when normality and covariance homogeneity assumptions did not hold. J. P. Shaffer's (1986) sequentially rejective step-down and Y. Hochberg's (1988) sequentially acceptive step-up Bonferroni procedures, based on an unstructured covariance structure, had superior Type I error control and power to detect true pairwise differences across the investigated conditions.  相似文献   

6.
Repeated measures analyses of variance are the method of choice in many studies from experimental psychology and the neurosciences. Data from these fields are often characterized by small sample sizes, high numbers of factor levels of the within-subjects factor(s), and nonnormally distributed response variables such as response times. For a design with a single within-subjects factor, we investigated Type I error control in univariate tests with corrected degrees of freedom, the multivariate approach, and a mixed-model (multilevel) approach (SAS PROC MIXED) with Kenward–Roger’s adjusted degrees of freedom. We simulated multivariate normal and nonnormal distributions with varied population variance–covariance structures (spherical and nonspherical), sample sizes (N), and numbers of factor levels (K). For normally distributed data, as expected, the univariate approach with Huynh–Feldt correction controlled the Type I error rate with only very few exceptions, even if samples sizes as low as three were combined with high numbers of factor levels. The multivariate approach also controlled the Type I error rate, but it requires NK. PROC MIXED often showed acceptable control of the Type I error rate for normal data, but it also produced several liberal or conservative results. For nonnormal data, all of the procedures showed clear deviations from the nominal Type I error rate in many conditions, even for sample sizes greater than 50. Thus, none of these approaches can be considered robust if the response variable is nonnormally distributed. The results indicate that both the variance heterogeneity and covariance heterogeneity of the population covariance matrices affect the error rates.  相似文献   

7.
Ignoring a nested factor can influence the validity of statistical decisions about treatment effectiveness. Previous discussions have centered on consequences of ignoring nested factors versus treating them as random factors on Type I errors and measures of effect size (B. E. Wampold & R. C. Serlin). The authors (a) discuss circumstances under which the treatment of nested provider effects as fixed as opposed to random is appropriate; (b) present 2 formulas for the correct estimation of effect sizes when nested factors are fixed; (c) present the results of Monte Carlo simulations of the consequences of treating providers as fixed versus random on effect size estimates, Type I error rates, and power; and (d) discuss implications of mistaken considerations of provider effects for the study of differential treatment effects in psychotherapy research.  相似文献   

8.
This article proposes 2 new approaches to test a nonzero population correlation (rho): the hypothesis-imposed univariate sampling bootstrap (HI) and the observed-imposed univariate sampling bootstrap (OI). The authors simulated correlated populations with various combinations of normal and skewed variates. With alpha set=.05, N> or =10, and rho< or =0.4, empirical Type I error rates of the parametric r and the conventional bivariate sampling bootstrap reached .168 and .081, respectively, whereas the largest error rates of the HI and the OI were .079 and .062. On the basis of these results, the authors suggest that the OI is preferable in alpha control to parametric approaches if the researcher believes the population is nonnormal and wishes to test for nonzero rhos of moderate size.  相似文献   

9.
Our goal is to provide empirical scientists with practical tools and advice with which to test hypotheses related to individual differences in intra-individual variability using the mixed-effects location-scale model. To that end, we evaluate Type I error rates and power to detect and predict individual differences in intra-individual variability using this model and provide empirically-based guidelines for building scale models that include random and/or systematically-varying fixed effects. We also provide two power simulation programs that allow researchers to conduct a priori empirical power analyses. Our results aligned with statistical power theory, in that, greater power was observed for designs with more individuals, more repeated occasions, greater proportions of variance available to be explained, and larger effect sizes. In addition, our results indicated that Type I error rates were acceptable in situations when individual differences in intra-individual variability were not initially detectable as well as when the scale-model individual-level predictor explained all initially detectable individual differences in intra-individual variability. We conclude our paper by providing study design and model building advice for those interested in using the mixed-effects location-scale model in practice.  相似文献   

10.
ABSTRACT. The partial correlation and the semi-partial correlation can be seen as measures of partial effect sizes for the correlational family. Thus, both indices have been used in the meta-analysis literature to represent the relationship between an outcome and a predictor of interest, controlling for the effect of other variables in the model. This article evaluates the accuracy of synthesizing these two indices under different situations. Both partial correlation and the semi-partial correlation appear to behave as expected with respect to bias and root mean squared error (RMSE). However, the partial correlation seems to outperform the semi-partial correlation regarding Type I error of the homogeneity test (Q statistic). Although further investigation is needed to fully understand the impact of meta-analyzing partial effect sizes, the current study demonstrates the accuracy of both indices.  相似文献   

11.
Hou,de la Torre和Nandakumar(2014)提出可以使用Wald统计量检验DIF,但其结果的一类错误率存在过度膨胀的问题。本研究中提出了一个使用观察信息矩阵进行计算的改进后的Wald统计量。结果表明:(1)使用观察信息矩阵计算的这一改进后的Wald统计量在DIF检验中具有良好的一类错误控制率,尤其是在项目具有较高区分能力的时候,解决了以往研究中一类错误率过度膨胀的问题。(2)随着样本量的增加以及DIF量的增大,使用观察信息矩阵计算Wald统计量的统计检验力也在增加。  相似文献   

12.
Inconsistencies in the research findings on F-test robustness to variance heterogeneity could be related to the lack of a standard criterion to assess robustness or to the different measures used to quantify heterogeneity. In the present paper we use Monte Carlo simulation to systematically examine the Type I error rate of F-test under heterogeneity. One-way, balanced, and unbalanced designs with monotonic patterns of variance were considered. Variance ratio (VR) was used as a measure of heterogeneity (1.5, 1.6, 1.7, 1.8, 2, 3, 5, and 9), the coefficient of sample size variation as a measure of inequality between group sizes (0.16, 0.33, and 0.50), and the correlation between variance and group size as an indicator of the pairing between them (1, .50, 0, ?.50, and ?1). Overall, the results suggest that in terms of Type I error a VR above 1.5 may be established as a rule of thumb for considering a potential threat to F-test robustness under heterogeneity with unequal sample sizes.  相似文献   

13.
基于改进的Wald统计量,将适用于两群组的DIF检测方法拓展至多群组的项目功能差异(DIF)检验;改进的Wald统计量将分别通过计算观察信息矩阵(Obs)和经验交叉相乘信息矩阵(XPD)而得到。模拟研究探讨了此二者与传统计算方法在多个群组下的DIF检验情况,结果表明:(1)Obs和XPD的一类错误率明显低于传统方法,DINA模型估计下Obs和XPD的一类错误率接近理论水平;(2)样本量和DIF量较大时,Obs和XPD具有与传统Wald统计量大体相同的统计检验力。  相似文献   

14.
Abstract This article considers the problem of comparing two independent groups in terms of some measure of location. It is well known that with Student's two-independent-sample t test, the actual level of significance can be well above or below the nominal level, confidence intervals can have inaccurate probability coverage, and power can be low relative to other methods. A solution to deal with heterogeneity is Welch's (1938) test. Welch's test deals with heteroscedasticity but can have poor power under arbitrarily small departures from normality. Yuen (1974) generalized Welch's test to trimmed means; her method provides improved control over the probability of a Type I error, but problems remain. Transformations for skewness improve matters, but the probability of a Type I error remains unsatisfactory in some situations. We find that a transformation for skewness combined with a bootstrap method improves Type I error control and probability coverage even if sample sizes are small.  相似文献   

15.
Adverse impact is often assessed by evaluating whether the success rates for 2 groups on a selection procedure are significantly different. Although various statistical methods have been used to analyze adverse impact data, Fisher's exact test (FET) has been widely adopted, especially when sample sizes are small. In recent years, however, the statistical field has expressed concern regarding the default use of the FET and has proposed several alternative tests. This article reviews Lancaster's mid-P (LMP) test (Lancaster, 1961), an adjustment to the FET that tends to have increased power while maintaining a Type I error rate close to the nominal level. On the basis of Monte Carlo simulation results, the LMP test was found to outperform the FET across a wide range of conditions typical of adverse impact analyses. The LMP test was also found to provide better control over Type I errors than the large-sample Z-test when sample size was very small, but it tended to have slightly lower power than the Z-test under some conditions.  相似文献   

16.
Adverse impact evaluations often call for evidence that the disparity between groups in selection rates is statistically significant, and practitioners must choose which test statistic to apply in this situation. To identify the most effective testing procedure, the authors compared several alternate test statistics in terms of Type I error rates and power, focusing on situations with small samples. Significance testing was found to be of limited value because of low power for all tests. Among the alternate test statistics, the widely-used Z-test on the difference between two proportions performed reasonably well, except when sample size was extremely small. A test suggested by G. J. G. Upton (1982) provided slightly better control of Type I error under some conditions but generally produced results similar to the Z-test. Use of the Fisher Exact Test and Yates's continuity-corrected chi-square test are not recommended because of overly conservative Type I error rates and substantially lower power than the Z-test.  相似文献   

17.
The omega (ω) statistic is reputed to be one of the best indices for detecting answer copying on multiple choice tests, but its performance relies on the accurate estimation of copier ability, which is challenging because responses from the copiers may have been contaminated. We propose an algorithm that aims to identify and delete the suspected copied responses through probability sampling and bootstrapping. In doing so, the bias in copier ability estimation will be determined and used to update the ability estimate for calculating the modified omega (ωm), a new statistic based on the ω. The performance of ωm and ω were compared in a Monte Carlo simulation study under 40 typical testing conditions (2 test lengths x 4 sample sizes x 5 levels of copying). In almost all conditions, the ωm had the same or better controlled Type I error and higher power than ω. The increase in power was particularly eminent when the source's estimated ability was higher than the copier and when 20% or 30% of items were copied. These findings support the use of the ωm as a replacement of ω to detect answer copying in multiple choice exams.  相似文献   

18.
A Monte Carlo study compared 14 methods to test the statistical significance of the intervening variable effect. An intervening variable (mediator) transmits the effect of an independent variable to a dependent variable. The commonly used R. M. Baron and D. A. Kenny (1986) approach has low statistical power. Two methods based on the distribution of the product and 2 difference-in-coefficients methods have the most accurate Type I error rates and greatest statistical power except in 1 important case in which Type I error rates are too high. The best balance of Type I error and statistical power across all cases is the test of the joint significance of the two effects comprising the intervening variable effect.  相似文献   

19.
Experiments often produce a hit rate and a false alarm rate in each of two conditions. These response rates are summarized into a single-point sensitivity measure such as d', and t tests are conducted to test for experimental effects. Using large-scale Monte Carlo simulations, we evaluate the Type I error rates and power that result from four commonly used single-point measures: d', A', percent correct, and gamma. We also test a newly proposed measure called gammaC. For all measures, we consider several ways of handling cases in which false alarm rate = 0 or hit rate = 1. The results of our simulations indicate that power is similar for these measures but that the Type I error rates are often unacceptably high. Type I errors are minimized when the selected sensitivity measure is theoretically appropriate for the data.  相似文献   

20.
In a variety of measurement situations, the researcher may wish to compare the reliabilities of several instruments administered to the same sample of subjects. This paper presents eleven statistical procedures which test the equality ofm coefficient alphas when the sample alpha coefficients are dependent. Several of the procedures are derived in detail, and numerical examples are given for two. Since all of the procedures depend on approximate asymptotic results, Monte Carlo methods are used to assess the accuracy of the procedures for sample sizes of 50, 100, and 200. Both control of Type I error and power are evaluated by computer simulation. Two of the procedures are unable to control Type I errors satisfactorily. The remaining nine procedures perform properly, but three are somewhat superior in power and Type I error control.A more detailed version of this paper is also available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号