期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Confidence intervals for standardized linear contrasts of means

Bonett DG 《心理学方法》2008,13(2):99-109

Most psychology journals now require authors to report a sample value of effect size along with hypothesis testing results. The sample effect size value can be misleading because it contains sampling error. Authors often incorrectly interpret the sample effect size as if it were the population effect size. A simple solution to this problem is to report a confidence interval for the population value of the effect size. Standardized linear contrasts of means are useful measures of effect size in a wide variety of research applications. New confidence intervals for standardized linear contrasts of means are developed and may be applied to between-subjects designs, within-subjects designs, or mixed designs. The proposed confidence interval methods are easy to compute, do not require equal population variances, and perform better than the currently available methods when the population variances are not equal. 相似文献

2.

Comparing one-step m-estimators of location corresponding to two independent groups

Rand R. Wilcox 《Psychometrika》1992,57(1):141-154

Experience with real data indicates that psychometric measures often have heavy-tailed distributions. This is known to be a serious problem when comparing the means of two independent groups because heavy-tailed distributions can have a serious effect on power. Another problem that is common in some areas is outliers. This paper suggests an approach to these problems based on the one-step M-estimator of location. Simulations indicate that the new procedure provides very good control over the probability of a Type I error even when distributions are skewed, have different shapes, and the variances are unequal. Moreover, the new procedure has considerably more power than Welch's method when distributions have heavy tails, and it compares well to Yuen's method for comparing trimmed means. Wilcox's median procedure has about the same power as the proposed procedure, but Wilcox's method is based on a statistic that has a finite sample breakdown point of only 1/n, wheren is the sample size. Comments on other methods for comparing groups are also included. 相似文献

3.

Robust and powerful nonorthogonal analyses

H. J. Keselman K. C. Carriere Lisa M. Lix 《Psychometrika》1995,60(3):395-418

Numerous types of analyses for factorial designs having unequal cell frequencies have been discussed in the literature. These analyses test either weighted or unweighted marginal means which, in turn, correspond to different model comparisons. Previous research has indicated, however, that these analyses result in biased (liberal or conservative) tests when cell variances are heterogeneous. We show how to obtain a generally robust and powerful analysis with any of the recommended nonorthogonal solutions by adapting a modification of the Welch-James procedure for comparing means when population variances are heterogeneous.This research was supported by a Social Sciences and Humanities Research Council (SSHRC) grant (# 410-92-0430) to the first author, a Manitoba Health Research Council Scholar Award and a grant from Natural Sciences and Engineering Research Council to the second author, and a SSHRC Doctoral Fellowship (# 752-92-1628) to the third author. The authors would like to express their gratitude to Joanne Keselman and three anonymous reviewers for their many helpful substantive comments on earlier drafts of this paper. 相似文献

4.

Confidence intervals and sample size calculations for the weighted eta-squared effect sizes in one-way heteroscedastic ANOVA 总被引：1，自引：0，他引：1

Gwowen Shieh 《Behavior research methods》2013,45(1):25-37

Effect size reporting and interpreting practices have been extensively recommended in academic journals when primary outcomes of all empirical studies have been analyzed. This article presents an alternative approach to constructing confidence intervals of the weighted eta-squared effect size within the context of one-way heteroscedastic ANOVA models. It is shown that the proposed interval procedure has advantages over an existing method in its theoretical justification, computational simplicity, and numerical performance. For design planning, the corresponding sample size procedures for precise interval estimation of the weighted eta-squared association measure are also delineated. Specifically, the developed formulas compute the necessary sample sizes with respect to the considerations of expected confidence interval width and tolerance probability of interval width within a designated value. Supplementary computer programs are provided to aid the implementation of the suggested techniques in practical applications of ANOVA designs when the assumption of homogeneous variances is not tenable. 相似文献

5.

Confidence intervals and sample size calculations for the standardized mean difference effect size between two normal populations under heteroscedasticity

G. Shieh 《Behavior research methods》2013,45(4):955-967

The use of effect sizes and associated confidence intervals in all empirical research has been strongly emphasized by journal publication guidelines. To help advance theory and practice in the social sciences, this article describes an improved procedure for constructing confidence intervals of the standardized mean difference effect size between two independent normal populations with unknown and possibly unequal variances. The presented approach has advantages over the existing formula in both theoretical justification and computational simplicity. In addition, simulation results show that the suggested one- and two-sided confidence intervals are more accurate in achieving the nominal coverage probability. The proposed estimation method provides a feasible alternative to the most commonly used measure of Cohen’s d and the corresponding interval procedure when the assumption of homogeneous variances is not tenable. To further improve the potential applicability of the suggested methodology, the sample size procedures for precise interval estimation of the standardized mean difference are also delineated. The desired precision of a confidence interval is assessed with respect to the control of expected width and to the assurance probability of interval width within a designated value. Supplementary computer programs are developed to aid in the usefulness and implementation of the introduced techniques. 相似文献

6.

Point-biserial correlation: Interval estimation,hypothesis testing,meta-analysis,and sample size determination

Douglas G. Bonett 《The British journal of mathematical and statistical psychology》2020,73(Z1):113-144

The point-biserial correlation is a commonly used measure of effect size in two-group designs. New estimators of point-biserial correlation are derived from different forms of a standardized mean difference. Point-biserial correlations are defined for designs with either fixed or random group sample sizes and can accommodate unequal variances. Confidence intervals and standard errors for the point-biserial correlation estimators are derived from the sampling distributions for pooled-variance and separate-variance versions of a standardized mean difference. The proposed point-biserial confidence intervals can be used to conduct directional two-sided tests, equivalence tests, directional non-equivalence tests, and non-inferiority tests. A confidence interval for an average point-biserial correlation in meta-analysis applications performs substantially better than the currently used methods. Sample size formulas for estimating a point-biserial correlation with desired precision and testing a point-biserial correlation with desired power are proposed. R functions are provided that can be used to compute the proposed confidence intervals and sample size formulas. 相似文献

7.

TRACKING THE GENDER PAY GAP: A CASE STUDY

Cheryl B. Travis Louis J. Gross Bruce A. Johnson 《Psychology of women quarterly》2009,33(4):410-418

This article provides a short introduction to standard considerations in the formal study of wages and illustrates the use of multiple regression and resampling simulation approaches in a case study of faculty salaries at one university. Multiple regression is especially beneficial where it provides information on strength of association, specific dollar estimates, and the option to identify outliers by gender. Resampling simulation allows for analysis at the department level and is beneficial where distributions depart substantially from normal, particularly where there are unequal error variances. Results indicate that both regression and simulation methods provided evidence of a sizable pay gap associated with gender, even after controlling for rank, academic field, and years of service. The gap occurs in fields traditionally viewed as female as well as science fields with typically lower female representation. Finally, we discuss implications for remediation based on these models. 相似文献

8.

Sample size planning with the cost constraint for testing superiority and equivalence of two independent groups

Jiin‐Huarng Guo Hubert J. Chen Wei‐Ming Luh 《The British journal of mathematical and statistical psychology》2011,64(3):439-461

The allocation of sufficient participants into different experimental groups for various research purposes under given constraints is an important practical problem faced by researchers. We address the problem of sample size determination between two independent groups for unequal and/or unknown variances when both the power and the differential cost are taken into consideration. We apply the well‐known Welch approximate test to derive various sample size allocation ratios by minimizing the total cost or, equivalently, maximizing the statistical power. Two types of hypotheses including superiority/non‐inferiority and equivalence of two means are each considered in the process of sample size planning. A simulation study is carried out and the proposed method is validated in terms of Type I error rate and statistical power. As a result, the simulation study reveals that the proposed sample size formulas are very satisfactory under various variances and sample size allocation ratios. Finally, a flowchart, tables, and figures of several sample size allocations are presented for practical reference. 相似文献

9.

The Fisher-Pitman permutation test when testing for differences in mean and variance

Neuhäuser M Manly BF 《Psychological reports》2004,94(1):189-194

The Fisher-Pitman permutation test can detect any type of difference between two samples: hence, a significant Fisher-Pitman permutation test does not necessarily provide evidence for a difference in means. It is possible, however, to test separately for differences in means and variances. Here, we present a recently proposed two-stage procedure to decide whether there are differences in means or variances that can be applied when samples may come from nonnormal distributions with possibly unequal variances. 相似文献

10.

Does familiarity change in the revelation effect?

Verde MF Rotello CM 《Journal of experimental psychology. Learning, memory, and cognition》2003,29(5):739-746

The revelation effect describes the increased tendency to call items "old" when a recognition judgment is preceded by an incidental task. Past findings show that d' for recognition decreases following revelation, evidence that the revelation effect is due to familiarity change. However, data from receiver operating characteristic curves from 3 experiments produced no evidence of changes in recognition sensitivity. The authors illustrate how the use of a single-point measure like d' can be misleading when familiarity distribution variances are unequal. Also investigated was whether the effect depends on the revelation materials used. Neither the memorability of the revelation items, their similarity to recognition probes, nor the difficulty of the task changed the size of the effect. Thus, the revelation effect is not the result of a memory retrieval mechanism and seems to be generic and all-or-nothing. These characteristics are consistent with response bias rather than familiarity change. 相似文献

11.

Measurement invariance versus selection invariance: is fair selection possible?

Borsboom D Romeijn JW Wicherts JM 《心理学方法》2008,13(2):75-98

This article shows that measurement invariance (defined in terms of an invariant measurement model in different groups) is generally inconsistent with selection invariance (defined in terms of equal sensitivity and specificity across groups). In particular, when a unidimensional measurement instrument is used and group differences are present in the location but not in the variance of the latent distribution, sensitivity and positive predictive value will be higher in the group at the higher end of the latent dimension, whereas specificity and negative predictive value will be higher in the group at the lower end of the latent dimension. When latent variances are unequal, the differences in these quantities depend on the size of group differences in variances relative to the size of group differences in means. The effect originates as a special case of Simpson's paradox, which arises because the observed score distribution is collapsed into an accept-reject dichotomy. Simulations show the effect can be substantial in realistic situations. It is suggested that the effect may be partly responsible for overprediction in minority groups as typically found in empirical studies on differential academic performance. A methodological solution to the problem is suggested, and social policy implications are discussed. 相似文献

12.

Optimal sample sizes for precise interval estimation of Welch’s procedure under various allocation and cost considerations

Shieh G Jan SL 《Behavior research methods》2012,44(1):202-212

Welch’s (Biometrika 29: 350–362, 1938) procedure has emerged as a robust alternative to the Student’s t test for comparing the means of two normal populations with unknown and possibly unequal variances. To facilitate the advocated statistical practice of confidence intervals and further improve the potential applicability of Welch’s procedure, in the present article, we consider exact approaches to optimize sample size determinations for precise interval estimation of the difference between two means under various allocation and cost considerations. The desired precision of a confidence interval is assessed with respect to the control of expected half-width, and to the assurance probability of interval half-width within a designated value. Furthermore, the design schemes in terms of participant allocation and cost constraints include (a) giving the ratio of group sizes, (b) specifying one sample size, (c) attaining maximum precision performance for a fixed cost, and (d) meeting a specified precision level for the least cost. The proposed methods provide useful alternatives to the conventional sample size procedures. Also, the developed programs expand the degree of generality for the existing statistical software packages and can be accessed at brm.psychonomic-journals.org/content/ supplemental. 相似文献

13.

Approximate sample size formulas for the two‐sample trimmed mean test with unequal variances

《The British journal of mathematical and statistical psychology》2007,60(1):137-146

Yuen's two‐sample trimmed mean test statistic is one of the most robust methods to apply when variances are heterogeneous. The present study develops formulas for the sample size required for the test. The formulas are applicable for the cases of unequal variances, non‐normality and unequal sample sizes. Given the specified α and the power (1?β), the minimum sample size needed by the proposed formulas under various conditions is less than is given by the conventional formulas. Moreover, given a specified size of sample calculated by the proposed formulas, simulation results show that Yuen's test can achieve statistical power which is generally superior to that of the approximate t test. A numerical example is provided. 相似文献

14.

Evaluating clinical significance: Incorporating robust statistics with normative comparison tests

Katrina van Wieringen Robert A. Cribbie 《The British journal of mathematical and statistical psychology》2014,67(2):213-230

The purpose of this study was to evaluate a modified test of equivalence for conducting normative comparisons when distribution shapes are non‐normal and variances are unequal. A Monte Carlo study was used to compare the empirical Type I error rates and power of the proposed Schuirmann–Yuen test of equivalence, which utilizes trimmed means, with that of the previously recommended Schuirmann and Schuirmann–Welch tests of equivalence when the assumptions of normality and variance homogeneity are satisfied, as well as when they are not satisfied. The empirical Type I error rates of the Schuirmann–Yuen were much closer to the nominal α level than those of the Schuirmann or Schuirmann–Welch tests, and the power of the Schuirmann–Yuen was substantially greater than that of the Schuirmann or Schuirmann–Welch tests when distributions were skewed or outliers were present. The Schuirmann–Yuen test is recommended for assessing clinical significance with normative comparisons. 相似文献

15.

A one-way random effects model for trimmed means 总被引：1，自引：0，他引：1

Rand R. Wilcox 《Psychometrika》1994,59(3):289-306

The random effects ANOVA model plays an important role in many psychological studies, but the usual model suffers from at least two serious problems. The first is that even under normality, violating the assumption of equal variances can have serious consequences in terms of Type I errors or significance levels, and it can affect power as well. The second and perhaps more serious concern is that even slight departures from normality can result in a substantial loss of power when testing hypotheses. Jeyaratnam and Othman (1985) proposed a method for handling unequal variances, under the assumption of normality, but no results were given on how their procedure performs when distributions are nonnormal. A secondary goal in this paper is to address this issue via simulations. As will be seen, problems arise with both Type I errors and power. Another secondary goal is to provide new simulation results on the Rust-Fligner modification of the Kruskal-Wallis test. The primary goal is to propose a generalization of the usual random effects model based on trimmed means. The resulting test of no differences among J randomly sampled groups has certain advantages in terms of Type I errors, and it can yield substantial gains in power when distributions have heavy tails and outliers. This last feature is very important in applied work because recent investigations indicate that heavy-tailed distributions are common. Included is a suggestion for a heteroscedastic Winsorized analog of the usual intraclass correlation coefficient. 相似文献

16.

Generalizations and Extensions of the Probability of Superiority Effect Size Estimator

John Ruscio Benjamin Lee Gera 《Multivariate behavioral research》2013,48(2):208-219

Researchers are strongly encouraged to accompany the results of statistical tests with appropriate estimates of effect size. For 2-group comparisons, a probability-based effect size estimator (A) has many appealing properties (e.g., it is easy to understand, robust to violations of parametric assumptions, insensitive to outliers). We review generalizations of the A statistic to extend its use to applications with discrete data, with weighted data, with k > 2 groups, and with correlated samples. These generalizations are illustrated through reanalyses of data from published studies on sex differences in the acceptance of hypothetical offers of casual sex and in scores on a measure of economic enlightenment, on age differences in reported levels of Authentic Pride, and in differences between the numbers of promises made and kept in romantic relationships. Drawing from research on the construction of confidence intervals for the A statistic, we recommend a bootstrap method that can be used for each generalization of A. We provide a suite of programs that should make it easy to use the A statistic and accompany it with a confidence interval in a wide variety of research contexts. 相似文献

17.

On sample size calculation for 2×2 fixed‐effect ANOVA when variances are unknown and possibly unequal

Jiin‐Huarng Guo Dr Wei‐Ming Luh 《The British journal of mathematical and statistical psychology》2009,62(2):417-425

The factorial 2 × 2 fixed‐effect ANOVA is a procedure used frequently in scientific research to test mean differences between‐subjects in all of the groups. But if the assumption of homogeneity is violated, the test for the row, column, and the interaction effect might be invalid or less powerful. Therefore, for planning research in the case of unknown and possibly unequal variances, it is worth developing a sample size formula to obtain the desired power. This article suggests a simple formula to determine the sample size for 2 × 2 fixed‐effect ANOVA for heterogeneous variances across groups. We use the approximate Welch t test and consider the variance ratio to derive the formula. The sample size determination requires two‐step iterations but the approximate sample sizes needed for the main effect and the interaction effect can be determined separately with the specified power. The present study also provides an example and a SAS program to facilitate the calculation process. 相似文献

18.

On the statistical and theoretical basis of signal detection theory and extensions: Unequal variance, random coefficient, and mixture models

Lawrence T. DeCarlo 《Journal of mathematical psychology》2010,54(3):304-313

Basic results for conditional means and variances, as well as distributional results, are used to clarify the similarities and differences between various extensions of signal detection theory (SDT). It is shown that a previously presented motivation for the unequal variance SDT model (varying strength) actually leads to a related, yet distinct, model. The distinction has implications for other extensions of SDT, such as models with criteria that vary over trials. It is shown that a mixture extension of SDT is also consistent with unequal variances, but provides a different interpretation of the results; mixture SDT also offers a way to unify results found across several types of studies. 相似文献

19.

Properties of hypothesis testing techniques and (Bayesian) model selection for exploration‐based and theory‐based (order‐restricted) hypotheses

下载免费PDF全文

Rebecca M. Kuiper Tim Nederhoff Irene Klugkist 《The British journal of mathematical and statistical psychology》2015,68(2):220-245

In this paper, the performance of six types of techniques for comparisons of means is examined. These six emerge from the distinction between the method employed (hypothesis testing, model selection using information criteria, or Bayesian model selection) and the set of hypotheses that is investigated (a classical, exploration‐based set of hypotheses containing equality constraints on the means, or a theory‐based limited set of hypotheses with equality and/or order restrictions). A simulation study is conducted to examine the performance of these techniques. We demonstrate that, if one has specific, a priori specified hypotheses, confirmation (i.e., investigating theory‐based hypotheses) has advantages over exploration (i.e., examining all possible equality‐constrained hypotheses). Furthermore, examining reasonable order‐restricted hypotheses has more power to detect the true effect/non‐null hypothesis than evaluating only equality restrictions. Additionally, when investigating more than one theory‐based hypothesis, model selection is preferred over hypothesis testing. Because of the first two results, we further examine the techniques that are able to evaluate order restrictions in a confirmatory fashion by examining their performance when the homogeneity of variance assumption is violated. Results show that the techniques are robust to heterogeneity when the sample sizes are equal. When the sample sizes are unequal, the performance is affected by heterogeneity. The size and direction of the deviations from the baseline, where there is no heterogeneity, depend on the effect size (of the means) and on the trend in the group variances with respect to the ordering of the group sizes. Importantly, the deviations are less pronounced when the group variances and sizes exhibit the same trend (e.g., are both increasing with group number). 相似文献

20.

Heterogeneous heterogeneity by default: Testing categorical moderators in mixed-effects meta-analysis

Josue E. Rodriguez Donald R. Williams Paul-Christian Bürkner 《The British journal of mathematical and statistical psychology》2023,76(2):402-433

Categorical moderators are often included in mixed-effects meta-analysis to explain heterogeneity in effect sizes. An assumption in tests of categorical moderator effects is that of a constant between-study variance across all levels of the moderator. Although it rarely receives serious thought, there can be statistical ramifications to upholding this assumption. We propose that researchers should instead default to assuming unequal between-study variances when analysing categorical moderators. To achieve this, we suggest using a mixed-effects location-scale model (MELSM) to allow group-specific estimates for the between-study variance. In two extensive simulation studies, we show that in terms of Type I error and statistical power, little is lost by using the MELSM for moderator tests, but there can be serious costs when an equal variance mixed-effects model (MEM) is used. Most notably, in scenarios with balanced sample sizes or equal between-study variance, the Type I error and power rates are nearly identical between the MEM and the MELSM. On the other hand, with imbalanced sample sizes and unequal variances, the Type I error rate under the MEM can be grossly inflated or overly conservative, whereas the MELSM does comparatively well in controlling the Type I error across the majority of cases. A notable exception where the MELSM did not clearly outperform the MEM was in the case of few studies (e.g., 5). With respect to power, the MELSM had similar or higher power than the MEM in conditions where the latter produced non-inflated Type 1 error rates. Together, our results support the idea that assuming unequal between-study variances is preferred as a default strategy when testing categorical moderators. 相似文献