期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Detecting Treatment Effects with Small Samples: The Power of Some Tests Under the Randomization Model

Bryan?Keller Email author 《Psychometrika》2012,77(2):324-338

Randomization tests are often recommended when parametric assumptions may be violated because they require no distributional or random sampling assumptions in order to be valid. In addition to being exact, a randomization test may also be more powerful than its parametric counterpart. This was demonstrated in a simulation study which examined the conditional power of three nondirectional tests: the randomization t test, the Wilcoxon–Mann–Whitney (WMW) test, and the parametric t test. When the treatment effect was skewed, with degree of skewness correlated with the size of the effect, the randomization t test was systematically more powerful than the parametric t test. The relative power of the WMW test under the skewed treatment effect condition depended on the sample size ratio. 相似文献

2.

Error of measurement and the sensitivity of a test of significance

J. P. Sutcliffe 《Psychometrika》1958,23(1):9-17

Implications of random error of measurement for the sensitivity of theF test of differences between means are elaborated. By considering the mathematical models appropriate to design situations involving true and fallible measures, it is shown how measurement error decreases the sensitivity of a test of significance. A method of reducing such loss of sensitivity is described and recommended for general practice.I wish to express my thanks in acknowledgement that the present form of this paper has benefited from editorial comment, and from the advice of Dr. H. Mulhall of the Department of Mathematics, University of Sydney. 相似文献

3.

New heterogeneous test statistics for the unbalanced fixed‐effect nested design

Jiin‐Huarng Guo L. Billard Wei‐Ming Luh 《The British journal of mathematical and statistical psychology》2011,64(2):259-276

When the underlying variances are unknown or/and unequal, using the conventional F test is problematic in the two‐factor hierarchical data structure. Prompted by the approximate test statistics (Welch and Alexander–Govern methods), the authors develop four new heterogeneous test statistics to test factor A and factor B nested within A for the unbalanced fixed‐effect two‐stage nested design under variance heterogeneity. The actual significance levels and statistical power of the test statistics were compared in a simulation study. The results show that the proposed procedures maintain better Type I error rate control and have greater statistical power than those obtained by the conventional F test in various conditions. Therefore, the proposed test statistics are recommended in terms of robustness and easy implementation. 相似文献

4.

Randomization test for coupled data

Bruno D. Zumbo 《Attention, perception & psychophysics》1996,58(3):471-478

Coupled data arise in perceptual research when subjects are contributing two scores to the data pool. These two scores, it can be reasonably argued, cannot be assumed to be independent of one another; therefore, special treatment is needed when performing statistical inference. This paper shows how the Type I error rate of randomization-based inference is affected by coupled data. It is demonstrated through Monte Carlo simulation that a randomization test behaves much like its parametric counterpart except that, for the randomization test, a negative correlation results in an inflation in the Type I error rate. A new randomization test, the couplet-referenced randomization test, is developed and shown to work for sample sizes of 8 or more observations. An example is presented to demonstrate the computation and interpretation of the new randomization test. 相似文献

5.

Sample size determinations for Welch's test in one‐way heteroscedastic ANOVA

Show‐Li Jan Gwowen Shieh 《The British journal of mathematical and statistical psychology》2014,67(1):72-93

For one‐way fixed effects ANOVA, it is well known that the conventional F test of the equality of means is not robust to unequal variances, and numerous methods have been proposed for dealing with heteroscedasticity. On the basis of extensive empirical evidence of Type I error control and power performance, Welch's procedure is frequently recommended as the major alternative to the ANOVA F test under variance heterogeneity. To enhance its practical usefulness, this paper considers an important aspect of Welch's method in determining the sample size necessary to achieve a given power. Simulation studies are conducted to compare two approximate power functions of Welch's test for their accuracy in sample size calculations over a wide variety of model configurations with heteroscedastic structures. The numerical investigations show that Levy's (1978a) approach is clearly more accurate than the formula of Luh and Guo (2011) for the range of model specifications considered here. Accordingly, computer programs are provided to implement the technique recommended by Levy for power calculation and sample size determination within the context of the one‐way heteroscedastic ANOVA model. 相似文献

6.

A statistical test for means of samples from skew populations

Leon Festinger 《Psychometrika》1943,8(4):205-210

This paper presents a test for determining significance of differences between means of samples which are drawn from positively skewed populations, more specifically, those having a Pearson Type III distribution function. The quantity 2npx _g/x _p (wherep equals the mean squared divided by the variance andn is the number of cases in the sample), which distributes itself as Chi Square for 2np degrees of freedom, may be referred to the tables of Chi Square for testing hypotheses about the value of the true mean. For two independent samples, the larger mean divided by the smaller mean, which distributes itself asF for 2n ₁ p ₁ and 2n ₂ p ₂ degrees of freedom, may be referred to theF distribution tables for testing significance of difference between means. The test assumes that the range of possible scores is from zero to infinity. When a lower theoretical score limit (c) exists which is not zero, the quantity (Mean —c) should be used instead of the mean in all calculations. 相似文献

7.

Bootstrapping to test for nonzero population correlation coefficients using univariate sampling

Beasley WH Deshea L Toothaker LE Mendoza JL Bard DE Rodgers JL 《心理学方法》2007,12(4):414-433

This article proposes 2 new approaches to test a nonzero population correlation (rho): the hypothesis-imposed univariate sampling bootstrap (HI) and the observed-imposed univariate sampling bootstrap (OI). The authors simulated correlated populations with various combinations of normal and skewed variates. With alpha set=.05, N> or =10, and rho< or =0.4, empirical Type I error rates of the parametric r and the conventional bivariate sampling bootstrap reached .168 and .081, respectively, whereas the largest error rates of the HI and the OI were .079 and .062. On the basis of these results, the authors suggest that the OI is preferable in alpha control to parametric approaches if the researcher believes the population is nonnormal and wishes to test for nonzero rhos of moderate size. 相似文献

8.

Replicability and randomization test logic in behavior analysis

Kenneth W. Jacobs 《Journal of the experimental analysis of behavior》2019,111(2):329-341

Randomization tests are a class of nonparametric statistics that determine the significance of treatment effects. Unlike parametric statistics, randomization tests do not assume a random sample, or make any of the distributional assumptions that often preclude statistical inferences about single‐case data. A feature that randomization tests share with parametric statistics, however, is the derivation of a p‐value. P‐values are notoriously misinterpreted and are partly responsible for the putative “replication crisis.” Behavior analysts might question the utility of adding such a controversial index of statistical significance to their methods, so it is the aim of this paper to describe the randomization test logic and its potentially beneficial consequences. In doing so, this paper will: (1) address the replication crisis as a behavior analyst views it, (2) differentiate the problematic p‐values of parametric statistics from the, arguably, more useful p‐values of randomization tests, and (3) review the logic of randomization tests and their unique fit within the behavior analytic tradition of studying behavioral processes that cut across species. 相似文献

9.

Identifying Cheating on Unproctored Internet Tests: The Z‐test and the likelihood ratio test

Jing Guo Fritz Drasgow 《International Journal of Selection & Assessment》2010,18(4):351-364

Unproctored Internet testing (UIT) is becoming more popular in employment settings due to its cost effectiveness and efficiency. However, one of the major concerns with UIT is the possibility of cheating behaviors: a more capable conspirator can sit beside the real applicant and answer test items, or the applicant may use unauthorized materials. The present study examined the effectiveness of using a proctored verification test following the UIT to identify cheating in UIT, where 2 test statistics, a Z‐test and a likelihood ratio (LR) test, compare the consistency of test performance across the testing conditions. A simulation study was conducted to test the effectiveness of the two test statistics for a computerized adaptive test format. Results indicate that both test statistics have high power to detect dishonest job applicants at low Type I error rates. Compared with the LR test, the Z‐test was more efficient and effective and is therefore recommended for practical applications. The theoretical and practical implications are discussed. 相似文献

10.

Extensions of a versatile randomization test for assessing single-case intervention effects

Levin JR Lall VF Kratochwill TR 《Journal of School Psychology》2011,49(1):55-79

The purpose of the present study was to investigate the statistical properties of two extensions of the Levin-Wampold (1999) single-case simultaneous start-point model's comparative effectiveness randomization test. The two extensions were (a) adapting the test to situations where there are more than two different intervention conditions and (b) examining the test's performance in classroom-based intervention situations, where the number of time periods (and associated outcome observations) is much smaller than in the contexts for which the test was originally developed. Various Monte Carlo sampling situations were investigated, including from one to five participant blocks per condition and differing numbers of time periods, potential intervention start points, degrees of within-phase autocorrelation, and effect sizes. For all situations, it was found that the Type I error probability of the randomization test was maintained at an acceptable level. With a few notable exceptions, respectable power was observed only in situations where the numbers of observations and potential intervention start points were relatively large, effect sizes were large, and the degree of within-phase autocorrelation was relatively low. It was concluded that the comparative effectiveness randomization test, with its desirable internal validity and statistical-conclusion validity features, is a versatile analytic tool that can be incorporated into a variety of single-case school psychology intervention research situations. 相似文献

11.

Effect of non-normality on test statistics for one-way independent groups designs

Cribbie RA Fiksenbaum L Keselman HJ Wilcox RR 《The British journal of mathematical and statistical psychology》2012,65(1):56-73

The data obtained from one‐way independent groups designs is typically non‐normal in form and rarely equally variable across treatment populations (i.e. population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e. the analysis of variance F test) typically provides invalid results (e.g. too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non‐normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e. trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non‐normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non‐normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non‐normal. 相似文献

12.

Testing overall and moderator effects in random effects meta‐regression

Hilde M. Huizenga Ingmar Visser Conor V. Dolan 《The British journal of mathematical and statistical psychology》2011,64(1):1-19

Random effects meta‐regression is a technique to synthesize results of multiple studies. It allows for a test of an overall effect, as well as for tests of effects of study characteristics, that is, (discrete or continuous) moderator effects. We describe various procedures to test moderator effects: the z, t, likelihood ratio (LR), Bartlett‐corrected LR (BcLR), and resampling tests. We compare the Type I error of these tests, and conclude that the common z test, and to a lesser extent the LR test, do not perform well since they may yield Type I error rates appreciably larger than the chosen alpha. The error rate of the resampling test is accurate, closely followed by the BcLR test. The error rate of the t test is less accurate but arguably tolerable. With respect to statistical power, the BcLR and t tests slightly outperform the resampling test. Therefore, our recommendation is to use either the resampling or the BcLR test. If these statistics are unavailable, then the t test should be used since it is certainly superior to the z test. 相似文献

13.

A k-sample significance test for independent alpha coefficients 总被引：1，自引：0，他引：1

A. Ralph Hakstian Thomas E. Whalen 《Psychometrika》1976,41(2):219-231

The earlier two-sample procedure of Feldt [1969] for comparing independent alpha reliability coefficients is extended to the case ofK 2 independent samples. Details of a normalization of the statistic under consideration are presented, leading to computational procedures for the overallK-group significance test and accompanying multiple comparisons. Results based on computer simulation methods are presented, demonstrating that the procedures control Type I error adequately. The results of a power comparison of the case ofK=2 with Feldt's [1969]F test are also presented. The differences in power were negligible. Some final observations, along with suggestions for further research, are noted.The authors gratefully acknowledge the assistance of Michael E. Masson, in the computations performed, and of Leonard S. Feldt, in suggesting the data generation procedures used in the study. In addition, the authors thank James Zidek and the Institute of Applied Mathematics and Statistics, University of British Columbia, for advice concerning some of the theoretical development. 相似文献

14.

Is Human Behavior Autocorrelated? An Empirical Analysis

Georgios D. Sideridis Charles R. Greenwood 《Journal of Behavioral Education》1997,7(3):273-293

The current review and analysis investigated the presence of serial dependency (or autocorrelation) in single-subject applied behavior-analytic research. While well researched, few studies have controlled for the number of data points that appeared in the time-series and, thus, the negative bias of the r coefficient, and the power to detect true serial dependency effects. Therefore, all baseline graphs that appeared in the Journal of Applied Behavior Analysis (JABA) between 1968 and 1993 that provided more than 30 data points were examined for the presence of serial dependency (N = 103). Results indicated that 12% of the baseline graphs provided a significant lag-1 autocorrelation, and over 83% of them had coefficient values less than or equal to (±.25). The distribution of the lag-1 autocorrelation coefficients had a mean of .10. Subsequent distributions of partial autocorrelations at lags two through seven had smaller means indicating that as the distance between observations increases (i.e., the lag), serial dependency decreased. Although serial dependency did not appear to be a common property of the single-subject behavioral experiments, it is recommended that, whenever statistical analyses are contemplated, its presence should always be examined. Alternatives for coping with the presence of significant levels of serial dependency were discussed in terms of: (a) using alternative statistical procedures (e.g., ARIMA models, randomization tests, the Shewhart quality-control charts); (b) correcting statistics of traditional parametric procedures (e.g., t, F); or (c) using the autocorrelation coefficient as an indicator and estimate of reliable intervention effects. 相似文献

15.

An exact test of significance for means of samples drawn from populations with an exponential frequency distribution

Leon Festinger 《Psychometrika》1943,8(3):153-160

相似文献

16.

Robust tests of equivalence for k independent groups

Andy Koh Robert Cribbie 《The British journal of mathematical and statistical psychology》2013,66(3):426-434

A common question of interest to researchers in psychology is the equivalence of two or more groups. Failure to reject the null hypothesis of traditional hypothesis tests such as the ANOVA F‐test (i.e., H₀: μ₁ = … = μ_k) does not imply the equivalence of the population means. Researchers interested in determining the equivalence of k independent groups should apply a one‐way test of equivalence (e.g., Wellek, 2003). The goals of this study were to investigate the robustness of the one‐way Wellek test of equivalence to violations of homogeneity of variance assumption, and compare the Type I error rates and power of the Wellek test with a heteroscedastic version which was based on the logic of the one‐way Welch (1951) F‐test. The results indicate that the proposed Wellek–Welch test was insensitive to violations of the homogeneity of variance assumption, whereas the original Wellek test was not appropriate when the population variances were not equal. 相似文献

17.

An empirical study of the normality and independence of errors of measurement in test scores

Frederic M. Lord 《Psychometrika》1960,25(1):91-104

An empirical study of test scores shows the variance of the errors of measurement to be significantly associated with true score in each of four groups studied; it also shows the distribution of the errors of measurement to be significantly skewed in three of these four groups. The mathematical rationale underlying the statistical treatment is presented. Standard error formulas are given for making the necessary significance tests.This research was in part carried out under Contracts Nonr-2214(00) and Nonr-2752(00) with the Office of Naval Research, Department of the Navy. 相似文献

18.

Robust step‐down tests for multivariate independent group designs

《The British journal of mathematical and statistical psychology》2007,60(2):245-265

A composite step‐down procedure, in which a set of step‐down tests are summarized collectively with Fisher's combination statistic, was considered to test for multivariate mean equality in two‐group designs. An approximate degrees of freedom (ADF) composite procedure based on trimmed/Winsorized estimators and a non‐pooled estimate of error variance is proposed, and compared to a composite procedure based on trimmed/Winsorized estimators and a pooled estimate of error variance. The step‐down procedures were also compared to Hotelling's T² and Johansen's ADF global procedure based on trimmed estimators in a simulation study. Type I error rates of the pooled step‐down procedure were sensitive to covariance heterogeneity in unbalanced designs; error rates were similar to those of Hotelling's T² across all of the investigated conditions. Type I error rates of the ADF composite step‐down procedure were insensitive to covariance heterogeneity and less sensitive to the number of dependent variables when sample size was small than error rates of Johansen's test. The ADF composite step‐down procedure is recommended for testing hypotheses of mean equality in two‐group designs except when the data are sampled from populations with different degrees of multivariate skewness. 相似文献

19.

A note on statistical power in multi‐site randomized trials with multiple treatments at each site

Xiaofeng Steven Liu 《The British journal of mathematical and statistical psychology》2014,67(2):231-247

We derive the statistical power functions in multi‐site randomized trials with multiple treatments at each site, using multi‐level modelling. An F statistic is used to test multiple parameters in the multi‐level model instead of the Wald chi square test as suggested in the current literature. The F statistic is shown to be more conservative than the Wald statistic in testing any overall treatment effect among the multiple study conditions. In addition, we improvise an easy way to estimate the non‐centrality parameters for the means comparison t‐tests and the F test, using Helmert contrast coding in the multi‐level model. The variance of treatment means, which is difficult to fathom but necessary for power analysis, is decomposed into intuitive simple effect sizes in the contrast tests. The method is exemplified by a multi‐site evaluation study of the behavioural interventions for cannabis dependence. 相似文献

20.

A priori tests in repeated measures designs: Effects of nonsphericity

Robert J. Boik 《Psychometrika》1981,46(3):241-255

The validity conditions for univariate repeated measures designs are described. Attention is focused on the sphericity requirement. For av degree of freedom family of comparisons among the repeated measures, sphericity exists when all contrasts contained in thev dimensional space have equal variances. Under nonsphericity, upper and lower bounds on test size and power of a priori, repeated measures,F tests are derived. The effects of nonsphericity are illustrated by means of a set of charts. The charts reveal that small departures from sphericity (.97 <1.00) can seriously affect test size and power. It is recommended that separate rather than pooled error term procedures be routinely used to test a priori hypotheses.Appreciation is extended to Milton Parnes for his insightful assistance. 相似文献