期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Power and Type I errors for pairwise comparisons of means in the unequal variances case

Philip H. Ramsey Patricia P. Ramsey 《The British journal of mathematical and statistical psychology》2009,62(2):263-281

A Monte Carlo simulation was conducted to compare pairwise multiple comparison procedures. The number of means varied from 4 to 8 and the sample sizes varied from 2 to 500. Procedures were evaluated on the basis of Type I errors, any‐pair power and all‐pairs power. Two modifications of the Games and Howell procedure were shown to make it conservative. No procedure was found to be uniformly most powerful. For any pair power the Games and Howell procedure was found to be generally most powerful even when applied at more stringent levels to control Type I errors. For all pairs power the Peritz procedure applied with modified Brown–Forsythe tests was found to be most powerful in most conditions. 相似文献

2.

Comparing score tests and other local dependence diagnostics for the graded response model

Yang Liu David Thissen 《The British journal of mathematical and statistical psychology》2014,67(3):496-513

Score tests for identifying locally dependent item pairs have been proposed for binary item response models. In this article, both the bifactor and the threshold shift score tests are generalized to the graded response model. For the bifactor test, the generalization is straightforward; it adds one secondary dimension associated only with one pair of items. For the threshold shift test, however, multiple generalizations are possible: in particular, conditional, uniform, and linear shift tests are discussed in this article. Simulation studies show that all of the score tests have accurate Type I error rates given large enough samples, although their small‐sample behaviour is not as good as that of Pearson's Χ² and M₂ as proposed in other studies for the purpose of local dependence (LD) detection. All score tests have the highest power to detect the LD which is consistent with their parametric form, and in this case they are uniformly more powerful than Χ² and M₂; even wrongly specified score tests are more powerful than Χ² and M₂ in most conditions. An example using empirical data is provided for illustration. 相似文献

3.

Comparison of closed testing procedures for pairwise testing of means

Ramsey PH 《心理学方法》2002,7(4):504-523

A Monte Carlo simulation was conducted to compare 9 pairwise multiple comparison procedures. Procedures were evaluated on the basis of any-pair power and all-pairs power. No procedure was found to be uniformly most powerful. A modification due to A. J. Hayter (1986) of Fisher's least significant difference was found to provide the best combination of ease of use and moderately high any-pair power in most cases. Pilot or exploratory studies can expect good power results with this relatively simple procedure. The greatest all-pairs power was usually provided by 1 of 2 partition-based versions of E. Peritz's (1970) procedure. Confirmatory studies will require such complex methods but may also need larger sample sizes than have been customary in psychological research. 相似文献

4.

A Thurstonian model for the dual pair (4IAX) discrimination method

Benoît Rousseau Daniel M. Ennis 《Attention, perception & psychophysics》2001,63(6):1083-1090

The Institute for Perception, Richmond, Virginia In the dual pair method, the subject is presented with two stimuli in two pairs: One pair is composed of two samples of the same stimulus; the other pair is composed of two samples of different stimuli, one being the same as that in the identical pair. The task of the judge is to select the most different pair. The psychometric function for the dual pair method is derived and expressed in terms of a singly noncentral beta distribution. A table is provided that connects a measure of the degree of difference, d , to the probability of a correct response. This table assumes an unbiased observer and differencing decision rule. A table is provided to give an estimate of the variance of d¢, the experimental estimate of d. The power of the dual pair method is also investigated, and a formula to determine the sample size required to meet Type I and Type II error specifications is given. The dual pair method appears to be slightly less powerful than the duotrio and the triangular methods. Experimental investigation is needed to explore the dual pair in applied research work. 相似文献

5.

A Thurstonian model for the dual pair (4IAX) discrimination method

Rousseau B Ennis DM 《Perception & psychophysics》2001,63(6):1083-1090

In the dual pair method, the subject is presented with two stimuli in two pairs: One pair is composed of two samples of the same stimulus; the other pair is composed of two samples of different stimuli, one being the same as that in the identical pair. The task of the judge is to select the most different pair. The psychometric function for the dual pair method is derived and expressed in terms of a singly noncentral beta distribution. A table is provided that connects a measure of the degree of difference, d, to the probability of a correct response. This table assumes an unbiased observer and differencing decision rule. A table is provided to give an estimate of the variance of d , the experimental estimate of d. The power of the dual pair method is also investigated, and a formula to determine the sample size required to meet Type I and Type II error specifications is given. The dual pair method appears to be slightly less powerful than the duo-trio and the triangular methods. Experimental investigation is needed to explore the dual pair in applied research work. 相似文献

6.

Variable criteria sequential stopping rule: Validity and power with repeated measures ANOVA,multiple correlation,MANOVA and relation to Chi-square distribution

Douglas A. Fitts 《Behavior research methods》2018,50(5):1988-2003

The variable criteria sequential stopping rule (vcSSR) is an efficient way to add sample size to planned ANOVA tests while holding the observed rate of Type I errors, α_o, constant. The only difference from regular null hypothesis testing is that criteria for stopping the experiment are obtained from a table based on the desired power, rate of Type I errors, and beginning sample size. The vcSSR was developed using between-subjects ANOVAs, but it should work with p values from any type of F test. In the present study, the α_o remained constant at the nominal level when using the previously published table of criteria with repeated measures designs with various numbers of treatments per subject, Type I error rates, values of ρ, and four different sample size models. New power curves allow researchers to select the optimal sample size model for a repeated measures experiment. The criteria held α_o constant either when used with a multiple correlation that varied the sample size model and the number of predictor variables, or when used with MANOVA with multiple groups and two levels of a within-subject variable at various levels of ρ. Although not recommended for use with χ² tests such as the Friedman rank ANOVA test, the vcSSR produces predictable results based on the relation between F and χ². Together, the data confirm the view that the vcSSR can be used to control Type I errors during sequential sampling with any t- or F-statistic rather than being restricted to certain ANOVA designs. 相似文献

7.

New heterogeneous test statistics for the unbalanced fixed‐effect nested design

Jiin‐Huarng Guo L. Billard Wei‐Ming Luh 《The British journal of mathematical and statistical psychology》2011,64(2):259-276

When the underlying variances are unknown or/and unequal, using the conventional F test is problematic in the two‐factor hierarchical data structure. Prompted by the approximate test statistics (Welch and Alexander–Govern methods), the authors develop four new heterogeneous test statistics to test factor A and factor B nested within A for the unbalanced fixed‐effect two‐stage nested design under variance heterogeneity. The actual significance levels and statistical power of the test statistics were compared in a simulation study. The results show that the proposed procedures maintain better Type I error rate control and have greater statistical power than those obtained by the conventional F test in various conditions. Therefore, the proposed test statistics are recommended in terms of robustness and easy implementation. 相似文献

8.

Effect of non-normality on test statistics for one-way independent groups designs

Cribbie RA Fiksenbaum L Keselman HJ Wilcox RR 《The British journal of mathematical and statistical psychology》2012,65(1):56-73

The data obtained from one‐way independent groups designs is typically non‐normal in form and rarely equally variable across treatment populations (i.e. population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e. the analysis of variance F test) typically provides invalid results (e.g. too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non‐normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e. trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non‐normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non‐normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non‐normal. 相似文献

9.

A note on consistency of non-parametric rank tests and related rank transformations

Zimmerman DW 《The British journal of mathematical and statistical psychology》2012,65(1):122-144

The extent to which rank transformations result in the same statistical decisions as their non‐parametric counterparts is investigated. Simulations are presented using the Wilcoxon–Mann–Whitney test, the Wilcoxon signed‐rank test and the Kruskal–Wallis test, together with the rank transformations and t and F tests corresponding to each of those non‐parametric methods. In addition to Type I errors and power over all simulations, the study examines the consistency of the outcomes of the two methods on each individual sample. The results show how acceptance or rejection of the null hypothesis and differences in p‐values of the test statistics depend in a regular and predictable way on sample size, significance level, and differences between means, for normal and various non‐normal distributions. 相似文献

10.

Navigational place learning in children and young adults as assessed with a standardized locomotor search task

《British journal of psychology (London, England : 1953)》2003,94(3):299-317

Spatial behaviour was investigated using a spatial learning task based on the Radial Arm Maze, the Morris Water Maze, and open‐field search‐task procedures. Ninety‐six healthy children from six age groups (3, 4, 5, 7, 10 and 12 years) with no history of CNS disorders were studied with respect to the emergence of position‐, cue‐ and place responses. Participants were to detect x out of n hidden locations, frames of reference could be varied systematically, and three spatial memory errors and speed of navigation were recorded automatically. Task difficulties were equivalent for each age group. Results showed that navigational place learning was fully developed by the age of 10, whereas participants relied on cue orientation up to age 7. Even in the youngest group, the task could be achieved without relying on egocentric orientation, provided that proximal cues were presented. Most of the errors were of the reference memory type, whereas working memory errors were extremely rare. Speed of navigation markedly improved between age 5 and 7. An additional experiment showed that navigational place‐learning behaviour was clearly dependent on distal cues. A third study showed that in young adults, learning of the spatial layout improved, but performance on the place task did not improve any further. No sex differences were observed. 相似文献

11.

Research Quality: Critique of Quantitative Articles in the Journal of Counseling & Development

Kelly L. Wester L. DiAnne Borders Steven Boul Evette Horton 《Journal of counseling and development : JCD》2013,91(3):280-290

The purpose of this study was to examine the quality of quantitative articles published in the Journal of Counseling & Development. Quality concerns arose in regard to omissions of psychometric information of instruments, effect sizes, and statistical power. Type VI and II errors were found. Strengths included stated research questions and appropriateness of analyses. Implications of these results are provided. 相似文献

12.

An improved Hochberg procedure for multiple tests of significance

Dror M. Rom 《The British journal of mathematical and statistical psychology》2013,66(1):189-196

We propose a simple modification of Hochberg's step‐up Bonferroni procedure for multiple tests of significance. The proposed procedure is always more powerful than Hochberg's procedure for more than two tests, and is more powerful than Hommel's procedure for three and four tests. A numerical analysis of the new procedure indicates that its Type I error is controlled under independence of the test statistics, at a level equal to or just below the nominal Type I error. Examination of various non‐null configurations of hypotheses shows that the modified procedure has a power advantage over Hochberg's procedure which increases in relationship to the number of false hypotheses. 相似文献

13.

Tests for equality of several alpha coefficients when their sample estimates are dependent

David J. Woodruff Leonard S. Feldt 《Psychometrika》1986,51(3):393-413

In a variety of measurement situations, the researcher may wish to compare the reliabilities of several instruments administered to the same sample of subjects. This paper presents eleven statistical procedures which test the equality ofm coefficient alphas when the sample alpha coefficients are dependent. Several of the procedures are derived in detail, and numerical examples are given for two. Since all of the procedures depend on approximate asymptotic results, Monte Carlo methods are used to assess the accuracy of the procedures for sample sizes of 50, 100, and 200. Both control of Type I error and power are evaluated by computer simulation. Two of the procedures are unable to control Type I errors satisfactorily. The remaining nine procedures perform properly, but three are somewhat superior in power and Type I error control.A more detailed version of this paper is also available. 相似文献

14.

Sample size determinations for Welch's test in one‐way heteroscedastic ANOVA

Show‐Li Jan Gwowen Shieh 《The British journal of mathematical and statistical psychology》2014,67(1):72-93

For one‐way fixed effects ANOVA, it is well known that the conventional F test of the equality of means is not robust to unequal variances, and numerous methods have been proposed for dealing with heteroscedasticity. On the basis of extensive empirical evidence of Type I error control and power performance, Welch's procedure is frequently recommended as the major alternative to the ANOVA F test under variance heterogeneity. To enhance its practical usefulness, this paper considers an important aspect of Welch's method in determining the sample size necessary to achieve a given power. Simulation studies are conducted to compare two approximate power functions of Welch's test for their accuracy in sample size calculations over a wide variety of model configurations with heteroscedastic structures. The numerical investigations show that Levy's (1978a) approach is clearly more accurate than the formula of Luh and Guo (2011) for the range of model specifications considered here. Accordingly, computer programs are provided to implement the technique recommended by Levy for power calculation and sample size determination within the context of the one‐way heteroscedastic ANOVA model. 相似文献

15.

Adaptive robust estimation and testing

《The British journal of mathematical and statistical psychology》2007,60(2):267-293

We examined nine adaptive methods of trimming, that is, methods that empirically determine when data should be trimmed and the amount to be trimmed from the tails of the empirical distribution. Over the 240 empirical values collected for each method investigated, in which we varied the total percentage of data trimmed, sample size, degree of variance heterogeneity, pairing of variances and group sizes, and population shape, one method resulted in exceptionally good control of Type I errors. However, under less extreme cases of non‐normality and variance heterogeneity a number of methods exhibited reasonably good Type I error control. With regard to the power to detect non‐null treatment effects, we found that the choice among the methods depended on the degree of non‐normality and variance heterogeneity. Recommendations are offered. 相似文献

16.

Repeated measures one‐way ANOVA based on a modified one‐step M‐estimator

《The British journal of mathematical and statistical psychology》2003,56(1):15-25

Wilcox, Keselman, Muska and Cribbie (2000) found a method for comparing the trimmed means of dependent groups that performed well in simulations, in terms of Type I errors, with a sample size as small as 21. Theory and simulations indicate that little power is lost under normality when using trimmed means rather than untrimmed means, and trimmed means can result in substantially higher power when sampling from a heavy‐tailed distribution. However, trimmed means suffer from two practical concerns described in this paper. Replacing trimmed means with a robust M‐estimator addresses these concerns, but control over the probability of a Type I error can be unsatisfactory when the sample size is small. Methods based on a simple modification of a one‐step M‐estimator that address the problems with trimmed means are examined. Several omnibus tests are compared, one of which performed well in simulations, even with a sample size of 11. 相似文献

17.

A local model of concurrent performance 总被引：5，自引：5，他引：0

下载免费PDF全文

Macdonall J 《Journal of the experimental analysis of behavior》1999,71(1):57-74

Concurrent procedures may be conceptualized as consisting of two pairs of schedules with only one pair operating at a time. One schedule of each pair arranges reinforcers for staying in the current alternative, and the other schedule arranges reinforcers for switching to the other alternative. These pairs alternate operation as the animal switches between choices. This analysis of the contingencies suggests that variables operating within an alternative produce behavior that conforms to the generalized matching law. Rats were exposed to one pair of stay and switch schedules in each condition, and the probabilities of reinforcement varied across conditions. Both run length and visit duration were power functions of the ratio of the probabilities of reinforcement for staying and switching. The local model, a model of performance on concurrent procedures, was derived from this power function. Performance on concurrent schedules was synthesized from the performances on the separate pairs. Both the generalized matching law and the local model fitted the synthesized concurrent performances. These results are consistent with the view that the contingencies in the alternative, the probability of stay and switch reinforcement, are responsible for performance consistent with the generalized matching law. These results are compatible with momentary maximizing and molar maximizing accounts of concurrent performance. Models of concurrent performance that posit comparisons among the alternatives are not easily applied to these results. 相似文献

18.

A note on statistical power in multi‐site randomized trials with multiple treatments at each site

Xiaofeng Steven Liu 《The British journal of mathematical and statistical psychology》2014,67(2):231-247

We derive the statistical power functions in multi‐site randomized trials with multiple treatments at each site, using multi‐level modelling. An F statistic is used to test multiple parameters in the multi‐level model instead of the Wald chi square test as suggested in the current literature. The F statistic is shown to be more conservative than the Wald statistic in testing any overall treatment effect among the multiple study conditions. In addition, we improvise an easy way to estimate the non‐centrality parameters for the means comparison t‐tests and the F test, using Helmert contrast coding in the multi‐level model. The variance of treatment means, which is difficult to fathom but necessary for power analysis, is decomposed into intuitive simple effect sizes in the contrast tests. The method is exemplified by a multi‐site evaluation study of the behavioural interventions for cannabis dependence. 相似文献

19.

THE ALPHA PERCENTAGE AND EXPERIMENTWISE ERROR RATES IN COMMUNICATION RESEARCH

THOMAS M. STEINFATT 《人类交流研究》1979,5(4):366-374

Experimentwise error rates of the type proposed by Ryan (1959) are discussed and contrasted with anew measure of the likelihood that the results of a series of significance tests are Type I errors. This new measure, the Alpha Percentage (a%), shares the advantages of experimentwise error rates over individual alpha levels in reducing Type I errors in communication research, but the Alpha Percentage has much greater power than currently used experimentwise error rates to detect significant effects. Four arguments against the use of experimentwise error procedures are discussed and EW, EP, and a% rates are reported for Communication Monographs and Human Communication Research. 相似文献

20.

Properties of bootstrap tests for N‐of‐1 studies

下载免费PDF全文

Sharon X. Lin Leanne Morrison Peter W. F. Smith Charlie Hargood Mark Weal Lucy Yardley 《The British journal of mathematical and statistical psychology》2016,69(3):276-290

N‐of‐1 study designs involve the collection and analysis of repeated measures data from an individual not using an intervention and using an intervention. This study explores the use of semi‐parametric and parametric bootstrap tests in the analysis of N‐of‐1 studies under a single time series framework in the presence of autocorrelation. When the Type I error rates of bootstrap tests are compared to Wald tests, our results show that the bootstrap tests have more desirable properties. We compare the results for normally distributed errors with those for contaminated normally distributed errors and find that, except when there is relatively large autocorrelation, there is little difference between the power of the parametric and semi‐parametric bootstrap tests. We also experiment with two intervention designs: ABAB and AB, and show the ABAB design has more power. The results provide guidelines for designing N‐of‐1 studies, in the sense of how many observations and how many intervention changes are needed to achieve a certain level of power and which test should be performed. 相似文献