首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The purpose of the present study was to investigate the statistical properties of two extensions of the Levin-Wampold (1999) single-case simultaneous start-point model's comparative effectiveness randomization test. The two extensions were (a) adapting the test to situations where there are more than two different intervention conditions and (b) examining the test's performance in classroom-based intervention situations, where the number of time periods (and associated outcome observations) is much smaller than in the contexts for which the test was originally developed. Various Monte Carlo sampling situations were investigated, including from one to five participant blocks per condition and differing numbers of time periods, potential intervention start points, degrees of within-phase autocorrelation, and effect sizes. For all situations, it was found that the Type I error probability of the randomization test was maintained at an acceptable level. With a few notable exceptions, respectable power was observed only in situations where the numbers of observations and potential intervention start points were relatively large, effect sizes were large, and the degree of within-phase autocorrelation was relatively low. It was concluded that the comparative effectiveness randomization test, with its desirable internal validity and statistical-conclusion validity features, is a versatile analytic tool that can be incorporated into a variety of single-case school psychology intervention research situations.  相似文献   

2.
Manolov R  Arnau J  Solanas A  Bono R 《Psicothema》2010,22(4):1026-1032
The present study evaluates the performance of four methods for estimating regression coefficients used to make statistical decisions about intervention effectiveness in single-case designs. Ordinary least square estimation is compared to two correction techniques dealing with general trend and a procedure that eliminates autocorrelation whenever it is present. Type I error rates and statistical power are studied for experimental conditions defined by the presence or absence of treatment effect (change in level or in slope), general trend, and serial dependence. The results show that empirical Type I error rates do not approach the nominal ones in the presence of autocorrelation or general trend when ordinary and generalized least squares are applied. The techniques controlling trend show lower false alarm rates, but prove to be insufficiently sensitive to existing treatment effects. Consequently, the use of the statistical significance of the regression coefficients for detecting treatment effects is not recommended for short data series.  相似文献   

3.
It is well-known that for normally distributed errors parametric tests are optimal statistically, but perhaps less well-known is that when normality does not hold, nonparametric tests frequently possess greater statistical power than parametric tests, while controlling Type I error rate. However, the use of nonparametric procedures has been limited by the absence of easily performed tests for complex experimental designs and analyses and by limited information about their statistical behavior for realistic conditions. A Monte Carlo study of tests of predictor subsets in multiple regression analysis indicates that various nonparametric tests show greater power than the F test for skewed and heavy-tailed data. These nonparametric tests can be computed with available software.  相似文献   

4.
A Monte Carlo study compared 14 methods to test the statistical significance of the intervening variable effect. An intervening variable (mediator) transmits the effect of an independent variable to a dependent variable. The commonly used R. M. Baron and D. A. Kenny (1986) approach has low statistical power. Two methods based on the distribution of the product and 2 difference-in-coefficients methods have the most accurate Type I error rates and greatest statistical power except in 1 important case in which Type I error rates are too high. The best balance of Type I error and statistical power across all cases is the test of the joint significance of the two effects comprising the intervening variable effect.  相似文献   

5.
In this commentary, we add to the spirit of the articles appearing in the special series devoted to meta- and statistical analysis of single-case intervention-design data. Following a brief discussion of historical factors leading to our initial involvement in statistical analysis of such data, we discuss: (a) the value added by including statistical-analysis recommendations in the What Works Clearinghouse Standards for single-case intervention designs; (b) the importance of visual analysis in single-case intervention research, along with the distinctive role that could be played by single-case effect-size measures; and (c) the elevated internal validity and statistical-conclusion validity afforded by the incorporation of various forms of randomization into basic single-case design structures. For the future, we envision more widespread application of quantitative analyses, as critical adjuncts to visual analysis, in both primary single-case intervention research studies and literature reviews in the behavioral, educational, and health sciences.  相似文献   

6.
The Type I error probability and the power of the independent samples t test, performed directly on the ranks of scores in combined samples in place of the original scores, are known to be the same as those of the non‐parametric Wilcoxon–Mann–Whitney (WMW) test. In the present study, simulations revealed that these probabilities remain essentially unchanged when the number of ranks is reduced by assigning the same rank to multiple ordered scores. For example, if 200 ranks are reduced to as few as 20, or 10, or 5 ranks by replacing sequences of consecutive ranks by a single number, the Type I error probability and power stay about the same. Significance tests performed on these modular ranks consistently reproduce familiar findings about the comparative power of the t test and the WMW tests for normal and various non‐normal distributions. Similar results are obtained for modular ranks used in comparing the one‐sample t test and the Wilcoxon signed ranks test.  相似文献   

7.
N‐of‐1 study designs involve the collection and analysis of repeated measures data from an individual not using an intervention and using an intervention. This study explores the use of semi‐parametric and parametric bootstrap tests in the analysis of N‐of‐1 studies under a single time series framework in the presence of autocorrelation. When the Type I error rates of bootstrap tests are compared to Wald tests, our results show that the bootstrap tests have more desirable properties. We compare the results for normally distributed errors with those for contaminated normally distributed errors and find that, except when there is relatively large autocorrelation, there is little difference between the power of the parametric and semi‐parametric bootstrap tests. We also experiment with two intervention designs: ABAB and AB, and show the ABAB design has more power. The results provide guidelines for designing N‐of‐1 studies, in the sense of how many observations and how many intervention changes are needed to achieve a certain level of power and which test should be performed.  相似文献   

8.
Our goal is to provide empirical scientists with practical tools and advice with which to test hypotheses related to individual differences in intra-individual variability using the mixed-effects location-scale model. To that end, we evaluate Type I error rates and power to detect and predict individual differences in intra-individual variability using this model and provide empirically-based guidelines for building scale models that include random and/or systematically-varying fixed effects. We also provide two power simulation programs that allow researchers to conduct a priori empirical power analyses. Our results aligned with statistical power theory, in that, greater power was observed for designs with more individuals, more repeated occasions, greater proportions of variance available to be explained, and larger effect sizes. In addition, our results indicated that Type I error rates were acceptable in situations when individual differences in intra-individual variability were not initially detectable as well as when the scale-model individual-level predictor explained all initially detectable individual differences in intra-individual variability. We conclude our paper by providing study design and model building advice for those interested in using the mixed-effects location-scale model in practice.  相似文献   

9.
10.
R 2 effect-size measures are presented to assess variance accounted for in mediation models. The measures offer a means to evaluate both component paths and the overall mediated effect in mediation models. Statistical simulation results indicate acceptable bias across varying parameter and sample-size combinations. The measures are applied to a real-world example using data from a team-based health promotion program to improve the nutrition and exercise habits of firefighters. SAS and SPSS computer code are also provided for researchers to compute the measures in their own data.  相似文献   

11.
Because behavior analysis is a data-driven process, a critical skill for behavior analysts is accurate visual inspection and interpretation of single-case data. Study 1 was a basic study in which we increased the accuracy of visual inspection methods for A-B designs through two refinements of the split-middle (SM) method called the dual-criteria (DC) and conservative dual-criteria (CDC) methods. The accuracy of these visual inspection methods was compared with one another and with two statistical methods (Allison & Gorman, 1993; Gottman, 1981) using a computer-simulated Monte Carlo study. Results indicated that the DC and CDC methods controlled Type I error rates much better than the SM method and had considerably higher power (to detect real treatment effects) than the two statistical methods. In Study 2, brief verbal and written instructions with modeling were used to train 5 staff members to use the DC method, and in Study 3, these training methods were incorporated into a slide presentation and were used to rapidly (i.e., 15 min) train a large group of individuals (N = 87). Interpretation accuracy increased from a baseline mean of 55% to a treatment mean of 94% in Study 2 and from a baseline mean of 71% to a treatment mean of 95% in Study 3. Thus, Study 1 answered basic questions about the accuracy of several methods of interpreting A-B designs; Study 2 showed how that information could be used to increase the accuracy of human visual inspectors; and Study 3 showed how the training procedures from Study 2 could be modified into a format that would facilitate rapid training of large groups of individuals to interpret single-case designs.  相似文献   

12.
Multiple-baseline designs are an extension of the basic single-case AB phase designs, in which several of those AB designs are implemented simultaneously to different persons, behaviors, or settings, and the intervention is introduced in a staggered way to the different units. These designs are well-suited for research in the behavioral sciences. We discuss the advantages and limitations for valid inferences, and suggest a statistical technique—randomization tests—for use with multiple-baseline data, to complement visual analysis. In addition, we provide an extension of our SCRT-R package (which already contained means for conducting randomization tests on single-case phase and alternation designs), for multiple-baseline AB data.  相似文献   

13.
The Type I error rates and powers of three recent tests for analyzing nonorthogonal factorial designs under departures from the assumptions of homogeneity and normality were evaluated using Monte Carlo simulation. Specifically, this work compared the performance of the modified Brown-Forsythe procedure, the generalization of Box's method proposed by Brunner, Dette, and Munk, and the mixed-model procedure adjusted by the Kenward-Roger solution available in the SAS statistical package. With regard to robustness, the three approaches adequately controlled Type I error when the data were generated from symmetric distributions; however, this study's results indicate that, when the data were extracted from asymmetric distributions, the modified Brown-Forsythe approach controlled the Type I error slightly better than the other procedures. With regard to sensitivity, the higher power rates were obtained when the analyses were done with the MIXED procedure of the SAS program. Furthermore, results also identified that, when the data were generated from symmetric distributions, little power was sacrificed by using the generalization of Box's method in place of the modified Brown-Forsythe procedure.  相似文献   

14.
The purpose of this study was to develop and compare tests of independent groups' slopes for non-normal distributions and heteroscedastic variances, including slope tests based on least squares, Theil-Sen, and trimmed estimation approaches. A slope test based on jackknife standard error estimates is proposed that can utilize each of the traditional estimation methods while also addressing problematic aspects of methods' standard error estimates. Simulations demonstrate that the proposed jackknife-based slope tests can improve standard error estimation, Type I error, and power relative to the traditional slope tests.  相似文献   

15.
We discuss the statistical testing of three relevant hypotheses involving Cronbach's alpha: one where alpha equals a particular criterion; a second testing the equality of two alpha coefficients for independent samples; and a third testing the equality of two alpha coefficients for dependent samples. For each of these hypotheses, various statistical tests have been proposed. Over the years, these tests have depended on progressively fewer assumptions. We propose a new approach to testing the three hypotheses that relies on even fewer assumptions, is especially suited for discrete item scores, and can be applied easily to tests containing large numbers of items. The new approach uses marginal modelling. We compared the Type I error rate and the power of the marginal modelling approach to several of the available tests in a simulation study using realistic conditions. We found that the marginal modelling approach had the most accurate Type I error rates, whereas the power was similar across the statistical tests.  相似文献   

16.
The data obtained from one‐way independent groups designs is typically non‐normal in form and rarely equally variable across treatment populations (i.e. population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e. the analysis of variance F test) typically provides invalid results (e.g. too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non‐normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e. trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non‐normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non‐normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non‐normal.  相似文献   

17.
Type I error is a risk undertaken whenever significance tests are conducted, and the chances of committing a Type I error increase as the number of significance tests increases. But adjusting the alpha level because of the number of tests conducted in a given study has no principled basis, commits one to absurd beliefs and practices, and reduces statistical power. The practice of requiring or employing such adjustments should be abandoned.  相似文献   

18.
Previous studies of different methods of testing mediation models have consistently found two anomalous results. The first result is elevated Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap tests not found in nonresampling tests or in resampling tests that did not include a bias correction. This is of special concern as the bias-corrected bootstrap is often recommended and used due to its higher statistical power compared with other tests. The second result is statistical power reaching an asymptote far below 1.0 and in some conditions even declining slightly as the size of the relationship between X and M, a, increased. Two computer simulations were conducted to examine these findings in greater detail. Results from the first simulation found that the increased Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap are a function of an interaction between the size of the individual paths making up the mediated effect and the sample size, such that elevated Type I error rates occur when the sample size is small and the effect size of the nonzero path is medium or larger. Results from the second simulation found that stagnation and decreases in statistical power as a function of the effect size of the a path occurred primarily when the path between M and Y, b, was small. Two empirical mediation examples are provided using data from a steroid prevention and health promotion program aimed at high school football players (Athletes Training and Learning to Avoid Steroids; Goldberg et al., 1996), one to illustrate a possible Type I error for the bias-corrected bootstrap test and a second to illustrate a loss in power related to the size of a. Implications of these findings are discussed.  相似文献   

19.
Randomization tests are nonparametric statistical tests that obtain their validity by computationally mimicking the random assignment procedure that was used in the design phase of a study. Because randomization tests do not rely on a random sampling assumption, they can provide a better alternative than parametric statistical tests for analyzing data from single-case designs. In this article, an R package is described for use in designing single-case phase (AB, ABA, and ABAB) and alternation (completely randomized, alternating treatments, and randomized block) experiments, as well as for conducting statistical analyses on data gathered by means of such designs. The R code is presented in a step-by-step way, which at the same time clarifies the rationale behind single-case randomization tests.  相似文献   

20.
Research problems that require a non‐parametric analysis of multifactor designs with repeated measures arise in the behavioural sciences. There is, however, a lack of available procedures in commonly used statistical packages. In the present study, a generalization of the aligned rank test for the two‐way interaction is proposed for the analysis of the typical sources of variation in a three‐way analysis of variance (ANOVA) with repeated measures. It can be implemented in the usual statistical packages. Its statistical properties are tested by using simulation methods with two sample sizes (n = 30 and n = 10) and three distributions (normal, exponential and double exponential). Results indicate substantial increases in power for non‐normal distributions in comparison with the usual parametric tests. Similar levels of Type I error for both parametric and aligned rank ANOVA were obtained with non‐normal distributions and large sample sizes. Degrees‐of‐freedom adjustments for Type I error control in small samples are proposed. The procedure is applied to a case study with 30 participants per group where it detects gender differences in linguistic abilities in blind children not shown previously by other methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号