首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The program DATASIM is used to simulate the classic “horn-honking” study by Doob and Gross (1968). In a 2×2 field experiment, Doob and Gross investigated the effects of status of frustrator—a low- or high-status car blocking an intersection—on the latency to honk among male and female drivers. The present paper illustrates how to extract the values of simulation parameters from the published study, how to initialize the simulation in DATASIM, and how to generate and analyze the simulated data. Certain complications arise because the latency data collected by Doob and Gross were nonnormally distributed, cell variances were heterogeneous, and sample sizes were unequal. DATASIM is able to incorporate these features in the simulation, and several methods for assessing the quality of the simulation are illustrated. In addition, sampling experiments are reported, which were performed to assess the joint and individual effects of nonnormality and heterogeneity on the Type I and Type II error rates of theF test. The paper concludes with some practical suggestions regarding how researchers can evaluate, and adjust for, the effects of such violations.  相似文献   

2.
The Type I error rates and powers of three recent tests for analyzing nonorthogonal factorial designs under departures from the assumptions of homogeneity and normality were evaluated using Monte Carlo simulation. Specifically, this work compared the performance of the modified Brown-Forsythe procedure, the generalization of Box's method proposed by Brunner, Dette, and Munk, and the mixed-model procedure adjusted by the Kenward-Roger solution available in the SAS statistical package. With regard to robustness, the three approaches adequately controlled Type I error when the data were generated from symmetric distributions; however, this study's results indicate that, when the data were extracted from asymmetric distributions, the modified Brown-Forsythe approach controlled the Type I error slightly better than the other procedures. With regard to sensitivity, the higher power rates were obtained when the analyses were done with the MIXED procedure of the SAS program. Furthermore, results also identified that, when the data were generated from symmetric distributions, little power was sacrificed by using the generalization of Box's method in place of the modified Brown-Forsythe procedure.  相似文献   

3.
Previous studies of different methods of testing mediation models have consistently found two anomalous results. The first result is elevated Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap tests not found in nonresampling tests or in resampling tests that did not include a bias correction. This is of special concern as the bias-corrected bootstrap is often recommended and used due to its higher statistical power compared with other tests. The second result is statistical power reaching an asymptote far below 1.0 and in some conditions even declining slightly as the size of the relationship between X and M, a, increased. Two computer simulations were conducted to examine these findings in greater detail. Results from the first simulation found that the increased Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap are a function of an interaction between the size of the individual paths making up the mediated effect and the sample size, such that elevated Type I error rates occur when the sample size is small and the effect size of the nonzero path is medium or larger. Results from the second simulation found that stagnation and decreases in statistical power as a function of the effect size of the a path occurred primarily when the path between M and Y, b, was small. Two empirical mediation examples are provided using data from a steroid prevention and health promotion program aimed at high school football players (Athletes Training and Learning to Avoid Steroids; Goldberg et al., 1996), one to illustrate a possible Type I error for the bias-corrected bootstrap test and a second to illustrate a loss in power related to the size of a. Implications of these findings are discussed.  相似文献   

4.
The goal of this study was to investigate the performance of Hall’s transformation of the Brunner-Dette-Munk (BDM) and Welch-James (WJ) test statistics and Box-Cox’s data transformation in factorial designs when normality and variance homogeneity assumptions were violated separately and jointly. On the basis of unweighted marginal means, we performed a simulation study to explore the operating characteristics of the methods proposed for a variety of distributions with small sample sizes. Monte Carlo simulation results showed that when data were sampled from symmetric distributions, the error rates of the original BDM and WJ tests were scarcely affected by the lack of normality and homogeneity of variance. In contrast, when data were sampled from skewed distributions, the original BDM and WJ rates were not well controlled. Under such circumstances, the results clearly revealed that Hall’s transformation of the BDM and WJ tests provided generally better control of Type I error rates than did the same tests based on Box-Cox’s data transformation. Among all the methods considered in this study, we also found that Hall’s transformation of the BDM test yielded the best control of Type I errors, although it was often less powerful than either of the WJ tests when both approaches reasonably controlled the error rates.  相似文献   

5.
We discuss the statistical testing of three relevant hypotheses involving Cronbach's alpha: one where alpha equals a particular criterion; a second testing the equality of two alpha coefficients for independent samples; and a third testing the equality of two alpha coefficients for dependent samples. For each of these hypotheses, various statistical tests have been proposed. Over the years, these tests have depended on progressively fewer assumptions. We propose a new approach to testing the three hypotheses that relies on even fewer assumptions, is especially suited for discrete item scores, and can be applied easily to tests containing large numbers of items. The new approach uses marginal modelling. We compared the Type I error rate and the power of the marginal modelling approach to several of the available tests in a simulation study using realistic conditions. We found that the marginal modelling approach had the most accurate Type I error rates, whereas the power was similar across the statistical tests.  相似文献   

6.
The variable criteria sequential stopping rule (vcSSR) is an efficient way to add sample size to planned ANOVA tests while holding the observed rate of Type I errors, αo, constant. The only difference from regular null hypothesis testing is that criteria for stopping the experiment are obtained from a table based on the desired power, rate of Type I errors, and beginning sample size. The vcSSR was developed using between-subjects ANOVAs, but it should work with p values from any type of F test. In the present study, the αo remained constant at the nominal level when using the previously published table of criteria with repeated measures designs with various numbers of treatments per subject, Type I error rates, values of ρ, and four different sample size models. New power curves allow researchers to select the optimal sample size model for a repeated measures experiment. The criteria held αo constant either when used with a multiple correlation that varied the sample size model and the number of predictor variables, or when used with MANOVA with multiple groups and two levels of a within-subject variable at various levels of ρ. Although not recommended for use with χ2 tests such as the Friedman rank ANOVA test, the vcSSR produces predictable results based on the relation between F and χ2. Together, the data confirm the view that the vcSSR can be used to control Type I errors during sequential sampling with any t- or F-statistic rather than being restricted to certain ANOVA designs.  相似文献   

7.
Research problems that require a non‐parametric analysis of multifactor designs with repeated measures arise in the behavioural sciences. There is, however, a lack of available procedures in commonly used statistical packages. In the present study, a generalization of the aligned rank test for the two‐way interaction is proposed for the analysis of the typical sources of variation in a three‐way analysis of variance (ANOVA) with repeated measures. It can be implemented in the usual statistical packages. Its statistical properties are tested by using simulation methods with two sample sizes (n = 30 and n = 10) and three distributions (normal, exponential and double exponential). Results indicate substantial increases in power for non‐normal distributions in comparison with the usual parametric tests. Similar levels of Type I error for both parametric and aligned rank ANOVA were obtained with non‐normal distributions and large sample sizes. Degrees‐of‐freedom adjustments for Type I error control in small samples are proposed. The procedure is applied to a case study with 30 participants per group where it detects gender differences in linguistic abilities in blind children not shown previously by other methods.  相似文献   

8.
The conventional approach for testing the equality of two normal mean vectors is to test first the equality of covariance matrices, and if the equality assumption is tenable, then use the two-sample Hotelling T 2 test. Otherwise one can use one of the approximate tests for the multivariate Behrens–Fisher problem. In this article, we study the properties of the Hotelling T 2 test, the conventional approach, and one of the best approximate invariant tests (Krishnamoorthy & Yu, 2004) for the Behrens–Fisher problem. Our simulation studies indicated that the conventional approach often leads to inflated Type I error rates. The approximate test not only controls Type I error rates very satisfactorily when covariance matrices were arbitrary but was also comparable with the T 2 test when covariance matrices were equal.  相似文献   

9.
We examine methods for measuring performance in signal-detection-like tasks when each participant provides only a few observations. Monte Carlo simulations demonstrate that standard statistical techniques applied to ad’ analysis can lead to large numbers of Type I errors (incorrectly rejecting a hypothesis of no difference). Various statistical methods were compared in terms of their Type I and Type II error (incorrectly accepting a hypothesis of no difference) rates. Our conclusions are the same whether these two types of errors are weighted equally or Type I errors are weighted more heavily. The most promising method is to combine an aggregated’ measure with a percentile bootstrap confidence interval, a computerintensive nonparametric method of statistical inference. Researchers who prefer statistical techniques more commonly used in psychology, such as a repeated measurest test, should useγ (Goodman & Kruskal, 1954), since it performs slightly better than or nearly as well asd’. In general, when repeated measurest tests are used,γ is more conservative thand’: It makes more Type II errors, but its Type I error rate tends to be much closer to that of the traditional .05 α level. It is somewhat surprising thatγ performs as well as it does, given that the simulations that generated the hypothetical data conformed completely to thed’ model. Analyses in which H—FA was used had the highest Type I error rates. Detailed simulation results can be downloaded fromwww.psychonomic.org/archive/Schooler-BRM-2004.zip.  相似文献   

10.
Categorical moderators are often included in mixed-effects meta-analysis to explain heterogeneity in effect sizes. An assumption in tests of categorical moderator effects is that of a constant between-study variance across all levels of the moderator. Although it rarely receives serious thought, there can be statistical ramifications to upholding this assumption. We propose that researchers should instead default to assuming unequal between-study variances when analysing categorical moderators. To achieve this, we suggest using a mixed-effects location-scale model (MELSM) to allow group-specific estimates for the between-study variance. In two extensive simulation studies, we show that in terms of Type I error and statistical power, little is lost by using the MELSM for moderator tests, but there can be serious costs when an equal variance mixed-effects model (MEM) is used. Most notably, in scenarios with balanced sample sizes or equal between-study variance, the Type I error and power rates are nearly identical between the MEM and the MELSM. On the other hand, with imbalanced sample sizes and unequal variances, the Type I error rate under the MEM can be grossly inflated or overly conservative, whereas the MELSM does comparatively well in controlling the Type I error across the majority of cases. A notable exception where the MELSM did not clearly outperform the MEM was in the case of few studies (e.g., 5). With respect to power, the MELSM had similar or higher power than the MEM in conditions where the latter produced non-inflated Type 1 error rates. Together, our results support the idea that assuming unequal between-study variances is preferred as a default strategy when testing categorical moderators.  相似文献   

11.
Researchers often want to demonstrate a lack of interaction between two categorical predictors on an outcome. To justify a lack of interaction, researchers typically accept the null hypothesis of no interaction from a conventional analysis of variance (ANOVA). This method is inappropriate as failure to reject the null hypothesis does not provide statistical evidence to support a lack of interaction. This study proposes a bootstrap‐based intersection–union test for negligible interaction that provides coherent decisions between the omnibus test and post hoc interaction contrast tests and is robust to violations of the normality and variance homogeneity assumptions. Further, a multiple comparison strategy for testing interaction contrasts following a non‐significant omnibus test is proposed. Our simulation study compared the Type I error control, omnibus power and per‐contrast power of the proposed approach to the non‐centrality‐based negligible interaction test of Cheng and Shao (2007, Statistica Sinica, 17, 1441). For 2 × 2 designs, the empirical Type I error rates of the Cheng and Shao test were very close to the nominal α level when the normality and variance homogeneity assumptions were satisfied; however, only our proposed bootstrapping approach was satisfactory under non‐normality and/or variance heterogeneity. In general a × b designs, although the omnibus Cheng and Shao test, as expected, is the most powerful, it is not robust to assumption violation and results in incoherent omnibus and interaction contrast decisions that are not possible with the intersection–union approach.  相似文献   

12.
Recently, a nonparametric technique called bootstrapping has been recommended over the more well-known analysis of variance (ANOVA) for analyzing repeated measures data. Advocates cite as bootstrap’s advantages over ANOVA the fact that the former uses distributional information and is free of normal theory assumptions. The present study used a computer simulation to compare the two techniques calculated using data sampled from normal and nonnormal distributions. The parametric test had adequate control of Type I error rates; the nonparametric test had overly liberal Type I error rates and therefore is not recommended.  相似文献   

13.
The purpose of this study was to evaluate a modified test of equivalence for conducting normative comparisons when distribution shapes are non‐normal and variances are unequal. A Monte Carlo study was used to compare the empirical Type I error rates and power of the proposed Schuirmann–Yuen test of equivalence, which utilizes trimmed means, with that of the previously recommended Schuirmann and Schuirmann–Welch tests of equivalence when the assumptions of normality and variance homogeneity are satisfied, as well as when they are not satisfied. The empirical Type I error rates of the Schuirmann–Yuen were much closer to the nominal α level than those of the Schuirmann or Schuirmann–Welch tests, and the power of the Schuirmann–Yuen was substantially greater than that of the Schuirmann or Schuirmann–Welch tests when distributions were skewed or outliers were present. The Schuirmann–Yuen test is recommended for assessing clinical significance with normative comparisons.  相似文献   

14.
Repeated measures analyses of variance are the method of choice in many studies from experimental psychology and the neurosciences. Data from these fields are often characterized by small sample sizes, high numbers of factor levels of the within-subjects factor(s), and nonnormally distributed response variables such as response times. For a design with a single within-subjects factor, we investigated Type I error control in univariate tests with corrected degrees of freedom, the multivariate approach, and a mixed-model (multilevel) approach (SAS PROC MIXED) with Kenward–Roger’s adjusted degrees of freedom. We simulated multivariate normal and nonnormal distributions with varied population variance–covariance structures (spherical and nonspherical), sample sizes (N), and numbers of factor levels (K). For normally distributed data, as expected, the univariate approach with Huynh–Feldt correction controlled the Type I error rate with only very few exceptions, even if samples sizes as low as three were combined with high numbers of factor levels. The multivariate approach also controlled the Type I error rate, but it requires NK. PROC MIXED often showed acceptable control of the Type I error rate for normal data, but it also produced several liberal or conservative results. For nonnormal data, all of the procedures showed clear deviations from the nominal Type I error rate in many conditions, even for sample sizes greater than 50. Thus, none of these approaches can be considered robust if the response variable is nonnormally distributed. The results indicate that both the variance heterogeneity and covariance heterogeneity of the population covariance matrices affect the error rates.  相似文献   

15.
The authors conducted a Monte Carlo simulation of 8 statistical tests for comparing dependent zero-order correlations. In particular, they evaluated the Type I error rates and power of a number of test statistics for sample sizes (Ns) of 20, 50, 100, and 300 under 3 different population distributions (normal, uniform, and exponential). For the Type I error rate analyses, the authors evaluated 3 different magnitudes of the predictor-criterion correlations (rho(y,x1) = rho(y,x2) = .1, .4, and .7). For the power analyses, they examined 3 different effect sizes or magnitudes of discrepancy between rho(y,x1) and rho(y,x2) (values of .1, .3, and .6). They conducted all of the simulations at 3 different levels of predictor intercorrelation (rho(x1,x2) = .1, .3, and .6). The results indicated that both Type I error rate and power depend not only on sample size and population distribution, but also on (a) the predictor intercorrelation and (b) the effect size (for power) or the magnitude of the predictor-criterion correlations (for Type I error rate). When the authors considered Type I error rate and power simultaneously, the findings suggested that O. J. Dunn and V. A. Clark's (1969) z and E. J. Williams's (1959) t have the best overall statistical properties. The findings extend and refine previous simulation research and as such, should have greater utility for applied researchers.  相似文献   

16.
Random effects meta‐regression is a technique to synthesize results of multiple studies. It allows for a test of an overall effect, as well as for tests of effects of study characteristics, that is, (discrete or continuous) moderator effects. We describe various procedures to test moderator effects: the z, t, likelihood ratio (LR), Bartlett‐corrected LR (BcLR), and resampling tests. We compare the Type I error of these tests, and conclude that the common z test, and to a lesser extent the LR test, do not perform well since they may yield Type I error rates appreciably larger than the chosen alpha. The error rate of the resampling test is accurate, closely followed by the BcLR test. The error rate of the t test is less accurate but arguably tolerable. With respect to statistical power, the BcLR and t tests slightly outperform the resampling test. Therefore, our recommendation is to use either the resampling or the BcLR test. If these statistics are unavailable, then the t test should be used since it is certainly superior to the z test.  相似文献   

17.
Two types of global testing procedures for item fit to the Rasch model were evaluated using simulation studies. The first type incorporates three tests based on first‐order statistics: van den Wollenberg's Q1 test, Glas's R1 test, and Andersen's LR test. The second type incorporates three tests based on second‐order statistics: van den Wollenberg's Q2 test, Glas's R2 test, and a non‐parametric test proposed by Ponocny. The Type I error rates and the power against the violation of parallel item response curves, unidimensionality and local independence were analysed in relation to sample size and test length. In general, the outcomes indicate a satisfactory performance of all tests, except the Q2 test which exhibits an inflated Type I error rate. Further, it was found that both types of tests have power against all three types of model violation. A possible explanation is the interdependencies among the assumptions underlying the model.  相似文献   

18.
In this paper, we describe a general purpose data simulator, Datasim, which is useful for anyone conducting computer-based laboratory assignments in statistics. Simulations illustrating sampling distributions, the central limit theorem, Type I and Type II decision errors, the power of a test, the effects of violating assumptions, and the distinction between orthogonal and non-orthogonal contrasts are discussed. Simulations illustrating other statistical concepts—partial correlation, regression to the mean, heteroscedasticity, the partitioning of error terms in splitplot designs, and so on—can be developed easily. Simulations can be assigned as laboratory exercises, or the instructor can execute the simulations during class, integrate the results into an ongoing lecture, and use the results to initiate class discussion of the relevant statistical concepts.  相似文献   

19.
A person fit test based on the Lagrange multiplier test is presented for three item response theory models for polytomous items: the generalized partial credit model, the sequential model, and the graded response model. The test can also be used in the framework of multidimensional ability parameters. It is shown that the Lagrange multiplier statistic can take both the effects of estimation of the item parameters and the estimation of the person parameters into account. The Lagrange multiplier statistic has an asymptotic χ2-distribution. The Type I error rate and power are investigated using simulation studies. Results show that test statistics that ignore the effects of estimation of the persons’ ability parameters have decreased Type I error rates and power. Incorporating a correction to account for the effects of the estimation of the persons’ ability parameters results in acceptable Type I error rates and power characteristics; incorporating a correction for the estimation of the item parameters has very little additional effect. It is investigated to what extent the three models give comparable results, both in the simulation studies and in an example using data from the NEO Personality Inventory-Revised.  相似文献   

20.
A composite step‐down procedure, in which a set of step‐down tests are summarized collectively with Fisher's combination statistic, was considered to test for multivariate mean equality in two‐group designs. An approximate degrees of freedom (ADF) composite procedure based on trimmed/Winsorized estimators and a non‐pooled estimate of error variance is proposed, and compared to a composite procedure based on trimmed/Winsorized estimators and a pooled estimate of error variance. The step‐down procedures were also compared to Hotelling's T2 and Johansen's ADF global procedure based on trimmed estimators in a simulation study. Type I error rates of the pooled step‐down procedure were sensitive to covariance heterogeneity in unbalanced designs; error rates were similar to those of Hotelling's T2 across all of the investigated conditions. Type I error rates of the ADF composite step‐down procedure were insensitive to covariance heterogeneity and less sensitive to the number of dependent variables when sample size was small than error rates of Johansen's test. The ADF composite step‐down procedure is recommended for testing hypotheses of mean equality in two‐group designs except when the data are sampled from populations with different degrees of multivariate skewness.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号