首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Valid use of the traditional independent samples ANOVA procedure requires that the population variances are equal. Previous research has investigated whether variance homogeneity tests, such as Levene's test, are satisfactory as gatekeepers for identifying when to use or not to use the ANOVA procedure. This research focuses on a novel homogeneity of variance test that incorporates an equivalence testing approach. Instead of testing the null hypothesis that the variances are equal against an alternative hypothesis that the variances are not equal, the equivalence-based test evaluates the null hypothesis that the difference in the variances falls outside or on the border of a predetermined interval against an alternative hypothesis that the difference in the variances falls within the predetermined interval. Thus, with the equivalence-based procedure, the alternative hypothesis is aligned with the research hypothesis (variance equality). A simulation study demonstrated that the equivalence-based test of population variance homogeneity is a better gatekeeper for the ANOVA than traditional homogeneity of variance tests.  相似文献   

2.
Ayala Cohen 《Psychometrika》1986,51(3):379-391
A test is proposed for the equality of the variances ofk 2 correlated variables. Pitman's test fork = 2 reduces the null hypothesis to zero correlation between their sum and their difference. Its extension, eliminating nuisance parameters by a bootstrap procedure, is valid for any correlation structure between thek normally distributed variables. A Monte Carlo study for several combinations of sample sizes and number of variables is presented, comparing the level and power of the new method with previously published tests. Some nonnormal data are included, for which the empirical level tends to be slightly higher than the nominal one. The results show that our method is close in power to the asymptotic tests which are extremely sensitive to nonnormality, yet it is robust and much more powerful than other robust tests.This research was supported by the fund for the promotion of research at the Technion.  相似文献   

3.
Solving theoretical or empirical issues sometimes involves establishing the equality of two variables with repeated measures. This defies the logic of null hypothesis significance testing, which aims at assessing evidence against the null hypothesis of equality, not for it. In some contexts, equivalence is assessed through regression analysis by testing for zero intercept and unit slope (or simply for unit slope in case that regression is forced through the origin). This paper shows that this approach renders highly inflated Type I error rates under the most common sampling models implied in studies of equivalence. We propose an alternative approach based on omnibus tests of equality of means and variances and in subject-by-subject analyses (where applicable), and we show that these tests have adequate Type I error rates and power. The approach is illustrated with a re-analysis of published data from a signal detection theory experiment with which several hypotheses of equivalence had been tested using only regression analysis. Some further errors and inadequacies of the original analyses are described, and further scrutiny of the data contradict the conclusions raised through inadequate application of regression analyses.  相似文献   

4.
In structural equation modeling, incremental fit indices are based on the comparison of the fit of a substantive model to that of a null model. The standard null model yields unconstrained estimates of the variance (and mean, if included) of each manifest variable. For many models, however, the standard null model is an improper comparison model. In these cases, incremental fit index values reported automatically by structural modeling software have no interpretation and should be disregarded. The authors explain how to formulate an acceptable, modified null model, predict changes in fit index values accompanying its use, provide examples illustrating effects on fit index values when using such a model, and discuss implications for theory and practice of structural equation modeling.  相似文献   

5.
The author compared simulations of the "true" null hypothesis (zeta) test, in which sigma was known and fixed, with the t test, in which s, an estimate of sigma, was calculated from the sample because the t test was used to emulate the "true" test. The true null hypothesis test bears exclusively on calculating the probability that a sample distance (mean) is larger than a specified value. The results showed that the value of t was sensitive to sampling fluctuations in both distance and standard error. Large values of t reflect small standard errors when n is small. The value of t achieves sensitivity primarily to distance only when the sample sizes are large. One cannot make a definitive statement about the probability or "significance" of a distance solely on the basis of the value of t.  相似文献   

6.
The conventional procedure for null hypothesis significance testing has long been the target of appropriate criticism. A more reasonable alternative is proposed, one that not only avoids the unrealistic postulation of a null hypothesis but also, for a given parametric difference and a given error probability, is more likely to report the detection of that difference.  相似文献   

7.
The squared multiple correlation coefficient has been widely employed to assess the goodness-of-fit of linear regression models in many applications. Although there are numerous published sources that present inferential issues and computing algorithms for multinormal correlation models, the statistical procedure for testing substantive significance by specifying the nonzero-effect null hypothesis has received little attention. This article emphasizes the importance of determining whether the squared multiple correlation coefficient is small or large in comparison with some prescribed standard and develops corresponding Excel worksheets that facilitate the implementation of various aspects of the suggested significance tests. In view of the extensive accessibility of Microsoft Excel software and the ultimate convenience of general-purpose statistical packages, the associated computer routines for interval estimation, power calculation, a nd samplesize determination are alsoprovided for completeness. The statistical methods and available programs of multiple correlation analysis described in this article purport to enhance pedagogical presentation in academic curricula and practical application in psychological research.  相似文献   

8.
The many null distributions of person fit indices   总被引:1,自引:0,他引:1  
This paper deals with the situation of an investigator who has collected the scores ofn persons to a set ofk dichotomous items, and wants to investigate whether the answers of all respondents are compatible with the one parameter logistic test model of Rasch. Contrary to the standard analysis of the Rasch model, where all persons are kept in the analysis and badly fittingitems may be removed, this paper studies the alternative model in which a small minority ofpersons has an answer strategy not described by the Rasch model. Such persons are called anomalous or aberrant. From the response vectors consisting ofk symbols each equal to 0 or 1, it is desired to classify each respondent as either anomalous or as conforming to the model. As this model is probabilistic, such a classification will possibly involve false positives and false negatives. Both for the Rasch model and for other item response models, the literature contains several proposals for a person fit index, which expresses for each individual the plausibility that his/her behavior follows the model. The present paper argues that such indices can only provide a satisfactory solution to the classification problem if their statistical distribution is known under the null hypothesis that all persons answer according to the model. This distribution, however, turns out to be rather different for different values of the person's latent trait value. This value will be called ability parameter, although our results are equally valid for Rasch scales measuring other attributes.As the true ability parameter is unknown, one can only use its estimate in order to obtain an estimated person fit value and an estimated null hypothesis distribution. The paper describes three specifications for the latter: assuming that the true ability equals its estimate, integrating across the ability distribution assumed for the population, and conditioning on the total score, which is in the Rasch model the sufficient statistic for the ability parameter.Classification rules for aberrance will be worked out for each of the three specifications. Depending on test length, item parameters and desired accuracy, they are based on the exact distribution, its Monte Carlo estimate and a new and promising approximation based on the moments of the person fit statistic. Results for the likelihood person fit statistic are given in detail, the methods could also be applied to other fit statistics. A comparison of the three specifications results in the recommendation to condition on the total score, as this avoids some problems of interpretation that affect the other two specifications.The authors express their gratitude to the reviewers and to many colleagues for comments on an earlier version.  相似文献   

9.
王阳  温忠麟  付媛姝 《心理科学进展》2020,28(11):1961-1969
常用的结构方程模型拟合指数存在一定局限, 如χ 2以传统零假设为目标假设, 无法验证模型, 而RMSEA和CFI等描述性的拟合指数不具备推断统计性质, 等效性检验有效弥补了这些问题。首先说明等效性检验如何评价单个模型的拟合, 并解释其与零假设检验的不同, 然后介绍等效性检验如何分析测量不变性, 接着用实证数据展示了等效性检验在单个模型评价和测量不变性检验中的效果, 并与传统模型评价方法比较。  相似文献   

10.
This study aims at investigating whether education and poverty have a causal effect on terrorist attacks in Turkey from 1980 to 2006. The ARDL bounds testing procedure and Granger-causality analysis within vector error-correction models (VECM) are used. Results of the ARDL bounds testing procedure suggest the rejection of the null hypothesis of no long-run relationship in the where education is the long-run forcing variable for terrorist attacks. The evidence obtained from the Granger-causality analysis within VECM confirms the findings from the ARDL bounds test that there exists a long-run equilibrium relationship between these two series. Results suggest no evidence of long-run relationship between terrorist attacks and poverty.  相似文献   

11.
Li L  Bentler PM 《心理学方法》2011,16(2):116-126
MacCallum, Browne, and Cai (2006) proposed a new framework for evaluation and power analysis of small differences between nested structural equation models (SEMs). In their framework, the null and alternative hypotheses for testing a small difference in fit and its related power analyses were defined by some chosen root-mean-square error of approximation (RMSEA) pairs. In this article, we develop a new method that quantifies those chosen RMSEA pairs and allows a quantitative comparison of them. Our method proposes the use of single RMSEA values to replace the choice of RMSEA pairs for model comparison and power analysis, thus avoiding the differential meaning of the chosen RMSEA pairs inherent in the approach of MacCallum et al. (2006). With this choice, the conventional cutoff values in model overall evaluation can directly be transferred and applied to the evaluation and power analysis of model differences.  相似文献   

12.
Tryon WW  Lewis C 《心理学方法》2008,13(3):272-277
Evidence of group matching frequently takes the form of a nonsignificant test of statistical difference. Theoretical hypotheses of no difference are also tested in this way. These practices are flawed in that null hypothesis statistical testing provides evidence against the null hypothesis and failing to reject H-sub-0 is not evidence supportive of it. Tests of statistical equivalence are needed. This article corrects the inferential confidence interval (ICI) reduction factor introduced by W. W. Tryon (2001) and uses it to extend his discussion of statistical equivalence. This method is shown to be algebraically equivalent with D. J. Schuirmann's (1987) use of 2 one-sided t tests, a highly regarded and accepted method of testing for statistical equivalence. The ICI method provides an intuitive graphic method for inferring statistical difference as well as equivalence. Trivial difference occurs when a test of difference and a test of equivalence are both passed. Statistical indeterminacy results when both tests are failed. Hybrid confidence intervals are introduced that impose ICI limits on standard confidence intervals. These intervals are recommended as replacements for error bars because they facilitate inferences.  相似文献   

13.
A. J. Riopelle (2003) has eloquently demonstrated that the null hypothesis assessed by the t test involves not only mean differences but also error in the estimation of the within-group standard deviation, s. He is correct in his conclusion that the precision of the interpretation of a significant t and the null hypothesis tested is complex, particularly when sample sizes are small. In this article, the author expands on Riopelle's thoughts by comparing t with some equivalent or closely related tests that make the reliance of t on the accurate estimation of error perhaps more salient and by providing a simulation that may address more directly the magnitude of the interpretational problem.  相似文献   

14.
We propose a default Bayesian hypothesis test for the presence of a correlation or a partial correlation. The test is a direct application of Bayesian techniques for variable selection in regression models. The test is easy to apply and yields practical advantages that the standard frequentist tests lack; in particular, the Bayesian test can quantify evidence in favor of the null hypothesis and allows researchers to monitor the test results as the data come in. We illustrate the use of the Bayesian correlation test with three examples from the psychological literature. Computer code and example data are provided in the journal archives.  相似文献   

15.
The log-linear model for contingency tables expresses the logarithm of a cell frequency as an additive function of main effects, interactions, etc., in a way formally identical with an analysis of variance model. Exact statistical tests are developed to test hypotheses that specific effects or sets of effects are zero, yielding procedures for exploring relationships among qualitative variables which are suitable for small samples. The tests are analogous to Fisher's exact test for a 2 × 2 contingency table. Given a hypothesis, the exact probability of the obtained table is determined, conditional on fixed marginals or other functions of the cell frequencies. The sum of the probabilities of the obtained table and of all less probable ones is the exact probability to be considered in testing the null hypothesis. Procedures for obtaining exact probabilities are explained in detail, with examples given.  相似文献   

16.
The author compared simulations of the “true” null hypothesis (z) test, in which ò was known and fixed, with the t test, in which s, an estimate of ò, was calculated from the sample because the t test was used to emulate the “true” test. The true null hypothesis test bears exclusively on calculating the probability that a sample distance (mean) is larger than a specified value. The results showed that the value of t was sensitive to sampling fluctuations in both distance and standard error. Large values of t reflect small standard errors when n is small. The value of t achieves sensitivity primarily to distance only when the sample sizes are large. One cannot make a definitive statement about the probability or “significance” of a distance solely on the basis of the value of t.  相似文献   

17.
A family of scaling corrections aimed to improve the chi-square approximation of goodness-of-fit test statistics in small samples, large models, and nonnormal data was proposed in Satorra and Bentler (1994). For structural equations models, Satorra-Bentler's (SB) scaling corrections are available in standard computer software. Often, however, the interest is not on the overall fit of a model, but on a test of the restrictions that a null model sayM 0 implies on a less restricted oneM 1. IfT 0 andT 1 denote the goodness-of-fit test statistics associated toM 0 andM 1, respectively, then typically the differenceT d =T 0T 1 is used as a chi-square test statistic with degrees of freedom equal to the difference on the number of independent parameters estimated under the modelsM 0 andM 1. As in the case of the goodness-of-fit test, it is of interest to scale the statisticT d in order to improve its chi-square approximation in realistic, that is, nonasymptotic and nonormal, applications. In a recent paper, Satorra (2000) shows that the difference between two SB scaled test statistics for overall model fit does not yield the correct SB scaled difference test statistic. Satorra developed an expression that permits scaling the difference test statistic, but his formula has some practical limitations, since it requires heavy computations that are not available in standard computer software. The purpose of the present paper is to provide an easy way to compute the scaled difference chi-square statistic from the scaled goodness-of-fit test statistics of modelsM 0 andM 1. A Monte Carlo study is provided to illustrate the performance of the competing statistics. This research was supported by the Spanish grants PB96-0300 and BEC2000-0983, and USPHS grants DA00017 and DA01070.  相似文献   

18.
According to Bayesians, the null hypothesis significance-testing procedure is not deductively valid because it involves the retention or rejection of the null hypothesis under conditions where the posterior probability of that hypothesis is not known. Other criticisms are that this procedure is pointless and encourages imprecise hypotheses. However, according to non-Bayesians, there is no way of assigning a prior probability to the null hypothesis, and so Bayesian statistics do not work either. Consequently, no procedure has been accepted by both groups as providing a compelling reason to accept or reject hypotheses. The author aims to provide such a method. In the process, the author distinguishes between probability and epistemic estimation and argues that, although both are important in a science that is not completely deterministic, epistemic estimation is most relevant for hypothesis testing. Based on this analysis, the author proposes that hypotheses be evaluated via epistemic ratios and explores the implications of this proposal. One implication is that it is possible to encourage precise theorizing by imposing a penalty for imprecise hypotheses.  相似文献   

19.
When participants are asked to respond in the same way to stimuli from different sources (e.g., auditory and visual), responses are often observed to be substantially faster when both stimuli are presented simultaneously (redundancy gain). Different models account for this effect, the two most important being race models and coactivation models. Redundancy gains consistent with the race model have an upper limit, however, which is given by the well-known race model inequality (Miller, 1982). A number of statistical tests have been proposed for testing the race model inequality in single participants and groups of participants. All of these tests use the race model as the null hypothesis, and rejection of the null hypothesis is considered evidence in favor of coactivation. We introduce a statistical test in which the race model prediction is the alternative hypothesis. This test controls the Type I error if a theory predicts that the race model prediction holds in a given experimental condition.  相似文献   

20.
Consider two independent groups with K measures for each subject. For the jth group and kth measure, let μtjk be the population trimmed mean, j = 1, 2; k = 1, ..., K. This article compares several methods for testing H 0 : u1k = t2k such that the probability of at least one Type I error is, and simultaneous probability coverage is 1 - α when computing confidence intervals for μt1k - μt2k . The emphasis is on K = 4 and α = .05. For zero trimming the problem reduces to comparing means, but it is well known that when comparing means, arbitrarily small departures from normality can result in extremely low power relative to using say 20% trimming. Moreover, when skewed distributions are being compared, conventional methods for comparing means can be biased for reasons reviewed in the article. A consequence is that in some realistic situations, the probability of rejecting can be higher when the null hypothesis is true versus a situation where the means differ by a half standard deviation. Switching to robust measures of location is known to reduce this problem, and combining robust measures of location with some type of bootstrap method reduces the problem even more. Published articles suggest that for the problem at hand, the percentile t bootstrap, combined with a 20% trimmed mean, will perform relatively well, but there are known situations where it does not eliminate all problems. In this article we consider an extension of the percentile bootstrap approach that is found to give better results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号