首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
基于改进的Wald统计量,将适用于两群组的DIF检测方法拓展至多群组的项目功能差异(DIF)检验;改进的Wald统计量将分别通过计算观察信息矩阵(Obs)和经验交叉相乘信息矩阵(XPD)而得到。模拟研究探讨了此二者与传统计算方法在多个群组下的DIF检验情况,结果表明:(1)Obs和XPD的一类错误率明显低于传统方法,DINA模型估计下Obs和XPD的一类错误率接近理论水平;(2)样本量和DIF量较大时,Obs和XPD具有与传统Wald统计量大体相同的统计检验力。  相似文献   

2.
We study several aspects of bootstrap inference for covariance structure models based on three test statistics, including Type I error, power and sample‐size determination. Specifically, we discuss conditions for a test statistic to achieve a more accurate level of Type I error, both in theory and in practice. Details on power analysis and sample‐size determination are given. For data sets with heavy tails, we propose applying a bootstrap methodology to a transformed sample by a downweighting procedure. One of the key conditions for safe bootstrap inference is generally satisfied by the transformed sample but may not be satisfied by the original sample with heavy tails. Several data sets illustrate that, by combining downweighting and bootstrapping, a researcher may find a nearly optimal procedure for evaluating various aspects of covariance structure models. A rule for handling non‐convergence problems in bootstrap replications is proposed.  相似文献   

3.
The present study evaluated accuracy levels of seven techniques for ascertaining, after a meta-analysis, whether moderators are present or not: (a) SH-75% rule for uncorrected r , (b) SH-75% rule for corrected r, (c) SH-95% rule for uncorrected r , (d) SH-95% rule for corrected r, (e) the Q statistic; (f) inclusion of 0 in the credibility interval around the corrected r, and (g) the size of the interval. Using Monte Carlo data which were defined by various parameters including sample based artifacts, comparisons of Type I and power determinations were generated. Findings showed that when differences between population correlations were small, power levels for all techniques were relatively low. Overall, SH rules and the Q statistic had greater power but higher Type I error rate than credibility intervals. Because of the high Type I error rate associated with both of the SH-95% techniques and the low power found with the credibility intervals, the SH-75% rules and Q statistic are recommended. Limitations and some practical implications for the findings are discussed.  相似文献   

4.
A statistical model for combining p values from multiple tests of significance is used to define rejection and acceptance regions for two-stage and three-stage sampling plans. Type I error rates, power, frequencies of early termination decisions, and expected sample sizes are compared. Both the two-stage and three-stage procedures provide appropriate protection against Type I errors. The two-stage sampling plan with its single interim analysis entails minimal loss in power and provides substantial reduction in expected sample size as compared with a conventional single end-of-study test of significance for which power is in the adequate range. The three-stage sampling plan with its two interim analyses introduces somewhat greater reduction in power, but it compensates with greater reduction in expected sample size. Either interim-analysis strategy is more efficient than a single end-of-study analysis in terms of power per unit of sample size.  相似文献   

5.
The statistical significance levels of the Wilcoxon-Mann-Whitney test and the Kruskal-Wallis test are substantially biased by heterogeneous variances of treatment groups--even when sample sizes are equal. Under these conditions, the Type I error probabilities of the nonparametric tests, performed at the .01, .05, and .10 significance levels, increase by as much as 40%-50% in many cases and sometimes as much as 300%. The bias increases systematically as the ratio of standard deviations of treatment groups increases and remains fairly constant for various sample sizes. There is no indication that Type I error probabilities approach the significance level asymptotically as sample size increases.  相似文献   

6.
Hou,de la Torre和Nandakumar(2014)提出可以使用Wald统计量检验DIF,但其结果的一类错误率存在过度膨胀的问题。本研究中提出了一个使用观察信息矩阵进行计算的改进后的Wald统计量。结果表明:(1)使用观察信息矩阵计算的这一改进后的Wald统计量在DIF检验中具有良好的一类错误控制率,尤其是在项目具有较高区分能力的时候,解决了以往研究中一类错误率过度膨胀的问题。(2)随着样本量的增加以及DIF量的增大,使用观察信息矩阵计算Wald统计量的统计检验力也在增加。  相似文献   

7.
The statistical significance levels of the Wilcoxon-Mann-Whitney test and the Kruskal-Wallis test are substantially biased by heterogeneous variances of treatment groups—even when sample sizes are equal. Under these conditions, the Type I error probabilities of the nonparametric tests, performed at the .01, .05, and .10 significance levels, increase by as much as 40%-50% in many cases and sometimes as much as 300%. The bias increases systematically as the ratio of standard deviations of treatment groups increases and remains fairly constant for various sample sizes. There is no indication that Type I error probabilities approach the significance level asymptotically as sample size increases.  相似文献   

8.
Adverse impact evaluations often call for evidence that the disparity between groups in selection rates is statistically significant, and practitioners must choose which test statistic to apply in this situation. To identify the most effective testing procedure, the authors compared several alternate test statistics in terms of Type I error rates and power, focusing on situations with small samples. Significance testing was found to be of limited value because of low power for all tests. Among the alternate test statistics, the widely-used Z-test on the difference between two proportions performed reasonably well, except when sample size was extremely small. A test suggested by G. J. G. Upton (1982) provided slightly better control of Type I error under some conditions but generally produced results similar to the Z-test. Use of the Fisher Exact Test and Yates's continuity-corrected chi-square test are not recommended because of overly conservative Type I error rates and substantially lower power than the Z-test.  相似文献   

9.
Goodness-of-fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square, but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's (1984) asymptotically distribution-free method and Satorra Bentler's (1988, 1994) mean scaling statistic were developed under the presumption of nonnormality in the factors and errors. This article finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent, and Bibby's (1980) study of students tested for their ability in 5 content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.  相似文献   

10.
When planning a study, sample size determination is one of the most important tasks facing the researcher. The size will depend on the purpose of the study, the cost limitations, and the nature of the data. By specifying the standard deviation ratio and/or the sample size ratio, the present study considers the problem of heterogeneous variances and non‐normality for Yuen's two‐group test and develops sample size formulas to minimize the total cost or maximize the power of the test. For a given power, the sample size allocation ratio can be manipulated so that the proposed formulas can minimize the total cost, the total sample size, or the sum of total sample size and total cost. On the other hand, for a given total cost, the optimum sample size allocation ratio can maximize the statistical power of the test. After the sample size is determined, the present simulation applies Yuen's test to the sample generated, and then the procedure is validated in terms of Type I errors and power. Simulation results show that the proposed formulas can control Type I errors and achieve the desired power under the various conditions specified. Finally, the implications for determining sample sizes in experimental studies and future research are discussed.  相似文献   

11.
Preliminary tests of equality of variances used before a test of location are no longer widely recommended by statisticians, although they persist in some textbooks and software packages. The present study extends the findings of previous studies and provides further reasons for discontinuing the use of preliminary tests. The study found Type I error rates of a two‐stage procedure, consisting of a preliminary Levene test on samples of different sizes with unequal variances, followed by either a Student pooled‐variances t test or a Welch separate‐variances t test. Simulations disclosed that the twostage procedure fails to protect the significance level and usually makes the situation worse. Earlier studies have shown that preliminary tests often adversely affect the size of the test, and also that the Welch test is superior to the t test when variances are unequal. The present simulations reveal that changes in Type I error rates are greater when sample sizes are smaller, when the difference in variances is slight rather than extreme, and when the significance level is more stringent. Furthermore, the validity of the Welch test deteriorates if it is used only on those occasions where a preliminary test indicates it is needed. Optimum protection is assured by using a separate‐variances test unconditionally whenever sample sizes are unequal.  相似文献   

12.
The variable criteria sequential stopping rule (vcSSR) is an efficient way to add sample size to planned ANOVA tests while holding the observed rate of Type I errors, αo, constant. The only difference from regular null hypothesis testing is that criteria for stopping the experiment are obtained from a table based on the desired power, rate of Type I errors, and beginning sample size. The vcSSR was developed using between-subjects ANOVAs, but it should work with p values from any type of F test. In the present study, the αo remained constant at the nominal level when using the previously published table of criteria with repeated measures designs with various numbers of treatments per subject, Type I error rates, values of ρ, and four different sample size models. New power curves allow researchers to select the optimal sample size model for a repeated measures experiment. The criteria held αo constant either when used with a multiple correlation that varied the sample size model and the number of predictor variables, or when used with MANOVA with multiple groups and two levels of a within-subject variable at various levels of ρ. Although not recommended for use with χ2 tests such as the Friedman rank ANOVA test, the vcSSR produces predictable results based on the relation between F and χ2. Together, the data confirm the view that the vcSSR can be used to control Type I errors during sequential sampling with any t- or F-statistic rather than being restricted to certain ANOVA designs.  相似文献   

13.
Previous studies of different methods of testing mediation models have consistently found two anomalous results. The first result is elevated Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap tests not found in nonresampling tests or in resampling tests that did not include a bias correction. This is of special concern as the bias-corrected bootstrap is often recommended and used due to its higher statistical power compared with other tests. The second result is statistical power reaching an asymptote far below 1.0 and in some conditions even declining slightly as the size of the relationship between X and M, a, increased. Two computer simulations were conducted to examine these findings in greater detail. Results from the first simulation found that the increased Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap are a function of an interaction between the size of the individual paths making up the mediated effect and the sample size, such that elevated Type I error rates occur when the sample size is small and the effect size of the nonzero path is medium or larger. Results from the second simulation found that stagnation and decreases in statistical power as a function of the effect size of the a path occurred primarily when the path between M and Y, b, was small. Two empirical mediation examples are provided using data from a steroid prevention and health promotion program aimed at high school football players (Athletes Training and Learning to Avoid Steroids; Goldberg et al., 1996), one to illustrate a possible Type I error for the bias-corrected bootstrap test and a second to illustrate a loss in power related to the size of a. Implications of these findings are discussed.  相似文献   

14.
The data obtained from one‐way independent groups designs is typically non‐normal in form and rarely equally variable across treatment populations (i.e. population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e. the analysis of variance F test) typically provides invalid results (e.g. too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non‐normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e. trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non‐normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non‐normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non‐normal.  相似文献   

15.
The allocation of sufficient participants into different experimental groups for various research purposes under given constraints is an important practical problem faced by researchers. We address the problem of sample size determination between two independent groups for unequal and/or unknown variances when both the power and the differential cost are taken into consideration. We apply the well‐known Welch approximate test to derive various sample size allocation ratios by minimizing the total cost or, equivalently, maximizing the statistical power. Two types of hypotheses including superiority/non‐inferiority and equivalence of two means are each considered in the process of sample size planning. A simulation study is carried out and the proposed method is validated in terms of Type I error rate and statistical power. As a result, the simulation study reveals that the proposed sample size formulas are very satisfactory under various variances and sample size allocation ratios. Finally, a flowchart, tables, and figures of several sample size allocations are presented for practical reference.  相似文献   

16.
A great deal of educational and social data arises from cluster sampling designs where clusters involve schools, classrooms, or communities. A mistake that is sometimes encountered in the analysis of such data is to ignore the effect of clustering and analyse the data as if it were based on a simple random sample. This typically leads to an overstatement of the precision of results and too liberal conclusions about precision and statistical significance of mean differences. This paper gives simple corrections to the test statistics that would be computed in an analysis of variance if clustering were (incorrectly) ignored. The corrections are multiplicative factors depending on the total sample size, the cluster size, and the intraclass correlation structure. For example, the corrected F statistic has Fisher's F distribution with reduced degrees of freedom. The corrected statistic reduces to the F statistic computed by ignoring clustering when the intraclass correlations are zero. It reduces to the F statistic computed using cluster means when the intraclass correlations are unity, and it is in between otherwise. A similar adjustment to the usual statistic for testing a linear contrast among group means is described.  相似文献   

17.
Experience with real data indicates that psychometric measures often have heavy-tailed distributions. This is known to be a serious problem when comparing the means of two independent groups because heavy-tailed distributions can have a serious effect on power. Another problem that is common in some areas is outliers. This paper suggests an approach to these problems based on the one-step M-estimator of location. Simulations indicate that the new procedure provides very good control over the probability of a Type I error even when distributions are skewed, have different shapes, and the variances are unequal. Moreover, the new procedure has considerably more power than Welch's method when distributions have heavy tails, and it compares well to Yuen's method for comparing trimmed means. Wilcox's median procedure has about the same power as the proposed procedure, but Wilcox's method is based on a statistic that has a finite sample breakdown point of only 1/n, wheren is the sample size. Comments on other methods for comparing groups are also included.  相似文献   

18.
The extent to which rank transformations result in the same statistical decisions as their non‐parametric counterparts is investigated. Simulations are presented using the Wilcoxon–Mann–Whitney test, the Wilcoxon signed‐rank test and the Kruskal–Wallis test, together with the rank transformations and t and F tests corresponding to each of those non‐parametric methods. In addition to Type I errors and power over all simulations, the study examines the consistency of the outcomes of the two methods on each individual sample. The results show how acceptance or rejection of the null hypothesis and differences in p‐values of the test statistics depend in a regular and predictable way on sample size, significance level, and differences between means, for normal and various non‐normal distributions.  相似文献   

19.
A problem arises in analyzing the existence of interdependence between the behavioral sequences of two individuals: tests involving a statistic such as chi-square assume independent observations within each behavioral sequence, a condition which may not exist in actual practice. Using Monte Carlo simulations of binomial data sequences, we found that the use of a chi-square test frequently results in unacceptable Type I error rates when the data sequences are autocorrelated. We compared these results to those from two other methods designed specifically for testing for intersequence independence in the presence of intrasequence autocorrelation. The first method directly tests the intersequence correlation using an approximation of the variance of the intersequence correlation estimated from the sample autocorrelations. The second method uses tables of critical values of the intersequence correlation computed by Nakamuraet al. (J. Am. Stat. Assoc., 1976,71, 214–222). Although these methods were originally designed for normally distributed data, we found that both methods produced much better results than the uncorrected chi-square test when applied to binomial autocorrelated sequences. The superior method appears to be the variance approximation method, which resulted in Type I error rates that were generally less than or equal to 5% when the level of significance was set at .05.  相似文献   

20.
Through Monte Carlo simulation, small sample methods for evaluating overall data-model fit in structural equation modeling were explored. Type I error behavior and power were examined using maximum likelihood (ML), Satorra-Bentler scaled and adjusted (SB; Satorra & Bentler, 1988, 1994), residual-based (Browne, 1984), and asymptotically distribution free (ADF; Browne, 1982, 1984) test statistics. To accommodate small sample sizes the ML and SB statistics were adjusted using a k-factor correction (Bartlett, 1950); the residual-based and ADF statistics were corrected using modified x2 and F statistics (Yuan & Bentler, 1998, 1999). Design characteristics include model type and complexity, ratio of sample size to number of estimated parameters, and distributional form. The k-factor-corrected SB scaled test statistic was especially stable at small sample sizes with both normal and nonnormal data. Methodologists are encouraged to investigate its behavior under a wider variety of models and distributional forms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号