期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Omnibus hypothesis testing in dominance-based ordinal multiple regression

Long JD 《心理学方法》2005,10(3):329-351

Often quantitative data in the social sciences have only ordinal justification. Problems of interpretation can arise when least squares multiple regression (LSMR) is used with ordinal data. Two ordinal alternatives are discussed, dominance-based ordinal multiple regression (DOMR) and proportional odds multiple regression. The Q2 statistic is introduced for testing the omnibus null hypothesis in DOMR. A simulation study is discussed that examines the actual Type I error rate and power of Q2 in comparison to the LSMR omnibus F test under normality and non-normality. Results suggest that Q2 has favorable sampling properties as long as the sample size-to-predictors ratio is not too small, and Q2 can be a good alternative to the omnibus F test when the response variable is non-normal. 相似文献

2.

Type I errors and power of the parametric bootstrap goodness‐of‐fit test: Full and limited information

《The British journal of mathematical and statistical psychology》2003,56(2):271-288

In sparse tables for categorical data well‐known goodness‐of‐fit statistics are not chi‐square distributed. A consequence is that model selection becomes a problem. It has been suggested that a way out of this problem is the use of the parametric bootstrap. In this paper, the parametric bootstrap goodness‐of‐fit test is studied by means of an extensive simulation study; the Type I error rates and power of this test are studied under several conditions of sparseness. In the presence of sparseness, models were used that were likely to violate the regularity conditions. Besides bootstrapping the goodness‐of‐fit usually used (full information statistics), corrected versions of these statistics and a limited information statistic are bootstrapped. These bootstrap tests were also compared to an asymptotic test using limited information. Results indicate that bootstrapping the usual statistics fails because these tests are too liberal, and that bootstrapping or asymptotically testing the limited information statistic works better with respect to Type I error and outperforms the other statistics by far in terms of statistical power. The properties of all tests are illustrated using categorical Markov models. 相似文献

3.

Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

《The British journal of mathematical and statistical psychology》2003,56(1):93-110

We study several aspects of bootstrap inference for covariance structure models based on three test statistics, including Type I error, power and sample‐size determination. Specifically, we discuss conditions for a test statistic to achieve a more accurate level of Type I error, both in theory and in practice. Details on power analysis and sample‐size determination are given. For data sets with heavy tails, we propose applying a bootstrap methodology to a transformed sample by a downweighting procedure. One of the key conditions for safe bootstrap inference is generally satisfied by the transformed sample but may not be satisfied by the original sample with heavy tails. Several data sets illustrate that, by combining downweighting and bootstrapping, a researcher may find a nearly optimal procedure for evaluating various aspects of covariance structure models. A rule for handling non‐convergence problems in bootstrap replications is proposed. 相似文献

4.

认知诊断测验中基于信息矩阵的多群组DIF检验

孙小坚刘彦楼王诗梦辛涛宋乃庆周蔓《心理科学》2022,45(3):710-717

基于改进的Wald统计量,将适用于两群组的DIF检测方法拓展至多群组的项目功能差异（DIF）检验;改进的Wald统计量将分别通过计算观察信息矩阵（Obs）和经验交叉相乘信息矩阵（XPD）而得到。模拟研究探讨了此二者与传统计算方法在多个群组下的DIF检验情况,结果表明：（1）Obs和XPD的一类错误率明显低于传统方法,DINA模型估计下Obs和XPD的一类错误率接近理论水平;（2）样本量和DIF量较大时,Obs和XPD具有与传统Wald统计量大体相同的统计检验力。相似文献

5.

Four applications of permutation methods to testing a single-mediator model

Taylor AB MacKinnon DP 《Behavior research methods》2012,44(3):806-844

Four applications of permutation tests to the single-mediator model are described and evaluated in this study. Permutation tests work by rearranging data in many possible ways in order to estimate the sampling distribution for the test statistic. The four applications to mediation evaluated here are the permutation test of ab, the permutation joint significance test, and the noniterative and iterative permutation confidence intervals for ab. A Monte Carlo simulation study was used to compare these four tests with the four best available tests for mediation found in previous research: the joint significance test, the distribution of the product test, and the percentile and bias-corrected bootstrap tests. We compared the different methods on Type I error, power, and confidence interval coverage. The noniterative permutation confidence interval for ab was the best performer among the new methods. It successfully controlled Type I error, had power nearly as good as the most powerful existing methods, and had better coverage than any existing method. The iterative permutation confidence interval for ab had lower power than do some existing methods, but it performed better than any other method in terms of coverage. The permutation confidence interval methods are recommended when estimating a confidence interval is a primary concern. SPSS and SAS macros that estimate these confidence intervals are provided. 相似文献

6.

改进的认知诊断模型项目功能差异检验方法——基于观察信息矩阵的Wald统计量

刘彦楼辛涛李令青田伟刘笑笑《心理学报》2016,(5):588-598

Hou,de la Torre和Nandakumar(2014)提出可以使用Wald统计量检验DIF,但其结果的一类错误率存在过度膨胀的问题。本研究中提出了一个使用观察信息矩阵进行计算的改进后的Wald统计量。结果表明:(1)使用观察信息矩阵计算的这一改进后的Wald统计量在DIF检验中具有良好的一类错误控制率,尤其是在项目具有较高区分能力的时候,解决了以往研究中一类错误率过度膨胀的问题。(2)随着样本量的增加以及DIF量的增大,使用观察信息矩阵计算Wald统计量的统计检验力也在增加。相似文献

7.

Testing for adverse impact when sample size is small

Collins MW Morris SB 《The Journal of applied psychology》2008,93(2):463-471

Adverse impact evaluations often call for evidence that the disparity between groups in selection rates is statistically significant, and practitioners must choose which test statistic to apply in this situation. To identify the most effective testing procedure, the authors compared several alternate test statistics in terms of Type I error rates and power, focusing on situations with small samples. Significance testing was found to be of limited value because of low power for all tests. Among the alternate test statistics, the widely-used Z-test on the difference between two proportions performed reasonably well, except when sample size was extremely small. A test suggested by G. J. G. Upton (1982) provided slightly better control of Type I error under some conditions but generally produced results similar to the Z-test. Use of the Fisher Exact Test and Yates's continuity-corrected chi-square test are not recommended because of overly conservative Type I error rates and substantially lower power than the Z-test. 相似文献

8.

A Person Fit Test For Irt Models For Polytomous Items

C. A. W. Glas Anna Villa T. Dagohoy 《Psychometrika》2007,72(2):159-180

A person fit test based on the Lagrange multiplier test is presented for three item response theory models for polytomous items: the generalized partial credit model, the sequential model, and the graded response model. The test can also be used in the framework of multidimensional ability parameters. It is shown that the Lagrange multiplier statistic can take both the effects of estimation of the item parameters and the estimation of the person parameters into account. The Lagrange multiplier statistic has an asymptotic χ²-distribution. The Type I error rate and power are investigated using simulation studies. Results show that test statistics that ignore the effects of estimation of the persons’ ability parameters have decreased Type I error rates and power. Incorporating a correction to account for the effects of the estimation of the persons’ ability parameters results in acceptable Type I error rates and power characteristics; incorporating a correction for the estimation of the item parameters has very little additional effect. It is investigated to what extent the three models give comparable results, both in the simulation studies and in an example using data from the NEO Personality Inventory-Revised. 相似文献

9.

Testing for robustness in Monte Carlo studies

Serlin RC 《心理学方法》2000,5(2):230-240

Monte Carlo studies provide the information needed to help researchers select appropriate analytical procedures under design conditions in which the underlying assumptions of the procedures are not met. In Monte Carlo studies, the 2 errors that one could commit involve (a) concluding that a statistical procedure is robust when it is not or (b) concluding that it is not robust when it is. In previous attempts to apply standard statistical design principles to Monte Carlo studies, the less severe of these errors has been wrongly designated the Type I error. In this article, a method is presented for controlling the appropriate Type I error rate; the determination of the number of iterations required in a Monte Carlo study to achieve desired power is described; and a confidence interval for a test's true Type I error rate is derived. A robustness criterion is also proposed that is a compromise between W. G. Cochran's (1952) and J. V. Bradley's (1978) criteria. 相似文献

10.

Testing for independence between pairs of autocorrelated binomial data sequences

David G. Schlundt Clyde P. Donahoe Jr. 《Journal of psychopathology and behavioral assessment》1983,5(4):309-316

A problem arises in analyzing the existence of interdependence between the behavioral sequences of two individuals: tests involving a statistic such as chi-square assume independent observations within each behavioral sequence, a condition which may not exist in actual practice. Using Monte Carlo simulations of binomial data sequences, we found that the use of a chi-square test frequently results in unacceptable Type I error rates when the data sequences are autocorrelated. We compared these results to those from two other methods designed specifically for testing for intersequence independence in the presence of intrasequence autocorrelation. The first method directly tests the intersequence correlation using an approximation of the variance of the intersequence correlation estimated from the sample autocorrelations. The second method uses tables of critical values of the intersequence correlation computed by Nakamuraet al. (J. Am. Stat. Assoc., 1976,71, 214–222). Although these methods were originally designed for normally distributed data, we found that both methods produced much better results than the uncorrected chi-square test when applied to binomial autocorrelated sequences. The superior method appears to be the variance approximation method, which resulted in Type I error rates that were generally less than or equal to 5% when the level of significance was set at .05. 相似文献

11.

A generally robust approach for testing hypotheses and setting confidence intervals for effect sizes

Keselman HJ Algina J Lix LM Wilcox RR Deering KN 《心理学方法》2008,13(2):110-129

Standard least squares analysis of variance methods suffer from poor power under arbitrarily small departures from normality and fail to control the probability of a Type I error when standard assumptions are violated. This article describes a framework for robust estimation and testing that uses trimmed means with an approximate degrees of freedom heteroscedastic statistic for independent and correlated groups designs in order to achieve robustness to the biasing effects of nonnormality and variance heterogeneity. The authors describe a nonparametric bootstrap methodology that can provide improved Type I error control. In addition, the authors indicate how researchers can set robust confidence intervals around a robust effect size parameter estimate. In an online supplement, the authors use several examples to illustrate the application of an SAS program to implement these statistical methods. 相似文献

12.

Evaluating Small Sample Approaches for Model Test Statistics in Structural Equation Modeling

《Multivariate behavioral research》2013,48(3):439-478

Through Monte Carlo simulation, small sample methods for evaluating overall data-model fit in structural equation modeling were explored. Type I error behavior and power were examined using maximum likelihood (ML), Satorra-Bentler scaled and adjusted (SB; Satorra & Bentler, 1988, 1994), residual-based (Browne, 1984), and asymptotically distribution free (ADF; Browne, 1982, 1984) test statistics. To accommodate small sample sizes the ML and SB statistics were adjusted using a k-factor correction (Bartlett, 1950); the residual-based and ADF statistics were corrected using modified x² and F statistics (Yuan & Bentler, 1998, 1999). Design characteristics include model type and complexity, ratio of sample size to number of estimated parameters, and distributional form. The k-factor-corrected SB scaled test statistic was especially stable at small sample sizes with both normal and nonnormal data. Methodologists are encouraged to investigate its behavior under a wider variety of models and distributional forms. 相似文献

13.

A Monte Carlo evaluation of tests for comparing dependent correlations

Hittner JB May K Silver NC 《The Journal of general psychology》2003,130(2):149-168

The authors conducted a Monte Carlo simulation of 8 statistical tests for comparing dependent zero-order correlations. In particular, they evaluated the Type I error rates and power of a number of test statistics for sample sizes (Ns) of 20, 50, 100, and 300 under 3 different population distributions (normal, uniform, and exponential). For the Type I error rate analyses, the authors evaluated 3 different magnitudes of the predictor-criterion correlations (rho(y,x1) = rho(y,x2) = .1, .4, and .7). For the power analyses, they examined 3 different effect sizes or magnitudes of discrepancy between rho(y,x1) and rho(y,x2) (values of .1, .3, and .6). They conducted all of the simulations at 3 different levels of predictor intercorrelation (rho(x1,x2) = .1, .3, and .6). The results indicated that both Type I error rate and power depend not only on sample size and population distribution, but also on (a) the predictor intercorrelation and (b) the effect size (for power) or the magnitude of the predictor-criterion correlations (for Type I error rate). When the authors considered Type I error rate and power simultaneously, the findings suggested that O. J. Dunn and V. A. Clark's (1969) z and E. J. Williams's (1959) t have the best overall statistical properties. The findings extend and refine previous simulation research and as such, should have greater utility for applied researchers. 相似文献

14.

A comment on sampling error in the standardized mean difference with unequal sample sizes: avoiding potential errors in meta-analytic and primary research

Laczo RM Sackett PR Bobko P Cortina JM 《The Journal of applied psychology》2005,90(4):758-764

The authors discuss potential confusion in conducting primary studies and meta-analyses on the basis of differences between groups. First, the authors show that a formula for the sampling error of the standardized mean difference (d) that is based on equal group sample sizes can produce substantially biased results if applied with markedly unequal group sizes. Second, the authors show that the same concerns are present when primary analyses or meta-analyses are conducted with point-biserial correlations, as the point-biserial correlation (r) is a transformation of d. Third, the authors examine the practice of correcting a point-biserial r for unequal sample sizes and note that such correction would also increase the sampling error of the corrected r. Correcting rs for unequal sample sizes, but using the standard formula for sampling error in uncorrected r, can result in bias. The authors offer a set of recommendations for conducting meta-analyses of group differences. 相似文献

15.

A comparison of methods to test mediation and other intervening variable effects 总被引：11，自引：0，他引：11

MacKinnon DP Lockwood CM Hoffman JM West SG Sheets V 《心理学方法》2002,7(1):83-104

A Monte Carlo study compared 14 methods to test the statistical significance of the intervening variable effect. An intervening variable (mediator) transmits the effect of an independent variable to a dependent variable. The commonly used R. M. Baron and D. A. Kenny (1986) approach has low statistical power. Two methods based on the distribution of the product and 2 difference-in-coefficients methods have the most accurate Type I error rates and greatest statistical power except in 1 important case in which Type I error rates are too high. The best balance of Type I error and statistical power across all cases is the test of the joint significance of the two effects comprising the intervening variable effect. 相似文献

16.

VALCOR: A program for estimating standard error, confidence intervals, and probability of corrected validity

Jesús F. Salgado 《Behavior research methods》1997,29(3):464-467

VALCOR is a Turbo-Basic program that corrects the observed (uncorrected) validity coefficients for criterion and predictor unreliability and range restriction in the predictor. Furthermore, using the formulas for the standard error of functions of correlations derived by Bobko and Rieck (1980), the program provides an estimation of the standard error, the confidence intervals, and the probability of the corrected validity coefficients. In this way, the probability and the boundaries of the corrected validity coefficients may be reported together with the probability of the uncorrected validity coefficients. The results are presented on the computer screen and may be saved in an external file. 相似文献

17.

Conceptualizing Interrater Agreement as Testing the Existence of Extra Variation in the Multinomial Model

《International Journal of Testing》2013,13(2):151-161

相似文献

18.

Regression-based techniques for statistical decision making in single-case designs

Manolov R Arnau J Solanas A Bono R 《Psicothema》2010,22(4):1026-1032

The present study evaluates the performance of four methods for estimating regression coefficients used to make statistical decisions about intervention effectiveness in single-case designs. Ordinary least square estimation is compared to two correction techniques dealing with general trend and a procedure that eliminates autocorrelation whenever it is present. Type I error rates and statistical power are studied for experimental conditions defined by the presence or absence of treatment effect (change in level or in slope), general trend, and serial dependence. The results show that empirical Type I error rates do not approach the nominal ones in the presence of autocorrelation or general trend when ordinary and generalized least squares are applied. The techniques controlling trend show lower false alarm rates, but prove to be insufficiently sensitive to existing treatment effects. Consequently, the use of the statistical significance of the regression coefficients for detecting treatment effects is not recommended for short data series. 相似文献

19.

Effect of non-normality on test statistics for one-way independent groups designs

Cribbie RA Fiksenbaum L Keselman HJ Wilcox RR 《The British journal of mathematical and statistical psychology》2012,65(1):56-73

The data obtained from one‐way independent groups designs is typically non‐normal in form and rarely equally variable across treatment populations (i.e. population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e. the analysis of variance F test) typically provides invalid results (e.g. too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non‐normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e. trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non‐normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non‐normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non‐normal. 相似文献

20.

Heterogeneous heterogeneity by default: Testing categorical moderators in mixed-effects meta-analysis

Josue E. Rodriguez Donald R. Williams Paul-Christian Bürkner 《The British journal of mathematical and statistical psychology》2023,76(2):402-433

Categorical moderators are often included in mixed-effects meta-analysis to explain heterogeneity in effect sizes. An assumption in tests of categorical moderator effects is that of a constant between-study variance across all levels of the moderator. Although it rarely receives serious thought, there can be statistical ramifications to upholding this assumption. We propose that researchers should instead default to assuming unequal between-study variances when analysing categorical moderators. To achieve this, we suggest using a mixed-effects location-scale model (MELSM) to allow group-specific estimates for the between-study variance. In two extensive simulation studies, we show that in terms of Type I error and statistical power, little is lost by using the MELSM for moderator tests, but there can be serious costs when an equal variance mixed-effects model (MEM) is used. Most notably, in scenarios with balanced sample sizes or equal between-study variance, the Type I error and power rates are nearly identical between the MEM and the MELSM. On the other hand, with imbalanced sample sizes and unequal variances, the Type I error rate under the MEM can be grossly inflated or overly conservative, whereas the MELSM does comparatively well in controlling the Type I error across the majority of cases. A notable exception where the MELSM did not clearly outperform the MEM was in the case of few studies (e.g., 5). With respect to power, the MELSM had similar or higher power than the MEM in conditions where the latter produced non-inflated Type 1 error rates. Together, our results support the idea that assuming unequal between-study variances is preferred as a default strategy when testing categorical moderators. 相似文献