首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A well-known concern regarding the usual linear regression model is multicollinearity. As the strength of the association among the independent variables increases, the squared standard error of regression estimators tends to increase, which can seriously impact power. This paper examines heteroscedastic methods for dealing with this issue when testing the hypothesis that all of the slope parameters are equal to zero via a robust ridge estimator that guards against outliers among the dependent variable. Included are results related to leverage points, meaning outliers among the independent variables. In various situations, the proposed method increases power substantially.  相似文献   

2.
This study examined the degree to which outliers were present in a convenience sample of published single-case research. Using a procedure for analyzing single-case data Allison &; Gorman (Behaviour Research and Therapy, 31, 621–631, 1993), this study compared the effect of outliers using ordinary least squares (OLS) regression to a robust regression method and attempted to answer four questions: (1) To what degree does outlier detection vary from OLS to robust regression? (2) How much do effect sizes differ from OLS to robust regression? (3) Are the differences produced by robust regression in more or less agreement with visual judgments of treatment effectiveness? (4) What is a typical range of effect sizes for robust regression versus OLS regression for data from “effective interventions”? Results suggest that outliers are common in single-case data. The effects of outliers in single-case data are explored, and the implications for researchers and practitioners using single-case designs are discussed.  相似文献   

3.
As Bayesian methods become more popular among behavioral scientists, they will inevitably be applied in situations that violate the assumptions underpinning typical models used to guide statistical inference. With this in mind, it is important to know something about how robust Bayesian methods are to the violation of those assumptions. In this paper, we focus on the problem of contaminated data (such as data with outliers or conflicts present), with specific application to the problem of estimating a credible interval for the population mean. We evaluate five Bayesian methods for constructing a credible interval, using toy examples to illustrate the qualitative behavior of different approaches in the presence of contaminants, and an extensive simulation study to quantify the robustness of each method. We find that the “default” normal model used in most Bayesian data analyses is not robust, and that approaches based on the Bayesian bootstrap are only robust in limited circumstances. A simple parametric model based on Tukey’s “contaminated normal model” and a model based on the t-distribution were markedly more robust. However, the contaminated normal model had the added benefit of estimating which data points were discounted as outliers and which were not.  相似文献   

4.
Many robust regression estimators have been proposed that have a high, finite‐sample breakdown point, roughly meaning that a large porportion of points must be altered to drive the value of an estimator to infinity. But despite this, many of them can be inordinately influenced by two properly placed outliers. With one predictor, an estimator that appears to correct this problem to a fair degree, and simultaneously maintain good efficiency when standard assumptions are met, consists of checking for outliers using a projection‐type method, removing any that are found, and applying the Theil — Sen estimator to the data that remain. When dealing with multiple predictors, there are two generalizations of the Theil — Sen estimator that might be used, but nothing is known about how their small‐sample properties compare. Also, there are no results on testing the hypothesis of zero slopes, and there is no information about the effect on efficiency when outliers are removed. In terms of hypothesis testing, using the more obvious percentile bootstrap method in conjunction with a slight modification of Mahalanobis distance was found to avoid Type I error probabilities above the nominal level, but in some situations the actual Type I error probabilities can be substantially smaller than intended when the sample size is small. An alternative method is found to be more satisfactory.  相似文献   

5.
Exploratory Mokken scale analysis (MSA) is a popular method for identifying scales from larger sets of items. As with any statistical method, in MSA the presence of outliers in the data may result in biased results and wrong conclusions. The forward search algorithm is a robust diagnostic method for outlier detection, which we adapt here to identify outliers in MSA. This adaptation involves choices with respect to the algorithm's objective function, selection of items from samples without outliers, and scalability criteria to be used in the forward search algorithm. The application of the adapted forward search algorithm for MSA is demonstrated using real data. Recommendations are given for its use in practical scale analysis.  相似文献   

6.
By means of more than a dozen user friendly packages, structural equation models (SEMs) are widely used in behavioral, education, social, and psychological research. As the underlying theory and methods in these packages are vulnerable to outliers and distributions with longer-than-normal tails, a fundamental problem in the field is the development of robust methods to reduce the influence of outliers and the distributional deviation in the analysis. In this paper we develop a maximum likelihood (ML) approach that is robust to outliers and symmetrically heavy-tailed distributions for analyzing nonlinear SEMs with ignorable missing data. The analytic strategy is to incorporate a general class of distributions into the latent variables and the error measurements in the measurement and structural equations. A Monte Carlo EM (MCEM) algorithm is constructed to obtain the ML estimates, and a path sampling procedure is implemented to compute the observed-data log-likelihood and then the Bayesian information criterion for model comparison. The proposed methodologies are illustrated with simulation studies and an example. The research described herein was fully supported by a grant (CUHK 4243/03H) from the Rearch Grants Council of the Hong Kong Special Administration Region. The authors are thankful to the Editor, the Associate Editor, and anonymous reviewers for valuable comments which improve the paper significantly, and are grateful to ICPSR and the relevant funding agency for allowing the use of their data. Requests for reprints should be sent to S. Y. Lee, Department of Statistics, The Chinese University of Hong Kong, Shatin, N. T., Hong Kong.  相似文献   

7.
A simulation study compared the performance of robust normal theory maximum likelihood (ML) and robust categorical least squares (cat-LS) methodology for estimating confirmatory factor analysis models with ordinal variables. Data were generated from 2 models with 2-7 categories, 4 sample sizes, 2 latent distributions, and 5 patterns of category thresholds. Results revealed that factor loadings and robust standard errors were generally most accurately estimated using cat-LS, especially with fewer than 5 categories; however, factor correlations and model fit were assessed equally well with ML. Cat-LS was found to be more sensitive to sample size and to violations of the assumption of normality of the underlying continuous variables. Normal theory ML was found to be more sensitive to asymmetric category thresholds and was especially biased when estimating large factor loadings. Accordingly, we recommend cat-LS for data sets containing variables with fewer than 5 categories and ML when there are 5 or more categories, sample size is small, and category thresholds are approximately symmetric. With 6-7 categories, results were similar across methods for many conditions; in these cases, either method is acceptable. (PsycINFO Database Record (c) 2012 APA, all rights reserved).  相似文献   

8.
This article compares several methods for performing robust principal component analysis, two of which have not been considered in previous articles. The criterion here, unlike that of extant articles aimed at comparing methods, is how well a method maximizes a robust version of the generalized variance of the projected data. This is in contrast to maximizing some measure of scatter associated with the marginal distributions of the projected scores, which does not take into account the overall structure of the projected data. Included are comparisons in which distributions are not elliptically symmetric. One of the new methods simply removes outliers using a projection-type multivariate outlier detection method that has been found to perform well relative to other outlier detection methods that have been proposed. The other new method belongs to the class of projection pursuit techniques and differs from other projection pursuit methods in terms of the function it tries to maximize. The comparisons include the method derived by Maronna (2005), the spherical method derived by Locantore et al. (1999), as well as a method proposed by Hubert, Rousseeuw, and Vanden Branden (2005). From the perspective used, the method by Hubert et al. (2005), the spherical method, and one of the new methods dominate the method derived by Maronna.  相似文献   

9.
During the last half century, hundreds of papers published in statistical journals have documented general conditions where reliance on least squares regression and Pearson's correlation can result in missing even strong associations between variables. Moreover, highly misleading conclusions can be made, even when the sample size is large. There are, in fact, several fundamental concerns related to non‐normality, outliers, heteroscedasticity, and curvature that can result in missing a strong association. Simultaneously, a vast array of new methods has been derived for effectively dealing with these concerns. The paper (i) reviews why least squares regression and classic inferential methods can fail, (ii) provides an overview of the many modern strategies for dealing with known problems, including some recent advances, and (iii) illustrates that modern robust methods can make a practical difference in our understanding of data. Included are some general recommendations regarding how modern methods might be used. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

10.
Cocaine is a type of drug that functions to increase the availability of the neurotransmitter dopamine in the brain. However, cocaine dependence or abuse is highly related to an increased risk of psychiatric disorders and deficits in cognitive performance, attention, and decision-making abilities. Given the chronic and persistent features of drug addiction, the progression of abstaining from cocaine often evolves across several states, such as addiction to, moderate dependence on, and swearing off cocaine. Hidden Markov models (HMMs) are well suited to the characterization of longitudinal data in terms of a set of unobservable states, and have increasingly been used to uncover the dynamic heterogeneity in progressive diseases or activities. However, the existence of outliers or influential points may misidentify the hidden states and distort the associated inference. In this study, we develop a Bayesian local influence procedure for HMMs with latent variables in the presence of missing data. The proposed model enables us to investigate the dynamic heterogeneity of multivariate longitudinal data, reveal how the interrelationships among latent variables change from one state to another, and simultaneously conduct statistical diagnosis for the given data, model assumptions, and prior inputs. We apply the proposed procedure to analyze a dataset collected by the UCLA center for advancing longitudinal drug abuse research. Several outliers or influential points that seriously influence estimation results are identified and removed. The proposed procedure also discovers the effects of treatment and individuals’ psychological problems on cocaine use behavior and delineates their dynamic changes across the cocaine-addiction states.  相似文献   

11.
Traditional structural equation modeling (SEM) techniques have trouble dealing with incomplete and/or nonnormal data that are often encountered in practice. Yuan and Zhang (2011a) developed a two-stage procedure for SEM to handle nonnormal missing data and proposed four test statistics for overall model evaluation. Although these statistics have been shown to work well with complete data, their performance for incomplete data has not been investigated in the context of robust statistics.

Focusing on a linear growth curve model, a systematic simulation study is conducted to evaluate the accuracy of the parameter estimates and the performance of five test statistics including the naive statistic derived from normal distribution based maximum likelihood (ML), the Satorra-Bentler scaled chi-square statistic (RML), the mean- and variance-adjusted chi-square statistic (AML), Yuan-Bentler residual-based test statistic (CRADF), and Yuan-Bentler residual-based F statistic (RF). Data are generated and analyzed in R using the package rsem (Yuan & Zhang, 2011b).

Based on the simulation study, we can observe the following: (a) The traditional normal distribution-based method cannot yield accurate parameter estimates for nonnormal data, whereas the robust method obtains much more accurate model parameter estimates for nonnormal data and performs almost as well as the normal distribution based method for normal distributed data. (b) With the increase of sample size, or the decrease of missing rate or the number of outliers, the parameter estimates are less biased and the empirical distributions of test statistics are closer to their nominal distributions. (c) The ML test statistic does not work well for nonnormal or missing data. (d) For nonnormal complete data, CRADF and RF work relatively better than RML and AML. (e) For missing completely at random (MCAR) missing data, in almost all the cases, RML and AML work better than CRADF and RF. (f) For nonnormal missing at random (MAR) missing data, CRADF and RF work better than AML. (g) The performance of the robust method does not seem to be influenced by the symmetry of outliers.  相似文献   

12.
Robust multidimensional scaling   总被引:3,自引:0,他引:3  
A method for multidimensional scaling that is highly resistant to the effects of outliers is described. To illustrate the efficacy of the procedure, some Monte Carlo simulation results are presented. The method is shown to perform well when outliers are present, even in relatively large numbers, and also to perform comparably to other approaches when no outliers are present.This research was supported by Grant A8351 from the Natural Sciences and Engineering Research Council of Canada to Ian Spence.  相似文献   

13.
Statistical methodology for handling omitted variables is presented in a multilevel modeling framework. In many nonexperimental studies, the analyst may not have access to all requisite variables, and this omission may lead to biased estimates of model parameters. By exploiting the hierarchical nature of multilevel data, a battery of statistical tools are developed to test various forms of model misspecification as well as to obtain estimators that are robust to the presence of omitted variables. The methodology allows for tests of omitted effects at single and multiple levels. The paper also introduces intermediate-level tests; these are tests for omitted effects at a single level, regardless of the presence of omitted effects at a higher level. A simulation study shows, not surprisingly, that the omission of variables yields bias in both regression coefficients and variance components; it also suggests that omitted effects at lower levels may cause more severe bias than at higher levels. Important factors resulting in bias were found to be the level of an omitted variable, its effect size, and sample size. A real data study illustrates that an omitted variable at one level may yield biased estimators at any level and, in this study, one cannot obtain reliable estimates for school-level variables when omitted child effects exist. However, robust estimators may provide unbiased estimates for effects of interest even when the efficient estimators fail, and the one-degree-of-freedom test helps one to understand where the problem is located. It is argued that multilevel data typically contain rich information to deal with omitted variables, offering yet another appealing reason for the use of multilevel models in the social sciences. This research was supported by the National Academy of Education/Spencer Foundation and the National Science Foundation, Grant Number SES-0436274.  相似文献   

14.
Mediation analysis investigates how certain variables mediate the effect of predictors on outcome variables. Existing studies of mediation models have been limited to normal theory maximum likelihood (ML) or least squares with normally distributed data. Because real data in the social and behavioral sciences are seldom normally distributed and often contain outliers, classical methods can result in biased and inefficient estimates, which lead to inaccurate or unreliable test of the meditated effect. The authors propose two approaches for better mediation analysis. One is to identify cases that strongly affect test results of mediation using local influence methods and robust methods. The other is to use robust methods for parameter estimation, and then test the mediated effect based on the robust estimates. Analytic details of both local influence and robust methods particular for mediation models were provided and one real data example was given. We first used local influence and robust methods to identify influential cases. Then, for the original data and the data with the identified influential cases removed, the mediated effect was tested using two estimation methods: normal theory ML and the robust method, crossing two tests of mediation: the Sobel (1982) Sobel, M. E. 1982. “Asymptotic confidence intervals for indirect effects in structural equation models”. In Sociological methodology, Edited by: Leinhardt, S. 290312. Washington, DC: American Sociological Association. [Crossref] [Google Scholar] test using information-based standard error (z I ) and sandwich-type standard error (z SW ). Results show that local influence and robust methods rank the influence of cases similarly, while the robust method is more objective. The widely used z I statistic is inflated when the distribution is heavy-tailed. Compared to normal theory ML, the robust method provides estimates with smaller standard errors and more reliable test.  相似文献   

15.
Classical methods for detecting outliers deal with continuous variables. These methods are not readily applicable to categorical data, such as incorrect/correct scores (0/1) and ordered rating scale scores (e.g., 0, …, 4) typical of multi-item tests and questionnaires. This study proposes two definitions of outlier scores suited for categorical data. One definition combines information on outliers from scores on all the items in the test, and the other definition combines information from all pairs of item scores. For a particular item-score vector, an outlier score expresses the degree to which the item-score vector is unusual. For ten real-data sets, the distribution of each of the two outlier scores is inspected by means of Tukey's fences and the extreme studentized deviate procedure. It is investigated whether the outliers that are identified are influential with respect to the statistical analysis performed on these data. Recommendations are given for outlier identification and accommodation in test and questionnaire data.  相似文献   

16.
We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is proposed to identify DIF items as outliers in the multivariate space. For low dimensionalities, up to 2–3 groups, a simple graphical tool is derived. We illustrate our approach with a reanalysis of data from Kim, Cohen, and Park (1995) on using calculators for a mathematics test.  相似文献   

17.
Ayala Cohen 《Psychometrika》1986,51(3):379-391
A test is proposed for the equality of the variances ofk 2 correlated variables. Pitman's test fork = 2 reduces the null hypothesis to zero correlation between their sum and their difference. Its extension, eliminating nuisance parameters by a bootstrap procedure, is valid for any correlation structure between thek normally distributed variables. A Monte Carlo study for several combinations of sample sizes and number of variables is presented, comparing the level and power of the new method with previously published tests. Some nonnormal data are included, for which the empirical level tends to be slightly higher than the nominal one. The results show that our method is close in power to the asymptotic tests which are extremely sensitive to nonnormality, yet it is robust and much more powerful than other robust tests.This research was supported by the fund for the promotion of research at the Technion.  相似文献   

18.
A method for selecting between K-dimensional linear factor models and (K + 1)-class latent profile models is proposed. In particular, it is shown that the conditional covariances of observed variables are constant under factor models but nonlinear functions of the conditioning variable under latent profile models. The performance of a convenient inferential method suggested by the main result is examined via data simulation and is shown to have acceptable error rate control when deciding between the 2 types of models. The proposed test is illustrated using examples from vocational assessment and developmental psychology.  相似文献   

19.
Experience with real data indicates that psychometric measures often have heavy-tailed distributions. This is known to be a serious problem when comparing the means of two independent groups because heavy-tailed distributions can have a serious effect on power. Another problem that is common in some areas is outliers. This paper suggests an approach to these problems based on the one-step M-estimator of location. Simulations indicate that the new procedure provides very good control over the probability of a Type I error even when distributions are skewed, have different shapes, and the variances are unequal. Moreover, the new procedure has considerably more power than Welch's method when distributions have heavy tails, and it compares well to Yuen's method for comparing trimmed means. Wilcox's median procedure has about the same power as the proposed procedure, but Wilcox's method is based on a statistic that has a finite sample breakdown point of only 1/n, wheren is the sample size. Comments on other methods for comparing groups are also included.  相似文献   

20.
Researchers are strongly encouraged to accompany the results of statistical tests with appropriate estimates of effect size. For 2-group comparisons, a probability-based effect size estimator (A) has many appealing properties (e.g., it is easy to understand, robust to violations of parametric assumptions, insensitive to outliers). We review generalizations of the A statistic to extend its use to applications with discrete data, with weighted data, with k > 2 groups, and with correlated samples. These generalizations are illustrated through reanalyses of data from published studies on sex differences in the acceptance of hypothetical offers of casual sex and in scores on a measure of economic enlightenment, on age differences in reported levels of Authentic Pride, and in differences between the numbers of promises made and kept in romantic relationships. Drawing from research on the construction of confidence intervals for the A statistic, we recommend a bootstrap method that can be used for each generalization of A. We provide a suite of programs that should make it easy to use the A statistic and accompany it with a confidence interval in a wide variety of research contexts.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号