期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

11.

Examining the effect of initialization strategies on the performance of Gaussian mixture modeling

Emilie Shireman Douglas Steinley Michael J. Brusco 《Behavior research methods》2017,49(1):282-293

Mixture modeling is a popular technique for identifying unobserved subpopulations (e.g., components) within a data set, with Gaussian (normal) mixture modeling being the form most widely used. Generally, the parameters of these Gaussian mixtures cannot be estimated in closed form, so estimates are typically obtained via an iterative process. The most common estimation procedure is maximum likelihood via the expectation-maximization (EM) algorithm. Like many approaches for identifying subpopulations, finite mixture modeling can suffer from locally optimal solutions, and the final parameter estimates are dependent on the initial starting values of the EM algorithm. Initial values have been shown to significantly impact the quality of the solution, and researchers have proposed several approaches for selecting the set of starting values. Five techniques for obtaining starting values that are implemented in popular software packages are compared. Their performances are assessed in terms of the following four measures: (1)?the ability to find the best observed solution, (2)?settling on a solution that classifies observations correctly, (3)?the number of local solutions found by each technique, and (4)?the speed at which the start values are obtained. On the basis of these results, a set of recommendations is provided to the user. 相似文献

12.

Alcohol use trajectories and the ubiquitous cat's cradle: cause for concern?

Sher KJ Jackson KM Steinley D 《Journal of abnormal psychology》2011,120(2):322-335

In recent years, trajectory approaches to characterizing individual differences in the onset and course of substance involvement have gained popularity. Previous studies have sometimes reported 4 prototypic courses: (a) a consistently "low" group, (b) an "increase" group, (c) a "decrease" group, and (d) a consistently "high" group. Although not always recovered, these trajectories are often found, despite these studies varying in the ages of the samples studied and the duration of the observation periods employed. Here, the authors examined the consistency with which these longitudinal patterns of heavy drinking were recovered in a series of latent class growth analyses that systematically varied the age of the sample at baseline, the duration of observation, and the number and frequency of measurement occasions. Data were drawn from a 4-year, 8-wave panel study of college student drinking (N = 3,720). Despite some variability across analyses, there was a strong tendency for these prototypes to emerge regardless of the participants' age at baseline and the duration of observation. These findings highlight potential problems with commonly employed trajectory-based approaches and the need to not over-reify these constructs. 相似文献

13.

A Tabu-Search Heuristic for Deterministic Two-Mode Blockmodeling of Binary Network Matrices

Michael Brusco Douglas Steinley 《Psychometrika》2011,76(4):612-633

Two-mode binary data matrices arise in a variety of social network contexts, such as the attendance or non-attendance of individuals at events, the participation or lack of participation of groups in projects, and the votes of judges on cases. A popular method for analyzing such data is two-mode blockmodeling based on structural equivalence, where the goal is to identify partitions for the row and column objects such that the clusters of the row and column objects form blocks that are either complete (all 1s) or null (all 0s) to the greatest extent possible. Multiple restarts of an object relocation heuristic that seeks to minimize the number of inconsistencies (i.e., 1s in null blocks and 0s in complete blocks) with ideal block structure is the predominant approach for tackling this problem. As an alternative, we propose a fast and effective implementation of tabu search. Computational comparisons across a set of 48 large network matrices revealed that the new tabu-search heuristic always provided objective function values that were better than those of the relocation heuristic when the two methods were constrained to the same amount of computation time. 相似文献

14.

A New Variable Weighting and Selection Procedure for K-means Cluster Analysis

Douglas Steinley Michael J. Brusco 《Multivariate behavioral research》2013,48(1):77-108

A variance-to-range ratio variable weighting procedure is proposed. We show how this weighting method is theoretically grounded in the inherent variability found in data exhibiting cluster structure. In addition, a variable selection procedure is proposed to operate in conjunction with the variable weighting technique. The performances of these procedures are demonstrated in a simulation study, showing favorable results when compared with existing standardization methods. A detailed demonstration of the weighting and selection procedure is provided for the well-known Fisher Iris data and several synthetic data sets. 相似文献

15.

Using Cohen's κ for Community Detection in Social Networks

Michaela Hoffman Douglas Steinley Kathleen M. Gates Mitchell J. Prinstein Michael J. Brusco 《Multivariate behavioral research》2013,48(6):740-741

相似文献

16.

Evaluating mixture modeling for clustering: recommendations and cautions

Steinley D Brusco MJ 《心理学方法》2011,16(1):63-79

This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison, 2002). Focus is given to the multivariate normal distribution, and 9 separate decompositions (i.e., class structures) of the covariance matrix are investigated. To provide a link to the current literature, comparisons are made with K-means clustering in 3 detailed Monte Carlo studies. The findings have implications for applied researchers in that mixture-model clustering techniques performed best when the covariance structure and number of clusters were known. However, as the information about the shape and number of clusters became unknown, degraded performance was observed for both K-means clustering and mixture-model clustering. 相似文献

17.

Properties of the Hubert-Arabie adjusted Rand index 总被引：1，自引：0，他引：1

Steinley D 《心理学方法》2004,9(3):386-396

This article provides an investigation of cluster validation indices that relates 4 of the indices to the L. Hubert and P. Arabie (1985) adjusted Rand index--the cluster validation measure of choice (G. W. Milligan & M. C. Cooper, 1986). It is shown how these other indices can be "roughly" transformed into the same scale as the adjusted Rand index. Furthermore, in-depth explanations are given of why classification rates should not be used in cluster validation research. The article concludes by summarizing several properties of the adjusted Rand index across many conditions and provides a method for testing the significance of observed adjusted Rand indices. 相似文献

18.

An evaluation of exact methods for the multiple subset maximum cardinality selection problem

下载免费PDF全文

Michael J. Brusco Hans‐Friedrich Köhn Douglas Steinley 《The British journal of mathematical and statistical psychology》2016,69(2):194-213

The maximum cardinality subset selection problem requires finding the largest possible subset from a set of objects, such that one or more conditions are satisfied. An important extension of this problem is to extract multiple subsets, where the addition of one more object to a larger subset would always be preferred to increases in the size of one or more smaller subsets. We refer to this as the multiple subset maximum cardinality selection problem (MSMCSP). A recently published branch‐and‐bound algorithm solves the MSMCSP as a partitioning problem. Unfortunately, the computational requirement associated with the algorithm is often enormous, thus rendering the method infeasible from a practical standpoint. In this paper, we present an alternative approach that successively solves a series of binary integer linear programs to obtain a globally optimal solution to the MSMCSP. Computational comparisons of the methods using published similarity data for 45 food items reveal that the proposed sequential method is computationally far more efficient than the branch‐and‐bound approach. 相似文献

19.

Variable Neighborhood Search Heuristics for Selecting a Subset of Variables in Principal Component Analysis

Michael J. Brusco Renu Singh Douglas Steinley 《Psychometrika》2009,74(4):705-726

The selection of a subset of variables from a pool of candidates is an important problem in several areas of multivariate statistics. Within the context of principal component analysis (PCA), a number of authors have argued that subset selection is crucial for identifying those variables that are required for correct interpretation of the components. In this paper, we adapt the variable neighborhood search (VNS) paradigm to develop two heuristics for variable selection in PCA. The performances of these heuristics were compared to those obtained by a branch-and-bound algorithm, as well as forward stepwise, backward stepwise, and tabu search heuristics. In the first experiment, which considered candidate pools of 18 to 30 variables, the VNS heuristics matched the optimal subset obtained by the branch-and-bound algorithm more frequently than their competitors. In the second experiment, which considered candidate pools of 54 to 90 variables, the VNS heuristics provided better solutions than their competitors for a large percentage of the test problems. An application to a real-world data set is provided to demonstrate the importance of variable selection in the context of PCA. 相似文献

20.

F. Murtagh (2005). <Emphasis Type="Italic">Correspondence analysis and data coding with Java and R</Emphasis>. 230 pp., US$76.00. ISBN 1584885289

Douglas Steinley 《Psychometrika》2009,74(1):181-183

相似文献