排序方式: 共有40条查询结果,搜索用时 15 毫秒
11.
Mixture modeling is a popular technique for identifying unobserved subpopulations (e.g., components) within a data set, with Gaussian (normal) mixture modeling being the form most widely used. Generally, the parameters of these Gaussian mixtures cannot be estimated in closed form, so estimates are typically obtained via an iterative process. The most common estimation procedure is maximum likelihood via the expectation-maximization (EM) algorithm. Like many approaches for identifying subpopulations, finite mixture modeling can suffer from locally optimal solutions, and the final parameter estimates are dependent on the initial starting values of the EM algorithm. Initial values have been shown to significantly impact the quality of the solution, and researchers have proposed several approaches for selecting the set of starting values. Five techniques for obtaining starting values that are implemented in popular software packages are compared. Their performances are assessed in terms of the following four measures: (1)?the ability to find the best observed solution, (2)?settling on a solution that classifies observations correctly, (3)?the number of local solutions found by each technique, and (4)?the speed at which the start values are obtained. On the basis of these results, a set of recommendations is provided to the user. 相似文献
12.
In recent years, trajectory approaches to characterizing individual differences in the onset and course of substance involvement have gained popularity. Previous studies have sometimes reported 4 prototypic courses: (a) a consistently "low" group, (b) an "increase" group, (c) a "decrease" group, and (d) a consistently "high" group. Although not always recovered, these trajectories are often found, despite these studies varying in the ages of the samples studied and the duration of the observation periods employed. Here, the authors examined the consistency with which these longitudinal patterns of heavy drinking were recovered in a series of latent class growth analyses that systematically varied the age of the sample at baseline, the duration of observation, and the number and frequency of measurement occasions. Data were drawn from a 4-year, 8-wave panel study of college student drinking (N = 3,720). Despite some variability across analyses, there was a strong tendency for these prototypes to emerge regardless of the participants' age at baseline and the duration of observation. These findings highlight potential problems with commonly employed trajectory-based approaches and the need to not over-reify these constructs. 相似文献
13.
Two-mode binary data matrices arise in a variety of social network contexts, such as the attendance or non-attendance of individuals
at events, the participation or lack of participation of groups in projects, and the votes of judges on cases. A popular method
for analyzing such data is two-mode blockmodeling based on structural equivalence, where the goal is to identify partitions
for the row and column objects such that the clusters of the row and column objects form blocks that are either complete (all
1s) or null (all 0s) to the greatest extent possible. Multiple restarts of an object relocation heuristic that seeks to minimize
the number of inconsistencies (i.e., 1s in null blocks and 0s in complete blocks) with ideal block structure is the predominant
approach for tackling this problem. As an alternative, we propose a fast and effective implementation of tabu search. Computational
comparisons across a set of 48 large network matrices revealed that the new tabu-search heuristic always provided objective
function values that were better than those of the relocation heuristic when the two methods were constrained to the same
amount of computation time. 相似文献
14.
A variance-to-range ratio variable weighting procedure is proposed. We show how this weighting method is theoretically grounded in the inherent variability found in data exhibiting cluster structure. In addition, a variable selection procedure is proposed to operate in conjunction with the variable weighting technique. The performances of these procedures are demonstrated in a simulation study, showing favorable results when compared with existing standardization methods. A detailed demonstration of the weighting and selection procedure is provided for the well-known Fisher Iris data and several synthetic data sets. 相似文献
15.
16.
This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison, 2002). Focus is given to the multivariate normal distribution, and 9 separate decompositions (i.e., class structures) of the covariance matrix are investigated. To provide a link to the current literature, comparisons are made with K-means clustering in 3 detailed Monte Carlo studies. The findings have implications for applied researchers in that mixture-model clustering techniques performed best when the covariance structure and number of clusters were known. However, as the information about the shape and number of clusters became unknown, degraded performance was observed for both K-means clustering and mixture-model clustering. 相似文献
17.
Properties of the Hubert-Arabie adjusted Rand index 总被引:1,自引:0,他引:1
Steinley D 《心理学方法》2004,9(3):386-396
This article provides an investigation of cluster validation indices that relates 4 of the indices to the L. Hubert and P. Arabie (1985) adjusted Rand index--the cluster validation measure of choice (G. W. Milligan & M. C. Cooper, 1986). It is shown how these other indices can be "roughly" transformed into the same scale as the adjusted Rand index. Furthermore, in-depth explanations are given of why classification rates should not be used in cluster validation research. The article concludes by summarizing several properties of the adjusted Rand index across many conditions and provides a method for testing the significance of observed adjusted Rand indices. 相似文献
18.
An evaluation of exact methods for the multiple subset maximum cardinality selection problem
下载免费PDF全文
![点击此处可从《The British journal of mathematical and statistical psychology》网站下载免费的PDF全文](/ch/ext_images/free.gif)
Michael J. Brusco Hans‐Friedrich Köhn Douglas Steinley 《The British journal of mathematical and statistical psychology》2016,69(2):194-213
The maximum cardinality subset selection problem requires finding the largest possible subset from a set of objects, such that one or more conditions are satisfied. An important extension of this problem is to extract multiple subsets, where the addition of one more object to a larger subset would always be preferred to increases in the size of one or more smaller subsets. We refer to this as the multiple subset maximum cardinality selection problem (MSMCSP). A recently published branch‐and‐bound algorithm solves the MSMCSP as a partitioning problem. Unfortunately, the computational requirement associated with the algorithm is often enormous, thus rendering the method infeasible from a practical standpoint. In this paper, we present an alternative approach that successively solves a series of binary integer linear programs to obtain a globally optimal solution to the MSMCSP. Computational comparisons of the methods using published similarity data for 45 food items reveal that the proposed sequential method is computationally far more efficient than the branch‐and‐bound approach. 相似文献
19.
The selection of a subset of variables from a pool of candidates is an important problem in several areas of multivariate
statistics. Within the context of principal component analysis (PCA), a number of authors have argued that subset selection
is crucial for identifying those variables that are required for correct interpretation of the components. In this paper,
we adapt the variable neighborhood search (VNS) paradigm to develop two heuristics for variable selection in PCA. The performances
of these heuristics were compared to those obtained by a branch-and-bound algorithm, as well as forward stepwise, backward
stepwise, and tabu search heuristics. In the first experiment, which considered candidate pools of 18 to 30 variables, the
VNS heuristics matched the optimal subset obtained by the branch-and-bound algorithm more frequently than their competitors.
In the second experiment, which considered candidate pools of 54 to 90 variables, the VNS heuristics provided better solutions
than their competitors for a large percentage of the test problems. An application to a real-world data set is provided to
demonstrate the importance of variable selection in the context of PCA. 相似文献
20.