首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Steinley (2007) provided a lower bound for the sum-of-squares error criterion function used in K-means clustering. In this article, on the basis of the lower bound, the authors propose a method to distinguish between 1 cluster (i.e., a single distribution) versus more than 1 cluster. Additionally, conditional on indicating there are multiple clusters, the procedure is extended to determine the number of clusters. Through a series of simulations, the proposed methodology is shown to outperform several other commonly used procedures for determining both the presence of clusters and their number.  相似文献   

2.
Replenishing item pools for on-line ability testing requires innovative and efficient data collection designs. By generating localD-optimal designs for selecting individual examinees, and consistently estimating item parameters in the presence of error in the design points, sequential procedures are efficient for on-line item calibration. The estimating error in the on-line ability values is accounted for with an item parameter estimate studied by Stefanski and Carroll. LocallyD-optimaln-point designs are derived using the branch-and-bound algorithm of Welch. In simulations, the overall sequential designs appear to be considerably more efficient than random seeding of items.This report was prepared under the Navy Manpower, Personnel, and Training R&D Program of the Office of the Chief of Naval Research under Contract N00014-87-0696. The authors wish to acknowledge the valuable advice and consultation given by Ronald Armstrong, Charles Davis, Bradford Sympson, Zhaobo Wang, Ing-Long Wu and three anonymous reviewers.  相似文献   

3.
When comparing examinees to a control, the examiner usually does not know the probability of correctly classifying the examinees based on the number of items used and the number of examinees tested. Using ranking and selection techniques, a general framework is described for deriving a lower bound on this probability. We illustrate how these techniques can be applied to the binomial error model. New exact results are given for normal populations having unknown and unequal variances.The work upon which this publication is based was performed pursuant to a grant [Grant No. NIE-G-76-0083] with the National Institute of Education, Department of Health, Education and Welfare. Points of view or opinions stated do not necessarily represent official NIE position or policy.  相似文献   

4.
At least four approaches have been used to estimate communalities that will leave an observed correlation matrixR Gramian and with minimum rank. It has long been known that the square of the observed multiple-correlation coefficient is a lower bound to any communality of a variable ofR. This lower bound actually provides a best possible estimate in several senses. Furthermore, under certain conditions basic to the Spearman-Thurstone common-factor theory, the bound must equal the communality in the limit as the number of observed variables increases. Otherwise, this type of theory cannot hold forR.This research was facilitated by a grant from the Lucius N. Littauer Foundation to the American Committee for Social Research in Israel in order to promote methodological work of the Israel Institute of Applied Social Research.  相似文献   

5.
Several neural networks have been proposed in the general literature for pattern recognition and clustering, but little empirical comparison with traditional methods has been done. The results reported here compare neural networks using Kohonen learning with a traditional clustering method (K-means) in an experimental design using simulated data with known cluster solutions. Two types of neural networks were examined, both of which used unsupervised learning to perform the clustering. One used Kohonen learning with a conscience and the other used Kohonen learning without a conscience mechanism. The performance of these nets was examined with respect to changes in the number of attributes, the number of clusters, and the amount of error in the data. Generally, theK-means procedure had fewer points misclassified while the classification accuracy of neural networks worsened as the number of clusters in the data increased from two to five.Acknowledgements: Sara Dickson, Vidya Nair, and Beth Means assisted with the neural network analyses.  相似文献   

6.
Recently there has been interest in the problem of determining an optimal passing score for a mastery test when the purpose of the test is to predict success or failure on an external criterion. For the case of constant losses for the two error types, a method of determining an optimal passing score is readily derived using standard techniques. The purpose of this note is to describe a lower bound to the probability of identifying an optimal passing score based on a random sample ofN examinees.The work upon which this publication is based was performed pursuant to a grant [contract] with the National Institute of Education, Department of Health, Education and Welfare. Points of view or opinions stated do not necessarily represent official NIE position or policy.  相似文献   

7.
Clustering individuals by measures of similarity or dissimilarity at trajectories of changes in longitudinal data enables determination of typical patterns of development and growth. The present research proposes a new constrained k‐means method with lower bound constraints on cluster proportions and distances among clusters at focused variables and time points to fulfill various needs in clustering longitudinal data. The method assumes a large number of clusters at the onset and iteratively deletes and combines clusters according to these constraints. An additional property of the proposed constrained k‐means includes direct estimation of the unknown number of clusters. Simulation results clearly show the usefulness of the method for extracting clusters in plausible, real‐life analysis including non‐normality within clusters, and the proposed algorithm works well and convergence of the estimates is satisfactory. An actual example using Japanese longitudinal data regarding sleep habits and mental health is presented to verify the utility of the proposed constrained k‐means.  相似文献   

8.
One of the intriguing questions of factor analysis is the extent to which one can reduce the rank of a symmetric matrix by only changing its diagonal entries. We show in this paper that the set of matrices, which can be reduced to rankr, has positive (Lebesgue) measure if and only ifr is greater or equal to the Ledermann bound. In other words the Ledermann bound is shown to bealmost surely the greatest lower bound to a reduced rank of the sample covariance matrix. Afterwards an asymptotic sampling theory of so-called minimum trace factor analysis (MTFA) is proposed. The theory is based on continuous and differential properties of functions involved in the MTFA. Convex analysis techniques are utilized to obtain conditions for differentiability of these functions.  相似文献   

9.
In the signal detection paradigm, the non-parametric index of sensitivity A′, as first introduced by Pollack and Norman (1964), is a popular alternative to the more traditional d′ measure of sensitivity. Smith (1995) clarified a confusion about the interpretation of A′ in relation to the area beneath proper receiver operating characteristic (ROC) curves, and provided a formula (which he called A′′) for this commonly held interpretation. However, he made an error in his calculations. Here, we rectify this error by providing the correct formula (which we call A) and compare the discrepancy that would have resulted. The corresponding measure for bias b is also provided. Since all such calculations apply to “proper” ROC curves with non-decreasing slopes, we also prove, as a separate result, the slope-monotonicity of ROC curves generated by likelihood-ratio criterion.This revised article was published online in August 2005 with the PDF paginated correctly.  相似文献   

10.
The test-retest reliability of qualitative items, such as occur in achievement tests, attitude questionnaires, public opinion surveys, and elsewhere, requires a different technique of analysis from that of quantitative variables. Definitions appropriate to the qualitative case are made both for the reliability coefficient of an individual on an item and for the reliability coefficient of a population on the item. From but a single trial of a large population on the item, it is possible to compute alower bound to the group reliability coefficient. Two kinds of lower bounds are presented. From two experimentally independent trials of the population on the item, it is possible to compute anupper bound to the group reliability coefficient. Two upper bounds are presented. The computations for the lower and upper bounds are all very simple. Numerical examples are given.  相似文献   

11.
A rationale and test for the number of factors in factor analysis   总被引:7,自引:0,他引:7  
John L. Horn 《Psychometrika》1965,30(2):179-185
It is suggested that if Guttman's latent-root-one lower bound estimate for the rank of a correlation matrix is accepted as a psychometric upper bound, following the proofs and arguments of Kaiser and Dickman, then the rank for a sample matrix should be estimated by subtracting out the component in the latent roots which can be attributed to sampling error, and least-squares capitalization on this error, in the calculation of the correlations and the roots. A procedure based on the generation of random variables is given for estimating the component which needs to be subtracted.I wish to acknowledge the valuable help given by J. Jaspers and L. G. Humphreys in the development of the ideas presented in this paper.  相似文献   

12.
A number of people have suggested that there is a link between information integration and consciousness, and a number of algorithms for calculating information integration have been put forward. The most recent of these is Balduzzi and Tononi’s state-based Φ algorithm, which has factorial dependencies that severely limit the number of neurons that can be analyzed. To address this issue an alternative state-based measure known as liveliness has been developed, which uses the causal relationships between neurons to identify the areas of maximum information integration. This paper outlines the state-based Φ and liveliness algorithms and sets out a number of test networks that were used to compare their accuracy and performance. The results show that liveliness is a reasonable approximation to state-based Φ for some network topologies, and it has a much more scalable performance than state-based Φ.  相似文献   

13.
Several authors have suggested that prior to conducting a confirmatory factor analysis it may be useful to group items into a smaller number of item ‘parcels’ or ‘testlets’. The present paper mathematically shows that coefficient alpha based on these parcel scores will only exceed alpha based on the entire set of items if W, the ratio of the average covariance of items between parcels to the average covariance of items within parcels, is greater than unity. If W is less than unity, however, and errors of measurement are uncorrelated, then stratified alpha will be a better lower bound to the reliability of a measure than the other two coefficients. Stratified alpha are also equal to the true reliability of a test when items within parcels are essentially tau‐equivalent if one assumes that errors of measurement are not correlated.  相似文献   

14.
Let Σ x be the (population) dispersion matrix, assumed well-estimated, of a set of non-homogeneous item scores. Finding the greatest lower bound for the reliability of the total of these scores is shown to be equivalent to minimizing the trace of Σ x by reducing the diagonal elements while keeping the matrix non-negative definite. Using this approach, Guttman's bounds are reviewed, a method is established to determine whether his λ4 (maximum split-half coefficient alpha) is the greatest lower bound in any instance, and three new bounds are discussed. A geometric representation, which sheds light on many of the bounds, is described. Present affiliation of the second author: Department of Statistics, University of Nigeria (Nsukka Campus). Work on this paper was carried out while on study leave in Aberystwyth.  相似文献   

15.
We show that every proper normal extension of the bi-modal system S5 2 has the poly-size model property. In fact, to every proper normal extension L of S5 2 corresponds a natural number b(L) - the bound of L. For every L, there exists a polynomial P(·) of degree b(L) + 1 such that every L-consistent formula is satisfiable on an L-frame whose universe is bounded by P(||), where || denotes the number of subformulas of . It is shown that this bound is optimal.  相似文献   

16.
In theory, the greatest lower bound (g.l.b.) to reliability is the best possible lower bound to the reliability based on single test administration. Yet the practical use of the g.l.b. has been severely hindered by sampling bias problems. It is well known that the g.l.b. based on small samples (even a sample of one thousand subjects is not generally enough) may severely overestimate the population value, and statistical treatment of the bias has been badly missing. The only results obtained so far are concerned with the asymptotic variance of the g.l.b. and of its numerator (the maximum possible error variance of a test), based on first order derivatives and the asumption of multivariate normality. The present paper extends these results by offering explicit expressions for the second order derivatives. This yields a closed form expression for the asymptotic bias of both the g.l.b. and its numerator, under the assumptions that the rank of the reduced covariance matrix is at or above the Ledermann bound, and that the nonnegativity constraints on the diagonal elements of the matrix of unique variances are inactive. It is also shown that, when the reduced rank is at its highest possible value (i.e., the number of variables minus one), the numerator of the g.l.b. is asymptotically unbiased, and the asymptotic bias of the g.l.b. is negative. The latter results are contrary to common belief, but apply only to cases where the number of variables is small. The asymptotic results are illustrated by numerical examples.This research was supported by grant DMI-9713878 from the National Science Foundation.  相似文献   

17.
In one well-known model for psychological distances, objects such as stimuli are placed in a hierarchy of clusters like a phylogenetic tree; in another common model, objects are represented as points in a multidimensional Euclidean space. These models are shown theoretically to be mutually exclusive and exhaustive in the following sense. The distances among a set ofn objects will be strictly monotonically related either to the distances in a hierarchical clustering system, or else to the distances in a Euclidean space of less thann — 1 dimensions, but not to both. Consequently, a lower bound on the number of Euclidean dimensions necessary to represent a set of objects is one less than the size of the largest subset of objects whose distances satisfy the ultrametric inequality, which characterizes the hierarchical model.This work was supported in part by Grant GB-13588X from the National Science Foundation. I would like to thank L. M. Kelly and A. A. J. Marley for their helpful comments and suggestions.  相似文献   

18.
A model is proposed which treats rankings given by a group of judges as representing regions in an isotonic space of dimensionalityr. Three possible criteria for estimating lower bound dimensionality are discussed: mutual boundary, cardinality, and the occurrence of transposition groups. Problems associated with each criterion are mentioned.Deceased.  相似文献   

19.
Summary A new, elaborated version of a time-quantum model (TQM) is outlined and illustrated by applying it to different experimental paradigms. As a basic prerequisite TQM adopts the coexistence of different discrete time units or (perceptual) intermittencies as constituent elements of the temporal architecture of mental processes. Unlike similar other approaches, TQM assumes the existence of an absolute lower bound for intermittencies, the time-quantum T, as an (approximately) universal constant and which has a duration of approximately 4.5 ms. Intermittencies of TQM must be multiples T k=k·T * within the interval T *T kL·T *M·T * with T *=q·T and integer q, k, L, and M. Here M denotes an upper bound for multipliers characteristic of individuals, the so-called coherence length; q and L may depend on task, individual and other factors. A second constraint is that admissible intermittencies must be integer fractions of L, the operative upper bound. In addition, M is assumed to determine the number of elementary information units to be stored in short-term memory.  相似文献   

20.
I compared the randomization/permutation test and theF test for a two-cell comparative experiment. I varied (1) the number of observations per cell, (2) the size of the treatment effect, (3) the shape of the underlying distribution of error and, (4) for cases with skewed error, whether or not the skew was correlated with the treatment. With normal error, there was little difference between the tests. When error was skewed, by contrast, the randomization test was more sensitive than theF test, and if the amount of skew was correlated with the treatment, the advantage for the randomization test was both large and positively correlated with the treatment. I conclude that, because the randomization test was never less powerful than theF test, it should replace theF test in routine work.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号