期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An Empirical Investigation of Population Invariance in the Value of Subscores

Sandip Sinharay Shelby J. Haberman 《International Journal of Testing》2014,14(1):22-48

Recently there has been an increasing level of interest in subtest scores, or subscores, for their potential diagnostic value. Haberman (2008 Haberman, S. J. 2008. When can subscores have value?. Journal of Educational and Behavioral Statistics, 33: 204–229. [Crossref], [Web of Science ®] , [Google Scholar]) suggested a method to determine if a subscore has added value over the total score. Researchers have often been interested in the performance of subgroups—for example, those based on gender or ethnicity—on subtests. Several researchers found that the difference in performance between the gender-based subgroups varied over the different subtests. In this article, we examine whether the added values of the subscores vary between subgroups using data from several operational tests, including an international English proficiency test. For these data sets, the added values of the subscores occasionally vary over the subgroups, but the added values of the augmented subscores are invariant over the subgroups. 相似文献

2.

Prediction of true test scores from observed item scores and ancillary data

下载免费PDF全文

Shelby J. Haberman Lili Yao Sandip Sinharay 《The British journal of mathematical and statistical psychology》2015,68(2):363-385

In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE^®General Analytical Writing and until 2009 in the case of TOEFL^® iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e‐rater^®. In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. 相似文献

3.

Evaluating interobserver reliability of interval data

Hopkins BL Hermann JA 《Journal of applied behavior analysis》1977,10(1):121-126

Previous recommendations to employ occurrence, nonoccurrence, and overall estimates of interobserver reliability for interval data are reviewed. A rationale for comparing obtained reliability to reliability that would result from a random-chance model is explained. Formulae and graphic functions are presented to allow for the determination of chance agreement for each of the three indices, given any obtained per cent of intervals in which a response is recorded to occur. All indices are interpretable throughout the range of possible obtained values for the per cent of intervals in which a response is recorded. The level of chance agreement simply changes with changing values. Statistical procedures that could be used to determine whether obtained reliability is significantly superior to chance reliability are reviewed. These procedures are rejected because they yield significance levels that are partly a function of sample sizes and because there are no general rules to govern acceptable significance levels depending on the sizes of samples employed. 相似文献

4.

Unbiased estimators of ability parameters,of their variance,and of their parallel-forms reliability

Frederic M. Lord 《Psychometrika》1983,48(2):233-245

Given known item parameters, unbiased estimators are derived i) for an examinee's ability parameter and for his proportion-correct true score, ii) for the variances of and across examinees in the group tested, and iii) for the parallel-forms reliability of the maximum likelihood estimator .This work was supported in part by contract N00014-80-C-0402, project designation NR 150-453 between the Office of Naval Research and Educational Testing Service. Reproduction in whole or in part is permitted for any purpose of the United States Government. 相似文献

5.

Estimating latent distributions in recurrent choice data 总被引：1，自引：0，他引：1

Ulf Böckenholt 《Psychometrika》1993,58(3):489-509

This paper introduces a flexible class of stochastic mixture models for the analysis and interpretation of individual differences in recurrent choice and other types of count data. These choice models are derived by specifying elements of the choice process at the individual level. Probability distributions are introduced to describe variations in the choice process among individuals and to obtain a representation of the aggregate choice behavior. Due to the explicit consideration of random effect sources, the choice models are parsimonious and readily interpretable. An easy to implement EM algorithm is presented for parameter estimation. Two applications illustrate the proposed approach. 相似文献

6.

Empirical bayes estimation of coefficients in the general linear model from data of deficient rank

Henry I. Braun Ph.D. Douglas H. Jones Donald B. Rubin Dorothy T. Thayer 《Psychometrika》1983,48(2):171-181

Empirical Bayes methods are shown to provide a practical alternative to standard least squares methods in fitting high dimensional models to sparse data. An example concerning prediction bias in educational testing is presented as an illustration.The authors would like to thank the referees for several useful comments.The analysis of the data discussed in this report was part of a study funded jointly by the Graduate Management Admission Council and Educational Testing Service. 相似文献

7.

A comparison of latent trait and latent class analyses of Likert-type data

Geofferey N. Masters 《Psychometrika》1985,50(1):69-82

This paper brings together and compares two developments in the analysis of Likert attitude scales. The first is the generalization of latent class models to ordered response categories. The second is the introduction of latent trait models with multiplicative parameter structures for the analysis of rating scales. Key similarities and differences between these two methods are described and illustrated by applying a latent trait model and a latent class model to the analysis of a set of life satisfaction data. The way in which the latent trait model defines a unit of measurement, takes into account the order of the response categories, and scales the latent classes, is discussed. While the latent class model provides better fit to these data, this is achieved at the cost of a logically inconsistent assignment of individuals to latent classes.The author wishes to thank Clifford C. Clogg, Otis Dudley Duncan and Benjamin D. Wright for their helpful comments on an earlier version of this paper. 相似文献

8.

Continuous and discrete latent structure models for item response data

Edward H. Haertel 《Psychometrika》1990,55(3):477-494

Relations are examined between latent trait and latent class models for item response data. Conditions are given for the two-latent class and two-parameter normal ogive models to agree, and relations between their item parameters are presented. Generalizationss are then made to continuous models with more than one latent trait and discrete models with more than two latent classes, and methods are presented for relating latent class models to factor models for dichotomized variables. Results are illustrated using data from the Law School Admission Test, previously analyzed by several authors. 相似文献

9.

Mixed-effects analyses of rank-ordered data

Ulf Böckenholt 《Psychometrika》2001,66(1):45-62

相似文献

10.

Constrained latent class analysis: Simultaneous classification and scaling of discrete choice data 总被引：2，自引：0，他引：2

Ulf Böckenholt Ingo Böckenholt 《Psychometrika》1991,56(4):699-716

A reparameterization of a latent class model is presented to simultaneously classify and scale nominal and ordered categorical choice data. Latent class-specific probabilities are constrained to be equal to the preference probabilities from a probabilistic ideal-point or vector model that yields a graphical, multidimensional representation of the classification results. In addition, background variables can be incorporated as an aid to interpreting the latent class-specific response probabilities. The analyses of synthetic and real data sets illustrate the proposed method.The authors thank Yosiho Takane, the editor and referees for their valuable suggestions. Authors are listed in reverse alphabetical order. 相似文献

11.

A unifying expression for the maximal reliability of a linear composite

Heng Li 《Psychometrika》1997,62(2):245-249

A formally simple expression for the maximal reliability of a linear composite is provided, its theoretical implications and its relation to existing results for reliability are discussed. 相似文献

12.

The development of children's rule use on the balance scale task 总被引：6，自引：0，他引：6

Jansen BR van der Maas HL 《Journal of experimental child psychology》2002,81(4):383-416

Cognitive development can be characterized by a sequence of increasingly complex rules or strategies for solving problems. Our work focuses on the development of children's proportional reasoning, assessed by the balance scale task using Siegler's (1976, 1981) rule assessment methodology. We studied whether children use rules, whether children of different ages use qualitatively different rules, and whether rules are used consistently. Nonverbal balance scale problems were administered to 805 participants between 5 and 19 years of age. Latent class analyses indicate that children use rules, that children of different ages use different rules, and that both consistent and inconsistent use of rules occurs. A model for the development of reasoning about the balance scale task is proposed. The model is a restricted form of the overlapping waves model (Siegler, 1996) and predicts both discontinuous and gradual transitions between rules. 相似文献

13.

On the reliability of the extreme score

Huynh Huynh 《Psychometrika》1986,51(3):475-478

Under the assumption of normality, a formula is derived for the reliability of the maximum score. It is shown that the maximum score is more reliable than each of the single observations, but less reliable than their composite score. 相似文献

14.

Beyond the qualitative paradigm: A framework for introducing diversity within qualitative psychology

Karen Henwood Nick Pidgeon 《Journal of community & applied social psychology》1994,4(4):225-238

The case for qualitative research in psychology is considered. We argue against the idea that qualitative research is merely a matter of technique or method, and question the utility of viewing it as a unitary paradigm. Rather, the links between epistemology, methodology, and method are explored within three theorized strands of qualitative inquiry, making reference to illustrative projects. Each strand is organized around a different approach to the issues of justifying and warranting psychological knowledge: (1) reliability and validity; (2) generativity and grounding; and (3) discourse and reflexivity. These are exemplified in Miles and Huberman's ‘data display’ model, Glaser and Strauss' method of ‘grounded theory’, and in various forms of ‘discourse’ analysis. Reflections upon points of contact between the three strands address two main issues: (1) rendering research publicly accountable; and (2) challenging relativism. 相似文献

15.

On the estimation of parameters in latent structure analysis

Leo A. Goodman 《Psychometrika》1979,44(1):123-128

In this note, we describe the iterative procedure introduced earlier by Goodman to calculate the maximum likelihood estimates of the parameters in latent structure analysis, and we provide here a simple and direct proof of the fact that the parameter estimates obtained with the iterative procedure cannot lie outside the allowed interval. Formann recently stated that Goodman's algorithm can yield parameter estimates that lie outside the allowed interval, and we prove in the present note that Formann's contention is incorrect.This research was supported in part by Research Contract No. NSF SOC 76-80389 from the Division of the Social Sciences of the National Science Foundation. The author is indebted to C. C. Clogg for helpful comments and for the numerical results reported here (see, e.g., Table 1). 相似文献

16.

Four bootstrap confidence intervals for the binomial-error model

Miao-Hsiang Lin Chao A. Hsiung 《Psychometrika》1992,57(4):499-520

Confidence intervals for the mean function of the true proportion score ( _x), where andx respectively denote the true proportion and observed test scores, can be approximated by the Efron, Bayesian, and parametric empirical Bayes (PEB) bootstrap procedures. The similarity of results yielded by all the bootstrap methods suggests the following: the unidentifiability problem of the prior distributiong() can be bypassed with respect to the construction of confidence intervals for the mean function, and a beta distribution forg() is a reasonable assumption for the test scores in compliance with a negative hypergeometric distribution. The PEB bootstrap, which reflects the construction of Morris intervals, is introduced for computing predictive confidence bands for x. It is noted that the effect of test reliability on the precision of interval estimates varies with the two types of confidence statements concerned.The Authors are indebted to the Editor and anonymous reviewers for constructive suggestions and comments. The authors wish to thank Min-Te Chao and Cheng-Der Fuh for some useful suggestions at earlier stages of writing this paper. 相似文献

17.

A graphical judgmental aid which summarizes obtained and chance reliability data and helps assess the believability of experimental effects

Birkimer JC Brown JH 《Journal of applied behavior analysis》1979,12(4):523-533

Interval by interval reliability has been criticized for "inflating" observer agreement when target behavior rates are very low or very high. Scored interval reliability and its converse, unscored interval reliability, however, vary as target behavior rates vary when observer disagreement rates are constant. These problems, along with the existence of "chance" values of each reliability which also vary as a function of response rate, may cause researchers and consumers difficulty in interpreting observer agreement measures. Because each of these reliabilities essentially compares observer disagreements to a different base, it is suggested that the disagreement rate itself be the first measure of agreement examined, and its magnitude relative to occurrence and to nonoccurrence agreements then be considered. This is easily done via a graphic presentation of the disagreement range as a bandwidth around reported rates of target behavior. Such a graphic presentation summarizes all the information collected during reliability assessments and permits visual determination of each of the three reliabilities. In addition, graphing the "chance" disagreement range around the bandwidth permits easy determination of whether or not true observer agreement has likely been demonstrated. Finally, the limits of the disagreement bandwidth help assess the believability of claimed experimental effects: those leaving no overlap between disagreement ranges are probably believable, others are not. 相似文献

18.

A latent class distance association model for cross‐classified data with a categorical response variable

José Fernando Vera Mark de Rooij Willem J. Heiser 《The British journal of mathematical and statistical psychology》2014,67(3):514-540

In this paper we propose a latent class distance association model for clustering in the predictor space of large contingency tables with a categorical response variable. The rows of such a table are characterized as profiles of a set of explanatory variables, while the columns represent a single outcome variable. In many cases such tables are sparse, with many zero entries, which makes traditional models problematic. By clustering the row profiles into a few specific classes and representing these together with the categories of the response variable in a low‐dimensional Euclidean space using a distance association model, a parsimonious prediction model can be obtained. A generalized EM algorithm is proposed to estimate the model parameters and the adjusted Bayesian information criterion statistic is employed to test the number of mixture components and the dimensionality of the representation. An empirical example highlighting the advantages of the new approach and comparing it with traditional approaches is presented. 相似文献

19.

Model based clustering of large data sets: Tracing the development of spelling ability

Herbert?Hoijtink Email author Annelise?Notenboom 《Psychometrika》2004,69(3):481-498

There are two main theories with respect to the development of spelling ability: the stage model and the model of overlapping waves. In this paper exploratory model based clustering will be used to analyze the responses of more than 3500 pupils to subsets of 245 items. To evaluate the two theories, the resulting clusters will be ordered along a developmental dimension using an external criterion. Solutions for three statistical problems will be given: (1) an algorithm that can handle large data sets and only renders non-degenerate clusters; (2) a goodness of fit test that is not affected by the fact that the number of possible response vectors by far out-weights the number of observed response vectors; and (3) a new technique,data expunction, that can be used to evaluate goodness-of-fit tests if the missing data mechanism is known. Research supported by a grant (NWO 411-21-006) of the Dutch Organization for Scientific Research. 相似文献

20.

分类精确性指数Entropy在潜剖面分析中的表现:一项蒙特卡罗模拟研究

王孟成邓俏文毕向阳叶浩生杨文登《心理学报》2017,(11):1473-1482

本研究通过蒙特卡洛模拟考查了分类精确性指数Entropy及其变式受样本量、潜类别数目、类别距离和指标个数及其组合的影响情况。研究结果表明:(1)尽管Entropy值与分类精确性高相关,但其值随类别数、样本量和指标数的变化而变化,很难确定唯一的临界值;(2)其他条件不变的情况下,样本量越大,Entropy的值越小,分类精确性越差;(3)类别距离对分类精确性的影响具有跨样本量和跨类别数的一致性;(4)小样本(N=50~100)的情况下,指标数越多,Entropy的结果越好;(5)在各种条件下Entropy对分类错误率比其它变式更灵敏。相似文献