首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
An update is presented on the steady decline in the education and training of psychometricians, quantitative psychologists, and personality assessment psychologists in North America.  相似文献   

Mark Wilson 《Psychometrika》2013,78(2):211-236
In this paper, I will review some aspects of psychometric projects that I have been involved in, emphasizing the nature of the work of the psychometricians involved, especially the balance between the statistical and scientific elements of that work. The intent is to seek to understand where psychometrics, as a discipline, has been and where it might be headed, in part at least, by considering one particular journey (my own). In contemplating this, I also look to psychometrics journals to see how psychometricians represent themselves to themselves, and in a complementary way, look to substantive journals to see how psychometrics is represented there (or perhaps, not represented, as the case may be). I present a series of questions in order to consider the issue of what are the appropriate foci of the psychometric discipline. As an example, I present one recent project at the end, where the roles of the psychometricians and the substantive researchers have had to become intertwined in order to make satisfactory progress. In the conclusion I discuss the consequences of such a view for the future of psychometrics.  相似文献   

决策一致性指考生在两次平行测验中被一致归类的程度,是衡量标准参照测验质量的重要指标.到目前为止,基于经典测量模型和项目反应模型,研究者已经提出了数十种估计决策一致性指标的方法,并对这些方法的优劣进行了比较.由于模型基础和对分数分布的假设不同,各种方法适用于不同的测验情境.未来的研究应当对已有方法进行验证,并探讨决策一致性在教育测量中的应用途径,为教育和心理测量工作者估计测验的决策一致性指标提供凭据.  相似文献   

T. Krishnan 《Psychometrika》1973,38(3):291-304
A method is given for finding a linear combination of binary item scores that minimizes the expected frequency of misclassification, in discriminating between two groups. The item scores are not assumed to be stochastically independent. The method uses the theory of threshold functions, developed by electrical engineers. Since psychometricians may not be familiar with this theory an elementary introduction to the required material is also given.  相似文献   

Despite significant advances over the past three decades in our understanding of the implicit dangers in ad hoc psychometric procedures, some important questions remain, particularly as regards the nature of the underlying reasoning process by which subjectively meaningful theoretical impressions are formulated and expressed. The present article seeks to address this issue, with particular attention being given to the distinction between demonstrative and dialectical reasoning. Evidence is offered to show that, at least under certain conditions, intuitive psychometricians formulate and express theoretical impressions on the basis of a reasoning process that is essentially dialectical in nature. Some major implications of this point of view for the study of personality theory are discussed. The limitations of the evidence presented, and the need for further research, are forgone.  相似文献   

This article provides the theory and application of the 2-stage maximum likelihood (ML) procedure for structural equation modeling (SEM) with missing data. The validity of this procedure does not require the assumption of a normally distributed population. When the population is normally distributed and all missing data are missing at random (MAR), the direct ML procedure is nearly optimal for SEM with missing data. When missing data mechanisms are unknown, including auxiliary variables in the analysis will make the missing data mechanism more likely to be MAR. It is much easier to include auxiliary variables in the 2-stage ML than in the direct ML. Based on most recent developments for missing data with an unknown population distribution, the article first provides the least technical material on why the normal distribution-based ML generates consistent parameter estimates when the missing data mechanism is MAR. The article also provides sufficient conditions for the 2-stage ML to be a valid statistical procedure in the general case. For the application of the 2-stage ML, an SAS IML program is given to perform the first-stage analysis and EQS codes are provided to perform the second-stage analysis. An example with open- and closed-book examination data is used to illustrate the application of the provided programs. One aim is for quantitative graduate students/applied psychometricians to understand the technical details for missing data analysis. Another aim is for applied researchers to use the method properly.  相似文献   

The Reduced Reparameterized Unified Model (Reduced RUM) is a diagnostic classification model for educational assessment that has received considerable attention among psychometricians. However, the computational options for researchers and practitioners who wish to use the Reduced RUM in their work, but do not feel comfortable writing their own code, are still rather limited. One option is to use a commercial software package that offers an implementation of the expectation maximization (EM) algorithm for fitting (constrained) latent class models like Latent GOLD or Mplus. But using a latent class analysis routine as a vehicle for fitting the Reduced RUM requires that it be re-expressed as a logit model, with constraints imposed on the parameters of the logistic function. This tutorial demonstrates how to implement marginal maximum likelihood estimation using the EM algorithm in Mplus for fitting the Reduced RUM.  相似文献   

This article is based largely upon the author's invited address at the 113th annual convention of the American Psychological Convention, Washington, DC, as the 2005 recipient of the Samuel J. Messick Award bestowed by APA Division 5 and the Educational Testing Service. The author summarizes the growth of graduate training in psychometrics and quantitative psychology, in the years prior to and following the end of WWII. He then opines the steady decline in the training of psychometricians and quantitative psychologists beginning in the 1970s and continuing into the 20th century. Likely causes of the decline are inferred and prospects for strengthening the quantitative skills of doctorates are discussed, including recommendations for reversing the downward trend.  相似文献   

Borsboom (2006) attacks psychologists for failing to incorporate psychometric advances in their work, discusses factors that contribute to this regrettable situation, and offers suggestions for ameliorating it. This commentary applauds Borsboom for calling the field to task on this issue and notes additional problems in the field regarding measurement that he could add to his critique. It also chastises Borsboom for occasionally being unnecessarily perjorative in his critique, noting that negative rhetoric is unlikely to make converts of offenders. Finally, it exhorts psychometricians to make their work more accessible and points to Borsboom, Mellenbergh, and Van Heerden (2003) as an excellent example of how this can be done. I wish to thank Frank Schmidt for his help in preparing this paper. Requests for reprints should be sent to la-clark@uiowa.edu.  相似文献   

陈平  辛涛 《心理学报》2011,43(7):836-850
项目的增补对认知诊断计算机化自适应测验(CD-CAT)题库的开发与维护至关重要。借鉴单维项目反应理论(IRT)中联合极大似然估计方法(JMLE)的思路, 提出联合估计算法(JEA), 仅依赖被试在旧题和新题上的作答反应联合地、自动地估计新题的属性向量和新题的项目参数。研究结果表明:当项目参数相对较小且样本量相对较大时, JEA算法在新题属性向量和新题项目参数估计精度方面表现不错; 而且样本大小、项目参数大小以及项目参数初值都影响着JEA算法的表现。  相似文献   

The American Board of Genetic Counseling (ABGC) performed a genetic counseling practice analysis (PA) to determine the content of the certification examination. The ABGC-appointed PA Advisory Committee worked with psychometricians to develop a survey which was distributed to 2,038 genetic counselors in the United States and Canada. The survey was also accessible on the ABGC website. Multiple criteria were used to establish the significance of the tasks included in the survey. A total of 677 responses were used in the analysis, representing a 37.1% corrected response rate. Five major content domains with 143 tasks were identified in the PA. New certification test specifications were developed on the basis of PA results and will be used in developing future examination forms. In keeping with credentialing standards, ABGC plans to conduct a PA on a regular basis so that the content of the examination reflects current practice.  相似文献   

The Kaufman Assessment Battery for Children (K-ABC) has been touted in the popular press and the publisher's advertising as a nonbiased measure of children's intellectual functioning. The claims are based substantially on a reduction of the mean differences in overall test performance between blacks and whites on the K-ABC. However, mean differences are the most often rejected indicators of test bias by most serious psychometricians, and while the issue of bias is complex and requires multiple approaches, one necessary test of bias requires a comparison of the predictive validity of the scale for blacks and for whites. The present study reports on the results of an investigation of the criterion-related validity of the K-ABC, predicting reading comprehension, arithmetic, and general achievement, for large samples of blacks and whites tested during the standardization of the battery. The statistical technique of Potthoff is used to test for bias under the regression definition of Cleary. The results are reported separately for several age groups. The Sequential and Mental Processing Composite scales tended to overpredict black children's academic levels, especially on the K-ABC achievement scales. Some differences also occurred with the Simultaneous Processing scale but with less frequency.  相似文献   

Current psychometric models of choice behavior are strongly influenced by Thurstone’s (1927, 1931) experimental and statistical work on measuring and scaling preferences. Aided by advances in computational techniques, choice models can now accommodate a wide range of different data types and sources of preference variability among respondents induced by such diverse factors as person-specific choice sets or different functional forms for the underlying utility representations. At the same time, these models are increasingly challenged by behavioral work demonstrating the prevalence of choice behavior that is not consistent with the underlying assumptions of these models. I discuss new modeling avenues that can account for such seemingly inconsistent choice behavior and conclude by emphasizing the interdisciplinary frontiers in the study of choice behavior and the resulting challenges for psychometricians. The author would like to thank R. Darrell Bock whose work inspired many of the ideas presented here. The paper benefitted from helpful comments by Albert Maydeu-Olivares and Rung-Ching Tsai. The reported research was supported in parts by the Social Sciences and Humanities Research Council of Canada.  相似文献   

An analysis of social desirability in personality assessment is presented. Starting with the symptoms, Study 1 showed that mean ratings of graded personality items are moderately to strongly linearly related to social desirability (Self Deception, Impression formation, and the first Principal Component), suggesting that item popularity may be a useful heuristic tool for identifying items which elicit socially desirable responding. We diagnose the cause of socially desirable responding as an interaction between the evaluative content of the item and enhancement motivation in the rater. Study 2 introduced a possible cure; evaluative neutralization of items. To test the feasibility of the method lay psychometricians (undergraduates) reformulated existing personality test items according to written instructions. The new items were indeed lower in social desirability while essentially retaining the five factor structure and reliability of the inventory. We conclude that although neutralization is no miracle cure, it is simple and has beneficial effects.  相似文献   

Given that a key function of tests is to serve as evaluation instruments and for decision making in the fields of psychology and education, the possibility that some of their items may show differential behaviour is a major concern for psychometricians. In recent decades, important progress has been made as regards the efficacy of techniques designed to detect this differential item functioning (DIF). However, the findings are scant when it comes to explaining its causes. The present study addresses this problem from the perspective of multilevel analysis. Starting from a case study in the area of transcultural comparisons, multilevel logistic regression is used: 1) to identify the item characteristics associated with the presence of DIF; 2) to estimate the proportion of variation in the DIF coefficients that is explained by these characteristics; and 3) to evaluate alternative explanations of the DIF by comparing the explanatory power or fit of different sequential models. The comparison of these models confirmed one of the two alternatives (familiarity with the stimulus) and rejected the other (the topic area) as being a cause of differential functioning with respect to the compared groups.  相似文献   

Despite definitions in standard sources, personnel managers, psychologists, and psychometricians persistently encounter problems that are best referred to as the ambiguous nature of validity. The purpose of this article is to pro- vide an overview of construct validity and personnel testing, to demonstrate its practical utility, and to clarify with concrete examples certain theories and models, as well as to illustrate the meaning of the terminology used by com- mentators on the topic. A brief historical overview of testing and validation is presented; the progress of construct validity and its acceptance by various sec- tors of society is discussed parsimoniously in the section The Seven Wonders of Personnel Psychology. In the past, personnel psychologists have not done a very good job of understanding the constructs that underlie test perform- ance. Some new approaches can help to correct this. A process should be rou- tinely used on all tests in order to develop an understanding of the constructs that underlie performance on an employment test; only by knowing the cor- rect criterion and method of measuring it can we ascertain the intrinsic valid- ity of our measures.  相似文献   

The paper surveys 15 years of progress in three psychometric research areas: latent dimensionality structure, test fairness, and skills diagnosis of educational tests. It is proposed that one effective model for selecting and carrying out research is to chose one's research questions from practical challenges facing educational testing, then bring to bear sophisticated probability modeling and statistical analyses to solve these questions, and finally to make effectiveness of the research answers in meeting the educational testing challenges be the ultimate criterion for judging the value of the research. The problem-solving power and the joy of working with a dedicated, focused, and collegial group of colleagues is emphasized. Finally, it is suggested that the summative assessment testing paradigm that has driven test measurement research for over half a century is giving way to a new paradigm that in addition embraces skills level formative assessment, opening up a plethora of challenging, exciting, and societally important research problems for psychometricians.  相似文献   

Assessing item fit for unidimensional item response theory models for dichotomous items has always been an issue of enormous interest, but there exists no unanimously agreed item fit diagnostic for these models, and hence there is room for further investigation of the area. This paper employs the posterior predictive model‐checking method, a popular Bayesian model‐checking tool, to examine item fit for the above‐mentioned models. An item fit plot, comparing the observed and predicted proportion‐correct scores of examinees with different raw scores, is suggested. This paper also suggests how to obtain posterior predictive p‐values (which are natural Bayesian p‐values) for the item fit statistics of Orlando and Thissen that summarize numerically the information in the above‐mentioned item fit plots. A number of simulation studies and a real data application demonstrate the effectiveness of the suggested item fit diagnostics. The suggested techniques seem to have adequate power and reasonable Type I error rate, and psychometricians will find them promising.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号