首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
SUMMARY

Validity and reliability of the new high stakes testing systems initiated in school systems across the United States in recent years in response to the accountability features mandated in the No Child Left Behind Legislation largely depend on item response theory and new rules of measurement. Reliability and validity in item response theory and classical test theory are reviewed. Additionally, practices in the states are considered. The conclusion of the paper is that the new test technology is theoretically better suited to assess achievement than classical test theory, but has not been shown to be valid and reliable enough for use as the sole criterion for determination of what was learned in school. Further, there is no evidence that they will ever be found to be valid and reliable enough for that purpose. Areas of additional needed research are considered.  相似文献   

2.
3.
Although paper and pencil tests of employee honesty are becoming increasingly widespread in industry, a paucity of research exists regarding them. In a recent review of this literature, Sackett and Harris (1984) noted that scant psychometric evidence is available as to their merits or weaknesses. The aim of this paper is to report on the factor and item analysis of one such test. A principal axis solution and item response theory model (1-parameter) were used to examine the data. The factor analysis revealed four readily interpretable factors. With regard to the item analysis, the results indicated that on the whole most of the 40 items showed a reasonable fit to the model. The implications of this research are addressed.  相似文献   

4.
本文首先分析了经典测验理论存在的局限,然后在潜在特质理论和项目特征曲线两大概念基础上阐述了项目反应理论及其基础模型的测量学原理,介绍了多个项目反应理论基础模型.最后简要介绍了七项当前应用项目反应理论指导大型题库建设和指导编制各种新型测验的热点内容.  相似文献   

5.
测验理论的新发展:多维项目反应理论   总被引:3,自引:0,他引:3  
多维项目反应理论是基于因子分析和单维项目反应理论两大背景下发展起来的一种新型测验理论。根据被试在完成一项任务时多种能力之间是如何相互作用的,多维项目反应模型可以分为补偿性模型和非补偿性模型两类。本文在系统介绍了当前普遍使用的补偿性模型的基础上,指出后续研究者应关注多维项目反应理论中多级评分和高维空间的多维模型、补偿性和非补偿性模型的融合、参数估计程序的开发和多维测验等值四个方面的研究。  相似文献   

6.
Signal Detection Theory (SDT; MacMillan & Creelman, 1991) is a method of data collection that has been used for several years, which describes the decision-making strategies of individuals. However, its use has been largely restricted to experiments involving sensation and perception. The Overclaiming Questionnaire (OCQ; Paulhus & Bruce, 1990) is a scale that has been developed to measure intellectual ability and personality, using SDT as a guideline. Although the scale has been successful in measuring human characteristics such as narcissism and intelligence, it is still unclear how to measure the characteristics of the various stimuli used (e.g., item difficulty, item discrimination, etc.). In some ways, this is a direct consequence of the general lack of research involved in item parameter estimation in the field of SDT. Using the OCQ, this article presents a graphical and nonparametric form of item response modeling to address this issue. In many ways, the approach is influenced by and structured around item response theory (IRT; Hambleton, Swaminathan, & Rogers, 1991). The general features of both SDT and IRT are described. Results suggest that this method is indeed a reasonable approach to describing item functioning, and there are several advantages to using this method over traditional IRT methods. Furthermore, SDT appears to be a fruitful approach to assessing intelligence, ability, and other psychological constructs, with advantages over traditional approaches. Overall, the results provide interesting implications for item selection and test development in several scientific and academic fields.  相似文献   

7.
三种心理测量理论的信度观   总被引:5,自引:0,他引:5  
目前,心理测量领域中主要存在三大理论派别。本文分别对这三种理论即经典测验理论、可概括性理论和项目反应理论作了简要介绍,着重分析这三种理论的信度观。文章讨论了这三种信度观的理论基础和研究方法,比较了它们的异同,指出经典测验理论存在的一些不足及概化理论和项目反应理论所作的改进。概化理论是对经典测验理论的扩展,它用多维的信度指标(概化系数)替代了经典测验理论的信度系数,项目反应理论则从信息量的角度出发,用项目信息函数、测验信息函数等指标更具体深入地反映项目、测验的测量可靠程度。  相似文献   

8.
Scale construction is a growth enterprise in the psychological literature. Unfortunately, many measures promise much but are severely limited by the inadequacies of their conceptualization and execution. In this paper, a model for developing psychological scales is presented that is rooted in the traditions of construct validity and classical test theory but informed by modern psychometric methods. Construct validity is conceptualized as a guiding principle in each of three phases of scale development, focused on (i) construct conceptualization and development of the initial item pool, (ii) item selection and structural validity, and (iii) assessment of external validity vis‐à‐vis other measures and relevant nontest criteria.  相似文献   

9.
Item response theory (IRT) plays an important role in psychological and educational measurement. Unlike the classical testing theory, IRT models aggregate the item level information, yielding more accurate measurements. Most IRT models assume local independence, an assumption not likely to be satisfied in practice, especially when the number of items is large. Results in the literature and simulation studies in this paper reveal that misspecifying the local independence assumption may result in inaccurate measurements and differential item functioning. To provide more robust measurements, we propose an integrated approach by adding a graphical component to a multidimensional IRT model that can offset the effect of unknown local dependence. The new model contains a confirmatory latent variable component, which measures the targeted latent traits, and a graphical component, which captures the local dependence. An efficient proximal algorithm is proposed for the parameter estimation and structure learning of the local dependence. This approach can substantially improve the measurement, given no prior information on the local dependence structure. The model can be applied to measure both a unidimensional latent trait and multidimensional latent traits.  相似文献   

10.
Hopelessness has become an increasingly important construct in palliative care research, yet concerns exist regarding the utility of existing measures when applied to patients with a terminal illness. This article describes a series of studies focused on the exploration, development, and analysis of a measure of hopelessness specifically intended for use with terminally ill cancer patients. The 1st stage of measure development involved interviews with 13 palliative care experts and 30 terminally ill patients. Qualitative analysis of the patient interviews culminated in the development of a set of potential questionnaire items. In the 2nd study phase, we evaluated these preliminary items with a sample of 314 participants, using item response theory and classical test theory to identify optimal items and response format. These analyses generated an 8-item measure that we tested in a final study phase, using a 3rd sample (n = 228) to assess reliability and concurrent validity. These analyses demonstrated strong support for the Hopelessness Assessment in Illness Questionnaire providing greater explanatory power than existing measures of hopelessness and found little evidence that this assessment was confounded by illness-related variables (e.g., prognosis). In summary, these 3 studies suggest that this brief measure of hopelessness is particularly useful for palliative care settings. Further research is needed to assess the applicability of the measure to other populations and contexts.  相似文献   

11.
Knowles ES  Condon CA 《心理评价》2000,12(3):245-252
This article examines item stability when the same item appears in different contexts. The 1st section considers the assumptions in classical test theory and item response theory concerning the relationship between the item and the trait it is presumed to measure. The 2nd section presents contextualist challenges to the measurement theory assumptions about item properties and shows the instability of item characteristics across different testing contexts. The 3rd section describes methods for checking the relationship between items and traits. Classical test methods, item response methods, and structural equation methods for assessing item stability are reviewed. The instability of item characteristics across contexts should caution researchers to assess, and not assume, that items operate the same way on different test versions. Item instability also indicates the need for a more detailed understanding of the psychological processes that occur between item and answer.  相似文献   

12.
The Wisconsin Schizotypy Scales are widely used for assessing schizotypy in nonclinical and clinical samples. However, they were developed using classical test theory (CTT) and have not had their psychometric properties examined with more sophisticated measurement models. The present study employed item response theory (IRT) as well as traditional CTT to examine psychometric properties of four of the schizotypy scales on the item and scale level, using a large sample of undergraduate students (n = 6,137). In addition, we investigated differential item functioning (DIF) for sex and ethnicity. The analyses revealed many strengths of the four scales, but some items had low discrimination values and many items had high DIF. The results offer useful guidance for applied users and for future development of these scales.  相似文献   

13.
Little research has been conducted on the psychometrics of the very short scale (36 items) of the Children's Behavior Questionnaire, and no one-item temperament scale has been tested for use in applied work. In this study, 237 United States caregivers completed a survey to define their child's behavioral patterns (i.e., Surgency, Negative Affectivity Effortful Control) using both scales. Psychometrics of the 36-item Children's Behavior Questionnaire were examined using classical test theory, principal factor analysis, and item response modeling. Classical test theory analysis demonstrated adequate internal consistency and factor analysis confirmed a three-factor structure. Potential improvements to the measure were identified using item response modeling. A one-item (three response categories) temperament scale was validated against the three temperament factors of the 36-item scale. The temperament response categories correlated with the temperament factors of the 36-item scale, as expected. The one-item temperament scale may be applicable for clinical use.  相似文献   

14.
We use classical test theory (CTT) and item response theory (IRT) methodologies to examine the psychometric and measurement properties of an instrument designed to assess sexual orientation harassment among military personnel (N?=?71,989). CTT analyses indicated that items were unidimensional and exhibited adequate levels of reliability. IRT analyses demonstrated that the items functioned similarly and exhibited appropriate levels of item discrimination. However, the analyses also suggested that the sensitivity of the items may be limited. Differential test functioning analyses provided evidence of the measurement equivalence of the instrument across male and female respondents. The findings provide support for the psychometric properties and measurement equivalence of the instrument for measuring sexual orientation harassment among male and female military personnel. We discuss the implications of our findings for future research on sexual orientation harassment in the workplace.  相似文献   

15.
In this study, we compared classical test theory (CTT) and item response theory (IRT) approaches in analyzing the Center for Epidemiological Studies Depression (CES-D) Scale (Radloff, 1977). Standard item analyses, as well as Rasch (1960) analyses, both revealed item departures from unidimensionality in a sample of 2,455 older persons responding to the CES-D. Positive affect items in the scale performed poorly overall, their removal reducing the scale's bandwidth only slightly. Modeling depression scores derived from Rasch measures and raw totals showed subtle but important differences for statistical inference. The assessment of depressive risk was slightly enhanced by using 16-item scale measures obtained from the results of the Rasch analysis as the dependent variable. Confirmatory factor analysis and parallel analysis verified the advantages of removing positively worded items. IRT and CTT techniques proved to be complementary in this study and can be usefully combined to improve measuring depression.  相似文献   

16.
The main aim of this article is to explicate why a transition to ideal point methods of scale construction is needed to advance the field of personality assessment. The study empirically demonstrated the substantive benefits of ideal point methodology as compared with the dominance framework underlying traditional methods of scale construction. Specifically, using a large, heterogeneous pool of order items, the authors constructed scales using traditional classical test theory, dominance item response theory (IRT), and ideal point IRT methods. The merits of each method were examined in terms of item pool utilization, model-data fit, measurement precision, and construct and criterion-related validity. Results show that adoption of the ideal point approach provided a more flexible platform for creating future personality measures, and this transition did not adversely affect the validity of personality test scores.  相似文献   

17.
The aim of this paper is to develop a new understanding of children's drawings and to provide ideas for future research in early childhood. Starting from classic theories on child graphical development, we proceed to analyze them and provide our own views on the subject. We will also recount a number of relevant empirical studies that appear to validate our theory. Our belief is that emotion and self-expression through movement play a key role in the development of child art, and that this may be already visible during the scribbling stage of drawing.  相似文献   

18.
Creativity is increasingly identified as a key educational outcome at the local, regional, and national levels in several countries. Yet one key issue about the nature of creativity remains controversial: Whether creativity is domain specific or domain general. Resolving this issue would significantly impact the way creativity is identified, nurtured, and assessed in our schools. Three-hundred and fifty-nine undergraduate and graduate students completed measures that assessed their creative achievements in 6 distinct domains. Results based on item response theory models suggested that creativity was domain general, rather than domain specific, and part of the evidence provided by the classical test theory models seemed to favor the domain-specific view. These findings have great implications for researchers and practitioners who aim to assess and promote creativity in schools.  相似文献   

19.
刘红云  骆方  王玥  张玉 《心理学报》2012,44(1):121-132
作者简要回顾了SEM框架下分类数据因素分析(CCFA)模型和MIRT框架下测验题目和潜在能力的关系模型, 对两种框架下的主要参数估计方法进行了总结。通过模拟研究, 比较了SEM框架下WLSc和WLSMV估计方法与MIRT框架下MLR和MCMC估计方法的差异。研究结果表明:(1) WLSc得到参数估计的偏差最大, 且存在参数收敛的问题; (2)随着样本量增大, 各种项目参数估计的精度均提高, WLSMV方法与MLR方法得到的参数估计精度差异很小, 大多数情况下不比MCMC方法差; (3)除WLSc方法外, 随着每个维度测验题目的增多参数估计的精度逐渐增高; (4)测验维度对区分度参数和难度参数的影响较大, 而测验维度对项目因素载荷和阈值的影响相对较小; (5)项目参数的估计精度受项目测量维度数的影响, 只测量一个维度的项目参数估计精度较高。另外文章还对两种方法在实际应用中应该注意的问题提供了一些建议。  相似文献   

20.
阶层线性模型是处理阶层结构数据的高级统计方法, 项目反应理论是精确测量被试能力的现代测量理论。多水平项目反应理论将阶层线性模型和项目反应理论相结合, 将项目反应模型嵌套在阶层线性模型内, 实现了项目参数和不同水平能力参数的估计, 对回归系数和误差项变异的估计也更加精确。作者概述了多水平项目反应理论的发展历程, 并从项目功能差异、测验等值、学校效能研究等方面评述了多水平项目反应理论在心理与教育测量中的应用, 总结了多水平项目反应理论的价值, 同时展望了今后的研究趋势。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号