首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
标准参照测验中的信度估计公式   总被引:4,自引:0,他引:4  
陈希镇 《心理学报》1996,29(4):436-442
标准参照测验是与常模参照测验不同的一种测验,在标准参照测验中,一个人在测验上的分数不是和他人相比较而是和某个已经设定的标准作比较。如果测验是从某功课论域中随机抽样构造而成,则使用者希望知道考生在这份测验上的观测分数与其在该功课论域上的分数(假如已知)的接近程度;如果使用者想根据测验分数对考生作掌握分类,则他们关心这个推断与假设考生论域分数已知时所作推断一致程度有多高。本文对这两个问题的信度估计进行探讨,得到几个有用的估计公式。  相似文献   

2.
资格认证测验属于典型的标准参照测验,在国内已得到普遍应用,但在报告心理测量学指标时很少提到标准参照模式的信度估计指标。该文归纳了标准参照测验信度估计的指标体系,分析讨论了适用于资格认证测验的信度估计及其与测验长度、分界标准分布、样本同质性的关系与特征。  相似文献   

3.
基于经典测量理论标准参照测验的传统划界分数设置方法是等级评分或指定划界分数,划界分数设置的方法有待进一步拓展。Bookmark法是基于项目反应理论的划界分数设置方法,学科专家以测验材料的能力参数值为基础,依据掌握百分比分数与被试能力水平的定量关系,设置多重划界分数,相对于传统方法更高效、精确。作者评述了Bookmark法的基本原理和具体实施方法,分析了Bookmark法的应用前景,并对Bookmark法设置划界分数的信效度和标准误估计的研究作了评述。  相似文献   

4.
通过操纵记忆提取的条件,本实验比较了自我参照、他人参照和无参照(褒义/贬义)任务下的记忆效果.结果发现:(1)自我参照编码在再认测验中出现了显著的提取优势(F(2.41)=6.097,p<0.01);(2)自我参照编码在颜色一致/不一致判断中未出现显著的提取优势(F(2,41)=1.039,p=0.363);(3)再认测验中,颜色不一致旧词的再认辨别力高于一致条件(F(1,41)=3.139,p=0.084).  相似文献   

5.
针对标准参照测验及格线设定研究中存在的问题,该文首先分析了Monte Carlo模拟实验法的基本原理,然后提出及格线设定研究的模拟实验思路,即:将专家的主观判断视为概率事件,对专家判断过程中的各种误差的概率分布做出合理假设,依据项目反应理论模型模拟得到专家的主观判断,然后采用重复抽样手段获得专家设定及格线的误差分布,据此衡量及格线的修复能力。文中给出了实例演示.文末讨论了模拟实验法的优点与不足.并展望了后续研究。  相似文献   

6.
自我参照范式是探索自我认知发展的重要范式,找到一个可靠方法获得一份稳定结果一直是儿童研究的重点和难点。研究以3岁至5岁儿童作为研究对象,通过所有权参照进行编码、图片再认和源判断进行检测。为了结果的稳定可靠,每位儿童需要参加4次测验。结果显示,图片再认上4岁、5岁儿童出现自我参照效应,源判断上3岁、4岁儿童出现自我参照效应。对源判断结果分析发现,自我参照成绩在各年龄组间保持不变、他人参照成绩随年龄增长逐渐提高,二者发展趋势的差异是5岁儿童源判断上没有自我参照效应的主要原因。以上结果说明,3~5岁儿童普遍存在自我参照效应,但测验方法会影响该效应的表现年龄;其次,自我源判断能力的发展早于他人,3~5岁之间他人源判断能力是在持续提高的;另外,4次测验间的比较证明,源判断测验较之再认具有更好的稳定性。  相似文献   

7.
用多元概化理论对普通话的测试   总被引:5,自引:0,他引:5  
杨志明  张雷 《心理学报》2002,34(1):51-56
用多元概化理论 (MGT)研究了国家语委编制的普通话测验。在G研究中 ,利用香港人普通话测试的数据 ,估计了引起分数变异的各种来源的方差与协方差分量。在D研究中 ,首先估计了该测验 3个部分的全域分数和各自的概化系数等技术指标 ,然后估计了全域合成分数及其概化系数、信噪比等指标。结果表明 ,该测验的信度从总体上讲是较高的 ,把三个部分的全域分数进行合成也是合理的 ,但从细节上看其第 3部分的信度较低。另外 ,当评分者个数为 3、试题数量为 2 8时 ,测验的第 1、2部分的信度已经较高 ,因此 ,在实测时减少这两部分的题量并不会有太大问题  相似文献   

8.
陈平  代艺  黄颖诗 《心理科学进展》2023,(10):1966-1980
测验模式效应(Test Mode Effect, TME)是指同一测验采用不同测验形式施测而产生的测验功能差异。TME的存在会对测验公平、选拔标准和测验等值等产生影响,因此对TME进行准确检测和合理解释具有重要意义。通过对TME的来源、检测(包括实验设计和检测方法)以及研究结果进行系统梳理,全面展示TME研究的方法论。对TME模型进行进一步解释、对TME研究中的测验形式进行拓展以及将TME的研究成果应用于我国的大规模教育测评项目,都是TME领域的未来重要发展方向。  相似文献   

9.
规范性分析详细描述了一个理想的、完美的内隐测验所应当具有的一系列标准和特征。通过对内隐联想测验的三个新变式在多大程度上符合这些规范性特征和标准的检验, 发现相对于简式内隐联想测验而言, 无再编码和单区组内隐联想测验更符合What标准和How标准, 未来的研究应更多关注内隐标准。  相似文献   

10.
卢谢峰  唐源鸿  王孟成 《心理科学》2012,35(6):1453-1458
人格测验的参照情境效应是指,在一般人格测验的基础上,设置某种特定的参照情境,进而使测验的效标关联效度得以提高的现象。在过去十余年里,参照情境效应的考察重心从早期的效度证据搜集逐渐转向内部机理的探讨。研究者试图通过参照情境与效标的逻辑关联、参照情境的被试间变异及被试内变异来解释现象背后的测量学原理。在构念层面则提出“人格和角色认同层级模型”,以此说明参照情境效应的人格机制问题。然而,该主题的探索尚处于初始阶段,未来研究可从参照情境的操作范式、参照情境效应的调节机制等方面继续寻求突破。  相似文献   

11.
This paper provides an empirical comparison of two methods of attribute valuation: the analytic hierarchy process (AHP) and conjoint analysis. Variants within each approach are also examined. The results of two empirical studies indicate that the methods differ in their predictive and convergent validity. Within the AHP methods no significant difference in predictive validity was found. Within the conjoint methods, the ranking method significantly outperformed the rating method. The difference in predictive validity between the AHP and conjoint methods was significant in the second study but not in the first study, suggesting superior performance of the AHP over conjoint analysis in complex problems. Copyright© 1998 John Wiley & Sons, Ltd.  相似文献   

12.
汽车司机安全驾驶性向测验的建构   总被引:11,自引:0,他引:11  
本研究之目的是建构适用于我国的汽车司机安全驾驶性向测验。经各种检验结果表明,无论是能力测验还是个性测验,它们在信度和效度上均达到了要求。因此,为进一步标准化奠定了基础。  相似文献   

13.
Miller and Rohling (2001) proposed a 24-step algorithm, the Rohling Interpretive Method (RIM), for quantitative interpretation of results from flexible neuropsychological test batteries. We believe that the RIM as presented in that paper has several conceptual problems, including (a) a failure to distinguish "statistically significant" from pathological differences, (b) an assumption that declines in specific abilities can be inferred when a particular test score deviates from an estimate of general premorbid ability, and (c) confusion between the standard deviation associated with individual test scores versus that of a composite of those scores. As an alternative, we suggest the value of developing and using co-normed comprehensive neuropsychological test batteries from which test users might select subsets of tests.  相似文献   

14.
A multiple-answer multiple-choice test item has a certain number of alternatives,any number of which might be keyed. The examinee is also allowed to mark any number of alternatives. This increased flexibility over the one keyed alternative case is useful in practice but raises questions about appropriate scoring rules. In this article a certain class of item scoring rules called thebinary class is considered. The concepts ofstandard scoring rules and equivalence among these scoring rules are introduced in the misinformation model for which the traditional knowledge model is a special case. The examinee's strategy with respect to a scoring rule is examined. The critical role of a quantity called the scoring ratio is emphasized. In the case of examinee uncertainty about the number of correct alternatives on an item, a Bayes and a minimax strategy for the examinee are developed. Also an appropriate response for the examiner to the minimax strategy is outlined.Research partially supported under Grants N00014-67-A-0314-0022 from the Office of Naval Research and GS-32514 and MPS 75-07539 from the National Science Foundation.  相似文献   

15.
Current research on the nature of attentional deficits in attention-deficit hyperactivity disorder (ADHD) is reviewed with a focus on studies using event-related potentials (ERPs). A robust effect is the smaller amplitude of the P3 wave to both auditory and visual stimuli for individuals with ADHD than for normal controls. This effect is indicative of deficits in the allocation of attentional resources to task-relevant information. The Nd wave is also smaller for individuals with ADHD, an effect that suggests impairments in the discrimination and preferential processing of task-relevant information, rather than distractibility by irrelevant stimuli. Studies of the effects of methylphenidate on ERPs in ADHD indicate that this stimulant enhances the quality of stimulus evaluation and speeds response time. These specific effects support the view that ADHD involves deficits in energetic processes that are required for processing task-relevant stimuli.  相似文献   

16.
初中词汇理解能力量表的编制   总被引:4,自引:2,他引:2  
曹亦薇 《心理学报》1999,32(2):215-221
应用项目反应理论为初中各年级编制了词汇理解能力的测验,其中包含了143个多项选择的词汇项目,经过反复预测和大规模的正式测试,证关了这三个测验的量表拟全于2PL模型,项目特征曲线拟合度良好的项目占全体项目数90%以上,能力的一维性也得以确认,经等值化后,各年级的区分度均值分别为0.61(初一),0.59(初二),0.55(初三)难度均值分别为-1.61,-1.30,-0.56。  相似文献   

17.
考试自我效能感是考试焦虑影响考试成绩的中介变量   总被引:23,自引:2,他引:21  
田宝  郭德俊 《心理科学》2004,27(2):340-343
本文采用结构方程的统计分析技术.依据Baron(1986)提出的确定中介变量的规则和标准.对265名中学生的特质考试焦虑、考试自我效能感、数学期末考试成绩的关系进行了探讨。考试焦虑,考试自我效能感和数学考试成绩是三个潜变量。结果表明,考试焦虑通过考试自我效能感这一中介变量对考试成绩产生影响,考试自我效能感对考试成绩有直接的影响作用,是考试焦虑影响考试成绩的中介变量。  相似文献   

18.
采用锚测验非等组设计的数据收集方案,对4种基于经典测量理论的等值方法进行了比较研究。研究数据取自TIMSS1999数据库,兼用等值标准误和交叉验证方法作为各等值方法比较的检验标准,利用CIPE程序对实验数据进行分析。研究结果表明,针对本研究所设置的等值情境,线性等值优于等百分位等值,其中Tucker线性方法比Levine观察分数线性方法更好一些,Braun-Holland线性方法不宜采用,频数估计等百分位方法等值误差较大,亦不足取。  相似文献   

19.
A. J. Riopelle (2003) has eloquently demonstrated that the null hypothesis assessed by the t test involves not only mean differences but also error in the estimation of the within-group standard deviation, s. He is correct in his conclusion that the precision of the interpretation of a significant t and the null hypothesis tested is complex, particularly when sample sizes are small. In this article, the author expands on Riopelle's thoughts by comparing t with some equivalent or closely related tests that make the reliance of t on the accurate estimation of error perhaps more salient and by providing a simulation that may address more directly the magnitude of the interpretational problem.  相似文献   

20.
Answer similarity indices were developed to detect pairs of test takers who may have worked together on an exam or instances in which one test taker copied from another. For any pair of test takers, an answer similarity index can be used to estimate the probability that the pair would exhibit the observed response similarity or a greater degree of similarity under the assumption that the test takers worked independently. To identify groups of test takers with unusually similar response patterns, Wollack and Maynes suggested conducting cluster analysis using probabilities obtained from an answer similarity index as measures of distance. However, interpretation of results at the cluster level can be challenging because the method is sensitive to the choice of clustering procedure and only enables probabilistic statements about pairwise relationships. This article addresses these challenges by presenting a statistical test that can be applied to clusters of examinees rather than pairs. The method is illustrated with both simulated and real data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号