首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Sampling fluctuations resulting from the sampling of test items rather than of examinees are discussed. It is shown that the Kuder-Richardson reliability coefficients actually are measures of this type of sampling fluctuation. Formulas for certain standard errors are derived; in particular, a simple formula is given for the standard error of measurement of an individual examinee's score. A common misapplication of the Wilks-Votaw criterion for parallel tests is pointed out. It is shown that the Kuder-Richardson formula-21 reliability coefficient should be used instead of the formula-20 coefficient in certain common practical situations.Most of the work reported here was carried out under contract with the Office of Naval Research. The writer is indebted to Professor S. S. Wilks, who has checked over certain critical portions of a draft of this paper.  相似文献   

2.
用多元概化理论对普通话的测试   总被引:5,自引:0,他引:5  
杨志明  张雷 《心理学报》2002,34(1):51-56
用多元概化理论 (MGT)研究了国家语委编制的普通话测验。在G研究中 ,利用香港人普通话测试的数据 ,估计了引起分数变异的各种来源的方差与协方差分量。在D研究中 ,首先估计了该测验 3个部分的全域分数和各自的概化系数等技术指标 ,然后估计了全域合成分数及其概化系数、信噪比等指标。结果表明 ,该测验的信度从总体上讲是较高的 ,把三个部分的全域分数进行合成也是合理的 ,但从细节上看其第 3部分的信度较低。另外 ,当评分者个数为 3、试题数量为 2 8时 ,测验的第 1、2部分的信度已经较高 ,因此 ,在实测时减少这两部分的题量并不会有太大问题  相似文献   

3.
曾祥星  丁道群 《心理科学》2017,40(5):1061-1067
在负性风险信息沟通中,采用文本表征与图形表征,结果发现图形表征方式更能引起风险回避行为。基于此,本研究以获益方案作为风险沟通信息,分别采用文本表征与图形表征来探索信息表征方式对风险寻求的影响。结果显示:在文本表征方式中,个体决策主要受备择方案之间风险大小的影响;而图形表征方式中,个体不仅受备择方案之间风险大小的影响,而且更受备择方案之间获益大小的影响。结果说明:相对文本表征,图形表征方式更能引起决策者为了获益而冒更大的风险,从而表现出风险寻求偏好,从而证实"图形效应"在风险决策中存在普遍性,为我们风险决策信息提供依据。  相似文献   

4.
A method for analyzing test item responses is proposed to examine differential item functioning (DIF) in multiple-choice items through a combination of the usual notion of DIF, for correct/incorrect responses and information about DIF contained in each of the alternatives. The proposed method uses incomplete latent class models to examine whether DIF is caused by the attractiveness of the alternatives, difficulty of the item, or both. DIF with respect to either known or unknown subgroups can be tested by a likelihood ratio test that is asymptotically distributed as a chi-square random variable.  相似文献   

5.
Cronbach's alpha is systematically used in the social sciences to estimate internal consistency. However, this coefficient assumes the continuity of the variables and this assumption is not met by ordinal response items or Likert scales. This work shows two alternatives to the alpha coefficient, the ordinal alpha and beta coefficients. Both of them take into account the ordinal nature of the data. The new coefficients were applied to several scales and compared with the traditional estimation of alpha. This paper also shows how to estimate those coefficients.  相似文献   

6.
A fundamental assumption of most IRT models is that items measure the same unidimensional latent construct. For the polytomous Rasch model two ways of testing this assumption against specific multidimensional alternatives are discussed. One, a marginal approach assuming a multidimensional parametric latent variable distribution, and, two, a conditional approach with no distributional assumptions about the latent variable. The second approach generalizes the Martin-Löf test for the dichotomous Rasch model in two ways: to polytomous items and to a test against an alternative that may have more than two dimensions. A study on occupational health is used to motivate and illustrate the methods.The authors would like to thank Niels Keiding, Klaus Larsen and the anonymous reviewers for valuable comments to a previous version of this paper. This research was supported by a grant from the Danish Research Academy and by a general research grant from Quality Metric, Inc.  相似文献   

7.
What type of items, keyed positively or negatively, makes social-emotional skill or personality scales more valid? The present study examines the different criterion validities of true- and false-keyed items, before and after correction for acquiescence. The sample included 12,987 children and adolescents from 425 schools of the State of São Paulo Brazil (ages 11–18 attending grades 6–12). They answered a computerized 162-item questionnaire measuring 18 facets grouped into five broad domains of social-emotional skills, i.e.: Open-mindedness (O), Conscientious Self-Management (C), Engaging with others (E), Amity (A), and Negative-Emotion Regulation (N). All facet scales were fully balanced (3 true-keyed and 3 false-keyed items per facet). Criterion validity coefficients of scales composed of only true-keyed items versus only false-keyed items were compared. The criterion measure was a standardized achievement test of language and math ability. We found that coefficients were almost as twice as big for false-keyed items’ scales than for true-keyed items’ scales. After correcting for acquiescence coefficients became more similar. Acquiescence suppresses the criterion validity of unbalanced scales composed of true-keyed items. We conclude that balanced scales with pairs of true and false keyed items make a better scale in terms of internal structural and predictive validity.  相似文献   

8.
The self-choice effect is the phenomenon whereby self-chosen items are remembered better than experimenter-assigned items. This study examined whether the effect occurs when the choice is constrained by cuing, and whether the effect also occurs for unchosen items. In the experiment, 33 participants chose (choice condition) or were assigned (force condition) a target from three alternatives that were followed by a cue sentence as a criterion for the choice. Cue sentences corresponded to any of the three alternatives (free cuing) or to only one (constrained cuing). Participants then engaged in free recall of targets and subsequent recognition of all alternatives (chosen and unchosen items). Memory performance was enhanced by choice regardless of the constraints, but was also enhanced for unchosen items. These results indicate that "free choice" is not always critical for the self-choice effect, and that multiple cuing involving unchosen items is a plausible account for the retention advantage of choice procedures.  相似文献   

9.
The concept of test reliability is examined in terms of general, group, and specific factors among the items, and the stability of scores in these factors from trial to trial. Four essentially different definitions of reliability are distinguished, which may be called the hypothetical self-correlation, the coefficient of equivalence, the coefficient of stability, and the coefficient of stability and equivalence. The possibility of estimating each of these coefficients is discussed. The coefficients are not interchangeable and have different values in corrections for attentuation, standard errors of measurement, and other practical applications.  相似文献   

10.
This article describes the 1997 revision of the Dutch Rating System for Test Quality used by the Committee of Test Affairs of the Dutch Association of Psychologists (COTAN). The revised rating system evaluates the quality of a test on 7 criteria: Theoretical basis and the soundness of the test development procedure, Quality of the testing materials, Comprehensiveness of the manual, Norms, Reliability, Construct validity, and Criterion validity. For each criterion, a checklist with a number of items is provided. Some items (for each criterion at least 1) are so-called key questions, which check whether certain minimum conditions are met. If a key question is rated negative, the rating for that criterion will automatically be "insufficient." To enhance a uniform interpretation of the items by the raters and to explain the system to test users and test developers, comment sections provide detailed information on rating and weighting the items. Once the items have been rated, the final grades (insufficient, sufficient, or good) for the 7 criteria are established by means of weighting rules.  相似文献   

11.
This paper discusses rowwise matrix correlation, based on the weighted sum of correlations between all pairs of corresponding rows of two proximity matrices, which may both be square (symmetric or asymmetric) or rectangular. Using the correlation coefficients usually associated with Pearson, Spearman, and Kendall, three different rowwise test statistics and their normalized coefficients are discussed, and subsequently compared with their nonrowwise alternatives like Mantel'sZ. It is shown that the rowwise matrix correlation coefficient between two matricesX andY is the partial correlation between the entries ofX andY controlled for the nominal variable that has the row objects as categories. Given this fact, partial rowwise correlations (as well as multiple regression extensions in the case of Pearson's approach) can be easily developed.The author wishes to thank the Editor, two referees, Jan van Hooff, and Ruud Derix for their useful comments, and E. J. Dietz for a copy of the algorithm of the Mantel permutation test.  相似文献   

12.
HORST P 《Psychometrika》1951,16(2):189-202
Having given a fixed amount of total testing time it is important to know how long each test in the battery should be so that the correlation of the battery with the criterion will be a maximum. The precise solution for the test lengths will depend on a particular set of conditions which may be specified. The writer has previously presented solutions for two sets of conditions. This article presents the solution for a third set of conditions. These are: (1) The total number of items or testing time is fixed. (2) The score is the total number of items correctly answered. (3) The test lengths are determined in such a way that the correlation of total score with the criterion is a maximum. The solutions for the two previous sets of conditions, together with the current set, are summarized. A set of experimental data is submitted to each solution and the three sets of results are compared.  相似文献   

13.
魏知超  杨靖 《心理科学》2006,29(2):401-405
本研究编制了一种用于测量儿童语音工作记忆的测验———非词复述测验,并在48名四年级小学生中初步进行信度、效度检验和项目分析。结果表明:(1)该测验有较高的重测信度;(2)该测验具有较高的结构效度和效标效度;(3)分测验二的项目难度分布比较合理,多数项目鉴别力较高,而分测验一的项目难度分布和项目鉴别力则有待于在今后的研究中进一步提高。  相似文献   

14.
自我职业选择测验(SDS)的试用报告   总被引:12,自引:0,他引:12  
本研究对自我职业选择测验 (SDS) 1 985年版进行了修订 ,并在武汉市中学生中进行了适用性的验证。在原测验中译本基础上 ,进行了项目修改、项目分析、信效度检验等标准化工作。结果表明 :①该测验具有良好的项目特性 ;②该测验同质性信度、分半信度均达到一般心理测验要求标准 ;③该测验结构效度与效标关联效度亦较为理想 ;④个别项目仍有待于进一步修改 ,取样还应面向全国 ,以利于进一步的推广作用。在武汉市中学生中的试用结果表明 :①该测验可以作为中学生职业辅导的选用工具 ;②在该测验中使用标准分代替粗分更具科学性。  相似文献   

15.
It is shown that a unidimensional monotone latent variable model for binary items implies a restriction on the relative sizes of item correlations: The negative logarithm of the correlations satisfies the triangle inequality. This inequality is not implied by the condition that the correlations are nonnegative, the criterion that coefficient H exceeds 0.30, or manifest monotonicity. The inequality implies both a lower bound and an upper bound for each correlation between two items, based on the correlations of those two items with every possible third item. It is discussed how this can be used in Mokken’s (A theory and procedure of scale-analysis, Mouton, The Hague, 1971) scale analysis.  相似文献   

16.
高阶因子模型本质上是一种特殊的双因子模型, 应用中却常被当做双因子模型的竞争模型。已有研究以满足比例约束的双因子模型(此时等价于一个高阶因子模型)为真实测量模型产生模拟数据, 比较了用双因子模型和高阶因子模型作为测量模型的预测效果。本文使用不满足比例约束的双因子模型(此时不与任何高阶因子模型等价)为真实测量模型产生模拟数据进行比较, 所得结果与满足比例约束的双因子模型的结果有很大差别, 双因子模型结构系数的相对偏差较小、检验力较高, 但第Ⅰ类错误率略高。结论是, 在比例约束条件成立时可以使用高阶因子模型, 否则, 从统计角度看, 一般情况下使用双因子模型进行预测比较好。  相似文献   

17.
Studying Deese–Roediger–McDermott (DRM) lists using a distinctive encoding task can reduce the DRM false memory illusion. Reductions for both distinctively encoded lists and non-distinctively encoded lists in a within-group design have been ascribed to use of a distinctiveness heuristic by which participants monitor their memories at test for distinctive-task details. Alternatively, participants might simply set a more conservative response criterion, which would be exceeded by distinctive list items more often than all other test items, including the critical non-studied items. To evaluate these alternatives, we compared a within-group who studied 5 lists by reading, 5 by anagram generation, and 5 by imagery, relative to a control group who studied all 15 lists by reading. Generation and imagery improved recognition accuracy by impairing relational encoding, but the within group did not show greater memory monitoring at test relative to the read control group. Critically, the within group’s pattern of list-based source judgments provided new evidence that participants successfully monitored for distinctive-task details at test. Thus, source judgments revealed evidence of qualitative, recollection-based monitoring in the within group, to which our quantitative signal-detection measure of monitoring was blind.  相似文献   

18.
It is pointed out that the scoring weights for test items should be approximations to regression-equation weights. For this reason any estimate of reliability of the weight should not be permitted to influence the size of the weight but should be used in determining the limit of acceptability of an item. A simple approximation weight is recommended for general use, and anabac is provided for the estimation of it when the correlation between item and criterion is the phi coefficient. A formula for the standard error of this weight is derived and tables of significant and very significant weights are presented in terms of deviations from the median weight.  相似文献   

19.
Memory strength and the decision process in recognition memory   总被引:1,自引:0,他引:1  
We investigated the role that memory strength plays in the decision process by examining the extent to which strength is used as a cue to dynamically modify recognition criteria. The study list consisted of strong and weak items, with strength a function of study duration or repetition. The recognition test list was divided into two consecutive blocks; strong items appeared in one block, weak items in the other. If the change in item strength across blocks leads to a shift in criterion, the false alarm rate should change accordingly. In four experiments, the false alarm rates did not change across blocks, even when the difference between the strong and the weak items was magnified and marked with semantic cues. However, the strength of the items in the first test block affected the false alarm rate. Thus, strength cues influence initial criterion placement but fail to induce criterion shifts following permanent and even dramatic changes in item strength. These null findings are contrasted with those in a fifth experiment, in which accuracy feedback produced dynamic criterion shifts.  相似文献   

20.
Simple statistical methods are developed to test whether coefficients of reproducibility, of homogeneity, or of consistency differ significantly from what can be expected if responses to different items are statistically independent. Simple methods are also developed for estimating the variance of coefficients of reproducibility when it is not assumed that responses to different items are independent. These estimates are used to test whether a coefficient differs significantly from any prescribed value, and also to obtain confidence intervals for these coefficients. The rationale for the measurement of reproducibility is also discussed.This research was carried out at the Statistical Research Center, University of Chicago, under sponsorship of the Statistics Branch, Office of Naval Research, and of the Social Science Research Committee, University of Chicago. Reproduction in whole or in part is permitted for any purpose of the United States Government. The author is indebted to Jacob Gewirtz for some very helpful comments.This paper is dedicated to Professor Jacob Marschak on the occasion of his sixtieth birthday, July 23, 1958.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号