首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A hybrid procedure for number correct scoring is proposed. The proposed scoring procedure is based on both classical true-score theory (CTT) and multidimensional item response theory (MIRT). Specifically, the hybrid scoring procedure uses test item weights based on MIRT and the total test scores are computed based on CTT. Thus, what makes the hybrid scoring method attractive is that this method accounts for the dimensionality of the test items while test scores remain easy to compute. Further, the hybrid scoring does not require large sample sizes once the item parameters are known. Monte Carlo techniques were used to compare and contrast the proposed hybrid scoring method with three other scoring procedures. Results indicated that all scoring methods in this study generated estimated and true scores that were highly correlated. However, the hybrid scoring procedure had significantly smaller error variances between the estimated and true scores relative to the other procedures.  相似文献   

2.
For a sample of 300 patients who had been administered the Minnesota Multiphasic Personality Inventory (MMPI), the MMPI-168 was extracted from the full MMPI and scored to incorporate those items normally excluded by Form R keys. MMPI-168 correlations with the full MMPI ranged from .80 to .97 with a mean of .90, indicating satisfactory statistical validity, and modified scoring was shown to improve predictability for Pa and Sc. Using these data, substitution equations for transforming MMPI-168 raw scores to estimates of full-scale scores were calculated. These transformations did not differ greatly from those reported in previous research except on Pa and Sc, where additional items increase scale length substantially.  相似文献   

3.
The Remote Associates Test (RAT; Mednick, 1962; Mednick & Mednick, 1967) is a commonly employed test of creative convergent thinking. The RAT is scored with a dichotomous scoring, scoring correct answers as 1 and all other answers as 0. Based on recent research into the information processing underlying RAT performance, we argued that the dichotomous scoring may lead to a loss of potentially relevant information. Thus, we proposed an alternate scoring based on semantic similarity between the answer given by the participant and the correct solution using Latent Semantic Analysis (LSA; Landauer & Dumais, 1997). We evaluate the psychometric properties of the alternate LSA scoring and found evidence of construct validity for the LSA scoring which was comparable to findings for the standard scoring, but not better as we would have expected. Thus, our expectations that LSA-based scoring of the RAT counteracts potential information loss were not met. However, LSA based scorings appear to be a promising alternative for hardly solvable RAT items. We conducted additional analyses comparing different RAT item types with regard to their validity as well as evaluating the information uniquely contained in the LSA scoring. Implications of all finding for existing research using RAT items are discussed.  相似文献   

4.
Summary: On formula‐scored exams students receive points and penalties for correct and incorrect answers, respectively, but they can avoid the penalty by withholding incorrect answers. However, test‐takers have difficulty strategically regulating their accuracy and often set an overly conservative metacognitive response bias (e.g., Higham, 2007). The current experiments extended these findings by exploring whether the comparative difficulty of surrounding test questions (i.e., easy vs. hard)—a factor unrelated to the knowledge being tested—impacts metacognitive response bias for medium‐difficulty test questions. Comparative difficulty had no significant influence on participants' ability to choose correct answers for medium questions, but it did affect willingness to report answers and confidence ratings. This difference carried over to corrected scores (scores after penalties are applied) when comparative difficulty was manipulated within‐subjects: Scores were higher in the hard condition. Results are discussed in terms of implications for interpreting formula‐scored tests and underlying mechanisms of performance.Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

5.
The Balanced Inventory of Desirable Responding (BIDR; Paulhus, 1994) is a widely used instrument to measure the 2 components of social desirability: self-deceptive enhancement and impression management. With respect to scoring of the BIDR, Paulhus (1994) authorized 2 methods, namely continuous scoring (all answers on the continuous answer scale are counted) and dichotomous scoring (only extreme answers are counted). In this article, we report 3 studies with student samples, and continuous and dichotomous scoring of BIDR subscales are compared with respect to reliability, convergent validity, sensitivity to instructional variations, and correlations with personality. Across studies, the scores from continuous scoring (continuous scores) showed higher Cronbach's alphas than those from dichotomous scoring (dichotomous scores). Moreover, continuous scores showed higher convergent correlations with other measures of social desirability and more consistent effects with self-presentation instructions (fake-good vs. fake-bad instructions). Finally, continuous self-deceptive enhancement scores showed higher correlations with those traits of the Five-factor model for which substantial correlations were expected (i.e., Neuroticism, Extraversion, and Conscientiousness). Consequently, these findings indicate that continuous scoring may be preferable to dichotomous scoring when assessing socially desirable responding with the BIDR.  相似文献   

6.
Best-worst scaling is a judgment format in which participants are presented with a set of items and have to choose the superior and inferior items in the set. Best-worst scaling generates a large quantity of information per judgment because each judgment allows for inferences about the rank value of all unjudged items. This property of best-worst scaling makes it a promising judgment format for research in psychology and natural language processing concerned with estimating the semantic properties of tens of thousands of words. A variety of different scoring algorithms have been devised in the previous literature on best-worst scaling. However, due to problems of computational efficiency, these scoring algorithms cannot be applied efficiently to cases in which thousands of items need to be scored. New algorithms are presented here for converting responses from best-worst scaling into item scores for thousands of items (many-item scoring problems). These scoring algorithms are validated through simulation and empirical experiments, and considerations related to noise, the underlying distribution of true values, and trial design are identified that can affect the relative quality of the derived item scores. The newly introduced scoring algorithms consistently outperformed scoring algorithms used in the previous literature on scoring many-item best-worst data.  相似文献   

7.
Internal and external validity tests were completed for an inventory that has been used to infer signs of temporal lobe lability. Strong, positive correlations were reported for a normal (reference) population between the numbers of responses that referred to paranormal experiences (including feelings of a "presence") and separately to religious beliefs and the numbers of spikes per minute within electroencephalographic recordings from the temporal lobe. Numbers of spikes were also correlated with the subjects' scores on the hysteria, schizophrenia, and psychasthenia scales from the MMPI. These clusters of items were not correlated with electrical activity from the occipital lobe (the comparison region). Numbers of responses to control clusters of mundane experiences were not correlated with the temporal lobe measures. A group of student poets scored higher on different subclusters of temporal lobe signs and on the schizophrenia and mania scales of the MMPI than the reference group. For both groups, there were positive correlations between the amount of alpha activity in the temporal lobe only and answers to items such as "hearing inner voices" and "feeling as if things were not real." These results demonstrate that quantitative measures of electrical changes in the temporal lobe are correlated with (or with the report of) specific experiences that are prevalent during surgical or epileptic stimulation of this brain region.  相似文献   

8.
A wealth of previous research has established that retrieval practice promotes memory, particularly when retrieval is successful. Although successful retrieval promotes memory, it remains unclear whether successful retrieval promotes memory equally well for items of varying difficulty. Will easy items still outperform difficult items on a final test if all items have been correctly recalled equal numbers of times during practice? In two experiments, normatively difficult and easy Lithuanian–English word pairs were learned via test–restudy practice until each item had been correctly recalled a preassigned number of times (from 1 to 11 correct recalls). Despite equating the numbers of successful recalls during practice, performance on a delayed final cued-recall test was lower for difficult than for easy items. Experiment 2 was designed to diagnose whether the disadvantage for difficult items was due to deficits in cue memory, target memory, and/or associative memory. The results revealed a disadvantage for the difficult versus the easy items only on the associative recognition test, with no differences on cue recognition, and even an advantage on target recognition. Although successful retrieval enhanced memory for both difficult and easy items, equating retrieval success during practice did not eliminate normative item difficulty differences.  相似文献   

9.
There are 16 items in the standard MMPI group forms which are included twice. It was found that a number of computerized scoring services use only the first occurrence of repeated items in scoring the MMPI scales, whereas the handscoring templates use an arbitrary combination of the first and second occurrence of these items. Comparison of these conventions suggested a potential for significant differences in score, particularly on the Sc scale. Scoring a set of 126 MMPIs of chronic pain patients by both of these scoring conventions revealed differences of up to 10 T points on the Sc scale. It is recommended that a single scoring convention for the MMPI be adopted by psychologists. For several reasons we suggest that only the first occurrence of repeated items should be used for scoring purposes. In the absence of a single convention, comparisons between Sc scores on different protocols should not be made without first insuring that the protocols were scored in the same manner.  相似文献   

10.
There are 16 items in the standard MMPI group forms which are included twice. It was found that a number of computerized scoring services use only the first occurrence of repeated items in scoring the MMPI scales, whereas the handscoring templates use an arbitrary combination of the first and second occurrence of these items. Comparison of these conventions suggested a potential for significant differences in score, particularly on the Sc scale. Scoring a set of 126 MMPIs of chronic pain patients by both of these scoring conventions revealed differences of up to 10 T points on the Sc scale. It is recommended that a single scoring convention for the MMPI be adopted by psychologists. For several reasons we suggest that only the first occurrence of repeated items should be used for scoring purposes. In the absence of a single convention, comparisons between Sc scores on different protocols should not be made without first insuring that the protocols were scored in the same manner.  相似文献   

11.
Men score higher than women on the Mental Rotations test (MRT), and the magnitude of this gender difference is the largest of that on any spatial test. Goldstein, Haldane, and Mitchell (1990) reported finding that the gender difference on the MRT disappears when “performance factors” are controlled— specifically, when subjects are allowed sufficient time to attempt all items on the test or when a scoring procedure that controls for the number of items attempted is used. The present experiment also explored whether eliminating these performance factors results in a disappearance of the gender difference on the test. Male and female college students were allowed a short time period or unlimited time on the MRT. The tests were scored according to three different procedures. The results showed no evidence that the gender difference on the MRT was affected by the scoring method or the time limit. Regardless of the scoring procedure, men scored higher than women, and the magnitude of the gender difference persisted undiminished when subjects completed all items on the test. Thus there was no evidence that performance factors produced the gender difference on the MRT. These results are consistent with the results of other investigators who have attempted to replicate Goldstein et al. ’s findings.  相似文献   

12.
为提高对高分作弊者和低抄袭比例作弊者的检验力,模拟600名考生的作答,设置3种抄袭比例(60%,80%和100%)和3种抄袭源能力(能力百分等级为60%,80%和100%),设计两阶段作弊甄别法,第一阶段使用lz指数筛选个人拟合差的异常考生,第二阶段针对这些异常考生使用ω指数进行精确甄别。结果表明该法对高分作弊者和低抄袭比例作弊者的检验力优于仅使用答案抄袭检测法。  相似文献   

13.
This study assesses the effects of member expertise on group decision-making and group performance. Three-person cooperative groups and three independent individuals solved either an easy or moderately difficult version of the deductive logic game Mastermind. Experimental groups were given veridical performance information, i.e., the members' rankings on prior individual administrations of the task. Control groups were not provided with this information. Results supported the predictions of this study: (1) groups gave more weight to the input of their highest performing members with the group decision-making process being best approximated by post hoc “expert weighted” social decision schemes and (2) groups performed at the level of the best of an equivalent number of individuals.  相似文献   

14.
This paper assesses framing effects on decision making with internal uncertainty, i.e., partial knowledge, by focusing on examinees' behavior in multiple-choice (MC) tests with different scoring rules. In two experiments participants answered a general-knowledge MC test that consisted of 34 solvable and 6 unsolvable items. Experiment 1 studied two scoring rules involving Positive (only gains) and Negative (only losses) scores. Although answering all items was the dominating strategy for both rules, the results revealed a greater tendency to answer under the Negative scoring rule. These results are in line with the predictions derived from Prospect Theory (PT) [Econometrica 47 (1979) 263]. The second experiment studied two scoring rules, which allowed respondents to exhibit partial knowledge. Under the Inclusion-scoring rule the respondents mark all answers that could be correct, and under the Exclusion-scoring rule they exclude all answers that might be incorrect. As predicted by PT, respondents took more risks under the Inclusion rule than under the Exclusion rule. The results illustrate that the basic process that underlies choice behavior under internal uncertainty and especially the effect of framing is similar to the process of choice under external uncertainty and can be described quite accurately by PT.  相似文献   

15.
简小珠  戴步云  戴海琦 《心理学报》2016,48(12):1625-1630
试题难度、试题考查重要性程度加权是多级记分试题的两个基本属性, 因而在IRT项目特征函数中需用不同参数来表示。以往多级记分模型用多个难度参数来描述多级记分试题的难度, 不能有效的表达多级记分试题的分数权重作用。从多级记分试题的分数加权作用角度, 本文提出Logistic加权模型并论述了理论构建思想。在Logistic加权模型下对项目参数估计的EM算法进行推导并编写了相应的参数估计程序。在Logistic加权模型下进行测验模拟, 发现项目参数估计的模拟返真性能良好。  相似文献   

16.
心理与教育测验中存在着被试作答异常现象(能力测验中的猜测现象和睡眠现象, 人格测验中的非0下渐近线现象和非1上渐近线现象), 会导致被试能力或人格特征的测量偏差。在能力测验中, 研究者已提出了多种方法来纠正猜测现象和睡眠现象, 这些方法往往需要调整或删除被试作答信息, 而四参数模型不需要改变被试作答信息而能有效纠正被试能力高估或低估现象。在人格测验中存在着非0下渐近线和非1上渐近线现象, 四参数模型能增强测验项目拟合性能, 提高人格测验的准确性。  相似文献   

17.
This study investigated the relation between expert and target scoring of a video‐based social understanding test (VSU) under two different types of instructions (internal and observer). The effects of the scoring methods and instructions on the VSU's construct validity were also examined. A total of 529 pilot applicants completed the VSU (some with internal and some with observer instructions), cognitive ability and knowledge tests, and a personality questionnaire. A subsample (n = 132) completed the VSU again with the other instructions and participated in an assessment center (AC). The two scores were moderately correlated; correlations decreased when the instructions were considered. Neither expert nor target scores showed convergent validity with AC variables; none of the scoring‐instruction combinations showed significant associations with the remaining measures.  相似文献   

18.
Two studies were designed to measure the cathartic effects of humor on aggressive responses. In the first study, two versions (easy and difficult) of Raven's intelligence test were administered to two groups of high school students. Only the easy version could be solved in the alloted time. Rosenzweig's (1951) Picture Frustration test was then administered and the students' aggressive responses were scored. Results showed that those who did not solve the problems had significantly higher scores on aggressivity than did the others. The second study, using four different groups, was planned according to a modified Solomon design. Two of the four groups of students completed the difficult part of the Raven test, and then two video-tapes were presented: a humorous one to two groups and a neutral one to the others. Finally, the Rosenzweig Picture Frustration test was administered to all four groups. An analysis of variance computed on the aggressivity scores showed one significant difference: frustrated students who viewed the humorous videotape had lower scores than those viewing the neutral one.  相似文献   

19.
刘玥  刘红云 《心理学报》2017,(9):1234-1246
双因子模型可以同时包含一个全局因子和多个局部因子,在描述多维测验结构时有其独特优势,近些年应用越来越广泛。文章基于双因子模型,提出了4种合成总分和维度分的方法,分别是:原始分法,加和法,全局题目加权加和法和局部题目加权加和法,并采用模拟的方法,在样本量、测验长度、维度间相关变化的条件下考察了这些方法与传统多维IRT方法的表现。最后,通过实证研究对结果进行了验证。结果显示:(1)全局加权加和法和局部加权加和法,尤其是局部加权加和法合成的总分和维度分与真值最接近、信度最高。(2)在维度间相关较高,测验长度较长的条件下,局部加权加和法的结果较好,部分条件下甚至优于多维IRT法。(3)仅有局部加权加和法合成的维度分能够反应维度间真实的相关关系。  相似文献   

20.
《人类行为》2013,26(4):371-388
We evaluated the effects of faking on mean scores and correlations with self-reported counterproductive behavior of integrity-related personality items administered in single-stimulus and forced-choice formats. In laboratory studies, we found that respondents instructed to respond as if applying for a job scored higher than when given standard or "straight-take" instructions. The size of the mean shift was nearly a full standard deviation for the single-stimulus integrity measure, but less than one third of a standard deviation for the same items presented in a forced-choice format. The correlation between the personality questionnaire administered in the single-stimulus condition and self-reported workplace delinquency was much lower in the job applicant condition than in the straight-take condition, whereas the same items administered in the forced-choice condition maintained their substantial correlations with workplace delinquency.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号