首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Maximum validity of a test with equivalent items   总被引:1,自引:0,他引:1  
It is assumed that a scale of true scores on a function exists and that the probability of answering an item correctly is a curve of the type of the integral of the normal curve. The product moment correlation between the test score and true score is derived for a normal distribution of subjects and a test composed of equivalent items. Numerical examples demonstrate that the maximum correlation between test scores and true scores occurs for a one hundred item test when the point correlation between items is less than three tenths.  相似文献   

2.
Under certain assumptions an expression, in terms of item difficulties and intercorrelations, is derived for the curvilinear correlation of test score on the ability underlying the test, this ability being defined as the common factor of the item tetrachoric intercorrelations corrected for guessing. It is shown that this curvilinear correlation is equal to the square root of the test reliability. Numerical values for these curvilinear correlations are presented for a number of hypothetical tests, defined in terms of their item parameters. These numerical results indicate that the reliability and the curvilinear correlation will be maximized by (1) minimizing the variability of item difficulty and (2) making the level of item difficulty somewhat easier than the halfway point between a chance percentage of correct answers and 100 per cent correct answers.  相似文献   

3.
Researchers often include a social desirability measure in personality measures, commonly the Balanced Inventory of Desirable Responding (BIDR), and correlate it with personality items to probe for social desirability of the items. A strong correlation between BIDR scores and a personality item would indicate high item social desirability. The current research assesses the validity of this practice. Results showed that these correlations have high validity only when BIDR scores are calculated as a continuous variable rather than as dichotomized item scores. In addition, self-deception scores have higher validity for detecting item social desirability than do impression management scores. The current research supported the use of the self-deception scores, in particular, to detect highly desirable or undesirable items.  相似文献   

4.
An experiment is reported that examines the role of item strength in output interference. Subjects studied two types of categorized item lists: lists in which each category consisted of strong and moderate items, and lists in which each category consisted of weak and moderate items. Different degrees of item strength were accomplished by varying the items’ taxonomic frequency within a category. The subjects either recalled a category’s strong and weak items before its moderate items, or vice versa. The prior recall of the moderate items impaired the later recall of the strong items, but did not impair the later recall of the weak items. This effect of item strength indicates that output interference is caused by a process of retrieval suppression. It additionally suggests that, in order to minimize output-interference effects in recall, a list’s strong items should be recalled before its weak items.  相似文献   

5.
The relative contributions of subtle and obvious item endorsements to the prediction of a relevant criterion were assessed under faking and control ("honest") conditions. The MMPI and a nonconformity questionnaire were first administered to 100 male college students. Items on the Pd scale and 101 additional MMPI items that correlated significantly with the nonconformity questionnaire were then rated by 38 other male college students for apparent relationship to psychopathology. From these ratings, a scale (designated PdX) was constructed, which consisted of 21 subtle and 21 obvious items. After a third group of 98 male college students completed the nonconformity questionnaire, they were asked to respond to the items of the Pd and PdX subscales under control, fake-good, and fake-bad instructions. Significant correlations between the nonconformity scale and certain PdX and Pd subscales were found only for the control group. Implications for test construction and for clinical interpretation under faking conditions are discussed.  相似文献   

6.
7.
8.
ABSTRACT

We evaluated the reliability, validity, and differential item functioning (DIF) of a shorter version of the Defining Issues Test-1 (DIT-1), the behavioural DIT (bDIT), measuring the development of moral reasoning. About 353 college students (81 males, 271 females, 1 not reported; age M = 18.64 years, SD = 1.20 years) who were taking introductory psychology classes at a public University in a suburb area in the Southern United States participated in the present study. First, we examined the reliability of the bDIT using Cronbach’s α and its concurrent validity with the original DIT-1 using disattenuated correlation. Second, we compared the test duration between the two measures. Third, we tested the DIF of each question between males and females. Findings reported that first, the bDIT showed acceptable reliability and good concurrent validity. Second, the test duration could be significantly shortened by employing the bDIT. Third, DIF results indicated that the bDIT items did not favour any gender. Practical implications of the present study based on the reported findings are discussed.  相似文献   

9.
Subjects were asked to report the number of items in a display as the items moved along a circular path around the fixation point. As the rotation speed increased, the apparent number of items also increased. This motion-induced overestimation (MIO) effect was investigated in three experiments. In the first experiment, the effect of rotation speed and set size was explored with an enumeration task. The overestimation error increased with an increase in speed or number of items in the display. In the second experiment, we used an adjustment paradigm to measure the speed threshold of MIO effect onset. Temporal rate of the display, which was defined as product of rotation speed and the number of rotating items, was the determining factor of MIO onset. In the third experiment, moving items were marked with different colours. Surprisingly, the number of perceived items was still overestimated even though the number of perceived colours was not.  相似文献   

10.
This study provided a comprehensive examination of the full range of transformational, transactional, and laissez-faire leadership. Results (based on 626 correlations from 87 sources) revealed an overall validity of .44 for transformational leadership, and this validity generalized over longitudinal and multisource designs. Contingent reward (.39) and laissez-faire (-.37) leadership had the next highest overall relations; management by exception (active and passive) was inconsistently related to the criteria. Surprisingly, there were several criteria for which contingent reward leadership had stronger relations than did transformational leadership. Furthermore, transformational leadership was strongly correlated with contingent reward (.80) and laissez-faire (-.65) leadership. Transformational and contingent reward leadership generally predicted criteria controlling for the other leadership dimensions, although transformational leadership failed to predict leader job performance.  相似文献   

11.
12.
We conducted two experimental studies with between-subjects and within-subjects designs to investigate the item response process for personality measures administered in high- versus low-stakes situations. Apart from assessing measurement validity of the item response process, we examined predictive validity; that is, whether or not different response models entail differential selection outcomes. We found that ideal point response models fit slightly better than dominance response models across high- versus low-stakes situations in both studies. Additionally, fitting ideal point models to the data led to fewer items displaying differential item functioning compared to fitting dominance models. We also identified several items that functioned as intermediate items in both the faking and honest conditions when ideal point models were fitted, suggesting that ideal point model is “theoretically” more suitable across these contexts for personality inventories. However, the use of different response models (dominance vs. ideal point) did not have any substantial impact on the validity of personality measures in high-stakes situations, or the effectiveness of selection decisions such as mean performance or percent of fakers selected. These findings are significant in that although prior research supports the importance and use of ideal point models for measuring personality, we find that in the case of personality faking, though ideal point models seem to have slightly better measurement validity, the use of dominance models may be adequate with no loss to predictive validity.  相似文献   

13.
14.
15.
This study sought to determine whether changing the time orientation or biodata items from past to present would result in a reduction of the items' validity. It was predicated on the notion that the traditionally employed measures of past performance were potentially unfair, especially to minority applicants. Administered to 192 members of the Air National Guard, the set of biodata items measuring present behavior was found to have validity coefficients which are at least comparable, if not superior, to the set measuring past behavior.  相似文献   

16.
17.
This study assessed the efficiency of the Haine and Koppitz scoring systems used with the Bender-Gestalt Test (B-G) in terms of their ability to differentiate between adolescents with and without central nervous system (CNS) impairment who were achieving below age-expectations. Utilizing a population of 84 adolescents enrolled in a residential treatment center, both the Haine and Koppitz systems with the Bender-Gestalt differentiated 25 Ss with CNS impairment from 59 Ss wihout such impairment. The results indicated, however, that neither scoring system was useful in individual classification when the B-G was used alone or in combination with intelligence test results.  相似文献   

18.
The predictive validity of a psychological measure can be improved by minimizing measurement errors through increases in the length of the assessment (aggregation) and, for an assessment of finite length, by making use of objective strategies for choosing from all available component measures. Two prominent considerations in selecting individual measures to be aggregated involve standards of (a) item content (construct approach) and (b) item/criterion association (empirical approach). Personality trait scales of different lengths were assembled for this study in order to represent features of the construct and empirical methods of selection. It was observed that (a) although reliability and validity generally increased with test length, aggregation beyond a certain point can fail to be expedient; and (b) although the prediction performance of empirically derived measures initially surpassed that of construct based assessments, the superiority of the empirical scales did not generalize to trait criteria that were not used as a basis for item selection. The data are interpreted as providing support for a theory-based program of test development where substantive considerations involving item content play a major role. The findings are also viewed as encouragement for conventional conceptualizations about organized dimensions of behavior.  相似文献   

19.
20.
The Profile of Mood States was administered to samples of 182 college males, 179 college females, and 257 prison inmates. College males and females did not differ significantly from each other in terms of scale elevation but differed from prison inmates on all scales except Fatigue-Inertia. The college samples differed from the published normative college samples, suggesting the importance of using local norms. A confirmatory item factor analysis suggested convergent item validity with the scoring key and similarity of structure across samples. Discriminant item validity, however, suggested that a smaller number of mood scales would offer a more justifiable interpretation of this inventory.This study was supported by the Alberta Hospital Edmonton, the Solicitor General of Canada, and Social Sciences and Humanities Research Council of Canada Grant 410-80-0576-XI.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号