首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The issue of the absence of parallel forms for the traditional individual intelligence tests has received little attention in the area of psychological testing ever since the early demise of the Wechsler Bellevue Form II and the delayed discontinuance of Form M of the Stanford-Binet Intelligence Scale. Five reasons have been presented here to argue that the availability of parallel forms could have both theoretical and practical benefits, especially if the constructors of individual ability and/or achievement tests employ the recent advances in item response theory and computer technology.  相似文献   

2.
Assessment centers rely on multiple, carefully constructed behavioral simulation exercises to measure individuals on multiple performance dimensions. Although methods for establishing parallelism among alternate forms of paper-and-pencil tests have been well researched (i.e., to equate tests on difficulty such that the scores can be compared), little research has considered the why and how of parallel simulation exercises. This paper extends established procedures for constructing parallel test forms to dimension-based behavioral simulations. We discuss reasons for establishing comparable, alternate simulation forms and discuss the issues raised when applying traditional procedures to simulation exercises. After proposing a set of guidelines for establishing alternate forms among simulations, we apply these guidelines to simulations used in an operational assessment center.  相似文献   

3.
In a concurrent validity study, a comprehensive job analysis of a mid-level secretarial position resulted in the development of highly valid employment selection instruments. Six hundred fifty-nine supervisors and 883 incumbents in 20 locations participated in the job analysis. Scores from the selection test correlated with composite ratings from a research performance appraisal (RPA) .41 (p<.001). Two forms of the test (A and B) were developed. Form A and Form B test scores and RPA composite ratings correlated .55 and .48 (p<.001) respectively. The unbiased estimate of equivalence reliability of Form A and Form B was .94. The two tests correlated .89 (p<.001).  相似文献   

4.
The interpretation of retest scores is problematic because they are potentially affected by measurement and predictive bias, which impact construct validity, and because their size differs as a function of various factors. This paper investigates the construct stability of scores on a figural matrices test and models retest effects at the level of the individual test taker as a function of covariates (simple retest vs. training, use of identical vs. parallel retest forms, and general mental ability). A total of N = 189 subjects took two tests of matrix items that were automatically generated according to a strict construction rationale. Between test administrations, participants in the intervention groups received training, while controls did not. The Rasch model fit the data at both time points, but there was a lack of item difficulty parameter invariance across time. Training increased test performance beyond simple retesting, but there was no large difference between the identical and parallel retest forms at the individual level. Individuals varied greatly in how they profited from retest experience, training, and the use of identical vs. parallel retest forms. The results suggest that even with carefully designed tasks, it is problematic to directly compare scores from initial tests and retests. Test administrators should emphasize learning potential instead of state level assessment, and inter-individual differences with regard to test experience should be taken into account when interpreting test results.  相似文献   

5.
6.
A rationale for, and data from, a trial of a theory of item generation by algorithms whose origins are cognitive models of task performance are presented. Since Spearman (1904), intelligence has been operationally defined and assessed in human subjects by administering identical test items whose content and order have been fixed only after empirical iterations. In our approach, intelligence is ostensively defined by theoretically determined algorithms used for item construction and presentation. Knowledge of what cognitive factors limit human performance makes it possible to vary within tightly specified parameters those features of the tasks that contribute to difficulty, which we call radicals, to let those components of the tasks that do not contribute to difficulty vary randomly, and to counterbalance aspects of answer production that might induce biases of response. Empirical data are based on the generation of five different short tests demanding only functional literacy as a prerequisite for their execution. Four parallel forms of each test were administered to young male Army recruits whose scores were collated with their Army Entrance Test results, which were not previously known to us. Results show that the parallel, algorithm-generated item sets are statistically invariant, which item generation theory demands; and that the individual tests differentially predict Army Entrance Test scores. We conclude that IQ test performances are parsimoniously explained by individual differences in encoding, comparison and reconstructive memory processes.  相似文献   

7.
In this pilot study, 20 middle-school-age children classified as emotionally handicapped were administered Forms L and M of the Peabody Picture Vocabulary Test--Revised in test-retest fashion. Pearson correlations for Form L were .90, for Form M, .69, and these dependent correlations were significantly different from each other. As triennial school psychological evaluations typically contain tests which have been administered previously, e.g., WISC-R, WRAT-R, we suggest that psychologists use caution when using Form M to test or retest the receptive vocabulary of emotionally handicapped or disturbed middle-school-age children.  相似文献   

8.
The assumption is presented of the test-taker as a hypothesis-generating organism who can become "testwise." Testwiseness is defined as a stable skill, acquired by test-taking experiences, by which an individual can make test responses conform to a desired response pattern. Forty-three college students completed two forms of The Personality Research Form (PRF) and a rank ordering of their predicted personality need pattern. Results show significantly higher correlations of PRF predictions in the second administration. Analyses show PRF profiles, not predictions, to have been modified. Furthermore, high testwise subjects had higher needs for Understanding and Nurturance, and lower needs for Aggression and Defendence than low testwise persons. The importance of considering testwiseness, given trends in society encouraging access to psychological records, is discussed.  相似文献   

9.
The assumption is presented of the test-taker as a hypothesis-generating organism who can become "testwise." Testwiseness is defined as a stable skill, acquired by test-taking experiences, by which an individual can make test responses conform to a desired response pattern. Forty-three college students completed two forms of The Personality Research Form (PRF) and a rank ordering of their predicted personality need pattern. Results show significantly higher correlations of PRF predictions in the second administration. Analyses show PRF profiles, not predictions, to have been modified. Furthermore, high testwise subjects had higher needs for Understanding and Nurturance, and lower needs for Aggression and Defendence than low testwise persons. The importance of considering testwiseness, given trends in society encouraging access to psychological records, is discussed.  相似文献   

10.
A procedure for developing alternate test forms that are parallel in the sense that scores on the different forms have similar means, standard deviations, and factor structures is described and applied to a bio-data inventory and a situational judgment test. Careful consideration of item-by-item parallelism during development resulted in alternate forms that were parallel at the item level. Further, comparison with a biodata test form comprised of items randomly selected from a pool of biodata items revealed that for the types of measures described here it may be necessary to produce parallel forms of each item to create alternate forms that are parallel in the way in which Cronbach (1947) originally defined parallelism.  相似文献   

11.
不同定义平行测验等值的群体不变性   总被引:1,自引:0,他引:1  
群体不变性是等值的一个重要假设,即对不同的考生子群体等值函数一致。本研究对不同平行测验定义下线性等值的群体不变性进行了理论分析和模拟研究,模拟研究REMSD指标通过六种不同加权方式计算。结果显示,严格平行测验在信度较低时REMSD指标更大;子群体均值差异和信度差异对REMSD的影响存在明显的交互作用;REMSD指标在期望权重等权下的最大,在分数权重采用子群体比例加权最小。最后对结果进行了讨论,对REMSD权重使用及进一步研究给出了建议。  相似文献   

12.
Evidence from 85 adult medical outpatients supported psychometric comparability of the 2 halves of the Washington University Sentence Completion Test (SCT) Form 81 and of the female and male forms of the SCT. There was slightly stronger internal consistency for the first versus the second half of the SCT. Each half correlated highly with the ogive total protocol rating and 36-item-sum rating. Intercorrelations of the 2 halves with external measures also suggested essentially equivalent relations. For the 30 identical items across gender, the median correlation between individual item ratings with the item-sum ratings was nearly equal for women and men. When the 6 nonidentical items were considered with the identical items, the median item-total correlation was slightly higher for men (45) than women (41). This difference was accounted for by the slightly larger variability in the mate subsample. Practically speaking, the 2 halves and the female and male forms may be used with minimal concern regarding psychometric comparability in similar medical outpatient settings.  相似文献   

13.
Evidence from 85 adult medical outpatients supported psychometric comparability of the 2 halves of the Washington University Sentence Completion Test (SCT) Form 81 and of the female and male forms of the SCT. There was slightly stronger internal consistency for the first versus the second half of the SCT. Each half correlated highly with the ogive total protocol rating and 36-item-sum rating. Intercorrelations of the 2 halves with external measures also suggested essentially equivalent relations. For the 30 identical items across gender, the median correlation between individual item ratings with the item-sum ratings was nearly equal for women and men. When the 6 nonidentical items were considered with the identical items, the median item-total correlation was slightly higher for men (45) than women (41). This difference was accounted for by the slightly larger variability in the mate subsample. Practically speaking, the 2 halves and the female and male forms may be used with minimal concern regarding psychometric comparability in similar medical outpatient settings.  相似文献   

14.
15.
Hospitalized psychiatric patients (n = 115) completed either Form A or Form B of the Whitaker Index of Schizophrenic Thinking, along with the Beck Depression Inventory, State Anxiety Inventory, and the MMPI. Only error scores on the WIST were calculated in an effort to assess validity of the WIST for use in group testing situations where individual timing and administration is cumbersome. Results supported the convergent and discriminant validity of Form A where significant correlations were found with measures of thought disorder (M M PI F, Pt, Sc and Pa) but not with indices of other symptomatology, such as depression and anxiety. Form B did now show such validity, with only one significant correlation with other measures (MMPI Pd). Both WIST Forms correctly identified nonschizophrenics (76% for Form A and 73% for Form B) more often than schizophrenics (57% for each form). Also, Form A was found to be negatively related to years of education. Suggestions for further research on the influence of intelligence and social class variables on WIST scores were made. Overall, Form A emerged as the most valid WIST form, with suggestions for its clinical use being offered.  相似文献   

16.
While individual and group psychotherapy are often referred to as forms of secular confession, the relationship of early religious confessional practices to the psychology of contemporary helping group processes needs further exploration. An examination of the theology and form of the Catholic rites of reconciliation indicates that their psychology and structure clearly parallel many of the healing processes at work in group psychotherapy.  相似文献   

17.
H. M. Fowler 《Psychometrika》1947,12(3):221-232
Results of an experiment to obtain data on the consistency of the items of two forms of an Activity Preference Blank are presented. Both Form I and Form II, which was a revised edition of Form I, were administered twice, so consistency data are available for both forms. A sub-item is said to be consistent if a high proportion of men marked it the same way,M for preferred Most andL for preferred Least, on both administrations of the test. The data of the experiment were investigated to see what happens to the consistency of sub-items when the items are changed in context, when the number of sub-items in an item is reduced, and when the time-interval between the administration and the re-administration of the test is increased. The author also gives data on the consistency of the responses made to particular combinations of sub-items and data on item consistency when all sub-item combinations are taken into consideration.  相似文献   

18.
The assumption that individual differences in recognition memory are associated with individual differences in intelligence was explored by administering intelligence tests and tests of immediate visual recognition memory to a sample of 52 5-year-old children expected to vary widely from one another in intelligence. Each child was given the Peabody Picture Vocabulary Test (Form L) and two tests of immediate recognition memory: one test for 27 abstract patterns and one test for 27 unfamiliar cartoon faces. The mean PPVT-IQ for the sample was in the average range at 98.1. Interindividual variability in IQ proved to be high as reflected in the group SD of 22.6, with scores ranging from 40 to 136. The recognition tasks proved to be of moderate difficulty. Individual differences in memory for patterns were highly related to memory for faces (r = .76), indicating that the overall recognition test was reliable. The most important result of the present study was the strong association between recognition memory performance and PPVT-IQ of .70. The relation between recognition memory and IQ could not be accounted for by the inclusion of a few very low IQ children, since the association remained high at .61 when children with IQs below 75 were omitted from analysis. In short, the present results indicate that immediate recognition memory is highly associated with intelligence.  相似文献   

19.
Aphasic patients were given digit and noun dichotic listening tests at monthly intervals during the first 6 months postonset of left hemisphere ischemic infarctions. Significant changes in performance were observed that consisted of parallel increases in scores for the two ears, and thus did not support the hypothesis that language recovery is mediated by transfer of language dominance from the left to the right hemisphere. Reliable differences also were noted between performance on the two tests. RE scores accounted for most of the variance among patients in performance on dichotic tests and, as such, appeared to be the best measure for characterizing individual differences.  相似文献   

20.
Sixty‐five Mexican American undergraduates completed a battery of tests, including the Expectations About Counseling‐Brief Form B and the Marlowe‐Crown Social Desirability Scale‐Form XX. Statistical analyses showed significant counselor ethnicity and participant gender main and interaction effects on EAC‐B ratings related to client attitudes and behaviors, counselor attitudes and behaviors, counselor characteristics, and counseling process.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号