期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The theory of test validity and correlated errors of measurement

Donald W. Zimmerman Richard H. Williams 《Journal of mathematical psychology》1977,16(2):135-152

In the theory of test validity it is assumed that error scores on two distinct tests, a predictor and a criterion, are uncorrelated. The expected-value concept of true score in the calssical test-theory model as formulated by Lord and Novick, Guttman, and others, implies mathematically, without further assumptions, that true scores and error scores are uncorrelated. This concept does not imply, however, that error scores on two arbitrary tests are uncorrelated, and an additional axiom of “experimental independence” is needed in order to obtain familiar results in the theory of test validity. The formulas derived in the present paper do not depend on this assumption and can be applied to all test scores. These more general formulas reveal some unexpected and anomalous properties of test validty and have implications for the interpretation of validity coefficients in practice. Under some conditions there is no attenuation produced by error of measurement, and the correlation between observed scores sometimes can exceed the correlation between true scores, so that the usual correction for attenuation may be inappropriate and misleading. Observed scores on two tests can be positively correlated even when true scores are negatively correlated, and the validity coefficient can exceed the index of reliability. In some cases of practical interest, the validity coefficient will decrease with increase in test length. These anomalies sometimes occur even when the correlation between error scores is quite small, and their magnitude is inversely related to test reliability. The elimination of correlated errors in practice will not enhance a test's predictive value, but will restore the properties of the validity coefficient that are familiar in the classical theory. 相似文献

2.

不同归因方式的诱导对自尊不同学生失败后测验成绩的影响 总被引：8，自引：0，他引：8

下载免费PDF全文

田录梅《心理发展与教育》2003,19(4):62-65

采用2×2两因素被试间设计检验了不同归因方式的诱导对自尊不同学生失败后测验成绩的影响。结果表明:(1)总体上,高自尊组在失败后的测验成绩显著优于低自尊组;(2)内部归因后,高自尊组的后继测验成绩非常显著地优于低自尊组;外部归因后,高、低自尊组的后继测验成绩无显著差异;(3)对于高自尊组,内部归因后的后继测验成绩优于外部归因后的成绩,但未达到显著性水平;对于低自尊组,外部归因后的成绩显著优于内部归因后的成绩。相似文献

3.

The effect of humor on aggression catharsis in the classroom

A Ziv 《The Journal of psychology》1987,121(4):359-364

Two studies were designed to measure the cathartic effects of humor on aggressive responses. In the first study, two versions (easy and difficult) of Raven's intelligence test were administered to two groups of high school students. Only the easy version could be solved in the alloted time. Rosenzweig's (1951) Picture Frustration test was then administered and the students' aggressive responses were scored. Results showed that those who did not solve the problems had significantly higher scores on aggressivity than did the others. The second study, using four different groups, was planned according to a modified Solomon design. Two of the four groups of students completed the difficult part of the Raven test, and then two video-tapes were presented: a humorous one to two groups and a neutral one to the others. Finally, the Rosenzweig Picture Frustration test was administered to all four groups. An analysis of variance computed on the aggressivity scores showed one significant difference: frustrated students who viewed the humorous videotape had lower scores than those viewing the neutral one. 相似文献

4.

Apparatus for Smoking Kymograph Drum Papers

Shepherd Ivory Franz Thomas A. Watson 《The Journal of general psychology》2013,140(4):509-515

In a computer simulation study, random samples from a uniform density were substituted for each of two independent samples from normal and various nonnormal densities. This procedure was compared with conventional ranking and with Bell and Doksum's (1965) procedure, which substituted random normal deviates for initial sample values. After performing the Student t test, the program transformed the initial scores and performed additional t tests on ranks, random uniform scores, and random normal scores. For several distributions, the test on random normal scores was more powerful than the others, consistent with known asymptotic results. The probabilities of Type I and Type II errors of the test on random uniform scores were nearly the same as those of the Mann-Whitney-Wilcoxon test, for all distributions examined. 相似文献

5.

Assessment of a practice effect in serial sensory organization testing scores of healthy adults

Grindstaff TL Christiano KE Broos AM Straub DA Darr NS Westphal KA 《Perceptual and motor skills》2006,102(2):379-386

This study assessed whether a practice effect occurs across five serial administrations of the sensory organization test. Composite equilibrium scores and mean equilibrium scores from 30 healthy volunteers (M age=36.9, SD = 12.2 yr.) performing each of the six test conditions were examined using a two-way repeated-measures analysis of variance. Analysis yielded a significant interaction between testing condition and time, as well as significant main effects for both condition and time. Pairwise comparisons showed significant differences among test conditions and the first and second times of test administration. Analysis of simple effects between the two administrations identified significant increases in composite equilibrium scores and mean equilibrium scores on two sway-referenced support surface conditions, vision removed and sway-referenced visual surround. An immediate increase in equilibrium scores suggests clinicians and researchers allow one pracatice trial before recording test scores for baseline measurements. 相似文献

6.

The Effect of Humor on Aggression Catharsis in the Classroom

Avner Ziv 《The Journal of psychology》2013,147(4):359-364

Two studies were designed to measure the cathartic effects of humor on aggressive responses. In the first study, two versions (easy and difficult) of Raven's intelligence test were administered to two groups of high school students. Only the easy version could be solved in the alloted time. Rosenzweig's (1951) Picture Frustration test was then administered and the students' aggressive responses were scored. Results showed that those who did not solve the problems had significantly higher scores on aggressivity than did the others. The second study, using four different groups, was planned according to a modified Solomon design. Two of the four groups of students completed the difficult part of the Raven test, and then two videotapes were presented: a humorous one to two groups and a neutral one to the others. Finally, the Rosenzweig Picture Frustration test was administered to all four groups. An analysis of variance computed on the aggressivity scores showed one significant difference: frustrated students who viewed the humorous videotape had lower scores than those viewing the neutral one. 相似文献

7.

概述和音乐对认知负荷和多媒体学习的影响 总被引：1，自引：0，他引：1

下载免费PDF全文

龚德英刘电芝张大均《心理发展与教育》2008,24(1):83-87

让学生学习多媒体呈现的有关移动通讯基础知识的材料,采用2(概述:概述vs.无概述)×2(背景音乐:有音乐vs.无音乐)的被试间设计,考察两因素对自评认知负荷、记忆和迁移成绩的影响.结果表明:两因素无交互作用;概述组的认知负荷显著低于无概述组,迁移成绩显著高于无概述组,两组记忆成绩没有显著差异;有背景音乐组的记忆成绩显著低于无音乐组,两组认知负荷和迁移成绩没有显著差异. 相似文献

8.

Intelligence as measured by the WAIS and a military draft board group test 总被引：1，自引：0，他引：1

E. L. MORTENSEN J. M. REINISCH T. W. TEASDALE 《Scandinavian journal of psychology》1989,30(4):315-318

Intelligence test scores derived from individual administration of the WAIS and group administration of a military draft board screening test were obtained for a sample of 232 young Danish males. The means on both tests show the sample was somewhat above the Danish average. Despite a more than four-year time interval between the two testings, and procedural and content differences between the tests themselves, the correlation between the two test scores was substantial (0.82). Consequently, we conclude that the group administered draft board test measures the same general intelligence as does the individually administered WAIS and is therefore well suited to large-scale epidemiological and demographic studies of intelligence. 相似文献

9.

Reliability of a sentence completion measure of ego development.

C Redmore K Waldman 《Journal of personality assessment》1975,39(3):236-243

Studied the reliability of the Washington University Sentence Completion Test by giving 51 9th graders and 26 college students the test twice, a week apart. For 9th graders the design included a test-retest group and two groups given half of the test at each session. Although test-retest correlations were high for the 9th graders, retest scores dropped significantly. With college students (a) test-retest correlations through positive and significant were lower, (b) retest scores did not change systematically, and (c) percentage agreement between test and retest scores was high. Discrepant results were related to motivational set and variance in test scores. Split-half correlations and internal consistency coefficients were high. Likelihood of lower retest scores makes problematic the use of this test for short term pretest-posttest studies seeking to stimulate ego development. 相似文献

10.

Diagnosing written language disabilities using the Woodcock-Johnson Tests of Educational Achievement-Revised and the Wechsler Individual Achievement Test

Brown MB Giandenoto MJ Bolen LM 《Psychological reports》2000,87(1):197-204

The writing portions of the Woodcock-Johnson Tests of Educational Achievement-Revised and the Wechsler Individual Achievement Test are often administered when establishing eligibility for special education services due to learning disabilities. The scores on these measures are typically regarded as equivalent although little is known about how scores on the two measures differ for the same students. Differences of only a few points, however, may affect eligibility for special education services. These tests were administered to 25 sixth grade students previously diagnosed with learning disabilities in written expression only. Students' Wechsler scores were consistently higher on the overall writing composite, while there was no difference in the mean scores on the language mechanics subtests. The WIAT Written Expression subtest mean, however, was significantly higher than the Woodcock-Johnson Writing Samples subtest mean. Use of the Wechsler test would be less likely to identify children for special education services in written expression when point discrepancy criteria are utilized for eligibility. Clinicians should be cognizant of the effect of the specific test chosen on eligibility outcome. 相似文献

11.

Influence of test anxiety on measurement of intelligence 总被引：1，自引：0，他引：1

Oostdam R Meijer J 《Psychological reports》2003,92(1):3-20

In this study a measurement model for a test anxiety questionnaire was investigated in a sample of 207 Dutch students in the first grade of junior secondary vocational education. The results of a confirmatory factor analysis showed that a model for test anxiety with three factors for worry, emotionality, and lack of self-confidence is associated with a significantly better fit than a model comprised of only the first two factors. The relations of the three test anxiety factors to scores on intelligence tests for measuring verbal ability, reasoning, and spatial ability were examined. The results indicated that test anxiety appears to be transitory: the negative relation between test anxiety and test performance promptly fades away. Finally, we examined whether a distinction can be made between highly test anxious students with low performance due to worrisome thoughts (interference hypothesis) or low ability (deficit hypothesis). Results do not support the deficit hypothesis because the scores of all highly test anxious students increased in a less stressful situation. 相似文献

12.

The effect on the test behavior of children, as reflected in the I.Q. scores, when reinforced after each correct response

Edlund CV 《Journal of applied behavior analysis》1972,5(3):317-319

This experiment studied the effect on intelligence test scores of a probable reinforcer given for correct responses. Eleven pairs of 5- to 7-yr-old children were matched on the basis of a strong liking of candy, no physical problems associated with eating it, parent permission to receive and eat the candy, age, sex, and a revised Stanford-Binet Scale Form L IQ score. The control group was given the revised Stanford-Binet Scale Form M, as prescribed in the test manual. The experimental group was also given Form M according to the manual, except M&M candy was given for each plus or correct response. There was an appreciable, statistically significant difference between the resulting IQ test scores of the two groups. 相似文献

13.

Assessing the Effect of Language Demand in Bundles of Math Word Problems

Kathleen Banks Ahmad Jeddeeni Cindy M. Walker 《International Journal of Testing》2016,16(4):269-287

Differential bundle functioning (DBF) analyses were conducted to determine whether seventh and eighth grade second language learners (SLLs) had lower probabilities of answering bundles of math word problems correctly that had heavy language demands, when compared to non-SLLs of equal math proficiency. Math word problems on each of four test forms (two at Grade 7 and two at Grade 8) were bundled together if they used the passive voice, conditional clause, relative clause, or a combination of any two. The results showed that the average total scores for SLLs was significantly lower than that for non-SLLs on each test form. However, only two bundles (passive voice and conditional clause at Grade 7 on Form 2) indicated statistically significant DBF against SLLs in favor of matched non-SLLs. An additional step was taken to determine whether the two bundles that showed statistically significant DBF against SLLs had biased the mean total scores for this group.The Walker, Zhang, Banks, and Cappaert (2012) procedure established for this purpose showed that this was not the case. Implications of the results are provided as well as suggestions for future research. 相似文献

14.

Exploratory study of the relations between spatial ability and drawing from memory

Czarnolewski MY Eliot J 《Perceptual and motor skills》2012,114(2):627-640

Test scores of 119 students, attending either a public four-year college or a technical school, were related to their proportionality and detail drawing scores on the Memory for Designs Test. In regression models, the ETS Maze Tracing, Eliot-Price Mental Rotations, and Bender-Gestalt tests were consistent predictors of proportionality scores, with the latter two tests uniquely related to these. The ETS Shapes Memory Test and the Form Board Test were the strongest predictors for detail accuracy scores. The Shapes test predicted proportionality when the CTY Visual Memory Test BB was excluded. The models then provided support for the hypothesis that drawing designs from memory, a critical skill in drawing, regardless of whether one focuses on accuracy for proportionality scores or for detail scores, is jointly related to the measures of recognition, production, and traditional spatial ability measures. This study identified multifaceted skills in drawing from memory. 相似文献

15.

A base-free measure of change

Ledyard R Tucker Fred Damarin Samuel Messick 《Psychometrika》1966,31(4):457-473

A model for the measurement of the discrepancy between two scores is presented and discussed as a paradigm for the study of growth or experimentally produced change. The model assumes two tests or measures differing in complexity, and it analyzes the true difference between the test scores into a component that is entirely dependent on the first or base-line test and a second component that is entirely independent of it. Equations for estimating both components are given and these are compared with other measurement efforts with similar goals. 相似文献

16.

多媒体互动插图对科学说明文学习的影响

刘儒德陈琦 David Reid 《应用心理学》2002,8(2):39-44

48名 1 3岁中学生被分为高低两种学习能力 ,每种被试又被分成两组分别学习配有两种插图 (多媒体静态插图和多媒体互动插图 )的一篇生物说明文 ,结果表明 :( 1 )在重在考查插图加工水平的图片测验上 ,学习能力和插图类型主效应均显著 ,但两者之间不存在显著交互作用 ;( 2 )在重在考查文图整合加工水平的文字测验上 ,学习能力和插图类型之间存在显著的交互作用 ,即只有当学习能力高时 ,两种插图之间存在显著差异。 ( 3 )在被试的填图过程中 ,被试的尝试次数与其插图加工时间及学习效果之间存在显著的相关。这说明 ,多媒体互动插图可促进所有被试对插图的加工 ,但只能促进学习能力高者对图文的整合加工 ,而且多媒体互动插图对学习的促进效果与被试加工插图时的实际加工深度有关相似文献

17.

Test anxiety in UK schoolchildren: Prevalence and demographic patterns

《The British journal of educational psychology》2007,77(3):579-593

Background. Despite a large body of international literature concerning the antecedents, correlates of and treatments for test anxiety, there has been little research until recently using samples of students drawn from the UK. There is a need to establish some basic normative data for test anxiety scores in this population of students, in order to establish whether international research findings may generalize to UK schoolchildren. Aim. To collect some exploratory data regarding test anxiety scores in a sample of UK schoolchildren, along with socio‐demographic variables identified in the existing literature as theoretically significant sources of individual and group differences in test anxiety scores. Sample. Key Stage 4 students (1348): 690 students in the Year 10 cohort and 658 students in the Year 11 cohort, drawn from seven secondary schools in the North of the UK. Method. Data on test anxiety were collected using a self‐report questionnaire, the Test Anxiety Inventory ( Spielberger, 1980 ) and additional demographic variables through the Student Profile Questionnaire. The factor structure of the Test Anxiety Inventory was explored using principal components analysis and multiple regression analysis used to predict variance in self‐reported test anxiety scores from individual and group variables. Results. The principal components analysis extracted two factors, worry and emotionality, in line with theoretical predictions. Gender, ethnic and socio‐economic background were identified as significant predictors of variance in test anxiety scores in this dataset. Whether English was an additional, or native, language of students did not predict variance in test anxiety scores and year group was identified as a predictor of emotionality scores only. Conclusion. Variance in the test anxiety scores of Key Stage 4 students can be predicted from a number of socio‐demographic variables. Further research is now required to assess the implications for assessment performance, examination arrangements and appropriateness of using a North American measure of test anxiety in a UK context. 相似文献

18.

Adrian Furnham 《Current Psychology》1989,8(1):30-37

This study examined the relationship between subjects’ actual test derived scores and their estimates of what those scores would be. Sixty subjects completed the 16 PF (form D) and then estimated the scores on each dimension for themselves and another person they knew well. The results showed significant positive correlations on 9 of the 16 dimensions for themselves. The dimensions they were best at estimating were Desurgency-Surgency, Untroubled adequacy-guilt proneness and Threctia-Parmia. Only two correlations (both negative) reached significance concerning their ability to predict another known person’s scores. Whereas subjects believed they were like the other person they nominated (13 of the 16 correlations were significantly positive), in actual fact their test derived scores showed only two significant findings, one positive and the other negative. Results are discussed in terms of lay theories of personality and their relationship to personality assessment. 相似文献

19.

Predicting one’s own and others’ 16 PF scores

Adrian Furnham 《Current psychology (New Brunswick, N.J.)》1989,8(1):30-37

This study examined the relationship between subjects’ actual test derived scores and their estimates of what those scores would be. Sixty subjects completed the 16 PF (form D) and then estimated the scores on each dimension for themselves and another person they knew well. The results showed significant positive correlations on 9 of the 16 dimensions for themselves. The dimensions they were best at estimating were Desurgency-Surgency, Untroubled adequacy-guilt proneness and Threctia-Parmia. Only two correlations (both negative) reached significance concerning their ability to predict another known person’s scores. Whereas subjects believed they were like the other person they nominated (13 of the 16 correlations were significantly positive), in actual fact their test derived scores showed only two significant findings, one positive and the other negative. Results are discussed in terms of lay theories of personality and their relationship to personality assessment. 相似文献

20.

汉语学龄前儿童正字法意识的发展

钱怡赵婧毕鸿燕《心理学报》2013,45(1):60-69

本研究选择北京地区幼儿园3岁、4岁、5岁儿童各31、48、33名, 采用单部件意识测验和部件位置及功能意识测验系统探查了学龄前儿童的正字法意识各个层面的发展状况。单部件意识测验包括部件替换、部件缺失和部件旋转三个部分; 部件位置及功能意识测验分为假字和非字两个部分, 而非字又包括两形非字和形声非字。结果发现单部件意识测验中, 5岁组儿童在部件替换水平上的得分显著高于3岁组儿童, 而3岁组与4岁组、4岁组与5岁组儿童之间无显著差异; 部件缺失和部件旋转水平上, 三个年龄段的儿童的得分表现出明显的增长趋势。部件位置及功能意识测验中, 假字得分在三个年龄段之间无显著差异; 非字得分随年龄增长显著提高。这些结果表明, 单部件意识在学前期处于不断发展的阶段, 其中对部件替换的非字的拒绝能力发展较早, 部件缺失和部件旋转非字的拒绝能力发展较晚; 部件位置及功能意识在学前期已经开始发展, 3岁儿童已经具有假字符合正字法规则的认识, 但对非字违反部件位置合法性和功能完整性的认识直到4岁左右才开始萌芽, 5岁还未成熟。相似文献