首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 119 毫秒
1.
本研究主要目的是针对高考英语成绩存在的城乡差异,检验这种差异是否来源于试题在城乡上的项目功能差异。如果两个能力本来相同的考生群体在某一试题得分上表现出不同程度的差异,该试题就存在项目功能差异。研究采用试题标准化分数差法,利用STDIF软件逐一分析了2016年三套全国高考英语卷的客观题是否存在城乡上的项目功能差异,在确定客观题没有项目功能差异后,以客观题成绩为匹配变量,采用条件得分图法对书面表达题是否存在城乡上的项目功能差异进行了分析。研究结果显示,高考英语全国I、II、III卷均未发现城乡上的项目功能差异试题,即可以认为高考英语全国卷对城乡不同户籍考生都非常公平、公正,城乡考生在英语成绩上的差异并非题目的公平性所致。  相似文献   

2.
A method is presented for converting the scores on one form of a test to those on another form of the same test. The method is particularly applicable to the case where each form has been administered to a different group and the only link between the two forms is a subset of items common to both. The proposed method, called theitem method of conversion, has been applied to several tests for which other methods of conversion are available for comparison. The necessary data are limited to tests for which the total score is the criterion for item analyses. The method gives highly satisfactory results for all the tests to which it has been applied, particularly when the two groups are rather different, in which case the delta method (a different item method) is inappropriate.The authors are only two of a group, including W. H. Angoff, F. M. Lord, and M. K. Schultz, all of whom have made important contributions to this paper.  相似文献   

3.
The item difficulty and discrimination index values for the Tactual Performance Test Location scores are presented. The difference between this and the previous item analysis of the Location score is that in this analysis the subjects' Location scores were used only if the subject correctly remembered the block.  相似文献   

4.
Woodrow  Herbert 《Psychometrika》1937,2(4):237-247
It is shown that in certain cases practice data approximately meet the assumptions involved in Thurstone's method of absolute scaling. An application of the method was accordingly made in the case of four test performances practiced for 39 days by a group of 56 subjects. The manner in which the practice data were scaled is described by using the data on practice in anagrams as an illustration. Scaling had little effect upon the correlations between initial and final score, but produced marked changes in the apparent effect of practice upon individual differences and in the correlations between initial score and gain.This is one of a series of studies of the application of scaling methods to problems of experimental psychology in which the data are obtained from but a single group of subjects. For other studies, see (4), (5), (6), and (8).  相似文献   

5.
By the use of the Seashore tests of Pitch Discrimination, Intensity Discrimination, Time Discrimination, and the test of Tonal Memory, it is shown that the easiness of an item, as determined by absolute scaling methods, is proportional to the logarithm of the magnitude of the stimulus. It is proposed that this is a case of Fechner's psychophysical law and that the unit of absolute scaling as applied to test items may become a satisfactory unit of allS-scales in the more traditional psychophysical problem.  相似文献   

6.
The concept of "absolute scaling" (Zwislocki & Goodman, 1980) implies that direct judgments of sensory magnitude not only reflect the relative positions of the stimuli being judged, but also permit us to assess level differences in sensation. In order to explore this notion for different scaling methods, in the present investigation we compared magnitude estimation with category partitioning, a verbally anchored categorization procedure, in scaling painful pressure stimuli covering different intensity ranges. The results indicate that when the same stimulus range was presented after 1 week, both methods appeared to be highly reliable, with category partitioning faring somewhat better than magnitude estimation. When the stimulus range was unobtrusively changed between sessions, both methods reflected the within-subjects shift in absolute level. When two different sets of subjects judged the slightly different stimulus ranges, both methods resulted in scale values consistent with absolute scaling, though only category partitioning was sensitive enough to differentiate the two stimulus ranges. The results are discussed in the context of different possibilities of anchoring direct scaling methods in order to obtain "absolute" level information.  相似文献   

7.
The additive constant problem in multidimensional scaling   总被引:1,自引:0,他引:1  
The problem of choosing the correct additive constant to convert relative interstimulus distances to absolute interstimulus distances in multidimensional scaling is investigated. An artificial numerical example is constructed, and various trial values of the constant are inserted to demonstrate the effect on the multidimensional map of making a variety of incorrect choices. Finally, a general solution to the problem, suggested by Dr. Ledyard R Tucker, is presented; each of the computational steps in this solution is set down for easy reference.This study was supported in part by Office of Naval Research Contract N6onr-270-20 and by National Science Foundation Grant G-642 to Princeton University.  相似文献   

8.
The purpose of the present study was to determine if the results obtained by the scaling methods of magnitude estimation and magnitude production could be influenced by providing subjects with prior exposure to psychophysical scaling in the form of magnitude estimation or magnitude production. Group 1 (n = 10, Mage = 21.1 yr.) performed lingual vibrotactile-magnitude estimation followed by lingual vibrotactile magnitude production. Group 2 (n = 10, Mage = 19.7 yr.) performed lingual vibrotactile-magnitude production (using the magnitude-estimation responses provided by Group 1), followed by lingual vibrotactile-magnitude estimation. For the magnitude estimations there was no over-all statistically significant difference between the two groups, but there was for the magnitude-production values. Magnitude-estimation scaling was apparently not influenced by prior exposure to magnitude production, while magnitude-production scaling was influenced by prior exposure to magnitude estimation. The results are discussed in terms of how subjective scaling behavior in psychophysical experimentation may be influenced by the interaction between an absolute internal scaling mechanism and parameters set by the experimenter, such as scaling method and range of stimulus intensity.  相似文献   

9.
Multidimensional scaling: I. Theory and method   总被引:19,自引:0,他引:19  
Torgerson  Warren S. 《Psychometrika》1952,17(4):401-419
Multidimensional scaling can be considered as involving three basic steps. In the first step, a scale of comparative distances between all pairs of stimuli is obtained. This scale is analogous to the scale of stimuli obtained in the traditional paired comparisons methods. In this scale, however, instead of locating each stimulus-object on a given continuum, the distances between each pair of stimuli are located on a distance continuum. As in paired comparisons, the procedures for obtaining a scale of comparative distances leave the true zero point undetermined. Hence, a comparative distance is not a distance in the usual sense of the term, but is a distance minus an unknown constant. The second step involves estimating this unknown constant. When the unknown constant is obtained, the comparative distances can be converted into absolute distances. In the third step, the dimensionality of the psychological space necessary to account for these absolute distances is determined, and the projections of stimuli on axes of this space are obtained. A set of analytical procedures was developed for each of the three steps given above, including a least-squares solution for obtaining comparative distances by the complete method of triads, two practical methods for estimating the additive constant, and an extension of Young and Householder's Euclidean model to include procedures for obtaining the projections of stimuli on axes from fallible absolute distances.This study was carried out while the author was an Educational Testing Service Psychometric Fellow at Princeton University. The author expresses his appreciation to his thesis adviser, Dr. H. Gulliksen, for his guidance throughout the study and to Dr. B. F. Green, Jr., for valuable assistance on several of the derivations.  相似文献   

10.
Best-worst scaling is a judgment format in which participants are presented with a set of items and have to choose the superior and inferior items in the set. Best-worst scaling generates a large quantity of information per judgment because each judgment allows for inferences about the rank value of all unjudged items. This property of best-worst scaling makes it a promising judgment format for research in psychology and natural language processing concerned with estimating the semantic properties of tens of thousands of words. A variety of different scoring algorithms have been devised in the previous literature on best-worst scaling. However, due to problems of computational efficiency, these scoring algorithms cannot be applied efficiently to cases in which thousands of items need to be scored. New algorithms are presented here for converting responses from best-worst scaling into item scores for thousands of items (many-item scoring problems). These scoring algorithms are validated through simulation and empirical experiments, and considerations related to noise, the underlying distribution of true values, and trial design are identified that can affect the relative quality of the derived item scores. The newly introduced scoring algorithms consistently outperformed scoring algorithms used in the previous literature on scoring many-item best-worst data.  相似文献   

11.
A formulation, which is different from Guttman's is presented. The two formulations are both called the optimal scaling approach, and are proven to provide identical scale values. The proposed formulation has at least two advantages over Guttman's. Namely, (i) the former serves to clarify close relations of the optimal scaling approach to those of Slater and the vector model of preferential choice, and (ii) in addition to the stimulus scale values, it provides scores for the subjects, which indicate the degrees of response consistency (transitivity), relative to the optimum solution. The method is assumption-free and capable of multidimensional analysis.This study was partly supported by the National Research Council Grant (No. A4581) to S. Nishisato. The author is indebted to Dr. Bert F. Green, Jr., Mr. Tomoichi Ishizuka, and anonymous reviewers for their valuable comments on an earlier draft.  相似文献   

12.
An equation is derived for predicting the effect of chance success, relative to item difficulty, on item-test correlation. The values predicted by this equation and by equations derived by Guilford and Carroll for predicting the effect of chance success on item difficulty and test reliability are compared with empirical values in an experiment which used identical test items in multiple-choice and answer-only form.Condensation of a dissertation presented in partial fulfillment of the requirements for the Ph.D. degree to the University of Chicago. Grateful acknowledgment is made to Professor Harold Gulliksen for his guidance as thesis advisor and to Professor L. L. Thurstone and Dr. D. W. Fiske of the University of Chicago who served as members of the thesis committee. The author is also indebted to Professor S. S. Wilks for review of the derivations and development of statistical tests used in the thesis, to Dr. L. R Tucker for technical advice, and to Dr. W. G. Mollenkopf for critical comments on the derivations and interpretations. The writer expresses appreciation to the Educational Testing Service for making available its technical facilities, and to the University of Chicago for the flexible administrative arrangement which made this thesis possible.  相似文献   

13.
The concept of “absolute scaling” (Zwislocki & Goodman, 1980) implies that direct judgments of sensory magnitude not only reflect the relative positions of the stimuli being judged, but also permit us to assess level differences in sensation. In order to explore this notion for different scaling methods, in the present investigation we compared magnitude estimation with category partitioning, a verbally anchored categorization procedure, in scaling painful pressure stimuli covering different intensity ranges. The results indicate that when the same stimulus range was presented after 1 week, both methods appeared to be highly reliable, with category partitioning faring somewhat better than magnitude estimation. When the stimulus range was unobtrusively changed between sessions, both methods reflected the within-subjects shift in absolute level. When two different sets of subjects judged the slightly different stimulus ranges, both methods resulted in scale values consistent with absolute scaling, though only category partitioning was sensitive enough to differentiate the two stimulus ranges. The results are discussed in the context of different possibilities of anchoring direct scaling methods in order to obtain “absolute” level information.  相似文献   

14.
Residuals for check of model fit in the polytomous Rasch model are examined. Comparisons are made between using counts for all response pattern and using item totals for score groups for the construction of the residuals. Comparisons are also, for the residuals based on score group totals, made between using as basis the item totals, or using the estimated item parameters. The developed methods are illustrated by two examples, one from a psychiatric rating scale, one from a Danish Welfare Study.  相似文献   

15.
本研究用中文修订版罗森博格自尊量表(RSES-R)考察随机截距因子分析模型在控制条目表述效应时的表现。用RSES-R和过分宣称问卷组成的量表调查621名中学生。结果表明,随机截距模型在建模时,拟合指数良好、因子方差与负荷合理,自尊因子分与RSES-R总分有极高相关,表明该模型能有效分离RSES-R得分的特质与表述效应。分离的表述效应因子分与受测者的自我提升水平具有显著但较弱的相关,表明表述效应与自受测者的社会赞许性有共同的成分。  相似文献   

16.
丁树良  罗芬  戴海琦  朱玮 《心理学报》2007,39(4):730-736
在IRT框架下,建立了0-1评分方式下单维双参数Logistic多题多做(MAMI)测验模型。与Spray给出的一题多做(MASI)模型相比,MAMI不仅模型更加精致,而且扩展了适用范围,参数估计方法也不同,采用EM算法求取项目参数。Monte Carlo模拟结果显示,应用MAMI测验模型与测验题量作相应增加的作法相比,两者给出的能力估计精度相同,但MAMI模型给出的项目参数估计精度更高。如果将MAMI测验模型与被试人数相应增加的作法相比,项目参数的估计精度相同,但MAMI给出的能力参数估计精度更高。这个发现表明,在一定条件下若允许修改答案,并采用累加式记分方式,纵使题量不变,也可使能力估计的精度相当于题量增加一倍的估计精度,而项目参数估计精度也会提高。这些发现不仅对技能评价和认知能力评价有参考价值,而且对数据的处理方式也有参考价值  相似文献   

17.
Many questions in the social sciences reduce to a comparison of mean values across groups in a classical analysis of variance F test. Often the original data my come from a set of items in a questionnaire or personality inventory. When this occurs, some sort of data reduction, combining of items, or scaling procedure is first performed before the hypothesis of no difference in mean values across groups can be made. In many cases, this problem causes undue concern t0 a researcher because the effect of the scoring procedure on the distribution of F is not clear. To help solve this problem, this study was undertaken to investigate whether the method used to calculate scores has any effect on the magnitude of the F ratio in an analysis of variance, for, if it were shown that no statistical difference existd, then a researcher would have some justification for showing the procedure having minimal messes. On the other hand, if statistical differences were b arise because of the kind d scaling procedure employed, then a researcher would have to be more cautious in his choice. For this empirical investigation, Guttman, Saaotor, and simple sum scores were generated using item responses from a large pool of high school seniors. No difference in scoring method was detected when the F ratios resulting from each of the three scoring methods were analyzed. This suggests that, for chin analyses, a simple sum score may be as effective as mres derived by more complicated methods.  相似文献   

18.
Lord  Frederic M. 《Psychometrika》1960,25(4):325-342
Formulas are derived for using the available item statistics and score statistics on a test to estimate the moments of the score distribution of a lengthened (or shortened) form of the same test. Other formulas are derived for estimating the bivariate moments of the scatterplot between two parallel test forms using only the data available on either form alone. An empirical study is made showing in each case satisfactory agreement between the theoretical values predicted from the formulas and the values actually observed. These results suggest the utility of the true-score model used in deriving the formulas.This work was supported by contract Nonr-2752(00) between the Office of Naval Research and Educational Testing Service. Reproduction in whole or in part for any purpose of the United States Government is permitted.  相似文献   

19.
Abstract.— A discriminant-analysis method for dichotomized data, based on the weighted H -index as the similarity measure between two persons, is introduced. The weight assigned each item is a strictly increasing function of the absolute value of its D -estimate. Here, only power functions are used. The method which is called the WHIDD-analysis, is applied to some clinical data (Jonsson, 1975). The power of 3 produces a correct classification of all 32 persons in the validation group.  相似文献   

20.
Under certain assumptions an expression, in terms of item difficulties and intercorrelations, is derived for the curvilinear correlation of test score on the ability underlying the test, this ability being defined as the common factor of the item tetrachoric intercorrelations corrected for guessing. It is shown that this curvilinear correlation is equal to the square root of the test reliability. Numerical values for these curvilinear correlations are presented for a number of hypothetical tests, defined in terms of their item parameters. These numerical results indicate that the reliability and the curvilinear correlation will be maximized by (1) minimizing the variability of item difficulty and (2) making the level of item difficulty somewhat easier than the halfway point between a chance percentage of correct answers and 100 per cent correct answers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号