首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Individually tailored behavioral ratings were introduced as a possible alternative to time sampling or counting each occurrence of the behavior. Interrater reliability was found to be good for the more specific behavior categories and reliability between ratings obtained from direct observation and those based on unit records was acceptable. The usefulness of individually tailored behavioral rating scales in demonstrating behavior change is demonstrated through the presentation of a representative case.  相似文献   

3.
4.
Garb HN 《心理评价》2007,19(1):4-13
To evaluate the value of computer-administered interviews and rating scales, the following topics are reviewed in the present article: (a) strengths and weaknesses of structured and unstructured assessment instruments, (b) advantages and disadvantages of computer administration, and (c) the validity and utility of computer-administered interviews and rating scales. Computer-administered evaluations are more comprehensive and reliable and less biased than evaluations routinely conducted in clinical practice. Also, the use of continuous monitoring systems, which increasingly entail the use of computer administration, has been related to improved treatment outcome. However, the use of computer-administered interviews and rating scales will sometimes lead to false positive diagnoses, and for this reason, it is recommended that computer assessment be combined with clinical judgment.  相似文献   

5.
For the tests in which the score on an item is not restricted to 0 and 1, but is any number on a continuous scale, a procedure for estimating an examinee's true score is given. For the case of 0, 1 item scoring this problem was considered by Lord [1959]. Following Lord, the least squares estimation procedure is used and the regression coefficient is obtained, which is compared with the generalized KR(20) and KR(21) formulas. Also, results are discussed using analysis of variance models.Now at Brooklyn College of the City University of New York.  相似文献   

6.
This paper argues that test data are ordinal, that latent trait scores are only determined ordinally, and that test data are used largely for ordinal purposes. Therefore it is desirable to develop a test theory based only on ordinal assumptions. A set of ordinal assumptions is presented, including an ordinal version of local independence. From these assumptions it is first shown that the gamma-correlation between two tests is the product of their gamma-correlations with the true latent order. The theory is generalized to allow for heterogeneous tests by defining a weighted average local independence. The tau-correlations between total score and the latent order can be found in both homogeneous and heterogeneous cases, and a system of differential item weighting to maximize the tau-correlation between weighted items and the latent order is provided. Thus a purely ordinal test theory seems possible.Part of this work was done while the author was a Visiting Fellow at Macquarrie University. The paper has benefitted from discussions with Professors Thomas J. Reynolds and Roderick P. McDonald and from the comments of several anonymous reviewers.  相似文献   

7.
Rating scales constitute one of the most widely employed techniques in research on personality and individual differences. The historical background of rating scales is therefore a matter of considerable interest. Though Galton has generally been given credit for originating rating scale methodology, several applications of rating scales prior to Galton can be identified, and the seminal idea of rating scales can be traced back to Galen.  相似文献   

8.
9.
10.
A psychophysical experiment compared the effects of two different kinds of anchoring upon category ratings of the sizes of squares: (1) single anchoring in which the same square was presented on every anchoring trial, and (2) multiple anchoring in which squares of different sizes were presented on anchoring trials. Subjects did not rate the anchors, only those squares presented on alternate trials as the series stimuli. The major finding was that the two kinds of anchoring have similar effects. As with the single anchor, the multiple anchor establishes a new endpoint for the scale of judgment. The previously demonstrated relationship of increasing and then decreasing contrast as a function of the remoteness of the single anchor (Sarris, 1967, 1976) was found also for multiple anchoring.  相似文献   

11.
In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE®General Analytical Writing and until 2009 in the case of TOEFL® iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e‐rater®. In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability.  相似文献   

12.
13.
We concern ourselves with the hypothesis that two variables have a perfect disattenuated correlation, hence measure the same trait except for errors of measurement. This hypothesis is equivalent to saying, within the adopted model, that true scores of two psychological tests satisfy a perfect linear relation. Statistical tests of this hypothesis are derived when the relation is specified with the exception of the additive constant. Two approaches are presented and various assumptions concerning the error parameters are used. Then the results are reinterpreted in terms of the possible existence of an unspecified perfect linear relation between true scores of two psychological tests. A numerical example is appended by way of illustration. Research reported in this paper has been supported by grant GB-18230 from National Science Foundation.  相似文献   

14.
The purpose of this study was to generate normative data by grade and sex to accompany behavior rating scales. Teachers rate 483 boys and girls in Grades 1 through 4. The findings suggest rating scales be re-examined since norms by grade level and sex may be desirable attributes.  相似文献   

15.
Walkup and Abbott (1978) stated that Edwards and Ashworth's (1977) failure to replicate Bem's (1974) selection of items for the Masculinity and Femininity Scales of the Bern Sex Role Inventory (BSRI) may be attributed to differences in the instructions and anchored rating scales used in the two studies. The present study tested the hypothesis that presence of various interaction effects involving instructions and rating scales would influence the acceptability of items for the BSRI Masculinity and Femininity Scales. Results based on the evaluation of individual items by Bem's item selection criteria in each of the four experimental conditions obtained by systematically manipulating two instructions (Bem's and Edwards' instructions) and two rating scales (Bem's and Edwards' rating scales) and also those based on the analysis of variance of item mean desirability ratings from the four experimental conditions supported the hypothesis.  相似文献   

16.
17.
Normative score performances on the Child's Attitude toward Mother and Child's Attitude toward Father scales by several adolescent subpopulations important to family therapists and researchers are reported for use in clinical assessment and future research. The instruments were administered to a representative sample of 2,419 Florida adolescents, and subpopulations were constructed based upon parental structure and sex. A previous study investigating psychometric properties of the two instruments was partially replicated. Results indicated that both scales are reliable and valid measures of the magnitude of problems in parent-child relationships from the child's point of view. The scales are recommended for both clinical and research applications.  相似文献   

18.
Shrinkage estimation of linear combinations of true scores   总被引:1,自引:0,他引:1  
This paper is concerned with combining observed scores from sections of tests. It is demonstrated that in the presence of population information a linear combination of true scores can be estimated more efficiently than by the same linear combination of the observed scores. Three criteria for optimality are discussed, but they yield the same solution which can be described and motivated as a multivariate shrinkage estimator.Input from Eric Bradlow, Charles Lewis, and Linda Zeger is acknowledged. Research for this paper was funded by the Program Research Council (ETS). Suggestions of the Editor and of anonymous referees were instrumental in several improvements of the paper.  相似文献   

19.
Behavioral (semantic differential) and neural (Evoked Potentials, EPs) responses were related to connotative meaning. The approach was based on Osgood's semantic analyses and dimensions of Evaluation (E), Potency (P), and Activity (A). The experimental variables were (1) the semantic class of the stimulus word (E+, E-, P+, P-, A+, A-) and (2) the dimension of the semantic scale (E, P, A) which the subject used to rate the stimulus words. These variables were experimentally combined such that on each trial the subject used a designated semantic scale to judge a specified stimulus word while brain activity was recorded. Using multivariate analyses, the effects on the EPs of stimulus word class, scale dimension, and their interaction were analyzed. The EP effects of stimulus word class were similar whether the subjects were merely saying the words or rating the words on a variety of semantic scales. Different EPs were found for six word classes, three semantic scale dimensions, and the 18 groups formed by their combination. The success rates in EP identification of (1) word class and (2) scale dimension did not depend on whether these two kinds of semantic variables involved the same or different semantic dimensions. The two kinds of semantic effects in EPs were largely independent. The behavioral data supported Osgood's results and showed that our subjects were appropriately processing the semantic information. The common analyses of data from all subjects suggest the universality of the connotative EP effects across individuals. This parallels, at the neural level, the universality of the connotative dimensions found at the behavioral level by semantic differential ratings. The EP effects imply that the neural representation of meaning is similar in different individuals.  相似文献   

20.
Judgemental relativism is a threat to the replicability and validity of measures of client behavior from direct rating scales whenever raters are exposed to different levels of client functioning since the internal standards, or anchor points, used to judge dimensional continua may vary on the basis of prior experience. Traditional interrater reliability indexes fail to identify such effects. The influence of judgmental relativism on summated ratings from the Nurses Observational Scale for Inpatient Evaluation (NOSIE-30) for 1040 adult mentally ill clients was examined with clinical staff raters from 24 treatment units in which the Time-Sample Behavioral Checklist (TSBC) provided full-week objective measures of actual client functioning via hourly direct observational coding (DOC). Regression analyses found that the same level of objective performance received higher or lower ratings across treatment units dependent on the raters'exposure to client groups that differed in level of functioning. Analyses of rating errors found that clients with better levels of functioning relative to others within treatment units were rated even higher than performance warranted. The operation of halo and contrast effects is explored and guidelines are provided for determining when judgmental relativism may produce or nullify significant differences. DOC assessments should be used instead of retrospective ratings to support most decisions in residential settings. Specific recommendations for the application of rating scales and improving data quality are provided.This study was the basis of a master's thesis at the University of Houston by Betty E. Rich under the direction of Gordon L. Paul and Marco J. Mariotto. Richard M. Rozelle, to whom appreciation is expressed for helpful comments, served on the examination committee. This study was partially supported by grants to Gordon L. Paul from the National Institute of Mental Health, Public Health Service (MH-15353; MH-25464); the Illinois Department of Mental Health and Developmental Disabilities; the Joyce Foundation; the MacArthur Foundation; the Owsley Foundation; the Cullen Foundation; and the Center for Public Policy, University of Houston.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号