首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Interval by interval reliability has been criticized for "inflating" observer agreement when target behavior rates are very low or very high. Scored interval reliability and its converse, unscored interval reliability, however, vary as target behavior rates vary when observer disagreement rates are constant. These problems, along with the existence of "chance" values of each reliability which also vary as a function of response rate, may cause researchers and consumers difficulty in interpreting observer agreement measures. Because each of these reliabilities essentially compares observer disagreements to a different base, it is suggested that the disagreement rate itself be the first measure of agreement examined, and its magnitude relative to occurrence and to nonoccurrence agreements then be considered. This is easily done via a graphic presentation of the disagreement range as a bandwidth around reported rates of target behavior. Such a graphic presentation summarizes all the information collected during reliability assessments and permits visual determination of each of the three reliabilities. In addition, graphing the "chance" disagreement range around the bandwidth permits easy determination of whether or not true observer agreement has likely been demonstrated. Finally, the limits of the disagreement bandwidth help assess the believability of claimed experimental effects: those leaving no overlap between disagreement ranges are probably believable, others are not.  相似文献   

2.
An observational system, which has been developed to facilitate recording of the total behavioral repertoire of autistic children, involves time-sampling recording of behavior with the help of a common Stenograph machine. Categories which exhausted all behavior were defined. Each category corresponded with a designated key on the Stenograph machine. The observer depressed one key at each 1-sec interval. The observer was paced by audible beats from a metronome. A naive observer can be used with this method. The observer is not mechanically limited and a minimum of observer training is required to obtain reliable measures. The data sampled during a five-week observation period indicated the stability of a taxonomic instrument of behavior based upon direct, time-sampling observations and the stability of spontaneous autistic behavior. Results showed that the behavior of the subjects was largely nonrandom and unsocialized in character.  相似文献   

3.
Four studies examined a new measure of compulsive hoarding (Saving Inventory-Revised; SI-R). Factor analysis using 139 hoarding participants identified 3 factors: difficulty discarding, excessive clutter, and excessive acquisition. Additional studies were conducted with hoarding participants, OCD participants without hoarding, community controls and an elderly sample exhibiting a range of hoarding behavior. Internal consistencies and test-retest reliabilities were good. The SI-R distinguished hoarding participants from all other non-hoarding comparison groups. The SI-R showed strong correlations with other indices and methods of measuring hoarding (beliefs, activity dysfunction from clutter, observer ratings of clutter in the home) and relatively weaker correlations with non-hoarding measures (positive and negative affect and OCD symptoms). The SI-R appears to be an appropriate instrument for assessing symptoms of compulsive hoarding in clinical and non-clinical samples.  相似文献   

4.
Bales (1981) tied his new method of group observation (SYMLOG) io several well-established theories and methods in the social sciences. Among these was Eysenck's Social Attitude space. With two reservations, Bales' initial assumptions concerning the isomorphism of the image or value level of SYMLOG and the Eysenck space were confirmed. First, conservatism was more strongly linked to self-interest and concern for material success and power than to conventionality. Second, the two dimensions of the Eysenck space were correlated in our sample. It is suggested that, while the structure of attitudes and values may remain constant, the correlations between dimensions can be used as an indication of primary and secondary polarizations in attitudes and values and that these polarizations may well reflect the political structure of the specific context. Implications of this observation for the analysis of issue polarization and policy formation are discussed, as are several points of comparison between the two systems.  相似文献   

5.
This study is about agreement on the assignment into the three basic classes or categories (A, B, C) of the Arbeitsgemeinschaft für Osteosynthesefragen/Association for the Study of Internal Fixation's (AO/ASIF) classification system for distal radial fractures. A random sample of 124 distal radial fractures was classified by two experienced observers. Their agreement was calculated according to Cohen's kappa statistic. To investigate the possible bases for disagreement, all conflicting X-ray assessments were discussed in a consensus meeting. It appeared that the kappa value was .65 (good agreement) before the meeting; kappa rose to .86 (excellent agreement) after the consensus meeting. It appeared that the undisplaced fractures were a major source of disagreement. Further, the presence of articular involvement was an important issue. It was frequently noted that one observer classified the fracture as extraarticular (basic Class A), while the other observer chose classification as an intra-articular fracture (basic Class C) or vice versa. This phenomenon has been called the A/C reversal shift. It is concluded that radiological innovations might enhance agreement on articular involvement, and a separate category for undisplaced fractures should be defined in the Arbeitsgemeinschaft für Osteosynthesefragen (AO) system. However, agreement on relevant distinctive features and discussion of conflicting assessments may also be important in achieving excellent agreement.  相似文献   

6.
Although a fully general extension of ROC analysis to classification tasks with more than two classes has yet to be developed, the potential benefits to be gained from a practical performance evaluation methodology for classification tasks with three classes have motivated a number of research groups to propose methods based on constrained or simplified observer or data models. Here we consider an ideal observer in a task with underlying data drawn from three univariate normal distributions. We investigate the behavior of the resulting ideal observer’s decision variables and ROC surface. In particular, we show that the pair of ideal observer decision variables is constrained to a parametric curve in two-dimensional likelihood ratio space, and that the decision boundary line segments used by the ideal observer can intersect this curve in at most six places. From this, we further show that the resulting ROC surface has at most four degrees of freedom at any point, and not the five that would be required, in general, for a surface in a six-dimensional space to be non-degenerate. In light of the difficulties we have previously pointed out in generalizing the well-known area under the ROC curve performance metric to tasks with three or more classes, the problem of developing a suitable and fully general performance metric for classification tasks with three or more classes remains unsolved.  相似文献   

7.
Using self- and observer reports on the Personality Inventory for DSM–5 (PID–5) and the HEXACO Personality Inventory–Revised (HEXACO–PI–R), we identified for each inventory several trait dimensions (each defined by both self- and observer reports on the facet-level scales belonging to the same domain) and 2 source dimensions (each defined by self-reports or by observer reports, respectively, on all facet-level scales). Results (N = 217) showed that the source dimensions of the PID–5 were very large (much larger than those of the HEXACO–PI–R), and suggest that self-report (or observer report) response styles substantially inflate the intercorrelations and the alpha reliabilities of the PID–5 scales. We discuss the meaning and the implications of the large PID–5 source components, and we suggest some methods of controlling their influence.  相似文献   

8.
The present study assesses and evaluates the psychometric properties of a Swedish version of the Job Stress Survey (JSS; Spielberger, 1991; Spielberger & Vagg, 1999). This instrument is constructed to measure generic sources of occupational stress encountered by employees in a wide variety of work settings, settings that often result in psychological strain. The JSS was administered to metal assembly industry workers and medical service personnel in northern Sweden (n= 1186). The exploratory factor analysis showed that there is a high similarity between the present Swedish version and the original American version. Internal reliabilities of the scales, as well as test-retest reliabilities were shown to be high, and concurrent validity, as examined by comparisons with the Perceived Stress Questionnaire Index (Levenstein et al., 1993) was found to be satisfactory. The consistency of these findings is discussed with particular focus on groups of employees, gender, and cross-cultural evaluations.  相似文献   

9.
Abstract

The preliminary development of a new scale to measure attitudes towards medicine and doctors is described. The scale comprises four factors: “positive attitude towards doctors,” “positive attitude towards medicine,” “negative attitude towards doctors,” and “negative attitude towards medicine.” Alpha coefficients for the four scales were satisfactory, ranging from 0.69 to 0.76. Test re-test reliabilities ranged from 0.69 to 0.81. Evidence of criterion-related validity was obtained from comparison of the attitudes of five groups involved in health care: nurses, medical students, patients, and sociologists and psychologists who are teachers in medical schools. Patients and medical students held significantly more positive attitudes towards doctors and medicine that did nurses and behavioural scientists. While nurses were as negative towards doctors as sociologists and psychologists, they were significantly less negative about medicine.  相似文献   

10.
Nine hundred and twenty-one males and 555 females in Uganda completed the Eysenck Personality Questionnaire. Indices of factor comparison indicated that the personality dimensions of P, E, N and L were virtually identical in Uganda and England. Some item changes were required to establish satisfactory reliabilities (alpha) for all factors. Sex differences revealed that males scored higher than females on E but lower on N, which is the usual finding. Strikingly, however, there was no sex difference for P and L, there being, in fact, a very slight tendency for females to score above males on P and below them on L. Cross-cultural comparisons, using only items both Ugandan and English scoring keys had in common, showed Ugandan Ss to score higher on L than their English counterparts, Ugandan males also scoring higher on E and N and Ugandan females scoring higher on P.  相似文献   

11.
12.
Parents used self-instructional booklets to decrease their children's (age4-8) whining. In each of 9 families, a multiple-baseline design across three problems, whining and two other was used. Parent data indicate mean improvement of 26% of the maximum possible from baseline means, with 8 of 9 children showing improvement. All parent final consumer ratings were positive. All interobserver reliabilities exceeded 80% agreement weighted for occurrence and nonoccurrence. Correlations for two sets of data between frequencies of whining estimated by parents twice an hour and percent of intervals recorded for whining from observer interval data for the hour produced median correlations of .62 and .51. Percent agreement between observer and parent data, both using interval recording at the same time, produced a median coefficient of agreement weighted for occurrence and non-occurrence of 59%. Results suggest that parents using self-instructional materials alone could reduce children's whining from levels originally considered excessive to levels parents considered acceptable.  相似文献   

13.
中层管理人员结构化面试测评效度的现场研究   总被引:2,自引:0,他引:2  
通过对某上市公司随机抽取的43位中层管理人员素质测评的现场研究,探讨结构化面试的信度效度问题。研究设计基于岗位分析与关键事件分析,采用3人小组面试的方法,同时实施情景面试与行为描述面试,综合测评被试岗位胜任能力。分析结果表明,评委要素评价内部一致性和评委间内部一致性都比较高,并与面试半年后上级评定的任务绩效和总体绩效显著相关,结构化面试具有较高的信度与预测效度。进一步比较情景面试和行为描述面试发现,这两种结构化面试有类似的信度,但是行为描述面试具有更高的效度。  相似文献   

14.
THE OBJECTIFIED BODY CONSCIOUSNESS SCALE Development and Validation   总被引:9,自引:0,他引:9  
Using feminist theory about the social construction of the female body, a scale was developed and validated to measure objectified body consciousness (OBC) in young women ( N = 502) and middle-aged women ( N = 151). Scales used were (a) surveillance (viewing the body as an outside observer), (b) body shame (feeling shame when the body does not conform), and (c) appearance control beliefs. The three scales were demonstrated to be distinct dimensions with acceptable reliabilities. Surveillance and body shame correlated negatively with body esteem. Control beliefs correlated positively with body esteem in young women and were related to frequency of restricted eating in all samples. All three scales were positively related to disordered eating. The relationship of OBC to women's body experience is discussed.  相似文献   

15.
The theory of motivated cheating postulates that test takers may cheat when they do not know an answer. With probabilityk, an “observer” is unsure of an answer and will copy from a nearby “target” with probabilityc. The corresponding parameters for the target may be entirely unrelated to those of the observer. Thus, the undesirable feature of bidirectionality of parameters found in correlational techniques is not an inherent feature of this theory of cheating. Predictions are derived, and estimates ofk andc are proposed. Statistically large values of c suggest that an observer was copying from a target. High values ofc for both the observer and the target suggest collusion. The theory is applied to a 40-item five-choice test taken by students in an introductory psychology section. From the full paired comparison matrix of target × observer parameter estimates, the method identifies 2 students who were probably in collusion.  相似文献   

16.
To obtain estimates of observer reliability, the Fagan Test of Infant Intelligence (FTII) apparatus was modified to allow the infants' performance to be videotaped. Based upon results of 25 infants scored once during time of testing and again 2 years later using the videotape version, interobserver and intraobserver reliabilities obtained for percent novelty preference (test rounds), total number of looks (familiarization and test rounds), and mean fixation (familiarization and test rounds) were mostly very high for each round (M r = .92, SD = .04). The videotaped infants' scores did not differ significantly from those of a comparable sample of infants who were tested using an unmodified apparatus.  相似文献   

17.
This paper presents the theoretical and methodological basis of a therapist's verbal behavior category system that allows us to study clinical psychologists' language from a functional-analytic framework and with a rigorous observation method. The procedure to develop the coding system is explained in detail from a very early stage of exploratory observation, to the systematic observation through the use of The Observer XT software. An analysis of intra- and inter-rater reliability using the kappa coefficient and taking into account the factors that affect the values of Cohen's index was carried out. Results show high levels of observer accuracy (between approximately 87% and 93%) that justify the application of this category system to study therapists' verbal behavior in session.  相似文献   

18.
The effects of progressive intoxication were studied in male social drinkers classified from prior histories as either aggressively (A) or nonaggressively (NA) predisposed while intoxicated. Two groups of two A and two NA subjects engaged in videotaped group discussions that were analyzed by Bales interaction process analysis (IPA). At comparable levels of ad libitum alcohol intake in a natural drinking environment, significantly more verbal activity was displayed by the A subjects than by the NA subjects (P <.001), including IPA category D (P <.025). The A subjects tended to address the group as a whole rather than individual members (P <.001) and NA subjects rather than other A subjects (P <.01). Free testosterone levels assessed from saliva were higher among A subjects than among NA subjects (P <.05) with no significant changes related to time and progressive intoxication. The results suggest that the tendency to behave aggressively while intoxicated may be a fairly stable individual trait, possibly related to androgen levels and active or coercive modes of social communication.  相似文献   

19.
The aim of this paper was to present a method to enable the analysis of the process of categorization of patients’ testimonials and the comparison of individual categories created by professionals. A complex diagnostic task (case conceptualization) was employed to study the categorization function in professional thinking. Two groups of psychotherapists (30 people in each group) served as subjects of the research. The main objective of the study was to find an appropriate representation of concept maps enabling a comparison of both the categories and the structures between experts. In the comparison process, only the information about the premises justifying each given category was taken into account and represented by a concept-testimonials matrix. Three different elements weighting schemes and matrix factorization-based unsupervised clustering methods were analyzed in the context of consistency and ability to establish main semantic groups of concepts common to the majority of experts. Moreover, special attention was paid to determining the number of main semantic classes. The study showed that even the used representation was similar to the task of documents indexing there was some discrepancy. The highest accuracy in generating main semantic groups was achieved using the PCA and K-Means (nKM) (the average false positive rate in clusters was 32%). This method outperformed Tempered PLSA (the average false positive rate per cluster was 52%). It was demonstrated that in analyzed task the nKM method allowed comparing the similarity of concepts even when they were created by various experts using different conceptual apparatus.  相似文献   

20.
Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words of varying length and frequency of occurrence are presented. The words can all be used as nouns. Intergroup reliabilities are satisfactory on all attributes. Correlations with previous word lists are significant, and the intercorrelations between measures match previous findings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号