首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Statistical significance tests are derived and evaluated for measuring apparent differences between an obtained and an expected binormal ROC curve, between two independent binormal ROC curves, and among groups of independent binormal ROC curves. A binormal ROC curve is described by two parameters which represent the spread of the means and the ratio of the standard deviations of the two underlying Gaussian decision variable distributions. To test the significance of apparent differences between or among ROC curves, approximate χ2 statistics for each of the three tests were constructed from maximum likelihood estimates of the two parameters defining the binormal ROC curve. The performance of each test statistic was evaluated by simulating five-category rating scale data with equal numbers of noise and signal-plus-noise trials (set at 50, 250, and 500) for each of three typical ROC curves. For the significance test involving only one ROC curve, rating scale data were generated from the chance diagonal of the ROC space also. Although test performance was found to be somewhat dependent on the number of trials and on the location of the ROC curve in the ROC space, comparisons of the obtained and expected fractions of (falsely) significant results at various α levels showed the proposed statistical significance tests to be reliable under practical experimental conditions.  相似文献   

2.
An introductory study of the perception of stochastically specified events is reported. The initial problem was to determine whether the perceiver can split visual input data of this kind into random and determined components. The inability of subjects to do so with the stimulus material used (a filmlike sequence of dot patterns), led to the more general question of how subjects code this kind of visual material. To meet the difficulty of defining the subjects' responses, two experiments were designed. In both, patterns were presented as a rapid sequence of dots on a screen. The patterns were more or less disturbed by “noise,” i.e. the dots did not appear exactly at their proper places. In the first experiment the response was a rating on a semantic scale, in the second an identification from among a set of alternative patterns. The results of these experiments give some insight in the coding systems adopted by the subjects. First, noise appears to be detrimental to pattern recognition, especially to patterns with little spread. Second, this shows connections with the factors obtained from analysis of the semantic ratings, e.g. easily disturbed patterns show a large drop in the semantic regularity factor, when only a little noise is added.  相似文献   

3.
Eighty-three undergraduate subjects (58 women and 25 men) participated in a prospective study in which they (a) completed widely used objective and projective measures of dependency, and then (b) provided monthly reports of the frequency and impact of various types of life events during a 1-semester (3-month) period. As expected, subjects' projective dependency scores predicted their frequency estimates and impact ratings of interpersonal life events but were unrelated to frequency estimates and impact ratings of other types of life events (e.g., achievement-related, legal). Objective dependency scores were unrelated to all life event frequency estimates and impact ratings. Findings are discussed in the context of recent theoretical frameworks that distinguish implicit dependency needs (which are assessed via projective measures) from self-attributed dependency needs (which are assessed via self-report tests). The importance of the type of dependency measure used in studies of the dependency-life events relationship is emphasized.  相似文献   

4.
Norms of rated subjective frequency of use and imagery on seven-point scales are reported for 1,916 French nouns. Subjective frequency was defined as the rated frequency of occurrence of words in spoken French, and imagery was defined as the rated case with which a word aroused a mental image. The mean, standard deviation, and percentile rank of the frequency and imagery ratings for each item are presented in the Appendix together with their objective frequency of occurrence in Baudot's (1992) dictionary. Interjudge reliability was assessed by calculating the correlation between the mean ratings of items repeated in the booklet, between the mean ratings obtained from odd-numbered and even-numbered respondents, and by computing the Cronbach alpha statistic for each page of the booklet. These reliability estimates were equal to or greater than .92 for frequency and for imagery, confirming the high level of interjudge consistency. Although the estimates provided by female and male participants were highly correlated (r = .97), the former gave a slightly higher frequency rating to the word sample but a slightly lower imagery rating than the latter did. Moreover, female respondents gave slightly more extreme ratings on the frequency and imagery scales. An analysis of the absolute difference between female and male ratings revealed a discrepancy of one half point or more on 20% of the word sample for frequency and 13% for imagery. On both scales, the mean absolute difference between male and female ratings was larger than that obtained by chance alone. This finding highlights the possibility that some words may not be equally familiar to women and men or may not evoke imagery with the same ease in these groups. Validity estimates for the frequency and imagery ratings were derived from correlations with scale values drawn from other normative studies. These correlation coefficients were equal to or greater than .78 for frequency and .86 for imagery, confirming the high level of consistency between this and other studies. An analysis of the relationship between subjective frequency and imagery ratings indicated that these variables are generally uncorrelated but exceptions occur. In the present study the coefficient of the correlation between subjective frequency and imagery was .24. However, when items with extreme mean frequency were excluded from the calculation, the correlation coefficient dropped to .04 and was no longer significant. Imagery ratings from five independent studies were all positively and significantly correlated with Vikis-Freibergs's (1974) frequency estimates, which were obtained from a free-association task. This finding suggests that word association, as a form of cued recall, may be influenced by several stimulus attributes including prior frequency of association and imagery-evoking value. The pattern of correlation between imagery ratings and text-based frequency estimates is not coherent. It reveals significant correlations only in select cases and no consistent polarity of linear relationship. The main contribution of this research is to provide reliable estimates of subjective frequency and imagery value for a word sample that is larger than those included in previous studies. A close examination of the linear relationship among the various sources of frequency and imagery data underscores the risk of confounding these variables in the selection of lexical stimuli for research.  相似文献   

5.
In a field study, models for magnitude estimation and for category ratings are applied to the scaling of occupational prestige. The two respective models provide sufficient conditions for magnitude estimates to yield logarithmic interval scales and for category ratings to lead to interval scales. Both models are found to hold reasonably well for the majority of respondents. As implied by a third model, the relation between magnitude estimation and category rating scales can well be described by a generalized power function. Although overall results do not favour one method over the other individual data analyses reveal substantial interindividual differences with respect to the capability of performing magnitude estimates and category ratings, respectively. The findings are compared to results recently found in psychophysical laboratory experiments, and it is concluded that the individual scale properties the two methods provide do not differ across the attitudinal and the sensory domains.  相似文献   

6.
A rating method was used to obtain operating characteristics for 60 heterogeneous words. A single message was heard in noise, or seen briefly in a tachistoscope. It was repeated until it had been assigned to the highest accuracy category (“confirmed”) or sent a maximum of six trials. The comparisons showed that it matters little whether reception is by eye or by ear. Whether within a trial or over successive repetitions, accuracy of reception is a direct function of the confidence rating and is relatively independent of the intelligibility level. Neither do the accuracy of reception or the distributions of rating categories change markedly over trials. Although no direct test was made, it appears that accuracy of reception is not lessened by the task of rating. Both visual and auditory data are fitted reasonably well by predictions made from a simple stochastic model based on the assumptions that (1) intelligibility, (2) probability of a correct acceptance, and (3) probability of an incorrect acceptance remain constant over successive repetitions. The model fits the visual better than the auditory data, as might be expected, since conditions of reception are more homogeneous over trials for vision than for audition.  相似文献   

7.
Eighty-three undergraduate subjects (58 women and 25 men) participated in a prospective study in which they (a) completed widely used objective and projective measures of dependency, and then (b) provided monthly reports of the frequency and impact of various types of life events during a 1-semester (3-month) period. As expected, subjects' projective dependency scores predicted their frequency estimates and impact ratings of interpersonal life events but were unrelated to frequency estimates and impact ratings of other types of life events (e.g., achievement-related, legal). Objective dependency scores were unrelated to all life event frequency estimates and impact ratings. Findings are discussed in the context of recent theoretical frameworks that distinguish implicit dependency needs (which are assessed via projective measures) from self-attributed dependency needs (which are assessed via self-report tests). The importance of the type of dependency measure used in studies of the dependency-life events relationship is emphasized.  相似文献   

8.
Rater bias in the EASI temperament scales: a twin study   总被引:1,自引:0,他引:1  
Under trait theory, ratings may be modeled as a function of the temperament of the child and the bias of the rater. Two linear structural equation models are described, one for mutual self- and partner ratings, and one for multiple ratings of related individuals. Application of the first model to EASI temperament data collected from spouses rating each other shows moderate agreement between raters and little rating bias. Spouse pairs agree moderately when rating their twin children, but there is significantly rater bias, with greater bias for monozygotic than for dizygotic twins. MLE's of heritability are approximately .5 for all temperament scales with no common environmental variance. Results are discussed with reference to trait validity, the person-situation debate, halo effects, and stereotyping. Questionnaire development using ratings on family members permits increased rater agreement and reduced rater bias.  相似文献   

9.
In a recognition memory experiment, Mickes, Wixted, and Wais (2007) reported that distributional statistics computed from ratings made using a 20-point confidence scale (which showed that the standard deviation of the ratings made to lures was approximately 0.80 times that of the targets) essentially matched the distributional statistics estimated indirectly by fitting a Gaussian signal-detection model to the receiver-operating characteristic (ROC). We argued that the parallel results serve to increase confidence in the Gaussian unequal-variance model of recognition memory. Rouder, Pratte, and Morey (2010) argue that the results are instead uninformative. In their view, parametric models of latent memory strength are not empirically distinguishable. As such, they argue, our conclusions are arbitrary, and parametric ROC analysis should be abandoned. In an attempt to demonstrate the inherent untestability of parametric models, they describe a non-Gaussian equal-variance model that purportedly accounts for our findings just as well as the Gaussian unequal-variance model does. However, we show that their new model—despite being contrived after the fact and in full view of the to-be-explained data—does not account for the results as well as the unequal-variance Gaussian model does. This outcome manifestly demonstrates that parametric models are, in fact, testable. Moreover, the results differentially favor the Gaussian account over the probit model and over several other reasonable distributional forms (such as the Weibull and the lognormal).  相似文献   

10.
The Gaussian model of signal detection cannot fit asymmetric data as long as the variances of the distributions are kept equal. It is therefore common practice to assume unequal variances in order to fit these data. But this assumption leads to the well-known crossover problem. The present paper provides new arguments for the abandonment of the Gaussian model with unequal variances. In its stead, this paper reevaluates multiple-parallel-threshold models. In particular, the Poisson model turns out to be very useful: it can handle data with any degree of asymmetry, giving a reasonable interpretation of the two parameters of the receiver-operating characteristic. The three-state-threshold model (Krantz, 1969) is given a new interpretation in light of the Poisson model. The slope of Poisson double-probability plots turns out to be much closer to unity than is predicted by the Gaussian approximation.  相似文献   

11.
Sense of agency, a feeling of generating actions and events by oneself, stems from action–outcome congruence. An implicit marker of sense of agency is intentional binding, which is compression of subjective temporal interval between action and outcome. We investigated relationships between intentional binding and explicit sense of agency. Participants pressed a key triggering auditory (Experiment 1) or visual outcome (Experiment 2) that occurred after variable delays. In each trial, participants rated their agency over the outcome and estimated the keypress–outcome temporal interval. Results showed that delays decreased agency ratings and intentional binding. There was inter-individual correlation between sensitivities to outcome delay (i.e., regression slope) of agency rating and intentional binding in the auditory but not visual domain. Importantly, we found intra-individual correlations between agency rating and intentional binding on a trial-by-trial basis in both outcome modalities. These results suggest that intentional binding coincides with explicit sense of agency.  相似文献   

12.
The ability to perceive others’ actions and coordinate our own body movements accordingly is essential for humans to interact with the social world. However, it is still unclear how the visual system achieves the remarkable feat of identifying temporally coordinated joint actions between individuals. Specifically, do humans rely on certain visual features of coordinated movements to facilitate the detection of meaningful interactivity? To address this question, participants viewed short video sequences of two actors performing different joint actions, such as handshakes, high fives, etc. Temporal misalignments were introduced to shift one actor’s movements forward or backward in time relative to the partner actor. Participants rated the degree of interactivity for the temporally shifted joint actions. The impact of temporal offsets on human interactivity ratings varied for different types of joint actions. Based on human rating distributions, we used a probabilistic cluster model to infer latent categories, each revealing shared characteristics of coordinated movements among sets of joint actions. Further analysis on the clustered structure suggested that global motion synchrony, spatial proximity between actors, and highly salient moments of interpersonal coordination are critical features that impact judgments of interactivity.  相似文献   

13.
The Gaussian model of signal detection cannot fit asymmetric data as long as the variances of the distributions are kept equal. It is therefore common practice to assume unequal variances in order to fit these data. But this assumption leads to the well-known crossover problem. The present paper provides new arguments for the abandonment of the Gaussian model with unequal variances. In its stead, this paper reevaluates multiple-parallel-threshold models. In particular, the Poisson model turns out to be very useful: it can handle data with any degree of asymmetry, giving a reasonable interpretation of the two parameters of the receiver-operating characteristic. The three-state-threshold model (Krantz, 1969) is given a new interpretation in light of the Poisson model. The slope of Poisson double-probability plots turns out to be much closer to unity than is predicted by the Gaussian approximation.  相似文献   

14.
In four experiments involving 184 participants, people rated their confidence that particular events had happened in their childhood (e.g., "Broke a window playing ball"). If participants had to unscramble a key word in a phrase just before rating it (e.g., "Broke a nwidwo [window] playing ball"), confidence ratings increased-the revelation effect. However, the pattern of revelation effects depended on the particular way in which participants processed key words (e.g., visualizing vs. counting vowels in the word window) approximately 10 min prior to rating life events that contained those words. Prior exposure to key words never in itself directly affected confidence ratings. These results demonstrate that one can manipulate the revelation effect by altering the processing that participants perform on words prior to unscrambling them. These results also pose difficulties for many accounts of the revelation effect. The major puzzle posed by our present findings is that unscrambling key words increases confidence that an event has happened in childhood, whereas prior exposure to these words does not.  相似文献   

15.
In this study incumbents from three different jobs were asked to rate lists of their job tasks on various constructs (e.g., time, importance) and then estimate the percentage of their job tasks (task coverage) included in the task list. Incumbents made these ratings under one of two conditions. In two instances, two months after making an initial task coverage rating for the entire list of tasks, the same incumbents were asked to estimate the task coverage of a reduced list of tasks (i.e., half to two-thirds of the tasks were removed from the list presented for rating). In a third instance one group of incumbents completed an entire inventory while a second group completed a reduced inventory. The average task coverage rating for the entire inventories were high (percent estimates ranging in the 80's–90's) and the average of the reduced inventories was much higher than expected (percent estimates in the 70's). It was concluded that incumbents and supervisors were not able to accurately estimate task coverage.  相似文献   

16.
Retrospective rating scales are widely used for formal assessment of typical performance. Raters who are the most familiar/interactive with ratees are routinely recommended to maximize the quality of ratings. This caveat to use the most familiar/interactive raters fails to distinguish sampling parameters of the observations on which ratings are based that may be important to assessing different classes of behavior. We hypothesized that systematic observational schedules would be of greater importance to ratings of public events than familiarity/interaction, per se, while the caveat would hold for ratings of private events. We used the Psychotic Inpatient Profile (PIP), which provides separate factor scores for ratings of public and private events, to examine these hypotheses in a quasi-experimental study with adult inpatients of mental hospitals. A large multiinstitutional data set provided retrospective PIP ratings by two types of raters. The most familiar/interactive local clinical staff for each client completed the PIP after observing on an ad lib schedule, along with ongoing job duties. Unfamiliar, noninteractive raters completed the PIP for each client after observing on a systematic time-sampling schedule for purposes of coding an entirely different instrument. Data were selected so that each of 189 clients received PIP scores from four raters, reflecting functioning during the same time period based on day-shift observations by one rater of each type and evening-shift observations by one rater of each type. Analyses of variance, consistency/discriminability of ratings, and prediction of social-action outcomes all supported the hypotheses. We discuss alternative strategies that are better for assessing typical performance in most circumstances. We also provide recommendations for improving the adequacy of observations for those circumstances in which the standardized retrospective rating scale could be a cost-effective assessment strategy.This study was the basis of a master's thesis at the University of Houston by the senior author under the direction of the junior authors. Richard M. Rozelle served on the examination committee. This study was partially supported by grants to Gordon L. Paul from the National Institute of Mental Health, Public Health Service (MH-15353; MH-25464); the Illinois Department of Mental Health and Developmental Disabilities; the Joyce Foundation; the MacArthur Foundation; the Owsley Foundation; the Cullen Foundation; and the Center for Public Policy of the University of Houston.  相似文献   

17.
The Adaptive Visual Analog Scales is a freely available computer software package designed to be a flexible tool for the creation, administration, and automated scoring of both continuous and discrete visual analog scale formats. The continuous format is a series of individual items that are rated along a solid line and scored as a percentage of distance from one of the two anchors of the rating line. The discrete format is a series of individual items that use a specific number of ordinal choices for rating each item. This software offers separate options for the creation and use of standardized instructions, practice sessions, and rating administration, all of which can be customized by the investigator. A unique participant/patient ID is used to store scores for each item, and individual data from each administration are automatically appended to that scale’s data storage file. This software provides flexible, time-saving access for data management and/or importing data into statistical packages. This tool can be adapted so as to gather ratings for a wide range of clinical and research uses and is freely available at www.nrlc-group.net.  相似文献   

18.
《人类行为》2013,26(1):19-35
Investigations of the construct-related evidence of the validity of performance ratings have been rare, perhaps because researchers are dissuaded by the con- siderable amount of evidence needed to show construct validity (Landy, 1986). It is argued that generalizability (G) theory (Cronbach, Gleser, Nanda, & Rajaratnam, 1972) is well-suited to investigations of construct-related evi- dence of validity because a single generalizability investigation may provide multiple inferences of validity. G theory permits the researcher to partition observed score variance into universe (true) score variance and multiple, distinct estimates of error variance. G theory was illustrated through the anal- ysis of proficiency ratings of 256 Air Force jet engine mechanics. Mechanics were rated on three different rating forms by themselves, peers, and supervi- sors. Interpretation of G study variance components revealed suitable evi- dence of construct validity. Ratings within sources were reliable. Proficiency ratings showed strong convergence over rating forms, though not over rating sources. Raters showed adequate discriminant validity across rating dimen- sions. The expectation of convergence over sources was further questioned.  相似文献   

19.
We obtain a generalized increment threshold law that describes the relationship between the just detectable average signal increment and the noise when the threshold is stabilized, for the usual yes-no (nonorthogonal) signal format. The results are applicable to audition and vision, for arbitrary (discrete or continuous) signal and noise statistics, and arbitrary detection probabilities. The results are most useful when the noise and signal-plusnoise probability densities have the same form. Previous treatments of this problem have been restricted to Gaussian and Poisson signal and noise densities.  相似文献   

20.
We tested the effects of rater agreeableness on the rating of others’ poor performance in performance appraisal (PA). We also examined the interactions between rater agreeableness and two aspects of the rating context: ratee self‐ratings and the prospect of future collaboration with the ratee. Participants (n= 230) were allocated to one of six experimental groups (a 3 × 2 between‐groups design) or a control group (n= 20). Participants received accurate, low‐deviated, or high‐deviated self‐ratings from the ratee. Half were notified they would collaborate with the ratee in a future task. High rater agreeableness, positive deviations in self‐rating, and the prospect of future collaboration were all independent predictors of higher PA ratings. The interactions between rater agreeableness and rating context were very small. We argue that conflict avoidance is an important motivation in the PA process.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号