首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A new measure for reliability of a rating scale is introduced, based on the classical definition of reliability, as the ratio of the true score variance and the total variance. Clinical trial data can be employed to estimate the reliability of the scale in use, whenever repeated measurements are taken. The reliability is estimated from the covariance parameters obtained from a linear mixed model. The method provides a single number to express the reliability of the scale, but allows for the study of the reliability’s time evolution. The method is illustrated using a case study in schizophrenia. The authors are grateful to J&J PRD for kind permission to use their data. We gratefully acknowledge support from the Belgian IUAP/PAI network “Statistical Techniques and Modeling for Complex Substantive Questions with Complex Data.”  相似文献   

2.
A covariance structure analysis method for improved point and interval estimation of composite reliability in repeated measure designs is outlined that accounts for specificity variance. The approach also permits the testing of time‐invariance in reliability of multiple‐component instruments in terms of the ratio of ‘pure’ measurement error variance to observed scale score variance. In addition, the procedure allows interval estimation of the difference in composite reliability coefficients across assessment occasions. The method described is illustrated with data from a cognitive intervention study.  相似文献   

3.
Two studies are reported addressing the reliability of the Behavioral and Emotional Rating Scale (BERS). The first study investigated test-retest reliability over a two-week period to determine the stability of the measure over time. The second study investigated inter-rater reliability between two teachers or classroom aides who were familiar with a student to determine the consistency with which the measure can be used by different individuals. In each study, samples were drawn from populations of students identified with emotional or behavioral disorders as specified by federal statutes. Reliability coefficients in each study were above .80, the standard recommended for screening tests that are reported individually, and in most cases above .90. Implications for use of the BERS are discussed.  相似文献   

4.
OBJECTIVE: The objective of the present study was to demonstrate the reciprocal relationships between family adaptation to illness and children's medication use over time among children who presented with wheezing illness in infancy but have varying illness outcomes by age 4. DESIGN: A longitudinal design and latent growth curve models (LGM) were used to predict change in family and caregiver adaptation to illness and children's medication use over three years among 140 infants with wheezing, among families from low socioeconomic, multi-ethnic backgrounds. MAIN OUTCOME MEASURES: One LGM predicted level and change (slope) of family adaptation to illness from children's baseline medication use. The second LGM predicted level and change (slope) of children's medication use from baseline family adjustment to illness. In both models, illness severity, caregivers' psychological resources, and emergency department use were covaried with the independent variable. RESULTS: Two latent growth models were found to adequately fit the data and demonstrate full reciprocal relations between family adaptation to illness and children's medication use while accounting for baseline variables. Baseline measures of caregiver psychological functioning and illness severity were also significant predictors of family adaptation and children's medication use over time. The two models were not statistically different for children with and without active asthma at 4 years of age. CONCLUSION: Findings support the reciprocal effects model of child and family influences on pediatric illness and underscore the importance of early indicators of individual and family functioning.  相似文献   

5.
This study examined the extent to which curriculum-based measurement (CBM) procedures could be implemented in a nonbasal reading curricula. Participants included 160 students from 31 second, third, fourth, and fifth grades, located in two school districts. Half of the students (20 from each grade) were instructed primarily in a literature-based reading series, while the remaining 80 participants (20 from each grade) were instructed primarily in a traditional skills-based reading program. CBM passage probes from each reading series were administered to all students twice weekly over an 8-week period. Students' rate of progress in each reading series was indexed using the slope of their data series' calculated by ordinary least squares regression. Results showed small yet significant main effects for the type of probe used for progress monitoring; however, this effect was not consistent across grades. In addition, significant main effects were found for grade. Students' growth in oral reading rate was found to increase linearly with grade until fifth grade where a leveling off in growth rate was observed. Results are discussed in relation to CBM as an applied measurement methodology for use in both practical and research applications.  相似文献   

6.
Estimating the reliability of scores on single‐item measures can be difficult because commonly used internal consistency estimates of reliability cannot be calculated. When longitudinal data is available, statistical models can be used to decompose the variability in the latent variable at each wave into trait versus state variance. Then, reliability can be estimated as a ratio of the sum of the trait variance that is captured in repeated assessments over the total variance. The current study used latent trait‐state‐error models on a nine‐year longitudinal data (N = 5,003) to estimate the test–retest reliability of scores on a single‐item measure of job satisfaction. Results showed that job satisfaction scores were somewhat unreliable (rxx = .49–.59) and amenable to change.  相似文献   

7.
A Web-based coding application was designed to improve coding efficiency and to provide a systematic means of evaluating responses to open-ended assessments. The system was developed for use by multiple raters to assign open-ended responses to predetermined categories. The application provides a software environment for efficiently supervising the work of coders and evaluating the quality of the coding by (1) systematically presenting open-ended responses to coders, (2) tracking each coder’s categorized responses, and (3) assessing interrater consistency at any time in order to identify coders in need of further training. In addition, the application can be set to automatically assign repeated responses to categories previously identified as appropriate for those responses. To evaluate the efficacy of the coding application and to determine the statistical reliability of coding open-ended data within this application, we examined data from two empirical studies. The results demonstrated substantial interrater agreement on items assigned to various categories across free and controlled association tasks. Overall, this new coding application provides a feasible method of reliably coding open-ended data and makes the task of coding these data more manageable.  相似文献   

8.
This article presents a case study exemplifying the use of curriculum-based measurement (CBM) in educational decision making. CBM is portrayed as more than just a set of brief fluency tests with strong conceptual ties to three related assessment models, viz., behavioral, ecological, and problem-solving assessment. The key elements of each of these models are delineated. A case study is presented to show how CBM is integrated into problem-solving educational assessment and decision-making practices.  相似文献   

9.
Reliability captures the influence of error on a measurement and, in the classical setting, is defined as one minus the ratio of the error variance to the total variance. Laenen, Alonso, and Molenberghs (Psychometrika 73:443–448, 2007) proposed an axiomatic definition of reliability and introduced the R T coefficient, a measure of reliability extending the classical approach to a more general longitudinal scenario. The R T coefficient can be interpreted as the average reliability over different time points and can also be calculated for each time point separately. In this paper, we introduce a new and complementary measure, the so-called R Λ , which implies a new way of thinking about reliability. In a longitudinal context, each measurement brings additional knowledge and leads to more reliable information. The R Λ captures this intuitive idea and expresses the reliability of the entire longitudinal sequence, in contrast to an average or occasion-specific measure. We study the measure’s properties using both theoretical arguments and simulations, establish its connections with previous proposals, and elucidate its performance in a real case study. The authors are grateful to J&J PRD for kind permission to use their data. We gratefully acknowledge support from Belgian IUAP/PAI network “Statistical Techniques and Modeling for Complex Substantive Questions with Complex Data.”  相似文献   

10.
This paper presents a latent variable approach for the estimation of treatment effects within a pooled interrupted time series (ITS) design. Although considered quasi-experimental, the ITS design has been noted as representing one of the strongest alternatives to the randomized experiment, making it highly appropriate for use in documenting the presence of effects that might warrant further evaluation in a large-scale randomized study. Results suggest that the latent variable growth modeling (LGM) is capable of detecting simultaneous differences in both level and slope, and provides tests of significance for these two necessary indicators of an ITS intervention effect. As shown in the analyses, the LGM framework provides a comprehensive and flexible approach to research design and data analysis, making available to a wide audience of researchers an analytical framework for a variety of analyses of growth and developmental processes.  相似文献   

11.
This study describes the development and evaluation of a multi-item scale for analyzing the genetic counseling process, the Manchester Observation Code (MOC) for genetic counseling. The instrument is specific to the field of genetic counseling and is designed for analysis of the communication between counselor and client. Coding is done directly from videotaped sessions. Because communication is the means by which genetic counseling is accomplished, the method measures four relevant components of communication: (1) grammatical form, (2) purpose, (3) subject, and (4) cue source. The instrument enables an observer to code the counselor's statements into these four components. Three videotaped sessions were used to measure interrater reliability, or the consistency of rating for each of the four communication domains using this method. Three videotaped sessions were also used to measure test-retest reliability, or the consistency of the designed method from one time to another. A total of 21 videotaped sessions were tested using the method. A statistical measure of reliability established consistency of the designed method; Cohen's kappa yielded 0.7 for interrater reliability and 0.79 for test-retest reliability. These findings suggest this instrument may be used to identify important elements of the genetic counseling process.  相似文献   

12.
The work of such psychologists as Kelly, McReynolds, Epstein, and Lazarus suggested the need for a measure of cognitive anxiety and provided a definition of that construct. A method of content analysis of verbal samples was devised and found to have adequate interjudge reliability. Normative data for five groups of subjects were provided. The validity of the measure as representative of a reaction to being unable to anticipate and integrate experience meaningfully was demonstrated in (a) the higher scores of groups of subjects who were currently coping with new experiences than those who were not, (b) the significant correlation of its scores with a state rather than trait anxiety measures, (c) the variability of its scores over time as observed in a generalizeability study, and (d) the higher scores of subjects when they were dealing with experiences for which meaningful anticipation was relatively difficult.  相似文献   

13.
Background and Objectives: Chronic stress is implicated in many theories as a contributor to a wide range of physical and mental health problems. The current study describes the development of a chronic stress measure that was based on the UCLA Life Stress Interview (LSI) and adapted in collaboration with community partners for use in a large community health study of low-income, ethnically diverse parents of infants in the USA (Community Child Health Network [CCHN]). We describe the instrument, its purpose and adaptations, implementation, and results of a reliability study in a subsample of the larger study cohort. Design and Methods: Interviews with 272 mothers were included in the present study. Chronic stress was assessed using the CCHN LSI, an instrument designed for administration by trained community interviewers to assess four domains of chronic stress, each rated by interviewers. Results: Significant correlations ranging from small to moderate in size between chronic stress scores on this measure, other measures of stress, biomarkers of allostatic load, and mental health provide initial evidence of construct and concurrent validity. Reliability data for interviewer ratings are also provided. Conclusions: This relatively brief interview (15 minutes) is available for use and may be a valuable tool for researchers seeking to measure chronic stress reliably and validly in future studies with time constraints.  相似文献   

14.
It is known that visual noise added to sinusoidal gratings changes the typical U-shaped threshold curve which becomes flat in log-log scale for frequencies below 10c/deg when gratings are masked with white noise of high power spectral density level. These results have been explained using the critical-band-masking (CBM) model by supposing a visual filter-bank of constant relative bandwidth. However, some psychophysical and biological data support the idea of variable octave bandwidth. The CBM model has been used here to explain the progressive change of threshold curves with the noise mask level and to estimate the bandwidth of visual filters. Bayesian staircases were used in a 2IFC paradigm to measure contrast thresholds of horizontal sinusoidal gratings (0.25-8 c/deg) within a fixed Gaussian window and masked with one-dimensional, static, broadband white noise with each of five power density levels. Raw data showed that the contrast threshold curve progressively shifts upward and flattens out as the mask noise level increases. Theoretical thresholds from the CBM model were fitted simultaneously to the data at all five noise levels using visual filters with log-Gaussian gain functions. If we assume a fixed-channel detection model, the best fit was obtained when the octave bandwidth of visual filters decreases as a function of peak spatial frequency.  相似文献   

15.
In an effort to duplicate high interrater reliability coefficients reported in the use of Epley and Ricks' (1963) time orientation scoring system with the Thematic Apperception Test (TAT), two pairs of judges and two different training procedures were employed. Reliability coefficients considerably lower than those quoted by other researchers were found. One method of using the system was to have judges discuss scoring differences during training and at various times during a research project until perfect agreement was reached. When used as an adjunct with periodic assessment' of reliability as judges scored a large number of stories, reliability coefficients within a range acceptable for research purposes were obtained. This procedure is presented with correlational evidence for the presence of the time factor that the scoring system purports to measure.  相似文献   

16.
The Go/No Go Association Task (GNAT; Nosek & Banaji, 2001) is an implicit measure with broad application in social psychology. It has several conceptual strengths to recommend it over other implicit methods, but the belief that it has poor reliability coupled with the absence of a method for calculating this important psychometric property has hindered its wider acceptance and use. Using data obtained from six GNAT studies covering a wide range of content areas, Study 1 compares the properties of different methods for estimating reliability of the GNAT. Study 2 demonstrates a resampling procedure to investigate how reliability varies as a function of block length. Study 1 shows that with appropriately chosen stimuli the GNAT can be a very reliable measure, while Study 2 indicates that as an empirical rule of thumb 50 to 80 trials per block should yield adequate to very good reliability. However, researchers are urged to calculate their own reliability coefficients, to this end we discuss GNAT design issues and provide procedures for calculating GNAT reliability which we hope will enhance the utility of the GNAT as a measure and promote its use in studying implicit cognition.  相似文献   

17.
Goal attainment scaling (GAS) is an individually tailored way to measure treatment gains, using a highly standardized procedure. An advantage of the method is that it takes into account individual characteristics of the patients, and at the same time the data are suitable for quantitative analysis and comparable across patients. Despite the wide acceptance and use of the method in the evaluation of psychotherapy, data on its psychometric properties are rather scarce. In the current study, GAS was used as one of several outcome measures in a research project on the effectiveness of various treatments for panic disorder with agoraphobia. Guidelines for GAS are presented as well as data on the reliability and validity of the procedure. Results indicate that the procedure is reliable, valid, and sensitive to the improvement of patients during treatment. Comparison of GAS with standardized measures revealed considerable concordance, although the clinical end status of patients diverged somewhat dependent on the measure considered.  相似文献   

18.
The Strongin-Hinsie Peck whole-mouth salivation measure (Peck, 1959) is typically collected for a 2-min duration. This study compared saliva collected for 120 sec with saliva collected for shorter durations (30 and 60 sec) over repeated presentation of gustatory cues. Results showed reliable increases in salivation from a water stimulus baseline to the first presentation of lemon juice as a function of measurement duration. Repeated measures analysis of variance showed overall decreases in salivation across each measurement duration, with a greater rate of habituation for the 120-sec interval than for the 30- and 60-sec intervals. These data suggest that shorter measurement intervals can be used to measure salivation in acute and repeated measurement paradigms, but the change in response to repeated stimulus presentations is more pronounced for the longer measurement duration.  相似文献   

19.
Abstract

Curriculum-based measurement (CBM) has evolved as a reliable and valid method for measuring and monitoring student performance in basic academic skills. While the efficacy of CBM for assessing reading skills is not in question, issues remain regarding whether or not a difference exists between CBM probes derived directly from the instructional curriculum and generic probes. The current study extends previous research comparing the utility of two types of CBM reading probe materials. Both types of probes were administered to 13 second grade students twice weekly for 5 weeks. No significant differences were found between the two probe types' measurement of performance or progress over time, which suggests that school psychologists and educational professionals can use generic or curriculum-dependent probes in curriculum-based measurement.  相似文献   

20.
In order to assess the reliability of psychophysiological recording, 15 subjects were assessed on multiple response measures (forehead EMG and forearm flexor EMG, heart rate, skin resistance level, hand surface temperature and cephalic vasomotor response), under multiple stimulus conditions (baseline, self-control, cognitive and physical Stressors), on multiple occasions (Days 1, 2, 8 and 28). Three forms of reliability coefficients were computed for each response measure: coefficients on absolute scores, coefficients on change scores from baseline to stressful conditions and coefficients on percent change from baseline. Only frontal EMG appeared to have consistently high absolute reliability coefficients, with hand surface temperature having high reliability if sessions are repeated within 1 week. Heart rate was less consistently reliable. Treating the responses as relative measures did not increase their reliability; indeed, hand surface temperature was completely unreliable when examined in this fashion. Implications of this study for behavioral medicine, biofeedback and anxiety-based disorders research, as well as Lang's tripartite response system model of fear and emotion, are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号