期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Testing the significance of interobserver agreement measures in the presence of autocorrelation: The jackknife procedure

Stephen V. Faraone Donald D. Dorfman 《Journal of psychopathology and behavioral assessment》1988,10(1):39-47

Users of interobserver agreement statistics have heretofore ignored the problem of autocorrelation in behavior sequences when testing the statistical significance of agreement measures. Due to autocorrelation traditional reliability tests based on the 2 × 2 contingency-table model (e.g., kappa, phi) are incorrect. Correct tests can be developed by using the bivariate time series as a statistical model. Seen from this perspective, testing the significance of interobserver agreement becomes formally equivalent to testing the significance of the lag-zero cross-correlation between two time series. The robust procedure known as the jackknife is suggested for this purpose. 相似文献

2.

Frequency,complexity, and clarity as covariates of observer reliability

Betty J. House Alvin E. House 《Journal of psychopathology and behavioral assessment》1979,1(2):149-165

Factors affecting interobserver agreement (reliability) with a comprehensive coding system in the naturalistic observation of children were examined. Data from 117 pairs of observations on 35 children and their families were examined with respect to reliability and three possible covariates: response frequency, observation complexity, and code definition clarity. Analysis of results strongly supported response frequency as a positive covariate of interobserver agreement. Complexity was found to negatively covary with interobserver agreement. The relationship between code clarity and reliability was in the predicted direction but failed to obtain statistical significance. Implications for observer training and data collection in observational studies are discussed.An earlier version of this article was prepared for presentation at The Association for the Advancement of Behavior Therapy, Atlanta, Georgia, December 1977. 相似文献

3.

TEMPORAL DISTRIBUTIONS OF PROBLEM BEHAVIOR BASED ON SCATTER PLOT ANALYSIS

SungWoo Kahng Brian A. Iwata Sonya M. Fischer Terry J. Page Kimberli R. H. Treadwell Don E. Williams Richard G. Smith 《Journal of applied behavior analysis》1998,31(4):593-604

相似文献

4.

Back to basics: Percentage agreement measures are adequate, but there are easier ways

Birkimer JC Brown JH 《Journal of applied behavior analysis》1979,12(4):535-543

Percentage agreement measures of interobserver agreement or "reliability" have traditionally been used to summarize observer agreement from studies using interval recording, time-sampling, and trial-scoring data collection procedures. Recent articles disagree on whether to continue using these percentage agreement measures, and on which ones to use, and what to do about chance agreements if their use is continued. Much of the disagreement derives from the need to be reasonably certain we do not accept as evidence of true interobserver agreement those agreement levels which are substantially probable as a result of chance observer agreement. The various percentage agreement measures are shown to be adequate to this task, but easier ways are discussed. Tables are given to permit checking to see if obtained disagreements are unlikely due to chance. Particularly important is the discovery of a simple rule that, when met, makes the tables unnecessary. If reliability checks using 50 or more observation occasions produce 10% or fewer disagreements, for behavior rates from 10% through 90%, the agreement achieved is quite improbably the result of chance agreement. 相似文献

5.

Nominal scale agreement among observers

Schouten Hubert J. A. 《Psychometrika》1986,51(3):453-466

Experiments are considered where each of a sample of subjects is assigned to one of C categories separately by each of a fixed or varying group of observers. Building on earlier publications, general procedures are proposed to analyze agreements and disagreements among observers. In the case of a varying group of observers, it is shown that it is not necessary to demand a constant number of observers per subject. In the case of a fixed group of observers, the problem of missing data is considered.The procedures are illustrated within the context of two clinical diagnosis examples. In the first example it is investigated which categories are relatively hard to distinguish from one another; a new theorem is applied that shows a useful property of the statistic kappa. In the second example it is investigated if a subgroup of observers can be found with a significantly higher degree of interobserver agreement.The author gratefully acknowledges the valuable suggestions by W. Molenaar, R. van Strik, R. Popping and the referees. 相似文献

6.

Modeling psychometric functions in R

Yssaad-Fesselier R Knoblauch K 《Behavior research methods》2006,38(1):28-41

We demonstrate some procedures in the statistical computing environment R for obtaining maximum likelihood estimates of the parameters of a psychometric function by fitting a generalized nonlinear regression model to the data. A feature for fitting a linear model to the threshold (or other) parameters of several psychometric functions simultaneously provides a powerful tool for testing hypotheses about the data and, potentially, for reducing the number of parameters necessary to describe them. Finally, we illustrate procedures for treating one parameter as a random effect that would permit a simplified approach to modeling stimulus-independent variability due to factors such as lapses or interobserver differences. These tools will facilitate a more comprehensive and explicit approach to the modeling of psychometric data. 相似文献

7.

Interobserver agreement and procedural fidelity: An odd asymmetry

Lindsay Essig Katarina Rotta Alan Poling 《Journal of applied behavior analysis》2023,56(1):78-85

We examined articles with experiments published in the Journal of Applied Behavior Analysis and in Behavior Analysis in Practice from 2017 through 2021 to determine how frequently procedural fidelity was assessed. When procedural fidelity was assessed, we determined how often a measure of interobserver agreement for those fidelity data was provided. We also determined how often a measure of interobserver agreement for participants' behavior was provided. Across both journals and all years, 54.7% of relevant articles provided a measure of procedural fidelity. Of them, 17.7% provided a measure of interobserver agreement for procedural fidelity. In marked contrast, 96.4% provided interobserver agreement data for participants' behavior. It is unfortunate that applied behavior analysts frequently fail to provide procedural fidelity data and, when they do, often fail to provide interobserver agreement data for the fidelity data. Reviewers for, and editors of, behavior-analytic journals are encouraged to strongly consider the relative value of procedural fidelity and agreement on procedural fidelity measures when rendering recommendations on the suitability of a given submission. 相似文献

8.

A multiple option observation system for experimental studies: MOOSES

Jon Tapp Joseph Wehby David Ellis 《Behavior research methods》1995,27(1):25-31

The purpose of this article is to outline the development of the Multiple Option Observation System for Experimental Studies (MOOSES), a flexible data collection package for applied behavioral research. Several data collection options are available to users of MOOSES. Event-based recording, interaction-based recording, duration recording, and interval recording are available to the users and can be used individually or together, depending upon the research question. The collection program can incorporate any of the keys on the keyboard. Function keys on the top or side are used for toggle (duration states) type data collection. Types of analysis include frequency and duration of discrete events, frequency of general behavior states, frequency and duration of events within behavioral states, percent interval analysis, sequential analysis, and interobserver agreement. Data obtained from MOOSES is easily incorporated with other data for further statistical analysis with standard statistical packages or popular spreadsheet programs. Applications of MOOSES and its uses in social interaction research are presented. Comparisons with other similar systems are provided. 相似文献

9.

Statistical versus clinical significance in research with the MMPI

C B Holmes J S Kixmiller R K Larsen 《Psychological reports》1989,64(1):159-162

Some of the contradictions in psychological research may be attributable to failure to distinguish statistical from clinical significance. 82 articles in which the MMPI was the research instrument were analyzed to see how often the results reported as significant were in fact large enough to warrant such a conclusion. Articles were classified as to whether or not the clinical interpretations were consistent with the statistical results. Excluding articles in which data were insufficient to reach an independent conclusion, 54.90% of the articles presented conclusions of clinical significance that were not supported by the data, while 45.10% reported clinical results that were supported by the data. 相似文献

10.

Alternate methods and software for calculating interobserver agreement for continuous observation data

William E. MacLean Jr. Jon T. Tapp Sr. Willard L. Johnson 《Journal of psychopathology and behavioral assessment》1985,7(1):65-73

Portable electronic data collection devices permit investigators to collect large amounts of observational data in a form ready for computer analysis. These devices are particularly efficient for gathering continuous data on multiple behavior categories. We expect that the increasing availability of these devices will lead to greater use of continuous data collection methods in observational research. This paper addresses the difficulties encountered when calculating traditional interobserver agreement statistics for continuous, multiple-code scoring. Two alternative strategies are described that yield interobserver agreement values based on the exact time of behavior code entries by the primary and secondary observers.Work on this paper was supported in part by NICHD Grants P01HD15051 and R01HD17650 and Office of Special Education and Rehabilitation Services Grant G008302980. 相似文献

11.

差数显著性t检验与元分析的对比研究 总被引：5，自引：0，他引：5

郭春彦朱滢《心理学报》1997,30(4):436-442

利用计算机构造被试总体、模拟实验研究程序进行抽样研究,探讨显著性ｔ检验方法与元分析方法在检验实验结果数据方面的差异。在模拟实验过程中,ｔ验受到显著性水平、样本容量和总体效果大小的影响,因此最终影响了统计推断的可靠性,建议：在进行显著性检验过程中,应对统计检验能力进行估计;元分析方法以样本为元素对总体进行推断,因此具有很高的准确性和可靠性,它将很有可能成为今后心理学研究的重要统计工具。相似文献

12.

Précis of statistical significance: rationale, validity, and utility

Chow SL 《The Behavioral and brain sciences》1998,21(2):169-94; discussion 194-239

The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of data. Neither can any of them fulfill the nonstatistical functions expected of them by critics. 相似文献

13.

Evaluating interobserver reliability of interval data

Hopkins BL Hermann JA 《Journal of applied behavior analysis》1977,10(1):121-126

Previous recommendations to employ occurrence, nonoccurrence, and overall estimates of interobserver reliability for interval data are reviewed. A rationale for comparing obtained reliability to reliability that would result from a random-chance model is explained. Formulae and graphic functions are presented to allow for the determination of chance agreement for each of the three indices, given any obtained per cent of intervals in which a response is recorded to occur. All indices are interpretable throughout the range of possible obtained values for the per cent of intervals in which a response is recorded. The level of chance agreement simply changes with changing values. Statistical procedures that could be used to determine whether obtained reliability is significantly superior to chance reliability are reviewed. These procedures are rejected because they yield significance levels that are partly a function of sample sizes and because there are no general rules to govern acceptable significance levels depending on the sizes of samples employed. 相似文献

14.

A probability-based formula for calculating interobserver agreement

Yelton AR Wildman BG Erickson MT 《Journal of applied behavior analysis》1977,10(1):127-131

Estimates of observer agreement are necessary to assess the acceptability of interval data. A common method for assessing observer agreement, per cent agreement, includes several major weaknesses and varies as a function of the frequency of behavior recorded and the inclusion or exclusion of agreements on nonoccurrences. Also, agreements that might be expected to occur by chance are not taken into account. An alternative method for assessing observer agreement that determines the exact probability that the obtained number of agreements or better would have occurred by chance is presented and explained. Agreements on both occurrences and nonoccurrences of behavior are considered in the calculation of this probability. 相似文献

15.

Determining the significance of scale values from multidimensional scaling profile analysis using a resampling method

Ding CS 《Behavior research methods》2005,37(1):37-47

Although multidimensional scaling (MDS) profile analysis is widely used to study individual differences, there is no objective way to evaluate the statistical significance of the estimated scale values. In the present study, a resampling technique (bootstrapping) was used to construct confidence limits for scale values estimated from MDS profile analysis. These bootstrap confidence limits were used, in turn, to evaluate the significance of marker variables of the profiles. The results from analyses of both simulation data and real data suggest that the bootstrap method may be valid and may be used to evaluate hypotheses about the statistical significance of marker variables of MDS profiles. 相似文献

16.

A monte carlo investigation of the statistical significance of Kruskal's nonmetric scaling procedure 总被引：1，自引：0，他引：1

David Klahr 《Psychometrika》1969,34(3):319-330

Recent advances in computer based psychometric techniques have yielded a collection of powerful tools for analyzing nonmetric data. These tools, although particularly well suited to the behavioral sciences, have several potential pitfalls. Among other things, there is no statistical test for evaluating the significance of the results. This paper provides estimates of the statistical significance of results yielded by Kruskal's nonmetric multidimensional scaling. The estimates, obtained from attempts to scale many randomly generated sets of data, reveal the relative frequency with which apparent structure is erroneously found in unstructured data. For a small number of points (i.e., six or seven) it is very likely that a good fit will be obtained in two or more dimensions when in fact the data are generated by a random process. The estimates presented here can be used as a bench mark against which to evaluate the significance of the results obtained from empirically based nonmetric multidimensional scaling.A preliminary version of this paper was presented at the International Federation for Information Processing Congress 68 in Edinburgh, Scotland, August 5–10, 1968. 相似文献

17.

Outcomes of a Day Treatment Program for Eating Disorders Using Clinical and Statistical Significance

Denise D. Ben-Porath Lucene Wisniewski Mark Warren 《Journal of Contemporary Psychotherapy》2010,40(2):115-123

The present study was designed to evaluate the outcomes of a day treatment program for 55 eating disordered (ED) patients using clinical and statistical significance testing. Results indicated a statistically significant reduction on all eating disordered outcomes. With respect to clinical significance testing, analysis of these data indicated that the majority of the individuals in the day treatment program made clinically significant and reliable change by the termination of treatment on all eating disorder measures. However, considerably less patients improved to such a point that they were asymptomatic. The importance of combining clinical significance testing with traditional significance testing is discussed. 相似文献

18.

Sampling distributions of the entropy in visual scanning

Robin S. Weiss Roger Remington Stephen R. Ellis 《Behavior research methods》1989,21(3):348-352

Identifying true statistical dependencies in visual-scanning data involves showing that the observed scanning pattern is significantly more ordered than that which would be produced by a stratified random-sampling model. In the past, entropy has been used as the index to measure statistical order or dependency. Due to the unknown nature of the underlying sampling distributions of entropy, however, researchers have had to use relatively less powerful nonparametric statistical tests to determine significance. In this paper we present relevant portions of the family of sampling distributions of entropy and show that they are sufficiently normally distributed to allow the use of a more powerful parametric statistical test when attempting to distinguish among the different models of sampling. 相似文献

19.

A Monte Carlo method for comparing generalized estimating equations to conventional statistical techniques for discounting data

Jonathan E. Friedel William B. DeHart Anne M. Foreman Michael E. Andrew 《Journal of the experimental analysis of behavior》2019,111(2):207-224

Discounting is the process by which outcomes lose value. Much of discounting research has focused on differences in the degree of discounting across various groups. This research has relied heavily on conventional null hypothesis significance tests that are familiar to psychologists, such as t‐tests and ANOVAs. As discounting research questions have become more complex by simultaneously focusing on within‐subject and between‐group differences, conventional statistical testing is often not appropriate for the obtained data. Generalized estimating equations (GEE) are one type of mixed‐effects model that are designed to handle autocorrelated data, such as within‐subject repeated‐measures data, and are therefore more appropriate for discounting data. To determine if GEE provides similar results as conventional statistical tests, we compared the techniques across 2,000 simulated data sets. The data sets were created using a Monte Carlo method based on an existing data set. Across the simulated data sets, the GEE and the conventional statistical tests generally provided similar patterns of results. As the GEE and more conventional statistical tests provide the same pattern of result, we suggest researchers use the GEE because it was designed to handle data that has the structure that is typical of discounting data. 相似文献

20.

Methodological implications of interaural correlation: count heads not ears

S Coren A R Hakstian 《Perception & psychophysics》1990,48(3):291-294

In a sample of 425 subjects, pure-tone hearing thresholds between the right and left ears were shown to have an average correlation of .885 (or .783 with age partialed out). This high interaural correlation is shown to invalidate the experimental procedure of entering data on the basis of "ears," where each subject can contribute one or two audiograms to the data pool, since such aggregation is demonstrated to produce spuriously high levels of apparent statistical significance in inferential statistical tests. 相似文献