首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Hierarchical data sets arise when the data for lower units (e.g., individuals such as students, clients, and citizens) are nested within higher units (e.g., groups such as classes, hospitals, and regions). In data collection for experimental research, estimating the required sample size beforehand is a fundamental question for obtaining sufficient statistical power and precision of the focused parameters. The present research extends previous research from Heo and Leon (2008) and Usami (2011b), by deriving closed-form formulas for determining the required sample size to test effects in experimental research with hierarchical data, and by focusing on both multisite-randomized trials (MRTs) and cluster-randomized trials (CRTs). These formulas consider both statistical power and the width of the confidence interval of a standardized effect size, on the basis of estimates from a random-intercept model for three-level data that considers both balanced and unbalanced designs. These formulas also address some important results, such as the lower bounds of the needed units at the highest levels.  相似文献   

2.
Factorial experimental designs have many potential advantages for behavioral scientists. For example, such designs may be useful in building more potent interventions by helping investigators to screen several candidate intervention components simultaneously and to decide which are likely to offer greater benefit before evaluating the intervention as a whole. However, sample size and power considerations may challenge investigators attempting to apply such designs, especially when the population of interest is multilevel (e.g., when students are nested within schools, or when employees are nested within organizations). In this article, we examine the feasibility of factorial experimental designs with multiple factors in a multilevel, clustered setting (i.e., of multilevel, multifactor experiments). We conduct Monte Carlo simulations to demonstrate how design elements-such as the number of clusters, the number of lower-level units, and the intraclass correlation-affect power. Our results suggest that multilevel, multifactor experiments are feasible for factor-screening purposes because of the economical properties of complete and fractional factorial experimental designs. We also discuss resources for sample size planning and power estimation for multilevel factorial experiments. These results are discussed from a resource management perspective, in which the goal is to choose a design that maximizes the scientific benefit using the resources available for an investigation.  相似文献   

3.
4.
A new method for deriving effect sizes from single-case designs is proposed. The strategy is applicable to small-sample time-series data with autoregressive errors. The method uses Generalized Least Squares (GLS) to model the autocorrelation of the data and estimate regression parameters to produce an effect size that represents the magnitude of treatment effect from baseline to treatment phases in standard deviation units. In this paper, the method is applied to two published examples using common single case designs (i.e., withdrawal and multiple-baseline). The results from these studies are described, and the method is compared to ten desirable criteria for single-case effect sizes. Based on the results of this application, we conclude with observations about the use of GLS as a support to visual analysis, provide recommendations for future research, and describe implications for practice.  相似文献   

5.
The point-biserial correlation is a commonly used measure of effect size in two-group designs. New estimators of point-biserial correlation are derived from different forms of a standardized mean difference. Point-biserial correlations are defined for designs with either fixed or random group sample sizes and can accommodate unequal variances. Confidence intervals and standard errors for the point-biserial correlation estimators are derived from the sampling distributions for pooled-variance and separate-variance versions of a standardized mean difference. The proposed point-biserial confidence intervals can be used to conduct directional two-sided tests, equivalence tests, directional non-equivalence tests, and non-inferiority tests. A confidence interval for an average point-biserial correlation in meta-analysis applications performs substantially better than the currently used methods. Sample size formulas for estimating a point-biserial correlation with desired precision and testing a point-biserial correlation with desired power are proposed. R functions are provided that can be used to compute the proposed confidence intervals and sample size formulas.  相似文献   

6.
When planning a study, sample size determination is one of the most important tasks facing the researcher. The size will depend on the purpose of the study, the cost limitations, and the nature of the data. By specifying the standard deviation ratio and/or the sample size ratio, the present study considers the problem of heterogeneous variances and non‐normality for Yuen's two‐group test and develops sample size formulas to minimize the total cost or maximize the power of the test. For a given power, the sample size allocation ratio can be manipulated so that the proposed formulas can minimize the total cost, the total sample size, or the sum of total sample size and total cost. On the other hand, for a given total cost, the optimum sample size allocation ratio can maximize the statistical power of the test. After the sample size is determined, the present simulation applies Yuen's test to the sample generated, and then the procedure is validated in terms of Type I errors and power. Simulation results show that the proposed formulas can control Type I errors and achieve the desired power under the various conditions specified. Finally, the implications for determining sample sizes in experimental studies and future research are discussed.  相似文献   

7.
Numerous rules-of-thumb have been suggested for determining the minimum number of subjects required to conduct multiple regression analyses. These rules-of-thumb are evaluated by comparing their results against those based on power analyses for tests of hypotheses of multiple and partial correlations. The results did not support the use of rules-of-thumb that simply specify some constant (e.g., 100 subjects) as the minimum number of subjects or a minimum ratio of number of subjects (N) to number of predictors (m). Some support was obtained for a rule-of-thumb that N ≥ 50 + 8 m for the multiple correlation and N ≥104 + m for the partial correlation. However, the rule-of-thumb for the multiple correlation yields values too large for N when m ≥ 7, and both rules-of-thumb assume all studies have a medium-size relationship between criterion and predictors. Accordingly, a slightly more complex rule-of thumb is introduced that estimates minimum sample size as function of effect size as well as the number of predictors. It is argued that researchers should use methods to determine sample size that incorporate effect size.  相似文献   

8.
Calculating and reporting appropriate measures of effect size are becoming standard practice in psychological research. One of the most common scenarios encountered involves the comparison of 2 groups, which includes research designs that are experimental (e.g., random assignment to treatment vs. placebo conditions) and nonexperimental (e.g., testing for gender differences). Familiar measures such as the standardized mean difference (d) or the point-biserial correlation (rpb) characterize the magnitude of the difference between groups, but these effect size measures are sensitive to a number of additional influences. For example, R. E. McGrath and G. J. Meyer (2006) showed that rpb is sensitive to sample base rates, and extending their analysis to situations of unequal variances reveals that d is, too. The probability-based measure A, the nonparametric generalization of what K. O. McGraw and S. P. Wong (1992) called the common language effect size statistic, is insensitive to base rates and more robust to several other factors (e.g., extreme scores, nonlinear transformations). In addition to its excellent generalizability across contexts, A is easy to understand and can be obtained from standard computer output or through simple hand calculations.  相似文献   

9.
Experiments that involve nested structures may assign treatment conditions either to entire groups (such as classrooms or schools) or individuals within groups (such as students). Although typically the interest in field experiments is in determining the significance of the overall treatment effect, it is equally important to examine the inconsistency of the treatment effect in different groups. This study provides methods for computing power of tests for the variability of treatment effects across level-2 and level-3 units in three-level designs, where, for example, students are nested within classrooms and classrooms are nested within schools and random assignment takes place at the first or the second level. The power computations take into account nesting effects at the second (e.g., classroom) and at the third (e.g., school) level as well as sample size effects (e.g., number of level-1 and level-2 units). The methods can also be applied to quasi-experimental studies that examine the significance of the variation of group differences in an outcome or associations between predictors and outcomes across level-2 and level-3 units.  相似文献   

10.
It is common practice in both randomized and quasi-experiments to adjust for baseline characteristics when estimating the average effect of an intervention. The inclusion of a pre-test, for example, can reduce both the standard error of this estimate and—in non-randomized designs—its bias. At the same time, it is also standard to report the effect of an intervention in standardized effect size units, thereby making it comparable to other interventions and studies. Curiously, the estimation of this effect size, including covariate adjustment, has received little attention. In this article, we provide a framework for defining effect sizes in designs with a pre-test (e.g., difference-in-differences and analysis of covariance) and propose estimators of those effect sizes. The estimators and approximations to their sampling distributions are evaluated using a simulation study and then demonstrated using an example from published data.  相似文献   

11.
While conventional hierarchical linear modeling is applicable to purely hierarchical data, a multiple membership random effects model (MMrem) is appropriate for nonpurely nested data wherein some lower-level units manifest mobility across higher-level units. Although a few recent studies have investigated the influence of cluster-level residual nonnormality on hierarchical linear modeling estimation for purely hierarchical data, no research has examined the statistical performance of an MMrem given residual non-normality. The purpose of the present study was to extend prior research on the influence of residual non-normality from purely nested data structures to multiple membership data structures. Employing a Monte Carlo simulation study, this research inquiry examined two-level MMrem parameter estimate biases and inferential errors. Simulation factors included the level-two residual distribution, sample sizes, intracluster correlation coefficient, and mobility rate. Results showed that estimates of fixed effect parameters and the level-one variance component were robust to level-two residual non-normality. The level-two variance component, however, was sensitive to level-two residual non-normality and sample size. Coverage rates of the 95% credible intervals deviated from the nominal value assumed when level-two residuals were non-normal. These findings can be useful in the application of an MMrem to account for the contextual effects of multiple higher-level units.  相似文献   

12.
《Behavior Therapy》2018,49(6):981-994
This article describes the development of an effect size measure called Ratio of Distances (RD). The goal was to develop a measure of level change for single case experimental research that met several practical requirements: (a) the measure is adaptable to designs with varying numbers of observations per, and across, phases; (b) the measure is adaptable to situations in which slope does and does not exist; (c) the measure has no ceiling, as is the limitation with commonly used overlap-based measures of effect size; and (d) the measure is computationally transparent and easily computed using widely available analysis tools (e.g., Microsoft Excel). The measure is applicable to single cases and meta-analyses.  相似文献   

13.
郑昊敏  温忠麟  吴艳 《心理科学进展》2011,19(12):1868-1878
效应量在量化方面弥补了零假设检验的不足。除了报告检验结果外, 许多期刊还要求在研究报告中包括效应量。效应量可以分为三大类别:差异类、相关类和组重叠类, 它们在不同的研究设计(如单因素和多因素被试间、被试内和混合实验设计)或在不同的数据条件下(如小样本、方差异质等)可能有不同的计算方法和用法, 但许多效应量可以相互转换。我们梳理出一个表格有助应用工作者根据研究目的和研究类型选用合适的效应量。  相似文献   

14.
This research was motivated by a clinical trial design for a cognitive study. The pilot study was a matched-pairs design where some data are missing, specifically the missing data coming at the end of the study. Existing approaches to determine sample size are all based on asymptotic approaches (e.g., the generalized estimating equation (GEE) approach). When the sample size in a clinical trial is small to medium, these asymptotic approaches may not be appropriate for use due to the unsatisfactory Type I and II error rates. For this reason, we consider the exact unconditional approach to compute the sample size for a matched-pairs study with incomplete data. Recommendations are made for each possible missingness pattern by comparing the exact sample sizes based on three commonly used test statistics, with the existing sample size calculation based on the GEE approach. An example from a real surgeon-reviewers study is used to illustrate the application of the exact sample size calculation in study designs.  相似文献   

15.
The accuracy in parameter estimation approach to sample size planning is developed for the coefficient of variation, where the goal of the method is to obtain an accurate parameter estimate by achieving a sufficiently narrow confidence interval. The first method allows researchers to plan sample size so that the expected width of the confidence interval for the population coefficient of variation is sufficiently narrow. A modification allows a desired degree of assurance to be incorporated into the method, so that the obtained confidence interval will be sufficiently narrow with some specified probability (e.g., 85% assurance that the 95 confidence interval width will be no wider than to units). Tables of necessary sample size are provided for a variety of scenarios that may help researchers planning a study where the coefficient of variation is of interest plan an appropriate sample size in order to have a sufficiently narrow confidence interval, optionally with somespecified assurance of the confidence interval being sufficiently narrow. Freely available computer routines have been developed that allow researchers to easily implement all of the methods discussed in the article.  相似文献   

16.
In experimental research, it is not uncommon to assign clusters to conditions. When analysing the data of such cluster-randomized trials, a multilevel analysis should be applied in order to take into account the dependency of first-level units (i.e., subjects) within a second-level unit (i.e., a cluster). Moreover, the multilevel analysis can handle covariates on both levels. If a first-level covariate is involved, usually the within-cluster effect of this covariate will be estimated, implicitly assuming the contextual effect to be equal. However, this assumption may be violated. The focus of the present simulation study is the effects of ignoring the inequality of the within-cluster and contextual covariate effects on parameter and standard error estimates of the treatment effect, which is the parameter of main interest in experimental research. We found that ignoring the inequality of the within-cluster and contextual effects does not affect the estimation of the treatment effect or its standard errors. However, estimates of the variance components, as well as standard errors of the constant, were found to be biased.  相似文献   

17.
Lyn H 《Animal cognition》2007,10(4):461-475
Error analysis has been used in humans to detect implicit representations and categories in language use. The present study utilizes the same technique to report on mental representations and categories in symbol use from two bonobos (Pan paniscus). These bonobos have been shown in published reports to comprehend English at the level of a two-and-a-half year old child and to use a keyboard with over 200 visuographic symbols (lexigrams). In this study, vocabulary test errors from over 10 years of data revealed auditory, visual, and spatio-temporal generalizations (errors were more likely items that looked like sounded like, or were frequently associated with the sample item in space or in time), as well as hierarchical and conceptual categorizations. These error data, like those of humans, are a result of spontaneous responding rather than specific training and do not solely depend upon the sample mode (e.g. auditory similarity errors are not universally more frequent with an English sample, nor were visual similarity errors universally more frequent with a photograph sample). However, unlike humans, these bonobos do not make errors based on syntactical confusions (e.g. confusing semantically unrelated nouns), suggesting that they may not separate syntactical and semantic information. These data suggest that apes spontaneously create a complex, hierarchical, web of representations when exposed to a symbol system. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

18.
Many statistics packages print skewness and kurtosis statistics with estimates of their standard errors. The function most often used for the standard errors (e.g., in SPSS) assumes that the data are drawn from a normal distribution, an unlikely situation. Some textbooks suggest that if the statistic is more than about 2 standard errors from the hypothesized value (i.e., an approximate value for the critical value from the t distribution for moderate or large sample sizes when α = 5%), the hypothesized value can be rejected. This is an inappropriate practice unless the standard error estimate is accurate and the sampling distribution is approximately normal. We show distributions where the traditional standard errors provided by the function underestimate the actual values, often being 5 times too small, and distributions where the function overestimates the true values. Bootstrap standard errors and confidence intervals are more accurate than the traditional approach, although still imperfect. The reasons for this are discussed. We recommend that if you are using skewness and kurtosis statistics based on the 3rd and 4th moments, bootstrapping should be used to calculate standard errors and confidence intervals, rather than using the traditional standard. Software in the freeware R for this article provides these estimates.  相似文献   

19.
Previous studies of semantic memory have overlooked an important distinction among so-called “property statements”. Statements with relative adjectives (e.g., Flamingos are big) imply a comparison to a standard or reference point associated with an immediate superordinate category (a flamingo is big for a bird), while the truth of statements with absolute adjectives (e.g., Flamingos are pink) is generally independent of such a standard. To examine the psychological consequences of this distinction, we asked subjects in Experiment 1 to verify sentences containing either relative or absolute adjectives embedded in either predicate-adjective (PA) constructions (e.g., A flamingo is big (pink)) or predicate-noun (PN) constructions (e.g. A flamingo is a big (pink) bird), where the predicate noun was the immediate superordinate. Reaction times (RTs) and errors for relative sentences decreased when the superordinate was specified, but remained constant for absolute sentences. These data also suggest that the truth value of relative sentences depends, not just on the superordinate, but also on a more global standard for everyday, human-oriented objects. Experiment 2 extends these results in showing that ratings of the truth of relative sentences are a function of the difference in size between an instance and its superordinate standard (e.g., between the size of a flamingo and that of an average bird) and the difference between the instance and the standard for everyday objects. Experiment 3 replicated these findings using reaction time as the dependent measure.  相似文献   

20.
Replication studies frequently fail to detect genuine effects because too few subjects are employed to yield an acceptable level of power. To remedy this situation, a method of sample size determination in replication attempts is described that uses information supplied by the original experiment to establish a distribution of probable effect sizes. The sample size to be employed is that which supplies an expected power of the desired amount over the distribution of probable effect sizes. The method may be used in replication attempts involving the comparison of means, the comparison of correlation coefficients, and the comparison of proportions. The widely available equation-solving program EUREKA provides a rapid means of executing the method on a microcomputer. Only ten lines are required to represent the method as a set of equations in EUREKA’s language. Such an equation file is readily modified, so that even inexperienced users find it a straightforward means of obtaining the sample size for a variety of designs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号