首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Neil Gourlay 《Psychometrika》1955,20(4):273-287
In an earlier paper, a method of analysis, due to Neyman and now known generally as variance component analysis, was used to examineF-test bias for experimental designs in education of the randomized block type. The same method is now applied to studyF-test bias for designs of the Latin square type. The results, in general, disprove the view that, for a valid application of Latin square techniques, it is necessary that all interactions are zero.  相似文献   

2.
We derive the statistical power functions in multi‐site randomized trials with multiple treatments at each site, using multi‐level modelling. An F statistic is used to test multiple parameters in the multi‐level model instead of the Wald chi square test as suggested in the current literature. The F statistic is shown to be more conservative than the Wald statistic in testing any overall treatment effect among the multiple study conditions. In addition, we improvise an easy way to estimate the non‐centrality parameters for the means comparison t‐tests and the F test, using Helmert contrast coding in the multi‐level model. The variance of treatment means, which is difficult to fathom but necessary for power analysis, is decomposed into intuitive simple effect sizes in the contrast tests. The method is exemplified by a multi‐site evaluation study of the behavioural interventions for cannabis dependence.  相似文献   

3.
Neil Gourlay 《Psychometrika》1955,20(3):227-248
Reference is made to Neyman's study ofF-test bias for the randomized blocks and Latin square designs employed in agriculture, and some account is given of later statistical developments which sprang from his work—in particular, the classification of model-types and the technique of variance component analysis. It is claimed that there is a need to carry out an examination ofF-test bias for experimental designs in education and psychology which will utilize the method and, where appropriate, the known' results of this new branch of variance analysis. In the present paper, such an investigation is carried out for designs which may be regarded as derivatives of the agricultural randomized blocks design. In a paper to follow, a similar investigation will be carried out for experimental designs of the Latin square type.  相似文献   

4.
The specification of sample size is an important aspect of the planning of every experiment. When the investigator intends to use the techniques of analysis of variance in the study of treatments effects, he should, in specifying sample size, take into consideration the power of theF tests which will be made. The charts presented in this paper make possible a simple and direct estimate of the sample size required forF tests of specified power.  相似文献   

5.
In contrast to prospective power analysis, retrospective power analysis provides an estimate of the statistical power of a hypothesis test after an investigation has been conducted rather than before. In this article, three approaches to obtaining point estimates of power and an interval estimation algorithm are delineated. Previous research on the bias and sampling error of these estimates is briefly reviewed. Finally, an SAS macro that calculates the point and interval estimates is described. The macro was developed to estimate the power of anF test (obtained from analysis of variance, multiple regression analysis, or any of several multivariate analyses), but it may be easily adapted for use with other statistics, such as chi-square tests ort tests.  相似文献   

6.
I compared the randomization/permutation test and theF test for a two-cell comparative experiment. I varied (1) the number of observations per cell, (2) the size of the treatment effect, (3) the shape of the underlying distribution of error and, (4) for cases with skewed error, whether or not the skew was correlated with the treatment. With normal error, there was little difference between the tests. When error was skewed, by contrast, the randomization test was more sensitive than theF test, and if the amount of skew was correlated with the treatment, the advantage for the randomization test was both large and positively correlated with the treatment. I conclude that, because the randomization test was never less powerful than theF test, it should replace theF test in routine work.  相似文献   

7.
When the purpose of the experiment is to compare treatments, the Sequences × Positions Latin Square has been employed to control unwanted effects attributable to individuals, position, and sequence. This particular Latin Square has been subjected to criticism on the grounds there is confounding due to structure, random variables, and subject interactions. Special Latin Square, a subclass of the Sequences × Positions Latin Square, is basically ap ×p factorial design in blocks of sizep. The two factors are treatments (T) and positions (P). Sequence is one component of theTP interaction, and square uniqueness is the sum of the remaining components. This completely replicated factorial design has no structural or random variable confounding; if subject interactions are present, square uniqueness may be used as the error term and the bias in the test of treatments will be conservative.  相似文献   

8.
Hooker, Finkelman, and Schwartzman (Psychometrika, 2009, in press) defined a paradoxical result as the attainment of a higher test score by changing answers from correct to incorrect and demonstrated that such results are unavoidable for maximum likelihood estimates in multidimensional item response theory. The potential for these results to occur leads to the undesirable possibility of a subject’s best answer being detrimental to them. This paper considers the existence of paradoxical results in tests composed of item bundles when compensatory models are used. We demonstrate that paradoxical results can occur when bundle effects are modeled as nuisance parameters for each subject. However, when these nuisance parameters are modeled as random effects, or used in a Bayesian analysis, it is possible to design tests comprised of many short bundles that avoid paradoxical results and we provide an algorithm for doing so. We also examine alternative models for handling dependence between item bundles and show that using fixed dependency effects is always guaranteed to avoid paradoxical results.  相似文献   

9.
A great deal of educational and social data arises from cluster sampling designs where clusters involve schools, classrooms, or communities. A mistake that is sometimes encountered in the analysis of such data is to ignore the effect of clustering and analyse the data as if it were based on a simple random sample. This typically leads to an overstatement of the precision of results and too liberal conclusions about precision and statistical significance of mean differences. This paper gives simple corrections to the test statistics that would be computed in an analysis of variance if clustering were (incorrectly) ignored. The corrections are multiplicative factors depending on the total sample size, the cluster size, and the intraclass correlation structure. For example, the corrected F statistic has Fisher's F distribution with reduced degrees of freedom. The corrected statistic reduces to the F statistic computed by ignoring clustering when the intraclass correlations are zero. It reduces to the F statistic computed using cluster means when the intraclass correlations are unity, and it is in between otherwise. A similar adjustment to the usual statistic for testing a linear contrast among group means is described.  相似文献   

10.
Several theorems concerning properties of the communaltiy of a test in the Thurstone multiple factor theory are established. The following theorems are applicable to a battery ofn tests which are describable in terms ofr common factors, with orthogonal reference vectors.1. The communality of a testj is equal to the square of the multiple correlation of testj with ther reference vectors.2. The communality of a testj is equal to the square of the multiple correlation of testj with ther reference vectors and then—1 remaining tests. Corollary: The square of the multiple correlation of a testj with then—1 remaining tests is equal to or less than the communality of testj. It cannot exceed the communality.3. The square of the multiple correlation of a testj with then—1 remaining tests equals the communality of testj if the group of tests containsr statistically independent ests teach with a communality of unity.4. With correlation coefficients corrected for attenuation, when the number of tests increases indefinitely while the rank of the correlational matrix remains unchanged, the communality of a testj equals the square of the multiple correlation of testj with then—1 remaining tests.5. With raw correlation coefficients, it is shown in a special case that the square of the multiple correlation of a testj with then—1 remaining tests approaches the communality of testj as a limit when the number of tests increases indefinitely while the rank of correlational matrix remains the same. This has not yet been proved for the general case.The author wishes to express his appreciation of the encouragement and assistance given him by Dr. L. L. Thurstone.  相似文献   

11.
The variable-criteria sequential stopping rule (SSR) is a method for conducting planned experiments in stages after the addition of new subjects until the experiment is stopped because the p value is less than or equal to a lower criterion and the null hypothesis has been rejected, the p value is above an upper criterion, or a maximum sample size has been reached. Alpha is controlled at the expected level. The table of stopping criteria has been validated for a t test or ANOVA with four groups. New simulations in this article demonstrate that the SSR can be used with unequal sample sizes or heterogeneous variances in a t test. As with the usual t test, the use of a separate-variance term instead of a pooled-variance term prevents an inflation of alpha with heterogeneous variances. Simulations validate the original table of criteria for up to 20 groups without a drift of alpha. When used with a multigroup ANOVA, a planned contrast can be substituted for the global F as the focus for the stopping rule. The SSR is recommended when significance tests are appropriate and when the null hypothesis can be tested in stages. Because of its efficiency, the SSR should be used instead of the usual approach to the t test or ANOVA when subjects are expensive, rare, or limited by ethical considerations such as pain or distress.  相似文献   

12.
Huynh Huynh 《Psychometrika》1978,43(2):161-175
Four approximate tests are considered for repeated measurement designs in which observations are multivariate normal with arbitrary covariance matrices. In these tests traditional within-subject mean square ratios are compared with critical values derived fromF distributions with adjusted degrees of freedom. Two of them—the approximate and the improved general approximate (IGA) tests—behave adequately in terms of Type I error. Generally, the IGA test functions better than the approximate test, however the latter involves less computations. In regards to power, the IGA test may compete with one multivariate procedure when the assumptions of the latter are tenable.The author wishes to thank Garrett K. Mandeville for his careful reading of the final version of the paper.  相似文献   

13.
Previous studies of different methods of testing mediation models have consistently found two anomalous results. The first result is elevated Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap tests not found in nonresampling tests or in resampling tests that did not include a bias correction. This is of special concern as the bias-corrected bootstrap is often recommended and used due to its higher statistical power compared with other tests. The second result is statistical power reaching an asymptote far below 1.0 and in some conditions even declining slightly as the size of the relationship between X and M, a, increased. Two computer simulations were conducted to examine these findings in greater detail. Results from the first simulation found that the increased Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap are a function of an interaction between the size of the individual paths making up the mediated effect and the sample size, such that elevated Type I error rates occur when the sample size is small and the effect size of the nonzero path is medium or larger. Results from the second simulation found that stagnation and decreases in statistical power as a function of the effect size of the a path occurred primarily when the path between M and Y, b, was small. Two empirical mediation examples are provided using data from a steroid prevention and health promotion program aimed at high school football players (Athletes Training and Learning to Avoid Steroids; Goldberg et al., 1996), one to illustrate a possible Type I error for the bias-corrected bootstrap test and a second to illustrate a loss in power related to the size of a. Implications of these findings are discussed.  相似文献   

14.
Abstract

Contrary to conventional educational testing, in so-called dynamic assessment subjects are allowed to consult help during testing or are offered prior training. The differential results of both testing procedures are sometimes ascribed to the idea that dynamic tests reflect the breadth of the zone of proximal development on top of independent achievement. Alternative explanations claim that conventional tests are more strongly biased towards various characteristics of persons, which have a negative influence on performance, when compared to dynamic tests. In this study, it was hypothesised that static as well as dynamic assessment is biased towards anxious tendencies of subjects, but the former more strongly than the latter. In order to investigate this supposition, the performance of subjects on dynamic and static tests was systematically compared and related to measures of test anxiety in a longitudinal experiment. In the experiment, repeated measures of independent mathematics achievement as well as mathematics learning potential were gathered among students of secondary education in the Netherlands. Prior to every mathematics test, subjects filled out a test anxiety questionnaire. After every mathematics test, subjects filled out a general state anxiety questionnaire. The participating subjects were students from secondary education, either preparing for higher vocational training or university, aged approximately 15 years on average.

The results of the experiment showed that lack of self-confidence is an important constituent factor of test anxiety, apart from worry and emotionality. The data supported the assumption that such testing procedures are less biased towards anxiety than conventional tests, but it was not established that dynamic testing procedures render results that are not biased by test anxious tendencies.  相似文献   

15.
Two cognitive biases might partially account for public support of the ineffective AMBER Alert system. Hindsight bias is a cognitive error in which people with outcome knowledge overestimate the likelihood that this particular outcome would occur; outcome bias is an error made in evaluating the quality of a decision once the outcome is known. Two experiments assessed whether hindsight and outcome bias occur in child abduction scenarios. Study 1 was a pre/posttest experiment that examined whether hindsight bias occurs in situations in which the identity of the abductor (stranger or parent) is manipulated between groups, and all participants are told the child was killed. Study 2, a between-subjects experiment, examined whether hindsight and outcome biases occur in situations in which no AMBER Alert was issued (because the situation did not meet the legal requirements to issue an Alert), and manipulated the identity of the abductor and the outcome (child safely returned, killed, or not outcome provided). Hindsight and outcome biases occurred in both studies, given the correct set of circumstances. Abductor identity also impacted outcome estimates. Results from the two studies indicate that hindsight and outcome bias occur, but this is dependent on the outcome (child killed, child returned safely, no outcome provided) and the identity of the abductor (stranger, dangerous parent, non-dangerous parent). Limitations and future directions are discussed.  相似文献   

16.
For 25 years psychologists have measured systematic measurement bias in terms of regression lines. According to this traditional approach a test is an unbiased predictor of a criterion for all subgroups if all subgroups have identical Y regression lines (i.e., identical slopes and identical Y intercepts). This paper shows that the traditional model is fundamentally incorrect and identical Y regression lines are not expected to occur with an unbiased test in a testing situation in which one group score lower than another group on both the test and criterion. This is the case even if the test is perfectly reliable. The traditional model for measuring bias actually results in a consistent error or bias against groups which score lower than average on both the test and criterion. In practice this bias operates against minority groups. Tests now thought to be unbiased or even biased in favor of minority groups may in fact be biased against minority groups. A new model of test bias, which is based solely on measurement principles, is briefly introduced. In this model unbiased tests produce groups with identical test-criterion common-factor axes having a slope of S YC/S XC and with each axis intersecting the group centroids.  相似文献   

17.
The validity conditions for univariate repeated measures designs are described. Attention is focused on the sphericity requirement. For av degree of freedom family of comparisons among the repeated measures, sphericity exists when all contrasts contained in thev dimensional space have equal variances. Under nonsphericity, upper and lower bounds on test size and power of a priori, repeated measures,F tests are derived. The effects of nonsphericity are illustrated by means of a set of charts. The charts reveal that small departures from sphericity (.97 <1.00) can seriously affect test size and power. It is recommended that separate rather than pooled error term procedures be routinely used to test a priori hypotheses.Appreciation is extended to Milton Parnes for his insightful assistance.  相似文献   

18.
When the process of publication favors studies with smallp-values, and hence large effect estimates, combined estimates from many studies may be biased. This paper describes a model for estimation of effect size when there is selection based on one-tailedp-values. The model employs the method of maximum likelihood in the context of a mixed (fixed and random) effects general linear model for effect sizes. It offers a test for the presence of publication bias, and corrected estimates of the parameters of the linear model for effect magnitude. The model is illustrated using a well-known data set on the benefits of psychotherapy.Authors' note: The contributions of the authors are considered equal, and the order of authorship was chosen to be reverse-alphabetical.  相似文献   

19.
An experiment with rats compared the ability of fixed and variable duration cues to produce blocking. Rats in group B (Blocking) were trained that both fixed- (F) and variable- (V) duration cues would be followed by food delivery. In a subsequent training stage F and V continued to be reinforced, but F was accompanied by X, and V by Y. In the test phase responding to X and Y was examined. Control group O (Overshadowing) received identical treatment, except that F and V were nonreinforced in the first training stage. In group B there was evidence for blocking, but only of X, which had been conditioned in compound with the fixed-duration F; there was no evidence for blocking of Y, which had been conditioned in compound with the variable-duration V. It is suggested that this result may occur because fixed cues reach a higher, more stable asymptote of associative strength than do their variable equivalents.  相似文献   

20.
In the application of the analysis of variance to data obtained in educational methods experiments which involve several classes of several schools, one assumption is that of homogeneity in the variances of pupil scores from school to school. It is shown that such variances on representative educational achievement tests are heterogeneous. The effects of this heterogeneity upon theF-tests of significance commonly employed in methods experiments are investigated by comparing the actual distribution ofF values for a large number of experiments involving marked heterogeneity with a theoretical distribution based on the assumption of homogeneity. Although the findings, which vary somewhat with the type of variance ratio, are not entirely conclusive, they apparently demonstrate that departure from homogeneity does not invalidate the use of the customaryF-tests for evaluating results of the typical methods experiment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号