期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Testing hypotheses involving Cronbach's alpha using marginal models

Renske E. Kuijpers L. Andries van der Ark Marcel A. Croon 《The British journal of mathematical and statistical psychology》2013,66(3):503-520

We discuss the statistical testing of three relevant hypotheses involving Cronbach's alpha: one where alpha equals a particular criterion; a second testing the equality of two alpha coefficients for independent samples; and a third testing the equality of two alpha coefficients for dependent samples. For each of these hypotheses, various statistical tests have been proposed. Over the years, these tests have depended on progressively fewer assumptions. We propose a new approach to testing the three hypotheses that relies on even fewer assumptions, is especially suited for discrete item scores, and can be applied easily to tests containing large numbers of items. The new approach uses marginal modelling. We compared the Type I error rate and the power of the marginal modelling approach to several of the available tests in a simulation study using realistic conditions. We found that the marginal modelling approach had the most accurate Type I error rates, whereas the power was similar across the statistical tests. 相似文献

2.

SAS procedures for approximate randomization tests

Ru San Chen William P. Dunlap 《Behavior research methods》1993,25(3):406-409

Approximate randomization tests are alternatives to conventional parametric statistical methods used when the normality and homoscedasticity assumptions are violated This article presents an SAS program that tests the equality of two means using an approximate randomization test This program can serve as a template for testing other hypotheses, which is illustrated by modifications to test the significance of a correlation coefficient or the equality of more than two means. 相似文献

3.

Simultaneous testing of McNemar's problem for several populations

M. A. Hamdan W. R. Pirie J. C. Arnold 《Psychometrika》1975,40(2):153-161

McNemar's problem concerns the hypothesis of equal probabilities for the unlike pairs of correlated binary variables. We consider four different extensions to this problem, each for testing simultaneous equality of proportions of unlike pairs inc independent populations of correlated binary variables, but each under different assumptions and/or additional hypotheses. For each extension both the likelihood ratio test and the goodness-of-fit chi-square test are given. Whenc=1, all cases reduce to McNemar's problem. Forc ≥ 2, however, the tests are quite different, depending on exactly how the hypothesis and alternatives of McNemar are extended. An example illustrates how widely the results may differ, depending on which extended framework is appropriate. 相似文献

4.

Tests for equality of several alpha coefficients when their sample estimates are dependent

David J. Woodruff Leonard S. Feldt 《Psychometrika》1986,51(3):393-413

In a variety of measurement situations, the researcher may wish to compare the reliabilities of several instruments administered to the same sample of subjects. This paper presents eleven statistical procedures which test the equality ofm coefficient alphas when the sample alpha coefficients are dependent. Several of the procedures are derived in detail, and numerical examples are given for two. Since all of the procedures depend on approximate asymptotic results, Monte Carlo methods are used to assess the accuracy of the procedures for sample sizes of 50, 100, and 200. Both control of Type I error and power are evaluated by computer simulation. Two of the procedures are unable to control Type I errors satisfactorily. The remaining nine procedures perform properly, but three are somewhat superior in power and Type I error control.A more detailed version of this paper is also available. 相似文献

5.

Properties of hypothesis testing techniques and (Bayesian) model selection for exploration‐based and theory‐based (order‐restricted) hypotheses

下载免费PDF全文

Rebecca M. Kuiper Tim Nederhoff Irene Klugkist 《The British journal of mathematical and statistical psychology》2015,68(2):220-245

In this paper, the performance of six types of techniques for comparisons of means is examined. These six emerge from the distinction between the method employed (hypothesis testing, model selection using information criteria, or Bayesian model selection) and the set of hypotheses that is investigated (a classical, exploration‐based set of hypotheses containing equality constraints on the means, or a theory‐based limited set of hypotheses with equality and/or order restrictions). A simulation study is conducted to examine the performance of these techniques. We demonstrate that, if one has specific, a priori specified hypotheses, confirmation (i.e., investigating theory‐based hypotheses) has advantages over exploration (i.e., examining all possible equality‐constrained hypotheses). Furthermore, examining reasonable order‐restricted hypotheses has more power to detect the true effect/non‐null hypothesis than evaluating only equality restrictions. Additionally, when investigating more than one theory‐based hypothesis, model selection is preferred over hypothesis testing. Because of the first two results, we further examine the techniques that are able to evaluate order restrictions in a confirmatory fashion by examining their performance when the homogeneity of variance assumption is violated. Results show that the techniques are robust to heterogeneity when the sample sizes are equal. When the sample sizes are unequal, the performance is affected by heterogeneity. The size and direction of the deviations from the baseline, where there is no heterogeneity, depend on the effect size (of the means) and on the trend in the group variances with respect to the ordering of the group sizes. Importantly, the deviations are less pronounced when the group variances and sizes exhibit the same trend (e.g., are both increasing with group number). 相似文献

6.

Bayes factor approaches for testing interval null hypotheses 总被引：1，自引：0，他引：1

Morey RD Rouder JN 《心理学方法》2011,16(4):406-419

Psychological theories are statements of constraint. The role of hypothesis testing in psychology is to test whether specific theoretical constraints hold in data. Bayesian statistics is well suited to the task of finding supporting evidence for constraint, because it allows for comparing evidence for 2 hypotheses against each another. One issue in hypothesis testing is that constraints may hold only approximately rather than exactly, and the reason for small deviations may be trivial or uninteresting. In the large-sample limit, these uninteresting, small deviations lead to the rejection of a useful constraint. In this article, we develop several Bayes factor 1-sample tests for the assessment of approximate equality and ordinal constraints. In these tests, the null hypothesis covers a small interval of non-0 but negligible effect sizes around 0. These Bayes factors are alternatives to previously developed Bayes factors, which do not allow for interval null hypotheses, and may especially prove useful to researchers who use statistical equivalence testing. To facilitate adoption of these Bayes factor tests, we provide easy-to-use software. 相似文献

7.

Asymptotically distribution-free (ADF) interval estimation of coefficient alpha

Maydeu-Olivares A Coffman DL Hartmann WM 《心理学方法》2007,12(2):157-176

The point estimate of sample coefficient alpha may provide a misleading impression of the reliability of the test score. Because sample coefficient alpha is consistently biased downward, it is more likely to yield a misleading impression of poor reliability. The magnitude of the bias is greatest precisely when the variability of sample alpha is greatest (small population reliability and small sample size). Taking into account the variability of sample alpha with an interval estimator may lead to retaining reliable tests that would be otherwise rejected. Here, the authors performed simulation studies to investigate the behavior of asymptotically distribution-free (ADF) versus normal-theory interval estimators of coefficient alpha under varied conditions. Normal-theory intervals were found to be less accurate when item skewness >1 or excess kurtosis >1. For sample sizes over 100 observations, ADF intervals are preferable, regardless of item skewness and kurtosis. A formula for computing ADF confidence intervals for coefficient alpha for tests of any size is provided, along with its implementation as an SAS macro. 相似文献

8.

Resampling-Based Inference Methods for Comparing Two Coefficients Alpha

Markus Pauly Maria Umlauft Ali Ünlü 《Psychometrika》2018,83(1):203-222

The two-sample problem for Cronbach’s coefficient \(\alpha _C\), as an estimate of test or composite score reliability, has attracted little attention compared to the extensive treatment of the one-sample case. It is necessary to compare the reliability of a test for different subgroups, for different tests or the short and long forms of a test. In this paper, we study statistical procedures of comparing two coefficients \(\alpha _{C,1}\) and \(\alpha _{C,2}\). The null hypothesis of interest is \(H_0 : \alpha _{C,1} = \alpha _{C,2}\), which we test against one-or two-sided alternatives. For this purpose, resampling-based permutation and bootstrap tests are proposed for two-group multivariate non-normal models under the general asymptotically distribution-free (ADF) setting. These statistical tests ensure a better control of the type-I error, in finite or very small sample sizes, when the state-of-affairs ADF large-sample test may fail to properly attain the nominal significance level. By proper choice of a studentized test statistic, the resampling tests are modified in order to be valid asymptotically even in non-exchangeable data frameworks. Moreover, extensions of this approach to other designs and reliability measures are discussed as well. Finally, the usefulness of the proposed resampling-based testing strategies is demonstrated in an extensive simulation study and illustrated by real data applications. 相似文献

9.

Testing equivalence with repeated measures: tests of the difference model of two-alternative forced-choice performance

García-Pérez MA Alcalá-Quintana R 《The Spanish journal of psychology》2011,14(2):1023-1049

Solving theoretical or empirical issues sometimes involves establishing the equality of two variables with repeated measures. This defies the logic of null hypothesis significance testing, which aims at assessing evidence against the null hypothesis of equality, not for it. In some contexts, equivalence is assessed through regression analysis by testing for zero intercept and unit slope (or simply for unit slope in case that regression is forced through the origin). This paper shows that this approach renders highly inflated Type I error rates under the most common sampling models implied in studies of equivalence. We propose an alternative approach based on omnibus tests of equality of means and variances and in subject-by-subject analyses (where applicable), and we show that these tests have adequate Type I error rates and power. The approach is illustrated with a re-analysis of published data from a signal detection theory experiment with which several hypotheses of equivalence had been tested using only regression analysis. Some further errors and inadequacies of the original analyses are described, and further scrutiny of the data contradict the conclusions raised through inadequate application of regression analyses. 相似文献

10.

Statistical criteria for parallel tests: A comparison of accuracy and power

Miguel A. García-Pérez 《Behavior research methods》2013,45(4):999-1010

Parallel tests are needed so that alternate forms can be applied to different groups or on different occasions, but also in the context of split-half reliability estimation for a given test. Statistically, parallelism holds beyond reasonable doubt when the null hypotheses of equality of observed means and variances across the two forms (or halves) are not rejected. Several statistical tests have been proposed for this purpose, but their performance has never been compared. This study assessed the relative performance (type I error rate and power) of the Student–Pitman–Morgan, Bradley–Blackwood, and Wilks tests of equality of means and variances in the typical conditions surrounding studies of parallelism—namely, integer-valued and bounded test scores with distributions that may not be bivariate normal. The results advise against the use of the Wilks test and support the use of the Bradley–Blackwood test because of its simplicity and its minimally better performance in comparison with the more cumbersome Student–Pitman–Morgan test. 相似文献

11.

Multiplicity,directional (type III) errors,and the null hypothesis

Shaffer JP 《心理学方法》2002,7(3):356-369

L. V. Jones and J. W. Tukey (2000) pointed out that the usual 2-sided, equal-tails null hypothesis test at level alpha can be reinterpreted as simultaneous tests of 2 directional inequality hypotheses, each at level alpha/2, and that the maximum probability of a Type I error is alpha/2 if the truth of the null hypothesis is considered impossible. This article points out that in multiple testing with familywise error rate controlled at alpha, the directional error rate (assuming all null hypotheses are false) is greater than alpha/2 and can be arbitrarily close to alpha. Single-step, step-down, and step-up procedures are analyzed, and other error rates, including the false discovery rate, are discussed. Implications for confidence interval estimation and hypothesis testing practices are considered. 相似文献

12.

Seeking Confirmation Is Rational for Deterministic Hypotheses

Joseph L. Austerweil Thomas L. Griffiths 《Cognitive Science》2011,35(3):499-526

The tendency to test outcomes that are predicted by our current theory (the confirmation bias) is one of the best‐known biases of human decision making. We prove that the confirmation bias is an optimal strategy for testing hypotheses when those hypotheses are deterministic, each making a single prediction about the next event in a sequence. Our proof applies for two normative standards commonly used for evaluating hypothesis testing: maximizing expected information gain and maximizing the probability of falsifying the current hypothesis. This analysis rests on two assumptions: (a) that people predict the next event in a sequence in a way that is consistent with Bayesian inference; and (b) when testing hypotheses, people test the hypothesis to which they assign highest posterior probability. We present four behavioral experiments that support these assumptions, showing that a simple Bayesian model can capture people's predictions about numerical sequences (Experiments 1 and 2), and that we can alter the hypotheses that people choose to test by manipulating the prior probability of those hypotheses (Experiments 3 and 4). 相似文献

13.

Effects of training with added difficulties on RADAR detection

Michael D. Young Alice F. Healy Cleotilde Gonzalez Varun Dutt Lyle E. Bourne Jr. 《Applied cognitive psychology》2011,25(3):395-407

Three experiments simulating military RADAR detection addressed a training difficulty hypothesis (training with difficulty promotes superior later testing performance) and a procedural reinstatement hypothesis (test performance improves when training conditions match test conditions). Training and testing were separated by 1 week. Participants detected targets (either alphanumeric characters or vehicle pictures) occurring among distractors. Two secondary tasks were used to increase difficulty (a concurrent, irrelevant tone‐counting task and a sequential, relevant action‐firing response). In Experiment 1 , involving alphanumeric targets with rapid displays, tone counting during training degraded test performance. In Experiment 2, involving vehicle targets with both sources of difficulty and slower presentation times, training under relevant difficulty aided test accuracy. In Experiment 3, involving vehicle targets and action firing with slow presentation times, test accuracy tended to be worst when neither training nor testing involved difficult conditions. These results show boundary conditions for the training difficulty and procedural reinstatement hypotheses. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

14.

Robust step‐down tests for multivariate independent group designs

《The British journal of mathematical and statistical psychology》2007,60(2):245-265

A composite step‐down procedure, in which a set of step‐down tests are summarized collectively with Fisher's combination statistic, was considered to test for multivariate mean equality in two‐group designs. An approximate degrees of freedom (ADF) composite procedure based on trimmed/Winsorized estimators and a non‐pooled estimate of error variance is proposed, and compared to a composite procedure based on trimmed/Winsorized estimators and a pooled estimate of error variance. The step‐down procedures were also compared to Hotelling's T² and Johansen's ADF global procedure based on trimmed estimators in a simulation study. Type I error rates of the pooled step‐down procedure were sensitive to covariance heterogeneity in unbalanced designs; error rates were similar to those of Hotelling's T² across all of the investigated conditions. Type I error rates of the ADF composite step‐down procedure were insensitive to covariance heterogeneity and less sensitive to the number of dependent variables when sample size was small than error rates of Johansen's test. The ADF composite step‐down procedure is recommended for testing hypotheses of mean equality in two‐group designs except when the data are sampled from populations with different degrees of multivariate skewness. 相似文献

15.

A replication of Rorschach and MMPI-2 convergent validity 总被引：1，自引：0，他引：1

Meyer GJ Riethmiller RJ Brooks RD Benoit WA Handler L 《Journal of personality assessment》2000,74(2):175-215

We replicated prior research on Rorschach and MMPI-2 convergent validity by testing 8 hypotheses in a new sample of patients. We also extended prior research by developing criteria to include more patients and by applying the same procedures to 2 self-report tests: the MMPI-2 and the MCMI-II. Results supported our hypotheses and paralleled the prior findings. Furthermore, 3 different tests for methodological artifacts could not account for the results. Thus, the convergence of Rorschach and MMPI-2 constructs seems to be partially a function of how patients interact with the tests. When patients approach each test with a similar style, conceptually aligned constructs tend to correlate. Although this result is less robust, when patients approach each test in an opposing manner, conceptually aligned constructs tend to be negatively correlated. When test interaction styles are ignored, MMPI-2 and Rorschach constructs tend to be uncorrelated, unless a sample just happens to possess a correlation between Rorschach and MMPI-2 stylistic variables. Remaining ambiguities and suggestions for further advances are discussed. 相似文献

16.

Nonparametric tests for equality of psychometric functions

Miguel A. García-Pérez Vicente Núñez-Antón 《Behavior research methods》2018,50(6):2226-2255

Many empirical studies measure psychometric functions (curves describing how observers’ performance varies with stimulus magnitude) because these functions capture the effects of experimental conditions. To assess these effects, parametric curves are often fitted to the data and comparisons are carried out by testing for equality of mean parameter estimates across conditions. This approach is parametric and, thus, vulnerable to violations of the implied assumptions. Furthermore, testing for equality of means of parameters may be misleading: Psychometric functions may vary meaningfully across conditions on an observer-by-observer basis with no effect on the mean values of the estimated parameters. Alternative approaches to assess equality of psychometric functions per se are thus needed. This paper compares three nonparametric tests that are applicable in all situations of interest: The existing generalized Mantel–Haenszel test, a generalization of the Berry–Mielke test that was developed here, and a split variant of the generalized Mantel–Haenszel test also developed here. Their statistical properties (accuracy and power) are studied via simulation and the results show that all tests are indistinguishable as to accuracy but they differ non-uniformly as to power. Empirical use of the tests is illustrated via analyses of published data sets and practical recommendations are given. The computer code in matlab and R to conduct these tests is available as Electronic Supplemental Material. 相似文献

17.

Testing two variances for superiority/non-inferiority and equivalence: Using the exhaustion algorithm for sample size allocation with cost

Jiin-huarng Guo Wei-ming Luh 《The British journal of mathematical and statistical psychology》2020,73(2):316-332

The equality of two group variances is frequently tested in experiments. However, criticisms of null hypothesis statistical testing on means have recently arisen and there is interest in other types of statistical tests of hypotheses, such as superiority/non-inferiority and equivalence. Although these tests have become more common in psychology and social sciences, the corresponding sample size estimation for these tests is rarely discussed, especially when the sampling unit costs are unequal or group sizes are unequal for two groups. Thus, for finding optimal sample size, the present study derived an initial allocation by approximating the percentiles of an F distribution with the percentiles of the standard normal distribution and used the exhaustion algorithm to select the best combination of group sizes, thereby ensuring the resulting power reaches the designated level and is maximal with a minimal total cost. In this manner, optimization of sample size planning is achieved. The proposed sample size determination has a wide range of applications and is efficient in terms of Type I errors and statistical power in simulations. Finally, an illustrative example from a report by the Health Survey for England, 1995–1997, is presented using hypertension data. For ease of application, four R Shiny apps are provided and benchmarks for setting equivalence margins are suggested. 相似文献

18.

Automatic Bayes Factors for Testing Equality- and Inequality-Constrained Hypotheses on Variances

Florian Böing-Messing Joris Mulder 《Psychometrika》2018,83(3):586-617

In comparing characteristics of independent populations, researchers frequently expect a certain structure of the population variances. These expectations can be formulated as hypotheses with equality and/or inequality constraints on the variances. In this article, we consider the Bayes factor for testing such (in)equality-constrained hypotheses on variances. Application of Bayes factors requires specification of a prior under every hypothesis to be tested. However, specifying subjective priors for variances based on prior information is a difficult task. We therefore consider so-called automatic or default Bayes factors. These methods avoid the need for the user to specify priors by using information from the sample data. We present three automatic Bayes factors for testing variances. The first is a Bayes factor with equal priors on all variances, where the priors are specified automatically using a small share of the information in the sample data. The second is the fractional Bayes factor, where a fraction of the likelihood is used for automatic prior specification. The third is an adjustment of the fractional Bayes factor such that the parsimony of inequality-constrained hypotheses is properly taken into account. The Bayes factors are evaluated by investigating different properties such as information consistency and large sample consistency. Based on this evaluation, it is concluded that the adjusted fractional Bayes factor is generally recommendable for testing equality- and inequality-constrained hypotheses on variances. 相似文献

19.

Answering two criticisms of hypothesis testing: a comment

Serlin RC 《Psychological reports》2000,87(2):579-581

In a recent article, Leventhal (1999) responds to two criticisms of hypothesis testing by showing that the one-tailed test and the directional two-tailed test are valid, even if all point null hypotheses are false and that hypothesis tests can provide the probability of decisions being correct which are based on the tests. Unfortunately, the falseness of all point null hypotheses affects the operating characteristics of the directional two-tailed test, seeming to weaken certain of Leventhal's arguments in favor of this procedure. 相似文献

20.

A test of the hypothesis that Cronbach's alpha reliability coefficient is the same for two tests administered to the same sample

Leonard S. Feldt 《Psychometrika》1980,45(1):99-105

In measurement studies the researcher may wish to test the hypothesis that Cronbach's alpha reliability coefficient is the same for two measurement procedures. A statistical test exists for independent samples of subjects. In this paper three procedures are developed for the situation in which the coefficients are determined from the same sample. All three procedures are computationally simple and give tight control of Type I error when the sample size is 50 or greater.The author is indebted to Jerry S. Gilmer for development of the computer programs used in this study. 相似文献