首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Consider a multivariate context withp variates andk independent samples, each of sizen. To test equality of thek population covariance matrices, the likelihood ratio test is commonly employed. Box'sF-approximation to the null distribution of the test statistic can be used to computep-values, if sample sizes are not too small. It is suggested to regard theF-approximation as accurate if the sample sizesn are greater than or equal to 1+0.0613p 2+2.7265p-1.4182p 0.5+0.235p 1.4* In (k), for 5p30,k20.This research was supported by the Deutsche Forschungsgemeinschaft through Ste 405/2-1.  相似文献   

2.
We present a simple approximation to the conditional distribution of goodness-of-fit statistics for the Rasch model, assuming that the item difficulties are known. The approximation is easily programmed, and gives relatively accurate assessments of conditionalp-values for tests of length 10 or more. A few generalizations are discussed.  相似文献   

3.
When the process of publication favors studies with smallp-values, and hence large effect estimates, combined estimates from many studies may be biased. This paper describes a model for estimation of effect size when there is selection based on one-tailedp-values. The model employs the method of maximum likelihood in the context of a mixed (fixed and random) effects general linear model for effect sizes. It offers a test for the presence of publication bias, and corrected estimates of the parameters of the linear model for effect magnitude. The model is illustrated using a well-known data set on the benefits of psychotherapy.Authors' note: The contributions of the authors are considered equal, and the order of authorship was chosen to be reverse-alphabetical.  相似文献   

4.
5.
Null hypothesis significance testing is criticised for emphatically focusing on using the appropriate statistic for the data and an overwhelming concern with low p-values. Here, we present a new technique, Observation Oriented Modeling (OOM), as an alternative to traditional techniques in the social sciences. Ten experiments on judgements of associative memory (JAM) were analysed with OOM to show data analysis procedures and the consistency of JAM results across several types of experimental manipulations. In a typical JAM task, participants are asked to rate the frequency of word pairings, such as LOST-FOUND, and are then compared to actual normed associative frequencies to measure how accurately participants can judge word pairs. Three types of JAM tasks are outlined (traditional, paired, and instructional manipulations) to demonstrate how modelling complex hypotheses can be applied through OOM to this type of data that would be conventionally analysed with null hypothesis significance testing.  相似文献   

6.
J. V. Howard 《Erkenntnis》2009,70(2):253-270
A pure significance test would check the agreement of a statistical model with the observed data even when no alternative model was available. The paper proposes the use of a modified p-value to make such a test. The model will be rejected if something surprising is observed (relative to what else might have been observed). It is shown that the relation between this measure of surprise (the s-value) and the surprise indices of Weaver and Good is similar to the relationship between a p-value, a corresponding odds-ratio, and a logit or log-odds statistic. The s-value is always larger than the corresponding p-value, and is not uniformly distributed. Difficulties with the whole approach are discussed.  相似文献   

7.
In practice, the sum of the item scores is often used as a basis for comparing subjects. For items that have more than two ordered score categories, only the partial credit model (PCM) and special cases of this model imply that the subjects are stochastically ordered on the common latent variable. However, the PCM is very restrictive with respect to the constraints that it imposes on the data. In this paper, sufficient conditions for the stochastic ordering of subjects by their sum score are obtained. These conditions define the isotonic (nonparametric) PCM model. The isotonic PCM is more flexible than the PCM, which makes it useful for a wider variety of tests. Also, observable properties of the isotonic PCM are derived in the form of inequality constraints. It is shown how to obtain estimates of the score distribution under these constraints by using the Gibbs sampling algorithm. A small simulation study shows that the Bayesian p-values based on the log-likelihood ratio statistic can be used to assess the fit of the isotonic PCM to the data, where model-data fit can be taken as a justification of the use of the sum score to order subjects.  相似文献   

8.
Previous studies have concluded that cognitive ability tests are not predictively biased against Hispanic American job applicants because test scores generally overpredict, rather than underpredict, their job performance. However, we highlight two important shortcomings of these past studies and use meta-analytic and computation modeling techniques to address these two shortcomings. In Study 1, an updated meta-analysis of the Hispanic–White mean difference (d-value) on job performance was carried out. In Study 2, computation modeling was used to correct the Study 1 d-values for indirect range restriction and combine them with other meta-analytic parameters relevant to predictive bias to determine how often cognitive ability test scores underpredict Hispanic applicants’ job performance. Hispanic applicants’ job performance was underpredicted by a small to moderate amount in most conditions of the computation model. In contrast to previous studies, this suggests cognitive ability tests can be expected to exhibit predictive bias against Hispanic applicants much of the time. However, some conditions did not exhibit underprediction, highlighting that predictive bias depends on various selection system parameters, such as the criterion-related validity of cognitive ability tests and other predictors used in selection. Regardless, our results challenge “lack of predictive bias” as a rationale for supporting test use.  相似文献   

9.
Mean comparisons are of great importance in the application of statistics. Procedures for mean comparison with manifest variables have been well studied. However, few rigorous studies have been conducted on mean comparisons with latent variables, although the methodology has been widely used and documented. This paper studies the commonly used statistics in latent variable mean modeling and compares them with parallel manifest variable statistics. Our results indicate that, under certain conditions, the likelihood ratio and Wald statistics used for latent mean comparisons do not always have greater power than the Hotelling T2 statistics used for manifest mean comparisons. The noncentrality parameter corresponding to the T2 statistic can be much greater than those corresponding to the likelihood ratio and Wald statistics, which we find to be different from those provided in the literature. Under a fixed alternative hypothesis, our results also indicate that the likelihood ratio statistic can be stochastically much greater than the corresponding Wald statistic. The robustness property of each statistic is also explored when the model is misspecified or when data are nonnormally distributed. Recommendations and advice are provided for the use of each statistic. The research was supported by NSF grant DMS-0437167 and Grant DA01070 from the National Institute on Drug Abuse. We would like to thank three referees for suggestions that helped in improving the paper.  相似文献   

10.
The problem of comparing the agreement of two n × n matrices has a variety of applications in experimental psychology. A well-known index of agreement is based on the sum of the element-wise products of the matrices. Although less familiar to many researchers, measures of agreement based on within-row and/or within-column gradients can also be useful. We provide a suite of MATLAB programs for computing agreement indices and performing matrix permutation tests of those indices. Programs for computing exact p-values are available for small matrices, whereas resampling programs for approximate p-values are provided for larger matrices.  相似文献   

11.
A crisis of confidence in research findings in consumer psychology and other academic disciplines has led to various proposals to abandon, replace, strengthen, or supplement the null hypothesis significance testing paradigm. The proliferation of such proposals, and their often-conflicting recommendations, can increase confusion among researchers. We aim to bring some clarity by proposing five simple principles for the new era of data analysis and reporting of research in consumer psychology. We avoid adding to researchers' confusion and proposing more onerous or rigid standards. Our goal is to offer straightforward practical principles that are easy for researchers to keep in mind while analyzing their data and reporting their findings. These principles involve (1) interpreting p-values as continuous measures of the strength of evidence, (2) being aware of assumptions that determine whether one can rely on p-values, (3) using theory to establish the applicability of findings to new settings, (4) employing multiple measures of evidence and various processes to obtain them, but assigning special privilege to none, and (5) reporting procedures and findings transparently and completely. We hope that these principles provide researchers with some guidance and help to strengthen the reliability of the conclusions derived from their data, analyses, and findings.  相似文献   

12.
Effects of behaviour change on cognitions are rarely examined within the Theory of Planned Behaviour. We tested whether increases in physical activity resulted in more positive beliefs about further change among a cohort of sedentary adults participating in a behavioural intervention trial (ProActive). At baseline, 6 and 12 months, 365 adults completed questionnaires assessing physical activity and cognitions about becoming more active over the coming year. Objective activity was assessed at baseline and 12 months. Participants reporting larger increases in activity were no more positive about making further increases than those reporting less behaviour change (p-values > 0.05). Participants with larger increases in objective activity reported weaker perceived control (β?=??0.342; p?=?0.001) and more negative instrumental attitudes (β?=??0.230; p?=?0.017) at 12 months. Participants may have felt that they had changed enough or measures of perceived success may be more sensitive to behaviour change. Alternatively, long measurement intervals may have missed immediate cognitive and affective consequences of behaviour change, or such effects may require participants to consistently self-monitor or receive feedback on performance. Future studies could test the effect of such techniques on physical activity and a wider range of cognitive, affective and physiological consequences, using more frequent measurement intervals.  相似文献   

13.
Survey data often contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. With typical nonnormally distributed data in practice, a rescaled statistic Trml proposed by Satorra and Bentler was recommended in the literature of SEM. However, Trml has been shown to be problematic when the sample size N is small and/or the number of variables p is large. There does not exist a reliable test statistic for SEM with small N or large p, especially with nonnormally distributed data. Following the principle of Bartlett correction, this article develops empirical corrections to Trml so that the mean of the empirically corrected statistics approximately equals the degrees of freedom of the nominal chi-square distribution. Results show that empirically corrected statistics control type I errors reasonably well even when N is smaller than 2p, where Trml may reject the correct model 100% even for normally distributed data. The application of the empirically corrected statistics is illustrated via a real data example.  相似文献   

14.
Researchers are strongly encouraged to accompany the results of statistical tests with appropriate estimates of effect size. For 2-group comparisons, a probability-based effect size estimator (A) has many appealing properties (e.g., it is easy to understand, robust to violations of parametric assumptions, insensitive to outliers). We review generalizations of the A statistic to extend its use to applications with discrete data, with weighted data, with k > 2 groups, and with correlated samples. These generalizations are illustrated through reanalyses of data from published studies on sex differences in the acceptance of hypothetical offers of casual sex and in scores on a measure of economic enlightenment, on age differences in reported levels of Authentic Pride, and in differences between the numbers of promises made and kept in romantic relationships. Drawing from research on the construction of confidence intervals for the A statistic, we recommend a bootstrap method that can be used for each generalization of A. We provide a suite of programs that should make it easy to use the A statistic and accompany it with a confidence interval in a wide variety of research contexts.  相似文献   

15.
Two statistics, one recent and one well known, are shown to be equivalent. The recent statistic, prep, gives the probability that the sign of an experimental effect is replicable by an experiment of equal power. That statistic is equivalent to the well‐known measure for the area under a receiver operating characteristic (ROC) curve for statistical power against significance level. Both statistics can be seen as exemplifying the area theorem of psychophysics.  相似文献   

16.
Relationship researchers regularly gather data from both members of the dyad, and these two scores are likely to be correlated. This nonindependence of observations can bias p values in significance testing if person is the unit in the statistical analysis. A method for determining how much bias results from dyadic interdependence is presented. Correction factors based on the degree of interdependence, design type, and the number of dyads are used to adjust the F statistic and its degrees of freedom to produce a corrected p value. Bias depends on the type of design and the degree of nonindependence, while the number of dyads in the study ordinarily has only a small effect on bias. Various strategies for controlling for nonindependence are briefly reviewed.  相似文献   

17.
Basic fact acquisition is an important component for developing higher-order math skills. However, getting students with a history of academic noncompliance to engage in activities related to skills acquisition can be difficult. Prior research demonstrates that engagement increases when nonpreferred activities are preceded by a series of brief activities with a high probability of completion. This technique, called high-p task/request sequences, was not fully explored within the context of skill acquisition. The purpose of this study was to examine the effects of adding high-p sequences to explicit instruction on the math fact acquisition of three elementary-age students in a learning support classroom. Results showed no differences in fact acquisition between explicit instruction and explicit instruction with an added high-p component. However, the high-p sessions took nearly twice as long to complete when compared to explicit instruction alone. Implications for instructional efficiency and limitations of the high-p procedures for acquisition tasks are discussed.  相似文献   

18.
According to Wollack and Schoenig (2018, The Sage encyclopedia of educational research, measurement, and evaluation. Thousand Oaks, CA: Sage, 260), benefiting from item preknowledge is one of the three broad types of test fraud that occur in educational assessments. We use tools from constrained statistical inference to suggest a new statistic that is based on item scores and response times and can be used to detect examinees who may have benefited from item preknowledge for the case when the set of compromised items is known. The asymptotic distribution of the new statistic under no preknowledge is proved to be a simple mixture of two χ2 distributions. We perform a detailed simulation study to show that the Type I error rate of the new statistic is very close to the nominal level and that the power of the new statistic is satisfactory in comparison to that of the existing statistics for detecting item preknowledge based on both item scores and response times. We also include a real data example to demonstrate the usefulness of the suggested statistic.  相似文献   

19.
This paper presents the asymptotic expansions of the distributions of the two‐sample t‐statistic and the Welch statistic, for testing the equality of the means of two independent populations under non‐normality. Unlike other approaches, we obtain the null distributions in terms of the distribution and density functions of the standard normal variable up to n?1, where n is the pooled sample size. Based on these expansions, monotone transformations are employed to remove the higher‐order cumulant effect. We show that the new statistics can improve the precision of statistical inference to the level of o (n?1). Numerical studies are carried out to demonstrate the performance of the improved statistics. Some general rules for practitioners are also recommended.  相似文献   

20.
Iverson, Lee, and Wagenmakers (2009) claimed that Killeen’s (2005) statistic prep overestimates the “true probability of replication.” We show that Iverson et al. confused the probability of replication of an observed direction of effect with a probability of coincidence—the probability that two future experiments will return the same sign. The theoretical analysis is punctuated with a simulation of the predictions of prep for a realistic random effects world of representative parameters, when those are unknown a priori. We emphasize throughout that prep is intended to evaluate the probability of a replication outcome after observations, not to estimate a parameter. Hence, the usual conventional criteria (unbiasedness, minimum variance estimator) for judging estimators are not appropriate for probabilities such as p and prep.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号