首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Traditional Null Hypothesis Testing procedures are poorly adapted to theory testing. The methodology can mislead researchers in several ways, including: (a) a lack of power can result in an erroneous rejection of the theory; (b) the focus on directionality (ordinal tests) rather than more precise quantitative predictions limits the information gained; and (c) the misuse of probability values to indicate effect size. An alternative approach is proposed which involves employing the theory to generate explicit effect size predictions that are compared to the effect size estimates and related confidence intervals to test the theoretical predictions. This procedure is illustrated employing the Transtheoretical Model. Data from a sample ( N = 3,967) of smokers from a large New England HMO system were used to test the model. There were a total of 15 predictions evaluated, each involving the relation between Stage of Change and one of the other 15 Transtheoretical Model variables. For each variable, omega-squared and the related confidence interval were calculated and compared to the predicted effect sizes. Eleven of the 15 predictions were confirmed, providing support for the theoretical model. Quantitative predictions represent a much more direct, informative, and strong test of a theory than the traditional test of significance.  相似文献   

2.
The practice of statistical inference in psychological research is critically reviewed. Particular emphasis is put on the fast pace of change from the sole reliance on null hypothesis significance testing (NHST) to the inclusion of effect size estimates, confidence intervals, and an interest in the Bayesian approach. We conclude that these developments are helpful for psychologists seeking to extract a maximum of useful information from statistical research data, and that seven decades of criticism against NHST is finally having an effect.  相似文献   

3.
We investigated the way experienced users interpret Null Hypothesis Significance Testing (NHST) outcomes. An empirical study was designed to compare the reactions of two populations of NHST users, psychological researchers and professional applied statisticians, when faced with contradictory situations. The subjects were presented with the results of an experiment designed to test the efficacy of a drug by comparing two groups (treatment/placebo). Four situations were constructed by combining the outcome of the t test (significant vs. nonsignificant) and the observed difference between the two means D (large vs. small). Two of these situations appeared as conflicting (t significant/D small and t nonsignificant/D large). Three fundamental aspects of statistical inference were investigated by means of open questions: drawing inductive conclusions about the magnitude of the true difference from the data in hand, making predictions for future data, and making decisions about stopping the experiment. The subjects were 25 statisticians from pharmaceutical companies in France, subjects well versed in statistics, and 20 psychological researchers from various laboratories in France, all with experience in processing and analyzing experimental data. On the whole, statisticians and psychologists reacted in a similar way and were very impressed by significant results. It must be outlined that professional applied statisticians were not immune to misinterpretations, especially in the case of nonsignificance. However, the interpretations that accustomed users attach to the outcome of NHST can vary from one individual to another, and it is hard to conceive that there could be a consensus in the face of seemingly conflicting situations. In fact, beyond the superficial report of “erroneous” interpretations, it can be seen in the misuses of NHST intuitive judgmental “adjustments” that try to overcome its inherent shortcomings. These findings encourage the many recent attempts to improve the habitual ways of analyzing and reporting experimental data.  相似文献   

4.
人们熟知的零假设显著性检验,受到一次次质疑与辩护,地位并未动摇,报告检验结果仍然是统计分析的习惯做法。不过,其局限性促使研究者探寻更多的统计方法如区间估计、效应量分析、检验力分析等。本文先介绍假设检验与置信区间的关系;然后讨论检验力与两类错误率和效应量的关系;最后在理顺上述统计方法的基础上,提供一个可操作的统计分析流程。  相似文献   

5.
Null hypothesis significance testing (NHST) is the most widely accepted and frequently used approach to statistical inference in quantitative communication research. NHST, however, is highly controversial, and several serious problems with the approach have been identified. This paper reviews NHST and the controversy surrounding it. Commonly recognized problems include a sensitivity to sample size, the null is usually literally false, unacceptable Type II error rates, and misunderstanding and abuse. Problems associated with the conditional nature of NHST and the failure to distinguish statistical hypotheses from substantive hypotheses are emphasized. Recommended solutions and alternatives are addressed in a companion article.  相似文献   

6.
SIGNIFICANCE TESTS HAVE THEIR PLACE   总被引:1,自引:0,他引:1  
Abstract— Null-hypothesis significance tests (NHST), properly used, tell us whether we have sufficient evidence to be confident of the sign of the population effect—but only if we abandon two-valued logic in favor of Kaiser's (1960) three-alternative hypothesis tests Confidence intervals provide a useful addition to NHSTs, and can be used to provide the same sign-determination function as NHST However, when so used, confidence intervals are subject to exactly the same Type I, II, and III error rates as NHST In addition, NHSTs provide two pieces of information about our data—maximum probability of a Type III error and probability of a successful exact replication—that confidence intervals do not The proposed alternative to NHST is just as susceptible to misinterpretation as is NHST The problem of bias due to censoring of data collection or publication can be handled by providing archives for all methodologically sound data sets, but reserving interpretations and conclusions for statistically significant results.  相似文献   

7.
Null hypothesis significance testing (NHST) is undoubtedly the most common inferential technique used to justify claims in the social sciences. However, even staunch defenders of NHST agree that its outcomes are often misinterpreted. Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual. Nevertheless, little is known about how researchers interpret CIs. In this study, 120 researchers and 442 students—all in the field of psychology—were asked to assess the truth value of six particular statements involving different interpretations of a CI. Although all six statements were false, both researchers and students endorsed, on average, more than three statements, indicating a gross misunderstanding of CIs. Self-declared experience with statistics was not related to researchers’ performance, and, even more surprisingly, researchers hardly outperformed the students, even though the students had not received any education on statistical inference whatsoever. Our findings suggest that many researchers do not know the correct interpretation of a CI. The misunderstandings surrounding p-values and CIs are particularly unfortunate because they constitute the main tools by which psychologists draw conclusions from data.  相似文献   

8.
新世纪20年来国内假设检验方法学研究内容可分为如下几类: 零假设显著性检验的不足、p值的使用问题、心理学研究的可重复性问题、效应量、检验力、等效性检验、其他与假设检验关联的研究。零假设显著性检验已经发展成一套组合流程: 为了保证检验力和节省成本, 实验研究需要做先验检验力分析预估样本容量, 但问卷超过160人在传统统计中就没有必要这样做。当拒绝零假设时, 应当结合效应量做出结论。当不拒绝零假设时, 需要报告后验检验力; 如果效应量中或大而检验力不够高, 则可增加被试再行分析, 但这一过程应主动披露, 报告最后的实际p值并对可能犯的第一类错误率做出评估。  相似文献   

9.
Confidence intervals (CIs) for means are frequently advocated as alternatives to null hypothesis significance testing (NHST), for which a common theme in the debate is that conclusions from CIs and NHST should be mutually consistent. The authors examined a class of CIs for which the conclusions are said to be inconsistent with NHST in within-subjects designs and a class for which the conclusions are said to be consistent. The difference between them is a difference in models. In particular, the main issue is that the class for which the conclusions are said to be consistent derives from fixed-effects models with subjects fixed, not mixed models with subjects random. Offered is mixed model methodology that has been popularized in the statistical literature and statistical software procedures. Generalizations to different classes of within-subjects designs are explored, and comments on the future direction of the debate on NHST are offered.  相似文献   

10.
吕小康 《心理科学》2012,35(6):1502-1506
假设检验思想的提出者Fisher与Neyman–Pearson在统计模型的方法论基础、两类错误的性质、显著性水平的理解、以及假设检验的功能等方面存在诸多分歧, 使得心理统计中最常用的原假设显著性检验模式呈现出隐含的各种矛盾, 从而引发了应用上的争议。心理统计不仅需要检讨现有检验模型的模糊之处和提出其他补充性的统计推论方式,更应注重反思心理统计的教育传统, 以建立更加开放和多元的统计应用视野, 使心理统计为更好地心理学研究服务。  相似文献   

11.
12.
Recently, we have shown that two types of initial testing (recall of a list or guessing of critical items repeated over 12 study/test cycles) improved final recognition of related and unrelated word lists relative to restudy. These benefits were eliminated, however, when test instructions were manipulated within subjects and presented after study of each list, procedures designed to minimise expectancy of a specific type of upcoming test [Huff, Balota, & Hutchison, 2016. The costs and benefits of testing and guessing on recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42, 1559–1572. doi:10.1037/xlm0000269], suggesting that testing and guessing effects may be influenced by encoding strategies specific for the type of upcoming task. We follow-up these experiments by examining test-expectancy processes in guessing and testing. Testing and guessing benefits over restudy were not found when test instructions were presented either after (Experiment 1) or before (Experiment 2) a single study/task cycle was completed, nor were benefits found when instructions were presented before study/task cycles and the task was repeated three times (Experiment 3). Testing and guessing benefits emerged only when instructions were presented before a study/task cycle and the task was repeated six times (Experiments 4A and 4B). These experiments demonstrate that initial testing and guessing can produce memory benefits in recognition, but only following substantial task repetitions which likely promote task-expectancy processes.  相似文献   

13.
Abstract

This paper evaluated multilevel reliability measures in two-level nested designs (e.g., students nested within teachers) within an item response theory framework. A simulation study was implemented to investigate the behavior of the multilevel reliability measures and the uncertainty associated with the measures in various multilevel designs regarding the number of clusters, cluster sizes, and intraclass correlations (ICCs), and in different test lengths, for two parameterizations of multilevel item response models with separate item discriminations or the same item discrimination over levels. Marginal maximum likelihood estimation (MMLE)-multiple imputation and Bayesian analysis were employed to evaluate the accuracy of the multilevel reliability measures and the empirical coverage rates of Monte Carlo (MC) confidence or credible intervals. Considering the accuracy of the multilevel reliability measures and the empirical coverage rate of the intervals, the results lead us to generally recommend MMLE-multiple imputation. In the model with separate item discriminations over levels, marginally acceptable accuracy of the multilevel reliability measures and empirical coverage rate of the MC confidence intervals were found in a limited condition, 200 clusters, 30 cluster size, .2 ICC, and 40 items, in MMLE-multiple imputation. In the model with the same item discrimination over levels, the accuracy of the multilevel reliability measures and the empirical coverage rate of the MC confidence intervals were acceptable in all multilevel designs we considered with 40 items under MMLE-multiple imputation. We discuss these findings and provide guidelines for reporting multilevel reliability measures.  相似文献   

14.
15.
Geoffrey Loftus, Editor ofMemory & Cognition from 1994 to 1997, strongly encouraged presentation of figures with error bars and avoidance of null hypothesis significance testing (NHST). The authors examined 696Memory & Cognition articles published before, during, and after the Loftus editorship. Use of figures with bars increased to 47% under Loftus’s editorship and then declined. Bars were rarely used for interpretation, and NHST remained almost universal. Analysis of 309 articles in other psychology journals confirmed that Loftus’s influence was most evident in the articles he accepted for publication, but was otherwise limited. An e-mail survey of authors of papers accepted by Loftus revealed some support for his policy, but allegiance to traditional practices as well. Reform of psychologists’ statistical practices would require more than editorial encouragement.  相似文献   

16.
Estimation based on effect sizes, confidence intervals, and meta‐analysis usually provides a more informative analysis of empirical results than does statistical significance testing, which has long been the conventional choice in psychology. The sixth edition of the American Psychological Association Publication Manual now recommends that psychologists should, wherever possible, use estimation and base their interpretation of research results on point and interval estimates. We outline the Manual's recommendations and suggest how they can be put into practice: adopt an estimation framework, starting with the formulation of research aims as ‘How much?’ or ‘To what extent?’ questions. Calculate from your data effect size estimates and confidence intervals to answer those questions, then interpret. Wherever appropriate, use meta‐analysis to integrate evidence over studies. The Manual's recommendations can assist psychologists improve they way they do their statistics and help build a more quantitative and cumulative discipline.  相似文献   

17.
Behavior therapy has been widely used as a treatment for trichotillomania. However, behavioral treatments for TTM have tended to focus on behavior reduction, while not paying as much attention to social and economic impact. The current study sought to clarify the social and economic impact of Trichotillomania (TTM) in two samples of persons with TTM. Members of the first sample attended a TTM patient conference (N = 36) and members of the second responded to an online survey (N = 381). Both samples completed self‐report measures that examined the impact of TTM on avoiding activities and relationships, as well as financial costs. Results indicated that both groups reported similar amounts of avoidance in social situations, sought help from multiple health professionals, spent considerable time engaged in hair pulling activities, and had interference in both work and school. The study suggests a number of ways to decrease the negative impact of TTM. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

18.
This study assessed the accuracy of predictions of freshman and overall college scholastic performance made by groups of high school counselors, college advisors, and counseling psychologists from a university counseling center in relation to the confidence of these judges that their prognoses were accurate. Predictions were made from three sets of case information. The results revealed that: (1) the degree of confidence counselors indicated in their freshman and overall college “pass” predictions was appropriately related to accuracy; (2) counselor confidence in freshman “fail” predictions was not related to accuracy although the “fail” judgments tended to be more accurate than the “pass” prognoses; (3) counselor confidence in their overall “fail” predictions was not significantly related to accuracy and, unlike the results for the freshman judgments, the overall “fail” predictions were not more accurate than the “pass” predictions; (4) the amount of case data available was not related to counselor predictive accuracy.  相似文献   

19.
In null hypothesis significance testing (NHST), p values are judged relative to an arbitrary threshold for significance (.05). The present work examined whether that standard influences the distribution of p values reported in the psychology literature. We examined a large subset of papers from three highly regarded journals. Distributions of p were found to be similar across the different journals. Moreover, p values were much more common immediately below .05 than would be expected based on the number of p values occurring in other ranges. This prevalence of p values just below the arbitrary criterion for significance was observed in all three journals. We discuss potential sources of this pattern, including publication bias and researcher degrees of freedom.  相似文献   

20.
Human choice under uncertainty is influenced by erroneous beliefs about randomness. In simple binary choice tasks, such as red/black predictions in roulette, long outcome runs (e.g. red, red, red) typically increase the tendency to predict the other outcome (i.e. black), an effect labeled the “gambler's fallacy.” In these settings, participants may also attend to streaks in their predictive performance. Winning and losing streaks are thought to affect decision confidence, although prior work indicates conflicting directions. Over three laboratory experiments involving red/black predictions in a sequential roulette task, we sought to identify the effects of outcome runs and winning/losing streaks upon color predictions, decision confidence and betting behavior. Experiments 1 (n = 40) and 3 (n = 40) obtained trial‐by‐trial confidence ratings, with a win/no win payoff and a no loss/loss payoff, respectively. Experiment 2 (n = 39) obtained a trial‐by‐trial bet amount on an equivalent scale. In each experiment, the gambler's fallacy was observed on choice behavior after color runs and, in experiment 2, on betting behavior after color runs. Feedback streaks exerted no reliable influence on confidence ratings, in either payoff condition. Betting behavior, on the other hand, increased as a function of losing streaks. The increase in betting on losing streaks is interpreted as a manifestation of loss chasing; these data help clarify the psychological mechanisms underlying loss chasing and caution against the use of betting measures (“post‐decision wagering”) as a straightforward index of decision confidence. © 2014 The Authors. Journal of Behavioral Decision Making published by John Wiley & Sons Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号