期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Bias in Estimation of Misclassification Rates

Shelby J. Haberman 《Psychometrika》2006,71(2):387-394

When a simple random sample of size n is employed to establish a classification rule for prediction of a polytomous variable by an independent variable, the best achievable rate of misclassification is higher than the corresponding best achievable rate if the conditional probability distribution is known for the predicted variable given the independent variable. In typical cases, this increased misclassification rate due to sampling is remarkably small relative to other increases in expected measures of prediction accuracy due to samplings that are typically encountered in statistical analysis. This issue is particularly striking if a polytomous variable predicts a polytomous variable, for the excess misclassification rate due to estimation approaches 0 at an exponential rate as n increases. Even with a continuous real predictor and with simple nonparametric methods, it is typically not difficult to achieve an excess misclassification rate on the order of n ⁻¹. Although reduced excess error is normally desirable, it may reasonably be argued that, in the case of classification, the reduction in bias is related to a more fundamental lack of sensitivity of misclassification error to the quality of the prediction. This lack of sensitivity is not an issue if criteria based on probability prediction such as logarithmic penalty or least squares are employed, but the latter measures typically involve more substantial issues of bias. With polytomous predictors, excess expected errors due to sampling are typically of order n ⁻¹. For a continuous real predictor, the increase in expected error is typically of order n ^−2/3 相似文献

2.

A probabilistic formulation and statistical analysis of guttman scaling

C. H. Proctor 《Psychometrika》1970,35(1):73-78

By proposing that the latent or true nature of subjects is identified with a limited number of response patterns (the Guttman scale patterns), the probability of an observed response pattern can be written as the sum of products of the probability of the true type multiplied by the chance of sufficient response error to cause the observed pattern to appear. This model contains the proportions of the true types as parameters plus some misclassification probabilities as parameters. Maximum likelihood methods are used to make estimates and test the fit for some examples. 相似文献

3.

Modeling the word recognition data of Vitevitch and Luce (1998): Is it ARTful?

Pitt MA Myung JI Altieri N 《Psychonomic bulletin & review》2007,14(3):442-448

Vitevitch and Luce (1998) showed that the probability with which phonemes co-occur in the language (phonotactic probability) affects the speed with which words and nonwords are named. Words with high phonotactic probabilities between phonemes were named more slowly than words with low probabilities, whereas with nonwords, just the opposite was found. To reproduce this reversal in performance, a model would seem to require not merely sublexical representations, but sublexical representations that are relatively independent of lexical representations. ARTphone (Grossberg, Boardman, & Cohen, 1997) is designed to meet these requirements. In this study, we used a technique called parameter space partitioning to analyze ARTphone’s behavior and to learn if it can mimic human behavior and, if so, to understand how. To perform best, differences in sublexical node probabilities must be amplified relative to lexical node probabilities to offset the additional source of inhibition (from top-down masking) that is found at the sublexical level. 相似文献

4.

Out with .05, in with Replication and Measurement: Isolating and Working with the Particular Effect Sizes that are Troublesome for Inferential Statistics

Michael T. Bradley Andrew Brand 《The Journal of general psychology》2017,144(4):309-316

It is difficult to obtain adequate power to test a small effect size with a set criterion alpha of 0.05. Probably an inferential test will indicate non-statistical significance and not be published. Rarely, statistical significance will be obtained, and an exaggerated effect size calculated and reported. Accepting all inferential probabilities and associated effect sizes could solve exaggeration problems. Graphs, generated through Monte Carlo methods, are presented to illustrate this. The first graph presents effect sizes (Cohen's d) as lines from 1 to 0 with probabilities on the Y axis and the number of measures on the X axis. This graph shows effect sizes of .5 or less should yield non-significance with sample sizes below 120 measures. The other graphs show results with as many as 10 small sample size replications. There is a convergence of means with the effect size as sample size increases and measurement accuracy emerges. 相似文献

5.

Estimating the polychoric correlation from misclassified data

Choi‐Fan Yiu Wai‐Yin Poon 《The British journal of mathematical and statistical psychology》2008,61(1):49-74

Many variables that are used in social and behavioural science research are ordinal categorical or polytomous variables. When more than one polytomous variable is involved in an analysis, observations are classified in a contingency table, and a commonly used statistic for describing the association between two variables is the polychoric correlation. This paper investigates the estimation of the polychoric correlation when the data set consists of misclassified observations. Two approaches for estimating the polychoric correlation have been developed. One assumes that the probabilities in relation to misclassification are known, and the other uses a double sampling scheme to obtain information on misclassification. A parameter estimation procedure is developed, and statistical properties for the estimates are discussed. The practicability and applicability of the proposed approaches are illustrated by analysing data sets that are based on real and generated data. Excel programmes with visual basic for application (VBA) have been developed to compute the estimate of the polychoric correlation and its standard error. The use of the structural equation modelling programme Mx to find parameter estimates in the double sampling scheme is discussed. 相似文献

6.

A warning about statistical significance tests performed on large samples of nonindependent observations

Zimmerman DW 《Perceptual and motor skills》2002,94(1):259-263

When sample observations are not independent, the variance estimate in the denominator of the Student t statistic is altered, inflating the value of the test statistic and resulting in far too many Type I errors. Furthermore, how much the Type I error probability exceeds the nominal significance level is an increasing function of sample size. If N is quite large, in the range of 100 to 200 or larger, small apparently inconsequential correlations that are unknown to a researcher, such as .01 or .02, can have substantial effects and lead to false reports of statistical significance when effect size is zero. 相似文献

7.

On linear combinations of binary item scores

T. Krishnan 《Psychometrika》1973,38(3):291-304

A method is given for finding a linear combination of binary item scores that minimizes the expected frequency of misclassification, in discriminating between two groups. The item scores are not assumed to be stochastically independent. The method uses the theory of threshold functions, developed by electrical engineers. Since psychometricians may not be familiar with this theory an elementary introduction to the required material is also given. 相似文献

8.

Simultaneous testing of McNemar's problem for several populations

M. A. Hamdan W. R. Pirie J. C. Arnold 《Psychometrika》1975,40(2):153-161

McNemar's problem concerns the hypothesis of equal probabilities for the unlike pairs of correlated binary variables. We consider four different extensions to this problem, each for testing simultaneous equality of proportions of unlike pairs inc independent populations of correlated binary variables, but each under different assumptions and/or additional hypotheses. For each extension both the likelihood ratio test and the goodness-of-fit chi-square test are given. Whenc=1, all cases reduce to McNemar's problem. Forc ≥ 2, however, the tests are quite different, depending on exactly how the hypothesis and alternatives of McNemar are extended. An example illustrates how widely the results may differ, depending on which extended framework is appropriate. 相似文献

9.

A multivariate projection‐type analogue of the Wilcoxon — Mann — Whitney test

《The British journal of mathematical and statistical psychology》2004,57(2):205-213

The problem of comparing two independent groups based on mulitivariate data is considered. Many such methods have been proposed, but it is difficult to gain a perspective on the extent to which the groups differ. The basic strategy here is to determine a robust measure of location for each group, project the data onto the line connecting these measures of location, and then compare the groups based on the ordering of the projected points. In the univariate case the method uses the same measure of effect size employed by the Wilcoxon — Mann — Whitney test. Under general conditions, the projected points are dependent, causing difficulties when testing hypotheses. Two methods are found to be effective when trying to avoid Type I error probabilities above the nominal level. The relative merits of the two methods are discussed. The projected data provide not only a useful (numerical) measure of effect size, but also a graphical indication of the extent to which groups differ. 相似文献

10.

The meaning of malingering data: further applications of Bayes' theorem

Mossman D 《Behavioral sciences & the law》2000,18(6):761-779

A previous Behavioral Sciences and the Law article (Mossman & Hart, 1996) asserted that information from malingering tests is best conceptualized using Bayes' theorem, and that courts therefore deserve Bayesian interpretations when mental health professionals present evidence about malingering. Mossman and Hart gave several examples of estimated Bayesian posterior probabilities, but they did not systematically address how one constructs confidence intervals for these estimates. This article explains how the usually imperfect nature of humanly created diagnostic tests mandates Bayesian interpretations of test results, and describes methods for generating confidence intervals for posterior probabilities. Sample calculations show that Bayesian reasoning is quite feasible and would not require investigators to expend unusual efforts when constructing and validating malingering instruments. Bayesian interpretations most accurately capture what malingering tests do: provide information that alters one's beliefs about the likelihood of malingering. 相似文献

11.

Assigning and combining probabilities in single-case studies: A second study

Rumen Manolov Antonio Solanas 《Behavior research methods》2013,45(4):1024-1035

The present study builds on a previous proposal for assigning probabilities to the outcomes computed using different primary indicators in single-case studies. These probabilities are obtained by comparing the outcome to previously tabulated reference values, and they reflect the likelihood of the results in the case that no intervention effect is present. In the present study, we explored how well different metrics are translated into p values in the context of simulation data. Furthermore, two published multiple-baseline data sets were used to illustrate how well the probabilities might reflect the intervention effectiveness, as assessed by the original authors. Finally, the importance of which primary indicator would be used in each data set to be integrated was explored; two ways of combining probabilities were used: a weighted average and a binomial test. The results indicated that the translation into p values worked well for the two nonoverlap procedures, with the results for the regression-based procedure diverging due to some undesirable features of its performance. These p values, when either taken individually or combined, were well aligned with effectiveness for the real-life data. These results suggest that assigning probabilities can be useful for translating the primary measure into the same metric, using these probabilities as additional evidence of the importance of behavioral change, complementing visual analysis and professionals’ judgments. 相似文献

12.

On the shape of the probability weighting function 总被引：21，自引：0，他引：21

Gonzalez R Wu G 《Cognitive psychology》1999,38(1):129-166

Empirical studies have shown that decision makers do not usually treat probabilities linearly. Instead, people tend to overweight small probabilities and underweight large probabilities. One way to model such distortions in decision making under risk is through a probability weighting function. We present a nonparametric estimation procedure for assessing the probability weighting function and value function at the level of the individual subject. The evidence in the domain of gains supports a two-parameter weighting function, where each parameter is given a psychological interpretation: one parameter measures how the decision maker discriminates probabilities, and the other parameter measures how attractive the decision maker views gambling. These findings are consistent with a growing body of empirical and theoretical work attempting to establish a psychological rationale for the probability weighting function. 相似文献

13.

Joint independent random utility models where one of the choice structures satisfies the strict utility model

A.A.J Marley 《Journal of mathematical psychology》1981,23(3):257-272

It is known that one cannot construct an independent joint random utility model for which both the preference and aversion probabilities satisfy the strict utility model. Given this result, it is natural to ask what kind of relations are possible in such contexts. Here, for instance, we demonstrate that if a structure of preference and aversion probabilities are generated by a common independent random utility model, and if the preference probabilities satisfy the strict utility model, then the preference and aversion probabilities are related by what is known as the strong acceptance condition. We also prove an appropriate converse of this result. Finally, an example is given of a process (other than the random utility model) that is compatible with the assumption that a structure of preference and aversion probabilities satisfies a common strict utility model. 相似文献

14.

Exact procedures for the analysis of multidimensional contingency tables

Juliet Popper Shaffer 《Behavior research methods》1972,4(5):231-236

The log-linear model for contingency tables expresses the logarithm of a cell frequency as an additive function of main effects, interactions, etc., in a way formally identical with an analysis of variance model. Exact statistical tests are developed to test hypotheses that specific effects or sets of effects are zero, yielding procedures for exploring relationships among qualitative variables which are suitable for small samples. The tests are analogous to Fisher's exact test for a 2 × 2 contingency table. Given a hypothesis, the exact probability of the obtained table is determined, conditional on fixed marginals or other functions of the cell frequencies. The sum of the probabilities of the obtained table and of all less probable ones is the exact probability to be considered in testing the null hypothesis. Procedures for obtaining exact probabilities are explained in detail, with examples given. 相似文献

15.

To Forget or Not to Forget: The Effect of Probability of Test on Directed Forgetting

Jonathan M. Golding 《The Quarterly Journal of Experimental Psychology Section A: Human Experimental Psychology》1996,49(2):326-340

Two experiments investigated how individuals use explicit memory cues that designate different probabilities of test. As in typical directed forgetting studies, subjects received words explicitly cued as having either a 0% or a 100% chance of being on a subsequent memory test (i.e. forget and remember cues, respectively). In addition, some words were explicitly cued as having the potential to be either forgotten or remembered (i.e. a 50% cue). Recall of 50% words was between that of 0% and 100% words. In addition, the presence of 50% words lowered recall of the 100% words compared to that of a control group that did not receive the 50% words, but received the same number of 100% words. A think-aloud task indicated that these results were due to the 50% words being treated like either 100% or 0% words at encoding. The results are discussed in terms of the effect of different probabilities of test on the strategic processing and representation of information. 相似文献

16.

Adequacy of Official Suicide Statistics for Scientific Research and Public Policy

David P. Phillips Todd E. Ruth 《Suicide & life-threatening behavior》1993,23(4):307-319

Many researchers have claimed that the study of suicide and the formation of public policy are not undermined by the misclassification of suicide as other causes of death. We evaluated this claim using a new technique and causes of death not previously considered. We examined computerized California death certificates, 1966-1990. Mortality peaks at symbolic ages are a characteristic feature of suicide. We sought such peaks in (1) causes of death commonly suspected of containing misclassified suicides (e.g., accidental barbiturate poisoning), (2) causes of death not hitherto suspected (e.g., pedestrian deaths), and (3) control groups. The first two categories displayed peaks at symbolic ages, but control groups did not. The size of the peak, indicative of misclassified suicides, varied markedly by race (p < .0001) and sex (p < .0001). Misclassification is evident for all time periods examined, large and small counties, and each race and sex. The maximum misclassification occurs for Blacks (14.92% of officially recorded suicides). We conclude that suicides are misallocated to at least five other causes of death (two of which have not been previously considered in the literature) and are most likely to be underreported for groups with low official suicide rates, that is, Blacks and females. Consequently, Blacks and females are not as protected from suicide as was previously supposed. It may be inadvisable to use official suicide data to test scientific hypotheses about suicide, unless the effects of underreporting are estimated and, if necessary, corrected for. 相似文献

17.

Directional verbal probabilities: inconsistencies between preferential judgments and numerical meanings

Honda H Yamagishi K 《Experimental psychology》2006,53(3):161-170

Verbal probability expressions (e.g., it is possible or doubtful) convey not only vague numerical meanings (i.e., probability) but also semantic functions, called directionality. We performed two experiments to examine whether preferential judgments are consistent with numerical meanings of verbal probabilities regardless of directionality. The results showed that because of the effects of directionality, perceived degrees of certainty for verbal probabilities differed between a binary choice and a numerical translation (Experiment 1), and decisions based on a verbal probability do not correspond to those based on a numerical translation for verbal probabilities (Experiment 2). These findings suggest that directionality of verbal probabilities is an independent feature from numerical meanings; hence numerical meanings of verbal probability alone remain insufficient to explain the effects of directionality on preferential judgments. 相似文献

18.

方差分析效果大小报告的新指标

下载免费PDF全文

刘铁川王闪闪桂雅立《心理学探新》2019,(3):238-243

心理学期刊论文中可重复性不高的现象,原因之一是研究结果的效果普遍较小。并且在报告效果大小的文章中,效果大小指标使用不当。在方差分析中最经常报告的是η2和η2_p,但是在不同的研究设计中,这些效果大小是无法直接进行比较的。广义eta方(η²_G)是近年来新出现的一种新的效果大小指标,可克服的η2和η2_p不足,灵活处理重复测量等多种研究设计下个体差异的计算问题,实现跨研究设计效果大小的可比性。论文结合实例介绍了η²_G的原理和计算方法,并对其优缺点、使用和报告等问题进行了讨论。研究人员在报告效果大小时要考虑到不同的研究设计和研究假设,并选择恰当的指标防止过高估计效果大小。相似文献

19.

Statistical significance levels of nonparametric tests biased by heterogeneous variances of treatment groups

Zimmerman DW 《The Journal of general psychology》2000,127(4):354-364

The statistical significance levels of the Wilcoxon-Mann-Whitney test and the Kruskal-Wallis test are substantially biased by heterogeneous variances of treatment groups--even when sample sizes are equal. Under these conditions, the Type I error probabilities of the nonparametric tests, performed at the .01, .05, and .10 significance levels, increase by as much as 40%-50% in many cases and sometimes as much as 300%. The bias increases systematically as the ratio of standard deviations of treatment groups increases and remains fairly constant for various sample sizes. There is no indication that Type I error probabilities approach the significance level asymptotically as sample size increases. 相似文献

20.

The Effects of Framing, Reflection, Probability, and Payoff on Risk Preference in Choice Tasks

Anton Kühberger Michael Schulte-Mecklenbeck Josef Perner 《Organizational behavior and human decision processes》1999,78(3):248

A meta-analysis of Asian-disease-like studies is presented to identify the factors which determine risk preference. First the confoundings between probability levels, payoffs, and framing conditions are clarified in a task analysis. Then the role of framing, reflection, probability, type, and size of payoff is evaluated in a meta-analysis. It is shown that bidirectional framing effects exist for gains and for losses. Presenting outcomes as gains tends to induce risk aversion, while presenting outcomes as losses tends to induce risk seeking. Risk preference is also shown to depend on the size of the payoffs, on the probability levels, and on the type of good at stake (money/property vs human lives). In general, higher payoffs lead to increasing risk aversion. Higher probabilities lead to increasing risk aversion for gains and to increasing risk seeking for losses. These findings are confirmed by a subsequent empirical test. Shortcomings of existing formal theories, such as prospect theory, cumulative prospect theory, venture theory, and Markowitz's utility theory, are identified. It is shown that it is not probabilities or payoffs, but the framing condition, which explains most variance. These findings are interpreted as showing that no linear combination of formally relevant predictors is sufficient to capture the essence of the framing phenomenon. 相似文献