This paper evaluates an adaptive staircase procedure for threshold estimation that is suitable for unforced-choice tasks—ones with the additional response alternativedon’t know. Within the framework of a theory of indecision, evidence is developed that fluctuations of the response criterion are much less detrimental to unforced-choice tasks than to yes/no tasks. An adaptive staircase procedure for unforced-choice tasks is presented. Computer simulations show a slight gain in efficiency ifdon’t know responses are allowed, even if response criteria vary. A behavioral comparison with forcedchoice and yes/no procedures shows that the new procedure outdoes the other two with respect to reliability. This is especially true for naive participants. For well-trained participants it is also slightly more efficient than the forced-choice procedure, and it produces a smaller systematic error than the yes/no procedure. Moreover, informal observations suggest that participants are more comfortable with unforced tasks than with forced ones.  相似文献   

Estimation of effect size is of interest in many applied fields such as Psychology, Sociology and Education. However there are few nonparametric estimators of effect size proposed in the existing literature, and little is known about the distributional characteristics of these estimators. In this article, two estimators based on the sample quantiles are proposed and studied. The first one is the estimator suggested by Hedges and Olkin (see page 93 of Hedges & Olkin, 1985) for the situation where a treatment effect is evaluated against a control group (Case A). A modified version of the robust estimator by Hedges and Olkin is also proposed for the situation where two parallel treatments are compared (Case B). Large sample distributions of both estimators are derived. Their asymptotic relative efficiencies with respect to the normal maximum likelihood estimators under several common distributions are evaluated. The robust properties of the proposed estimators are discussed with respect to the sample-wise breakdown points proposed by Akritas (1991). Simulation studies are provided in which the performing characteristics of the proposed estimator are compared to that of the nonparametric estimators by Kraemer and Andrews (1982). Interval estimation of the effect sizes is also discussed. In an example, interval estimates for the data set in Kraemer and Andrews (1982) are calculated for both cases A and B.  相似文献   

The Gibbs sampler can be used to obtain samples of arbitrary size from the posterior distribution over the parameters of a structural equation model (SEM) given covariance data and a prior distribution over the parameters. Point estimates, standard deviations and interval estimates for the parameters can be computed from these samples. If the prior distribution over the parameters is uninformative, the posterior is proportional to the likelihood, and asymptotically the inferences based on the Gibbs sample are the same as those based on the maximum likelihood solution, for example, output from LISREL or EQS. In small samples, however, the likelihood surface is not Gaussian and in some cases contains local maxima. Nevertheless, the Gibbs sample comes from the correct posterior distribution over the parameters regardless of the sample size and the shape of the likelihood surface. With an informative prior distribution over the parameters, the posterior can be used to make inferences about the parameters underidentified models, as we illustrate on a simple errors-in-variables model.We thank David Spiegelhalter for suggesting applying the Gibbs sampler to structural equation models to the first author at a 1994 workshop in Wiesbaden. We thank Ulf Böckenholt, Chris Meek, Marijtje van Duijn, Clark Glymour, Ivo Molenaar, Steve Klepper, Thomas Richardson, Teddy Seidenfeld, and Tom Snijders for helpful discussions, mathematical advice, and critiques of earlier drafts of this paper.  相似文献   

This paper evaluates an adaptive staircase procedure for threshold estimation that is suitable for unforced-choice tasks-ones with the additional response alternative don't know. Within the framework of a theory of indecision, evidence is developed that fluctuations of the response criterion are much less detrimental to unforced-choice tasks than to yes/no tasks. An adaptive staircase procedure for unforced-choice tasks is presented. Computer simulations show a slight gain in efficiency if don't know responses are allowed, even if response criteria vary. A behavioral comparison with forced-choice and yes/no procedures shows that the new procedure outdoes the other two with respect to reliability. This is especially true for naive participants. For well-trained participants it is also slightly more efficient than the forced-choice procedure, and it produces a smaller systematic error than the yes/no procedure. Moreover, informal observations suggest that participants are more comfortable with unforced tasks than with forced ones.  相似文献   

On estimation and hypothesis testing problems for correlation coefficients   总被引:1,自引:0,他引:1  
A selection of statistical problems commonly encountered in psychological or psychiatric research concerning correlation coefficients are re-evaluated in the light of recently developed simplifications in the forms of the distribution theory of the intraclass correlation coefficient (exact theory), of the product-moment correlation coefficient and of the Spearman rank correlation coefficient (approximate).  相似文献   

A state-of-the-art data analysis procedure is presented to conduct hierarchical Bayesian inference and hypothesis testing on delay discounting data. The delay discounting task is a key experimental paradigm used across a wide range of disciplines from economics, cognitive science, and neuroscience, all of which seek to understand how humans or animals trade off the immediacy verses the magnitude of a reward. Bayesian estimation allows rich inferences to be drawn, along with measures of confidence, based upon limited and noisy behavioural data. Hierarchical modelling allows more precise inferences to be made, thus using sometimes expensive or difficult to obtain data in the most efficient way. The proposed probabilistic generative model describes how participants compare the present subjective value of reward choices on a trial-to-trial basis, estimates participant- and group-level parameters. We infer discount rate as a function of reward size, allowing the magnitude effect to be measured. Demonstrations are provided to show how this analysis approach can aid hypothesis testing. The analysis is demonstrated on data from the popular 27-item monetary choice questionnaire (Kirby, Psychonomic Bulletin & Review, 16(3), 457–462 2009), but will accept data from a range of protocols, including adaptive procedures. The software is made freely available to researchers.  相似文献   

Standard least squares analysis of variance methods suffer from poor power under arbitrarily small departures from normality and fail to control the probability of a Type I error when standard assumptions are violated. This article describes a framework for robust estimation and testing that uses trimmed means with an approximate degrees of freedom heteroscedastic statistic for independent and correlated groups designs in order to achieve robustness to the biasing effects of nonnormality and variance heterogeneity. The authors describe a nonparametric bootstrap methodology that can provide improved Type I error control. In addition, the authors indicate how researchers can set robust confidence intervals around a robust effect size parameter estimate. In an online supplement, the authors use several examples to illustrate the application of an SAS program to implement these statistical methods.  相似文献   

Discretized multivariate normal structural models are often estimated using multistage estimation procedures. The asymptotic properties of parameter estimates, standard errors, and tests of structural restrictions on thresholds and polychoric correlations are well known. It was not clear how to assess the overall discrepancy between the contingency table and the model for these estimators. It is shown that the overall discrepancy can be decomposed into a distributional discrepancy and a structural discrepancy. A test of the overall model specification is proposed, as well as a test of the distributional specification (i.e., discretized multivariate normality). Also, the small sample performance of overall, distributional, and structural tests, as well as of parameter estimates and standard errors is investigated under conditions of correct model specification and also under mild structural and/or distributional misspecification. It is found that relatively small samples are needed for parameter estimates, standard errors, and structural tests. Larger samples are needed for the distributional and overall tests. Furthermore, parameter estimates, standard errors, and structural tests are surprisingly robust to distributional misspecification. This research was supported by the Department of Universities, Research and Information Society (DURSI) of the Catalan Government, and by grants BSO2000-0661 and BSO2003-08507 of the Spanish Ministry of Science and Technology.  相似文献   

Quantile maximum likelihood (QML) is an estimation technique, proposed by Heathcote, Brown, and Mewhort (2002), that provides robust and efficient estimates of distribution parameters, typically for response time data, in sample sizes as small as 40 observations. In view of the computational difficulty inherent in implementing QML, we provide open-source Fortran 90 code that calculates QML estimates for parameters of the ex-Gaussian distribution, as well as standard maximum likelihood estimates. We show that parameter estimates from QML are asymptotically unbiased and normally distributed. Our software provides asymptotically correct standard error and parameter intercorrelation estimates, as well as producing the outputs required for constructing quantile—quantile plots. The code is parallelizable and can easily be modified to estimate parameters from other distributions. Compiled binaries, as well as the source code, example analysis files, and a detailed manual, are available for free on the Internet.  相似文献   

This paper presents a row-column (RC) association model in which the estimated row and column scores are forced to be in agreement with an a priori specified ordering. Two efficient algorithms for finding the order-restricted maximum likelihood (ML) estimates are proposed and their reliability under different degrees of association is investigated by a simulation study. We propose testing order-restricted RC models using a parametric bootstrap procedure, which turns out to yield reliablep values, except for situations in which the association between the two variables is very weak. The use of order-restricted RC models is illustrated by means of an empirical example. Francisca Galindo performed this research as a part of her PhD. dissertation project at Tilburg University.  相似文献   

Computerized adaptive testing in personality assessment can improve efficiency by significantly reducing the number of items administered to answer an assessment question. Two approaches have been explored for adaptive testing in computerized personality assessment: item response theory and the countdown method. In this article, the authors review the literature on each and report the results of an investigation designed to explore the utility, in terms of item and time savings, and validity, in terms of correlations with external criterion measures, of an expanded countdown method-based research version of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2), the MMPI-2 Computerized Adaptive Version (MMPI-2-CA). Participants were 433 undergraduate college students (170 men and 263 women). Results indicated considerable item savings and corresponding time savings for the adaptive testing modalities compared with a conventional computerized MMPI-2 administration. Furthermore, computerized adaptive administration yielded comparable results to computerized conventional administration of the MMPI-2 in terms of both test scores and their validity. Future directions for computerized adaptive personality testing are discussed.  相似文献   

The following problem is considered: Given that the frequency distribution of the errors of measurement is known, determine or estimate the distribution of true scores from the distribution of observed scores for a group of examinees. Typically this problem does not have a unique solution. However, if the true-score distribution is smooth, then any two smooth solutions to the problem will differ little from each other. Methods for finding smooth solutions are developed a) for a population and b) for a sample of examinees. The results of a number of tryouts on actual test data are summarized.The writer wishes to thank Diana Lees and Virginia Lennon, who wrote the computer programs, carried out some of the mathematical derivations, and helped with other important aspects of the work. This work was supported in part by contract Nonr-2752(00) between the Office of Naval Research and Educational Testing Service. Reproduction, translation, use and disposal in whole or in part by or for the United States Government is permitted.  相似文献   

Abstract:  In test operations using IRT (item response theory), items are included in a test before being used to rate subjects and the response data is used to estimate their item parameters. However, this method of test operation may lead to item content leakage and an adequate test operation can become difficult. To address this problem, Ozaki and Toyoda (2005, 2006 ) developed item difficulty parameter estimation methods that use paired comparison data from the perspective of the difficulty of items as judged by raters familiar with the field. In the present paper, an improved method of item difficulty parameter estimation is developed. In this new method, an item for which the difficulty parameter is to be estimated is compared with multiple items simultaneously, from the perspective of their difficulty. This is not a one-to-one comparison but a one-to-many comparison. In the comparisons, raters are informed that items selected from an item pool are ordered according to difficulty. The order will provide insight to improve the accuracy of judgment.  相似文献   

We relate Thurstonian models for paired comparisons data to Thurstonian models for ranking data, which assign zero probabilities to all intransitive patterns. We also propose an intermediate model for paired comparisons data that assigns nonzero probabilities to all transitive patterns and to some but not all intransitive patterns.There is a close correspondence between the multidimensional normal ogive model employed in educational testing and Thurstone's model for paired comparisons data under multiple judgment sampling with minimal identification restrictions. Alike the normal ogive model, Thurstonian models have two formulations, a factor analytic and an IRT formulation. We use the factor analytic formulation to estimate this model from the first and second order marginals of the contingency table using estimators proposed by Muthén. We also propose a statistic to assess the fit of these models to the first and second order marginals of the contingency table. This is important, as a model may reproduce well the estimated thresholds and tetrachoric correlations, yet fail to reproduce the marginals of the contingency table if the assumption of multivariate normality is incorrect.A simulation study is performed to investigate the performance of three alternative limited information estimators which differ in the procedure used in their final stage: unweighted least squares (ULS), diagonally weighted least squares (DWLS), and full weighted least squares (WLS). Both the ULS and DWLS show a good performance with medium size problems and small samples, with a slight better performance of the ULS estimator.This paper is based on the author's doctoral dissertation; Ulf Böckenholt, advisor. The final stages of this research took place while the author was at the Department of Statistics and Econometrics, Universidad Carlos III de Madrid. The author is indebted to Adolfo Hernández for stimulating discussions that helped improve this paper, and to Ulf Böckenholt and the Associate Editor for a number of helpfulsuggestions to a previous draft.  相似文献   

A measure of the discrepancy between observed transition frequencies and those predicted by a learning model, an average error of a learning model, is presented. The maximum-likelihood estimator of the average error is derived and its use in a modified test of goodness of fit is demonstrated.  相似文献   

In this paper we introduce a novel reinforcement learning algorithm called event-learning. The algorithm uses events, ordered pairs of two consecutive states. We define event-value function and we derive learning rules. Combining our method with a well-known robust control method, the SDS algorithm, we introduce Robust Policy Heuristics (RPH). It is shown that RPH, a fast-adapting non-Markovian policy, is particularly useful for coarse models of the environment and could be useful for some partially observed systems. RPH may be of help in alleviating the ‘curse of dimensionality’ problem. Event-learning and RPH can be used to separate time scales of learning of value functions and adaptation. We argue that the definition of modules is straightforward for event-learning and event-learning makes planning feasible in the RL framework. Computer simulations of a rotational inverted pendulum with coarse discretization are shown to demonstrate the principle.  相似文献   

