首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A test theory using only ordinal assumptions is presented. It is based on the idea that the test items are a sample from a universe of items. The sum across items of the ordinal relations for a pair of persons on the universe items is analogous to a true score. Using concepts from ordinal multiple regression, it is possible to estimate the tau correlations of test items with the universe order from the taus among the test items. These in turn permit the estimation of the tau of total score with the universe. It is also possible to estimate the odds that the direction of a given observed score difference is the same as that of the true score difference. The estimates of the correlations between items and universe and between total score and universe are found to agree well with the actual values in both real and artificial data.Part of this paper was presented at the June, 1989, Meeting of the Psychometric Society. The authors wish to thank several reviewers for their suggestions. This research was mainly done while the second author was a University Fellow at the University of Southern California.  相似文献   

2.
E. Maris 《Psychometrika》1998,63(1):65-71
In the context ofconditional maximum likelihood (CML) estimation, confidence intervals can be interpreted in three different ways, depending on the sampling distribution under which these confidence intervals contain the true parameter value with a certain probability. These sampling distributions are (a) the distribution of the data given theincidental parameters, (b) the marginal distribution of the data (i.e., with the incidental parameters integrated out), and (c) the conditional distribution of the data given the sufficient statistics for the incidental parameters. Results on the asymptotic distribution of CML estimates under sampling scheme (c) can be used to construct asymptotic confidence intervals using only the CML estimates. This is not possible for the results on the asymptotic distribution under sampling schemes (a) and (b). However, it is shown that theconditional asymptotic confidence intervals are also valid under the other two sampling schemes. I am indebted to Theo Eggen, Norman Verhelst and one of Psychometrika's reviewers for their helpful comments.  相似文献   

3.
Meta-analyses of correlation coefficients are an important technique to integrate results from many cross-sectional and longitudinal research designs. Uncertainty in pooled estimates is typically assessed with the help of confidence intervals, which can double as hypothesis tests for two-sided hypotheses about the underlying correlation. A standard approach to construct confidence intervals for the main effect is the Hedges-Olkin-Vevea Fisher-z (HOVz) approach, which is based on the Fisher-z transformation. Results from previous studies (Field, 2005, Psychol. Meth., 10, 444; Hafdahl and Williams, 2009, Psychol. Meth., 14, 24), however, indicate that in random-effects models the performance of the HOVz confidence interval can be unsatisfactory. To this end, we propose improvements of the HOVz approach, which are based on enhanced variance estimators for the main effect estimate. In order to study the coverage of the new confidence intervals in both fixed- and random-effects meta-analysis models, we perform an extensive simulation study, comparing them to established approaches. Data were generated via a truncated normal and beta distribution model. The results show that our newly proposed confidence intervals based on a Knapp-Hartung-type variance estimator or robust heteroscedasticity consistent sandwich estimators in combination with the integral z-to-r transformation (Hafdahl, 2009, Br. J. Math. Stat. Psychol., 62, 233) provide more accurate coverage than existing approaches in most scenarios, especially in the more appropriate beta distribution simulation model.  相似文献   

4.
Reliability of scores from psychological or educational assessments provides important information regarding the precision of measurement. The reliability of scores is however population dependent and may vary across groups. In item response theory, this population dependence can be attributed to differential item functioning or to differences in the latent distributions between groups and needs to be accounted for when estimating the reliability of scores for different groups. Here, we introduce group-specific and overall reliability coefficients for sum scores and maximum likelihood ability estimates defined by a multiple group item response theory model. We derive confidence intervals using asymptotic theory and evaluate the empirical properties of estimators and the confidence intervals in a simulation study. The results show that the estimators are largely unbiased and that the confidence intervals are accurate with moderately large sample sizes. We exemplify the approach with the Montreal Cognitive Assessment (MoCA) in two groups defined by education level and give recommendations for applied work.  相似文献   

5.
Formulas are derived for unbiased sample estimators of any raw or central moment of the frequency distribution of true scores. A general method is developed for obtaining from each examinee's observed score a least squares estimate of his true score.This research was carried out under contract Nonr-2214(00) with the Office of Naval Research, Department of the Navy.  相似文献   

6.
7.
Principal covariate regression (PCOVR) is a method for regressing a set of criterion variables with respect to a set of predictor variables when the latter are many in number and/or collinear. This is done by extracting a limited number of components that simultaneously synthesize the predictor variables and predict the criterion ones. So far, no procedure has been offered for estimating statistical uncertainties of the obtained PCOVR parameter estimates. The present paper shows how this goal can be achieved, conditionally on the model specification, by means of the bootstrap approach. Four strategies for estimating bootstrap confidence intervals are derived and their statistical behaviour in terms of coverage is assessed by means of a simulation experiment. Such strategies are distinguished by the use of the varimax and quartimin procedures and by the use of Procrustes rotations of bootstrap solutions towards the sample solution. In general, the four strategies showed appropriate statistical behaviour, with coverage tending to the desired level for increasing sample sizes. The main exception involved strategies based on the quartimin procedure in cases characterized by complex underlying structures of the components. The appropriateness of the statistical behaviour was higher when the proper number of components were extracted.  相似文献   

8.
A modified beta binomial model is presented for use in analyzing ramdom guessing multiple choice tests and certain forms of taste tests. Detection probabilities for each item are distributed beta across the population subjects. Properties for the observable distribution of correct responses are derived. Two concepts of true score estimates are presented. One, analogous to Duncan's empirical Bayes posterior mean score, is appropriate for assessing the subject's performance on that particular test. The second is more suitable for predicting outcomes on similar tests.This research was made possible by a grant from the Center for Food Policy Research, Graduate School of Business, Columbia University.  相似文献   

9.
On cyclic-interval reinforcement schedules, animals typically show a postreinforcement pause that is a function of the immediately preceding time interval (temporal tracking). Animals, however, do not track single-alternation schedules-when two different intervals are presented in strict alternation on successive trials. In this experiment, pigeons were first trained with a cyclic schedule consisting of alternating blocks of 12 short intervals (5 s or 30 s) and 12 long intervals (180 s), followed by three different single-alternation interval schedules: (a) 30 s and 180 s, (b) 5 s and 180 s, and (c) 5 s and 30 s. Pigeons tracked both schedules with alternating blocks of 12 intervals. With the single-alternation schedules, when the short interval duration was 5 s, regardless of the duration of the longer interval, pigeons learned the alternation pattern, and their pause anticipated the upcoming interval. When the shorter interval was 30 s, even when the ratio of short to long intervals was kept at 6:1, pigeons did not initially show anticipatory pausing-a violation of the principle of timescale invariance.  相似文献   

10.
Confidence intervals (CIs) are fundamental inferential devices which quantify the sampling variability of parameter estimates. In item response theory, CIs have been primarily obtained from large-sample Wald-type approaches based on standard error estimates, derived from the observed or expected information matrix, after parameters have been estimated via maximum likelihood. An alternative approach to constructing CIs is to quantify sampling variability directly from the likelihood function with a technique known as profile-likelihood confidence intervals (PL CIs). In this article, we introduce PL CIs for item response theory models, compare PL CIs to classical large-sample Wald-type CIs, and demonstrate important distinctions among these CIs. CIs are then constructed for parameters directly estimated in the specified model and for transformed parameters which are often obtained post-estimation. Monte Carlo simulation results suggest that PL CIs perform consistently better than Wald-type CIs for both non-transformed and transformed parameters.  相似文献   

11.
The author compared simulations of the "true" null hypothesis (zeta) test, in which sigma was known and fixed, with the t test, in which s, an estimate of sigma, was calculated from the sample because the t test was used to emulate the "true" test. The true null hypothesis test bears exclusively on calculating the probability that a sample distance (mean) is larger than a specified value. The results showed that the value of t was sensitive to sampling fluctuations in both distance and standard error. Large values of t reflect small standard errors when n is small. The value of t achieves sensitivity primarily to distance only when the sample sizes are large. One cannot make a definitive statement about the probability or "significance" of a distance solely on the basis of the value of t.  相似文献   

12.
The defects of the least-squares or multiple-regression equation approach to estimating orthogonal factors are discussed and transformations of the beta weights are derived which remove these defects with minimum loss in correlations between estimators and true factor scores.The opinions expressed in this paper are those of the author and do not necessarily reflect official Department of the Army policy.  相似文献   

13.
14.
In a survey of journal articles, test manuals, and test critique books, the author found that a mean sample size (N) of 260 participants had been used for reliability studies on 742 tests. The distribution was skewed because the median sample size for the total sample was only 90. The median sample sizes for the internal consistency, retest, and interjudge reliabilities were 182, 64, and 36, respectively. The author presented sample size statistics for the various internal consistency methods and types of tests. In general, the author found that the sample sizes that were used in the internal consistency studies were too small to produce sufficiently precise reliability coefficients, which in turn could cause imprecise estimates of examinee true-score confidence intervals. The results also suggest that larger sample sizes have been used in the last decade compared with those that were used in earlier decades.  相似文献   

15.
Classical reliability theory assumes that individuals have identical true scores on both testing occasions, a condition described as stable. If some individuals' true scores are different on different testing occasions, described as unstable, the estimated reliability can be misleading. A model called stable unstable reliability theory (SURT) frames stability or instability as an empirically testable question. SURT assumes a mixed population of stable and unstable individuals in unknown proportions, with w(i) the probability that individual i is stable. w(i) becomes i's test score weight which is used to form a weighted correlation coefficient r(w) which is reliability under SURT. If all w(i) = 1 then r(w) is the classical reliability coefficient; thus classical theory is a special case of SURT. Typically r(w) is larger than the conventional reliability r, and confidence intervals on true scores are typically shorter than conventional intervals. r(w) is computed with routines in a publicly available R package.  相似文献   

16.
Forecasts of future outcomes, such as the consequences of climate change, are given with different degrees of precision. Logically, more precise forecasts (e.g., a temperature increase of 3–4°) have a smaller probability of capturing the actual outcome than less precise forecasts (e.g., a temperature increase of 2–6°). Nevertheless, people often trust precise forecasts more than vague forecasts, perhaps because precision is associated with knowledge and expertise. In five experiments, we ask whether people expect highly confident forecasts to be associated with wider or narrower outcome ranges than less confident forecasts (Experiments 1, 2, and 5), and, conversely, whether they expect precise forecasts to be issued with higher or lower confidence than vague forecasts (Experiments 3 and 4). The results revealed two distinct ways of thinking about confidence intervals, labeled distributional (wide intervals seen as more probable than narrow intervals) and associative (wide intervals seen as more uncertain than narrow intervals). Distributional responses occurred somewhat more often in within‐subjects designs, where wide and narrow prediction intervals and high and low probability estimates can be directly compared, whereas separate evaluations (in between‐subjects design) suggested associative responses to be slightly more frequent. These findings are relevant for experts communicating forecasts through confidence intervals. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

17.
The point estimate of sample coefficient alpha may provide a misleading impression of the reliability of the test score. Because sample coefficient alpha is consistently biased downward, it is more likely to yield a misleading impression of poor reliability. The magnitude of the bias is greatest precisely when the variability of sample alpha is greatest (small population reliability and small sample size). Taking into account the variability of sample alpha with an interval estimator may lead to retaining reliable tests that would be otherwise rejected. Here, the authors performed simulation studies to investigate the behavior of asymptotically distribution-free (ADF) versus normal-theory interval estimators of coefficient alpha under varied conditions. Normal-theory intervals were found to be less accurate when item skewness >1 or excess kurtosis >1. For sample sizes over 100 observations, ADF intervals are preferable, regardless of item skewness and kurtosis. A formula for computing ADF confidence intervals for coefficient alpha for tests of any size is provided, along with its implementation as an SAS macro.  相似文献   

18.
This study investigated the influence of intertrial interval duration on the performance of autistic children during teaching situations. The children were taught under the same conditions existing in their regular programs, except that the length of time between trials was systematically manipulated. With both multiple baseline and repeated reversal designs, two lengths of intertrial interval were employed: short intervals with the SD for any given trial presented approximately one second following the reinforcer for the previous trial versus long intervals with the SD presented four or more seconds following the reinforcer for the previous trial. The results showed that: (1) the short intertrial intervals always produced higher levels of correct responding than the long intervals; and (2) there were improving trends in performance and rapid acquisition with the short intertrial intervals, in contrast to minimal or no change with the long intervals. The results are discussed in terms of utilizing information about child and task characteristics in terms of selecting optimal intervals. The data suggest that manipulations made between trials have a large influence on autistic children's learning.  相似文献   

19.
Finite sample inference procedures are considered for analyzing the observed scores on a multiple choice test with several items, where, for example, the items are dissimilar, or the item responses are correlated. A discrete p-parameter exponential family model leads to a generalized linear model framework and, in a special case, a convenient regression of true score upon observed score. Techniques based upon the likelihood function, Akaike's information criteria (AIC), an approximate Bayesian marginalization procedure based on conditional maximization (BCM), and simulations for exact posterior densities (importance sampling) are used to facilitate finite sample investigations of the average true score, individual true scores, and various probabilities of interest. A simulation study suggests that, when the examinees come from two different populations, the exponential family can adequately generalize Duncan's beta-binomial model. Extensions to regression models, the classical test theory model, and empirical Bayes estimation problems are mentioned. The Duncan, Keats, and Matsumura data sets are used to illustrate potential advantages and flexibility of the exponential family model, and the BCM technique.The authors wish to thank Ella Mae Matsumura for her data set and helpful comments, Frank Baker for his advice on item response theory, Hirotugu Akaike and Taskin Atilgan, for helpful discussions regarding AIC, Graham Wood for his advice concerning the class of all binomial mixture models, Yiu Ming Chiu for providing useful references and information on tetrachoric models, and the Editor and two referees for suggesting several references and alternative approaches.  相似文献   

20.
The author compared simulations of the “true” null hypothesis (z) test, in which ò was known and fixed, with the t test, in which s, an estimate of ò, was calculated from the sample because the t test was used to emulate the “true” test. The true null hypothesis test bears exclusively on calculating the probability that a sample distance (mean) is larger than a specified value. The results showed that the value of t was sensitive to sampling fluctuations in both distance and standard error. Large values of t reflect small standard errors when n is small. The value of t achieves sensitivity primarily to distance only when the sample sizes are large. One cannot make a definitive statement about the probability or “significance” of a distance solely on the basis of the value of t.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号