首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We present a general sampling procedure to quantify model mimicry, defined as the ability of a model to account for data generated by a competing model. This sampling procedure, called the parametric bootstrap cross-fitting method (PBCM; cf. Williams (J. R. Statist. Soc. B 32 (1970) 350; Biometrics 26 (1970) 23)), generates distributions of differences in goodness-of-fit expected under each of the competing models. In the data informed version of the PBCM, the generating models have specific parameter values obtained by fitting the experimental data under consideration. The data informed difference distributions can be compared to the observed difference in goodness-of-fit to allow a quantification of model adequacy. In the data uninformed version of the PBCM, the generating models have a relatively broad range of parameter values based on prior knowledge. Application of both the data informed and the data uninformed PBCM is illustrated with several examples.  相似文献   

2.
Experiments often produce a hit rate and a false alarm rate in each of two conditions. These response rates are summarized into a single-point sensitivity measure such as d', and t tests are conducted to test for experimental effects. Using large-scale Monte Carlo simulations, we evaluate the Type I error rates and power that result from four commonly used single-point measures: d', A', percent correct, and gamma. We also test a newly proposed measure called gammaC. For all measures, we consider several ways of handling cases in which false alarm rate = 0 or hit rate = 1. The results of our simulations indicate that power is similar for these measures but that the Type I error rates are often unacceptably high. Type I errors are minimized when the selected sensitivity measure is theoretically appropriate for the data.  相似文献   

3.
4.
Although many common uses of p-values for making statistical inferences in contemporary scientific research have been shown to be invalid, no one, to our knowledge, has adequately assessed the main original justification for their use, which is that they can help to control the Type I error rate (Neyman & Pearson, 1928, 1933). We address this issue head-on by asking a specific question: Across what domain, specifically, do we wish to control the Type I error rate? For example, do we wish to control it across all of science, across all of a specific discipline such as psychology, across a researcher's active lifetime, across a substantive research area, across an experiment, or across a set of hypotheses? In attempting to answer these questions, we show that each one leads to troubling dilemmas wherein controlling the Type I error rate turns out to be inconsistent with other scientific desiderata. This inconsistency implies that we must make a choice. In our view, the other scientific desiderata are much more valuable than controlling the Type I error rate and so it is the latter, rather than the former, with which we must dispense. But by doing so—that is, by eliminating the Type I error justification for computing and using p-values—there is even less reason to believe that p is useful for validly rejecting null hypotheses than previous critics have suggested.  相似文献   

5.
Many books on statistical methods advocate a ‘conditional decision rule’ when comparing two independent group means. This rule states that the decision as to whether to use a ‘pooled variance’ test that assumes equality of variance or a ‘separate variance’ Welch t test that does not should be based on the outcome of a variance equality test. In this paper, we empirically examine the Type I error rate of the conditional decision rule using four variance equality tests and compare this error rate to the unconditional use of either of the t tests (i.e. irrespective of the outcome of a variance homogeneity test) as well as several resampling‐based alternatives when sampling from 49 distributions varying in skewness and kurtosis. Several unconditional tests including the separate variance test performed as well as or better than the conditional decision rule across situations. These results extend and generalize the findings of previous researchers who have argued that the conditional decision rule should be abandoned.  相似文献   

6.
A Monte Carlo simulation was conducted to compare pairwise multiple comparison procedures. The number of means varied from 4 to 8 and the sample sizes varied from 2 to 500. Procedures were evaluated on the basis of Type I errors, any‐pair power and all‐pairs power. Two modifications of the Games and Howell procedure were shown to make it conservative. No procedure was found to be uniformly most powerful. For any pair power the Games and Howell procedure was found to be generally most powerful even when applied at more stringent levels to control Type I errors. For all pairs power the Peritz procedure applied with modified Brown–Forsythe tests was found to be most powerful in most conditions.  相似文献   

7.
Bootstrap and jackknife techniques are used to estimate ellipsoidal confidence regions of group stimulus points derived from INDSCAL. The validity of these estimates is assessed through Monte Carlo analysis. Asymptotic estimates of confidence regions based on a MULTISCALE solution are also evaluated. Our findings suggest that the bootstrap and jackknife techniques may be used to provide statements regarding the accuracy of the relative locations of points in space. Our findings also suggest that MULTISCALE asymptotic estimates of confidence regions based on small samples provide an optimistic view of the actual statistical reliability of the solution. The authors wish to thank Geert DeSoete, Richard A. Harshman, William Heiser, Jon Kettenring, Joseph B. Kruskal, Jacqueline Meulman, James O. Ramsay, John W. Tukey, Paul A. Tukey, and Mike Wish. Sharon L. Weinberg is a consultant at AT&T Bell Laboratories, Murray Hill, New Jersey 07974.  相似文献   

8.
In sparse tables for categorical data well‐known goodness‐of‐fit statistics are not chi‐square distributed. A consequence is that model selection becomes a problem. It has been suggested that a way out of this problem is the use of the parametric bootstrap. In this paper, the parametric bootstrap goodness‐of‐fit test is studied by means of an extensive simulation study; the Type I error rates and power of this test are studied under several conditions of sparseness. In the presence of sparseness, models were used that were likely to violate the regularity conditions. Besides bootstrapping the goodness‐of‐fit usually used (full information statistics), corrected versions of these statistics and a limited information statistic are bootstrapped. These bootstrap tests were also compared to an asymptotic test using limited information. Results indicate that bootstrapping the usual statistics fails because these tests are too liberal, and that bootstrapping or asymptotically testing the limited information statistic works better with respect to Type I error and outperforms the other statistics by far in terms of statistical power. The properties of all tests are illustrated using categorical Markov models.  相似文献   

9.
Controversy surrounds the relationship between the three Synoptic Gospels written by Mark, Matthew and Luke. In particular, researchers on the New Testament disagree about whether the similarities between these three Gospels can be attributed to copying from a common information source, or whether they are due to a reliance on common oral traditions. Two experiments were conducted to investigate the characteristics that might distinguish material that was orally transmitted from that which was copied. In Experiment 1, participants were asked to write short narratives about recent and historic events, using (a) no external sources (b) an external source which was to be returned before the narrative was written, or (c) an external source which was retained while the narrative was written. Results showed that long sequences of common verbatim text occurred only when external sources could be retained while the account was written, suggesting behaviour indicative of copying. In Experiment 2, however, different genres of material were examined (jokes, aphorisms, and poetry). Results showed that while long sequences of more than 18 words in verbatim sequence might be evidence of copying where narrative material is concerned, it is not necessarily true for poetry or aphorisms, where it is possible to transmit from memory more than 18 words in exact sequence. We conclude that when Gospel instances of long sequences of similar material are examined, copying is a likely explanation. But such instances only represent a small proportion of the total number of parallels. The majority of parallel traditions appear to rely on memory, consistent with the experimental evidence presented here. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

10.
11.
Principal components analysis (PCA) is used to explore the structure of data sets containing linearly related numeric variables. Alternatively, nonlinear PCA can handle possibly nonlinearly related numeric as well as nonnumeric variables. For linear PCA, the stability of its solution can be established under the assumption of multivariate normality. For nonlinear PCA, however, standard options for establishing stability are not provided. The authors use the nonparametric bootstrap procedure to assess the stability of nonlinear PCA results, applied to empirical data. They use confidence intervals for the variable transformations and confidence ellipses for the eigenvalues, the component loadings, and the person scores. They discuss the balanced version of the bootstrap, bias estimation, and Procrustes rotation. To provide a benchmark, the same bootstrap procedure is applied to linear PCA on the same data. On the basis of the results, the authors advise using at least 1,000 bootstrap samples, using Procrustes rotation on the bootstrap results, examining the bootstrap distributions along with the confidence regions, and merging categories with small marginal frequencies to reduce the variance of the bootstrap results.  相似文献   

12.
I compared the randomization/permutation test and theF test for a two-cell comparative experiment. I varied (1) the number of observations per cell, (2) the size of the treatment effect, (3) the shape of the underlying distribution of error and, (4) for cases with skewed error, whether or not the skew was correlated with the treatment. With normal error, there was little difference between the tests. When error was skewed, by contrast, the randomization test was more sensitive than theF test, and if the amount of skew was correlated with the treatment, the advantage for the randomization test was both large and positively correlated with the treatment. I conclude that, because the randomization test was never less powerful than theF test, it should replace theF test in routine work.  相似文献   

13.
Advance preparation reduces RT task-switching cost, which is thought to be evidence of preparatory control in the cuing task-switching paradigm. In the present study, we emphasize errors in relation to response speed. In two experiments, we show that (1) task switching increased the rate at which the currently irrelevant task was erroneously executed ("task errors") and (2) advance task preparation reduced the task error rate to that seen in nonswitch trials. The implications of the results to the hypothesis concerning task-specific preparation are discussed.  相似文献   

14.
15.
In the course of running an eye-tracking experiment, one computer system or subsystem typically presents the stimuli to the participant and records manual responses, and another collects the eye movement data, with little interaction between the two during the course of the experiment. This article demonstrates how the two systems can interact with each other to facilitate a richer set of experimental designs and applications and to produce more accurate eye tracking data. In an eye-tracking study, a participant is periodically instructed to look at specific screen locations, orexplicit required fixation locations (RFLs), in order to calibrate the eye tracker to the participant. The design of an experimental procedure will also often produce a number ofimplicit RFLs—screen locations that the participant must look at within a certain window of time or at a certain moment in order to successfully and correctly accomplish a task, but without explicit instructions to fixate those locations. In these windows of time or at these moments, the disparity between the fixations recorded by the eye tracker and the screen locations corresponding to implicit RFLs can be examined, and the results of the comparison can be used for a variety of purposes. This article shows how the disparity can be used to monitor the deterioration in the accuracy of the eye tracker calibration and to automatically invoke a re-calibration procedure when necessary. This article also demonstrates how the disparity will vary across screen regions and participants and how each participant’s uniqueerror signature can be used to reduce the systematic error in the eye movement data collected for that participant.  相似文献   

16.
The correlation between a short-form (SF) test and its full-scale (FS) counterpart is a mainstay in the evaluation of SF validity. However, in correcting for overlapping error variance in this measure, investigators have overattenuated the validity coefficient through an intuitive misapplication of P. Levy's (1967) formula. The authors of the present article clarify that such corrections should be based on subtest-level versus FS-level data. Additionally, the authors propose a simple, modified equation incorporating FS-level scores that provides liberal and conservative validity measures for comparison across estimation methods, and they demonstrate its use in both a normative (N = 2,450) and clinical psychiatric (N = 216) sample.  相似文献   

17.
Nonparametric item response theory methods were applied to the responses of 1,000 college students on the 64 items of the Inventory of Interpersonal Problems-Circumplex (IIP-C; Alden, Wiggins, & Pincus, 1990) to develop an abbreviated 32-item version of the instrument. In a separate validation sample of 981 students, the newly selected scale items did not show evidence of differential item functioning across males and females. There was high convergence found between the new scales and IIP-C parent scales, along with commensurate or improved fits to the circular structural model relative to the full scale and its existing brief derivatives-the IIP-32 and the IIP-SC. Results provide evidence that the new brief scales can improve the level of precision and information yielded in brief assessments of interpersonal problems without gender bias.  相似文献   

18.
Averaging across observers is common in psychological research. Often, averaging reduces the measurement error and, thus, does not affect the inference drawn about the behavior of individuals. However, in other situations, averaging alters the structure of the data qualitatively, leading to an incorrect inference about the behavior of individuals. In this research, the influence of averaging across observers on the fits of decision bound models (Ashby, 1992a) and generalized context models (GCM; Nosofsky, 1986) was investigated through Monte Carlo simulation of a variety of categorization conditions, perceptual representations, and individual difference assumptions and in an experiment. The results suggest that (1) averaging has little effect when the GCM is the correct model, (2) averaging often improves the fit of the GCM and worsens the fit of the decision bound model when the decision bound model is the correct model, (3) the GCM is quite flexible and, under many conditions, can mimic the predictions of the decision bound model, whereas the decision bound model is generally unable to mimic the predictions of the GCM, (4) the validity of the decision bound model's perceptual representation assumption can have a large effect on the inference drawn about the form of the decision bound, and (5) the experiment supported the claim that averaging improves the fit of the GCM. These results underscore the importance of performing single-observer analysis if one is interested in understanding the categorization performance of individuals.  相似文献   

19.
English, French, and bilingual English-French 17-month-old infants were compared for their performance on a word learning task using the Switch task. Object names presented a /b/ vs. /g/ contrast that is phonemic in both English and French, and auditory strings comprised English and French pronunciations by an adult bilingual. Infants were habituated to two novel objects labeled 'bowce' or 'gowce' and were then presented with a switch trial where a familiar word and familiar object were paired in a novel combination, and a same trial with a familiar word–object pairing. Bilingual infants looked significantly longer to switch vs. same trials, but English and French monolinguals did not, suggesting that bilingual infants can learn word–object associations when the phonetic conditions favor their input. Monolingual infants likely failed because the bilingual mode of presentation increased phonetic variability and did not match their real-world input. Experiment 2 tested this hypothesis by presenting monolingual infants with nonce word tokens restricted to native language pronunciations. Monolinguals succeeded in this case. Experiment 3 revealed that the presence of unfamiliar pronunciations in Experiment 2, rather than a reduction in overall phonetic variability was the key factor to success, as French infants failed when tested with English pronunciations of the nonce words. Thus phonetic variability impacts how infants perform in the switch task in ways that contribute to differences in monolingual and bilingual performance. Moreover, both monolinguals and bilinguals are developing adaptive speech processing skills that are specific to the language(s) they are learning.  相似文献   

20.
A log-linear analysis (Grimshaw et al., 1994) of Fused Dichotic Words Test data was examined in a sample of 28 children with epilepsy who had undergone the intracarotid amobarbital procedure to determine the nature of speech representation. The analysis yields a measure of ear dominance (lambda*) that controls for stimulus dominance confounds. Most patients with unilateral speech obtained statistically significant ear advantages, whereas most with bilateral speech displayed no significant ear advantage. Despite controlling for stimulus dominance confounds, some of the scores from patients with left-hemisphere speech overlapped with those from patients with bilateral speech representation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号