首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Simulation studies have shown the three-form planned missing data design efficiently collects high quality data while reducing participant burden. This methodology is rarely used in sport and exercise psychology. Therefore, we conducted a re-sampling study with existing sport and exercise psychology survey data to test how three-form planned missing data survey design implemented with different item distribution approaches effect constructs’ internal measurement structure and validity. Results supported the efficacy of the three-form planned missing data survey design for cross-sectional data collection. Sample sizes of at least 300 (i.e., 100 per form) are recommended for having unbiased parameter estimates. It is also recommended items be distributed across survey forms to have representation of each facet of a construct on every form, and that a select few of these items be included across all survey forms. Further guidelines for three-form surveys based upon the results of this resampling study are provided.  相似文献   

2.
Previous designs for online calibration have only considered examinees’ responses to items. However, the use of response time, a useful metric that can easily be collected by a computer, has not yet been embedded in calibration designs. In this article we utilize response time to optimize the assignment of new items online, and accordingly propose two new adaptive designs. These are the D-optimal per expectation time unit design (D-ET) and the D-optimal per time unit design (D-T). The former method uses the conditional maximum likelihood estimation (CMLE) method to estimate the expected response times, while the latter employs the nonparametric k-nearest-neighbour method to predict the response times. Simulations were conducted to compare the two new designs with the D-optimal online calibration design (D design) in the context of continuous online calibration. In addition, a preliminary study was carried out to evaluate the performance of CMLE prior to its application in D-ET. The results showed that, compared to the D design, the D-ET and D-T designs saved response time and accrued larger calibration information per time unit, without sacrificing item calibration precision.  相似文献   

3.
He  Yinhong  Chen  Ping 《Psychometrika》2020,85(1):35-55

The maintenance of item bank is essential for continuously implementing adaptive tests. Calibration of new items online provides an opportunity to efficiently replenish items for the operational item bank. In this study, a new optimal design for online calibration (referred to as D-c) is proposed by incorporating the idea of original D-optimal design into the reformed D-optimal design proposed by van der Linden and Ren (Psychometrika 80:263–288, 2015) (denoted as D-VR design). To deal with the dependence of design criteria on the unknown item parameters of new items, Bayesian versions of the locally optimal designs (e.g., D-c and D-VR) are put forward by adding prior information to the new items. In the simulation implementation of the locally optimal designs, five calibration sample sizes were used to obtain different levels of estimation precision for the initial item parameters, and two approaches were used to obtain the prior distributions in Bayesian optimal designs. Results showed that the D-c design performed well and retired smaller number of new items than the D-VR design at almost all levels of examinee sample size; the Bayesian version of D-c using the prior obtained from the operational items worked better than that using the default priors in BILOG-MG and PARSCALE; and Bayesian optimal designs generally outperformed locally optimal designs when the initial item parameters of the new items were poorly estimated.

  相似文献   

4.
A major challenge for representative longitudinal studies is panel attrition, because some respondents refuse to continue participating across all measurement waves. Depending on the nature of this selection process, statistical inferences based on the observed sample can be biased. Therefore, statistical analyses need to consider a missing-data mechanism. Because each missing-data model hinges on frequently untestable assumptions, sensitivity analyses are indispensable to gauging the robustness of statistical inferences. This article highlights contemporary approaches for applied researchers to acknowledge missing data in longitudinal, multilevel modeling and shows how sensitivity analyses can guide their interpretation. Using a representative sample of N = 13,417 German students, the development of mathematical competence across three years was examined by contrasting seven missing-data models, including listwise deletion, full-information maximum likelihood estimation, inverse probability weighting, multiple imputation, selection models, and pattern mixture models. These analyses identified strong selection effects related to various individual and context factors. Comparative analyses revealed that inverse probability weighting performed rather poorly in growth curve modeling. Moreover, school-specific effects should be acknowledged in missing-data models for educational data. Finally, we demonstrated how sensitivity analyses can be used to gauge the robustness of the identified effects.  相似文献   

5.
The authors describe 2 efficiency (planned missing data) designs for measurement: the 3-form design and the 2-method measurement design. The 3-form design, a kind of matrix sampling, allows researchers to leverage limited resources to collect data for 33% more survey questions than can be answered by any 1 respondent. Power tables for estimating correlation effects illustrate the benefit of this design. The 2-method measurement design involves a relatively cheap, less valid measure of a construct and an expensive, more valid measure of the same construct. The cost effectiveness of this design stems from the fact that few cases have both measures, and many cases have just the cheap measure. With 3 brief simulations involving structural equation models, the authors show that compared with the same-cost complete cases design, a 2-method measurement design yields lower standard errors and a higher effective sample size for testing important study parameters. With a large cost differential between cheap and expensive measures and small effect sizes, the benefits of the design can be enormous. Strategies for using these 2 designs are suggested.  相似文献   

6.
We show that if overall sample size and effect size are held constant, the power of theF test for a one-way analysis of variance decreases dramatically as the number of groups increases. This reduction in power is even greater when the groups added to the design do not produce treatment effects. If a second independent variable is added to the design, either a split-plot or a completely randomized design may be employed. For the split-plot design, we show that the power of theF test on the betweengroups factor decreases as the correlation across the levels of the within-groups factor increases. The attenuation in between-groups power becomes more pronounced as the number of levels of the withingroups factor increases. Sample size and total cost calculations are required to determine whether the split-plot or completely randomized design is more efficient in a particular application. The outcome hinges on the cost of obtaining (or recruiting) a single subject relative to the cost of obtaining a single observation: We call this thesubject-to-observation cost (SOC) ratio. Split-plot designs are less costly than completely randomized designs only when the SOC ratio is high, the correlation across the levels of the within-groups factor is low, and the number of such levels is small.  相似文献   

7.
When a meta-analysis on results from experimental studies is conducted, differences in the study design must be taken into consideration. A method for combining results across independent-groups and repeated measures designs is described, and the conditions under which such an analysis is appropriate are discussed. Combining results across designs requires that (a) all effect sizes be transformed into a common metric, (b) effect sizes from each design estimate the same treatment effect, and (c) meta-analysis procedures use design-specific estimates of sampling variance to reflect the precision of the effect size estimates.  相似文献   

8.
Recommended effect size statistics for repeated measures designs   总被引:1,自引:0,他引:1  
Investigators, who are increasingly implored to present and discuss effect size statistics, might comply more often if they understood more clearly what is required. When investigators wish to report effect sizes derived from analyses of variance that include repeated measures, past advice has been problematic. Only recently has a generally useful effect size statistic been proposed for such designs: generalized eta squared (ηG2; Olejnik & Algina, 2003). Here, we present this method, explain that ηG2 is preferred to eta squared and partial eta squared because it provides comparability across between-subjects and within-subjects designs, show that it can easily be computed from information provided by standard statistical packages, and recommend that investigators provide it routinely in their research reports when appropriate.  相似文献   

9.
In this paper, we study the effect of the elimination of items from a scale so that only those items that correlate highly are chosen. Using a simulation, we estimate the impact on Cronbach's alpha as a function of the total number of items in a scale, the number of items chosen, the true correlation among the items, and the sample size. The results suggest that a substantial effect can exist. Not surprisingly, the effect is larger when sample sizes are smaller, when a smaller fraction of the original items is retained, and when there is greater variation in the true item-total correlations of the measures.  相似文献   

10.
This article reports the results of a study that located, digitized, and coded all 809 single-case designs appearing in 113 studies in the year 2008 in 21 journals in a variety of fields in psychology and education. Coded variables included the specific kind of design, number of cases per study, number of outcomes, data points and phases per case, and autocorrelations for each case. Although studies of the effects of interventions are a minority in these journals, within that category, single-case designs are used more frequently than randomized or nonrandomized experiments. The modal study uses a multiple-baseline design with 20 data points for each of three or four cases, where the aim of the intervention is to increase the frequency of a desired behavior; but these characteristics vary widely over studies. The average autocorrelation is near to but significantly different from zero; but autocorrelations are significantly heterogeneous. The results have implications for the contributions of single-case designs to evidence-based practice and suggest a number of future research directions.  相似文献   

11.
In the present study, we investigated the role of list composition in the testing effect. Across three experiments, participants learned items through study and initial testing or study and restudy. List composition was manipulated, such that tested and restudied items appeared either intermixed in the same lists (mixed lists) or in separate lists (pure lists). In Experiment 1, half of the participants received mixed lists and half received pure lists. In Experiment 2, all participants were given both mixed and pure lists. Experiment 3 followed Erlebacher’s (Psychological Bulletin, 84, 212–219, 1977) method, such that mixed lists, pure tested lists, and pure restudied lists were given to independent groups. Across all three experiments, the final recall results revealed significant testing effects for both mixed and pure lists, with no reliable difference in the magnitude of the testing advantage across list designs. This finding suggests that the testing effect is not subject to a key boundary condition—list design—that impacts other memory phenomena, including the generation effect.  相似文献   

12.
Arnau J  Bendayan R  Blanca MJ  Bono R 《Psicothema》2012,24(3):449-454
This study aimed to evaluate the robustness of the linear mixed model, with the Kenward-Roger correction for degrees of freedom, when implemented in SAS PROC MIXED, using split-plot designs with small sample sizes. A Monte Carlo simulation design involving three groups and four repeated measures was used, assuming an unstructured covariance matrix to generate the data. The study variables were: sphericity, with epsilon values of 0.75 and 0.57; group sizes, equal or unequal; and shape of the distribution. As regards the latter, non-normal distributions were introduced, combining different values of kurtosis in each group. In the case of unbalanced designs, the effect of pairing (positive or negative) the degree of kurtosis with group size was also analysed. The results show that the Kenward-Roger procedure is liberal, particularly for the interaction effect, under certain conditions in which normality is violated. The relationship between the values of kurtosis in the groups and the pairing of kurtosis with group size are found to be relevant variables to take into account when applying this procedure.  相似文献   

13.
This article compares the use of single- and multiple-item pools with respect to test security against item sharing among some examinees in computerized testing. A simulation study was conducted to make a comparison among different pool designs using the item selection method of maximum item information with the Sympson-Hetter exposure control and content balance. The results from the simulation study indicate that two-pool designs have a better degree of resistance to item sharing than do the single-pool design in terms of measurement precision in ability estimation. This article further characterizes the conditions under which employing a multiple-pool design is better than using a single, whole pool in terms of minimizing the number of compromised items encountered by examinees under a randomized item selection method. Although no current computerized testing program endorses the randomized item selection method, the results derived in this study can shed some light on item pool designs regarding test security for all item selection algorithms, especially those that try to equalize or balance item exposure rates by employing a randomized item selection method locally, such as the a-stratified-with-b-blocking method.  相似文献   

14.
It is common practice in both randomized and quasi-experiments to adjust for baseline characteristics when estimating the average effect of an intervention. The inclusion of a pre-test, for example, can reduce both the standard error of this estimate and—in non-randomized designs—its bias. At the same time, it is also standard to report the effect of an intervention in standardized effect size units, thereby making it comparable to other interventions and studies. Curiously, the estimation of this effect size, including covariate adjustment, has received little attention. In this article, we provide a framework for defining effect sizes in designs with a pre-test (e.g., difference-in-differences and analysis of covariance) and propose estimators of those effect sizes. The estimators and approximations to their sampling distributions are evaluated using a simulation study and then demonstrated using an example from published data.  相似文献   

15.
A composite step‐down procedure, in which a set of step‐down tests are summarized collectively with Fisher's combination statistic, was considered to test for multivariate mean equality in two‐group designs. An approximate degrees of freedom (ADF) composite procedure based on trimmed/Winsorized estimators and a non‐pooled estimate of error variance is proposed, and compared to a composite procedure based on trimmed/Winsorized estimators and a pooled estimate of error variance. The step‐down procedures were also compared to Hotelling's T2 and Johansen's ADF global procedure based on trimmed estimators in a simulation study. Type I error rates of the pooled step‐down procedure were sensitive to covariance heterogeneity in unbalanced designs; error rates were similar to those of Hotelling's T2 across all of the investigated conditions. Type I error rates of the ADF composite step‐down procedure were insensitive to covariance heterogeneity and less sensitive to the number of dependent variables when sample size was small than error rates of Johansen's test. The ADF composite step‐down procedure is recommended for testing hypotheses of mean equality in two‐group designs except when the data are sampled from populations with different degrees of multivariate skewness.  相似文献   

16.
The current paper proposes a solution that generalizes ideas of Brown and Forsythe to the problem of comparing hypotheses in two-way classification designs with heteroscedastic error structure. Unlike the standard analysis of variance, the proposed approach does not require the homogeneity assumption. A comprehensive simulation study, in which sample size of the cells, relationship between the cell sizes and unequal variance, degree of variance heterogeneity, and population distribution shape were systematically manipulated, shows that the proposed approximation was generally robust when normality and heterogeneity were jointly violated.  相似文献   

17.
Convergent and discriminant validity of psychological constructs can best be examined in the framework of multitrait–multimethod (MTMM) analysis. To gain information at the level of single items, MTMM models for categorical variables have to be applied. The CTC(M?1) model is presented as an example of an MTMM model for ordinal variables. Based on an empirical application of the CTC(M?1) model, a complex simulation study was conducted to examine the sample size requirements of the robust weighted least squares mean‐ and variance‐adjusted χ2 test of model fit (WLSMV estimator) implemented in Mplus. In particular, the simulation study analysed the χ2 approximation, the parameter estimation bias, the standard error bias, and the reliability of the WLSMV estimator depending on the varying number of items per trait–method unit (ranging from 2 to 8) and varying sample sizes (250, 500, 750, and 1000 observations). The results showed that the WLSMV estimator provided a good – albeit slightly liberal – χ2 approximation and stable and reliable parameter estimates for models of reasonable complexity (2–4 items) and small sample sizes (at least 250 observations). When more complex models with 5 or more items were analysed, larger sample sizes of at least 500 observations were needed. The most complex model with 9 trait–method units and 8 items (72 observed variables) requires sample sizes of at least 1000 observations.  相似文献   

18.
Four types of analysis are commonly applied to data from structured Rater x Ratee designs. These types are characterized by the unit of analysis, which is either raters or ratees, and by the design used, which is either between-units or within-unit design. The 4 types of analysis are quite different, and therefore they give rise to effect sizes that differ in their substantive interpretations. In most cases, effect sizes based on between-ratee analysis have the least ambiguous meaning and will best serve the aims of meta-analysts and primary researchers. Effect sizes that arise from within-unit designs confound the strength of an effect with its homogeneity. Nonetheless, the authors identify how a range of effect-size types such as these serve the aims of meta-analysis appropriately.  相似文献   

19.
Different random or purposive allocations of items to parcels within a single sample are thought not to alter structural parameter estimates as long as items are unidimensional and congeneric. If, additionally, numbers of items per parcel and parcels per factor are held fixed across allocations, different allocations of items to parcels within a single sample are thought not to meaningfully alter model fit—at least when items are normally distributed. We show analytically that, although these statements hold in the population, they do not necessarily hold in the sample. We show via a simulation that, even under these conservative conditions, the magnitude of within-sample item-to-parcel-allocation variability in structural parameter estimates and model fit can alter substantive conclusions when sampling error is high (e.g., low N, low item communalities, few items per few parcels). We supply a software tool that facilitates reporting and ameliorating the consequences of item-to-parcel-allocation variability. The tool's utility is demonstrated on an empirical example involving the Neuroticism-Extroversion-Openness (NEO) Personality Inventory and the Computer Assisted Panel Study data set.  相似文献   

20.
In their recent paper, Marchant, Simons, and De Fockert (2013) claimed that the ability to average between multiple items of different sizes is limited by small samples of arbitrarily attended members of a set. This claim is based on a finding that observers are good at representing the average when an ensemble includes only two sizes distributed among all items (regular sets), but their performance gets worse when the number of sizes increases with the number of items (irregular sets). We argue that an important factor not considered by Marchant et al. (2013) is the range of size variation that was much bigger in their irregular sets. We manipulated this factor across our experiments and found almost the same efficiency of averaging for both regular and irregular sets when the range was stabilized. Moreover, highly regular sets consisting only of small and large items (two-peaks distributions) were averaged with greater error than sets with small, large, and intermediate items, suggesting a segmentation threshold determining whether all variable items are perceived as a single ensemble or distinct subsets. Our results demonstrate that averaging can actually be parallel but the visual system has some difficulties with it when some items differ too much from others.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号