Methods of sample size planning are developed from the accuracy in parameter approach in the multiple regression context in order to obtain a sufficiently narrow confidence interval for the population squared multiple correlation coefficient when regressors are random. Approximate and exact methods are developed that provide necessary sample size so that the expected width of the confidence interval will be sufficiently narrow. Modifications of these methods are then developed so that necessary sample size will lead to sufficiently narrow confidence intervals with no less than some desired degree of assurance. Computer routines have been developed and are included within the MBESS R package so that the methods discussed in the article can be implemented. The methods and computer routines are demonstrated using an empirical example linking innovation in the health services industry with previous innovation, personality factors, and group climate characteristics.  相似文献   

Gwowen Shieh 《Psychometrika》2006,71(3):529-540
This paper considers the problem of analysis of correlation coefficients from a multivariate normal population. A unified theorem is derived for the regression model with normally distributed explanatory variables and the general results are employed to provide useful expressions for the distributions of simple, multiple, and partial-multiple correlation coefficients. The inversion principle and monotonicity property of the proposed formulations are used to describe alternative approaches to the exact interval estimation, power calculation, and sample size determination for correlation coefficients. The author thanks the referees for their constructive comments and helpful suggestions and especially the associate editor for drawing attention to several critical results which led to substantial improvements of the exposition. The work for this paper was initiated while the author was visiting the Department of Statistics, Stanford University. This research was partially supported by National Science Council Grant NSC-94-2118-M-009-004. Request for reprints should be sent to Gwowen Shieh, Department of Management Science, National Chiao Tung University, Hsinchu, Taiwan 30050, ROC.  相似文献   

概化理论广泛应用于各种心理测评实践中。当有预算限制时,概化理论需要考虑如何设计一个测量可靠性相对较高且可行性也相对较强的测量程序,这就要求通过某些途径估计最佳样本量。拉格朗日乘法是概化理论预算限制下最佳样本量估计较为成熟的方法。探讨了概化理论预算限制下最佳样本量估计的一些影响因素,如受总预算舍入的影响等,也提出了一些后续改善的建议,如推导出拉格朗日乘法的统一公式等。  相似文献   

The underlying statistical models for multiple regression analysis are typically attributed to two types of modeling: fixed and random. The procedures for calculating power and sample size under the fixed regression models are well known. However, the literature on random regression models is limited and has been confined to the case of all variables having a joint multivariate normal distribution. This paper presents a unified approach to determining power and sample size for random regression models with arbitrary distribution configurations for explanatory variables. Numerical examples are provided to illustrate the usefulness of the proposed method and Monte Carlo simulation studies are also conducted to assess the accuracy. The results show that the proposed method performs well for various model specifications and explanatory variable distributions. The author would like to thank the editor, the associate editor, and the referees for drawing attention to pertinent references that led to improved presentation. This research was partially supported by National Science Council grant NSC-94-2118-M-009-004.  相似文献   

使用“高校教师教学水平评价问卷”,要求566名学生对19名教师进行评价,对收集到的数据作不同的概化设计,包括t×i、(st)×i、(st)×(iv)和(st)×(iv)×o四种设计。基于概化理论,结合预算限制,统一LaGrange乘法公式,自行推导不同设计的最佳样本量公式,联合估计的方差分量,计算出不同设计的最佳样本量。结果表明:(1)LaGrange乘法统一公式表现出较强的通用性,能够适用于预算限制下各种概化设计;(2)评价场合是影响高校教师教学水平评价一个相当重要的因素;(3)(st)×(iv)×o是高校教师教学水平评价概化理论预算限制下最优概化设计;(4)高校教师教学水平评价概化理论预算限制下,每位教师最佳评价学生人数为20人,每个维度最佳评价题目数为3题。  相似文献   

This paper is concerned with supplementing statistical tests for the Rasch model so that additionally to the probability of the error of the first kind (Type I probability) the probability of the error of the second kind (Type II probability) can be controlled at a predetermined level by basing the test on the appropriate number of observations. An approach to determining a practically meaningful extent of model deviation is proposed, and the approximate distribution of the Wald test is derived under the extent of model deviation of interest.  相似文献   

There is no shortage of recommendations regarding the appropriate sample size to use when conducting a factor analysis. Suggested minimums for sample size include from 3 to 20 times the number of variables and absolute ranges from 100 to over 1,000. For the most part, there is little empirical evidence to support these recommendations. This simulation study addressed minimum sample size requirements for 180 different population conditions that varied in the number of factors, the number of variables per factor, and the level of communality. Congruence coefficients were calculated to assess the agreement between population solutions and sample solutions generated from the various population conditions. Although absolute minimums are not presented, it was found that, in general, minimum sample sizes appear to be smaller for higher levels of communality; minimum sample sizes appear to be smaller for higher ratios of the number of variables to the number of factors; and when the variables-to-factors ratio exceeds 6, the minimum sample size begins to stabilize regardless of the number of factors or the level of communality.  相似文献   

When designing a study that uses structural equation modeling (SEM), an important task is to decide an appropriate sample size. Historically, this task is approached from the power analytic perspective, where the goal is to obtain sufficient power to reject a false null hypothesis. However, hypothesis testing only tells if a population effect is zero and fails to address the question about the population effect size. Moreover, significance tests in the SEM context often reject the null hypothesis too easily, and therefore the problem in practice is having too much power instead of not enough power.

An alternative means to infer the population effect is forming confidence intervals (CIs). A CI is more informative than hypothesis testing because a CI provides a range of plausible values for the population effect size of interest. Given the close relationship between CI and sample size, the sample size for an SEM study can be planned with the goal to obtain sufficiently narrow CIs for the population model parameters of interest.

Latent curve models (LCMs) is an application of SEM with mean structure to studying change over time. The sample size planning method for LCM from the CI perspective is based on maximum likelihood and expected information matrix. Given a sample, to form a CI for the model parameter of interest in LCM, it requires the sample covariance matrix S, sample mean vector , and sample size N. Therefore, the width (w) of the resulting CI can be considered a function of S, , and N. Inverting the CI formation process gives the sample size planning process. The inverted process requires a proxy for the population covariance matrix Σ, population mean vector μ, and the desired width ω as input, and it returns N as output. The specification of the input information for sample size planning needs to be performed based on a systematic literature review. In the context of covariance structure analysis, Lai and Kelley (2011) discussed several practical methods to facilitate specifying Σ and ω for the sample size planning procedure.  相似文献   

Analysis of covariance (ANCOVA) is commonly used in behavioral and educational research to reduce the error variance and improve the power of analysis of variance by adjusting the covariate effects. For planning and evaluating randomized ANCOVA designs, a simple sample-size formula has been proposed to account for the variance deflation factor in the comparison of two treatment groups. The objective of this article is to highlight an overlooked and potential problem of the exiting approximation and to provide an alternative and exact solution of power and sample size assessments for testing treatment contrasts. Numerical investigations are conducted to reveal the relative performance of the two procedures as a reliable technique to accommodate the covariate features that make ANCOVA design particularly distinctive. The described approach has important advantages over the current method in general applicability, methodological justification, and overall accuracy. To enhance the practical usefulness, computer algorithms are presented to implement the recommended power calculations and sample-size determinations.  相似文献   

Computerized cognitive testing with software programs such as the Automated Neuropsychological Assessment Metrics (ANAM) have long been used to assess cognition in military samples. This study describes demographic influences on computerized testing performance in a large active duty military sample (n = 2366). Performance differences between men and women were minimal on most ANAM subtests, but there was a clear speed/accuracy trade-off, with men favoring speed and women favoring accuracy on the Continuous Performance Test (CPT) subtest. As expected, reaction time increased with age on most subtests, with the exception of Mathematical Processing Test (MTH). Higher education resulted in significant but minimal performance increases on Code Substitution (CDS), Matching to Sample (MSP), and Memory Search (STN) subtests. In contrast, substantial performance differences were seen between education groups on the MTH subtest. These data reveal that it is important to consider demographic factors, particularly age, when using ANAM to draw conclusions about military samples. These results also point to the importance of exploring demographic influences for all reaction time–based computerized assessment batteries.  相似文献   


When planning mediation studies, researchers are often interested in the sample size needed to achieve adequate power for testing mediation. Power depends on population effect sizes, which are unknown in practice. In conventional power analysis, effect size estimates, however, are often used as population values, which could result in underpowered studies. Uncertainty in effect size estimates has been considered in other sample size planning contexts (e.g., t-test, ANOVA), but has not been handled properly for planning mediation studies. In the current study, we proposed an easy-to-use sample size planning method for testing mediation with uncertainty in effect size estimates considered. We conducted simulation studies to demonstrate the impact of uncertainty in effect size estimates on power of testing mediation, and to provide sample size suggestions under different levels of uncertainty. Empirical examples were provided to illustrate the application of our method. R functions and a web application were developed to facilitate implementation.  相似文献   

A simplified method is used to estimate the appropriate sample sizes needed to detect main effects and an interaction effect in analysis of variance, using the IQ data from the Capron and Duyme (1991) adoption study as an example. To achieve power of 80% to reject an hypothesis of no interaction when there is in reality a modest interaction requires about 215 children in each of four groups in a 2 × 2 design, whereas only 9 to 10 children per group are needed to detect main effects. Only a transnational collaborative study could hope to find this many children in the condition where a child from high socioeconomic status background is adopted into a low status family.  相似文献   

Assuming that subject responses rank order stimuli by preference, statistical methods are presented for testing the hypothesis that responses conform to a unidimensional, qualitative unfolding model and to an a priori stimulus ordering. The model postulates that persons and stimulus variables are ordered along a single continuum and that subjects most prefer stimuli nearest their own position. The underlying continuum need not form an interval scale of the stimulus attribute. The general assumptions of the test for the unfolding model make it suitable for the analysis of structure in attitude responses, preference data, and developmental stage data.This research was supported by a grant from the U.S. Public Health Service (Grant No. 1-R01-MH27861-01) to the University of Minnesota. I wish to thank Sanford Weisberg for his helpful suggestions. I also wish to thank Karen Kitchener and Patricia King for letting me use their data.  相似文献   

The Dirty Dozen (Jonason & Webster, 2010) is a frequently used concise version of the Dark Triad to measure three socially aversive personality traits: Machiavellianism, psychopathy and, narcissism. The present study has examined measurement invariance in a sample of Belgian adults. The present study aims to assess measurement invariance of the Dutch version of the Dirty Dozen measure across gender in a large city-based representative adult sample in Belgium (N = 1587). Multi-group first-order confirmatory factor analysis for categorical indicators was utilized. In addition, unique associations between Dirty Dozen traits, trait self-control and, acceptance of illegitimate norms were examined in a series of structural equation models. Results indicated that the internal consistency of the Dirty Dozen subscales was good for Machiavellianism (α = 0.80) and narcissism (α = 0.80), but modest for psychopathy (α = 0.64). The hypothesized three correlated factors model with separate factors for Machiavellianism, psychopathy and, narcissism provided a poor fit for men and women. Invariance testing across gender showed evidence for weak invariance only, indicating that the underlying latent factors are measured the same way with the same metric in the two populations. However, we were not able to establish strong measurement invariance. Observed group differences should be interpreted with caution. Furthermore, Machiavellianism and psychopathy were strongly associated with trait self-control in both men and women. Strong correlations were found between acceptance of illegitimate norms and Dirty Dozen traits, Machiavellianism and, psychopathy, but not with narcissism.  相似文献   

In recent years, the military has devoted considerable effort to the develop- ment of empirically keyed biodata instruments for use in selection. Although studies using empirical keying procedures are common in the personnel selection literature, relatively few studies have compared these procedures. Using data collected from Naval Academy midshipmen, we compared nine empirical keying procedures: vertical percent (five strategies), horizontal percent, mean criterion, phi coefficient, and rare response. For each keying procedure, five different sample sizes were used to determine the minimum sample size needed to obtain stable results. For the three largest samples, all of the criterion-based methods yielded scales with significant cross-validities. Among these methods, two vertical percent strategies generally produced the most valid scales for the four largest samples. Without exception, the cross-validities for the only noncriterion-based method (rare response) failed to reach significance. The effects of unit versus differential weighting and scale length versus item-alternative validity are discussed.  相似文献   

Comparisons of Child Behavior Checklist (CBCL) scores from 31 societies (Rescorla et al. Journal of Emotional and Behavioral Disorders 15:13–142 2007) supported the instrument’s multicultural robustness, but none of these societies was in South America. The present study tested the multicultural robustness of the 2001 CBCL using data from a national epidemiological survey in Uruguay. Participants were 1,374 6- to 11-year-olds recruited through 65 schools nationwide; 1,098 (80%) had received no mental health or special education services in the past year (non-referred group), whereas 276 (referred group) had been referred for mental health services, had repeated ≥2 grades, or had significant developmental disabilities. Mean item ratings, factor structure, and scale internal consistencies were very similar to findings reported by Rescorla et al. (Journal of Emotional and Behavioral Disorders 15:13–142 2007) and Ivanova et al. (Journal of Clinical Child and Adoloescent Psychology 36: 405–417 2007). Children from low SES school environments obtained higher problem scores, especially in the referred group. Gender, age, and referral status effects paralleled those in the U.S. Non-referred children obtained somewhat higher mean problem scores in Uruguay than in the U.S., but mean score differences between non-referred and referred children were smaller in Uruguay than the U.S. Findings supporting the CBCL’s multicultural robustness in a South American country extend the generalizability of findings reported by Rescorla et al. (Journal of Emotional and Behavioral Disorders 15:13–142 2007) for 31 societies.  相似文献   

Past methodological research on mediation analysis mainly focused on situations where all variables were complete and continuous. When issues of categorical data occur combined with missing data, more methodological considerations are involved. Specifically, appropriate decisions need to be made on estimation methods of the indirect effects and on confidence intervals for testing the indirect effects with accommodations of missing data. We compare strategies that address these issues based on a model with a dichotomous mediator, aiming to provide guidelines for researchers facing such challenges in practice.  相似文献   

Although latent attributes that follow a hierarchical structure are anticipated in many areas of educational and psychological assessment, current psychometric models are limited in their capacity to objectively evaluate the presence of such attribute hierarchies. This paper introduces the Hierarchical Diagnostic Classification Model (HDCM), which adapts the Log-linear Cognitive Diagnosis Model to cases where attribute hierarchies are present. The utility of the HDCM is demonstrated through simulation and by an empirical example. Simulation study results show the HDCM is efficiently estimated and can accurately test for the presence of an attribute hierarchy statistically, a feature not possible when using more commonly used DCMs. Empirically, the HDCM is used to test for the presence of a suspected attribute hierarchy in a test of English grammar, confirming the data is more adequately represented by hierarchical attribute structure when compared to a crossed, or nonhierarchical structure.  相似文献   

