首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We show how to test hypotheses for coefficient alpha in three different situations: (1) hypothesis tests of whether coefficient alpha equals a prespecified value, (2) hypothesis tests involving two statistically independent sample alphas as may arise when testing the equality of coefficient alpha across groups, and (3) hypothesis tests involving two statistically dependent sample alphas as may arise when testing the equality of alpha across time or when testing the equality of alpha for two test scores within the same sample. We illustrate how these hypotheses may be tested in a structural equation-modeling framework under the assumption of normally distributed responses and also under asymptotically distribution free assumptions. The formulas for the hypothesis tests and computer code are given for four different applied examples. Supplemental materials for this article may be downloaded from http://brm.psychonomic-journals.org/content/supplemental.  相似文献   

2.
Current interest in the assessment of measurement equivalence emphasizes 2 major methods of analysis. The authors offer a comparison of a linear method (confirmatory factor analysis) and a nonlinear method (differential item and test functioning using item response theory) with an emphasis on their methodological similarities and differences. The 2 approaches test for the equality of true scores (or expected raw scores) across 2 populations when the latent (or factor) score is held constant. Both approaches can provide information about when measurement nonequivalence exists and the extent to which it is a problem. An empirical example is used to illustrate the 2 approaches.  相似文献   

3.
In this paper, the performance of six types of techniques for comparisons of means is examined. These six emerge from the distinction between the method employed (hypothesis testing, model selection using information criteria, or Bayesian model selection) and the set of hypotheses that is investigated (a classical, exploration‐based set of hypotheses containing equality constraints on the means, or a theory‐based limited set of hypotheses with equality and/or order restrictions). A simulation study is conducted to examine the performance of these techniques. We demonstrate that, if one has specific, a priori specified hypotheses, confirmation (i.e., investigating theory‐based hypotheses) has advantages over exploration (i.e., examining all possible equality‐constrained hypotheses). Furthermore, examining reasonable order‐restricted hypotheses has more power to detect the true effect/non‐null hypothesis than evaluating only equality restrictions. Additionally, when investigating more than one theory‐based hypothesis, model selection is preferred over hypothesis testing. Because of the first two results, we further examine the techniques that are able to evaluate order restrictions in a confirmatory fashion by examining their performance when the homogeneity of variance assumption is violated. Results show that the techniques are robust to heterogeneity when the sample sizes are equal. When the sample sizes are unequal, the performance is affected by heterogeneity. The size and direction of the deviations from the baseline, where there is no heterogeneity, depend on the effect size (of the means) and on the trend in the group variances with respect to the ordering of the group sizes. Importantly, the deviations are less pronounced when the group variances and sizes exhibit the same trend (e.g., are both increasing with group number).  相似文献   

4.
Parallel tests are needed so that alternate forms can be applied to different groups or on different occasions, but also in the context of split-half reliability estimation for a given test. Statistically, parallelism holds beyond reasonable doubt when the null hypotheses of equality of observed means and variances across the two forms (or halves) are not rejected. Several statistical tests have been proposed for this purpose, but their performance has never been compared. This study assessed the relative performance (type I error rate and power) of the Student–Pitman–Morgan, Bradley–Blackwood, and Wilks tests of equality of means and variances in the typical conditions surrounding studies of parallelism—namely, integer-valued and bounded test scores with distributions that may not be bivariate normal. The results advise against the use of the Wilks test and support the use of the Bradley–Blackwood test because of its simplicity and its minimally better performance in comparison with the more cumbersome Student–Pitman–Morgan test.  相似文献   

5.
A numerical procedure is outlined for obtaining an interval estimate of the regression of true score on observed score. Only the frequency distribution of observed scores is needed for this. The procedure assumes that the conditional distribution of observed scores for fixed true score is binomial. The procedure is applied to several sets of test data.This research was sponsored in part by the Personnel and Training Research Programs, Psychological Sciences Division, Office of Naval Research, under Contract No. N00014-69-C-0017, Contract Authority Identification Number, NR No. 150-303, and Educational Testing Service. Reproduction in whole or in part is permitted for any purpose of the United States Government.  相似文献   

6.
Finite sample inference procedures are considered for analyzing the observed scores on a multiple choice test with several items, where, for example, the items are dissimilar, or the item responses are correlated. A discrete p-parameter exponential family model leads to a generalized linear model framework and, in a special case, a convenient regression of true score upon observed score. Techniques based upon the likelihood function, Akaike's information criteria (AIC), an approximate Bayesian marginalization procedure based on conditional maximization (BCM), and simulations for exact posterior densities (importance sampling) are used to facilitate finite sample investigations of the average true score, individual true scores, and various probabilities of interest. A simulation study suggests that, when the examinees come from two different populations, the exponential family can adequately generalize Duncan's beta-binomial model. Extensions to regression models, the classical test theory model, and empirical Bayes estimation problems are mentioned. The Duncan, Keats, and Matsumura data sets are used to illustrate potential advantages and flexibility of the exponential family model, and the BCM technique.The authors wish to thank Ella Mae Matsumura for her data set and helpful comments, Frank Baker for his advice on item response theory, Hirotugu Akaike and Taskin Atilgan, for helpful discussions regarding AIC, Graham Wood for his advice concerning the class of all binomial mixture models, Yiu Ming Chiu for providing useful references and information on tetrachoric models, and the Editor and two referees for suggesting several references and alternative approaches.  相似文献   

7.
A hybrid procedure for number correct scoring is proposed. The proposed scoring procedure is based on both classical true-score theory (CTT) and multidimensional item response theory (MIRT). Specifically, the hybrid scoring procedure uses test item weights based on MIRT and the total test scores are computed based on CTT. Thus, what makes the hybrid scoring method attractive is that this method accounts for the dimensionality of the test items while test scores remain easy to compute. Further, the hybrid scoring does not require large sample sizes once the item parameters are known. Monte Carlo techniques were used to compare and contrast the proposed hybrid scoring method with three other scoring procedures. Results indicated that all scoring methods in this study generated estimated and true scores that were highly correlated. However, the hybrid scoring procedure had significantly smaller error variances between the estimated and true scores relative to the other procedures.  相似文献   

8.
Four misconceptions about the requirements for proper use of analysis of covariance (ANCOVA) are examined by means of Monte Carlo simulation. Conclusions are that ANCOVA does not require covariates to be measured without error, that ANCOVA can be used effectively to adjust for initial group differences that result from nonrandom assignment which is dependent on observed covariate scores, that ANCOVA does not provide unbiased estimates of true treatment effects where initial group differences are due to nonrandom assignment which is dependent on the true latent covariable if the covariate contains measurement error, and that ANCOVA requires no assumption concerning the equality of within-groups and between-groups regression. Where treatments actually influence covariate scores, the hypothesis tested by ANCOVA concerns a weighted combination of effects on covariate and dependent variables.  相似文献   

9.
This paper is a presentation of an essential part of the sampling theory of the error variance and the standard error of measurement. An experimental assumption is that several equivalent tests with equal variances are available. These may be either final forms of the same test or obtained by dividing one test into several parts. The simple model of independent and normally distributed errors of measurement with zero mean is employed. No assumption is made about the form of the distributions of true and observed scores. This implies unrestricted freedom in defining the population. First, maximum-likelihood estimators of the error variance and the standard error of measurement are obtained, their sampling distributions given, and their properties investigated. Then unbiased estimators are defined and their distributions derived. The accuracy of estimation is given special consideration from various points of view. Next, rigorous statistical tests are developed to test hypotheses about error variances on the basis of one and two samples. Also the construction of confidence intervals is treated. Finally, Bartlett's test of homogeneity of variances is used to provide a multi-sample test of equality of error variances.  相似文献   

10.
Approximate randomization tests are alternatives to conventional parametric statistical methods used when the normality and homoscedasticity assumptions are violated This article presents an SAS program that tests the equality of two means using an approximate randomization test This program can serve as a template for testing other hypotheses, which is illustrated by modifications to test the significance of a correlation coefficient or the equality of more than two means.  相似文献   

11.
The study of prediction bias is important and the last five decades include research studies that examined whether test scores differentially predict academic or employment performance. Previous studies used ordinary least squares (OLS) to assess whether groups differ in intercepts and slopes. This study shows that OLS yields inaccurate inferences for prediction bias hypotheses. This paper builds upon the criterion-predictor factor model by demonstrating the effect of selection, measurement error, and measurement bias on prediction bias studies that use OLS. The range restricted, criterion-predictor factor model is used to compute Type I error and power rates associated with using regression to assess prediction bias hypotheses. In short, OLS is not capable of testing hypotheses about group differences in latent intercepts and slopes. Additionally, a theorem is presented which shows that researchers should not employ hierarchical regression to assess intercept differences with selected samples.  相似文献   

12.
Solving theoretical or empirical issues sometimes involves establishing the equality of two variables with repeated measures. This defies the logic of null hypothesis significance testing, which aims at assessing evidence against the null hypothesis of equality, not for it. In some contexts, equivalence is assessed through regression analysis by testing for zero intercept and unit slope (or simply for unit slope in case that regression is forced through the origin). This paper shows that this approach renders highly inflated Type I error rates under the most common sampling models implied in studies of equivalence. We propose an alternative approach based on omnibus tests of equality of means and variances and in subject-by-subject analyses (where applicable), and we show that these tests have adequate Type I error rates and power. The approach is illustrated with a re-analysis of published data from a signal detection theory experiment with which several hypotheses of equivalence had been tested using only regression analysis. Some further errors and inadequacies of the original analyses are described, and further scrutiny of the data contradict the conclusions raised through inadequate application of regression analyses.  相似文献   

13.
This study was an investigation of the relationship between potential creativity—as measured by fluency scores on the Alternate Uses Test—and performance on Wason's 2-4-6 task. As hypothesized, participants who were successful in discovering the rule had significantly higher fluency scores. Successful participants also generated higher frequencies of confirmatory and disconfirmatory hypotheses, but a multiple regression analysis using the stepwise method revealed that the frequency of generating disconfirmatory hypotheses and fluency scores were the only two significant factors in task outcome. The results also supported earlier studies where disconfirmation was shown to play a more important role in the later stages of hypothesis testing. This was especially true of successful participants, who employed a higher frequency of disconfirmatory hypotheses after receiving feedback on the first announcement. These results imply that successful participants benefited from the provision of feedback on the first announcement by switching to a more successful strategy in the hypothesis-testing sequence.  相似文献   

14.
A composite step‐down procedure, in which a set of step‐down tests are summarized collectively with Fisher's combination statistic, was considered to test for multivariate mean equality in two‐group designs. An approximate degrees of freedom (ADF) composite procedure based on trimmed/Winsorized estimators and a non‐pooled estimate of error variance is proposed, and compared to a composite procedure based on trimmed/Winsorized estimators and a pooled estimate of error variance. The step‐down procedures were also compared to Hotelling's T2 and Johansen's ADF global procedure based on trimmed estimators in a simulation study. Type I error rates of the pooled step‐down procedure were sensitive to covariance heterogeneity in unbalanced designs; error rates were similar to those of Hotelling's T2 across all of the investigated conditions. Type I error rates of the ADF composite step‐down procedure were insensitive to covariance heterogeneity and less sensitive to the number of dependent variables when sample size was small than error rates of Johansen's test. The ADF composite step‐down procedure is recommended for testing hypotheses of mean equality in two‐group designs except when the data are sampled from populations with different degrees of multivariate skewness.  相似文献   

15.
We discuss the statistical testing of three relevant hypotheses involving Cronbach's alpha: one where alpha equals a particular criterion; a second testing the equality of two alpha coefficients for independent samples; and a third testing the equality of two alpha coefficients for dependent samples. For each of these hypotheses, various statistical tests have been proposed. Over the years, these tests have depended on progressively fewer assumptions. We propose a new approach to testing the three hypotheses that relies on even fewer assumptions, is especially suited for discrete item scores, and can be applied easily to tests containing large numbers of items. The new approach uses marginal modelling. We compared the Type I error rate and the power of the marginal modelling approach to several of the available tests in a simulation study using realistic conditions. We found that the marginal modelling approach had the most accurate Type I error rates, whereas the power was similar across the statistical tests.  相似文献   

16.
Given the substantial rise in the number of students identified as learning-disabled, increasing attention has centered on methods for determining a severe discrepancy between ability and achievement. Using scores from 86 learning disabilities referrals, we compared four such methods (a z-score discrepancy, an estimated true score discrepancy, an unadjusted regression procedure, and an adjusted regression procedure). Each student was evaluated with the WISC-R, PIAT, and K-ABC. A high degree of agreement was found between z-score and estimated true score difference approaches. Less agreement was found between the unadjusted regression procedure and the other methods. It was concluded that the four methods cannot be used interchangeably in the calculation of severe discrepancies. Of the four methods that were analyzed, the unadjusted regression procedure selected the smallest percentage of students.  相似文献   

17.
In their work on the human development sequence, Inglehart and Welzel (Modernization, cultural change, and democracy: the human development sequence. Cambridge University Press, New York, 2005) argue that there is a “rising tide” of gender equality across various countries in the system. While the authors propose that the process that holds true for a rising tide in women’s rights is also true for other outgroups including minorities and homosexuals, they do not test their proposed relationship on feelings toward these groups. At the same time, studies on sexuality and tolerance suggest that religious beliefs and government institutions play a significant role in shaping societal attitudes about homosexuality, promulgating beliefs and policies that place homosexuality in a negative light. In the case of government institutions, sexuality may also be framed as a security issue, making homosexuality appear as a threat. The present work performs an empirical test of the mechanisms of the human development sequence on tolerance toward homosexuality, and compares this theory to rival hypotheses regarding the effects of religion and heteronormative policies. Empirical testing using hierarchical linear models shows mixed support for hypotheses drawn from work on the human development sequence, but indicates that religious belief and heteronormativity in government policies have a significant relationship to levels of tolerance.  相似文献   

18.
We develop a new method for assessing the adequacy of a smooth regression function based on nonparametric regression and the bootstrap. Our methodology allows users to detect systematic misfit and to test hypotheses of the form “the proposed smooth regression model is not significantly different from the smooth regression model that generated these data.” We also provide confidence bands on the location of nonparametric regression estimates assuming that the proposed regression function is true, allowing users to pinpoint regions of misfit. We illustrate the application of the new method, using local linear nonparametric regression, both where an error model is assumed and where the error model is an unknown non-stationary function of the predictor.  相似文献   

19.
20.
反应风格是共同方法偏差的主要来源之一。本文首先讨论反应风格的定义和类型,梳理其危害,认为反应风格能使测验分数出现偏差,影响测验信效度分析和变量关系分析,有必要控制其危害。然后介绍了常用的反应风格测量方法,包括计数法和模型法两大类,对测量方法的选择给出了建议,在此基础上,就如何结合反应风格的测量方法与残差回归法、偏相关法来控制反应风格危害给出建议。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号