首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于改进的Wald统计量,将适用于两群组的DIF检测方法拓展至多群组的项目功能差异(DIF)检验;改进的Wald统计量将分别通过计算观察信息矩阵(Obs)和经验交叉相乘信息矩阵(XPD)而得到。模拟研究探讨了此二者与传统计算方法在多个群组下的DIF检验情况,结果表明:(1)Obs和XPD的一类错误率明显低于传统方法,DINA模型估计下Obs和XPD的一类错误率接近理论水平;(2)样本量和DIF量较大时,Obs和XPD具有与传统Wald统计量大体相同的统计检验力。  相似文献   

2.
Empirical research on the relationship between culture and creativity has thus far yielded no consistent results. Investigations of the differences are mostly post-hoc, and results are inconclusive. A creativity-value-oriented theory is proposed to explain cultural differences, as an alternative to ethnic and language effects. This study was conducted to compare the performances of artistic creativity of Germans and Chinese. Results revealed that the four groups of students examined (German students of Caucasian descent, German students of Asian descent, Chinese students studying abroad, and Chinese students studying in China) differed in their artistic creativity. German participants (Caucasian Germans and Asian Germans) produced more creative and aesthetically pleasing artwork than did their Chinese counterparts (Chinese studying abroad and domestic Chinese). This difference was observed by both German and Chinese judges. There no significant subgroup differences in creative performances—no difference between the two German groups, and no difference between the two Chinese groups. Finally, although there were significant differences between German judges, Chinese judges studying abroad, and domestic Chinese judges in judging the artworks, these were not due to a preference for artwork from students from their own cultural background. Chinese and German judges roughly agreed on what constitutes creativity. These results suggest that cultural differences affect creative performances.  相似文献   

3.
Summary

Consensus ratings for beauty or attractiveness yielded comparatively low, though mostly positive, correlations with intelligence and educational achievement. Most of the correlations between beauty and intelligence and also between beauty and scholarship were in the neighborhood of +.20. Four groups of college students, two groups of girls and two groups of boys, served as S's. The consensus ratings for beauty were secured from 2 groups of judges, each group composed of 12 boys and 12 girls. These consensus ratings were correlated with ratings for intelligence and scholarship, as determined by intelligence test scores and by grades received in at least three semesters of college work. The ratings for beauty showed a high degree of variability. On the average, individual judgments deviated from the consensus ratings by about four steps. Deviations were greater for the middle group than for those taking a high or low position in the consensus ratings. The judges showed higher variability in rating their own sex than when rating the opposite sex.  相似文献   

4.
To date, the statistical software designed for assessing differential item functioning (DIF) with Mantel-Haenszel procedures has employed the following statistics: the Mantel-Haenszel chi-square statistic, the generalized Mantel-Haenszel test and the Mantel test. These statistics permit detecting DIF in dichotomous and polytomous items, although they limit the analysis to two groups. On the contrary, this article describes a new approach (and the related software) that, using the generalized Mantel-Haenszel statistic proposed by Landis, Heyman, and Koch (1978), permits DIF assessment in multiple groups, both for dichotomous and polytomous items. The program is free of charge and is available in the following languages: Spanish, English and Portuguese.  相似文献   

5.
The effect of changing the amount of information on judges' predictive efficiency in a clinical prediction task was studied. Thirty judges predicted 30 students' average achievement scores on the basis of different amount of test data. One group of judges had information about the intercorrelations among the tests and the ecologkal validity of the tests. Another group of judges had only informahion about which tests were used. The predictive efficiency was not a monotonically increasing function of amount of test data. The most marked result was that the relative predictive efficiency decreased from four to six tests in both groups.  相似文献   

6.
Computerized classification testing (CCT) aims to classify persons into one of two or more possible categories to make decisions such as mastery/non-mastery or meet most/meet all/exceed. A defining feature of CCT is its stopping criterion: the test terminates when there is enough confidence to make a decision. There is abundant research on CCT with a single cut-off, and two common stopping criteria are the sequential probability ratio test (SPRT) statistic and the generalized likelihood ratio statistic (GLR). However, there is a relative scarcity of research extending the SPRT to the multi-hypothesis case for when there is more than one cut-off. In this paper, we propose a new multi-category GLR (mGLR) statistic as well as a stochastically curtailed version of the CCT with three or more categories. A simulation study was conducted to show that the mGLR statistic outperformed the existing stopping rules by generating shorter average test length without sacrificing classification accuracy. Results also revealed that the stochastically curtailed mGLR successfully increased test efficiency in certain testing conditions.  相似文献   

7.
该文受Berkson将检验方法用于估计未知参数的启发,根据三个拟合优度统计量导出三种新的求取等值系数的方法,即:平方根等值方法(Square Root criterion,SQRTcrit)、对称相对熵等值方法(Symmetric Relative Entropy criterion,SREcrit)、加权等值方法(Weighted criterion,Wcrit),即Haebara准则的加权式。虽然在被检验的两个分布列很接近时,这三个多项拟合优度检验方法是渐近等价的,然而用它们求取等值系数时,Monte-Carlo模拟结果表明这三种新等值方法的行为表现存在差异。它们之间的差异和随机误差的大小有密切关系,即与项目参数估计的精度有关;还与等值系数A的范围有关。  相似文献   

8.
Increasingly behavioral researchers are soliciting cognitive responses in addition to standard attitudinal measures when attempting to assess the effects of persuasive communications. The coding of the elicited cognitive responses generally involves some sort of categorization, typically undertaken by independent judges, and the quality of the data is, to a large degree, evaluated in terms of some reliability coefficient which reflects the extent to which the independent judges agreed. The purpose of this paper is to present and illustrate a probabilistic model for assessing inter-judge reliability. The proposed probabilistic model allows one to (a) use formal test statistics to evaluate the extent and character of inter-judge reliability, (b) estimate the assignment error rates and their standard errors, and (c) test for simultaneous agreement for more than two judges. The probabilistic model is operationalized in terms of restricted latent class models.  相似文献   

9.
To test the hypothesis that personality structure differs across levels of cognitive ability, personality traits of 154 participants of various ages and educational backgrounds were rated by themselves and two well-informed judges using the Estonian Personality Item Pool NEO (EPIP-NEO; Mõttus, Pullmann, & Allik, 2006). When participants were divided into two groups on the basis of their ability test scores, a relatively high cross-observer agreement was observed in the both ability groups. Although in the high-ability group personality traits were slightly less correlated and factor structures were somewhat more similar to the normative American self-report structure of the NEO-PI-R, there was no evidence that personality structure differs substantially across ability groups.  相似文献   

10.
A nonparametric, small-sample-size test for the homogeneity of two psychometric functions against the left- and right-shift alternatives has been developed. The test is designed to determine whether it is safe to amalgamate psychometric functions obtained in different experimental sessions. The sum of the lower and upper p-values of the exact (conditional) Fisher test for several 2 × 2 contingency tables (one for each point of the psychometric function) is employed as the test statistic. The probability distribution of the statistic under the null (homogeneity) hypothesis is evaluated to obtain corresponding p-values. Power functions of the test have been computed by randomly generating samples from Weibull psychometric functions. The test is free of any assumptions about the shape of the psychometric function; it requires only that all observations are statistically independent.  相似文献   

11.
Three expert MMPI judges classified 100 psychiatric inpatients as psychotic or non-psychotic on the basis of their MMPIs. Validity scale data, as well as clinical scale scores, were included for 50 of the profiles, while the validity scale scores were withheld from the judges for the remaining 50 profiles. Within each of the above two groups, half had a "positive" validity scale sign (a defensive validity scale configuration defined as L or K greater than or equal to 70, or both greater than or equal to 60) and half had a negative validity scale sign, indicating a lack of defensiveness. Using actual diagnosis as the external criterion, results indicated that the majority of defensive psychotic patients produced clinical scale configurations which appeared nonpsychotic to the judges. Conversely, the majority of nondefensive nonpsychotics produced psychotic-appearing clinical scale configurations. These two types of test misses suggest that K corrections on MMPI scales relating to psychosis are not optimal for psychiatric inpatients. Guidelines were developed for interpreting defensive profiles.  相似文献   

12.
The Draw-A-Pcrson test was administered to three matched groups of 32 male Ss each: reactive schizophrenics, process schizophrenics, and normal control subjects. Patients were rated for prognosis using the Premorbid Subscale of the Phillips Scale. Drawings were rated by two judges on 80 diagnostic signs culled from the literature. No signs were found to significantly differentiate reactive and process schizophrenics, and only three signs significantly differentiated normals from schizophrenics. It was concluded that a sign approach to the DAP is insensitive to the reaction-process dimension of schizophrenia, and of only limited value in differentiating between normals and schizophrenics in general.  相似文献   

13.
Testing the fit of finite mixture models is a difficult task, since asymptotic results on the distribution of likelihood ratio statistics do not hold; for this reason, alternative statistics are needed. This paper applies the π* goodness of fit statistic to finite mixture item response models. The π* statistic assumes that the population is composed of two subpopulations – those that follow a parametric model and a residual group outside the model; π* is defined as the proportion of population in the residual group. The population was divided into two or more groups, or classes. Several groups followed an item response model and there was also a residual group. The paper presents maximum likelihood algorithms for estimating item parameters, the probabilities of the groups and π*. The paper also includes a simulation study on goodness of recovery for the two‐ and three‐parameter logistic models and an example with real data from a multiple choice test.  相似文献   

14.
A multivariate permutation test of similarity between two populations with corresponding unordered disjoint categories is described. The test statistic, resampling probability value, and measure of effect size are described.  相似文献   

15.
ABSTRACT This article reports two studies, where the accuracy of implicit personality theory (IPT) was investigated using on-line behavior counts as well as retrospective frequency estimates as standards of comparison Eight discussion groups, each comprising six members, were videotaped Their act frequencies with respect to 16 types of behavior were judged on-line using two coding schemes, each one being applied by two independent raters Five other judges estimated the act frequencies retrospectively Furthermore, judges revealed their IPT by estimating the conditional likelihood of these types of behavior It turned out that (a) retrospective judges perceive different base rates accurately, (b) the correlations among retrospectively estimated and among on-line recorded act frequencies show high correspondences, (c) IPT accurately mirrors the correlations among retrospectively estimated as well as among on-line recorded act frequencies, and (d) judges do not appropriately consider perceived base rates when estimating conditional probabilities It is concluded that IPT is considerably accurate in those respects that are important for the validity and structural fidelity of personality ratings  相似文献   

16.
A procedure for evaluating a variety of rater reliability models is presented. A multivariate linear model is utilized to describe and assess a set of ratings. The parameters of such a model are reexpressed in terms of a factor-analytic model. Maximum likelihood methods are employed to estimate and test the parameters in this factor-analytic model. The approach is related to the use of the intraclass correlation coefficient to estimate reliability. Two examples are presented, and the results contrasted to those found with an intraclass correlation approach. Extensions of the procedure to multiple sets of judges, multiple measures, and multiple groups is introduced.  相似文献   

17.
Two studies were designed to compare (a) the rated creativity of artworks created by American and Chinese college students, and (b) the criteria used by American and Chinese judges to evaluate these artworks. The study demonstrated that the two groups of students differed in their artistic creativity. American participants produced more creative and aesthetically pleasing artworks than did their Chinese counterparts, and this difference in performance was recognized by both American and Chinese judges. The difference between the use of criteria by American and Chinese judges was small, and consisted mainly of the American judges' use of stricter standards in evaluating overall creativity. Moreover, in general, there was a greater consensus among Chinese judges regarding what constitutes creativity than among American judges. The study also revealed, but preliminarily, that the artistic creativity of Chinese students was more likely to be reduced as a function of restrictive task constraints or of the absence of explicit instructions to be creative. The results of this study seem to support the hypothesis that an independent self‐oriented culture is more encouraging of the development of artistic creativity than is an interdependent self‐oriented culture. Other possible explanations, such as differences in people's attitudes toward and motivation for engaging in art activities, or socioeconomic factors might also account for differences in people's artistic creativity.  相似文献   

18.
A high-level language program to obtain the bootstrap-corrected asymptotic distribution-free (ADF) test statistic proposed by Yung and Bentler (1994) is reviewed. The program uses the Gauss-Newton algorithm, first to obtain the ADF test statistic from the raw data, and second, to achieve the corrected test statistic from 500 independent bootstrap samples. A generator of nonnormal random samples was also implemented, according to the algorithms of Fleishman (1978) and Vale and Maurelli (1983), which permits the realization of Monte Carlo simulations. Furthermore, the open nature of the program facilitates the inclusion of new procedures as well as the possibility of increased control of the procedures, variables, and equations.  相似文献   

19.
Legal Audiences     
This paper approaches legal argumentation from a rhetorical perspective. It discusses the nature of the audiences that are (and should be) targeted by judges in the legal process. Judicial opinions reach diverse groups of people with very different attitudes and expectations: other judges, lawyers, litigants, concerned citizens, etc. One important way in which these groups differ is that some of them are more likely to be persuaded by legalistic, precedent or statute-based arguments, while others expect judges to decide on grounds of justice or equity. So, judges face the challenge of determining whether they should select particular groups for special attention, or whether they have alternative rhetorical means to approach the problem of audience diversity. One strategy that is likely to be recommended by rhetorical scholars is that judges should not try to accommodate the various preferences of their actual readership, but that they should rather invoke an idealized audience or some version of Chaïm Perelman’s universal audience. However, the paper tries to show that the universal audience is of limited value for a discussion about how judges ought to proceed in the face of audience diversity. In particular, the idea of a universal audience does not help judges to make the choice between a legalistic or an equity-based approach to legal decision-making. By showing that this is so, the paper also raises doubts about the common thought that to invoke the universal audience in law is to appeal to natural law (as distinct from written, positive law).  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号