首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the theory of test validity it is assumed that error scores on two distinct tests, a predictor and a criterion, are uncorrelated. The expected-value concept of true score in the calssical test-theory model as formulated by Lord and Novick, Guttman, and others, implies mathematically, without further assumptions, that true scores and error scores are uncorrelated. This concept does not imply, however, that error scores on two arbitrary tests are uncorrelated, and an additional axiom of “experimental independence” is needed in order to obtain familiar results in the theory of test validity. The formulas derived in the present paper do not depend on this assumption and can be applied to all test scores. These more general formulas reveal some unexpected and anomalous properties of test validty and have implications for the interpretation of validity coefficients in practice. Under some conditions there is no attenuation produced by error of measurement, and the correlation between observed scores sometimes can exceed the correlation between true scores, so that the usual correction for attenuation may be inappropriate and misleading. Observed scores on two tests can be positively correlated even when true scores are negatively correlated, and the validity coefficient can exceed the index of reliability. In some cases of practical interest, the validity coefficient will decrease with increase in test length. These anomalies sometimes occur even when the correlation between error scores is quite small, and their magnitude is inversely related to test reliability. The elimination of correlated errors in practice will not enhance a test's predictive value, but will restore the properties of the validity coefficient that are familiar in the classical theory.  相似文献   

2.
A model for longitudinal latent structure analysis is proposed. We assume that test scores for a given mental or attitudinal test are observed for the same individuals at two different points in time. The purpose of the analysis is to fit a model that combines the values of the latent variable at the two time points in a two-dimensional latent density. The correlation coefficient between the two values of the latent variable can then be estimated. The theory and methods are illustrated by a Danish dataset concerning psychic vulnerability.  相似文献   

3.
汪文义  宋丽红  丁树良 《心理学报》2016,48(12):1612-1624
介绍多维项目反应理论模型下分类准确性和分类一致性指标, 采用蒙特卡罗方法实现复杂决策规则下指标计算, 并从数学上证明分类准确性指标两类估计量在均匀先验和相同决策规则条件下依概率收敛于同一真值。研究结果表明:分类准确性指标可以比较准确地评价分类结果的准确性; 分类一致性指标可以较好地评价分类结果的重测一致性; 在一定条件下, 基于能力量尺的指标优于基于原始总分的指标; 纵使测验维度增加, 估计精度仍比较好; 随着测验长度和维度间相关增加, 分类准确性和分类一致性更高。指标可以用来评价标准参照测验或计算机分类测验的多种决策规则下分类信度和效度。  相似文献   

4.
Influence of test anxiety on measurement of intelligence   总被引:1,自引:0,他引:1  
In this study a measurement model for a test anxiety questionnaire was investigated in a sample of 207 Dutch students in the first grade of junior secondary vocational education. The results of a confirmatory factor analysis showed that a model for test anxiety with three factors for worry, emotionality, and lack of self-confidence is associated with a significantly better fit than a model comprised of only the first two factors. The relations of the three test anxiety factors to scores on intelligence tests for measuring verbal ability, reasoning, and spatial ability were examined. The results indicated that test anxiety appears to be transitory: the negative relation between test anxiety and test performance promptly fades away. Finally, we examined whether a distinction can be made between highly test anxious students with low performance due to worrisome thoughts (interference hypothesis) or low ability (deficit hypothesis). Results do not support the deficit hypothesis because the scores of all highly test anxious students increased in a less stressful situation.  相似文献   

5.
Mean gain scores for cognitive ability tests between two sessions in a selection setting are now a robust finding, yet not fully understood. Many authors do not attribute such gain scores to an increase in the target abilities. Our approach consists of testing a longitudinal SEM model suitable to this view. We propose to model the scores' changes of a battery of tests between two sessions with a single factor, namely the change in the situational component of the scores. The situational component encompasses all effects due to the specificity of the state of the person in the current situation (e.g., anxiety level, tiredness, test-taking practice) and is allowed to vary from one session to another. By definition, this single component is supposed to influence all tests at a given session. In particular cases such as high-stake selection settings, where applicants are likely to train themselves before retaking the tests, situational factors might even suffice to explain mean score increases. Empirically, our latent change model closely fitted the scores of 752 applicants for entry into the French Aircraft Pilot Training, gathered on a set of three tests (visual perception, mechanical comprehension, and selective attention). Gain scores of moderate to strong effect sizes could be explained by common situational effects, with no need for admitting change on ability components. Therefore, gain scores may be understood as construct-irrelevant changes.  相似文献   

6.
This study examines how audiovisual signals are combined in time for a temporal analogue of the ventriloquist effect in a purely temporal context, that is, no spatial grounding of signals or other spatial facilitation. Observers were presented with two successive intervals, each defined by a 1250-ms tone, and indicated in which interval a brief audiovisual stimulus (visual flash + noise burst) occurred later. In "test" intervals, the audiovisual stimulus was presented with a small asynchrony, while in "probe" intervals it was synchronous and presented at various times guided by an adaptive staircase to find the perceived temporal location of the asynchronous stimulus. As in spatial ventriloquism, and consistent with maximum likelihood estimation (MLE), the asynchronous audiovisual signal was shifted toward the more reliably localized component (audition, for all observers). Moreover, these temporal shifts could be forward or backward in time, depending on the asynchrony order, suggesting perceived timing is not entirely determined by physical timing. However, the critical signature of MLE combination--better bimodal than unimodal precision--was not found. Regardless of the underlying model, these results demonstrate temporal ventriloquism in a paradigm that is defined in a purely temporal context.  相似文献   

7.
Rasch models are characterised by sufficient statistics for all parameters. In the Rasch unidimensional model for two ordered categories, the parameterisation of the person and item is symmetrical and it is readily established that the total scores of a person and item are sufficient statistics for their respective parameters. In contrast, in the unidimensional polytomous Rasch model for more than two ordered categories, the parameterisation is not symmetrical. Specifically, each item has a vector of item parameters, one for each category, and each person only one person parameter. In addition, different items can have different numbers of categories and, therefore, different numbers of parameters. The sufficient statistic for the parameters of an item is itself a vector. In estimating the person parameters in presently available software, these sufficient statistics are not used to condition out the item parameters. This paper derives a conditional, pairwise, pseudo-likelihood and constructs estimates of the parameters of any number of persons which are independent of all item parameters and of the maximum scores of all items. It also shows that these estimates are consistent. Although Rasch’s original work began with equating tests using test scores, and not with items of a test, the polytomous Rasch model has not been applied in this way. Operationally, this is because the current approaches, in which item parameters are estimated first, cannot handle test data where there may be many scores with zero frequencies. A small simulation study shows that, when using the estimation equations derived in this paper, such a property of the data is no impediment to the application of the model at the level of tests. This opens up the possibility of using the polytomous Rasch model directly in equating test scores.  相似文献   

8.
In selection research and practice, there have been many attempts to correct scores on noncognitive measures for applicants who may have faked their responses somehow. A related approach with more impact would be identifying and removing faking applicants from consideration for employment entirely, replacing them with high-scoring alternatives. The current study demonstrates that under typical conditions found in selection, even this latter approach has minimal impact on mean performance levels. Results indicate about .1 SD change in mean performance across a range of typical correlations between a faking measure and the criterion. Where trait scores were corrected only for suspected faking, and applicants not removed or replaced, the minimal impact the authors found on mean performance was reduced even further. By comparison, the impact of selection ratio and test validity is much larger across a range of realistic levels of selection ratios and validities. If selection researchers are interested only in maximizing predicted performance or validity, the use of faking measures to correct scores or remove applicants from further employment consideration will produce minimal effects.  相似文献   

9.
According to Helms, "test fairness" is defined as "removal from test scores of systematic variance attributable to experiences of racial or cultural socialization." Some of Helms's reasoning is based on earlier work, which recommended that racial group or category variables be replaced entirely with individual-level constructs, to reflect racial socialization experiences that vary within racial groups. Treatment of the test fairness issue--a social and political issue--will benefit from explicitly considering historical events that contributed to group-level race differences. In light of this history, D. A. Newman et al suggest (a) retaining a group-level conceptualization of race/racial socialization and also (b) focusing on criterion-irrelevant variance in test scores that is attributable to race.  相似文献   

10.
Two experiments were conducted to determine if the manner in which information is combined to make personnel evaluations might depend on the correlation between the pieces of information. Introductory psychology students evaluated hypothetical employees for promotion to a supervisory position in a computer-systems department based on personnel test scores in management and computer programming. The management scores were from either a high or a low validity test. In addition some employees were missing test scores. The correlation between the test scores varied across groups of subjects (−.84, .00, .84). Responses were consistent with a relative-weight averaging model in the positive-correlation condition, but not in the negative- or zero-correlation conditions. A constant-weight averaging model seemed most appropriate in the zero-correlation condition, but it was not really possible to distinguish an additive from a constant-weight averaging strategy in the negative-correlation condition. The test to distinguish these strategies involves comparing the ratings of single-test employees to those of two-test employees. However, subjects developed various strategies for rating the single-test employees which tended to invalidate the test. Configural strategies were evident in the negative- and zero-correlation conditions for employees with lopsided score combinations. Employees with one test score were penalized particularly in the negative-correlation condition and when the available score was low in validity.  相似文献   

11.
This experiment tested the hypothesis that the number of model presentations and verbal coding of modeled actions affect reproduction accuracy through their effect on cognitive representation. Subjects viewed a complex action pattern either two or eight times with or without verbal coding to highlight the dynamic structure of the component actions and their temporal sequencing. They then received, in order, a recognition test and a pictorial-arrangement test to assess the accuracy of their cognitive representations of the modeled actions. Subsequently, all subjects were tested for their ability to reproduce the action pattern from memory. Results showed that increased exposure to modeled actions enhanced the accuracy of both the cognitive representation and the behavioral reproduction. Verbal coding also increased cognitive and reproduction accuracy, but only when combined with multiple opportunities to observe the modeled actions. A causal analysis confirmed that the effects of multiple exposures and verbal coding were entirely mediated by changes produced in the accuracy of cognitive representation.  相似文献   

12.
Available research suggests that fear of negative evaluation and fear of positive evaluation are related but distinct constructs that each contribute to social anxiety, implying a need to focus on these fears in treatment. Yet, this research is almost entirely based on cross-sectional data. We examined the longitudinal relationship between fears of positive and negative evaluation over three time points in a sample of undergraduate students. We tested competing models consistent with two basic positions regarding these fears: (1) that fear of positive evaluation only appears to affect social anxiety because it arises from the same, single underlying trait as fear of negative evaluation, and (2) fears of positive and negative evaluation are correlated, but clearly distinct, constructs. The best-fitting model was an autoregressive latent-trajectory model in which each type of fear had a separate trait-like component. The correlation between these trait-like components appeared to fully account for the relationships between these constructs over time. This investigation adds to the evidence in support of the second position described above: fear of positive evaluation is best interpreted as a separate construct from fear of negative evaluation.  相似文献   

13.
刘玥  刘红云 《心理学报》2017,(9):1234-1246
双因子模型可以同时包含一个全局因子和多个局部因子,在描述多维测验结构时有其独特优势,近些年应用越来越广泛。文章基于双因子模型,提出了4种合成总分和维度分的方法,分别是:原始分法,加和法,全局题目加权加和法和局部题目加权加和法,并采用模拟的方法,在样本量、测验长度、维度间相关变化的条件下考察了这些方法与传统多维IRT方法的表现。最后,通过实证研究对结果进行了验证。结果显示:(1)全局加权加和法和局部加权加和法,尤其是局部加权加和法合成的总分和维度分与真值最接近、信度最高。(2)在维度间相关较高,测验长度较长的条件下,局部加权加和法的结果较好,部分条件下甚至优于多维IRT法。(3)仅有局部加权加和法合成的维度分能够反应维度间真实的相关关系。  相似文献   

14.
This study examines the predictive criterion-related validity of a series of professional certification tests for water and wastewater management operators. Certification test data were obtained on 164 operators holding one of three jobs in water or wastewater management facilities. The certification test scores were broken down into four component scores and a total score. Criterion data consisted of performance evaluations obtained from the operator's supervisor, a self-rating from the operator, or both. Test scores were correlated with the job performance evaluations. The results indicated that scores on the certification tests were not related to rated job performance by job type, job level, source of performance evaluation, or component of job performance. The findings are discussed in the context of establishing an appropriate criterion against which to validate certification tests, and the practical problems of doing so in an applied setting.  相似文献   

15.
It has been claimed that the short-term forgetting shown by the Peterson technique is entirely due to proactive interference from prior experimental items. Two experiments investigated this by studying forgetting when prior items were avoided by testing subjects only once. Both experiments showed significant forgetting, although the degree of forgetting was less than with a multitrial procedure. On the basis of this and other results it is suggested that the Peterson technique comprises two components, a primary memory component which decays within 6 sec, and a more stable secondary memory component. Forgetting with the multitrial procedure is attributed principally to the need to use temporal retrieval cues to avoid confusion between successive items; longer retention intervals are associated with reduced temporal discriminability and hence poorer recall.  相似文献   

16.
儿童的检测时与智力   总被引:4,自引:1,他引:3  
刘正奎  施建农  程黎 《心理学报》2003,35(6):823-829
采用三种视觉检测时任务,考察了儿童的检测时的特点以及儿童的检测时与智力之间的关系。结果发现:随着年龄的增长,儿童的检测时有逐步减小的趋势;儿童的检测时依赖于加工任务的类型。儿童的检测时与智力测验分数之间具有中等程度的负相关,并且两者之间的相关程度受加工任务和年龄因素的影响;与检测时快的儿童组相比,检测时慢组的检测时能够更好预测其智力测验分数。  相似文献   

17.
迫选(forced-choice,FC)测验由于可以控制传统李克特方法带来的反应偏差,被广泛应用于非认知测验中,而迫选测验的传统计分方式会产生自模式数据,这种数据由于不适合于个体间的比较,一直备受批评。近年来,多种迫选IRT模型的发展使研究者能够从迫选测验中获得接近常模性的数据,再次引起了研究者与实践人员对迫选IRT模型的兴趣。首先,依据所采纳的决策模型和题目反应模型对6种较为主流的迫选IRT模型进行分类和介绍。然后,从模型构建思路、参数估计方法两个角度对各模型进行比较与总结。其次,从参数不变性检验、计算机化自适应测验(computerized adaptive testing, CAT)和效度研究3个应用研究方面进行述评。最后提出未来研究可以在模型拓展、参数不变性检验、迫选CAT测验和效度研究4个方向深入。  相似文献   

18.
This study introduces the Sentiment Analysis and Cognition Engine (SEANCE), a freely available text analysis tool that is easy to use, works on most operating systems (Windows, Mac, Linux), is housed on a user’s hard drive (as compared to being accessed via an Internet interface), allows for batch processing of text files, includes negation and part-of-speech (POS) features, and reports on thousands of lexical categories and 20 component scores related to sentiment, social cognition, and social order. In the study, we validated SEANCE by investigating whether its indices and related component scores can be used to classify positive and negative reviews in two well-known sentiment analysis test corpora. We contrasted the results of SEANCE with those from Linguistic Inquiry and Word Count (LIWC), a similar tool that is popular in sentiment analysis, but is pay-to-use and does not include negation or POS features. The results demonstrated that both the SEANCE indices and component scores outperformed LIWC on the categorization tasks.  相似文献   

19.
Objective tests of personality typically include a number of items or trials; the total score on the test is the sum of the subject's “correct” responses across all such trials. Normally, the trials are varied systematically across various facets of the test design, so that the total score represents a composite measure of accuracy averaged across these test facets. However, since only one score is computed for each subject, some potentially important kinds of individual differences—namely all those associated with each particular variation in the test design—are treated solely as measurement unreliability. Such a psychometric stance may serve to obscure more differentiated types of individual differences, with the result that composite scores from trials based on one type of experimental design may not be highly related to such scores from trials using a somewhat different design. The present paper presents a general procedure for scoring objective tests more analytically. To illustrate this general rationale, and to demonstrate its potential utility, data have been reanalyzed from two previous studies, one using the Rod-and-Frame test, the other the Müller-Lyer illusion. In both cases, the traditional global accuracy score did not correlate significantly with other theoretically related variables, while a number of component scores were quite highly related.  相似文献   

20.
Both the sensitivity and administration time of a test are important in evaluating visuospatial attention in clinical settings, especially with respect to external validity. The purpose of the present study was to propose an adaptive model that provides a reference for test modification by manipulating target-to-distractor (T/D) ratios and the number of stimuli on the computerized cancellation test system. Tasks with different T/D ratios and numbers of stimuli were presented to two groups—children with and without dyslexia (n=41 and 65, respectively)—to determine whether their visuospatial attention performance differed on different test forms. In general, there were significant differences between the two groups in hit rates, completion times, and performance quality (PQ) scores. The PQ score of visual attention was affected by the T/D ratios rather than by the number of stimuli. The findings suggested that the T/D ratio has a strong effect on PQ scores, and that it should be taken into consideration in test and task design.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号