首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This article describes the 1997 revision of the Dutch Rating System for Test Quality used by the Committee of Test Affairs of the Dutch Association of Psychologists (COTAN). The revised rating system evaluates the quality of a test on 7 criteria: Theoretical basis and the soundness of the test development procedure, Quality of the testing materials, Comprehensiveness of the manual, Norms, Reliability, Construct validity, and Criterion validity. For each criterion, a checklist with a number of items is provided. Some items (for each criterion at least 1) are so-called key questions, which check whether certain minimum conditions are met. If a key question is rated negative, the rating for that criterion will automatically be "insufficient." To enhance a uniform interpretation of the items by the raters and to explain the system to test users and test developers, comment sections provide detailed information on rating and weighting the items. Once the items have been rated, the final grades (insufficient, sufficient, or good) for the 7 criteria are established by means of weighting rules.  相似文献   

2.
This study investigated the validity and incremental validity of a situational interview beyond that of a composite measure of cognitive ability. Forty-seven factory service technicians underwent an interview and took four cognitive ability tests. Supervisors rated the performance of these subjects in a concurrent validation study. The interview was found to be a valid predictor of a supervisor rating of performance (r = 0.32, p < 0.05 uncorrected), however, was unable to show incremental validity over ability tests (Incremental R2= 0.05, n.s.). Limitations of the present study and directions for future research are discussed.  相似文献   

3.
The validity of cognitive ability tests is often interpreted solely as a function of the cognitive abilities that these tests are supposed to measure, but other factors may be at play. The effects of test anxiety on the criterion related validity (CRV) of tests was the topic of a recent study by Reeve, Heggestad, and Lievens (2009) (Reeve, C. L., Heggestad, E. D., & Lievens, F. (2009). Modeling the impact of test anxiety and test familiarity on the criterion-related validity of cognitive ability tests. Intelligence, 37, 34?41.). They proposed a model on the basis of classical test theory, and concluded on the basis of data simulations that test anxiety typically decreases the CRV. In this paper, we view the effects of test anxiety on cognitive ability test scores and its implications for validity coefficients from the perspective of confirmatory factor analysis. We argue that CRV will be increased above the effect of targeted constructs if test anxiety affects both predictor and criterion performance. This prediction is tested empirically by considering convergent validity of subtests in five experimental studies of the effect of stereotype threat on test performance. Results show that the effects of test anxiety on cognitive test performance may actually enhance the validity of tests.  相似文献   

4.
Nonparametric tests for testing the validity of polytomous ISOP-models (unidimensional ordinal probabilistic polytomous IRT-models) are presented. Since the ISOP-model is a very general nonparametric unidimensional rating scale model the test statistics apply to a great multitude of latent trait models. A test for the comonotonicity of item sets of two or more items is suggested. Procedures for testing the comonotonicity of two item sets and for item selection are developed. The tests are based on Goodman-Kruskal's gamma index of ordinal association and are generalizations thereof. It is an essential advantage of polytomous ISOP-models within probabilistic IRT-models that the tests of validity of the model can be performed before and without the model being fitted to the data. The new test statistics have the further advantage that no prior order of items or subjects needs to be known.  相似文献   

5.
Recent years have shown increased awareness of the importance of personality tests in educational, clinical, and occupational settings, and developing faking-resistant personality tests is a very pragmatic issue for achieving more precise measurement. Inspired by Stark (2002) and Stark, Chernyshenko, and Drasgow (2005), we develop a pairwise preference-based personality test that aims to measure multidimensional personality traits using a large-scale statement bank. An experiment compares the resistance of the developed personality test to faking with that of rating scale-based personality tests in the item response theory model framework. Results show that latent traits estimated from the personality test based on the rating scale method are severely biased, and that faking effect can be pragmatically ignored in the personality test developed based on the pairwise preference method.  相似文献   

6.
This study introduces the Amsterdam Chess Test (ACT). The ACT measures chess playing proficiency through 5 tasks: a choose-a-move task (comprising two parallel tests), a motivation questionnaire, a predict-a-move task, a verbal knowledge questionnaire, and a recall task. The validity of these tasks was established using external criteria based on the Elo chess rating system. Results from a representative sample of active chess players showed that the ACT is a very reliable test for chess expertise and that ACT has high predictive validity. Several hypotheses about the relationships between chess expertise, chess knowledge, motivation, and memory were tested. Incorporating response latencies in test scores is shown to lead to an increase in criterion validity, particularly for easy items.  相似文献   

7.
The strong controversy over the Thematic Apperception Test's (TAT) validity may be partly due to the divergent results critics and advocates have obtained in their own research. It is noted that conditions of test administration are closely associated with quality of TAT research results. To demonstrate a cause of TAT invalidity, 199 adolescents were given the TAT following one of four instructional sets: neutral, following a personality test, emphasizing that the TAT is a personality test, and in a nonthreatening but structured setting. As expected, stories written after neutral instructions were valid predictors of need for achievement, affiliation, and power criteria; whereas other stories yielded nonsignificant and sometimes negative validities. It was concluded that the instructions are crucial to the quality of TAT results, and suggestions were offered to help ensure validity.  相似文献   

8.
In recent years, there has been debate about the validity of figure drawings, although surveys of clinicians in both general and forensic practice still find them to be one of the most widely used tests of personality functioning. Using both Heilbrun's (1992) guidelines for the use of psychological tests in a forensic evaluation and the U.S. Supreme Court's Daubert v. Merrell Dow Pharmaceuticals, Inc. (1993) criteria for the admission of scientific evidence, I examine the admissibility of human figure drawings in court. The results suggest that the most commonly used methods for interpreting human figure drawings fall short of meeting the standards for admissibility. The use of overall rating scales, although weak in validity, appear to minimally meet these standards.  相似文献   

9.
The aim of this article is to provide empirical psychometric evidence of the (longitudinal) predictive validity of a learning potential measure—the Learning Potential Computerised Adaptive Test (LPCAT)—in comparison with standard static tests with school aggregate results as the criterion measure. Participants were 79 boys (mean age 12.44, SD = 0.44) and 72 girls (mean age 11.18, SD = 0.42) attending two private schools. Correlation and regression analyses were used to evaluate the predictive validity of the learning potential and standard test scores for school aggregate academic results as criterion measure. Results indicate that learning potential scores were statistically significant predictors of aggregate academic results and provided results that were comparable to those of the standard test results—providing empirical support for the use of learning potential tests in mainstream educational settings.  相似文献   

10.
HSK主观考试评分的Rasch实验分析   总被引:1,自引:0,他引:1  
主观评分中存在的不一致性导致主观评分的信度降低。多面Rasch模型基于项目反应理论,可以应用于评分员效应的识别和消除,从而提高主观评分的信度。该文介绍多面Rasch模型的理论和应用框架,设计了基于该模型的HSK主观考试评分质量控制应用框架,利用HSK作文评分数据进行了实验验证。  相似文献   

11.
Various stimulus components (video, orally-presented questions) and response components (multiple-choice, written, orally-given replies) of situational judgement tests of occupational social competency were investigated as to their impact upon the validity for a behavior-oriented role playing criterion while keeping test content constant. The stimulus component video alone had no impact upon validity. The response components contributed to validity. Validity increased with improved fidelity of response components. Concerning stimulus-response-combinations, the validity of two video tests (r = 0.17 and r = 0.36) was not higher than the validity of similar oral questioning (r = 0.13 and r = 0.37) but was significantly lower than a situational interview (r = 0.59). Response fidelity proved to be a bottleneck regarding validity of video tests. As a result, it is recommended that, in order to maximize validity of video and multimedia efforts, the developers of video- and multimedia tests focus special attention on response fidelity.  相似文献   

12.
Several recent articles have suggested that assessments of the relative importance of different abilities or competencies to a job have little bearing on the criterion‐related validity of these selection tests that measure those abilities. We hypothesize that selection test batteries chosen to maximize the judged importance of knowledge, skills, and abilities will not predict performance better than batteries of tests chosen at random. The results in two independent samples consistently show that the validity of test batteries chosen based on subject matter expert judgments of importance is not different from the validity of batteries of a comparable number of tests chosen at random from a set of intercorrelated tests, or even those chosen to provide the worst possible match between test content and job content.  相似文献   

13.
This study empirically evaluates the relationships between rationally and empirically estimated item indices used in the development of a test of land navigation knowledge for the US Marine Corps. Three land navigation instructors provided ratings on three item indices (item content validity ratios [CVRs], difficulty, and discrimination). These ratings were correlated with empirically derived item indices (difficulty, discrimination, and item-criterion correlations) obtained from the scores of 359 Marines on measures of land navigation knowledge, skill, and performance. Contrary to previous research, item CVRs were significantly correlated with all three empirically determined item indices. The increased effectiveness of rational item indices obtained in the present study over those from previous research appears to be related to the following areas: qualifications of expert raters, match of raters to the rating task, and the nature of the rating task. The implications for content-oriented test design and for judgmental test validation strategies are discussed.  相似文献   

14.
The resurgence of personality tests in selection has sparked interest in factors that may increase the utility and acceptability of these tests. Following a justice framework, the present study explores two possible methods for improving the psychometric properties and test‐taker perceptions of a widely used measure of personality, the NEO‐Five Factor Inventory. The first manipulation altered respondents' frame‐of‐reference (FOR) by adding “at‐work” tags to the personality test. The second provided information about the validity and appropriateness of the personality test for selection. Under the controlled setting of a laboratory experiment, participants (n=345) were randomly assigned to one of the conditions in the following between‐subjects design: 2 (FOR: work specific vs. generic) × 2 (information: validity vs. control). The FOR manipulation produced consistent effects on the personality test responses, but in contrast to recent claims, produced no effect on test perceptions. Alternatively, the information manipulation primarily influenced job‐relatedness perceptions, but had a modest negative effect on the psychometric properties of the personality test. These results show some possibilities, and difficulties, for enhancing perceptions of personality tests. They also have important implications for justice theory because they suggest that interactions among the procedural justice rules may yield unexpected and contradictory effects.  相似文献   

15.
16.
人际关系适应特征的情境评价方法研究   总被引:1,自引:0,他引:1  
本研究运用情境评价法对人际关系适应特征进行了实验研究 ,结果表明 :1以 Schutz提出的人际关系六因素为评价标准 ,采用情境评价法比问卷形式能更有效地揭示人际适应特征 ;2在情境评价中 ,以合作为主的情境设置比以竞争为主的情境设置更利于反映出人际适应特征 ;3情境评价采用定向、组织、交流和问题解决等阶段的过程设计符合情境评价的实际进程 ,有助于分阶段展开被试的行为特征 ,提高情境评价的可控性和准确性。  相似文献   

17.
One hundred ninety-three manufacturing employees who produce electro-mechanical components participated in a concurrent criterion-related validity study. The employees were administered three tests: The Bennett Mechanical Comprehension Test (Form S); The Flanagan Aptitude Classification Test-Mechanics; and the Thurstone Test of Mental Alertness (Form A). Job performance was measured by a supervisor rating of fifteen job dimensions, assessed at two points in time separated by 60 days. Correlational and multiple regression analyses were used to assess the relationship between test scores and job performance ratings. The results revealed that the Bennett Mechanical Comprehension test was the best single predictor of job performance (uncorrectedr =.38), and the incremental gain in predictability from additional tests was not significant. The results were discussed in the context of the changing nature of manufacturing jobs and the inadequacy of conventional mechanical aptitude tests to be sensitive to these changes.  相似文献   

18.
美术能力倾向测验对美术人才的识别和选拔具有重要意义。现有测验可分为审美能力测量和艺术创作能力测量两种,其中审美能力测量的测验又可以分为审美判断测验和判断后选择判断理由的两种形式。但以往研究缺乏对审美知觉能力维度的实证研究、没有区分“主观美”和“客观美”,以及缺乏对各类型美术能力倾向测验效度的比较研究。未来可加深对审美知觉能力维度的实证研究,开发多类型美术能力判断测验并比较其效度,开发适用于小学生的美术能力倾向测验,以及开发适合我国国情的美术能力倾向测验。  相似文献   

19.
Despite widespread and growing acceptance that published personality tests are valid predictors of job performance, Morgeson et al. (2007) propose they be abandoned in personnel selection because average validity estimates are low. Our review of the literature shows that Morgeson et al.'s skepticism is unfounded. Meta-analyses have demonstrated that published personality tests, in fact, yield useful validity estimates when validation is based on confirmatory research using job analysis and taking into account the bidirectionality of trait–performance linkages. Further gains are likely by use of narrow over broad measures, multivariate prediction, and theory attuned to the complexities of trait expression and evaluation at work. Morgeson et al. also suggest that faking has little, if any, impact on personality test validity and that it may even contribute positively to job performance. Job applicant research suggests that faking under true hiring conditions attenuates personality test validity but that validity is still sufficiently strong to warrant personality test use in hiring. Contrary to Morgeson et al., we argue that the full value of published personality tests in organizations has yet to be realized, calling for programmatic theory-driven research.  相似文献   

20.
《Military psychology》2013,25(1):97-120
This investigation evaluated potential revisions to the Armed Services Vocational Aptitude Battery (ASVAB). The data analyzed were collected from trainees in 17 U.S. Air Force, Army, and Navy jobs as part of the Joint-Services Enhanced Computer-Administered Test (ECAT) battery validation study. Predictors included the trainees’ preenlistment scores for the 10 tests in the current ASVAB, plus the 9 experimental ECAT battery tests. The criteria were measures of training performance. All possible combinations of tests that (a) included the Word Knowledge and Arithmetic Reasoning tests of the ASVAB and (b) could be administered in a 134- to 164-min interval were evaluated with respect to 5 indexes of test battery performance: criterion-related validity, classification efficiency, and 3 types of subgroup differences (White vs. Black, White vs. Hispanic, and male vs. female). The 5 indexes were calculated for each of the 16,437 possible combinations of tests. The standard deviations of the indexes across the combinations of tests showed that (a) values on the validity index varied little, (b) values on the classification efficiency and White versus Black and White versus Hispanic subgroup differences indexes varied moderately, and (c) values on the male versus female difference index varied substantially. The validity index of the combinations showed a moderate correlation with the classification efficiency index and a nearly zero correlation with subgroup differences. However, the classification efficiency index showed a small-to-moderate positive correlation with the subgroup difference indexes. The subgroup difference indexes showed moderate-to-high positive correlations with one another. Examinations of the top 20 combinations of tests identified by each index demonstrated that tests that optimize one type of index usually do not optimize each of the other indexes. In particular, trade-offs were observed between (a) the maximization of validity (and classification efficiency) versus the minimization of all 3 types of subgroup differences and (b) the minimization of differences between Whites and Blacks (or between Whites and Hispanics) versus the minimization of differences between men and women. These results suggest that no combination of the tests considered in this investigation simultaneously optimizes all 5 test battery performance indexes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号