首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven’s progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.  相似文献   

2.
The assumption that individual differences in recognition memory are associated with individual differences in intelligence was explored by administering intelligence tests and tests of immediate visual recognition memory to a sample of 52 5-year-old children expected to vary widely from one another in intelligence. Each child was given the Peabody Picture Vocabulary Test (Form L) and two tests of immediate recognition memory: one test for 27 abstract patterns and one test for 27 unfamiliar cartoon faces. The mean PPVT-IQ for the sample was in the average range at 98.1. Interindividual variability in IQ proved to be high as reflected in the group SD of 22.6, with scores ranging from 40 to 136. The recognition tasks proved to be of moderate difficulty. Individual differences in memory for patterns were highly related to memory for faces (r = .76), indicating that the overall recognition test was reliable. The most important result of the present study was the strong association between recognition memory performance and PPVT-IQ of .70. The relation between recognition memory and IQ could not be accounted for by the inclusion of a few very low IQ children, since the association remained high at .61 when children with IQs below 75 were omitted from analysis. In short, the present results indicate that immediate recognition memory is highly associated with intelligence.  相似文献   

3.
Scores on a test built on Raaheim's (1974) theory of problem solving and intelligence (the Family Test, Part I) were correlated with scores of divergent and convergent production within the same ideational area (the Family Test, Parts II and III). The results indicate that both divergent and convergent production contribute to the solution of the problem-solving tasks. To expand the findings to a broader field of intelligence research scores on the tests of divergent and convergent production were correlated with school achievement. Multiple correlations of 0.61 (males) and 0.67 (females) were found between school achievement and the two tests. Comparisons of groups with scores above and below the means of the two tests show that school achievement depends on the combination of divergent and convergent production, suggesting that, although the two types of production reflect two different aspects of intellectual activity, as a rule they work together in the process of intelligent adjustment.  相似文献   

4.
The interpretation of retest scores is problematic because they are potentially affected by measurement and predictive bias, which impact construct validity, and because their size differs as a function of various factors. This paper investigates the construct stability of scores on a figural matrices test and models retest effects at the level of the individual test taker as a function of covariates (simple retest vs. training, use of identical vs. parallel retest forms, and general mental ability). A total of N = 189 subjects took two tests of matrix items that were automatically generated according to a strict construction rationale. Between test administrations, participants in the intervention groups received training, while controls did not. The Rasch model fit the data at both time points, but there was a lack of item difficulty parameter invariance across time. Training increased test performance beyond simple retesting, but there was no large difference between the identical and parallel retest forms at the individual level. Individuals varied greatly in how they profited from retest experience, training, and the use of identical vs. parallel retest forms. The results suggest that even with carefully designed tasks, it is problematic to directly compare scores from initial tests and retests. Test administrators should emphasize learning potential instead of state level assessment, and inter-individual differences with regard to test experience should be taken into account when interpreting test results.  相似文献   

5.
Situational judgment tests (SJTs) pose unique cognitive demands on test takers in that, when presented in written form, they require a great deal of reading and cognitive effort. Because of this cognitive demand, responses to test items toward the end of the test may be influenced by an order effect produced by responding to a large quantity of previous test items. This construct‐irrelevant order effect may increase measurement error and threaten the validity of SJT scores. To test this phenomenon, data were obtained from 1,089 applicants who had completed a lengthy SJT as part of a selection process for an hourly safety and surveillance job at a large international corporation. Results showed that local item dependence, item difficulty, and the rate of omitted responses all increased when items were placed toward the end. The order effect alone was not strong enough to influence subgroup mean score differences in the second half of the test. However, this effect did vary by race: African‐Americans were most strongly affected by the order effect, followed by Caucasians, in their number of omitted responses. Implications and future research of this effect for SJTs and similar types of assessments are discussed.  相似文献   

6.
矩阵完成问题的项目生成研究   总被引:1,自引:0,他引:1  
依据Embretson提出的认知设计系统方法,设计并编制了矩阵完成问题的项目生成系统,实际生成了矩阵完成问题测验。探讨矩阵测验与瑞文测验的关系,以及认知模型对矩阵问题的难度和区分度的预测能力。结果表明所设计的认知模型对矩阵项目的性能参数有一定的预测能力,生成的矩阵测验与瑞文测验有基本相同的心理测量属性。可以使用该系统生成的矩阵项目来测量被试的抽象推理能力。  相似文献   

7.
Changes in the factor structure of intelligence tests between early and later stages of performance and between testing and retesting were studied. In addition to factor analyses, correlations between the test scores of various time periods and the final factor scores were computed. The principal findings were the powerful influence of the verbal ability in the initial stages of most tests, the gradual 'purification' of the factors, and the occurrence of a shift of the highest factor loadings of some tests from one factor to another, particularly in tests with increasing item difficulty.  相似文献   

8.
It has been demonstrated in a number of experiments that the difficulty level of several performance type intelligence test tasks is determined directly by stimulus and task variables that vary the information to be processed. The variables are quantifiable. The implications of these findings for intelligence and the problems of an experimental approach to the measurement of intelligence are discussed.  相似文献   

9.
In a picture-matching task pictures of objects had to be arranged into pairs by aphasic and nonaphasic patients and normal controls. Aphasic patients were also given the Token Test. Correlation between the rank order of error scores in both tests was highly significant in aphasic patients. The pictures were also given to a normal group for free matching. Overlapping of normal performance on free matching and aphasic performance on bound matching occurred. We hypothesized that aphasic impairment was due to a difficulty in calling up associations, difficulty in feature analysis, and in moving from one concept to another. These findings are discussed in the light of abilities needed for Token Test performance. The results indicate that the traditionally presumed fundamental difference between verbal and nonverbal cognitive tasks is rather unsatisfactory.  相似文献   

10.
Generating items during testing: Psychometric issues and models   总被引:2,自引:0,他引:2  
On-line item generation is becoming increasingly feasible for many cognitive tests. Item generation seemingly conflicts with the well established principle of measuring persons from items with known psychometric properties. This paper examines psychometric principles and models required for measurement from on-line item generation. Three psychometric issues are elaborated for item generation. First, design principles to generate items are considered. A cognitive design system approach is elaborated and then illustrated with an application to a test of abstract reasoning. Second, psychometric models for calibrating generating principles, rather than specific items, are required. Existing item response theory (IRT) models are reviewed and a new IRT model that includes the impact on item discrimination, as well as difficulty, is developed. Third, the impact of item parameter uncertainty on person estimates is considered. Results from both fixed content and adaptive testing are presented.This article is based on the Presidential Address Susan E. Embretson gave on June 26, 1999 at the 1999 Annual Meeting of the Psychometric Society held at the University of Kansas in Lawrence, Kansas. —Editor  相似文献   

11.
Individual differences in decision speed have been regarded as direct reflections of a "primitive" functional neurophysiological characteristic, which affects performance on all cognitive tasks and so may be regarded as the "biological basis of intelligence", or of age-related changes in mental abilities. More detailed analyses show that variability within an experimental session (WSV) is a stable individual difference characteristic and that mean choice reaction times (CRTs) are gross summary statistics that reflect variability, rather than maximum speed of performance. A total of 98 people aged from 60 to 80 years completed 36 weekly sessions on six different letter categorization tasks. After effects of practice and of circadian variability had been eliminated, individuals with lower scores on the Cattell Culture Fair intelligence test had slower CRTs and greater WSV on all tasks. A simulation study showed that the greater WSVs of low Cattell scorers led directly to the significantly greater variability of their mean CRTs from session to session. However because CRTs on tasks co-varied from session to session it was apparent that, besides being affected by WSV, individuals' between-session variabilities (BSVs) also vary because of state changes that affect their performance from day to day. It seems that both variability in performance from trial to trial during a session and variability in average performance from day to day are correlated, stable, individual difference characteristics that vary inversely with intelligence test performance. Methodological consequences of these results for interpretations of age-related cognitive changes, for variability between as well as within individuals, for individual differences in decision speed, and for circadian variability in performance are discussed.  相似文献   

12.
In Experiment I a group of pupils from a secondary school was given a test of general intelligence, a test of the ability to categorize objects in a flexible way, and five different problem-solving tasks. Subjects who were successful on the problems had higher scores on the intelligence test than the rest. The Categorizing Test was not, however, a good indicator of success. In Experiment II a comparison was made between scores on different parts of the so-called Family Test. With high school pupils and university students as subjects, correlation coefficients in the neighborhood of 0.40 were found between scores when suggesting possible classifications of objects, and scores when choosing a classification to fit different sets of objects. The triangular form of the scatterplots was taken as an indication that an ability to present different interpretations of one's experience is a necessary, but not sufficient condition for success in problem-solving tasks.  相似文献   

13.
自编235个图形推理测验题目。采用铆测验等值设计,以72个联合型瑞文测验题目为铆题,对初中到大学各能力层次的1733名男性进行了测验。使用BILOG MG3.0(边际极大似然估计)对实测数据进行了分析,采用Logsitic 3参数模型。剔除数据与模型拟合不好的题目以及信息函数最大值小于0.3的题目,最终建立一个包含181道题目的题库。该题库可以用于淘汰智力较低的应征青年  相似文献   

14.
Abstract

We administered a comprehensive attentional battery to an epidemiologically defined sample of 435 first-and second-grade children to assess the influence of gender and verbal intelligence on attention. The battery included three versions of the continuous performance test (CPT), two digit cancellation tasks, three subtests from the WISC-R, and the Wisconsin Card Sorting Test. The results indicated that both gender and intelligence had an impact on attentional performance. Girls performed better than boys; they made fewer errors on the CPT and obtained higher scores on the digit cancellation task and the Coding subtest of the WISC-R. Children with higher verbal intelligence also performed better on the attentional tests, but this advantage was not observed across measures or levels of performance. For example, children with limited verbal skills performed significantly worse than their peers only in measures with high processing demands (the degraded CPT and the distraction version of the digit cancellation task).  相似文献   

15.
An operational definition of intellectual privilege and deprivation was used to develop measures of that construct in each sex from information furnished by 10th-grade students in Project Talent about their family and health background. Information such as reports of high school grades that directly reflected the ability of the students was excluded. The correlation between the experimental measure and the Talent intelligence composite is about .65 in contrast to the usual value of about .40 for a traditional measure of socioeconomic status. Sex differences in the items keyed are minimal. Using a similar definition, measures were also developed for Vernon's mechanical-spatial major group factor (1960), but the level of validity achieved is lower. The measures of privilege/deprivation developed for the two criteria show some degree of differential validity, but there is more generality in the two kinds of privilege than in tested abilities. The measures of intellectual privilege are almost congeneric measures of general intelligence, but there are significant departures from parallel profiles of relationships with other cognitive tests in Talent. The measures of intellectual privilege for both sexes add to the accuracy of prediction obtained from the intelligence criterion of scores on verbal academic and aesthetic information, but add nothing in predicting scores on measures of spatial visualization and rote memory. The possibility of constructing a test of general intelligence using item types that would minimize the correlation with privilige/deprivation is discussed.  相似文献   

16.
The factor structures of two recently developed measures of emotional intelligence, the Situational Test of Emotional Understanding and Situational Test of Emotion Management (STEU, STEM; MacCann & Roberts, 2008) were examined. The results did not support a factor structure of either measure’s subscales indicated by the approach used in developing the test items, and examination of the factors obtained using parallel analysis to determine the number of factors to extract did not yield interpretable factors. These findings suggest that only total scale scores should be used for these tests, although the general factor extracted from the items was not strong for either test; further development work on these tests is indicated.  相似文献   

17.
This paper presents the results of three studies in which scores obtained on single and competing tests were correlated with the composites representing fluid intelligence, crystallized intelligence, and the short-term acquisition and retrieval function. The results indicate that competing tasks have higher correlations with intelligence than single tests. Since in two studies of this paper no decrements in performance were observed on the competing as compared with the single task, the concept of limited central processing capacity cannot account for individual differences in performance of these cognitive tasks. It is suggested that the concept of efficient processing of information, perhaps efficient encoding, may be the basis for individual differences in cognitive abilities.  相似文献   

18.
认知操作、认知方式与外倾性人格特质的关系   总被引:1,自引:0,他引:1  
张利燕  郑雪 《心理科学》2007,30(3):604-608
对28名外倾被试和28名内倾被试进行实验性认知测试,以考察认知操作、认知方式与外倾性人格特质的关系。结果表明,内外倾被试在含有社会认知操作、非社会认知操作的认知任务的测试总分上没有差异,而在社会认知一非社会认知方式的评价分数上存在显著差异。外倾被试更多地倾向于社会认知型认知方式,内倾被试更多地倾向于非社会认知型认知方式。研究结果支持了外倾性与智力关系的假设:外倾性与社会认知一非社会认知的认知方式存在相关关系,而与认知操作测试总分无关。  相似文献   

19.
该文在新一代测验理论的视角下,以几何类比推理测验为研究对象,以认知策略的诊断为目的,研究更能引发被试的规则构建策略或选项剔除策略的项目的特征。研究结果表明,项目中元素数量是影响被试使用规则构建策略或选项剔除策略的关键因素,元素数量越多时越倾向于用规则构建策略,而元素数量越少时倾向于用选项剔除策略。研究结果可直接用于测验设计,使得测验能更多在某一策略的框假下分析描述被试特征。  相似文献   

20.
The Children's Embedded Figures Test, the Rod and Frame Test to measure the field dependence-independence cognitive style, Cattell's Culture Fair Intelligence Tests to measure cognitive ability, and two cancellation tasks (Zazzo task and Bourdon task) to assess sustained attention were administered to 179 boys and 110 girls whose average age was 9.0 yr. Correlations between scores on measures of field dependence-independence and cognitive ability were moderated. Average correlations between scores on measures of field dependence-independence, cognitive ability, and measures of sustained attention was .23 for the Zazzo task and quite weak (.06) for the Bourdon task.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号