首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The purpose of this investigation was to examine the extent to which item and text characteristics predict item difficulty on the comprehension portion of the Gates-MacGinitie Reading Tests for the 7th–9th and 10th–12th grade levels. Detailed item-based analyses were performed on 192 comprehension questions on the basis of the cognitive processing model framework proposed by Embretson and colleagues (Embretson & Wetzel, 1987). Item difficulty was analyzed in terms of various passage features (e.g., word frequency and number of propositions) and individual-question characteristics (e.g., abstractness and degree of inferential processing), using hierarchical linear modeling. The results indicated that the difficulty of the items in the test for the 7th–9th grade level is primarily influenced by text features—in particular, vocabulary difficulty—whereas the difficulty of the items in the test for the 10th–12th grade level is less systematically influenced by text features.  相似文献   

2.
刘玥  刘红云 《心理科学》2015,(6):1504-1512
研究旨在探索无铆题情况下,使用构造铆测验法,实现测验分数等值。研究一和研究二分别探索题目难度排序错误、铆题难度差异对构造铆测验法的影响。结果表明:(1)等组条件下,随着错误铆题比例,难度排序错误程度,铆题难度差异增大,构造铆测验法的等值误差逐渐增大,随机等组法的等值误差较为稳定;不等组条件下,构造铆测验法的等值误差均小于随机等组法;(2)对于构造铆测验法,在不等组条件下,铆测验长度越短,等值误差越大。  相似文献   

3.
Is learning of a complex functional relationship enhanced by trying to predict what output will go with a given input, as compared to studying an input–output pair? We examined learning of a bilinear function and transfer to new items outside the trained range. Subjects either saw the input–output pairs (study-only condition) or attempted to guess the output and then saw the pair (test/study condition). The total study times were equated, and motivation was enhanced with a monetary bonus. Performance was markedly better for the test/study condition, both within the trained range and in the transfer test. This benefit of testing during training was observed on a criterial test administered shortly after training. Testing has long been shown to enhance the explicit learning and retention of verbal material; our present findings reveal a novel domain for which testing can also be advantageous—that is, function learning.  相似文献   

4.
Abstract:  In test operations using IRT (item response theory), items are included in a test before being used to rate subjects and the response data is used to estimate their item parameters. However, this method of test operation may lead to item content leakage and an adequate test operation can become difficult. To address this problem, Ozaki and Toyoda (2005, 2006 ) developed item difficulty parameter estimation methods that use paired comparison data from the perspective of the difficulty of items as judged by raters familiar with the field. In the present paper, an improved method of item difficulty parameter estimation is developed. In this new method, an item for which the difficulty parameter is to be estimated is compared with multiple items simultaneously, from the perspective of their difficulty. This is not a one-to-one comparison but a one-to-many comparison. In the comparisons, raters are informed that items selected from an item pool are ordered according to difficulty. The order will provide insight to improve the accuracy of judgment.  相似文献   

5.
6.
The study attempted an understanding of the cognitive process involved in appreciation of history and the developmental pattern of the same. A test of Historical Understanding (HU) was constructed consisting of items which were similar to historical situations, but real historical episodes were not included in order to avoid any effect of prior knowledge and memory of historical facts. The test items were pilot tested and refined. A random sample of 15 children, 9–14 years of age (Grades 4, 6 and 8), was administered the test with clinical probing followed by an interview to assess children’s idea of past and history. The findings revealed that appreciation of the difference between past and history, chronology, and historical imagination emerged early by 9 years of age developing further with age/Grade. Development of some dimensions such as empathy and critical analysis appeared late by 13–14 years.  相似文献   

7.
The aim of the present study was to compare the activation levels of true and false memories in the Deese–Roediger–McDermott (DRM) paradigm. For this purpose, we used a lexical decision task (LDT) that can be considered a relative pure measure of activation. Participants had to study a list of words that were semantically associated to a critical non-presented word (CI), and afterwards had to classify the actually studied words, the CI and new words in the LDT. Results indicated that the classification latency of the CI was the same as actually studied words and shorter than new words. The results might be interpreted as evidence that the false and true memory items have the same activation level and that the false memory effect can be based on the indirect activation of the CI at the encoding.  相似文献   

8.
The remote association test (RAT) has been applied in various fields; however, evidence of construct validity for the original version and subsequent extensions of the RAT remains limited. This study aimed to elucidate the dimensionality and the relationship between item features and item difficulties for the RAT—Chinese Version (RAT-C) using the Rasch model and the linear logistic test model (LLTM). The revised 30-item RAT-C was administered to 475 undergraduates (263 women and 212 men) in 8 universities in Taiwan. Item features (including types of associations among stimulus words, and frequency and concreteness of target words) were recoded. The analysis found that the RAT-C measured a single latent construct, with all 30 items conforming to the Rasch model’s expectation. Furthermore, according to the LLTM analysis, most item features predicted Rasch item difficulty, suggesting that these features can explain why some items were more difficult than others and can be used to create new items with known item difficulty to tailor the difficulty level for different groups of participants in the future.  相似文献   

9.
The identical elements (IE) model (Rickard, Healy, & Bourne, Learning, Memory, and Cognition 32:734–748, 1994) of fact representation predicts that, in both verbal and numerical domains, performance gains with retrieval practice on multielement items will be specific to the practiced stimulus–response combinations, failing to transfer even to altered stimulus–response mappings of practiced items. In the case of arithmetic, the model predicts no transfer across either complementary operations (e.g., 4 × 7 to 28 / 4) or complementary division or subtraction problems (e.g., 28 / 4 to 28 / 7). Although that model has successfully described transfer effects in the domains of multiplication–division and episodic cued recall, it is challenged by a recent demonstration of positive cross-operation transfer for addition and subtraction (Campbell & Agnew, Psychonomic Bulletin & Review 16:938–944, 2009). We report results of a new addition–subtraction transfer experiment, the design of which closely matched that of a prior multiplication–division experiment that supported the model. The transfer results were consistent with the IE model. A two-component model of memory retrieval practice effects is proposed to account for the discrepant experimental results for addition and subtraction and to guide future work.  相似文献   

10.
Assessment of irrational beliefs by such measures as the Common Beliefs Survey III (CBS) has traditionally relied upon classical test theory assumptions, in which the properties of specific test items are less important than the total test score as the aggregate of all item responses. An alternative approach using item response theory (IRT) methodology allows one to specify the parameters of difficulty and discrimination for each test item. Difficulty levels of CBS items range along a continuum of irrationality, the implied latent trait measured by responses to the questionnaire as a whole. We evaluated the CBS responses of 605 individuals from clinical and college settings, drawing from current and archival data. The original Likert scale ratings were recoded into dichotomous scores. Fourteen of the 54 items were highly or very highly discriminating in distinguishing respondents with high and low irrationality levels. However, discriminating items exhibited a very narrow range of difficulty; most functioned at a point a little above the halfway mark on the continuum of irrationality. Item characteristic curves and test information curves were very similar for female (n = 424) and male (n = 179) respondents. We derived a 4-item screening test for irrationality from our IRT analyses of the 54 CBS items. Further test development, focused on the selection and scaling of items with a much broader range of difficulty, would facilitate evaluation of the hierarchical structure of irrational beliefs. Portions of this paper were presented at the 39th Annual Convention of the Association for Behavioral and Cognitive Therapies, Washington, DC, November, 2005.  相似文献   

11.
Recent work by Hupbach, Gomez, Hardt, and Nadel (Learning & Memory, 14, 47–53, 2007) and Hupbach, Gomez, and Nadel (Memory, 17, 502–510, 2009) suggests that episodic memory for a previously studied list can be updated to include new items, if participants are reminded of the earlier list just prior to learning a new list. The key finding from the Hupbach studies was an asymmetric pattern of intrusions, whereby participants intruded numerous items from the second list when trying to recall the first list, but not viceversa. Hupbach et al. (2007; 2009) explained this pattern in terms of a cellular reconsolidation process, whereby first-list memory is rendered labile by the reminder and the labile memory is then updated to include items from the second list. Here, we show that the temporal context model of memory, which lacks a cellular reconsolidation process, can account for the asymmetric intrusion effect, using well-established principles of contextual reinstatement and item–context binding.  相似文献   

12.
叶萌  辛涛 《心理科学》2015,(1):209-215
本文旨在以“锚题代表性”这一研究命题切入,探索在非等组锚测验设计下,作为实现测验链接的重要载体,锚题和相关的测验试卷/水平之间究竟应该有什么关系。本文首先指出锚题代表性这一概念在等值和垂直量尺化领域中具有不同的含义,并给出其在垂直量尺化中的含义。通过考察测验链接中有关锚题代表性的既有研究,系统总结相关研究成果,本文概括出了当前锚题构建实践的可能优化方案,分析了锚题代表性研究的未来方向。  相似文献   

13.
Despite the existence of numerous health-related quality of life (HRQoL) measures, few if any are geared to evaluating the impact of consumer products. We describe the development and initial implementation of the Farage Quality of Life (FQoL™) general questionnaire, a self-administered questionnaire to assess the potential impact of a variety of consumer products on overall well-being and HRQoL. We developed the 27-item FQoL™ measure, scored on a Likert scale and covering Overall Quality of Life (1 item), Well-Being (12 items), and Energy and Vitality (14 items), and a 3-item Menstrual Module for use with menstruating women. We assessed test-retest reliability by administering the items twice to a sample of 20 women 3 days apart, calculating mean absolute differences in responses. Then, in a study of 119 women ages 18–55 years who were randomly assigned to use a new brand of menstrual pad vs. their usual menstrual pads for 1 menstrual period, we administered the FQoL™ questionnaire 5–7 days before their menstrual period and 5–7 days after the start of their period. We compared changes in responses within groups and between groups pre- vs. during menstruation. Overall, test-retest reliability was good, with a mean (SD) absolute difference for the 27 general items of 0.51 (0.31). In the menstrual pad study, the mean (SD) age of responders was 35.3 (7.9) years; 59 (50%) were age 18–35 and 60 (50%) were age 36–55. Relative to the intervention group, the usual pads group reported worse HRQoL during vs. pre-menstruation on items addressing self-confidence, managing stress, energy, and fatigue (P ≤ 0.05 for each comparison). In subgroup analyses, relative to intervention patients, women age 18-35 in the usual pads group reported greater changes for the worse during vs. pre-menstruation in managing stress; energy; and fatigue, but relatively better work or school attendance during vs. pre-menstruation, and women age 36–55 in the usual pads group reported greater changes for the worse in self-confidence and in desire to go out in public (P ≤ 0.03 for each comparison). The general FQoL™ is a new measure of HRQoL applicable to consumer product evaluation. It has good test-retest reliability. The FQoL™ menstrual module detects changes in HRQoL during vs. before the menstrual period associated with menstrual pad use. Further research is needed to assess the construct validity of the HRQoL.  相似文献   

14.
This paper discusses the influence of test difficulty on the correlation between test items and between tests. The greater the difference in difficulty between two test items or between two tests the smaller the maximum correlation between them. In general, the greater the number of degrees of difficulty among the items in a test or among the tests in a battery, the higher the rank of the matrix of intercorrelations; that is, differences in difficulty are represented in the factorial configuration as additional factors. The suggestion is made that if all tests included in a battery are roughly homogeneous with respect to difficulty existing hierarchies will be more clearly defined and meaningful psychological interpretation of factors more readily attained.  相似文献   

15.
A factor analysis of the ten sub-tests of the Seashore test of pitch discrimination revealed that more than one ability is involved. One factor, which accounted for the greater share of the variances, had loadings that decreased systematically with increasing difficulty. A second factor had strongest loadings among the more difficult items, particularly those with frequency differences of 2 to 5 cycles per second. A third had strongest loadings at differences of 5 to 12 cycles per second. No explanation for the three factors is apparent, but the hypothesis is accepted that they represent distinct abilities. In tests so homogeneous as to content and form, where a single common factor might well have been expected, the appearance of additional common factors emphasizes the importance of considering the difficulty level of test items, both in the attempt to interpret new factors and in the practice of testing. The same kind of item may measure different abilities according as it is easy or difficult for the individuals to whom it is applied.  相似文献   

16.
This article describes the functions of a SAS macro and an SPSS syntax that produce common statistics for conventional item analysis including Cronbach’s alpha, item difficulty index (p-value or item mean), and item discrimination indices (D-index, point biserial and biserial correlations for dichotomous items and item-total correlation for polytomous items). These programs represent an improvement over the existing SAS and SPSS item analysis routines in terms of completeness and user-friendliness. To promote routine evaluations of item qualities in instrument development of any scale, the programs are available at no charge for interested users. The program codes along with a brief user’s manual that contains instructions and examples are downloadable from suen.ed.psu.edu/~pwlei/plei.htm.  相似文献   

17.
An experiment was conducted to investigate people’s ability to vary a response criterion strategically, in a recognition memory task, as a function of the length of time given to process the test stimuli (from 100 to 1,500 msec). The experiment used the response signal procedure, in which the participants responded after a signal that came at a variable time delay from stimulus onset. The proportion of new versus old test items was varied systematically with the time of the response signal, with the proportion of new test items rising, falling, or staying constant at later signals. It was found that the participants’ response biases changed adaptively, becoming more conservative at later signals in the rising condition, becoming less conservative in the falling condition, and not changing significantly in the constant condition. Theoretical and methodological implications for recognition memory research are discussed.  相似文献   

18.
A dilemma was created for factor analysts by Ferguson (Psychometrika, 1941,6, 323–329) when he demonstrated that test items or sub-tests of varying difficulty will yield a correlation matrix of rank greater than 1, even though the material from which the items or sub-tests are drawn is homogeneous, although homogeneity of such material had been defined operationally by factor analysts as having a correlation matrix of rank 1. This dilemma has been resolved as a case of ambiguity, which lay in (1) failure to specify whether homogeneity was to apply to content, difficulty, or both, and (2) failure to state explicitly the kind of correlation to be used in obtaining the matrix. It is demonstrated that (1) if the material but (2) if content is homogeneous but difficulty is not, the homogeneity of the content can be demonstrated only by using the tetrachoric correlation coefficient in deriving the matrix; and that the use of the phi-coefficient (Pearsonianr) will disclose only the nonhomogeneity of the difficulty and lead to a series ofconstant error factors as contrasted withcontent factors. Since varying difficulty of items (and possibly of sub-tests) is desirable as well as practically unavoidable, it is recommended that all factor analysis problems be carried out with tetrachoric correlations. While no one would want to obtain the constant error factors by factor analysis (difficulty being more easily obtained by counting passes), their importance for test construction is pointed out.  相似文献   

19.
An experiment was done to test a context-matching explanation of memory for recency under steady-state conditions. Subjects went through a list of 550 names, in which individual names were repeated at lags of 5–30 other items. The names were shown in two different styles or contexts. An old versus new recognition decision was made on each name, and eachold response was followed by a numerical judgment of recency (JOR). When first- and second-presentation contexts were the same, recognition hit rates were higher, and mean JORs were shorter (more recent), than when the two contexts were different. The JOR result is as predicted by the context-matching hypothesis.  相似文献   

20.
In educational practice, a test assembly problem is formulated as a system of inequalities induced by test specifications. Each solution to the system is a test, represented by a 0–1 vector, where each element corresponds to an item included (1) or not included (0) into the test. Therefore, the size of a 0–1 vector equals the number of items n in a given item pool. All solutions form a feasible set—a subset of 2 n vertices of the unit cube in an n-dimensional vector space. Test assembly is uniform if each test from the feasible set has an equal probability of being assembled. This paper demonstrates several important applications of uniform test assembly for educational practice. Based on Slepian’s inequality, a binary program was analytically studied as a candidate for uniform test assembly. The results of this study establish a connection between combinatorial optimization and probability inequalities. They identify combinatorial properties of the feasible set that control the uniformity of the binary programming test assembly. Computer experiments illustrating the concepts of this paper are presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号