首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
A nonparametric item response theory model—the Mokken scale analysis (a stochastic elaboration of the deterministic Guttman scale)—and a computer program that performs this analysis are described. Three procedures of scaling are distinguished: a search procedure, an evaluation of the whole set of items, and an extension of an existing scale. All procedures provide a coefficient of scalability for all items that meet the criteria of the Mokken model and an item coefficient of scalability for every item. Four different types of reliability coefficient are computed both for the entire set of items and for the scalable items. A test of robustness of the found scale can be performed to analyze whether the scale is invariant across different subgroups or samples. This robustness test serves as a goodness of fit test for the established scale. The program is written in FORTRAN 77. Two versions are available, an SPSS-X procedure program (which can be used with the SPSS-X mainframe package) and a stand-alone program suitable for both mainframe and microcomputers.  相似文献   

3.
To date, exposure control procedures that are designed to control item exposure and test overlap simultaneously are based on the assumption of item sharing between pairs of examinees. However, examinees may obtain test information from more than one examinee in practice. This larger scope of information sharing needs to be taken into account in refining exposure control procedures. To control item exposure and test overlap among a group of examinees larger than two, the relationship between the two indices needs to be identified first. The purpose of this paper is to analytically derive the relationships between item exposure rate and each of the two forms of test overlap, item sharing and item pooling, for fixed‐length computerized adaptive tests. Item sharing is defined as the number of common items shared by all examinees in a group, while item pooling is the number of overlapping items that an examinee has with a group of examinees. The accuracy of the derived relationships was verified using numerical examples. The relationships derived will lay the foundation for future development of procedures to simultaneously control item exposure and item sharing or item pooling among a group of examinees larger than two.  相似文献   

4.
Classical item analysis procedures were developed for dichotomously scored items and do not apply to items allowing multiple correct responses. Maximum likelihood procedures analogous to those employed in polychotomous bio-assay are presented which yield estimates of the sets of parameters for items having multiple nonordered responses. Expressions for the estimates of the asymptotic variances of the item parameters and on overall chi-square goodness of fit test are also provided.  相似文献   

5.
Students enrolled in a Psychology of Learning course were assigned to either a lecture section, one of two similar personalized instruction sections, or a fourth section that rotated across all three teaching procedures. All students took identical midterms and a final examination. After correcting test performance for differences in the cumulative grade point average of students in the four sections, examination performance of students in the personalized sections was found to be superior to that of students in the lecture section. An analysis of class section examination performance by item type revealed that students in the lecture section scored lower on all item types, but the greatest differences occurred on items that required written responses (essay and fill-in items) rather than recognition responses (multiple choice items). A gross analysis of student performance in the class rotated across the instructional procedures suggests that personalized instruction had its greatest impact on students with "average" to "poor" academic records.  相似文献   

6.
Cognitive diagnosis models of educational test performance rely on a binary Q‐matrix that specifies the associations between individual test items and the cognitive attributes (skills) required to answer those items correctly. Current methods for fitting cognitive diagnosis models to educational test data and assigning examinees to proficiency classes are based on parametric estimation methods such as expectation maximization (EM) and Markov chain Monte Carlo (MCMC) that frequently encounter difficulties in practical applications. In response to these difficulties, non‐parametric classification techniques (cluster analysis) have been proposed as heuristic alternatives to parametric procedures. These non‐parametric classification techniques first aggregate each examinee's test item scores into a profile of attribute sum scores, which then serve as the basis for clustering examinees into proficiency classes. Like the parametric procedures, the non‐parametric classification techniques require that the Q‐matrix underlying a given test be known. Unfortunately, in practice, the Q‐matrix for most tests is not known and must be estimated to specify the associations between items and attributes, risking a misspecified Q‐matrix that may then result in the incorrect classification of examinees. This paper demonstrates that clustering examinees into proficiency classes based on their item scores rather than on their attribute sum‐score profiles does not require knowledge of the Q‐matrix, and results in a more accurate classification of examinees.  相似文献   

7.
College students’ ability to judge whether a studied item had been learned well enough to be recalled on a later test was examined in three experiments with self-paced learning procedures. Generally, these learners compensated for item difficulty when allocating study time, studying hard items longer than easy items, but they still recalled more easy items than hard items and tended to drop items out too soon. When provided with test opportunities during study or a delay between study and judgment, learners compensated significantly more for item difficulty and recalled substantially more. Paradoxically, good and poor learners compensated similarly for item difficulty and benefited similarly from testing during study and from delayed decision making. Thus, although the ability to make metamemory decisions was shown to be important for effective learning, these decisions were made equally well by good and poor associative learners. An analysis of tasks used to investigate metamemory-memory relationships in adult learning may provide an account for this apparent learning ability paradox.  相似文献   

8.
本文对多级计分认知诊断测验的DIF概念进行了界定,并通过模拟实验以及实证研究对四种常见的多级计分DIF检验方法的适用性进行理论以及实践性的探索。研究结果表明:四种方法均能对多级计分认知诊断中的DIF进行有效的检验,且各方法的表现受模型的影响不大;相较于以总分为匹配变量,以KS为匹配变量时更利于DIF的检测;以KS为匹配变量的LDFA方法以及以KS为匹配变量的曼特尔检验方法在检测DIF题目时有着最高的检验力。  相似文献   

9.
Men score higher than women on the Mental Rotations test (MRT), and the magnitude of this gender difference is the largest of that on any spatial test. Goldstein, Haldane, and Mitchell (1990) reported finding that the gender difference on the MRT disappears when “performance factors” are controlled— specifically, when subjects are allowed sufficient time to attempt all items on the test or when a scoring procedure that controls for the number of items attempted is used. The present experiment also explored whether eliminating these performance factors results in a disappearance of the gender difference on the test. Male and female college students were allowed a short time period or unlimited time on the MRT. The tests were scored according to three different procedures. The results showed no evidence that the gender difference on the MRT was affected by the scoring method or the time limit. Regardless of the scoring procedure, men scored higher than women, and the magnitude of the gender difference persisted undiminished when subjects completed all items on the test. Thus there was no evidence that performance factors produced the gender difference on the MRT. These results are consistent with the results of other investigators who have attempted to replicate Goldstein et al. ’s findings.  相似文献   

10.
Two experiments are reported which investigated the effects of data-driven generation of study items on direct and indirect measures of memory. Previous research in the field of implicit memory has traditionally employed generation procedures at encoding which focused on conceptually driven processing. The present study undertook to devise data-driven generation procedures that were predicted to lead to a generation effect on word-stem completion.

In Experiment 1 subjects had to generate target items from anagrams and newly developed “assemblograms”, requiring mainly data-driven processing, as well as from semantic cues and definitions, involving mainly conceptually driven processing. Effects of these generate conditions were compared to the usual name condition on a direct word-stem cued recall test, and on an indirect word-stem completion test. Differences between data-driven generation on the stem completion task and the name condition failed to reach significant differences in retention.

In Experiment 2 subjects generated targets from assemblograms and from semantic cues. The data revealed the predicted occurrence of a generation effect on an indirect memory test following data-driven generation. The finding of a generation effect in an indirect as opposed to a direct memory test was seen as support for the view that generating a study item may enhance data-driven as well as conceptually driven processing, depending on the processing demands made by generation procedures. The results were interpreted within the transfer-appropriate processing framework, with additional reference to Glisky and Rabinowitz's two-component account of generation effects (Glisky & Rabinowitz, 1985).  相似文献   

11.
To date, the statistical software designed for assessing differential item functioning (DIF) with Mantel-Haenszel procedures has employed the following statistics: the Mantel-Haenszel chi-square statistic, the generalized Mantel-Haenszel test and the Mantel test. These statistics permit detecting DIF in dichotomous and polytomous items, although they limit the analysis to two groups. On the contrary, this article describes a new approach (and the related software) that, using the generalized Mantel-Haenszel statistic proposed by Landis, Heyman, and Koch (1978), permits DIF assessment in multiple groups, both for dichotomous and polytomous items. The program is free of charge and is available in the following languages: Spanish, English and Portuguese.  相似文献   

12.
Can things that were never experienced be more readily accepted on recognition tests than things that were experienced? A current explanation of false memory predicts that this can happen when things that were never experienced provide superior access to the gist of events. This prediction was tested in three experiments in which the task was to accept all test items that were consistent with the substance of previously studied material, regardless of whether they had been studied. Acceptance rates were consistently higher for some never-studied items (those that provided superior access to gist memories) than for studied items. This effect varied predictably as a function of manipulations of the strength of gist memories and their accessibility. These results have implications for the use of exploratory memory-interrogation procedures in psychotherapy and the law.  相似文献   

13.
The use of well-documented procedures such as shaping, differential reinforcement, and fading may not be the most practical for teaching certain academic behaviors. An alternative procedure of interspersing trials on previously trained items with trials on unknown items has been suggested, but its effects on acquisition and retention have not been systematically examined. This study investigated the effects of interspersing known items during training on new tasks. Six mentally retarded adolescents were given pretests on spelling and sightreading words, which were divided into pools of learned and unlearned items. Training and baseline conditions were implemented concurrently, using a multi-element design. During interspersal training sessions, 10 known words from the pretest were alternately presented with each of 10 test words that were incorrect on the pretest. The ratio of previously mastered words to test words was gradually reduced. During baseline sessions, 10 different test words were presented without alternation of previously known words. During this condition, a procedure involving high-density social reinforcement contingent on task-related behaviors, but not necessarily correct responses, was later introduced, followed by a return to the original noninterspersal baseline. During all conditions, test words were deleted and replaced after meeting a mastery criterion of three consecutive correct trials. Retention tests were administered over learned test words for all conditions, at specified intervals. Results showed that both acquisition and retention of spelling and sightreading words were facilitated by the interspersal procedure. All subjects acquired more words during the interspersal condition than either the high-density or baseline conditions. The effectiveness of the procedure may possibly be attributed to better maintenance of attending behavior to unknown items as a function of the inclusion of known items, which directly increase the amount of reinforcement for correct responses during the early stages of skill acquisition.  相似文献   

14.
In an attempt to assess the degree to which specific stimulus-response associations are gradually acquired in learning a serial list, the order of the middle items was altered during acquisition. Five groups with 16 Ss per group had either no items switched, two items switched after four or eight test trials, or four items switched after four or eight test trials. The nonsense syllables were presented with slide projectors by means of standard serial anticipation procedures. Contrary to hypotheses, there were no overall differences between the four experimental groups and the control in trials to criterion or in total errors. However, although few experimental Ss reported noticing the switch, they made more errors on the trials immediately following the switch in comparison with the control group. These results are interpreted as disconfirming continuous, stimulus-specific association assumptions and supporting noncontinuous, nonassociative approaches.  相似文献   

15.
This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.  相似文献   

16.
A model that describes the construction and execution of decimal computation procedures is presented. Our hypothesis is that students compute by relying solely on syntax-based rules; semantic knowledge has no effect on performance. To test the claim, a model is developed in which computation procedures are viewed as chains of component symbol manipulation rules. The model assumes that students acquire through instruction the individual rules that achieve subgoals in the computation process. The task for the procedural system is to select rules that satisfy each subgoal in sequence. The model specifies the rules of the system and identifies the syntactic features of the task that affect the selection of individual rules at each decision point. It then predicts the relative difficulty of decimal computation items and predicts the procedural flaw that will occur most frequently on each item. Written test and interview data are presented to test the predictions. Concluding comments discuss the nature of students' computation procedures, compare the model with other models of computation performance, and outline how the model might inform instruction.  相似文献   

17.
A hybrid procedure for number correct scoring is proposed. The proposed scoring procedure is based on both classical true-score theory (CTT) and multidimensional item response theory (MIRT). Specifically, the hybrid scoring procedure uses test item weights based on MIRT and the total test scores are computed based on CTT. Thus, what makes the hybrid scoring method attractive is that this method accounts for the dimensionality of the test items while test scores remain easy to compute. Further, the hybrid scoring does not require large sample sizes once the item parameters are known. Monte Carlo techniques were used to compare and contrast the proposed hybrid scoring method with three other scoring procedures. Results indicated that all scoring methods in this study generated estimated and true scores that were highly correlated. However, the hybrid scoring procedure had significantly smaller error variances between the estimated and true scores relative to the other procedures.  相似文献   

18.
质性研究中编码者信度的多种方法考察   总被引:1,自引:0,他引:1  
徐建平  张厚粲 《心理科学》2005,28(6):1430-1432
质性研究中检验编码者信度的方法有归类一致性指数、编码信度系数、相关系数、中位数检验、概化系数等。基于教师胜任力访谈数据集,对编码者信度考察结果表明,归类一致性指数和编码信度系数受相同编码数影响而不稳定,相关系数受数据类型制约,中位数检验受研究设计影响,概化系数则受编码者和编码项目的数量影响。研究中须合理选用。  相似文献   

19.
Lihua Yao 《Psychometrika》2012,77(3):495-523
Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure of item pools, the population distribution of the simulees, the number of items selected, and the content area. The existing procedures such as Volume (Segall in Psychometrika, 61:331?C354, 1996), Kullback?CLeibler information (Veldkamp & van?der Linden in Psychometrika 67:575?C588, 2002), Minimize the error variance of the linear combination (van?der Linden in J. Educ. Behav. Stat. 24:398?C412, 1999), and Minimum Angle (Reckase in Multidimensional item response theory, Springer, New York, 2009) are compared to a new procedure, Minimize the error variance of the composite score with the optimized weight, proposed for the first time in this study. The intent is to find an item selection procedure that yields higher precisions for both the domain and composite abilities and a higher percentage of selected items from the item pool. The comparison is performed by examining the absolute bias, correlation, test reliability, time used, and item usage. Three sets of item pools are used with the item parameters estimated from real live CAT data. Results show that Volume and Minimum Angle performed similarly, balancing information for all content areas, while the other three procedures performed similarly, with a high precision for both domain and overall scores when selecting items with the required number of items for each domain. The new item selection procedure has the highest percentage of item usage. Moreover, for the overall score, it produces similar or even better results compared to those from the method that selects items favoring the general dimension using the general model (Segall in Psychometrika 66:79?C97, 2001); the general dimension method has low precision for the domain scores. In addition to the simulation study, the mathematical theories for certain procedures are derived. The theories are confirmed by the simulation applications.  相似文献   

20.
Ke-Hai Yuan 《Psychometrika》2009,74(2):233-256
When data are not missing at random (NMAR), maximum likelihood (ML) procedure will not generate consistent parameter estimates unless the missing data mechanism is correctly modeled. Understanding NMAR mechanism in a data set would allow one to better use the ML methodology. A survey or questionnaire may contain many items; certain items may be responsible for NMAR values in other items. The paper develops statistical procedures to identify the responsible items. By comparing ML estimates (MLE), statistics are developed to test whether the MLEs are changed when excluding items. The items that cause a significant change of the MLEs are responsible for the NMAR mechanism. Normal distribution is used for obtaining the MLEs; a sandwich-type covariance matrix is used to account for distribution violations. The class of nonnormal distributions within which the procedure is valid is provided. Both saturated and structural models are considered. Effect sizes are also defined and studied. The results indicate that more missing data in a sample does not necessarily imply more significant test statistics due to smaller effect sizes. Knowing the true population means and covariances or the parameter values in structural equation models may not make things easier either. The research was supported by NSF grant DMS04-37167, the James McKeen Cattell Fund.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号