首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In some popular test designs (including computerized adaptive testing and multistage testing), many item pairs are not administered to any test takers, which may result in some complications during dimensionality analyses. In this paper, a modified DETECT index is proposed in order to perform dimensionality analyses for response data from such designs. It is proven in this paper that under certain conditions, the modified DETECT can successfully find the dimensionality-based partition of items. Furthermore, the modified DETECT index is decomposed into two parts, which can serve as indices of the reliability of results from the DETECT procedure when response data are judged to be multidimensional. A simulation study shows that the modified DETECT can successfully recover the dimensional structure of response data under reasonable specifications. Finally, the modified DETECT procedure is applied to real response data from two-stage tests to demonstrate how to utilize these indices and interpret their values in dimensionality analyses.  相似文献   

2.
This paper investigates the double‐rating method (DRM) as a way to reduce test takers' social desirability response set. This involves the introduction of a pre‐assessment task, in which respondents indicate how others would probably answer the test or survey questionnaire presented. Two studies conducted in Hong Kong and Canada evaluate the effectiveness of the DRM. Results show that social desirability responses obtained using this method are significantly less frequent than those obtained under a conventional instruction. The pre‐assessment task induces test takers to realize that other people will probably respond truthfully, and report some socially undesirable information. The test takers subsequently conform to this frankness in their own self‐report. The merits and limitations of this method are discussed.  相似文献   

3.
In high-stakes testing, often multiple test forms are used and a common time limit is enforced. Test fairness requires that ability estimates must not depend on the administration of a specific test form. Such a requirement may be violated if speededness differs between test forms. The impact of not taking speed sensitivity into account on the comparability of test forms regarding speededness and ability estimation was investigated. The lognormal measurement model for response times by van der Linden was compared with its extension by Klein Entink, van der Linden, and Fox, which includes a speed sensitivity parameter. An empirical data example was used to show that the extended model can fit the data better than the model without speed sensitivity parameters. A simulation was conducted, which showed that test forms with different average speed sensitivity yielded substantial different ability estimates for slow test takers, especially for test takers with high ability. Therefore, the use of the extended lognormal model for response times is recommended for the calibration of item pools in high-stakes testing situations. Limitations to the proposed approach and further research questions are discussed.  相似文献   

4.
Motor interference was measured in terms of average response time in sorting 15 pairs of cards, each pair containing words which were either unrelated or identical but printed in different colors. The two words of each pair were used in labeling two cubicles in such a way that they were apart by 0, 1, 2, or 3 intervening cubicles in quasirandomly chosen directions. Interference was inversely related to response similarity, but this relationship may not appear in the absence of sufficient stimulus similarity. The findings are interpreted in the light of a hypothesis which views motor interference as a tendency for responses to deflect from their own courses and be pulled towards those of other responses with which they are at conflict.  相似文献   

5.
In an experiment employing the symbol-element recognition task (Mohs, Wescourt, & Atkinson, 1975), subjects first learned six lists consisting of four words (elements) each. Each list was associated with a unique consonant (symbol). Subsequently, on each of a series of test trials, subjects were presented with one, two, or four symbol-element pairs. A positive response was required if all test words were correctly paired with their associated consonants and a negative response if any one test word was incorrectly paired with a consonant. Of primary concern was the way reaction time (RT) varied with number of pairs presented, the type of response required, and, on negative trials, the position of the mismatched pair in the test display. RT increased with the number of pairs presented on a trial and the increase was greater for positive than for negative trials. For negative pair trials, RT increased with the distance of the mismatched pair from the top of the test display. On negative trials in which the top pair in the test display was the mismatched pair, RT increased with the total number of pairs presented on the trial. A serial, probabilistic order of processing model is proposed to account for these results, and applications of the model to other paradigms are discussed.  相似文献   

6.
The purpose of this study is to explore patterns in model-data fit related to subgroups of test takers from a large-scale writing assessment. Using data from the SAT, a calibration group was randomly selected to represent test takers who reported that English was their best language from the total population of test takers (N = 322,011). A reference scale for the items was constructed based on EBL responses. Response behaviors of test takers who reported that English was not their best language (ENBL) were examined in relationship to this reference scale. This study illustrates the use of differential subgroup analyses to identify patterns related to person misfit within subgroups, as well as subsets of items, that may affect the validity of writing scores for ENBL test takers. The methodology described here offers an approach that can be used to explore, understand, and improve the validity of scores obtained from ENBL test takers in large-scale writing assessments.  相似文献   

7.
The experiment reported was concerned with impression formation in children. Twelve subjects in each of Grades K, 2, 4, and 6 rated several sets of single trait words and trait pairs. The response scale consisted of a graded series of seven schematic faces which ranged from a deep frown to a happy smile. A basic question was whether children use an orderly integration rule in forming impressions of trait pairs. The answer was clear. At all grade levels a simple averaging model adequately accounted for pair ratings. A second question concerned how children resolve semantic inconsistencies. Responses to two highly inconsistent trait pairs suggested that subjects responded in the same fashion, essentially averaging the two traits in a pair. Overall, the data strongly supported an averaging model, and indicated that impression formation of children is similar to previous results obtained from adults.  相似文献   

8.
Two new tests for a model for the response times on pure speed tests by Rasch (1960) are proposed. The model is based on the assumption that the test response times are approximately gamma distributed, with known index parameters and unknown rate parameters. The rate parameters are decomposed in a subject ability parameter and a test difficulty parameter. By treating the ability as a gamma distributed random variable, maximum marginal likelihood (MML) estimators for the test difficulty parameters and the parameters of the ability distribution are easily derived. Also the model tests proposed here pertain to the framework of MML. Two tests or modification indices are proposed. The first one is focused on the assumption of local stochastic independence, the second one on the assumption of the test characteristic functions. The tests are based on Lagrange multiplier statistics, and can therefore be computed using the parameter estimates under the null model. Therefore, model violations for all items and pairs of items can be assessed as a by-product of one single estimation run. Power studies and applications to real data are included as numerical examples.  相似文献   

9.
The theory of motivated cheating postulates that test takers may cheat when they do not know an answer. With probabilityk, an “observer” is unsure of an answer and will copy from a nearby “target” with probabilityc. The corresponding parameters for the target may be entirely unrelated to those of the observer. Thus, the undesirable feature of bidirectionality of parameters found in correlational techniques is not an inherent feature of this theory of cheating. Predictions are derived, and estimates ofk andc are proposed. Statistically large values of c suggest that an observer was copying from a target. High values ofc for both the observer and the target suggest collusion. The theory is applied to a 40-item five-choice test taken by students in an introductory psychology section. From the full paired comparison matrix of target × observer parameter estimates, the method identifies 2 students who were probably in collusion.  相似文献   

10.
Many students and applicants take multiple‐choice tests to demonstrate their competence and achievement. When they are unsure, they guess the most likely answer to maximize their score. Despite the impact of guessing on test reliability and individual performance, studies have not examined how patterns of answer sequences in multiple‐choice tests affect guessing. This research presents the test taker's fallacy, which refers to an individual's tendency to expect a different answer to appear for the next question given a run of the same answer choices. The test taker's fallacy exhibits negative recency, similar to the gambler's fallacy. However, extending the sequential judgment literature, the test taker's fallacy shows that negative recency arises even when sequences may or may not be randomly generated. In three studies, including a survey and experiments, the test taker's fallacy is robustly observed. The test taker's fallacy is consistent with the operation of the representativeness heuristic. This research explains what and how test takers guess given a streak of answers and extends judgment under uncertainty to the test‐taking context.  相似文献   

11.
In 3 experiments, the authors examined the role of memory for prior instances for making relative judgments in conflict detection. Participants saw pairs of aircraft either repeatedly conflict with each other or pass safely before being tested on new aircraft pairs, which varied in similarity to the training pairs. Performance was influenced by the similarity between aircraft pairs. Detection time was faster when a conflict pair resembled a pair that had repeatedly conflicted. Detection time was slower, and participants missed conflicts, when a conflict pair resembled a pair that had repeatedly passed safely. The findings identify aircraft features that are used as inputs into the memory decision process and provide an indication of the processes involved in the use of memory for prior instances to make relative judgments.  相似文献   

12.
Test collusion (TC) is sharing of test materials or answers to test questions before or during the test (important special case of TC is item preknowledge). Because of potentially large advantages for examinees involved, TC poses a serious threat to the validity of score interpretations. The proposed approach applies graph theory methodology to response similarity analyses for identifying groups of examinees involved in TC without using any knowledge about parts of test that were affected by TC. The approach supports different response similarity indices (specific to a particular type of TC) and different types of groups (connected components, cliques, or near-cliques). A comparison with an up-to-date method using real and simulated data is presented. Possible extensions and practical recommendations are given.  相似文献   

13.
Faster same than different judgments typically are obtained when two letters are compared. When two tones that might differ only on frequency are compared, however, same judgments typically are slower than different judgments. A uniprocessor, unidimensional model, based on Krueger's noisy-operator theory, was fitted satisfactorily to data from four published studies of tone comparison. The model predicts faster response time on different judgments because of heterogeneity of difference. Because the second tone in a pair typically may be either higher or lower in frequency than the first, there will be a greater variety of perceived difference counts on different pairs than on same pairs. As a result, a large difference count will be decisive and will lead to an immediate "different" response, because it can be produced only by a different pair, whereas a small difference count will not be so decisive because it can be produced by either a same or a different pair. Consequently, there generally will be more rechecking on same than different pairs, and thus longer RT on same pairs.  相似文献   

14.
Individuals’ propensity not to override the first answer that comes to mind is thought to be a crucial cause behind many failures in reasoning. In the present study, we aimed to explore the strategies used and the abilities employed when individuals solve the cognitive reflection test (CRT), the most widely used measure of this tendency. Alongside individual differences measures, protocol analysis was employed to unfold the steps of the reasoning process in solving the CRT. This exploration revealed that there are several ways people solve or fail the test. Importantly, 77% of the cases in which reasoners gave the correct final answer in our protocol analysis, they started their response with the correct answer or with a line of thought which led to the correct answer. We also found that 39% of the incorrect responders reflected on their first response. The findings indicate that the suppression of the first answer may not be the only crucial feature of reflectivity in the CRT and that the lack of relevant knowledge is a prominent cause of the reasoning errors. Additionally, we confirmed that the CRT is a multi-faceted construct: both numeracy and reflectivity account for performance. The results can help to better apprehend the “whys and whens” of the decision errors in heuristics and biases tasks and to further refine existing explanatory models.  相似文献   

15.
16.
Short-term memory in the pigeon: stimulus-response associations   总被引:3,自引:3,他引:0       下载免费PDF全文
Three pigeons pecked for food in two experiments in which each trial consisted of two phases: a study and a test phase. The study phase in Experiment I consisted of two stimulus-response pairs presented successively. Each pair consisted of the illumination of a left or right key (the stimulus) and a peck on the lighted side key (the response). The study phase in Experiment II consisted of three such pairs presented successively. A retention interval, varied between 0.1 and 4.0 sec, separated the study phase from the test phase. The test phase of a trial began with the illumination of the center key by one of two (Experiment I) or three (Experiment II) colors. This color was the same as the stimulus element of one of the pairs in the study phase. A reinforcer was presented if a subject then emitted the response element of the indicated stimulus-response pair. The results provide information on the conditions that enable a pigeon to remember the responses most recently emitted in the presence of various stimuli. The results suggest an account of the maintenance of behavior that is temporally noncontiguous with reinforcement.  相似文献   

17.
Four experiments tested the hypothesis that successful retrieval of an item from memory affects retention only because the retrieval provides an additional presentation of the target item. Two methods of learning paired associates were compared. In the pure study trial (pure ST condition) method, both items of a pair were presented simultaneously for study. In the test trial/study trial (TTST condition) method, subjects attempted to retrieve the response term during a period in which only the stimulus term was present (and the response term of the pair was presented after a 5-sec delay). Final retention of target items was tested with cued-recall tests. In Experiment 1, there was a reliable advantage in final testing for nonsense-syllable/number pairs in the TTST condition over pairs in the pure ST condition. In Experiment 2, the same result was obtained with Eskimo/English word pairs. This benefit of the TTST condition was not apparently different for final retrieval after 5 min or after 24 h. Experiments 3 and 4 ruled out two artifactual explanations of the TTST advantage observed in the first two experiments. Because performing a memory retrieval (TTST condition) led to better performance than pure study (pure ST condition), the results reject the hypothesis that a successful retrieval is beneficial only to the extent that it provides another study experience.  相似文献   

18.
The brain’s processing of synonymity and antonymy was explored by examining the cortical evoked responses to correct judgments that a test word was a synonym or an antonym of a standard word presented 1 sec previously. Each of five subjects judged 256 pairs of words in each of two sessions. The evoked response to the second word was averaged separately for synonym and antonym pairs. Presentation of each test word as a synonym or an antonym, the order of presentation of each pair, and the side of the “synonym” response key were counter-balanced within subjects. The difference between the averaged response to antonym test words and that to synonym test words differed biphasically over the interval 250-650 msec after the stimulus. The demonstration of an evoked response difference between synonyms and antnyms extends the applicability of evoked potentials from attributes of individual word meaning to the semantic relationships between words.  相似文献   

19.
WordNet, an electronic dictionary (or lexical database), is a valuable resource for computational and cognitive scientists. Recent work on the computing of semantic distances among nodes (synsets) in WordNet has made it possible to build a large database of semantic distances for use in selecting word pairs for psychological research. The database now contains nearly 50,000 pairs of words that have values for semantic distance, associative strength, and similarity based on co-occurrence. Semantic distance was found to correlate weakly with these other measures but to correlate more strongly with another measure of semantic relatedness, featural similarity. Hierarchical clustering analysis suggested that the knowledge structure underlying semantic distance is similar in gross form to that underlying featural similarity. In experiments in which semantic similarity ratings were used, human participants were able to discriminate semantic distance. Thus, semantic distance as derived from WordNet appears distinct from other measures of word pair relatedness and is psychologically functional. This database may be downloaded from www.psychonomic.org/archive/.  相似文献   

20.
WordNet, an electronic dictionary (or lexical database), is a valuable resource for computational and cognitive scientists. Recent work on the computing of semantic distances among nodes (synsets) in WordNet has made it possible to build a large database of semantic distances for use in selecting word pairs for psychological research. The database now contains nearly 50,000 pairs of words that have values for semantic distance, associative strength, and similarity based on co-occurrence. Semantic distance was found to correlate weakly with these other measures but to correlate more strongly with another measure of semantic relatedness, featural similarity. Hierarchical clustering analysis suggested that the knowledge structure underlying semantic distance is similar in gross form to that underlying featural similarity. In experiments in which semantic similarity ratings were used, human participants were able to discriminate semantic distance. Thus, semantic distance as derived from WordNet appears distinct from other measures of word pair relatedness and is psychologically functional. This database may be downloaded fromwww.psychonomic.org/archive/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号