首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The original data of McGurk's classic study of black-white differences on cultural and noncultural tests is re-analyzed at the item level to investigate the role of possible item biases that would cause the noncultural items to be relatively more difficult than the cultural items for blacks than for whites. The evidence indicates that McGurk's results cannot be explained in terms of item biases, but appear to be the result of the noncultural items requiring more sheer reasoning ability than the cultural items, which depend more on acquired information.  相似文献   

2.
This study was designed to investigate the effects of item sampling on hindsight bias in experiments using general knowledge material. The results show that the use of random versus traditional experimenter-selected item samples can have different effects on hindsight bias. In a within-subjects study almost twice as many items in a random sample were connected with a reversed effect rather than with a traditional hindsight bias. The same items that resulted in overconfidence in foresight lead to a higher degree of hindsight bias than others. The results suggest that earlier findings of unusually large hindsight effects with general knowledge tasks may be explained by the selection of items used. No hindsight effect was found on confidence scores in a within-subjects design, but was obtained in a between-subjects design. Results suggest that the use of a within-subjects design itself can moderate hindsight bias by familiarizing subjects with the task. The study shows the importance of two conditions for decreasing the hindsight bias: (1) The use of randomly sampled items, and (2) The use of a within-subjects procedure. When these conditions were met, the "knew-it-all-along effect" was completely eliminated.  相似文献   

3.
Lord developed an approximation for the bias function for the maximum likelihood estimate in the context of the three-parameter logistic model. Using Taylor's expansion of the likelihood equation, he obtained an equation that includes the conditional expectation, given true ability, of the discrepancy between the maximum likelihood estimate and true ability. All terms of orders higher thann ?1 are ignored wheren indicates the number of items. Lord assumed that all item and individual parameters are bounded, all item parameters are known or well-estimated, and the number of items is reasonably large. In the present paper, an approximation for the bias function of the maximum likelihood estimate of the latent trait, or ability, will be developed using the same assumptions for the more general case where item responses are discrete. This will include the dichotomous response level, for which the three-parameter logistic model has been discussed, the graded response level and the nominal response level. Some observations will be made for both dichotomous and graded response levels.  相似文献   

4.
The authors examined gender bias in the diagnostic criteria for Diagnostic and Statistical Manual of Mental Disorders (4th ed., text revision; American Psychiatric Association, 2000) personality disorders. Participants (N=599) were selected from 2 large, nonclinical samples on the basis of information from self-report questionnaires and peer nominations that suggested the presence of personality pathology. All were interviewed with the Structured Interview for DSM-IV Personality (B. Pfohl, N. Blum, & M. Zimmerman, 1997). Using item response theory methods, the authors compared data from 315 men and 284 women, searching for evidence of differential item functioning in the diagnostic features of 10 personality disorder categories. Results indicated significant but moderate measurement bias pertaining to gender for 6 specific criteria. In other words, men and women with equivalent levels of pathology endorsed the items at different rates. For 1 paranoid personality disorder criterion and 3 antisocial criteria, men were more likely to endorse the biased items. For 2 schizoid personality disorder criteria, women were more likely to endorse the biased items.  相似文献   

5.
采用多侧面Rasch模型对28位评委在托幼机构教育质量评价中的评委偏差进行了分析。分析结果显示:28名评委评分宽严度差异显著;3名评委内部一致性较差,其余25名评委内部一致性较稳定;评委与评价班级的交互作用不显著,与评价项目的交互作用显著。研究结果表明MFRM可以对托幼机构教育质量评价的评委偏差进行个体层面的具体分析,从项目反应理论的视角为托幼机构教育质量评价的评委针对性培训、评估评委的合格性从而建立合格评委库等提供现代教育、心理测量学依据。  相似文献   

6.
项目功能差异在跨文化人格问卷分析中的应用   总被引:2,自引:0,他引:2  
曹亦薇 《心理学报》2003,35(1):120-126
利用IRT的等级模型调查了中日两组被试关于SHIBA简易人格量表中“环境敏感性”的项目功能差异(DIF)的现状。研究发现:(1)量表中DIF的项目比例大(3/4);(2)DIF与项目内容、阈值有关而与区分度大小关系不大;(3)DIF项目间的日方特征曲线较之中方有较强的整合性。该研究利用DIF研究结果对跨文化的人格比较作了新尝试。最后提出了关于深化DIF研究的新课题  相似文献   

7.
Viren Swami 《Body image》2009,6(2):149-151
The present study examined evidence of the ‘love-is-blind bias’ (a tendency to perceive romantic partners as more attractive than the self) among gay men and lesbians. In total, 93 gay men and 140 lesbians provided self- and partner-ratings of physical attractiveness for a range of body components. Results of a series of t-tests showed that both gay men and lesbians rated their partners as significantly more attractive than themselves on all but one item, respectively. Effect sizes for these differences were moderate to large. Further analyses showed that lesbians provided higher self- and partner-ratings than gay men on some items, although effect sizes for these differences were small. Overall, these results provide evidence for the existence of a robust positive illusion in self–partner perceptions.  相似文献   

8.
In a recent empirical study, Starns, Hicks, Brown, and Martin (Memory & Cognition, 36, 1–8 2008) collected source judgments for old items that participants had claimed to be new and found residual source discriminability depending on the old-new response bias. The authors interpreted their finding as evidence in favor of the bivariate signal-detection model, but against the two-high-threshold model of item/source memory. According to the latter, new responses only follow from the state of old-new uncertainty for which no source discrimination is possible, and the probability of entering this state is independent of the old-new response bias. However, when missed old items were presented for source discrimination, the participants could infer that the items had been previously studied. To test whether this implicit feedback led to second retrieval attempts and thus to source memory for presumably unrecognized items, we replicated Starns et al.’s (Memory & Cognition, 36, 1–8 2008) finding and compared their procedure to a procedure without such feedback. Our results challenge the conclusion to abandon discrete processing in source memory; source memory for unrecognized items is probably an artifact of the procedure, by which implicit feedback prompts participants to reconsider their recognition judgment when asked to rate the source of old items in the absence of item memory.  相似文献   

9.
Despite a large literature on infants’ memory for visually presented stimuli, the processes underlying visual memory are not well understood. Two studies with 4-month-olds (N = 60) examined the effects of providing opportunities for comparison of items on infants’ memory for those items. Experiment 1 revealed that 4-month-olds failed to show evidence of memory for an item presented during familiarization in a standard task (i.e., when only one item was presented during familiarization). In Experiment 2, infants showed robust memory for one of two different items presented during familiarization. Thus, infants’ memory for the distinctive features of individual items was enhanced when they could compare items.  相似文献   

10.
11.
A Hypermasculinity Inventory was developed to measure a macho personality constellation consisting of three components: (a) calloused sex attitudes toward women, (b) violence as manly, and (c) danger as exciting. The 30 forced-choice items were selected by a two-stage internal consistency item analysis. Issues of substantive and structural validity were addressed by considering item content, test format, homogeneity of items, and the factor structure of items. The Cronbach α coefficient for the Hypermasculinity Inventory was .89 in the present sample of 135 college men. External validity was assessed by correlating scores of the Hypermasculinity Inventory with self-reported drug use, r(135) = .26, p < .01, aggressive behavior r(135) = .65, p < .001, and dangerous driving, r(136) = .47, p < .001, following alcohol consumption, and delinquent behavior during the high school years, r(135) = .38, p < .01. Construct validity was supported further by a pattern of theoretically meaningful correlations with the Personality Research Form (D. N. Jackson, 1974, Goshen, NY: Research Psychologists Press). The discussion considered further research that is needed to adduce additional evidence for the construct validity of the Hypermasculinity Inventory as a measure of the macho personality pattern.  相似文献   

12.
Test validity is predicated on there being a lack of bias in tasks, items, or test content. It is well-known that factors such as test candidates' mother tongue, life experiences, and socialization practices of the wider community may serve to inject subtle interactions between individuals' background and the test content. When the gender of the test candidate interacts further with these factors, the potential for item bias to influence test performances grows. A dilemma faced by test designers concerns how they can proactively screen test content for possible sources of bias. Conventional practices in many contexts rely on the subjective opinion of review panels in detecting sensitive topical content and potentially biased material and items. In the last 2 decades this practice has been rivaled by the increased availability of item bias diagnostic software. Few studies have compared the relative accuracy and cost utility of the two approaches in the domain of language assessment. This study makes just that comparison. A 4-passage, 20-item reading comprehension test was given to a stratified sample of 825 high school students and college undergraduates at 5 Japanese institutions. The sampling included a focus group of 468 female students compared to a reference group of 357 male English as a foreign language (EFL) learners. The test passages and items were also given to a panel of 97 in-service and preservice EFL teachers for subjective ratings of potential gender bias. The results of the actual item responses were then empirically checked for evidence of differential item functioning using Simultaneous Item Bias analysis, the Mantel-Haenszel Delta method, and logistic regression. Concordance analyses of the subjective and objective methods suggest that subjective screening of bias overestimates the extent of actual item bias. Implications for cost-effective approaches to item bias detection are discussed.  相似文献   

13.
Researchers (e.g., Ironson, 1982; Tenopyr, 1990) have suggested that item bias investigators equate subgroups on external criteria such as job performance rather than total test scores before considering subgroup passing rates on test items. In a study comparing these two approaches to studies of item bias, we found little evidence of bias using total test score as the estimate of overall examinee ability, but nearly all items were biased in comparisons of white and African-American subgroups on Numerical, Verbal, and Mechanical Reasoning tests and in male-female comparisons on a Mechanical Reasoning test when job performance was used to select "equally able" examinees. However, the use of job performance as the ability index is analogous to performance-based approaches to test bias (Hartigan & Wigdor, 1989; Thorndike, 1971) and directly equivalent to the Darlington (1971) and Cole (1973) test bias definition, the logical inconsistencies of which have been previously described (Hunter & Schmidt, 1976; Peterson & Novick, 1976). We conclude that performance matching as a basis of forming "equal ability" groups is inappropriate.  相似文献   

14.
This paper considers testing in which the goal is to minimize the number of test items required to establish a learner's state of ability. Focus is on optimal or near optimal selection over a well-defined universe of items or stimuli. Selection policies are determined for the case in which the items have hierarchical or partial hierarchical relationships. Derivation of an optimal policy rests upon techniques from dynamic programming. For situations in which an optimal policy may be too costly to compute, two heuristic approximations are offered. One heuristic counts the hypothetical estimates of ability that remain tenable following a response to each item and chooses the item that minimizes the expectation of that number. The other selects the item that maximizes the statistic of information.  相似文献   

15.
Kindergarteners and third graders were given a continuous recognition memory task involving two-digit numbers. In addition, a rating scale consisting of photographs of various facial expressions was used to obtain confidence judgments from the Ss. Conventional analyses as well as signal detection analyses of the data revealed the following results: (a) the overall performance of the third graders was superior to that of the kindergarteners; (b) memory strength decreased as the number of intervening items increased; (c) there was no difference in the forgetting rates of the two grade levels; (d) the third graders exhibited a more liberal response bias than the kindergarteners; (e) both the hit rate (probability of correctly labeling an old stimulus as old) and the false-alarm rate (probability of incorrectly labeling a new stimulus as old) increased across blocks of items; (f) the increases in the hit rate and the false-alarm rate over blocks were due to a change in criterion from a relatively conservative level to a more lenient one; (g) the lower the S's level of confidence in judging an item as old, the lower was the probability of that item actually being old; (h) the third graders were better than the kindergarteners at gauging the accuracy of their recognition responses. It was concluded that with respect to recognition memory, chidren as young as 512 years old are capable, to some extent, of monitoring their own memory states.  相似文献   

16.
Standardized tests are used widely in comparative studies of clinical populations, either as dependent or control variables. Yet, one cannot always be sure that the test items measure the same constructs in the groups under study. In the present work, 460 participants with intellectual disability of undifferentiated etiology and 488 typical children were tested using Raven's Colored Progressive Matrices (RCPM). Data were analyzed using binomial logistic regression modeling designed to detect differential item functioning (DIF). Results showed that 12 items out of 36 function differentially between the two groups, but only 2 items exhibit at least moderate DIF. Thus, a very large majority of the items have identical discriminative power and difficulty levels across the two groups. It is concluded that RCPM can be used with confidence in studies comparing participants with and without intellectual disability. In addition, it is suggested that methods for investigating internal bias of tests used in cross-cultural, cross-linguistic or cross-gender comparisons should also be regularly employed in studies of clinical populations, particularly in the field of developmental disability, to show the absence of systematic measurement error (i.e. DIF) affecting item responses.  相似文献   

17.
Michael Braun 《Sex roles》2008,59(9-10):644-656
This study was designed to show that measuring gender ideology by egalitarian items might be problematic, as gender egalitarianism is not simply the reverse of gender traditionalism and includes very different stances. Egalitarian items are likely to lead to an insufficient discrimination between traditional and non-traditional respondents. While the former often ignore the egalitarian stance, the latter hold different egalitarian positions which prevents agreement with any specific item. Empirical evidence is reported from quantitative and qualitative data: an international survey based on random samples (n?=?5.692) and a cognitive study, which was conducted in Germany by both internet (n?=?889) and telephone (n?=?285). Here, the selection of a response category was analyzed by using probing questions.  相似文献   

18.
This research provides an example of testing for differential item functioning (DIF) using multiple indicator multiple cause (MIMIC) structural equation models. True/False items on five scales of the Schedule for Nonadaptive and Adaptive Personality (SNAP) were tested for uniform DIF in a sample of Air Force recruits with groups defined by gender and ethnicity. Uniform DIF exists when an item is more easily endorsed for one group than the other, controlling for group mean differences on the variable under study. Results revealed significant DIF for many SNAP items and some effects were quite large. Differentially-functioning items can produce measurement bias and should be either deleted or modeled as if separate items were administered to different groups. Future research should aim to determine whether the DIF observed here holds for other samples.  相似文献   

19.
A promising approach to understanding the processes involved when subjects respond to personality items is provided by the investigation of the causes of inconsistent responses when subjects answer the same item on two occasions. Among these causes are the properties of the item. Previous item research focused almost exclusively on properties which are not highly specific to the item, such as endorsement rate (ER) and social desirability scale value (SDSV). Although past studies found that items with ‘extreme’ SDSVs and/or ERs elicit fewer inconsistencies, these studies ignored more item-specific properties such as item content and item ambiguity. The present study demonstrates that contrary results regarding consistency may be obtained when more item-specific properties are taken into consideration. These results are interpreted as evidence that certain kinds of item content can increase the indecision and conflict that characterize some subjects' response processes.  相似文献   

20.
Frank Miele 《Intelligence》1979,3(2):149-163
It is often argued that IQ differences between groups of American Blacks and Whites are the result of IQ tests being culturally biased instruments. The present study attempts to determine the existence of cultural bias in the Wechsler Intelligence Scale for Children (WISC) by comparing Black and White children on: (1) loadings of the first principal component on the WISC subtests; (2) the rank order of item difficulty; (3) measuring the contribution to the total variance of the Race by Item interaction obtained by ANOVA; (4) and by attempting to stimulate race differences by within-race age differences. The results indicate that: (1) there is no evidence of specific factors peculiar to either racial group (the groups differ on what is common to all subtests); (2) the rank order of item difficulties is similar in both racial groups; (3) ANOVA reveals a significant Race by Item interaction, but one which accounts for less than five percent of the total variance, less than one percent in the case of age offset analysis; (4) the items which best discriminate between Blacks and Whites at any age level are also the items which best discriminate between age groups within-race. The data support the view that race differences are differences in mental maturity rather than in artifact of biased testing instruments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号