首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
The current research investigated questions that persist regarding the criterion‐related and construct validity of situational (SI) versus past‐behaviour (PBI) structured interview formats in predicting the job performance of managers. Analyses of data collected from 157 applicants to managerial positions showed that the PBI format significantly predicted job performance ratings (r = .32, p <.01), whereas the SI format did not (r = .09, ns). Investigation of potential construct differences between the SI and PBI formats showed that the PBI was more highly related to manager‐relevant cognitive ability measures, assessment centre exercises and personality traits, as compared with the SI. Such differences help to explain the predictive validity differences between the SI and PBI observed in current and previous research.  相似文献   

2.
In this article, we offer some suggestions as to why tetrads and pentads have become the dominant formats for administering multidimensional forced choice (MFC) items but, in turn, raise questions regarding the underlying psychometric model and means of addressing item quality and scoring accuracy. We then focus our attention on multidimensional pairwise preference (MDPP) items and present an item response theory–based approach to constructing and modeling MDPP responses directly, assessing information at the item and scale levels, and a way of computing standard errors for trait scores and estimating scale reliability. To demonstrate the viability of this method for applied use, we show that the correspondence between MDPP scores derived from direct modeling with those obtained using single statement and unidimensional pairwise preference measures administered in a laboratory setting. Trait score correlations and criterion related validities are compared across testing formats and rating sources (i.e., self and other), and the usefulness of our model-based approach is further demonstrated by some illustrative results involving computerized adaptive tests (CAT).  相似文献   

3.
Multiple‐choice response formats are troublesome, as an item is often scored as solved simply because the examinee may be lucky at guessing the correct option. Instead of pertinent Item Response Theory models, which take guessing effects into account, this paper considers a psycho‐technological approach to re‐conceptualizing multiple‐choice response formats. The free‐response format is compared with two different multiple‐choice formats: a traditional format with a single correct response option and five distractors (‘1 of 6’), and another with five response options, three of them being distractors and two of them being correct (‘2 of 5’). For the latter format, an item is scored as mastered only if both correct response options and none of the distractors are marked. After the exclusion of a few items, the Rasch model analyses revealed appropriate fit for 188 items altogether. The resulting item‐difficulty parameters were used for comparison. The multiple‐choice format ‘1 of 6’ differs significantly from the multiple‐choice format ‘2 of 5’, while the latter does not differ significantly from the free‐response format. The lower difficulty of items ‘1 of 6’ suggests guessing effects.  相似文献   

4.
Across three samples (N=475, 358, and 112), the authors examined the criterion‐related validity of the Employee Screening Questionnaire (ESQ), a brief forced‐choice measure of integrity in the workplace. Results suggested that ESQ scores correlate highly with self‐ and other‐reports of counterproductive work behaviors (rs of .59, .50, and .47 on the consolidated scores), as well as self‐reports of job satisfaction (rs of ?.41 and ?.22 on the consolidated scores), and intention to leave the organization (rs of .30 and .21 on the consolidated scores). No predictive bias by gender was found for the ESQ scores. Based on these results, the authors encourage more research on the use of personality‐based (covert), forced‐choice integrity tests in selection contexts.  相似文献   

5.
While the Angoff (1971) is a commonly used cut score method, critics ( Berk, 1996; Impara & Plake, 1997 ) argue the Angoff places too‐high cognitive demands on raters. In response to criticisms of the Angoff, a number of modifications to the method have been proposed. Some suggested Angoff modifications include using an iterative rating process, presenting judges with normative data about item performance, revising the rating judgment into a Yes/No decision, assigning relative weights to dimensions within a test, and using item response theory in setting cut scores. In this study, subject matter expert raters were provided with a ‘difficulty anchored’ rating scale to use while making Angoff ratings; this scale can be viewed as a variation of the Angoff normative data modification. The rating scale presented test items having known p‐values as anchors, and served as a simple means of providing normative information to guide the Angoff rating process. Results are discussed regarding reliability of the mean Angoff rating (.73) and the correlation of mean Angoff ratings with item difficulty (observed r ranges from .65 to .73).  相似文献   

6.
Data are described as ipsative if a given set of responses always sum to the same total. However, there are many properties of data collection that can give rise to different types of ipsative data. In this study, the most common type of ipsative data used in employee selection (forced‐choice ipsative data; FCID) is discussed as a special case of other types of ipsative data. Although all ipsative data contains constraints on covariance matrices (covariance‐level interdependence), FCID contains additional item‐level interdependencies as well. The psychological processes that give rise to FCID and the resultant psychometric properties are discussed. In addition, data from which both normative and ipsative responses were provided by job applicants illustrate very different patterns of correlations as well as very different selection decisions between normative, FCID and ipsatized measures.  相似文献   

7.
This is a response to Gray and Wilson’s (2007) article: “A detailed analysis of the reliability and validity of the sensation seeking scale in a UK sample”. Gray and Wilson analysed the items in the four subscales of the SSS-V, using a Likert type response format and deconstructing the forced choice format of the original. However they used some anachronistic items from the old 1978 form rather than the revisions of these items in the newer form. But even excluding the 19 items from the 80 item test not meeting their internal reliability criterion did not improve the reliabilities of the old scales in their Likert format. Validity of the SSS is not really addressed despite the title of the article.  相似文献   

8.
The Adaptive Visual Analog Scales is a freely available computer software package designed to be a flexible tool for the creation, administration, and automated scoring of both continuous and discrete visual analog scale formats. The continuous format is a series of individual items that are rated along a solid line and scored as a percentage of distance from one of the two anchors of the rating line. The discrete format is a series of individual items that use a specific number of ordinal choices for rating each item. This software offers separate options for the creation and use of standardized instructions, practice sessions, and rating administration, all of which can be customized by the investigator. A unique participant/patient ID is used to store scores for each item, and individual data from each administration are automatically appended to that scale’s data storage file. This software provides flexible, time-saving access for data management and/or importing data into statistical packages. This tool can be adapted so as to gather ratings for a wide range of clinical and research uses and is freely available at www.nrlc-group.net.  相似文献   

9.
The choice of performance rating format may influence employees' fairness perceptions. Participants in two studies, one consisting of 208 participants and the other of 393 participants, evaluated the fairness of common relative and absolute rating formats. The participants in the second study also evaluated the fairness of two rating formats, one absolute and one relative, presented in organizational contexts of varying procedural and distributive justice. Results indicate that not only are absolute formats perceived as more fair than relative formats, but differences in fairness perceptions also occur among relative and absolute formats. Furthermore, it appears that rating format influences procedural justice, especially when outcomes are perceived as fair. Implications for organizations' appraisal practices are discussed.  相似文献   

10.
We replicated the response‐restriction (RR) preference assessment and compared results in terms of preference hierarchies to those from free‐operant and multiple stimulus without replacement (MSWO) formats with six children with autism spectrum disorders (ASDs). We also assessed social validity of each format with teachers and clinicians who work with children with ASDs. Complete hierarchies were produced in four of 18 assessments and with MSWO and RR formats only. Results of the social validity assessment varied across raters, with each preference assessment format receiving the highest rating from at least one rater. Results are discussed in terms of practical recommendations and relative to the preference assessment literature as a whole.  相似文献   

11.
12.
887 respondents completed ipsative and normative versions of the PAL-TOPAS personality questionnaire. Data were analysed to test for (1) systematic bias in scores associated with the two response formats and (2) predictors of the magnitude of the discrepancy in the individual's ipsative and normative scores. Discrepancy was assessed for both item responses and scale scores. Sources of biases investigated included ipsative scaling artifact, extremeness of scores on the normative scales and response variability. Results showed that systematic bias in scale scores and magnitude of discrepancy were predicted by different factors. One source of systematic bias was associated with ipsative scaling artifact: the ipsative scales measure both the scale itself and rejection of other alternatives. A second source of systematic bias was acquiescence in response to normative items. A confirmatory factor analysis showed that a good but imperfect fit to the data may be obtained by constructing a structural model of the inter-relationship between normative and ipsative scores which accommodates both sources of bias. The strongest influence on discrepancy in scale scores was extremeness of normative scoring, associated with a bias towards either general acceptance or rejection of trait adjectives. It is concluded that both normative and ipsative response formats have limitations, and it may often be desirable to assess both.  相似文献   

13.
Multiple‐choice (MC) tests are arguably the most widely used testing format in applied settings. In the psychometric and education literatures, research on the optimal number of options for knowledge and ability MC tests has revealed that three‐option tests are psychometrically equivalent and, in some cases, superior to five‐option tests. In addition, there are a number of practical, economic, and administrative advantages associated with the use of three‐option MC tests. Yet, despite its advantages, the three‐option format is underutilized in personnel selection. Across two studies, we compared test‐taker perceptions, criterion‐related validity, and sex‐based subgroup differences, and in Study 1, we compared race‐based subgroup differences on three‐ and five‐option tests. Participants in the two studies completed a three‐ or five‐option version of ACT. Test perceptions, criterion‐related validity, and race‐ and sex‐based subgroup differences were similar across test formats. The implications for the expanded use of three‐option tests in applied settings and future directions for research are discussed.  相似文献   

14.
This study describes the development of a multidimensional biodata form which used explicit constructs to guide item generation and rational scale development, construct validation, criterion measurement and empirical keying. These constructs were goal-orientation, teamwork, customer service, resourcefulness, learning ability and leadership. Exploratory and confirmatory factor analyses in both applicant and incumbent samples were used to identify and test the model which included the thirteen, more differentiated rational scales relating to these six, broader constructs. Empirical keying of the rationally developed scales was conducted against criterion construct scales conceptually related to each predictor construct. Empirical keying at the item level was found to result in higher validities and cross-validities than either empirical keying at the scale level, or rational keying. The item keyed instrument also demonstrated incremental validity over a test of cognitive ability for specific work performance domains as well as overall work performance.  相似文献   

15.
While it is known that client factors account for the largest proportion of outcome variance across treatment modalities, little is known about how clients’ characteristics affect the process and effectiveness of couple therapy. To further knowledge in this area, we created a brief, practice‐friendly measure, the Expectation and Preference Scales for Couple Therapy (EPSCT). Three self‐report scales assess clients’ Outcome expectations (e.g., I expect our relationship to improve as a result of couple therapy) and role expectations for Self (e.g., I expect to listen to my partner's concerns) and Partner (e.g., I expect my partner to blame me). Three Cognitive‐Behavioral, Emotionally Focused, and Family Systems preference scales use a forced‐choice format to measure the comparative strength of respondents’ preferences for interventions broadly reflective of each approach. A large item pool was developed from relevant literature and clinical experience and refined based on face and content analyses with two panels of experienced couple therapists and researchers. Across four studies with 1,175 participants, the scales’ internal consistency reliabilities were similar and their construct validity was supported with confirmatory factor analyses and significant correlations with several established measures, including expectation measures developed for individual psychotherapy and measures of attitudes toward professional help seeking and valuing personal growth. Across all studies, participants had stronger role expectations for themselves than their partners, although gender effects differed by sample. We discuss how to use the 15‐item EPSCT in clinical practice and in future research as a predictor of couple therapy processes and outcomes.  相似文献   

16.
Abstract

Cancer is recognized to have multifaceted stressful impact on all areas of a patient's life. Researchers commonly use self-report questionnaires, intended to measure stressors objectively. However, the item-content and response-format of such scales often tap physical and mental responses to stress, thereby contaminating prediction of adverse impact. This article reports the development and validation of English and French versions of the Inventory of Recent Life Experiences for Cancer Patients (IRLE-C) which is designed to minimize such “criterion-contamination”. This entailed (1) avoiding items reflecting physical or subjective distress; (2) rating stressors for degree of exposure only; and (3) use of an innocuous scale title. The initial item pool was administered serially to a sample of 100 Francophone breast-cancer and prostate-cancer patients. To guard against inflating reliability and validity estimates through capitalizing on chance, we administered the 30-item final scale to an independent sample of 96 Francophone breast-cancer and prostate-cancer patients undergoing radiation treatment. Following the item-selection step, factorial structure and validity analyses were performed using the combined French-speaking sample (n= 196). Second, we administered the English version of the scale to an English-speaking sample of 127 cancer patients (various cancer sites and stages). The measure showed good internal consistency (.94 and .89 for the Francophone and Anglophone samples respectively) and met criteria for a 2-week test-retest reliability (r= .70 for the item-selection subsample and .80 for the cross-replication sub-sample). Correlations between the IRLE-C and the POMS Total Mood Disturbance were around .60 for both the Francophone and Anglophone samples. Avoiding contamination (through content and format) without losing its relationship to subjective distress, the IRLE-C appears a useful instrument for applying the stress-process model in oncology to establish clear distinctions among stressors, mediators, reactions, and consequences.  相似文献   

17.
Understanding the nature of science (NOS) is a critical aspect of scientific reasoning, yet few studies have investigated its developmental beginnings and initial structure. One contributing reason is the lack of an adequate instrument. Two studies assessed NOS understanding among third graders using a multiple‐select (MS) paper‐and‐pencil test. Study 1 investigated the validity of the MS test by presenting the items to 68 third graders (9‐year‐olds) and subsequently interviewing them on their underlying NOS conception of the items. All items were significantly related between formats, indicating that the test was valid. Study 2 applied the same instrument to a larger sample of 243 third graders, and their performance was compared to a multiple‐choice (MC) version of the test. Although the MC format inflated the guessing probability, there was a significant relation between the two formats. In summary, the MS format was a valid method revealing third graders' NOS understanding, thereby representing an economical test instrument. A latent class analysis identified three groups of children with expertise in qualitatively different aspects of NOS, suggesting that there is not a single common starting point for the development of NOS understanding; instead, multiple developmental pathways may exist.  相似文献   

18.
To examine the appropriateness of a Multi‐Trait–Multi‐Method framework for testing construct validity of Assessment Centers (ACs) and get practical implications for the improved AC design, degree to which the AC dimension‐related performance behaviors consistently manifest across multiple AC rating situations was investigated. The present study used a large sample (N = 5,006) to apply a measurement invariance analysis. AC rating situations generally produced consistent factor loadings for items on AC dimensions, item residuals, dimension factor variances, and covariance between dimensions. The AC rating situation of interview tended to produce higher ratings and less item residuals. These findings support the consistency in constructs assessed across different AC rating situations, while some exercises may be better for teasing apart particular dimensions than others.  相似文献   

19.
This study investigates the effects of rater personality (Conscientiousness and Agreeableness), rating format (graphic rating scale vs. behavioral checklist), and the rating social context (face‐to‐face feedback vs. no face‐to‐face feedback) on rating elevation of performance ratings. As predicted, raters high on Agreeableness showed more elevated ratings than those low on Agreeableness when they expected to have the face‐to‐face feedback meeting. Furthermore, rating format moderated the relationship between Agreeableness and rating elevation, such that raters high on Agreeableness provided less elevated ratings when using the behavioral checklist than the graphic rating scale, whereas raters low on Agreeableness showed little difference in elevation across different rating formats. Results also suggest that the interactive effects of rater personality, rating format, and social context may depend on the performance level of the ratee. The implications of these findings will be discussed.  相似文献   

20.
The present research investigated if an item response theory (IRT)‐scored forced‐choice personality questionnaire has the same normative data structures as a similar version that uses a 5‐point Likert scale instead. The study was conducted using a sample of 349 training delegates who completed both an IRT‐scored forced‐choice and a normative single‐stimulus version of the questionnaire. Results largely supported the scaling properties, measurement precision, and equivalence of the data structures of the two scoring methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号