首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Despite recent interest in the practice of allowing job applicants to retest, surprisingly little is known about how retesting affects 2 of the most critical factors on which staffing procedures are evaluated: subgroup differences and criterion-related validity. We examined these important issues in a sample of internal candidates who completed a job-knowledge test for a within-job promotion. This was a useful context for these questions because we had job-performance data on all candidates (N = 403), regardless of whether they passed or failed the promotion test (i.e., there was no direct range restriction). We found that retest effects varied by subgroup, such that females and younger candidates improved more upon retesting than did males and older candidates. There also was some evidence that Black candidates did not improve as much as did candidates from other racial groups. In addition, among candidates who retested, their retest scores were somewhat better predictors of subsequent job performance than were their initial test scores (rs = .38 vs. .27). The overall results suggest that retesting does not negatively affect criterion-related validity and may even enhance it. Furthermore, retesting may reduce the likelihood of adverse impact against some subgroups (e.g., female candidates) but increase the likelihood of adverse impact against other subgroups (e.g., older candidates).  相似文献   

2.
This study investigates the test–retest reliability of a battery of executive function (EF) tasks with a specific interest in testing whether the method that is used to create a battery-wide score would result in differences in the apparent test–retest reliability of children’s performance. A total of 188 4-year-olds completed a battery of computerized EF tasks twice across a period of approximately two weeks. Two different approaches were used to create a score that indexed children’s overall performance on the battery—i.e., (1) the mean score of all completed tasks and (2) a factor score estimate which used confirmatory factor analysis (CFA). Pearson and intra-class correlations were used to investigate the test–retest reliability of individual EF tasks, as well as an overall battery score. Consistent with previous studies, the test–retest reliability of individual tasks was modest (rs ≈ .60). The test–retest reliability of the overall battery scores differed depending on the scoring approach (rmean = .72; rfactor_score = .99). It is concluded that the children’s performance on individual EF tasks exhibit modest levels of test–retest reliability. This underscores the importance of administering multiple tasks and aggregating performance across these tasks in order to improve precision of measurement. However, the specific strategy that is used has a large impact on the apparent test–retest reliability of the overall score. These results replicate our earlier findings and provide additional cautionary evidence against the routine use of factor analytic approaches for representing individual performance across a battery of EF tasks.  相似文献   

3.
Previous research on measurement error in job performance ratings estimated reliability using coefficients: alpha, test–retest, and interrater correlation. None of these three coefficients control for the four main sources of error in performance ratings. For this reason, coefficient of equivalence and stability (CES) has been suggested as the ideal estimate of reliability. This article presents the estimates of CES for a time interval of 1, 2, and 3 years. The values obtained for a single rater were .51, .48, and .44, respectively. For two raters, the values were .59, .55, and .51. The findings suggest that previous reliability estimates based on alpha, test–retest, and interrater coefficients overestimated the reliability of job performance ratings. In the present study, the interrater coefficient overestimates reliability by 13.6–25.4% for an interval time of 1–3 years, as it does not control for transient error. Results also showed that the importance of transient error increases as the length of the interval between the measures increases. Based on the results, it is suggested that corrected validities based on interrater reliability underestimate the magnitude of the validity. The implications of these findings for future efforts to estimate criterion reliability and predictor validity are discussed.  相似文献   

4.
The purpose of the present study is to examine the test-retest reliability of the Halstead-Reitan Battery (HRB) in an acutely psychotic population. Ten acutely psychotic patients, initially tested upon admission to an inpatient unit of the Austin State Hospital, were selected for retesting on the basis of showing the most complete remission of psychotic symptoms of all patients tested over the 14-month period from June 1978 to August 1979. Only patients showing good remission were selected in order to maximize the potential for changes in test performance to occur and, thus, provide a conservative estimate of test-retest reliability. The average retest interval was 10.4 weeks (SD=6.67), with retest being completed just prior to discharge. Reliability was examined for each HRB subtest across subjects, as well as for each subject across subtest. While the patients generally showed an improved performance at retest, the reliability of the HRB was relatively high.  相似文献   

5.
Eyetracking is commonly used to investigate attentional bias. Although some studies have investigated the internal consistency of eyetracking, data are scarce on the test–retest reliability and agreement of eyetracking to investigate attentional bias. This study reports the test–retest reliability, measurement error, and internal consistency of 12 commonly used outcome measures thought to reflect the different components of attentional bias: overall attention, early attention, and late attention. Healthy participants completed a preferential-looking eyetracking task that involved the presentation of threatening (sensory words, general threat words, and affective words) and nonthreatening words. We used intraclass correlation coefficients (ICCs) to measure test–retest reliability (ICC > .70 indicates adequate reliability). The ICCs(2, 1) ranged from –.31 to .71. Reliability varied according to the outcome measure and threat word category. Sensory words had a lower mean ICC (.08) than either affective words (.32) or general threat words (.29). A longer exposure time was associated with higher test–retest reliability. All of the outcome measures, except second-run dwell time, demonstrated low measurement error (<6%). Most of the outcome measures reported high internal consistency (α > .93). Recommendations are discussed for improving the reliability of eyetracking tasks in future research.  相似文献   

6.
Reference checking is a near‐universal practice within personnel selection systems, and legal pressures to gather job‐relevant and structured feedback from references is mounting. Despite this state of affairs, reference checking is a woefully under‐researched method for obtaining psychometrically sound and behaviorally informative data that predict task, team, and leadership behavior at work. From studies of job candidates in applied settings, this article reports on the reliability, validity, and compliance of multisource reference feedback gathered using a web‐based methodology. Acceptable levels of internal consistency, inter‐rater reliability, and test–retest reliability of the reference‐checking instrument were realized. Results of survival analyses found support for prediction of involuntary, but not voluntary turnover. No practically significant differences were found in overall mean scores across demographic subgroups. Finally, the web‐based reference‐checking system evinced high degrees of efficiencies across a range of metrics (e.g., reference response time, reference response rate, candidate response time).  相似文献   

7.
Three studies were conducted to develop and validate a mental toughness instrument for use in military training environments. Study 1 (n = 435) focused on item generation and testing the structural integrity of the Military Training Mental Toughness Inventory (MTMTI). The measure assessed ability to maintain optimal performance under pressure from a range of different stressors experienced by recruits during infantry basic training. Study 2 (n = 104) examined the concurrent validity, predictive validity, and test–retest reliability of the measure. Study 3 (n = 106) confirmed the predictive validity of the measure with a sample of more specialized infantry recruits. Overall, the military training mental toughness inventory demonstrated sound psychometric properties and structural validity. Furthermore, it was found to possess good test–retest reliability, concurrent validity, and predicted performance in 2 different training contexts with 2 separate samples.  相似文献   

8.
This paper analyzes existing research on the test–retest reliability of human judgment, i.e. the extent to which a judge makes identical judgments when presented with identical stimuli on two occasions. Only research involving professional judges who make experimental judgments in a reasonable analog of their everyday experience is included. Studies of both internal consistency reliability and temporal stability reliability are analyzed (where the former refers to the inclusion of repeat stimuli in the same experimental session, and the latter refers to the repeating of the experimental task from a few days to several months later). It is found that (1) the test–retest reliability literature is concentrated in four substantive judgment areas (medicine/psychology, meteorology, human resources management, and business), (2) the literature is extremely variable in terms of research approach/design, the determinants or correlates of test–retest reliability that have been studied, and the quality of the execution and analysis, and (3) mean test–retest reliability differs across both substantive judgment areas and the internal consistency versus temporal stability distinction. An inescapable conclusion from the analysis is that our knowledge of this fundamental property of human judgment is quite meager. Therefore, the paper concludes with suggestions about future research that would address test–retest reliability more systematically. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

9.
In this article, the psychometric properties of a new scale aimed at quantifying passion are explored, i.e. passion related to becoming good or achieving in some area/theme/skill.The Passion Scale was designed to be quantitative, simple to administer, applicable for large-group testing, and reliable in monitoring passion.A total of 126 participants between 18 and 47 years of age (mean age = 21.65, SD = 3.45) completed an assessment of Passion Scale, enabling us to investigate its feasibility, internal consistency, construct validity and test-retest reliability.FeasibilityThe overall pattern of results suggest that the scale for passion presented here is applicable for the age studied (18–47).Internal consistencyAll individual item scores correlated positively with the total score, with correlations ranging from 0.51 to 0.69. The Cronbach's alpha value for the standardized items was 0.86.Construct validityPearson correlations coefficient between total score passion scale and Grit-S scale were 0.39 for adults, mean age 21.23 (SD = 3.45) (N = 107).Test-retest reliability: Intraclass correlation coefficient (ICCs) between test and retest scores for the total score was 0.92.These promising results warrant further development of the passion scale, including normalization based on a large, representative sample.  相似文献   

10.
The use of reliability estimates is increasingly scrutinized as scholars become more aware that test–retest stability and self–other agreement provide a better approximation of the theoretical and practical usefulness of an instrument than its internal reliability. In this study, we investigate item characteristics that potentially impact single‐item internal reliability, retest reliability, and self–other agreement. Across two large samples (N = 6690 and N = 4396), two countries (Estonia and The Netherlands), and two personality inventories (the NEO PI‐3 and the HEXACO‐PI‐R), results show that (i) item variance is a strong predictor of self–other agreement and retest reliability but not of single‐item internal reliability; (ii) item variance mediates the relations between evaluativeness and self–other agreement; and (iii) self–other agreement is predicted by observability and item domain. On the whole, weak relations between item length, negations, and item position (indicating effects of questionnaire length) on the one hand, and single‐item internal reliability, retest reliability, and self–other agreement on the other, were observed. In order to increase the predictive validity of personality scales, our findings suggest that during the construction of questionnaire items, researchers are advised to pay close attention especially to item variance, but also to evaluativeness and observability. Copyright © 2016 European Association of Personality Psychology  相似文献   

11.
西方诚信度测验研究述评   总被引:1,自引:0,他引:1  
诚信度测验是指在招聘和选拔过程中用于评价应聘者的诚实、诚信、可依靠性,从而预测他们的偷盗、违反纪律、反工作行为以及今后工作业绩的以纸笔测验为主的测验工具。文章对西方诚信度测验的发展和应用现状做出了综述,指出诚信度测验具有较好的信度和效度。大五人格模型中的责任感、宜人性和情绪稳定性是诚信度测验潜在的测量内容,但诚信度测验与大五人格模型以外的其它人格维度也存在较高的相关性。诚信度测验对反工作行为和整体工作绩效具有良好的预测效度。文章在总结诚信度测验存在的一些争议和问题的基础上,提出了它在中国企业员工招聘和选拔中应用的若干建议  相似文献   

12.
We examined the factor structure, reliability, and validity of the Chinese version of the Personal Growth Initiative Scale–II (CPGIS–II) using data from a sample of 927 Chinese university students. Consistent with previous findings, confirmatory factor analyses supported a 4-factor model of the CPGIS–II. Reliability analyses indicated that the 4 CPGIS–II subscales, namely Readiness for Change, Planfulness, Using Resources, and Intentional Behavior, demonstrated good internal consistency reliability and adequate test–retest reliability across a 4-week period. In addition, evidence for convergent and incremental validity was found in relation to measures of positive and negative psychological adjustment. Finally, results of hierarchical regression analyses indicated that the 4 personal growth initiative dimensions, especially planfulness, accounted for additional unique variance in psychological adjustment beyond resilience. Some implications for using the CPGIS–II in Chinese are discussed.  相似文献   

13.
Up until now, no really short instrument that measures the six personality dimensions of the HEXACO model has been available. In two studies, I report the construction of the Brief HEXACO Inventory (BHI), which represents the 24 HEXACO facets with 1 item per facet (i.e., 4 items per domain) and which takes approximately 2–3 min to complete. Although characterized by relatively low alpha reliability, its test–retest stability, self-other agreement, and convergent correlations with full-length scales are relatively high and its validity loss is only modest. Correcting for attenuation using a weighted average of alpha reliability, test–retest stability, and self-other agreement, the BHI re-estimates the original construct validity correlations of the HEXACO-PI-R with relatively great accuracy.  相似文献   

14.
The relationship between conscientiousness and job performance has been found to be nonlinear in the West, which challenges conceptually and empirically the traditional assumption of the single linear relationship. In this research, we examined the nonlinear effects of conscientiousness on both overall job performance and performance dimensions (i.e., task performance, adaptive performance and contextual performance) in the Chinese context. The results of our two studies supported some evidence for the nonlinear effect of conscientiousness on overall job performance. In addition, it was found that conscientiousness has different (linear or nonlinear) effects on performance dimensions. These findings suggest that the nonlinear effects of conscientiousness on job performance deserve further investigation, and a distinction should be made with regard to job performance in personnel evaluation. Results are discussed in terms of the significance of considering the nonlinear relationship between conscientiousness and performance criteria.  相似文献   

15.
This study examined whether declarative memory in infants can be reliably assessed using the deferred imitation task. Twenty-four infants at the age of 12 months were given the same deferred imitation task twice within a short period of time (week-to-week assessment). Replicating the results of former studies the second memory test yielded better memory performances on the group level than the first one, indicating a memory benefit as is typically found in older children as well as in adults. Stability of memory performance level was analysed using two indicators, namely test – retest correlations assessing stability of individual memory performances for the whole sample, as well as corrected test – retest correlations using individual consistency scores. Test – retest reliability was highly significant (r = .52, p = .009), as well as corrected test – retest reliability (r = .62, p = .001), thus demonstrating that the individual memory performance level in infants can reliably be assessed using the deferred imitation task.  相似文献   

16.
Test-retest reliability of the Test of Variables of Attention (T.O.V.A.) was investigated in two studies using two different time intervals: 90 min and 1 week (2 days). To investigate the 90-min reliability, 31 school-age children (M = 10 years, SD = 2.66) were administered the T.O.V.A. then read ministered the test 90 min afterward. Significant reliability coefficients were obtained across omission (.70), commission (.78), response time (.84), and response time variability (.87). For the second study, a different sample of 33 school-age children (M = 10.01 years, SD = 2.59) were administered the test then read ministered the test 1 week later. Significant reliability coefficients were obtained for omission (.86), commission (.74), response time (.79), and response time variability (.87). Standard error of measurement statistics were calculated using the obtained coefficients. Commission scores were significantly higher on second trials for each retest interval.  相似文献   

17.
There is considerable debate about the sociocognitive features of autism spectrum conditions (ASC), and a tool for profiling the sociocognitive profiles of children and adolescents with ASC is needed. The aim of this research was to evaluate the psychometric properties of a new questionnaire—The Australian Scale for Autism Spectrum Conditions (ASASC). Three hundred twenty‐two parents of children on the ASC spectrum, including autistic disorder (n = 76), Asperger's disorder (n = 188), and pervasive developmental disorder not otherwise specified (n = 21), and a clinical group of children with subclinical ASC features and no ASC diagnosis (n = 37). Measures include an initial scale measuring eight potential dimensions of ASC, a related screening tool for autism, and two previously validated social skills questionnaires. The questionnaires were administered online. The ASASC was factor‐analysed, internal and test–retest reliabilities (for a randomly selected 84 parents) were calculated, and preliminary tests of convergent and divergent validity were conducted. The resulting measure (44 items) contained five coherent and reliable dimensions: understand and express emotion, fact orientation, sensory sensitivity, social communication, and rigidity. The questionnaire had good test–retest reliability and convergent/divergent validity. The ASASC enables profiles of ASC symptomatology that should be useful in adjusting interventions to individual needs.  相似文献   

18.
As scientists, it is imperative that we understand not only the power of our research tools to yield results, but also their ability to obtain similar results over time. This study is an investigation into how common decisions made during the design and analysis of a functional magnetic resonance imaging (fMRI) study can influence the reliability of the statistical results. To that end, we gathered back-to-back test–retest fMRI data during an experiment involving multiple cognitive tasks (episodic recognition and two-back working memory) and multiple fMRI experimental designs (block, event-related genetic sequence, and event-related m-sequence). Using these data, we were able to investigate the relative influences of task, design, statistical contrast (task vs. rest, target vs. nontarget), and statistical thresholding (unthresholded, thresholded) on fMRI reliability, as measured by the intraclass correlation (ICC) coefficient. We also utilized data from a second study to investigate test–retest reliability after an extended, six-month interval. We found that all of the factors above were statistically significant, but that they had varying levels of influence on the observed ICC values. We also found that these factors could interact, increasing or decreasing the relative reliability of certain Task × Design combinations. The results suggest that fMRI reliability is a complex construct whose value may be increased or decreased by specific combinations of factors.  相似文献   

19.
Peterson, Deary, and Austin (2003) considered the reliability of the Cognitive Styles Analysis (CSA) (Riding, 1991). The CSA seeks to assess an individual’s position on each of two fundamental style dimensions – the Wholist-Analytic and the Verbal-Imagery dimensions. It presents a series of simple cognitive tasks, which the subjects may choose to process according to their preferred style. Performance on these test items is in terms of response times. The CSA comprises 40 items to assess the Wholist-Analytic and 48 for the Verbal-Imagery and typically takes 15–20 min to complete. It is intended to be suitable for a wide age and ability range, and applicable to a variety of contexts and cultures.The most important characteristic of any test of cognitive style is its temporal stability. Studies which attempt to establish test validity without definitive evidence of test reliability are lacking a basic foundation. Riding has not published any statistical data on the test–retest reliability of the CSA.Peterson et al. (2003) and Peterson (2003) claim to have carried out the primary evaluation of the CSA’s reliability. However we were the first to publish accurate test–retest reliability data on Riding’s CSA (Redmond, Mullally, & Parkinson, 2002).This brief report addresses the issue as to who initially established the unreliability of the CSA in the first place and why Peterson, Deary and Austin’s claims are misleading and unsubstantiated.  相似文献   

20.
A review of the extant literature and new empirical research suggests that social desirability is not much of a concern in personality and integrity testing for personnel selection. In particular, based on meta-analytically derived evidence, it appears that social desirability influences do not destroy the convergent and discriminant validity of the Big Five dimensions of personality (Emotional Stability, Extraversion, Openness to Experience, Agreeableness, and Conscientiousness). We also present new empirical evidence regarding gender and age differences in socially desirable re- sponding. Although social desirability predicts a number of important work variables such as job satisfaction, organizational commitment, and supervisor ratings of training success, social desirability does not seem to be a predictor of overall job performance and is only very weakly related to specific dimensions of job performance such as technical proficiency (r = -.07) and personal discipline ( r = .05). Large sample investigations of the moderating influences of social desirability in actual work settings indicate that social desirability does not moderate the criterion-related validities of personality variables or integrity tests. The criterion-related validity of integrity tests for overall job performance with applicant samples in predictive studies is .41. Controlling for social desirability in integrity or personality test scores leaves the operational validities intact, thereby suggesting that social desirability functions neither as a mediator nor as a suppressor variable in personality-performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号