期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Validity Stability Across Entering College Cohorts: Exploring the temporal generalizability of local validity estimates

Christopher R. Huber Nathan R. Kuncel Paul R. Sackett Adam S. Beatty 《International Journal of Selection & Assessment》2015,23(3):237-246

Local validity studies rely on the assumption that validity estimates from one incumbent sample approximate validity for future applicant pools. We test this assumption using SAT scores and high school grades as predictors of first year college grade point average across multiple college applicant pools for over 100 schools. We present evidence for substantial absolute and rank order consistency in validity estimates. However, this consistency is far less than perfect, resulting in potentially meaningful utility differences over time. In addition, observed fluctuations are not fully explained by sampling error alone. 相似文献

2.

The First Twenty Years of the Will and the Ways: An Examination of Score Reliability Distribution on Snyder’s Dispositional Hope Scale

Chan M. Hellman Megan K. Pittman Ricky T. Munoz 《Journal of Happiness Studies》2013,14(3):723-729

C. R. Snyder has established hope theory as an important contributor to positive psychology. As the empirical evidence continues to grow, hope researchers need to have confidence that their measures will produce reliable scores. This study presents a reliability generalization on both the internal consistency and test–retest reliability estimates from Snyder’s dispositional hope scale. While over 300 published works were found to have cited the target article 74 present internal consistency scores and 17 reported scores for test–retest reliability. The results of the reliability generalization suggest support for the score reliabilities produced by the dispositional hope scale. However, internal consistency was higher for studies using the eight-item response format (α = 0.82) compared to those using the four-item response format (α = 0.77). Additionally, the test–retest score reliability was high 0.80 with no statistically significant differences by response format. Findings also demonstrated that score reliability estimates were not significantly influenced by the coded sample characteristics. 相似文献

3.

A Reliability Generalization Meta-Analysis of Self-Report Measures of Adult Attachment

James M. Graham Marta S. Unterschute 《Journal of personality assessment》2015,97(1):31-41

This study is a reliability generalization meta-analysis that reviews 5 of the most frequently used continuous measures of adult attachment security: the Adult Attachment Scale, Revised Adult Attachment Scale, Adult Attachment Questionnaire, Experiences in Close Relationships, and Experiences in Close Relationships–Revised. A total of 313,462 individuals from 564 studies provided 1,629 internal consistency reliability estimates for this meta-analysis. We present the average internal consistency reliability of scores for each measure and test the consistency of score reliabilities across a wide variety of sample characteristics. In light of this, we highlight several issues in the measurement of adult attachment security and make concrete recommendations for researchers seeking to measure adult attachment. 相似文献

4.

Item Difficulty,Discrimination, and the Confidence-Frequency Effect in a Categorical Judgment Task

《Organizational behavior and human decision processes》1995,61(2):148-167

Three experiments are presented to examine the role of discrimination abilities in the relationship between confidence and performance across items that vary in difficulty. The studies also test the confidence-frequency effect, predicted in the theory of probabilistic mental models (Gigerenzer, Hoffrage, & Kleinbolting, 1991), by investigating the relationship between performance estimates provided as confidence judgments and as estimates of the frequency of correct responding. The categorical judgment task involved predicting whether a handwriting sample, generated using naturalistic sampling, had been written by a female or male. The results suggest that, when outcome variability is taken into account, discrimination ability differs drastically between easy, medium, and hard items. Discriminability is greatest for easy items and decreases until it disappears for hard items. There was also evidence of a confidence-frequency effect, with mean confidence judgments showing a slight tendency toward overconfidence and frequency estimates showing a slight tendency to underestimate performance. 相似文献

5.

Development and validation of the Male Body Talk Scale: A psychometric investigation

《Body image》2014,11(3):233-244

This paper details the development of the Male Body Talk (MBT) scale and five studies supporting the psychometric soundness of scores on this new measure. Participants were 18–65-year-old men recruited via Amazon's Mechanical Turk, introductory psychology courses, and snowball sampling. The MBT scale assesses the frequency with which men engage in negatively valenced body-related conversations with others. Two subscales were identified through a combination of exploratory and confirmatory factor analysis. The Muscle Talk subscale assesses men's tendency to express concerns regarding degree of muscularity and being too small. The Fat Talk subscale assesses men's tendency to express concerns regarding level of body fat and being overweight. Scores on the MBT scale demonstrated strong internal consistency and moderate test–retest reliability. Evidence of convergent, discriminant, and incremental validity of scores on the MBT scale is presented. This new measure is a useful tool for examining how often men engage in negative body talk. 相似文献

6.

Assessing individual differences: effects of responding to prior questionnaires on the substantive and psychometric properties of self-esteem and depression assessments

G H Brody Z Stoneman M Millar J K McCoy 《Journal of personality assessment》1990,54(1-2):401-411

It has become a common practice among psychological researchers to administer batteries of individual difference assessments to research participants, although little is known about whether the substantive and psychometric integrity of the questionnaires are maintained when they are administered after the subject has completed other instruments. The studies presented here consider these issues in relation to the assessment of self-esteem and depression. In the first study, college students responded to a self-esteem inventory (a) by itself (control group), (b) after one prior questionnaire, (c) after three prior questionnaires, or (d) after five prior questionnaires. Results indicated that filling out one or more questionnaires before an assessment of self-esteem resulted in reports of lower self-esteem relative to the control condition. Additional analyses revealed that filling out three or five prior questionnaires created lower reliabilities of subscale scores and lower estimates of concurrent validity between self-esteem and depression. When the effect of prior questionnaires on the General Self-Esteem subscale was examined, the aforementioned results were replicated, and the prior questionnaire treatment created heterogeneous variances across the experimental groups. The second study was designed as a replication of the first study, using an assessment of depression as the target questionnaire. These results revealed that reports of depressive symptomatology increased as the number of prior questionnaires increased. Again, the prior questionnaire treatment created heterogeneity of variance between the groups, but did not adversely affect its internal consistency. 相似文献

7.

Statistical inferences about the error variance

Walter Kristof 《Psychometrika》1963,28(2):129-143

This paper is a presentation of an essential part of the sampling theory of the error variance and the standard error of measurement. An experimental assumption is that several equivalent tests with equal variances are available. These may be either final forms of the same test or obtained by dividing one test into several parts. The simple model of independent and normally distributed errors of measurement with zero mean is employed. No assumption is made about the form of the distributions of true and observed scores. This implies unrestricted freedom in defining the population. First, maximum-likelihood estimators of the error variance and the standard error of measurement are obtained, their sampling distributions given, and their properties investigated. Then unbiased estimators are defined and their distributions derived. The accuracy of estimation is given special consideration from various points of view. Next, rigorous statistical tests are developed to test hypotheses about error variances on the basis of one and two samples. Also the construction of confidence intervals is treated. Finally, Bartlett's test of homogeneity of variances is used to provide a multi-sample test of equality of error variances. 相似文献

8.

Study samples are too small to produce sufficiently precise reliability coefficients

Charter RA 《The Journal of general psychology》2003,130(2):117-129

In a survey of journal articles, test manuals, and test critique books, the author found that a mean sample size (N) of 260 participants had been used for reliability studies on 742 tests. The distribution was skewed because the median sample size for the total sample was only 90. The median sample sizes for the internal consistency, retest, and interjudge reliabilities were 182, 64, and 36, respectively. The author presented sample size statistics for the various internal consistency methods and types of tests. In general, the author found that the sample sizes that were used in the internal consistency studies were too small to produce sufficiently precise reliability coefficients, which in turn could cause imprecise estimates of examinee true-score confidence intervals. The results also suggest that larger sample sizes have been used in the last decade compared with those that were used in earlier decades. 相似文献

9.

Psychometric Properties of Loevinger's Sentence Completion Test in an Adult Psychiatric Outpatient Sample

《Journal of personality assessment》2013,95(3):478-486

Little research has been conducted on Loevinger's Washington University Sentence Completion Test of Ego Development in adult psychiatric outpatients. The measure is a promising method of assessing a construct of personality and character functioning that should be useful research on psychopathology and in choosing treatment modalities. The data presented in this study address the question of the psychometric adequacy of the measure in this segment of the subject population. Specifically, estimates of interrater reliability, internal consistency, and test-retest reliability are presented for a sample of 42 adult outpatients. In addition, the relationship between total protocol ratings and item sum scores is explored. 相似文献

10.

Situational effects in trait assessment: The FPI,NEOFFI, and EPI questionnaires

Renate Deinzer Rolf Steyer Michael Eid Peter Notz Peter Schwenkmezger Fritz Ostendorf Aljoscha Neubauer 《欧洲人格杂志》1995,9(1):1-23

While most researchers do agree now that situations may have an effect in the assessment of traits, the consequences have been neglected, so far: if situations affect the assessment of traits we have to take this fact into account in studies on reliability and validity of measurement instruments and their application. In the theoretical part of this article we provide a more formal exposition of this point, introducing the basic concepts of latent state–trait (LST) theory. LST theory and the associated models allow for the estimation of the situational impact on trait measures in non-experimental, correlational studies. In the empirical part, LST theory is applied to three well known trait questionnaires: the Freiburg Personality Inventory, the NEO Five-Factor Inventory and the Eysenck Personality Inventory. It is shown that significant proportions of the variances of the scales of these questionnaires are due to situational effects. The following consequences of this finding are discussed, (i) Instead of the reliability coefficient, the proportion of variance due to the latent trait, the consistency coefficient, should be used for the estimation of confidence intervals for trait scores, (ii) To reduce the situational effects on trait estimates it may be useful to base such an estimate on several occasions, i.e., to aggregate data across occasions. (iii) Reliability and validity studies should not only be based on a sample of persons representative of those to whom the test will be applied; they should also be conducted in situational contexts representative of the intended applications. 相似文献

11.

Consistency and strength of grapheme-color associations are separable aspects of synesthetic experience

《Consciousness and cognition》2021

Consistency of synesthetic associations over time is a widely used test of synesthesia. Since many studies suggest that consistency is not a completely reliable feature, we compared the consistency and strength of synesthetes’ grapheme-color associations. Consistency was measured by scores on the Synesthesia Battery and by the Euclidean distance in color space for the specific graphemes tested for each participant. Strength was measured by congruency magnitudes on the Implicit Association Test. The strength of associations was substantially greater for synesthetes than non-synesthetes, suggesting that this is a novel, objective marker of synesthesia. Although, intuitively, strong associations should also be consistent, consistency and strength were uncorrelated, indicating that they are likely independent, at least for grapheme-color synesthesia. These findings have implications for our understanding of synesthesia and for estimates of its prevalence since synesthetes who experience strong, but inconsistent, associations may not be identified by tests that focus solely on consistency. 相似文献

12.

Broader Autism Phenotype in Parents of Children with Autism: A Systematic Review of Percentage Estimates

Eric Rubenstein Devika Chawla 《Journal of child and family studies》2018,27(6):1705-1720

The broader autism phenotype (BAP) is a collection of sub-diagnostic autistic traits more common in families of individuals with autism spectrum disorder (ASD) than in the general population. BAP is a latent construct that can be defined using different domains, measured using multiple instruments, and reported using different techniques. Therefore, estimates of BAP may vary greatly across studies. Our objective was to systematically review studies that reported occurrence of BAP in parents of children with ASD in order to quantify and describe heterogeneity in estimates. We systematically searched PubMed and Scopus using PRISMA guidelines for studies quantifying percentage of parents of children with ASD who had BAP. We identified 41 studies that measured BAP in parents of children with ASD. These studies used eight different instruments, four different forms of data collection, and had a wide range of sample sizes (N?=?4 to N?=?3299). Percentage with BAP ranged from 2.6% to 80%. BAP was more prevalent in fathers than mothers. Parental BAP may be an important tool for parsing heterogeneity in ASD etiology and for developing parent-mediated ASD interventions. However, the variety in measurement instruments and variability in study samples limits our ability to synthesize estimates. To improve measurement of BAP and increase consistency across studies, universal methods should be accepted and adopted across studies. A more consistent approach to BAP measurement may enable efficient etiologic research that can be meta-analyzed and may allow for a larger evidence base that can be used to account for BAP when developing parent-mediated interventions. 相似文献

13.

Assessing Individual Differences: Effects of Responding to Prior Questionnaires on the Substantive and Psychometric Properties of Self-Esteem and Depression Assessments

Gene H. Brody Zolinda Stoneman Murray Millar J. Kelly McCoy 《Journal of personality assessment》2013,95(1-2):401-411

It has become a common practice among psychological researchers to administer batteries of individual difference assessments to research participants, although little is known about whether the substantive and psychometric integrity of the questionnaires are maintained when they are administered after the subject has completed other instruments. The studies presented here consider these issues in relation to the assessment of self-esteem and depression. In the first study, college students responsed to a self-esteem inventory (a) by itself (control group), (b) after one prior questionnaire, (c) after three prior questionnaires, or (d) after five prior questionnaires. Results indicated that filling out one or more questionnaires before an assessment of self-esteem resulted in repots of lower self-esteem relative to the control condition. Additional analyses revealed that filling out three or five prior questionnaires created lower reliabilities of subscale scores and lower estimates of concurrent validity between self-esteem and depression. When the effect of prior questionnaires on the General Self-Esteem subscale was examined, the aforementioned results were replicated, and the prior questionnaire treatment created heterogeneous variances across the experimental groups. The second study was designed as a replication of the first study, using an assessment of depression as the target questionnaire. These results revealed that reports of depressive symptomatology increased as the number of prior questionnaires increased. Again, the prior or questionnaire treatment created heterogeneity of variance between the groups, but did not adversely affect its internal consistency. 相似文献

14.

Use of internal consistency coefficients for estimating reliability of experimental task scores

Samuel B. Green Yanyun Yang Mary Alt Shara Brinkley Shelley Gray Tiffany Hogan Nelson Cowan 《Psychonomic bulletin & review》2016,23(3):750-763

Reliabilities of scores for experimental tasks are likely to differ from one study to another to the extent that the task stimuli change, the number of trials varies, the type of individuals taking the task changes, the administration conditions are altered, or the focal task variable differs. Given that reliabilities vary as a function of the design of these tasks and the characteristics of the individuals taking them, making inferences about the reliability of scores in an ongoing study based on reliability estimates from prior studies is precarious. Thus, it would be advantageous to estimate reliability based on data from the ongoing study. We argue that internal consistency estimates of reliability are underutilized for experimental task data and in many applications could provide this information using a single administration of a task. We discuss different methods for computing internal consistency estimates with a generalized coefficient alpha and the conditions under which these estimates are accurate. We illustrate use of these coefficients using data for three different tasks. 相似文献

15.

Reliability of a sentence completion measure of ego development.

C Redmore K Waldman 《Journal of personality assessment》1975,39(3):236-243

Studied the reliability of the Washington University Sentence Completion Test by giving 51 9th graders and 26 college students the test twice, a week apart. For 9th graders the design included a test-retest group and two groups given half of the test at each session. Although test-retest correlations were high for the 9th graders, retest scores dropped significantly. With college students (a) test-retest correlations through positive and significant were lower, (b) retest scores did not change systematically, and (c) percentage agreement between test and retest scores was high. Discrepant results were related to motivational set and variance in test scores. Split-half correlations and internal consistency coefficients were high. Likelihood of lower retest scores makes problematic the use of this test for short term pretest-posttest studies seeking to stimulate ego development. 相似文献

16.

Heterogeneous factor analysis models: A bayesian approach

Asim Ansari Kamel Jedidi Laurette Dube 《Psychometrika》2002,67(1):49-77

Multilevel factor analysis models are widely used in the social sciences to account for heterogeneity in mean structures. In this paper we extend previous work on multilevel models to account for general forms of heterogeneity in confirmatory factor analysis models. We specify various models of mean and covariance heterogeneity in confirmatory factor analysis and develop Markov Chain Monte Carlo (MCMC) procedures to perform Bayesian inference, model checking, and model comparison.We test our methodology using synthetic data and data from a consumption emotion study. The results from synthetic data show that our Bayesian model perform well in recovering the true parameters and selecting the appropriate model. More importantly, the results clearly illustrate the consequences of ignoring heterogeneity. Specifically, we find that ignoring heterogeneity can lead to sign reversals of the factor covariances, inflation of factor variances and underappreciation of uncertainty in parameter estimates. The results from the emotion study show that subjects vary both in means and covariances. Thus traditional psychometric methods cannot fully capture the heterogeneity in our data. 相似文献

17.

Using reliability generalization methods to explore measurement error: an illustration using the MMPI-2 PSY-5 scales

Rouse SV 《Journal of personality assessment》2007,88(3):264-275

Reliability generalization (RG) is a meta-analytic technique that allows for the systematic examination of variation in score reliability for different samples of test takers; this procedure is based on the recognition that reliability is not a stable property of a test but is sample dependent. As a demonstration of an RG analysis, I obtained 63 reliability coefficients for each of the MMPI-2 (Butcher et al., 2001) Personality Psychopathology 5 (Harkness, McNulty, & Ben-Porath, 1995) scales. The overall variability of alpha coefficients supports the argument that reliability is sample dependent and underscores the need for researchers to calculate reliability estimates based on their research samples rather than simply citing published alpha coefficients as evidence of score reliability. I observed statistically significant mean reliability differences for scores across the 5 scales, with the highest level of reliability observed for scores on the measure of Negative Emotionality and the lowest levels of reliability observed for scores on the measures of Aggression and Disconstraint. There was no evidence that the sex-composition of a sample was systematically related to score reliability, and there were no statistically significant differences in reliability between scores obtained with the English version of the test and those obtained with translated forms. However, reliability was consistently lower for scores on some scales when the data were obtained in nonclinical settings as opposed to clinical ones. Sample size was not significantly correlated with reliability estimates. RG methods have the potential for deepening the level of understanding about the role of reliability in the evaluation and use of personality tests. 相似文献

18.

Bilingual computerized speech-recognition screening for clinical depression: Evaluating a cellular telephone prototype

Gerardo M. González Craig R. Costello Mario Valenzuela Beverly Chaidez Arcela Nuñez-alvarez 《Behavior research methods》1995,27(4):476-482

This exploratory field study evaluated a bilingual computerized speech-recognition cellular telephone prototype of the Center for Epidemiological Studies—Depression scale (CES-D). Thirty Spanish and 22 English speakers completed both computer-telephone and face-to-face CES-D methods and an oral depression checklist in counterbalanced order. Both language groups reported high positive ratings for the computer-telephone method, with the English sample preferring the computer-telephone over the face-to-face method. In both samples, the computer-telephone method yielded high internal consistency estimates, strong alternate form reliabilities, and similar high correlations to the depression checklist. Both groups reported significantly elevated scores with the computer-telephone method, but total score variances for both methods did not differ. Computer-telephone limitations included occasional misrecognitions and template training constraints. 相似文献

19.

Application of the bootstrap methods in factor analysis

Masanori Ichikawa Sadanori Konishi 《Psychometrika》1995,60(1):77-93

A Monte Carlo experiment is conducted to investigate the performance of the bootstrap methods in normal theory maximum likelihood factor analysis both when the distributional assumption is satisfied and unsatisfied. The parameters and their functions of interest include unrotated loadings, analytically rotated loadings, and unique variances. The results reveal that (a) bootstrap bias estimation performs sometimes poorly for factor loadings and nonstandardized unique variances; (b) bootstrap variance estimation performs well even when the distributional assumption is violated; (c) bootstrap confidence intervals based on the Studentized statistics are recommended; (d) if structural hypothesis about the population covariance matrix is taken into account then the bootstrap distribution of the normal theory likelihood ratio test statistic is close to the corresponding sampling distribution with slightly heavier right tail.This study was carried out in part under the ISM cooperative research program (91-ISM · CRP-85, 92-ISM · CRP-102). The authors would like to thank the editor and three reviewers for their helpful comments and suggestions which improved the quality of this paper considerably. 相似文献

20.

Internal Consistency and Power When Comparing Total Scores from Two Groups

Kimberly A. Barchard Vincent Brouwers 《Multivariate behavioral research》2016,51(4):482-494

Researchers now know that when theoretical reliability increases, power can increase, decrease, or stay the same. However, no analytic research has examined the relationship of power to the most commonly used type of reliability—internal consistency—and the most commonly used measures of internal consistency, coefficient alpha and ICC(A,k). We examine the relationship between the power of independent samples t tests and internal consistency. We explicate the mathematical model upon which researchers usually calculate internal consistency, one in which total scores are calculated as the sum of observed scores on K measures. Using this model, we derive a new formula for effect size to show that power and internal consistency are influenced by many of the same parameters but not always in the same direction. Changing an experiment in one way (e.g., lengthening the measure) is likely to influence multiple parameters simultaneously; thus, there are no simple relationships between such changes and internal consistency or power. If researchers revise measures to increase internal consistency, this might not increase power. To increase power, researchers should increase sample size, select measures that assess areas where group differences are largest, and use more powerful statistical procedures (e.g., ANCOVA). 相似文献