首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The SCORE (Systemic Clinical Outcome and Routine Evaluation) is a 40‐item questionnaire for completion by family members 12 years and older to assess outcome in systemic therapy. This study aimed to investigate psychometric properties of two short versions of the SCORE and their responsiveness to therapeutic change. Data were collected at 19 centers from 701 families at baseline and from 433 of these 3–5 months later. Results confirmed the three‐factor structure (strengths, difficulties, and communication) of the 15‐ and 28‐item versions of the SCORE. Both instruments had good internal consistency and test–retest reliability. They also showed construct and criterion validity, correlating with measures of parent, child, and family adjustment, and discriminating between clinical and nonclinical cases. Total and factor scales of the SCORE‐15 and ‐28 were responsive to change over 3–5 months of therapy. The SCORE‐15 and SCORE‐28 are brief psychometrically robust family assessment instruments which may be used to evaluate systemic therapy.  相似文献   

2.
The use of reliability estimates is increasingly scrutinized as scholars become more aware that test–retest stability and self–other agreement provide a better approximation of the theoretical and practical usefulness of an instrument than its internal reliability. In this study, we investigate item characteristics that potentially impact single‐item internal reliability, retest reliability, and self–other agreement. Across two large samples (N = 6690 and N = 4396), two countries (Estonia and The Netherlands), and two personality inventories (the NEO PI‐3 and the HEXACO‐PI‐R), results show that (i) item variance is a strong predictor of self–other agreement and retest reliability but not of single‐item internal reliability; (ii) item variance mediates the relations between evaluativeness and self–other agreement; and (iii) self–other agreement is predicted by observability and item domain. On the whole, weak relations between item length, negations, and item position (indicating effects of questionnaire length) on the one hand, and single‐item internal reliability, retest reliability, and self–other agreement on the other, were observed. In order to increase the predictive validity of personality scales, our findings suggest that during the construction of questionnaire items, researchers are advised to pay close attention especially to item variance, but also to evaluativeness and observability. Copyright © 2016 European Association of Personality Psychology  相似文献   

3.
The study examined the effects of gender and item content of domain‐general and domain‐specific creative‐thinking tests on four subscale scores of creative‐thinking (fluency, flexibility, originality, and elaboration). Chinese tenth‐grade students (234 males and 244 females) participated in the study. Domain‐general creative thinking was measured by using two domain‐independent items—box and newspaper. Domain‐specific creative thinking was measured in the domain of history by two history‐specific items—school uniform and health food—that were part of lessons in modern Chinese history. Domain‐general creative‐thinking scores were not different across gender in any of the four subscales. In domain‐specific creative thinking, female students produced more responses (fluency) and more categories of ideas (flexibility), and more detailed answers (elaboration) on both items than did males. Gender difference was not found in originality. Item effects were significant in both general and specific creative‐thinking scores, with higher fluency, flexibility, and elaboration for the newspaper than the box item, and higher fluency, flexibility, originality, and elaboration for the school uniform than the health food item. The findings on both gender and item effects support the contention that personal interest and life experience influence the generation of creative solutions. The finding that gender did not differ in domain‐general creative‐thinking was expected, as the two general items (box and newspaper) are experienced similarly by both genders. As most of the creative‐thinking tests are influenced by individuals' experience beyond creative‐thinking ability, judicial evaluation and use of creative‐thinking scores are underscored.  相似文献   

4.
We designed this study to evaluate several data collection and equating designs in the context of item response theory (IRT) equating. The random‐groups design and the common‐item design have been widely used for collecting data for IRT equating. In this study, we investigated four equating methods based upon these two data collection designs, using empirical data from a number of different testing programs. When the randomly equivalent group assumption was reasonably met, the four equating methods tended to produce highly comparable results. On the other hand, equating methods based upon either of the equating designs produced dissimilar results. Sample size can have differential effects on the equating results produced by the different equating methods. In practice, a common‐item equivalent‐groups design often produces unacceptably large differences in the group mean due to various anomalies such as context effects, poor quality of common items, or a very small number of common items. In such cases, a random‐groups design would produce more stable equating results.  相似文献   

5.
The present study investigates the impact of item characteristics on multi‐source performance assessment. Three item characteristics (syntax, double‐barreledness, & behavioral specificity) were linked to the psychometric properties of items used by self, subordinates, peers, and supervisors as operationalized by the relationship between the item and the performance dimension it is intended to measure. Results show that syntax, a linguistic index that pertains to the length of items, is related to the psychometric properties of all rating sources except subordinates. The implications of this effect for the design of multi‐source assessment instrument are discussed.  相似文献   

6.
It has been recently proposed that pregnant women would perform memory tasks by focusing more on item‐specific processes and less on relational processing, compared to post‐partum women (Mickes, Wixted, Shapiro & Scarff, 2009 ). The present cross‐sectional study tested this hypothesis by directly manipulating the type of encoding employed in the study phase. Pregnant, post‐partum and control women either rated the pleasantness of word meaning (which induced item‐specific elaboration) or named the semantic category to which they belonged (which induced relational elaboration). Memory for the encoded words was later tested in free recall (which emphasizes relational processing) and in recognition (which emphasizes item‐specific processing). In line with Mickes et al.'s ( 2009 ) conclusions, pregnant women in the item‐specific condition performed worse than post‐partum women in the relational condition in free recall, but not in recognition. However, compared to the other two groups, pregnant women also exhibited lower recognition accuracy in the item‐specific condition. Overall, these results confirm that pregnant women rely on relational encoding less than post‐partum women, but additionally suggest that the former group might use item‐specific processes less efficiently than post‐partum and control women.  相似文献   

7.
Test anxiety (TA) is a prevalent issue among students that can result in deleterious consequences, such as underachievement. However, a contemporary measure that has been validated for use with Australian students seems to be lacking. This study, therefore, investigated the suitability of the German Test Anxiety Inventory (TAI‐G) for use with Australian university students. While the original TAI‐G contains 30 items and was designed to measure four factors (worry, emotionality, interference, and lack of confidence), differing factorial models have been supported in the literature using either the original or a shortened 17‐item version of the measure. These differing TAI‐G models were tested and compared in the current study via confirmatory factor analysis using 224 Australian university students. As expected, results supported the superior fit of the 17‐item four‐factor model. Additionally, the convergent validity of the measure was supported since measures of self‐esteem, self‐efficacy, and general anxiety were all found to correlate significantly with the TAI‐G in the hypothesised directions. Finally, the finding that all of the TAI‐G subscales had acceptably high reliabilities led to the conclusion that the 17‐item TAI‐G is a valid and reliable measure of TA in an Australian university population.  相似文献   

8.
Agreement between multiple informants on child personality has received limited attention. Focusing on factor structure, gender differences and the influence of socially desirable responding (SDR), we compared parent and teacher Big Five personality ratings of around 600 7‐year olds. Although parent ratings were more desirable than teacher ratings, differential agreement was generally similar to that found for adults, and especially high for ratings of boys. The more evaluative the personality item, the larger the mean‐level difference between parents and teachers on that item. However, undesirable items showed the highest levels of differential agreement. In parent ratings, the two poles of Agreeableness formed separate factors. To view Pro‐sociality as independent of Antagonism could enable parents to view their child more positively. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

9.
We examined the differential item functioning (DIF) of Rosenberg's (1965 ) Self‐Esteem Scale (RSES) and compared scores from U.S. participants with those from 7 other countries: Canada, Germany, New Zealand, Kenya, South Africa, Singapore, and Taiwan. Results indicate that DIF was present in all comparisons. Moreover, controlling for latent self‐esteem, participants from individualistic countries had an easier time reporting high self‐esteem on self‐competence‐related items, whereas participants from communal countries had an easier time reporting high self‐esteem on self‐liking items ( Tafarodi & Milne, 2002 ). After adjusting for DIF, we found larger mean self‐esteem differences between the countries than observed scores initially indicated. The suitability of the RSES, and the importance of examining DIF, for cross‐cultural research are discussed.  相似文献   

10.
This study examined the interactions of stimulus type (high‐ vs. low‐tech) and magnitude (duration of access) on preference and reinforcer efficacy. Two preference assessments were conducted to identify highly preferred high‐tech and low‐tech items for each participant. A subsequent assessment examined preference for those items when provided at 30‐s and 600‐s durations. We then evaluated reinforcer efficacy for those same items when provided for a range of durations using progressive‐ratio schedules. Results suggested item type and access duration interacted to influence preference and reinforcer efficacy. Participants preferred high‐tech items at longer durations of access and engaged in more responding when the high‐tech item was provided for long durations, but these patterns were reversed for the low‐tech item. In addition, participants engaged in less responding when the high‐tech item was provided for short durations and when the low‐tech item was provided for long durations.  相似文献   

11.
Parameters of the two‐parameter logistic model are generally estimated via the expectation–maximization (EM) algorithm by the maximum‐likelihood (ML) method. In so doing, it is beneficial to estimate the common prior distribution of the latent ability from data. Full non‐parametric ML (FNPML) estimation allows estimation of the latent distribution with maximum flexibility, as the distribution is modelled non‐parametrically on a number of (freely moving) support points. It is generally assumed that EM estimation of the two‐parameter logistic model is not influenced by initial values, but studies on this topic are unavailable. Therefore, the present study investigates the sensitivity to initial values in FNPML estimation. In contrast to the common assumption, initial values are found to have notable influence: for a standard convergence criterion, item discrimination and difficulty parameter estimates as well as item characteristic curve (ICC) recovery were influenced by initial values. For more stringent criteria, item parameter estimates were mainly influenced by the initial latent distribution, whilst ICC recovery was unaffected. The reason for this might be a flat surface of the log‐likelihood function, which would necessitate setting a sufficiently tight convergence criterion for accurate recovery of item parameters.  相似文献   

12.
What happens when people try to forget something? What are the consequences of instructing people to intentionally forget a sentence? Recent studies employing the item‐method directed forgetting paradigm have shown that to‐be‐forgotten (TBF) items are, in a subsequent task, emotionally devaluated relative to to‐be‐remembered (TBR) items, an aftereffect of memory selection (Vivas, Marful, Panagiotidou & Bajo, 2016). As such, distractor devaluation by attentional selection generalizes to memory selection. In this study, we use the item‐method directed forgetting paradigm to test the effects of memory selection and inhibition on truth judgments of ambiguous sentences. We expected the relative standing of an item in the task (i.e., whether it was instructed to be remembered or forgotten) to affect the truthfulness value of that item, making TBF items less valid/truthful than TBR items. As predicted, ambiguous sentences associated with a “Forget” cue were subsequently judged as less true than sentences associated with a “Remember” cue, suggesting that instructions to intentionally forget a statement can produce changes in the validity/truthfulness of that statement. To our knowledge, this is the first study to show an influence of memory processes involved in selection and forgetting on the perceived truthfulness of sentences.  相似文献   

13.
This article describes the development, in an Irish context, of a three‐factor, twenty‐eight‐item version of the Systemic Clinical Outcome and Routine Evaluation (SCORE) questionnaire for assessing progress in family therapy. The forty‐ item version of the SCORE was administered to over 700 Irish participants including non‐clinical adolescents and young adults, families attending family therapy, and parents of young people with physical and intellectual disabilities and cystic fibrosis. For validation purposes, data were also collected using brief measures of family and personal adjustment. A twenty‐eight‐item version of the SCORE (the SCORE‐28) containing three factor scales that assess family strengths, difficulties and communication was identified through exploratory principal components analysis. Confirmatory factor analysis showed that the factor structure of the SCORE‐28 was stable. The SCORE‐28 and its three factor scales were shown to have excellent internal consistency reliability, satisfactory test‐retest reliability and construct validity. The SCORE‐28 scales correlated highly with the General Functioning Scale of the Family Assessment Device, and moderately with the Global Assessment of Relational Functioning Scale, the Kansas Marital and Parenting Satisfaction Scales, the Satisfaction with Life Scale, the Mental Health Inventory – 5, and the total problems scale of the Strengths and Difficulties Questionnaire. Correlational analyses also showed that the SCORE‐28 scales were not strongly associated with demographic characteristics or social desirability response set. The SCORE‐28 may routinely be administered to literate family members aged over 12 years before and after family therapy to evaluate therapy outcome.  相似文献   

14.
Few group psychotherapy studies focus on therapists' interventions, and instruments that can measure group psychotherapy treatment fidelity are scarce. The aim of the present study was to evaluate the reliability of the Mentalization‐based Group Therapy Adherence and Quality Scale (MBT‐G‐AQS), which is a 19‐item scale developed to measure adherence and quality in mentalization‐based group therapy (MBT‐G). Eight MBT groups and eight psychodynamic groups (a total of 16 videotaped therapy sessions) were rated independently by five raters. All groups were long‐term, outpatient psychotherapy groups with 1.5 hours weekly sessions. Data were analysed by a Generalizability Study (G‐study and D‐study). The generalizability models included analyses of reliability for different numbers of raters. The global (overall) ratings for adherence and quality showed high to excellent reliability for all numbers of raters (the reliability by use of five raters was 0.97 for adherence and 0.96 for quality). The mean reliability for all 19 items for a single rater was 0.57 (item range 0.26–0.86) for adherence, and 0.62 (item range 0.26–0.83) for quality. The reliability for two raters obtained mean absolute G‐coefficients on 0.71 (item range 0.41–0.92 for the different items) for adherence and 0.76 (item range 0.42–0.91) for quality. With all five raters the mean absolute G‐coefficient for adherence was 0.86 (item range 0.63–0.97) and 0.88 for quality (item range 0.64–0.96). The study demonstrates high reliability of ratings of MBT‐G‐AQS. In models differentiating between different numbers of raters, reliability was particularly high when including several raters, but was also acceptable for two raters. For practical purposes, the MBT‐G‐AQS can be used for training, supervision and psychotherapy research.  相似文献   

15.
Visual working memory (VWM) is a central bottleneck in human information processing. Its capacity is most often measured in terms of how many individual‐item representations VWM can hold (k). In the standard task employed to estimate k, an array of highly discriminable colour patches is maintained and, after a short retention interval, compared to a test display (change detection). Recent research has shown that with more complex, structured displays, change‐detection performance is, in addition to individual‐item representations, supported by ensemble representations formed as a result of spatial subgroupings. Here, by asking participants to additionally localize the change, we reveal indication for an influence of ensemble representations even in the very simple, unstructured displays of the colour‐patch change‐detection task. Critically, pure‐item models from which standard formulae of k are derived do not consider ensemble representations and, therefore, potentially overestimate k. To gauge this overestimation, we develop an item‐plus‐ensemble model of change detection and change localization. Estimates of k from this new model are about 1 item (~30%) lower than the estimates from traditional pure‐item models, even if derived from the same data sets.  相似文献   

16.
Personality development research heavily relies on the comparison of scale means across age. This approach implicitly assumes that the scales are strictly measurement invariant across age. We questioned this assumption by examining whether appropriate personality indicators change over the lifespan. Moreover, we identified which types of items (e.g. dispositions, behaviours, and interests) are particularly prone to age effects. We reanalyzed the German Revised NEO Personality Inventory normative sample (N = 11,724) and applied a genetic algorithm to select short scales that yield acceptable model fit and reliability across locally weighted samples ranging from 16 to 66 years of age. We then examined how the item selection changes across age points and item types. Emotion‐type items seemed to be interchangeable and generally applicable to people of all ages. Specific interests, attitudes, and social effect items—most prevalent within the domains of Extraversion, Agreeableness, and Openness—seemed to be more prone to measurement variations over age. A large proportion of items were systematically discarded by the item‐selection procedure, indicating that, independent of age, many items are problematic measures of the underlying traits. The implications for personality assessment and personality development research are discussed. © 2019 European Association of Personality Psychology  相似文献   

17.
Research on sense of community (SOC) has traditionally been approached from a resource perspective. Recently, however, research on the experience of SOC has evolved to include a related but distinct construct of sense of community responsibility (SOC‐R), or feelings of accountability for the well‐being of a community. This study applied item response theory to examine the psychometric properties of a SOC‐R scale used in an evaluation of community‐based substance abuse prevention coalitions. Data were collected in 2017 from coalition members (analytic sample = 309) in the northeastern United States. Findings indicate that the scale was reliable, unidimensional, and functioned well, particularly at low and moderate levels of the construct. The addition of two items intended to capture higher levels of the construct improved the scale's functioning at higher levels of SOC‐R. The adapted SOC‐R scale was also shown to have moderately strong relationships with conceptually relevant variables, including SOC, coalition participation, number of roles performed in the coalition, and engagement in community action activities. These findings provide empirical evidence to support the reliability and validity of the SOC‐R scale, and have critical implications for our conceptualization of the SOC construct, its measurement, and for the evaluation of community‐based prevention interventions.  相似文献   

18.
在MCAT中考查四种项目选择指标在有无曝光控制条件下的选题表现。项目选择指标分别是:(1)贝叶斯的D优化方法(D-optimality)、后验期望Kullback-Leibler方法(KLP)、基于等权重复合分数的最小误差方差方法(the minimized error variance of the linear combination score with equal weight,V1)和基于最优权重复合分数的最小误差方差方法(the minimized error variance of the composite score with optimized weight,V2)。将针对认知诊断CAT项目曝光控制的的限制阈值方法(Restrictive Threshold,RT)和限制进度(Restrictive Progressive,RPG)方法、单维CAT中的最大优先指标方法(Maximum Priority Index,MPI)推广到MCAT。模拟研究表明:(1)KLP,D-优化和V1对领域分数估计准确,能力返真性比V2更好。(2)尽管V1和V2方法相比KLP和D-优化方法提高了题库利用率,但这四种选题指标都产生不均匀的项目曝光率分布。(2)三种曝光控制策略都极大地提高项目曝光均匀性,且不明显降低测量精度。(3)MPI与RPG方法在曝光控制方面表现类似,且比RT的方法表现更好。  相似文献   

19.
The purpose of this study was to validate the 7‐item long‐term mating orientation scale (LTMO) as translated into Japanese. Two samples of Japanese adults (N = 2000; 50.0% male; Mage = 40.9 for the first survey; N = 300; 54.7% male; Mage = 42.4 for the second survey) completed a web‐based questionnaire, including the Japanese translation of the LTMO. The results showed that the psychometric properties of the Japanese LTMO scale were comparable to those of the original English version. The scale had adequate reliability based on Cronbach's α and McDonald's ω. Convergent validity was demonstrated by the correlation between the LTMO scores and related variables: human life history strategies, short‐term sociosexual orientation, attitude to infidelity, romantic attachment style, and so on. The translated scale provides a valid and reliable instrument in Japanese that measures human mating strategy.  相似文献   

20.
The positive affect and negative affect schedule (PANAS) is a popular measure of positive (PA) and negative affectivity (NA). Developed and validated in Western contexts, the 20‐item scale has been frequently administered on respondents from Asian countries with the assumption of cross‐cultural measurement invariance. We examine this assumption via a rigorous multigroup confirmatory factor analysis, which allows us to assess between‐group differences in both strength of scale item‐to‐latent factor relationship (metric invariance test) and mean of each scale item (scalar invariance test), on a large sample of 1,065 respondents recruited from Singapore (Asian sample) and the United States (Western sample). We found that two items assessing PA (“excited” and “proud”) and three items assessing NA (“guilty,” “hostile,” and “ashamed”) exhibited metric noninvariance whereas 11 of the remaining metric invariant items exhibited scalar noninvariance, suggesting that the PA and NA constructs differ from what the PANAS is expected to measure for Asian respondents. Our findings serve as a cautionary note to researchers who intend to administer the PANAS in future studies as well as to researchers interpreting the results of past studies involving respondents from Asian countries.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号