Multiple baseline designs—both concurrent and nonconcurrent—are the predominant experimental design in modern applied behavior analytic research and are increasingly employed in other disciplines. In the past, there was significant controversy regarding the relative rigor of concurrent and nonconcurrent multiple baseline designs. The consensus in recent textbooks and methodological papers is that nonconcurrent designs are less rigorous than concurrent designs because of their presumed limited ability to address the threat of coincidental events (i.e., history). This skepticism of nonconcurrent designs stems from an emphasis on the importance of across-tier comparisons and relatively low importance placed on replicated within-tier comparisons for addressing threats to internal validity and establishing experimental control. In this article, we argue that the primary reliance on across-tier comparisons and the resulting deprecation of nonconcurrent designs are not well-justified. In this article, we first define multiple baseline designs, describe common threats to internal validity, and delineate the two bases for controlling these threats. Second, we briefly summarize historical methodological writing and current textbook treatment of these designs. Third, we explore how concurrent and nonconcurrent multiple baselines address each of the main threats to internal validity. Finally, we make recommendations for more rigorous use, reporting, and evaluation of multiple baseline designs.  相似文献   

Important findings are often a balance between the rigor of the experimental design and innovativeness of the experimental question. One broad topic area that has received a great deal of discussion, but little empirical study, is the evaluation of educational systems. Experimental designs that permit the analysis of practices used by state education agencies, local education agencies, and schools have the potential for yielding socially significant findings that could improve education. In this article we discuss the use of nonconcurrent multiple baseline designs as an option for studying the activities and effects of educational programs. Nonconcurrent multiple baseline designs stagger the timing of baseline-to-intervention changes across various entities, but the baselines and intervention phases are not contemporaneous across each of the tiers. Although considered less rigorous than concurrent multiple baseline designs, nonconcurrent designs have a degree of flexibility that may allow for their use in studying complex social contexts, such as educational settings, that might otherwise go unanalyzed.  相似文献   

In order to examine the concurrent and criterion validity of the questionnaire version of the Eating Disorders Examination (EDE-Q), self-report and interview formats were administered to a community sample of women aged 18-45 (n = 208). Correlations between EDE-Q and EDE subscales ranged from 0.68 for Eating Concern to 0.78 for Shape Concern. Scores on the EDE-Q were significantly higher than those of the EDE for all subscales, with the mean difference ranging from 0.25 for Restraint to 0.85 for Shape Concern. Frequency of both objective bulimic episodes (OBEs) and subjective bulimic episodes (SBEs) was significantly correlated between measures. Chance-corrected agreement between EDE-Q and EDE ratings of the presence of OBEs was fair, while that for SBEs was poor. Receiver operating characteristic (ROC) analysis, based on a sample of 13 cases, indicated that a score of 2.3 on the global scale of the EDE-Q in conjunction with the occurrence of any OBEs and/or use of exercise as a means of weight control, yielded optimal validity coefficients (sensitivity = 0.83, specificity = 0.96, positive predictive value = 0.56). A stepwise discriminant function analysis yielded eight EDE-Q items which best distinguished cases from non-cases, including frequency of OBEs, use of exercise as a means of weight control, use of self-induced vomiting, use of laxatives and guilt about eating. The EDE-Q has good concurrent validity and acceptable criterion validity. The measure appears well-suited to use in prospective epidemiological studies.  相似文献   

A dimensional approach was used to evaluate the internal validity of the DSM-III-R ADHD-inattention, ADHD-hyperactivity/impulsivity, oppositional defiant disorder (ODD), and conduct disorder (CD) symptoms (i.e., whether a symptom has a stronger correlation with its own dimension than the other three). Parents rated 4,019 children between the ages of 2 and 19 on these symptoms. The results showed that 5 of the 6 inattention symptoms, 3 of the 4 hyperactivity symptoms, 1 of the 4 impulsivity symptoms, 6 of the 9 oppositional defiant disorder symptoms, and 8 of the 11 CD symptoms had significant internal validity. Confirmatory factor analysis (CFA) found support for inattention, hyperactivity/impulsivity, oppositional defiant, and conduct disorder dimensions. Multiple-group CFA also found support for factor pattern and loading invariance across gender. The implications of these results as well as the merits of the dimensional approach to symptom validity are discussed in the context of the DSM-IV changes in ADHD, ODD, and CD.  相似文献   

In Germany, Stober et al. (1999, 2001) presented evidence for the validity of the SDS-17, a new measure of social desirability bias. In the current investigation, three experiments (n = 800) assessed the SDS-17’s validity in the US environment. In all conditions SDS-17 scores correlated highly with Marlowe–Crowne scores. In Study 1, a group administration of a paper and pencil booklet, SDS-17 scores of 327 college students were higher under Fake Good than Standard conditions, and both were higher than scores in the Honest condition. Study 2, an online survey of a demographically diverse adult sample (n = 257), showed that the increase in SDS-17 scores under Fake Good conditions occurs also in a Web survey and that SDS-17 scores were unrelated to one’s demographic profile. Study 3, a group administration to 216 college students, revealed again that scores under Fake Good were higher than those under Standard administration and that SDS-17 scores correlated more highly with the Impression Management than with the Self-Deception subscales of the BIDR. The SDS-17 appeared valid for the US environment as a measure of socially desirable responding. The evidence, however, encourages its further assessment as an index of social desirability bias per se.  相似文献   

In this study, we propose that the unique needs and characteristics of linguistic minorities should be considered throughout the test development process. Unlike most measurement invariance investigations in the assessment of linguistic minorities, which typically are conducted after test administration, we propose strategies that focus on the early stages of test development. Our approaches involve considering linguistic minorities in the selection of the test and sampling designs. We posit that joint consideration of these issues preemptively can strengthen the claims derived from tests used to assess linguistically diverse populations. This also will contribute to more psychometrically robust assessments, which can yield accurate and valid score-based inferences for linguistic minorities. To this end, we provide various examples and methodological approaches that can be used in the selection of the test and sampling designs that take these issues into consideration.  相似文献   

Studies on adults have revealed a disadvantageous effect of negative emotional stimuli on executive functions (EF), and it is suggested that this effect is amplified in children. The present study’s aim was to assess how emotional facial expressions affected working memory in 9- to 12-year-olds, using a working memory task with emotional facial expressions as stimuli. Additionally, we explored how degree of internalizing and externalizing symptoms in typically developing children was related to performance on the same task. Before employing the working memory task with emotional facial expressions as stimuli, an independent sample of 9- to 12-year-olds was asked to recognize the facial expressions intended to serve as stimuli for the working memory task and to rate the facial expressions on the degree to which the emotion was expressed and for arousal to obtain a baseline for how children during this age recognize and react to facial expressions. The first study revealed that children rated the facial expressions with similar intensity and arousal across age. When employing the working memory task with facial expressions, results revealed that negatively valenced expressions impaired working memory more than neutral and positively valenced expressions. The ability to successfully complete the working memory task increased between 9 to 12 years of age. Children’s total problems were associated with poorer performance on the working memory task with facial expressions. Results on the effect of emotion on working memory are discussed in light of recent models and empirical findings on how emotional information might interact and interfere with cognitive processes such as working memory.  相似文献   

This article proposes and demonstrates a methodology for test score validation through abductive reasoning. It describes how abductive reasoning can be utilized in support of the claims made about test score validity. This methodology is demonstrated with a real data example of the Canadian English Language Proficiency Index Program (CELPIP)-General test—a program assessing functional English language ability in the community and workplace. Abductive reasoning seeks the enabling conditions through which a claim about a person's ability makes sense. For example, it makes sense that a person has strong functional language proficiency if he or she has been regularly using English to write emails and meet with colleagues at work. A valid test score should be affected by the extent of a person's engagement with such enabling conditions. Empirical evidence that warrants such an abductively reasoned claim is illustrated through a latent class analysis within a structural equation model. Evidence is examined to investigate whether certain classes of test takers who have been differentially engaging in the enabling conditions do, in fact, predict a person's CELPIP-General performance. The steps of the methodology are summarized in the closing section.  相似文献   

Confirmatory factor analysis was used to model a multitrait (ADHD-inattention and hyperactivity/ impulsivity) by multisource (teachers and parents) design across a 3-month interval in a sample of 360 Australian elementary school children. The purpose was to evaluate the convergent and discriminant validity of the ADHD-inattention (IN) and hyperactivity/impulsivity (H/I) measures. Although similar traits and similar sources showed stronger correlations across time than dissimilar traits and dissimilar sources, the amount of source variance in the ADHD-IN and ADHD-H/I measures was substantial and consistent across the interval (M = 59%; range = 35–84%). This large amount of source variance raises the possibility that the correlations of the IN and H/I rating scales with other constructs (e.g., social competence, conduct problems) represent mostly source rather than trait effects. Multitrait by multisource analyses provide a means to answer this question and further advance understanding of ADHD.  相似文献   

Objective : Adaptive tasks, referring to the subjective evaluation of disease-related stressors in relation to personal concerns, have been neglected in the extensive literature on coping with chronic disease. In this study, the development of an instrument for measuring adaptive tasks is described: the Questionnaire Adaptive Tasks in Multiple Sclerosis (QuAT-MS). Method : The QuAT-MS is based on a bottom-up categorization of patients' statements on the losses, threats, and challenges brought about by their disease, and employs 10 scales to measure the importance attached to particular disease-related stressors. Validity and reliability of this bottom-up categorization were established in a sample of MS patients ( N = 259) by examining their associations with related concepts relevant in adaptation to disease, such as coping (CISS), coping resources (LOT, self-efficacy), and quality of life (SIP). We also investigated whether patients' backgrounds and disease characteristics were related to adaptive tasks. Results : Adaptive tasks are more closely related with concepts relevant for adaptation (coping and coping resources) than with physical functioning (SIP) and disease-related characteristics (illness duration). Adaptive tasks are also associated with gender and level of education. Conclusion : It is concluded that adaptive tasks can be distinguished from related concepts like coping and quality of life. Furthermore, the QuAT-MS offers a reliable and patient-centred instrument for measuring the tasks which MS patients identify in their adaptation process.  相似文献   

This study explored the validity of the Values-In-Action Inventory of Strengths (VIA-IS) in an African context. A convenience sample of 256 African students completed the VIA-IS in English. The majority of strengths subscales had good reliability coefficients and mean scores comparable to those reported in a Western context. Satisfactory criterion-related validity was established through correlations with other well-being indices. First and second order confirmatory factor analyses only partly supported construct validity. All strengths subscales consisted of more than one factor. The hypothesised six-virtue cluster pattern was partially supported. Exploratory factor analysis suggested the possibility of an emic factor pattern of strengths consisting of 3 components: Within the first factor, Intrapersonal and Relationship Strengths, two clusters are distinguished, namely, Intrapersonal Strengths, and Horizontal and Vertical Relationship Strengths. The second factor was Integrity in Group Context. Thus, the VIA-IS has merit, but is not completely valid in its original form.  相似文献   

The present investigation examined the incremental predictive validity of mindfulness skills, as measured by the Kentucky Inventory of Mindfulness Skills (KIMS), in relation to multiple facets of emotional dysregulation, as indexed by the Difficulties in Emotion Regulation Scale (DERS), above and beyond variance explained by negative affectivity, anxiety sensitivity, and distress tolerance. Participants were a nonclinical community sample of 193 young adults (106 women, 87 men; Mage = 23.91 years). The KIMS Accepting without Judgment subscale was incrementally negatively predictive of all facets of emotional dysregulation, as measured by the DERS. Furthermore, KIMS Acting with Awareness was incrementally negatively related to difficulties engaging in goal-directed behavior. Additionally, both observing and describing mindfulness skills were incrementally negatively related to lack of emotional awareness, and describing skills also were incrementally negatively related to lack of emotional clarity. Findings are discussed in relation to advancing scientific understanding of emotional dysregulation from a mindfulness skills-based framework.  相似文献   

The present work analyses the predictive validity of measures provided by several available self‐report and indirect measurement instruments to assess risk propensity (RP) and proposes a measurement instrument using the Implicit Association Test: the IAT of Risk Propensity Self‐Concept (IAT‐RPSC), an adaptation of the prior IAT‐RP of Dislich et al. Study 1 analysed the relationship between IAT‐RPSC scores and several RP self‐report measures. Participants' risk‐taking behaviour in a natural setting was also assessed, analyzing the predictive validity of the IAT‐RPSC scores on risk‐taking behaviour compared with the self‐report measures. Study 2 analysed the predictive validity of the IAT‐RPSC scores in comparison with other indirect measures. Results of these studies showed that the IAT‐RPSC scores exhibited good reliability and were positively correlated to several self‐report and indirect measures, providing evidence for convergent validity. Most importantly, the IAT‐RPSC scores predicted risk‐taking behaviour in a natural setting with real consequences above and beyond all other self‐report and indirect measures analysed. Copyright © 2013 European Association of Personality Psychology  相似文献   

Although it is common in community psychology research to have data at both the community, or cluster, and individual level, the analysis of such clustered data often presents difficulties for many researchers. Since the individuals within the cluster cannot be assumed to be independent, the use of many traditional statistical techniques that assumes independence of observations is problematic. Further, there is often interest in assessing the degree of dependence in the data resulting from the clustering of individuals within communities. In this paper, a random-effects regression model is described for analysis of clustered data. Unlike ordinary regression analysis of clustered data, random-effects regression models do not assume that each observation is independent, but do assume data within clusters are dependent to some degree. The degree of this dependency is estimated along with estimates of the usual model parameters, thus adjusting these effects for the dependency resulting from the clustering of the data. Models are described for both continuous and dichotomous outcome variables, and available statistical software for these models is discussed. An analysis of a data set where individuals are clustered within firms is used to illustrate fetatures of random-effects regression analysis, relative to both individual-level analysis which ignores the clustering of the data, and cluster-level analysis which aggregates the individual data. Preparation of this article was supported by National Heart, Lung, and Blood Institute Grant R18 HL42987-01A1, National Institutes of Mental Health Grant MH44826-01A2, and University of Illinois at Chicago Prevention Research Center Developmental Project CDC Grant R48/CCR505025.  相似文献   

The purpose of the present study was to empirically test the suggestion that experiential avoidance in an emotion regulation context is best understood as an emotion regulatory function of topographically distinct strategies. To do this we examined whether a measure of experiential avoidance could statistically account for the effects of emotion regulation strategies intervening at different points of the emotion-generative process as conceptualized by Gross' (1998) process model of emotion regulation. The strategies under examination were behavioral avoidance, cognitive reappraisal, and response suppression. The specific hypotheses to be tested were (1) that behavioral avoidance, cognitive reappraisal, and response suppression would statistically mediate the differences in measures of psychological well-being between a clinical and nonclinical sample, but that (2) these indirect effects would be reduced to nonsignificant levels when controlling for differences in experiential avoidance. The results provide clear support for the first hypothesis with regard to all the studied strategies. In contrast to the second hypothesis, the results showed the predicted outcome pattern only for the response-focused strategy “response suppression” and not for cognitive reappraisal or behavioral avoidance. The results are interpreted and discussed in relation to theories on experiential avoidance and emotion regulation.  相似文献   

