首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Curriculum-based measurement of reading (CBM-R) is used to estimate oral reading fluency. Unlike many traditional published tests, CBM-R materials are often comprised of 20 to 30 alternate forms/passages. Historically, CBM-R assessment materials were sampled from curricular materials. Recent research has documented the potentially deleterious effects of poorly controlled alternate forms on CBM-R outcomes. The purpose of this study was to examine alternate procedures for the selection of passages that comprise CBM-R passage-sets. The study examined four procedures for the evaluation and selection of passages, including random sampling, Spache readability formula, mean level of performance evaluation, and Euclidean Distance evaluation. The latter two procedures relied on field testing and evaluation of student performance. Each of eighty-eight students in second- and third-grade were administered 50 CBM-R passages. Generalizability and dependability studies were used to examine students' performance on these passages and evaluate CBM-R passage selection procedures. Results provide support for the use of field testing methods (i.e., calculating performance means and Euclidean Distances) for passage selection. Implications are discussed for future research and practice.  相似文献   

2.
Single-case designs provide an established technology for evaluating the effects of academic interventions. Researchers interested in studying the long-term effects of reading interventions often use curriculum-based measures of reading (CBM-R) as they possess many of the desirable characteristics for use in a time-series design. The reliability of CBM-R scores is often supported by research from group designs, but making idiographic interpretations regarding the change in a student’s oral reading rate requires attention to the precision of static scores and growth estimates. The purpose of this paper is twofold. First, we discuss how recent empirical work on the technical adequacy of CBM-R scores has revealed multiple threats to the data-evaluation validity when CBM-R passages are used to measure oral reading rate. Second, we identify pertinent considerations for conducting a visual analysis of intervention effects based on CBM-R data. We conclude with a brief discussion of implications for researchers considering the use of CBM-R within multiple-baseline designs.  相似文献   

3.
Interventionists often monitor the progress of students receiving supplemental interventions with general outcome measures (GOMs) such as curriculum-based measurement of reading (CBM-R). However, some researchers have suggested that interventionists should collect data more closely related to instructional targets, specific subskill mastery measures (SSMMs) because outcomes from GOMs such as CBM-R may not be sufficiently sensitive to gauge intervention effects. In turn, interventionists may prematurely terminate an effective intervention or continue to deliver an ineffective intervention if they do not monitor student progress with the appropriate measure. However, such recommendations are based upon expert opinion or studies with serious methodological shortcomings. We used multi-variate multilevel modeling to compare pre-intervention intercepts and intervention slopes between GOM and SSMM data collected concurrently in a sample of 96 first, 44 second, and 53 third grade students receiving tier 2 phonics interventions. Statistically significant differences were observed between slopes from SSMM consonant-vowel-consonant words and CBM-R data. Statistically significant differences in slopes were not observed for consonant blend, digraph or consonant-vowel-consonant-silent e (CVCe) SSMMs. Results suggest that using word lists to monitor student response to instruction for early struggling readers is beneficial but as students are exposed to more complex phonetic patterns, the distinction between SSMMs and CBM-R become less meaningful.  相似文献   

4.
Although curriculum based measures of oral reading (CBM-R) have strong technical adequacy, there is still a reason to believe that student performance may be influenced by factors of the testing situation, such as errors examiners make in administering and scoring the test. This study examined the construct-irrelevant variance introduced by examiners using a cross-classified multilevel model. We sought to determine the extent of variance in student CBM-R scores attributable to examiners and, if present, the extent to which it was moderated by students' grade level and English learner (EL) status. Fit indices indicated that a cross-classified random effects model (CCREM) best fits the data with measures nested within students, students nested within schools, and examiners crossing schools. Intraclass correlations of the CCREM revealed that roughly 16% of the variance in student CBM-R scores was associated between examiners. The remaining variance was associated with the measurement level, 3.59%; between students, 75.23%; and between schools, 5.21%. Results were moderated by grade level but not by EL status. The discussion addresses the implications of this error for low-stakes and high-stakes decisions about students, teacher evaluation systems, and hypothesis testing in reading intervention research.  相似文献   

5.
Curriculum based measurement of oral reading (CBM-R) is used to monitor the effects of academic interventions for individual students. Decisions to continue, modify, or terminate these interventions are made by interpreting time series CBM-R data. Such interpretation is founded upon visual analysis or the application of decision rules. The purpose of this study was to compare the accuracy of visual analysis and decision rules. Visual analysts interpreted 108 CBM-R progress monitoring graphs one of three ways: (a) without graphic aids, (b) with a goal line, or (c) with a goal line and a trend line. Graphs differed along three dimensions, including trend magnitude, variability of observations, and duration of data collection. Automated trend line and data point decision rules were also applied to each graph. Inferential analyses permitted the estimation of the probability of a correct decision (i.e., the student is improving – continue the intervention, or the student is not improving – discontinue the intervention) for each evaluation method as a function of trend magnitude, variability of observations, and duration of data collection. All evaluation methods performed better when students made adequate progress. Visual analysis and decision rules performed similarly when observations were less variable. Results suggest that educators should collect data for more than six weeks, take steps to control measurement error, and visually analyze graphs when data are variable. Implications for practice and research are discussed.  相似文献   

6.
This study examined the measurement equivalence of a global organizational survey measuring six work climate factors as administered across 25 countries (N = 31,315) in all regions of the world (West Europe, East Europe, North America, Latin America, South America, Middle East, Africa and Asia‐Pacific). Across all countries, the survey instrument exhibited ‘form equivalence’ and ‘metric equivalence’, suggesting that respondents completed the survey using the same frame‐of‐reference and interpreted the rating scale intervals similarly. Schwartz's (1994, 1999, 2004) cultural value theory was then used for grouping the countries in cultural regions, and to anticipate measurement equivalence of the data from the survey within and between these regions. Results showed partial support for Schwartz's theory. The English‐speaking region was the only region where empirical evidence for ‘scalar equivalence’ was found. No support was found for the prediction that measurement equivalence would be higher among countries that are part of cultural regions with a small cultural distance than among countries that are part of cultural regions with a large cultural distance. However, the use of a common language in a particular cultural region reduced the bias present in the cross‐country comparison within that region.  相似文献   

7.
ABSTRACT: Establishing the measurement equivalence of instruments is a prerequisite to making meaningful comparisons between individuals or within individuals over time. Whereas previous research has investigated the effects of rater characteristics on the measurement equivalence of performance ratings, the current study investigated a ratee characteristic—ratee job experience. Using confirmatory factor analysis and item response theory methods with replication, the measurement equivalence of supervisor ratings of 7,200 managers with differing levels of managerial experience was assessed. Overall, results indicated a high degree of measurement equivalence suggesting that meaningful comparison may be made across ratees with different levels of job experience.  相似文献   

8.
In the search to find cheaper, faster approaches for data collection, crowdsourcing methods (i.e., online labor portals that allow independent workers to complete surveys for compensation) have risen in popularity as a tool for personality researchers, despite a lack of evidence regarding the equivalence of crowdsourcing with traditional data collection methods. The purpose of this study was to evaluate crowdsourcing as a data collection tool by examining the measurement equivalence of crowdsourced data (i.e., from Amazon.com’s MTurk) with more traditional samples (i.e., an undergraduate sample and a sample of organizational employees). Our results (using a popular measure of Big Five personality) provided evidence of measurement equivalence across all three samples, with one important exception: crowdsourced data (from MTurk) only exhibited measurement invariance with traditional data collection methods when responses were restricted to participants from native-English speaking countries. Although MTurk appears to be an easy, cost-effective data collection tool, our results suggest that MTurk data are similar to traditionally-collected data only when the MTurk sample is restricted to IP addresses from English-speaking countries.  相似文献   

9.
David Lewis's modal realism claims that nothing can exist in more than one world or time, and that statements about how something would have been are to be analysed in terms of its counterpart. I first explain why the counterpart relation depends on de re modal statements in an intensional language, so that intuitive properties of similarity relations cannot be used to show that the counterpart relation is not an equivalence relation. I then look at test sentences in (the intensional) natural language, and show that none of them provide compelling evidence that a counterpart semantics is needed.  相似文献   

10.
Estimating a trend line through words read correct per minute scores collected across successive weeks is a preferred method to evaluate student response to instruction with curriculum-based measurement of reading (CBM-R). This is due in part, because the slope of that line of best fit is used to predict the trajectory of student performance if the current intervention is maintained. In turn, trend lines should predict future scores with a high degree of accuracy when an intervention is maintained. We evaluated the forecasting accuracy of a trend estimation method currently used in practice (i.e., ordinary least squares), and five alternate methods recently evaluated in CBM-R simulation studies, using actual student data. Results suggest that alternate trend estimation methods predicted future performance with a similar level of accuracy as ordinary least squares trend lines across most conditions, with the exception of slopes estimated via Bayesian analysis. Bayesian trend lines estimated using informed prior distributions yielded noticeably less biased and more precise predictions when applied to short data series relative to all other estimation methods across most conditions. Outcomes from the current study highlight the need to further explore the viability of Bayesian analysis to evaluate individual time series data.  相似文献   

11.
The pretest-posttest control group design can be analyzed with the posttest as dependent variable and the pretest as covariate (ANCOVA) or with the difference between posttest and pretest as dependent variable (CHANGE). These 2 methods can give contradictory results if groups differ at pretest, a phenomenon that is known as Lord's paradox. Literature claims that ANCOVA is preferable if treatment assignment is based on randomization or on the pretest and questionable for preexisting groups. Some literature suggests that Lord's paradox has to do with measurement error in the pretest. This article shows two new things: First, the claims are confirmed by proving the mathematical equivalence of ANCOVA to a repeated measures model without group effect at pretest. Second, correction for measurement error in the pretest is shown to lead back to ANCOVA or to CHANGE, depending on the assumed absence or presence of a true group difference at pretest. These two new theoretical results are illustrated with multilevel (mixed) regression and structural equation modeling of data from two studies.  相似文献   

12.
The logic of comparison is taken as a starting point. It is argued that any cross-cultural comparison presupposes a comparison scale, i.e. a scale that is identical across the populations included in a study. Scale identity can be specified for various levels of measurement. In the second section a simple classification is presented for inferences about cross-cultural differences derived from psychological measurements. Two questions are asked for various categories of inferences, viz., whether they are logically feasible and whether they can be validated empirically. In the third section the statistical analysis of psychometric conditions for equivalence is discussed. The fourth section deals with the problem what alternatives for meaningful interpretation a researcher has if data turn out to be lacking in equivalence. In the fifth section a conceptual problem is raised, namely whether the basic assumption of this article is realistic that psychological concepts are identical across cultures.  相似文献   

13.
We examined equivalence-based N400 effects by comparing EEG data from participants with different experiences with equivalence testing. Before a priming task used in EEG measurement, Group 1 was given only matching-to-sample training trials whereas Group 2 was exposed to matching-to-sample training and equivalence probe trials. We asked whether exposure to the reinforcement contingency was sufficient to bring about an N400 outcome that might indicate potentially emergent equivalence relations or if such a response depended on experience with equivalence tests. Results showed robust N400 in both groups. Experience with equivalence tests did not further increase the N400 effects. Our findings add confirmatory evidence that equivalence relations may originate via the reinforcement contingency alone. Furthermore, complementary EEG data collected from priming tasks involving words from natural language showed functional overlap between laboratory-defined equivalence and natural word-based N400 effects.  相似文献   

14.
John Worrall 《Synthese》2011,180(2):157-172
Are theories ‘underdetermined by the evidence’ in any way that should worry the scientific realist? I argue that no convincing reason has been given for thinking so. A crucial distinction is drawn between data equivalence and empirical equivalence. Duhem showed that it is always possible to produce a data equivalent rival to any accepted scientific theory. But there is no reason to regard such a rival as equally well empirically supported and hence no threat to realism. Two theories are empirically equivalent if they share all consequences expressed in purely observational vocabulary. This is a much stronger requirement than has hitherto been recognised—two such ‘rival’ theories must in fact agree on many claims that are clearly theoretical in nature. Given this, it is unclear how much of an impact on realism a demonstration that there is always an empirically equivalent ‘rival’ to any accepted theory would have—even if such a demonstration could be produced. Certainly in the case of the version of realism that I defend—structural realism—such a demonstration would have precisely no impact: two empirically equivalent theories are, according to structural realism, cognitively indistinguishable.  相似文献   

15.
Tracy L. Tylka 《Body image》2013,10(3):415-418
Considered a measure of positive body image, the Body Appreciation Scale (BAS; Avalos et al., 2005) assesses acceptance of, favorable opinions toward, and respect for the body. Although the BAS was originally developed for and psychometrically examined with women, researchers are administering it to men and making gender comparisons. However, tests of measurement equivalence/invariance are needed to determine whether the BAS operates similarly for women and men. Therefore, in the present study, the BAS's cross-gender configural, factor loading, and intercept invariance was examined among 930 college women and men. The BAS demonstrated measurement equivalence/invariance between women and men, suggesting that gender comparisons can be made with confidence. Additional evidence was accrued for the convergent validity of the male version of the BAS, as it was related to men's dissatisfaction with muscularity, body fat, and height. These findings reinforce the structural and construct integrity of the BAS.  相似文献   

16.
Category clustering is a robust finding in the free recall of familiar category members, but has rarely been studied with artificial categories. In the present study, college students learned artificial categories via stimulus-equivalence methodology. Arbitrary match-to-sample training with nonsense syllables established three interrelated conditional discriminations, and, for most subjects, unreinforced test trials revealed the emergent stimulus-control relations considered to be evidence of equivalence classes. Free-recall tests revealed evidence of significant within-class clustering both before and after equivalence testing, but was more pronounced after the equivalence tests. These findings confirm that classic phenomena like clustering in free recall can be studied with stimulus-equivalence methodology, thus allowing for experimental control over relevant variables.  相似文献   

17.
Stimulus equivalence is defined as the ability to relate stimuli in novel ways after training in which not all of the stimuli had been directly linked to one another. Sidman (2000) suggested all elements of conditional discrimination training contingencies that result in equivalence potentially become class members. Research has demonstrated the inclusion of samples, comparisons, responses, and reinforcers in equivalence classes. Given the evidence that all elements of a conditional discrimination become part of the class, the purpose of this study was to determine if class-specific prompts would also enter into their relevant equivalence classes. Experiment 1 investigated the inclusion of prompts in an equivalence class using abstract stimuli with neurotypical students enrolled in higher education courses. Experiment 2 systematically replicated Experiment 1 using meaningful stimuli and individuals diagnosed with autism spectrum disorder. The results of both experiments demonstrated that class-specific prompts became part of equivalence classes with the other positive elements of the contingency. The results are discussed in terms of class expansion and the potential impact on equivalence-based instruction.  相似文献   

18.
The simultaneous matching-to-sample procedures that are widely used to study stimulus equivalence in human participants have generally been unsuccessful in animals. However, functional equivalence classes have been demonstrated in pigeons and sea lions using a concurrent repeated reversal discrimination procedure. In this procedure, responding to one set of stimuli is reinforced but responding to a different set is not and the set associated with reinforcement is changed with multiple reversals during the experiment. The experiments reported here were designed to assess whether functional equivalence classes could be demonstrated in rats using similar techniques. Rats were initially trained with two sets of olfactory stimuli (six odors/set). Following many reversals, probe reversal sessions were conducted in which rats were exposed to a subset of the members of each set and, later in the session, the withheld stimuli were introduced. Responding to these delayed probe trials in accord with the reversed contingencies constituted transfer of function. There was some evidence of transfer in Experiment 1, but the effects were relatively weak and variable. Experiment 2 introduced procedural changes and found strong evidence of transfer of function consistent with the formation of functional equivalence classes. These procedures offer a promising strategy to study symbolic behavior in rodents.  相似文献   

19.
Given the growing interest in the study of subjective well-being as a measure of social progress, instruments that produce valid and reliable scores and that can be used within and across countries are needed. The aim of the present study was to analyze the measurement equivalence of the Day Reconstruction Method in its brief version, using nationally representative samples from Finland, Poland, and Spain obtained within the COURAGE in Europe project. The goodness-of-fit of a two-correlated-factors model and the reliability of the scores obtained were assessed. Cross-country invariance was tested employing a multiple group confirmatory factor analysis, through sequential constraint imposition. In each country, measurement invariance was tested across time frames (morning, afternoon and evening) and days of the week (weekday and weekend). The results found support for the hypothesis of a two-correlated-factors (positive and negative affect) structure; the reliability of the positive, the negative and the net affect scores showed appropriate values. A high equivalence across the three national samples was found: all items except one showed strong measurement invariance indicating that respondents from Finland, Poland, and Spain attribute the same meaning to the latent construct under study, and the levels of the underlying items are equal in all three countries. Similar results were found for the measurement equivalence across time frames and days of the week. Our findings support the assumption of comparability across the different samples considered; in general, higher positive affect and lower negative affect were found in Finland, in the evening and at the weekend.  相似文献   

20.
It is well established that humans and other animals may treat two perceptually different cues alike, if the cues have been individually paired with a common antecedent or a common consequence. Recently, Molet et al. (Psychon Bull Rev 18:618–623, 2011) reported evidence for a new form of acquired equivalence in human conditional discrimination, namely context-mediated equivalence. In the present research, using a flavor conditioning procedure, we asked whether rats would show similar context-mediated equivalence to demonstrate that this new form of acquired equivalence is a general phenomenon. Rats experienced two flavor cues A and B each presented either in the same context, X, or each in its own distinctive context, X or Y. Subsequently, the rats experienced B with sucrose in a third context, Z, and then the generalization of conditioning to A was assessed. When tested in Context Z, consumption of A was more marked when A and B had both been presented in the same context than when they had been presented in two different contexts. Thus, importantly, in the absence of the training context, cues that shared a common context at different times came to be treated as equivalent. This represents the first evidence of context-mediated equivalence in a nonhuman species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号