首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 703 毫秒
1.
Learning reward expectations in honeybees   总被引:1,自引:0,他引:1       下载免费PDF全文
The aim of this study was to test whether honeybees develop reward expectations. In our experiment, bees first learned to associate colors with a sugar reward in a setting closely resembling a natural foraging situation. We then evaluated whether and how the sequence of the animals’ experiences with different reward magnitudes changed their later behavior in the absence of reinforcement and within an otherwise similar context. We found that the bees that had experienced increasing reward magnitudes during training assigned more time to flower inspection 24 and 48 h after training. Our design and behavioral measurements allowed us to uncouple the signal learning and the nutritional aspects of foraging from the effects of subjective reward values. We thus found that the animals behaved differently neither because they had more strongly associated the related predicting signals nor because they were fed more or faster. Our results document for the first time that honeybees develop long-term expectations of reward; these expectations can guide their foraging behavior after a relatively long pause and in the absence of reinforcement, and further experiments will aim toward an elucidation of the neural mechanisms involved in this form of learning.  相似文献   

2.
Changes in reward magnitude or value have been reported to produce effects on timing behavior, which have been attributed to changes in the speed of an internal pacemaker in some instances and to attentional factors in other cases. The present experiments therefore aimed to clarify the effects of reward magnitude on timing processes. In Experiment 1, rats were trained to discriminate a short (2 s) vs. a long (8 s) signal followed by testing with intermediate durations. Then, the reward on short or long trials was increased from 1 to 4 pellets in separate groups. Experiment 2 measured the effect of different reward magnitudes associated with the short vs. long signals throughout training. Finally, Experiment 3 controlled for satiety effects during the reward magnitude manipulation phase. A general flattening of the psychophysical function was evident in all three experiments, suggesting that unequal reward magnitudes may disrupt attention to duration.  相似文献   

3.
Three experiments investigated the performance of rats on a task involving differential reinforcement of lever-press durations. Experiment 1, which employed a discrete-trials procedure, manipulated deprivation level between subjects and reward magnitude within subjects. The minimum lever-press duration which would result in reward was varied from .4 to 6.4 sec. It was found that low deprivation resulted in longer mean durations and less response variability at the higher criterial values than did high deprivation. The magnitude of reward was not found to affect performance. Experiment 2 manipulated reward magnitude between subjects while holding deprivation level constant, and used the same general procedures as in Experiment 1. Small reward resulted in longer mean lever-press durations and less variability in responding than did large reward at the higher criterial values. The intertrial intervals were omitted in Experiment 3 in which deprivation level was varied between subjects and reinforcement was delivered only for response durations extending between 6.0 and 7.6 sec. Low deprivation resulted in longer mean lever-press durations and less response variability than did high deprivation, but the probability of a rewarded press duration did not differ between groups. The results overall are consistent with the hypothesis that low deprivation and small reward magnitude lead to weaker goal-approach responses and, hence, to less competition with lever holding. The deprivation and reward magnitude manipulations did not appear to influence lever holding performance by affecting the ability of animals to form temporal discriminations.  相似文献   

4.
Rats responded on 2 levers delivering brain stimulation reward on concurrent variable interval schedules. Following many successive sessions with unchanging relative rates of reward, subjects adjusted to an eventual change slowly and showed spontaneous reversions at the beginning of subsequent sessions. When changes in rates of reward occurred between and within every session, subjects adjusted to them about as rapidly as they could in principle do so, as shown by comparison to a Bayesian model of an ideal detector. This and other features of the adjustments to frequent changes imply that the behavioral effect of reinforcement depends on the subject's perception of incomes and changes in incomes rather than on the strengthening and weakening of behaviors in accord with their past effects or expected results. Models for the process by which perceived incomes determine stay durations and for the process that detects changes in rates are developed.  相似文献   

5.
A great deal of behavioral and economic research suggests that the value attached to a reward stands in inverse relation to the amount of effort required to obtain it, a principle known as effort discounting. In the present article, we present the first direct evidence for a neural analogue of effort discounting. We used fMRI to measure neural responses to monetary rewards in the human nucleus accumbens (NAcc), a structure previously demonstrated to encode reference-dependent reward information. The magnitude of accumbens activation was found to vary with both reward outcome and the degree of mental effort demanded to obtain individual rewards. For a fixed level of reward, the NAcc was less strongly activated following a high-demand for effort than following a low demand. The magnitude of this effect was noted to correlate with preceding activation in the dorsal anterior cingulate cortex, a region that has been proposed to monitor information-processing demands and to mediate in the subjective experience of effort.  相似文献   

6.
For rats that bar pressed for intracranial electrical stimulation in a 2-lever matching paradigm with concurrent variable interval schedules of reward, the authors found that the time allocation ratio is based on a multiplicative combination of the ratio of subjective reward magnitudes and the ratio of the rates of reward. Multiplicative combining was observed in a range covering approximately 2 orders of magnitude in the ratio of the rates of reward from about 1:10 to 10:1) and an order of magnitude change in the size of rewards. After determining the relation between the pulse frequency of stimulation and subjective reward magnitude, the authors were able to predict from knowledge of the subjective magnitudes of the rewards and the obtained relative rates of reward the subject's time allocation ratio over a range in which it varied by more than 3 orders of magnitude.  相似文献   

7.
In humans, the order of receiving sequential rewards can significantly influence the overall subjective utility of an outcome. For example, people subjectively rate receiving a large reward by itself significantly higher than receiving the same large reward followed by a smaller one (Do, Rupert, & Wolford, 2008). This result is called the peak-end effect. A comparative analysis of order effects can help determine the generality of such effects across primates, and we therefore examined the influence of reward-quality order on decision making in three rhesus macaque monkeys (Macaca mulatta). When given the choice between a high-low reward sequence and a low-high sequence, all three monkeys preferred receiving the high-value reward first. Follow-up experiments showed that for two of the three monkeys their choices depended specifically on reward-quality order and could not be accounted for by delay discounting. These results provide evidence for the influence of outcome order on decision making in rhesus monkeys. Unlike humans, who usually discount choices when a low-value reward comes last, rhesus monkeys show no such peak-end effect.  相似文献   

8.
文章围绕人脑眶额皮质在表征奖赏信息上的核心作用展开,分别从人脑眶额皮质表征奖赏信息的共同神经表征和特异性神经表征特点,奖赏加工和该脑区局部形态特征之间的关系以及眶额皮质表征奖赏信息的时间动态性等角度对最近的重要研究进展进行了一定的梳理和概括。最后,文章讨论了未来研究需要解决的由奖赏类型多样性和脑区间奖赏信息整合复杂性带来的系列问题。  相似文献   

9.
In humans, the order of receiving sequential rewards can significantly influence the overall subjective utility of an outcome. For example, people subjectively rate receiving a large reward by itself significantly higher than receiving the same large reward followed by a smaller one (Do, Rupert, & Wolford, 2008). This result is called the peak-end effect. A comparative analysis of order effects can help determine the generality of such effects across primates, and we therefore examined the influence of reward-quality order on decision making in three rhesus macaque monkeys (Macaca mulatta). When given the choice between a high–low reward sequence and a low–high sequence, all three monkeys preferred receiving the high-value reward first. Follow-up experiments showed that for two of the three monkeys their choices depended specifically on reward-quality order and could not be accounted for by delay discounting. These results provide evidence for the influence of outcome order on decision making in rhesus monkeys. Unlike humans, who usually discount choices when a low-value reward comes last, rhesus monkeys show no such peak-end effect.  相似文献   

10.
Emerging evidence from decision neuroscience suggests that although younger and older adults show similar frontostriatal representations of reward magnitude, older adults often show deficits in feedback-driven reinforcement learning. In the present study, healthy adults completed reward-based tasks that did or did not depend on probabilistic learning, while undergoing functional neuroimaging. We observed reductions in the frontostriatal representation of prediction errors during probabilistic learning in older adults. In contrast, we found evidence for stability across adulthood in the representation of reward outcome in a task that did not require learning. Together, the results identify changes across adulthood in the dynamic coding of relational representations of feedback, in spite of preserved reward sensitivity in old age. Overall, the results suggest that the neural representation of prediction error, but not reward outcome, is reduced in old age. These findings reveal a potential dissociation between cognition and motivation with age and identify a potential mechanism for explaining changes in learning-dependent decision making in old adulthood.  相似文献   

11.
Reward signal plays an important role in guiding human learning behaviour. Recent studies have provided evidence that reward signal modulates perceptual learning of basic visual features. Typically, the reward effects on perceptual learning were accompanied with consciously presented reward during the learning process. However, whether an unconsciously presented reward signal that minimizes the contribution of attentional and motivational factors can facilitate perceptual learning remains less well understood. We trained human subjects on a visual motion detection task and subliminally delivered a monetary reward for correct response during the training. The results showed significantly larger learning effect for high reward-associated motion direction than low reward-associated motion direction. Importantly, subjects could neither discriminate the relative values of the subliminal monetary reward nor correctly report the reward-direction contingencies. Our findings suggest that reward signal plays an important modulatory role in perceptual learning even if the magnitude of the reward was not consciously perceived.  相似文献   

12.
The orbitofrontal cortex (OBFc) has been suggested to code the motivational value of environmental stimuli and to use this information for the flexible guidance of goal-directed behavior. To examine whether information regarding reward prediction is quantitatively represented in the rat OBFc, neural activity was recorded during an olfactory discrimination “go”/“no-go” task in which five different odor stimuli were predictive for various amounts of reward or an aversive reinforcer. Neural correlates related to both actual and expected reward magnitude were observed. Responses related to reward expectation occurred during the execution of the behavioral response toward the reward site and within a waiting period prior to reinforcement delivery. About one-half of these neurons demonstrated differential firing toward the different reward sizes. These data provide new and strong evidence that reward expectancy, regardless of reward magnitude, is coded by neurons of the rat OBFc, and are indicative for representation of quantitative information concerning expected reward. Moreover, neural correlates of reward expectancy appear to be distributed across both motor and nonmotor phases of the task.  相似文献   

13.
This article develops the cognitive—emotional forager (CEF) model, a novel application of a neural network to dynamical processes in foraging behavior. The CEF is based on a neural network known as the gated dipole, introduced by Grossberg, which is capable of representing short-term affective reactions in a manner similar to Solomon and Corbit’s (1974) opponent process theory. The model incorporates a trade-off between approach toward food and avoidance of predation under varying levels of motivation induced by hunger. The results of simulations in a simple patch selection paradigm, using a lifetime fitness criterion for comparison, indicate that the CEF model is capable of nearly optimal foraging and outperforms a run-of-luck rule-of-thumb model. Models such as the one presented here can illuminate the underlying cognitive and motivational components of animal decision making.  相似文献   

14.
The medial prefrontal cortex (mPFC) and the core region of the nucleus accumbens (AcbC) are key regions of a neural system that subserves risk-based decision making. Here, we examined whether dopamine (DA) signals conveyed to the mPFC and AcbC are critical for risk-based decision making. Rats with 6-hydroxydopamine or vehicle infusions into the mPFC or AcbC were examined in an instrumental task demanding probabilistic choice. In each session, probabilities of reward delivery after pressing one of two available levers were signaled in advance in forced trials followed by choice trials that assessed the animal??s preference. The probabilities of reward delivery associated with the large/risky lever declined systematically across four consecutive blocks but were kept constant within four subsequent daily sessions of a particular block. Thus, in a given session, rats need to assess the current value associated with the large/risky versus small/certain lever and adapt their lever preference accordingly. Results demonstrate that the assessment of within-session reward probabilities and probability discounting across blocks were not altered in rats with mPFC and AcbC DA depletions, relative to sham controls. These findings suggest that the capacity to evaluate the magnitude and likelihood of rewards associated with alternative courses of action seems not to rely on intact DA transmission in the mPFC or AcbC.  相似文献   

15.
A recent study found that guppies (Poecilia reticulata) can be trained to discriminate 4 versus 5 objects, a numerical discrimination typically achieved only by some mammals and birds. In that study, guppies were required to discriminate between two patches of small objects on the bottom of the tank that they could remove to find a food reward. It is not clear whether this species possesses exceptional numerical accuracy compared with the other ectothermic vertebrates or whether its remarkable performance was due to a specific predisposition to discriminate between differences in the quality of patches while foraging. To disentangle these possibilities, we trained guppies to the same numerical discriminations with a more conventional two-choice discrimination task. Stimuli were sets of dots presented on a computer screen, and the subjects received a food reward upon approaching the set with the larger numerosity. Though the cognitive problem was identical in the two experiments, the change in the experimental setting led to a much poorer performance as most fish failed even the 2 versus 3 discrimination. In four additional experiments, we varied the duration of the decision time, the type of stimuli, the length of training, and whether correction was allowed in order to identify the factors responsible for the difference. None of these parameters succeeded in increasing the performance to the level of the previous study, although the group trained with three-dimensional stimuli learned the easiest numerical task. We suggest that the different results with the two experimental settings might be due to constraints on learning and that guppies might be prepared to accurately estimate patch quality during foraging but not to learn an abstract stimulus–reward association.  相似文献   

16.
17.
On mental timing tasks, erroneous knowledge of results (KR) leads to incorrect performance accompanied by the subjective judgment of accurate performance. Using the start-stop technique (an analogue of the peak interval procedure) with both reproduction and production timing tasks, the authors analyze what processes erroneous KR alters. KR provides guidance (performance error information) that lowers decision thresholds. Erroneous KR also provides targeting information that alters response durations proportionately to the magnitude of the feedback error. On the production task, this shift results from changes in the reference memory, whereas on the reproduction task this shift results from changes in the decision threshold for responding. The idea that erroneous KR can alter different cognitive processes on related tasks is supported by the authors' demonstration that the learned strategies can transfer from the reproduction task to the production task but not visa versa. Thus effects of KR are both task and context dependent.  相似文献   

18.
To determine the joint effects of partial reward and reward magnitude on acquisition and extinction rates, and on acquisition and extinction asymptotes, 215 Wistar albino rats were trained in a Hunter straight runway. The experimental design was a 4 × 4 × 2 factorial combining four reward magnitudes, four reward percentages, and two experimenters. The data revealed that the acquisition rate was an increasing function of both percentage and magnitude of reward and that neither reward magnitude nor percentage of reward significantly affected acquisition asymptote. For extinction, it was found that, for continuous schedules, the larger the reward magnitude the less the resistance to extinction and, for partial schedules, the larger the reward magnitude the greater the resistance to extinction. These results were interpreted within the framework of the sequential effects hypothesis (Capaldi, 1966).  相似文献   

19.
When feedback follows a sequence of decisions, relationships between actions and outcomes can be difficult to learn. We used event-related potentials (ERPs) to understand how people overcome this temporal credit assignment problem. Participants performed a sequential decision task that required two decisions on each trial. The first decision led to an intermediate state that was predictive of the trial outcome, and the second decision was followed by positive or negative trial feedback. The feedback-related negativity (fERN), a component thought to reflect reward prediction error, followed negative feedback and negative intermediate states. This suggests that participants evaluated intermediate states in terms of expected future reward, and that these evaluations supported learning of earlier actions within sequences. We examine the predictions of several temporal-difference models to determine whether the behavioral and ERP results reflected a reinforcement-learning process.  相似文献   

20.
Two rat experiments shed light on how variation in behavior is regulated. Experiment 1 used the peak procedure. On most trials, the 1st bar press more than 40 s after signal onset ended the signal and produced food. Other trials lasted much longer and ended without food. On those trials, the variability of bar-press duration increased greatly after the 1st response more than 40 s after signal onset. In Experiment 2, which asked whether the increase was due to the omission of expected reward or the decrease in reward expectation, reward expectation had a strong effect on response duration, whereas omission of expected reward had little effect. In both experiments, response rate and response duration changed independently, suggesting that they reflect different parts of the underlying mechanism. In Experiment 1, response durations implied that timing of the signal was more accurate than the rate-vs.-time function might suggest. Experiment 2 suggested that lowering reward expectation increases variation in response form.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号