首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Reinforcement learning in the brain   总被引:1,自引:0,他引:1  
A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computational models. Increasingly, analysis at the computational level has drawn on ideas from reinforcement learning, which provide a normative framework within which decision-making can be analyzed. More recently, the fruits of these extensive lines of research have made contact with investigations into the neural basis of decision making. Converging evidence now links reinforcement learning to specific neural substrates, assigning them precise computational roles. Specifically, electrophysiological recordings in behaving animals and functional imaging of human decision-making have revealed in the brain the existence of a key reinforcement learning signal, the temporal difference reward prediction error. Here, we first introduce the formal reinforcement learning framework. We then review the multiple lines of evidence linking reinforcement learning to the function of dopaminergic neurons in the mammalian midbrain and to more recent data from human imaging experiments. We further extend the discussion to aspects of learning not associated with phasic dopamine signals, such as learning of goal-directed responding that may not be dopamine-dependent, and learning about the vigor (or rate) with which actions should be performed that has been linked to tonic aspects of dopaminergic signaling. We end with a brief discussion of some of the limitations of the reinforcement learning framework, highlighting questions for future research.  相似文献   

2.
From both functional and biological considerations, it is widely believed that action production, planning, and goal-oriented behaviors supported by the frontal cortex are organized hierarchically [Fuster (1991); Koechlin, E., Ody, C., & Kouneiher, F. (2003). Neuroscience: The architecture of cognitive control in the human prefrontal cortex. Science, 424, 1181-1184; Miller, G. A., Galanter, E., & Pribram, K. H. (1960). Plans and the structure of behavior. New York: Holt]. However, the nature of the different levels of the hierarchy remains unclear, and little attention has been paid to the origins of such a hierarchy. We address these issues through biologically-inspired computational models that develop representations through reinforcement learning. We explore several different factors in these models that might plausibly give rise to a hierarchical organization of representations within the PFC, including an initial connectivity hierarchy within PFC, a hierarchical set of connections between PFC and subcortical structures controlling it, and differential synaptic plasticity schedules. Simulation results indicate that architectural constraints contribute to the segregation of different types of representations, and that this segregation facilitates learning. These findings are consistent with the idea that there is a functional hierarchy in PFC, as captured in our earlier computational models of PFC function and a growing body of empirical data.  相似文献   

3.
以理性决策为基础的锻炼行为理论被认为是理解身体活动的主导体系, 它提供了与身体活动相关的认知构念作为有价值的信息。基于社会生态模型设计的行为干预措施, 因表现出了更好的效果而备受关注。近期研究表明, 积极的运动认知和当前体育环境都没能很好地促进个人锻炼习惯的养成, 因此有必要探索新的理论体系来阐明个人锻炼习惯的形成机制。解释身体活动的最新体系是双系统理论, 由于其考虑了身体活动的无意识和快乐决定因素, 有望提供一个更广泛的动机视角。一方面, 多个有代表性的身体活动双系统模型, 从简单的自发路径, 到情境线索与锻炼习惯, 再到突出自动情感评价作用的复杂概念模型, 阐明了系统1的构建, 结合锻炼行为理论所关注的系统2, 为模型的构建提供了依据。另一方面, 通过对双系统的竞争、协调和层级控制原则的分析, 为模型的控制提供了建议。经典的强化学习框架解释了双系统模型的构建与控制原则:在模型的构建方面, 无模型与基于模型的强化学习分别表示系统1和系统2。在模型的控制方面, Dyna协作架构与分层强化学习, 为身体活动可能是一种相互协作、分层执行的复杂行动组合提供了合理解释。最后提出强化学习视角下锻炼者-体育环境的互动模式, 试图从一个全新的角度探讨锻炼行为。  相似文献   

4.
Hierarchical models of behavior and prefrontal function   总被引:2,自引:0,他引:2  
The recognition of hierarchical structure in human behavior was one of the founding insights of the cognitive revolution. Despite decades of research, however, the computational mechanisms underlying hierarchically organized behavior are still not fully understood. Recent findings from behavioral and neuroscientific research have fueled a resurgence of interest in the problem, inspiring a new generation of computational models. In addition to developing some classic proposals, these models also break fresh ground, teasing apart different forms of hierarchical structure, placing a new focus on the issue of learning and addressing recent findings concerning the representation of behavioral hierarchies within the prefrontal cortex. In addition to offering explanations for some key aspects of behavior and functional neuroanatomy, the latest models also pose new questions for empirical research.  相似文献   

5.
To provide the first systematic test of whether young children will spontaneously perceive and imitate hierarchical structure in complex actions, a task was devised in which a set of 16 elements can be modelled through either of two different, hierarchically organized strategies. Three-year-old children showed a strong and significant tendency to copy whichever of the two hierarchical approaches they witnessed an adult perform. Responses to an element absent in demonstrations, but present at test, showed that children did not merely copy the chain of events they had witnessed, but acquired hierarchically structured rules to which the new element was assimilated. Consistent with this finding, children did not copy specific sequences of actions at lower hierarchical levels.  相似文献   

6.
Adapting decision making according to dynamic and probabilistic changes in action-reward contingencies is critical for survival in a competitive and resource-limited world. Much research has focused on elucidating the neural systems and computations that underlie how the brain identifies whether the consequences of actions are relatively good or bad. In contrast, less empirical research has focused on the mechanisms by which reinforcements might be used to guide decision making. Here, I review recent studies in which an attempt to bridge this gap has been made by characterizing how humans use reward information to guide and optimize decision making. Regions that have been implicated in reinforcement processing, including the striatum, orbitofrontal cortex, and anterior cingulate, also seem to mediate how reinforcements are used to adjust subsequent decision making. This research provides insights into why the brain devotes resources to evaluating reinforcements and suggests a direction for future research, from studying the mechanisms of reinforcement processing to studying the mechanisms of reinforcement learning.  相似文献   

7.
People encode goal-directed behaviors, such as assembling an object, by segmenting them into discrete actions, organized as goal-subgoal hierarchies. Does hierarchical encoding contribute to observational learning? Participants in 3 experiments segmented an object assembly task into coarse and fine units of action and later performed it themselves. Hierarchical encoding, measured by segmentation patterns, correlated with more accurate and more hierarchically structured performance of the later assembly task. Furthermore, hierarchical encoding increased when participants (a) segmented coarse units first, (b) explicitly looked for hierarchical structure, and (c) described actions while segmenting them. Improving hierarchical encoding always led to improvements in learning, as well as a surprising shift toward encoding and executing actions from the actor's spatial perspective instead of the participants' own. Hierarchical encoding facilitates observational learning by organizing perceived actions into a representation that can serve as an action plan.  相似文献   

8.
Dialogues on prediction errors   总被引:2,自引:0,他引:2  
The recognition that computational ideas from reinforcement learning are relevant to the study of neural circuits has taken the cognitive neuroscience community by storm. A central tenet of these models is that discrepancies between actual and expected outcomes can be used for learning. Neural correlates of such prediction-error signals have been observed now in midbrain dopaminergic neurons, striatum, amygdala and even prefrontal cortex, and models incorporating prediction errors have been invoked to explain complex phenomena such as the transition from goal-directed to habitual behavior. Yet, like any revolution, the fast-paced progress has left an uneven understanding in its wake. Here, we provide answers to ten simple questions about prediction errors, with the aim of exposing both the strengths and the limitations of this active area of neuroscience research.  相似文献   

9.
The neural substrate for behavioral, cognitive and linguistic actions is hierarchically organized in the cortex of the frontal lobe. In their methodologically impeccable study, Koechlin et al. reveal the neural dynamics of the frontal hierarchy in behavioral action. Progressively higher areas control the performance of actions requiring the integration of progressively more complex and temporally dispersed information. The study substantiates the crucial role of the prefrontal cortex in the temporal organization of behavior.  相似文献   

10.
Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.  相似文献   

11.
The amygdala and ventromedial prefrontal cortex in morality and psychopathy   总被引:12,自引:0,他引:12  
Recent work has implicated the amygdala and ventromedial prefrontal cortex in morality and, when dysfunctional, psychopathy. This model proposes that the amygdala, through stimulus-reinforcement learning, enables the association of actions that harm others with the aversive reinforcement of the victims' distress. Consequent information on reinforcement expectancy, fed forward to the ventromedial prefrontal cortex, can guide the healthy individual away from moral transgressions. In psychopathy, dysfunction in these structures means that care-based moral reasoning is compromised and the risk that antisocial behavior is used instrumentally to achieve goals is increased.  相似文献   

12.
An important issue in the field of learning is to what extent one can distinguish between behavior resulting from either belief or reinforcement learning. Previous research suggests that it is difficult or even impossible to distinguish belief from reinforcement learning: belief and reinforcement models often fit the empirical data equally well. However, previous research has been confined to specific games in specific settings. In the present study we derive predictions for behavior in games using the EWA learning model (e.g., Camerer & Ho, 1999), a model that includes belief learning and a specific type of reinforcement learning as special cases. We conclude that belief and reinforcement learning can be distinguished, even in 2×2 games. Maximum differentiation in behavior resulting from either belief or reinforcement learning is obtained in games with pure Nash equilibria with negative payoffs and at least one other strategy combination with only positive payoffs. Our results help researchers to identify games in which belief and reinforcement learning can be discerned easily.  相似文献   

13.
Individuals from across the psychosis spectrum display impairments in reinforcement learning. In some individuals, these deficits may result from aberrations in reward prediction error (RPE) signaling, conveyed by dopaminergic projections to the ventral striatum (VS). However, there is mounting evidence that VS RPE signals are relatively intact in medicated people with schizophrenia (PSZ). We hypothesized that, in PSZ, reinforcement learning deficits often are not related to RPE signaling per se but rather their impact on learning and behavior (i.e., learning rate modulation), due to dysfunction in anterior cingulate and dorsomedial prefrontal cortex (dmPFC). Twenty-six PSZ and 23 healthy volunteers completed a probabilistic reinforcement learning paradigm with occasional, sudden, shifts in contingencies. Using computational modeling, we found evidence of an impairment in trial-wise learning rate modulation (α) in PSZ before and after a reinforcement contingency shift, expressed most in PSZ with more severe motivational deficits. In a subsample of 22 PSZ and 22 healthy volunteers, we found little evidence for between-group differences in VS RPE and dmPFC learning rate signals, as measured with fMRI. However, a follow-up psychophysiological interaction analysis revealed decreased dmPFC-VS connectivity concurrent with learning rate modulation, most prominently in individuals with the most severe motivational deficits. These findings point to an impairment in learning rate modulation in PSZ, leading to a reduced ability to adjust task behavior in response to unexpected outcomes. At the level of the brain, learning rate modulation deficits may be associated with decreased involvement of the dmPFC within a greater RL network.  相似文献   

14.
Schizophrenia is characterized by an abnormal dopamine system, and dopamine blockade is the primary mechanism of antipsychotic treatment. Consistent with the known role of dopamine in reward processing, prior research has demonstrated that patients with schizophrenia exhibit impairments in reward-based learning. However, it remains unknown how treatment with antipsychotic medication impacts the behavioral and neural signatures of reinforcement learning in schizophrenia. The goal of this study was to examine whether antipsychotic medication modulates behavioral and neural responses to prediction error coding during reinforcement learning. Patients with schizophrenia completed a reinforcement learning task while undergoing functional magnetic resonance imaging. The task consisted of two separate conditions in which participants accumulated monetary gain or avoided monetary loss. Behavioral results indicated that antipsychotic medication dose was associated with altered behavioral approaches to learning, such that patients taking higher doses of medication showed increased sensitivity to negative reinforcement. Higher doses of antipsychotic medication were also associated with higher learning rates (LRs), suggesting that medication enhanced sensitivity to trial-by-trial feedback. Neuroimaging data demonstrated that antipsychotic dose was related to differences in neural signatures of feedback prediction error during the loss condition. Specifically, patients taking higher doses of medication showed attenuated prediction error responses in the striatum and the medial prefrontal cortex. These findings indicate that antipsychotic medication treatment may influence motivational processes in patients with schizophrenia.  相似文献   

15.
人类在社会互动中通过他人的行为对他人特质、意图及特定情境下的社会规范进行学习, 是优化决策、维护积极社会互动的重要条件。近年来, 越来越多的研究通过结合计算模型与神经影像技术对社会学习的认知计算机制及其神经基础进行了深入考察。已有研究发现, 人类的社会学习过程能够较好地被强化学习模型与贝叶斯模型刻画, 主要涉及的认知计算过程包括主观期望、预期误差和不确定性的表征以及信息整合的过程。大脑对这些计算过程的执行主要涉及奖惩加工相关脑区(如腹侧纹状体与腹内侧前额叶)、社会认知加工相关脑区(如背内侧前额叶和颞顶联合区)及认知控制相关脑区(如背外侧前额叶)。需要指出的是, 计算过程与大脑区域之间并不是一一映射的关系, 提示未来研究可借助多变量分析与脑网络分析等技术从系统神经科学的角度来考察大尺度脑网络如何执行不同计算过程。此外, 将来研究应注重生态效度, 利用超扫描技术考察真实互动下的社会学习过程, 并更多地关注内隐社会学习的计算与神经机制。  相似文献   

16.
Three categories of behavior analysis may be called molecular, molar, and unified. Molecular analyses focus on how manual shaping segments moment-to-moment behaving into new, unified, hierarchically organized patterns. Manual shaping is largely atheoretical, qualitative, and practical. Molar analyses aggregate behaviors and then compute a numerical average for the aggregate. Typical molar analyses involve average rate of, or average time allocated to, the aggregated behaviors. Some molar analyses have no known relation to any behavior stream. Molar analyses are usually quantitative and often theoretical. Unified analyses combine automated shaping of moment-to-moment behaving and molar aggregates of the shaped patterns. Unified controlling relations suggest that molar controlling relations like matching confound shaping and strengthening effects of reinforcement. If a molecular analysis is about how reinforcement organizes individual behavior moment by moment, and a molar analysis is about how reinforcement encourages more or less of an activity aggregated over time, then a unified analysis handles both kinds of analyses. Only theories engendered by computer simulation appear to be able to unify all three categories of behavior analysis.  相似文献   

17.
ABSTRACT— Human risk taking is characterized by a large amount of individual heterogeneity. In this study, we applied resting-state electroencephalography, which captures stable individual differences in neural activity, before subjects performed a risk-taking task. Using a source-localization technique, we found that the baseline cortical activity in the right prefrontal cortex predicts individual risk-taking behavior. Individuals with higher baseline cortical activity in this brain area display more risk aversion than do other individuals. This finding demonstrates that neural characteristics that are stable over time can predict a highly complex behavior such as risk-taking behavior and furthermore suggests that hypoactivity in the right prefrontal cortex might serve as a dispositional indicator of lower regulatory abilities, which is expressed in greater risk-taking behavior.  相似文献   

18.
The feedback negativity (FN), an early neural response that differentiates rewards from losses, appears to be generated in part by reward circuits in the brain. A prominent model of the FN suggests that it reflects learning processes by which environmental feedback shapes behavior. Although there is evidence that human behavior is more strongly influenced by rewards that quickly follow actions, in nonlaboratory settings, optimal behaviors are not always followed by immediate rewards. However, it is not clear how the introduction of a delay between response selection and feedback impacts the FN. Thus, the present study used a simple forced choice gambling task to elicit the FN, in which feedback about rewards and losses was presented after either 1 or 6?s. Results suggest that, at short delays (1?s), participants clearly differentiated losses from rewards, as evidenced in the magnitude of the FN. At long delays (6?s), on the other hand, the difference between losses and rewards was negligible. Results are discussed in terms of eligibility traces and the reinforcement learning model of the FN.  相似文献   

19.
We studied the choice behavior of 2 monkeys in a discrete-trial task with reinforcement contingencies similar to those Herrnstein (1961) used when he described the matching law. In each session, the monkeys experienced blocks of discrete trials at different relative-reinforcer frequencies or magnitudes with unsignalled transitions between the blocks. Steady-state data following adjustment to each transition were well characterized by the generalized matching law; response ratios undermatched reinforcer frequency ratios but matched reinforcer magnitude ratios. We modelled response-by-response behavior with linear models that used past reinforcers as well as past choices to predict the monkeys' choices on each trial. We found that more recently obtained reinforcers more strongly influenced choice behavior. Perhaps surprisingly, we also found that the monkeys' actions were influenced by the pattern of their own past choices. It was necessary to incorporate both past reinforcers and past choices in order to accurately capture steady-state behavior as well as the fluctuations during block transitions and the response-by-response patterns of behavior. Our results suggest that simple reinforcement learning models must account for the effects of past choices to accurately characterize behavior in this task, and that models with these properties provide a conceptual tool for studying how both past reinforcers and past choices are integrated by the neural systems that generate behavior.  相似文献   

20.
Although it is widely known that brain regions such as the prefrontal cortex, the amygdala, and the ventral striatum play large roles in decision making, their precise contributions remain unclear. Here, we used functional magnetic resonance imaging and principles of reinforcement learning theory to investigate the relationship between current reinforcements and future decisions. In the experiment, subjects chose between high-risk (i.e., low probability of a large monetary reward) and low-risk (high probability of a small reward) decisions. For each subject, we estimated value functions that represented the degree to which reinforcements affected the value of decision options on the subsequent trial. Individual differences in value functions predicted not only trial-to-trial behavioral strategies, such as choosing high-risk decisions following high-risk rewards, but also the relationship between activity in prefrontal and subcortical regions during one trial and the decision made in the subsequent trial. These findings provide a novel link between behavior and neural activity by demonstrating that value functions are manifested both in adjustments in behavioral strategies and in the neural activity that accompanies those adjustments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号