首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
Reinforcement learning in the brain   总被引:1,自引:0,他引:1  
A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computational models. Increasingly, analysis at the computational level has drawn on ideas from reinforcement learning, which provide a normative framework within which decision-making can be analyzed. More recently, the fruits of these extensive lines of research have made contact with investigations into the neural basis of decision making. Converging evidence now links reinforcement learning to specific neural substrates, assigning them precise computational roles. Specifically, electrophysiological recordings in behaving animals and functional imaging of human decision-making have revealed in the brain the existence of a key reinforcement learning signal, the temporal difference reward prediction error. Here, we first introduce the formal reinforcement learning framework. We then review the multiple lines of evidence linking reinforcement learning to the function of dopaminergic neurons in the mammalian midbrain and to more recent data from human imaging experiments. We further extend the discussion to aspects of learning not associated with phasic dopamine signals, such as learning of goal-directed responding that may not be dopamine-dependent, and learning about the vigor (or rate) with which actions should be performed that has been linked to tonic aspects of dopaminergic signaling. We end with a brief discussion of some of the limitations of the reinforcement learning framework, highlighting questions for future research.  相似文献   

I have tried to sketch an approach to the complex phenomena that go by the name of ‘mindfulness’ that both does justice to this complexity and depth, and also offers a way of thinking about mindfulness in evolutionary, ecosocial and neural terms: terms that enable us to ask questions like: where did mindfulness come from? What kind of consciousness is it? What was it for, before it was co-opted by spiritual and therapeutic kinds of discourse and practice? And how do brains do it? In essence, I am suggesting that human brains seem to have developed, for good evolutionary reasons, a degree of facility with imaginative empathy and as-if identification; and that mindfulness capitalises on this to create what is probably a uniquely human form of learning—or rather unlearning.  相似文献   

Memory is composed of several different abilities that are supported by different brain systems. The distinction between declarative (conscious) and nondeclarative (non-conscious) memory has proved useful in understanding the nature of eyeblink classical conditioning – the best understood example of classical conditioning in vertebrates. In delay conditioning, the standard procedure, conditioning depends on the cerebellum and brainstem and is intact in amnesia. Trace conditioning, a variant of the standard procedure, depends additionally on the hippocampus and neocortex and is impaired in amnesia. Recent studies have sharpened the contrast between delay and trace conditioning by exploring the importance of awareness. We discuss these new findings in relation to the brain systems supporting eyeblink conditioning and suggest why awareness is important for trace conditioning but not for delay conditioning.  相似文献   

The presence of complementary information across multiple sensory or motor modalities during learning, referred to as multimodal enrichment, can markedly benefit learning outcomes. Why is this? Here, we integrate cognitive, neuroscientific, and computational approaches to understanding the effectiveness of enrichment and discuss recent neuroscience findings indicating that crossmodal responses in sensory and motor brain regions causally contribute to the behavioral benefits of enrichment. The findings provide novel evidence for multimodal theories of enriched learning, challenge assumptions of longstanding cognitive theories, and provide counterevidence to unimodal neurobiologically inspired theories. Enriched educational methods are likely effective not only because they may engage greater levels of attention or deeper levels of processing, but also because multimodal interactions in the brain can enhance learning and memory.  相似文献   

The experiment reviewed here was an attempt to show that two differential Pavlovian conditioning designs, namely positive and negative patterning, can best be understood as rule learning. First, it is shown that positive patterning is equivalent to the logical rule of conjunction (AND) and that negative patterning is equivalent to the logical rule of exclusive disjunction (XOR). It is assumed that in order to learn both kinds of discrimination subjects learn to use the according rule. If this is the case, the observed differentiation should be independent of the number of reinforcements for each individual stimulus. Second, subjects should be able to transfer the rule to new stimuli. Forty human subjects were randomly divided into four groups (N=10 each). Two factors were manipulated independently between subjects: (1) positive vs negative patterning, and (2) 2 vs 4 pairs of trained stimuli. Second interval skin conductance responses were measured. During initial acquisition positive as well as negative patterning occurred independently of number of pairs of trained stimuli (with total amount of training kept constant). Furthermore, AND as well as XOR could be transfered to new stimuli.  相似文献   

In August 2013, the Western Cape Government adopted an Integrated Provincial Violence Prevention Policy Framework initiated by the provincial Department of Health in response to the unusually high incidence of, and health burden arising from, interpersonal violence. The policy framework encompasses a more comprehensive intersectoral approach to the prevention of violence than the traditional criminal justice and security-centred approach typically promoted in South Africa as the conventional wisdom. It aims to bring coherence and clarity to the government's objectives in the field of violence prevention by way of a whole-of-government approach encompassing all sectors. The Policy Framework attempts to balance short-term evidence-based interventions, such as reducing the availability and harmful use of alcohol, with longer term interventions that require the state and all citizens to take active responsibility in addressing more holistically the complex social norms that support violence. It is consonant with a “whole-of-society” approach current in the South African polity to policy formulation and implementation, and is underpinned by the public health-centred guidelines set out by the international Global Campaign for the Prevention of Violence. The policy framework supports evidence-based approaches for violence prevention and a review and consultation process aimed at aligning existing performance priorities and deliverables across departments. One year after its adoption we review the uptake of this policy and reflect on some of its early successes as well as barriers to its implementation. We identify early resistance arising from its conflict with intra-departmental priorities, the impact of competing policies and directives, and we propose a research agenda to support its uptake.  相似文献   

Decision making is a core competence for animals and humans acting and surviving in environments they only partially comprehend, gaining rewards and punishments for their troubles. Decision-theoretic concepts permeate experiments and computational models in ethology, psychology, and neuroscience. Here, we review a well-known, coherent Bayesian approach to decision making, showing how it unifies issues in Markovian decision problems, signal detection psychophysics, sequential sampling, and optimal exploration and discuss paradigmatic psychological and neural examples of each problem. We discuss computational issues concerning what subjects know about their task and how ambitious they are in seeking optimal solutions; we address algorithmic topics concerning model-based and model-free methods for making choices; and we highlight key aspects of the neural implementation of decision making.  相似文献   

Lindahl, Maj-Britt. Awareness, conditioning, and information processing in complex learning situations. Scand. J. Psychol., 1973, 14, 121–130.-Learning without awareness was studied with a method involving three phases for each of 2. 10 subjects: (1) a problem solving phase, (2) a phase of applying the solution found in (1), and (3) a test phase. In (2) the solution was applied to the classification of 50 successively presented instances. These were in one condition sampled so that a perceptually easily discriminable basis of classification obtained that was different from and redundant to the one of the solution of phase (1). The results of the test phase showed that this redundant basis of classification influenced the performance of the subjects in this condition despite unawareness of it according to verbal reports. The phenomenon was interpreted as a case of learning without awareness, describable in conditioning terms, that influenced performance. The limitations of this type of learning were discussed, and its relationship to an aware information processing type of learning was outlined.  相似文献   

In the first of two parts in which a general mathematical theory of non-symbolic learning and conditioning is constructed, the sections of the theory dealing with non-symbolic learning and conditioning are presented, and a number of its qualitative implications are compared with available experimental results. In general, the agreement is found to be rather close.  相似文献   

The second of two parts of this article extends a mathematical theory of non-symbolic learning and conditioning to cases where reward and punishment are involved. The preceding results are generalized to the case where stimuli and responses are related psychophysically, thus constituting a theory of transfer, generalization, and discrimination.  相似文献   

Stable leg-flexion CRs were successfully elaborated in cats receiving tone-strong shock pairings, but not in cats receiving tone-weak shock pairings. Both shock USs elicited reliable flexion URs in the presence of the CS, thus satisfying the contiguity requirement basic to the Pavlovian paradigm. Elaboration of the flexion CRs required a large number of trials relative to the conditioned freezing and decelerative heart rate responses which appeared after only a very few trials in the strong-US cats. As with flexion CRs, freezing and heart-rate responses never developed with the weak-shock US. When the weak-US cats were later switched to the strong US, freezing and heart-rate CRs quickly appeared and flexion CRs appeared after fewer strong-US trials than in cats receiving the strong shock originally. The results were interpreted as supporting a reinforcement conception of classical defense conditioning and as indicating the importance of using a US capable of eliciting emotional responses.  相似文献   

Community psychology gained formal recognition in Ghana when a few students were admitted to Wilfrid Laurier University, in Canada, to pursue master's degree in the early 1990s. In Ghana, community psychology is enacted through the operations of non-governmental organizations (NGOs) and professionals. The university classroom is also being used as the main context for introducing people to the field of community psychology. In comparison with the work of community psychologists in countries such as the United States of America and Canada, the field is still underdeveloped in Ghana. “Small wins” which refers to the process of achieving an intervention objective through gradual and incremental successes, are considered as examples of “best practice” in Ghana, where religion and superstition are at the heart of almost every activity. Despite the current challenges, community psychology has a promising future in Ghana.  相似文献   

Three experiments investigated the effects of reinforcement magnitude on conditioned key pecking in pigeons. Experiment 1, which included between-groups and within-subject designs, yielded significant effects of unconditioned stimulus (US) magnitude on the within-conditioned stimulus (CS) distribution of key pecks and on choice behavior, but no effect on the overall rate of key pecking. Experiment 2 employed a larger US-magnitude difference in a within-subject design. This manipulation resulted in differential rates of key pecking as well as a significant choice effect and differential within-CS key-peck distributions. A second-order conditioning procedure was used in Experiment 3, in which diffuse, visual stimuli (S1's) served as Pavlovian reinforcers for two key-light S2's. The S1 previously paired with a large US was more effective in conditioning second-order key-peck behavior to an S2 than was the S1 paired with a small US. The results of these experiments demonstrate that the associative effects of US magnitude can be expressed in the strength of CS-directed motor responding. The distinctive within-CS key-peck distributions in first-order conditioning suggests an interaction between CS- and US-directed responses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号