首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Botvinick MM  Niv Y  Barto AC 《Cognition》2009,113(3):262-280
Research on human and animal behavior has long emphasized its hierarchical structure—the divisibility of ongoing behavior into discrete tasks, which are comprised of subtask sequences, which in turn are built of simple actions. The hierarchical structure of behavior has also been of enduring interest within neuroscience, where it has been widely considered to reflect prefrontal cortical functions. In this paper, we reexamine behavioral hierarchy and its neural substrates from the point of view of recent developments in computational reinforcement learning. Specifically, we consider a set of approaches known collectively as hierarchical reinforcement learning, which extend the reinforcement learning paradigm by allowing the learning agent to aggregate actions into reusable subroutines or skills. A close look at the components of hierarchical reinforcement learning suggests how they might map onto neural structures, in particular regions within the dorsolateral and orbital prefrontal cortex. It also suggests specific ways in which hierarchical reinforcement learning might provide a complement to existing psychological models of hierarchically structured behavior. A particularly important question that hierarchical reinforcement learning brings to the fore is that of how learning identifies new action routines that are likely to provide useful building blocks in solving a wide range of future problems. Here and at many other points, hierarchical reinforcement learning offers an appealing framework for investigating the computational and neural underpinnings of hierarchically structured behavior.  相似文献   

2.
错误加工的神经机制   总被引:2,自引:0,他引:2  
错误加工在认知控制和行为监控中起着关键的作用。监控自己的认知和行为结果,觉察到应该的反应和实际的反应之间的差别(即错误),纠正这种错误并防止再犯是错误加工的主要内容。错误反应或由代表反应错误等负性反馈刺激诱发的ERP相对于正确反应或由代表反应正确等正性反馈刺激诱发的ERP,表现出一个相对负走向的波形变化,称为错误相关负波(error-related negativity, ERN)和反馈相关负波(feedback-related negativity, FRN),两者都定位于前扣回附近。错误加工神经机制的研究主要集中于前扣回与ERN、错误意识与错误加工、ERN与FRN的关系。对于前扣带回和ERN的功能意义的解释主要有错误检测理论、冲突监控理论、强化学习理论和预期违反假说。目前错误加工神经机制的研究在研究内容、研究方法和理论观点等方面还存在些问题,这些可能是未来进一步的研究方向  相似文献   

3.
A recent theory holds that the anterior cingulate cortex (ACC) uses reinforcement learning signals conveyed by the midbrain dopamine system to facilitate flexible action selection. According to this position, the impact of reward prediction error signals on ACC modulates the amplitude of a component of the event-related brain potential called the error-related negativity (ERN). The theory predicts that ERN amplitude is monotonically related to the expectedness of the event: It is larger for unexpected outcomes than for expected outcomes. However, a recent failure to confirm this prediction has called the theory into question. In the present article, we investigated this discrepancy in three trial-and-error learning experiments. All three experiments provided support for the theory, but the effect sizes were largest when an optimal response strategy could actually be learned. This observation suggests that ACC utilizes dopamine reward prediction error signals for adaptive decision making when the optimal behavior is, in fact, learnable.  相似文献   

4.
Behavioral studies suggest that two affective dimensions of personality are associated with working memory (WM) function. WM load is known to modulate neural activity in the caudal anterior cingulate cortex (ACC), a brain region critical for the cognitive control of behavior. On this basis, we hypothesized that neural activity in the caudal ACC during a WM task should be associated with personality: correlated negatively with behavioral approach sensitivity (BAS) and positively with behavioral inhibition sensitivity (BIS). Using functional magnetic resonance imaging, we measured brain activity in 14 participants performing a three-back WM task. Higher self-reported BAS predicted better WM performance (r = .27) and lower WM-related activation in the caudal ACC (r = -.84), suggesting personality differences in cognitive control. The data bolster approach-withdrawal (action control) theories of personality and suggest refinements to the dominant views of ACC and personality.  相似文献   

5.
Although a growing number of studies have investigated the neural mechanisms of reinforcement learning, it remains unclear how the brain responds to feedback that is unreliable. A recent theory proposes that the reward positivity (RewP) component of the event-related brain potential (ERP) and frontal midline theta (FMT) power reflect separate feedback-related processing functions of anterior cingulate cortex (ACC). In the present study, the electroencephalogram (EEG) was recorded from participants as they engaged in a time estimation task in which feedback reliability was manipulated across conditions. After each response, they received a cue that indicated that the following feedback stimulus was 100%, 75%, or 50% reliable. The results showed that participants’ time estimates adjusted linearly according to the feedback reliability. Moreover, presentation of the cue indicating 100% reliability elicited a larger RewP-like ERP component than the other cues did, and feedback presentation elicited a RewP of approximately equal amplitude for all of the three reliability conditions. By contrast, FMT power elicited by negative feedback decreased linearly from the 100% condition to 75% and 50% condition, and only FMT power predicted behavioral adjustments on the following trials. In addition, an analysis of Beta power and cross-frequency coupling (CFC) of Beta power with FMT phase suggested that Beta-FMT communication modulated motor areas for the purpose of adjusting behavior. We interpreted these findings in terms of the hierarchical reinforcement learning account of ACC, in which the RewP and FMT are proposed to reflect reward processing and control functions of ACC, respectively.  相似文献   

6.
Inconsistencies in findings between age and perceived locus of control of reinforcement were examined in light of social learning theory. Absence of work was hypothesized to reduce opportunities for reinforcement and thus expectancies. No differences were found in internal-external (I-E) locus of control among nine age groups (20 to 65 years) for subjects (882 school teachers) during the span of their work lives. It seems that I-E depends on the frequency and intensity of expectancies for behavior reinforcement sequences that work affords. Before and after work life there is not only less to control, but many of the nonwork reinforcers are not contingent on one's own behavior. Relinquishing internal control and a shift of focus toward reflection on experience and meaning of ife may well be a desirable and natural process for older people.  相似文献   

7.
This study asked whether the concurrent reinforcement of behavioral variability facilitates learning to emit a difficult target response. Sixty students repeatedly pressed sequences of keys, with an originally infrequently occurring target sequence consistently being followed by positive feedback. Three conditions differed in the feedback given to non-target sequences: concurrent positive feedback presented contingent on response variability, positive feedback presented non-contingently, or no reinforcement for any non-target responses (control condition). Contrary to the result of analogous rat studies, if anything, the participants in the control condition more readily learned to emit the target sequence than did the subjects in each of the other two conditions. It is argued that these contradictory findings are primarily caused by procedural differences, such as differences in the density of the reinforcement schedule applied to non-target behavior, rather than reflecting a true species difference.  相似文献   

8.
A number of theories suggest that people behave similarly in similar situations. Social learning theory in particular suggests that people behave similarly in situations perceived to be similar in their pattern of reinforcement contingencies. This study used two measures of perception of behavior similarity and three measures of perception of situation similarity for 20 situations chosen by each of II female subjects as beingss characteristic of her current life. Measures of perceived behavior similarity included paired comparison judgments and analyses of similarity of ratings of behavior probabilities. Measures of perceived situation similarity included paired comparison judgments and analyses of similarity of ratings of outcome or reinforcement contingencies for the specified behaviors, including both internal and external reinforcers. In addition, reliability estimates were obtained on some tasks. Results indicated the following: (1) Generally there was a statistically significant relationship between measures of perceived situation similarity and measures of perceived behavior similarity. The magnitude of the relationship varied considerably from subject to subject. (2) Measures of the same variables did not show better agreement with one another than they did with measures of the different variables, despite evidence of adequate reliability. The data suggested general support for social learning theory but also evidence that factors other than perceived reinforcers in the situation influence how situations are perceived and how people behave in them.  相似文献   

9.
The authors propose a reinforcement-learning mechanism as a model for recurrent choice and extend it to account for skill learning. The model was inspired by recent research in neurophysiological studies of the basal ganglia and provides an integrated explanation of recurrent choice behavior and skill learning. The behavior includes effects of differential probabilities, magnitudes, variabilities, and delay of reinforcement. The model can also produce the violation of independence, preference reversals, and the goal gradient of reinforcement in maze learning. An experiment was conducted to study learning of action sequences in a multistep task. The fit of the model to the data demonstrated its ability to account for complex skill learning. The advantages of incorporating the mechanism into a larger cognitive architecture are discussed.  相似文献   

10.
The importance of cognition in the facilitation and reinforcement of criminal behavior has been highlighted and recognized in numerous offender populations. In particular, professionals have theorized that various offender populations hold offense-supportive schemas or implicit theories that require treatment in therapy. However, the role of cognition in deliberate firesetting has received no focused conceptual or theoretical attention. Using current research evidence and theory relating to general cognition and the characteristics of firesetters, this paper outlines a preliminary conceptual framework of the potential cognitions (in the form of implicit theories) that are likely to characterize firesetters. Five implicit theories are proposed that may be associated with firesetting behavior. The content, structure, and etiological functions of these implicit theories are described as well as the cognitive similarities between firesetters and other offender types. Future research implications and practical implications of the proposed implicit theories are also discussed.  相似文献   

11.
Helplessness, a belief that the world is not subject to behavioral control, has long been central to our understanding of depression, and has influenced cognitive theories, animal models and behavioral treatments. However, despite its importance, there is no fully accepted definition of helplessness or behavioral control in psychology or psychiatry, and the formal treatments in engineering appear to capture only limited aspects of the intuitive concepts. Here, we formalize controllability in terms of characteristics of prior distributions over affectively charged environments. We explore the relevance of this notion of control to reinforcement learning methods of optimising behavior in such environments and consider how apparently maladaptive beliefs can result from normative inference processes. These results are discussed with reference to depression and animal models thereof.  相似文献   

12.
Recent advances have allowed the application of behaviorism's rigor to the control of complex cognitive tasks in animals. This article examines recent research on serially organized behavior in animals. 'Chaining theory', the traditional approach to the study of such behavior, reduces intelligent action to sequences of discrete stimulus-response units in which each overt response is evoked by a particular stimulus. However, such theories are too weak to explain many forms of serially organized cognition, both in humans and animals. By training non-human primates to produce arbitrary sequences that cannot be learned as chains of particular motor responses, the simultaneous chaining paradigm has overcome limitations of chaining theory in experiments on serial expertise, the use of numerical rules, knowledge of ordinal position, and distance and magnitude effects.  相似文献   

13.
Some studies have found that extinction leaves response structures unaltered; others have found that response variability is increased. Responding by Long-Evans rats was extinguished after 3 schedules. In one, reinforcement depended on repetitions of a particular response sequence across 3 operanda. In another, sequences were reinforced only if they varied. In the third, reinforcement was yoked: not contingent upon repetitions or variations. In all cases, rare sequences increased during extinction--variability increased--but the ordering of sequence probabilities was generally unchanged, the most common sequences during reinforcement continuing to be most frequent in extinction. The rats' combination of generally doing what worked before but occasionally doing something very different may maximize the possibility of reinforcement from a previously bountiful source while providing necessary variations for new learning.  相似文献   

14.
An important issue in the field of learning is to what extent one can distinguish between behavior resulting from either belief or reinforcement learning. Previous research suggests that it is difficult or even impossible to distinguish belief from reinforcement learning: belief and reinforcement models often fit the empirical data equally well. However, previous research has been confined to specific games in specific settings. In the present study we derive predictions for behavior in games using the EWA learning model (e.g., Camerer & Ho, 1999), a model that includes belief learning and a specific type of reinforcement learning as special cases. We conclude that belief and reinforcement learning can be distinguished, even in 2×2 games. Maximum differentiation in behavior resulting from either belief or reinforcement learning is obtained in games with pure Nash equilibria with negative payoffs and at least one other strategy combination with only positive payoffs. Our results help researchers to identify games in which belief and reinforcement learning can be discerned easily.  相似文献   

15.
This experiment investigated social referencing as a form of discriminative learning in which maternal facial expressions signaled the consequences of the infant's behavior in an ambiguous context. Eleven 4- and 5-month-old infants and their mothers participated in a discrimination-training procedure using an ABAB design. Different consequences followed infants' reaching toward an unfamiliar object depending on the particular maternal facial expression. During the training phases, a joyful facial expression signaled positive reinforcement for the infant reaching for an ambiguous object, whereas a fearful expression signaled aversive stimulation for the same response. Baseline and extinction conditions were implemented as controls. Mothers' expressions acquired control over infants' approach behavior for all participants. All participants ceased to show discriminated responding during the extinction phase. The results suggest that 4- and 5-month-old infants can learn social referencing via discrimination training.  相似文献   

16.
The theme is proposed that a learning theory of vocational behavior could contribute to both understanding and theory development in the area of vocational decisions. The discussion includes a definition of vocational decisions and specification of three functions of a theory. The functions are explanation, prediction, and control. The developmental, psychoanalytic, and trait approaches to theories of vocational decisions are contrasted with a learning approach in terms of the definition of decisions and the functions of theories. An example of vocational counseling illustrates the application of learning concepts to vocational decision behavior. The major advantages of a learning theory approach are that it has the potential for both accurate prediction and extensive control of decisions.  相似文献   

17.
The numerous mechanisms of behavior change in infant development are sometimes difficult to distinguish. Although it is agreed that elicitation and reinforcement both influence infant learning, the distinction between these two learning mechanisms was clarified in response to K. Bloom's (1984, Journal of Experimental Child Psychology, 38, 93-102) commentary. The theoretical and methodological assumptions of an functional analysis of infant behavior were made explicit in the context of the C. L. Poulson study (1983, Journal of Experimental Child Psychology, 36, 471-489). The rationale for the use of DRO schedules to control for elicitation effects of continuous reinforcement and the inadequacy of noncontingent schedules for this purpose were also discussed.  相似文献   

18.
An experiment with rats examined the roles of demarcating stimuli and differential reinforcement probability on the development of functional response units. It examined the development of units in a probabilistic, free-operant situation in which the presence of demarcating stimuli was manipulated. In all conditions, behavior became organized into two-response sequences framed by changes in local reinforcement probability. A tone demarcating the beginning and end of contingent response sequences facilitated the development of functional response units, as in chunking, but the same units developed slowly in the absence of the tone. Complex functional response units developed even though reinforcement contigencies remained constant. These findings demonstrate that models of operant learning must include a mechanism for changing the response unit as a function of reinforcement history. Markov models may seem to be a natural technique for modeling response sequences because of their ability to predict individual responses as a function of reinforcement history; however, no class of Markov chain can incorporate changing response units in their predictions.  相似文献   

19.
Persistence refers to the extent to which an individual pursues reinforcement that is no longer available. The most common generalization regarding persistence is the partial reinforcement extinction effect, which states that partial, rather than continuous, reinforcement creates the greatest level of persistence. Although the partial reinforcement effect is the most common effect in humans, exceptions exist, namely the generalized and the reversed partial reinforcement effect. Since the 1930s, psychologists have used 2 general paradigms for studying persistence in humans: the experimental paradigm and the cognitive/individual differences paradigm. For the experimental paradigm, the primary independent variable is the schedule of reinforcement used to establish the behavior prior to the removal of reinforcement. Explanations of persistence from the experimental perspective depend on associative principles derived from various theories of learning. By contrast, the cognitive/individual differences paradigm treats persistence as a function of trait variables, including locus of control and self-esteem, or general cognitive processes, such as cognitive dissonance or social cognition. In this article, the author reviews the status of the current literature on persistence and recommends directions for future research.  相似文献   

20.
Self-stimulatory behavior is repetitive, stereotyped, functionally autonomous behavior seen in both normal and developmentally disabled populations, yet no satisfactory theory of its development and major characteristics has previously been offered. We present here a detailed hypothesis of the acquisition and maintenance of self-stimulatory behavior, proposing that the behaviors are operant responses whose reinforcers are automatically produced interoceptive and exteroceptive perceptual consequences. The concept of perceptual stimuli and reinforcers, the durability of self-stimulatory behaviors, the sensory extinction effect, the inverse relationship between self-stimulatory and other behaviors, the blocking effect of self-stimulatory behavior on new learning, and response substitution effects are discussed in terms of the hypothesis. Support for the hypothesis from the areas of sensory reinforcement and sensory deprivation is also reviewed. Limitations of major alternative theories are discussed, along with implications of the perceptual reinforcement hypothesis for the treatment of excessive self-stimulatory behavior and for theoretical conceptualizations of functionally related normal and pathological behaviors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号