首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Botvinick MM  Niv Y  Barto AC 《Cognition》2009,113(3):262-280
Research on human and animal behavior has long emphasized its hierarchical structure—the divisibility of ongoing behavior into discrete tasks, which are comprised of subtask sequences, which in turn are built of simple actions. The hierarchical structure of behavior has also been of enduring interest within neuroscience, where it has been widely considered to reflect prefrontal cortical functions. In this paper, we reexamine behavioral hierarchy and its neural substrates from the point of view of recent developments in computational reinforcement learning. Specifically, we consider a set of approaches known collectively as hierarchical reinforcement learning, which extend the reinforcement learning paradigm by allowing the learning agent to aggregate actions into reusable subroutines or skills. A close look at the components of hierarchical reinforcement learning suggests how they might map onto neural structures, in particular regions within the dorsolateral and orbital prefrontal cortex. It also suggests specific ways in which hierarchical reinforcement learning might provide a complement to existing psychological models of hierarchically structured behavior. A particularly important question that hierarchical reinforcement learning brings to the fore is that of how learning identifies new action routines that are likely to provide useful building blocks in solving a wide range of future problems. Here and at many other points, hierarchical reinforcement learning offers an appealing framework for investigating the computational and neural underpinnings of hierarchically structured behavior.  相似文献   

2.
Adaptive learning models are used to predict behavior in repeated choice tasks. Predictions can be based on previous payoffs or previous choices of the player. The current paper proposes a new method for evaluating the degree of reliance on past choices, called equal payoff series extraction (EPSE). Under this method a simulated player has the same exact choices as the player but receives equal constant payoffs from all of the alternatives. Success in predicting the next choice ahead for this simulated player therefore relies strictly on mimicry of previous choices of the actual player. This allows determining the marginal fit of predictions that are not based on the actual task payoffs. To evaluate the reliance on past choices under different models, an experiment was conducted in which 48 participants completed a three-alternative choice task in four task conditions. Two different learning rules were evaluated: an interference rule and a decay rule. The results showed that while the predictions of the decay rule relied more on past choices, only the reliance on past payoffs was associated with improved parameter generality. Moreover, we show that the Equal Payoff Series can be used as a criterion for optimizing parameters resulting in better parameter generalizability.  相似文献   

3.
Analysis of binary choice behavior in iterated tasks with immediate feedback reveals robust deviations from maximization that can be described as indications of 3 effects: (a) a payoff variability effect, in which high payoff variability seems to move choice behavior toward random choice; (b) underweighting of rare events, in which alternatives that yield the best payoffs most of the time are attractive even when they are associated with a lower expected return; and (c) loss aversion, in which alternatives that minimize the probability of losses can be more attractive than those that maximize expected payoffs. The results are closer to probability matching than to maximization. Best approximation is provided with a model of reinforcement learning among cognitive strategies (RELACS). This model captures the 3 deviations, the learning curves, and the effect of information on uncertainty avoidance. It outperforms other models in fitting the data and in predicting behavior in other experiments.  相似文献   

4.
5.
The purpose of the popular Iowa gambling task is to study decision making deficits in clinical populations by mimicking real-life decision making in an experimental context. Busemeyer and Stout [Busemeyer, J. R., & Stout, J. C. (2002). A contribution of cognitive decision models to clinical assessment: Decomposing performance on the Bechara gambling task. Psychological Assessment, 14, 253-262] proposed an “Expectancy Valence” reinforcement learning model that estimates three latent components which are assumed to jointly determine choice behavior in the Iowa gambling task: weighing of wins versus losses, memory for past payoffs, and response consistency. In this article we explore the statistical properties of the Expectancy Valence model. We first demonstrate the difficulty of applying the model on the level of a single participant, we then propose and implement a Bayesian hierarchical estimation procedure to coherently combine information from different participants, and we finally apply the Bayesian estimation procedure to data from an experiment designed to provide a test of specific influence.  相似文献   

6.
In this paper, we study the connections between working memory capacity (WMC) and learning in the context of economic guessing games. We apply a generalized version of reinforcement learning, popularly known as the experience-weighted attraction (EWA) learning model, which has a connection to specific cognitive constructs, such as memory decay, the depreciation of past experience, counterfactual thinking, and choice intensity. Through the estimates of the model, we examine behavioral differences among individuals due to different levels of WMC. In accordance with ‘Miller’s magic number’, which is the constraint of working memory capacity, we consider two different sizes (granularities) of strategy space: one is larger (finer) and one is smaller (coarser). We find that constraining the EWA models by using levels (granules) within the limits of working memory allows for a better characterization of the data based on individual differences in WMC. Using this level-reinforcement version of EWA learning, also referred to as the EWA rule learning model, we find that working memory capacity can significantly affect learning behavior. Our likelihood ratio test rejects the null that subjects with high WMC and subjects with low WMC follow the same EWA learning model. In addition, the parameter corresponding to ‘counterfactual thinking ability’ is found to be reduced when working memory capacity is low.  相似文献   

7.
We investigated preschoolers’ selective learning from models that had previously appeared to be reliable or unreliable. Replicating previous research, children from 4 years selectively learned novel words from reliable over unreliable speakers. Extending previous research, children also selectively learned other kinds of acts – novel games – from reliable actors. More important, – and novel to this study, this selective learning was not just based on a preference for one model or one kind of act, but had a normative dimension to it. Children understood the way a reliable actor demonstrated an act not only as the better one, but as the normatively appropriate or correct one, as indicated in both their explicit verbal comments and their spontaneous normative interventions (e.g., protest, critique) in response to third-party acts deviating from the one demonstrated. These findings are discussed in the broader context of the development of children's social cognition and cultural learning.  相似文献   

8.
Reinforcement learning in the brain   总被引:1,自引:0,他引:1  
A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computational models. Increasingly, analysis at the computational level has drawn on ideas from reinforcement learning, which provide a normative framework within which decision-making can be analyzed. More recently, the fruits of these extensive lines of research have made contact with investigations into the neural basis of decision making. Converging evidence now links reinforcement learning to specific neural substrates, assigning them precise computational roles. Specifically, electrophysiological recordings in behaving animals and functional imaging of human decision-making have revealed in the brain the existence of a key reinforcement learning signal, the temporal difference reward prediction error. Here, we first introduce the formal reinforcement learning framework. We then review the multiple lines of evidence linking reinforcement learning to the function of dopaminergic neurons in the mammalian midbrain and to more recent data from human imaging experiments. We further extend the discussion to aspects of learning not associated with phasic dopamine signals, such as learning of goal-directed responding that may not be dopamine-dependent, and learning about the vigor (or rate) with which actions should be performed that has been linked to tonic aspects of dopaminergic signaling. We end with a brief discussion of some of the limitations of the reinforcement learning framework, highlighting questions for future research.  相似文献   

9.
Foregone payoffs add information about the outcomes for alternatives that are not chosen. The present paper examines the effect of foregone payoffs on underweighting rare but possible events in repeated choice tasks. Previous studies have not demonstrated any long‐lasting effects of foregone payoffs (following repeated presentation of a task) when foregone payoffs do not add much information. The present paper highlights the conditions and the contributing factors for the occurrence of such long‐lasting effects. An experimental study compares the effect of foregone payoffs under different degrees of rarity of the negative payoff. It is demonstrated that foregone payoffs increase the selection from risky alternatives with extremely rare and highly negative outcomes, and that this effect does not diminish with repeated presentation of the task. These findings can be summarized using a surprisingly simple reinforcement‐learning model. The findings are discussed in the context of the potential long‐term effect of social learning. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

10.
The anterior cingulate cortex (ACC) is commonly associated with cognitive control and decision making, but its specific function is highly debated. To explore a recent theory that the ACC learns the reward values of task contexts (Holroyd & McClure in Psychological Review, 122, 54–83, 2015; Holroyd & Yeung in Trends in Cognitive Sciences, 16, 122–128, 2012), we recorded the event-related brain potentials (ERPs) from participants as they played a novel gambling task. The participants were first required to select from among three games in one “virtual casino,” and subsequently they were required to select from among three different games in a different virtual casino; unbeknownst to them, the payoffs for the games were higher in one casino than in the other. Analysis of the reward positivity, an ERP component believed to reflect reward-related signals carried to the ACC by the midbrain dopamine system, revealed that the ACC is sensitive to differences in the reward values associated with both the casinos and the games inside the casinos, indicating that participants learned the values of the contexts in which rewards were delivered. These results highlight the importance of the ACC in learning the reward values of task contexts in order to guide action selection.  相似文献   

11.
Optimal decision criterion placement maximizes expected reward and requires sensitivity to the category base rates (prior probabilities) and payoffs (costs and benefits of incorrect and correct responding). When base rates are unequal, human decision criterion is nearly optimal, but when payoffs are unequal, suboptimal decision criterion placement is observed, even when the optimal decision criterion is identical in both cases. A series of studies are reviewed that examine the generality of this finding, and a unified theory of decision criterion learning is described (Maddox & Dodd, 2001). The theory assumes that two critical mechanisms operate in decision criterion learning. One mechanism involves competition between reward and accuracy maximization: The observer attempts to maximize reward, as instructed, but also places some importance on accuracy maximization. The second mechanism involves a flat-maxima hypothesis that assumes that the observer's estimate of the reward-maximizing decision criterion is determined from the steepness of the objective reward function that relates expected reward to decision criterion placement. Experiments used to develop and test the theory require each observer to complete a large number of trials and to participate in all conditions of the experiment. This provides maximal control over the reinforcement history of the observer and allows a focus on individual behavioral profiles. The theory is applied to decision criterion learning problems that examine category discriminability, payoff matrix multiplication and addition effects, the optimal classifier's independence assumption, and different types of trial-by-trial feedback. In every case the theory provides a good account of the data, and, most important, provides useful insights into the psychological processes involved in decision criterion learning.  相似文献   

12.
13.
Psychologically based rules are important in human behavior and have the potential of explaining equilibrium selection and separatrix crossings to a payoff dominant equilibrium in coordination games. We show how a rule learning theory can easily accommodate behavioral rules such as aspiration-based experimentation and reciprocity-based cooperation and how to test for the significance of additional rules. We confront this enhanced rule learning model with experimental data on games with multiple equilibria and separatrix-crossing behavior. Maximum likelihood results do not support aspiration-based experimentation or anticipated reciprocity as significant explanatory factors, but do support a small propensity for non-aspiration-based experimentation by random belief and non-reciprocity-based cooperation.  相似文献   

14.
There is ample evidence that multimedia learning is challenging, and learners often underutilize appropriate cognitive processes. Previous research has applied prompts to promote the use of helpful cognitive processing. However, prompts still require learners to regulate their learning, which may interfere with learning, especially in situations where cognitive demands are already high. As an alternative, implementation intentions (i.e. if-then plans) are expected to help regulate behaviour automatically due to their specific wording, thereby offloading demands. Accordingly, this study aimed at investigating whether implementation intentions compared with prompts improve learning performance, especially under high cognitive load. Students (N = 120) learned either in a control condition without instructional support, with prompts, or with implementation intentions. Within each condition, half of the participants studied the multimedia instruction under conditions of either high or low cognitive load, which was experimentally manipulated by instructing them to perform one of two secondary tasks. In line with our hypotheses, the results showed that under low cognitive load, both prompts and implementation intentions led to better learning than the control condition. By contrast, under high cognitive load, only implementation intentions promoted learning. Thus, implementation intentions are an efficient means to promote learning even under challenging circumstances.  相似文献   

15.
Preschool children typically do not learn words from ignorant or unreliable speakers. Here, we examined the mechanism by which these learning failures occur by modifying the comprehension test procedure that measures word learning. Following lexical training by a knowledgeable or ignorant speaker, 48 preschool-aged children were asked either a standard comprehension test question (i.e., “Which one is the blicket”) or a question about the labeling episode (i.e., “Which one did I say is the blicket”). Immediately after training, children chose the object labeled by an ignorant speaker when asked the episode question, but not when asked the semantic question. However, the advantage for episode questions disappeared when the same children were asked after a brief delay. These findings show that children encode their experiences with ignorant speakers, but do not form semantic representations on the basis of those experiences.  相似文献   

16.
In this study, 2.5-, 3-, and 4-year-olds (N = 108) participated in a novel noun generalization task in which background context was manipulated. During the learning phase of each trial, children were presented with exemplars in one or multiple background contexts. At the test, children were asked to generalize to a novel exemplar in either the same or a different context. The 2.5-year-olds’ performance was supported by matching contexts; otherwise, children in this age group demonstrated context dependent generalization. The 3-year-olds’ performance was also supported by matching contexts; however, children in this age group were aided by training in multiple contexts as well. Finally, the 4-year-olds demonstrated high performance in all conditions. The results are discussed in terms of the relationship between word learning and memory processes; both general memory development and memory developments specific to word learning (e.g., retention of linguistic labels) are likely to support word learning and generalization.  相似文献   

17.
Categorization and concept learning encompass some of the most important aspects of behavior, but historically they have not been central topics in the experimental analysis of behavior. To introduce this special issue of the Journal of the Experimental Analysis of Behavior (JEAB), we define key terms; distinguish between the study of concepts and the study of concept learning; describe three types of concept learning characterized by the stimulus classes they yield; and briefly identify several other themes (e.g., quantitative modeling and ties to language) that appear in the literature. As the special issue demonstrates, a surprising amount and diversity of work is being conducted that either represents a behavior-analytic perspective or can inform or constructively challenge this perspective.  相似文献   

18.
Social learning is considered one of the hallmarks of cognition. Observers learn from demonstrators that a particular behavior pattern leads to a specific consequence or outcome, which may be either positive or negative. In the last few years, social learning has been studied in a variety of taxa including birds and bony fish. To date, there are few studies demonstrating learning processes in cartilaginous fish. Our study shows that the cartilaginous fish freshwater stingrays (Potamotrygon falkneri) are capable of social learning and isolates the processes involved. Using a task that required animals to learn to remove a food reward from a tube, we found that observers needed significantly (P < 0.01) fewer trials to learn to extract the reward than demonstrators. Furthermore, observers immediately showed a significantly (P < 0.05) higher frequency of the most efficient “suck and undulation” strategy exhibited by the experienced demonstrators, suggesting imitation. Shedding light on social learning processes in cartilaginous fish advances the systematic comparison of cognition between aquatic and terrestrial vertebrates and helps unravel the evolutionary origins of social cognition.  相似文献   

19.
To explain learning, comparative researchers invoke an associative construct by which immediate reinforcement strengthens animal's adaptive responses. In contrast, cognitive researchers freely acknowledge humans' explicit-learning capability to test and confirm hypotheses even lacking direct reinforcement. We describe a new dissociative framework that may stretch animals' learning toward the explicit pole of cognition. We discuss the neuroscience of reinforcement-based learning and suggest the possibility of disabling a dominant form of reinforcement-based discrimination learning. In that vacuum, researchers may have an opportunity to observe animals' explicit learning strategies (i.e., hypotheses, rules, task self-construals). We review initial research using this framework showing explicit learning by humans and perhaps by monkeys. Finally, we consider why complementary explicit and reinforcement-based learning systems might promote evolutionary and ecological fitness. Illuminating the evolution of parallel learning systems may also tell part of the story of the emergence of humans' extraordinary capacity for explicit-declarative cognition.  相似文献   

20.
The authors explore the division of labor between the basal ganglia-dopamine (BG-DA) system and the orbitofrontal cortex (OFC) in decision making. They show that a primitive neural network model of the BG-DA system slowly learns to make decisions on the basis of the relative probability of rewards but is not as sensitive to (a) recency or (b) the value of specific rewards. An augmented model that explores BG-OFC interactions is more successful at estimating the true expected value of decisions and is faster at switching behavior when reinforcement contingencies change. In the augmented model, OFC areas exert top-down control on the BG and premotor areas by representing reinforcement magnitudes in working memory. The model successfully captures patterns of behavior resulting from OFC damage in decision making, reversal learning, and devaluation paradigms and makes additional predictions for the underlying source of these deficits.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号