首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Understanding how learning changes during human development has been one of the long-standing objectives of developmental science. Recently, advances in computational biology have demonstrated that humans display a bias when learning to navigate novel environments through rewards and punishments: they learn more from outcomes that confirm their expectations than from outcomes that disconfirm them. Here, we ask whether confirmatory learning is stable across development, or whether it might be attenuated in developmental stages in which exploration is beneficial, such as in adolescence. In a reinforcement learning (RL) task, 77 participants aged 11–32 years (four men, mean age = 16.26) attempted to maximize monetary rewards by repeatedly sampling different pairs of novel options, which varied in their reward/punishment probabilities. Mixed-effect models showed an age-related increase in accuracy as long as learning contingencies remained stable across trials, but less so when they reversed halfway through the trials. Age was also associated with a greater tendency to stay with an option that had just delivered a reward, more than to switch away from an option that had just delivered a punishment. At the computational level, a confirmation model provided increasingly better fit with age. This model showed that age differences are captured by decreases in noise or exploration, rather than in the magnitude of the confirmation bias. These findings provide new insights into how learning changes during development and could help better tailor learning environments to people of different ages.

Research Highlights

  • Reinforcement learning shows age-related improvement during adolescence, but more in stable learning environments compared with volatile learning environments.
  • People tend to stay with an option after a win more than they shift from an option after a loss, and this asymmetry increases with age during adolescence.
  • Computationally, these changes are captured by a developing confirmatory learning style, in which people learn more from outcomes that confirm rather than disconfirm their choices.
  • Age-related differences in confirmatory learning are explained by decreases in stochasticity, rather than changes in the magnitude of the confirmation bias.
  相似文献   

2.
A new machine learning approach known as motivated learning (ML) is presented in this work. Motivated learning drives a machine to develop abstract motivations and choose its own goals. ML also provides a self-organizing system that controls a machine’s behavior based on competition between dynamically-changing pain signals. This provides an interplay of externally driven and internally generated control signals. It is demonstrated that ML not only yields a more sophisticated learning mechanism and system of values than reinforcement learning (RL), but is also more efficient in learning complex relations and delivers better performance than RL in dynamically-changing environments. In addition, this paper shows the basic neural network structures used to create abstract motivations, higher level goals, and subgoals. Finally, simulation results show comparisons between ML and RL in environments of gradually increasing sophistication and levels of difficulty.  相似文献   

3.
The Iowa gambling task (IGT) has been used in numerous studies, often to examine decision-making performance in different clinical populations. Reinforcement learning (RL) models such as the expectancy valence (EV) model have often been used to characterize choice behavior in this work, and accordingly, parameter differences from these models have been used to examine differences in decision-making processes between different populations. These RL models assume a strategy whereby participants incrementally update the expected rewards for each option and probabilistically select options with higher expected rewards. Here we show that a formal model that assumes a win-stay/lose-shift (WSLS) strategy—which is sensitive only to the outcome of the previous choice—provides the best fit to IGT data from about half of our sample of healthy young adults, and that a prospect valence learning (PVL) model that utilizes a decay reinforcement learning rule provides the best fit to the other half of the data. Further analyses suggested that the better fits of the WSLS model to many participants’ data were not due to an enhanced ability of the WSLS model to mimic the RL strategy assumed by the PVL and EV models. These results suggest that WSLS is a common strategy in the IGT and that both heuristic-based and RL-based models should be used to inform decision-making behavior in the IGT and similar choice tasks.  相似文献   

4.
The effectiveness of the differential reinforcement for low rates of responding (DRL) contingency in suppressing response rates of septal rats was investigated by using a Multi-DRL-yoked-VI (variable interval) schedule of reinforcement. The yoking procedure equated the interreinforcement times on the two schedules. Each schedule was in effect for half of each session, and the change in schedule was signaled by the presence or absence of a cue light. Schedule order and DRL delay requirement were varied. For both normal and septal rats, the response rates were higher in the VI component than the DRL component; this effect demonstrates that the responding of septals as well as normals is suppressed by the differential reinforcement of a particular class of IRTs. A sharp difference in the level of responding occurred at the point of transition from one component of the multiple schedule to the other, which provides evidence of a discrimination between the two schedules for both normals and septals. The conclusion is that the responding of septals is suppressed by the DRL contingency and not controlled solely by the density and distribution of reinforcement.  相似文献   

5.
Frank MC  Tenenbaum JB 《Cognition》2011,120(3):360-371
Children learning the inflections of their native language show the ability to generalize beyond the perceptual particulars of the examples they are exposed to. The phenomenon of “rule learning”—quick learning of abstract regularities from exposure to a limited set of stimuli—has become an important model system for understanding generalization in infancy. Experiments with adults and children have revealed differences in performance across domains and types of rules. To understand the representational and inferential assumptions necessary to capture this broad set of results, we introduce three ideal observer models for rule learning. Each model builds on the next, allowing us to test the consequences of individual assumptions. Model 1 learns a single rule, Model 2 learns a single rule from noisy input, and Model 3 learns multiple rules from noisy input. These models capture a wide range of experimental results—including several that have been used to argue for domain-specificity or limits on the kinds of generalizations learners can make—suggesting that these ideal observers may be a useful baseline for future work on rule learning.  相似文献   

6.
An important issue in the field of learning is to what extent one can distinguish between behavior resulting from either belief or reinforcement learning. Previous research suggests that it is difficult or even impossible to distinguish belief from reinforcement learning: belief and reinforcement models often fit the empirical data equally well. However, previous research has been confined to specific games in specific settings. In the present study we derive predictions for behavior in games using the EWA learning model (e.g., Camerer & Ho, 1999), a model that includes belief learning and a specific type of reinforcement learning as special cases. We conclude that belief and reinforcement learning can be distinguished, even in 2×2 games. Maximum differentiation in behavior resulting from either belief or reinforcement learning is obtained in games with pure Nash equilibria with negative payoffs and at least one other strategy combination with only positive payoffs. Our results help researchers to identify games in which belief and reinforcement learning can be discerned easily.  相似文献   

7.
Successfully explaining and replicating the complexity and generality of human and animal learning will require the integration of a variety of learning mechanisms. Here, we introduce a computational model which integrates associative learning (AL) and reinforcement learning (RL). We contrast the integrated model with standalone AL and RL models in three simulation studies. First, a synthetic grid‐navigation task is employed to highlight performance advantages for the integrated model in an environment where the reward structure is both diverse and dynamic. The second and third simulations contrast the performances of the three models in behavioral experiments, demonstrating advantages for the integrated model in accounting for behavioral data.  相似文献   

8.
Everitt  Tom  Hutter  Marcus  Kumar  Ramana  Krakovna  Victoria 《Synthese》2021,198(27):6435-6467

Can humans get arbitrarily capable reinforcement learning (RL) agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question impacts how far RL can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we study when an RL agent has an instrumental goal to tamper with its reward process, and describe design principles that prevent instrumental goals for two different types of reward tampering (reward function tampering and RF-input tampering). Combined, the design principles can prevent reward tampering from being an instrumental goal. The analysis benefits from causal influence diagrams to provide intuitive yet precise formalizations.

  相似文献   

9.
The design of recommendation strategies in the adaptive learning systems focuses on utilizing currently available information to provide learners with individual-specific learning instructions. As a critical motivate for human behaviours, curiosity is essentially the drive to explore knowledge and seek information. In a psychologically inspired view, we propose a curiosity-driven recommendation policy within the reinforcement learning framework, allowing for an efficient and enjoyable personalized learning path. Specifically, a curiosity reward from a well-designed predictive model is generated to model one's familiarity with the knowledge space. Given such curiosity rewards, we apply the actor–critic method to approximate the policy directly through neural networks. Numerical analyses with a large continuous knowledge state space and concrete learning scenarios are provided to further demonstrate the efficiency of the proposed method.  相似文献   

10.
In Experiment I, groups of rats were trained to press a lever for food reinforcement on differential reinforcement of low rate (DRL) schedules which differed in parameter value. A stimulus which terminated with either a 0.5-mA or 2.0-mA electric shock was then superimposed upon each DRL baseline. In general, the magnitude of conditioned suppression was an inverse function of DRL schedule parameter and a direct function of shock intensity. Experiment II demonstrated that the rate of responding maintained by the DRL component of a multiple DRL-extinction schedule decreased during a stimulus preceding a 0.5-mA shock, whereas the rate of responding maintained by the DRL component of a multiple DRL-variable interval schedule showed little change or increased slightly during a stimulus preceding a 0.5-mA shock.  相似文献   

11.
Previous studies have identified and manipulated collateral behavior to assess the effect of collateral behavior on performance under the differential-reinforcement-of-low-rate (DRL) schedule. However, conclusions could not be applied to subjects not observed to engage in collateral behavior. The present study used a technique that prevented the occurrence of the types of collateral behavior typically observed in the pigeon. This technique did not require the identification of collateral behavior in the subjects. The exclusion of the types of collateral behavior typically observed in pigeons resulted in higher response rates and lower reinforcement rates under large DRL values but had no effect at lower DRL values. It was concluded that collateral behavior is necessary for low response rates and high reinforcement rates under large DRL values.  相似文献   

12.
Spaced responding in multiple DRL schedules   总被引:2,自引:2,他引:0       下载免费PDF全文
Rats were able to adjust to two different temporal requirements within several multiple DRL schedules of reinforcement, and a slight induction between pairs of components was found. Initial administration of dl-amphetamine differentially disrupted spaced responding in the components of a multiple DRL 36 DRL 18 schedule, but did not eliminate discrimination between the components. After maximum drug effects, the continued administration of dl-amphetamine was accompanied by a progressive recovery of the behavior towards the characteristics of saline control.  相似文献   

13.
本研究采取问卷法,选取武汉市283名初一和高二学生为被试,考察了中学生成就目标定向与学习策略、学业成绩的关系。结果表明:(1)掌握目标与深加工、元认知策略以及浅表策略三者都存在显著相关,其中与浅表加工策略相关的显著性水平略低,成绩接近目标与三种学习策略都在呈非常显著相关,而成绩回避目标只与浅表策略显著相关;(2)掌握目标和成绩接近目标有助于学业成绩,成绩回避目标不利于取得良好成绩;(3)与初一年级相比,高二年级中成绩接近目标和掌握目标显著下降,采取深加工和元认知策略的水平也下降;初一和高二年级中,男生比女生更多采取掌握目标和元认知策略;(4)多元目标比单一目标的学生更多地使用深加工和元认知策略,但多元目标者并不必然比单一目标者成绩好。  相似文献   

14.
We evaluated the effectiveness of full-session differential reinforcement of low rates of behavior (DRL) on 3 primary school children's rates of requesting attention from their teacher. Using baseline rates of responding and teacher recommendations, we set a DRL schedule that was substantially lower than baseline yet still allowed the children access to teacher assistance. The DRL schedule was effective in reducing children's requests for assistance and approval, and the teacher found the intervention highly useful and acceptable. The possible mechanisms that account for behavior change using full-session DRL schedules are discussed.  相似文献   

15.
Undergraduates were exposed to a series of reinforcement schedules: first, to a fixed-ratio (FR) schedule in the presence of one stimulus and to a differential-reinforcement-of-low-rate (DRL) schedule in the presence of another (multiple FR DRL training), then to a fixed-interval (FI) schedule in the presence of a third stimulus (FI baseline), next to the FI schedule under the stimuli previously correlated with the FR and DRL schedules (multiple FI FI testing), and, finally, to a single session of the multiple FR DRL schedule again (multiple FR DRL testing). Response rates during the multiple FI FI schedule were higher under the former FR stimulus than under the former DRL stimulus. This effect of remote histories was prolonged when either the number of FI-baseline sessions was small or zero, or the time interval between the multiple FR DRL training and the multiple FI FI testing was short. Response rates under these two stimuli converged with continued exposure to the multiple FI FI schedule in most cases, but quickly differentiated when the schedule returned to the multiple FR DRL.  相似文献   

16.
Vocal stereotypy (VS) is often observed in individuals with autism spectrum disorder (ASD) which at high rates can interfere with socialization or functioning in structured settings. There are multiple effective interventions available; yet, many procedures target the complete omission of the behavior or are only assessed at short intervals, making it unclear how they will generalize in applied settings. One intervention yet to be assessed as an individual intervention for automatically reinforced VS is differential reinforcement of low rates of behavior (DRL). In the present study, a functional analysis determined that the VS of two female adolescents with ASD was maintained by automatic reinforcement. A DRL procedure was implemented which incorporated: (a) a specified interval for reinforcement; (b) the behavioral expectations; (c) the permissible instances of VS within the interval; (d) learner feedback; and (e) the reset/non-reset aspect of the schedule. As the targeted behavior decreased across sessions, the DRL interval was systematically increased in order to thin out the schedule of reinforcement. The intervention reduced VS and increased untargeted task engagement for both participants. Applied and theoretical implications of the study as well as social validity, limitations, and future research are discussed.  相似文献   

17.
In this paper we introduce a novel reinforcement learning algorithm called event-learning. The algorithm uses events, ordered pairs of two consecutive states. We define event-value function and we derive learning rules. Combining our method with a well-known robust control method, the SDS algorithm, we introduce Robust Policy Heuristics (RPH). It is shown that RPH, a fast-adapting non-Markovian policy, is particularly useful for coarse models of the environment and could be useful for some partially observed systems. RPH may be of help in alleviating the ‘curse of dimensionality’ problem. Event-learning and RPH can be used to separate time scales of learning of value functions and adaptation. We argue that the definition of modules is straightforward for event-learning and event-learning makes planning feasible in the RL framework. Computer simulations of a rotational inverted pendulum with coarse discretization are shown to demonstrate the principle.  相似文献   

18.
Memory persistence is a dynamic process involving the reconsolidation of memories after their reactivation. Reconsolidation impairments have been demonstrated for many types of memories in rats, and signaling at N-methyl-d-aspartate (NMDA) receptors appears often to be a critical pharmacological mechanism. Here we investigated the reconsolidation of appetitive pavlovian memories reinforced by natural rewards. In male Lister Hooded rats, systemic administration of the NMDA receptor antagonist (+)-5-methyl-10,11-dihydro-SH-dibenzo{a,d}cyclohepten-5,10-imine maleate (MK-801, 0.1 mg/kg i.p.) either before or immediately following a brief memory reactivation session abolished the subsequent acquisition of a new instrumental response with sucrose conditioned reinforcement. However, only when injected prior to memory reactivation was MK-801 effective in disrupting the maintenance of a previously-acquired instrumental response with conditioned reinforcement. These results demonstrate that NMDA receptor-mediated signaling is required for appetitive pavlovian memory reconsolidation.  相似文献   

19.
Quantity and quality of motor exploration are proposed to be fundamental for infant motor development. However, it is still not clear what types of motor exploration contribute to learning. To determine whether changes in quantity of leg movement and/or variability of leg acceleration are related to performance in a contingency learning task, twenty 6–8-month-old infants with typical development participated in a contingency learning task. During this task, a robot provided reinforcement when the infant’s right leg peak acceleration was above an individualized threshold. The correlation coefficient between the infant’s performance and the change in quantity of right leg movement, linear variability, and nonlinear variability of right leg movement acceleration from baseline were calculated. Simple linear regression and multiple linear regression were calculated to explain the contribution of each variable to the performance individually and collectively. We found significant correlation between the performance and the change in quantity of right leg movement (r = 0.86, p < 0.001), linear variability (r = 0.71, p < 0.001), and nonlinear variability (r = 0.62, p = 0.004) of right leg movement acceleration, respectively. However, multiple linear regression showed that only quantity and linear variability of leg movements were significant predicting factors for the performance ratio (p < 0.001, adjusted R2 = 0.94). These results indicated that the quantity of exploration and variable exploratory strategies could be critical for the motor learning process during infancy.  相似文献   

20.
Botvinick MM  Niv Y  Barto AC 《Cognition》2009,113(3):262-280
Research on human and animal behavior has long emphasized its hierarchical structure—the divisibility of ongoing behavior into discrete tasks, which are comprised of subtask sequences, which in turn are built of simple actions. The hierarchical structure of behavior has also been of enduring interest within neuroscience, where it has been widely considered to reflect prefrontal cortical functions. In this paper, we reexamine behavioral hierarchy and its neural substrates from the point of view of recent developments in computational reinforcement learning. Specifically, we consider a set of approaches known collectively as hierarchical reinforcement learning, which extend the reinforcement learning paradigm by allowing the learning agent to aggregate actions into reusable subroutines or skills. A close look at the components of hierarchical reinforcement learning suggests how they might map onto neural structures, in particular regions within the dorsolateral and orbital prefrontal cortex. It also suggests specific ways in which hierarchical reinforcement learning might provide a complement to existing psychological models of hierarchically structured behavior. A particularly important question that hierarchical reinforcement learning brings to the fore is that of how learning identifies new action routines that are likely to provide useful building blocks in solving a wide range of future problems. Here and at many other points, hierarchical reinforcement learning offers an appealing framework for investigating the computational and neural underpinnings of hierarchically structured behavior.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号