首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
How, and how well, do people switch between exploration and exploitation to search for and accumulate resources? We study the decision processes underlying such exploration/exploitation trade-offs using a novel card selection task that captures the common situation of searching among multiple resources (e.g., jobs) that can be exploited without depleting. With experience, participants learn to switch appropriately between exploration and exploitation and approach optimal performance. We model participants' behavior on this task with random, threshold, and sampling strategies, and find that a linear decreasing threshold rule best fits participants' results. Further evidence that participants use decreasing threshold-based strategies comes from reaction time differences between exploration and exploitation; however, participants themselves report non-decreasing thresholds. Decreasing threshold strategies that “front-load” exploration and switch quickly to exploitation are particularly effective in resource accumulation tasks, in contrast to optimal stopping problems like the Secretary Problem requiring longer exploration.  相似文献   

2.
When feedback follows a sequence of decisions, relationships between actions and outcomes can be difficult to learn. We used event-related potentials (ERPs) to understand how people overcome this temporal credit assignment problem. Participants performed a sequential decision task that required two decisions on each trial. The first decision led to an intermediate state that was predictive of the trial outcome, and the second decision was followed by positive or negative trial feedback. The feedback-related negativity (fERN), a component thought to reflect reward prediction error, followed negative feedback and negative intermediate states. This suggests that participants evaluated intermediate states in terms of expected future reward, and that these evaluations supported learning of earlier actions within sequences. We examine the predictions of several temporal-difference models to determine whether the behavioral and ERP results reflected a reinforcement-learning process.  相似文献   

3.
The bandit problem is a dynamic decision-making task that is simply described, well-suited to controlled laboratory study, and representative of a broad class of real-world problems. In bandit problems, people must choose between a set of alternatives, each with different unknown reward rates, to maximize the total reward they receive over a fixed number of trials. A key feature of the task is that it challenges people to balance the exploration of unfamiliar choices with the exploitation of familiar ones. We use a Bayesian model of optimal decision-making on the task, in which how people balance exploration with exploitation depends on their assumptions about the distribution of reward rates. We also use Bayesian model selection measures that assess how well people adhere to an optimal decision process, compared to simpler heuristic decision strategies. Using these models, we make inferences about the decision-making of 451 participants who completed a set of bandit problems, and relate various measures of their performance to other psychological variables, including psychometric assessments of cognitive abilities and personality traits. We find clear evidence of individual differences in the way the participants made decisions on the bandit problems, and some interesting correlations with measures of general intelligence.  相似文献   

4.
5.
The goal of this research was to further our understanding of how the striatum responds to the delivery of affective feedback. Previously, we had found that the striatum showed a pattern of sustained activation after presentation of a monetary reward, in contrast to a decrease in the hemodynamic response after a punishment. In this study, we tested whether the activity of the striatum could be modulated by parametric variations in the amount of financial reward or punishment. We used an event-related fMRI design in which participants received large or small monetary rewards or punishments after performance in a gambling task. A parametric ordering of conditions was observed in the dorsal striatum according to both magnitude and valence. In addition, an early response to the presentation of feedback was observed and replicated in a second experiment with increased temporal resolution. This study further implicates the dorsal striatum as an integral component of a reward circuitry responsible for the control of motivated behavior, serving to code for such feedback properties as valence and magnitude.  相似文献   

6.
Which stimuli we pay attention to is strongly influenced by learning. Stimuli previously associated with reward outcomes, such as money and food, and stimuli previously associated with aversive outcomes, such as monetary loss and electric shock, automatically capture attention. Social reward (happy expressions) can bias attention towards associated stimuli, but the role of negative social feedback in biasing attentional selection remains unexplored. On the one hand, negative social feedback often serves to discourage particular behaviours. If attentional selection can be curbed much like any other behavioural preference, we might expect stimuli associated with negative social feedback to be more readily ignored. On the other hand, if negative social feedback influences attention in the same way that other aversive outcomes do, such feedback might ironically bias attention towards the stimuli it is intended to discourage selection of. In the present study, participants first completed a training phase in which colour targets were associated with negative social feedback. Then, in a subsequent test phase, these same colour stimuli served as task-irrelevant distractors during a visual search task. The results strongly support the latter interpretation in that stimuli previously associated with negative social feedback impaired search performance.  相似文献   

7.
Memories of past experiences can guide our decisions. Thus, if memories are undermined or distorted, decision making should be affected. Nevertheless, little empirical research has been done to examine the role of memory in reinforcement decision-making. We hypothesized that if memories guide choices in a conditioning decision-making task, then manipulating these memories would result in a change of decision preferences to gain reward. We manipulated participants’ memories by providing false feedback that their memory associations were wrong before they made decisions that could lead them to win money. Participants’ memory ratings decreased significantly after receiving false feedback. More importantly, we found that false feedback led participants’ decision bias to disappear after their memory associations were undermined. Our results suggest that reinforcement decision-making can be altered by false feedback on memories. The results are discussed using memory mechanisms such as spreading activation theories.  相似文献   

8.
郑旭涛  郭文姣  陈满  金佳  尹军 《心理学报》2020,52(5):584-596
采用学习-测验两任务范式, 通过3项实验探讨了社会行为的效价信息对注意捕获的影响。在学习阶段, 被试观看具有积极效价的帮助行为(某智能体帮助另一智能体爬山)和消极效价的阻碍行为(某智能体阻碍另一智能体爬山), 以及与各自运动特性匹配的无社会交互行为, 其目的为建立不同智能体颜色与社会行为效价信息的联结关系。在测验阶段, 则分别检验社会行为中的施动方(帮助者和阻碍者)颜色和受动方(被帮助者和被阻碍者)颜色的注意捕获效应。结果发现, 消极社会行为中施动方颜色和受动方颜色均更容易捕获注意, 而积极社会行为效价信息并没有改变联结特征值的注意捕获效应; 且相比于受动方, 与消极社会行为效价建立联结的施动方颜色的注意捕获效应更强。该结果提示, 存在消极社会行为效价驱动的注意捕获, 且消极的效价信息与卷入社会行为所有个体的特征建立联结, 但该联结中施动方物理特征具有更高的注意优先性。这一发现暗示, 声誉信息与对社会交互行为的整体表征可能综合作用于对社会交互事件的注意选择。  相似文献   

9.
When faced with decisions, rats sometimes pause and look back and forth between possible alternatives, a phenomenon termed vicarious trial and error (VTE). When it was first observed in the 1930s, VTE was theorized to be a mechanism for exploration. Later theories suggested that VTE aided the resolution of sensory or neuroeconomic conflict. In contrast, recent neurophysiological data suggest that VTE reflects a dynamic search and evaluation process. These theories make unique predictions about the timing of VTE on behavioral tasks. We tested these theories of VTE on a T-maze with return rails, where rats were given a choice between a smaller reward available after one delay or a larger reward available after an adjustable delay. Rats showed three clear phases of behavior on this task: investigation, characterized by discovery of task parameters; titration, characterized by iterative adjustment of the delay to a preferred interval; and exploitation, characterized by alternation to hold the delay at the preferred interval. We found that VTE events occurred during adjustment laps more often than during alternation laps. Results were incompatible with theories of VTE as an exploratory behavior, as reflecting sensory conflict, or as a simple neuroeconomic valuation process. Instead, our results were most consistent with VTE as reflecting a search process during deliberative decision making. This pattern of VTE that we observed is reminiscent of current navigational theories proposing a transition from a deliberative to a habitual decision-making mechanism.  相似文献   

10.
We investigated how changes in outcome magnitude affect behavioral variation in human volunteers. Our participants entered strings of characters using a computer keyboard, receiving feedback (gaining a number of points) for any string at least ten characters long. During a “surprise” phase in which the number of points awarded was changed, participants only increased their behavioral variability when the reward value was downshifted to a lower amount, and only when such a shift was novel. Upshifts in reward did not have a systematic effect on variability.  相似文献   

11.
Many theories propose that top-down attentional signals control processing in sensory cortices by modulating neural activity. But who controls the controller? Here we investigate how a biologically plausible neural reinforcement learning scheme can create higher order representations and top-down attentional signals. The learning scheme trains neural networks using two factors that gate Hebbian plasticity: (1) an attentional feedback signal from the response-selection stage to earlier processing levels; and (2) a globally available neuromodulator that encodes the reward prediction error. We demonstrate how the neural network learns to direct attention to one of two coloured stimuli that are arranged in a rank-order. Like monkeys trained on this task, the network develops units that are tuned to the rank-order of the colours and it generalizes this newly learned rule to previously unseen colour combinations. These results provide new insight into how individuals can learn to control attention as a function of reward contingency.  相似文献   

12.
ABSTRACT

In recent years there has been rapid proliferation of studies demonstrating how reward learning guides visual search. However, most of these studies have focused on feature-based reward, and there has been scant evidence supporting the learning of space-based reward. We raise the possibility that the visual search apparatus is impenetrable to spatial value contingencies, even when such contingencies are learned and represented online in a separate knowledge domain. In three experiments, we interleaved a visual choice task with a visual search task in which one display quadrant produced greater monetary rewards than the remaining quadrants. We found that participants consistently exploited this spatial value contingency during the choice task but not during the search task – even when these tasks were interleaved within the same trials and when rewards were contingent on response speed. These results suggest that the expression of spatial value information is task specific and that the visual search apparatus could be impenetrable to spatial reward information. Such findings are consistent with an evolutionary framework in which the search apparatus has little to gain from spatial value information in most real world situations.  相似文献   

13.
It is well known that observers can implicitly learn the spatial context of complex visual searches, such that future searches through repeated contexts are completed faster than those through novel contexts, even though observers remain at chance at discriminating repeated from new contexts. This contextual-cueing effect arises quickly (within less than five exposures) and asymptotes within 30 exposures to repeated contexts. In spite of being a robust effect (its magnitude is over 100 ms at the asymptotic level), the effect is implicit: Participants are usually at chance at discriminating old from new contexts at the end of an experiment, in spite of having seen each repeated context more than 30 times throughout a 50-min experiment. Here, we demonstrate that the speed at which the contextual-cueing effect arises can be modulated by external rewards associated with the search contexts (not with the performance itself). Following each visual search trial (and irrespective of a participant’s search speed on the trial), we provided a reward, a penalty, or no feedback to the participant. Crucially, the type of feedback obtained was associated with the specific contexts, such that some repeated contexts were always associated with reward, and others were always associated with penalties. Implicit learning occurred fastest for contexts associated with positive feedback, though penalizing contexts also showed a learning benefit. Consistent feedback also produced faster learning than did variable feedback, though unexpected penalties produced the largest immediate effects on search performance.  相似文献   

14.
Probabilistic models in human sensorimotor control   总被引:2,自引:0,他引:2  
Sensory and motor uncertainty form a fundamental constraint on human sensorimotor control. Bayesian decision theory (BDT) has emerged as a unifying framework to understand how the central nervous system performs optimal estimation and control in the face of such uncertainty. BDT has two components: Bayesian statistics and decision theory. Here we review Bayesian statistics and show how it applies to estimating the state of the world and our own body. Recent results suggest that when learning novel tasks we are able to learn the statistical properties of both the world and our own sensory apparatus so as to perform estimation using Bayesian statistics. We review studies which suggest that humans can combine multiple sources of information to form maximum likelihood estimates, can incorporate prior beliefs about possible states of the world so as to generate maximum a posteriori estimates and can use Kalman filter-based processes to estimate time-varying states. Finally, we review Bayesian decision theory in motor control and how the central nervous system processes errors to determine loss functions and select optimal actions. We review results that suggest we plan movements based on statistics of our actions that result from signal-dependent noise on our motor outputs. Taken together these studies provide a statistical framework for how the motor system performs in the presence of uncertainty.  相似文献   

15.
Past studies have shown that the perceived time of actions is retrospectively influenced by post-action events. The current study examined whether rewarding performance feedback (even when false) altered the reported time of action. In Experiment 1, participants performed a speeded button press task and received monetary reward for a presumed “fast,” or a monetary punishment for a presumed “slow” response. Rewarded trials resulted in the false perception that the response action occurred earlier than punished trials. In Experiments 2 and 3, the need for a speeded response and reward were independently manipulated in order to decouple the cognitive and reward components in the feedback signal. When tested independently, neither variable affected the judged time of action. We conclude that meaningful feedback (fast or slow) is only used when made salient by reward, to modulate the judged time of an action.  相似文献   

16.
In the current study, we examined the postulation that rumination makes it difficult for depressed individuals to learn the exact probability that different stimuli will be associated with punishment. To do so, we induced rumination or distraction in depressed and never-depressed participants and then measured punishment and reward sensitivity with a probabilistic selection task. In this task, participants first learn the probability that different stimuli will be associated with reward and punishment. During a subsequent test phase in which novel combinations of stimuli are presented, participants’ sensitivity to reward is tested by measuring their tendency to select the stimuli that were most highly rewarded during training, and their sensitivity to punishment is tested by measuring their tendency to not select the stimuli that were most highly punished during training. Compared with distraction, rumination led depressed participants to be less sensitive to the probability that stimuli will be associated with punishment and relatively less sensitive to punishment than reward. Never-depressed participants and depressed participants who were distracted from rumination were as sensitive to reward as they were to punishment. The effects of rumination on sensitivity to punishment may be a mechanism by which rumination can lead to maladaptive consequences.  相似文献   

17.
Watching another person take actions to complete a goal and making inferences about that person's knowledge is a relatively natural task for people. This ability can be especially important in educational settings, where the inferences can be used for assessment, diagnosing misconceptions, and providing informative feedback. In this paper, we develop a general framework for automatically making such inferences based on observed actions; this framework is particularly relevant for inferring student knowledge in educational games and other interactive virtual environments. Our approach relies on modeling action planning: We formalize the problem as a Markov decision process in which one must choose what actions to take to complete a goal, where choices will be dependent on one's beliefs about how actions affect the environment. We use a variation of inverse reinforcement learning to infer these beliefs. Through two lab experiments, we show that this model can recover people's beliefs in a simple environment, with accuracy comparable to that of human observers. We then demonstrate that the model can be used to provide real‐time feedback and to model data from an existing educational game.  相似文献   

18.
Traditional models of action understanding emphasise the idea that long-term exposure to a wide array of visual patterns of particular actions allows for effective action anticipation or prediction. More recently, a greater emphasis has been placed on the motor system’s role in the perceptual understanding and prediction of action outcomes. There have been attempts to isolate the contributions of visual and motor experience in action prediction, but to date, these studies have relied on comparisons of motor-visual experience to visual-only (observational) experience. We conducted a learning study, where visual experience was directly manipulated during practice. Novice participants practised throwing darts to 3 specific areas of a dartboard. A group trained without vision of their action, only feedback about the final landing position, significantly improved in their ability to predict the landing position of a thrown dart, from temporally occluded video clips. The performance of this ‘no-vision’ group did not differ from a Full-vision group and was significantly more accurate than an observation-only and a no-practice control group (with the latter two groups not improving pre- to post-practice). These results suggest that motor experience specifically modulates the perceptual prediction of action outcomes. This is thought to occur through simulative mechanisms, whereby observed actions are mapped onto the observer’s own motor representations.  相似文献   

19.
Action effects do not occur randomly in time but follow our actions at specific delays. The ideomotor principle (IMP) is widely used to explain how the relation between actions and contingently following effects is acquired and numerous studies demonstrate robust action-effect learning. Yet, little is known about the acquisition of temporal delays of action effects. Here, we demonstrate that participants learn that action effects occur at specific delays. Participants responded slower to action effects that occurred earlier than usual. In addition, participants often prematurely responded before the effect when it occurred later than expected. Thus, in contrast to biases of time perception in action contexts (e.g., Haggard, Trends Cogn Sci 9:290–295, 2005; Stetson et al., Neuron 51:651–659, 2006), participants learn and exploit temporal regularities between actions and effects for behavioral control.  相似文献   

20.
A number of prior fMRI studies have focused on the ways in which the midbrain dopaminergic reward system coactivates with hippocampus to potentiate memory for valuable items. However, another means by which people could selectively remember more valuable to-be-remembered items is to be selective in their use of effective but effortful encoding strategies. To broadly examine the neural mechanisms of value on subsequent memory, we used fMRI to assess how differences in brain activity at encoding as a function of value relate to subsequent free recall for words. Each word was preceded by an arbitrarily assigned point value, and participants went through multiple study–test cycles with feedback on their point total at the end of each list, allowing for sculpting of cognitive strategies. We examined the correlation between value-related modulation of brain activity and participants’ selectivity index, which measures how close participants were to their optimal point total, given the number of items recalled. Greater selectivity scores were associated with greater differences in the activation of semantic processing regions, including left inferior frontal gyrus and left posterior lateral temporal cortex, during the encoding of high-value words relative to low-value words. Although we also observed value-related modulation within midbrain and ventral striatal reward regions, our fronto-temporal findings suggest that strategic engagement of deep semantic processing may be an important mechanism for selectively encoding valuable items.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号