首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Oesterheld  Caspar 《Synthese》2019,198(27):6491-6504

Decision theorists disagree about how instrumentally rational agents, i.e., agents trying to achieve some goal, should behave in so-called Newcomb-like problems, with the main contenders being causal and evidential decision theory. Since the main goal of artificial intelligence research is to create machines that make instrumentally rational decisions, the disagreement pertains to this field. In addition to the more philosophical question of what the right decision theory is, the goal of AI poses the question of how to implement any given decision theory in an AI. For example, how would one go about building an AI whose behavior matches evidential decision theory’s recommendations? Conversely, we can ask which decision theories (if any) describe the behavior of any existing AI design. In this paper, we study what decision theory an approval-directed agent, i.e., an agent whose goal it is to maximize the score it receives from an overseer, implements. If we assume that the overseer rewards the agent based on the expected value of some von Neumann–Morgenstern utility function, then such an approval-directed agent is guided by two decision theories: the one used by the agent to decide which action to choose in order to maximize the reward and the one used by the overseer to compute the expected utility of a chosen action. We show which of these two decision theories describes the agent’s behavior in which situations.

  相似文献   

2.
Reinforcement learning (RL) models of decision‐making cannot account for human decisions in the absence of prior reward or punishment. We propose a mechanism for choosing among available options based on goal‐option association strengths, where association strengths between objects represent previously experienced object proximity. The proposed mechanism, Goal‐Proximity Decision‐making (GPD), is implemented within the ACT‐R cognitive framework. GPD is found to be more efficient than RL in three maze‐navigation simulations. GPD advantages over RL seem to grow as task difficulty is increased. An experiment is presented where participants are asked to make choices in the absence of prior reward. GPD captures human performance in this experiment better than RL.  相似文献   

3.
ABSTRACT

Predictions about human behaviour can be influenced by the presence and status of goals. The purpose of this study was to assess the impact of an active goal and barriers to that goal on predictions about outcomes experienced by agents. Participants read stories describing characters with goals. The extent that there were barriers to those goals was varied. Participants predicted what happens next in the story, both prior to and after barrier removal. There was support for a goal barrier hypothesis, where the conditions for predicting goal completion involved removing conditions that prevent a goal being achieved (Experiments 1 and 2). At the same time, unachieved goals were more accessible to working memory than completed goals, regardless of a barrier (Experiment 3). These results suggest that participants deliberately decided when it was appropriate to use goal information to predict outcomes of intentional actions conducted by the agents in the stories.  相似文献   

4.

The Pavlovian-Instrumental Transfer (PIT) paradigm examines probabilistic and reinforcement learning. Disruptions in mechanisms that mediate PIT (i.e., cues not triggering adaptive behaviors) are thought to be contributors to psychopathology, making the study of probabilistic and reinforcement learning clinically relevant. The current study evaluated an appetitive PIT effect and its relationship with symptom dimensions spanning depression and anxiety, with a particular focus on anhedonia. Forty young adults ranging in scores across dimensions of depression and anxiety symptoms completed the PIT paradigm and self-report symptom measures. The PIT paradigm consisted of three phases. The instrumental phase consisted of a contingent association in which participants squeezed a handgrip for monetary reward. The Pavlovian phase established a purely predictive association between three visual stimuli (CS?+?, CS-, baseline) and presence or absence of monetary reward. In the transfer phase, participants’ responses allowed for examination of whether motivational characteristics of Pavlovian predictors influenced the vigor of their handgrip squeezes (instrumental action), which were formerly independent of Pavlovian associations. Analyses revealed a baseline-reward PIT effect, whereby a reward-associated Pavlovian cue enhanced instrumental responding in the transfer phase. However, there were no significant differences between CS?+?and CS- or CS- and baseline cues, suggesting a disrupted interaction of Pavlovian and instrumental learning. Further, the appetitive PIT effect captured in this paradigm was not associated with anhedonia, fears, or general distress. Future work should investigate the influence of mood states using more specific appetitive PIT paradigms to further understanding of the implications of disrupted reflexive and instrumental responding.

  相似文献   

5.
Here we attempted to clarify the role of dopamine signaling in reward seeking. In Experiment 1, we assessed the effects of the dopamine D(1)/D(2) receptor antagonist flupenthixol (0.5 mg/kg i.p.) on Pavlovian incentive motivation and found that flupenthixol blocked the ability of a conditioned stimulus to enhance both goal approach and instrumental performance (Pavlovian-to-instrumental transfer). In Experiment 2 we assessed the effects of flupenthixol on reward palatability during post-training noncontingent re-exposure to the sucrose reward in either a control 3-h or novel 23-h food-deprived state. Flupenthixol, although effective in blocking the Pavlovian goal approach, was without effect on palatability or the increase in reward palatability induced by the upshift in motivational state. This noncontingent re-exposure provided an opportunity for instrumental incentive learning, the process by which rats encode the value of a reward for use in updating reward-seeking actions. Flupenthixol administered prior to the instrumental incentive learning opportunity did not affect the increase in subsequent off-drug reward-seeking actions induced by that experience. These data suggest that although dopamine signaling is necessary for Pavlovian incentive motivation, it is not necessary for changes in reward experience, or for the instrumental incentive learning process that translates this experience into the incentive value used to drive reward-seeking actions, and provide further evidence that Pavlovian and instrumental incentive learning processes are dissociable.  相似文献   

6.
Albino rats were trained in a delayed discriminated conditional avoidance response (CAR) to study the possible effects of varying the various training parameters,viz., the CS-UCS interval, the stimulus and strength of the inter-trial and intersession intervals on the acquisition and performance of the CAR. The response latency (RL) was related to the CS-UCS interval in a serial trend while there were response failures (ER%) at both extremes. The efficiency of the CAR were also varied according to the stimulus strength, improving up to 2 mA intensities and declining thereafter. The CAR deteriorated, as reflected in increased magnitudes of RL and ER%, with increasing intertriai intervals of 4 min or more, both in trained and trainee rats, but was not significantly affected by increasing intersession intervals unless it was 7 days or above. These findings are discussed in the light of the known principles of classical conditioning as well as of electrophysiological findings from instrumental animal conditioning studies.  相似文献   

7.
Abstract

In this paper I indicate the reasons why critical theory needs an alternative conception of critique, and then I sketch out what such an alternative should be. The conception of critique I develop involves a time‐responsive redisclosure of the world capable of disclosing new or previously unnoticed possibilities, possibilities in light of which agents can change their self‐understanding and their practices, and change their orientation to the future and the past.  相似文献   

8.
Plural Agents     
Genuine agents are able to engage in activity because they find it worth pursuing—because they care about it. In this respect, they differ from what might be called “mere intentional systems”: systems like chess‐playing computers that exhibit merely goal‐directed behavior mediated by instrumental rationality, without caring. A parallel distinction can be made in the domain of social activity: plural agents must be distinguished from plural intentional systems in that plural agents have cares and engage in activity because of those cares. In this paper, I sketch an account of what it is for an individual to care about things in terms of her exhibiting a certain pattern of emotions. After extending this account to make sense of an individual's caring about other agents, I then show how a certain sort of emotional connectedness among a group of people can make intelligible the group's having cares and thereby constitute that group as a plural agent. Alternative accounts of social action, by ignoring the difference between mere intentional systems and genuine agents, and so by leaving out these emotional entanglements from their accounts of social action, thereby fail to capture a whole range of social phenomena involving plural agents.  相似文献   

9.
What are moral principles? In particular, what are moral principles of the sort that (if they exist) ground moral obligations or—at the very least—particular moral truths? I argue that we can fruitfully conceive of such principles as real, irreducibly dispositional properties of individual persons (agents and patients) that are responsible for and thereby explain the moral properties of (e.g.) agents and actions. Such moral dispositions (or moral powers) are apt to be the metaphysical grounds of moral obligations and of particular truths about what is morally permissible, impermissible, etc. Moreover, they can do other things that moral principles are supposed to do: explain the phenomena “falling within their scope,” support counterfactuals, and ground moral necessities, “necessary connections” between obligating reasons and obligations. And they are apt to be the truthmakers for moral laws, or “lawlike” moral generalizations.  相似文献   

10.
When an anticipated food reward is unexpectedly reduced in quality or quantity, many mammals show a successive negative contrast (SNC) effect, i.e. a reduction in instrumental or consummatory responses below the level shown by control animals that have only ever received the lower-value reward. SNC effects are believed to reflect an aversive emotional state, caused by the discrepancy between the expected and the actual reward. Furthermore, how animals respond to such discrepancy has been suggested to be a sign of animals’ background mood state. However, the occurrence and interpretation of SNC effects are not unequivocal, and there is a relative lack of studies conducted outside of laboratory conditions. Here, we tested two populations of domestic dogs (24 owned pet dogs and 21 dogs from rescue kennels) in a SNC paradigm following the methodology by Bentosela et al. (J Comp Psychol 123:125–130, 2009), using a design that allowed a within-, as well as a between-, subjects analysis. We found no evidence of a SNC effect in either population using a within- or between-subjects design. Indeed, the within-subjects analysis revealed a reverse SNC effect, with subjects in the shifted condition showing a significantly higher level of response, even after they received an unexpected reduction in reward quality. Using a within-, rather than a between-, subjects design may be beneficial in studies of SNC due to higher sensitivity and statistical power; however, order effects on subject performance need to be considered. These results suggest that this particular SNC paradigm may not be sufficiently robust to replicate easily in a range of environmental contexts and populations.  相似文献   

11.
What does it mean to introduce the notion of imagination in the discussion about global justice? What is gained by studying the role of imagination in thinking about global justice? Does a focus on imagination imply that we must replace existing influential principle-centred approaches such as that of John Rawls and his critics?

We can distinguish between two approaches to global justice. One approach is Rawlsian and Kantian in inspiration. Discussions within this tradition typically focus on the question whether Rawls's theory of justice (1971), designed for the national level, can or should be applied to the global level. Can and should Rawls's Difference Principle be globalized, as Thomas Pogge argues? Is this proposal superior to Rawls's Law of Peoples (1999)? Another approach to global justice has been developed by Martha Nussbaum in Cultivating Humanity (1997), Poetic Justice (1995), and other work. I will construct her view and critically examine it by looking at her arguments about the relation between empathy, literature, and global justice.

At first sight, these two approaches seem to be opposed. The former puts an emphasis on principles, universal reason, and the moral aspects of institutions and their policies, whereas the latter is rather concerned with the relation between imagination and justice, with the particular, and with the individual moral development. But is this necessarily so? I will show that both approaches could benefit from each other's insights to strengthen their own position. Moreover, I will argue for middle way between, or an integration of the two approaches that combines principles and imagination. In this way, we can move towards a more comprehensive account of global justice.  相似文献   

12.
This article reviews the results of experimental studies on imitative behavior reported by various investigators, and then discusses the possible brain mechanisms responsible for this behavior. It was found that human infants in their first hours of life were already capable of spontaneous imitation of simple motor acts demonstrated by an adult, without previous training or reward; these observations suggest that imitative behavior is an innate process that can be considered anunconditional reflex of imitation. It was also found that satiated animals resumed eating when they saw their companions eating. In the latter case, the imitative reflex triggered the previously acquired feeding behavior. Similar mechanisms could be responsible for the phenomenon of eating more in the presence of companions than in their absence, as well as that of preferring the food chosen by companions. When followed by a reward, the imitative act can be learned—that is, transformed into aninstrumental conditional response; learning by imitation of simple motor acts was observed in animals, and that of complex motor acts was observed in children who had already achieved a certain developmental stage. In animals, learning complex motor tasks was facilitated by previous observation of a companion performing this task. In this case, the presence of the observer during the session could lead to habituation of the experimental situation and production of associations between this situation and stimuli or emotions related to the reward or punishment, and might result in more efficient learning later. The imitative behavior can be inhibited by stimuli producing responses antagonistic to the act of imitation.  相似文献   

13.
ABSTRACT

The list of proposed addictions has recently grown to include television, videogames, shopping, day trading, kleptomania, and use of the Internet. These activities share with a more established entry, gambling, the property that they require no delivery of a biological stimulus that might be thought to unlock a hardwired brain process. I propose a framework for analyzing that class of incentives that do not depend on the prediction of physically privileged environmental events: people have a great capacity to coin endogenous reward; we learn to cultivate it, and, where it is entrapping, to minimize it, by managing internally generated appetites for it. The basic method of cultivating endogenous reward is to learn cues that predict when best to harvest the reward that has been made possible by the growth of these appetites. This hedonic management occurs in the same motivational marketplace as the instrumental planning that seeks environmental goods in the conventional manner, and presumably obeys the same laws of temporal difference learning; but these laws are no longer limiting. Furthermore, instrumental contingencies often provide the most productive structure for hedonic management as well, for reasons that I discuss; but the needs of hedonic management create incentives both to pursue instrumental goals in a suboptimal manner and to avoid noticing how the hedonic incentive affects this pursuit. The result is the apparent irrationality that is often observed in process addictions.  相似文献   

14.
Reward learning is known to influence the automatic capture of attention. This study examined how the rate of learning, after high- or low-value reward outcomes, can influence future transfers into value-driven attentional capture. Participants performed an instrumental learning task that was directly followed by an attentional capture task. A hierarchical Bayesian reinforcement model was used to infer individual differences in learning from high or low reward. Results showed a strong relationship between high-reward learning rates (or the weight that is put on learning after a high reward) and the magnitude of attentional capture with high-reward colors. Individual differences in learning from high or low rewards were further related to performance differences when high- or low-value distractors were present. These findings provide novel insight into the development of value-driven attentional capture by showing how information updating after desired or undesired outcomes can influence future deployments of automatic attention.  相似文献   

15.
Summary

The convergence of theory and research on socially shared cognition represents a promising new direction for understanding how to enhance the intellectual growth of individuals. In this article, we draw upon the metaphor of “apprenticeship” to explain how individual cognitive development of children and adults alike can be enhanced by mentoring relationships within a particular educational “culture.” The view advanced here is that computers and related technologies can be instrumental in creating socially interactive and reflective learning communities. Within these communities there is active transmission of knowledge between individuals as they are guided from the periphery through to the center of the learning enterprise. Examples of communities of learners are provided to illustrate the process of socially shared cognition and development of knowledge networks. Principles for the creation of sustainable learning communities apply equally to traditional educational settings and on-line communities. The concept of the “collective zone of proximal development” is advanced here to explain how cognitive growth progressively occurs for community members who are operating within a socially interactive and reflective learning environment. Finally, principles and recommendations are offered on how to design communities so that all individuals can achieve their optimal functioning level through guided social participation.  相似文献   

16.
Autonomous systems such as Connected Autonomous Vehicles (CAVs), assistive robots are set improve the way we live. Autonomous systems need to be equipped with capabilities to Reinforcement Learning (RL) is a type of machine learning where an agent learns by interacting with its environment through trial and error, which has gained significant interest from research community for its promise to efficiently learn decision making through abstraction of experiences. However, most of the control algorithms used today in current autonomous systems such as driverless vehicle prototypes or mobile robots are controlled through supervised learning methods or manually designed rule-based policies. Additionally, many emerging autonomous systems such as driverless cars, are set in a multi-agent environment, often with partial observability. Learning decision making policies in multi-agent environments is a challenging problem, because the environment is not stationary from the perspective of a learning agent, and hence the Markov properties assumed in single agent RL does not hold. This paper focuses on learning decision-making policies in multi-agent environments, both in cooperative settings with full observability and dynamic environments with partial observability. We present experiments in simple, yet effective, new multi-agent environments to simulate policy learning in scenarios that could be encountered by an autonomous navigating agent such as a CAV. The results illustrate how agents learn to cooperate in order to achieve their objectives successfully. Also, it was shown that in a partially observable setting, an agent was capable of learning to roam around its environment without colliding in the presence of obstacles and other moving agents. Finally, the paper discusses how data-driven multi-agent policy learning can be extended to real-world environments by augmenting the intelligence of autonomous vehicles.  相似文献   

17.
While there is much literature on autonomy and the conditions for its attainment, there is less on how those conditions reflect on agents’ ordinary careers. Most people’s careers involve a great deal of subservient activity that would prevent the kind of control over agents’ actions that autonomy would seem to require. Yet, it would seem strange to deny autonomy to every agent who regularly follows orders at work—to do so would make autonomy a futile ideal. Most contemporary autonomy accounts provide purely theoretical analysis without reference to any practical goal that autonomy could serve. These accounts are likely to resolve this issue in one direction: either almost entirely including or excluding subservient workers from autonomy. Either solution would fail to distinguish agents who sufficiently control their lives, in spite of limited subservience, according to their own standards, from agents for whom subservience precludes a fulfilling life. I suggest the solution lies in a return to goal-oriented autonomy accounts, which can use the goal to distinguish when subservience overwhelms autonomy from when subservience and autonomy can coexist. I present an account that anchors autonomy in the happiness that it provides for agents who sufficiently control their lives as determined by their more important prudential standards. On this account, agents in subservient careers can be autonomous if they determine how to make their careers consistent with their happiness.  相似文献   

18.
The Wide-Scope view of instrumental reason holds that you should not intend an end without also intending what you believe to be the necessary means. This, the Wide-Scoper claims, provides the best account of why failing to intend the believed means to your end is a rational failing. But Wide-Scopers have struggled to meet a simple Explanatory Challenge: why shouldn't you intend an end without intending the necessary means? What reason is there not to do so? In the first half of this paper, I argue that the Wide-Scope view struggles to meet this challenge because it takes the principles of instrumental reason to have unlimited application—to apply to all agents, in all circumstances. I then go on to offer a new account of these principles. The new account is very much in the spirit of the Wide-Scope view, and shares its central advantages, but lacks its unlimited application. This view should, therefore, find the Explanatory Challenge more tractable. In the second half of the paper, I argue that this prediction is confirmed. If the requirements of instrumental reason apply only when a means is, or is believed to be, necessary for your end, then plausible independent claims, about reasons, rationality, and intentions, explain why failing to intend the necessary means to your ends is a rational failing.  相似文献   

19.
Behavior‐disordered children (N = 65) competed with a presumed unknown peer on consecutive administrations of an analogue aggression task of instrumental aggression (blocking the opponent’s game) and hostile aggression (sending the opponent a noise). The first administration as a reward‐only, nonpunishment condition. The second administration contained both reward and punishment conditions. Results indicated clear differences on aggressive responding during conditions of reward and punishment. Significant correlations were found between instrumental aggression during reward across the two administrations, whereas correlations between aggression during reward and aggression during punishment were nonsignificant. Teacher ratings of Covert‐Proactive Aggression correlated with analogue task instrumental aggression but not with hostile aggression on both administrations. Aggression during punishment was significantly correlated with Continuous Performance Test inattention and impulsivity scores, suggesting that impulsivity and inattention may play an important role in children’s ability to inhibit aggression during cues for punishment. These data indicate the utility of a laboratory analogue procedure to assess conditions associated with childhood aggression and to further our understanding of childhood aggression subtypes. Aggr. Behav. 27:1–13, 2001. © 2001 Wiley‐Liss, Inc.  相似文献   

20.
Rats exposed to incentive downshift show behavioral deterioration. This phenomenon, called successive negative contrast (SNC), occurs in instrumental and consummatory responses (iSNC, cSNC). Whereas iSNC is related to the violation of reward expectancies retrieved in anticipation of the goal (cued-recall), cSNC involves reward rejection and may require only recognition memory retrieved at consumption. The three within-subject experiments reported here suggest that cued-recall memory can also operate in cSNC under some conditions. A small but significant cSNC effect was obtained when animals were exposed to the conditioning context during an average 90-s interval before the introduction of the incentive (either 16% or 2% sucrose solutions), rather than being given immediate access to the sucrose upon entry into the context (Experiment 1). Neither simultaneous contrast (Experiment 2) nor simple sequential effects (Experiment 3) contribute to this within-subject version of cSNC. These results suggest that cSNC can be shifted to a cued-recall mode with appropriate training parameters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号