manually_patched_in_entries.txt

"24477102":{"id":24477102,"title":"Training and tracking in robotics","abst":"We explore the use of learning schemes in training and adapting performance on simple coordination tasks. The tasks are 1-D pole balancing. Several programs incorporating learning have already achieved this (1, S, 8): the problem is to move a cart along a short piece of track to at to keep a pole balanced on its end; the pole is hinged to the cart at its bottom, and the cart is moved either to the left or to the right by a force of constant magnitude. The form of the task considered here, after (3), involves a genuinely difficult credit-assignment problem. We use a learning scheme previously developed and analysed (1, 7) to achieve performance through reinforcement, and extend it to include changing and new requirements. For example, the length or mast of the pole can change, the bias of the force, its strength, and so on; and the system can be tasked to avoid certain regions altogether. In this way we explore the learning system&#039;s ability to adapt to changes and to profit from a selected training sequence, both of which are of obvious utility in practical robotics applications.The results described here were obtained using a computer simulation of the pole-balancing problem. A movie will be shown of the performance of the system under the various requirements and tasks.","url":"https://dblp.uni-trier.de/db/conf/ijcai/ijcai85.html#SelfridgeSB85","lang":null,"authors":[1997216218,2120464764,2305505430],"fos":[119857082,154945302,67203356,2777275308,2986914688,34413123,41008148],"journals":[],"conferences":[2793897755],"conference_series":[1203999783],"references":[38913427,1488252886,1970185999,2091565802,2138178898,2895239407],"filter_matches":["trainingtrackingrobotics"],"rank":20103,"citation_count":98,"estimated_citation_count":149,"publication_date":"1985-08-18","found_in":1},
"1584313244":{"id":1584313244,"title":"Vision-Based Behavior Acquisition For A Shooting Robot By Using A Reinforcement Learning","abst":"We propose a method which acquires a purposive behavior for a mobile robot to shoot a ball into the goal by using a vision-based reinforcement learning. A mobile robot (an agent) does not need to know any parameters of the 3-D environment or its kinematics/dynamics. Information about the changes of the environment is only  the image captured from a single TV camera mounted on the robot. An action-value function in terms of state is to be learned. Image positions of a ball and a goal are used as a state variable which shows the effect of an action previously taken. After the learning process, the robot tries to carry a ball near the goal and to shoot it. Both computer simulation and real robot experiments are shown, and discussion on the role of vision in the context of the vision-based reinforcement learning is given.","url":"http://ci.nii.ac.jp/naid/110003299723","lang":"ja","authors":[229000411,1973789311,2079479084,2111067488],"fos":[39920418,88044701,188116033,90509273,31972630,97541855,19966478,77967617,188888258,154945302,41008148],"journals":[],"conferences":[],"conference_series":[],"references":[169977351,207822505,969698184,1557517019,1594201624,1966089223,2050797564,2059039791,2061361125,2097856935,2149276032,2341171179,3011120880],"filter_matches":["asr"],"rank":20112,"citation_count":27,"estimated_citation_count":27,"publication_date":"1994-01-21","found_in":1},
"2012036715":{"id":2012036715,"title":"Transfer of Learning by Composing Solutions of Elemental Sequential Tasks","abst":"Although building sophisticated learning agents that operate in complex environments will require learning to perform multiple tasks, most applications of reinforcement learning have focused on single tasks. In this paper I consider a class of sequential decision tasks (SDTs), called composite sequential decision tasks, formed by temporally concatenating a number of elemental sequential decision tasks. Elemental SDTs cannot be decomposed into simpler SDTs. I consider a learning agent that has to learn to solve a set of elemental and composite SDTs. I assume that the structure of the composite tasks is unknown to the learning agent. The straightforward application of reinforcement learning to multiple tasks requires learning the tasks separately, which can waste computational resources, both memory and time. I present a new learning algorithm and a modular architecture that learns the decomposition of composite SDTs, and achieves transfer of learning by sharing the solutions of elemental SDTs across multiple composite SDTs. The solution of a composite SDT is constructed by computationally inexpensive modifications of the solutions of its constituent elemental SDTs. I provide a proof of one aspect of the learning algorithm.","url":"https://rd.springer.com/chapter/10.1007/978-1-4615-3618-5_6","lang":"en","authors":[2102570927],"fos":[2992882098,154945302,28006648,2988947689,150899416,11866591,2988486947,119857082,97541855,41008148],"journals":[62148650],"conferences":[],"conference_series":[],"references":[1253821906,1491843047,1500024457,1545148916,1557517019,1586172133,1640247718,1931792391,1979071892,1979500821,1996847178,2001729196,2035446426,2060587510,2091565802,2100677568,2110415190,2135630072,2135995262,2150884987,3011120880,3017143921],"filter_matches":["cse"],"rank":18877,"citation_count":245,"estimated_citation_count":425,"publication_date":"1992-05-01","found_in":1},
"2117629901":{"id":2117629901,"title":"A comparison of direct and model-based reinforcement learning","abst":"This paper compares direct reinforcement learning (no explicit model) and model-based reinforcement learning on a simple task: pendulum swing up. We find that in this task model-based approaches support reinforcement learning from smaller amounts of training data and efficient handling of changing goals.","url":"https://dblp.uni-trier.de/db/conf/icra/icra1997.html#AtkesonS97a","lang":"en","authors":[2161070301,2649737133],"fos":[2994509759,17500928,90509273,196340769,199190896,51632099,133731056,154945302,65244806,110639684,119857082,127413603,97541855],"journals":[],"conferences":[],"conference_series":[1163902177],"references":[1491843047,1585546214,1595634327,1597173708,1616818660,1689445748,1966195676,1972586294,1990005421,2002971752,2009533501,2013232999,2048226872,2080759927,2083143894,2105038027,2107726111,2116039916,2118426468,2119717200,2121832485,2124175081,2131398727,2131600418,2147766102,2463510513],"filter_matches":["cdm"],"rank":19864,"citation_count":161,"estimated_citation_count":243,"publication_date":"1997-04-20","found_in":1},
"1822705290":{"id":1822705290,"title":"Effective control knowledge transfer through learning skill and representation hierarchies","abst":"Learning capabilities of computer systems still lag far behind biological systems. One of the reasons can be seen in the inefficient re-use of control knowledge acquired over the lifetime of the artificial learning system. To address this deficiency, this paper presents a learning architecture which transfers control knowledge in the form of behavioral skills and corresponding representation concepts from one task to subsequent learning tasks. The presented system uses this knowledge to construct a more compact state space representation for learning while assuring bounded optimality of the learned task policy by utilizing a representation hierarchy. Experimental results show that the presented method can significantly outperform learning on a flat state space representation and the MAXQ method for hierarchical reinforcement learning.","url":"https://ijcai.org/papers07/Papers/IJCAI07-331.pdf","lang":null,"authors":[2132793154,2140678636],"fos":[58973888,8038995,119857082,188888258,120822770,199190896,77967617,154945302,28006648,24138899,41008148],"journals":[],"conferences":[2793510348],"conference_series":[1203999783],"references":[1553182805,1801398035,2038694949,2061504687,2102000945,2109910161,2111625828,2121517924,2126565096,2130903752],"filter_matches":["ecr"],"rank":20412,"citation_count":38,"estimated_citation_count":58,"publication_date":"2007-01-06","found_in":1},
"2089561656":{"id":2089561656,"title":"State abstraction for programmable reinforcement learning agents","abst":"Safe state abstraction in reinforcement learning allows an agent to ignore aspects of its current state that are irrelevant to its current decision, and therefore speeds up dynamic programming and learning. This paper explores safe state abstraction in hierarchical reinforcement learning, where learned behaviors must conform to a given partial, hierarchical program. Unlike previous approaches to this problem, our methods yield significant state abstraction while maintaining hierarchical optimality, i.e., optimality among all policies consistent with the partial program. We show how to achieve this for a partial programming language that is essentially Lisp augmented with nondeterministic constructs. We demonstrate our methods on two variants of Dietterich&#039;s taxi domain, showing how state abstraction and hierarchical optimality result in faster learning of better policies and enable the transfer of learned skills from one problem to another.","url":"http://dl.acm.org/citation.cfm?id=894129","lang":"en","authors":[2128362942,2805045621],"fos":[119857082,37404715,97541855,124304363,47932503,190883126,154945302,176181172,199190896,41008148],"journals":[],"conferences":[2785605827],"conference_series":[1184914352],"references":[112321980,1488730473,1552562496,1650504995,1777239053,1982678075,2039153121,2107726111,2109910161,2121517924,2156067405,2158548602,2341171179],"filter_matches":["sapr"],"rank":19113,"citation_count":169,"estimated_citation_count":278,"publication_date":"2002-07-28","found_in":1},
"1612195517":{"id":1612195517,"title":"Relativized options: choosing the right transformation","abst":"Relativized options combine model minimization methods and a hierarchical reinforcement learning framework to derive compact reduced representations of a related family of tasks. Relativized options are defined without an absolute frame of reference, and an option&#039;s policy is transformed suitably based on the circumstances under which the option is invoked. In earlier work we addressed the issue of learning the option policy online. In this article we develop an algorithm for choosing, from among a set of candidate transformations, the right transformation for each member of the family of tasks.","url":"https://works.bepress.com/andrew_barto/6/","lang":"en","authors":[1956086128,1997216218],"fos":[119857082,97541855,56397880,41008148,74992021,154945302,147764199],"journals":[],"conferences":[2785245801],"conference_series":[1180662882],"references":[79394677,151521611,1533853869,1534331386,1545378070,1598052524,1988217924,2001729196,2058735307,2059677035,2097815751,2107628283,2109910161,2121517924,2143435603,2158548602,2159599017,2168342951],"filter_matches":["opt"],"rank":20165,"citation_count":37,"estimated_citation_count":54,"publication_date":"2003-08-21","found_in":1},
"1607318605":{"id":1607318605,"title":"Proto-transfer Learning in Markov Decision Processes Using Spectral Methods","abst":"In this paper we introduce proto-transfer leaning, a new framework for transfer learning. We explore solutions to transfer learning within reinforcement learning through the use of spectral methods. Proto-value functions (PVFs) are basis functions computed from a spectral analysis of random walks on the state space graph. They naturally lead to the ability to transfer knowledge and representation between related tasks or domains. We investigate task transfer by using the same PVFs in Markov decision processes (MDPs) with different rewards functions. Additionally, our experiments in domain transfer explore applying the Nystrom method for interpolation of PVFs between MDPs of different sizes. 1. Problem Statement The aim of transfer learning is to reuse behavior by using the knowledge learned about one domain or task to accelerate learning in a related domain or task. In this paper we explore solutions to transfer learning within reinforcement learning (Sutton &amp; Barto, 1998) through spectral methods. The new framework of proto-transfer learning transfers representations from one domain to another. This transfer entails the reuse of eigenvectors learned from one graph on another. We explore how to transfer knowledge learned on the source graph to a similar graph by modifying the eigenvectors of the Laplacian of the source domain to be reused for the target domain. Proto-value functions (PVFs) are a natural abstraction since they condense a domain by automatically learning an embedding of the Appearing in the ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning, Pittsburgh, PA, 2006. Copyright 2006 by the author(s)/owner(s). state space based on its topology (Mahadevan, 2005). PVFs lead to the ability to transfer knowledge about domains and tasks, since they are constructed without taking reward into account. We define task transfer as the problem of transferring knowledge when the state space remains the same and only the reward differs. For task transfer, taskindependent basis functions, such as PVFs, can be reused from one task to the next without modification. Domain transfer refers to the more challenging problem of the state space changing. This change in state space can be a change in topology (i.e. obstacles moving to different locations) or a change in scale (i.e. a smaller or larger domain of the same shape). For domain transfer, the basis functions may need to be modified to reflect the changes in the state space. (Foster &amp; Dayan, 2002) study the task transfer problem by applying unsupervised, mixture model, learning methods to a collection of optimal value functions of different tasks in order to decompose and extract the underlying structure. In this paper, we investigate task transfer in discrete domains by reusing PVFs in MDPs with different reward functions. For domain transfer, we apply the Nystrom extension for interpolation of PVFs between MDPs of different sizes (Mahadevan et al., 2006). Previous work has accelerated learning when transferring behaviors between tasks and domains (Taylor et al., 2005), but we transfer representation and reuse knowledge to learn comparably on a new task or domain.","url":"http://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1149&amp;context=cs_faculty_pubs","lang":"en","authors":[2010118303,2133778237],"fos":[2777965961,97541855,72434380,150899416,106189395,33923547,154945302,28006648,2776960227,8038995],"journals":[],"conferences":[],"conference_series":[],"references":[194200989,658559791,1515851193,1598748993,2006373179,2032618685,2128905965,2130005627,2143958939,2271263738],"filter_matches":["markovspectral"],"rank":20149,"citation_count":40,"estimated_citation_count":61,"publication_date":"2006-01-01","found_in":1},
"2165792602":{"id":2165792602,"title":"Improving action selection in MDP&#039;s via knowledge transfer","abst":"Temporal-difference reinforcement learning (RL) has been successfully applied in several domains with large state sets. Large action sets, however, have received considerably less attention. This paper demonstrates the use of knowledge transfer between related tasks to accelerate learning with large action sets. We introduce action transfer, a technique that extracts the actions from the (near-)optimal solution to the first task and uses them in place of the full action set when learning any subsequent tasks. When optimal actions make up a small fraction of the domain&#039;s action set, action transfer can substantially reduce the number of actions and thus the complexity of the problem. However, action transfer between dissimilar tasks can be detrimental. To address this difficulty, we contribute randomized task perturbation (RTP), an enhancement to action transfer that makes it robust to unrepresentative source tasks. We motivate RTP action transfer with a detailed theoretical analysis featuring a formalism of related tasks and a bound on the suboptimality of action transfer. The empirical results in this paper show the potential of RTP action transfer to substantially expand the applicability of RL to problems with large action sets.","url":"http://www.cs.utexas.edu/users/ai-lab/?AAAI05-actions","lang":"en","authors":[183355381,2147180669],"fos":[97541855,119857082,38706069,2776960227,162696548,166109690,154945302,41008148],"journals":[],"conferences":[2785841295],"conference_series":[1184914352],"references":[1515851193,1631187438,1800916125,2041367235,2117341272,2121517924,2121863487,2134153324,2154549708,2155791599,2160279936,3011120880],"filter_matches":["ias"],"rank":19780,"citation_count":62,"estimated_citation_count":93,"publication_date":"2005-07-09","found_in":1},
"1598748993":{"id":1598748993,"title":"Structure in the Space of Value Functions","abst":"Solving in an efficient manner many different optimal control tasks within the same underlying environment requires decomposing the environment into its computationally elemental fragments. We suggest how to find fragmentations using unsupervised, mixture model, learning methods on data derived from optimal value functions for multiple tasks, and show that these fragmentations are in accord with observable structure in the environments. Further, we present evidence that such fragments can be of use in a practical reinforcement learning context, by facilitating online, actor-critic learning of multiple goals MDPs.","url":"https://jhu.pure.elsevier.com/en/publications/structure-in-the-space-of-value-functions-3","lang":"en","authors":[2057359516,2141591103],"fos":[37404715,58973888,91575142,77967617,119857082,8038995,154945302,61224824,97541855,188116033,178980831,33923547],"journals":[62148650],"conferences":[],"conference_series":[],"references":[16046748,125130877,354832773,1488730473,1515851193,1533169541,1568042657,1594216983,1600813180,1610678877,1631187438,1748123235,1756110333,1981814724,1992402718,2020149918,2037210683,2049633694,2061504687,2069317438,2091565802,2098589862,2099111195,2101533993,2102000945,2102409316,2103504761,2110415190,2111874892,2114451917,2117341272,2121863487,2125510930,2151454335,2156067405,2158548602,2160371091,2162837059,2259005268,2951774643,3011120880],"filter_matches":["ssvf"],"rank":20450,"citation_count":47,"estimated_citation_count":72,"publication_date":"2002-11-01","found_in":1},
"1974043469":{"id":1974043469,"title":"Learning domain structure through probabilistic policy reuse in reinforcement learning","abst":"Policy Reuse is a transfer learning approach to improve a reinforcement learner with guidance from previously learned similar policies. The method uses the past policies as a probabilistic bias where the learner chooses among the exploitation of the ongoing learned policy, the exploration of random unexplored actions, and the exploitation of past policies. In this work, we demonstrate that Policy Reuse further contributes to the learning of the structure of a domain. Interestingly and almost as a side effect, Policy Reuse identifies classes of similar policies revealing a basis of core-policies of the domain. We demonstrate theoretically that, under a set of conditions to be satisfied, reusing such a set of core-policies allows us to bound the minimal expected gain received while learning a new policy. In general, Policy Reuse contributes to the overall goal of lifelong reinforcement learning, as (i) it incrementally builds a policy library; (ii) it provides a mechanism to reuse past policies; and (iii) it learns an abstract domain structure in terms of core-policies of the domain.","url":"https://paperity.org/p/3969703/learning-domain-structure-through-probabilistic-policy-reuse-in-reinforcement-learning","lang":"en","authors":[2108671403,2128007263],"fos":[154945302,139502532,97541855,124101348,56739046,67203356,119857082,47932503,49937458,3019959826,41008148,206588197,150899416],"journals":[2480581173],"conferences":[],"conference_series":[],"references":[26772505,36691172,69737162,114278598,1028811162,1255659923,1258105458,1486747874,1489563120,1494114146,1504212531,1512137381,1556824961,1557798492,1585546346,1598052524,1696410204,1853223271,1964150045,2012036715,2014512216,2056584142,2090170171,2091633639,2097113539,2097381042,2103626435,2104641222,2107726111,2109910161,2114451917,2114580749,2121517924,2121863487,2122451452,2122982548,2128118446,2128905965,2131746053,2133040789,2137375617,2153668164,2156493855,2159666783,2161571887,2164114810,2165792602,2519453022,3008729385,3011120880],"filter_matches":["ppr"],"rank":19885,"citation_count":20,"estimated_citation_count":20,"publication_date":"2013-03-01","found_in":1},
"2042357378":{"id":2042357378,"title":"Multitask Reinforcement Learning on the Distribution of MDPs","abst":"In this paper we address a new problem in reinforcement learning. Here we consider an agent that faces multiple learning tasks within its lifetime. The agent&#x2019;s objective is to maximize its total reward in the lifetime as well as a conventional return in each task. To realize this, it has to be endowed an important ability to keep its past learning experiences and utilize them for improving future learning performance. This time we try to phrase this problem formally. The central idea is to introduce an environmental class, BV-MDPs that is defined with the distribution of MDPs. As an approach to exploiting past learning experiences, we focus on statistical information (mean and deviation) about the agent&#x2019;s value tables. The mean can be used as initial values of the table when a new task is presented. The deviation can be viewed as measuring reliability of the mean, and we utilize it in calculating priority of simulated backups. We conduct experiments in computer simulation to evaluate the effectiveness.","url":"https://ui.adsabs.harvard.edu/abs/2003ITEIS.123.1004F/abstract","lang":"en","authors":[2329837719,2685624219],"fos":[188116033,154945302,28006648,188888258,199190896,119857082,97541855,41008148,8038995,77967617,58973888],"journals":[178577447],"conferences":[],"conference_series":[],"references":[1491843047,1557517019,2012036715,2048226872,2100677568,2106639887,2107726111,2121863487],"filter_matches":["rld"],"rank":21731,"citation_count":10,"estimated_citation_count":10,"publication_date":"2003-01-01","found_in":1},
"2169743339":{"id":2169743339,"title":"Multi-task reinforcement learning: a hierarchical Bayesian approach","abst":"We consider the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes (MDPs) chosen randomly from a fixed but unknown distribution. We model the distribution over MDPs using a hierarchical Bayesian infinite mixture model. For each novel MDP, we use the previously learned distribution as an informed prior for modelbased Bayesian reinforcement learning. The hierarchical Bayesian framework provides a strong prior that allows us to rapidly infer the characteristics of new environments based on previous environments, while the use of a nonparametric model allows us to quickly adapt to environments we have not encountered before. In addition, the use of infinite mixtures allows for the model to automatically learn the number of underlying MDP components. We evaluate our approach and show that it leads to significant speedups in convergence to an optimal policy after observing only a small number of tasks.","url":"https://dl.acm.org/citation.cfm?doid=1273496.1273624","lang":"en","authors":[1993564419,2126535395,2139785505,2440006205],"fos":[61224824,191413810,97541855,178980831,71923881,71983512,2776886580,41008148,119857082,154945302,106189395,107673813],"journals":[],"conferences":[2784560012],"conference_series":[1180662882],"references":[1496855202,1515851193,1582436621,1591803298,2039522160,2071814471,2079247031,2080972498,2106953752,2121863487,2128775537,2132057084],"filter_matches":["hba"],"rank":19011,"citation_count":169,"estimated_citation_count":235,"publication_date":"2007-06-20","found_in":1},
"2106953752":{"id":2106953752,"title":"General game learning using knowledge transfer","abst":"We present a reinforcement learning game player that can interact with a General Game Playing system and transfer knowledge learned in one game to expedite learning in many other games. We use the technique of value-function transfer where general features are extracted from the state space of a previous game and matched with the completely different state space of a new game. To capture the underlying similarity of vastly disparate state spaces arising from different games, we use a game-tree lookahead structure for features. We show that such feature-based value function transfer learns superior policies faster than a reinforcement learning agent that does not use knowledge transfer. Furthermore, knowledge transfer using lookahead features can capture opponent-specific value-functions, i.e. can exploit an opponent&#039;s weaknesses to learn faster than a reinforcement learner that uses lookahead with minimax (pessimistic) search against the same opponent.","url":"http://ijcai.org/Proceedings/07/Papers/107.pdf","lang":"en","authors":[2147180669,2162022237],"fos":[167573328,73795354,41008148,154945302,202556891,2780617750,14642086,119857082,102234262,47175762,170828538],"journals":[],"conferences":[2793510348],"conference_series":[1203999783],"references":[47250057,1515851193,2041367235,2099587183,2111572265,2121863487,2126565096,2145943363,2964331425],"filter_matches":["glgt"],"rank":19380,"citation_count":79,"estimated_citation_count":103,"publication_date":"2007-01-06","found_in":1},
"2134153324":{"id":2134153324,"title":"Generalizing plans to new environments in relational MDPs","abst":"A longstanding goal in planning research is the ability to generalize plans developed for some set of environments to a new but similar environment, with minimal or no replanning. Such generalization can both reduce planning time and allow us to tackle larger domains than the ones tractable for direct planning. In this paper, we present an approach to the generalization problem based on a new framework of relational Markov Decision Processes (RMDPs). An RMDP can model a set of similar environments by representing objects as instances of different classes. In order to generalize plans to multiple environments, we define an approximate value function specified in terms of classes of objects and, in a multiagent setting, by classes of agents. This class-based approximate value function is optimized relative to a sampled subset of environments, and computed using an efficient linear programming method. We prove that a polynomial number of sampled environments suffices to achieve performance close to the performance achievable when optimizing over the entire space. Our experimental results show that our method generalizes plans successfully to new, significantly larger, environments, with minimal loss of performance relative to environment-specific planning. We demonstrate our approach on a real strategic computer war game.","url":"http://www.ijcai.org/Past%20Proceedings/IJCAI-2003/PDF/144.pdf","lang":"en","authors":[1988556028,2167404190,2283966317,2290440925],"fos":[2993270172,41008148,106189395,41045048,119857082,154945302,126255220,14646407,177148314],"journals":[],"conferences":[2792991336],"conference_series":[1203999783],"references":[11088338,41895731,1515851193,1557798492,1560550898,1631187438,1654728867,1800916125,1967346767,2020149918,2040766536,2096600060,2096622112,2110111529,2121517924,2121863487,2134779831,2149385746,2158479468],"filter_matches":["per"],"rank":19050,"citation_count":149,"estimated_citation_count":226,"publication_date":"2003-08-09","found_in":1},
"2166798247":{"id":2166798247,"title":"Transfer learning in real-time strategy games using hybrid CBR/RL","abst":"The goal of transfer learning is to use the knowledge acquired in a set of source tasks to improve performance in a related but previously unseen target task. In this paper, we present a multilayered architecture named CAse-Based Reinforcement Learner (CARL). It uses a novel combination of Case-Based Reasoning (CBR) and Reinforcement Learning (RL) to achieve transfer while playing against the Game AI across a variety of scenarios in MadRTSTM, a commercial Real Time Strategy game. Our experiments demonstrate that CARL not only performs well on individual tasks but also exhibits significant performance gains when allowed to transfer knowledge from previous tasks.","url":"https://dl.acm.org/citation.cfm?id=1625275.1625444","lang":"en","authors":[2100123673,2106580818,2109056120,2126228213,2135226159,2251311143],"fos":[97541855,28006648,2781170869,119857082,150899416,67203356,74678566,154945302,41008148],"journals":[],"conferences":[2793510348],"conference_series":[1203999783],"references":[1487906362,1500151553,1515851193,1565097357,1592209052,2100677568,2126385963,2132057084,2140095144,2154549708,2312609093],"filter_matches":["slh"],"rank":19167,"citation_count":110,"estimated_citation_count":183,"publication_date":"2007-01-06","found_in":1},
"2122982548":{"id":2122982548,"title":"Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression","abst":"We present a novel formulation for providing advice to a reinforcement learner that employs support-vector regression as its function approximator. Our new method extends a recent advice-giving technique, called Knowledge-Based Kernel Regression (KBKR), that accepts advice concerning a single action of a reinforcement learner. In KBKR, users can say that in some set of states, an action&#039;s value should be greater than some linear expression of the current state. In our new technique, which we call Preference KBKR (Pref-KBKR), the user can provide advice in a more natural manner by recommending that some action is preferred over another in the specified set of states. Specifying preferences essentially means that users are giving advice about policies rather than Q values, which is a more natural way for humans to present advice. We present the motivation for preference advice and a proof of the correctness of our extension to KBKR. In addition, we show empirical results that our method can make effective use of advice on a novel reinforcement-learning task, based on the RoboCup simulator, which we call Breakaway. Our work demonstrates the significant potential of advice-giving techniques for addressing complex reinforcement learning problems, while further demonstrating the use of support-vector regression for reinforcement learning.","url":"https://experts.umn.edu/en/publications/giving-advice-about-preferred-actions-to-reinforcement-learners-v","lang":"en","authors":[738944226,2047441381,2079278047,2120363087,2464448550],"fos":[41008148,97541855,67203356,55439883,154945302,200695384,83546350,119857082],"journals":[],"conferences":[2785841295],"conference_series":[1184914352],"references":[193428430,1497976081,1511354083,1515851193,1584120419,1617610651,1976115983,2093404847,2121863487,2134289401,2140584963,2141559645,2145739724,2155791599,3022194887],"filter_matches":["apa"],"rank":19317,"citation_count":82,"estimated_citation_count":118,"publication_date":"2005-07-09","found_in":1},
"1506146479":{"id":1506146479,"title":"Skill acquisition via transfer learning and advice taking","abst":"We describe a reinforcement learning system that transfers skills from a previously learned source task to a related target task. The system uses inductive logic programming to analyze experience in the source task, and transfers rules for when to take actions. The target task learner accepts these rules through an advice-taking algorithm, which allows learners to benefit from outside guidance that may be imperfect. Our system accepts a human-provided mapping, which specifies the similarities between the source and target tasks and may also include advice about the differences between them. Using three tasks in the RoboCup simulated soccer domain, we demonstrate that this system can speed up reinforcement learning substantially.","url":"https://experts.umn.edu/en/publications/skill-acquisition-via-transfer-learning-and-advice-taking","lang":"en","authors":[738944226,2047441381,2079278047,2120363087],"fos":[154945302,68339613,150899416,34413123,132758656,2776960227,28006648,41008148,2779382394,97541855],"journals":[],"conferences":[],"conference_series":[2755314191],"references":[198956113,1515851193,1834252692,1987902506,2012036715,2100677568,2104046064,2121863487,2122982548,2124175081,2126565096,2140584963,2153353285,2155791599,3022194887],"filter_matches":["sat"],"rank":19301,"citation_count":51,"estimated_citation_count":78,"publication_date":"2006-09-18","found_in":1},
"2123995443":{"id":2123995443,"title":"Graph-Based Domain Mapping for Transfer Learning in General Games","abst":"A general game player is an agent capable of taking as input a description of a game&#039;s rules in a formal language and proceeding to play without any subsequent human input. To do well, an agent should learn from experience with past games and transfer the learned knowledge to new problems. We introduce a graph-based method for identifying previously encountered games and prove its robustness formally. We then describe how the same basic approach can be used to identify similar but non-identical games. We apply this technique to automate domain mapping for value function transfer and speed up reinforcement learning on variants of previously played games. Our approach is fully implemented with empirical results in the general game playing system.","url":"http://www.cs.utexas.edu/users/ai-lab/?kuhlmann:ecml07","lang":"en","authors":[2147180669,2160943816],"fos":[95940807,97541855,170828538,14642086,2780617750,102234262,73795354,167573328,154945302,41008148],"journals":[],"conferences":[],"conference_series":[2755314191],"references":[95993446,135031542,1515851193,1548156140,1984290203,2077329922,2099587183,2106953752,2110630796,2111572265,2121863487,2145943363,2151259087,2154328025,2312609093,2396715201,2911296969,3020831056],"filter_matches":["gggt","glgt"],"rank":19443,"citation_count":46,"estimated_citation_count":69,"publication_date":"2007-09-17","found_in":1},
"203338875":{"id":203338875,"title":"An experts algorithm for transfer learning","abst":"A long-lived agent continually faces new tasks in its environment. Such an agent may be able to use knowledge learned in solving earlier tasks to produce candidate policies for its current task. There may, however, be multiple reasonable policies suggested by prior experience, and the agent must choose between them potentially without any a priori knowledge about their applicability to its current situation. We present an &quot;experts&quot; algorithm for efficiently choosing amongst candidate policies in solving an unknown Markov decision process task. We conclude with the results of experiments on two domains in which we generate candidate policies from solutions to related tasks and use our experts algorithm to choose amongst them.","url":"https://dl.acm.org/citation.cfm?id=1625275.1625448","lang":null,"authors":[2090935536,2102570927],"fos":[75553542,150899416,154945302,41008148,106189395,119857082,11413529],"journals":[],"conferences":[2793510348],"conference_series":[1203999783],"references":[79394677,1516061453,1517018472,1970041563,1979675141,1998498767,2009551863,2011277999,2028357975,2077902449,2098339418,2105507006,2126565096,2169659168,2489939061],"filter_matches":["eta"],"rank":19737,"citation_count":42,"estimated_citation_count":58,"publication_date":"2007-01-06","found_in":1}
"1234567":{"id":1234567,"title":"Model Transfer for Markov Decision Tasks via Parameter Matching","abst":"","url":"","lang":"en","authors":[],"fos":[],"journals":[],"conferences":[],"conference_series":[],"references":[],"filter_matches":["eta"],"rank":0,"citation_count":26,"estimated_citation_count":0,"publication_date":"2006-01-01","found_in":1}
"7654321":{"id":1234567,"title":"Transferring state abstractions between MDPs","abst":"","url":"","lang":"en","authors":[],"fos":[],"journals":[],"conferences":[],"conference_series":[],"references":[],"filter_matches":["eta"],"rank":0,"citation_count":51,"estimated_citation_count":0,"publication_date":"2006-01-01","found_in":1}