ICML 2016 強(qiáng)化學(xué)習(xí)相關(guān)論文

Neil Zhu,簡書ID Not_GOD,University AI 創(chuàng)始人 & Chief Scientist,致力于推進(jìn)世界人工智能化進(jìn)程。制定并實施 UAI 中長期增長戰(zhàn)略和目標(biāo),帶領(lǐng)團(tuán)隊快速成長為人工智能領(lǐng)域最專業(yè)的力量。
作為行業(yè)領(lǐng)導(dǎo)者,他和UAI一起在2014年創(chuàng)建了TASA(中國最早的人工智能社團(tuán)), DL Center(深度學(xué)習(xí)知識中心全球價值網(wǎng)絡(luò)),AI growth(行業(yè)智庫培訓(xùn))等,為中國的人工智能人才建設(shè)輸送了大量的血液和養(yǎng)分。此外,他還參與或者舉辦過各類國際性的人工智能峰會和活動,產(chǎn)生了巨大的影響力,書寫了60萬字的人工智能精品技術(shù)內(nèi)容,生產(chǎn)翻譯了全球第一本深度學(xué)習(xí)入門書《神經(jīng)網(wǎng)絡(luò)與深度學(xué)習(xí)》,生產(chǎn)的內(nèi)容被大量的專業(yè)垂直公眾號和媒體轉(zhuǎn)載與連載。曾經(jīng)受邀為國內(nèi)頂尖大學(xué)制定人工智能學(xué)習(xí)規(guī)劃和教授人工智能前沿課程,均受學(xué)生和老師好評。

ICML 16-全部接受論文

ICML 2016 - 強(qiáng)化學(xué)習(xí)相關(guān)論文 如下:

1. Inverse Optimal Control with Deep Networks via Policy Optimization

Chelsea Finn, UC Berkeley; Sergey Levine, ; Pieter Abbeel, Berkeley

摘要:

http://arxiv.org/abs/1603.00448

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

Nan Jiang, University of Michigan; Lihong Li, Microsoft

http://arxiv.org/abs/1511.03722

Smooth Imitation Learning

Hoang Le, Caltech; Andrew Kang, ; Yisong Yue, Caltech; Peter Carr,

PAC Lower Bounds and Efficient Algorithms for The Max KK-Armed Bandit Problem

Yahel David, Technion; Nahum Shimkin, Technion

Anytime Exploration for Multi-armed Bandits using Confidence Information

Kwang-Sung Jun, UW-Madison; Robert Nowak,

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Yingfei Wang, Princeton University; Chu Wang, ; Warren Powell,

https://arxiv.org/abs/1510.02354

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

Junpei Komiyama, The University of Tokyo; Junya Honda, The University of Tokyo; Hiroshi Nakagawa, The University of Tokyo

https://arxiv.org/abs/1605.01677

Benchmarking Deep Reinforcement Learning for Continuous Control

Yan Duan, University of California, Berk; Xi Chen, University of California, Berkeley; Rein Houthooft, Ghent University; John Schulman, University of California, Berkeley; Pieter Abbeel, Berkeley

https://arxiv.org/abs/1604.06778

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

Prashanth L.A., University of Maryland ; Cheng Jie, University of Maryland – College Park; Michael Fu, University of Maryland – College Park; Steve Marcus, University of Maryland – College Park; Csaba Szepesvari, Alberta

http://arxiv.org/abs/1506.02632

An optimal algorithm for the Thresholding Bandit Problem

Andrea LOCATELLI, University of Potsdam; Maurilio Gutzeit, Universit?t Potsdam; Alexandra Carpentier,

Sequential decision making under uncertainty: Are most decisions easy?

Ozgur Simsek, ; Simon Algorta, ; Amit Kothiyal,

Opponent Modeling in Deep Reinforcement Learning

He He, ; Jordan , ; Hal Daume, Maryland

Softened Approximate Policy Iteration for Markov Games

Julien Pérolat, Univ. Lille; Bilal Piot, Univ. Lille; Matthieu Geist, ; Bruno Scherrer, ; Olivier Pietquin, Univ. Lille, CRIStAL, UMR 9189, SequeL Team, Villeneuve d’Ascq, 59650, FRANCE

Asynchronous Methods for Deep Reinforcement Learning

Volodymyr Mnih, Google DeepMind; Adria Puigdomenech Badia, Google DeepMind; Mehdi Mirza, ; Alex Graves, Google DeepMind; Timothy Lillicrap, Google DeepMind; Tim Harley, Google DeepMind; David , ; Koray Kavukcuoglu, Google Deepmind

https://arxiv.org/abs/1602.01783

Dueling Network Architectures for Deep Reinforcement Learning

Ziyu Wang, Google Inc.; Nando de Freitas, University of Oxford; Tom Schaul, Google Inc.; Matteo Hessel, Google Deepmind; Hado van Hasselt, Google DeepMind; Marc Lanctot, Google Deepmind

http://arxiv.org/abs/1511.06581 Cited by 10

Differentially Private Policy Evaluation

Borja Balle, Lancaster University; Maziar Gomrokchi, McGill University; Doina Precup, McGill

https://arxiv.org/abs/1603.02010

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Philip Thomas, CMU; Emma ,

https://arxiv.org/abs/1604.00923

Hierarchical Decision Making In Electricity Grid Management

Gal Dalal, Technion; Elad Gilboa, Technion; Shie Mannor, Technion

http://arxiv.org/abs/1603.01840

Generalization and Exploration via Randomized Value Functions

Ian Osband, Stanford; Ben , ; Zheng Wen, Adobe Research

https://arxiv.org/abs/1402.0635 Cited by 9

Scalable Discrete Sampling as a Multi-Armed Bandit Problem

Yutian Chen, University of Cambridge; Zoubin ,

摘要

Drawing a sample from a discrete distribution is one of the building components for Monte Carlo methods. Like other sampling algorithms, discrete sampling suffers from the high computational burden in large-scale inference problems. We study the problem of sampling a discrete random variable with a high degree of dependency that is typical in large-scale Bayesian inference and graphical models, and propose an efficient approximate solution with a subsampling approach. We make a novel connection between the discrete sampling and Multi-Armed Bandits problems with a finite reward population and provide three algorithms with theoretical guarantees. Empirical evaluations show the robustness and efficiency of the approximate algorithms in both synthetic and real-world large-scale problems.

http://arxiv.org/abs/1506.09039

Model-Free Imitation Learning with Policy Optimization

Jonathan Ho, Stanford; Jayesh Gupta, Stanford University; Stefano Ermon,

Improving the Efficiency of Deep Reinforcement Learning with Normalized Advantage Functions and Synthetic Experience

Shixiang Gu, University of Cambridge; Sergey Levine, Google; Timothy Lillicrap, Google DeepMind; Ilya Sutskever, OpenAI

http://arxiv.org/abs/1603.00748

Near Optimal Behavior via Approximate State Abstraction

David Abel, Brown University; David Hershkowitz, Brown University; Michael Littman,

https://cs.brown.edu/~dabel/papers/abel_approx_abstraction.pdf

Model-Free Trajectory Optimization for Reinforcement Learning of Motor Skills

Riad Akrour, TU Darmstadt; Gerhard Neumann

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 還是有很多東西不舍得丟掉 打卡05-11/17 越是放在后面收拾的,其實是越難割舍的。這個抽屜我打開又關(guān)上,關(guān)上又...
    靜守一隅閱讀 231評論 0 0
  • 說在前面的話: UIWebView因為其通用性,在iOS開發(fā)中經(jīng)常被使用到。比如用來在應(yīng)用內(nèi)加載某個網(wǎng)頁或HTML...
    teanfoo閱讀 1,023評論 0 1

友情鏈接更多精彩內(nèi)容