Neil Zhu，簡書ID Not_GOD，University AI 創(chuàng)始人 & Chief Scientist，致力于推進(jìn)世界人工智能化進(jìn)程。制定并實施 UAI 中長期增長戰(zhàn)略和目標(biāo)，帶領(lǐng)團(tuán)隊快速成長為人工智能領(lǐng)域最專業(yè)的力量。
作為行業(yè)領(lǐng)導(dǎo)者，他和UAI一起在2014年創(chuàng)建了TASA（中國最早的人工智能社團(tuán)）, DL Center（深度學(xué)習(xí)知識中心全球價值網(wǎng)絡(luò)），AI growth（行業(yè)智庫培訓(xùn)）等，為中國的人工智能人才建設(shè)輸送了大量的血液和養(yǎng)分。此外，他還參與或者舉辦過各類國際性的人工智能峰會和活動，產(chǎn)生了巨大的影響力，書寫了60萬字的人工智能精品技術(shù)內(nèi)容，生產(chǎn)翻譯了全球第一本深度學(xué)習(xí)入門書《神經(jīng)網(wǎng)絡(luò)與深度學(xué)習(xí)》，生產(chǎn)的內(nèi)容被大量的專業(yè)垂直公眾號和媒體轉(zhuǎn)載與連載。曾經(jīng)受邀為國內(nèi)頂尖大學(xué)制定人工智能學(xué)習(xí)規(guī)劃和教授人工智能前沿課程，均受學(xué)生和老師好評。

ICML 16-全部接受論文

ICML 2016 - 強(qiáng)化學(xué)習(xí)相關(guān)論文如下：

1. Inverse Optimal Control with Deep Networks via Policy Optimization

Chelsea Finn, UC Berkeley; Sergey Levine, ; Pieter Abbeel, Berkeley

摘要：

http://arxiv.org/abs/1603.00448

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

Nan Jiang, University of Michigan; Lihong Li, Microsoft

http://arxiv.org/abs/1511.03722

Smooth Imitation Learning

Hoang Le, Caltech; Andrew Kang, ; Yisong Yue, Caltech; Peter Carr,

PAC Lower Bounds and Efficient Algorithms for The Max KK-Armed Bandit Problem

Yahel David, Technion; Nahum Shimkin, Technion

Anytime Exploration for Multi-armed Bandits using Confidence Information

Kwang-Sung Jun, UW-Madison; Robert Nowak,

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Yingfei Wang, Princeton University; Chu Wang, ; Warren Powell,

https://arxiv.org/abs/1510.02354

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

Junpei Komiyama, The University of Tokyo; Junya Honda, The University of Tokyo; Hiroshi Nakagawa, The University of Tokyo

https://arxiv.org/abs/1605.01677

Benchmarking Deep Reinforcement Learning for Continuous Control

Yan Duan, University of California, Berk; Xi Chen, University of California, Berkeley; Rein Houthooft, Ghent University; John Schulman, University of California, Berkeley; Pieter Abbeel, Berkeley

https://arxiv.org/abs/1604.06778

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

Prashanth L.A., University of Maryland ; Cheng Jie, University of Maryland – College Park; Michael Fu, University of Maryland – College Park; Steve Marcus, University of Maryland – College Park; Csaba Szepesvari, Alberta

http://arxiv.org/abs/1506.02632

An optimal algorithm for the Thresholding Bandit Problem

Andrea LOCATELLI, University of Potsdam; Maurilio Gutzeit, Universit?t Potsdam; Alexandra Carpentier,

Sequential decision making under uncertainty: Are most decisions easy?

Ozgur Simsek, ; Simon Algorta, ; Amit Kothiyal,

Opponent Modeling in Deep Reinforcement Learning

He He, ; Jordan , ; Hal Daume, Maryland

Softened Approximate Policy Iteration for Markov Games

Julien Pérolat, Univ. Lille; Bilal Piot, Univ. Lille; Matthieu Geist, ; Bruno Scherrer, ; Olivier Pietquin, Univ. Lille, CRIStAL, UMR 9189, SequeL Team, Villeneuve d’Ascq, 59650, FRANCE

Asynchronous Methods for Deep Reinforcement Learning

Volodymyr Mnih, Google DeepMind; Adria Puigdomenech Badia, Google DeepMind; Mehdi Mirza, ; Alex Graves, Google DeepMind; Timothy Lillicrap, Google DeepMind; Tim Harley, Google DeepMind; David , ; Koray Kavukcuoglu, Google Deepmind

https://arxiv.org/abs/1602.01783

Dueling Network Architectures for Deep Reinforcement Learning

Ziyu Wang, Google Inc.; Nando de Freitas, University of Oxford; Tom Schaul, Google Inc.; Matteo Hessel, Google Deepmind; Hado van Hasselt, Google DeepMind; Marc Lanctot, Google Deepmind

http://arxiv.org/abs/1511.06581 Cited by 10

Differentially Private Policy Evaluation

Borja Balle, Lancaster University; Maziar Gomrokchi, McGill University; Doina Precup, McGill

https://arxiv.org/abs/1603.02010

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Philip Thomas, CMU; Emma ,

https://arxiv.org/abs/1604.00923

Hierarchical Decision Making In Electricity Grid Management

Gal Dalal, Technion; Elad Gilboa, Technion; Shie Mannor, Technion

http://arxiv.org/abs/1603.01840

Generalization and Exploration via Randomized Value Functions

Ian Osband, Stanford; Ben , ; Zheng Wen, Adobe Research

https://arxiv.org/abs/1402.0635 Cited by 9

Scalable Discrete Sampling as a Multi-Armed Bandit Problem

Yutian Chen, University of Cambridge; Zoubin ,

摘要

Drawing a sample from a discrete distribution is one of the building components for Monte Carlo methods. Like other sampling algorithms, discrete sampling suffers from the high computational burden in large-scale inference problems. We study the problem of sampling a discrete random variable with a high degree of dependency that is typical in large-scale Bayesian inference and graphical models, and propose an efficient approximate solution with a subsampling approach. We make a novel connection between the discrete sampling and Multi-Armed Bandits problems with a finite reward population and provide three algorithms with theoretical guarantees. Empirical evaluations show the robustness and efficiency of the approximate algorithms in both synthetic and real-world large-scale problems.

http://arxiv.org/abs/1506.09039

Model-Free Imitation Learning with Policy Optimization

Jonathan Ho, Stanford; Jayesh Gupta, Stanford University; Stefano Ermon,

Improving the Efficiency of Deep Reinforcement Learning with Normalized Advantage Functions and Synthetic Experience

Shixiang Gu, University of Cambridge; Sergey Levine, Google; Timothy Lillicrap, Google DeepMind; Ilya Sutskever, OpenAI

http://arxiv.org/abs/1603.00748

Near Optimal Behavior via Approximate State Abstraction

David Abel, Brown University; David Hershkowitz, Brown University; Michael Littman,

https://cs.brown.edu/~dabel/papers/abel_approx_abstraction.pdf

Model-Free Trajectory Optimization for Reinforcement Learning of Motor Skills

Riad Akrour, TU Darmstadt; Gerhard Neumann

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

ICML 2016 強(qiáng)化學(xué)習(xí)相關(guān)論文

ICML 2016 強(qiáng)化學(xué)習(xí)相關(guān)論文

ICML 16-全部接受論文

ICML 2016 - 強(qiáng)化學(xué)習(xí)相關(guān)論文如下：

1. Inverse Optimal Control with Deep Networks via Policy Optimization

摘要：

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

Smooth Imitation Learning

PAC Lower Bounds and Efficient Algorithms for The Max KK-Armed Bandit Problem

Anytime Exploration for Multi-armed Bandits using Confidence Information

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

Benchmarking Deep Reinforcement Learning for Continuous Control

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

An optimal algorithm for the Thresholding Bandit Problem

Sequential decision making under uncertainty: Are most decisions easy?

Opponent Modeling in Deep Reinforcement Learning

Softened Approximate Policy Iteration for Markov Games

Asynchronous Methods for Deep Reinforcement Learning

Dueling Network Architectures for Deep Reinforcement Learning

Differentially Private Policy Evaluation

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Hierarchical Decision Making In Electricity Grid Management

Generalization and Exploration via Randomized Value Functions

Scalable Discrete Sampling as a Multi-Armed Bandit Problem

摘要

Model-Free Imitation Learning with Policy Optimization

Improving the Efficiency of Deep Reinforcement Learning with Normalized Advantage Functions and Synthetic Experience

Near Optimal Behavior via Approximate State Abstraction

Model-Free Trajectory Optimization for Reinforcement Learning of Motor Skills

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

ICML 2016 強(qiáng)化學(xué)習(xí)相關(guān)論文

ICML 16-全部接受論文

ICML 2016 - 強(qiáng)化學(xué)習(xí)相關(guān)論文 如下：

1. Inverse Optimal Control with Deep Networks via Policy Optimization

摘要：

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

Smooth Imitation Learning

PAC Lower Bounds and Efficient Algorithms for The Max KK-Armed Bandit Problem

Anytime Exploration for Multi-armed Bandits using Confidence Information

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

Benchmarking Deep Reinforcement Learning for Continuous Control

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

An optimal algorithm for the Thresholding Bandit Problem

Sequential decision making under uncertainty: Are most decisions easy?

Opponent Modeling in Deep Reinforcement Learning

Softened Approximate Policy Iteration for Markov Games

Asynchronous Methods for Deep Reinforcement Learning

Dueling Network Architectures for Deep Reinforcement Learning

Differentially Private Policy Evaluation

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Hierarchical Decision Making In Electricity Grid Management

Generalization and Exploration via Randomized Value Functions

Scalable Discrete Sampling as a Multi-Armed Bandit Problem

摘要

Model-Free Imitation Learning with Policy Optimization

Improving the Efficiency of Deep Reinforcement Learning with Normalized Advantage Functions and Synthetic Experience

Near Optimal Behavior via Approximate State Abstraction

Model-Free Trajectory Optimization for Reinforcement Learning of Motor Skills

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

ICML 2016 - 強(qiáng)化學(xué)習(xí)相關(guān)論文如下：