ChainerRL - Deep Reinforcement Learning Library
Chainer-based deep reinforcement learning library, ChainerRL has been released. https://github.com/pfnet/chainerrl
(This post is translated from the original post written by Yasuhiro Fujita.)
ChainerRL contains a set of Chainer implementations of deep reinforcement learning (DRL) algorithms. The followings are implemented and accessible under a unified interface.
- Deep Q-Network (Mnih et al., 2015)
- Double DQN (Hasselt et al., 2016)
- Normalized Advantage Function (Gu et al., 2016)
- (Persistent) Advantage Learning (Bellemare et al., 2016)
- Deep Deterministic Policy Gradient (DDPG) (Lillicrap et al., 2016)
- SVG(0) (Heese et al., 2015)
- Asynchronous Advantage Actor-Critic (A3C) (Mnih et al., 2016)
- Asynchronous N-step Q-learning (Mnih et al., 2016)
- Actor-Critic with Experience Replay (Wang et al., 2017)
The ChainerRL library comes with many examples such as video gameplay of Atari 2600 using A3C,
and learning to control humanoid robot using DDPG.
How to use
Here is a brief introduction to ChainerRL.
First, user must provide an appropriate definition of the problem (called “environment”) that is to be solved using reinforcement learning. The format of defining the environment in ChainerRL follows that of OpenAI’s Gym (https://github.com/openai/gym), a benchmark toolkit for reinforcement learning. ChainerRL can be used either with Gym or an original implementation of environment. Basically, the environment should have two methods,
1 2 3 4 5 6 env = YourEnv() # reset() returns the current observation given the environment obs = env.reset() action = 0 # step() sends an action to the environemnt, then returns 4-tuple (next observation, reward, whether it reachs the terminal of episode, and additional information). obs, r, done, info = env.step(action)
In DRL, neural networks correspond to policy that determines an action given a state, or value functions (V-function or Q-function), that estimate the value of a state or action. The parameters of neural network models are then updated through training. In ChainerRL, policies and value functions are represented as a
Link object in Chainer that implements
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 class CustomDiscreteQFunction(chainer.Chain): def __init__(self): super().__init__(l1=L.Linear(100, 50) l2=L.Linear(50, 4)) def __call__(self, x, test=False): h = F.relu(self.l1(x)) h = self.l2(h) return chainerrl.action_value.DiscreteActionValue(h) class CustomGaussianPolicy(chainer.Chain): def __init__(self): super().__init__(l1=L.Linear(100, 50) mean=L.Linear(50, 4), var=L.Linear(50, 4)) def __call__(self, x, test=False): h = F.relu(self.l1(x)) mean = self.mean(h) var = self.var(h) return chainerrl.distribution.GaussianDistribution(mean, var)
Then “Agent” can be defined given the model, an optimizer in Chainer, and algorithm-specific parameters. Agents execute the training of the model through interactions with the environment.
1 2 3 4 q_func = CustomDiscreteQFunction() optimizer = chainer.Adam() optimizer.setup(q_func) agent = chainerrl.agents.DQN(q_func, optimizer, ...) # truncated other parameters
After creating the agent, training can be done either by user’s own training loop,
1 2 3 4 5 6 7 8 9 10 11 12 13 # Training obs = env.reset() r = 0 done = False for _ in range(10000): while not done: action = agent.act_and_train(obs, r) obs, r, done, info = env.step(action) agent.stop_episode_and_train(obs, r, done) obs = env.reset() r = 0 done = False agent.save('final_agent')
or a pre-defined training function as follows.
1 2 3 chainerrl.experiments.train_agent_with_evaluation( agent, env, steps=100000, eval_frequency=10000, eval_n_runs=10, outdir='results')
We also provide a quickstart guide to start playing with ChainerRL.
As ChainerRL is currently a beta version, feedbacks are highly appreciated if you are interested in reinforcement learning. We are planning to keep improving ChainerRL, by making it easier to use and by adding new algorithms.
Chainer is a Python-based, standalone open source framework for deep learning models. Chainer provides a flexible, intuitive, and high performance means of implementing a full range of deep learning models, including state-of-the-art models such as recurrent neural networks and variational autoencoders.
- ONNX support by Chainer
- How to use Chainer for Theano users
- Theano's contribution
- Chainer v2.0.0 and our future development plans
- Performance comparison of LSTM with and without cuDNN(v5) in Chainer
- ChainerRL - Deep Reinforcement Learning Library
- Performance of Distributed Deep Learning using ChainerMN
- Research projects using Chainer