How to use Chainer for Theano users
As we mentioned on our blog, Theano will stop development in a few weeks. Many aspects of Chainer were inspired by Theano’s clean interface design, so we would like to introduce Chainer to users of Theano. We hope this article assists interested Theano users to move to Chainer easily.
First, let’s summarize the key similarities and differences between Theano and Chainer.
- Python-based library
- Functions can accept NumPy arrays
- CPU/GPU support
- Easy to write various operation as a differentiable function (custom layer)
- Theano compiles the computational graph before run
- Chainer builds the comptuational graph in runtime
- Chainer provides many high-level APIs for neural networks
- Chainer supports distributed learning with ChainerMN
In this post, we assume that the modules below have been imported.
import numpy as np
import theano import theano.tensor as T
import chainer import chainer.functions as F import chainer.links as L
Define a parametric function
A neural network basically has many parametric functions and activation functions, commonly called “layers.” Let’s see the difference between how to create a new parametric function between Theano and Chainer. In this example, to show the way to do the same thing with the two different libraries, we show how to define the 2D convolution function. Chainer has
chainer.links.Convolution2D, so it isn’t necessary to write the code below to use 2D convolution as a building block of a network.
class TheanoConvolutionLayer(object): def __init__(self, input, filter_shape, image_shape): # Prepare initial values of the parameter W spatial_dim = np.prod(filter_shape[2:]) fan_in = filter_shape * spatial_dim fan_out = filter_shape * spatial_dim scale = np.sqrt(3. / fan_in) # Create the parameter W W_init = np.random.uniform(-scale, scale, filter_shape) self.W = theano.shared(W_init.astype(np.float32), borrow=True) # Create the paramter b b_init = np.zeros((filter_shape,)) self.b = theano.shared(b_init.astype(np.float32), borrow=True) # Describe the convolution operation conv_out = T.nnet.conv2d( input=input, filters=self.W, filter_shape=filter_shape, input_shape=image_shape) # Add a bias self.output = conv_out + self.b.dimshuffle('x', 0, 'x', 'x') # Store paramters self.params = [self.W, self.b]
How can we use this class? In Theano, the computation is defined as code using symbols, but doesn’t perform actual computation at that time. Namely, it defines the computational graph before run. To use the defined computational graph, we need to define another operator using
theano.function which takes input variables and output variables.
batchsize = 32 input_shape = (batchsize, 1, 28, 28) filter_shape = (6, 1, 5, 5) # Create a tensor that represents a minibatch x = T.fmatrix('x') input = x.reshape(input_shape) conv = TheanoConvolutionLayer(input, filter_shape, input_shape) f = theano.function([input], conv.output)
conv is the definition of how to compute the output from the first argument
f is the actual operator. You can pass values to
f to compute the result of convolution like this:
x_data = np.random.rand(32, 1, 28, 28).astype(np.float32) y = f(x_data) print(y.shape, type(y))
(32, 6, 24, 24) <class 'numpy.ndarray'>
What about the case in Chainer? Theano is a more general framework for scientific calculation, while Chainer focuses on neural networks. Chainer has many high-level APIs to write the building blocks of neural networks easier. Well, how to write the same convolution operator in Chainer?
class ChainerConvolutionLayer(chainer.Link): def __init__(self, filter_shape): super().__init__() with self.init_scope(): # Specify the way of initialize W_init = chainer.initializers.LeCunUniform() b_init = chainer.initializers.Zero() # Create a parameter object self.W = chainer.Parameter(W_init, filter_shape) self.b = chainer.Parameter(b_init, filter_shape) def __call__(self, x): return F.convolution_2d(x, self.W, self.b)
Actually, Chainer has pre-implemented
chainer.links.Convolution2D class for convolution. So, you don’t need to implement the code above by yourself, but it shows how to do the same thing written in Theano above.
You can create your own parametric function by defining a class inherited from
chainer.Link as shown in the above. The computation that will be applied to the input is described in
Then, how to use this class?
chainer_conv = ChainerConvolutionLayer(filter_shape) y = chainer_conv(x_data) print(y.shape, type(y), type(y.array))
(32, 6, 24, 24) <class 'chainer.variable.Variable'> <class 'numpy.ndarray'>
Chainer provides many functions in
chainer.functions and it takes NumPy arrays or
chainer.Variable objects as inputs. You can write arbitrary layer using those functions to make it differentiable. Note that a
chainer.Variable object contains its actual data in the
You can write the same thing using
L.Convolution2D like this:
conv_link = L.Convolution2D(in_channels=1, out_channels=6, ksize=(5, 5)) y = conv_link(x_data) print(y.shape, type(y), type(y.array))
(32, 6, 24, 24) <class 'chainer.variable.Variable'> <class 'numpy.ndarray'>
Use Theano function as a layer in Chainer
How to port parametric functions written in Theano to
Links in Chainer is shown in the above chapter, but there’s an easier way to port non-parametric functions from Theano to Chainer.
TheanoFunction to wrap a Theano function as a
chainer.Link. What you need to prepare is just the inputs and outputs of the Theano function you want to port to Chainer’s
Link. For example, a convolution function of Theano can be converted to a Chainer’s
Link as follows:
x = T.fmatrix().reshape((32, 1, 28, 28)) W = T.fmatrix().reshape((6, 1, 5, 5)) b = T.fvector().reshape((6,)) conv_out = T.nnet.conv2d(x, W) + b.dimshuffle('x', 0, 'x', 'x') f = L.TheanoFunction(inputs=[x, W, b], outputs=[conv_out])
It converts the Theano computational graph into Chainer’s computational graph! This is differentiable with the Chainer APIs, and easy to use as a building block of a network written in Chainer. But it takes
b as input arguments, so it should be noted that it doesn’t keep those parameters inside.
Anyway, how to use this ported Theano function in a network in Chainer?
class MyNetworkWithTheanoConvolution(chainer.Chain): def __init__(self, theano_conv): super().__init__() self.theano_conv = theano_conv W_init = chainer.initializers.LeCunUniform() b_init = chainer.initializers.Zero() with self.init_scope(): self.W = chainer.Parameter(W_init, (6, 1, 5, 5)) self.b = chainer.Parameter(b_init, (6,)) self.l1 = L.Linear(None, 100) self.l2 = L.Linear(100, 10) def __call__(self, x): h = self.theano_conv(x, self.W, self.b) h = F.relu(h) h = self.l1(h) h = F.relu(h) return self.l2(h)
This class is a Chainer’s model class which is inherited from
chainer.Chain. This is a standard way to define a class in Chainer, but, look! it uses a Theano function as a layer inside
__call__ method. The first layer of this network is a convolution layer, and that layer is Theano function which runs computation with Theano.
The usage of this network is completely same as the normal Chainer’s models:
# Instantiate a model object model = MyNetworkWithTheanoConvolution(f) # And give an array/Variable to get the network output y = model(x_data)
This network takes a mini-batch of images whose shape is
(32, 1, 28, 28) and outputs 10-dimensional vectors for each input image, so the shape of the output variable will be
This network is differentiable and the parameters of the Theano’s convolution function which are defined in the constructer as
self.b can be optimized through Chainer’s optimizers normaly.
t = np.random.randint(0, 10, size=(32,)).astype(np.int32) loss = F.softmax_cross_entropy(y, t) model.cleargrads() loss.backward()
You can check the gradients calculated for the parameters
b used in the Theano function
W_gradient = model.W.grad_var.array b_gradient = model.b.grad_var.array
print(W_gradient.shape, type(W_gradient)) print(b_gradient.shape, type(b_gradient))
(6, 1, 5, 5) <class 'numpy.ndarray'> (6,) <class 'numpy.ndarray'>
While we are familiar with Chainer, it has been longer since we have used Theano. If there are corrections or additional advice we should add to this guide, please let us know.
Chainer is a Python-based, standalone open source framework for deep learning models. Chainer provides a flexible, intuitive, and high performance means of implementing a full range of deep learning models, including state-of-the-art models such as recurrent neural networks and variational autoencoders.
- ChainerX Beta Release
- Released Chainer/CuPy v5.0.0
- ChainerMN on AWS with CloudFormation
- Open source deep learning framework Chainer officially supported by Amazon Web Services
- New ChainerMN functions for improved performance in cloud environments and performance testing results on AWS
- ChainerMN on Kubernetes with GPUs
- Released Chainer/CuPy v4.0.0
- ONNX support by Chainer