Create a neural network¶
Now let’s look how to create neural networks in Gluon. In addition the NDArray package (nd
) that we just covered, we now will also import the neural network nn
package from gluon
.
[1]:
from mxnet import nd
from mxnet.gluon import nn
Create your neural network’s first layer¶
Let’s start with a dense layer with 2 output units.
[2]:
layer = nn.Dense(2)
layer
[2]:
Dense(None -> 2, linear)
Then initialize its weights with the default initialization method, which draws random values uniformly from \([-0.7, 0.7]\).
[3]:
layer.initialize()
Then we do a forward pass with random data. We create a \((3,4)\) shape random input x
and feed into the layer to compute the output.
[4]:
x = nd.random.uniform(-1,1,(3,4))
layer(x)
[4]:
[[-0.02524132 -0.00874885]
[-0.06026538 -0.01308061]
[ 0.02468396 -0.02181557]]
<NDArray 3x2 @cpu(0)>
As can be seen, the layer’s input limit of 2 produced a \((3,2)\) shape output from our \((3,4)\) input. Note that we didn’t specify the input size of layer
before (though we can specify it with the argument in_units=4
here), the system will automatically infer it during the first time we feed in data, create and initialize the weights. So we can access the weight after the first forward pass:
[5]:
layer.weight.data()
[5]:
[[-0.00873779 -0.02834515 0.05484822 -0.06206018]
[ 0.06491279 -0.03182812 -0.01631819 -0.00312688]]
<NDArray 2x4 @cpu(0)>
Chain layers into a neural network¶
Let’s first consider a simple case that a neural network is a chain of layers. During the forward pass, we run layers sequentially one-by-one. The following code implements a famous network called LeNet through nn.Sequential
.
[6]:
net = nn.Sequential()
# Add a sequence of layers.
net.add(# Similar to Dense, it is not necessary to specify the input channels
# by the argument `in_channels`, which will be automatically inferred
# in the first forward pass. Also, we apply a relu activation on the
# output. In addition, we can use a tuple to specify a non-square
# kernel size, such as `kernel_size=(2,4)`
nn.Conv2D(channels=6, kernel_size=5, activation='relu'),
# One can also use a tuple to specify non-symmetric pool and stride sizes
nn.MaxPool2D(pool_size=2, strides=2),
nn.Conv2D(channels=16, kernel_size=3, activation='relu'),
nn.MaxPool2D(pool_size=2, strides=2),
# The dense layer will automatically reshape the 4-D output of last
# max pooling layer into the 2-D shape: (x.shape[0], x.size/x.shape[0])
nn.Dense(120, activation="relu"),
nn.Dense(84, activation="relu"),
nn.Dense(10))
net
[6]:
Sequential(
(0): Conv2D(None -> 6, kernel_size=(5, 5), stride=(1, 1), Activation(relu))
(1): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
(2): Conv2D(None -> 16, kernel_size=(3, 3), stride=(1, 1), Activation(relu))
(3): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
(4): Dense(None -> 120, Activation(relu))
(5): Dense(None -> 84, Activation(relu))
(6): Dense(None -> 10, linear)
)
The usage of nn.Sequential
is similar to nn.Dense
. In fact, both of them are subclasses of nn.Block
. The following codes show how to initialize the weights and run the forward pass.
[7]:
net.initialize()
# Input shape is (batch_size, color_channels, height, width)
x = nd.random.uniform(shape=(4,1,28,28))
y = net(x)
y.shape
[7]:
(4, 10)
We can use []
to index a particular layer. For example, the following accesses the 1st layer’s weight and 6th layer’s bias.
[8]:
(net[0].weight.data().shape, net[5].bias.data().shape)
[8]:
((6, 1, 5, 5), (84,))
Create a neural network flexibly¶
In nn.Sequential
, MXNet will automatically construct the forward function that sequentially executes added layers. Now let’s introduce another way to construct a network with a flexible forward function.
To do it, we create a subclass of nn.Block
and implement two methods:
__init__
create the layersforward
define the forward function.
[9]:
class MixMLP(nn.Block):
def __init__(self, **kwargs):
# Run `nn.Block`'s init method
super(MixMLP, self).__init__(**kwargs)
self.blk = nn.Sequential()
self.blk.add(nn.Dense(3, activation='relu'),
nn.Dense(4, activation='relu'))
self.dense = nn.Dense(5)
def forward(self, x):
y = nd.relu(self.blk(x))
print(y)
return self.dense(y)
net = MixMLP()
net
[9]:
MixMLP(
(blk): Sequential(
(0): Dense(None -> 3, Activation(relu))
(1): Dense(None -> 4, Activation(relu))
)
(dense): Dense(None -> 5, linear)
)
In the sequential chaining approach, we can only add instances with nn.Block
as the base class and then run them in a forward pass. In this example, we used print
to get the intermediate results and nd.relu
to apply relu activation. So this approach provides a more flexible way to define the forward function.
The usage of net
is similar as before.
[10]:
net.initialize()
x = nd.random.uniform(shape=(2,2))
net(x)
[[0.0000000e+00 0.0000000e+00 6.2900386e-04 7.6445540e-05]
[0.0000000e+00 0.0000000e+00 1.1989386e-03 1.2375204e-03]]
<NDArray 2x4 @cpu(0)>
[10]:
[[-3.8061840e-05 1.5568350e-05 4.3668215e-06 4.2853058e-05
1.8710394e-05]
[-1.8345519e-05 2.6403079e-05 2.4685731e-05 7.7019373e-05
9.7785989e-05]]
<NDArray 2x5 @cpu(0)>
Finally, let’s access a particular layer’s weight
[11]:
net.blk[1].weight.data()
[11]:
[[-0.0343901 -0.05805862 -0.06187592]
[-0.06210143 -0.00918167 -0.00170272]
[-0.02634858 0.05334064 0.02748809]
[ 0.06669661 -0.01711474 0.01647211]]
<NDArray 4x3 @cpu(0)>