mxnet.gluon.rnn.LSTM¶
-
class
mxnet.gluon.rnn.
LSTM
(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', projection_size=None, h2r_weight_initializer=None, state_clip_min=None, state_clip_max=None, state_clip_nan=False, **kwargs)[source]¶ Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.
For each element in the input sequence, each layer computes the following function:
\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.
- Parameters
hidden_size (int) – The number of features in the hidden state h.
num_layers (int, default 1) – Number of recurrent layers.
layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.
bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.
h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
projection_size (int, default None) – The number of features after projection.
h2r_weight_initializer (str or Initializer, default None) – Initializer for the projected recurrent weights matrix, used for the linear transformation of the recurrent state to the projected space.
state_clip_min (float or None, default None) – Minimum clip value of LSTM states. This option must be used together with state_clip_max. If None, clipping is not applied.
state_clip_max (float or None, default None) – Maximum clip value of LSTM states. This option must be used together with state_clip_min. If None, clipping is not applied.
state_clip_nan (boolean, default False) – Whether to stop NaN from propagating in state by clipping it to min/max. If the clipping range is not specified, this option is ignored.
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
prefix (str or None) – Prefix of this Block.
params (ParameterDict or None) – Shared Parameters for this Block.
- Inputs:
data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.
states: a list of two initial recurrent state tensors. Each has shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.
- Outputs:
out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
out_states: a list of two output recurrent state tensors with the same shape as in states. If states is None out_states will not be returned.
Examples
>>> layer = mx.gluon.rnn.LSTM(100, 3) >>> layer.initialize() >>> input = mx.nd.random.uniform(shape=(5, 3, 10)) >>> # by default zeros are used as begin state >>> output = layer(input) >>> # manually specify begin state. >>> h0 = mx.nd.random.uniform(shape=(3, 3, 100)) >>> c0 = mx.nd.random.uniform(shape=(3, 3, 100)) >>> output, hn = layer(input, [h0, c0])
-
__init__
(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', projection_size=None, h2r_weight_initializer=None, state_clip_min=None, state_clip_max=None, state_clip_nan=False, **kwargs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(hidden_size[, num_layers, layout, …])Initialize self.
apply
(fn)Applies
fn
recursively to every child block as well as self.begin_state
([batch_size, func])Initial state for this cell.
cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
ParameterDict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectParameterDict
which match some given regular expressions.export
(path[, epoch])Export HybridBlock to json format that can be loaded by SymbolBlock.imports, mxnet.mod.Module or the C++ interface.
forward
(x, *args)Defines the forward computation.
hybrid_forward
(F, inputs[, states])Overrides to construct symbolic graph for this Block.
hybridize
([active])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, ctx, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load_parameters
(filename[, ctx, …])Load parameters from file previously saved by save_parameters.
load_params
(filename[, ctx, allow_missing, …])[Deprecated] Please use load_parameters.
name_scope
()Returns a name space object managing a child
Block
and parameter names.register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
register_forward_pre_hook
(hook)Registers a forward pre-hook on the block.
save_parameters
(filename)Save parameters to file.
save_params
(filename)[Deprecated] Please use save_parameters.
state_info
([batch_size])summary
(*inputs)Print the summary of the model’s output and parameters.
Attributes
name
Name of this
Block
, without ‘_’ in the end.params
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).prefix
Prefix of this
Block
.