Parameter¶

class mxnet.gluon.Parameter(name, grad_req='write', shape=None, dtype=<class 'numpy.float32'>, lr_mult=1.0, wd_mult=1.0, init=None, allow_deferred_init=False, differentiable=True, stype='default', grad_stype='default')[source]¶

A Container holding parameters (weights) of Blocks.

Parameter holds a copy of the parameter on each Context after it is initialized with Parameter.initialize(...). If grad_req is not 'null', it will also hold a gradient array on each Context:

ctx = mx.gpu(0)
x = mx.nd.zeros((16, 100), ctx=ctx)
w = mx.gluon.Parameter('fc_weight', shape=(64, 100), init=mx.init.Xavier())
b = mx.gluon.Parameter('fc_bias', shape=(64,), init=mx.init.Zero())
w.initialize(ctx=ctx)
b.initialize(ctx=ctx)
out = mx.nd.FullyConnected(x, w.data(ctx), b.data(ctx), num_hidden=64)

Parameters

name (str) – Name of this parameter.
grad_req ({'write', 'add', 'null'}, default 'write') –
Specifies how to update gradient to grad arrays.
- 'write' means everytime gradient is written to grad NDArray.
- 'add' means everytime gradient is added to the grad NDArray. You need to manually call zero_grad() to clear the gradient buffer before each iteration when using this option.
- ’null’ means gradient is not requested for this parameter. gradient arrays will not be allocated.
shape (int or tuple of int, default None) – Shape of this parameter. By default shape is not specified. Parameter with unknown shape can be used for Symbol API, but init will throw an error when using NDArray API.
dtype (numpy.dtype or str, default 'float32') – Data type of this parameter. For example, numpy.float32 or 'float32'.
lr_mult (float, default 1.0) – Learning rate multiplier. Learning rate will be multiplied by lr_mult when updating this parameter with optimizer.
wd_mult (float, default 1.0) – Weight decay multiplier (L2 regularizer coefficient). Works similar to lr_mult.
init (Initializer, default None) – Initializer of this parameter. Will use the global initializer by default.
stype ({'default', 'row_sparse', 'csr'}, defaults to 'default'.) – The storage type of the parameter.
grad_stype ({'default', 'row_sparse', 'csr'}, defaults to 'default'.) – The storage type of the parameter’s gradient.

grad_req¶

This can be set before or after initialization. Setting grad_req to 'null' with x.grad_req = 'null' saves memory and computation when you don’t need gradient w.r.t x.

Type: {‘write’, ‘add’, ‘null’}

lr_mult¶

Local learning rate multiplier for this Parameter. The actual learning rate is calculated with learning_rate * lr_mult. You can set it with param.lr_mult = 2.0

Type: float

wd_mult¶

Local weight decay multiplier for this Parameter.

Type: float

Get and set parameters¶

`Parameter.initialize`([init, ctx, …])	Initializes parameter and gradient arrays.
`Parameter.data`([ctx])	Returns a copy of this parameter on one context.
`Parameter.list_data`()	Returns copies of this parameter on all contexts, in the same order as creation.
`Parameter.list_row_sparse_data`(row_id)	Returns copies of the ‘row_sparse’ parameter on all contexts, in the same order as creation.
`Parameter.row_sparse_data`(row_id)	Returns a copy of the ‘row_sparse’ parameter on the same context as row_id’s.
`Parameter.set_data`(data)	Sets this parameter’s value on all contexts.
`Parameter.shape`

Get and set gradients associated with parameters¶

`Parameter.grad`([ctx])	Returns a gradient buffer for this parameter on one context.
`Parameter.list_grad`()	Returns gradient buffers on all contexts, in the same order as `values()`.
`Parameter.zero_grad`()	Sets gradient buffer on all contexts to 0.
`Parameter.grad_req`

Handle device contexts¶

`Parameter.cast`(dtype)	Cast data and gradient of this Parameter to a new data type.
`Parameter.list_ctx`()	Returns a list of contexts this parameter is initialized on.
`Parameter.reset_ctx`(ctx)	Re-assign Parameter to other contexts.

Convert to symbol¶

Parameter.var()

Returns a symbol representing this parameter.