mxnet.optimizer.LBSGD¶

class mxnet.optimizer.LBSGD(momentum=0.0, multi_precision=False, warmup_strategy='linear', warmup_epochs=5, batch_scale=1, updates_per_epoch=32, begin_epoch=0, num_epochs=60, **kwargs)[source]¶

The Large Batch SGD optimizer with momentum and weight decay.

The optimizer updates the weight by:

state = momentum * state + lr * rescale_grad * clip(grad, clip_gradient) + wd * weight
weight = weight - state

For details of the update algorithm see sgd_update and sgd_mom_update. In addition to the SGD updates the LBSGD optimizer uses the LARS, Layer-wise Adaptive Rate Scaling, algorithm to have a separate learning rate for each layer of the network, which leads to better stability over large batch sizes.

This optimizer accepts the following parameters in addition to those accepted by Optimizer.

Parameters

momentum (float, optional) – The momentum value.

multi_precision (bool, optional) –

Flag to control the internal precision of the optimizer.:

False: results in using the same precision as the weights (default),
True: makes internal 32-bit copy of the weights and applies gradients
in 32-bit precision even if actual weights used in the model have lower precision.
Turning this on can improve convergence and accuracy when training with float16.

warmup_strategy (string ('linear', 'power2', 'sqrt'. , 'lars' default : 'linear')) –
warmup_epochs (unsigned, default: 5) –
batch_scale (unsigned, default: 1 (same as batch size*numworkers)) –
updates_per_epoch (updates_per_epoch (default: 32, Default might not reflect true number batches per epoch. Used for warmup.)) –
begin_epoch (unsigned, default 0, starting epoch.) –

__init__(momentum=0.0, multi_precision=False, warmup_strategy='linear', warmup_epochs=5, batch_scale=1, updates_per_epoch=32, begin_epoch=0, num_epochs=60, **kwargs)[source]¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`([momentum, multi_precision, …])	Initialize self.
`create_optimizer`(name, **kwargs)	Instantiates an optimizer with a given name and kwargs.
`create_state`(index, weight)	Creates auxiliary state for a given weight.
`create_state_multi_precision`(index, weight)	Creates auxiliary state for a given weight, including FP32 high precision copy if original weight is FP16.
`register`(klass)	Registers a new optimizer.
`set_learning_rate`(lr)	Sets a new learning rate of the optimizer.
`set_lr_mult`(args_lr_mult)	Sets an individual learning rate multiplier for each parameter.
`set_lr_scale`(args_lrscale)	[DEPRECATED] Sets lr scale.
`set_wd_mult`(args_wd_mult)	Sets an individual weight decay multiplier for each parameter.
`update`(index, weight, grad, state)	Updates the given parameter using the corresponding gradient and state.
`update_multi_precision`(index, weight, grad, …)	Updates the given parameter using the corresponding gradient and state.

Attributes

`learning_rate`
`opt_registry`