mxnet.optimizer.AdaGrad¶
- 
class mxnet.optimizer.AdaGrad(eps=1e-07, **kwargs)[source]¶
- AdaGrad optimizer. - This class implements the AdaGrad optimizer described in Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, and available at http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf. - This optimizer updates each weight by: - grad = clip(grad * rescale_grad, clip_gradient) history += square(grad) div = grad / sqrt(history + float_stable_eps) weight += (div + weight * wd) * -lr - This optimizer accepts the following parameters in addition to those accepted by - Optimizer.- Parameters
- eps (float, optional) – Initial value of the history accumulator. Avoids division by 0. 
 - 
__init__(eps=1e-07, **kwargs)[source]¶
- Initialize self. See help(type(self)) for accurate signature. 
 - Methods - __init__([eps])- Initialize self. - create_optimizer(name, **kwargs)- Instantiates an optimizer with a given name and kwargs. - create_state(index, weight)- Creates auxiliary state for a given weight. - create_state_multi_precision(index, weight)- Creates auxiliary state for a given weight, including FP32 high precision copy if original weight is FP16. - register(klass)- Registers a new optimizer. - set_learning_rate(lr)- Sets a new learning rate of the optimizer. - set_lr_mult(args_lr_mult)- Sets an individual learning rate multiplier for each parameter. - set_lr_scale(args_lrscale)- [DEPRECATED] Sets lr scale. - set_wd_mult(args_wd_mult)- Sets an individual weight decay multiplier for each parameter. - update(index, weight, grad, state)- Updates the given parameter using the corresponding gradient and state. - update_multi_precision(index, weight, grad, …)- Updates the given parameter using the corresponding gradient and state. - Attributes - learning_rate- opt_registry