mxnet.ndarray.BatchNorm¶
-
mxnet.ndarray.BatchNorm(data=None, gamma=None, beta=None, moving_mean=None, moving_var=None, eps=_Null, momentum=_Null, fix_gamma=_Null, use_global_stats=_Null, output_mean_var=_Null, axis=_Null, cudnn_off=_Null, out=None, name=None, **kwargs)¶ Batch normalization.
Normalizes a data batch by mean and variance, and applies a scale
gammaas well as offsetbeta.Assume the input has more than one dimension and we normalize along axis 1. We first compute the mean and variance along this axis:
\[\begin{split}data\_mean[i] = mean(data[:,i,:,...]) \\ data\_var[i] = var(data[:,i,:,...])\end{split}\]Then compute the normalized output, which has the same shape as input, as following:
\[out[:,i,:,...] = \frac{data[:,i,:,...] - data\_mean[i]}{\sqrt{data\_var[i]+\epsilon}} * gamma[i] + beta[i]\]Both mean and var returns a scalar by treating the input as a vector.
Assume the input has size k on axis 1, then both
gammaandbetahave shape (k,). Ifoutput_mean_varis set to be true, then outputs bothdata_meanand the inverse ofdata_var, which are needed for the backward pass. Note that gradient of these two outputs are blocked.Besides the inputs and the outputs, this operator accepts two auxiliary states,
moving_meanandmoving_var, which are k-length vectors. They are global statistics for the whole dataset, which are updated by:moving_mean = moving_mean * momentum + data_mean * (1 - momentum) moving_var = moving_var * momentum + data_var * (1 - momentum)
If
use_global_statsis set to be true, thenmoving_meanandmoving_varare used instead ofdata_meananddata_varto compute the output. It is often used during inference.The parameter
axisspecifies which axis of the input shape denotes the ‘channel’ (separately normalized groups). The default is 1. Specifying -1 sets the channel axis to be the last item in the input shape.Both
gammaandbetaare learnable parameters. But iffix_gammais true, then setgammato 1 and its gradient to 0.Note
When
fix_gammais set to True, no sparse support is provided. Iffix_gamma isset to False, the sparse tensors will fallback.Defined in src/operator/nn/batch_norm.cc:L574
- Parameters
data (NDArray) – Input data to batch normalization
gamma (NDArray) – gamma array
beta (NDArray) – beta array
moving_mean (NDArray) – running mean of input
moving_var (NDArray) – running variance of input
eps (double, optional, default=0.001) – Epsilon to prevent div 0. Must be no less than CUDNN_BN_MIN_EPSILON defined in cudnn.h when using cudnn (usually 1e-5)
momentum (float, optional, default=0.9) – Momentum for moving average
fix_gamma (boolean, optional, default=1) – Fix gamma while training
use_global_stats (boolean, optional, default=0) – Whether use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator.
output_mean_var (boolean, optional, default=0) – Output the mean and inverse std
axis (int, optional, default='1') – Specify which shape axis the channel is specified
cudnn_off (boolean, optional, default=0) – Do not select CUDNN operator, if available
out (NDArray, optional) – The output NDArray to hold the result.
- Returns
out – The output of this function.
- Return type
NDArray or list of NDArrays