mxnet.gluon.data.DataLoader¶
-
class
mxnet.gluon.data.
DataLoader
(dataset, batch_size=None, shuffle=False, sampler=None, last_batch=None, batch_sampler=None, batchify_fn=None, num_workers=0, pin_memory=False, prefetch=None, thread_pool=False)[source]¶ Loads data from a dataset and returns mini-batches of data.
- Parameters
dataset (Dataset) – Source dataset. Note that numpy and mxnet arrays can be directly used as a Dataset.
batch_size (int) – Size of mini-batch.
shuffle (bool) – Whether to shuffle the samples.
sampler (Sampler) – The sampler to use. Either specify sampler or shuffle, not both.
last_batch ({'keep', 'discard', 'rollover'}) –
How to handle the last batch if batch_size does not evenly divide len(dataset).
keep - A batch with less samples than previous batches is returned. discard - The last batch is discarded if its incomplete. rollover - The remaining samples are rolled over to the next epoch.
batch_sampler (Sampler) – A sampler that returns mini-batches. Do not specify batch_size, shuffle, sampler, and last_batch if batch_sampler is specified.
batchify_fn (callable) –
Callback function to allow users to specify how to merge samples into a batch. Defaults to default_batchify_fn:
def default_batchify_fn(data): if isinstance(data[0], nd.NDArray): return nd.stack(*data) elif isinstance(data[0], tuple): data = zip(*data) return [default_batchify_fn(i) for i in data] else: data = np.asarray(data) return nd.array(data, dtype=data.dtype)
num_workers (int, default 0) – The number of multiprocessing workers to use for data preprocessing.
pin_memory (boolean, default False) – If
True
, the dataloader will copy NDArrays into pinned memory before returning them. Copying from CPU pinned memory to GPU is faster than from normal CPU memory.prefetch (int, default is num_workers * 2) – The number of prefetching batches only works if num_workers > 0. If prefetch > 0, it allow worker process to prefetch certain batches before acquiring data from iterators. Note that using large prefetching batch will provide smoother bootstrapping performance, but will consume more shared_memory. Using smaller number may forfeit the purpose of using multiple worker processes, try reduce num_workers in this case. By default it defaults to num_workers * 2.
thread_pool (bool, default False) – If
True
, use threading pool instead of multiprocessing pool. Using threadpool can avoid shared memory usage. If DataLoader is more IO bounded or GIL is not a killing problem, threadpool version may achieve better performance than multiprocessing.
-
__init__
(dataset, batch_size=None, shuffle=False, sampler=None, last_batch=None, batch_sampler=None, batchify_fn=None, num_workers=0, pin_memory=False, prefetch=None, thread_pool=False)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(dataset[, batch_size, shuffle, …])Initialize self.