A PyTorch >=1.0 implementation of DenseNets, optimized to save GPU memory.
While DenseNets are fairly easy to implement in deep learning frameworks, most implmementations (such as the original) tend to be memory-hungry. In particular, the number of intermediate feature maps generated by batch normalization and concatenation operations grows quadratically with network depth. It is worth emphasizing that this is not a property inherent to DenseNets, but rather to the implementation.
This implementation uses a new strategy to reduce the memory consumption of DenseNets. We use checkpointing to compute the Batch Norm and concatenation feature maps. These intermediate feature maps are discarded during the forward pass and recomputed for the backward pass. This adds 15-20% of time overhead for training, but reduces feature map consumption from quadratic to linear.
This implementation is inspired by this technical report, which outlines a strategy for efficient DenseNets via memory sharing.
In your existing project:
There is one file in the models
folder.
models/densenet.py
is an implementation based off the torchvision and
project killer implementations.If you care about speed, and memory is not an option, pass the efficient=False
argument into the DenseNet
constructor.
Otherwise, pass in efficient=True
.
Options:
block_config
optionefficient=True
uses the memory-efficient versionsmall_inputs=False
. For CIFAR or SVHN, set small_inputs=True
.Running the demo:
The only extra package you need to install is python-fire:
pip install fire
CUDA_VISIBLE_DEVICES=0 python demo.py --efficient True --data <path_to_folder_with_cifar10> --save <path_to_save_dir>
CUDA_VISIBLE_DEVICES=0,1,2 python demo.py --efficient True --data <path_to_folder_with_cifar10> --save <path_to_save_dir>
Options:
--depth
(int) - depth of the network (number of convolution layers) (default 40)--growth_rate
(int) - number of features added per DenseNet layer (default 12)--n_epochs
(int) - number of epochs for training (default 300)--batch_size
(int) - size of minibatch (default 256)--seed
(int) - manually set the random seed (default None)A comparison of the two implementations (each is a DenseNet-BC with 100 layers, batch size 64, tested on a NVIDIA Pascal Titan-X):
Implementation | Memory cosumption (GB/GPU) | Speed (sec/mini batch) |
---|---|---|
Naive | 2.863 | 0.165 |
Efficient | 1.605 | 0.207 |
Efficient (multi-GPU) | 0.985 | - |
@article{pleiss2017memory,
title={Memory-Efficient Implementation of DenseNets},
author={Pleiss, Geoff and Chen, Danlu and Huang, Gao and Li, Tongcheng and van der Maaten, Laurens and Weinberger, Kilian Q},
journal={arXiv preprint arXiv:1707.06990},
year={2017}
}
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。