Early Exiting with Compressible Activations for Efficient Neural Network Inference
Cloud-based Machine Learning (ML) incurs high latency and high sensitivity to connectivity failures. This can be improved by maximizing the ML processing on the local device. However, since these low-power devices typically have limited resources, the early-exit mechanism has been used to hierarchically split a neural network in parts over multiple devices, trading off computation with communication costs. But, the intermediate activations can be significantly large at the split point. In this paper, we present a novel entropy-based technique that learns to intelligently compress activations during training and efficiently encodes them during inference. We show that in an early-exit configuration, entropy regularization with Huffman coding can save up to 54% in communication cost, while keeping the classification accuracy of MobileNetV2 on CIFAR-10 above 85%.
IEEE, Smart Systems Integration, Bruges, Belgium
Internal research funding by DATAPROG (RES3)