IOS: Inter-Operator Scheduler for CNN Acceleration
Abstract: To accelerate CNN inference, existing deep learning frameworks focus on optimizing intra-operator parallelization. However, a single operator can no longer fully utilize the available parallelism given the rapid advances in high-performance hardware, resulting in a large gap between the peak performance and the real performance. This performance gap is more severe under smaller batch sizes. In this work, we extensively study the parallelism between operators and propose Inter-Operator Scheduler (IOS) to automatically schedule multiple operators' parallel execution through a novel dynamic programming algorithm. IOS consistently outperforms state-of-the-art libraries (e.g., TensorRT) by 1.1 to 1.5x on modern CNN benchmarks.
Overview of Inter-Operator Scheduler
Two Parallelization Strategies Parallelizing Operators
@inproceedings{ding2021ios, author={Yaoyao Ding and Ligeng Zhu and Zhihao Jia and Gennady Pekhimenko and Song Han}, booktitle = {Proceedings of Machine Learning and Systems}, title = {{IOS: Inter-Operator Scheduler for CNN Acceleration}}, volume = {3}, year = {2021} }
Acknowledgments: We want to thank Xiaodan (Serina) Tan for NVIDIA GPU related issues and constructive discussion. This project was supported by the Canada Foundation for Innovation JELF grant, NSERC Discovery grant, AWS Machine Learning Research Award, Facebook Faculty Research Award, MIT-IBM Watson AI Lab, MIT Data Science and AI Lab (DSAIL), NVIDIA, and NSF CAREER Award #1943349.