Yanzhi Wang - Towards the Limits of Energy Efficiency and Performance of Deep Learning Systems
Deep learning systems have achieved unprecedented processes in a number of fields such as computer vision, robotics, game playing, unmanned driving and aerial systems, and other AI-related fields. However, the rapidly expanding model size are posing a significant restriction on both the computation and weight storage, for both inference and training, and on both high-performance computing systems and low-power embedded system and IoT applications. In order to overcome these limitations, we propose a holistic framework of incorporating structured matrices into deep learning systems, and could achieve (i) simultaneous reduction on weight storage and computational complexities, (ii) simultaneous speedup of training and inference, and (iii) generality and fundamentality that can be adopted to both software and hardware implementations, different platforms, and different neural network types, sizes, and scalability.Besides algorithm-level achievements, our framework has (i) a solid theoretical foundation to prove that our approach will converge to the same “effectiveness” as deep learning without compression, and to demonstrate/prove that our approach approach/achieve the theoretical limitation of computation and storage of deep learning systems; (ii) platform-specific implementations and optimizations on smartphones, FPGAs, and ASIC circuits. We demonstrate that our smartphone-based implementation achieves the similar speed of GPU and existing ASIC implementations on the same application. Our FPGA-based implementations could achieve 20X speedup and 80,000X energy efficiency improvement compared with state-of-the-art implementations in GPU.
Yanzhi Wang is currently an assistant professor at Syracuse University, starting from August 2015. He received B.S. degree from Tsinghua University in 2009 and Ph.D. degree from University of Southern California in 2014, under supervision of Prof. Massoud Pedram. His research interests include neuromorphic computing, energy-efficient deep learning systems, deep reinforcement learning, embedded systems and wearable devices, etc. He has received best paper awards from International Symposium on Low Power Electronics Design 2014, International Symposium on VLSI Designs 2014, top paper award from IEEE Cloud Computing Conference 2014, and best paper in track from ICASSP 2017. He has two popular papers in IEEE Trans. on CAD. He has received multiple best paper nominations from ACM Great Lakes Symposium on VLSI, IEEE Trans. on CAD, and Asia and South Pacific Design Automation Conference.