摘要
With the wide deployment of various machine learning algorithms, highly energy-efficient customized machine learning systems have gained popularity. The machine learning compilers are crucial to machine learning systems. The intermediate representation is the key to programming and compilation environments, and it connects the high-level programming language and the lower-level instruction set architectures. The current state-of-the-art intermediate representations are either oriented to high-level algorithms or classical processors based on scalar processing, but they cannot be effectively implemented on tensor-based machine learning systems. To address this problem, we propose a tensor intermediate representation for machine learning systems to improve programming productivity and performance. Concretely, we define a series of tensor types, tensor operations, and tensor memories and optimize the tensor processing based on these definitions. To validate our proposal, we extend the proposed tensor intermediate representation to the low-level scalar intermediate representation of TVM and perform experiments with Tensor Core on a typical machine learning system. Experimental results show that we explore optimizations that are not discovered in the original intermediate representation and achieve 1.62×~2.85× performance improvement. Besides, the tensor intermediate representation improves the efficiency of programming by 5.46× on average.
- 单位