AI Compilation Principles 1

Thanks to the uploader ZOMI-chan: https://space.bilibili.com/517221395

Basic Concepts of Compilers#

What is a compiler?
Why do AI frameworks need to introduce compilers?
What is the relationship between AI frameworks and AI compilers?

Compilers and Interpreters#

The biggest difference between a compiler and an interpreter is: the interpreter converts code into machine code at runtime, while the compiler converts code into machine code before the program runs.

JIT and AOT Compilation Methods#

Currently, there are mainly two ways for programs to run: static compilation and dynamic interpretation.

Static compiled code programs are entirely translated into machine code before execution, usually referred to as AOT (Ahead of Time), meaning "compiled in advance";
Dynamically interpreted programs translate and run the code simultaneously, usually referred to as JIT (Just In Time), meaning "compiled on the fly".

A typical representative of AOT programs is applications developed in C/C++, which must be compiled into machine code before execution and then handed over to the operating system for specific execution; while JIT has many representatives, such as JavaScript, Python, and other dynamically interpreted programs.

Feature	JIT (Just In Time)	AOT (Ahead of Time)
Advantages	1. Can generate optimal machine instructions in real-time based on current hardware conditions 2. Can generate optimal sequences of machine instructions based on the current program's execution 3. When a program needs to support dynamic linking, only JIT compilation can be used 4. Can adjust code based on the actual memory situation in the process, allowing for better memory utilization	1. Compiling before program execution can avoid performance and memory consumption during runtime compilation 2. Can achieve maximum performance early in program execution 3. Can significantly speed up program execution efficiency
Disadvantages	1. Compilation requires resources from the runtime, which can cause stuttering during process execution 2. Compilation takes up runtime, and some code optimization may not be fully supported, requiring a trade-off between smoothness and time 3. Time is needed for preparing and identifying frequently used methods, initial compilation cannot achieve maximum performance	1. Compiling before program execution increases installation time 2. Storing pre-compiled content takes up more memory 3. Sacrifices consistency issues of high-level languages

Differences in AI Frameworks#

Currently, mainstream AI frameworks come with a front-end expression layer, along with AI compilers enabling hardware, thus the relationship between AI frameworks and AI compilers is very close. Some AI frameworks, such as MindSpore and TensorFlow, include their own AI compilers by default. After upgrading to PyTorch 2.X, it also comes with the Inductor feature, which can interface with multiple different AI compilers.

Compilation Method	Description	Typical Representatives
AOT (Ahead of Time)	Static compiled code programs are entirely translated into machine code before execution, suitable for mobile and embedded deep learning applications.	1. Inference engine: AI models after training are solidified in advance for inference deployment. 2. Static graph generation: Neural network models represented as a unified IR description, executing compiled content at runtime.
JIT (Just In Time)	Dynamically interpreted programs translate and run simultaneously, suitable for scenarios requiring real-time optimization.	1. PyTorch JIT: Compiles Python code into native machine code in real-time, optimizing and accelerating deep learning models. 2. Jittor: A deep learning framework based on dynamic JIT compilation, using meta-operators and unified computation graphs for efficient operations and automatic optimization.

Pass and Intermediate Representation IR#

A Pass is a complete scan or processing of the source programming language. In a compiler, a Pass refers to a structured technique used to complete functions such as analysis, optimization, or transformation of the compilation object (IR). The execution of a Pass is the process of the compiler analyzing and optimizing the compilation unit, and the Pass constructs the analysis results needed for these processes.

In the LLVM compiler, the Passes are divided into three categories: Analysis pass, Transform pass, and Utility pass.

Pass Type	Description	Function	Common Examples
Analysis Pass	Computes high-level information about relevant IR units without modifying them. This information can be used by other Passes or for debugging and program visualization.	1. Extracts and stores information from IR units 2. Provides query interfaces for other Passes to access 3. Provides invalidate interfaces to handle information invalidation	Basic Alias Analysis, Scalar Evolution Analysis
Transform Pass	Uses the analysis results from Analysis Pass to change and optimize the IR in some way.	1. Changes instructions and control flow in the IR 2. May reduce function calls, exposing more optimization opportunities	Inline Pass
Utility Pass	Functional utility programs that do not belong to Analysis Pass or Transform Pass.	1. Performs specific tasks, such as extracting basic blocks	extract-blocks Pass

IR (Intermediate Representation) is a very important data structure in compilers. After completing the front-end work, the compiler first generates its custom IR and then executes various optimization algorithms based on it, finally generating the target code.

As shown in the figure, in compiler theory, compilers are usually divided into front-end and back-end. The front-end performs lexical analysis, syntax analysis, and semantic analysis on the input program, then generates the intermediate representation IR. The back-end optimizes the IR and then generates the target code.

For example, LLVM separates the front-end and back-end, clearly defining an abstract language at the intermediate layer, called IR. After defining IR, the front-end's task is to generate the final IR, the optimizer is responsible for optimizing the generated IR, and the back-end's task is to convert the IR into the target platform's language. LLVM's IR uses LLVM assembly language or LLVM language to implement the type system of LLVM IR, which refers to the type system in LLVM assembly language.

Thus, the only exchanged data structure type between the front-end, optimizer, and back-end of the compiler is IR, which decouples different modules. Some IRs even have specific names, such as Open64's IR is usually called WHIRL IR, Ark Compiler's IR is called MAPLE IR, and LLVM is typically referred to as LLVM IR.

IR usually has two purposes: 1) one is for analysis and transformation, 2) the other is directly for interpreted execution.

In the compiler, the analysis and processing work based on IR can be based on some higher-level semantic abstractions in the early stages, where the required IR is closer to the source code. In the later stages of the compiler, lower-level semantics, closer to the target code, are used. Based on the aforementioned high-to-low level abstractions, IR can be categorized into three layers: high-level HIR, middle-level MIR, and low-level LIR.

IR Type	Description	Purpose	Features
HIR (High IR)	Analysis and transformation of code executed based on the source programming language.	Used for IDEs, code translation tools, code generation tools, etc., executing high-level code optimizations (such as constant folding, inline association).	1. Accurately expresses the semantics of the source programming language 2. Can use AST and symbol tables
MIR (Middle IR)	Code analysis and optimization independent of the source programming language and hardware architecture.	Used for compilation optimization algorithms, executing general optimizations (such as arithmetic optimization, constant and variable propagation, dead code elimination).	1. Independent of source program code and target program code 2. Usually based on three-address code (TAC)
LIR (Low IR)	Optimizations and code generation dependent on specific underlying hardware architecture.	Used for executing optimizations related to specific hardware architectures, generating machine instructions or assembly code.	1. Instructions usually correspond one-to-one with machine instructions 2. Reflects the underlying characteristics of specific hardware architectures

Characteristics of three-address code TAC: At most three addresses (i.e., variables), where the left side of the assignment symbol is for writing, and the right side can have at most two addresses and one operator for reading data and calculating.

Multi-layer IR has significant advantages over single-layer IR:

Can provide more information about the source programming language
IR expression is more flexible and convenient for optimization
Makes optimization algorithms and optimization Passes more efficient

In the LLVM compiler, a three-part structure is adopted based on the abstract levels from high to low, which makes it very convenient to add support for new languages or new target platforms, greatly reducing engineering costs. In this front-end and back-end separated three-part structure, LLVM IR is mainly divided into three layers of IR, playing an important role in connecting various levels and modules within the compiler. From facilitating developers' understanding of program code to facilitating understanding by hardware machines.

Summary#

An interpreter is a computer program that converts each high-level program statement into machine code.
A compiler converts high-level language programs into machine code, i.e., converts human-readable code into computer-readable code.
A Pass is a complete scan or processing of the source programming language.
Intermediate Representation IR is a data structure in compilers that connects various levels and modules within the compiler.

02 Development of Traditional Compilers#

Compilers and programming languages have developed almost synchronously, and the development process can be divided into several stages:

First stage: In the 1950s, the first compiler program appeared, translating arithmetic formulas into machine code, laying the foundation for the development of high-level languages.
Second stage: In the 1960s, various high-level languages and corresponding compilers emerged, such as Fortran, COBOL, LISP, ALGOL, etc., and compilation technology gradually matured and standardized.
Third stage: In the 1970s, structured programming methods and modular programming ideas emerged, as well as object-oriented languages and compilers, such as Pascal, C, Simula, etc. Compilation technology began to focus on the readability and maintainability of engineering code.
Fourth stage: In the 1980s, parallel computers and distributed systems emerged, along with languages and compilers that support parallel and distributed computing, such as Ada, Prolog, ML, etc. Compilation technology began to consider the parallel and distribution capabilities of programs.
Fifth stage: In the 1990s, new platforms such as the Internet and mobile devices emerged, along with languages and compilers that support cross-platform and dynamic features, such as Java, C#, Python, etc. Compilation technology began to focus on the security and efficiency of programs.
Sixth stage: In the first decade of the 21st century, the Torch framework led by Lua emerged to address the explosive emergence of AI applications and AI algorithm research, followed by the introduction of AI frameworks such as TensorFlow, PyTorch, MindSpore, Paddle, etc. With the development of AI frameworks and the AI industry, AI compilers such as AKG and MLIR have emerged.

The Competition of Traditional Compilers#

Currently, mainstream classic open-source compilers like LLVM and GCC are usually divided into three parts: front-end (frontEnd), optimizer (Optimizer), and back-end (backEnd).

Front-End: Mainly responsible for lexical and syntax analysis, converting source code into an abstract syntax tree, dividing the program into basic components, checking the syntax, semantics, and grammar of the code, and then generating intermediate code.
Optimizer: The optimizer optimizes the intermediate code obtained based on the front-end (such as removing redundant code, eliminating sub-expressions, etc.), making the code more efficient.
Back-end: The back-end generates target code from the already optimized intermediate code, converting it into code optimizers and code generators.

Type	GCC	Clang/LLVM
License	GNU GPL	Apache 2.0
Code Modularity	Integrated architecture	Modular
Supported Platforms	Unix, Windows, MAC	Unix, MAC
Code Generation	Efficient, with many compiler options available	Efficient, LLVM back-end uses SSA form
Language Independent Type System	No	Yes
Build Tools	Make Base	CMake
Parser	Originally used Bison LR, now changed to recursive descent parser	Handwritten recursive descent parser
Linker	LD	lld
Debugger	GDB	LLDB

Summary#

Compilation technology is a gem in computer science, as a core technology in foundational software.
Compilers can recognize vocabulary, sentences, and various specific formats and data structures in high-level language program code.
The compilation process is the conversion of source code programs into binary code that machines can recognize.
Traditional compilers are usually divided into three parts: front-end (frontEnd), optimizer (Optimizer), and back-end (backEnd).