Thank you to the uploader ZOMI-chan: https://space.bilibili.com/517221395
GCC Compilation Process and Principles#
Main Features of GCC
- It is a portable compiler that supports multiple hardware platforms.
- Cross-platform cross-compilation.
- It has multiple language frontends for parsing different languages.
- Modular design allows for the addition of new language and CPU architecture support.
- It is open-source free software and can be used for free.
GCC Compilation Flow#
The GCC compilation process can be roughly divided into four stages: preprocessing, compilation, assembly, and linking.
Source Code (Text)#
#include <stdio.h>
#define HELLOWORD ("hello world\n")
int main(void){
printf(HELLOWORD);
return 0;
}
Preprocessing (cpp)#
Generate file hello.i
gcc -E hello.c -o hello.i
During the preprocessing stage, the source code is read, and the included preprocessing directives and macro definitions are checked, followed by the corresponding replacement operations. Additionally, the preprocessing stage removes comments and unnecessary whitespace characters from the program. The final generated .i file contains the preprocessed code content.
When high-level language code is preprocessed to generate the .i file, the preprocessing process involves macro replacement, conditional compilation, and other operations. The following is an explanation of these preprocessing operations:
- Header File Expansion:
During the preprocessing stage, the compiler inserts the contents of the included header files into the corresponding positions in the source file, allowing access to the functions, variables, macros, and other contents defined in the header files during compilation. - Macro Replacement:
During the preprocessing stage, the compiler replaces the macros defined in the source file when they are used, substituting the macro names with their defined content. This simplifies code writing and improves code readability and maintainability. - Conditional Compilation:
Through preprocessing directives such as #if, #else, #ifdef, etc., it is determined before compilation whether certain code segments should be included in the final compilation process. This allows for selective inclusion of code based on conditions, enabling code control for different platforms and environments. - Removing Comments:
During the preprocessing stage, the compiler removes comments from the source file, including single-line comments (//) and multi-line comments (/.../), which can improve compilation speed and reduce the size of the compiled code. - Adding Line Numbers and File Name Identifiers:
Through preprocessing directives such as #line, line numbers and file name identifiers are added to the source file during the preprocessing stage, facilitating the location of error messages and debugging during compilation. - Retaining #pragma Commands:
During the preprocessing stage, the compiler retains preprocessing directives that begin with #pragma, such as #pragma once, #pragma pack, etc. These directives can be used to guide the compiler in specific processing, such as controlling the compiler's behavior or optimizing code.
Partial content of the hello.i
file is as follows, with details available in the ../code/gcc/hello.i
file.
int main(void){
printf(("hello world\n"));
return 0;
}
In this file, the header file has been included, the macro definition HELLOWORD has been replaced with the string "hello world\n", and comments and unnecessary whitespace characters have been removed.
Compilation (ccl)#
Here, compilation does not merely refer to the entire process of converting a program from source file to binary file, but specifically to the process of converting the preprocessed file (hello.i
) into a specific assembly code file (hello.s
).
In this process, the preprocessed .i file is used as input, and the compiler (ccl) generates the corresponding assembly code .s file. The compiler (ccl) is the frontend of GCC, and its main function is to convert the preprocessed code into assembly code. The compilation stage performs syntax analysis, lexical analysis, and various optimizations on the preprocessed .i file, ultimately generating the corresponding assembly code.
Assembly code exists as text-based program code, and the generated .s file serves as a bridge between the high-level language code written by programmers and computer hardware.
Generate file hello.s
:
gcc -S hello.i -o hello.s
hello.s
:
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 10, 15 sdk_version 10, 15, 6
.globl _main ## -- Begin function main
.p2align 4, 0x90
_main: ## @main
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $16, %rsp
movl $0, -4(%rbp)
leaq L_.str(%rip), %rdi
movb $0, %al
callq _printf
xorl %ecx, %ecx
movl %eax, -8(%rbp) ## 4-byte Spill
movl %ecx, %eax
addq $16, %rsp
popq %rbp
retq
.cfi_endproc
## -- End function
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "hello world\n"
.subsections_via_symbols
Now the hello.s
file contains only assembly instructions, indicating that the hello.c
file has been successfully compiled into assembly language.
Assembly (as)#
In this step, we convert the assembly code into machine instructions. This step is accomplished by the assembler (as). The assembler is the backend of GCC, and its main function is to convert assembly code into machine instructions.
The assembler's job is to convert human-readable assembly code into machine instructions or binary code, generating a relocatable object program, typically with a .o file extension. This object file contains the machine code converted line by line, stored in binary form. This relocatable object program provides the foundation for subsequent linking and execution, allowing our assembly code to be directly executed by the computer.
Generate file hello.o
gcc -c hello.s -o hello.o
Linking (ld)#
During the linking process, the linker’s role is to link the object files with other object files, library files, and startup files to generate an executable file. During linking, the linker resolves symbols, performs relocation, optimizes code, determines memory layout, loads, and performs dynamic linking. Through the linker’s processing, all necessary dependencies are packaged into a target program that can be executed on a specific platform, allowing users to run this program directly.
gcc -o hello.o -o hello
Adding the -v parameter allows you to view the detailed compilation process:
gcc -v hello.c -o hello
- Static Linking refers to a copy of each library function needed is included in the executable file during the linking process. By using static libraries for linking, the generated program contains all the libraries required for its execution and can run directly. However, programs generated by static linking tend to be larger in size.
- Dynamic Linking refers to the executable file only contains the file names, allowing the loader to find the required function libraries at runtime. By using dynamic libraries for linking, the generated program needs to load the required dynamic libraries to run. Compared to static linking, dynamically linked programs are smaller in size but must rely on the required dynamic libraries; otherwise, they cannot be executed.
Compilation Methods#
Type | Definition | Example |
---|---|---|
Local Compilation | The platform that compiles the source code is the same as the platform that executes the compiled program. | Compiling on Intel x86 architecture/Windows platform, the generated program runs on the same Intel x86 architecture/Windows 10. |
Cross Compilation | The platform that compiles the source code is different from the platform that executes the compiled program. | Compiling on Intel x86 architecture/Linux (Ubuntu) platform using a cross-compilation toolchain, the generated program runs on ARM architecture/Linux. |
Differences Between GCC and Traditional Compilation Process#
The traditional three-stage division refers to dividing the compilation process into frontend, optimization, and backend stages, with each stage having dedicated tools responsible for it.
In GCC, the compilation process is divided into four stages: preprocessing, compilation, assembly, and linking. Among them, the preprocessing and compilation stages of GCC belong to the frontend part of the three-stage division, while the assembly stage belongs to the backend part of the three-stage division.
The linking stage of GCC merges the optimization stage of the backend part of the three-stage division, but its purpose is consistent with that of the backend part, which is to generate an executable file.
The four stages of the GCC compilation process have some overlap and correspondence with the frontend, optimization, and backend stages of the traditional three-stage division, but GCC provides a more detailed and comprehensive division of the compilation process, making the functions of each stage clearer and more independent.
Summary#
This section introduced the GCC compilation process, which mainly includes four stages: preprocessing, compilation, assembly, and linking. It also summarized the advantages and disadvantages of GCC:
Advantages of GCC | Disadvantages of GCC |
---|---|
1) Supports JAVA/ADA/FORTRAN | 1) GCC has a high degree of code coupling, making it difficult to integrate into dedicated IDEs and call GCC in a modular way. |
2) GCC supports more platforms | 2) GCC is built as a single static compiler, making it difficult to be used as an API and integrated into other tools. |
3) GCC is more popular, widely used, and fully supported | 3) From 1987 to 2022, over 35 years, the later versions have poorer code quality. |
4) GCC is based on C and does not require a C++ compiler for compilation | 4) GCC has about 15 million lines of code, making it one of the largest free programs in existence. |