Morgan Woods

真真夜夜的炼金工坊

AI Compilation Principles 2

Thank you to the uploader ZOMI-chan: https://space.bilibili.com/517221395

GCC Compilation Process and Principles#

Main Features of GCC

  • It is a portable compiler that supports multiple hardware platforms.
  • Cross-platform cross-compilation.
  • It has multiple language front-ends for parsing different languages.
  • Modular design allows for the addition of new language and CPU architecture support.
  • It is open-source free software and can be used for free.
image

GCC Compilation Flow#

The compilation process of GCC can be roughly divided into four stages: preprocessing, compilation, assembly, and linking.

image

Source Code (Text)#

#include <stdio.h>

#define HELLOWORD ("hello world\n")

int main(void){
    printf(HELLOWORD);
    return 0;
}

Preprocessing (cpp)#

Generate file hello.i

gcc -E hello.c -o hello.i

During the preprocessing stage, the source code is read, and the included preprocessing directives and macro definitions are checked, followed by the corresponding replacement operations. Additionally, the preprocessing stage removes comments and unnecessary whitespace characters from the program. The final generated .i file contains the preprocessed code content.

When high-level language code is preprocessed to generate the .i file, the preprocessing process involves macro replacement, conditional compilation, and other operations. The following is an explanation of these preprocessing operations:

  1. Header File Expansion:
    During the preprocessing stage, the compiler inserts the contents of the included header files into the corresponding positions in the source file, so that functions, variables, macros, and other contents defined in the header files can be accessed during compilation.
  2. Macro Replacement:
    During the preprocessing stage, the compiler replaces the macros defined in the source file with their defined content when used. This simplifies code writing and improves code readability and maintainability.
  3. Conditional Compilation:
    Through preprocessing directives such as #if, #else, #ifdef, etc., it is determined before compilation whether certain code snippets should be included in the final compilation process. This allows for selective inclusion of code based on conditions, achieving code control for different platforms and environments.
  4. Removing Comments:
    During the preprocessing stage, the compiler removes comments in the source file, including single-line comments (//) and multi-line comments (/.../), which can improve compilation speed and reduce the size of the compiled code.
  5. Adding Line Numbers and File Name Identifiers:
    Through preprocessing directives such as #line, line numbers and file name identifiers are added to the source file during the preprocessing stage, making it easier to locate error messages and debug during compilation.
  6. Retaining #pragma Commands:
    During the preprocessing stage, the compiler retains preprocessing directives that start with #pragma, such as #pragma once, #pragma pack, etc. These directives can be used to guide the compiler for specific processing, such as controlling compiler behavior or optimizing code.

The partial content of the hello.i file is as follows, and detailed content can be seen in the ../code/gcc/hello.i file.

int main(void){
    printf(("hello world\n"));
    return 0;
}

In this file, the header file has been included, the macro definition HELLOWORD has been replaced with the string "hello world\n", and comments and unnecessary whitespace characters have been removed.

Compilation (ccl)#

Here, compilation does not merely refer to the entire process of converting a program from source file to binary file, but specifically refers to the process of converting the preprocessed file (hello.i) into a specific assembly code file (hello.s).

In this process, the preprocessed .i file is used as input, and the compiler (ccl) generates the corresponding assembly code .s file. The compiler (ccl) is the front-end of GCC, and its main function is to convert the preprocessed code into assembly code. The compilation stage performs syntax analysis, lexical analysis, and various optimizations on the preprocessed .i file, ultimately generating the corresponding assembly code.

Assembly code exists in a text format, and after compilation, it generates the .s file, serving as a bridge between the high-level language code written by programmers and computer hardware.

Generate file hello.s:

gcc -S hello.i -o hello.s

hello.s:

    .section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 10, 15    sdk_version 10, 15, 6
    .globl    _main                   ## -- Begin function main
    .p2align    4, 0x90
_main:                                  ## @main
    .cfi_startproc
## %bb.0:
    pushq    %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register %rbp
    subq    $16, %rsp
    movl    $0, -4(%rbp)
    leaq    L_.str(%rip), %rdi
    movb    $0, %al
    callq    _printf
    xorl    %ecx, %ecx
    movl    %eax, -8(%rbp)          ## 4-byte Spill
    movl    %ecx, %eax
    addq    $16, %rsp
    popq    %rbp
    retq
    .cfi_endproc
                                        ## -- End function
    .section    __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
    .asciz    "hello world\n"

.subsections_via_symbols

Now the hello.s file contains only assembly instructions, indicating that the hello.c file has been successfully compiled into assembly language.

Assembly (as)#

In this step, we convert the assembly code into machine instructions. This step is completed by the assembler (as). The assembler is the back-end of GCC, and its main function is to convert assembly code into machine instructions.

The assembler's job is to convert human-readable assembly code into machine instructions or binary code, generating a relocatable object program, typically with a .o file extension. This object file contains the machine code converted line by line, stored in binary form. This relocatable object program provides the foundation for subsequent linking and execution, allowing our assembly code to be directly executed by the computer.

Generate file hello.o

gcc -c hello.s -o hello.o

Linking (ld)#

During the linking process, the linker’s role is to link the object files with other object files, library files, and startup files to generate an executable file. During the linking process, the linker resolves symbols, performs relocation, optimizes code, determines memory layout, loads, and performs dynamic linking. Through the linker’s processing, all necessary dependencies are packaged into a target program that can be executed on a specific platform, allowing users to directly execute this program.

gcc -o hello.o -o hello

Adding the -v parameter allows you to view the detailed compilation process:

gcc -v hello.c -o hello
  • Static Linking refers to a copy of each library function needed is added to the executable file during the linking process. By using static libraries for static linking, the generated program contains all the libraries required for program execution and can run directly. However, programs generated by static linking tend to be larger in size.
  • Dynamic Linking refers to the executable file only contains the file names, allowing the loader to find the required function libraries at runtime. By using dynamic libraries for dynamic linking, the generated program needs to load the required dynamic libraries to run. Compared to static linking, dynamically linked programs are smaller in size but must rely on the required dynamic libraries; otherwise, they cannot be executed.

Compilation Methods#

TypeDefinitionExample
Local CompilationThe platform compiling the source code is the same as the platform executing the compiled program.Compiling on Intel x86 architecture/Windows platform, the generated program runs on the same Intel x86 architecture/Windows 10.
Cross CompilationThe platform compiling the source code is different from the platform executing the compiled program.Compiling on Intel x86 architecture/Linux (Ubuntu) platform using a cross-compilation toolchain, the generated program runs on ARM architecture/Linux.

Differences Between GCC and Traditional Compilation Process#

The traditional three-stage division refers to dividing the compilation process into front-end, optimization, and back-end stages, with each stage having dedicated tools responsible for it.

In GCC, the compilation process is divided into four stages: preprocessing, compilation, assembly, and linking. Among them, the preprocessing and compilation stages of GCC belong to the front-end part of the three-stage division, while the assembly stage belongs to the back-end part of the three-stage division.

The linking stage of GCC merges the optimization stage of the back-end part of the three-stage division, but its purpose is consistent with that of the back-end part, which is to generate an executable file.

The four stages of the GCC compilation process have certain overlaps and correspondences with the front-end, optimization, and back-end stages of the traditional three-stage division, but GCC provides a more detailed and comprehensive division of the compilation process, making the functions of each stage clearer and more independent.

Summary#

This section introduced the GCC compilation process, mainly including the four stages of preprocessing, compilation, assembly, and linking. It also summarized the advantages and disadvantages of GCC:

Advantages of GCCDisadvantages of GCC
1) Supports JAVA/ADA/FORTRAN1) GCC code coupling is high, making it difficult to integrate into dedicated IDEs using a modular approach.
2) GCC supports more platforms2) GCC is built as a single static compiler, making it difficult to be used as an API and integrated into other tools.
3) GCC is more popular, widely used, and fully supported3) From 1987 to 2022, over 35 years, the later versions have poorer code quality.
4) GCC is based on C and can compile without a C++ compiler4) GCC has about 15 million lines of code, making it one of the largest existing free programs.
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.