Every program we wrote needs to invoke a compiler to convert the source files into an executable file. Basically, the compiler takes each C++ source file in the working directory and compiles them into object files. The object files produced are then linked together with libraries and symbols to produce an executable file, which is our program.
Note that each source file will be compile into one object file. The picture below shows that compiler converts Main.cpp
into Main.obj
.
If our program has n
source files, the compiler would produces n
object files as a result. For example, the compiler will generates two object files if there are two source files provided. (Main.obj
and Math.obj
in the example below)
We can categorise the compiler works into three main stages. However, the actual compiling process involves more steps. The detailed process is not covered in this post, as this post is meant to be beginner friendly.
- Stage 1 : Preprocessing
- Stage 2 : Compiling
- Stage 3 : Linking
Stage 1 : Preprocessing
At the first stage, the compiler will run the preprocessor on all the source files (only source files, no header files). Each C++ source file will then be built into a translation unit which resulted as object file at the later stage. A translation unit is just a preprocessed source file consists of an implementation file (.c / .cpp) and all the headers (.h / .hpp) that it included. It usually represented in a file with a .i suffix. ( Note that this file is hypothetical and only produced by compiler if we specifically requested. )
Here, the preprocessor will go through all our preprocessor directives and resolves them before compilation stage.
How preprocessor resolve #include
The most commonly used preprocessor directive would be #include
, and It is crucial for every C++ developer to know how it works.
Let’s take a look at a simple Math.cpp
that add 2 numbers,
Math.h
Math.cpp
Note that Math.cpp
has a preprocessor directive #include
that include Math.h
. Here, the processor will open Math.h
, read all the contents inside, and paste it into our Math.cpp
.
To have a better understanding on how it works, we could request the compiler to give us the preprocessed source file. Let’s have a look at Math.i
Math.i
You may have noticed that “int num2 = 2
” has been copied from Math.h
to Math.cpp
. That’s all the preprocessor does, it’s pretty simple.
Now, Let’s assume we have a Main.cpp
that prints “Hello World” on screen:
Main.cpp
And if we look at the size of preprocessed C/C++ Source files produced :
Note that the file size of Main.i(1.34MB)
is much larger than Math.i(269 bytes)
even though the line of codes are similar. That’s because we include a huge and massive <iostream>
in Main.cpp
.
Stage 2 : Compiling
After preprocessor done it’s job, the compiler will then take our C++ translation units and compiles them into object files. Theses object files in binary contain computer understandable machine code, which included instructions and metadata about the addresses of variables and functions (symbols). As we can see from Math.obj
below, it contains binary data.
Math.obj
We can also request the compiler to generate the output in human-readable assembly listing files. The assembly code below are extracted from the generated assembly listing file Math.asm
.
Math.asm
We can see that it contains symbol for Add
function, and the Add
operation has been converted into assembly instructions. The first instruction move num1
to registry eax
, and second instruction add num2
with num1
stored inside eax
and update the result in eax
.
Now with all the object files generated, the computer knows what to do and where the symbols located. The next stage is to link them together.
Stage 3 : Linking
Object files generated from compiler are standalone and unable to interact with each other, and it is the job of linker to link them together. In a nutshell, the linker links all object files and libraries together and create an executable file.
To have a better understanding on how linker works, let’s start with a simple example. Assume that we have an Add
function definition in Math.h
, which receives two integer parameters and returns the sum of them. (Of course in real life we won’t write code in this way, this is just an example to show how compiler and linker work.)
Math.h
And we call the function in Main.cpp
as below :
Main.cpp
After we compiled the code, you probably noticed that there is an compilation error C3861 telling that ‘Add’ identifier not found, of course, because Main.cpp
has no idea what Add
is.
One of the ways to fix this is to simply copy the function signature into Main.cpp
, to tell the compiler that Add
is a function receives two int parameters and returns an int value.
Now the compilation is succeed.
Next, let’s try to build it.
Note that we get a linking error LNK2019 telling that we have unresolved external symbol, named Add@@YAHHH@Z
, which is our Add function. This result is expected since the linker doesn’t knows where to find the function required, as we only provide function signature. The linker needs to know where the function definition located.
Now let’s include Math.h
instead, which contains the function definition, and build again.
This time the build is succedded, and an executable file ( HelloWorld.exe
) is generated.