什么是LLVM
LLVM是架構(gòu)編譯器(compiler)的框架系統(tǒng),以C++編寫而成,用于優(yōu)化任何編程語言編寫的程序的編譯時間(compile-time),鏈接時間(link-time),運(yùn)行時間 (run-time)以及空閑時間(idle-time),對開發(fā)者保持開放,并兼容已有腳本。
目前,LLVM已被蘋果的開發(fā)工具(Xcode),Xilinx Vivado,F(xiàn)acebook,Google等公司采用。
傳統(tǒng)編譯器的設(shè)計(jì)

-
Frontend(編譯器前端)
編譯器前端主要任務(wù)是解析源代碼。它會進(jìn)行詞法分析,語法分析,語義分析,檢測源代碼是否存在錯誤,然后構(gòu)建抽象語法樹(Abstract Syntax Tree, AST),LLVM前端還會生成中間代碼(intermediate representation ,IR)。 -
Optimizer(優(yōu)化器)
負(fù)責(zé)進(jìn)行各種優(yōu)化。改善代碼運(yùn)行時間,如消除代碼冗余計(jì)算等。 -
Backend(編譯器后端)/CodeGenerator(代碼生成器)
將代碼映射到目標(biāo)指令集。生成機(jī)器語言,并進(jìn)行機(jī)器相關(guān)的代碼優(yōu)化。
iOS的編譯器架構(gòu)

ObjectiveC/C/C++使用的編譯器前端是Clang,Swift的編譯前端是Swift,而它們的后端都是LLVM。
LLVM的設(shè)計(jì)

當(dāng)編譯器要支持多種源語言或多種硬件架構(gòu)時,LLVM的優(yōu)勢就顯現(xiàn)出來了。它有別于如GCC這樣的編譯器,由于GCC是作為整體應(yīng)用程序而設(shè)計(jì)的,因此他的用途就受到了局限性和靈活性;而LLVM設(shè)計(jì)最重要的部分是,使用通用的代碼表現(xiàn)形式(IR),它是用來在編譯器中表示代碼的形式。所以LLVM可以為任何編程語言獨(dú)立編寫前端,同樣可以為任何硬件架構(gòu)獨(dú)立編寫后端。
Clang
Clang是LLVM項(xiàng)目中的一個子項(xiàng)目。它是基于LLVM的一個輕量級編譯器,誕生之初是為了替代GCC, 提供更快的編譯速度。它是負(fù)責(zé)編譯C,C++,Obj-C的編譯器,它屬于真?zhèn)€LLVM架構(gòu)中的,編譯器前端。
通過Clang 感受編譯流程
首先通過Xcode創(chuàng)建一個簡單的工程, 這里只包含一個main.m文件
#import <Foundation/Foundation.h>
int main(int argc, const char * argv[]) {
@autoreleasepool {
// insert code here...
NSLog(@"Hello, World!");
}
return 0;
}
- 通過命令行打印
main.m的編譯階段:
clang -ccc-print-phases main.m
執(zhí)行結(jié)果:
0: input, "main.m", objective-c
1: preprocessor, {0}, objective-c-cpp-output
2: compiler, {1}, ir
3: backend, {2}, assembler
4: assembler, {3}, object
5: linker, {4}, image
6: bind-arch, "x86_64", {5}, image
注釋
0: 輸入文件:找到源文件
1: 預(yù)處理:替換宏,頭文件導(dǎo)入等
2: 編譯:進(jìn)行詞法分析,語法分析,檢測語法錯誤,最終生成ir文件
3: 后端:通過一個個的Pass優(yōu)化,最終生成匯編代碼
4: 匯編:生成目標(biāo)文件
5: 鏈接:鏈接需要的動態(tài)庫和靜態(tài)庫,生成可執(zhí)行文件
6: 通過不同的架構(gòu),生成對應(yīng)的可執(zhí)行文件
-
執(zhí)行預(yù)處理命令
這個步驟執(zhí)行的東西比較多,這里#import <Foundation/Foundation.h>注釋掉,然后修改下代碼:
#import <stdio.h>
#define A 10
typedef int ZZ_INT;
int main(int argc, const char * argv[]) {
ZZ_INT B = 15;
printf("result = %d",A + B);
return 0;
}
clang -E main.m
# 1 "main.m"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 375 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "main.m" 2
# 11 "main.m"
# 1 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/stdio.h" 1 3 4
# 64 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/stdio.h" 3 4
# 1 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/_stdio.h" 1 3 4
# 68 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/_stdio.h" 3 4
# 1 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/sys/cdefs.h" 1 3 4
# 630 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/sys/cdefs.h" 3 4
//省略.......
typedef union {
char __mbstate8[128];
long long _mbstateL;
} __mbstate_t;
typedef __mbstate_t __darwin_mbstate_t;
typedef long int __darwin_ptrdiff_t;
//省略......
typedef int ZZ_INT;
int main(int argc, const char * argv[]) {
ZZ_INT B = 15;
printf("result = %d",10 + B);
return 0;
}
省略的部分大致為#import <stdio.h>頭文件導(dǎo)入的內(nèi)容,main函數(shù)中printf中,原來#define A 10也被直接替換成了10。這里需要注意的是typedef int ZZ_INT;并沒有替換。
-
編譯階段
-詞法分析
預(yù)處理之后就是詞法分析。這里會把代碼切成一個個的Token
clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m
執(zhí)行結(jié)果:
annot_module_include '#import <stdio.h>
#define A 10
typedef int ZZ_INT;
int main(int argc, const char * argv[]) {
ZZ_INT B = 15;
printf("result = %d",A + B);
return 0;
}
?' Loc=<main.m:11:1>
typedef 'typedef' [StartOfLine] Loc=<main.m:14:1>
int 'int' [LeadingSpace] Loc=<main.m:14:9>
identifier 'ZZ_INT' [LeadingSpace] Loc=<main.m:14:13>
semi ';' Loc=<main.m:14:19>
int 'int' [StartOfLine] Loc=<main.m:16:1>
identifier 'main' [LeadingSpace] Loc=<main.m:16:5>
l_paren '(' Loc=<main.m:16:9>
int 'int' Loc=<main.m:16:10>
identifier 'argc' [LeadingSpace] Loc=<main.m:16:14>
comma ',' Loc=<main.m:16:18>
const 'const' [LeadingSpace] Loc=<main.m:16:20>
char 'char' [LeadingSpace] Loc=<main.m:16:26>
star '*' [LeadingSpace] Loc=<main.m:16:31>
identifier 'argv' [LeadingSpace] Loc=<main.m:16:33>
l_square '[' Loc=<main.m:16:37>
r_square ']' Loc=<main.m:16:38>
r_paren ')' Loc=<main.m:16:39>
l_brace '{' [LeadingSpace] Loc=<main.m:16:41>
identifier 'ZZ_INT' [StartOfLine] [LeadingSpace] Loc=<main.m:17:5>
identifier 'B' [LeadingSpace] Loc=<main.m:17:12>
equal '=' [LeadingSpace] Loc=<main.m:17:14>
numeric_constant '15' [LeadingSpace] Loc=<main.m:17:16>
semi ';' Loc=<main.m:17:18>
identifier 'printf' [StartOfLine] [LeadingSpace] Loc=<main.m:18:5>
l_paren '(' Loc=<main.m:18:11>
string_literal '"result = %d"' Loc=<main.m:18:12>
comma ',' Loc=<main.m:18:25>
numeric_constant '10' Loc=<main.m:18:26 <Spelling=main.m:13:11>>
plus '+' [LeadingSpace] Loc=<main.m:18:28>
identifier 'B' [LeadingSpace] Loc=<main.m:18:30>
r_paren ')' Loc=<main.m:18:31>
semi ';' Loc=<main.m:18:32>
return 'return' [StartOfLine] [LeadingSpace] Loc=<main.m:19:5>
numeric_constant '0' [LeadingSpace] Loc=<main.m:19:12>
semi ';' Loc=<main.m:19:13>
r_brace '}' [StartOfLine] Loc=<main.m:20:1>
eof '' Loc=<main.m:20:2>
詞法分析將代碼按照單個詞 或者 標(biāo)點(diǎn)符號 分割出來,并標(biāo)記出了文件名,行號,起始位置,比如這里的
identifier 'main' [LeadingSpace] Loc=<main.m:16:5>
main 在main.m文件下的 第16行 從第5個字符開始。
-語法分析
詞法分析完之后就是語法分析,主要是驗(yàn)證語法是否正確。在詞法分析的基礎(chǔ)上,將單詞序列組合成各類語法短語,如'程序','語句','表達(dá)式'等,然后將所有的節(jié)點(diǎn)組成抽象語法樹(Abstract Syntax Tree,AST)。語法分析程序判斷源程序在結(jié)構(gòu)上是否正確。
clang -fmodules -fsyntax-only -Xclang -ast-dump main.m
執(zhí)行結(jié)果:
TranslationUnitDecl 0x7feb9b01cc08 <<invalid sloc>> <invalid sloc> <undeserialized declarations>
|-TypedefDecl 0x7feb9b01d4a0 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x7feb9b01d1a0 '__int128'
|-TypedefDecl 0x7feb9b01d510 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x7feb9b01d1c0 'unsigned __int128'
|-TypedefDecl 0x7feb9b01d5b0 <<invalid sloc>> <invalid sloc> implicit SEL 'SEL *'
| `-PointerType 0x7feb9b01d570 'SEL *'
| `-BuiltinType 0x7feb9b01d400 'SEL'
|-TypedefDecl 0x7feb9b01d698 <<invalid sloc>> <invalid sloc> implicit id 'id'
| `-ObjCObjectPointerType 0x7feb9b01d640 'id'
| `-ObjCObjectType 0x7feb9b01d610 'id'
|-TypedefDecl 0x7feb9b01d778 <<invalid sloc>> <invalid sloc> implicit Class 'Class'
| `-ObjCObjectPointerType 0x7feb9b01d720 'Class'
| `-ObjCObjectType 0x7feb9b01d6f0 'Class'
|-ObjCInterfaceDecl 0x7feb9b01d7d0 <<invalid sloc>> <invalid sloc> implicit Protocol
|-TypedefDecl 0x7feb9b01db48 <<invalid sloc>> <invalid sloc> implicit __NSConstantString 'struct __NSConstantString_tag'
| `-RecordType 0x7feb9b01d940 'struct __NSConstantString_tag'
| `-Record 0x7feb9b01d8a0 '__NSConstantString_tag'
|-TypedefDecl 0x7feb9a022a00 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x7feb9b01dba0 'char *'
| `-BuiltinType 0x7feb9b01cca0 'char'
|-TypedefDecl 0x7feb9a022ce8 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'struct __va_list_tag [1]'
| `-ConstantArrayType 0x7feb9a022c90 'struct __va_list_tag [1]' 1
| `-RecordType 0x7feb9a022af0 'struct __va_list_tag'
| `-Record 0x7feb9a022a58 '__va_list_tag'
|-ImportDecl 0x7feb9a023510 <main.m:11:1> col:1 implicit Darwin.C.stdio
|-TypedefDecl 0x7feb9a023568 <line:14:1, col:13> col:13 referenced ZZ_INT 'int'
| `-BuiltinType 0x7feb9b01cd00 'int'
`-FunctionDecl 0x7feb9a023840 <line:16:1, line:20:1> line:16:5 main 'int (int, const char **)'
|-ParmVarDecl 0x7feb9a0235d8 <col:10, col:14> col:14 argc 'int'
|-ParmVarDecl 0x7feb9a0236f0 <col:20, col:38> col:33 argv 'const char **':'const char **'
`-CompoundStmt 0x7feb9a0c3e60 <col:41, line:20:1>
|-DeclStmt 0x7feb9a0c3c68 <line:17:5, col:18>
| `-VarDecl 0x7feb9a0c3800 <col:5, col:16> col:12 used B 'ZZ_INT':'int' cinit
| `-IntegerLiteral 0x7feb9a0c3868 <col:16> 'int' 15
|-CallExpr 0x7feb9a0c3dd0 <line:18:5, col:31> 'int'
| |-ImplicitCastExpr 0x7feb9a0c3db8 <col:5> 'int (*)(const char *, ...)' <FunctionToPointerDecay>
| | `-DeclRefExpr 0x7feb9a0c3c80 <col:5> 'int (const char *, ...)' Function 0x7feb9a0c3890 'printf' 'int (const char *, ...)'
| |-ImplicitCastExpr 0x7feb9a0c3e18 <col:12> 'const char *' <NoOp>
| | `-ImplicitCastExpr 0x7feb9a0c3e00 <col:12> 'char *' <ArrayToPointerDecay>
| | `-StringLiteral 0x7feb9a0c3cd8 <col:12> 'char [12]' lvalue "result = %d"
| `-BinaryOperator 0x7feb9a0c3d70 <line:13:11, line:18:30> 'int' '+'
| |-IntegerLiteral 0x7feb9a0c3d00 <line:13:11> 'int' 10
| `-ImplicitCastExpr 0x7feb9a0c3d58 <line:18:30> 'ZZ_INT':'int' <LValueToRValue>
| `-DeclRefExpr 0x7feb9a0c3d20 <col:30> 'ZZ_INT':'int' lvalue Var 0x7feb9a0c3800 'B' 'ZZ_INT':'int'
`-ReturnStmt 0x7feb9a0c3e50 <line:19:5, col:12>
`-IntegerLiteral 0x7feb9a0c3e30 <col:12> 'int' 0
CompoundStmt,DeclStmt,CallExpr等都是一些節(jié)點(diǎn)對應(yīng)的命令語句。
-生成中間代碼IR(intermediate representation)
完成上面步驟之后就要開始生成中間代碼(IR)了,代碼生成器會將語法樹自上而下的遍歷逐步翻譯成LLVM-IR。通過如下命令生成.ll文本文件來查看IR代碼:
clang -S -fobjc-arc -emit-llvm main.m
打開main.ll文件:
; ModuleID = 'main.m'
source_filename = "main.m"
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.15.0"
@.str = private unnamed_addr constant [12 x i8] c"result = %d\00", align 1
; Function Attrs: noinline optnone ssp uwtable
define i32 @main(i32, i8**) #0 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i8**, align 8
%6 = alloca i32, align 4
store i32 0, i32* %3, align 4
store i32 %0, i32* %4, align 4
store i8** %1, i8*** %5, align 8
store i32 15, i32* %6, align 4
%7 = load i32, i32* %6, align 4
%8 = add nsw i32 10, %7
%9 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([12 x i8], [12 x i8]* @.str, i64 0, i64 0), i32 %8)
ret i32 0
}
declare i32 @printf(i8*, ...) #1
attributes #0 = { noinline optnone ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "darwin-stkchk-strong-link" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "probe-stack"="___chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "darwin-stkchk-strong-link" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "probe-stack"="___chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6, !7}
!llvm.ident = !{!8}
!0 = !{i32 2, !"SDK Version", [3 x i32] [i32 10, i32 15, i32 4]}
!1 = !{i32 1, !"Objective-C Version", i32 2}
!2 = !{i32 1, !"Objective-C Image Info Version", i32 0}
!3 = !{i32 1, !"Objective-C Image Info Section", !"__DATA,__objc_imageinfo,regular,no_dead_strip"}
!4 = !{i32 4, !"Objective-C Garbage Collection", i32 0}
!5 = !{i32 1, !"Objective-C Class Properties", i32 64}
!6 = !{i32 1, !"wchar_size", i32 4}
!7 = !{i32 7, !"PIC Level", i32 2}
!8 = !{!"Apple clang version 11.0.3 (clang-1103.0.32.59)"}
這里的代碼看起來就有點(diǎn)匯編的意思了,先簡單介紹下這些標(biāo)識符:
@ 全局標(biāo)識
% 局部標(biāo)識
alloca 開辟空間
align 內(nèi)存對齊
i32 32Bit/4bytes
store 寫入內(nèi)存
load讀取數(shù)據(jù)
call 調(diào)用函數(shù)
ret 返回
IR的優(yōu)化:LLVM將優(yōu)化級別分為-O0,-O1,-O2,-O3,-Os,
clang -Os -S -fobjc-arc -emit-llvm main.m -o main.ll
bitCode
Xcode7 以后開啟bitCode后,蘋果會做進(jìn)一步優(yōu)化。生成.bc的中間代碼。
我們通過優(yōu)化過的main.ll來生成main.bc代碼:
clang -emit-llvm -c main.ll -o main.bc
執(zhí)行結(jié)果:
dec0 170b 0000 0000 1400 0000 d80c 0000
0700 0001 4243 c0de 3514 0000 0700 0000
620c 3024 9696 a6a5 f7d7 7f5d d3b7 4ffb
b7ed e7fd 4f0b 5180 4c01 0000 210c 0000
e602 0000 0b02 2100 0200 0000 1600 0000
0781 2391 41c8 0449 0610 3239 9201 840c
2505 0819 1e04 8b62 8014 4502 4292 0b42
a410 3214 3808 184b 0a32 5288 4870 c421
2344 1287 8c10 4192 0264 c808 b114 2043
4688 20c9 0132 5284 182a 282a 9031 7cb0
5c91 20c5 c800 0000 8920 0000 1000 0000
3222 4809 2064 8504 9322 a484 0493 22e3
84a1 9014 124c 8a8c 0b84 a44c 1040 7304
4832 000a 7304 6040 8048 19c6 2864 aa30
08a1 8180 1c18 34a8 8c00 cc11 8002 0000
5118 0000 5901 0000 1bf8 27f8 ffff ffff
-
生成匯編代碼
通過最終的main.bc或main.ll,來生成匯編代碼:
clang -S -fobjc-arc main.bc -o main.s
clang -S -fobjc-arc main.ll -o main.s
執(zhí)行結(jié)果:
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 10, 15 sdk_version 10, 15, 4
.globl _main ## -- Begin function main
.p2align 4, 0x90
_main: ## @main
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $32, %rsp
movl $0, -4(%rbp)
movl %edi, -8(%rbp)
movq %rsi, -16(%rbp)
movl $15, -20(%rbp)
movl -20(%rbp), %eax
addl $10, %eax
leaq L_.str(%rip), %rdi
movl %eax, %esi
movb $0, %al
callq _printf
xorl %ecx, %ecx
movl %eax, -24(%rbp) ## 4-byte Spill
movl %ecx, %eax
addq $32, %rsp
popq %rbp
retq
.cfi_endproc
## -- End function
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "result = %d"
.section __DATA,__objc_imageinfo,regular,no_dead_strip
L_OBJC_IMAGE_INFO:
.long 0
.long 64
.subsections_via_symbols
生成的匯編代碼也可以進(jìn)行優(yōu)化:
clang -Os -S -fobjc-arc main.m -o main.s
優(yōu)化之后的文件變小了一些,最明顯的變化就是匯編指令變少了:
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 10, 15 sdk_version 10, 15, 4
.globl _main ## -- Begin function main
_main: ## @main
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
leaq L_.str(%rip), %rdi
movl $25, %esi
xorl %eax, %eax
callq _printf
xorl %eax, %eax
popq %rbp
retq
.cfi_endproc
## -- End function
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "result = %d"
.section __DATA,__objc_imageinfo,regular,no_dead_strip
L_OBJC_IMAGE_INFO:
.long 0
.long 64
.subsections_via_symbols
-
生成目標(biāo)文件(匯編器)
目標(biāo)文件的生成,是匯編以匯編代碼作為輸入,將匯編代碼轉(zhuǎn)換為機(jī)器代碼,最終輸出目標(biāo)文件(object file)。
clang -fmodules -c main.s -o main.o
通過nm命令,查看main.o中的符號信息:
nm -nm main.o
(undefined) external _printf
0000000000000000 (__TEXT,__text) external _main
這里的_printf是一個undefined external的。
undefined 表示在當(dāng)前文件暫時找不到符號_printf;
external 表示這個符號是外部可以訪問的。
這里也就是說還沒有鏈接動靜態(tài)庫。
-
生成可執(zhí)行文件(鏈接)
鏈接器把編譯產(chǎn)生的.o文件和.dylib/.a文件鏈接,生成一個mach-o文件。
clang main.o -o main
這部操作之后就生成了一個exec的mach-o的可執(zhí)行文件:

再次查看
main中的符號信息:
(undefined) external _printf (from libSystem)
(undefined) external dyld_stub_binder (from libSystem)
0000000100000000 (__TEXT,__text) [referenced dynamically] external __mh_execute_header
0000000100000f6f (__TEXT,__text) external _main
0000000100002008 (__DATA,__data) non-external __dyld_private
可以看到libSystem.dylib就鏈接上去了。
Clang插件
待補(bǔ)充
總結(jié)
本文90%內(nèi)容為摘抄記錄,代碼部分為實(shí)測。