本文目的
承上啟下。作為對《【自制操作系統(tǒng)】HelloWorld》中的相關技術細節(jié)進行解釋,也為以后將用使用的技術進行簡要介紹。隨著探索的深入,將不斷地對此文進行修正和補充。
System V ABI
System V ABI,即System V Application Binary Interface,包含了一系列關于調用規(guī)范(Calling Convention)、對象文件格式、可執(zhí)行文件格式等很多方面的規(guī)范細節(jié)。如果我們采用GNU Compiler Collection(GCC) 來開發(fā)我們的操作系統(tǒng),那么我們最好遵循這些規(guī)范。
由于筆者水平有限,在此只能對相關概念進行簡要介紹,如有不當還望指正。
ELF
ELF,Executable and Linkable Format,是Unix以及類Unix系統(tǒng)對象文件的格式標準。ELF是一種可擴展的文件格式,不同的硬件平臺和操作系統(tǒng)下均有不同的實現。通用地講,ELF對象文件主要包含三種:
- 可重定向文件(relocatable file)
用于和其它對象文件進行連接生成可執(zhí)行文件或共享對象文件。
可執(zhí)行文件(executable fiel)
共享對象文件(shared object file)
文件格式的根本目的就是描述一個文件的內容。而為了能對不同內容的文件進行描述,文件格式通常會定義各種復雜的數據結構來描述其中的內容。當這些復雜數據結構變成線性結構時常常轉換為各種表格,通常是多重層疊的表格。如Java程序的字節(jié)碼文件就使用了各種表格來描述可執(zhí)行文件的內容。ELF文件也不例外,它也定義了各種不同的數據結構(表格)。
因為ELF格式既能描述一個可執(zhí)行文件,也能描述一個用于鏈接的對象文件,所以ELF文件格式提供了兩種不同的視角:

因為這款玩具操作系統(tǒng)將運行在intel平臺之上,在此只對System V在Intel架構之上的ELF文件格式進行簡要介紹。
假設現有kernel.c文件如下:
int main(){
return 0xdead;
}
使用之前編譯好的交叉編譯工具對kernel.c文件進行編譯,并生成kernel.o對象文件。具體編譯指令為:i686-elf-gcc -std=gnu99 -ffreestanding -O2 -c kernel.c -o kernel.o
工具readelf可以查看ELF格式的文件內容。(注意,Mac不自帶readelf工具,需要安裝gnu的工具鏈,之前有過介紹)。
和所有文件格式一樣,ELF格式文件最開始的若干字節(jié)為文件的魔數(magic number),用于描述文件格式。Header作為ELF格式文件的第一部分內容,從整體上描述了該ELF文件的內容。包含了此文件所對應的系統(tǒng)和硬件平臺需求,以及Program Header Table/Section Header Table的概要信息(即如何找到這兩個表格),以及其它的一些元信息。下圖描述了a.out文件的Header信息

從上圖中可以看出,由a.c文件所編譯而來的a.out文件不是可執(zhí)行文件,因此其并沒有Program Header。相反,a.out文件使用幾個section來描述其內容。
ELF文件中的Section Header Table描述了該文件中所有section的元信息,包括名稱、類型、以及如果被加載到程序映像時的內存信息(地址、大小、權限等)等信息。
其中,所有預定義的section name都是以.開始。在ELF規(guī)范中預定了幾個特殊的section,這里選幾個常見的進行簡要介紹。
.bss未初始化數據.data已初始化的數據.dynamic動態(tài)鏈接信息.strtab字符串表.symtab符號表.text可執(zhí)行代碼
而System V進一步地定義了幾個特殊的section。這里只介紹兩個我們需要關心的section:
.fini程序終止時需要執(zhí)行的代碼.init程序初始化時需要執(zhí)行的代碼(一般情況下,是在main函數之前被執(zhí)行的代碼).relname重定向信息,其中name為占位符。
下圖描述了kernel.o文件中的section信息:

編譯生成的kernel.o文件并不是一個可執(zhí)行文件.它并沒有一個Program Header Table,因此程序加載器也無法知道應該如何將文件加載到內存中以及如何將控制流傳遞給此文件所描述的程序。相反,它包含了關于如何重定向該文件的信息,如下圖所示:

為了能讓kernel.c所描述的程序能作為一個“內核”在bochs虛擬機上運行(通過grub加載),我們需要一個可執(zhí)行的kernel文件。
下述代碼所示的boot.s文件,主要是定義了能被grub識別的multi-boot信息,定義了_start函數。_start函數在會在調用kernel.c中的main函數之后進入死循環(huán)。通過命令i686-elf-as boot.s -o boot.o可將此匯編文件編譯生成boot.o對象文件。
# Declare constants used for creating a multiboot header.
.set ALIGN, 1<<0 # align loaded modules on page boundaries
.set MEMINFO, 1<<1 # provide memory map
.set FLAGS, ALIGN | MEMINFO # this is the Multiboot 'flag' field
.set MAGIC, 0x1BADB002 # 'magic number' lets bootloader find the header
.set CHECKSUM, -(MAGIC + FLAGS) # checksum of above, to prove we are multiboot
# Declare a header as in the Multiboot Standard. We put this into a special
# section so we can force the header to be in the start of the final program.
# You don't need to understand all these details as it is just magic values that
# is documented in the multiboot standard. The bootloader will search for this
# magic sequence and recognize us as a multiboot kernel.
.section .multiboot
.align 4
.long MAGIC
.long FLAGS
.long CHECKSUM
# Currently the stack pointer register (esp) points at anything and using it may
# cause massive harm. Instead, we'll provide our own stack. We will allocate
# room for a small temporary stack by creating a symbol at the bottom of it,
# then allocating 16384 bytes for it, and finally creating a symbol at the top.
.section .bootstrap_stack, "aw", @nobits
stack_bottom:
.skip 16384 # 16 KiB
stack_top:
# The linker script specifies _start as the entry point to the kernel and the
# bootloader will jump to this position once the kernel has been loaded. It
# doesn't make sense to return from this function as the bootloader is gone.
.section .text
.global _start
.type _start, @function
_start:
# Welcome to kernel mode! We now have sufficient code for the bootloader to
# load and run our operating system. It doesn't do anything interesting yet.
# Perhaps we would like to call printf("Hello, World\n"). You should now
# realize one of the profound truths about kernel mode: There is nothing
# there unless you provide it yourself. There is no printf function. There
# is no <stdio.h> header. If you want a function, you will have to code it
# yourself. And that is one of the best things about kernel development:
# you get to make the entire system yourself. You have absolute and complete
# power over the machine, there are no security restrictions, no safe
# guards, no debugging mechanisms, there is nothing but what you build.
# By now, you are perhaps tired of assembly language. You realize some
# things simply cannot be done in C, such as making the multiboot header in
# the right section and setting up the stack. However, you would like to
# write the operating system in a higher level language, such as C or C++.
# To that end, the next task is preparing the processor for execution of
# such code. C doesn't expect much at this point and we only need to set up
# a stack. Note that the processor is not fully initialized yet and stuff
# such as floating point instructions are not available yet.
# To set up a stack, we simply set the esp register to point to the top of
# our stack (as it grows downwards).
movl $stack_top, %esp
# We are now ready to actually execute C code. We cannot embed that in an
# assembly file, so we'll create a kernel.c file in a moment. In that file,
# we'll create a C entry point called kernel_main and call it here.
call kernel_main
# This infinite loop will help us debug in bochs more easily.
LoopLabel:
jmp LoopLabel
# In case the function returns, we'll want to put the computer into an
# infinite loop. To do that, we use the clear interrupt ('cli') instruction
# to disable interrupts, the halt instruction ('hlt') to stop the CPU until
# the next interrupt arrives, and jumping to the halt instruction if it ever
# continues execution, just to be safe. We will create a local label rather
# than real symbol and jump to there endlessly.
cli
hlt
.Lhang:
jmp .Lhang
# Set the size of the _start symbol to the current location '.' minus its start.
# This is useful when debugging or when you implement call tracing.
.size _start, . - _start
將boot.o與kernel.o文件進行連接后,即可生成一個ELF格式的可執(zhí)行文件。下述的linker.ld文件將指導鏈接器(linker)對boot.o文件與kenerl.o文件進行鏈接并生成可執(zhí)行的kernel文件。具體編譯指令為:i686-elf-gcc -T linker.ld -o kernel -ffreestanding -O0 -nostdlib kernel.o boot.o -lgcc
/* The bootloader will look at this image and start execution at the symbol
designated as the entry point. */
ENTRY(_start)
/* Tell where the various sections of the object files will be put in the final
kernel image. */
SECTIONS
{
/* Begin putting sections at 1 MiB, a conventional place for kernels to be
loaded at by the bootloader. */
. = 1M;
/* First put the multiboot header, as it is required to be put very early
early in the image or the bootloader won't recognize the file format.
Next we'll put the .text section. */
.text BLOCK(4K) : ALIGN(4K)
{
*(.multiboot)
*(.text)
}
/* Read-only data. */
.rodata BLOCK(4K) : ALIGN(4K)
{
*(.rodata)
}
/* Read-write data (initialized) */
.data BLOCK(4K) : ALIGN(4K)
{
*(.data)
}
/* Read-write data (uninitialized) and stack */
.bss BLOCK(4K) : ALIGN(4K)
{
*(COMMON)
*(.bss)
*(.bootstrap_stack)
}
/* The compiler may produce other sections, by default it will put them in
a segment with the same name. Simply add stuff here as needed. */
}
如果使用readelf工具查看kernel文件,可以發(fā)現:作為可執(zhí)行的ELF文件,kernel穩(wěn)重中包含了Program Header Table(如下圖所示)。Program Header Table描述了此ELF文件作為可執(zhí)行文件在被加載器(在此例子中,我們的kernel由grub加載)加載到內存中時,各個section應該被加載到何種內存段中,并應該具有什么樣的權限。

從上圖可以很容易地看出,Program Header Table由不同的Segment,而每個Segment包含若干Section。屬于同一Segment的Section由相同的權限進行保護。
調用約定(Calling Convention)
這部分也只是System V在i386平臺下的約定。
- 函數調用指令
匯編語言中函數調用使用
call指令調用函數。而對應的ret執(zhí)行將從棧頂pop出一個地址(caller 調用callee的指令的下一條指令的地址)后,跳轉到此指令執(zhí)行。
- 函數調用返回值
函數的返回值將存入寄存器%eax中。如果結果為64位,則其高32位存入寄存器%edx中
- 參數傳遞
使用棧傳遞參數值,采用“從右向左,一次壓棧”的方式。
如我們在bochs中運行上文的kernel時,在進入死循環(huán)之后,可發(fā)現寄存器%eax中為main函數的返回值。

附相關資源
System V ABI (v4.1):http://www.sco.com/developers/devspecs/gabi41.pdf