qcow2 文件格式詳解(I)

castle/文

qcow2 鏡像格式是 QEMU 模擬器支持的一種磁盤鏡像。它也是可以用一個文件的形式來表示一塊固定大小的塊設(shè)備磁盤。與普通的 raw 格式的鏡像相比,有以下特性:

  • 更小的空間占用,即使文件系統(tǒng)不支持空洞(holes);
  • 支持寫時拷貝(COW, copy-on-write),鏡像文件只反映底層磁盤的變化;
  • 支持快照(snapshot),鏡像文件能夠包含多個快照的歷史;
  • 可選擇基于 zlib 的壓縮方式
  • 可以選擇 AES 加密

目前網(wǎng)上可以百度到一些對qcow2文件的中文解析,但大多語焉不詳,索性自己看qemu官方的文檔,順便在這里記下自己的理解。
虛擬化新手,理解可能有誤,見諒。

下文是對qcow2官方文檔的翻譯。以及自己的一些理解。

原文地址

概述

A qcow2 image file is organized in units of constant size, which are called
(host) clusters.

qcow2 鏡像文件是由多個固定大小的單元組織構(gòu)成,這些單元被稱為 (host)clusters 。

A cluster is the unit in which all allocations are done,
both for actual guest data and for image metadata.

無論是實際用戶數(shù)據(jù)(guest data)還是鏡像的元數(shù)據(jù)(metadata),都在一個 cluster 單元中進行存儲。

Likewise, the virtual disk as seen by the guest is divided into (guest)
clusters of the same size.

同樣的,用戶所見到的虛擬磁盤也是被分割為多個同樣大小的 cluesters 。

All numbers in qcow2 are stored in Big Endian byte order.

qcow2里所有的數(shù)都是Big Endian的。

文件頭

The first cluster of a qcow2 image contains the file header:

qcow2 鏡像的第一個 cluster 內(nèi)容包含了文件頭信息,文件頭在源代碼里的定義如下:

typedef struct QCowHeader {
    uint32_t magic;
    uint32_t version;
    uint64_t backing_file_offset;
    uint32_t backing_file_size;
    uint32_t cluster_bits;
    uint64_t size; /* in bytes */
    uint32_t crypt_method;
    uint32_t l1_size; /* XXX: save number of clusters instead ? */
    uint64_t l1_table_offset;
    uint64_t refcount_table_offset;
    uint32_t refcount_table_clusters;
    uint32_t nb_snapshots;
    uint64_t snapshots_offset;

    /* The following fields are only valid for version >= 3 */
    uint64_t incompatible_features;
    uint64_t compatible_features;
    uint64_t autoclear_features;

    uint32_t refcount_order;
    uint32_t header_length;
} QEMU_PACKED QCowHeader;

文件頭結(jié)構(gòu)體里的具體含義如下:

字節(jié) 0 - 3 :magic

QCOW magic string ("QFI\xfb")

4個字節(jié)固定的標(biāo)識符

4 - 7 version

Version number (valid values are 2 and 3)

版本號,2或者3

8 - 15 backing_file_offset

Offset into the image file at which the backing file name is stored (NB: The string is not null terminated). 0 if the image doesn't have a backing file.

backing_file 文件路徑字符串相對于文件起始位置的偏移地址,這個字符串不是以0結(jié)束的。該值為0時,表示該鏡像沒有 backing file

什么是backing file就不解釋了,知道qcow2的人自然知道。

16 - 19 backing_file_size

Length of the backing file name in bytes. Must not be longer than 1023 bytes. Undefined if the image doesn't have a backing file.

backing file 文件路徑字符串長度,單位是字節(jié)數(shù)。必須小于1023字節(jié)。鏡像沒有backing file時,該值無意義

20 - 23 cluster_bits

Number of bits that are used for addressing an offset within a cluster (1 << cluster_bits is the cluster size). Must not be less than 9 (i.e. 512 byte clusters). Note: qemu as of today has an implementation limit of 2 MB as the maximum cluster size and won't be able to open images with larger cluster sizes.

cluster 位數(shù),代表了 cluster 大小(1 << cluster_bits 就是 cluster 的大?。2荒苄∮?,也就是每個 cluster 大小不能小于 512個字節(jié)。 Note:新版本的qemu啟用了最大 2MB 的 cluster 大小。

24 - 31 size

Virtual disk size in bytes

虛擬磁盤的大小,單位字節(jié)。應(yīng)該就是鏡像文件總的大小。

32 - 35 crypt_method

0 for no encryption 1 for AES encryption

0 - 未加密;1 - AES加密

36 - 39 l1_size

Number of entries in the active L1 table

L1 table的入口個數(shù)。
L1 table 是什么鬼?目前不理解,以后再說

40 - 47 l1_table_offset

Offset into the image file at which the active L1 table starts. Must be aligned to a cluster boundary.

L1 table 相對于鏡像文件起始位置的偏移。 必須與 cluster 對齊

48 - 55 refcount_table_offset

Offset into the image file at which the refcount table starts. Must be aligned to a cluster boundary.

refcount table 相對于鏡像文件起始位置的偏移。必須與 cluster 對齊
refcount table 在后文有解釋?

56 - 59 refcount_table_clusters

Number of clusters that the refcount table occupies

refcount table 占用了多少個 cluster

60 - 63 nb_snapshots

Number of snapshots contained in the image

鏡像文件中包含了多少個快照。

64 - 71 snapshots_offset

Offset into the image file at which the snapshot table starts. Must be aligned to a cluster boundary.

快照 table 相對于鏡像文件起始位置的偏移。必須與 cluster 對齊

If the version is 3 or higher, the header has the following additional fields.
For version 2, the values are assumed to be zero, unless specified otherwise
in the description of a field.

如果版本是3或更高(目前最高就是3),文件頭還會包含以下的信息。在版本2中,這些值都是0,除非特別說明.

72 - 79 incompatible_features

Bitmask of incompatible features.
An implementation must fail to open an image if an unknown bit is set.

未實現(xiàn)的特征的位掩碼
在解析文件的時候,如果發(fā)現(xiàn)某個未知的位被設(shè)置為1,就是需要報錯的時候了。

Bit 0:

Dirty bit. If this bit is set then refcounts may be inconsistent, make sure to scan L1/L2 tables to repair refcounts before accessing the image.

臟位。如果該位為1,refcounts可能和實際情況是不一致的,在解析的時候需要掃描一遍 L1/L2 table 來修復(fù) refcounts。

Bit 1:

Corrupt bit. If this bit is set then any data structure may be corrupt and the image must not be written to (unless for regaining consistency).

損壞位。如果該位為1,任何數(shù)據(jù)結(jié)構(gòu)可能損壞,且鏡像不應(yīng)該被寫。
好吧,如果讀到這一位為1,我不想管了……

Bits 2-63:

Reserved (set to 0)

保留,應(yīng)該為0。

80 - 87: compatible_features

Bitmask of compatible features. An implementation can safely ignore any unknown bits that are set.

兼容特征的位掩碼。解析的時候完全可以忽略這些位。

Bit 0:

Lazy refcounts bit.
If this bit is set then lazy refcount updates can be used. This means marking the image file dirty and postponing refcount metadata updates.

該位為1,則 lazy refcount 更新可以被使用。 意味著 dirty bit 為1,并且推遲refcount 元數(shù)據(jù)的更新。

Bits 1-63: Reserved (set to 0)

88 - 95: autoclear_features

Bitmask of auto-clear features. An implementation may only write to an image with unknown auto-clear features if it clears the respective bits from this field first.

我的理解是…… 對于這些autoclear feature,在處理鏡像時,如果某一位含義未知,則應(yīng)該先將其設(shè)置為0,再進行寫鏡像操作。

Bit 0:

Bitmaps extension bit
This bit indicates consistency for the bitmaps extension data. It is an error if this bit is set without the bitmaps extension present. If the bitmaps extension is present but this bit is unset, the bitmaps extension data must be considered inconsistent.

這一位表示 bitmap extension 數(shù)據(jù)一致性。 如果這一位為1,但不存在 bitmaps extension,則應(yīng)該報錯;如果存在 bitmap extension 但這一位為0,則應(yīng)認(rèn)為 bitmap extension data 不一致(存在問題?)。

Bits 1-63: Reserved (set to 0)

96 - 99: refcount_order

Describes the width of a reference count block entry (width in bits: refcount_bits = 1 << refcount_order). For version 2 images, the order is always assumed to be 4 (i.e. refcount_bits = 16). This value may not exceed 6 (i.e. refcount_bits = 64).

refcount block 入口的寬度。
抱歉寫到這里的時候,我還不明白refcount是什么含義,無法做出更多解釋,不過反正版本2的時候是個固定值16,應(yīng)該影響不大
后文有詳細(xì)解釋

refcount_bits = 1 << refcount_order

版本2時,固定為4,也就是說 refcount_bits = 16.
該值不超過6,也就是 refcount_bits 不超過 64

100 - 103: header_length

Length of the header structure in bytes. For version 2 images, the length is always assumed to be 72 bytes.

文件頭結(jié)構(gòu)體的長度,版本2時,長度固定為72字節(jié)。

header extensions

Directly after the image header, optional sections called header extensions can
be stored. Each extension has a structure like the following:

緊接著鏡像的文件頭,存儲的是可選的多個 header extensions。
源代碼里header extension的結(jié)構(gòu)體定義如下:

typedef struct Qcow2UnknownHeaderExtension {
    uint32_t magic;
    uint32_t len;
    QLIST_ENTRY(Qcow2UnknownHeaderExtension) next;
    uint8_t data[];
} Qcow2UnknownHeaderExtension;

每一個結(jié)構(gòu)如下:

Byte 0 - 3: Header extension type:

0x00000000 - End of the header extension area
0xE2792ACA - Backing file format name
0x6803f857 - Feature name table
0x23852875 - Bitmaps extension
other - Unknown header extension, can be safely ignored

幾個固定的,extension的類型,沒啥可說的。

4 - 7: Length of the header extension data

數(shù)據(jù)長度

8 - n: Header extension data

數(shù)據(jù)內(nèi)容

n - m: Padding to round up the header extension size to the next multiple of 8.

填充到8字節(jié)對齊

Unless stated otherwise, each header extension type shall appear at most once
in the same image.
If the image has a backing file then the backing file name should be stored in
the remaining space between the end of the header extension area and the end of
the first cluster. It is not allowed to store other data here, so that an
implementation can safely modify the header and add extensions without harming
data of compatible features that it doesn't support. Compatible features that
need space for additional data can use a header extension.

除非特別說明,每個extension類型在一個鏡像里應(yīng)該只會出現(xiàn)一次。

下面是 Feature name table 和 Bitmaps extension 兩種 extension 類型結(jié)構(gòu)的說明。

Feature name table

The feature name table is an optional header extension that contains the name
for features used by the image. It can be used by applications that don't know
the respective feature (e.g. because the feature was introduced only later) to
display a useful error message.

The number of entries in the feature name table is determined by the length of
the header extension data. Each entry look like this:

Byte       0:   Type of feature (select feature bitmap)
                    0: Incompatible feature
                    1: Compatible feature
                    2: Autoclear feature

           1:   Bit number within the selected feature bitmap (valid
                values: 0-63)

      2 - 47:   Feature name (padded with zeros, but not necessarily null
                terminated if it has full length)

Bitmaps extension

The bitmaps extension is an optional header extension. It provides the ability
to store bitmaps related to a virtual disk. For now, there is only one bitmap
type: the dirty tracking bitmap, which tracks virtual disk changes from some
point in time.

The data of the extension should be considered consistent only if the
corresponding auto-clear feature bit is set, see autoclear_features above.

The fields of the bitmaps extension are:

Byte  0 -  3:  nb_bitmaps
               The number of bitmaps contained in the image. Must be
               greater than or equal to 1.

               Note: Qemu currently only supports up to 65535 bitmaps per
               image.

      4 -  7:  Reserved, must be zero.

      8 - 15:  bitmap_directory_size
               Size of the bitmap directory in bytes. It is the cumulative
               size of all (nb_bitmaps) bitmap headers.

     16 - 23:  bitmap_directory_offset
               Offset into the image file at which the bitmap directory
               starts. Must be aligned to a cluster boundary.

Host cluster management

看到這里,發(fā)現(xiàn)似乎有 host cluster 和 guest cluster 的區(qū)別,權(quán)且這么認(rèn)為,先繼續(xù)看吧。

qcow2 manages the allocation of host clusters by maintaining a reference count
for each host cluster. A refcount of 0 means that the cluster is free, 1 means
that it is used, and >= 2 means that it is used and any write access must
perform a COW (copy on write) operation.

這里解釋了前面一直提到的refcount。對于每一個host cluster,qcow2維護了一個refcount表,應(yīng)該是引用計數(shù)的概念,當(dāng)refcount為0時,表示該cluster是未分配的,1表示是在使用的,>=2時表示在被使用,并且所有的寫操作都要進行COW(copy on write)操作。

The refcounts are managed in a two-level table. The first level is called
refcount table and has a variable size (which is stored in the header). The
refcount table can cover multiple clusters, however it needs to be contiguous
in the image file.

采用了兩層表來維護管理 refcounts,第一層叫 refcount table,是可變大小的(refcount table 的 size 存儲在header里),refcount table 的每一項覆蓋多個 cluster,當(dāng)然,在鏡像文件中refcount table是連續(xù)存儲的。

It contains pointers to the second level structures which are called refcount
blocks and are exactly one cluster in size.

refcount table 包含了多個指針,指向了第二層結(jié)構(gòu)體,第二層結(jié)構(gòu)被稱為 refcount block,一個refcount block在大小上就是一個cluster。(意思就是,block也是存在一個個cluster里的)

Given a offset into the image file, the refcount of its cluster can be obtained
as follows:

以下是根據(jù)鏡像偏移量 offset,獲得某個cluster對應(yīng)引用計數(shù)的方法:

    refcount_block_entries = (cluster_size * 8 / refcount_bits)

    refcount_block_index = (offset / cluster_size) % refcount_block_entries
    refcount_table_index = (offset / cluster_size) / refcount_block_entries

    refcount_block = load_cluster(refcount_table[refcount_table_index]);
    return refcount_block[refcount_block_index];

注:各種變量前文有述,這里回顧一下

cluster_size = 1 << cluster_bits  //最小 512 bytes
refcount_bits = 16 //in version 2

怎么理解呢?
以版本2為例

  1. cluster_size是一個cluster的字節(jié)數(shù),對于一個qcow2文件來說,每個cluster都是固定大小的,比如512字節(jié)。
  2. refcount_bits固定是16,因為refcount block也要按照cluster的大小來存儲,所以每個cluster能夠存儲的block個數(shù): refcount_block_entries = cluster_size / 2 = 256 。
    refcount_table 的一個單元對應(yīng) 256 個 refcount_block,存在一個cluster里。
    每個block里有2個字節(jié)(16位),記錄了某個cluster的引用計數(shù)。

所以計算某個 offset 所在的 cluster 引用計數(shù)的辦法,先 offset / cluster_size 得到這個offset對應(yīng)的是第幾個cluster,然后在refcount table里找,存在table的第幾個單元里,最后在這個單元里找是第幾個block存著引用計數(shù)。

因為一開始沒有區(qū)分原文里offset所在的cluster和存儲block的cluster,理解這個refcount table 頗費了一番功夫,年紀(jì)大了腦子真的不好使了?

下面是 refcount table 和 refcount block 的結(jié)構(gòu)體定義,理解了上面這段的話,這里挺簡單的了。

Refcount table entry:

    Bit  0 -  8:    Reserved (set to 0)

         9 - 63:    Bits 9-63 of the offset into the image file at which the
                    refcount block starts. Must be aligned to a cluster
                    boundary.

                    If this is 0, the corresponding refcount block has not yet
                    been allocated. All refcounts managed by this refcount block
                    are 0.

Refcount block entry (x = refcount_bits - 1):

    Bit  0 -  x:    Reference count of the cluster. If refcount_bits implies a
                    sub-byte width, note that bit 0 means the least significant
                    bit in this context.

先寫到這里吧,后面還有很重要的 cluster mapping的解讀和快照的解讀,明天有精力再寫。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • mean to add the formatted="false" attribute?.[ 46% 47325/...
    ProZoom閱讀 3,158評論 0 3
  • pyspark.sql模塊 模塊上下文 Spark SQL和DataFrames的重要類: pyspark.sql...
    mpro閱讀 9,911評論 0 13
  • qemu-img是QEMU的磁盤管理工具,在qemu-kvm源碼編譯后就會默認(rèn)編譯好qemu-img這個二進制文件...
    春風(fēng)拂過誰閱讀 5,509評論 0 2
  • Introduction What is Bowtie 2? Bowtie 2 is an ultrafast a...
    wzz閱讀 6,162評論 0 5
  • NAME dnsmasq - A lightweight DHCP and caching DNS server....
    ximitc閱讀 2,991評論 0 0

友情鏈接更多精彩內(nèi)容