概要
本文對dd測試中的進(jìn)行下簡單介紹。
dd的man手冊
收下看下dd測試中,flag參數(shù)都有哪些,查看dd的man手冊如下:
'if=FILE'
Read from FILE instead of standard input.
'of=FILE'
Write to FILE instead of standard output. Unless 'conv=notrunc' is
given, 'dd' truncates FILE to zero bytes (or the size specified
with 'seek=').
'bs=BYTES'
Set both input and output block sizes to BYTES. This makes 'dd'
read and write BYTES per block, overriding any 'ibs' and 'obs'
settings. In addition, if no data-transforming 'conv' option is
specified, input is copied to the output as soon as it's read, even
if it is smaller than the block size.
'count=N'
Copy N 'ibs'-byte blocks from the input file, instead of everything
until the end of the file. if 'iflag=count_bytes' is specified, N
is interpreted as a byte count rather than a block count. Note if
the input may return short reads as could be the case when reading
from a pipe for example, 'iflag=fullblock' will ensure that
'count=' corresponds to complete input blocks rather than the
traditional POSIX specified behavior of counting input read
operations.
'conv=CONVERSION[,CONVERSION]...'
Convert the file as specified by the CONVERSION argument(s). (No
spaces around any comma(s).)
Conversions:
'ascii'
Convert EBCDIC to ASCII, using the conversion table specified
by POSIX. This provides a 1:1 translation for all 256 bytes.
'ebcdic'
Convert ASCII to EBCDIC. This is the inverse of the 'ascii'
conversion.
'ibm'
Convert ASCII to alternate EBCDIC, using the alternate
conversion table specified by POSIX. This is not a 1:1
translation, but reflects common historical practice for '~',
'[', and ']'.
The 'ascii', 'ebcdic', and 'ibm' conversions are mutually
exclusive.
'block'
For each line in the input, output 'cbs' bytes, replacing the
input newline with a space and padding with spaces as
necessary.
'unblock'
Remove any trailing spaces in each 'cbs'-sized input block,
and append a newline.
The 'block' and 'unblock' conversions are mutually exclusive.
'lcase'
Change uppercase letters to lowercase.
'ucase'
Change lowercase letters to uppercase.
The 'lcase' and 'ucase' conversions are mutually exclusive.
'sparse'
Try to seek rather than write NUL output blocks. On a file
system that supports sparse files, this will create sparse
output when extending the output file. Be careful when using
this option in conjunction with 'conv=notrunc' or
'oflag=append'. With 'conv=notrunc', existing data in the
output file corresponding to NUL blocks from the input, will
be untouched. With 'oflag=append' the seeks performed will be
ineffective. Similarly, when the output is a device rather
than a file, NUL input blocks are not copied, and therefore
this option is most useful with virtual or pre zeroed devices.
'swab'
Swap every pair of input bytes. GNU 'dd', unlike others,
works when an odd number of bytes are read--the last byte is
simply copied (since there is nothing to swap it with).
'swab'
Swap every pair of input bytes. GNU 'dd', unlike others,
works when an odd number of bytes are read--the last byte is
simply copied (since there is nothing to swap it with).
'sync'
Pad every input block to size of 'ibs' with trailing zero
bytes. When used with 'block' or 'unblock', pad with spaces
instead of zero bytes.
The following "conversions" are really file flags and don't affect
internal processing:
'excl'
Fail if the output file already exists; 'dd' must create the
output file itself.
'nocreat'
Do not create the output file; the output file must already
exist.
The 'excl' and 'nocreat' conversions are mutually exclusive.
'notrunc'
Do not truncate the output file.
'noerror'
Continue after read errors.
'fdatasync'
Synchronize output data just before finishing. This forces a
physical write of output data.
'fsync'
Synchronize output data and metadata just before finishing.
This forces a physical write of output data and metadata.
'iflag=FLAG[,FLAG]...'
Access the input file using the flags specified by the FLAG
argument(s). (No spaces around any comma(s).)
'oflag=FLAG[,FLAG]...'
Access the output file using the flags specified by the FLAG
argument(s). (No spaces around any comma(s).)
Here are the flags. Not every flag is supported on every operating
system.
'append'
Write in append mode, so that even if some other process is
writing to this file, every 'dd' write will append to the
current contents of the file. This flag makes sense only for
output. If you combine this flag with the 'of=FILE' operand,
you should also specify 'conv=notrunc' unless you want the
output file to be truncated before being appended to.
'cio'
Use concurrent I/O mode for data. This mode performs direct
I/O and drops the POSIX requirement to serialize all I/O to
the same file. A file cannot be opened in CIO mode and with a
standard open at the same time.
'direct'
Use direct I/O for data, avoiding the buffer cache. Note that
the kernel may impose restrictions on read or write buffer
sizes. For example, with an ext4 destination file system and
a linux-based kernel, using 'oflag=direct' will cause writes
to fail with 'EINVAL' if the output buffer size is not a
multiple of 512.
'directory'
Fail unless the file is a directory. Most operating systems
do not allow I/O to a directory, so this flag has limited
utility.
'dsync'
Use synchronized I/O for data. For the output file, this
forces a physical write of output data on each write. For the
input file, this flag can matter when reading from a remote
file that has been written to synchronously by some other
process. Metadata (e.g., last-access and last-modified time)
is not necessarily synchronized.
'sync'
Use synchronized I/O for both data and metadata.
'nocache'
Discard the data cache for a file. When count=0 all cache is
discarded, otherwise the cache is dropped for the processed
portion of the file. Also when count=0 failure to discard the
cache is diagnosed and reflected in the exit status. Here as
some usage examples:
# Advise to drop cache for whole file
dd if=ifile iflag=nocache count=0
# Ensure drop cache for the whole file
dd of=ofile oflag=nocache conv=notrunc,fdatasync count=0
# Drop cache for part of file
dd if=ifile iflag=nocache skip=10 count=10 of=/dev/null
# Stream data using just the read-ahead cache
dd if=ifile of=ofile iflag=nocache oflag=nocache
'nonblock'
Use non-blocking I/O.
'noatime'
Do not update the file's access time. Some older file systems
silently ignore this flag, so it is a good idea to test it on
your files before relying on it.
'noctty'
Do not assign the file to be a controlling terminal for 'dd'.
This has no effect when the file is not a terminal. On many
hosts (e.g., GNU/Linux hosts), this option has no effect at
all.
'nofollow'
Do not follow symbolic links.
'nolinks'
Fail if the file has multiple hard links.
'binary'
Use binary I/O. This option has an effect only on nonstandard
platforms that distinguish binary from text I/O.
'text'
Use text I/O. Like 'binary', this option has no effect on
standard platforms.
'fullblock'
Accumulate full blocks from input. The 'read' system call may
return early if a full block is not available. When that
happens, continue calling 'read' to fill the remainder of the
block. This flag can be used only with 'iflag'. This flag is
useful with pipes for example as they may return short reads.
In that case, this flag is needed to ensure that a 'count='
argument is interpreted as a block count rather than a count
of read operations.
'count_bytes'
Interpret the 'count=' operand as a byte count, rather than a
block count, which allows specifying a length that is not a
multiple of the I/O block size. This flag can be used only
with 'iflag'.
我們重點對direct、dsync、sync來進(jìn)行下介紹,在介紹之前,需要首先了解下linux的I/O體系,如下:
Linux I/O體系

上面的圖片有些復(fù)雜,可以簡略為如下圖片:

Linux磁盤I/O可以分為以下層次:
虛擬文件系統(tǒng)層
文件系統(tǒng)層
緩存層
通用塊層
I/O調(diào)度層
驅(qū)動層
物理設(shè)備層
虛擬文件系統(tǒng)層
一般來說,應(yīng)用程序不會直接跟物理設(shè)備直接打交道,基本上都是經(jīng)過文件系統(tǒng)去操作設(shè)備。文件系統(tǒng)種類比較多,比如基于塊設(shè)備的ext系列、xfs,網(wǎng)絡(luò)文件系統(tǒng)nfs等等,各類文件系統(tǒng)的接口和實現(xiàn)各不相同,這就產(chǎn)生了一個問題,難道應(yīng)用程序要為各種文件系統(tǒng)做特殊化處理嗎?答案是不用的,因為有虛擬文件系統(tǒng)。虛擬文件系統(tǒng)層位于文件系統(tǒng)層之上,屏蔽了各種文件系統(tǒng)的差異,為應(yīng)用層提供了一個統(tǒng)一的、虛擬的文件系統(tǒng)接口,也就是說應(yīng)用程序使用一套統(tǒng)一的接口便可以操作所有的文件系統(tǒng)。
文件系統(tǒng)層
基于虛擬文件系統(tǒng)定義的統(tǒng)一接口,實現(xiàn)具體文件系統(tǒng)的功能,文件系統(tǒng)有三類:
1.基于塊設(shè)備的文件系統(tǒng),如ext2、3、4,xfs;
2.網(wǎng)絡(luò)文件系統(tǒng),如nfs、cifs;
3.特殊文件系統(tǒng),如/proc、裸設(shè)備文件。
緩存層
相比于CPU和內(nèi)存,磁盤I/O屬于慢速I/O,為了提高磁盤I/O的速度,Linux添加了緩存層。默認(rèn)情況下,I/O數(shù)據(jù)先放到緩存中便返回上層,由內(nèi)核再把數(shù)據(jù)寫到設(shè)備,或者是上層把緩存數(shù)據(jù)讀走。對于寫操作,由于數(shù)據(jù)是放到緩存便返回了,上層認(rèn)為I/O結(jié)束了,實際上數(shù)據(jù)還沒落盤,如果這時候電腦異常掉電了,數(shù)據(jù)將會丟失。如果應(yīng)用層要確保數(shù)據(jù)寫到物理設(shè)備了,可以調(diào)用flush接口,緩存中的數(shù)據(jù)將會刷到物理設(shè)備中。Linux也提供了繞過緩存層的設(shè)置,打開文件的時候指定direct標(biāo)識,數(shù)據(jù)將繞過緩存層繼續(xù)執(zhí)行。
可通過free看到目前緩存的數(shù)據(jù)量,下圖的buff/cache便是:

通用塊層
由于設(shè)備種類繁多,接口也各不相同,為了屏蔽這些設(shè)備的差異,添加了通用塊層。文件系統(tǒng)只需要跟統(tǒng)一的通用層打交道便可以跟設(shè)備通信,無需關(guān)心實際設(shè)備驅(qū)動的實現(xiàn),簡化了文件系統(tǒng)的實現(xiàn)。
I/O調(diào)度層
磁盤I/O請求是隨機的,請求操作的磁盤位置也是隨機的,為了減少磁盤I/O的磁盤,增大磁盤整體的吞吐量,Linux添加了I/O調(diào)度層。I/O調(diào)度層使用調(diào)度算法,更加合理的對I/O請求進(jìn)行排序和合并,經(jīng)典的是電梯算法。
把磁盤I/O請求比作為乘坐電梯,分別有請求到3樓、到2樓、到6樓、到4樓,如果沒有調(diào)度算法的處理,將會出現(xiàn)電梯從1樓到3樓,從3樓到2樓,從2樓到6樓,再從6樓到4樓,造成電梯資源的浪費;如果有了調(diào)度算法,對調(diào)度進(jìn)行了合理的排序,將出現(xiàn)電梯先到2樓、3樓、4樓、6樓,一次從1樓到6樓便可以完成所有的請求。

驅(qū)動層
各類物理設(shè)備的驅(qū)動層,用于內(nèi)核與物理設(shè)備通訊。內(nèi)核會提供驅(qū)動的通用接口,設(shè)備商根據(jù)接口實現(xiàn)驅(qū)動程序并注冊到內(nèi)核便可實現(xiàn)內(nèi)核與設(shè)備的通訊。
物理設(shè)備層
各種物理磁盤設(shè)備,提供實際的存儲功能,慢速設(shè)備有傳統(tǒng)的機械硬盤HDD、快速的有固態(tài)硬盤SSD和NVME。物理磁盤也會帶有緩存,用于提供I/O速度,磁盤中帶有電容,可保證哪怕掉電也能把緩存數(shù)據(jù)刷寫到磁盤中。
常見參數(shù)對此
conv標(biāo)志
'fdatasync'
Synchronize output data just before finishing. This forces a
physical write of output data.
'fsync'
Synchronize output data and metadata just before finishing.
This forces a physical write of output data and metadata.
oflag參數(shù)
'direct'
Use direct I/O for data, avoiding the buffer cache. Note that
the kernel may impose restrictions on read or write buffer
sizes. For example, with an ext4 destination file system and
a linux-based kernel, using 'oflag=direct' will cause writes
to fail with 'EINVAL' if the output buffer size is not a
multiple of 512.
'dsync'
Use synchronized I/O for data. For the output file, this
forces a physical write of output data on each write. For the
input file, this flag can matter when reading from a remote
file that has been written to synchronously by some other
process. Metadata (e.g., last-access and last-modified time)
is not necessarily synchronized.
'sync'
Use synchronized I/O for both data and metadata.
沒oflag
沒有oflag時,dd按照默認(rèn)的方式打開輸出文件,默認(rèn)是buffered I/O,數(shù)據(jù)寫到緩存層便返回,所以速度最快。
oflag=direct
以該方式打開輸出文件,數(shù)據(jù)寫到磁盤緩存便返回,所以速度比上面的buffered I/O方式要慢。
oflag=sync
以該方式打開輸出文件,數(shù)據(jù)全部落盤才返回,所以速度比上面的僅寫到磁盤緩存要慢。
oflag=dsync
以該方式打開輸出文件,跟sync相同,區(qū)別在于sync同步元數(shù)據(jù),但是dsync不包括元數(shù)據(jù)。
實際案例
某客戶兩臺同配置機器,運行數(shù)據(jù)庫業(yè)務(wù),對兩臺機器使用dd性能測試,客戶原始反饋兩臺機器使用dd測試性能差距較大,如下:
主服務(wù)器,有業(yè)務(wù)運行

同時客戶表示主服務(wù)器tpm文件系統(tǒng)寫入較快

備服務(wù)器,無業(yè)務(wù)運行。

分析
1、tmp寫入較快為bs=1M,同時為fsync(參數(shù)原則上要求物理寫入,但測試對象為/tmp文件系統(tǒng),此文件系統(tǒng)會有些特殊)。
舉例如下,同樣的參數(shù)在/tmp下跟在/home下執(zhí)行就會有些差別:

2、兩臺機器差別較大的原因,懷疑主要是受業(yè)務(wù)的影響。
3、因此建議在無業(yè)務(wù)影響的條件下,測試其他非/tmp文件系統(tǒng),并使用oflag=direct(排除系統(tǒng)緩存影響),結(jié)果如下:
