dd命令詳解

概要

本文對dd測試中的進(jìn)行下簡單介紹。

dd的man手冊

收下看下dd測試中,flag參數(shù)都有哪些,查看dd的man手冊如下:

'if=FILE'
     Read from FILE instead of standard input.
'of=FILE'
     Write to FILE instead of standard output.  Unless 'conv=notrunc' is
     given, 'dd' truncates FILE to zero bytes (or the size specified
     with 'seek=').
'bs=BYTES'
     Set both input and output block sizes to BYTES.  This makes 'dd'
     read and write BYTES per block, overriding any 'ibs' and 'obs'
     settings.  In addition, if no data-transforming 'conv' option is
     specified, input is copied to the output as soon as it's read, even
     if it is smaller than the block size.
'count=N'
     Copy N 'ibs'-byte blocks from the input file, instead of everything
     until the end of the file.  if 'iflag=count_bytes' is specified, N
     is interpreted as a byte count rather than a block count.  Note if
     the input may return short reads as could be the case when reading
     from a pipe for example, 'iflag=fullblock' will ensure that
     'count=' corresponds to complete input blocks rather than the
     traditional POSIX specified behavior of counting input read
     operations.
'conv=CONVERSION[,CONVERSION]...'
     Convert the file as specified by the CONVERSION argument(s).  (No
     spaces around any comma(s).)

     Conversions:

     'ascii'
          Convert EBCDIC to ASCII, using the conversion table specified
          by POSIX.  This provides a 1:1 translation for all 256 bytes.

     'ebcdic'
          Convert ASCII to EBCDIC.  This is the inverse of the 'ascii'
          conversion.

     'ibm'
          Convert ASCII to alternate EBCDIC, using the alternate
          conversion table specified by POSIX.  This is not a 1:1
          translation, but reflects common historical practice for '~',
          '[', and ']'.

          The 'ascii', 'ebcdic', and 'ibm' conversions are mutually
          exclusive.

     'block'
          For each line in the input, output 'cbs' bytes, replacing the
          input newline with a space and padding with spaces as
          necessary.

     'unblock'
          Remove any trailing spaces in each 'cbs'-sized input block,
          and append a newline.

          The 'block' and 'unblock' conversions are mutually exclusive.

     'lcase'
          Change uppercase letters to lowercase.

     'ucase'
          Change lowercase letters to uppercase.

          The 'lcase' and 'ucase' conversions are mutually exclusive.

     'sparse'
          Try to seek rather than write NUL output blocks.  On a file
          system that supports sparse files, this will create sparse
          output when extending the output file.  Be careful when using
          this option in conjunction with 'conv=notrunc' or
          'oflag=append'.  With 'conv=notrunc', existing data in the
          output file corresponding to NUL blocks from the input, will
          be untouched.  With 'oflag=append' the seeks performed will be
          ineffective.  Similarly, when the output is a device rather
          than a file, NUL input blocks are not copied, and therefore
          this option is most useful with virtual or pre zeroed devices.

     'swab'
          Swap every pair of input bytes.  GNU 'dd', unlike others,
          works when an odd number of bytes are read--the last byte is
          simply copied (since there is nothing to swap it with).
     'swab'
          Swap every pair of input bytes.  GNU 'dd', unlike others,
          works when an odd number of bytes are read--the last byte is
          simply copied (since there is nothing to swap it with).

     'sync'
          Pad every input block to size of 'ibs' with trailing zero
          bytes.  When used with 'block' or 'unblock', pad with spaces
          instead of zero bytes.

     The following "conversions" are really file flags and don't affect
     internal processing:

     'excl'
          Fail if the output file already exists; 'dd' must create the
          output file itself.

     'nocreat'
          Do not create the output file; the output file must already
          exist.

          The 'excl' and 'nocreat' conversions are mutually exclusive.

     'notrunc'
          Do not truncate the output file.

     'noerror'
          Continue after read errors.

     'fdatasync'
          Synchronize output data just before finishing.  This forces a
          physical write of output data.

     'fsync'
          Synchronize output data and metadata just before finishing.
          This forces a physical write of output data and metadata.
'iflag=FLAG[,FLAG]...'
     Access the input file using the flags specified by the FLAG
     argument(s).  (No spaces around any comma(s).)

'oflag=FLAG[,FLAG]...'
     Access the output file using the flags specified by the FLAG
     argument(s).  (No spaces around any comma(s).)

     Here are the flags.  Not every flag is supported on every operating
     system.

     'append'
          Write in append mode, so that even if some other process is
          writing to this file, every 'dd' write will append to the
          current contents of the file.  This flag makes sense only for
          output.  If you combine this flag with the 'of=FILE' operand,
          you should also specify 'conv=notrunc' unless you want the
          output file to be truncated before being appended to.

     'cio'
          Use concurrent I/O mode for data.  This mode performs direct
          I/O and drops the POSIX requirement to serialize all I/O to
          the same file.  A file cannot be opened in CIO mode and with a
          standard open at the same time.

     'direct'
          Use direct I/O for data, avoiding the buffer cache.  Note that
          the kernel may impose restrictions on read or write buffer
          sizes.  For example, with an ext4 destination file system and
          a linux-based kernel, using 'oflag=direct' will cause writes
          to fail with 'EINVAL' if the output buffer size is not a
          multiple of 512.

     'directory'

          Fail unless the file is a directory.  Most operating systems
          do not allow I/O to a directory, so this flag has limited
          utility.

     'dsync'
          Use synchronized I/O for data.  For the output file, this
          forces a physical write of output data on each write.  For the
          input file, this flag can matter when reading from a remote
          file that has been written to synchronously by some other
          process.  Metadata (e.g., last-access and last-modified time)
          is not necessarily synchronized.

     'sync'
          Use synchronized I/O for both data and metadata.
     'nocache'
          Discard the data cache for a file.  When count=0 all cache is
          discarded, otherwise the cache is dropped for the processed
          portion of the file.  Also when count=0 failure to discard the
          cache is diagnosed and reflected in the exit status.  Here as
          some usage examples:

               # Advise to drop cache for whole file
               dd if=ifile iflag=nocache count=0

               # Ensure drop cache for the whole file
               dd of=ofile oflag=nocache conv=notrunc,fdatasync count=0

               # Drop cache for part of file
               dd if=ifile iflag=nocache skip=10 count=10 of=/dev/null

               # Stream data using just the read-ahead cache
               dd if=ifile of=ofile iflag=nocache oflag=nocache

     'nonblock'
          Use non-blocking I/O.

     'noatime'
          Do not update the file's access time.  Some older file systems
          silently ignore this flag, so it is a good idea to test it on
          your files before relying on it.

     'noctty'
          Do not assign the file to be a controlling terminal for 'dd'.
          This has no effect when the file is not a terminal.  On many
          hosts (e.g., GNU/Linux hosts), this option has no effect at
          all.

     'nofollow'
          Do not follow symbolic links.

     'nolinks'
          Fail if the file has multiple hard links.

     'binary'
          Use binary I/O.  This option has an effect only on nonstandard
          platforms that distinguish binary from text I/O.

     'text'
          Use text I/O.  Like 'binary', this option has no effect on
          standard platforms.

     'fullblock'
          Accumulate full blocks from input.  The 'read' system call may
          return early if a full block is not available.  When that
          happens, continue calling 'read' to fill the remainder of the
          block.  This flag can be used only with 'iflag'.  This flag is
          useful with pipes for example as they may return short reads.
          In that case, this flag is needed to ensure that a 'count='
          argument is interpreted as a block count rather than a count
          of read operations.

     'count_bytes'
          Interpret the 'count=' operand as a byte count, rather than a
          block count, which allows specifying a length that is not a
          multiple of the I/O block size.  This flag can be used only
          with 'iflag'.

我們重點對direct、dsync、sync來進(jìn)行下介紹,在介紹之前,需要首先了解下linux的I/O體系,如下:

Linux I/O體系

image.png

上面的圖片有些復(fù)雜,可以簡略為如下圖片:


image.png

Linux磁盤I/O可以分為以下層次:

虛擬文件系統(tǒng)層

文件系統(tǒng)層

緩存層

通用塊層

I/O調(diào)度層

驅(qū)動層

物理設(shè)備層

虛擬文件系統(tǒng)層

一般來說,應(yīng)用程序不會直接跟物理設(shè)備直接打交道,基本上都是經(jīng)過文件系統(tǒng)去操作設(shè)備。文件系統(tǒng)種類比較多,比如基于塊設(shè)備的ext系列、xfs,網(wǎng)絡(luò)文件系統(tǒng)nfs等等,各類文件系統(tǒng)的接口和實現(xiàn)各不相同,這就產(chǎn)生了一個問題,難道應(yīng)用程序要為各種文件系統(tǒng)做特殊化處理嗎?答案是不用的,因為有虛擬文件系統(tǒng)。虛擬文件系統(tǒng)層位于文件系統(tǒng)層之上,屏蔽了各種文件系統(tǒng)的差異,為應(yīng)用層提供了一個統(tǒng)一的、虛擬的文件系統(tǒng)接口,也就是說應(yīng)用程序使用一套統(tǒng)一的接口便可以操作所有的文件系統(tǒng)。

文件系統(tǒng)層

基于虛擬文件系統(tǒng)定義的統(tǒng)一接口,實現(xiàn)具體文件系統(tǒng)的功能,文件系統(tǒng)有三類:

1.基于塊設(shè)備的文件系統(tǒng),如ext2、3、4,xfs;

2.網(wǎng)絡(luò)文件系統(tǒng),如nfs、cifs;

3.特殊文件系統(tǒng),如/proc、裸設(shè)備文件。

緩存層

相比于CPU和內(nèi)存,磁盤I/O屬于慢速I/O,為了提高磁盤I/O的速度,Linux添加了緩存層。默認(rèn)情況下,I/O數(shù)據(jù)先放到緩存中便返回上層,由內(nèi)核再把數(shù)據(jù)寫到設(shè)備,或者是上層把緩存數(shù)據(jù)讀走。對于寫操作,由于數(shù)據(jù)是放到緩存便返回了,上層認(rèn)為I/O結(jié)束了,實際上數(shù)據(jù)還沒落盤,如果這時候電腦異常掉電了,數(shù)據(jù)將會丟失。如果應(yīng)用層要確保數(shù)據(jù)寫到物理設(shè)備了,可以調(diào)用flush接口,緩存中的數(shù)據(jù)將會刷到物理設(shè)備中。Linux也提供了繞過緩存層的設(shè)置,打開文件的時候指定direct標(biāo)識,數(shù)據(jù)將繞過緩存層繼續(xù)執(zhí)行。

可通過free看到目前緩存的數(shù)據(jù)量,下圖的buff/cache便是:

image.png

通用塊層

由于設(shè)備種類繁多,接口也各不相同,為了屏蔽這些設(shè)備的差異,添加了通用塊層。文件系統(tǒng)只需要跟統(tǒng)一的通用層打交道便可以跟設(shè)備通信,無需關(guān)心實際設(shè)備驅(qū)動的實現(xiàn),簡化了文件系統(tǒng)的實現(xiàn)。

I/O調(diào)度層

磁盤I/O請求是隨機的,請求操作的磁盤位置也是隨機的,為了減少磁盤I/O的磁盤,增大磁盤整體的吞吐量,Linux添加了I/O調(diào)度層。I/O調(diào)度層使用調(diào)度算法,更加合理的對I/O請求進(jìn)行排序和合并,經(jīng)典的是電梯算法。

把磁盤I/O請求比作為乘坐電梯,分別有請求到3樓、到2樓、到6樓、到4樓,如果沒有調(diào)度算法的處理,將會出現(xiàn)電梯從1樓到3樓,從3樓到2樓,從2樓到6樓,再從6樓到4樓,造成電梯資源的浪費;如果有了調(diào)度算法,對調(diào)度進(jìn)行了合理的排序,將出現(xiàn)電梯先到2樓、3樓、4樓、6樓,一次從1樓到6樓便可以完成所有的請求。

image.png

驅(qū)動層

各類物理設(shè)備的驅(qū)動層,用于內(nèi)核與物理設(shè)備通訊。內(nèi)核會提供驅(qū)動的通用接口,設(shè)備商根據(jù)接口實現(xiàn)驅(qū)動程序并注冊到內(nèi)核便可實現(xiàn)內(nèi)核與設(shè)備的通訊。

物理設(shè)備層

各種物理磁盤設(shè)備,提供實際的存儲功能,慢速設(shè)備有傳統(tǒng)的機械硬盤HDD、快速的有固態(tài)硬盤SSD和NVME。物理磁盤也會帶有緩存,用于提供I/O速度,磁盤中帶有電容,可保證哪怕掉電也能把緩存數(shù)據(jù)刷寫到磁盤中。

常見參數(shù)對此

conv標(biāo)志

 'fdatasync'
      Synchronize output data just before finishing.  This forces a
      physical write of output data.

 'fsync'
      Synchronize output data and metadata just before finishing.
      This forces a physical write of output data and metadata.

oflag參數(shù)

 'direct'
      Use direct I/O for data, avoiding the buffer cache.  Note that
      the kernel may impose restrictions on read or write buffer
      sizes.  For example, with an ext4 destination file system and
      a linux-based kernel, using 'oflag=direct' will cause writes
      to fail with 'EINVAL' if the output buffer size is not a
      multiple of 512.
 'dsync'
      Use synchronized I/O for data.  For the output file, this
      forces a physical write of output data on each write.  For the
      input file, this flag can matter when reading from a remote
      file that has been written to synchronously by some other
      process.  Metadata (e.g., last-access and last-modified time)
      is not necessarily synchronized.

 'sync'
      Use synchronized I/O for both data and metadata.

沒oflag
沒有oflag時,dd按照默認(rèn)的方式打開輸出文件,默認(rèn)是buffered I/O,數(shù)據(jù)寫到緩存層便返回,所以速度最快。
oflag=direct
以該方式打開輸出文件,數(shù)據(jù)寫到磁盤緩存便返回,所以速度比上面的buffered I/O方式要慢。
oflag=sync
以該方式打開輸出文件,數(shù)據(jù)全部落盤才返回,所以速度比上面的僅寫到磁盤緩存要慢。
oflag=dsync
以該方式打開輸出文件,跟sync相同,區(qū)別在于sync同步元數(shù)據(jù),但是dsync不包括元數(shù)據(jù)。

實際案例

某客戶兩臺同配置機器,運行數(shù)據(jù)庫業(yè)務(wù),對兩臺機器使用dd性能測試,客戶原始反饋兩臺機器使用dd測試性能差距較大,如下:
主服務(wù)器,有業(yè)務(wù)運行


image.png

同時客戶表示主服務(wù)器tpm文件系統(tǒng)寫入較快


image.png

備服務(wù)器,無業(yè)務(wù)運行。
image.png

分析

1、tmp寫入較快為bs=1M,同時為fsync(參數(shù)原則上要求物理寫入,但測試對象為/tmp文件系統(tǒng),此文件系統(tǒng)會有些特殊)。
舉例如下,同樣的參數(shù)在/tmp下跟在/home下執(zhí)行就會有些差別:


image.png

2、兩臺機器差別較大的原因,懷疑主要是受業(yè)務(wù)的影響。
3、因此建議在無業(yè)務(wù)影響的條件下,測試其他非/tmp文件系統(tǒng),并使用oflag=direct(排除系統(tǒng)緩存影響),結(jié)果如下:


image.png
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容