linux pigz 快速壓縮命令
功能:多線程壓縮文件,比gzip快很多?。。?/p>
我寫教程,喜歡貼原鏈接,不單單呈現(xiàn)我自己的解析。
- 官網(wǎng):http://zlib.net/pigz/
- 學(xué)習(xí)鏈接:https://www.cnblogs.com/kuang17/p/7193124.html
幫助文檔
NAME:
pigz, unpigz ? compress or expand files
讀法:pig-zee
SYNOPSIS
pigz [ ?cdfhikKlLmMnNqrRtz0..9,11 ] [ -b blocksize ] [ -p threads ] [ -S suffix ] [ name ... ]
unpigz [ ?cfhikKlLmMnNqrRtz ] [ -b blocksize ] [ -p threads ] [ -S suffix ] [ name ... ]
功能
? Pigz compresses using threads to make use of multiple processors and cores.
參數(shù)描述
- -b:更改block size的大小
- The default input block size is 128K, but can be changed with the -b option.
- -p: 設(shè)置線程數(shù)
- The number of compress threads is set by default to the number of online processors, which can be changed using the -p option. Specifying -p 1 avoids the use of threads entirely.
- 解壓縮不能并行化,至少?zèng)]有專門準(zhǔn)備的放氣流那個(gè)目的。 Asaresult,pigz使用單個(gè)線程(主線程)進(jìn)行解壓縮,但是會(huì)創(chuàng)建三個(gè)其他線程用于讀取,寫入和檢查計(jì)算,這可以加速解壓縮在某些情況下。關(guān)閉這樣類似的并行解壓縮:可以通過指定并行解壓縮來關(guān)閉并行解壓縮進(jìn)程(-dp 1或-tp 1)。
- pigz -d或unpigz可以將壓縮文件恢復(fù)為原始文件
參數(shù)詳解
-# --fast --best
Regulate the speed of compression using the specified digit #, where ?1 or ??fast indicates the fastest compression method (less compression) and ?9 or ??best indicates the slowest compression method (best compression). -0 is no compression. ?11 gives a few percent better compression at a severe cost in execution time, using the zopfli algorithm by Jyr ki Alakuijala. The default is ?6.
# -1或--fast表示最快的壓縮方法(壓縮程度較低),-9或--best表示最慢的壓縮方法(最佳壓縮)。 -0沒有壓縮。 -11使用Jyr ki Alakuijala的zopfli算法,在執(zhí)行時(shí)間內(nèi)以極高的成本提供了幾個(gè)百分點(diǎn)的更好的壓縮。默認(rèn)值為-6。
-b --blocksiz e mmm
Set compression block size to mmmK (default 128KiB).
# 將壓縮塊大小設(shè)置為mmmK(默認(rèn)為128KiB)
-c --stdout --to-stdout
Write all processed output to stdout (won’t delete).
# 將所有已處理的輸出寫入stdout(不會(huì)刪除)。
# 類似gzip的-c,把壓縮的內(nèi)容輸出,這樣可以重定向到另一個(gè)名字的文本,然后可以保留原文本
-d --decompress --uncompress
Decompress the compressed input.
# 類似gzip的-d,解壓縮(壓縮內(nèi)容傳進(jìn)來的輸入)
# 見下面示例,可以繼續(xù)往后面|繼續(xù)操作
qmcui 13:20:04 ~
$ cat tmp1.gz|gzip -d
1
2
3
4
5
6
7
8
9
10
-f --force
Force overwr ite, compress .gz, links, and to terminal.
-h --help
Displayahelp screen and quit.
-i --independent
Compress blocks independently for damage recovery.
# 獨(dú)立壓縮塊以進(jìn)行損壞恢復(fù)。
-k --keep
Do not delete original file after processing.
# 處理后不要?jiǎng)h除原始文件。
-K --zip
Compress to PKWare zip (.zip) single entry for mat.
-l --list List the contents of the compressed input.
# 列出壓縮輸入的內(nèi)容
-L --license
Display the pigz license and quit.
-m --no-time
Do not store or restore the modification time. -Nm will store or restore the name, but not the modification time. Note that the order of the options is important.
# 不要存儲(chǔ)或恢復(fù)修改時(shí)間。 -Nm將存儲(chǔ)或恢復(fù)名稱,但不包括修改時(shí)間。請(qǐng)注意,選項(xiàng)的順序很重要。
-M --time
Store or restore the modification time. -nM will store or restore the modification time, but not the name. Note that the order of the options is important.
# 存儲(chǔ)或恢復(fù)修改時(shí)間。 -nM將存儲(chǔ)或恢復(fù)修改時(shí)間,但是不是名字。請(qǐng)注意,選項(xiàng)的順序很重要。
-n --no-name
Do not store or restore the file name or the modification time. This is the default when decompressing. When the file name is not restored from the header, the name of the compressed file with the suffix stripped is the name of the decompressed file. When the modification time is not restored from the header, the modification time of the compressed file is used (not the current time).
-N --name
Store or restore both the file name and the modification time. This is the default when compressing.
-p --processes n
Allow up to n processes (default is the number of online processors)
# 很重要,線程數(shù)
-q --quiet --silent
Print no messages, even on error.
# 即使出錯(cuò),也不打印任何消息。(參數(shù)意義相似)
-r --recursive
Process the contents of all subdirectories.
# 處理所有子目錄的內(nèi)容。
-R --rsyncable
Input-deter mined block locations for rsync.
-S --suffix .sss
Use suffix .sss instead of .gz (for compression).
# 使用后綴.sss而不是.gz(用于壓縮)。
-t --test
Test the integrity of the compressed input.
# 測(cè)試壓縮輸入的完整性。
-v --verbose
Provide more verbose output.
-V --version
Show the version of pigz. -vV also shows the zlib version.
-z --zlib
Compress to zlib (.zz) instead of gzip for mat.
# 壓縮為zlib(.zz)而不是gzip為mat。
-- All arguments after "--" are treated as file names (for names that start with "-")
These options are unique to the -11 compression level:
-F --first
Do iterations first, before block split (default is last).
-I, --iterations n
Number of iterations for optimization (default 15).
-J, --maxsplits n
Maximum number of split blocks (default 15).
-O --oneblock
Do not split into smaller blocks (default is block splitting)
示例
qmcui 13:50:11 ~
$ time pigz -p 4 -c /public/reference/genome/hg38/hg38.fa >~/hg38.fa.pigz.p4.gz 2>hg38.fa.pigz.p4.gz.log
# 時(shí)間結(jié)果
real 2m2.551s
user 8m8.217s
sys 0m2.914s
# 大約花費(fèi)2min
qmcui 13:52:13 ~
$ time gzip -c /public/reference/genome/hg38/hg38.fa >~/hg38.fa.gz 2>hg38.fa.gz.log
# 時(shí)間結(jié)果
real 4m37.013s
user 4m35.220s
sys 0m1.137s
# 大約花費(fèi)4min
第一個(gè)壓縮大約運(yùn)行了不足3min
第二個(gè)壓縮需要很久,大約運(yùn)行了不足5min
評(píng)估
? 如果你壓縮的文件很小,比如小于3g,你可以使用gzip,如果文件很大,比如很大很大的vcf,100G+,最好使用pigz,可以省很多時(shí)間。
? 其次,因?yàn)槲覜]測(cè)試-9,即最大壓縮率的情況,但是我覺得pigz應(yīng)該速度還是完勝gzip。
致謝
哈哈,致謝我的生信小伙伴wll,告訴我這么快,因?yàn)槲乙矝]留意過cutadapter的子任務(wù)里還藏著這么一個(gè)好東西
- 你還可以用xz壓縮命令:學(xué)習(xí)網(wǎng)址
- pigz學(xué)習(xí)示例:https://www.jb51.net/LINUXjishu/194296.html
鏈接結(jié)論:
1、pigz默認(rèn)用法(默認(rèn)并發(fā)線程是邏輯cpu個(gè)數(shù))可比gzip快5.3倍,CPU消耗則是gzip的8倍,壓縮比則相當(dāng);
2、并發(fā)8線程對(duì)比4線程提升:41.2%,16線程對(duì)比8線程提升:27.9%,32線程對(duì)比16線程提升:3%;
3、在對(duì)壓縮效率要求較高、但對(duì)短時(shí)間內(nèi)CPU消耗較高不受影響的場(chǎng)景,使用pigz非常合適。
如果你已經(jīng)開始改善你的流程了,加速或者尋找最好的代碼來分析自己的數(shù)據(jù),避免一味重復(fù)的是別人的代碼。恭喜你,你具備了“生信研發(fā)”的基礎(chǔ)技能,評(píng)估。
如果你看不懂,自學(xué)是很慢,成功的人也很多,加油,否則問一下身邊人選一個(gè)靠譜的培訓(xùn)班吧。
gzip壓縮算法學(xué)習(xí):https://www.cnblogs.com/kuang17/p/7193124.html