minimap2 我是用10G 以上基因組 +100G reads
默認參數(shù)下
一般消耗20-40G 內(nèi)存;
存入文件時消耗80G
后來思考,-I 參數(shù),對于一些大基因組 可以以消耗時間為代價,降低內(nèi)存消耗
-I NUM Load at most NUM target bases into RAM for indexing [4G]. If there are more than NUM bases in target.fa,
minimap2 needs to read query.fa multiple times to map it against each batch of target sequences.
NUM may be ending with k/K/m/M/g/G. NB: mapping quality is incorrect given a multi-part index.
Note:如果 基因組大于 -I 設(shè)置的大小 ,就會是 multi-part index;
這時副作用
(1) 比對質(zhì)量(mapping quality ) 會不準確,根據(jù)需要進行取舍
(2) 使用 -a 參數(shù),以 sam 格式輸出,則不會有前面的SQ 行;
@SQ SN:C14E LN:145181
建議還在用sam 格式的同學(xué) 轉(zhuǎn)戰(zhàn) paf 格式吧,長度信息都在paf 中
PAF: a Pairwise mApping Format
Col Type Description
1 string Query sequence name
2 int Query sequence length
3 int Query start (0-based; BED-like; closed)
4 int Query end (0-based; BED-like; open)
5 char Relative strand: "+" or "-"
6 string Target sequence name
7 int Target sequence length
8 int Target start on original strand (0-based)
9 int Target end on original strand (0-based)
10 int Number of residue matches
11 int Alignment block length
12 int Mapping quality (0-255; 255 for missing)
默認-I 是4G ; 也就是如果基因組過大,拆分為多份多次導(dǎo)入內(nèi)存中比對;
以比對時間為代價降低內(nèi)存消耗, 建立索引時修改 -I 參數(shù)
minimap2 -I 3G -d ref.mmi ref.fasta