Usearch fastq_mergepairs 命令使用信息搬運(yùn)

All the following information come from www.drive5.com, I just use this as a notebook for my learning, I declare no commercial interest with this. Everyone who see this document should refer to www.drive5.com.?

I got some problem when I was trying to merge my read data, then I collected some information, they are shown as following.?


The fastq_mergepairs command merges (assembles) paired-end reads to create consensus sequences and, optionally, consensus quality scores. This command has many features and options so I recommend spending some time browsing the documentation to get familiar with the capabilities of fastq_mergepairs and issues that arise in read merging.


Basic usage

The simplest way to use fastq_mergepairs is to specify the the forward and reverse FASTQ filenames and an output FASTQ filename.

usearch -fastq_mergepairs SampleA_R1.fastq -reverse SampleA_R2.fastq -fastqout merged.fq


Automatic R2 filename

If the -reverse option is omitted, the reverse FASTQ filename is constructed by replacing R1 with R2. The following command line is equivalent to the example above.

usearch -fastq_mergepairs SampleA_R1.fastq -fastqout merged.fq


Merging multiple FASTQ file pairs in a single command

You can specify two or more FASTQ filenames following -fastq_mergepairs. In the following example, SampleA and SampleB are both merged. The R2 filenames are constructed automatically as explained above, or can be given explicitly using the -reverse option.

usearch -fastq_mergepairs SampleA_R1.fastq SampleB_R1.fastq -fastqout merged.fq

usearch -fastq_mergepairs *_R1*.fastq? -fastqout merged.fq (This is what I was using when I had 45 reads).


Adding sample identifiers to read labels

If multiple samples are combined into a single file as shown in some of the above examples, then you lose track of which read came from which sample. This is addressed by adding a?sample identifier?to each read label. The simplest method is to use the -sample option, e.g.

usearch -fastq_mergepairs SampleA_R1.fastq -fastqout merged.fq -sample SampleA

The string sample=SampleA; will be added at the end of the read label.


Getting the sample identifier from the FASTQ filename

FASTQ filenames are often based on the sample identifier, e.g. SampleA_R1.fastq. If you specify? -relabel @ then fastq_mergepairs gets the sample identifier from the FASTQ file name by truncating at the first underscore (_) or period (.). A period and the read number is added after the sample identifier to make the new read label, which replaces the original label. This differs from the -sample option, which adds the sample= annotation at the end of the label. The usearch_global command understands both of these methods for putting sample identifiers into read labels..

usearch -fastq_mergepairs SampleA_R1.fastq -fastqout merged.fq? -relabel @


Merging multiple files with sample identifiers

By using wildcards and the? -relabel @ option you can merge multiple files and add sample identifiers to the read labels, for example:

usearch -fastq_mergepairs *R1*.fastq -fastqout merged.fq? -relabel @


fastq_mergepairs options

Input files

-

fastq_mergepairs? Forward FASTQ filename(s).? -reverse? Reverse FASTQ filename(s). If not given, constructed by replacing R1 with R2.

-interleaved? Forward and reverse reads are interleaved in the same file (sometimes produced by SRA fastq-dump).

Output files

-

fastqout? FASTQ filename for merged reads.

-fastaout? FASTA filename for merged reads.

-fastqout_notmerged_fwd? FASTQ filename for forward reads which were not merged.

-fastaout_notmerged_fwd? FASTA filename for forward reads which were not merged.

-fastqout_notmerged_rev? FASTQ filename for reverse reads which were not merged.

-fastaout_notmerged_rev? FASTA filename for reverse reads which were not merged.

Reports

?-report?? Filename for summary report. See?Reviewing a fastq_mergepairs report to check for problems.

-tabbedout? Tabbed text file containing detailed information about merging process for each pair including reason for discarding.

-alnout? Human-readable alignments. Useful for?trouble-shooting.

Merged read labels

-relabel? Prefix string for output labels. The read number 1, 2, 3... is appended after the prefix.

-relabel @ Relabel using prefix string constructed from FASTQ filename, this will be understood as the sample identifier.

??-sample? xxx Append sample identifier to read label using sample=xxx; format. This is an alternative method for adding sample ids.

-fastq_eeout? Add ee=xxx; annotation with the number of expected errors in the merged read.

-label_suffix? Suffix to append to merged read label. Can be used e.g. to add sample=xxx; type of?sample identifier annotations.

Filtering

??-fastq_maxdiffs? Maximum number of mismatches in the alignment. Default 5. Consider increasing if you have long overlaps.

-fastq_pctid? Minimum %id of alignment. Default 90. Consider decreasing if you have long overlaps.

-fastq_nostagger? Discard?staggered pairs. Default is to trim overhangs (non-biological sequence).

-fastq_minmergelen? Minimum length for the merged sequence. See?Filtering artifacts by setting a merge length range.

-fastq_maxmergelen? Maximum length for the merged sequence.

-fastq_minqual? Discard merged read if any merged Q score is less than the given value. (No minimum by default).

-fastq_minovlen? Discard pair if alignment is shorter than given value. Default 16.

Pre-processing of reads before alignment

?-fastq_trunctail? Truncate reads at the first Q score with <= this value. Default 2.

-fastq_minlen? Discard pair if either read is shorter than this, after truncating by -fastq_trunctail if applicable. Default 64.

Multi-threading

?-threads?Specifies the number of threads. Default 10, or the number of CPU cores, which ever is less.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容