Using the tabbedout file to investigate merging problems信息收集

信息來源https://www.drive5.com/usearch/manual/merge_tabbed_check.html


If the merge report shows that many reads are failing to merge for a given reason, then you can use the tabbedout file to investigate further. For example, suppose the report says that 70% of the pairs were discarded because of "too many diffs", i.e. mismatches in the alignments.

The simplest way to investigate is to use the?-fastqout_notmerged_fwd and -fastqout_notmerged_rev options to get the pairs which did not merge, then (if needed) use fastx_subsample to get a small subset for manual investigation. See?trouble-shooting merging?for details.

If reads are failing to merge for two or more different reasons, then you can use the tabbedout file to get the subset of reads that is failing for one of those reasons, which may be convenient for further analysis in challenging cases.

The format of the tabbedout file is not documented in detail (and is subject to change in future usearch builds), but is fairly self-explanatory. Each read pair is one line in the file. The read label is the first field (truncated at the first space). Subsequent fields are separated by tabs. Each field reports the results of one step in the merging process, for example:

M00967:15:000000000-A2G1J:1:1101:18083:3926 aln=123-128-121?diffs=15 toomanydiffs result=notmerged

This shows that the pair failed to merge because there were too many (15) mismatches in the alignment. To get the read labels for all the reads that failed to merge for this reason, you can do this:

grep toomanydiffs tabbedout.txt | cut -f1 > toomanydiffs.labels

Then, to get the reads:

usearch -fastx_getseqs myreads_R1.fasta -labels toomanydiffs.labels -trunclabels -fastqout fwd.fq

usearch -fastx_getseqs myreads_R2.fasta -labels toomanydiffs.labels -trunclabels -fastqout rev.fq

The -trunclabels option is needed with typical Illumina reads because otherwise the labels will fail to match due to the suffixes 1:N:0.... and 2:N:0...? which are added to the labels for the R1 and R2 reads, respectively.

Now you have a test set of read pairs which you can use to investigate further.

?著作權歸作者所有,轉載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容