日韩伊人网,人妻第三区

引言

本系列將開展全新的CUT&Tag 數(shù)據(jù)處理和分析專欄。想要獲取更多教程內(nèi)容或者生信分析服務(wù)可以添加文末的學(xué)習(xí)交流群或客服QQ:941844452。

重復(fù)去除

CUT&Tag 技術(shù)會將接頭序列插入到抗體連接的 pA-Tn5 附近的 DNA 中，而插入的具體位置會受到周圍 DNA 可及性的影響。因此，那些起始和結(jié)束位置完全相同的片段是比較常見的，但這些所謂的“重復(fù)項”可能并不是由于 PCR 過程中的復(fù)制產(chǎn)生的。實際上，發(fā)現(xiàn)高質(zhì)量的 CUT&Tag 數(shù)據(jù)集的表觀重復(fù)率通常很低，即使是看起來像是“重復(fù)”的片段，也可能是真實的片段。因此，不建議刪除這些重復(fù)項。不過，在實驗樣本量極少，或者懷疑存在 PCR 擴增重復(fù)的情況下，可以考慮刪除重復(fù)項。以下命令展示了如何使用 Picard 來檢查重復(fù)率。

##== linux command ==##
## depending on how you load picard and your server environment, the picardCMD can be different. Adjust accordingly.
picardCMD="java -jar picard.jar"
mkdir -p $projPath/alignment/removeDuplicate/picard_summary

## Sort by coordinate
$picardCMD SortSam I=$projPath/alignment/sam/${histName}_bowtie2.sam O=$projPath/alignment/sam/${histName}_bowtie2.sorted.sam SORT_ORDER=coordinate

## mark duplicates
$picardCMD MarkDuplicates I=$projPath/alignment/sam/${histName}_bowtie2.sorted.sam O=$projPath/alignment/removeDuplicate/${histName}_bowtie2.sorted.dupMarked.sam METRICS_FILE=$projPath/alignment/removeDuplicate/picard_summary/${histName}_picard.dupMark.txt

## remove duplicates
picardCMD MarkDuplicates I=$projPath/alignment/sam/${histName}_bowtie2.sorted.sam O=$projPath/alignment/removeDuplicate/${histName}_bowtie2.sorted.rmDup.sam REMOVE_DUPLICATES=true METRICS_FILE=$projPath/alignment/removeDuplicate/picard_summary/${histName}_picard.rmDup.txt

總結(jié)了明顯的重復(fù)率，并計算出唯一的庫大小而無需重復(fù)。

##=== R command ===## 
## Summarize the duplication information from the picard summary outputs.
dupResult = c()
for(hist in sampleList){
  dupRes = read.table(paste0(projPath, "/alignment/removeDuplicate/picard_summary/", hist, "_picard.rmDup.txt"), header = TRUE, fill = TRUE)
  
  histInfo = strsplit(hist, "_")[[1]]
  dupResult = data.frame(Histone = histInfo[1], Replicate = histInfo[2], MappedFragNum_hg38 = dupRes$READ_PAIRS_EXAMINED[1] %>% as.character %>% as.numeric, DuplicationRate = dupRes$PERCENT_DUPLICATION[1] %>% as.character %>% as.numeric * 100, EstimatedLibrarySize = dupRes$ESTIMATED_LIBRARY_SIZE[1] %>% as.character %>% as.numeric) %>% mutate(UniqueFragNum = MappedFragNum_hg38 * (1-DuplicationRate/100))  %>% rbind(dupResult, .)
}
dupResult$Histone = factor(dupResult$Histone, levels = histList)
alignDupSummary = left_join(alignSummary, dupResult, by = c("Histone", "Replicate", "MappedFragNum_hg38")) %>% mutate(DuplicationRate = paste0(DuplicationRate, "%"))
alignDupSummary

在這些示例數(shù)據(jù)集中，IgG 對照樣本的重復(fù)率比較高。這是因為這些樣本中的數(shù)據(jù)來源于 CUT&Tag 反應(yīng)中的非特異性片段化。因此，在進行下游分析之前，從 IgG 數(shù)據(jù)集中去除重復(fù)項是比較合理的。
估計的文庫大小是根據(jù) Picard 計算的 PE 重復(fù)率來估算文庫中獨特分子數(shù)量的。
估計的文庫大小與目標表位的豐度以及抗體的質(zhì)量成正比，而 IgG 樣本的估計文庫大小通常會很低。
獨特片段數(shù)是通過公式 MappedFragNum_hg38 * (1 - DuplicationRate/100) 計算得出的。

##=== R command ===## 
## generate boxplot figure for the  duplication rate
fig4A = dupResult %>% ggplot(aes(x = Histone, y = DuplicationRate, fill = Histone)) +
    geom_boxplot() +
    geom_jitter(aes(color = Replicate), position = position_jitter(0.15)) +
    scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.9, option = "magma", alpha = 0.8) +
    scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +
    theme_bw(base_size = 18) +
    ylab("Duplication Rate (*100%)") +
    xlab("") 

fig4B = dupResult %>% ggplot(aes(x = Histone, y = EstimatedLibrarySize, fill = Histone)) +
    geom_boxplot() +
    geom_jitter(aes(color = Replicate), position = position_jitter(0.15)) +
    scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.9, option = "magma", alpha = 0.8) +
    scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +
    theme_bw(base_size = 18) +
    ylab("Estimated Library Size") +
    xlab("") 

fig4C = dupResult %>% ggplot(aes(x = Histone, y = UniqueFragNum, fill = Histone)) +
    geom_boxplot() +
    geom_jitter(aes(color = Replicate), position = position_jitter(0.15)) +
    scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.9, option = "magma", alpha = 0.8) +
    scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +
    theme_bw(base_size = 18) +
    ylab("# of Unique Fragments") +
    xlab("")

ggarrange(fig4A, fig4B, fig4C, ncol = 3, common.legend = TRUE, legend="bottom")

本文由mdnice多平臺發(fā)布

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

CUT&Tag 數(shù)據(jù)處理和分析教程（5）

CUT&Tag 數(shù)據(jù)處理和分析教程（5）

引言

重復(fù)去除

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

CUT&Tag 數(shù)據(jù)處理和分析教程（5）

引言

重復(fù)去除

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av