青青操69,久草福利草久久

Corresponding author: Li Ding
Director of Computational Biology, Oncology
Washington University School of Medicine, St. Louis, MO

Sample procurement, sequencing and analysis roadmap.

1.Sequencing strategies

WES轉(zhuǎn)向WGS：WGS data are therefore considered to be the unbiased 'gold standard'

1.1 Traditional sequencing analyses
In practice, detection of all germline and somatic aberrations is a formidable challenge owing to limitations in current analysis algorithms, as well as to the quantity and quality of sequence data.
實際上，由于當(dāng)前分析算的局限性，以及測序數(shù)據(jù)的數(shù)量和質(zhì)量的限制，檢測所有種系和體細(xì)胞突變是一項艱巨的挑戰(zhàn)。
1.2 Subclonal analyses
cancer progression has long been known to be a fundamentally clonal process, and sequence coverage is now becoming sufficiently large to permit detection of the low-prevalence events that are routinely associated with tumour subclones. Multisite and/or multistage sequencing and tumour sectioning experiments have begun to identify founding clones and subclones that contribute to cancer progression
1.3 Single-cell sequencing
Pioneering work on assessing CNAs in multiple tumour subpopulations was followed by single-cell sequencing using whole-genome amplification (WGA) of DNA extracted from nuclei that were sorted by flow cytometry.
目前仍然存在一些挑戰(zhàn)，如簡并寡核苷酸引物WGA的放大偏差和多重置換擴增技術(shù)（degenerate oligonucleotide-primed WGA是指引物的3' 含6bp的隨機序列，可以隨機的和基因組DNA結(jié)合，從而實現(xiàn)對全基因組的擴增；multiple displacement amplification techniques利用隨機引物和等溫擴增可以獲得高保真的DNA大片段，但該方法的主要缺陷在于非平衡的基因組覆蓋率、擴增偏倚、嵌合序列及非特異擴增等），這些技術(shù)的偏倚導(dǎo)致了不均勻的覆蓋，并因此難以確定體細(xì)胞的變化，包括SNVs、CNAs和結(jié)構(gòu)畸變。由于兩個等位基因中的一個的優(yōu)先擴增，檢測靈敏度受等位基因缺失的影響最大，有報道稱等位基因缺失率為8 - 40%。大的CNAs仍然可以在基因組覆蓋率較低的情況下進(jìn)行檢測(例如，5-6%)，而不平等的覆蓋率使得分析較小的CNAs和結(jié)構(gòu)變異極其困難。

2.Dissecting genomic changes in cancer

以下表格是注釋和解讀腫瘤基因組突變的計算工具

Program	Function	Synopsis	Refs
*SNV and indel detection*
Bassovac	SNV and indel detection	Bayesian approach with tumour or normal impurity and clonality	–
GATK	SNV and indel detection	Analysis framework using MapReduce	23
JointSNVMix	SNV detection	Binomial/multinomial probability with pre-filtering	31
MuTect	SNV and indel detection	Bayesian probability with pre- and post-filtering	28
Pindel	Indel detection	Pattern growth learning method	38
SNVMix	SNV detection	Binomial mixture model	30
SomaticSniper	SNV and indel detection	Bayesian probability with posterior filtering	27
Strelka	SNV and indel detection	Bayesian probability with posterior filtering	29
VarScan	SNV and indel detection	Fisher exact test, filtering and FDR correction	24,25
*Copy-number aberration, structural variant and gene fusion detection*
BreakDancer	Structural variant and indel detection	Kolmogorov–Smirnov test on discordant reads	54
BreakFusion	Gene fusion detection	Alignment-based pipeline for transcriptomic data	68
BreakTrans	Gene fusion mapping	Integration of fusion discovery and breakpoint tools	73
ChimeraScan	Chimeric transcription detection	Discordant read pairs with posterior filtering	67
CREST	Structural variant detection	Heuristics and binomial test on soft-clipped reads	55
deFuse	Gene fusion detection	Dynamic programming split and discordant reads	65
DELLY	Structural variant detection	Integrated method of discordant and split reads	40
GASV-Pro	Structural variant detection	Plane sweep for segment intersection	57
Genome STRiP	Structural variant detection	Depth and split or discordant reads on populations	59
Hydra	Structural variant detection	Discordant reads with assembly validation	139
LUMPY	Structural variant detection	Integrated method of discordant and split reads	167
TIGRA	Structural variant detection	Debruijn graph-based assembly	42
*Level I annotation and interpretation*
ABSOLUTE	Purity, ploidy and clonality prediction	Optimization of logarithmic scores	148
ANNOVAR	Functional prediction	Annotation-based prediction	74
ASCAT	Purity, ploidy and clonality prediction	Goodness-of-fit ranking of candidate solutions	168
TUSON Explorer	Gene classification	Oncogene or tumour suppressor discovery using mutational signatures	100
CHASM	Functional prediction	Random forest classifier	84,85
MutationAssessor	Functional prediction	Conservation-based prediction (entropy score)	83
PolyPhen2	Functional prediction	Probability model based on structure and alignment	81,169
SciClone	Tumour clonality prediction	Bayesian mixture model	–
SIFT	Functional prediction	Conservation-based prediction	82
SNPeff	Functional prediction	Annotation and coding effect prediction	75
THetA	Purity, ploidy and clonality prediction	Maximum likelihood of mixture composition	151
VEP	Functional prediction	Annotation-based prediction	170
*Level II annotation and interpretation*
Dendrix	Mutation analysis	De novo discovery of mutually exclusive mutations	128
HotNet	Network analysis	Diffusion model for significant networks	119
MEMo	Network analysis	Network modules with mutual exclusivity	122
MuSiC	Mutation analysis	Framework for significance analysis of mutations	92
Multi-Dendrix	Mutation analysis	De novo discovery of multiple sets of exclusive mutations	129
MutSigCV	Mutation analysis	Gene significance with variable background mutation rate	93
NBS	Network analysis	Clustering using non-negative matrix factorization	121
Oncodrive-CIS and OncodriveCLUST	Mutation analysis	Z-statistics for copy numbers of driver genes	171,172
PARADIGM	Gene expression analysis	Network analysis of gene expression	126
PathScan	Pathway analysis	Probability model for mutation-enriched pathways	109
TieDIE	Network analysis	Network diffusion model linking mutations to gene expression	125

根據(jù)經(jīng)驗，由多個獨立算法call出來的候選事件不太可能是假陽性，而由任何單個算法call出來的候選事件則反之。因此，使用multicaller strategies現(xiàn)在變得更加普遍，當(dāng)然這樣做也會影響結(jié)果的靈敏度。但是各類工具的組合數(shù)量太龐大了，較難實現(xiàn)。

2.1 SNV detection
SNV檢測算法：GATK、VarScan、SAMtools、SomaticSniper、MuTect、Strelka、JointSNVMix和SNVMix。前三種方法能夠同時處理germline and somatic variants，其他幾種方法用來call somatic mutations using tumour and matched normal genomic sequences.
盡管在生殖系樣本中雜合子VAFs(variant allele fraction)預(yù)計為50%，但這一數(shù)字不適用于腫瘤中的體細(xì)胞突變，主要原因是正常組織污染和/或腫瘤異質(zhì)性。目前，算法開發(fā)的重點是在廣泛的VAFs上處理體細(xì)胞突變。例如Bassovac算法，它在call變異時考慮了雙向雜質(zhì)和腫瘤亞克隆結(jié)構(gòu)(即異質(zhì)性)的影響。
2.2 Indel detection
Indel detection is still challenging, mainly owing both to their lower frequencies than those of SNVs and to mapping difficulties.
大多數(shù)工具默認(rèn)允許two mismatches and no gaps in 'seeded' regions (that is, in the first 28 bp in a read), 從而導(dǎo)致了包含indel的序列無法正常比對。Paired-end mapping對于發(fā)現(xiàn)末端再翼側(cè)的大片段indel很有幫助，Gapped alignment, split read and de novo assembly 是目前常見的檢測indel的方法。VarScan25 and GATK Unified Genotyper are based on heuristics for indel calling using raw statistics such as coverage, number of indel-supporting reads, read mapping qualities and mismatch counts.
現(xiàn)有的許多工具對短indels (< 5-8 bp)檢測效果較好，但缺乏高的陽性率。此外，他們通常無法檢測中等大小的indel，包括一些已知的'druggable' and/or prognostic events。最后，低復(fù)雜度區(qū)域(如均聚物)的檢測尤其具有挑戰(zhàn)性。SAMtools、Dindel可以call出短indel，Pindel、DELLY8采用了一種借鑒蛋白質(zhì)數(shù)據(jù)分析的模式生長方法來檢測indel斷點，Pindel具有較高的精度，Burrows Wheeler aligner (BWA)-MEM41允許更好地發(fā)現(xiàn)長indels和SV， local de novo assembly or multiple alignments可以減少假陽性indel的數(shù)量。
2.3 CNA and structural variant detection
Accurate inference of copy number from sequence data requires normalization procedures that consider certain biases inherent to short-read sequencing methods (such as GC content and library biases). Approaches have been implemented for both GC-based coverage normalization and mapping bias.
尋找復(fù)發(fā)的CNA：Genomic identification of significant targets in cancer (GISTIC) and correlation matrix diagonal segmentation (CMDS) have been developed for the identification of recurrent CNAs.
檢測多種結(jié)構(gòu)變化（缺失、串聯(lián)或反向復(fù)制、倒置、插入和易位）：BreakDancer, CREST (clipping reveals structure), VariationHunter, geometric analysis of structural variants (GASV)-Pro，and Genome STRucture In Populations (Genome STRiP)
2.4 Gene fusion detection
RNA-Seq發(fā)現(xiàn)基因融合：TopHat-fusion、 deFuse、MapSplice、ChimeraScan、 BreakFusion
基因融合既可以發(fā)生在只涉及兩個遠(yuǎn)端loci的簡單易位，也可以由多個遠(yuǎn)端loci組成復(fù)雜重排：Comrad and nFuse，這兩種方法都將原始WGS和RNA-seq序列進(jìn)行比對，同時驗證融合和基因組斷點。
Comrad和nFuse可以解釋不明確的讀取對齊，因此可以最小化由不對齊引起的錯誤。
我們最近開發(fā)了BreakTrans，它聯(lián)合分析WGS和RNA-seq數(shù)據(jù)，以測試其他工具(如TopHat-fusion、MapSplice、BreakDancer和CREST)產(chǎn)生的假設(shè)，以進(jìn)一步描述基因融合的機制成分。

3. Driver mutations and pathways

3.1 Annotations and functional predictions
RefSeq基因和轉(zhuǎn)錄本：Ensembl和GENCODE
調(diào)控元件：ENCODE、TransFac和RegulomeDB
非編碼RNA：NONCODE、BodyMap和miRBase
蛋白質(zhì)注釋：Pfam和Interpro
綜合注釋：ANNOVAR和SNPeff提供轉(zhuǎn)錄變異的注釋，SKIPPY預(yù)測隱性剪接效應(yīng)因子，VEP、FunSeq和SNPnexus均擴展支持，包括非編碼元素和調(diào)控特性的注釋，VAAST(變異注釋、分析和搜索工具)和GEMINI(基因組挖掘)允許對編碼變異、非編碼變異、調(diào)控元件和表型進(jìn)行全面分析和整合
有害性：PolyPhen、SIFT、MutationAssessor和Condel
蛋白質(zhì)翻譯后修飾：ActiveDriver
3.2 Significantly mutated genes
檢測Driver mutation的一個方法是區(qū)分掉背景突變率BMR。BMR的測量比較困難，許多因素可以影響B(tài)MR（包括基因長度、表達(dá)水平和復(fù)制時間的差異）, variation among samples and errors in upstream analyses. BMR不僅在同一癌癥類型的患者之間存在差異，而且可能與環(huán)境因素和病毒特征有關(guān)的不同癌癥類型也有關(guān)。最后，對突變的不正確或有偏倚的注釋可能會導(dǎo)致假陽性。基因序列覆蓋不足加劇了這些問題。MuSiC和MutSig可以解決這些問題。
另一種用于區(qū)分司機突變和乘客突變的方法是檢查突變是否聚集在蛋白質(zhì)序列的特定殘基上。The '20/20 rule' 建議，如果一個基因至少20%的錯義突變(or identical in-frame indels)位于一個特定的殘基上，那么該基因應(yīng)該被歸類為致癌基因。相反，如果至少20%的突變處于失活狀態(tài)(即無意義的移碼、剪接位點或終止密碼子讀取突變)，則基因可以被歸類為腫瘤抑制因子?，F(xiàn)在，這一方法被一些算法所補充，這些算法利用更嚴(yán)格的統(tǒng)計分?jǐn)?shù)來評估突變信號的模式，以及蛋白質(zhì)序列或三維蛋白質(zhì)結(jié)構(gòu)突變的聚類。
3.3 Pathway and network analyses
通路和網(wǎng)絡(luò)分析: 1.分析已知通路, which are represented as gene sets, 2.分析交互作用網(wǎng)絡(luò)to implicitly build pathways de novo.
方法1：評估突變基因組合的一種直接方法是檢查突變基因列表與已知生物功能的預(yù)定義基因集之間的重疊：KEGG、GO和MSigDB。例如，假設(shè)我們有一個突變基因列表(M)，我們的目標(biāo)是看看這個列表中是否包含調(diào)控細(xì)胞周期的基因，利用KEGG數(shù)據(jù)庫，我們發(fā)現(xiàn)了20多個細(xì)胞周期基因(L)的列表，有兩個統(tǒng)計檢驗可以用來檢驗M和L是否有顯著重疊。首先，如果對M進(jìn)行排序(例如，使用上面描述的突變顯著性評分之一)，那么可以使用基因集富集分析(GSEA)來確定L中的基因是否接近排序列表的頂部(M)；其次，如果M未排序，則可以使用超幾何檢驗評估M和L之間的重疊。
方法2：以上分析方法的缺陷：1. Human gene annotations and pathway databases remain incomplete, and there is extensive crosstalk between pathways, which implies that decisions regarding the genes that form the boundary of a pathway are arbitrary to some extent. 2. The crosstalk is represented in gene-set and pathway databases by the presence of multiple overlapping gene sets, thus complicating the interpretation of reported enrichments. 3. Finally, signalling and regulatory pathways have a rich topology of activating and inhibitory interactions, and this information is not represented in the list of genes or proteins that are members of the pathway，激活和抑制作用無法通過富集分析體現(xiàn)。為了克服這些限制，分析突變組合的第二種方法是使用生物相互作用網(wǎng)絡(luò)：相互作用網(wǎng)絡(luò)已被用來取代基因集，以確定應(yīng)進(jìn)一步評估的突變組合。然而，大多數(shù)生物網(wǎng)絡(luò)具有不均勻的拓?fù)浣Y(jié)構(gòu)，其特征是中心或節(jié)點的存在。HotNet是一種查找大型交互網(wǎng)絡(luò)的子網(wǎng)絡(luò)的方法，該子網(wǎng)絡(luò)在隨機樣本中發(fā)生的變異比預(yù)期的要多，HotNet已被用于確定幾種癌癥類型的子網(wǎng)絡(luò)，這些子網(wǎng)絡(luò)在TCGA的背景下進(jìn)行了分析，例如，涉及卵巢癌中Notch信號通路的突變。還有一些其他工具，如network-based stratification (NBS)、MEMo、Tied Diffusion Through Interacting Events (TieDIE)等。
方法3：第三種用于分析突變組合的方法是識別相互排斥的突變集。人們可以通過識別相互排斥的突變集來找到驅(qū)動突變的組合。MEMo使用這個概念來檢測已知相互作用的基因，或者，可以嘗試在不預(yù)先限制基因集的情況下重新發(fā)現(xiàn)相互排斥的基因集（Dendrix、Multi-Dendrix、RME）。

4. Genome integrity and clonal architectures

4.1 Kataegis, chromothripsis and chromoplexy
TCGA中最引人注目的發(fā)現(xiàn)之一是具有極端數(shù)量和突變類型的基因組。
Kataegis is the occurrence of an unusually large number of SNPs clustered in a single locus, and was first reported in breast tumours and other cancer types.
chromothripsis, in which one or more loci undergo a catastrophic event of simultaneous breakage and aberrant repair at multiple breakpoints in a single cell division，chromothripsis was originally reported in ~2–3% of all cancers but was shown to be particularly common in bone cancers (~25%)，后來發(fā)現(xiàn)可能與TP53突變有關(guān)。chromoplexy是在前列腺癌中發(fā)現(xiàn)的類似事件。
4.2 Defining clonal architecture in heterogeneous tumours
以上討論的所有基因組改變都在克隆進(jìn)化中發(fā)揮作用。
ABSOLUTE增加了一個最佳擬合CNA模型和一個核型似然模型
PyClone使用分層貝葉斯聚類來識別克隆
SciClone使用貝葉斯混合模型來檢查來自患者的多個樣本(使用初始和復(fù)發(fā)的腫瘤樣本)或空間(使用多個活檢樣本)
腫瘤異質(zhì)性分析(THetA)算法解釋了CNAs的存在，這使得VAFs的分析變得混亂

5. Conclusion: basic and clinical applications

在癌癥基因組學(xué)進(jìn)入生物醫(yī)學(xué)領(lǐng)域的短短時間內(nèi)，它做出了許多基礎(chǔ)性的貢獻(xiàn)：
首先，癌癥相關(guān)基因和途徑已被確定;
其次，已經(jīng)建立了胚系的易感性;
三是技術(shù)和算法不斷完善;
第四，組織和記錄了大量的數(shù)據(jù)集;
最后，知識被分類到新的數(shù)據(jù)庫中。
未來的挑戰(zhàn)：
'data spectrum' and associated analysis tools are not yet complete，如蛋白質(zhì)組數(shù)據(jù)；
The second factor is the reality of cost；
癌癥研究的下一個篇章無疑將進(jìn)一步推動臨床應(yīng)用，并使大型制藥公司更多地參與開發(fā)新的治療藥物。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Expanding the computational toolbox for mining cancer genomes

Expanding the computational toolbox for mining cancer genomes

1.Sequencing strategies

2.Dissecting genomic changes in cancer

3. Driver mutations and pathways

4. Genome integrity and clonal architectures

5. Conclusion: basic and clinical applications

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Expanding the computational toolbox for mining cancer genomes

1.Sequencing strategies

2.Dissecting genomic changes in cancer

3. Driver mutations and pathways

4. Genome integrity and clonal architectures

5. Conclusion: basic and clinical applications

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av