11 篇 Science 連發(fā)!解析基因組的關(guān)鍵之處,揭示人類健康和疾病的相關(guān)信息

導(dǎo)讀

從兩克重的大黃蜂到重達(dá)數(shù)噸的鯨魚,地球上存在著包括人類在內(nèi)的豐富的物種,在過(guò)去的漫長(zhǎng)時(shí)間里,它們幾乎適應(yīng)了地球上的所有環(huán)境。其中,哺乳動(dòng)物是最多樣化的一類動(dòng)物,無(wú)論是在大小上,還是在形狀上,均表現(xiàn)出豐富的多樣性。自生命科學(xué)研究出現(xiàn)以來(lái),了解哺乳動(dòng)物的變異是何時(shí)、如何以及在何種選擇壓力下發(fā)展起來(lái)的一直是人們感興趣的問(wèn)題。此外,通過(guò)研究人的進(jìn)化史,還可以進(jìn)一步了解人類的健康狀況,例如,那些在許多物種中保守的基因可能是對(duì)正常功能至關(guān)重要的基因,因此當(dāng)其發(fā)生改變時(shí)可能導(dǎo)致疾病。

2023 年 4 月 28 日,諸多科學(xué)家們與世界上最大的哺乳動(dòng)物基因組學(xué)比較資源 Zoonomia Project 的國(guó)際合作,同日在 Science 雜志上發(fā)表了 11 篇研究論文。他們對(duì) 240 種哺乳動(dòng)物物種(占哺乳動(dòng)物家族的 80% 以上)的基因組多樣性進(jìn)行了編目。其中,部分研究發(fā)現(xiàn)指出人類基因組中經(jīng)過(guò)數(shù)百萬(wàn)年進(jìn)化后保持不變的部分,提供了可能揭示人類健康和疾病的信息。

圖片來(lái)源:Science

Zoonomia Project 是由麻省理工學(xué)院、哈佛大學(xué)等單位的科學(xué)家牽頭的一個(gè)大型國(guó)際研究項(xiàng)目,研究人員通過(guò)對(duì)一系列哺乳動(dòng)物基因組進(jìn)行測(cè)序,然后將數(shù)百個(gè)物種的基因組進(jìn)行整合分析,為理解哺乳動(dòng)物、哺乳動(dòng)物進(jìn)化和人類自身打開(kāi)一扇新的大門。研究人員對(duì)一系列哺乳動(dòng)物基因組進(jìn)行測(cè)序,然后將它們進(jìn)行對(duì)比,這是一項(xiàng)巨大的計(jì)算任務(wù)。利用這種比對(duì),研究人員確定了基因組的關(guān)鍵區(qū)域,在哺乳動(dòng)物物種和數(shù)百萬(wàn)年的進(jìn)化中最為保守或不變。

作者假設(shè),雖然這些區(qū)域不產(chǎn)生蛋白質(zhì),但可能包含指導(dǎo)蛋白質(zhì)產(chǎn)生時(shí)間和數(shù)量的指令,這些區(qū)域的突變可能在疾病的起源或哺乳動(dòng)物物種的獨(dú)特特征中發(fā)揮重要作用。通過(guò)他們的分析,研究人員也驗(yàn)證了這一假設(shè),并能夠確定至少 10% 的人類基因組是有功能的,大約是蛋白質(zhì)編碼(1%)的十倍。研究結(jié)果進(jìn)一步揭示了遺傳變異可能在罕見(jiàn)和常見(jiàn)的人類疾?。òò┌Y)中起到因果作用。

01

如果某些東西對(duì)物種正常的功能很重要,那么它往往會(huì)在進(jìn)化過(guò)程中被保存下來(lái),即進(jìn)化約束概念。因此,進(jìn)化約束是衡量基因組中特定區(qū)域在生命進(jìn)化樹(shù)上的變化程度。在今日 Science 特刊的一篇研究 Leveraging base-pair mammalian constraint to understand genetic variation and human disease中,Sullivan 等人觀察到的在許多物種和進(jìn)化過(guò)程中保持不變的 DNA 序列,以及在一個(gè)或幾個(gè)譜系中突然開(kāi)始積累突變的序列,都強(qiáng)有力地表明了功能相關(guān)性和進(jìn)化力量在起作用。研究人員還通過(guò)研究髓母細(xì)胞瘤患者,發(fā)現(xiàn)了人類基因組進(jìn)化保守位置的突變,他們認(rèn)為這些突變可能導(dǎo)致腦腫瘤生長(zhǎng)更快或抵抗治療。結(jié)果表明,在疾病研究中使用這些數(shù)據(jù)和方法可以更容易地發(fā)現(xiàn)增加疾病風(fēng)險(xiǎn)的遺傳變化。

02

在研究 Evolutionary constraint and innovation across hundreds of placental mammals中,研究人員確定了與哺乳動(dòng)物世界中一些特殊特征相關(guān)的基因組部分,例如非凡的大腦大小、卓越的嗅覺(jué)以及在冬季冬眠的能力。作者使用基因組來(lái)證實(shí),對(duì)有效種群規(guī)模和多樣性的估計(jì)可以幫助預(yù)測(cè)難以監(jiān)測(cè)和采樣的物種的風(fēng)險(xiǎn)。

03

在另一項(xiàng)研究 A genomic timescale for placental mammal evolution 中表明,甚至在大約 65 萬(wàn)年前,即地球被小行星撞擊、恐龍滅絕之前,哺乳動(dòng)物就已經(jīng)開(kāi)始變異和分化。

The timing of placental mammal evolution.

Superordinal mammalian diversification took place in the Cretaceous during periods of continental fragmentation and sea level rise with little phylogenomic discordance (pie charts: left, autosomes; right, X chromosome), which is consistent with allopatric speciation. By contrast, the Paleogene hosted intraordinal diversification in the aftermath of the K-Pg mass extinction event, when clades exhibited higher phylogenomic discordance consistent with speciation with gene flow and incomplete lineage sorting.

04

另一項(xiàng)題為Three-dimensional genome rewiring in loci with human accelerated regions的研究中,使用 Zoonomia 數(shù)據(jù)和實(shí)驗(yàn)分析檢查了 10000 多個(gè)特定于人類的基因缺失,并將其中一些與神經(jīng)元的功能聯(lián)系起來(lái)。

Example of HAR enhancer hijacking.image.png

The HAR is nearby and regulates gene A, but not gene B, as the chimpanzee genome folds. An insertion in the human genome brings the HAR closer to gene B, causing expression of gene B. The HAR adapts to being in gene B’s regulatory domain through substitutions to previously conserved nucleotides.

05

一篇題為Comparative genomics of Balto, a famous historic dog, captures lost diversity of 1920s sled dogs的研究中,提供了為什么 1920 年代一只名叫巴爾托的著名雪橇犬能夠在阿拉斯加的惡劣環(huán)境中幸存下來(lái)的遺傳解釋。

Balto, famed 20th-century Alaskan sled dog, shares common ancestry with modern Asian and Arctic canine lineages.

In an unsupervised admixture analysis, Balto’s ancestry, representing 20th-century Alaskan sled dogs, is assigned predominantly to four Arctic lineage dog populations. He had no discernable wolf ancestry. The Alaskan sled dogs (a working population) did not fall into a distinct ancestry cluster but shared about a third of their ancestry with Balto in the supervised admixture analysis. Balto and working sled dogs carried fewer constrained and missense rare variants than modern dog breeds.IMAGE CREDIT: K. MORRILL

06

一篇題為 The functional and evolutionary impacts of human-specific deletions in conserved elements的研究中,Xue 等人則分享了對(duì)基因組結(jié)構(gòu)的研究。在確定了僅跨越少數(shù)堿基的缺失后,他們分析了這些缺失在多種人類細(xì)胞類型中調(diào)節(jié)基因表達(dá)的能力,并探索了這些缺失是否可能導(dǎo)致獨(dú)特的人類表型。結(jié)果發(fā)現(xiàn),復(fù)雜的認(rèn)知功能再次成為人類進(jìn)化過(guò)程中序列變化的主要受益者之一,這些小缺失附近的基因系統(tǒng)地富集了那些在大腦和神經(jīng)元功能中發(fā)揮作用的基因。通過(guò)實(shí)驗(yàn)證實(shí)了它們?cè)诙喾N細(xì)胞類型中的功能后,作者還觀察到,許多缺失導(dǎo)致人類細(xì)胞中基因表達(dá)的增加,這是獲取新功能的驅(qū)動(dòng)因素。

Human-specific deletions that remove nucleotides from regions highly conserved in other animals (hCONDELs).

We assessed 10,032 hCONDELs across diverse, biologically relevant datasets and identified tissue-specific enrichment (top left). The regulatory impact of hCONDELs was characterized by comparing chimp and human sequences in MPRAs (bottom left). The ability of hCONDELs to either improve or perturb activating and repressing gene-regulatory elements was assessed (top right). The deleted chimpanzee sequence was reintroduced back into human cells, causing a cascade of transcriptional differences for an hCONDEL regulating LOXL2 (bottom right).

07

在一篇題為 Relating enhancer genetic variation across mammals to complex phenotypes using machine learning的研究中,研究人員使用機(jī)器學(xué)習(xí)來(lái)識(shí)別與大腦大小相關(guān)的基因組區(qū)域。

Tissue-Aware Conservation Inference Toolkit (TACIT) associates genetic differences between species with phenotypes.

TACIT works by generating open chromatin data from a few species in a tissue related to a phenotype, using the sequences underlying open and closed chromatin regions to train a machine learning model for predicting tissue-specific open chromatin and associating open chromatin predictions across dozens of mammals with the phenotype. [Species silhouettes are from PhyloPic]

08

在題為 Mammalian evolution of human cis-regulatory elements and transcription factor binding sites的研究中,描述了人類基因組中調(diào)控序列的進(jìn)化。

Mammalian evolution of the human regulatory landscape.

(A) Distribution of human cCREs by the number of genomes they align.
(B) Projection of cCREs by alignments to the other 240 mammalian genomes.
(C) Project of HNF4A sites (constrained, red; unconstrained, blue).
(D) Heritability enrichment for 69 human traits in partitions of TFBSs ordered by evolutionary constraint.
(E) Heritability enrichment for human traits by subsets of TFBSs.

09

在題為 Insights into mammalian TE diversity through the curation of 248 genome assemblies 的研究中,檢測(cè)了 248 個(gè)胎盤哺乳動(dòng)物基因組裝配體的轉(zhuǎn)座元件 (transposable element, TE) 含量,這是迄今真核生物中最大的 de novo TE 管理工作。研究發(fā)現(xiàn),盡管哺乳動(dòng)物在總 TE 含量和多樣性方面相似,但它們?cè)诮?TE 積累方面表現(xiàn)出實(shí)質(zhì)性的差異。哺乳動(dòng)物在任何給定的時(shí)間往往只積累少數(shù)幾種 TE,其中一種 TE 占主導(dǎo)地位。此外,還發(fā)現(xiàn)了飲食習(xí)慣與 DNA 轉(zhuǎn)座子入侵之間的關(guān)聯(lián)。

Boxplots depicting the range of recently accumulated TEs among mammals (by proportion of genome).

Five categories of TE were examined: DNA transposons, long interspersed elements (LINEs), long terminal repeat (LTR) retrotransposons, rolling circle (RC) transposons, and short interspersed elements (SINEs). Species with the highest and lowest proportions for each TE type are indicated by a picture of the organism and its common name. With regard to RC and DNA transposons, we found that most mammalian genome assemblies exhibit essentially zero recent accumulation (RC: 240 of 248 mammals had <0.1%; DNA: 210 of 248 mammals had <0.1%).ILLUSTRATIONS: BRITTANY ANN HALE

10

在題為 The contribution of historical processes to contemporary extinction risk in placental mammals 的研究中,調(diào)查了 240 種哺乳動(dòng)物的單基因組的遺傳變異,發(fā)現(xiàn)由于遺傳負(fù)荷的長(zhǎng)期積累和固定,歷史上種群較小的物種攜帶了比例較大的有害等位基因,有較高的滅絕風(fēng)險(xiǎn)。

Genomic information can help predict extinction risk in diverse mammalian species.

Across 240 mammals, species with smaller historical Ne had lower genetic diversity, higher genetic load, and were more likely to be threatened with extinction. Genomic data were used to train models that predict whether a species is threatened, which can be valuable for assessing extinction risk in species lacking ecological or census data. [Animal silhouettes are from PhyloPic]

11

在題為 Integrating gene annotation with orthology inference at scale 的研究中,提出了 TOGA(Tool to infer Orthologs from Genome Alignments),這是一種集成了結(jié)構(gòu)基因注釋和同源序列推斷的方法。研究人員將其應(yīng)用于 488 個(gè)胎盤哺乳動(dòng)物和 501 個(gè)鳥類,從而創(chuàng)建了迄今最大的比較基因資源。

A different paradigm for orthology inference.

Orthologous, but not paralogous, genes have partially aligning intronic and intergenic regions. TOGA uses this principle to infer orthologous gene loci and integrates orthology inference with gene annotation. Using a reference species, TOGA can be applied to hundreds of aligned query genomes to provide rich comparative genomics resources.

在本期 Science 特刊的一系列論文中,比較了 240 種哺乳動(dòng)物的基因組,其中還包含了許多受威脅或?yàn)l危物種。這些 DNA 樣本由全球 50 多個(gè)不同的機(jī)構(gòu)收集和提供,這些發(fā)現(xiàn)有助于說(shuō)明比較基因組學(xué)如何不僅可以闡明某些物種如何取得非凡的壯舉,還可以幫助科學(xué)家更好地了解我們基因組中功能正常的部分以及它們?nèi)绾斡绊懡】岛图膊?/strong>。

參考文獻(xiàn):
1. Bogdan M. Kirilenko et al. Integrating gene annotation with orthology inference at scale. Science (2023).

2. Aryn P. Wilder et al. The contribution of historical processes to contemporary extinction risk in placental mammals. Science (2023).

3. Nicole M. Foley et al. A genomic timescale for placental mammal evolution. Science (2023).

4. Austin B. Osmanski et al. Insights into mammalian TE diversity through the curation of 248 genome assemblies. Science (2023).

5. James R. Xue et al. The functional and evolutionary impacts of human-specific deletions in conserved elements. Science (2023).

6. Matthew J. Christmas and Irene M. Kaplow et al. Evolutionary constraint and innovation across hundreds of placental mammals. Science (2023).

7. Katherine L. Moon et al. Comparative genomics of Balto, a famous historic dog, captures lost diversity of 1920s sled dogs. Science (2023).

8. Gregory Andrews et al. Mammalian evolution of human cis-regulatory elements and transcription factor binding sites. Science (2023).

9. Kathleen C. Keough et al. Three-dimensional genome rewiring in loci with human accelerated regions. Science (2023).

10. Irene M. Kaplow et al. Relating enhancer genetic variation across mammals to complex phenotypes using machine learning. Science (2023).

11. Patrick F. Sullivan et al. Leveraging base-pair mammalian constraint to understand genetic variation and human disease. Science (2023).

參考:https://mp.weixin.qq.com/s/Nxj92VYnRFNg9KH6L7CTBw

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容