寫在前面
這篇文章仍然來(lái)自幾篇文章及自己平時(shí)的積累,主要闡述關(guān)鍵基因和hub基因。很多人誤以為hub基因就是關(guān)鍵基因,甚至有人認(rèn)為差異表達(dá)基因就是關(guān)鍵基因。在正式看本文章之前,我先以個(gè)人理解的角度簡(jiǎn)單的來(lái)說(shuō)明這三者之間的關(guān)系,不同見解的請(qǐng)留言。
- 差異表達(dá)基因是兩個(gè)group之間有統(tǒng)計(jì)學(xué)差異的gene,以芯片為例的話,幾萬(wàn)個(gè)探針里可能差異的就1000個(gè)左右(當(dāng)然根據(jù)設(shè)定閾值差異很大)
- hub基因,是degree高的gene,在基因表達(dá)網(wǎng)絡(luò)中有高的連接度degree,不涉及betweeness等。并且hub基因的篩選有很大的人為因素,到底是取前5%還是10%沒(méi)有具體要求,一般建議5%。也就是說(shuō)這是一個(gè)很寬松的設(shè)定。
- 關(guān)鍵基因,有人從hub里挑靠前的,有人從差異表達(dá)基因里挑p值大的。到怎么才算關(guān)鍵基因?籠統(tǒng)來(lái)說(shuō),假如你這個(gè)基因被敲減,表型顯著消失,那肯定是關(guān)鍵基因。但僅從生物信息分析角度怎么挑?不可能有一種方法就可以直接解決這個(gè)問(wèn)題,現(xiàn)在只從表達(dá)網(wǎng)絡(luò)的角度,稍后我會(huì)寫一篇多個(gè)角度如何篩選關(guān)鍵基因的文章。,其范圍要比hub小。hub不一定關(guān)鍵,關(guān)鍵不一定hub。
總之,在數(shù)目上獲范疇上
DGEs>Hubs>key genes(candidate genes)
------------------------------------------------
好了,開始正文吧
HUB 基因
The WGCNA approach typically deals with the identification of gene modules by using the gene expression levels that are highly correlated across samples. This technique has been successfully utilized to detect gene modules in Arabidopsis, rice, maize and poplar for various biotic and abiotic stresses . Further, this approach also leads to construction of Gene Co-expression Network (GCN), a scale free network, where, genes are represented as nodes and edges depict associations among genes . In such network, highly connected genes are called hub genes, which are expected to play an important role in understanding the biological mechanism of response under stresses/conditions. Identification of hub genes will also help in mitigating the stress in plants through genetic engineering. The existing approaches have mainly focused on hub gene identification, based only on gene connection degrees in the GCN. Moreover, these techniques select such genes empirically without any statistical criteria. Besides, few approaches can be found in the literature for the identification of hub nodes in a scale free network.
這里可以看出,hub基因是是在無(wú)尺度共表達(dá)網(wǎng)絡(luò)中存在的,對(duì)應(yīng)著degree,也就是說(shuō)在GCN中。現(xiàn)存的方法主要關(guān)注hub基因的鑒定,基于的就是GCN中的連接度,這些技術(shù)只是憑經(jīng)驗(yàn)選擇,并沒(méi)有統(tǒng)計(jì)學(xué)標(biāo)準(zhǔn)。另外,在文獻(xiàn)中很少有方法發(fā)現(xiàn)來(lái)鑒定無(wú)尺度網(wǎng)絡(luò)的中hub nodes。
所以作者提出了一個(gè)算法,并寫了一個(gè)包,對(duì)hub gene提供p值,可以根據(jù)p值標(biāo)準(zhǔn)來(lái)減少hub gene數(shù)目。
包在這里
文章地址1
文章地址2
It has been a long-standing長(zhǎng)久存在的 goal in systems biology to find relations between the topological properties and functional features of protein networks. However, most of the focus in network studies has been on highly connected proteins (“hubs”). As a complementary notion, it is possible to define bottlenecks as proteins with a high betweenness centrality (i.e., network nodes that have many “shortest paths” going through them, analogous to major bridges and tunnels on a highway map). Bottlenecks are, in fact, key connector proteins with surprising functional and dynamic properties. In particular, they are more likely to be essential proteins. In fact, in regulatory and other directed networks, betweenness (i.e., “bottleneck-ness”) is a much more significant indicator of essentiality than degree (i.e., “hub-ness”). Furthermore, bottlenecks correspond to the dynamic components of the interaction network—they are significantly less well coexpressed with their neighbors than nonbottlenecks, implying that expression dynamics is wired into the network topology.
A network is a graph consisting of a number of nodes with edges connecting them. Recently, network models have been widely applied to biological systems. Here, we are mainly interested in two types of biological networks: the interaction network, where nodes are proteins and edges connect interacting partners; and the regulatory network, where nodes are proteins and edges connect transcription factors and their targets. Betweenness is one of the most important topological properties of a network. It measures the number of shortest paths going through a certain node. Therefore, nodes with the highest betweenness control most of the information flow in the network, representing the critical points of the network. We thus call these nodes the “bottlenecks” of the network. Here, we focus on bottlenecks in protein networks. We find that, in the regulatory network, where there is a clear concept of information flow, protein bottlenecks indeed have a much higher tendency to be essential genes. In this type of network, betweenness is a good predictor of essentiality. Biological researchers can therefore use the betweenness as one more feature to choose potential targets for detailed analysis.
Figure1.png

下面是關(guān)于hub和bottlenecks的區(qū)別解釋
Central complex members have a low betweenness and are hub–nonbottlenecks. 中心復(fù)合體成員低betweenness,屬于hub-nonbottlenecks.
Because of the high connectivity inside these complexes, paths can go through them and all their neighbors. On the other hand, hub–bottlenecks tend to correspond to highly central proteins that connect several complexes or are peripheral members of central complexes.
Hub-bottlenecks傾向于對(duì)應(yīng)那些高中心性蛋白,連接幾個(gè)復(fù)合體,或者是中心復(fù)合體的周邊成員,他們有高betweenness的事實(shí)顯示這些蛋白不是簡(jiǎn)單的大的蛋白復(fù)合體的成員(nonbottleneck-hubs的特點(diǎn)),而是把這個(gè)復(fù)合體和網(wǎng)絡(luò)中其他部分連接起來(lái),一定意義上說(shuō),是真正的連接度瓶頸。
The fact that they have a high betweenness suggests that these proteins are not, however, simply members of large protein complexes (which is true for nonbottleneck–hubs), but are those members that connect the complex to the rest of the graph; in a sense, real connectivity bottlenecks. While hub–nonbottlenecks mainly consist of structural proteins, hub–bottlenecks are more likely to be part of signal transduction pathways.
Hub-nonbottlenecks主要構(gòu)成結(jié)構(gòu)蛋白,
Hub-bottlenecks更傾向于是信號(hào)轉(zhuǎn)導(dǎo)通路的一部分
Furthermore, hub–bottlenecks are (by construction) the most efficient in disrupting the network upon hub removal. This relates nicely to the date/party-hub concept by Han et al. : hub–bottlenecks tend to be date-hubs, whereas hub–nonbottlenecks tend to be party-hubs.
另外,一旦hub被移走,hub-bottlenecks是破壞網(wǎng)絡(luò)最有效的節(jié)點(diǎn)。這和Han的hub概念非常接近:hub-bottlenecks傾向于是date-hubs,hub-nonbottlenecks傾向于party-hubs(hans的文章看了就明白,datehubs更容易是大架構(gòu)的組織者維持者,是大老板)。(han的這個(gè)觀點(diǎn)發(fā)表在nature上,下面是han的觀點(diǎn))
上面說(shuō)的那個(gè)han的nature上的文章
https://www.nature.com/articles/nature02555
In apparently scale-free protein–protein interaction networks, or ‘interactome’ networks1,2, most proteins interact with few partners, whereas a small but significant proportion of proteins, the ‘hubs’, interact with many partners.
在無(wú)尺度蛋白相互作用網(wǎng)絡(luò)或叫相互作用組網(wǎng)絡(luò),大多數(shù)蛋白都是和少數(shù)的partners作用,只有少部分蛋白,也就是hubs,和很多partners作用.
非hub但瓶頸通常比那些非hub非瓶頸蛋白和他們的鄰居共表達(dá)更少,符合這個(gè)觀察:betweenness是和鄰接蛋白平均相關(guān)性的指標(biāo),非hub但瓶頸蛋白很少是復(fù)合體成員,并且大部分都是調(diào)節(jié)蛋白和信號(hào)轉(zhuǎn)到machinery。
不管是生物還是非生物,只要是無(wú)尺度網(wǎng)絡(luò),都對(duì)隨機(jī)的node移除有抵抗能力,但是對(duì)hubs的移除非常敏感。
大概就是酵母做了個(gè)實(shí)驗(yàn),移除敲除編碼hub蛋白的基因,比非hub的死亡率大3倍,我們發(fā)現(xiàn)了兩類hub:party hubs黨派型,同時(shí)和partners的大部分相互作用。Date hubs約會(huì)型,不同的時(shí)間或位置結(jié)合不同的partners。

這樣,酵母中的相互作用網(wǎng)絡(luò)的hub基于他們的partners‘表達(dá)譜,可以分為兩類:date和party hubs。這種區(qū)分揭示了酵母蛋白組組織模塊的模型,通過(guò)regulators,mediators或adaptors連接模塊,這就是date hubs。Party hubs代表不同的模塊內(nèi)部的必須的成分,對(duì)這這些模塊介導(dǎo)的功能很重要(因此傾向于是必須蛋白),傾向于在蛋白組的組織上低水平工作。(大概意思是date hubs是大boss,溝通銜接,而party hubs是模塊內(nèi)部的小老板)。我們提出,date hubs在整個(gè)蛋白組網(wǎng)絡(luò)中生物模塊的總體組織中是必須的,參與的是大范圍的整合連接(雖然一些date hub可以簡(jiǎn)單的共享,并且調(diào)節(jié)模塊內(nèi)或跨模塊的局部功能)。這種相互作用網(wǎng)絡(luò)的關(guān)鍵特點(diǎn),比如對(duì)抗外界環(huán)境的遺傳穩(wěn)定性和彈性,使用這樣的模塊組織方式作為框架就更好理解了。
因此,所謂的date-hubs是那些有高的betweeness(hub-bottlenecks),
而party-hubs更可能是有著低betweeness的hubs(hub-nonbottlenecks)
這個(gè)發(fā)現(xiàn),或許表明了相互作用網(wǎng)絡(luò)中動(dòng)態(tài)和拓?fù)涮匦灾g的聯(lián)系,而這迄今為止是人類未知的。
作者相信,雖然先有不好實(shí)現(xiàn)的地方,但是betweenness將來(lái)會(huì)被證明是一個(gè)非常有用的工具對(duì)很多蛋白昂立來(lái)說(shuō),尤其是有方向的edges(調(diào)控網(wǎng)絡(luò))。
總之,我們提供了兩種互補(bǔ)的拓?fù)渚W(wǎng)絡(luò)特性的整合分析,這適合于不同的網(wǎng)絡(luò)類型。這種整合的方法解釋了先前不為人知的網(wǎng)絡(luò)拓?fù)湫再|(zhì)之間的聯(lián)系,蛋白質(zhì)必要性和表達(dá)動(dòng)態(tài)。我們相信,這種整合的方法就像現(xiàn)在提出的這種,會(huì)對(duì)將來(lái)的預(yù)測(cè)模型至為重要。
