Author: Matthew C. Altman 1,2,21? , Darawan Rinchai3,21?, Nicole Baldwin 4, Mohammed Toufiq 3, Elizabeth Whalen1, Mathieu Garand3, Basirudeen Syed Ahamed Kabeer3, Mohamed Alfaki3, Scott R. Presnell 1,Prasong Khaenam1, Aaron Ayllón-Benítez 5, Fleur Mougin5, Patricia Thébault6, Laurent Chiche7, Noemie Jourde-Chiche8, J. Theodore Phillips4, Goran Klintmalm4, Anne O’Garra 9,10, Matthew Berry11, Chloe Bloom10, Robert J. Wilkinson12,13,14, Christine M. Graham9, Marc Lipman15, nGanjana Lertmemongkolchai 16, Davide Bedognetti3, Rodolphe Thiebaut 5, Farrah Kheradmand 17, Asuncion Mejias 18, Octavio Ramilo 18, Karolina Palucka4,19, Virginia Pascual 4,20, Jacques Banchereau 4,19 & Damien Chaussabel 1,3?
Affiliations:
1Systems Immunology, Benaroya Research Institute, Seattle, WA, USA.
2Division of Allergy and Infectious Diseases, University of Washington, Seattle, WA, USA.
3Research Branch, Sidra Medicine, Doha, Qatar.
4Baylor Institute for Immunology Research, Baylor Research Institute, Dallas,TX, USA.
5Inserm U1219 Bordeaux Population Health Research Center, Bordeaux University, Bordeaux, France.
6LaBRI, CNRS UMR5800,Bordeaux University, Bordeaux, France.
7Department of Internal Medicine, Hopital Européen, Marseille, France.
8Aix-Marseille University, C2VN,INSERM 1263, INRA 1260 Marseille, France.
9Laboratory of Immunoregulation and Infection, The Francis Crick Institute, London, UK.
10National Heart and Lung Institute, Imperial College London, London, UK.
11Royal Cornwall Hospitals NHS Trust, Truro, UK.
12The Francis Crick Institute, London, UK.
13Department of Infectious Disease, Imperial College, London, UK.
14Wellcome Center for Infectious Diseases Research in Africa and Department of Medicine, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town Observatory, 7925 Cape Town, Republic of South Africa.
15UCL Respiratory, Division of Medicine, University College London, London, UK.
16Centre for Research and Development of Medical Diagnostic Laboratories, Faculty of Associated Medical Sciences, Khon Kaen University, Khon Kaen, Thailand.
17Baylor College of Medicine & Center for Translational Research on Inflammatory Diseases, Michael E. DeBakey VAMC, Houston, TX, USA.
18Abigail Wexner Research Institute at Nationwide Children’s Hospital and the Ohio State University School of Medicine, Columbus, OH, USA.
19The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
20Weill Cornell Medicine, New York, NY, USA.
21These authors contributed equally: Matthew C. Altman, Darawan Rinchai.
?email: maltman@benaroyaresearch.org; drinchai@sidra.org; dchaussabel@sidra.org
?? 本研究設(shè)計(jì)了一種新的轉(zhuǎn)錄組模塊庫——BloodGen3,可作為分析和解釋血液轉(zhuǎn)錄組穩(wěn)定的可重復(fù)的框架。此分析框架構(gòu)建是基于985個(gè)不同的免疫學(xué)和生理學(xué)狀態(tài)的血液轉(zhuǎn)錄組表達(dá)譜的共聚類模式。支持多種可選自定義來源的解釋,包括:module-level analysis workflows, fingerprint grid plot visualizations, interactive web applications and an extensive annotation framework comprising functional profiling reports and reference transcriptional profiles.
Results
Generation of a collection of datasets covering a wide range of immune states.
?? 為構(gòu)建框架,為了識別到盡量寬泛的免疫學(xué)反應(yīng),納入了16個(gè)datasets的共985個(gè)樣本。
- TB (23 condition, 11 control, 34 total);
- Staph aureus (99 condition, 44 control, 143 total);
- Sepsis (35 condition, 12 control, 47 total);
- HIV (28 condition, 35 control, 63 total);
- Flu (25 condition, 14 control, 39 total);
- RSV (70 condition, 14 control, 84 total);
- B-cell deficiency (20 condition, 13 control, 33 total);
- Liver Transplant (94 condition, 30 control, 124 total);
- Pregnancy (25 condition, 20 control, 45 total);
- Melanoma Stage IV (22 condition, 5 control, 27 total);
- Kawasaki (21 condition, 23 control, 44 total);
- Juvenile Dermatomyositis (40 condition, 9 control, 49 total);
- COPD (19 condition, 24 control, 43 total);
- MS - untreated (34 condition, 22 control, 56 total);
- Pediatric SLE (55 condition, 14 control, 69 total);
- SoJIA (62 condition, 23 control, 85 total)
Implementation of a stepwise approach to blood transcriptional module repertoire construction.

a. 收集跨越多種免疫和生理狀態(tài)的16個(gè)血液轉(zhuǎn)錄組數(shù)據(jù)集作為識別基因共表達(dá)模式起點(diǎn)
b. 利用k-means對每個(gè)數(shù)據(jù)集進(jìn)行獨(dú)立的聚類分析
c. 記錄兩個(gè)基因包含在同一聚類中的實(shí)例數(shù)量,觸發(fā)事件范圍0-16之間(即反映了沒有或所有16個(gè)數(shù)據(jù)集中的共聚類范圍)
d. 共表達(dá)記錄作為輸入數(shù)據(jù)并建立一個(gè)co-clustering graph,節(jié)點(diǎn)代表genes,邊代表共表達(dá)事件(至少發(fā)生一次),并根據(jù)聚類次數(shù)賦予相應(yīng)的權(quán)重。
e. 根據(jù)權(quán)重逐級選擇整個(gè)網(wǎng)絡(luò)中選擇相應(yīng)的sub-network,并分配相應(yīng)module ID.
算法與偽代碼
- 基于歐式距離(Euclidean distance)
- Hartigan’s K-Means clustering algorithm (分層 + k-means聚類算法)
- 優(yōu)化設(shè)置:If at any k the algorithm creates a cluster whose members’ average Pearson correlation to the mean cluster vector is <0.3, the cluster is deleted and the algorithm begins again at k-1. The ‘ideal’ number of clusters (k) for each dataset was determined within a range of k=1-100 by means of the jump statistic.
偽代碼如下:
Integer nLastQuartile = 4;
Integer nMaxRelaxtion = m_nNumDatasets / 3;
Integer nRelaxtionIncrement = Math.max(1, (nMaxRelaxtion / 3));
Integer nRelaxtion = nMaxRelaxtion;
for (int nCliqueThreshold = numberOfDatasets; nCliqueThreshold >= 1; nCliqueThreshold--)
{
Integer nQuartile = ((nThreshold * 100) / m_nNumDatasets) / 25;
if (nQuartile.equals(nLastQuartile) == false)
{
if (nQuartile <= 2)
{
nRelaxtion = Math.max(0, nRelaxtion - nRelaxtionIncrement);
}
nLastQuartile = nQuartile;
}
Integer nParacliqueThreshold = nThreshold - nRelaxtion;
do
{
maximumClique = find maximum clique w co-clustering weight >= nCliqueThreshold
if (size of maximumClique > 15)
{
paraclique = find paraclique in graph
remove maximumClique and paraclique from graph
}
} while (maximumClique is found)

?? 權(quán)重共聚類網(wǎng)絡(luò)用于構(gòu)建BloodGen3的模塊庫。具體來講,即獲取根據(jù)在不同生理?xiàng)l件的"states"狀態(tài)下獲得相應(yīng)共表達(dá)網(wǎng)絡(luò)的factor。對于全血樣本而言,這些狀態(tài)即是不同的疾病或生理學(xué)表型。在A場景下(某種生理?xiàng)l件的"states"),基因集在所有撒種疾病狀態(tài)下均共表達(dá),故網(wǎng)絡(luò)權(quán)重為3(邊的值設(shè)置為3).在場景B和C下,共表達(dá)發(fā)生在2種或3中疾病條件下,權(quán)重則為2和3。
Development of module-level analysis workflows and visualizations.
?? 通過上述方法構(gòu)建了依賴轉(zhuǎn)錄子豐度對應(yīng)的各種生理特征下的基因集作為候選給定模塊,據(jù)此,利用這些模塊作為"framework"能夠:
- identify functional convergences among the genes that comprise each set
- summarize changes in overall transcript abundance related to pathological processes or therapeutic interventions.
?? 最終BloodGen3模塊庫包含了382個(gè)模塊,每個(gè)模塊平均的基因數(shù)為37.1,中位數(shù)為26.5,范圍在12-169。
??模塊功能注釋以及富集使用的工具包括:GSAn, Literature Lab, IPA, DAVID, KEGG, BioCarta, OMIM, and GOTERM。
?? 作者也將 BloodGen3 repertoire與先前的研究(Gen1, Gen2)的重疊情況進(jìn)行了鑒定1。
?? modual-水平的分析則確定了組間豐度水平不同的構(gòu)成型轉(zhuǎn)錄本的比例s (e.g. cases vs.controls; pre-treatment vs. post-treatment)。由此衍生出兩個(gè)與轉(zhuǎn)錄本比例相關(guān)的values(升高或降低)。(cut-off)依據(jù)用戶自己的偏好(based on statistics, fold changes and/or differences with or without multiple-testing correction for group comparisons.)接下來使用"fingerprint"對模塊水平的差異表達(dá)進(jìn)行可視化。
The development of the BloodGen3 module fingerprint grids
?? 基于在16個(gè)數(shù)據(jù)集上觀察到的轉(zhuǎn)錄本豐度水平的相似性,執(zhí)行第二層聚類,將382個(gè)模塊分組為38個(gè)“aggregates”。分離到使用這種方法推導(dǎo)出了兩個(gè)級別粒度(即模塊級別和模塊aggregate級別)。模塊被限制為一個(gè)最小粒度aggregate級別,用于限制變量的數(shù)量便于管理。
a. 熱圖上的每一行分別對應(yīng)于給定數(shù)據(jù)集和給定方向的轉(zhuǎn)錄本豐度的變化(即轉(zhuǎn)錄本豐度的增加或減少)。將健康對照作為基線,轉(zhuǎn)錄豐度增加為紅色,減少為藍(lán)色,因此熱圖上總計(jì)有32行。列對應(yīng)包含了所有BloodGen3 庫(N = 382)。底部顯示的顏色與模塊aggregates ID相關(guān)聯(lián),僅用于說明在指紋網(wǎng)格圖上組織模塊的策略。整個(gè)過程達(dá)成的效果是:指紋網(wǎng)格中每一行的表達(dá)級別的變化形成相關(guān)性,而不是這種指紋網(wǎng)格的初始迭代的情況。
?? 執(zhí)行此步聚合后,在給定的模塊行中可以觀察到某種程度的功能收斂。例如,在指紋網(wǎng)格中,能夠發(fā)現(xiàn)A1行包含了幾個(gè)與淋巴細(xì)胞相關(guān)的模塊,而A28行包含了6個(gè)不同的“干擾素模塊”,A33行和A35行包含了許多與炎癥相關(guān)的功能模塊。
b. 模塊將根據(jù)以下情況進(jìn)行排列:根據(jù)16個(gè)數(shù)據(jù)集的相似性將382模塊被劃分為38 clusters (aggregates)。 27 個(gè)aggregates的子集包——含兩個(gè)以上的modules,作為圖中的行。圖b中的每條帶箭頭的線的長度代表了每個(gè)cluster中的module的數(shù)量。
c. 當(dāng)使用BloodGen3Module R package作為血液轉(zhuǎn)錄組數(shù)據(jù)的下游分析時(shí),module水平的將映射到這個(gè)圖表,并通過不同的顏色來呈現(xiàn)其密度的改變。
Illustrative case of fingerprint grid plot representation
?? 通過減小數(shù)據(jù)維度更便于理解數(shù)據(jù)本身。fingerprint圖垂直方向直觀的顯示了module中的aggregate的變化,水平方向則顯示aggregate的內(nèi)部變化以及其中包含的所有modules的變化。所有分析流與解釋可以在BooldGen3網(wǎng)頁上找到。

In-depth functional annotation of fixed transcriptional module repertoires
Functional annotation:
- 方法
- concurrent ontology, pathway or literature-term profiling analyses
- determination for the constitutive genes for each module of expression patterns in select reference
- 步驟
-
Step 1——Functional profiling
- 使用
DAVID,GOTERM以及GSAn對382個(gè)module進(jìn)行GO分析。 - 使用
KEGG,BioCarta以及the Ingenuity Pathway Analysis(IPA)進(jìn)行通路分析。 - 使用
Literature Lab進(jìn)行l(wèi)iterature-term enrichment。 - 使用
RcisTargetR包鑒定識別在每個(gè)modul中過表達(dá)的轉(zhuǎn)錄因子結(jié)合基序
最終將這一步獲取的注釋進(jìn)行整理合并,獲取不同module的功能注釋titles.
- 使用
-
Step 2—Expression patterns in reference transcriptome datasets:(此步使用了三個(gè)不同的轉(zhuǎn)錄組數(shù)據(jù)作為reference來改善
BloodGen3module庫的特征和功能的可解釋性。)- Novershtern2
- Speake3
- Monaco4
-
Measuring inter-individual variability for the molecular stratification of patient cohorts
?? 此步主要是表征個(gè)體差異,依據(jù)個(gè)體的轉(zhuǎn)錄子的counts設(shè)置固定的cutoff(e.g: absolute fold change in expression and absolute difference in expression vs. average of control samples)。計(jì)算不同個(gè)體表達(dá)的差異基因的百分比,這些百分比相當(dāng)于從組間比較獲得的值,只是它們是為每個(gè)單獨(dú)的樣本得到的。

Profiling the abundance of A28 interferon-inducible genes at the aggregate level across reference patient cohorts
?? 作者對BloodGen3的應(yīng)用做了相應(yīng)的解釋,如下圖:

a. 展示了16個(gè)健康狀態(tài)下,轉(zhuǎn)錄子豐度在27個(gè)module aggregates(包含至少兩個(gè)以上的module)上相應(yīng)模式的熱圖??梢钥吹?,在第一分層中,急性HIV感染與MS被聚類,另外14個(gè)健康狀態(tài)發(fā)生聚類。發(fā)生這種二分情況的原因是:與炎癥或/和髓系細(xì)胞相關(guān)的模塊發(fā)生了抑制(A34–A38) ,伴隨這淋巴細(xì)胞反應(yīng)增強(qiáng)相聯(lián)系的模塊發(fā)生聚集 (A1–A8) 。這暗示,首要的signature發(fā)生變化是由于粒系和淋巴細(xì)胞的數(shù)量在整體比例上發(fā)生了變化。這里值得注意的是,盡管如上述情況確實(shí)可能存在并影響了相應(yīng)的整體轉(zhuǎn)錄豐度,但在IFN信號模塊在HIV與其他的聚類組別中卻出現(xiàn)的相似的富集情況。(e.g 急性HIV感染屬于一種簇,而SLE或流感病毒屬于另一個(gè)簇)。
b. 顯示了A28 aggregate中的6個(gè)modules的基因組成
c. 顯示了A28 aggregate在不同感染性疾病狀況下患者與對照組之間的基因表達(dá)差異
d. 顯示了使用IFNα治療的丙肝感染患者與使用IFNβ治療的MS患者的A28 aggregates相關(guān)基因表達(dá)差異。
??對I型IFN的反應(yīng)主要是構(gòu)成M8.3和M10.1的轉(zhuǎn)錄本豐度的不成比例的增加。相反地,M15.86對I型IFN處理后的變化非常的小,卻在急性HIV感染和流感病毒感染時(shí)顯著的增加。所以,M15.86可能與IFNγ相關(guān)。RSV與其他感染相比,其IFN的反應(yīng)要更弱,而TB感染出現(xiàn)的強(qiáng)烈的IFN反應(yīng)。
Profiling the abundance of A28 interferon-inducible genes at the module level across reference patient cohorts

Profiling the abundance of A28 interferon-inducible genes at the module level across individual subjects
Development and availability of ancillary resources
BloodGen3Module應(yīng)用
??BloodGen3Module R包的工作流程圖,如下所示:

應(yīng)用BloodGen3Module 包括了三個(gè)步驟:

?? 表達(dá)矩陣的第一列和第二列分別添加gene symbol和gene對應(yīng)的module信息,隨后的列則是樣本所對應(yīng)表達(dá)信息。
- determination of differential expression
determination of differential expression
?? 第一列為gene symbol,第二列為相關(guān)module。差異表達(dá)可以自定義設(shè)定,若是比較兩組差異,可以比較p值和fold change(FC),若是在個(gè)體水平差異以及FC。 - calculation of the percentage of the response
calculation of the percentage of the response
?? 在組水平比較的實(shí)例:第一列為module,第二列為Total gene則是module中的所含基因數(shù)量。第三、四列兩組比較后up-regulated modules和down-regulated modules,并將該列進(jìn)行細(xì)分。最后一列為% Responses,即總的一個(gè)模塊內(nèi)的基因的響應(yīng)率。個(gè)module的響應(yīng)比率的計(jì)算方式是:(up-regulated gene number - down-regulated gene number)/ Total gene
代碼
Group comparison analysis
?? 組間差異可以使用t-test(R包方法Groupcomparison)和limma(R包方法Groupcomparisonlimma).
相關(guān)代碼:
t-test:
# t-test:
Group_df <- Groupcomparison(data.matrix,
sample_info = sample_ann,
FC = 1.5,
pval = 0.1,
FDR = TRUE,
Group_column = "Group_test",
Test_group = "Sepsis",
Ref_group = "Control")
limma:
# limma
Group_limma <- Groupcomparisonlimma(data.matrix,
sample_info = sample_ann,
FC = 1.5,
pval = 0.1,
FDR = TRUE,
Group_column = "Group_test",
Test_group = "Sepsis",
Ref_group = "Control")
代碼變量說明:
-
data.matrix:為基因水平的表達(dá)矩陣,使用gene symbol作為矩陣的row.names,在進(jìn)行方法Groupcomparison或Groupcomparisonlimma前必須進(jìn)行預(yù)處理(如進(jìn)行normalization)。注意:這里的歸一化預(yù)處理不能進(jìn)行l(wèi)og2轉(zhuǎn)換。 -
sample_ann:為樣本的注釋文件,將與data.matrix里列名相對應(yīng)的樣本名設(shè)置為row.name。將其特定的分組信息(相應(yīng)的condition信息)作為列,并對其命名(比如,可以將列名設(shè)置為Group_test)。
Fingerprint grid visualization
?? 此步進(jìn)行module水平的轉(zhuǎn)錄子豐度改變的可視化。構(gòu)成module的轉(zhuǎn)錄子,在兩組間的差異表達(dá)情況的百分比作為module response在圖表中進(jìn)行展現(xiàn)。在圖表中的圓點(diǎn)具有代表相應(yīng)的aggregate中的module(它們的位置是固定的),而紅色或藍(lán)色代表相應(yīng)相應(yīng)的基因轉(zhuǎn)錄百分比在表達(dá)水平的增高或降低。
gridplot(Group_df,
cutoff = 15,
Ref_group = "Control",
filename = "Group_comparison_")
Individual sample analysis
Individual_df <- Individualcomparison(data.matrix,
sample_info = sample_ann,
FC = 1.5,
DIFF = 10,
Group_column = "Group_test",
Ref_group = "Control")
Individual fingerprint visualization
fingerprintplot(Individual_df,
sample_info = sample_ann,
cutoff = 15,
rowSplit = TRUE,
Group_column = "Group_test",
show_ref_group = FALSE,
Ref_group = "Control",
Aggregate = NULL,
filename = "Gen3_Individual_plot",
height = NULL,
width = NULL)
參考文獻(xiàn)
- Li, S. et al. Molecular signatures of antibody responses derived from a systems biology study of five human vaccines. Nat. Immunol. 15, 195–204 (2014).
- Novershtern, N. et al. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell 144, 296–309 (2011).
- Linsley, P. S., Speake, C., Whalen, E. & Chaussabel, D. Copy number loss of the interferon gene cluster in melanomas is linked to reduced T cell infiltrate and poor patient prognosis. PloS ONE 9, e109760 (2014).
- Monaco, G. et al. RNA-Seq signatures normalized by mRNA abundance allow absolute deconvolution of human immune cell types. Cell Rep. 26, 1627–1640. e7 (2019).


