Issues 13

Motivation
Itis well known that the integration among different data-sources is reliablebecause of its potential of unveiling new functionalities of the genomicexpressions, which might be dormant in a single-source analysis. Moreover,different studies have justified the more powerful analyses of multi-platformdata. Toward this, in this study, we consider the circadian genes’ omicsprofile, such as copy number changes and RNA-sequence data along with theirsurvival response. We develop a Bayesian structural equation modeling coupledwith linear regressions and log normal accelerated failure-time regression tointegrate the information between these two platforms to predict the survivalof the subjects. We place conjugate priors on the regression parameters andderive the Gibbs sampler using the conditional distributions of them.
Results
Ourextensive simulation study shows that the integrative model provides a betterfit to the data than its closest competitor. The analyses of glioblastomacancer data and the breast cancer data from TCGA, the largest genomics andtranscriptomics database, support our findings.
背景:晝夜節(jié)律振蕩是調(diào)節(jié)多種生理和代謝過程的基本過程。晝夜節(jié)律的紊亂與重要的生理后果有關(guān),包括代謝紊亂和癌癥。節(jié)律基因與諸如膠質(zhì)母細(xì)胞瘤,乳腺癌的發(fā)病機制有關(guān),我們的研究集中在晝夜節(jié)律基因及其對患者生存的影響。
方法:在本文中提出貝葉斯結(jié)構(gòu)方程式和貝葉斯加速失效時間(AFT)模型相結(jié)合的方法,將RNAseq 和DNA CNV進行集成分析,預(yù)測預(yù)后.
結(jié)果:在仿真數(shù)據(jù)上進行模擬,發(fā)現(xiàn)性能優(yōu)于一般的回歸模型,然后在TCGA的癌癥數(shù)據(jù)集上進行預(yù)測,發(fā)現(xiàn)整合 CNV和RNAseq 可以更好地擬合生存情況。

Motivation
Thematrix factorization is an important way to analyze coregulation patterns intranscriptomic data, which can reveal the tumor signal perturbation status andsubtype classification. However, current matrix factorization methods do notprovide clear bicluster structure. Furthermore, these algorithms are based onthe assumption of linear combination, which may not be sufficient to capturethe coregulation patterns.
Results
Wepresented a new algorithm for Boolean matrix factorization (BMF) viaexpectation maximization (BEM). BEM is more aligned with the molecularmechanism of transcriptomic coregulation and can scale to matrix with over 100million data points. Synthetic experiments showed that BEM outperformed otherBMF methods in terms of reconstruction error. Real-world applicationdemonstrated that BEM is applicable to all kinds of transcriptomic data,including bulk RNA-seq, single-cell RNA-seq and spatial transcriptomicdatasets. Given appropriate binarization, BEM was able to extract coregulationpatterns consistent with disease subtypes, cell types or spatial anatomy.
背景:樣本的聚類可以發(fā)現(xiàn)細(xì)胞異質(zhì)性,基因的共表達(dá)聚類可以揭示轉(zhuǎn)錄因子和靶基因的關(guān)系,之前的BMF對布爾因子有很多的限制
方法:本文提出一種新的BMF算法,無需假設(shè)布爾因子的大小
結(jié)果:乳腺癌亞型分類,從單細(xì)胞測序進行細(xì)胞類型反卷積,空間轉(zhuǎn)錄組的細(xì)分

Motivation
Manyordinary differential equation (ODE) models have been introduced to replacelinear regression models for inferring gene regulatory relationships fromtime-course gene expression data. But, since the observed data are usually notdirect measurements of the gene products or there is an unknown time lag ingene regulation, it is problematic to directly apply traditional ODE models orlinear regression models.
Results
Weintroduce a lagged ODE model to infer lagged gene regulatory relationships fromtime-course measurements, which are modeled as linear transformation of thegene products. A time-course microarray dataset from a yeast cell-cycle studyis used for simulation assessment of the methods and real data analysis. Theresults show that our method, by considering both time lag and measurementscaling, performs much better than other linear and ODE models. It indicatesthe necessity of explicitly modeling the time lag and measurement scaling inODE gene regulatory models.
背景:推斷基因調(diào)控網(wǎng)絡(luò)是系統(tǒng)生物學(xué)的主要任務(wù),ODE模型經(jīng)常被用來描述動態(tài)系統(tǒng)。然而,在基因調(diào)控網(wǎng)絡(luò)的研究中,會有兩個問題:1. 實驗數(shù)據(jù)和原始數(shù)據(jù)之間會引入線性縮放2.會存在時間差
方法:我們應(yīng)用隨機非線性回歸方法來同時估計ODE模型中的所有參數(shù)
結(jié)果:在模擬數(shù)據(jù)集上,與線性模型相比,和沒有考慮線性縮放的ODE模型相比,沒有考慮連續(xù)時間差的ODE模型相比,精確度提高。在真實數(shù)據(jù)集上也表現(xiàn)突出。
?
Issue 11

Motivation
Single-cellsequencing (SCS) data provide unprecedented insights into intratumoralheterogeneity. With SCS, we can better characterize clonal genotypes andreconstruct phylogenetic relationships of tumor cells/clones. However, SCS dataare often error-prone, making their computational analysis challenging.
Results
Toinfer the clonal evolution in tumor from the error-prone SCS data, we developedan efficient computational framework, termed RobustClone. It recovers the truegenotypes of subclones based on the extended robust principal componentanalysis, a low-rank matrix decomposition method, and reconstructs thesubclonal evolutionary tree. RobustClone is a model-free method, which can beapplied to both single-cell single nucleotide variation (scSNV) and single-cellcopy-number variation (scCNV) data. It is efficient and scalable to large-scaledatasets. We conducted a set of systematic evaluations on simulated datasetsand demonstrated that RobustClone outperforms state-of-the-art methods inlarge-scale data both in accuracy and efficiency. We further validatedRobustClone on two scSNV and two scCNV datasets and demonstrated thatRobustClone could recover genotype matrix and infer the subclonal evolutiontree accurately under various scenarios. In particular, RobustClone revealedthe spatial progression patterns of subclonal evolution on the large-scale 10XGenomics scCNV breast cancer dataset.
背景:了解癌癥進展并表征腫瘤內(nèi)異質(zhì)性的先進進化機制可指導(dǎo)預(yù)測和控制癌癥進展,轉(zhuǎn)移和治療反應(yīng)的原理。單細(xì)胞測序能更好的刻畫細(xì)胞異質(zhì)性,但單細(xì)胞測序很容易出錯,包括FP,F(xiàn)N,MB,細(xì)胞重疊。
方法:PCA常用來從被大量噪聲污染的數(shù)據(jù)中恢復(fù)低秩數(shù)據(jù),本文在PCA的方法上進行了擴展RPCA,增加了魯棒性。利用scSNV 和 scCNV恢復(fù)細(xì)胞的真實的基因型,識別子克隆,重構(gòu)子克隆進化樹。
結(jié)果:在模擬數(shù)據(jù)集上和在真實數(shù)據(jù)集上,在精確性和有效性上都比已有的方法有優(yōu)勢

Motivation
Inthe analysis of high-throughput omics data from tissue samples, estimating andaccounting for cell composition have been recognized as important steps. Highcost, intensive labor requirements and technical limitations hinder the cellcomposition quantification using cell-sorting or single-cell technologies.Computational methods for cell composition estimation are available, but theyare either limited by the availability of a reference panel or suffer from lowaccuracy.
Results
Weintroduce TOols for the Analysis of heterogeneouS Tissues TOAST/-P?andTOAST/+P, two partial reference-free algorithms for estimating cell compositionof heterogeneous tissues based on their gene expression profiles. TOAST/-P andTOAST/+P incorporate additional biological information, includingcell-type-specific markers and prior knowledge of compositions, in theestimation procedure. Extensive simulation studies and real data analysesdemonstrate that the proposed methods provide more accurate and robust cellcomposition estimation than existing methods.
背景:細(xì)胞成分(包括細(xì)胞類型和比例)可以通過諸如免疫組織化學(xué),流式細(xì)胞術(shù)和單細(xì)胞測序之類的技術(shù)通過實驗獲得。RB方法需要從純化的組織獲得參考數(shù)據(jù),作為預(yù)測值;RF不需要參考,需要大量樣本,PRF需要額外的數(shù)據(jù)來提高預(yù)測結(jié)果。
方法:提出了一種局部RF反卷積方法TOAST/-P
and TOAST/+P,該方法利用基因表達(dá)數(shù)據(jù)和細(xì)胞類型特異性標(biāo)記物和先前的組成知識來指導(dǎo)細(xì)胞組成估計。-P沒有先驗細(xì)胞組成。+P 有先驗細(xì)胞組成
結(jié)果:在精確性和魯棒性方面優(yōu)于現(xiàn)有的方法

Motivation
Cell-type-specificsurface proteins can be exploited as valuable markers for a range ofapplications including immunophenotyping live cells, targeted drug deliveryandin vivoimaging. Despite their utility and relevance,the unique combination of molecules present at the cell surface are not yetdescribed for most cell types. A significant challenge in analyzing ‘omic’discovery datasets is the selection of candidate markers that are mostapplicable for downstream applications.
Results
Here,we developed GenieScore, a prioritization metric that integrates aconsensus-based prediction of cell surface localization with user-input data torank-order candidate cell-type-specific surface markers. In this report, wedemonstrate the utility of GenieScore for analyzing human and rodent data fromproteomic and transcriptomic experiments in the areas of cancer, stem cell andislet biology. We also demonstrate that permutations of GenieScore, termedIsoGenieScore and OmniGenieScore, can efficiently prioritize co-expressed andintracellular cell-type-specific markers, respectively.
背景:細(xì)胞類型特異性表面蛋白可作為有價值的標(biāo)記物用于許多應(yīng)用,包括活細(xì)胞免疫表型鑒定,靶向藥物遞送和體內(nèi)成像。但是大多數(shù)的細(xì)胞類型的表面蛋白是不清楚的。
方法:在這里,我們開發(fā)了GenieScore,這是一種優(yōu)先級度量標(biāo)準(zhǔn),它將基于共識的細(xì)胞表面定位預(yù)測與用戶輸入定量數(shù)據(jù)(蛋白組或轉(zhuǎn)錄組)集成在一起,對候選細(xì)胞類型特定的表面標(biāo)記進行排序。GenieScore為是否是表面蛋白,是否在豐度上有差異,信號強度(是否被特異性的抗體所檢測)的乘積
結(jié)果:開發(fā)了SurfaceGenie,web 界面,計算GenieScore和本體注釋

Background
Assigningevery human gene to specific functions, diseases and traits is a grandchallenge in modern genetics. Key to addressing this challenge arecomputational methods, such as supervised learning and label propagation, thatcan leverage molecular interaction networks to predict gene attributes. Inspite of being a popular machine-learning technique across fields, supervisedlearning has been applied only in a few network-based studies for predictingpathway-, phenotype- or disease-associated genes. It is unknown how supervisedlearning broadly performs across different networks and diverse geneclassification tasks, and how it compares to label propagation, the widelybenchmarked canonical approach for this problem.
Results
Inthis study, we present a comprehensive benchmarking of supervised learning fornetwork-based gene classification, evaluating this approach and a classic labelpropagation technique on hundreds of diverse prediction tasks and multiplenetworks using stringent evaluation schemes. We demonstrate that supervisedlearning on a gene’s full network connectivity outperforms label propagaton andachieves high prediction accuracy by efficiently capturing local networkproperties, rivaling label propagation’s appeal for naturally using networktopology. We further show that supervised learning on the full network is alsosuperior to learning on node embeddings (derived usingnode2vec), an increasingly popular approach forconcisely representing network connectivity. These results show that supervisedlearning is an accurate approach for prioritizing genes associated with diversefunctions, diseases and traits and should be considered a staple ofnetwork-based gene classification workflows.
背景:后基因組時代的一大挑戰(zhàn)是根據(jù)基因組中參與的細(xì)胞途徑以及與之相關(guān)的多因素性狀和疾病來表征基因組中的每個基因。 通過計算預(yù)測基因與途徑,性狀或疾病之間的關(guān)聯(lián)(此處稱為“基因分類”的任務(wù))對于此任務(wù)至關(guān)重要
方法:我們提出了基于網(wǎng)絡(luò)的基因分類的有監(jiān)督學(xué)習(xí),
結(jié)果:sp優(yōu)于label propagation和node embedding

Motivation
Cell-to-cellvariation has uncovered associations between cellular phenotypes. However, itremains challenging to address the cellular diversity of such associations.
Results
Here,we do not rely on the conventional assumption that the same association holdsthroughout the entire cell population. Instead, we assume that associations mayexist in a certain subset of the cells. We developed CEllular Niche Association(CENA) to reliably predict pairwise associations together with the cell subsetsin which the associations are detected. CENA does not rely on predefinedsubsets but only requires that the cells of each predicted subset would share acertain characteristic state. CENA may therefore reveal dynamic modulation ofdependencies along cellular trajectories of temporally evolving states. Usingsimulated data, we show the advantage of CENA over existing methods and itsscalability to a large number of cells. Application of CENA to real biologicaldata demonstrates dynamic changes in associations that would be otherwisemasked.
背景:細(xì)胞軌跡已被用于描述基因表達(dá)的時間變化,但目前尚未用于研究關(guān)聯(lián)的時間變化。(細(xì)胞表型之間的依賴,相互作用)。在分析整個細(xì)胞群體時這些關(guān)聯(lián)被掩蓋,所以在特定細(xì)胞亞群上進行分析。
方法:開發(fā)CENA的新方法,該方法旨在解決捕獲主要存在于某個細(xì)胞子集中的關(guān)聯(lián)的問題。CENA將識別關(guān)聯(lián)的細(xì)胞子集的原始問題轉(zhuǎn)變?yōu)榉敝氐淖泳W(wǎng)檢測問題。
結(jié)果:CENA旨在解決單細(xì)胞基因組學(xué)研究中的一項重要任務(wù),即探索細(xì)胞狀態(tài)空間中的關(guān)聯(lián)如何變化。