10X單細(xì)胞(10X空間轉(zhuǎn)錄組)軌跡分析(擬時(shí)分析)VECTOR之文獻(xiàn)分享

hello,昨天我們分享了VECTOR的示例代碼,文章在10X單細(xì)胞(10X空間轉(zhuǎn)錄組)軌跡分析(擬時(shí)分析)之VECTOR,2020年8月發(fā)表于Cell Reports,對(duì)于其原理,我們還是需要認(rèn)真總結(jié)一下的,這篇短文就讓我們來分享一下這篇文獻(xiàn),把握重點(diǎn),看看這個(gè)軟件的特點(diǎn)及運(yùn)用情況,對(duì)軟件的把握做到心中有數(shù)。

SUMMARY

A key step in trajectory inference is the determination of starting cells(這個(gè)大家應(yīng)該深有體會(huì),所以做個(gè)性化分析之前都是需要細(xì)胞定義的), which is typically done by using manually selected marker genes(目前大多數(shù)細(xì)胞定義的方法還是依賴于人工選擇marker,相似性映射的方法目前問題太多). In this study, we find that the quantile polarization(分位數(shù)極化 ???) of a cell’s principal-component values is strongly associated with their respective states in development hierarchy(主成分的value與細(xì)胞發(fā)育狀態(tài)相關(guān)), and therefore provides an unsupervised solution for determining the starting cells(這個(gè)地方需要深入研究一下). Based on this finding, we developed a tool named VECTOR that infers vectors of developmental directions for cells in UniformManifold Approximation and Projection (UMAP). In seven datasets of different developmental scenarios, VECTOR correctly identifies the starting cells and successfully infers the vectors of developmental directions. VECTOR is freely available for academic use at https://github.com/jumphone/Vector.(運(yùn)用示例很好,每篇文章都是這么說的)。

INTRODUCTION

這個(gè)地方我們提煉一下

TI方法的算法(monocle,PAGA,slingshot等,這幾個(gè)軟件大家都應(yīng)該很熟悉)設(shè)計(jì)有兩個(gè)共同的組成部分

  • the use of dimensional reduction, clustering, or graph-building techniques to convert scRNA-seq data into a simplified representation of trajectory, and the ordering of cells along the trajectory.(降維聚類,很常規(guī)
  • there may be many alternative trajectories to choose from, most TI methods require the use of prior information, such as a set of known marker genes, to determine the starting cells (SCs) of the correct trajectory.(說白了,需要做細(xì)胞定義來決定發(fā)育的起點(diǎn),不做細(xì)胞定義的軌跡分析都是耍流氓
    marker的人為主觀選擇確實(shí)存在很大的誤差,Recently, a new study found that RNA velocity(RNA Velocyto確實(shí)這個(gè)方面做的不錯(cuò),人為干預(yù)減少),the time derivative of gene expression states, could be estimated by modeling the relationship between unspliced and spliced mRNAs, making it possible to deduce the future transcriptional states of cells and consequently the developmental trajectories without the need of prior information for determining SCs(依據(jù)可變剪切來推斷發(fā)育軌跡,這個(gè)方法高分文獻(xiàn)經(jīng)常用到),在沒有使用任何先驗(yàn)信息的情況下,使用RNA速度鑒定了神經(jīng)c譜系細(xì)胞的新型發(fā)育模型,證明了其在發(fā)育譜系分析中的有用性。

看一下RNA velocyto的缺點(diǎn)

  • reanalyze raw sequencing data to determine intron reads for quantifying unspliced mRNAs, which is time-consuming and sometimes may not be possible because of the limitation of the sequencing platforms.(這也不算什么缺點(diǎn))。

現(xiàn)在做單細(xì)胞分析確實(shí)PCA分析是必需的,Cells at different developmental states have been shown to
have distinct patterns of PC values.However, the patterns of a cell’s PC values have not yet been fully explored in the current TI methods.(這個(gè)地方作者持保留意見),In this study, we observed that the averaged polarization of a cell’s PC values across a large number of PC subspaces is strongly correlated with their developmental states, with SCs having the most polarized PC values.(這個(gè)地方需要注意一下,不知道大家注意過沒有,初始細(xì)胞的PC值很特別么??待會(huì)看看看方法),We thus provided an unsupervised solution for determining the SCs based on the averaged polarization of a cell’s PC values.(依據(jù)PC值來確定發(fā)育起點(diǎn),這個(gè)方法不能說是無監(jiān)督,必須半監(jiān)督),當(dāng)然,作者的示例當(dāng)然很不錯(cuò),我們自己用需要點(diǎn)注意了。

Result

第一步是拿定義好的兩個(gè)單細(xì)胞數(shù)據(jù)集驗(yàn)證軟件的可靠性

我們做PCA分析的時(shí)候,一般選擇前十幾個(gè)PCA做下游的分析,Seurat本身會(huì)計(jì)算50個(gè)PCA,作者這個(gè)地方采用的卻是150個(gè)PCA,這個(gè)地方依據(jù)是什么,需要在方法中看看了。

圖片.png

在數(shù)據(jù)集分析中發(fā)現(xiàn),F(xiàn)or both oligodendrocyte and enterocyte lineages, we found that cells at earlier developmental stages tend to have more extreme PC values(更極端的PCA值)(either very small or very large—i.e., highly polarized(極化原來是這個(gè)意思,服了)),while those at later developmental stages tend to have more intermediate PC values(這個(gè)規(guī)律還真沒注意過,需要拿自己的數(shù)據(jù)來嘗試一下了)。such patterns were more obvious if we inspected the density of the PC value quantiles at all 150 PC subspaces for cells at different developmental stages。(看圖規(guī)律倒是很明顯
圖片.png

To quantify the polarization of the PC value quantiles, we next defines a quantile polarization (QP) score that averages the polarization of the PC value quantile of a given cell across all 150 PC subspacesQP的定義,這個(gè)方式講道理, 我還是第一次見),然后QP的值很發(fā)育層級(jí)相關(guān)性很高,with cells at the earliest developmental stages having the greatest QP scores。
圖片.png

We further experimented with using a different number of PCs, and found that such correlations were robust if the number of PCs used could explain ~20%–80% of the total variance。

UMAP直接推斷軌跡發(fā)生,這個(gè)在monocle3軟件中有運(yùn)用

In essence, VECTOR treats a twodimensional UMAP representation of cells as an image and splits it into a number of pixels. After removing those pixels that do not include any cells, VECTOR focuses on the largest connected pixel (LCP) network in UMAP to infer developmental directions.看來這個(gè)軟件這是在UMAP圖上進(jìn)行軌跡的推斷)。By averaging the QP scores of cells inside each pixel, VECTOR identifies the high-scoring pixels that have the greatest QP scores (top 10% by default).(PCA的極化值推斷發(fā)育起點(diǎn)的細(xì)胞),作者也提到了這個(gè)方法可能會(huì)存在假陽性,Here, VECTOR considers not only QP scores but also the connectivity of cells in UMAP; from the high-scoring pixels, it selects the largest connected high scoring pixels as the starting point of development. (聯(lián)合UMAP的分析結(jié)果進(jìn)行綜合分析,得到發(fā)育起點(diǎn)的細(xì)胞),Those isolated high-scoring pixels that are likely false positives are then filtered out.(這個(gè)地方其實(shí)有bug)。For each pixel in the LCP network, VECTOR computes a pseudotime score defined as
its network distance to the starting point of development(大部分軟件都是這么計(jì)算的)。Finally, for a given target pixel VECTOR computes a vector (with arrow and length) by taking into consideration the information of all pixels in the LCP network, including the direction of the unit vector pointing from a selected pixel to the target pixel, the relative pseudotime score between the target pixel and the selected pixel, and the closeness of the selected pixel to the target pixel in the LCP network, and so on.(分析結(jié)果得到類似RNA Velocyto的圖)。箭頭的方向就是發(fā)育的方向,臨近發(fā)育起點(diǎn)和發(fā)育中期,箭頭較短,臨近發(fā)育終點(diǎn)箭頭較長(zhǎng)。

運(yùn)用示例

剛才定義好的兩個(gè)數(shù)據(jù)集表現(xiàn)很好,成功識(shí)別了發(fā)育起點(diǎn)和軌跡


圖片.png

運(yùn)用到其他示例數(shù)據(jù),效果也不錯(cuò)


圖片.png
Vector 和 RNA Velocyto的比較
圖片.png

Vector效果更好,RNA Velocyto有截?cái)?,which may be caused by the lack of intron reads in these cells.當(dāng)然,Velocyto也很難識(shí)別發(fā)育的起點(diǎn)。

接下來是運(yùn)用到多發(fā)育分支的數(shù)據(jù)

圖片.png

效果不錯(cuò)。當(dāng)然,軟件也提供了人工選擇發(fā)育起點(diǎn)的功能。

Method

The workflow of VECTOR

Given a two-dimensional UMAP representation of cells, VECTOR treats it as an image, and then splitting it into a number of pixels. We provide a parameter called ‘‘N’’ for defining the number of pixels in UMAP.

圖片.png

不僅僅有數(shù)據(jù)處理,還有圖片處理的相關(guān)信息

大家不妨試一試吧

生活很好,有你更好

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容