L1000 data 知識(shí)點(diǎn)+處理流程

L1000 data proceeds through a data processing pipeline outlined in the figure below. Briefly, the pipeline captures raw data from Luminex FlexMap 3D scanners as it is generated, deconvolutes 978 transcripts from only 500 Luminex bead colors, normalizes the data based on 80 invariant control genes, infers the expression of the non-measured transcripts, determines differentially expressed genes following a perturbation compared to controls, and generates composite signatures across biological replicates. Along the way the data are subjected to rigorous quality control filters at both the sample and plate level.


Data processing pipeline

Level 1

Level 1 -LXB - raw fluorescent intensity (FI) values measured for every bead detected by Luminex scanners. The FI is proportional to the amount of amplicon bound to the bead, and hence also proportional to the transcript abundance of the genes that particular bead is interrogating. Each 384-well plate generates 384 LXB files, where each file contains a fluorescent intensity value for each observed bead in the well. Here, the data from each perturbagen treatment is referred to as a profile, experiment, or instance.

Level2

Level 2 - GEX - Gene expression levels for the 978 landmark genes, deconvoluted from the measured fluorescent intensity values. (See supplementary information in Subramanian, et al., 2017 for details on peak deconvolution.) Here, the data from each perturbagen treatment is referred to as a profile, experiment, or instance.

Level3

Level 3a - NORM - Gene expression (GEX, Level 2) are normalized to invariant gene set curves and quantile normalized across each plate. Here, the data from each perturbagen treatment is referred to as a profile, experiment, or instance.

Level 3b - INF- Additional values for 11,350 additional genes not directly measured in the L10000 assay are inferred based on the normalized values for the 978 landmark genes.

Level4

Level 4 - ZS - Z-scores for each gene based on Level 3 with respect to the entire plate population. This comparison of profiles to their appropriate population control generates a list of differentially expressed genes.

Level5

Level 5 - MODZ - replicate-collapsed z-score vectors based on Level 4. Replicate collapse generates one differential expression vector, which we term a signature. Connectivity analyses are performed on signatures.

For levels 1 and 2, values are present for only the 978 landmark features. For levels 3-5, values are present for each of the 12,328 genes (978 landmark plus 11,350 inferred).

The code for the data processing pipeline is available in the cmapM GitHub repository. The procedure to replicate each step the pipeline along with sample data are detailed here.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容