10X單細胞空間通訊分析之最新版cellphoneDB(v4)解讀

作者,Evil Genius

前不久剛給學員上了一節(jié)關于細胞通訊的課程,也發(fā)現了很多軟件的更新之處,在這里給大家分享一下cellphoneDB v4.0最新更新的內容。

考慮空間位置的通訊分析手段---CellphoneDB(V3.0)

安裝上的不同,現在cellphoneDB完全封裝成一個linux運行命令,conda直接安裝就可以。

conda create -n cpdb python=3.8

source activate cpdb

pip install cellphonedb

分析方法上的更新(三種方法選擇)

  • METHOD 1 simple analysis (>= v1): Here, no statistical analysis is performed. CellphoneDB will output the mean for all the interactions for each cell type pair combination. Note that CellphoneDB will report the means only if all the gene members of the interactions are expressed by at least a fraction of cells in a cell type (threshold). If the condition threshold is not met, the interaction will be ignored in the corresponding cell type pairs.
如果采用方法1,那么直接會選出所有的表達配受體的細胞類型pair
means, deconvoluted = cpdb_analysis_method.call(
         cpdb_file_path = cellphonedb.zip,
         meta_file_path = test_meta.txt,
         counts_file_path = test_counts.h5ad,
         counts_data = 'hgnc_symbol',
         output_path = out_path)

結果只包含受配體對的means.csv and deconvoluted.csv

  • METHOD 2 statistical_analysis (>= v1): This is a statistical analysis that evaluates for significance all the interactions that can potentially occur in your dataset: i.e. between ALL the potential cell type pairs. Here, CellphoneDB uses empirical shuffling to calculate which ligand–receptor pairs display significant cell-type specificity. Specifically, it estimates a null distribution of the mean of the average ligand and receptor expression in the interacting clusters by randomly permuting the cluster labels of all cells. The P value for the likelihood of cell-type specificity of a given receptor–ligand complex is calculated on the basis of the proportion of the means that are as high as or higher than the actual mean.

如果采用方法2,那么就會對配受體對進行假設檢驗

  • Only receptors and ligands expressed in more than a user-specified threshold percentage of the cells in the specific cluster (threshold default is 0.1) are tested and will get a mean value in the significant.txt output.
  • For the multi-subunit heteromeric complexes, we require that:
    1、 all subunits of the complex are expressed by a proportion of cells (threshold), and then
    2、 We use the member of the complex with the minimum expression to compute the interaction means and perform the random shuffling.
然后,對所有細胞類型進行兩兩比較。首先,隨機排列所有細胞的cluster標簽(默認為1000),并確定cluster中平均受體表達水平的平均值和相互作用cluster中平均配體表達水平的平均值。對于兩種細胞類型之間的每個成對比較中的每個受體配體對,這產生零分布。通過計算等于或高于實際平均值的平均值的比例,獲得給定受體-配體復合物細胞類型特異性可能性的p值。然后,根據顯著對的數量優(yōu)先考慮細胞類型之間高度豐富的相互作用,以便可以手動選擇生物學上相關的相互作用。
from cellphonedb.src.core.methods import cpdb_statistical_analysis_method

deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(
        cpdb_file_path = cellphonedb.zip,
        meta_file_path = test_meta.txt,
        counts_file_path = test_counts.h5ad,
        counts_data = 'hgnc_symbol',
        output_path = out_path)
  • METHOD 3 degs_analysis (>= v3): This method is proposed as an alternative to the statistical inference approach. This approach allows the user to design more complex comparisons to retrieve interactions specific to a cell type of interest. This is particularly relevant when your research question goes beyond comparing “one” cell type vs “the rest”. Examples of alternative contrasts are hierarchical comparisons (e.g. you are interested in a specific lineage, such epithelial cells, and wish to identify the genes changing their expression within this lineage) or comparing disease vs control (e.g. you wish to identify upregulated genes in disease T cells by comparing them against control T cells). For this CellphoneDB method (cpdb_degs_analysis_method), the user provides an input file (test_DEGs.txt in the command below) indicating which genes are relevant for a cell type (for example, marker genes or significantly upregulated genes resulting from a differential expression analysis (DEG)). CellphoneDB will select interactions where:

  • 1、 all the genes in the interaction are expressed in the corresponding cell type by more than 10% of cells (threshold = 0.1) and

  • 2、 at least one gene-cell type pair is in the provided DEG.tsv file.

from cellphonedb.src.core.methods import cpdb_degs_analysis_method

deconvoluted, means, relevant_interactions, significant_means = cpdb_degs_analysis_method.call(
         cpdb_file_path = cellphonedb.zip,
         meta_file_path = test_meta.txt,
         counts_file_path = test_counts.h5ad,
         degs_file_path = degs_file.txt,
         counts_data = 'hgnc_symbol',
         threshold = 0.1,
         output_path = out_path)
這種方法可以自由地設計基因表達比較,以更好地匹配研究問題。使用方法2,零假設(和背景分布)考慮數據集中的所有細胞類型,并執(zhí)行“一個”細胞類型與“其余”細胞類型的比較。然而,分析可能希望使用不同的方法來更好地反映研究情況。下面是一些例子:

分析需要考慮技術批次或生物協(xié)變量。在這里,更好的方法是依賴包含這些混雜因素的差異表達方法,并直接向CellphoneDB提供結果。

**對特定譜系中的特異性感興趣,并希望執(zhí)行分層差異表達分析(例如,對特定譜系感興趣,例如上皮細胞,并希望識別在該上皮譜系中改變其表達的基因;研究問題:與上皮細胞b相比,上皮細胞a中哪些相互作用被上調?)

希望在疾病與控制方式中比較特定群體(例如,通過將疾病T細胞與對照T細胞進行比較來識別疾病T細胞中的上調基因;研究問題:疾病t細胞上調了哪些相互作用?)

包含空間信息,可以參考考慮空間位置的通訊分析手段---CellphoneDB(V3.0)

結果解讀

Output files

All files (except “deconvoluted.txt”) follow the same structure: rows depict interacting proteins while columns represent interacting cell type pairs.

  • The “means.txt” file contains mean values for each ligand-receptor interaction (rows) for each cell-cell interaction pair (columns).

  • The “pvalues.txt” contains the P values for the likelihood of cell-type specificity of a given receptor–ligand complex (rows) in each cell-cell interaction pair (columns), resulting from the statistical_analysis.

  • The “significant_means.txt” contains the mean expression (same as “means.txt”) of the significant receptor–ligand complex only. This is the result of crossing “means.csv” and “pvalues.txt”.

  • The “relevant_interactions.txt” contains a binary matrix indicating if the interaction is relevant (1) or not (0). An interaction is classified as relevant if a gene is a DEG in a cluster/cell type (information provided by the user in the DEG.tsv file) and all the participant genes are expressed. Alternatively, the value is set to 0. This file is specific to degs_analysis. Each row corresponds to a ligand-receptor interaction, while each column corresponds to a cell-cell interacting pair.

  • The “deconvoluted.txt” file gives additional information for each of the interacting partners. This is important as some of the interacting partners are heteromers. In other words, multiple molecules have to be expressed in the same cluster in order for the interacting partner to be functional.

See below the meaning of each column in the outputs:

P-value (pvalues.txt), Mean (means.txt), Significant mean (significant_means.txt) and Relevant interactions (relevant_interactions.txt)
  • id_cp_interaction: Unique CellphoneDB identifier for each interaction stored in the database.
  • interacting_pair: Name of the interacting pairs separated by “|”.
  • partner A or B: Identifier for the first interacting partner (A) or the second (B). It could be: UniProt (prefix simple:) or complex (prefix complex:)
  • gene A or B: Gene identifier for the first interacting partner (A) or the second (B). The identifier will depend on the input user list.
  • secreted: True if one of the partners is secreted.
  • Receptor A or B: True if the first interacting partner (A) or the second (B) is annotated as a receptor in our database.
  • annotation_strategy: Curated if the interaction was annotated by the CellphoneDB developers. Otherwise, the name of the database where the interaction has been downloaded from.
  • is_integrin: True if one of the partners is integrin.
  • rank: Total number of significant p-values for each interaction divided by the number of cell type-cell type comparisons. (Only in significant_means.txt)
  • means: Mean values for all the interacting partners: mean value refers to the total mean of the individual partner average expression values in the corresponding interacting pairs of cell types. If one of the mean values is 0, then the total mean is set to 0. (Only in means.txt)
  • p.values: p-values for all the interacting partners: p.value refers to the enrichment of the interacting ligand-receptor pair in each of the interacting pairs of cell types. (Only in pvalues.txt)
  • significant_mean: Significant mean calculation for all the interacting partners. If p.value < 0.05, the value will be the mean. Alternatively, the value is set to 0. (Only in significant_means.txt)
  • relevant_interactions: Indicates if the interaction is relevant (1) or not (0). If a gene in the interaction is a DEG (i.e. a gene in the DEG.tsv file), and all the participant genes are expressed, the interaction will be classified as relevant. Alternatively, the value is set to 0. ( Only in relevant_interactions.txt)

Again, remember that the interactions are not symmetric. It is not the same IL12-IL12 receptor for clusterA clusterB (i.e. receptor is in clusterB) that IL12-IL12 receptor for clusterB clusterA (i.e. receptor is in clusterA).

Deconvoluted (deconvoluted.txt)

  • gene_name: Gene identifier for one of the subunits that are participating in the interaction defined in the “means.csv” file. The identifier will depend on the input of the user list.

  • uniprot: UniProt identifier for one of the subunits that are participating in the interaction defined in the “means.csv” file.

  • is_complex: True if the subunit is part of a complex. Single if it is not, complex if it is.

  • protein_name: Protein name for one of the subunits that are participating in the interaction defined in the “means.csv” file.

  • complex_name: Complex name if the subunit is part of a complex. Empty if not.

  • id_cp_interaction: Unique CellphoneDB identifier for each of the interactions stored in the database.

  • mean: Mean expression of the corresponding gene in each cluster.

Interpreting the outputs

How to read and interpret the results?

The key files are significant_means.txt (for statistical_analysis) or relevant_interactions.txt (for degs_analysis), see below. When interpreting the results, we recommend you first define your questions of interest. Next, focus on specific cell type pairs and manually review the interactions prioritising those with lower p-value and/or higher mean expression. For graphical representation we recommend @zktuong repository: ktplots in R and ktplotspy in python.

CellphoneDB output is high-throughput. CellphoneDB provides all cell-cell interactions that may potentially occur in your dataset, given the expression of the cells. The size of the output may be overwhelming, but if you apply some rationale (which will depend on the design of your experiment and your biological question), you will be able to narrow it down to a few candidate interactions. The new method degs_analysis will allow you to perform a more tailored analysis towards specific cell-types or conditions, while the option microenvs will allow you to restrict the combinations of cell-type pairs to test.

It may be that not all of the cell-types of your input dataset co-appear in time and space. Cell types that do not co-appear in time and space will not interact. For example, you might have cells coming from different in vitro systems, different developmental stages or disease and control conditions. Use this prior information to restrict and ignore infeasible cell-type combinations from the outputs (i.e., columns) as well as their associated interactions (i.e. rows). You can restrict the analysis to feasible cell-type combinations using the option microenvs. Here you can input a two columns file indicating which cell type is in which spatiotemporal microenvironment. CellphoneDB will use this information to define possible pairs of interacting cells (i.e. pairs of clusters co-existing in a microenvironment) ignoring the rest of combinations.

最重要的是結果文件采用受配體對進行展示,而不是通常的配受體對。

簡單記錄一下,生活很好,有你更好
?著作權歸作者所有,轉載或內容合作請聯系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

友情鏈接更多精彩內容