scanpy也可以使用harmony,但是其實調(diào)用的Harmonypy這個包,其實使用的話倒是比較簡單,就是下面這些命令,但是我不是很關(guān)心這個,關(guān)鍵是它怎么寫的
Fast, sensitive and accurate integration of single-cell data with Harmony | Nature Methods
import scanpy as sc
import scanpy.external as sce
adata = sc.datasets.pbmc3k()
sc.pp.recipe_zheng17(adata)
sc.tl.pca(adata)
adata.obs['batch'] = 1350*['a'] + 1350*['b']
sce.pp.harmony_integrate(adata, 'batch')
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.pl.umap(adata, color=['batch'], legend_fontsize=8)

harmonypy/harmony.py at master · slowkow/harmonypy (github.com)
obsm就是降維的數(shù)據(jù)
sce.pp.harmony_integrate(adata, 'batch')這句其實就是下面這個調(diào)用下面的語句
import harmonypy
harmony_out = harmonypy.run_harmony(adata.obsm["X_pca"], adata.obs, 'batch')
adata.obsm[adjusted_basis] = harmony_out.Z_corr.T ###obsm就是降維的數(shù)據(jù)
adata.obs

算法1這里講的就是harmnoy通過使用PCA降維后的數(shù)據(jù),不斷重復聚類,收斂的過程
image.png
算法的主體
def harmonize(self, iter_harmony=10, verbose=True):
converged = False
for i in range(1, iter_harmony + 1):
if verbose:
logger.info("Iteration {} of {}".format(i, iter_harmony))
# STEP 1: Clustering
self.cluster()
# STEP 2: Regress out covariates
# self.moe_correct_ridge()
self.Z_cos, self.Z_corr, self.W, self.Phi_Rk = moe_correct_ridge(
self.Z_orig, self.Z_cos, self.Z_corr, self.R, self.W, self.K,
self.Phi_Rk, self.Phi_moe, self.lamb
)
# STEP 3: Check for convergence
converged = self.check_convergence(1)
if converged:
if verbose:
logger.info(
"Converged after {} iteration{}"
.format(i, 's' if i > 1 else '')
)
break
if verbose and not converged:
logger.info("Stopped before convergence")
return 0
算法2是最大優(yōu)化多樣性聚類
Harmony概率性地將細胞分配給cluster,從而使每個cluster內(nèi)數(shù)據(jù)集的多樣性最大化。
image.png
def cluster(self):
# Z_cos has changed
# R is assumed to not have changed
# Update Y to match new integrated data
self.dist_mat = 2 * (1 - np.dot(self.Y.T, self.Z_cos))
for i in range(self.max_iter_kmeans):
# print("kmeans {}".format(i))
# STEP 1: Update Y
self.Y = np.dot(self.Z_cos, self.R.T)
self.Y = self.Y / np.linalg.norm(self.Y, ord=2, axis=0)
# STEP 2: Update dist_mat
self.dist_mat = 2 * (1 - np.dot(self.Y.T, self.Z_cos))
# STEP 3: Update R
self.update_R()
# STEP 4: Check for convergence
self.compute_objective()
if i > self.window_size:
converged = self.check_convergence(0)
if converged:
break
self.kmeans_rounds.append(i)
self.objective_harmony.append(self.objective_kmeans[-1])
return 0
算法3,Mixture of Experts Correct
由于Harmony使用軟聚類,因此可以通過多個因子的線性組合對其A中進行的軟聚類分配進行線性校正,來修正每個單細胞。
image.png
def moe_correct_ridge(Z_orig, Z_cos, Z_corr, R, W, K, Phi_Rk, Phi_moe, lamb):
Z_corr = Z_orig.copy()
for i in range(K):
Phi_Rk = np.multiply(Phi_moe, R[i,:])
x = np.dot(Phi_Rk, Phi_moe.T) + lamb
W = np.dot(np.dot(np.linalg.inv(x), Phi_Rk), Z_orig.T)
W[0,:] = 0 # do not remove the intercept
Z_corr -= np.dot(W.T, Phi_Rk)
Z_cos = Z_corr / np.linalg.norm(Z_corr, ord=2, axis=0)
return Z_cos, Z_corr, W,
(A)Harmony概率性地將細胞分配給cluster,從而使每個cluster內(nèi)數(shù)據(jù)集的多樣性最大化。
(B)Harmony計算每個cluster的所有數(shù)據(jù)集的全局中心,以及特定數(shù)據(jù)集的中心。
(C)在每個cluster中,Harmony基于中心為每個數(shù)據(jù)集計算校正因子。
(D)最后,Harmony使用基于C的特定于細胞的因子校正每個細胞。由于Harmony使用軟聚類,因此可以通過多個因子的線性組合對其A中進行的軟聚類分配進行線性校正,來修正每個單細胞。
重復步驟A到D,直到收斂為止。聚類分配和數(shù)據(jù)集之間的依賴性隨著每一輪的減少而減小。
“harmony”整合不同平臺的單細胞數(shù)據(jù)之旅 - 騰訊云開發(fā)者社區(qū)-騰訊云 (tencent.com)


