運(yùn)行SCENIC做單細(xì)胞的轉(zhuǎn)錄因子分析時(shí)遇到一個(gè)問題，就是運(yùn)行SCENIC所需的輸入文件中需要用到cistarget database的參考motif文件，而這個(gè)在SCENIC官網(wǎng)上僅有人、小鼠、果蠅的參考數(shù)據(jù)庫，而自己的數(shù)據(jù)是大鼠的基因組，因此需要自己動(dòng)手去建大鼠的cistarget database。
SCENIC官網(wǎng)中提供了create cistarget database的workflow，但官網(wǎng)中該部分的文檔寫的并不十分詳細(xì)，在此特別感謝另外一位做擬南芥的簡友，提供了很好的借鑒，附上參考資料：
https://github.com/aertslab/create_cisTarget_databases
http://www.itdecent.cn/p/59db26de0858
https://github.com/weng-lab/cluster-buster

第一步：安裝各種軟件

創(chuàng)建環(huán)境

conda create -n create_cistarget_databases \
    'python=3.10' \
    'numpy=1.21' \
    'pandas>=1.4.1' \
    'pyarrow>=7.0.0' \
    'numba>=0.55.1' \
    'python-flatbuffers'

conda activate create_cistarget_databases

將create_cisTarget_databases軟件包拷貝到本地

git clone https://github.com/aertslab/create_cisTarget_databases

安裝Cluster-Buster

##安裝預(yù)編譯二進(jìn)制文件
cd "${CONDA_PREFIX}/bin" #進(jìn)入環(huán)境根目錄下的bin文件夾
wget https://resources.aertslab.org/cistarget/programs/cbust #下載預(yù)編譯二進(jìn)制文件
chmod a+x cbust #使該文件變?yōu)榭蓤?zhí)行文件

##安裝cbust
git clone -b change_f4_output https://github.com/ghuls/cluster-buster/
cd cluster-buster
make cbust
conda activate create_cistarget_databases
cp -a cbust "${CONDA_PREFIX}/bin/cbust"

安裝UCSC工具

cd "${CONDA_PREFIX}/bin"
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bigWigAverageOverBed
chmod a+x liftOver bigWigAverageOverBed
conda activate create_cistarget_databases

第二步：創(chuàng)建Cistarget數(shù)據(jù)庫

根據(jù)create_cistarget_database的官方文檔，需要輸入以下文件

image.png

所需文件

1、FASTA file with regulatory regions: 所有基因的啟動(dòng)子區(qū)域序列，可以在UCSC中下載https://hgdownload.soe.ucsc.edu/goldenPath/rn7/bigZips/。

2、motifs矩陣文件：in Cluster-Buster format

從cisBP(http://cisbp.ccbr.utoronto.ca/)中下載大鼠基因的motif信息，主要關(guān)注PWM文件夾。

image.png

PWM文件夾由許多motif的txt文件組成，點(diǎn)開其中一個(gè)motif的txt文件可以看到其中包含了motif的概率矩陣信息。在這個(gè)矩陣中，每一行代表一個(gè)堿基位置，每一列代表一個(gè)堿基類型，數(shù)字表示該位置上對應(yīng)堿基類型的頻次或權(quán)重。

image.png

需要注意的是，PWM文件夾中的motif矩陣文件需要修飾為Cluster-Buster中的motif矩陣格式!

image.png

cd ./pwms_all_motifs

#Step 1: 去掉文件夾中所有txt文件的第一行和第一列的信息
for file in *.txt; do
    awk 'NR>1 { for (i=2; i<=NF; i++) printf $i"\t"; printf "\n" }' "$file" > "${file}_temp"
    mv "${file}_temp" "$file"
done

#Step 2: 過濾掉文件夾中的空白文件
find ./ -type f -empty -delete

#Step 3: 提取文件夾中的文件名作為motif_id，并在相應(yīng)的txt文件的第一行添加motif_id的信息
for file in *.txt; do
    motif_id=$(basename "$file" .txt)
    sed -i "1s/^/$motif_id\n/" "$file"
done

#Step 4: 在所有文檔的開頭加上“>”符號(hào)
sed -i '1s/^/>/' *.txt

#Step 5: 將所有文件的后綴名改為.cb文件
for file in *.txt; do
    mv "$file" "${file%.txt}.cb"
done

得到了以下結(jié)果：

image.png

3、motif list：其實(shí)就是把PWM文件夾中所有文件的文件名提取出來形成一個(gè)txt文件就行

for file in pwms_all_motifs/*.cb; do
    echo "$(basename "$file" .cb)" >> motif_list.txt
done

運(yùn)行create_cisTarget_motif_databases.py

cd /home/lwc/scRNA/SCENIC/create_cisTarget_databases
ln -s ~/scRNA/SCENIC/Rattus_cistarget_database/upstream2000.fa
ln -s ~/scRNA/SCENIC/Rattus_cistarget_database/pwms_all_motifs/
ln -s ~/scRNA/SCENIC/Rattus_cistarget_database/motif_list.txt

python create_cistarget_motif_databases.py \
   -f upstream2000.fa \
   -M pwms_all_motifs/ \
   -m motif_list.txt \
   -o ~/scRNA/SCENIC/Rattus_cistarget_database/ \
   -t 22

運(yùn)行界面如下

image.png

總結(jié)

第一步安裝軟件按照create_cistarget_database的官方文檔來就好了
第二步比較坑的一點(diǎn)是官方文檔中只說了需要Cluster-Buster格式的motif文件，但是并沒有詳細(xì)說明，另外就是還缺少說明motif list需要的是什么信息。
想要?jiǎng)?chuàng)建cistarget database，最關(guān)鍵的是要拿到對應(yīng)種屬的motif PWM文件，然后將這個(gè)motif的矩陣文件進(jìn)行修飾改為Cluster-Buster的格式。
本文選取的是收錄motif信息最多的cisbp數(shù)據(jù)庫，當(dāng)然還有其他數(shù)據(jù)庫可以下載這個(gè)motif PWM文件，但是至于怎么把它們整合起來那是另一回事了。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

創(chuàng)建大鼠cistarget參考數(shù)據(jù)庫

創(chuàng)建大鼠cistarget參考數(shù)據(jù)庫

第一步：安裝各種軟件

創(chuàng)建環(huán)境

將create_cisTarget_databases軟件包拷貝到本地

安裝Cluster-Buster

安裝UCSC工具

第二步：創(chuàng)建Cistarget數(shù)據(jù)庫

所需文件

1、FASTA file with regulatory regions: 所有基因的啟動(dòng)子區(qū)域序列，可以在UCSC中下載https://hgdownload.soe.ucsc.edu/goldenPath/rn7/bigZips/。

2、motifs矩陣文件：in Cluster-Buster format

3、motif list：其實(shí)就是把PWM文件夾中所有文件的文件名提取出來形成一個(gè)txt文件就行

運(yùn)行create_cisTarget_motif_databases.py

總結(jié)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

創(chuàng)建大鼠cistarget參考數(shù)據(jù)庫

第一步：安裝各種軟件

創(chuàng)建環(huán)境

將create_cisTarget_databases軟件包拷貝到本地

安裝Cluster-Buster

安裝UCSC工具

第二步：創(chuàng)建Cistarget數(shù)據(jù)庫

所需文件

1、FASTA file with regulatory regions: 所有基因的啟動(dòng)子區(qū)域序列，可以在UCSC中下載https://hgdownload.soe.ucsc.edu/goldenPath/rn7/bigZips/。

2、motifs矩陣文件：in Cluster-Buster format

3、motif list：其實(shí)就是把PWM文件夾中所有文件的文件名提取出來形成一個(gè)txt文件就行

運(yùn)行create_cisTarget_motif_databases.py

總結(jié)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

1、FASTA file with regulatory regions: 所有基因的啟動(dòng)子區(qū)域序列，可以在UCSC中下載https://hgdownload.soe.ucsc.edu/goldenPath/rn7/bigZips/。

2、motifs矩陣文件：in Cluster-Buster format