Synteny and Rearrangement Identifier (Syri), 鑒定基因組間SV。以全基因組比對結(jié)果作為輸入,可識別不同種類的結(jié)構(gòu)變異(SV)。
學(xué)習(xí)通道:
1、基本原理
Step 1 鑒定systemic regions and non-systemic regions (rearrangements)

Step 1
Step 2 將non-systemic regions (rearrangements)分為inversion, duplications, translocations

Step 2
Step 3 在systemic- and -non-systemic region 鑒定local variants

Step 3
2. 安裝
本次安裝v1.4
需要
conda install cython numpy scipy pandas=0.23.4 biopython psutil matplotlib=3.0.0
conda install -c conda-forge python-igraph
conda install -c bioconda pysam
# Additionally, if using chroder
conda install -c bioconda longestrunsubsequence
可以新建環(huán)境進(jìn)行安裝
安裝SyRI
git clone https://github.com/schneebergerlab/syri.git
python setup.py install
chmod +x syri/bin/syri syri/bin/chroder syri/bin/plotsr # Make files executable
所有可執(zhí)行文件都在中cwd/syri/bin/。
3. 簡單操作
在安裝example/下有 操作流程,對應(yīng)操作即可。
# Using minimap2 for generating alignment. Any other whole genome alignment tool can also be used.
minimap2 -ax asm5 --eqx refgenome qrygenome > out.sam
python3 $PATH_TO_SYRI -c out.sam -r refgenome -q qrygenome -k -F S
# or
samtools view -b out.sam > out.bam
python3 $PATH_TO_SYRI -c out.bam -r refgenome -q qrygenome -k -F B
繪圖
python3 $PATH_TO_PLOTSR syri.out refgenome qrygenome -H 8 -W 5
也可以使用nucmer進(jìn)行比對
nucmer --maxmatch -c 100 -b 500 -l 50 refgenome qrygenome # Whole genome alignment. Any other alignment can also be used.
delta-filter -m -i 90 -l 100 out.delta > out.filtered.delta # Remove small and lower quality alignments
show-coords -THrd out.filtered.delta > out.filtered.coords # Convert alignment information to a .TSV format as required by SyRI
python3 $PATH_TO_SYRI -c out.filtered.coords -d out.filtered.delta -r refgenome -q qrygenome
python3 $PATH_TO_PLOTSR syri.out refgenome qrygenome -H 8 -W 5
??注意:
- 比對時,染色體數(shù)量相同,ID也相同
- 全基因組比對,沒有掛載到染色體上的contig可不用
- 如果沒有chromosomal-level genome, 軟件自動會將其contig進(jìn)行掛載,類似于RaGOO軟件。
4. 輸出格式
存在兩種格式tsv以及vcf格式
- TSV格式規(guī)格

針對于注釋類型,也存在如下解釋

Parent ID對應(yīng)于其中存在alignment或 local variation 的注釋塊中(共線性區(qū)域或結(jié)構(gòu)重排)的unique ID。因此,如果在基因組A的Chr1:10和基因組B的Chr2:542有一個易位區(qū)域(unique ID TRANS1)存在A-> T SNP(unique ID SNP1),則相應(yīng)的條目將為:
Chr1 10 10 A T Chr2 542 542 SNP1 TRANS1 SNP -
-
VCF
由于vcf格式是基于reference進(jìn)行排列,因?yàn)椴荒茱@示query genome的un-aligned region
5 畫圖
python /path/to/plotsr syri.out /path/to/refgenome /path/to/qrygenome
positional arguments:
reg syri.out file generated by SyRI
r path to reference genome
q path to query genome
optional arguments:
-h, --help show this help message and exit
-s S minimum size of a SR to be plotted
-R Create ribbons
-f F font size
-H H height of the plot
-W W width of the plot
-o {pdf,png,svg} output file format (pdf, png, svg)
-d D DPI for the final image
-b {agg,cairo,pdf,pgf,ps,svg,template}
Matplotlib backend to use

image.png
參考
20220901
v1.6
安裝
conda create -n syrl -c bioconda syri -y
conda install -c bioconda plotsr