在任意文件夾下面創(chuàng)建 folder1~5這5個文件夾,然后每個文件夾下面繼續(xù)創(chuàng)建 folder1~5這5個文件夾
zy@VM-0-15-ubuntu:~/tmp$ mkdir -p folder_{1..5}/folder_{1..5}
zy@VM-0-15-ubuntu:~$ tree tmp
tmp
├── folder_1
│ ├── folder_1
│ ├── folder_2
│ ├── folder_3
│ ├── folder_4
│ └── folder_5
├── folder_2
│ ├── folder_1
│ ├── folder_2
│ ├── folder_3
│ ├── folder_4
│ └── folder_5
├── folder_3
│ ├── folder_1
│ ├── folder_2
│ ├── folder_3
│ ├── folder_4
│ └── folder_5
├── folder_4
│ ├── folder_1
│ ├── folder_2
│ ├── folder_3
│ ├── folder_4
│ └── folder_5
└── folder_5
├── folder_1
├── folder_2
├── folder_3
├── folder_4
└── folder_5
在第五題創(chuàng)建的每一個文件夾下面都 創(chuàng)建第二題文本文件 me.txt ,內(nèi)容也要一樣。
zy@VM-0-15-ubuntu:~$ vim txt.sh
#!/bin/bash
#
for i in {1..5};do
cd ~/tmp/folder_$i
for I in {1..5};do
cd ~/tmp/folder_$i/folder_$I
echo -e:"Go to: http://www.biotrainee.com/ \n I love bioinfomatics.\n And you ?">me.txt
done
done
##按下Esc :wq
zy@VM-0-15-ubuntu:~$ tree tmp
tmp
├── folder_1
│ ├── folder_1
│ │ └── me.txt
│ ├── folder_2
│ │ └── me.txt
│ ├── folder_3
│ │ └── me.txt
│ ├── folder_4
│ │ └── me.txt
│ ├── folder_5
│ │ └── me.txt
│ └── me.txt
├── folder_2
│ ├── folder_1
│ │ └── me.txt
│ ├── folder_2
│ │ └── me.txt
│ ├── folder_3
│ │ └── me.txt
│ ├── folder_4
│ │ └── me.txt
│ ├── folder_5
│ │ └── me.txt
│ └── me.txt
├── folder_3
│ ├── folder_1
│ │ └── me.txt
│ ├── folder_2
│ │ └── me.txt
│ ├── folder_3
│ │ └── me.txt
│ ├── folder_4
│ │ └── me.txt
│ ├── folder_5
│ │ └── me.txt
│ └── me.txt
├── folder_4
│ ├── folder_1
│ │ └── me.txt
│ ├── folder_2
│ │ └── me.txt
│ ├── folder_3
│ │ └── me.txt
│ ├── folder_4
│ │ └── me.txt
│ ├── folder_5
│ │ └── me.txt
│ └── me.txt
└── folder_5
├── folder_1
│ └── me.txt
├── folder_2
│ └── me.txt
├── folder_3
│ └── me.txt
├── folder_4
│ └── me.txt
├── folder_5
│ └── me.txt
└── me.txt
#如何查看echo的幫助信息
/bin/echo --help
下載 http://www.biotrainee.com/jmzeng/igv/test.bed 文件,后在里面選擇含有 H3K4me3 的那一行是第幾行,該文件總共有幾行。
zy@VM-0-15-ubuntu:~/tmp$ wget -c http://www.biotrainee.com/jmzeng/igv/test.bed
zy@VM-0-15-ubuntu:~/tmp$ grep -n H3K4me3 test.bed
8:chr1 9810 10438 ID=SRX387603;Name=H3K4me3%20(@%20HMLE);Title=GSM1280527:%20HMLE%20Twist3D%20H3K4me3%20rep2%3B%20Homo%20sapiens%3B%20ChIP-Seq;Cell%20group=Breast;<br>source_name=HMLE_Twist3D_H3K4me3;cell%20type=human%20mammary%20epithelial%20cells;transfected%20with=Twist1;culture%20type=sphere;chip%20antibody=H3K4me3;chip%20antibody%20vendor=Millipore; 222 . 9810 10438 0,226,255
zy@VM-0-15-ubuntu:~/tmp$ wc -l test.bed
10 test.bed
下載 http://www.biotrainee.com/jmzeng/rmDuplicate.zip 文件,并且解壓,查看里面的文件夾結(jié)構(gòu)
zy@VM-0-15-ubuntu:~/tmp$ wget -c http://www.biotrainee.com/jmzeng/rmDuplicate.zip
zy@VM-0-15-ubuntu:~/tmp$ unzip rmDuplicate.zip
zy@VM-0-15-ubuntu:~/tmp$ ls
rmDuplicate rmDuplicate.zip test.bed
zy@VM-0-15-ubuntu:~/tmp$ tree rmDuplicate
rmDuplicate
├── picard
│ ├── paired
│ │ ├── readme.txt
│ │ ├── tmp.header
│ │ ├── tmp.MarkDuplicates.log
│ │ ├── tmp.metrics
│ │ ├── tmp.rmdup.bai
│ │ ├── tmp.rmdup.bam
│ │ ├── tmp.sam
│ │ └── tmp.sorted.bam
│ └── single
│ ├── readme.txt
│ ├── tmp.header
│ ├── tmp.MarkDuplicates.log
│ ├── tmp.metrics
│ ├── tmp.rmdup.bai
│ ├── tmp.rmdup.bam
│ ├── tmp.sam
│ └── tmp.sorted.bam
└── samtools
├── paired
│ ├── readme.txt
│ ├── tmp.header
│ ├── tmp.rmdup.bam
│ ├── tmp.rmdup.vcf.gz
│ ├── tmp.sam
│ ├── tmp.sorted.bam
│ └── tmp.sorted.vcf.gz
└── single
├── readme.txt
├── tmp.header
├── tmp.rmdup.bam
├── tmp.rmdup.vcf.gz
├── tmp.sam
├── tmp.sorted.bam
└── tmp.sorted.vcf.gz
進入 rmDuplicate/samtools/single 文件夾里面,查看后綴為 .sam 的文件,搞清楚 生物信息學里面的SAM/BAM 定義是什么。
zy@VM-0-15-ubuntu:~$ cd ~/tmp/rmDuplicate/samtools/single
zy@VM-0-15-ubuntu:~/tmp/rmDuplicate/samtools/single$ less -S tmp.sam
# sam文件是序列比對后的標準文件。bam文件是sam的二進制文件。bam所占的內(nèi)存比較小。sam包括頭部注釋信息和比對信息,比對信息每行12列,以tab健分割。每列含義:
打開 后綴為BAM 的文件,找到產(chǎn)生該文件的命令。
zy@VM-0-15-ubuntu:~/tmp/rmDuplicate/samtools/single$ samtools view tmp.rmdup.bam
zy@VM-0-15-ubuntu:~/tmp/rmDuplicate/samtools/single$ ls
readme.txt tmp.header tmp.rmdup.bam tmp.rmdup.vcf.gz tmp.sam tmp.sorted.bam tmp.sorted.vcf.gz
# 找到tmp.header查看
zy@VM-0-15-ubuntu:~/tmp/rmDuplicate/samtools/single$ tail -n 3 tmp.header
@SQ SN:chrY LN:57227415
@SQ SN:chrY_KI270740v1_random LN:37240
@PG ID:bowtie2 PN:bowtie2 VN:2.2.9 CL:"/home/jianmingzeng/biosoft/bowtie/bowtie2-2.2.9/bowtie2-align-s --wrapper basic-0 -p 20 -x /home/jianmingzeng/reference/index/bowtie/hg38 -S /home/jianmingzeng/data/public/allMouse/alignment/WT_rep2_Input.sam -U /tmp/41440.unp"
# 其中CL字樣處就是要找的命令
根據(jù)上面的命令,找到我使用的參考基因組 /home/jianmingzeng/reference/index/bowtie/hg38 具體有多少條染色體。
# 頭文件的 @SQ的注釋信息就是參考基因組的注釋信息。
zy@VM-0-15-ubuntu:~/tmp/rmDuplicate/samtools/single$ grep -o -E "SN:chr[0-9]+|SN:chr[a-z A-Z]+" tmp.header|sort |uniq
zy@VM-0-15-ubuntu:~/tmp/rmDuplicate/samtools/single$ grep -o -E "SN:chr[0-9]+|SN:chr[a-z A-Z]+" tmp.header|sort |uniq|wc -l
26
上面的后綴為BAM 的文件的第二列,只有 0 和 16 兩個數(shù)字,用 cut/sort/uniq等命令統(tǒng)計它們的個數(shù)。
zy@VM-0-15-ubuntu:~/tmp/rmDuplicate/samtools/single$ samtools view tmp.rmdup.bam |cut -f 2 |sort -n |uniq -dc
16 0
12 16
重新打開 rmDuplicate/samtools/paired 文件夾下面的后綴為BAM 的文件,再次查看第二列,并且統(tǒng)計
zy@VM-0-15-ubuntu:~/tmp/rmDuplicate/samtools/paired$ samtools view tmp.rmdup.bam |cut -f 2 |sort -n |uniq -dc
2 83
2 97
8 99
7 147
2 163
下載 http://www.biotrainee.com/jmzeng/sickle/sickle-results.zip 文件,并且解壓,查看里面的文件夾結(jié)構(gòu)
zy@VM-0-15-ubuntu:~/tmp$ wget -c http://www.biotrainee.com/jmzeng/sickle/sickle-results.zip
zy@VM-0-15-ubuntu:~/tmp$ ls -lh
total 2.4M
drwxrwxr-x 4 zy zy 4.0K Nov 12 2016 rmDuplicate
-rw-rw-r-- 1 zy zy 103K Nov 12 2016 rmDuplicate.zip
-rw-rw-r-- 1 zy zy 2.3M Oct 6 2016 sickle-results.zip
-rw-rw-r-- 1 zy zy 3.1K May 18 2017 test.bed
解壓 sickle-results/single_tmp_fastqc.zip 文件,并且進入解壓后的文件夾,找到 fastqc_data.txt 文件,并且搜索該文本文件以 >>開頭的有多少行?
zy@VM-0-15-ubuntu:~/tmp/sickle-results$ unzip single_tmp_fastqc.zip
zy@VM-0-15-ubuntu:~/tmp/sickle-results$ cd single_tmp_fastqc/
zy@VM-0-15-ubuntu:~/tmp/sickle-results/single_tmp_fastqc$ ls
fastqc_data.txt fastqc.fo fastqc_report.html Icons Images summary.txt
y@VM-0-15-ubuntu:~/tmp/sickle-results/single_tmp_fastqc$ grep ^\>\> fastqc_data.txt |cat -n
1 >>Basic Statistics pass
2 >>END_MODULE
3 >>Per base sequence quality pass
4 >>END_MODULE
5 >>Per tile sequence quality pass
6 >>END_MODULE
7 >>Per sequence quality scores pass
8 >>END_MODULE
9 >>Per base sequence content fail
10 >>END_MODULE
11 >>Per sequence GC content warn
12 >>END_MODULE
13 >>Per base N content pass
14 >>END_MODULE
15 >>Sequence Length Distribution warn
16 >>END_MODULE
17 >>Sequence Duplication Levels pass
18 >>END_MODULE
19 >>Overrepresented sequences warn
20 >>END_MODULE
21 >>Adapter Content pass
22 >>END_MODULE
23 >>Kmer Content warn
24 >>END_MODULE
下載 http://www.biotrainee.com/jmzeng/tmp/hg38.tss 文件,去NCBI找到TP53/BRCA1等自己感興趣的基因?qū)?yīng)的 refseq數(shù)據(jù)庫 ID,然后找到它們的hg38.tss 文件的哪一行。(https://www.ncbi.nlm.nih.gov/gene/7157)

點擊NM_001126113.2跳轉(zhuǎn)到新頁面


zy@VM-0-15-ubuntu:~/tmp$ grep NM_001126113 hg38.tss
NM_001126113 chr17 7685550 7689550 1
解析hg38.tss 文件,統(tǒng)計每條染色體的基因個數(shù)。
zy@VM-0-15-ubuntu:~/tmp$ cat hg38.tss |head
NR_046018 chr1 9874 13874 0
NR_024540 chr1 27370 31370 1
NR_104148 chr7 64664083 64668083 0
NR_111960 chrX 44871175 44875175 0
NR_028458 chr14 92104621 92108621 1
NR_028459 chr14 92104621 92108621 1
NR_026818 chr1 34081 38081 1
NR_026820 chr1 34081 38081 1
NR_026822 chr1 34081 38081 1
NM_001005484 chr1 67091 71091 0
zy@VM-0-15-ubuntu:~/tmp$ cat hg38.tss |cut -f2 >chr.txt
zy@VM-0-15-ubuntu:~/tmp$ less chr.txt
wc -l chr.txt
wc -l chr2.txt
less chr2.txt
zy@VM-0-15-ubuntu:~/tmp$ sort chr2.txt |uniq -dc
6157 chr1
2838 chr10
3577 chr11
3014 chr12
1133 chr13
1982 chr14
2377 chr15
2696 chr16
3794 chr17
883 chr18
5880 chr19
4090 chr2
1692 chr20
895 chr21
1410 chr22
3395 chr3
2277 chr4
2821 chr5
5782 chr6
2785 chr7
2221 chr8
2310 chr9
2 chrM
32 chrUn
2561 chrX
414 chrY
解析hg38.tss 文件,統(tǒng)計NM和NR開頭的序列,了解NM和NR開頭的含義。
zy@VM-0-15-ubuntu:~/tmp$ grep -o -E "NM|NR" hg38.tss |sort |uniq -dc
51064 NM
15954 NR
# NM指的是轉(zhuǎn)錄組產(chǎn)物的序列; NR指的是非編碼的轉(zhuǎn)錄子序列