国产一区二区精彩福利,我慢在线不卡视频

劉小澤記錄于 2019.5.10
將Turner寫的一些好用的單行命令與大家分享，原文還有許多可以去看
https://github.com/stephenturner/oneliners

About Fastq/fasta

fastq sequences length distribution => 得到fq文件中序列長度的分布

$ zcat file.fastq.gz | awk 'NR%4 == 2 {lengths[length($0)]++} END {for (l in lengths) {print l, lengths[l]}}'

reverse complement => 反向互補(bǔ)

$ echo 'ATTGCTATGCTNNNT' | rev | tr 'ACTG' 'TGAC'

fastq2fasta

$ zcat file.fastq.gz | paste - - - - | perl -ane 'print ">$F[0]\n$F[1]\n";' | gzip -c > file.fasta.gz

split a multifasta file into single ones with csplit => fasta按>拆分

# * refers to the number of files 可以選擇拆分的文件數(shù)量
$ csplit -z -q -n 4 -f test test.fa /\>/ {*}
# OR use awk 一次性全部按>拆分
$ awk '/^>/{s=++d".fa"} {print > s}' multi.fa

single line fasta to multi-line of 50 characters in each line => 單行fa變多行

$ awk -v FS= '/^>/{print;next}{for (i=0;i<=NF/50;i++) {for (j=1;j<=50;j++) printf "%s", $(i*50 +j); print ""}}' file

# fold -w 50 file

multi-line fasta to one-line => 一個多行fa文件變單行

# 方法一：
$ awk 'BEGIN{RS=">"}NR>1{sub("\n","\t"); gsub("\n",""); print RS$0}' file.fa
# 方法二：
$ cat file.fasta | awk '/^>/{if(N>0) printf("\n"); ++N; printf("%s\t",$0);next;} {printf("%s",$0);}END{printf("\n");}'

Number of reads in a fastq file => 統(tǒng)計fq中序列數(shù)（4行一個序列）

$ cat file.fq | echo $((`wc -l`/4))

print length of each entry in a multifasta file => 打印fa中每個序列的長度

$ awk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen = seqlen +length($0)}END{print seqlen}' file.fa

subsample fastq => 取fq文件的子集（其中0.01是指取出來百分之1的reads）

$ cat file.fq | paste - - - - | awk 'BEGIN{srand(1234)}{if(rand() < 0.01) print $0}' | tr '\t' '\n' > out.fq

About Sam/Bam

bam2bed

samtools view file.bam | perl -F'\t' -ane '$strand=($F[1]&16)?"-":"+";$length=1;$tmp=$F[5];$tmp =~ s/(\d+)[MD]/$length+=$1/eg;print "$F[2]\t$F[3]\t".($F[3]+$length)."\t$F[0]\t0\t$strand\n";' > file.bed

bam2wig

samtools mpileup -BQ0 file.sorted.bam | perl -pe '($c, $start, undef, $depth) = split;if ($c ne $lastC || $start != $lastStart+1) {print "fixedStep chrom=$c start=$start step=1 span=1\n";}$_ = $depth."\n";($lastC, $lastStart) = ($c, $start);' | gzip -c > file.wig.gz

Basis Linux

get all folders' size in the current folder => 當(dāng)前目錄下的所有目錄大小

$ du -h --max-depth=1

exit a dead ssh session => 退出卡死的ssh界面

$ ~.

copy large folders fast => 快速拷貝大文件夾

# copy every file in folder 拷貝目錄下的所有文件
rsync -av from_dir/ to_dir
# skip transferred files 跳過已拷貝的文件
rsync -avhP from_dir/ to_dir

find bam in the current folder recursively and copy them to a new dir with 5 CPUs => 拷貝大文件（如bam）到其他文件夾，并用5個線程

find . -name "*bam" | xargs -P5 -I{} rsync -av {} dest_dir

group files by extensions => 按照后綴的順序排序文件

ll -X

loop through all the names => 循環(huán)語句

for i in {1..22} X Y 
do
  echo $i
done
# 對于{01..22} 的結(jié)果是 01 02 ...

GREP

grep fastq reads containing a pattern but maintain the fastq format => 匹配fq中序列并打印

# 例如要在SP1.fq中找到這段序列的fq格式
# 如果匹配到多個，那么每條序列中間會用--分隔，因此需要用sed去除
$ grep -A 2 -B 1 'TGAGACAACATCT' SP1.fq | sed '/^--$/d' > out.fq

SED

delete with sed => 刪除行

# delete blank lines
sed /^$/d
# delete the last line
sed $d

AWK

awk join two files with common columns => awk連接有共同列的文件（類似于R的merge函數(shù)）

# http://stackoverflow.com/questions/13258604/join-two-files-using-awk
# file_a.bed： 
chr1    123 aa  b   c   d
chr1    234 a   b   c   d
chr1    345 aa  b   c   d
chr1    456 a   b   c   d
# file_b.bed
xxxx    abcd    chr1    123 aa  c   d   e
yyyy    defg    chr1    345 aa  e   f   g
# 現(xiàn)在想在a的基礎(chǔ)上根據(jù)a、b共有列來增加b中的新內(nèi)容
$ awk 'NR==FNR{a[$3,$4,$5]=$1OFS$2;next}{$6=a[$1,$2,$3];print}' OFS='\t' \
file_b.bed file_a.bed

# 結(jié)果
chr1    123 aa  b   c   xxxx    abcd
chr1    234 a   b   c   
chr1    345 aa  b   c   yyyy    defg
chr1    456 a   b   c

Explanation:

NR==FNR NR is the current input line number and FNR the current file's line number. The two will be equal only while the 1st file is being read.

OFS awk set the output field seperator; while set the input seperator is -F

next means to proceed for the next line, rather than execute the following { } code block

awk to compare two different files and print if matches=> 比較兩個文件的指定列，然后打印比對上的行

# https://unix.stackexchange.com/questions/134829/compare-two-columns-of-different-files-and-print-if-it-matches
# 例如 file1
abc|123|BNY|apple|
cab|234|cyx|orange|
def|kumar|pki|bird|
# file2
abc|123|
kumar|pki|
cab|234
# expected
abc|123|BNY|apple|
cab|234|cyx|orange|

$  awk -F'|' 'NR==FNR{a[$1$2]++;next};a[$1$2] > 0' file2 file1

conditional operator => 條件判斷

基本格式：var=condition?condition_if_true:condition_if_false

例如：

# 現(xiàn)在有這個文件 test
a1  ACTGTCTGTCACTGTGTTGTGATGTTG
a2  ACTTTATATAT
a3  ACTTATATATATATA
a4  ACTTATATATATATA
a5  ACTTTATATATT    
# 我想看看每行序列部分是不是大于14個堿基
$ awk '{print (length($2)>14)?$0">14":$0"<=14"}' test

get new line => 在原來的內(nèi)容基礎(chǔ)上增加新內(nèi)容

$ awk 'BEGIN{while((getline k <"test")>0) print "NEW:"k}{print}' test

merge multi-fasta into one single fasta => 合并多個fasta文件到一個文件中

# give a awk script called linearize.awk
$cat >linearize.awk 
# then copy and paste below
/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}

# run the awk script
$ paste <(awk -f linearize.awk file1.fa ) <(awk -f linearize.awk file2.fa  )| tr "\t" "\n" > multi.fa

根據(jù)id輸出序列

while read -r line; do awk -v pattern=$line -v RS=">" '$0 ~ pattern { printf(">%s", $0); }'  Seq.fasta; done < id.txt > output.fa

需要指定Seq.fasta、id.txt

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

與你分享生信好用的單行命令

與你分享生信好用的單行命令

About Fastq/fasta

fastq sequences length distribution => 得到fq文件中序列長度的分布

reverse complement => 反向互補(bǔ)

fastq2fasta

split a multifasta file into single ones with csplit => fasta按>拆分

single line fasta to multi-line of 50 characters in each line => 單行fa變多行

multi-line fasta to one-line => 一個多行fa文件變單行

Number of reads in a fastq file => 統(tǒng)計fq中序列數(shù)（4行一個序列）

print length of each entry in a multifasta file => 打印fa中每個序列的長度

subsample fastq => 取fq文件的子集（其中0.01是指取出來百分之1的reads）

About Sam/Bam

bam2bed

bam2wig

Basis Linux

get all folders' size in the current folder => 當(dāng)前目錄下的所有目錄大小

exit a dead ssh session => 退出卡死的ssh界面

copy large folders fast => 快速拷貝大文件夾

find bam in the current folder recursively and copy them to a new dir with 5 CPUs => 拷貝大文件（如bam）到其他文件夾，并用5個線程

group files by extensions => 按照后綴的順序排序文件

loop through all the names => 循環(huán)語句

GREP

grep fastq reads containing a pattern but maintain the fastq format => 匹配fq中序列并打印

SED

delete with sed => 刪除行

AWK

awk join two files with common columns => awk連接有共同列的文件（類似于R的merge函數(shù)）

awk to compare two different files and print if matches=> 比較兩個文件的指定列，然后打印比對上的行

conditional operator => 條件判斷

get new line => 在原來的內(nèi)容基礎(chǔ)上增加新內(nèi)容

merge multi-fasta into one single fasta => 合并多個fasta文件到一個文件中

根據(jù)id輸出序列

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

與你分享生信好用的單行命令

About Fastq/fasta

fastq sequences length distribution => 得到fq文件中序列長度的分布

reverse complement => 反向互補(bǔ)

fastq2fasta

split a multifasta file into single ones with csplit => fasta按>拆分

single line fasta to multi-line of 50 characters in each line => 單行fa變多行

multi-line fasta to one-line => 一個多行fa文件變單行

Number of reads in a fastq file => 統(tǒng)計fq中序列數(shù)（4行一個序列）

print length of each entry in a multifasta file => 打印fa中每個序列的長度

subsample fastq => 取fq文件的子集（其中0.01是指取出來百分之1的reads）

About Sam/Bam

bam2bed

bam2wig

Basis Linux

get all folders' size in the current folder => 當(dāng)前目錄下的所有目錄大小

exit a dead ssh session => 退出卡死的ssh界面

copy large folders fast => 快速拷貝大文件夾

find bam in the current folder recursively and copy them to a new dir with 5 CPUs => 拷貝大文件（如bam）到其他文件夾，并用5個線程

group files by extensions => 按照后綴的順序排序文件

loop through all the names => 循環(huán)語句

GREP

grep fastq reads containing a pattern but maintain the fastq format => 匹配fq中序列并打印

SED

delete with sed => 刪除行

AWK

awk join two files with common columns => awk連接有共同列的文件（類似于R的merge函數(shù)）

awk to compare two different files and print if matches=> 比較兩個文件的指定列，然后打印比對上的行

conditional operator => 條件判斷

get new line => 在原來的內(nèi)容基礎(chǔ)上增加新內(nèi)容

merge multi-fasta into one single fasta => 合并多個fasta文件到一個文件中

根據(jù)id輸出序列

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

find bam in the current folder recursively and copy them to a new dir with 5 CPUs => 拷貝大文件（如bam）到其他文件夾，并用5個線程

awk to compare two different files and print if matches=> 比較兩個文件的指定列，然后打印比對上的行