利用python求已知DNA模板的互補(bǔ)DNA序列(自學(xué)44天)

現(xiàn)有一段DNA序列:ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT,求其互補(bǔ)DNA序列。在生物上DNA互補(bǔ)序列簡(jiǎn)述表達(dá)可以表示為:A與T,C與G互補(bǔ),可以理解為將上述序列中現(xiàn)有的A用T代替,C用G代替,T用A代替,G用C代替,則其互補(bǔ)序列為:TGACTAGCTAATGCATATCATAAACGATAGTATGTATATATAGCTACGCAAGTA
根據(jù)上述表述,我可以利用replace()函數(shù)進(jìn)行替換,將A用T替換,T用A替換,C用G替換,G用C替換,簡(jiǎn)述其代碼如下:

my_dna = "ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT"
# replace A with T
sequence1 = my_dna.replace('A', 'T')
# replace T with A
sequence2 = sequence1.replace('T', 'A')
# replace C with G
sequence3 = sequence2.replace('C', 'G')
# replace G with C
sequence4 = sequence3.replace('G', 'C')
# print the result of the final replacement
print(sequence1)
print(sequence2)
print(sequence3)
print(sequence4)

其輸出結(jié)果如下:

TCTGTTCGTTTTCGTTTTGTTTTTGCTTTCTTTCTTTTTTTTCGTTGCGTTCTT
ACAGAACGAAAACGAAAAGAAAAAGCAAACAAACAAAAAAAACGAAGCGAACAA
AGAGAAGGAAAAGGAAAAGAAAAAGGAAAGAAAGAAAAAAAAGGAAGGGAAGAA
ACACAACCAAAACCAAAACAAAAACCAAACAAACAAAAAAAACCAACCCAACAA

顯然結(jié)果是不正確的,我們?cè)趕equence1到sequence2中就已經(jīng)出現(xiàn)錯(cuò)誤,誤把sequence1中A被替換之后變?yōu)門(mén)的序列,在sequence2中又被替換掉了,因此我們要轉(zhuǎn)變思路,保持只替換原本的序列,不進(jìn)行多次替換,避免錯(cuò)誤,我們可以嘗試每次只在原始序列上進(jìn)行替換,嘗試代碼如下:

my_dna = "ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT"
# replace A with T
sequence = my_dna.replace('A', 'T')
# replace T with A
sequence2 = my_dna.replace('T', 'A')
# replace C with G
sequence3 = my_dna.replace('C', 'G')
# replace G with C
sequence4 = my_dna.replace('G', 'C')
print(sequence1)
print(sequence2)
print(sequence3)
print(sequence4)

其輸出結(jié)果如下:

TCTGTTCGTTTTCGTTTTGTTTTTGCTTTCTTTCTTTTTTTTCGTTGCGTTCTT
ACAGAACGAAAACGAAAAGAAAAAGCAAACAAACAAAAAAAACGAAGCGAACAA
AGTGATGGATTAGGTATAGTATTTGGTATGATAGATATATATGGATGGGTTGAT
ACTCATCCATTACCTATACTATTTCCTATCATACATATATATCCATCCCTTCAT

顯然結(jié)果也是不正確的,因此,我們要引入中間變量,最后再把它做一個(gè)回環(huán),

天明豆豆

也就是說(shuō)引入四個(gè)臨時(shí)字母,然后每個(gè)變換2次,最后把最終結(jié)果輸出,其代碼可以為:

my_dna = "ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT"
sequence1 = my_dna.replace('A', 'H')
sequence2 = sequence1.replace('T', 'J')
sequence3 = sequence2.replace('C', 'K')
sequence4 = sequence3.replace('G', 'L')
sequence5 = sequence4.replace('H', 'T')
sequence6 = sequence5.replace('J', 'A')
sequence7 = sequence6.replace('K', 'G')
sequence8 = sequence7.replace('L', 'C')
print(sequence8)

其結(jié)果為:
TGACTAGCTAATGCATATCATAAACGATAGTATGTATATATAGCTACGCAAGTA

至此得到了我們想要的結(jié)果,但這種方法顯然是有些復(fù)雜了,我們可以利用字符的大小寫(xiě)來(lái)完成我們的工作,也就是利用小寫(xiě)字母為臨時(shí)變量,最終利用upper()輸出大寫(xiě)的結(jié)果就行了,其代碼和結(jié)果如下:

my_dna = "ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT"
sequence1 = my_dna.replace('A', 't')
print(sequence1)
sequence2 = sequence1.replace('T', 'a')
print(sequence2)
sequence3 = sequence2.replace('C', 'g')
print(sequence3)
sequence4 = sequence3.replace('G', 'c')
print(sequence4)
print(sequence4.upper())

其結(jié)果為:

tCTGtTCGtTTtCGTtTtGTtTTTGCTtTCtTtCtTtTtTtTCGtTGCGTTCtT
tCaGtaCGtaatCGatatGataaaGCataCtatCtatatataCGtaGCGaaCta
tgaGtagGtaatgGatatGataaaGgatagtatgtatatatagGtaGgGaagta
tgactagctaatgcatatcataaacgatagtatgtatatatagctacgcaagta
TGACTAGCTAATGCATATCATAAACGATAGTATGTATATATAGCTACGCAAGTA

至此我們的互補(bǔ)DNA序列得到了,也許有更好更簡(jiǎn)潔的代碼,歡迎評(píng)論補(bǔ)充。

日常結(jié)尾:
雖然這是個(gè)小小的計(jì)算程序,但對(duì)于初學(xué)者的我來(lái)說(shuō)每一次對(duì)原代碼的升級(jí)改造,哪怕是讀懂后的注釋都感覺(jué)是一次進(jìn)步提升,總之代碼雖小,動(dòng)手最重要!希望更多學(xué)習(xí)Python的愛(ài)好者不要像我一樣眼高手低,學(xué)習(xí)編程就是要,思考,敲碼,思考,敲碼,敲碼,再敲碼!

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容