Repeated DNA Sequences

題目來源
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].

看到這道題的最初想法是用哈希,記錄每一個十個字母長度的數(shù)目,如果大于等于2則輸出。然后直接AC了!

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        vector<string> res;
        int n = s.length();
        if (n < 10)
            return res;
        unordered_map<string, int> map;
        for (int i=0; i<=n-10; i++) {
            map[s.substr(i, 10)]++;
            if (map[s.substr(i, 10)] == 2)
                res.push_back(s.substr(i, 10));
        }
        return res;
    }
};

然后看看大神們的做法吧,結(jié)合上位操作。
The main idea is to store the substring as int in map to bypass the memory limits.
There are only four possible character A, C, G, and T, but I want to use 3 bits per letter instead of 2.
Why? It's easier to code.
A is 0x41, C is 0x43, G is 0x47, T is 0x54. Still don't see it? Let me write it in octal.
A is 0101, C is 0103, G is 0107, T is 0124. The last digit in octal are different for all four letters. That's all we need!
We can simply use s[i] & 7 to get the last digit which are just the last 3 bits, it's much easier than lookup table or switch or a bunch of if and else, right?
We don't really need to generate the substring from the int. While counting the number of occurrences, we can push the substring into result as soon as the count becomes 2, so there won't be any duplicates in the result.

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        vector<string> res;
        int n = s.length();
        if (n < 10)
            return res;
        unordered_map<int, int> map;
        int i = 0, t = 0;
        while (i < 9) {
            t = t << 3 | s[i++] & 7;
        }
        while (i < n){
            if (map[t = t << 3 & 0x3FFFFFFF | s[i++] & 7]++ == 1)
                res.push_back(s.substr(i-10, 10));
        }
        return res;
    }
};
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容