最大公共子串和最大公共子序列

1 最大公共子串

問題描述:給定兩個字符串,求出最長公共子字符串及其長度

Python實現(xiàn)

def lcsubstr(s1, s2):
    n1 = len(s2)
    n2 = len(s2)
    m = [[0 for i in range(n1+1)] for j in range(n2+1)]
    mmax = 0
    p = 0
    for i in range(n1):
        for j in range(n2):
            if s1[i] == s2[j]:
                m[i+1][j+1] = m[i][j] + 1
                if m[i+1][j+1] > mmax:
                    mmax = m[i+1][j+1]
                    p = i+1
    return s1[p-mmax:p], mmax

2 最大公共子序列

子序列是指將給定序列中的零個或多個元素去掉之后的結(jié)果。給定兩個子序列X和Y,如果序列Z既是X的子序列也是Y的子序列,那么稱Z是X和Y的公共子序列,其中Z的元素不要求在X和Y中連續(xù)出現(xiàn)。

遞歸式表達如下:
c[i,j]= \begin{cases} 0, & if\ i=0\ or\ j=0 \\ c[i-1,j-1], &if\ i,j>0\ and\ x_i = y_j \\ max(c[i-1,j],c[i,j-1]) & if\ i,j>0\ and\ x_i \ne y_j \end{cases}

為了求最長公共子序列的長度,只需要采用動態(tài)規(guī)劃算法自下而上的填充c[i,j]即可,而為了求出最長公共子序列,按照算法導(dǎo)論書上的方法,維護一張表,記錄尋找最長公共子序列的方向。

不需要求解LCS本身的python實現(xiàn)

def lcs(s1, s2):
    n1 = len(s1)
    n2 = len(s2)
    m = [[0 for i in range(n1+1)] for j in range(n2+1)]
    b = [[0 for i in range(n1+1)] for j in range(n2+1)]
    # According to the list comprehsions, note the maximum index of the row is n2
    # and the maximum index of the column is n1, so 0<= i <=n2 and 0<= j <=n1,
    # otherwise you may cause error "string index out of range".
    for i in range(n2):
        for j in range(n1):
            if s1[j] == s2[i]:
                m[i+1][j+1] = m[i][j] + 1
                b[i+1][j+1] = 'ok'
            elif m[i+1][j] > m[i][j+1]:
                m[i+1][j+1] = m[i+1][j]
                b[i+1][j+1] = 'left'
            else:
                m[i+1][j+1] = m[i][j+1]
                b[i+1][j+1] = 'up'
    return b, m[-1][-1]

需要求解LCS時,

def lcs(s1, s2):
    n1 = len(s1)
    n2 = len(s2)
    m = [[0 for i in range(n1+1)] for j in range(n2+1)]
    b = [[0 for i in range(n1+1)] for j in range(n2+1)]
    # According to the list comprehsions, note the maximum index of the row is n2
    # and the maximum index of the column is n1, so 0<= i <=n2 and 0<= j <=n1,
    # otherwise you may cause error "string index out of range".
    for i in range(n2):
        for j in range(n1):
            if s1[j] == s2[i]:
                m[i+1][j+1] = m[i][j] + 1
                b[i+1][j+1] = 'ok'
            elif m[i+1][j] > m[i][j+1]:
                m[i+1][j+1] = m[i+1][j]
                b[i+1][j+1] = 'left'
            else:
                m[i+1][j+1] = m[i][j+1]
                b[i+1][j+1] = 'up'
    return b, m[-1][-1]

def printcls(b, s2, i, j):
    if not i or not b:
        return
    if b[i][j] == 'ok':
        printcls(b, s2, i-1, j-1)
        print(s2[i-1], end='')
    elif b[i][j] == 'left':
        printcls(b, s2, i, j-1)
    elif b[i][j] == 'up':
        printcls(b, s2, i-1, j)

if __name__ == '__main__':
    s1 = 'helloworld'
    s2 = 'loop'
    b, mmax = lcs(s1, s2)
    print(mmax)
    printcls(b, s2, len(s2), len(s1))

簡直非??拥?,生成列表時候是

m = [[0 for i in range(len(s1)+1)] for j in range(len(s2)+1)]

但是之后s1得用j索引,s2用i索引,i和j分別表示行和列,范圍是[0, n2-1]和[0, n1-1]

對比一下

# In the following code, the matrix represented by numpy is ok but by list comprehsions will cause bug.
import numpy as np
m = np.zeros([4, 10], dtype=np.int)
for i in range(4):
    for j in range(10):
        print(m[i][j])

m = [0 for i in range(4) for j in range(10)]
for i in range(4):
    for j in range(10):
        print(m[i][j])

因為np.zeros([4, 10], dtype=np.int)表示4行10列,而[0 for i in range(4) for j in range(10)]表示10行4列

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容