1 最大公共子串
問題描述:給定兩個字符串,求出最長公共子字符串及其長度
Python實現(xiàn)
def lcsubstr(s1, s2):
n1 = len(s2)
n2 = len(s2)
m = [[0 for i in range(n1+1)] for j in range(n2+1)]
mmax = 0
p = 0
for i in range(n1):
for j in range(n2):
if s1[i] == s2[j]:
m[i+1][j+1] = m[i][j] + 1
if m[i+1][j+1] > mmax:
mmax = m[i+1][j+1]
p = i+1
return s1[p-mmax:p], mmax
2 最大公共子序列
子序列是指將給定序列中的零個或多個元素去掉之后的結(jié)果。給定兩個子序列X和Y,如果序列Z既是X的子序列也是Y的子序列,那么稱Z是X和Y的公共子序列,其中Z的元素不要求在X和Y中連續(xù)出現(xiàn)。
遞歸式表達如下:
為了求最長公共子序列的長度,只需要采用動態(tài)規(guī)劃算法自下而上的填充即可,而為了求出最長公共子序列,按照算法導(dǎo)論書上的方法,維護一張表,記錄尋找最長公共子序列的方向。
不需要求解LCS本身的python實現(xiàn)
def lcs(s1, s2):
n1 = len(s1)
n2 = len(s2)
m = [[0 for i in range(n1+1)] for j in range(n2+1)]
b = [[0 for i in range(n1+1)] for j in range(n2+1)]
# According to the list comprehsions, note the maximum index of the row is n2
# and the maximum index of the column is n1, so 0<= i <=n2 and 0<= j <=n1,
# otherwise you may cause error "string index out of range".
for i in range(n2):
for j in range(n1):
if s1[j] == s2[i]:
m[i+1][j+1] = m[i][j] + 1
b[i+1][j+1] = 'ok'
elif m[i+1][j] > m[i][j+1]:
m[i+1][j+1] = m[i+1][j]
b[i+1][j+1] = 'left'
else:
m[i+1][j+1] = m[i][j+1]
b[i+1][j+1] = 'up'
return b, m[-1][-1]
需要求解LCS時,
def lcs(s1, s2):
n1 = len(s1)
n2 = len(s2)
m = [[0 for i in range(n1+1)] for j in range(n2+1)]
b = [[0 for i in range(n1+1)] for j in range(n2+1)]
# According to the list comprehsions, note the maximum index of the row is n2
# and the maximum index of the column is n1, so 0<= i <=n2 and 0<= j <=n1,
# otherwise you may cause error "string index out of range".
for i in range(n2):
for j in range(n1):
if s1[j] == s2[i]:
m[i+1][j+1] = m[i][j] + 1
b[i+1][j+1] = 'ok'
elif m[i+1][j] > m[i][j+1]:
m[i+1][j+1] = m[i+1][j]
b[i+1][j+1] = 'left'
else:
m[i+1][j+1] = m[i][j+1]
b[i+1][j+1] = 'up'
return b, m[-1][-1]
def printcls(b, s2, i, j):
if not i or not b:
return
if b[i][j] == 'ok':
printcls(b, s2, i-1, j-1)
print(s2[i-1], end='')
elif b[i][j] == 'left':
printcls(b, s2, i, j-1)
elif b[i][j] == 'up':
printcls(b, s2, i-1, j)
if __name__ == '__main__':
s1 = 'helloworld'
s2 = 'loop'
b, mmax = lcs(s1, s2)
print(mmax)
printcls(b, s2, len(s2), len(s1))
簡直非??拥?,生成列表時候是
m = [[0 for i in range(len(s1)+1)] for j in range(len(s2)+1)]
但是之后s1得用j索引,s2用i索引,i和j分別表示行和列,范圍是[0, n2-1]和[0, n1-1]
對比一下
# In the following code, the matrix represented by numpy is ok but by list comprehsions will cause bug.
import numpy as np
m = np.zeros([4, 10], dtype=np.int)
for i in range(4):
for j in range(10):
print(m[i][j])
m = [0 for i in range(4) for j in range(10)]
for i in range(4):
for j in range(10):
print(m[i][j])
因為np.zeros([4, 10], dtype=np.int)表示4行10列,而[0 for i in range(4) for j in range(10)]表示10行4列