#=======================================================================
# Author: Isai Damier
# Title: Longest Common Subsequence
# Project: geekviewpoint
# Package: algorithms
#
# Statement:
# Given two sequences, find the longest common subsequence
# between them.
#
# Time Complexity: O(n*m)
# where m and n are the size of the input lists, respectively
#
# Sample Input: awaited, alpine
# Sample Output: aie
#
# DEFINITION OF SUBSEQUENCE:
# A sequence is a particular order in which related objects follow
# each other (e.g. DNA, Fibonacci). A sub-sequence is a sequence
# obtained by omitting some of the elements of a larger sequence.
#
# SEQUENCE SUBSEQUENCE OMISSION
# [3,1,2,5,4] [1,2] 3,5,4
# [3,1,2,5,4] [3,1,4] 2,5
#
# SEQUENCE NOT SUBSEQUENCE REASON
# [3,1,2,5,4] [4,2,5] 4 should follow 5
#
# STRATEGY:
# An easy way to find a longest common subsequence of characters between
# two words is to first track the lengths of all the common sequences
# and then from those lengths pick a maximum; finally, from that
# maximum length, trace out the actual longest common subsequence
# between the two words.
#
# Let's use a table C (i.e. list C) to track the lengths of all common
# sub-sequences between the two input words. Naturally, C starts with
# all cells set to zero. Then to fill up C, we walk thru the two words,
# comparing characters. Each time we find a match, we mark the
# corresponding cell in C as one plus whatever maximum we had seen so
# far for that particular sub-sequence.
#
# When we are done filling up C, we find a maximum length z in C. Then
# using the indices of z, we trace that actual sequence and return it
# as the answer.
#
# The ensuing code shows the details of how we fill C and then from C
# extract the actual LCS.
#
# ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~
#
# The remaining paragraphs contain details that you may not particularly
# care for. You may skip them and go right to the code.
#
# Finding the LCS of two sequences X and Y, i.e. LCS(X,Y), is similar
# in structure to finding the Fibonacci of some number n, i.e. fib(n).
# Both problems exhibit optimal substructures. Which is to say,
# for instance, the solution to fib(5) contains the solution to fib(4)
# which in turn contains fib(3), and so on. Similarly, the solution
# of LCS(X,Y) contains the solution of LCS(X-1,Y), which in turn
# contains the solution of LCS(X-1,Y-1), and so on. Whenever a problem
# exhibits "optimal substructure", we can use recursion to solve it:
#
# _
# | fib(n-1)+fib(n-2) if n > 1
# fib(n) = | n if n == 1 or n == 0
# |_
#
# For the LCS recurrence relation, a few notes are necessary:
# Let C(i,j) be the length of the prefix of LCS(X,Y) given by
#
# C(i,j) = |LCS(X[1..i],Y[1..j])|
#
# so that if the length of X is m and the length of Y is n,
#
# C(m,n) = |LCS(X,Y)|
#
# Then the recurrence relation for filling C is
# _
# | C(i-1,j-1)+1 if X(i) == Y(j)
# C(i,j) = | max[C(i-1,j),C(i,j-1)] otherwise
# |_
#
# In addition to "optimal substructures", the LCS(X,Y) like the fib(n)
# contains overlapping subproblems. What we mean is this: In solving
# f(5), we meet f(2) both as a subproblem of f(4) and as a subproblem
# of f(3).
#
# fib(5)
# / \
# fib(4) fib(3) :fib(5)=fib(4)+fib(3)
# / \ / \
# fib(3) fib(2) fib(2) fib(1) :fib(5)=fib(3)+fib(2)+fib(2)+fib(1)
#
# Whenever a problem exhibits "overlapping subproblems", if you use
# recursion as a strategy you will end up solving the same subproblems
# more than once. Therefore, it makes more sense to track subproblems
# using a table so that once you solve a subproblem, you can simply
# lookup it's solution next time you need.
#
# CONCLUSION:
# In the foregoing "strategy" section, we essentially described
# dynamic programming.
#
# Dynamic Programming: (aka Dynamic Tabling) is a strategy for solving
# problems, which exhibit optimal substructures and overlapping
# subproblems, by using a table to track the solution of subproblems
# that have already been solved so as not to solve a subproblem more
# than once. This tracking or note taking is usually referred to as
# memoization, meaning writing in a memo.
#=======================================================================
#====================================================================
# Space Complexity: O(n*m)
# where m and n are the size of the input lists, respectively
#
# This solution first computes a table of lengths and then from that
# table computes an actual longest common subsequence.
#====================================================================
def LCS( A, B ):
m = lcsTable( A, B )
return actualLCS( m, A, B )
#=======================================================================
# LCS table of lengths
#=======================================================================
def lcsTable( A, B ):
m = [[0 for y in range( len( B ) + 1 )] for x in range( len( A ) + 1 )]
for x in range( 1, len( A ) + 1 ):
for y in range( 1, len( B ) + 1 ):
if A[x - 1] == B[y - 1]:
m[x][y] = 1 + m[x - 1][y - 1]
else:
m[x][y] = max( m[x][y - 1], m[x - 1][y] )
return m
#=======================================================================
# Extract an actual LCS from the length table, which effectively
# contains all the possible longest common sequences.
#=======================================================================
def actualLCS( m, A, B ):
result = []
x , y = len( A ), len( B )
while x > 0 and y > 0:
if A[x - 1] == B[y - 1]:
result.append( A[x - 1] )
x -= 1
y -= 1
elif m[x][y - 1] > m[x - 1][y]:
y -= 1
else:
x -= 1
return result
import unittest
from dynamic_programming import LongestCommonSubsequence as subsequence
class Test( unittest.TestCase ):
def testLCS( self ):
fibonacci = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377,
610, 987]
prime = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103]
rand = [0, 1, 21, 45, 144, 610, 987]
bad = [102, 104, 106, 108, 110, 112, 114]
exp1 = [89, 13, 5, 3, 2]
exp2 = [987, 610, 144, 21, 1, 0]
# no shared end points
actual = subsequence.LCS( fibonacci, prime )
self.assertEquals( exp1, actual )
# share both end points
actual = subsequence.LCS( fibonacci, rand )
self.assertEquals( exp2, actual )
# share no points
actual = subsequence.LCS( fibonacci, bad )
self.assertEquals( 0, len( actual ) )
# share all points
actual = subsequence.LCS( fibonacci, fibonacci )
actual.reverse()
self.assertEquals( fibonacci, actual )