#=======================================================================
# Author: Isai Damier
# Title: Longest Common Subsequence
# Project: geekviewpoint
# Package: algorithms
#
# Statement:
# Given two sequences, find the longest common subsequence
# between them.
#
# Time Complexity: O(n*m)
# where m and n are the size of the input lists, respectively
#
# Sample Input: awaited, alpine
# Sample Output: aie
#
# DEFINITION OF SUBSEQUENCE:
# A sequence is a particular order in which related objects follow
# each other (e.g. DNA, Fibonacci). A sub-sequence is a sequence
# obtained by omitting some of the elements of a larger sequence.
#
# SEQUENCE SUBSEQUENCE OMISSION
# [3,1,2,5,4] [1,2] 3,5,4
# [3,1,2,5,4] [3,1,4] 2,5
#
# SEQUENCE NOT SUBSEQUENCE REASON
# [3,1,2,5,4] [4,2,5] 4 should follow 5
#
# STRATEGY:
# An easy way to find a longest common subsequence of characters between
# two words is to first track the lengths of all the common sequences
# and then from those lengths pick a maximum; finally, from that
# maximum length, trace out the actual longest common subsequence
# between the two words.
#
# Let's use a table C (i.e. list C) to track the lengths of all common
# sub-sequences between the two input words. Naturally, C starts with
# all cells set to zero. Then to fill up C, we walk thru the two words,
# comparing characters. Each time we find a match, we mark the
# corresponding cell in C as one plus whatever maximum we had seen so
# far for that particular sub-sequence.
#
# When we are done filling up C, we find a maximum length z in C. Then
# using the indices of z, we trace that actual sequence and return it
# as the answer.
#
# The ensuing code shows the details of how we fill C and then from C
# extract the actual LCS.
#
# ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~
#
# The remaining paragraphs contain details that you may not particularly
# care for. You may skip them and go right to the code.
#
# Finding the LCS of two sequences X and Y, i.e. LCS(X,Y), is similar
# in structure to finding the Fibonacci of some number n, i.e. fib(n).
# Both problems exhibit optimal substructures. Which is to say,
# for instance, the solution to fib(5) contains the solution to fib(4)
# which in turn contains fib(3), and so on. Similarly, the solution
# of LCS(X,Y) contains the solution of LCS(X-1,Y), which in turn
# contains the solution of LCS(X-1,Y-1), and so on. Whenever a problem
# exhibits "optimal substructure", we can use recursion to solve it:
#
# _
# | fib(n-1)+fib(n-2) if n > 1
# fib(n) = | n if n == 1 or n == 0
# |_
#
# For the LCS recurrence relation, a few notes are necessary:
# Let C(i,j) be the length of the prefix of LCS(X,Y) given by
#
# C(i,j) = |LCS(X[1..i],Y[1..j])|
#
# so that if the length of X is m and the length of Y is n,
#
# C(m,n) = |LCS(X,Y)|
#
# Then the recurrence relation for filling C is
# _
# | C(i-1,j-1)+1 if X(i) == Y(j)
# C(i,j) = | max[C(i-1,j),C(i,j-1)] otherwise
# |_
#
# In addition to "optimal substructures", the LCS(X,Y) like the fib(n)
# contains overlapping subproblems. What we mean is this: In solving
# f(5), we meet f(2) both as a subproblem of f(4) and as a subproblem
# of f(3).
#
# fib(5)
# / \
# fib(4) fib(3) :fib(5)=fib(4)+fib(3)
# / \ / \
# fib(3) fib(2) fib(2) fib(1) :fib(5)=fib(3)+fib(2)+fib(2)+fib(1)
#
# Whenever a problem exhibits "overlapping subproblems", if you use
# recursion as a strategy you will end up solving the same subproblems
# more than once. Therefore, it makes more sense to track subproblems
# using a table so that once you solve a subproblem, you can simply
# lookup it's solution next time you need.
#
# CONCLUSION:
# In the foregoing "strategy" section, we essentially described
# dynamic programming.
#
# Dynamic Programming: (aka Dynamic Tabling) is a strategy for solving
# problems, which exhibit optimal substructures and overlapping
# subproblems, by using a table to track the solution of subproblems
# that have already been solved so as not to solve a subproblem more
# than once. This tracking or note taking is usually referred to as
# memoization, meaning writing in a memo.
#=======================================================================
#=======================================================================
# Space Complexity: O(min(n,m))
# where m and n are the size of the input lists, respectively.
#
# There are a number of reduced space LCS algorithms. This is just
# one approach.
#=======================================================================
def spaceEfficientLCS_length( A, B ):
if len( A ) > len( B ):
return LCS_length( A, B )
return LCS_length( B, A );
def LCS_length( B, A ):
m, n = len( A ) , len( B )
T = [[0 for y in range( n + 1 )] for x in range( 2 )]
for i in range( m - 1, -1, -1 ):
ndx = i & 1; # odd or even
for j in range( n - 1, -1, -1 ):
if A[i] == B[j]:
T[ndx][j] = 1 + T[1 - ndx][j + 1]
else:
T[ndx][j] = max( T[1 - ndx][j], T[ndx][j + 1] )
return T[0][0]
import unittest
from dynamic_programming import LongestCommonSubsequence as subsequence
class Test( unittest.TestCase ):
def testSpaceEfficientLCS_length( self ):
fibonacci = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377,
610, 987]
prime = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103]
rand = [0, 1, 21, 45, 144, 610, 987]
bad = [102, 104, 106, 108, 110, 112, 114]
exp1, exp2 = 5, 6
# no shared end points
actual = subsequence.spaceEfficientLCS_length( fibonacci, prime )
self.assertEquals( exp1, actual )
# share both end points
actual = subsequence.spaceEfficientLCS_length( fibonacci, rand )
self.assertEquals( exp2, actual )
# share no points
actual = subsequence.spaceEfficientLCS_length( fibonacci, bad )
self.assertEquals( 0, actual )
# share all points
actual = subsequence.spaceEfficientLCS_length( fibonacci, fibonacci )
self.assertEquals( len( fibonacci ), actual )