#
Space Efficient LCS

#======================================================================= # Author: Isai Damier # Title: Longest Common Subsequence # Project: geekviewpoint # Package: algorithms # # Statement: # Given two sequences, find the longest common subsequence # between them. # # Time Complexity: O(n*m) # where m and n are the size of the input lists, respectively # # Sample Input: awaited, alpine # Sample Output: aie # # DEFINITION OF SUBSEQUENCE: # A sequence is a particular order in which related objects follow # each other (e.g. DNA, Fibonacci). A sub-sequence is a sequence # obtained by omitting some of the elements of a larger sequence. # # SEQUENCE SUBSEQUENCE OMISSION # [3,1,2,5,4] [1,2] 3,5,4 # [3,1,2,5,4] [3,1,4] 2,5 # # SEQUENCE NOT SUBSEQUENCE REASON # [3,1,2,5,4] [4,2,5] 4 should follow 5 # # STRATEGY: # An easy way to find a longest common subsequence of characters between # two words is to first track the lengths of all the common sequences # and then from those lengths pick a maximum; finally, from that # maximum length, trace out the actual longest common subsequence # between the two words. # # Let's use a table C (i.e. list C) to track the lengths of all common # sub-sequences between the two input words. Naturally, C starts with # all cells set to zero. Then to fill up C, we walk thru the two words, # comparing characters. Each time we find a match, we mark the # corresponding cell in C as one plus whatever maximum we had seen so # far for that particular sub-sequence. # # When we are done filling up C, we find a maximum length z in C. Then # using the indices of z, we trace that actual sequence and return it # as the answer. # # The ensuing code shows the details of how we fill C and then from C # extract the actual LCS. # # ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ # # The remaining paragraphs contain details that you may not particularly # care for. You may skip them and go right to the code. # # Finding the LCS of two sequences X and Y, i.e. LCS(X,Y), is similar # in structure to finding the Fibonacci of some number n, i.e. fib(n). # Both problems exhibit optimal substructures. Which is to say, # for instance, the solution to fib(5) contains the solution to fib(4) # which in turn contains fib(3), and so on. Similarly, the solution # of LCS(X,Y) contains the solution of LCS(X-1,Y), which in turn # contains the solution of LCS(X-1,Y-1), and so on. Whenever a problem # exhibits "optimal substructure", we can use recursion to solve it: # # _ # | fib(n-1)+fib(n-2) if n > 1 # fib(n) = | n if n == 1 or n == 0 # |_ # # For the LCS recurrence relation, a few notes are necessary: # Let C(i,j) be the length of the prefix of LCS(X,Y) given by # # C(i,j) = |LCS(X[1..i],Y[1..j])| # # so that if the length of X is m and the length of Y is n, # # C(m,n) = |LCS(X,Y)| # # Then the recurrence relation for filling C is # _ # | C(i-1,j-1)+1 if X(i) == Y(j) # C(i,j) = | max[C(i-1,j),C(i,j-1)] otherwise # |_ # # In addition to "optimal substructures", the LCS(X,Y) like the fib(n) # contains overlapping subproblems. What we mean is this: In solving # f(5), we meet f(2) both as a subproblem of f(4) and as a subproblem # of f(3). # # fib(5) # / \ # fib(4) fib(3) :fib(5)=fib(4)+fib(3) # / \ / \ # fib(3) fib(2) fib(2) fib(1) :fib(5)=fib(3)+fib(2)+fib(2)+fib(1) # # Whenever a problem exhibits "overlapping subproblems", if you use # recursion as a strategy you will end up solving the same subproblems # more than once. Therefore, it makes more sense to track subproblems # using a table so that once you solve a subproblem, you can simply # lookup it's solution next time you need. # # CONCLUSION: # In the foregoing "strategy" section, we essentially described # dynamic programming. # # Dynamic Programming: (aka Dynamic Tabling) is a strategy for solving # problems, which exhibit optimal substructures and overlapping # subproblems, by using a table to track the solution of subproblems # that have already been solved so as not to solve a subproblem more # than once. This tracking or note taking is usually referred to as # memoization, meaning writing in a memo. #======================================================================= #======================================================================= # Space Complexity: O(min(n,m)) # where m and n are the size of the input lists, respectively. # # There are a number of reduced space LCS algorithms. This is just # one approach. #======================================================================= def spaceEfficientLCS_length( A, B ): if len( A ) > len( B ): return LCS_length( A, B ) return LCS_length( B, A ); def LCS_length( B, A ): m, n = len( A ) , len( B ) T = [[0 for y in range( n + 1 )] for x in range( 2 )] for i in range( m - 1, -1, -1 ): ndx = i & 1; # odd or even for j in range( n - 1, -1, -1 ): if A[i] == B[j]: T[ndx][j] = 1 + T[1 - ndx][j + 1] else: T[ndx][j] = max( T[1 - ndx][j], T[ndx][j + 1] ) return T[0][0]

import unittest from dynamic_programming import LongestCommonSubsequence as subsequence class Test( unittest.TestCase ): def testSpaceEfficientLCS_length( self ): fibonacci = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987] prime = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103] rand = [0, 1, 21, 45, 144, 610, 987] bad = [102, 104, 106, 108, 110, 112, 114] exp1, exp2 = 5, 6 # no shared end points actual = subsequence.spaceEfficientLCS_length( fibonacci, prime ) self.assertEquals( exp1, actual ) # share both end points actual = subsequence.spaceEfficientLCS_length( fibonacci, rand ) self.assertEquals( exp2, actual ) # share no points actual = subsequence.spaceEfficientLCS_length( fibonacci, bad ) self.assertEquals( 0, actual ) # share all points actual = subsequence.spaceEfficientLCS_length( fibonacci, fibonacci ) self.assertEquals( len( fibonacci ), actual )