/***********************************************************************
* Author: Isai Damier
* Title: Longest Common Subsequence
* Project: geekviewpoint
* Package: algorithms
*
* Statement:
* Given two sequences, find the longest common subsequence
* between them.
*
* Time Complexity: O(n*m)
* where m and n are the size of the input arrays, respectively
*
* Sample Input: awaited, alpine
* Sample Output: aie
*
* DEFINITION OF SUBSEQUENCE:
* A sequence is a particular order in which related objects follow
* each other (e.g. DNA, Fibonacci). A sub-sequence is a sequence
* obtained by omitting some of the elements of a larger sequence.
*
* SEQUENCE SUBSEQUENCE OMISSION
* {3,1,2,5,4} {1,2} 3,5,4
* {3,1,2,5,4} {3,1,4} 2,5
*
* SEQUENCE NOT SUBSEQUENCE REASON
* {3,1,2,5,4} {4,2,5} 4 should follow 5
*
* STRATEGY:
* An easy way to find a longest common subsequence of characters between
* two words is to first track the lengths of all the common sequences
* and then from those lengths pick a maximum; finally, from that
* maximum length, trace out the actual longest common subsequence
* between the two words.
*
* Let's use a table C (i.e. array C) to track the lengths of all common
* sub-sequences between the two input words. Naturally, C starts with
* all cells set to zero. Then to fill up C, we walk thru the two words,
* comparing characters. Each time we find a match, we mark the
* corresponding cell in C as one plus whatever maximum we had seen so
* far for that particular sub-sequence.
*
* When we are done filling up C, we find a maximum length z in C. Then
* using the indices of z, we trace that actual sequence and return it
* as the answer.
*
* The ensuing code shows the details of how we fill C and then from C
* extract the actual LCS.
*
* ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~
*
* The remaining paragraphs contain details that you may not particularly
* care for. You may skip them and go right to the code.
*
* Finding the LCS of two sequences X and Y, i.e. LCS(X,Y), is similar
* in structure to finding the Fibonacci of some number n, i.e. fib(n).
* Both problems exhibit optimal substructures. Which is to say,
* for instance, the solution to fib(5) contains the solution to fib(4)
* which in turn contains fib(3), and so on. Similarly, the solution
* of LCS(X,Y) contains the solution of LCS(X-1,Y), which in turn
* contains the solution of LCS(X-1,Y-1), and so on. Whenever a problem
* exhibits "optimal substructure", we can use recursion to solve it:
*
* _
* | fib(n-1)+fib(n-2) if n > 1
* fib(n) = | n if n == 1 or n == 0
* |_
*
* For the LCS recurrence relation, a few notes are necessary:
* Let C(i,j) be the length of the prefix of LCS(X,Y) given by
*
* C(i,j) = |LCS(X[1..i],Y[1..j])|
*
* so that if the length of X is m and the length of Y is n,
*
* C(m,n) = |LCS(X,Y)|
*
* Then the recurrence relation for filling C is
* _
* | C(i-1,j-1)+1 if X(i) == Y(j)
* C(i,j) = | max{C(i-1,j),C(i,j-1)} otherwise
* |_
*
* In addition to "optimal substructures", the LCS(X,Y) like the fib(n)
* contains overlapping subproblems. What we mean is this: In solving
* f(5), we meet f(2) both as a subproblem of f(4) and as a subproblem
* of f(3).
*
* fib(5)
* / \
* fib(4) fib(3) :fib(5)=fib(4)+fib(3)
* / \ / \
* fib(3) fib(2) fib(2) fib(1) :fib(5)=fib(3)+fib(2)+fib(2)+fib(1)
*
* Whenever a problem exhibits "overlapping subproblems", if you use
* recursion as a strategy you will end up solving the same subproblems
* more than once. Therefore, it makes more sense to track subproblems
* using a table so that once you solve a subproblem, you can simply
* lookup it's solution next time you need.
*
* CONCLUSION:
* In the foregoing "strategy" section, we essentially described
* dynamic programming.
*
* Dynamic Programming: (aka Dynamic Tabling) is a strategy for solving
* problems, which exhibit optimal substructures and overlapping
* subproblems, by using a table to track the solution of subproblems
* that have already been solved so as not to solve a subproblem more
* than once. This tracking or note taking is usually referred to as
* memoization, meaning writing in a memo.
*
**********************************************************************/
package algorithm.programming.dynamic;
import java.util.ArrayList;
import java.util.List;
public class LongestCommonSubsequence {
/****
* Space Complexity: O(n*m)
* where m and n are the size of the input arrays, respectively
*
* This solution first computes a table of lengths and then from that
* table computes an actual longest common subsequence.
****/
public List<Integer> LCS(int[] A, int[] B) {
int[][] m = lcsTable(A, B);
List<Integer> sequence = actualLCS(m, A, B);
return sequence;
}
/****
* LCS table of lengths
****/
private int[][] lcsTable(int[] A, int[] B) {
int[][] m = new int[A.length + 1][B.length + 1];
for (int x = 1; x <= A.length; x++) {
for (int y = 1; y <= B.length; y++) {
if (A[x - 1] == B[y - 1]) {
m[x][y] = 1 + m[x - 1][y - 1];
} else {
m[x][y] = Math.max(m[x][y - 1], m[x - 1][y]);
}
}
}
return m;
}
/****
* Extract an actual LCS from the length table, which effectively
* contains all the possible longest common sequences.
****/
private List<Integer> actualLCS(int[][] m, int[] A, int[] B) {
List<Integer> result = new ArrayList<Integer>();
int x = A.length, y = B.length;
while (x > 0 && y > 0) {
if (A[x - 1] == B[y - 1]) {
result.add(A[x - 1]);
x--;
y--;
} else if (m[x][y - 1] > m[x - 1][y]) {
y--;
} else {
x--;
}
}
return result;
}
}
package algorithm.programming.dynamic;
import java.util.Arrays;
import java.util.List;
import org.junit.Test;
import static org.junit.Assert.*;
public class LongestCommonSubsequenceTest {
/**
* Test of LCS method, of class LongestCommonSubsequence.
*/
@Test
public void testLCS() {
System.out.println("LCS");
int[] fibonacci = {0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,
144, 233, 377, 610, 987};
int[] prime = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,
101, 103};
int[] rand = {0, 1, 21, 45, 144, 610, 987};
int[] bad = {102, 104, 106, 108, 110, 112, 114};
Integer[] exp1 = {89, 13, 5, 3, 2};
Integer[] exp2 = {987, 610, 144, 21, 1, 0};
//no shared end points
LongestCommonSubsequence seq = new LongestCommonSubsequence();
List<Integer> actual = seq.LCS(fibonacci, prime);
assertTrue(Arrays.equals(exp1, actual.toArray(new Integer[0])));
//share both end points
actual = seq.LCS(fibonacci, rand);
assertTrue(Arrays.equals(exp2, actual.toArray(new Integer[0])));
//share no points
actual = seq.LCS(fibonacci, bad);
assertEquals(0, actual.size());
//share all points
actual = seq.LCS(fibonacci, fibonacci);
assertEquals(fibonacci.length, actual.size());
int i = fibonacci.length;
for (int e : actual) {
assertEquals(e, fibonacci[--i]);
}
}
}