Longest Common Subsequence
by Isai Damier, Android Engineer @ Google

Code
Unit Test
Sponsors

/***********************************************************************
 * Author: Isai Damier
 * Title: Longest Common Subsequence
 * Project: geekviewpoint
 * Package: algorithms
 *
 * Statement:
 *   Given two sequences, find the longest common subsequence
 *   between them.
 *
 *  Time Complexity: O(n*m)
 *    where m and n are the size of the input arrays, respectively
 * 
 * Sample Input: awaited, alpine
 * Sample Output: aie
 * 
 * DEFINITION OF SUBSEQUENCE:
 *   A sequence is a particular order in which related objects follow
 *   each other (e.g. DNA, Fibonacci). A sub-sequence is a sequence
 *   obtained by omitting some of the elements of a larger sequence.
 * 
 *   SEQUENCE       SUBSEQUENCE     OMISSION
 *   {3,1,2,5,4}     {1,2}            3,5,4          
 *   {3,1,2,5,4}     {3,1,4}          2,5
 * 
 *   SEQUENCE       NOT SUBSEQUENCE   REASON
 *   {3,1,2,5,4}     {4,2,5}           4 should follow 5
 *     
 * STRATEGY:
 *   An easy way to find a longest common subsequence of characters between
 *   two words is to first track the lengths of all the common sequences
 *   and then from those lengths pick a maximum; finally, from that
 *   maximum length, trace out the actual longest common subsequence
 *   between the two words.
 * 
 *   Let's use a table C (i.e. array C) to track the lengths of all common
 *   sub-sequences between the two input words. Naturally, C starts with
 *   all cells set to zero. Then to fill up C, we walk thru the two words,
 *   comparing characters. Each time we find a match, we mark the
 *   corresponding cell in C as one plus whatever maximum we had seen so
 *   far for that particular sub-sequence.
 * 
 *   When we are done filling up C, we find a maximum length z in C. Then
 *   using the indices of z, we trace that actual sequence and return it
 *   as the answer.
 * 
 *   The ensuing code shows the details of how we fill C and then from C
 *   extract the actual LCS.
 * 
 *          ~~~~    ~~~~    ~~~~    ~~~~    ~~~~    ~~~~    ~~~~
 * 
 *   The remaining paragraphs contain details that you may not particularly
 *   care for. You may skip them and go right to the code.
 * 
 *   Finding the LCS of two sequences X and Y, i.e. LCS(X,Y), is similar
 *   in structure to finding the Fibonacci of some number n, i.e. fib(n).
 *   Both problems exhibit optimal substructures. Which is to say,
 *   for instance, the solution to fib(5) contains the solution to fib(4)
 *   which in turn contains fib(3), and so on. Similarly, the solution
 *   of LCS(X,Y) contains the solution of LCS(X-1,Y), which in turn
 *   contains the solution of LCS(X-1,Y-1), and so on. Whenever a problem
 *   exhibits "optimal substructure", we can use recursion to solve it:
 *   
 *               _
 *              | fib(n-1)+fib(n-2)    if n > 1
 *     fib(n) = | n                    if n == 1 or n == 0
 *              |_
 *              
 *   For the LCS recurrence relation, a few notes are necessary:
 *   Let C(i,j) be the length of the prefix of LCS(X,Y) given by
 * 
 *     C(i,j) = |LCS(X[1..i],Y[1..j])|
 * 
 *   so that if the length of X is m and the length of Y is n,
 *   
 *     C(m,n) = |LCS(X,Y)|
 * 
 *   Then the recurrence relation for filling C is
 *               _
 *              | C(i-1,j-1)+1            if X(i) == Y(j)
 *     C(i,j) = | max{C(i-1,j),C(i,j-1)}  otherwise
 *              |_
 * 
 *   In addition to "optimal substructures", the LCS(X,Y) like the fib(n)
 *   contains overlapping subproblems.  What we mean is this: In solving
 *   f(5), we meet f(2) both as a subproblem of f(4) and as a subproblem
 *   of f(3).
 * 
 *               fib(5)
 *             /        \
 *         fib(4)        fib(3)       :fib(5)=fib(4)+fib(3)
 *         /    \       /     \
 *     fib(3) fib(2)  fib(2)  fib(1)  :fib(5)=fib(3)+fib(2)+fib(2)+fib(1)
 * 
 *   Whenever a problem exhibits "overlapping subproblems", if you use
 *   recursion as a strategy you will end up solving the same subproblems
 *   more than once. Therefore, it makes more sense to track subproblems
 *   using a table so that once you solve a subproblem, you can simply
 *   lookup it's solution next time you need.
 * 
 * CONCLUSION:
 *   In the foregoing "strategy" section, we essentially described
 *   dynamic programming.
 * 
 *   Dynamic Programming: (aka Dynamic Tabling) is a strategy for solving
 *     problems, which exhibit optimal substructures and overlapping
 *     subproblems, by using a table to track the solution of subproblems
 *     that have already been solved so as not to solve a subproblem more
 *     than once. This tracking or note taking is usually referred to as
 *     memoization, meaning writing in a memo.
 * 
 **********************************************************************/ 
 package algorithm.programming.dynamic;

import java.util.ArrayList;
import java.util.List;

public class LongestCommonSubsequence {

  /****
   * Space Complexity: O(n*m)
   *    where m and n are the size of the input arrays, respectively
   * 
   * This solution first computes a table of lengths and then from that
   * table computes an actual longest common subsequence.
   ****/
  public List<Integer> LCS(int[] A, int[] B) {
    int[][] m = lcsTable(A, B);
    List<Integer> sequence = actualLCS(m, A, B);
    return sequence;
  }

  /****
   * LCS table of lengths
   ****/
  private int[][] lcsTable(int[] A, int[] B) {
    int[][] m = new int[A.length + 1][B.length + 1];
    for (int x = 1; x <= A.length; x++) {
      for (int y = 1; y <= B.length; y++) {
        if (A[x - 1] == B[y - 1]) {
          m[x][y] = 1 + m[x - 1][y - 1];
        } else {
          m[x][y] = Math.max(m[x][y - 1], m[x - 1][y]);
        }
      }
    }
    return m;
  }

  /****
   * Extract an actual LCS from the length table, which effectively
   * contains all the possible longest common sequences.
   ****/
  private List<Integer> actualLCS(int[][] m, int[] A, int[] B) {
    List<Integer> result = new ArrayList<Integer>();
    int x = A.length, y = B.length;
    while (x > 0 && y > 0) {
      if (A[x - 1] == B[y - 1]) {
        result.add(A[x - 1]);
        x--;
        y--;
      } else if (m[x][y - 1] > m[x - 1][y]) {
        y--;
      } else {
        x--;
      }
    }
    return result;
  }
}

package algorithm.programming.dynamic;

import java.util.Arrays;
import java.util.List;
import org.junit.Test;
import static org.junit.Assert.*;

public class LongestCommonSubsequenceTest {

  /**
   * Test of LCS method, of class LongestCommonSubsequence.
   */
  @Test
  public void testLCS() {
    System.out.println("LCS");
    int[] fibonacci = {0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,
                       144, 233, 377, 610, 987};
    int[] prime = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
                   43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,
                   101, 103};
    int[] rand = {0, 1, 21, 45, 144, 610, 987};
    int[] bad = {102, 104, 106, 108, 110, 112, 114};

    Integer[] exp1 = {89, 13, 5, 3, 2};
    Integer[] exp2 = {987, 610, 144, 21, 1, 0};

    //no shared end points
    LongestCommonSubsequence seq = new LongestCommonSubsequence();
    List<Integer> actual = seq.LCS(fibonacci, prime);
    assertTrue(Arrays.equals(exp1, actual.toArray(new Integer[0])));

    //share both end points
    actual = seq.LCS(fibonacci, rand);
    assertTrue(Arrays.equals(exp2, actual.toArray(new Integer[0])));

    //share no points
    actual = seq.LCS(fibonacci, bad);
    assertEquals(0, actual.size());

    //share all points
    actual = seq.LCS(fibonacci, fibonacci);
    assertEquals(fibonacci.length, actual.size());
    int i = fibonacci.length;
    for (int e : actual) {
      assertEquals(e, fibonacci[--i]);
    }
  }
}

Longest Common Subsequence by Isai Damier, Android Engineer @ Google

Longest Common Subsequence
by Isai Damier, Android Engineer @ Google