Knuth Morris Pratt Prefix Table
by Isai Damier

/*********************************************************************
 * Author: Isai Damier
 * Title: Knuth Morris Pratt String Search Algorithms Prefix Table
 * Project: geekviewpoint
 * Package: algorithm.search.string
 * 
 * This is the portion of the KMP algorithm that pre-processes the
 * needle/pattern, P. It does so by matching the needle against shifts
 * of itself. The resulting skip table S provides information for
 * skipping fruitless shifts of the haystack/text.
 *
 * Let's start by setting up some fundamentals for the demonstration
 * ahead. Say we are looping through P as in
 *
 *   for(int i=0; i<P.length; i++) visit(P[i]);
 *
 * Let "prefix" mean the portion of P we already visited before
 * reaching char P[i].
 *
 * Then for P={abababca}, 
 *  - the prefix X<i> of P when i=0 is X<0> = {} 
 *  - the prefix X<i> of P when i=1 is X<1> = {a}
 *  - the prefix X<i> of P when i=2 is X<2> = {ab} 
 *  - the prefix X<i> of P when i=3 is X<3> = {aba} ...
 *
 * Now for the fun stuff: Let X<i> be the prefix of P with length i.
 * Then we define the overlap between X and P as the longest suffix of
 * X to match a prefix of P. (Note: this algorithm does not treat X as
 * a proper suffix of itself.)
 *
 * For example, the overlap between X<5> and P is aba because 
 * 
 *     X<5>: ab|aba 
 *     P   :    aba|babca
 * 
 * We could not use 
 * 
 *     X<5>: ababa 
 *     P   : ababa|bca 
 * 
 * because this algorithm does not treat X as its own suffix.
 *
 * Following is the filling of the skip table S:
 *
 *   i=0 X={} S={-1} Note: P[0] has no prefix
 * 
 *   i=1 X={a} S={-1,0} Note: X has no suffix
 * 
 *   i=2 X={ab} S={-1,0,0} Note: no match
 * 
 *   i=3 X={aba} S={-1,0,0,1} 
 *     Note:
 *        X<3>: ab|a 
 *        P   :    a|bababca 
 *     match ends after P[0], so start next shift at i=1
 *
 *   i=4 X={abab} S={-1,0,0,1,2}
 *     Note: 
 *        X<4>: ab|ab
 *        P      : ab|ababca 
 *     match ends after P[1], so start next shift at i=2
 *
 *   i=5 X={ababa} S={-1,0,0,1,2,3}
 *     Note:
 *        X<5>: ab|aba
 *        P      : aba|babca
 *     match ends after P[2], so start next shift at i=3
 *
 *   i=6 X={ababab} S={-1,0,0,1,2,3,4}
 *     Note:
 *        X<5>: ab|abab
 *        P   :    abab|abca
 *     match ends after P[3], so start next shift at i=4
 *
 *   i=7 X={abababc} S={-1,0,0,1,2,3,4,0} 
 *     Note: no match so restart at i=0
 *
 * For people with a background in computer engineering or integrated
 * circuit design, this method is essentially sketching a state
 * transition table/diagram for the input P. (as a refresher visit
 * http://teahlab.com/ and browse or use the costume search).
 *
 ********************************************************************/ 
 private void prefixTable(char[] P, int[] T) {
  int pos = 2, m = 0;
  T[0] = -1;
  while (pos < P.length) {
    if (P[pos - 1] == P[m]) {
      T[pos++] = ++m;
    } else if (0 < T[m]) {
      m = T[m];
    } else {
      pos++; //same as T[pos++] = 0;
    }
  }
}