/*********************************************************************
* Author: Isai Damier
* Title: Knuth Morris Pratt String Search Algorithms Prefix Table
* Project: geekviewpoint
* Package: algorithm.search.string
*
* This is the portion of the KMP algorithm that pre-processes the
* needle/pattern, P. It does so by matching the needle against shifts
* of itself. The resulting skip table S provides information for
* skipping fruitless shifts of the haystack/text.
*
* Let's start by setting up some fundamentals for the demonstration
* ahead. Say we are looping through P as in
*
* for(int i=0; i<P.length; i++) visit(P[i]);
*
* Let "prefix" mean the portion of P we already visited before
* reaching char P[i].
*
* Then for P={abababca},
* - the prefix X<i> of P when i=0 is X<0> = {}
* - the prefix X<i> of P when i=1 is X<1> = {a}
* - the prefix X<i> of P when i=2 is X<2> = {ab}
* - the prefix X<i> of P when i=3 is X<3> = {aba} ...
*
* Now for the fun stuff: Let X<i> be the prefix of P with length i.
* Then we define the overlap between X and P as the longest suffix of
* X to match a prefix of P. (Note: this algorithm does not treat X as
* a proper suffix of itself.)
*
* For example, the overlap between X<5> and P is aba because
*
* X<5>: ab|aba
* P : aba|babca
*
* We could not use
*
* X<5>: ababa
* P : ababa|bca
*
* because this algorithm does not treat X as its own suffix.
*
* Following is the filling of the skip table S:
*
* i=0 X={} S={-1} Note: P[0] has no prefix
*
* i=1 X={a} S={-1,0} Note: X has no suffix
*
* i=2 X={ab} S={-1,0,0} Note: no match
*
* i=3 X={aba} S={-1,0,0,1}
* Note:
* X<3>: ab|a
* P : a|bababca
* match ends after P[0], so start next shift at i=1
*
* i=4 X={abab} S={-1,0,0,1,2}
* Note:
* X<4>: ab|ab
* P : ab|ababca
* match ends after P[1], so start next shift at i=2
*
* i=5 X={ababa} S={-1,0,0,1,2,3}
* Note:
* X<5>: ab|aba
* P : aba|babca
* match ends after P[2], so start next shift at i=3
*
* i=6 X={ababab} S={-1,0,0,1,2,3,4}
* Note:
* X<5>: ab|abab
* P : abab|abca
* match ends after P[3], so start next shift at i=4
*
* i=7 X={abababc} S={-1,0,0,1,2,3,4,0}
* Note: no match so restart at i=0
*
* For people with a background in computer engineering or integrated
* circuit design, this method is essentially sketching a state
* transition table/diagram for the input P. (as a refresher visit
* http://teahlab.com/ and browse or use the costume search).
*
********************************************************************/
private void prefixTable(char[] P, int[] T) {
int pos = 2, m = 0;
T[0] = -1;
while (pos < P.length) {
if (P[pos - 1] == P[m]) {
T[pos++] = ++m;
} else if (0 < T[m]) {
m = T[m];
} else {
pos++; //same as T[pos++] = 0;
}
}
}