#
Knuth Morris Pratt Prefix Table

/********************************************************************* * Author: Isai Damier * Title: Knuth Morris Pratt String Search Algorithms Prefix Table * Project: geekviewpoint * Package: algorithm.search.string * * This is the portion of the KMP algorithm that pre-processes the * needle/pattern, P. It does so by matching the needle against shifts * of itself. The resulting skip table S provides information for * skipping fruitless shifts of the haystack/text. * * Let's start by setting up some fundamentals for the demonstration * ahead. Say we are looping through P as in * * for(int i=0; i<P.length; i++) visit(P[i]); * * Let "prefix" mean the portion of P we already visited before * reaching char P[i]. * * Then for P={abababca}, * - the prefix X<i> of P when i=0 is X<0> = {} * - the prefix X<i> of P when i=1 is X<1> = {a} * - the prefix X<i> of P when i=2 is X<2> = {ab} * - the prefix X<i> of P when i=3 is X<3> = {aba} ... * * Now for the fun stuff: Let X<i> be the prefix of P with length i. * Then we define the overlap between X and P as the longest suffix of * X to match a prefix of P. (Note: this algorithm does not treat X as * a proper suffix of itself.) * * For example, the overlap between X<5> and P is aba because * * X<5>: ab|aba * P : aba|babca * * We could not use * * X<5>: ababa * P : ababa|bca * * because this algorithm does not treat X as its own suffix. * * Following is the filling of the skip table S: * * i=0 X={} S={-1} Note: P[0] has no prefix * * i=1 X={a} S={-1,0} Note: X has no suffix * * i=2 X={ab} S={-1,0,0} Note: no match * * i=3 X={aba} S={-1,0,0,1} * Note: * X<3>: ab|a * P : a|bababca * match ends after P[0], so start next shift at i=1 * * i=4 X={abab} S={-1,0,0,1,2} * Note: * X<4>: ab|ab * P : ab|ababca * match ends after P[1], so start next shift at i=2 * * i=5 X={ababa} S={-1,0,0,1,2,3} * Note: * X<5>: ab|aba * P : aba|babca * match ends after P[2], so start next shift at i=3 * * i=6 X={ababab} S={-1,0,0,1,2,3,4} * Note: * X<5>: ab|abab * P : abab|abca * match ends after P[3], so start next shift at i=4 * * i=7 X={abababc} S={-1,0,0,1,2,3,4,0} * Note: no match so restart at i=0 * * For people with a background in computer engineering or integrated * circuit design, this method is essentially sketching a state * transition table/diagram for the input P. (as a refresher visit * http://teahlab.com/ and browse or use the costume search). * ********************************************************************/ private void prefixTable(char[] P, int[] T) { int pos = 2, m = 0; T[0] = -1; while (pos < P.length) { if (P[pos - 1] == P[m]) { T[pos++] = ++m; } else if (0 < T[m]) { m = T[m]; } else { pos++; //same as T[pos++] = 0; } } }