## Saturday, December 12, 2015

### KMP (Knuth-Morris-Pratt) Substring Match - Linear Time Substring Match Algorithm - Java

KMP algorithm needs no buffer to record the data. It can be used to find the substring on-the-fly. It is a linear time algorithm. It uses a DFA to achieve linear time search. The construction of DFA takes O(R*W) time and space where R is the number of different characters used and M is the length of the pattern to be search in a text of length N.

Source:

```public class KMPSubstringMatch {

private int W, R;

private int[][] dfa;
private String s;
private String sub;

public KMPSubstringMatch(String s, String sub, int R){
this.W = sub.length();
this.R = R;
this.sub = sub;
this.s = s;
dfa = new int[R][W];
}

public void constructDFA(){
int X=0;
dfa[sub.charAt(0)][0] = 1;
for(int i=1;i<W;++i){
for(int j=0;j<R;++j){
dfa[j][i] = dfa[j][X];
}
dfa[sub.charAt(i)][i] = i+1;
X = dfa[sub.charAt(i)][X];
}
}

public int find(){
int i, j;
for(i=0, j=0;i<s.length() && j!=W;++i){
j=dfa[s.charAt(i)][j];
}
if(j==W)return i-sub.length();
else return -1;
}

public static void find(String s, String sub){
KMPSubstringMatch kmp = new KMPSubstringMatch(s, sub, 257);
kmp.constructDFA();
System.out.println("Substring \""+sub+"\" found at index "+kmp.find());
}

public static void main(String[] args){
String s = "haystackintheneedleinthehaystack";

find(s, "needle");
find(s, "jatin");
find(s, "ack");
find(s, "a");
find(s, "z");
find(s, "lein");
}
}```

Output:

Substring "needle" found at index 13
Substring "jatin" found at index -1
Substring "ack" found at index 5
Substring "a" found at index 1
Substring "z" found at index -1
Substring "lein" found at index 17