Saturday, December 12, 2015

KMP (Knuth-Morris-Pratt) Substring Match - Linear Time Substring Match Algorithm - Java

KMP algorithm needs no buffer to record the data. It can be used to find the substring on-the-fly. It is a linear time algorithm. It uses a DFA to achieve linear time search. The construction of DFA takes O(R*W) time and space where R is the number of different characters used and M is the length of the pattern to be search in a text of length N.

Source:

public class KMPSubstringMatch {
    
    private int W, R;
    
    private int[][] dfa;
    private String s;
    private String sub;
    
    public KMPSubstringMatch(String s, String sub, int R){
        this.W = sub.length();
        this.R = R;
        this.sub = sub;
        this.s = s;
        dfa = new int[R][W];
    }
        
    public void constructDFA(){  
        int X=0;
        dfa[sub.charAt(0)][0] = 1;
        for(int i=1;i<W;++i){ 
            for(int j=0;j<R;++j){
                dfa[j][i] = dfa[j][X];
            }
            dfa[sub.charAt(i)][i] = i+1;
            X = dfa[sub.charAt(i)][X];
        }
    }
    
    public int find(){
        int i, j;
        for(i=0, j=0;i<s.length() && j!=W;++i){
            j=dfa[s.charAt(i)][j]; 
        }
        if(j==W)return i-sub.length();
        else return -1; 
    } 
    
    public static void find(String s, String sub){
        KMPSubstringMatch kmp = new KMPSubstringMatch(s, sub, 257);
        kmp.constructDFA();
        System.out.println("Substring \""+sub+"\" found at index "+kmp.find());
    }
    
    public static void main(String[] args){
        String s = "haystackintheneedleinthehaystack";
        
        find(s, "needle");
        find(s, "jatin");
        find(s, "ack");
        find(s, "a");
        find(s, "z");
        find(s, "lein");
    }
}

Output:

Substring "needle" found at index 13
Substring "jatin" found at index -1
Substring "ack" found at index 5
Substring "a" found at index 1
Substring "z" found at index -1
Substring "lein" found at index 17

No comments:

Post a Comment