Saturday, December 12, 2015

Boyer-Moore Substring Match Algorithm

Boyer-Moore algorithm can give sublinear(best-case) as well as quadratic time performance(worst-case). Like KMP, it precomputes information necessary to search the subtring in the text.

Substring search with the Boyer-Moore mismatched character heuristic takes about ~(N/M) character compares to search for a pattern of length M in a text of length N. It can be ~MN in the worst case.



(Algorithms Part II - coursera.org)

Source :

import java.util.Arrays;

public class BoyerMoore {
    static String s, sub;
    static int[] pre;
    static int R = 257;
    
    public static int find(){
        pre = new int[R+1];
        
        Arrays.fill(pre, -1);
        precompute();
        
        int W = sub.length();
        int i, skip = 0;
        
        for(i=0;i<s.length()-W;i+=skip){ 
            skip = 0;
            for(int j=W-1;j>=0;j--){ 
                if(sub.charAt(j)!=s.charAt(i+j)){
                    skip = Math.max(1, j-pre[s.charAt(i+j)]);
                    break;
                }
            }
            if(skip==0)return i;
        }
        return -1;
    }
    
    public static void precompute(){
        for(int i=0;i<sub.length();++i){
            pre[sub.charAt(i)] = i;
        }
    }
    
    public static void main(String[] args){
        s = "thereisaneedleinthehaystack";
        sub = "needle";
        System.out.println("Substring found at index = "+find());
        
        sub = "nothing";
        System.out.println("Substring found at index = "+find());
    }
}

Output:

Substring found at index = 8
Substring found at index = -1

No comments:

Post a Comment