Sunday, December 13, 2015

Rabin-Karp Substring Search Algorithm - Java

Works fine in practice (linear time) but the worst case complexity is MN where M is the length of the pattern to be searched and N is the number of characters in the text to be searched.

Choose mod to be a large prime. Probability of a collision = 1/mod. mod should not be too much large to cause overflow.


Source:

import java.math.BigInteger;
import java.util.Random;


public class RabinKarp {
    static String s, sub;
    static long mod = 997;
    static int R=10;
    static long RM;
    static long phash;
     
    private static void setUp(){
        
        mod = BigInteger.probablePrime(32, new Random()).longValue();
        //Compute Hash for Pattern
        
        phash = 0;
        for(int i=0;i<sub.length();++i){
            //Calculation =>> phash = phash*R+(new digit)
            RabinKarp.phash = hash(sub, i, RabinKarp.phash); 
        }
        
        //Precompute RM
        RM = 1;
        for(int i=0;i<sub.length()-1;++i){
            RM=RM*R%mod;
        }
    }
    
    private static long hash(String ss, int i, long lastHash){ 
        if(i>=sub.length())return ((lastHash+ss.charAt(i-sub.length())*(mod-RM))*R+ss.charAt(i))%mod; 
        else return lastHash*R+ss.charAt(i);
    } 
    
    public static int match(String s, String sub){
        RabinKarp.s = s;
        RabinKarp.sub = sub;  
        setUp();
        
        long lastHash = 0;
        int i; 
        for(i=0;i<s.length();++i){
           long cHash = hash(s, i, lastHash); 
           if(cHash == phash)return i-sub.length()+1;
           lastHash = cHash;
       }
       return -1;
    }
    
    public static void main(String[] args){
        s = "findaneedleinahaystack";
        sub="needle";
        System.out.println("Match found at index "+match(s, sub));
    }
}

Output:

Match found at index 5

No comments:

Post a Comment