codebytes: Longest Common Subsequence

Tuesday, January 15, 2019

Longest Common Subsequence

Longest common subsequence is a classic DP problem.

Statement: Given two sequences, find the length of longest subsequence present in both of them. A subsequence is a sequence that appears in the same relative order, but not necessarily contiguous. For example, “abc”, “abg”, “bdf”, “aeg”, ‘”acefg”, .. etc are subsequences of “abcdefg”. So a string of length n has 2^n different possible subsequences.

Solution:

1. Use DP:

Array : dp[first.length][second.length]
Function: compute(int x, int y)

A dp array of size DP[length(first)][length(second)] stores the state of the current computation.

At every function call compute(x, y) we are considering matching the char at xth index of first string with yth index of second string. If it matches we add 1 to the length of subsequence.

If it does not match, we have to skip it. We do so by just returning the value of sebsequent index matches:

skip current xth index in first string:
compute(x+1, y)

skip current yth index in second string:
compute(x, y+1)

Each of these calls return the max subsequence length starting at their supplied indices.

At current x,y configuration since the characters don't match we consider max return value among these 2 calls and assign the current dp[x][y] state to this value. So subsequent calls to this function compute(x, y) return dp[x][y] value.

This is equivalent to saying that we have already computed what is the max common subsequence length is we start from x in first string and y in second string so just take the value and don't compute again.

This way we avoid a lot of unnecessary computation.

Time complexity:

O(MN)

where M = length(string1) N = length(string2)

For each x we match every y. Since results are stored in dp[][] we don't do the matching for particular x and y more than once.

Practice Problem SPOJ : EELCS

Code:

codebytes

Tuesday, January 15, 2019

Longest Common Subsequence

No comments:

Post a Comment