Find length of the Longest Common Substring

Posted by on Mar 31, 2024

Given two strings s1 and s2, the task is to find the length of the longest common substring.

Unlike subsequences, substrings are required to occupy consecutive positions within the original string.

Input: s1 = "abcdxyz", s2 = "xyzabcd"
Output : 4
Explanation: "abcd"

Input: s1 = "zxabcdezy", s2 = "yzabcdezx"
Output : 6
Explanation: "abcdez"


Solutions

Method 1: Recursion

This Longest Common Substrings (LCSubString) problem is a variation of the Longest Common Subsequence (LCS).

The idea is to generate all substrings of both given strings and find the longest matching substring.

Start from the last elements (n-1, m-1) of both strings and also pass an extra variable "l" to keep track of the current sub-string length:

If s1[n-1] == s2[m-1], this character will be part of the current substring, so add it to the count and pass the increased count in the recur check for the remaining (0...n-2, 0...m-2) characters.

Also check recursively for (0...n-2, 0...m-1) and (0...n-1, 0...m-2) and return the max of these two and the current substring length(l).

If any of the string is exhausted (m ==0, n ==0) return current substring length(l). This is the base condition.

Complexity

The time complexity of the above recursive solution is exponential (2^n).

In addition, O(n) auxiliary space was used for the recursion stack.

Method 2: DP (Memoization)

The Memoization Technique is basically an extension to the recursive approach so that we can overcome the problem of calculating redundant cases and thus decrease time complexity.

We can see that in each recursive call only the value of "n", "m" and "l" changes, so we can store and reuse the result of a function(..n, m.., l..) call using a "n * m*l" 3-D array.

The 3-D array will store a particular state (n, m, l) if we get it the first time.

Now, if we come across the same state (n, m, l) again, instead of calculating it in exponential complexity, we can directly return its result stored in the table in constant time.

Complexity

The time complexity of this solution is (n * m * l).

In addition, O(n * m * l) auxiliary space was used by the table.

Related


Shortest Common Super-sequence (SCS)

What is "Subset Sum Problem" ?

Equal subset sum partition problem

Print Shortest Common Super-sequence (SCS)

Unbounded knapsack (Recursion & Dynamic Programming)

Minimum number of deletions and insertions

Print Longest Common Subsequence (LCS)

Longest Common Subsequence (LCS)

Count number of subsets with given sum

What is the "Rod Cutting Problem" ?