Minimum number of deletions and insertions

Posted by on Mar 31, 2024

Two strings, "s1" and "s2," are given. The task is to perform the minimum number of operations (delete and insert) on s1 so as to transform it into s2.

Input: s1 = abcd, s2 = zgbcf. Output: 5

Input: s1 = heap, s2 = pea. Output: 3


Solutions

Method 1: Recursion

This problem is a variation of the Longest Common Subsequence (LCS).

Start from the last elements (n-1, m-1) of both subsequences:

If s1[n-1] == s2[m-1], this character is common, so remove it from both sequences and check for the remaining (0...n-2, 0...m-2) characters.

If s1[n-1] != s2[m-1], then check recursively for following choices:

1) Remove current character from s1 -> add +1 to operations, n = n-1
2) Add current character from s2 to s1 -> add +1 to operations, s1 + s2.charAt(m - 1), n= n+1 

return the minimum operations of the above two choices.

If n==0, return m (remaining elements from s2 has to be inserted in s1), else if m==0, return n (remaining elements from s1 has to be deleted). This is the base condition.

Complexity

This solution is exponential ((2^n)) in term of time complexity.

In addition, O(n) auxiliary stack space was used for the recursion stack.

Method 2: Memoization

The Memoization Technique is basically an extension to the recursive approach so that we can overcome the problem of calculating redundant cases and thus decrease time complexity.

We can see that in each recursive call only the value of "n" and "m" (length) changes, so we can store and reuse the result of a function(..n, m..) call using a "n * m" 2-D array.

The 2-D array will store a particular state (n, m) if we get it the first time.

Now, if we come across the same state (n, m) again, instead of calculating it in exponential complexity, we can directly return its result stored in the table in constant time.

Complexity

The time complexity of this solution is (n * m).

In addition, O(n * m) auxiliary space was used by the table.

Related


Equal subset sum partition problem

What is the "Rod Cutting Problem" ?

Print Longest Common Subsequence (LCS)

What is "Subset Sum Problem" ?

Count number of subsets with given sum

Shortest Common Super-sequence (SCS)

Find length of the Longest Common Substring

Print Shortest Common Super-sequence (SCS)

Longest Common Subsequence (LCS)

Unbounded knapsack (Recursion & Dynamic Programming)