LeetCode-编辑距离
#javascript #编程 #go #leetcode

问题陈述

给定两个字符串word1word2,返回将Word1转换为Word2 的最小操作数量。

您有以下三个单词的操作:

  • 插入字符
  • 删除字符
  • 更换字符

从:https://leetcode.com/problems/edit-distance

提取的问题声明

示例1:

Input: word1 = 'horse', word2 = 'ros'
Output: 3
Explanation:
horse -> rorse (replace 'h' with 'r')
rorse -> rose (remove 'r')
rose -> ros (remove 'e')

示例2:

Input: word1 = 'intention', word2 = 'execution'
Output: 5
Explanation:
intention -> inention (remove 't')
inention -> enention (replace 'i' with 'e')
enention -> exention (replace 'n' with 'x')
exention -> exection (replace 'n' with 'c')
exection -> execution (insert 'u')

约束:

- 0 <= word1.length, word2.length <= 500
- word1 and word2 consist of lowercase English letters.

解释

s per wikipedia:

编辑距离是一个字符串度量,即一种量化两个字符串(例如单词)的方式的方法,通过计算将一个字符串转换为另一个字符串所需的最小操作数量来衡量。

编辑距离在自然语言处理中查找应用程序,其中自动拼写校正可以通过从词典中选择与所讨论单词较低距离的词典来确定拼写错误单词的候选校正。在生物信息学中,它可用于量化DNA序列的相似性,可以将其视为字母A,C,G和T。

的字符串

编辑距离的不同定义使用不同的字符串操作集。 Levenshtein distance操作是字符串中字符的去除,插入或替换。作为最常见的度量,Levenshtein距离通常与编辑距离互换使用。

在此博客文章中,我们将讨论编辑距离问题的三个解决方案。我们将从蛮力开始,并尝试使用动态编程来优化解决方案。

蛮力方法

蛮方法基于递归。我们考虑可以轻松解决的较小子问题。我们在输入序列上重复出现,并找出哪个操作插入,删除或更换将花费最低。

我们将I和J分别视为Word1和Word2的指示。如果两个字符串的第一个字符都是相同的,则我们会重复出现长度I + 1和j + 1的长度。如果字符不同,我们将对该字符执行所有三个操作。我们计算哪种操作成本最低。

这种方法的C ++片段如下:

int editDistance(string word1, string word2) {
    if (word1.length() == 0)
        return word2.length();

    if (word2.length() == 0)
        return word1.length();

    return match(word1, word2, 0, 0);
}

int match(string s1, string s2, int i, int j) {
    if (s1.length() == i)
         return s2.length() - j;

    if (s2.length() == j)
        return s1.length() - i;

    int result;

    if (s1[i] == s2[j]) {
        result = match(s1, s2, i + 1, j + 1);
    } else {
        int insertChar = match(s1, s2, i, j + 1);
        int deleteChar = match(s1, s2, i + 1, j);
        int substituteChar = match(s1, s2, i + 1, j + 1);
        result = min(insertChar, deleteChar, substituteChar) + 1;
    }

    return result
}

这种方法的时间复杂性是 o(3^n)。空间复杂性是 o(m),其中m是递归树的堆栈空间。

动态编程:回忆方法

如果我们查看蛮力方法中的递归堆栈,我们会发现许多子问题正在重叠。当我们分别计算Word1和Word2的ITH和jth索引的最小值时,我们将值存储在缓存数组中,以便在需要时使用它。

假设我们知道两个字符串的编辑距离,直到i和j索引。然后我们可以写

editDistance(i + 1, j + 1) = 1 + min (editDistance(i, j + 1), editDistance(i + 1, j), editDistance(i, j))
  1. 我们检查word1 [i]的字符是否等于word2 [j],然后查找缓存[i + 1] [j + 1]。

  2. 其他,我们将最少的缓存[i + 1] [j],缓存[i] [j + 1],缓存[i] [j],然后添加1,然后将结果存储在缓存中[I + 1] [J + 1]。

这种方法的C ++片段如下:

int editDistance(string word1, string word2) {
    int cache[word1.length][word2.length];

    for (int i = 0; i < word1.length; i++) {
        for (int j = 0; j < word2.length; j++) {
            cache[i][j] = -1;
        }
    }

    return match(word1, word2, 0, 0, cache);
}

int match(string s1, string s2, int i, int j, int[][] cache){
    if (s1.size() == i)
        return s2.size() - j;

    if (s2.size() == j)
        return s1.size() - i;

    if (cache[i][j] != -1) {
        return cache[i][j];
    }

    if (s1[i] == s2[j]) {
        cache[i][j] = match(s1, s2, i + 1, j + 1, cache);
    } else {
        int insertChar = match(s1, s2, i, j + 1, cache);
        int deleteChar = match(s1, s2, i + 1, j, cache);
        int replaceChar = match(s1, s2, i + 1, j + 1, cache);
        cache[i][j] = min(insertChar, deleteChar, replaceChar) + 1;
    }

    return cache[i][j];
}

这种方法的时间复杂性是 o(n^2)。空间复杂性是 o(n^2)

动态编程:制表方法

与自上而下的回忆方法相比,制表方法以自下而上的方式起作用。使用自下而上的方法,在计算I和J的编辑距离时,我们首先从
中计算最小值

editDistance(i - 1, j)
editDistance(i, j - 1)
editDistance(i - 1, j - 1)

让我们检查一下算法以清楚地理解它。

算法

- set m = word1.length()
      n = word2.length()

- initialize vector<vector<int>> dp(m + 1, vector<int>(n + 1))

- loop for i = 0; i <= m; i++
  - set dp[i][0] = i
- for end

- loop for j = 0; j <= n; j++
  - set dp[0][j] = j
- for end

- loop for i = 1; i <= m; i++
  - loop for j = 1; j <= n; j++
    - if word1[i - 1] == word2[j - 1]
      - set dp[i][j] = dp[i - 1][j - 1]
    - else
      - set dp[i][j] = 1 + min(dp[i][j - 1], dp[i - 1][j], dp[i - 1][j - 1])
    - if end
  - for end
- for end

- return dp[m][n]

这种方法的时间复杂性是 o(n^2)。空间复杂性是 o(n^2)

让我们在 c ++ golang javascript 中查看我们的解决方案。

C ++解决方案

class Solution {
public:
    int editDistance(string word1, string word2) {
        int m = word1.length();
        int n = word2.length();

        vector<vector<int>> dp(m + 1, vector<int>(n + 1));

        for (int i = 0; i <= m; i++) {
            dp[i][0] = i;
        }

        for (int j = 0; j <= n; j++) {
            dp[0][j] = j;
        }

        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (word1[i - 1] == word2[j - 1]) {
                    dp[i][j] = dp[i - 1][j - 1];
                } else {
                    dp[i][j] = 1 + min({dp[i][j - 1], dp[i - 1][j], dp[i - 1][j - 1]});
                }
            }
        }

        return dp[m][n];
    }
};

Golang解决方案

func editDistance(word1 string, word2 string) int {
    m , n := len(word1), len(word2)

    dp := make([][]int, m + 1)

    for i := 0; i < m + 1; i++ {
        dp[i] = make([]int, n + 1)
    }

    for i := 0; i <= m; i++ {
        dp[i][0] = i
    }

    for j := 0; j <= n; j++ {
        dp[0][j] = j
    }

    for i := 1; i <= m; i++ {
        for j := 1; j <= n; j++ {
            if word1[i - 1] == word2[j - 1] {
                dp[i][j] = dp[i - 1][j - 1]
            } else {
                dp[i][j] = 1 + min(min(dp[i][j - 1], dp[i - 1][j]), dp[i - 1][j - 1])
            }
        }
    }

    return dp[m][n]
}

func min(a, b int) int {
    if a < b {
        return a
    }

    return b
}

JavaScript解决方案

var editDistance = function(word1, word2) {
    let m = word1.length;
    let n = word2.length;

    let dp = Array(m + 1).fill().map(() => Array(n + 1));

    for (let i = 0; i <= m; i++) {
        dp[i][0] = i;
    }

    for (let j = 0; j <= n; j++) {
        dp[0][j] = j;
    }

    for (let i = 1; i <= m; i++) {
        for (let j = 1; j <= n; j++) {
            if (word1[i - 1] == word2[j - 1]) {
                dp[i][j] = dp[i - 1][j - 1];
            } else {
                dp[i][j] = 1 + Math.min(Math.min(dp[i][j - 1], dp[i - 1][j]), dp[i - 1][j - 1]);
            }
        }
    }

    return dp[m][n];
};

让我们干燥算法以获取一些示例以查看解决方案的工作原理。

Input: word1 = 'horse'
       word2 = 'ros'

Step 1: m = word1.length()
          = 5
        n = word2.length()
          = 3

Step 2: vector<vector<int>> dp(m + 1, vector<int>(n + 1))
        dp[6][4]

Step 3: loop for i = 0; i <= m; i++
            dp[i][0] = i
        for end

        dp = [
               [0, 0, 0, 0],
               [1, 0, 0, 0],
               [2, 0, 0, 0],
               [3, 0, 0, 0],
               [4, 0, 0, 0],
               [5, 0, 0, 0],
             ]

Step 4: loop for j = 0; i <= n; i++
            dp[0][j] = j
        for end

        dp = [
               [0, 1, 2, 3],
               [1, 0, 0, 0],
               [2, 0, 0, 0],
               [3, 0, 0, 0],
               [4, 0, 0, 0],
               [5, 0, 0, 0],
             ]

Step 5: loop for i = 1; i <= m
          1 <= 5
          true

          loop for j = 1; j <= n
            1 <= 3
            true

            if word1[i - 1] == word2[j - 1]
               word1[1 - 1] == word2[1 - 1]
               word1[0] == word2[0]
               'h' == 'r'
               false

            else
              dp[i][j] = 1 + min({dp[i][j - 1], dp[i - 1][j], dp[i - 1][j - 1]})
              dp[1][1] = 1 + min(dp[1][1 - 1], dp[1 - 1][1], dp[1 - 1][1 - 1])
                       = 1 + min(dp[1][0], dp[0][1], dp[0][0])
                       = 1 + min(1, 1, 0)
                       = 1 + 0
                       = 1

            j++
            j = 2

          loop for j <= n
            2 <= 3
            true

            if word1[i - 1] == word2[j - 1]
               word1[1 - 1] == word2[2 - 1]
               word1[0] == word2[1]
               'h' == 'o'
               false

            else
              dp[i][j] = 1 + min({dp[i][j - 1], dp[i - 1][j], dp[i - 1][j - 1]})
              dp[1][2] = 1 + min(dp[1][2 - 1], dp[1 - 1][2], dp[1 - 1][2 - 1])
                       = 1 + min(dp[1][1], dp[0][2], dp[0][1])
                       = 1 + min(1, 2, 1)
                       = 1 + 1
                       = 2

            j++
            j = 3

          loop for j <= n
            3 <= 3
            true

            if word1[i - 1] == word2[j - 1]
               word1[1 - 1] == word2[3 - 1]
               word1[0] == word2[2]
               'h' == 's'
               false

            else
              dp[i][j] = 1 + min({dp[i][j - 1], dp[i - 1][j], dp[i - 1][j - 1]})
              dp[1][3] = 1 + min(dp[1][3 - 1], dp[1 - 1][3], dp[1 - 1][3 - 1])
                       = 1 + min(dp[1][2], dp[0][3], dp[0][2])
                       = 1 + min(2, 3, 2)
                       = 1 + 2
                       = 3

            j++
            j = 4

          loop for j <= n
            4 <= 3
            false

          dp = [
                 [0, 1, 2, 3],
                 [1, 1, 2, 3],
                 [2, 0, 0, 0],
                 [3, 0, 0, 0],
                 [4, 0, 0, 0],
                 [5, 0, 0, 0],
               ]

          i++
          i = 2

Step 6: loop for i <= m
          2 <= 5
          true

          loop for j = 1; j <= n
            1 <= 3
            true

            if word1[i - 1] == word2[j - 1]
               word1[2 - 1] == word2[1 - 1]
               word1[1] == word2[0]
               'o' == 'r'
               false

            else
              dp[i][j] = 1 + min({dp[i][j - 1], dp[i - 1][j], dp[i - 1][j - 1]})
              dp[2][1] = 1 + min(dp[2][1 - 1], dp[2 - 1][1], dp[2 - 1][1 - 1])
                       = 1 + min(dp[2][0], dp[1][1], dp[1][0])
                       = 1 + min(2, 1, 1)
                       = 1 + 1
                       = 2

            j++
            j = 2

          loop for j <= n
            2 <= 3
            true

            if word1[i - 1] == word2[j - 1]
               word1[2 - 1] == word2[2 - 1]
               word1[1] == word2[1]
               'o' == 'o'
               true

               dp[i][j] = dp[i - 1][j - 1]
               dp[2][2] = dp[2 - 1][2 - 1]
                        = dp[1][1]
                        = 1

            j++
            j = 3

          loop for j <= n
            3 <= 3
            true

            if word1[i - 1] == word2[j - 1]
               word1[2 - 1] == word2[3 - 1]
               word1[1] == word2[2]
               'o' == 's'
               false

            else
              dp[i][j] = 1 + min({dp[i][j - 1], dp[i - 1][j], dp[i - 1][j - 1]})
              dp[2][3] = 1 + min(dp[2][3 - 1], dp[2 - 1][3], dp[2 - 1][3 - 1])
                       = 1 + min(dp[2][2], dp[1][3], dp[1][2])
                       = 1 + min(1, 3, 2)
                       = 1 + 1
                       = 2

            j++
            j = 4

          loop for j <= n
            4 <= 3
            false

          dp = [
                 [0, 1, 2, 3],
                 [1, 1, 2, 3],
                 [2, 2, 1, 2],
                 [3, 0, 0, 0],
                 [4, 0, 0, 0],
                 [5, 0, 0, 0],
               ]

          i++
          i = 3

Step 7: loop for i <= m
          3 <= 5
          true

          loop for j = 1; j <= n
            1 <= 3
            true

            if word1[i - 1] == word2[j - 1]
               word1[3 - 1] == word2[1 - 1]
               word1[2] == word2[0]
               'r' == 'r'
               true

               dp[i][j] = dp[i - 1][j - 1]
               dp[3][1] = dp[3 - 1][1 - 1]
                        = dp[2][0]
                        = 2

            j++
            j = 2

          loop for j <= n
            2 <= 3
            true

            if word1[i - 1] == word2[j - 1]
               word1[3 - 1] == word2[2 - 1]
               word1[2] == word2[1]
               'r' == 'o'
               false

            else
               dp[i][j] = 1 + min({dp[i][j - 1], dp[i - 1][j], dp[i - 1][j - 1]})
               dp[3][2] = 1 + min(dp[3][2 - 1], dp[3 - 1][2], dp[3 - 1][2 - 1])
                        = 1 + min(dp[3][1], dp[2][2], dp[2][1])
                        = 1 + min(0, 1, 2)
                        = 1 + 1
                        = 2

            j++
            j = 3

          loop for j <= n
            3 <= 3
            true

            if word1[i - 1] == word2[j - 1]
               word1[3 - 1] == word2[3 - 1]
               word1[2] == word2[2]
               'r' == 's'
               false

            else
               dp[i][j] = 1 + min({dp[i][j - 1], dp[i - 1][j], dp[i - 1][j - 1]})
               dp[3][3] = 1 + min(dp[3][3 - 1], dp[3 - 1][3], dp[3 - 1][3 - 1])
                        = 1 + min(dp[3][2], dp[2][3], dp[2][2])
                        = 1 + min(0, 1, 1)
                        = 1 + 1
                        = 2

            j++
            j = 4

          loop for j <= n
            4 <= 3
            false

          dp = [
                 [0, 1, 2, 3],
                 [1, 1, 2, 3],
                 [2, 2, 1, 2],
                 [3, 2, 2, 2],
                 [4, 0, 0, 0],
                 [5, 0, 0, 0],
               ]

          i++
          i = 4

Step 8: loop for i <= m
          4 <= 5
          true

          loop for j = 1; j <= n
            1 <= 3
            true

            if word1[i - 1] == word2[j - 1]
               word1[4 - 1] == word2[1 - 1]
               word1[3] == word2[0]
               's' == 'r'
               false

            else
               dp[i][j] = 1 + min({dp[i][j - 1], dp[i - 1][j], dp[i - 1][j - 1]})
               dp[4][1] = 1 + min(dp[4][1 - 1], dp[4 - 1][1], dp[4 - 1][1 - 1])
                        = 1 + min(dp[4][0], dp[3][1], dp[3][0])
                        = 1 + min(4, 2, 3)
                        = 1 + 2
                        = 3

            j++
            j = 2

          loop for j <= n
            2 <= 3
            true

            if word1[i - 1] == word2[j - 1]
               word1[4 - 1] == word2[2 - 1]
               word1[3] == word2[1]
               's' == 'o'
               false

            else
               dp[i][j] = 1 + min({dp[i][j - 1], dp[i - 1][j], dp[i - 1][j - 1]})
               dp[4][2] = 1 + min(dp[4][2 - 1], dp[4 - 1][2], dp[4 - 1][2 - 1])
                        = 1 + min(dp[4][1], dp[3][2], dp[3][1])
                        = 1 + min(3, 2, 2)
                        = 1 + 2
                        = 3

            j++
            j = 3

          loop for j <= n
            3 <= 3
            true

            if word1[i - 1] == word2[j - 1]
               word1[4 - 1] == word2[3 - 1]
               word1[3] == word2[2]
               's' == 's'
               true

               dp[i][j] = dp[i - 1][j - 1]
               dp[4][3] = dp[4 - 1][3 - 1]
                        = dp[3][2]
                        = 2

            j++
            j = 4

          loop for j <= n
            4 <= 3
            false

          dp = [
                 [0, 1, 2, 3],
                 [1, 1, 2, 3],
                 [2, 2, 1, 2],
                 [3, 2, 2, 2],
                 [4, 3, 3, 2],
                 [5, 0, 0, 0],
               ]

          i++
          i = 5

Step 9: loop for i <= m
          5 <= 5
          true

          loop for j = 1; j <= n
            1 <= 3
            true

            if word1[i - 1] == word2[j - 1]
               word1[5 - 1] == word2[1 - 1]
               word1[4] == word2[0]
               'e' == 'r'
               false

            else
               dp[i][j] = 1 + min({dp[i][j - 1], dp[i - 1][j], dp[i - 1][j - 1]})
               dp[5][1] = 1 + min(dp[5][1 - 1], dp[5 - 1][1], dp[5 - 1][1 - 1])
                        = 1 + min(dp[5][0], dp[4][1], dp[4][0])
                        = 1 + min(5, 3, 4)
                        = 1 + 3
                        = 4

            j++
            j = 2

          loop for j = 1; j <= n
            2 <= 3
            true

            if word1[i - 1] == word2[j - 1]
               word1[5 - 1] == word2[2 - 1]
               word1[4] == word2[1]
               'e' == 'o'
               false

            else
               dp[i][j] = 1 + min({dp[i][j - 1], dp[i - 1][j], dp[i - 1][j - 1]})
               dp[5][2] = 1 + min(dp[5][2 - 1], dp[5 - 1][2], dp[5 - 1][2 - 1])
                        = 1 + min(dp[5][1], dp[4][2], dp[4][1])
                        = 1 + min(4, 3, 3)
                        = 1 + 3
                        = 4

            j++
            j = 3

          loop for j = 1; j <= n
            3 <= 3
            true

            if word1[i - 1] == word2[j - 1]
               word1[5 - 1] == word2[3 - 1]
               word1[4] == word2[2]
               'e' == 's'
               false

            else
               dp[i][j] = 1 + min({dp[i][j - 1], dp[i - 1][j], dp[i - 1][j - 1]})
               dp[5][3] = 1 + min(dp[5][3 - 1], dp[5 - 1][3], dp[5 - 1][3 - 1])
                        = 1 + min(dp[5][2], dp[4][3], dp[4][2])
                        = 1 + min(4, 2, 3)
                        = 1 + 2
                        = 3

            j++
            j = 4

          loop for j = 1; j <= n
            4 <= 3
            false

          dp = [
                 [0, 1, 2, 3],
                 [1, 1, 2, 3],
                 [2, 2, 1, 2],
                 [3, 2, 2, 2],
                 [4, 3, 3, 2],
                 [5, 4, 4, 3],
               ]

          i++
          i = 6

Step 10: loop for i <= m
           6 <= 5
           false

Step 11: return dp[m][n]
                dp[5][3]
                3

We return the answer as 3.