[LeetCode]Minimum Unique Word Abbreviation

题目描述：

LeetCode 411. Minimum Unique Word Abbreviation

A string such as "word" contains the following abbreviations:

["word", "1ord", "w1rd", "wo1d", "wor1", "2rd", "w2d", "wo2", "1o1d", "1or1", "w1r1", "1o2", "2r1", "3d", "w3", "4"]

Given a target string and a set of strings in a dictionary, find an abbreviation of this target string with the smallest possible length such that it does not conflict with abbreviations of the strings in the dictionary.

Each number or letter in the abbreviation is considered length = 1. For example, the abbreviation "a32bc" has length = 4.

Note:

In the case of multiple answers as shown in the second example below, you may return any one of them.
Assume length of target string = m, and dictionary size = n. You may assume that m ≤ 21, n ≤ 1000, and log2(n) + m ≤ 20.

Examples:

"apple", ["blade"] -> "a4" (because "5" or "4e" conflicts with "blade")

"apple", ["plain", "amber", "blade"] -> "1p3" (other valid answers include "ap3", "a3e", "2p2", "3le", "3l1").

题目大意：

一个字符串"word"可以包含下列缩写：

["word", "1ord", "w1rd", "wo1d", "wor1", "2rd", "w2d", "wo2", "1o1d", "1or1", "w1r1", "1o2", "2r1", "3d", "w3", "4"]

给定一个目标串和一个字符串集合组成的字典，寻找目标串的缩写的最小可能长度，并且该缩写不与字典中的字符串的缩写相互冲突。

缩写中的每一个数字或者字母的长度都认为是1。例如缩写“a32bc”的长度是4。

注意：

在有多种答案时，比如第二组测试用例，你可以任选其一返回。

假设目标串长度 = m，字典长度 = n。你可以假设m ≤ 21, n ≤ 1000, 并且 log2(n) + m ≤ 20。

解题思路：

深度优先搜索（DFS） + 剪枝

复用题目Valid Word Abbreviation的单词缩写检测方法validWordAbbreviation

首先将dictionary中长度与target不同的单词去掉。

DFS从空字符串''出发，逐一增加target中的字母；并尝试越过一些字母，插入数字。

遍历dictionary中的单词，检测冲突，若不存在冲突，则递归，并更新最优解。

剪枝策略：

利用变量length记录当前时刻的最优的单词缩写长度，若DFS分支的长度大于length，则可剪枝。

Python代码：

class Solution(object):
    def minAbbreviation(self, target, dictionary):
        """
        :type target: str
        :type dictionary: List[str]
        :rtype: str
        """
        self.target = target
        self.size = len(target)
        self.dlist = [d for d in dictionary if len(d) == self.size]
        self.ans = target
        self.length = self.size
        self.dfs('', 0, 0)
        return self.ans
    def dfs(self, abbr, length, depth):
        if length >= self.length: return
        if depth == self.size:
            for word in self.dlist:
                if self.validWordAbbreviation(word, abbr):
                    return
            self.ans = abbr
            self.length = length
            return
        self.dfs(abbr + self.target[depth], length + 1, depth + 1)
        if depth == 0 or not abbr[-1].isdigit():
            for x in range(2, self.size - depth + 1):
                self.dfs(abbr + str(x), length + 1, depth + x)
    def validWordAbbreviation(self, word, abbr):
        """
        :type word: str
        :type abbr: str
        :rtype: bool
        """
        size = len(word)
        cnt = loc = 0
        for w in abbr:
            if w.isdigit():
                if w == '0' and cnt == 0:
                    return False
                cnt = cnt * 10 + int(w)
            else:
                loc += cnt
                cnt = 0
                if loc >= size or word[loc] != w:
                    return False
                loc += 1
        return loc + cnt == size

与上述解法同样的思路，可以将单词缩写的冲突检测用位运算实现。

将字典dictionary中的单词word通过如下函数转化为数字：

def toNumber(self, target, word):
    ans = 0
    for x in range(self.size):
        ans <<= 1
        ans += target[x] == word[x]
    return ans

例如假设target='apple'，word='amble'，则word的二进制数字为10011

a p p l e
a m b l e
1 0 0 1 1

单词缩写abbr通过如下原则转化为二进制数字：

缩写中的字母替换为二进制1，数字覆盖的位数替换为二进制0

例如单词manipulation的缩写m2ip6n可以转化为100110000001

m a n i p u l a t i o n
m  2  i p      6      n
1 0 0 1 1 0 0 0 0 0 0 1

检测单词缩写abbr与字典dicitonary中是否存在冲突通过下列函数实现：

def checkNumber(self, number):
    for w in self.wlist:
        if number & w == number:
            return False
    return True

Python代码：

class Solution(object):
    def minAbbreviation(self, target, dictionary):
        """
        :type target: str
        :type dictionary: List[str]
        :rtype: str
        """
        self.size = len(target)
        self.wlist = [self.toNumber(target, d) \
                      for d in dictionary \
                      if len(d) == self.size]
        self.ans = (1 << self.size) - 1
        self.length = self.size
        self.dfs(0, 0, 0)
        return self.toWord(self.ans)
    def dfs(self, number, depth, length):
        if length >= self.length: return
        if depth == self.size:
            if not any(number & w == number for w in self.wlist):
                self.ans = number
                self.length = length
            return
        self.dfs((number << 1) + 1, depth + 1, length + 1)
        if length == 0 or number & 1:
            for x in range(2, self.size - depth + 1):
                self.dfs(number << x, depth + x, length + 1)
    def toNumber(self, target, word):
        ans = 0
        for x in range(self.size):
            ans <<= 1
            ans += target[x] == word[x]
        return ans
    def toWord(self, number):
        ans = ''
        cnt = 0
        for x in range(self.size):
            if number & (1 << self.size - x - 1):
                if cnt:
                    ans += str(cnt)
                    cnt = 0
                ans += target[x]
            else:
                cnt += 1
        return ans + str(cnt or '')

本文链接：http://bookshadow.com/weblog/2016/10/02/leetcode-minimum-unique-word-abbreviation/
请尊重作者的劳动成果，转载请注明出处！书影博客保留对文章的所有权利。

xiaoA 发布于 2017年2月3日 09:21 #

bit的方法太厉害！！完全想不出来
jackalsin 发布于 2017年11月16日 01:11 #

您好，问一下为什么
self.dfs(abbr + str(x), length + 1, depth + x)

这一行是length + 1, 不是 length + len(str(x))

我的理解是这个是代表现在abbreviation的长度，但是如果x是个两位数的话，不应该length + 2。 discuss板块上很多人也这样做了，但是感觉不是很舒服啊

https://discuss.leetcode.com/topic/61573/python-dfs-solution-with-bit-manipulation
在线疯狂发布于 2017年11月16日 18:44 #

题目描述中提到，缩写中的每一个数字或者字母的长度都认为是1。例如缩写“a32bc”的长度是4。

周一	周二	周三	周四	周五	周六	周日
2016年9月				2016年11月
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31