[LeetCode]Substring with Concatenation of All Words

You are given a string, S, and a list of words, L, that are all of the same length. Find all starting indices of substring(s) in S that is a concatenation of each word in L exactly once and without any intervening characters.

For example, given:
S: "barfoothefoobarman"
L: ["foo", "bar"]

You should return the indices: [0,9].
(order does not matter).

题目简述：

主字符串S，模式字符串list：L（L中的元素等长，有可能重复），寻找L以任意顺序拼接后得到的字符串在S中的索引。

解题思路：

1. 遍历L中的模式串，建立字典树trie，叶子节点保存模式串的唯一编号num（从1开始），以及该模式串的个数val；并记录不重复模式串的个数tt

2. 遍历母串S，使用字典树生成数组arr，其中arr[i]保存从母串位置i开始匹配到的模式串唯一编号num与该模式串在L中的个数val的元组(num, val)。如果arr[i]为(0, 0)，则表示在母串位置i没有匹配到L中的任何模式串。例如样例中生成的arr = [(2, 1), (0, 0), (0, 0), (1, 1), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (1, 1), (0, 0), (0, 0), (2, 1), (0, 0), (0, 0), (0, 0)]

3. 遍历母串S，采用“滑动窗口法”。数组元素dp[i]保存元组( wordDict , wordSet , accm )：（1）模式串字典wordDict（key为模式串唯一编号num，value为模式串在窗口内的出现次数），（2）出现次数足够的模式串集合wordSet，（3）连续出现的模式串个数accm

Python代码：

class Solution:
    def buildTrie(self, L):
        trie = {}
        t = 0
        for word in L:
            node = trie
            for letter in word:
                child = node.get(letter)
                if child is None:
                    child = {}
                    node[letter] = child
                node = child
            val = node.get('val', 0)
            if val == 0:
                t += 1
                node['num'] = t
            node['val'] = val + 1
        return (trie, t)

    def valTrie(self, string, start, size, trie):
        node = trie
        for i in range(start, start + size):
            if node.get(string[i]) is None:
                return (0, 0) # string[start : start + size] not in trie
            node = node[string[i]]
        return (node['num'], node['val'])

    def findSubstring(self, S, L):
        lenL = len(L)
        size = len(L[0])
        trie, tt = self.buildTrie(L)
        strlen = len(S)
        arr = []
        for i in range(strlen - size + 1):
            arr.append(self.valTrie(S, i, size, trie))
        result = []
        dp = []
        for i in range(strlen - size + 1):
            wordDict = None
            wordSet = set()
            accm = 0
            if arr[i][0] > 0:
                idx = i - size * (lenL - 1)
                if i < size:
                    wordDict = {arr[i][0] : 1}
                    if arr[i][1] == 1:
                        wordSet.add(arr[i][0])
                    if len(wordSet) == tt:
                        result.append(i)
                else:
                    wordDict, wordSet, accm = dp[i - size]
                    if wordDict is None:
                        wordDict = {}
                    accm += 1
                    num = wordDict.get(arr[i][0], 0)
                    wordDict[arr[i][0]] = num + 1
                    if num + 1 >= arr[i][1]:
                        wordSet.add(arr[i][0])
                    if idx >= 0:
                        if len(wordSet) == tt:
                            result.append(idx)
                        if accm >= lenL - 1 and arr[idx][0] > 0:
                            num = wordDict.get(arr[idx][0], 0)
                            if num - 1 < arr[idx][1] and arr[idx][0] in wordSet:
                                wordSet.remove(arr[idx][0])
                            if num >= 1:
                                wordDict[arr[idx][0]] = num - 1
            dp.append((wordDict,wordSet,accm))
        return result

Judge评测结果：Accepted 704ms

本文链接：http://bookshadow.com/weblog/2014/04/24/leetcode-substring-concatenation-all-words/
请尊重作者的劳动成果，转载请注明出处！书影博客保留对文章的所有权利。

周一	周二	周三	周四	周五	周六	周日
2014年3月				2014年5月
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30