[LeetCode]Find Median from Data Stream

题目描述:

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

Examples: 

[2,3,4] , the median is 3

[2,3], the median is (2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

  • void addNum(int num) - Add a integer number from the data stream to the data structure.
  • double findMedian() - Return the median of all elements so far.

For example:

add(1)
add(2)
findMedian() -> 1.5
add(3) 
findMedian() -> 2

题目大意:

中位数是指有序整数列表的中间值。如果列表的长度为偶数,此时没有中间值。则其中位数就是两个中间值的平均值。

例如:

[2,3,4], 中位数是 3

[2,3], 中位数是 (2 + 3) / 2 = 2.5

设计一种数据结构支持下面两种操作:

  • void addNum(int num) - 从数据流向数据结构追加一个整数。
  • double findMedian() - 返回截至目前所有元素的中位数。

测试样例如题目描述。

解题思路:

维护大顶堆(MaxHeap) + 小顶堆(MinHeap)

需要满足下面的约束条件:

  1. 大顶堆中存储的元素 均不大于 小顶堆中的元素
  2. MaxHeap.size() == MinHeap.size(),或者 MaxHeap.size() == MinHeap.size() + 1

则有:

  1. 当MaxHeap.size() == MinHeap.size() + 1时,中位数就是MaxHeap的堆顶元素
  2. 当MaxHeap.size() == MinHeap.size()时,中位数就是MaxHeap堆顶元素与MinHeap堆顶元素的均值

使用Python的内置堆算法heapq可以很容易地实现小顶堆,而大顶堆可以通过对元素的值 * -1实现。

参考Python标准库文档:https://docs.python.org/2/library/heapq.html

Python代码(使用heapq):

from heapq import *
class MedianFinder:
    def __init__(self):
        """
        Initialize your data structure here.
        """
        self.minHeap = []
        self.maxHeap = []

    def addNum(self, num):
        """
        Adds a num into the data structure.
        :type num: int
        :rtype: void
        """
        heappush(self.maxHeap, -num)
        minTop = self.minHeap[0] if len(self.minHeap) else None
        maxTop = self.maxHeap[0] if len(self.maxHeap) else None
        if minTop < -maxTop or len(self.minHeap) + 1 < len(self.maxHeap):
            heappush(self.minHeap, -heappop(self.maxHeap))
        if len(self.maxHeap) < len(self.minHeap):
            heappush(self.maxHeap, -heappop(self.minHeap))

    def findMedian(self):
        """
        Returns the median of current data stream
        :rtype: float
        """
        if len(self.minHeap) < len(self.maxHeap):
            return -1.0 * self.maxHeap[0]
        else:
            return (self.minHeap[0] - self.maxHeap[0]) / 2.0

# Your MedianFinder object will be instantiated and called as such:
# mf = MedianFinder()
# mf.addNum(1)
# mf.findMedian()

使用heapq的heappushpop操作可以合并执行heappush与heappop操作,进而使代码简化:

参考:https://leetcode.com/discuss/64852/ac-python-two-heap-solution-o-log-n-add-o-1-find-388-ms

Python代码:

import heapq
def __init__(self):
    self.small = []  # the smaller half of the list, min-heap with invert values
    self.large = []  # the larger half of the list, min heap

def addNum(self, num):
    if len(self.small) == len(self.large):
        heapq.heappush(self.large, -heapq.heappushpop(self.small, -num))
    else:
        heapq.heappush(self.small, -heapq.heappushpop(self.large, num))

def findMedian(self):
    if len(self.small) == len(self.large):
        return float(self.large[0] - self.small[0]) / 2.0
    else:
        return float(self.large[0])

# 18 / 18 test cases passed.
# Status: Accepted
# Runtime: 388 ms

下面的Python代码通过手写堆的方式实现。

Python代码(手写堆):

class Heap:
    def __init__(self, cmp):
        self.cmp = cmp
        self.heap = [None]

    def __swap__(self, x, y, a):
        a[x], a[y] = a[y], a[x]

    def size(self):
        return len(self.heap) - 1

    def top(self):
        return self.heap[1] if self.size() else None

    def append(self, num):
        self.heap.append(num)
        self.siftUp(self.size())

    def pop(self):
        top, last = self.heap[1], self.heap.pop()
        if self.size():
            self.heap[1] = last
            self.siftDown(1)
        return top

    def siftUp(self, idx):
        while idx > 1 and self.cmp(idx, idx / 2, self.heap):
            self.__swap__(idx / 2, idx, self.heap)
            idx /= 2

    def siftDown(self, idx):
        while idx * 2 <= self.size():
            nidx = idx * 2
            if nidx + 1 <= self.size() and self.cmp(nidx + 1, nidx, self.heap):
                nidx += 1
            if self.cmp(nidx, idx, self.heap):
                self.__swap__(nidx, idx, self.heap)
                idx = nidx
            else:
                break


class MedianFinder:
    def __init__(self):
        """
        Initialize your data structure here.
        """
        self.minHeap = Heap(cmp = lambda x, y, a: a[x] < a[y])
        self.maxHeap = Heap(cmp = lambda x, y, a: a[x] > a[y])

    def addNum(self, num):
        """
        Adds a num into the data structure.
        :type num: int
        :rtype: void
        """
        self.maxHeap.append(num)
        if self.minHeap.top() < self.maxHeap.top() \
          or self.minHeap.size() + 1 < self.maxHeap.size():
            self.minHeap.append(self.maxHeap.pop())
        if self.maxHeap.size() < self.minHeap.size():
            self.maxHeap.append(self.minHeap.pop())

    def findMedian(self):
        """
        Returns the median of current data stream
        :rtype: float
        """
        if self.minHeap.size() < self.maxHeap.size():
            return self.maxHeap.top()
        else:
            return (self.minHeap.top() + self.maxHeap.top()) / 2.0

# Your MedianFinder object will be instantiated and called as such:
# mf = MedianFinder()
# mf.addNum(1)
# mf.findMedian()

对于更加精简的Java/C++/Python代码,可以参考LeetCode Discuss中StefanPochmann的解答:

链接:https://leetcode.com/discuss/64850/short-simple-java-c-python-o-log-n-o-1

以及:https://leetcode.com/discuss/64910/very-short-o-log-n-o-1

本文链接:http://bookshadow.com/weblog/2015/10/19/leetcode-find-median-data-stream/
请尊重作者的劳动成果,转载请注明出处!书影博客保留对文章的所有权利。

如果您喜欢这篇博文,欢迎您捐赠书影博客: ,查看支付宝二维码

Pingbacks已关闭。

暂无评论

张贴您的评论