LeetCode-Hot100-堆

前言

题单LeetCode 热题 100的堆部分，给出解析。

215. Kth Largest Element in an Array

原题链接

题目描述

Given an integer array nums and an integer k, return the k^th largest element in the array.

Note that it is the k^th largest element in the sorted order, not the k^th distinct element.

Can you solve it without sorting?

Example 1:

1 2	Input: nums = [3,2,1,5,6,4], k = 2 Output: 5

Example 2:

1 2	Input: nums = [3,2,3,1,2,4,5,5,6], k = 4 Output: 4

Constraints:

1 <= k <= nums.length <= 10^5
-10^4 <= nums[i] <= 10^4

解题方法

堆排序

要在一个整数数组 nums 中找到第 k 大的元素，且要求不能通过排序的方式来解决，这可以通过使用堆数据结构来实现。

我们可以使用最小堆（Min-Heap）来解决这个问题。具体步骤如下：

创建一个大小为 k 的最小堆。
遍历数组中的元素：
- 如果堆的大小小于 k，将元素加入堆中。
- 如果堆的大小已经是 k，检查当前元素是否大于堆顶元素（即堆中最小的元素）。如果大于堆顶元素，则将堆顶元素弹出，并将当前元素加入堆中。
遍历结束后，堆顶元素即为第 k 大的元素。

快速选择

快速选择算法是基于快速排序 (Quick Sort) 算法的思想发展而来的。它使用了分治 (Divide and Conquer) 的策略，通过一个选定的「基准值」(pivot) 将数组分为两部分，一部分包含所有小于基准值的元素，另一部分包含所有大于基准值的元素。与快速排序不同的是，快速选择不需要完全排序整个数组，而是根据需要找的是第 k 小还是第 k 大的元素，只处理包含目标元素的那部分数组。

快速选择算法的基本执行步骤如下：

从数组中选择一个元素作为基准值 (pivot)。
使用该基准值对数组进行划分，将所有小于基准值的元素放在它的左边，大于基准值的放在右边。
判断基准值的位置与 k 的大小：
1. 如果基准值正好是第 k 个元素，直接返回该基准值。
2. 如果基准值的位置大于 k，在左边的子数组中，继续进行快速选择。
3. 如果基准值的位置小于 k，转到右边的子数组中，继续寻找第 k 大的元素。
重复以上步骤，直到找到第 k 个元素。

代码实现

堆排序

import heapq

class Solution:
    def findKthLargest(self, nums: List[int], k: int) -> int:
        # 创建一个大小为 k 的最小堆
        min_heap = []

        for num in nums:
            if len(min_heap) < k:
                heapq.heappush(min_heap, num)
            else:
                if num > min_heap[0]:
                    heapq.heappushpop(min_heap, num)

        # 最小堆的堆顶就是第 k 大的元素
        return min_heap[0]

快速选择

class Solution:
    def findKthLargest(self, nums: List[int], k: int) -> int:
        # 交换位置i和位置j的元素
        def swap(i: int, j: int):
            nums[i], nums[j] = nums[j], nums[i]

        # 对[left, right]范围内的元素进行降序快排，找到第k大元素
        def quick_sort_kth_element(k: int, left: int, right: int) -> int:
            mid = (right + left) // 2    # 选取中间元素作为切分值
            swap(mid, right)             # 将切分值放到右边界避免加入元素的划分
            partition, i, j = nums[right], left, right   # 双指针从左右边界开始，分别找到要交换的元素

            while i < j:
                while i < j and nums[i] >= partition: i += 1    # 找到左侧小于切分值的元素
                while j > i and nums[j] <= partition: j -=1    # 找到右侧大于切分值的元素【因为是找大于，即使j从right开始，right也不会被选中】
                if i < j:
                    swap(i, j)     # 将大于元素放到左侧，小于元素放到右侧
            swap(i, right)     # i最后停留的位置一定是右侧首个小于切分值的元素，与切分值交换，则[left, i)都是大于（等于）切分值，[i+1, right]都是小于（等于）切分值
            if i == k - 1: return nums[i]   # 如果切分值就是第k大元素，直接返回
            if i < k - 1: return quick_sort_kth_element(k, i + 1, right)     # 切分值是第k大之前的元素，在右区间搜索第k大
            return quick_sort_kth_element(k, left, i - 1)   # 切分值是第k大之后的元素，在左区间搜索第k大

        return quick_sort_kth_element(k, 0, len(nums) - 1)    # 快排整个区间

复杂度分析

方法	时间复杂度	空间复杂度
堆排序	$O(n)$	$O(n)$
快速选择	$O(n)$	$O(logn)$

347. Top K Frequent Elements

原题链接

题目描述

Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order .

Example 1:

1 2	Input: nums = [1,1,1,2,2,3], k = 2 Output: [1,2]

Example 2:

1 2	Input: nums = [1], k = 1 Output: [1]

Constraints:

1 <= nums.length <= 10^5
-10^4 <= nums[i] <= 10^4
k is in the range [1, the number of unique elements in the array].
It is guaranteed that the answer is unique .

Follow up: Your algorithm’s time complexity must be better than O(n log n), where n is the array’s size.

解题方法

要找到给定整数数组 nums 中的 k 个最频繁的元素，我们可以使用以下步骤：

计数元素频率：使用哈希表（或字典）来计算每个元素出现的频率。
构建最小堆：使用最小堆来保持 k 个频率最高的元素。最小堆可以确保我们在添加新元素时，能够轻松移除频率最低的元素。
返回结果：从堆中提取出 k 个元素。

代码实现

from collections import Counter

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        # 1. 计数每个元素的频率
        count = Counter(nums)
        
        # 2. 使用最小堆维护 k 个最频繁的元素
        heap = []
        for num, freq in count.items():
            heapq.heappush(heap, (freq, num))
            if len(heap) > k:
                heapq.heappop(heap)  # 保持堆的大小为 k
        
        # 3. 从堆中提取出 k 个元素
        return [num for freq, num in heap]

复杂度分析

时间复杂度：$O(nlogn)$

空间复杂度：$O(n)$

295. Find Median from Data Stream

原题链接

题目描述

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.

For example, for arr = [2,3,4], the median is 3.
For example, for arr = [2,3], the median is (2 + 3) / 2 = 2.5.

Implement the MedianFinder class:

MedianFinder() initializes the MedianFinder object.
void addNum(int num) adds the integer num from the data stream to the data structure.
double findMedian() returns the median of all elements so far. Answers within 10^-5 of the actual answer will be accepted.

Example 1:

Input
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output
[null, null, null, 1.5, null, 2.0]

Explanation
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1);    // arr = [1]
medianFinder.addNum(2);    // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3);    // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0

Constraints:

-10^5 <= num <= 10^5
There will be at least one element in the data structure before calling findMedian.
At most 5 * 10^4 calls will be made to addNum and findMedian.

Follow up:

If all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?
If 99% of all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?

解题方法

这个问题要求我们设计一个 MedianFinder 类，来支持添加数字并能够实时地返回数据流中所有数字的中位数。

解题思路：

维护数据结构：可以通过使用两种数据结构来解决这个问题，分别是最大堆（max heap）和最小堆（min heap）。
- 最大堆：用来存储较小的一半数字。
- 最小堆：用来存储较大的一半数字。
堆的作用：
- 由于堆是一个完全二叉树结构，可以高效地获取最大值（最大堆）和最小值（最小堆）。我们使用两个堆来保持数据的平衡。
- 如果总数为奇数，最大堆会比最小堆多一个元素，返回最大堆的根节点就是中位数。
- 如果总数为偶数，两个堆的大小相同，返回两个堆根节点的平均值。

算法步骤：

初始化两个堆：最大堆和最小堆。
添加数字：
- 先将数字添加到最大堆中（因为最大堆根节点是当前较小的一半数字）。
- 然后将最大堆的根节点移到最小堆，以保证两个堆的元素尽量平衡。
- 如果最小堆的大小大于最大堆，则将最小堆的根节点移回最大堆。
求中位数：
- 如果最大堆的大小比最小堆大，返回最大堆的根节点。
- 如果两个堆的大小相同，则返回两个堆根节点的平均值。

代码实现

class MedianFinder:

    def __init__(self):
        # 最大堆：保存较小的一半数字
        self.max_heap = []  # Python heapq 默认是最小堆，所以存储负数来模拟最大堆
        # 最小堆：保存较大的一半数字
        self.min_heap = []

    def addNum(self, num: int) -> None:
        # 首先加入最大堆（负数以模拟最大堆）
        heapq.heappush(self.max_heap, -num)
        
        # 将最大堆的最大值移到最小堆
        heapq.heappush(self.min_heap, -heapq.heappop(self.max_heap))
        
        # 保证两个堆的平衡，最小堆不允许比最大堆大
        if len(self.min_heap) > len(self.max_heap):
            heapq.heappush(self.max_heap, -heapq.heappop(self.min_heap))

    def findMedian(self) -> float:
        # 如果最大堆比最小堆多一个元素
        if len(self.max_heap) > len(self.min_heap):
            return -self.max_heap[0]
        # 否则两个堆的大小相等，返回两个堆的平均值
        return (-self.max_heap[0] + self.min_heap[0]) / 2.0
        


# Your MedianFinder object will be instantiated and called as such:
# obj = MedianFinder()
# obj.addNum(num)
# param_2 = obj.findMedian()

复杂度分析

每次 addNum 操作的时间复杂度是 $O(log n)$，因为每次加入元素后可能需要调整堆。

每次 findMedian 操作的时间复杂度是 $O(1)$，只需要查看堆的根节点。