Heap in python
Heap,即堆,也就是優(yōu)先隊列。我們可以在這里找到[]維基百科](https://en.wikipedia.org/wiki/Heap_(data_structure))
堆(英語:Heap)是計算機科學(xué)中一類特殊的數(shù)據(jù)結(jié)構(gòu)的統(tǒng)稱。堆通常是一個可以被看做一棵樹的數(shù)組對象。在隊列中,調(diào)度程序反復(fù)提取隊列中第一個作業(yè)并運行,因為實際情況中某些時間較短的任務(wù)將等待很長時間才能結(jié)束,或者某些不短小,但具有重要性的作業(yè),同樣應(yīng)當(dāng)具有優(yōu)先權(quán)。堆即為解決此類問題設(shè)計的一種數(shù)據(jù)結(jié)構(gòu)。
邏輯定義:
n個元素序列{k1,k2...ki...kn},當(dāng)且僅當(dāng)滿足下列關(guān)系時稱之為堆:
(ki <= k2i,ki <= k2i+1)或者(ki >= k2i,ki >= k2i+1), (i = 1,2,3,4...n/2)
性質(zhì):
堆的實現(xiàn)通過構(gòu)造二叉堆(binary heap),實為二叉樹的一種;由于其應(yīng)用的普遍性,當(dāng)不加限定時,均指該數(shù)據(jù)結(jié)構(gòu)的這種實現(xiàn)。這種數(shù)據(jù)結(jié)構(gòu)具有以下性質(zhì)。
- 任意節(jié)點小于(或大于)它的所有后裔,最小元(或最大元)在堆的根上(堆序性)。
- 堆總是一棵完全樹。即除了最底層,其他層的節(jié)點都被元素填滿,且最底層盡可能地從左到右填入。
將根節(jié)點最大的堆叫做最大堆或大根堆,根節(jié)點最小的堆叫做最小堆或小根堆。常見的堆有二叉堆、斐波那契堆等。
支持的基本操作[編輯]
操作 描述 時間復(fù)雜度
build 創(chuàng)建一個空堆 O(n)
insert 向堆中插入一個新元素 O(log n)
update 將新元素提升使其匹配堆的性質(zhì)
get 獲取當(dāng)前堆頂元素的值 O(1)
delete 刪除堆頂元素 O(log n)
heapify 使刪除堆頂元素的堆再次成為堆
某些堆實現(xiàn)還支持其他的一些操作,如斐波那契堆支持檢查一個堆中是否存在某個元素。
示例程序:
為將元素X插入堆中,找到空閑位置,創(chuàng)建一個空穴,若滿足堆序性(英文:heap order),則插入完成;否則將父節(jié)點元素裝入空穴,刪除該父節(jié)點元素,完成空穴上移。直至滿足堆序性。這種策略叫做上濾(percolate up)。[1]
void Insert( ElementType X, PriorityQueue H )
{
int i;
if( IsFull(H) )
{
printf( "Queue is full.\n" );
return;
}
for( i = ++H->Size; H->Element[i/2] > X; i /= 2 )
H->Elements[i] = H->Elements[i/2];
H->Elements[i] = X;
}
以上是插入到一個二叉堆的過程。
DeleteMin,刪除最小元,即二叉樹的根或父節(jié)點。刪除該節(jié)點元素后,隊列最后一個元素必須移動到堆得某個位置,使得堆仍然滿足堆序性質(zhì)。這種向下替換元素的過程叫作下濾。
ElementType
DeleteMin( PriorityQueue H )
{
int i, Child;
ElementType MinElement, LastElement;
if( IsEmpty( H ) )
{
printf( "Queue is empty.\n" );
return H->Elements[0];
}
MinElement = H->Elements[1];
LastElement = H->Elements[H->Size--];
for( i = 1; i*2 <= H->Size; i = Child )
{
// Find smaller child.
Child = i*2;
if( Child != H->Size && H->Elements[Child+1]
< H->Elements[Child] )
Child++;
// Percolate one level.
if( LastElement > H->Elements[Child] )
H->Elements[i] = H->Elements[Child];
else
break;
}
H->Elements[i] = LastElement;
return MinElement;
}
應(yīng)用
堆排序,或者運用堆的排序以選擇優(yōu)先
Python中的heapq模塊 -- 堆排序算法
Purpose:
目的:
The heapq implements a min-heap sort algorithm suitable for use with Python’s lists.
heapq模塊執(zhí)行了一個適用于Python列表的最小堆排序算法。
A heap is a tree-like data structure where the child nodes have a sort-order relationship with the parents. Binary heaps can be represented using a list or array organized so that the children of element N are at positions 2N+1 and 2N+2 (for zero-based indexes). This layout makes it possible to rearrange heaps in place, so it is not necessary to reallocate as much memory when adding or removing items.
堆,是一個類似于樹的數(shù)據(jù)結(jié)構(gòu),其中子節(jié)點們與其父節(jié)點有一個排序的關(guān)系。二叉堆能夠表示為使用一個列表或者數(shù)組來組織,因此元素N的子節(jié)點位于2N+1和2N+2(對于基于0的索引)。這樣的布局允許在原來位置重置堆,因此沒有必要在添加和刪除元素的時候重置過多的內(nèi)存空間。
A max-heap ensures that the parent is larger than or equal to both of its children. A min-heap requires that the parent be less than or equal to its children. Python’s heapq module implements a min-heap.
一個最大堆可以確保父節(jié)點大于或者等于其兩個子節(jié)點。最小堆需要父節(jié)點小于或者等于其子節(jié)點。Python的heapq模塊使用最小堆。
Example Data
示例數(shù)據(jù)
The examples in this section use the data in heapq_heapdata.py.
本節(jié)的這個示例使用heapq_heapdata.py中的data。
heapq_heapdata.py
# This data was generated with the random module.
data = [19, 9, 4, 10, 11]
The heap output is printed using heapq_showtree.py:
heapq_showtree.py使用heap的輸出:
heapq_showtree.py
import math
from io import StringIO
def show_tree(tree, total_width=36, fill=' '):
"""Pretty-print a tree."""
output = StringIO()
last_row = -1
for i, n in enumerate(tree):
if i:
row = int(math.floor(math.log(i + 1, 2)))
else:
row = 0
if row != last_row:
output.write('\n')
columns = 2 ** row
col_width = int(math.floor(total_width / columns))
output.write(str(n).center(col_width, fill))
last_row = row
print(output.getvalue())
print('-' * total_width)
print()
Creating a Heap
創(chuàng)建一個堆
There are two basic ways to create a heap, heappush() and heapify().
想要創(chuàng)建一個堆,有兩種基本方法:heappush() 和 heapify()。
heapq_heappush.py
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
heap = []
print('random :', data)
print()
for n in data:
print('add {:>3}:'.format(n))
heapq.heappush(heap, n)
show_tree(heap)
Using heappush(), the heap sort order of the elements is maintained as new items are added from a data source.
使用heappush(),當(dāng)來自于數(shù)據(jù)源的新元素添加到堆中時,堆排序算法將維護元素的順序。
$ python3 heapq_heappush.py
random : [19, 9, 4, 10, 11]
add 19:
19
add 9:
9
19
add 4:
4
19 9
add 10:
4
10 9
19
add 11:
4
10 9
19 11
If the data is already in memory, it is more efficient to use heapify() to rearrange the items of the list in place.
如果data已經(jīng)在內(nèi)存之中,使用heapify()來在列表內(nèi)部重置元素將會更高效。
heapq_heapify.py
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
print('random :', data)
heapq.heapify(data)
print('heapified :')
show_tree(data)
The result of building a list in heap order one item at a time is the same as building it unordered and then calling heapify().
按照堆的順序,一次一個元素構(gòu)建一個列表的結(jié)果,與構(gòu)建一個構(gòu)建一個未排序的列表一致,直接使用heapify():
$ python3 heapq_heapify.py
random : [19, 9, 4, 10, 11]
heapified :
4
9 19
10 11
Accessing Contents of a Heap
訪問堆中的內(nèi)容
Once the heap is organized correctly, use heappop() to remove the element with the lowest value.
一旦堆正確的組織,就可以使用heappop()來移除最小的元素值。
heapq_heappop.py
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
print('random :', data)
heapq.heapify(data)
print('heapified :')
show_tree(data)
print
for i in range(2):
smallest = heapq.heappop(data)
print('pop {:>3}:'.format(smallest))
show_tree(data)
In this example, adapted from the stdlib documentation, heapify() and heappop() are used to sort a list of numbers.
在本示例中,參考標(biāo)準(zhǔn)庫的文檔,heapify() 和 heappop()用于對于一個數(shù)字列表進(jìn)行排序。
$ python3 heapq_heappop.py
random : [19, 9, 4, 10, 11]
heapified :
4
9 19
10 11
pop 4:
9
10 19
11
pop 9:
10
11 19
To remove existing elements and replace them with new values in a single operation, use heapreplace().
想要用一次操作移除一個已經(jīng)存在的元素,然后使用一個新的元素替代它,可以使用 heapreplace():
heapq_heapreplace.py
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
heapq.heapify(data)
print('start:')
show_tree(data)
for n in [0, 13]:
smallest = heapq.heapreplace(data, n)
print('replace {:>2} with {:>2}:'.format(smallest, n))
show_tree(data)
Replacing elements in place makes it possible to maintain a fixed size heap, such as a queue of jobs ordered by priority.
在原來的位置替換元素,這樣就可以維護一個固定大小的堆,例如按照優(yōu)先級排列的任務(wù)隊列。
$ python3 heapq_heapreplace.py
start:
4
9 19
10 11
replace 4 with 0:
0
9 19
10 11
replace 0 with 13:
9
10 19
13 11
Data Extremes From a Heap
堆中兩端的數(shù)據(jù)
heapq also includes two functions to examine an iterable to find a range of the largest or smallest values it contains.
heapq包含兩個函數(shù)用與迭代查找堆中一定范圍內(nèi)最大或者最小的值。
heapq_extremes.py
import heapq
from heapq_heapdata import data
print('all :', data)
print('3 largest :', heapq.nlargest(3, data))
print('from sort :', list(reversed(sorted(data)[-3:])))
print('3 smallest:', heapq.nsmallest(3, data))
print('from sort :', sorted(data)[:3])
Using nlargest() and nsmallest() are only efficient for relatively small values of n > 1, but can still come in handy in a few cases.
使用 nlargest() 和 nsmallest() 這兩個函數(shù)對于查找 n > 1的較小數(shù)值會顯得更高效,但是在一些場景下更靈活。
$ python3 heapq_extremes.py
all : [19, 9, 4, 10, 11]
3 largest : [19, 11, 10]
from sort : [19, 11, 10]
3 smallest: [4, 9, 10]
from sort : [4, 9, 10]
Efficiently Merging Sorted Sequences
高效的合并已排序的序列
Combining several sorted sequences into one new sequence is easy for small data sets.
使用如下所示的方法,對于較小的數(shù)據(jù)集,合并一些已經(jīng)排序的序列到一個新的序列會變的很容易。
list(sorted(itertools.chain(*data)))
For larger data sets, this technique can use a considerable amount of memory. Instead of sorting the entire combined sequence, merge() uses a heap to generate a new sequence one item at a time, and determine the next item using a fixed amount of memory.
對于較大的數(shù)據(jù)集,使用如上所示的代碼將會占用大量的內(nèi)存。與對整個已經(jīng)組合的序列進(jìn)行排序相比,merge()使用了堆來一次性生成新的序列,并且判斷新生成的序列使用固定數(shù)量的內(nèi)存。
heapq_merge.py
import heapq
import random
random.seed(2016)
data = []
for i in range(4):
new_data = list(random.sample(range(1, 101), 5))
new_data.sort()
data.append(new_data)
for i, d in enumerate(data):
print('{}: {}'.format(i, d))
print('\nMerged:')
for i in heapq.merge(*data):
print(i, end=' ')
print()
Because the implementation of merge() uses a heap, it consumes memory based on the number of sequences being merged, rather than the number of items in those sequences.
由于在堆中執(zhí)行merge()函數(shù),所消耗的內(nèi)存取決于所合并的序列數(shù)量,而不是這些序列中的元素數(shù)量。
$ python3 heapq_merge.py
0: [33, 58, 71, 88, 95]
1: [10, 11, 17, 38, 91]
2: [13, 18, 39, 61, 63]
3: [20, 27, 31, 42, 45]
Merged:
10 11 13 17 18 20 27 31 33 38 39 42 45 58 61 63 71 88 91 95
最讓我眼前一亮的應(yīng)用,就是leetcode中的第二十三題目:merge k sorted list,我們可以將問題轉(zhuǎn)換為合并queue:
class ListNode(object):
def __init__(self, x):
self.val = x
self.next = None
def mergeKLists(lists):
current = dummy = ListNode(0)
heap = []
for sorted_list in lists:
if sorted_list:
heapq.heappush(heap, (sorted_list.val, sorted_list))
while heap:
smallest = heapq.heappop(heap)[1]
current.next = smallest
current = current.next
if smallest.next:
heapq.heappush(heap, (smallest.next.val, smallest.next))
return dummy.next
將鏈表的問題轉(zhuǎn)換成堆排序的問題。
1