2018-09-03

Hash Tables

When we talk about?hash tables, we're actually talking about dictionary. While an array can be used to construct hash tables, array indexes its elements using integers. However, if we want to store data and use keys other than integer, such as 'string', we may want to use dictionary.

Dictionaries in Python are implemented using hash tables. It is an array whose indexes are obtained using a hash function on the keys.

We declare an empty dictionary like this:

We declare an empty dictionary like this:

>>> D = {}

Then, we can add its elements:

>>> D['a'] = 1

>>> D['b'] = 2

>>> D['c'] = 3

>>> D

{'a': 1, 'c': 3, 'b': 2}

It's a structure with (key, value) pair:

D[key] = value

The string used to "index" the hash table D is called the "key". To access the data stored in the table, we need to know the key:

>>> D['b']

2

How we loop through the hash table?

>>> for k in D.keys():

...? ? print D[k]

...

1

3

2

If we want to print the (key, value) pair:

>>> for k,v in D.items():

...? ? print k,':',v

...

a : 1

c : 3

b : 2

Hashing from two arrays

Using two Arrays of equal length, create a Hash object where the elements from one array (the keys) are associated with the elements of the other (the values):

>>> keys = ['a', 'b', 'c']

>>> values = [1, 2, 3]

>>> hash = {k:v for k, v in zip(keys, values)}

>>> hash

{'a': 1, 'c': 3, 'b': 2}

Hashing

Here are some hashing samples using built-in?hash?function:

>>> map(hash, [0, 1, 2, 3])

[0, 1, 2, 3]

>>> map(hash, ['0','1','2','3'])

[6144018481, 6272018864, 6400019251, 6528019634]

>>> hash('0')

6144018481

As we can see from the example, Python is using different hash() function depending on the type of data.

Python provides?hashlib?for secure hashes and message digests:

md5(), sha*():

>>> import hashlib>>> hashlib.md5('a')>>> hashlib.md5('a').digest()'\x0c\xc1u\xb9\xc0\xf1\xb6\xa81\xc3\x99\xe2iw&a;'>>> hashlib.md5('a').hexdigest()'0cc175b9c0f1b6a831c399e269772661'>>> hashlib.sha512('a')>>> hashlib.sha512('a').digest()

'\x1f@\xfc\x92\xda$\x16\x94u\ty\xeel\xf5\x82\xf2\xd5\xd7\xd2\x8e\x183]\xe0Z\xbcT\xd0V\x0e\x0fS\x02\x86\x0ce+\xf0\x8dV\x02R\xaa^t!\x05F\xf3i\xfb\xbb\xce\x8c\x12\xcf\xc7\x95{&R;\xfe\x9au'

>>> hashlib.sha512('a').hexdigest()

'1f40fc92da241694750979ee6cf582f2d5d7d28e18335de05abc54d0560e0f5302860c652bf08d560252aa5e74210546f369fbbbce8c12cfc7957b2652fe9a75'

>>>

Hashing example code

Hashing example code

The following code is a revision from?Sets (union/intersection) and itertools - Jaccard coefficient & shingling to check plagiarism. In this section, we used 64 bit integer (hash value from hash()) for the comparison of shingles instead of directly working on the string.

Output is exactly the same as the one we got using string comparison:

combinations=[(0, 1), (0, 2), (1, 2)]

(0, 1) : jaccard=0.0196078431373

(0, 2) : jaccard=0.0

(1, 2) : jaccard=0.0

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容