本質(zhì)
哈希表的底層數(shù)據(jù)結(jié)構(gòu)是數(shù)組,數(shù)組中每一個(gè)元素存放的是鍵值對。

核心原理
f(key) = index
將key傳入,通過某個(gè)算法f,計(jì)算出索引,如果索引沖突,再通過某個(gè)辦法對索引進(jìn)行+1或-1再算一遍,直到算到不沖突為止。
負(fù)載因子 = 總鍵值對數(shù) / 哈希表總長度
負(fù)載因子用來衡量用來衡量哈希表的空滿程度,一般,當(dāng)負(fù)載因子為0.75~1的值就會(huì)進(jìn)行自動(dòng)擴(kuò)容。
存取過程(以iOS方法緩存列表底層實(shí)現(xiàn)為例)
example:Class內(nèi)部結(jié)構(gòu)中有個(gè)方法緩存列表,當(dāng)objc_msgSend進(jìn)行到搜索方法緩存列表的步驟時(shí),為使查找更高效,緩存列表的底層就是通過哈希表實(shí)現(xiàn)對調(diào)用過的方法的緩存。
1.存
(1)根據(jù)key計(jì)算出索引值
(2)如果該索引中沒有元素,就存入鍵值對
(3)如果該索引中有元素,索引沖突,就對索引進(jìn)行+1或-1進(jìn)行存儲(chǔ)
(4)如果當(dāng)前已存在的所有索引都有元素,就進(jìn)行哈希表的擴(kuò)容
(5)設(shè)置新空間是舊空間的2倍
(6)重新分配內(nèi)存
(7)重新設(shè)置掩碼_mask = newCapacity-1
(8)會(huì)將舊內(nèi)存釋放掉,清空緩存
- objc_cache.mm源碼解析
static void cache_fill_nolock(Class cls, SEL sel, IMP imp, id receiver)
{
cacheUpdateLock.assertLocked();
// Never cache before +initialize is done
if (!cls->isInitialized()) return;
// Make sure the entry wasn't added to the cache by some other thread
// before we grabbed the cacheUpdateLock.
if (cache_getImp(cls, sel)) return;
cache_t *cache = getCache(cls);
cache_key_t key = getKey(sel);
// Use the cache as-is if it is less than 3/4 full
mask_t newOccupied = cache->occupied() + 1;
mask_t capacity = cache->capacity();
if (cache->isConstantEmptyCache()) {
// Cache is read-only. Replace it.
cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE);
}
else if (newOccupied <= capacity / 4 * 3) {
// Cache is less than 3/4 full. Use it as-is.
}
else {
// Cache is too full. Expand it.
cache->expand();
}
// Scan for the first unused slot and insert there.
// There is guaranteed to be an empty slot because the
// minimum size is 4 and we resized at 3/4 full.
bucket_t *bucket = cache->find(key, receiver);
if (bucket->key() == 0) cache->incrementOccupied();
//設(shè)置bucket中的key和imp
bucket->set(key, imp);
}
//擴(kuò)容
void cache_t::expand()
{
cacheUpdateLock.assertLocked();
uint32_t oldCapacity = capacity();
//新的空間是舊的空間的2倍
uint32_t newCapacity = oldCapacity ? oldCapacity*2 : INIT_CACHE_SIZE;
if ((uint32_t)(mask_t)newCapacity != newCapacity) {
// mask overflow - can't grow further
// fixme this wastes one bit of mask
newCapacity = oldCapacity;
}
//重新分配內(nèi)存
reallocate(oldCapacity, newCapacity);
}
void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity)
{
bool freeOld = canBeFreed();
bucket_t *oldBuckets = buckets();
bucket_t *newBuckets = allocateBuckets(newCapacity);
// Cache's old contents are not propagated.
// This is thought to save cache memory at the cost of extra cache fills.
// fixme re-measure this
assert(newCapacity > 0);
assert((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1);
//會(huì)重新設(shè)置最新的_mask = newCapacity - 1;
setBucketsAndMask(newBuckets, newCapacity - 1);
if (freeOld) {
//會(huì)將舊的釋放掉,清空緩存
cache_collect_free(oldBuckets, oldCapacity);
cache_collect(false);
}
}
2.取
(1)通過key得到索引
(2)do-while循環(huán),如果散列表元素的key恰好等于按位與掩碼(&_mask)取出的索引,就直接返回
(3)如果不是,就將索引-1繼續(xù)查找
- objc_cache.mm源碼解析
bucket_t * cache_t::find(cache_key_t k, id receiver)
{
assert(k != 0);
//返回buckets散列表
bucket_t *b = buckets();
mask_t m = mask();
//得到索引
mask_t begin = cache_hash(k, m);
mask_t i = begin;
do {
//如果buket_t的索引的恰好等于通過&_mask取出的索引,就直接返回
if (b[i].key() == 0 || b[i].key() == k) {
return &b[i];
}
} while ((i = cache_next(i, m)) != begin);
//如果不是,直接i-1,如果減到0還不行,就直接是_mask(相當(dāng)于再從最后一位繼續(xù)減)
// hack
Class cls = (Class)((uintptr_t)this - offsetof(objc_class, cache));
cache_t::bad_cache(receiver, (SEL)k, cls);
}
static inline mask_t cache_hash(cache_key_t key, mask_t mask)
{
//按位與mask得到索引
return (mask_t)(key & mask);
}
#if __arm__ || __x86_64__ || __i386__
// objc_msgSend has few registers available.
// Cache scan increments and wraps at special end-marking bucket.
#define CACHE_END_MARKER 1
static inline mask_t cache_next(mask_t i, mask_t mask) {
return (i+1) & mask;
}
#elif __arm64__
// objc_msgSend has lots of registers available.
// Cache scan decrements. No end marker needed.
#define CACHE_END_MARKER 0
static inline mask_t cache_next(mask_t i, mask_t mask) {
//arm64架構(gòu),索引-1
return i ? i-1 : mask;
}
總結(jié)
Objective-C中的實(shí)現(xiàn),優(yōu)缺點(diǎn)并存,但相對于實(shí)際情況而言,還是優(yōu)點(diǎn)大于缺點(diǎn)。因?yàn)閷τ诜椒ň彺媪斜?,一般不?huì)有大量的數(shù)據(jù),擴(kuò)容或許是少數(shù)情況。此時(shí),你可能會(huì)有疑問,當(dāng)數(shù)據(jù)量少的時(shí)候,豈不是犧牲了內(nèi)存?然而,哈希表就是采用了空間換時(shí)間,犧牲內(nèi)存空間,換取存取效率的方法。所以,整體符合需求即可。
優(yōu)點(diǎn):
整體而言,提升了存取效率,時(shí)間復(fù)雜度為O(1),無需遍歷。
缺點(diǎn):
當(dāng)哈希表比較大時(shí),如果擴(kuò)容會(huì)導(dǎo)致瞬間效率降低。
不同編程語言,哈希表的實(shí)現(xiàn)也各有特點(diǎn),但是本質(zhì)和原理不變。
例如:
對于大量的數(shù)據(jù)或者數(shù)據(jù)庫中的存取,Java和Redis也有自己的優(yōu)化方法。
1.Java中,當(dāng)哈希函數(shù)不合理導(dǎo)致鏈表過長時(shí),會(huì)使用紅黑樹來保證插入和查找的效率。
2.Redis 使用增量式擴(kuò)容方法優(yōu)化了這個(gè)缺點(diǎn),同時(shí)還有拉鏈法的實(shí)現(xiàn)(放在鏈表頭部頭插,因?yàn)樾虏迦氲恼{(diào)用概率會(huì)高)。