一、tcmalloc介紹<參考資源>

go的內(nèi)存管理和tcmalloc（thread-caching malloc）很像，先看一下tcmalloc的實(shí)現(xiàn)。

1.1 簡(jiǎn)介

tcmalloc是google推出的一種內(nèi)存分配器，常見(jiàn)的內(nèi)存分配器還有g(shù)libc的ptmalloc和google的jemalloc。相比于ptmalloc，tcmalloc性能更好，特別適用于高并發(fā)場(chǎng)景。

1.2 tcmalloc算法策略

tcmalloc分配的內(nèi)存主要來(lái)自兩個(gè)地方：全局緩存堆和進(jìn)程的私有緩存。對(duì)于一些小容量的內(nèi)存申請(qǐng)使用進(jìn)程的私有緩存，私有緩存不足的時(shí)候可以再?gòu)娜志彺嫔暾?qǐng)一部分作為私有緩存。對(duì)于大容量的內(nèi)存申請(qǐng)則需要從全局緩存中進(jìn)行申請(qǐng)。而大小容量的邊界就是32k。緩存的組織方式是一個(gè)單鏈表數(shù)組，數(shù)組的每個(gè)元素是一個(gè)單鏈表，鏈表中的每個(gè)元素具有相同的大小。

1.2.1 Small Object Allocation

小對(duì)象內(nèi)存分配默認(rèn)會(huì)分配86個(gè)不同大小的塊，而這些塊的大小并沒(méi)有明確說(shuō)明，需要查一下源碼。每種大小的塊的數(shù)組的長(zhǎng)度都采用使用了才初始化，有點(diǎn)類似于lazy-initialize。

Small Object Allocation

1.2.2 Big Object Allocation

對(duì)于大于32k的內(nèi)存申請(qǐng)，使用全局內(nèi)存來(lái)分配。全局內(nèi)存的組織也是單鏈表數(shù)組，數(shù)組長(zhǎng)度為256，分別對(duì)用1 page大小, 2 page大小（1 page=4k）。

Big Object Allocation

1.2.3 Span

tcmalloc使用span來(lái)管理內(nèi)存分頁(yè)，一個(gè)span可以包含幾個(gè)連續(xù)分頁(yè)。span的狀態(tài)只有未分配、作為大對(duì)象分配、作為小對(duì)象分配。

Span

1.3 golang對(duì)比

golang的內(nèi)存分配并不是和tcmalloc一模一樣。

局部緩存并不是分配給進(jìn)程或者線程，而是分配給P（這個(gè)還需要說(shuō)一下go的goroutine實(shí)現(xiàn)）
go的GC是stop the world，并不是每個(gè)進(jìn)程單獨(dú)進(jìn)行GC。
span的管理更有效率

二、基本知識(shí)

2.1 golang運(yùn)行調(diào)度

在 Golang 里面有三個(gè)基本的概念：G, M, P。

G（Goroutine）：我們所說(shuō)的協(xié)程，為用戶級(jí)的輕量級(jí)線程，每個(gè)Goroutine對(duì)象中的sched保存著其上下文信息
M（Machine）：對(duì)內(nèi)核級(jí)線程的封裝，數(shù)量對(duì)應(yīng)真實(shí)的CPU數(shù)（真正干活的對(duì)象）
P（Processor）：即為G和M的調(diào)度對(duì)象，用來(lái)調(diào)度G和M之間的關(guān)聯(lián)關(guān)系，其數(shù)量可通過(guò)GOMAXPROCS()來(lái)設(shè)置，默認(rèn)為核心數(shù)。
一個(gè) Goroutine 的運(yùn)行需要 G + P + M 三部分結(jié)合起來(lái)。

2.2 逃逸分析（escape analysis）

逃逸分析請(qǐng)戳 GoLang-逃逸分析

三、關(guān)鍵數(shù)據(jù)結(jié)構(gòu)

幾個(gè)關(guān)鍵的地方：

mcache: per-P cache，可以認(rèn)為是 local cache。
mcentral: 全局 cache，mcache 不夠用的時(shí)候向 mcentral 申請(qǐng)。
mheap: 當(dāng) mcentral 也不夠用的時(shí)候，通過(guò) mheap 向操作系統(tǒng)申請(qǐng)。
可以將其看成多級(jí)內(nèi)存分配器。

3.1 mcache

我們知道每個(gè) Gorontine 的運(yùn)行都是綁定到一個(gè) P 上面，mcache 是每個(gè) P 的 cache。這么做的好處是分配內(nèi)存時(shí)不需要加鎖。mcache 結(jié)構(gòu)如下。

//go version 1.10.8
//file runtime/mcache.go

// Per-thread (in Go, per-P) cache for small objects.
// No locking needed because it is per-thread (per-P).
//
// mcaches are allocated from non-GC'd memory, so any heap pointers
// must be specially handled.
//
//go:notinheap
type mcache struct {
    // The following members are accessed on every malloc,
    // so they are grouped here for better caching.
    next_sample int32   // trigger heap sample after allocating this many bytes
    local_scan  uintptr // bytes of scannable heap allocated

    // Allocator cache for tiny objects w/o pointers.
    // See "Tiny allocator" comment in malloc.go.

    // tiny points to the beginning of the current tiny block, or
    // nil if there is no current tiny block.
    //
    // tiny is a heap pointer. Since mcache is in non-GC'd memory,
    // we handle it by clearing it in releaseAll during mark
    // termination.
    tiny             uintptr
    tinyoffset       uintptr
    local_tinyallocs uintptr // number of tiny allocs not counted in other stats

    // The rest is not accessed on every malloc.

    alloc [numSpanClasses]*mspan // spans to allocate from, indexed by spanClass

    stackcache [_NumStackOrders]stackfreelist

    // Local allocator stats, flushed during GC.
    local_nlookup    uintptr                  // number of pointer lookups
    local_largefree  uintptr                  // bytes freed for large objects (>maxsmallsize)
    local_nlargefree uintptr                  // number of frees for large objects (>maxsmallsize)
    local_nsmallfree [_NumSizeClasses]uintptr // number of frees for small objects (<=maxsmallsize)
}

我們可以暫時(shí)只關(guān)注 alloc [_NumSizeClasses]*mspan，這是一個(gè)大小為 67 的指針（指針指向 mspan ）數(shù)組（_NumSizeClasses = 67），每個(gè)數(shù)組元素用來(lái)包含特定大小的塊。當(dāng)要分配內(nèi)存大小時(shí)，為 object 在 alloc 數(shù)組中選擇合適的元素來(lái)分配。67 種塊大小為 0，8 byte, 16 byte, … ，這個(gè)和 tcmalloc 稍有區(qū)別。

//go version 1.10.8
//file runtime/sizeclasses.go

var class_to_size = [_NumSizeClasses]uint16{0, 8, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 256, 288, 320, 352, 384, 416, 448, 480, 512, 576, 640, 704, 768, 896, 1024, 1152, 1280, 1408, 1536, 1792, 2048, 2304, 2688, 3072, 3200, 3456, 4096, 4864, 5376, 6144, 6528, 6784, 6912, 8192, 9472, 9728, 10240, 10880, 12288, 13568, 14336, 16384, 18432, 19072, 20480, 21760, 24576, 27264, 28672, 32768}

上面的 alloc 類似內(nèi)存池的 freelist 數(shù)組或者鏈表，正常實(shí)現(xiàn)每個(gè)數(shù)組元素是一個(gè)鏈表，鏈表由特定大小的塊串起來(lái)。但是這里統(tǒng)一使用了 mspan 結(jié)構(gòu)，那么只有一種可能，就是 mspan 中記錄了需要分配的塊大小。我們來(lái)看一下 mspan 的結(jié)構(gòu)。

3.2 mspan

span 在 tcmalloc 中作為一種管理內(nèi)存的基本單位而存在。Golang 的 mspan 的結(jié)構(gòu)如下。

//go version 1.10.8
//file runtime/mheap.go

//go:notinheap
type mspan struct {
    next *mspan     // next span in list, or nil if none
    prev *mspan     // previous span in list, or nil if none
    list *mSpanList // For debugging. TODO: Remove.

    startAddr uintptr // address of first byte of span aka s.base()
    npages    uintptr // number of pages in span

    manualFreeList gclinkptr // list of free objects in _MSpanManual spans

    // freeindex is the slot index between 0 and nelems at which to begin scanning
    // for the next free object in this span.
    // Each allocation scans allocBits starting at freeindex until it encounters a 0
    // indicating a free object. freeindex is then adjusted so that subsequent scans begin
    // just past the newly discovered free object.
    //
    // If freeindex == nelem, this span has no free objects.
    //
    // allocBits is a bitmap of objects in this span.
    // If n >= freeindex and allocBits[n/8] & (1<<(n%8)) is 0
    // then object n is free;
    // otherwise, object n is allocated. Bits starting at nelem are
    // undefined and should never be referenced.
    //
    // Object n starts at address n*elemsize + (start << pageShift).
    freeindex uintptr
    // TODO: Look up nelems from sizeclass and remove this field if it
    // helps performance.
    nelems uintptr // number of object in the span.

    // Cache of the allocBits at freeindex. allocCache is shifted
    // such that the lowest bit corresponds to the bit freeindex.
    // allocCache holds the complement of allocBits, thus allowing
    // ctz (count trailing zero) to use it directly.
    // allocCache may contain bits beyond s.nelems; the caller must ignore
    // these.
    allocCache uint64

    // allocBits and gcmarkBits hold pointers to a span's mark and
    // allocation bits. The pointers are 8 byte aligned.
    // There are three arenas where this data is held.
    // free: Dirty arenas that are no longer accessed
    //       and can be reused.
    // next: Holds information to be used in the next GC cycle.
    // current: Information being used during this GC cycle.
    // previous: Information being used during the last GC cycle.
    // A new GC cycle starts with the call to finishsweep_m.
    // finishsweep_m moves the previous arena to the free arena,
    // the current arena to the previous arena, and
    // the next arena to the current arena.
    // The next arena is populated as the spans request
    // memory to hold gcmarkBits for the next GC cycle as well
    // as allocBits for newly allocated spans.
    //
    // The pointer arithmetic is done "by hand" instead of using
    // arrays to avoid bounds checks along critical performance
    // paths.
    // The sweep will free the old allocBits and set allocBits to the
    // gcmarkBits. The gcmarkBits are replaced with a fresh zeroed
    // out memory.
    allocBits  *gcBits
    gcmarkBits *gcBits

    // sweep generation:
    // if sweepgen == h->sweepgen - 2, the span needs sweeping
    // if sweepgen == h->sweepgen - 1, the span is currently being swept
    // if sweepgen == h->sweepgen, the span is swept and ready to use
    // h->sweepgen is incremented by 2 after every GC

    sweepgen    uint32
    divMul      uint16     // for divide by elemsize - divMagic.mul
    baseMask    uint16     // if non-0, elemsize is a power of 2, & this will get object allocation base
    allocCount  uint16     // number of allocated objects
    spanclass   spanClass  // size class and noscan (uint8)
    incache     bool       // being used by an mcache
    state       mSpanState // mspaninuse etc
    needzero    uint8      // needs to be zeroed before allocation
    divShift    uint8      // for divide by elemsize - divMagic.shift
    divShift2   uint8      // for divide by elemsize - divMagic.shift2
    elemsize    uintptr    // computed from sizeclass or from npages
    unusedsince int64      // first time spotted by gc in mspanfree state
    npreleased  uintptr    // number of pages released to the os
    limit       uintptr    // end of data in span
    speciallock mutex      // guards specials list
    specials    *special   // linked list of special records sorted by offset.
}

從上面的結(jié)構(gòu)可以看出：

next, prev: 指針域，因?yàn)?mspan 一般都是以鏈表形式使用。
npages: mspan 的大小為 page 大小的整數(shù)倍。
spanclass(size class and noscan (uint8)): 0 ~ _NumSizeClasses 之間的一個(gè)值，這個(gè)解釋了我們的疑問(wèn)。比如，spanclass = 3，那么這個(gè) mspan 被分割成 32 byte 的塊。
elemsize: 通過(guò) sizeclass 或者 npages 可以計(jì)算出來(lái)。比如 sizeclass = 3, elemsize = 32 byte。對(duì)于大于 32Kb 的內(nèi)存分配，都是分配整數(shù)頁(yè)，elemsize = page_size * npages。
nelems: span 中包塊的總數(shù)目。
freeindex: 0 ~ nelemes-1，表示分配到第幾個(gè)塊。

3.3 mcentral

上面說(shuō)到當(dāng) mcache 不夠用的時(shí)候，會(huì)從 mcentral 申請(qǐng)。那我們下面就來(lái)介紹一下 mcentral。

//go version 1.10.8
//file runtime/mcentral.go

// Central list of free objects of a given size.
//
//go:notinheap
type mcentral struct {
    lock      mutex
    spanclass spanClass
    nonempty  mSpanList // list of spans with a free object, ie a nonempty free list
    empty     mSpanList // list of spans with no free objects (or cached in an mcache)

    // nmalloc is the cumulative count of objects allocated from
    // this mcentral, assuming all spans in mcaches are
    // fully-allocated. Written atomically, read under STW.
    nmalloc uint64
}

// mSpanList heads a linked list of spans.
//
//go:notinheap
type mSpanList struct {
    first *mspan // first span in list, or nil if none
    last  *mspan // last span in list, or nil if none
}

mcentral 分析：

spanclass: 也有成員 spanclass，那么 mcentral 是不是也有 67 個(gè)呢？是的。
lock: 因?yàn)闀?huì)有多個(gè) P 過(guò)來(lái)競(jìng)爭(zhēng)。
nonempty: mspan 的雙向鏈表，當(dāng)前 mcentral 中可用的 mspan list。
empty: 已經(jīng)被使用的，可以認(rèn)為是一種對(duì)所有 mspan 的 track。

問(wèn)題來(lái)了，mcentral 存在于什么地方？雖然在上面我們將 mcentral 和 mheap 作為兩個(gè)部分來(lái)講，但是作為全局的結(jié)構(gòu)，這兩部分是可以定義在一起的。實(shí)際上也是這樣，mcentral 包含在 mheap 中。

3.4 mheap

Golang 中的 mheap 結(jié)構(gòu)定義如下。

//go version 1.10.8
//file runtime/mheap.go

// Main malloc heap.
// The heap itself is the "free[]" and "large" arrays,
// but all the other global data is here too.
//
// mheap must not be heap-allocated because it contains mSpanLists,
// which must not be heap-allocated.
//
//go:notinheap
type mheap struct {
    lock      mutex
    free      [_MaxMHeapList]mSpanList // free lists of given length up to _MaxMHeapList
    freelarge mTreap                   // free treap of length >= _MaxMHeapList
    busy      [_MaxMHeapList]mSpanList // busy lists of large spans of given length
    busylarge mSpanList                // busy lists of large spans length >= _MaxMHeapList
    sweepgen  uint32                   // sweep generation, see comment in mspan
    sweepdone uint32                   // all spans are swept
    sweepers  uint32                   // number of active sweepone calls

    // allspans is a slice of all mspans ever created. Each mspan
    // appears exactly once.
    //
    // The memory for allspans is manually managed and can be
    // reallocated and move as the heap grows.
    //
    // In general, allspans is protected by mheap_.lock, which
    // prevents concurrent access as well as freeing the backing
    // store. Accesses during STW might not hold the lock, but
    // must ensure that allocation cannot happen around the
    // access (since that may free the backing store).
    allspans []*mspan // all spans out there

    // spans is a lookup table to map virtual address page IDs to *mspan.
    // For allocated spans, their pages map to the span itself.
    // For free spans, only the lowest and highest pages map to the span itself.
    // Internal pages map to an arbitrary span.
    // For pages that have never been allocated, spans entries are nil.
    //
    // Modifications are protected by mheap.lock. Reads can be
    // performed without locking, but ONLY from indexes that are
    // known to contain in-use or stack spans. This means there
    // must not be a safe-point between establishing that an
    // address is live and looking it up in the spans array.
    //
    // This is backed by a reserved region of the address space so
    // it can grow without moving. The memory up to len(spans) is
    // mapped. cap(spans) indicates the total reserved memory.
    spans []*mspan

    // sweepSpans contains two mspan stacks: one of swept in-use
    // spans, and one of unswept in-use spans. These two trade
    // roles on each GC cycle. Since the sweepgen increases by 2
    // on each cycle, this means the swept spans are in
    // sweepSpans[sweepgen/2%2] and the unswept spans are in
    // sweepSpans[1-sweepgen/2%2]. Sweeping pops spans from the
    // unswept stack and pushes spans that are still in-use on the
    // swept stack. Likewise, allocating an in-use span pushes it
    // on the swept stack.
    sweepSpans [2]gcSweepBuf

    _ uint32 // align uint64 fields on 32-bit for atomics

    // Proportional sweep
    //
    // These parameters represent a linear function from heap_live
    // to page sweep count. The proportional sweep system works to
    // stay in the black by keeping the current page sweep count
    // above this line at the current heap_live.
    //
    // The line has slope sweepPagesPerByte and passes through a
    // basis point at (sweepHeapLiveBasis, pagesSweptBasis). At
    // any given time, the system is at (memstats.heap_live,
    // pagesSwept) in this space.
    //
    // It's important that the line pass through a point we
    // control rather than simply starting at a (0,0) origin
    // because that lets us adjust sweep pacing at any time while
    // accounting for current progress. If we could only adjust
    // the slope, it would create a discontinuity in debt if any
    // progress has already been made.
    pagesInUse         uint64  // pages of spans in stats _MSpanInUse; R/W with mheap.lock
    pagesSwept         uint64  // pages swept this cycle; updated atomically
    pagesSweptBasis    uint64  // pagesSwept to use as the origin of the sweep ratio; updated atomically
    sweepHeapLiveBasis uint64  // value of heap_live to use as the origin of sweep ratio; written with lock, read without
    sweepPagesPerByte  float64 // proportional sweep ratio; written with lock, read without
    // TODO(austin): pagesInUse should be a uintptr, but the 386
    // compiler can't 8-byte align fields.

    // Malloc stats.
    largealloc  uint64                  // bytes allocated for large objects
    nlargealloc uint64                  // number of large object allocations
    largefree   uint64                  // bytes freed for large objects (>maxsmallsize)
    nlargefree  uint64                  // number of frees for large objects (>maxsmallsize)
    nsmallfree  [_NumSizeClasses]uint64 // number of frees for small objects (<=maxsmallsize)

    // range of addresses we might see in the heap
    bitmap        uintptr // Points to one byte past the end of the bitmap
    bitmap_mapped uintptr

    // The arena_* fields indicate the addresses of the Go heap.
    //
    // The maximum range of the Go heap is
    // [arena_start, arena_start+_MaxMem+1).
    //
    // The range of the current Go heap is
    // [arena_start, arena_used). Parts of this range may not be
    // mapped, but the metadata structures are always mapped for
    // the full range.
    arena_start uintptr
    arena_used  uintptr // Set with setArenaUsed.

    // The heap is grown using a linear allocator that allocates
    // from the block [arena_alloc, arena_end). arena_alloc is
    // often, but *not always* equal to arena_used.
    arena_alloc uintptr
    arena_end   uintptr

    // arena_reserved indicates that the memory [arena_alloc,
    // arena_end) is reserved (e.g., mapped PROT_NONE). If this is
    // false, we have to be careful not to clobber existing
    // mappings here. If this is true, then we own the mapping
    // here and *must* clobber it to use it.
    arena_reserved bool

    _ uint32 // ensure 64-bit alignment

    // central free lists for small size classes.
    // the padding makes sure that the MCentrals are
    // spaced CacheLineSize bytes apart, so that each MCentral.lock
    // gets its own cache line.
    // central is indexed by spanClass.
    central [numSpanClasses]struct {
        mcentral mcentral
        pad      [sys.CacheLineSize - unsafe.Sizeof(mcentral{})%sys.CacheLineSize]byte
    }

    spanalloc             fixalloc // allocator for span*
    cachealloc            fixalloc // allocator for mcache*
    treapalloc            fixalloc // allocator for treapNodes* used by large objects
    specialfinalizeralloc fixalloc // allocator for specialfinalizer*
    specialprofilealloc   fixalloc // allocator for specialprofile*
    speciallock           mutex    // lock for special record allocators.

    unused *specialfinalizer // never set, just here to force the specialfinalizer type into DWARF
}

var mheap_ mheap

mheap_ 是一個(gè)全局變量，會(huì)在系統(tǒng)初始化的時(shí)候初始化（在函數(shù) mallocinit() 中）。我們先看一下 mheap 具體結(jié)構(gòu)。

allspans []*mspan: 所有的 spans 都是通過(guò) mheap_ 申請(qǐng)，所有申請(qǐng)過(guò)的 mspan 都會(huì)記錄在 allspans。結(jié)構(gòu)體中的 lock 就是用來(lái)保證并發(fā)安全的。注釋中有關(guān)于 STW 的說(shuō)明，與Golang 的 GC 有關(guān)，這里不做詳細(xì)說(shuō)明。
central [_NumSizeClasses]…: 這個(gè)就是之前介紹的 mcentral ，每種大小的塊對(duì)應(yīng)一個(gè) mcentral。mcentral 上面介紹過(guò)了。pad 可以認(rèn)為是一個(gè)字節(jié)填充，為了避免偽共享（false sharing）問(wèn)題的。False Sharing 可以參考 False Sharing - wikipedia，這里就不細(xì)說(shuō)了。
sweepgen, sweepdone: GC 相關(guān)。（Golang 的 GC 策略是 Mark & Sweep, 這里是用來(lái)表示 sweep 的，這里就不再深入了。）
free [_MaxMHeapList]mSpanList: 這是一個(gè) SpanList 數(shù)組，每個(gè) SpanList 里面的 mspan 由 1 ~ 127 (_MaxMHeapList - 1) 個(gè) page 組成。比如 free[3] 是由包含 3 個(gè) page 的 mspan 組成的鏈表。free 表示的是 free list，也就是未分配的。對(duì)應(yīng)的還有 busy list。
freelarge: mspan 組成的鏈表，每個(gè)元素（也就是 mspan）的 page 個(gè)數(shù)大于 127。對(duì)應(yīng)的還有 busylarge。
spans []*mspan: 記錄 arena 區(qū)域頁(yè)號(hào)（page number）和 mspan 的映射關(guān)系。
arena_start, arena_end, arena_used: 要解釋這幾個(gè)變量之前要解釋一下 arena。arena 是 Golang 中用于分配內(nèi)存的連續(xù)虛擬地址區(qū)域。由 mheap 管理，堆上申請(qǐng)的所有內(nèi)存都來(lái)自 arena。那么如何標(biāo)志內(nèi)存可用呢？操作系統(tǒng)的常見(jiàn)做法用兩種：一種是用鏈表將所有的可用內(nèi)存都串起來(lái)；另一種是使用位圖來(lái)標(biāo)志內(nèi)存塊是否可用。結(jié)合上面一條 spans，內(nèi)存的布局是下面這樣的。

golang內(nèi)存布局
spanalloc, cachealloc fixalloc: fixalloc 是 free-list，用來(lái)分配特定大小的塊。
剩下的是一些統(tǒng)計(jì)信息和 GC 相關(guān)的信息，這里暫且按住不表。

四、初始化

在系統(tǒng)初始化階段，上面介紹的幾個(gè)結(jié)構(gòu)會(huì)被進(jìn)行初始化，我們直接看一下初始化代碼：mallocinit()。

//go version 1.10.8
//file runtime/malloc.go

func mallocinit() {
    if class_to_size[_TinySizeClass] != _TinySize {
        throw("bad TinySizeClass")
    }

    testdefersizes()

    // Copy class sizes out for statistics table.
    for i := range class_to_size {
        memstats.by_size[i].size = uint32(class_to_size[i])
    }

    // Check physPageSize.
    if physPageSize == 0 {
        // The OS init code failed to fetch the physical page size.
        throw("failed to get system page size")
    }
    if physPageSize < minPhysPageSize {
        print("system page size (", physPageSize, ") is smaller than minimum page size (", minPhysPageSize, ")\n")
        throw("bad system page size")
    }
    if physPageSize&(physPageSize-1) != 0 {
        print("system page size (", physPageSize, ") must be a power of 2\n")
        throw("bad system page size")
    }

    // The auxiliary regions start at p and are laid out in the
    // following order: spans, bitmap, arena.
    var p, pSize uintptr
    var reserved bool

    // The spans array holds one *mspan per _PageSize of arena.
    var spansSize uintptr = (_MaxMem + 1) / _PageSize * sys.PtrSize
    spansSize = round(spansSize, _PageSize)
    // The bitmap holds 2 bits per word of arena.
    var bitmapSize uintptr = (_MaxMem + 1) / (sys.PtrSize * 8 / 2)
    bitmapSize = round(bitmapSize, _PageSize)

    // Set up the allocation arena, a contiguous area of memory where
    // allocated data will be found.
    if sys.PtrSize == 8 {
        // On a 64-bit machine, allocate from a single contiguous reservation.
        // 512 GB (MaxMem) should be big enough for now.
        //
        // The code will work with the reservation at any address, but ask
        // SysReserve to use 0x0000XXc000000000 if possible (XX=00...7f).
        // Allocating a 512 GB region takes away 39 bits, and the amd64
        // doesn't let us choose the top 17 bits, so that leaves the 9 bits
        // in the middle of 0x00c0 for us to choose. Choosing 0x00c0 means
        // that the valid memory addresses will begin 0x00c0, 0x00c1, ..., 0x00df.
        // In little-endian, that's c0 00, c1 00, ..., df 00. None of those are valid
        // UTF-8 sequences, and they are otherwise as far away from
        // ff (likely a common byte) as possible. If that fails, we try other 0xXXc0
        // addresses. An earlier attempt to use 0x11f8 caused out of memory errors
        // on OS X during thread allocations.  0x00c0 causes conflicts with
        // AddressSanitizer which reserves all memory up to 0x0100.
        // These choices are both for debuggability and to reduce the
        // odds of a conservative garbage collector (as is still used in gccgo)
        // not collecting memory because some non-pointer block of memory
        // had a bit pattern that matched a memory address.
        //
        // Actually we reserve 544 GB (because the bitmap ends up being 32 GB)
        // but it hardly matters: e0 00 is not valid UTF-8 either.
        //
        // If this fails we fall back to the 32 bit memory mechanism
        //
        // However, on arm64, we ignore all this advice above and slam the
        // allocation at 0x40 << 32 because when using 4k pages with 3-level
        // translation buffers, the user address space is limited to 39 bits
        // On darwin/arm64, the address space is even smaller.
        arenaSize := round(_MaxMem, _PageSize)
        pSize = bitmapSize + spansSize + arenaSize + _PageSize
        for i := 0; i <= 0x7f; i++ {
            switch {
            case GOARCH == "arm64" && GOOS == "darwin":
                p = uintptr(i)<<40 | uintptrMask&(0x0013<<28)
            case GOARCH == "arm64":
                p = uintptr(i)<<40 | uintptrMask&(0x0040<<32)
            default:
                p = uintptr(i)<<40 | uintptrMask&(0x00c0<<32)
            }
            p = uintptr(sysReserve(unsafe.Pointer(p), pSize, &reserved))
            if p != 0 {
                break
            }
        }
    }

    if p == 0 {
        // On a 32-bit machine, we can't typically get away
        // with a giant virtual address space reservation.
        // Instead we map the memory information bitmap
        // immediately after the data segment, large enough
        // to handle the entire 4GB address space (256 MB),
        // along with a reservation for an initial arena.
        // When that gets used up, we'll start asking the kernel
        // for any memory anywhere.

        // We want to start the arena low, but if we're linked
        // against C code, it's possible global constructors
        // have called malloc and adjusted the process' brk.
        // Query the brk so we can avoid trying to map the
        // arena over it (which will cause the kernel to put
        // the arena somewhere else, likely at a high
        // address).
        procBrk := sbrk0()

        // If we fail to allocate, try again with a smaller arena.
        // This is necessary on Android L where we share a process
        // with ART, which reserves virtual memory aggressively.
        // In the worst case, fall back to a 0-sized initial arena,
        // in the hope that subsequent reservations will succeed.
        arenaSizes := []uintptr{
            512 << 20,
            256 << 20,
            128 << 20,
            0,
        }

        for _, arenaSize := range arenaSizes {
            // SysReserve treats the address we ask for, end, as a hint,
            // not as an absolute requirement. If we ask for the end
            // of the data segment but the operating system requires
            // a little more space before we can start allocating, it will
            // give out a slightly higher pointer. Except QEMU, which
            // is buggy, as usual: it won't adjust the pointer upward.
            // So adjust it upward a little bit ourselves: 1/4 MB to get
            // away from the running binary image and then round up
            // to a MB boundary.
            p = round(firstmoduledata.end+(1<<18), 1<<20)
            pSize = bitmapSize + spansSize + arenaSize + _PageSize
            if p <= procBrk && procBrk < p+pSize {
                // Move the start above the brk,
                // leaving some room for future brk
                // expansion.
                p = round(procBrk+(1<<20), 1<<20)
            }
            p = uintptr(sysReserve(unsafe.Pointer(p), pSize, &reserved))
            if p != 0 {
                break
            }
        }
        if p == 0 {
            throw("runtime: cannot reserve arena virtual address space")
        }
    }

    // PageSize can be larger than OS definition of page size,
    // so SysReserve can give us a PageSize-unaligned pointer.
    // To overcome this we ask for PageSize more and round up the pointer.
    p1 := round(p, _PageSize)
    pSize -= p1 - p

    spansStart := p1
    p1 += spansSize
    mheap_.bitmap = p1 + bitmapSize
    p1 += bitmapSize
    if sys.PtrSize == 4 {
        // Set arena_start such that we can accept memory
        // reservations located anywhere in the 4GB virtual space.
        mheap_.arena_start = 0
    } else {
        mheap_.arena_start = p1
    }
    mheap_.arena_end = p + pSize
    mheap_.arena_used = p1
    mheap_.arena_alloc = p1
    mheap_.arena_reserved = reserved

    if mheap_.arena_start&(_PageSize-1) != 0 {
        println("bad pagesize", hex(p), hex(p1), hex(spansSize), hex(bitmapSize), hex(_PageSize), "start", hex(mheap_.arena_start))
        throw("misrounded allocation in mallocinit")
    }

    // Initialize the rest of the allocator.
    mheap_.init(spansStart, spansSize)
    _g_ := getg()
    _g_.m.mcache = allocmcache()
}

4.1 arena 相關(guān)

// The spans array holds one *mspan per _PageSize of arena.
var spansSize uintptr = (_MaxMem + 1) / _PageSize * sys.PtrSize
spansSize = round(spansSize, _PageSize)
// The bitmap holds 2 bits per word of arena.
var bitmapSize uintptr = (_MaxMem + 1) / (sys.PtrSize * 8 / 2)
bitmapSize = round(bitmapSize, _PageSize)
arenaSize := round(_MaxMem, _PageSize)

// _MaxMem is the maximum heap arena size minus 1.
//
// On 32-bit, this is also the maximum heap pointer value,
// since the arena starts at address 0.
_MaxMem = 1<<_MHeapMap_TotalBits - 1

首先解釋一下變量 _MaxMem ，里面還有一個(gè)變量就不再列出來(lái)了。簡(jiǎn)單來(lái)說(shuō) _MaxMem 就是系統(tǒng)為 arena 區(qū)分配的大小：64 位系統(tǒng)分配 512 G；對(duì)于 Windows 64 位系統(tǒng)，arena 區(qū)分配 32 G。round 是一個(gè)對(duì)齊操作，向上取 _PageSize 的倍數(shù)。實(shí)現(xiàn)也很有意思，代碼如下。

// round n up to a multiple of a.  a must be a power of 2.
func round(n, a uintptr) uintptr {
    return (n + a - 1) &^ (a - 1)
}

bitmap 用兩個(gè) bit 表示一個(gè)字的可用狀態(tài)，那么算下來(lái) bitmap 的大小為 16 G。spans 記錄的 arena 區(qū)的塊頁(yè)號(hào)和對(duì)應(yīng)的 mspan 指針的對(duì)應(yīng)關(guān)系。比如 arena 區(qū)內(nèi)存地址 x，對(duì)應(yīng)的頁(yè)號(hào)就是 page_num = (x - arena_start) / page_size，那么 spans 就會(huì)記錄 spans[page_num] = x。如果 arena 為 512 G的話，spans 區(qū)的大小為 512 G / 8K * 8 = 512 M。這里值得注意的是 Golang 的內(nèi)存管理虛擬地址頁(yè)大小為 8k。

_PageSize = 1 << _PageShift
_PageMask = _PageSize - 1

所以這一段連續(xù)的的虛擬內(nèi)存布局（64 位）如下：

golang虛擬內(nèi)存布局

4.2 虛擬地址申請(qǐng)

主要是下面這段代碼。

pSize = bitmapSize + spansSize + arenaSize + _PageSize
for i := 0; i <= 0x7f; i++ {
    switch {
    case GOARCH == "arm64" && GOOS == "darwin":
        p = uintptr(i)<<40 | uintptrMask&(0x0013<<28)
    case GOARCH == "arm64":
        p = uintptr(i)<<40 | uintptrMask&(0x0040<<32)
    default:
        p = uintptr(i)<<40 | uintptrMask&(0x00c0<<32)
    }
    p = uintptr(sysReserve(unsafe.Pointer(p), pSize, &reserved))
    if p != 0 {
        break
    }
}

初始化的時(shí)候，Golang 向操作系統(tǒng)申請(qǐng)一段連續(xù)的地址空間，就是上面的 spans + bitmap + arena。p 就是這段連續(xù)地址空間的開(kāi)始地址，不同平臺(tái)的 p 取值不一樣。向 OS 申請(qǐng)的時(shí)候視不同的 OS 版本，調(diào)用不同的系統(tǒng)調(diào)用，比如 Unix 系統(tǒng)調(diào)用 mmap (mmap 向操作系統(tǒng)內(nèi)核申請(qǐng)新的虛擬地址區(qū)間，可指定起始地址和長(zhǎng)度)，Windows 系統(tǒng)調(diào)用 VirtualAlloc （類似 mmap）。

//來(lái)源于互聯(lián)網(wǎng)

//bsd
func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer {
    if sys.PtrSize == 8 && uint64(n) > 1<<32 || sys.GoosNacl != 0 {
        *reserved = false
        return v
    }

    p := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
    if uintptr(p) < 4096 {
        return nil
    }
    *reserved = true
    return p
}

//darwin
func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer {
    *reserved = true
    p := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
    if uintptr(p) < 4096 {
        return nil
    }
    return p
}

//linux
func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer {
    ...
    p := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
    if uintptr(p) < 4096 {
        return nil
    }
    *reserved = true
    return p
}
//windows
func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer {
    *reserved = true
    // v is just a hint.
    // First try at v.
    v = unsafe.Pointer(stdcall4(_VirtualAlloc, uintptr(v), n, _MEM_RESERVE, _PAGE_READWRITE))
    if v != nil {
        return v
    }

    // Next let the kernel choose the address.
    return unsafe.Pointer(stdcall4(_VirtualAlloc, 0, n, _MEM_RESERVE, _PAGE_READWRITE))
}

4.3 mheap 初始化

我們上面介紹 mheap 結(jié)構(gòu)的時(shí)候知道 spans, bitmap, arena 都是存在于 mheap 中的，從操作系統(tǒng)申請(qǐng)完地址之后就是初始化 mheap 了。

    // PageSize can be larger than OS definition of page size,
    // so SysReserve can give us a PageSize-unaligned pointer.
    // To overcome this we ask for PageSize more and round up the pointer.
    p1 := round(p, _PageSize)
    pSize -= p1 - p

    spansStart := p1
    p1 += spansSize
    mheap_.bitmap = p1 + bitmapSize
    p1 += bitmapSize
    if sys.PtrSize == 4 {
        // Set arena_start such that we can accept memory
        // reservations located anywhere in the 4GB virtual space.
        mheap_.arena_start = 0
    } else {
        mheap_.arena_start = p1
    }
    mheap_.arena_end = p + pSize
    mheap_.arena_used = p1
    mheap_.arena_alloc = p1
    mheap_.arena_reserved = reserved

    if mheap_.arena_start&(_PageSize-1) != 0 {
        println("bad pagesize", hex(p), hex(p1), hex(spansSize), hex(bitmapSize), hex(_PageSize), "start", hex(mheap_.arena_start))
        throw("misrounded allocation in mallocinit")
    }

    // Initialize the rest of the allocator.
    mheap_.init(spansStart, spansSize)
    _g_ := getg()
    _g_.m.mcache = allocmcache()

p 是從連續(xù)虛擬地址的起始地址，先進(jìn)行對(duì)齊，然后初始化 arena，bitmap，spans 地址。mheap_.init()會(huì)初始化 fixalloc 等相關(guān)的成員，還有 mcentral 的初始化。

// Initialize the heap.
func (h *mheap) init(spansStart, spansBytes uintptr) {
    h.treapalloc.init(unsafe.Sizeof(treapNode{}), nil, nil, &memstats.other_sys)
    h.spanalloc.init(unsafe.Sizeof(mspan{}), recordspan, unsafe.Pointer(h), &memstats.mspan_sys)
    h.cachealloc.init(unsafe.Sizeof(mcache{}), nil, nil, &memstats.mcache_sys)
    h.specialfinalizeralloc.init(unsafe.Sizeof(specialfinalizer{}), nil, nil, &memstats.other_sys)
    h.specialprofilealloc.init(unsafe.Sizeof(specialprofile{}), nil, nil, &memstats.other_sys)

    // Don't zero mspan allocations. Background sweeping can
    // inspect a span concurrently with allocating it, so it's
    // important that the span's sweepgen survive across freeing
    // and re-allocating a span to prevent background sweeping
    // from improperly cas'ing it from 0.
    //
    // This is safe because mspan contains no heap pointers.
    h.spanalloc.zero = false

    // h->mapcache needs no init
    for i := range h.free {
        h.free[i].init()
        h.busy[i].init()
    }

    h.busylarge.init()
    for i := range h.central {
        h.central[i].mcentral.init(spanClass(i))
    }

    sp := (*slice)(unsafe.Pointer(&h.spans))
    sp.array = unsafe.Pointer(spansStart)
    sp.len = 0
    sp.cap = int(spansBytes / sys.PtrSize)

    // Map metadata structures. But don't map race detector memory
    // since we're not actually growing the arena here (and TSAN
    // gets mad if you map 0 bytes).
    h.setArenaUsed(h.arena_used, false)
}

mheap 初始化之后，對(duì)當(dāng)前的線程也就是 M 進(jìn)行初始化。

//獲取當(dāng)前 G
_g_ := getg()
// 獲取 G 上綁定的 M 的 mcache
_g_.m.mcache = allocmcache()

4.4 per-P mcache 初始化

上面好像并沒(méi)有說(shuō)到針對(duì) P 的 mcache 初始化，因?yàn)檫@個(gè)時(shí)候還沒(méi)有初始化 P。我們看一下 bootstrap 的代碼。

func schedinit() {
    ...
    mallocinit()
    ...
    
    if procs > _MaxGomaxprocs {
        procs = _MaxGomaxprocs
    }
    if procresize(procs) != nil {
        ...
    }
}

其中 mallocinit() 上面說(shuō)過(guò)了。對(duì) P 的初始化在函數(shù) procresize() 中執(zhí)行，我們下面只看內(nèi)存相關(guān)的部分。

//go version 1.10.8
//file runtime/proc.go

func procresize(nprocs int32) *p {
    ...
    // initialize new P's
    for i := int32(0); i < nprocs; i++ {
        pp := allp[i]
        if pp == nil {
            pp = new(p)
            pp.id = i
            pp.status = _Pgcstop
            pp.sudogcache = pp.sudogbuf[:0]
            for i := range pp.deferpool {
                pp.deferpool[i] = pp.deferpoolbuf[i][:0]
            }
            atomicstorep(unsafe.Pointer(&allp[i]), unsafe.Pointer(pp))
        }
        // P mcache 初始化
        if pp.mcache == nil {
            if old == 0 && i == 0 {
                if getg().m.mcache == nil {
                    throw("missing mcache?")
                }
                // P[0] 分配給主 Goroutine
                pp.mcache = getg().m.mcache // bootstrap
            } else {
                // P[0] 之外的 P 申請(qǐng) mcache
                pp.mcache = allocmcache()
            }
        }
        ...
    }
    ...
}

所有的 P 都存放在一個(gè)全局?jǐn)?shù)組 allp 中，procresize() 的目的就是將 allp 中用到的 P 進(jìn)行初始化，同時(shí)對(duì)多余 P 的資源剝離。

五、內(nèi)存分配

先說(shuō)一下給對(duì)象 object 分配內(nèi)存的主要流程：

object size > 32K，則使用 mheap 直接分配。
object size < 16 byte，使用 mcache 的小對(duì)象分配器 tiny 直接分配。（其實(shí) tiny 就是一個(gè)指針，暫且這么說(shuō)吧。）
object size > 16 byte && size <=32K byte 時(shí)，先使用 mcache 中對(duì)應(yīng)的 size class 分配。
如果 mcache 對(duì)應(yīng)的 size class 的 span 已經(jīng)沒(méi)有可用的塊，則向 mcentral 請(qǐng)求。
如果 mcentral 也沒(méi)有可用的塊，則向 mheap 申請(qǐng)，并切分。
如果 mheap 也沒(méi)有合適的 span，則向操作系統(tǒng)申請(qǐng)。
我們看一下在堆上，也就是 arena 區(qū)分配內(nèi)存的相關(guān)函數(shù)。

package main

func foo() *int {
    x := 1
    return &x
}

func main() {
    x := foo()
    println(*x)
}

根據(jù)之前介紹的逃逸分析，foo() 中的 x 會(huì)被分配到堆上。把上面代碼保存為 main.go 看一下匯編代碼。

$ go build -gcflags '-l' -o main main.go
$ go tool objdump -s "main\.foo" main
TEXT main.foo(SB) /Users/xxx/src/heap/main.go
  main.go:3             0x104aad0               65488b0c25a0080000      MOVQ GS:0x8a0, CX                       
  main.go:3             0x104aad9               483b6110                CMPQ 0x10(CX), SP                       
  main.go:3             0x104aadd               7639                    JBE 0x104ab18                           
  main.go:3             0x104aadf               4883ec18                SUBQ $0x18, SP                          
  main.go:3             0x104aae3               48896c2410              MOVQ BP, 0x10(SP)                       
  main.go:3             0x104aae8               488d6c2410              LEAQ 0x10(SP), BP                       
  main.go:4             0x104aaed               488d052c990000          LEAQ type.*+38976(SB), AX               
  main.go:4             0x104aaf4               48890424                MOVQ AX, 0(SP)                          
  main.go:4             0x104aaf8               e82304fcff              CALL runtime.newobject(SB)              
  main.go:4             0x104aafd               488b442408              MOVQ 0x8(SP), AX                        
  main.go:4             0x104ab02               48c70001000000          MOVQ $0x1, 0(AX)                        
  main.go:5             0x104ab09               4889442420              MOVQ AX, 0x20(SP)                       
  main.go:5             0x104ab0e               488b6c2410              MOVQ 0x10(SP), BP                       
  main.go:5             0x104ab13               4883c418                ADDQ $0x18, SP                          
  main.go:5             0x104ab17               c3                      RET                                     
  main.go:3             0x104ab18               e80389ffff              CALL runtime.morestack_noctxt(SB)       
  main.go:3             0x104ab1d               ebb1                    JMP main.foo(SB)                        
  :-1                   0x104ab1f               cc                      INT $0x3

堆上內(nèi)存分配調(diào)用了 runtime 包的 newobject 函數(shù)。

//go version 1.10.8
//file runtime/malloc.go

// implementation of new builtin
// compiler (both frontend and SSA backend) knows the signature
// of this function
func newobject(typ *_type) unsafe.Pointer {
    return mallocgc(typ.size, typ, true)
}

// Allocate an object of size bytes.
// Small objects are allocated from the per-P cache's free lists.
// Large objects (> 32 kB) are allocated straight from the heap.
func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
    if gcphase == _GCmarktermination {
        throw("mallocgc called with gcphase == _GCmarktermination")
    }

    if size == 0 {
        return unsafe.Pointer(&zerobase)
    }

    if debug.sbrk != 0 {
        align := uintptr(16)
        if typ != nil {
            align = uintptr(typ.align)
        }
        return persistentalloc(size, align, &memstats.other_sys)
    }

    // assistG is the G to charge for this allocation, or nil if
    // GC is not currently active.
    var assistG *g
    if gcBlackenEnabled != 0 {
        // Charge the current user G for this allocation.
        assistG = getg()
        if assistG.m.curg != nil {
            assistG = assistG.m.curg
        }
        // Charge the allocation against the G. We'll account
        // for internal fragmentation at the end of mallocgc.
        assistG.gcAssistBytes -= int64(size)

        if assistG.gcAssistBytes < 0 {
            // This G is in debt. Assist the GC to correct
            // this before allocating. This must happen
            // before disabling preemption.
            gcAssistAlloc(assistG)
        }
    }

    // Set mp.mallocing to keep from being preempted by GC.
    mp := acquirem()
    if mp.mallocing != 0 {
        throw("malloc deadlock")
    }
    if mp.gsignal == getg() {
        throw("malloc during signal")
    }
    mp.mallocing = 1

    shouldhelpgc := false
    dataSize := size
    c := gomcache()
    var x unsafe.Pointer
    noscan := typ == nil || typ.kind&kindNoPointers != 0
    if size <= maxSmallSize {
        // object size <= 32K
        if noscan && size < maxTinySize {
        // 小于 16 byte 的小對(duì)象分配
            // Tiny allocator.
            //
            // Tiny allocator combines several tiny allocation requests
            // into a single memory block. The resulting memory block
            // is freed when all subobjects are unreachable. The subobjects
            // must be noscan (don't have pointers), this ensures that
            // the amount of potentially wasted memory is bounded.
            //
            // Size of the memory block used for combining (maxTinySize) is tunable.
            // Current setting is 16 bytes, which relates to 2x worst case memory
            // wastage (when all but one subobjects are unreachable).
            // 8 bytes would result in no wastage at all, but provides less
            // opportunities for combining.
            // 32 bytes provides more opportunities for combining,
            // but can lead to 4x worst case wastage.
            // The best case winning is 8x regardless of block size.
            //
            // Objects obtained from tiny allocator must not be freed explicitly.
            // So when an object will be freed explicitly, we ensure that
            // its size >= maxTinySize.
            //
            // SetFinalizer has a special case for objects potentially coming
            // from tiny allocator, it such case it allows to set finalizers
            // for an inner byte of a memory block.
            //
            // The main targets of tiny allocator are small strings and
            // standalone escaping variables. On a json benchmark
            // the allocator reduces number of allocations by ~12% and
            // reduces heap size by ~20%.
            off := c.tinyoffset
            // Align tiny pointer for required (conservative) alignment.
            if size&7 == 0 {
                off = round(off, 8)
            } else if size&3 == 0 {
                off = round(off, 4)
            } else if size&1 == 0 {
                off = round(off, 2)
            }
            if off+size <= maxTinySize && c.tiny != 0 {
                // The object fits into existing tiny block.
                x = unsafe.Pointer(c.tiny + off)
                c.tinyoffset = off + size
                c.local_tinyallocs++
                mp.mallocing = 0
                releasem(mp)
                return x
            }
            // Allocate a new maxTinySize block.
            span := c.alloc[tinySpanClass]
            v := nextFreeFast(span)
            if v == 0 {
                v, _, shouldhelpgc = c.nextFree(tinySpanClass)
            }
            x = unsafe.Pointer(v)
            (*[2]uint64)(x)[0] = 0
            (*[2]uint64)(x)[1] = 0
            // See if we need to replace the existing tiny block with the new one
            // based on amount of remaining free space.
            if size < c.tinyoffset || c.tiny == 0 {
                c.tiny = uintptr(x)
                c.tinyoffset = size
            }
            size = maxTinySize
        } else {
            var sizeclass uint8
            if size <= smallSizeMax-8 {
                sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]
            } else {
                sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]
            }
            size = uintptr(class_to_size[sizeclass])
            spc := makeSpanClass(sizeclass, noscan)
            span := c.alloc[spc]
            v := nextFreeFast(span)
            if v == 0 {
                v, span, shouldhelpgc = c.nextFree(spc)
            }
            x = unsafe.Pointer(v)
            if needzero && span.needzero != 0 {
                memclrNoHeapPointers(unsafe.Pointer(v), size)
            }
        }
    } else {
        //object size > 32K byte
        var s *mspan
        shouldhelpgc = true
        systemstack(func() {
            s = largeAlloc(size, needzero, noscan)
        })
        s.freeindex = 1
        s.allocCount = 1
        x = unsafe.Pointer(s.base())
        size = s.elemsize
    }

    var scanSize uintptr
    if !noscan {
        // If allocating a defer+arg block, now that we've picked a malloc size
        // large enough to hold everything, cut the "asked for" size down to
        // just the defer header, so that the GC bitmap will record the arg block
        // as containing nothing at all (as if it were unused space at the end of
        // a malloc block caused by size rounding).
        // The defer arg areas are scanned as part of scanstack.
        if typ == deferType {
            dataSize = unsafe.Sizeof(_defer{})
        }
        heapBitsSetType(uintptr(x), size, dataSize, typ)
        if dataSize > typ.size {
            // Array allocation. If there are any
            // pointers, GC has to scan to the last
            // element.
            if typ.ptrdata != 0 {
                scanSize = dataSize - typ.size + typ.ptrdata
            }
        } else {
            scanSize = typ.ptrdata
        }
        c.local_scan += scanSize
    }

    // Ensure that the stores above that initialize x to
    // type-safe memory and set the heap bits occur before
    // the caller can make x observable to the garbage
    // collector. Otherwise, on weakly ordered machines,
    // the garbage collector could follow a pointer to x,
    // but see uninitialized memory or stale heap bits.
    publicationBarrier()

    // Allocate black during GC.
    // All slots hold nil so no scanning is needed.
    // This may be racing with GC so do it atomically if there can be
    // a race marking the bit.
    if gcphase != _GCoff {
        gcmarknewobject(uintptr(x), size, scanSize)
    }

    if raceenabled {
        racemalloc(x, size)
    }

    if msanenabled {
        msanmalloc(x, size)
    }

    mp.mallocing = 0
    releasem(mp)

    if debug.allocfreetrace != 0 {
        tracealloc(x, size, typ)
    }

    if rate := MemProfileRate; rate > 0 {
        if size < uintptr(rate) && int32(size) < c.next_sample {
            c.next_sample -= int32(size)
        } else {
            mp := acquirem()
            profilealloc(mp, x, size)
            releasem(mp)
        }
    }

    if assistG != nil {
        // Account for internal fragmentation in the assist
        // debt now that we know it.
        assistG.gcAssistBytes -= int64(size - dataSize)
    }

    if shouldhelpgc {
        if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {
            gcStart(gcBackgroundMode, t)
        }
    }

    return x
}

整個(gè)分配過(guò)程可以根據(jù) object size 拆解成三部分：size < 16 byte, 16 byte <= size <= 32 K byte, size > 32 K byte。

5.1 size 小于 16 byte

對(duì)于小于 16 byte 的內(nèi)存塊，mcache 有個(gè)專門的內(nèi)存區(qū)域 tiny 用來(lái)分配，tiny 是指針，指向開(kāi)始地址。

func mallocgc(...) {
    ...
            off := c.tinyoffset
            // 地址對(duì)齊
            if size&7 == 0 {
                off = round(off, 8)
            } else if size&3 == 0 {
                off = round(off, 4)
            } else if size&1 == 0 {
                off = round(off, 2)
            }
            // 分配
            if off+size <= maxTinySize && c.tiny != 0 {
                // The object fits into existing tiny block.
                x = unsafe.Pointer(c.tiny + off)
                c.tinyoffset = off + size
                c.local_tinyallocs++
                mp.mallocing = 0
                releasem(mp)
                return x
            }
            // tiny 不夠了，為其重新分配一個(gè) 16 byte 內(nèi)存塊
            span := c.alloc[tinySizeClass]
            v := nextFreeFast(span)
            if v == 0 {
                v, _, shouldhelpgc = c.nextFree(tinySizeClass)
            }
            x = unsafe.Pointer(v)
            //將申請(qǐng)的內(nèi)存塊全置為 0
            (*[2]uint64)(x)[0] = 0
            (*[2]uint64)(x)[1] = 0
            // See if we need to replace the existing tiny block with the new one
            // based on amount of remaining free space.
            // 如果申請(qǐng)的內(nèi)存塊用不完，則將剩下的給 tiny，用 tinyoffset 記錄分配了多少。
            if size < c.tinyoffset || c.tiny == 0 {
                c.tiny = uintptr(x)
                c.tinyoffset = size
            }
            size = maxTinySize
}

如上所示，tinyoffset 表示 tiny 當(dāng)前分配到什么地址了，之后的分配根據(jù) tinyoffset 尋址。先根據(jù)要分配的對(duì)象大小進(jìn)行地址對(duì)齊，比如 size 是 8 的倍數(shù)，tinyoffset 和 8 對(duì)齊。然后就是進(jìn)行分配。如果 tiny 剩余的空間不夠用，則重新申請(qǐng)一個(gè) 16 byte 的內(nèi)存塊，并分配給 object。如果有結(jié)余，則記錄在 tiny 上。

5.2 size 大于 32 K byte

對(duì)于大于 32 Kb 的內(nèi)存分配，直接跳過(guò) mcache 和 mcentral，通過(guò) mheap 分配。

func mallocgc(...) {
    } else {
        var s *mspan
        shouldhelpgc = true
        systemstack(func() {
            s = largeAlloc(size, needzero)
        })
        s.freeindex = 1
        s.allocCount = 1
        x = unsafe.Pointer(s.base())
        size = s.elemsize
    }
    ...
}

func largeAlloc(size uintptr, needzero bool) *mspan {
    ...
    npages := size >> _PageShift
    if size&_PageMask != 0 {
        npages++
    }
    ...
    s := mheap_.alloc(npages, 0, true, needzero)
    if s == nil {
        throw("out of memory")
    }
    s.limit = s.base() + size
    heapBitsForSpan(s.base()).initSpan(s)
    return s
}

對(duì)于大于 32 K 的內(nèi)存分配都是分配整數(shù)頁(yè)，先右移然后低位與計(jì)算需要的頁(yè)數(shù)。

5.3 size 介于 16 和 32K

對(duì)于 size 介于 16 ~ 32K byte 的內(nèi)存分配先計(jì)算應(yīng)該分配的 sizeclass，然后去 mcache 里面 alloc[sizeclass] 申請(qǐng)，如果 mcache.alloc[sizeclass] 不足以申請(qǐng)，則 mcache 向 mcentral 申請(qǐng)，然后再分配。mcentral 給 mcache 分配完之后會(huì)判斷自己需不需要擴(kuò)充，如果需要?jiǎng)t想 mheap 申請(qǐng)。

func mallocgc(...) {
        ...
        } else {
            var sizeclass uint8
            //計(jì)算 sizeclass
            if size <= smallSizeMax-8 {
                sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]
            } else {
                sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]
            }
            size = uintptr(class_to_size[sizeclass])
            spc := makeSpanClass(sizeclass, noscan)
            span := c.alloc[spc]
            //從對(duì)應(yīng)的 span 里面分配一個(gè) object 
            v := nextFreeFast(span)
            if v == 0 {
                v, span, shouldhelpgc = c.nextFree(spc)
            }
            x = unsafe.Pointer(v)
            if needzero && span.needzero != 0 {
                memclrNoHeapPointers(unsafe.Pointer(v), size)
            }
          }
}

我們首先看一下如何計(jì)算 sizeclass 的，預(yù)先定義了兩個(gè)數(shù)組：size_to_class8 和 size_to_class128。數(shù)組 size_to_class8，其第 i 個(gè)值表示地址區(qū)間 ( (i-1)8, i8 ] (smallSizeDiv = 8) 對(duì)應(yīng)的 sizeclass，size_to_class128 類似。小于 1024 - 8 = 1016 （smallSizeMax=1024），使用 size_to_class8，否則使用數(shù)組 size_to_class128?？匆幌聰?shù)組具體的值：0, 1, 2, 3, 3, 4, 4…。舉個(gè)例子，比如要分配 17 byte 的內(nèi)存（16 byte 以下的使用 mcache.tiny 分配），sizeclass = size_to_calss8[(17+7)/8] = size_to_class8[3] = 3。不得不說(shuō)這種用空間換時(shí)間的策略確實(shí)提高了運(yùn)行效率。
計(jì)算出 sizeclass，那么就可以去 mcache.alloc[sizeclass] 分配了，注意這是一個(gè) mspan 指針，真正的分配函數(shù)是 nextFreeFast() 函數(shù)，如下。

// nextFreeFast returns the next free object if one is quickly available.
// Otherwise it returns 0.
func nextFreeFast(s *mspan) gclinkptr {
    theBit := sys.Ctz64(s.allocCache) // Is there a free object in the allocCache?
    if theBit < 64 {
        result := s.freeindex + uintptr(theBit)
        if result < s.nelems {
            freeidx := result + 1
            if freeidx%64 == 0 && freeidx != s.nelems {
                return 0
            }
            s.allocCache >>= uint(theBit + 1)
            s.freeindex = freeidx
            s.allocCount++
            return gclinkptr(result*s.elemsize + s.base())
        }
    }
    return 0
}

allocCache 這里是用位圖表示內(nèi)存是否可用，1 表示可用。然后通過(guò) span 里面的 freeindex 和 elemsize 來(lái)計(jì)算地址即可。
如果 mcache.alloc[sizeclass] 已經(jīng)不夠用了，則從 mcentral 申請(qǐng)內(nèi)存到 mcache。

// nextFree returns the next free object from the cached span if one is available.
// Otherwise it refills the cache with a span with an available object and
// returns that object along with a flag indicating that this was a heavy
// weight allocation. If it is a heavy weight allocation the caller must
// determine whether a new GC cycle needs to be started or if the GC is active
// whether this goroutine needs to assist the GC.
func (c *mcache) nextFree(sizeclass uint8) (v gclinkptr, s *mspan, shouldhelpgc bool) {
    s = c.alloc[sizeclass]
    shouldhelpgc = false
    freeIndex := s.nextFreeIndex()
    if freeIndex == s.nelems {
        // The span is full.
        if uintptr(s.allocCount) != s.nelems {
            println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
            throw("s.allocCount != s.nelems && freeIndex == s.nelems")
        }
        systemstack(func() {
            // 這個(gè)地方 mcache 向 mcentral 申請(qǐng)
            c.refill(int32(sizeclass))
        })
        shouldhelpgc = true
        s = c.alloc[sizeclass]
        // mcache 向 mcentral 申請(qǐng)完之后，再次從 mcache 申請(qǐng)
        freeIndex = s.nextFreeIndex()
    }

    ...
}

// nextFreeIndex returns the index of the next free object in s at
// or after s.freeindex.
// There are hardware instructions that can be used to make this
// faster if profiling warrants it.
// 這個(gè)函數(shù)和 nextFreeFast 有點(diǎn)冗余了
func (s *mspan) nextFreeIndex() uintptr {
    ...
}

mcache 向 mcentral，如果 mcentral 不夠，則向 mheap 申請(qǐng)。

//go version 1.10.8

func (c *mcache) refill(sizeclass int32) *mspan {
    ...
    // 向 mcentral 申請(qǐng)
    s = mheap_.central[sizeclass].mcentral.cacheSpan()
    ...
    return s
}

// Allocate a span to use in an MCache.
func (c *mcentral) cacheSpan() *mspan {
    ...
    // Replenish central list if empty.
    s = c.grow()
}

func (c *mcentral) grow() *mspan {
    npages := uintptr(class_to_allocnpages[c.sizeclass])
    size := uintptr(class_to_size[c.sizeclass])
    n := (npages << _PageShift) / size
    
    //這里向 mheap 申請(qǐng)
    s := mheap_.alloc(npages, c.sizeclass, false, true)
    ...
    return s
}

如果 mheap 不足，則向 OS 申請(qǐng)。接上面的代碼 mheap_.alloc()

func (h *mheap) alloc(npage uintptr, sizeclass int32, large bool, needzero bool) *mspan {
    ...
    var s *mspan
    systemstack(func() {
        s = h.alloc_m(npage, sizeclass, large)
    })
    ...
}

func (h *mheap) alloc_m(npage uintptr, sizeclass int32, large bool) *mspan {
    ... 
    s := h.allocSpanLocked(npage)
    ...
}

func (h *mheap) allocSpanLocked(npage uintptr) *mspan {
    ...
    s = h.allocLarge(npage)
    if s == nil {
        if !h.grow(npage) {
            return nil
        }
        s = h.allocLarge(npage)
        if s == nil {
            return nil
        }
    }
    ...
}

func (h *mheap) grow(npage uintptr) bool {
    // Ask for a big chunk, to reduce the number of mappings
    // the operating system needs to track; also amortizes
    // the overhead of an operating system mapping.
    // Allocate a multiple of 64kB.
    npage = round(npage, (64<<10)/_PageSize)
    ask := npage << _PageShift
    if ask < _HeapAllocChunk {
        ask = _HeapAllocChunk
    }

    v := h.sysAlloc(ask)
    ...
}

整個(gè)函數(shù)調(diào)用鏈如上所示，最后 sysAlloc 會(huì)調(diào)用系統(tǒng)調(diào)用（mmap 或者 VirtualAlloc，和初始化那部分有點(diǎn)類似）去向操作系統(tǒng)申請(qǐng)。

六、內(nèi)存回收

這里只會(huì)介紹一些簡(jiǎn)單的內(nèi)存回收。

6.1 mcache 回收

mcache 回收可以分兩部分：第一部分是將 alloc 中未用完的內(nèi)存歸還給對(duì)應(yīng)的 mcentral。

func freemcache(c *mcache) {
    systemstack(func() {
        c.releaseAll()
        ...

        lock(&mheap_.lock)
        purgecachedstats(c)
        mheap_.cachealloc.free(unsafe.Pointer(c))
        unlock(&mheap_.lock)
    })
}

func (c *mcache) releaseAll() {
    for i := 0; i < _NumSizeClasses; i++ {
        s := c.alloc[i]
        if s != &emptymspan {
            mheap_.central[i].mcentral.uncacheSpan(s)
            c.alloc[i] = &emptymspan
        }
    }
    // Clear tinyalloc pool.
    c.tiny = 0
    c.tinyoffset = 0
}

6.2 mcentral 回收

當(dāng) mspan 沒(méi)有 free object 的時(shí)候，將 mspan 歸還給 mheap。

func (c *mcentral) freeSpan(s *mspan, preserve bool, wasempty bool) bool {
    ...
    lock(&c.lock)
    ...
    if s.allocCount != 0 {
        unlock(&c.lock)
        return false
    }

    c.nonempty.remove(s)
    unlock(&c.lock)
    mheap_.freeSpan(s, 0)
    return true
}

6.3 mheap

mheap 并不會(huì)定時(shí)向操作系統(tǒng)歸還，但是會(huì)對(duì) span 做一些操作，比如合并相鄰的 span。

七、總結(jié)

目前GoLang內(nèi)存管理還處于學(xué)習(xí)階段，上文多是整理學(xué)習(xí)了一些目前互聯(lián)網(wǎng)的信息，參考資源都已列出，有些地方還沒(méi)有理解，如有問(wèn)題，請(qǐng)多交流。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

GoLang-內(nèi)存管理

GoLang-內(nèi)存管理

一、tcmalloc介紹<參考資源>

1.1 簡(jiǎn)介

1.2 tcmalloc算法策略

1.2.1 Small Object Allocation

1.2.2 Big Object Allocation

1.2.3 Span

1.3 golang對(duì)比

二、基本知識(shí)

2.1 golang運(yùn)行調(diào)度

2.2 逃逸分析（escape analysis）

三、關(guān)鍵數(shù)據(jù)結(jié)構(gòu)

3.1 mcache

3.2 mspan

3.3 mcentral

3.4 mheap

四、初始化

4.1 arena 相關(guān)

4.2 虛擬地址申請(qǐng)

4.3 mheap 初始化

4.4 per-P mcache 初始化

五、內(nèi)存分配

5.1 size 小于 16 byte

5.2 size 大于 32 K byte

5.3 size 介于 16 和 32K

六、內(nèi)存回收

6.1 mcache 回收

6.2 mcentral 回收

6.3 mheap

七、總結(jié)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

GoLang-內(nèi)存管理

一、tcmalloc介紹<參考資源>

1.1 簡(jiǎn)介

1.2 tcmalloc算法策略

1.2.1 Small Object Allocation

1.2.2 Big Object Allocation

1.2.3 Span

1.3 golang對(duì)比

二、基本知識(shí)

2.1 golang運(yùn)行調(diào)度

2.2 逃逸分析（escape analysis）

三、關(guān)鍵數(shù)據(jù)結(jié)構(gòu)

3.1 mcache

3.2 mspan

3.3 mcentral

3.4 mheap

四、初始化

4.1 arena 相關(guān)

4.2 虛擬地址申請(qǐng)

4.3 mheap 初始化

4.4 per-P mcache 初始化

五、內(nèi)存分配

5.1 size 小于 16 byte

5.2 size 大于 32 K byte

5.3 size 介于 16 和 32K

六、內(nèi)存回收

6.1 mcache 回收

6.2 mcentral 回收

6.3 mheap

七、總結(jié)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

二、基本知識(shí)

三、關(guān)鍵數(shù)據(jù)結(jié)構(gòu)

五、內(nèi)存分配

七、總結(jié)