Go常見(jiàn)的一些性能優(yōu)化

[]byte和string

轉(zhuǎn)換

  • 盡量避免[]byte和string的互相轉(zhuǎn)換,go的string是不可變類(lèi)型,標(biāo)準(zhǔn)實(shí)現(xiàn)中和[]byte的互轉(zhuǎn)均為值拷貝
  • 多數(shù)場(chǎng)景下都可以?xún)?yōu)先選擇強(qiáng)轉(zhuǎn)換方式進(jìn)行互轉(zhuǎn)
//強(qiáng)轉(zhuǎn)換
func stringToBytes(s string) []byte {
   x := (*[2]uintptr)(unsafe.Pointer(&s))
   b := [3]uintptr{x[0], x[1], x[1]}
   return *(*[]byte)(unsafe.Pointer(&b))
}

func bytesToString(b []byte) string {
   return *(*string)(unsafe.Pointer(&b))
}

僅在只讀場(chǎng)景下使用強(qiáng)轉(zhuǎn)換

內(nèi)存申請(qǐng)

提前預(yù)估容量

  • slice/map初始化盡量估計(jì)好長(zhǎng)度,能有效減少內(nèi)存分配次數(shù),優(yōu)化很明顯
  • 盡量規(guī)避使用append,因?yàn)樾枰悼截?,且涉及到重新申?qǐng)內(nèi)存,可能會(huì)發(fā)生逃逸(Mac環(huán)境下測(cè)試:當(dāng)append之后的slice長(zhǎng)度大于8時(shí)會(huì)被分配到堆上)
  • 如果無(wú)法預(yù)估,一般場(chǎng)景下可以考慮申請(qǐng)足夠大的空間,并在場(chǎng)景允許的情況下優(yōu)先考慮復(fù)用slice
func useCap1() {
   arr := make([]int, 0, 2048)
   for i := 0; i < 2048; i++ {
      arr = append(arr, i)
   }
}

func useCap2() {
   arr := make([]int, 2048)
   for i := 0; i < 2048; i++ {
       arr[i] = i
   }
}

func noCap() {
   var arr []int
   for i := 0; i < 2048; i++ {
      arr = append(arr, i)
   }
}

Benchmark

goos: darwin
goarch: amd64
BenchmarkUseCap1-12       966577              1212 ns/op               0 B/op          0 allocs/op
BenchmarkUseCap2-12      2398420               499 ns/op               0 B/op          0 allocs/op
BenchmarkNoCap-12         192712              6016 ns/op           58616 B/op         14 allocs/op

slice擴(kuò)容的主要代碼,常規(guī)場(chǎng)景下的擴(kuò)容邏輯為cap<1024時(shí)每次翻倍,cap>1024時(shí)每次增長(zhǎng)25%,此處也可以對(duì)應(yīng)上benchmark中noCap()分配在了堆上,并經(jīng)過(guò)了14次擴(kuò)容

newcap := old.cap
doublecap := newcap + newcap
if cap > doublecap {
   newcap = cap
} else {
   if old.len < 1024 {
      newcap = doublecap
   } else {
      // Check 0 < newcap to detect overflow
      // and prevent an infinite loop.
      for 0 < newcap && newcap < cap {
         newcap += newcap / 4
      }
      // Set newcap to the requested cap when
      // the newcap calculation overflowed.
      if newcap <= 0 {
         newcap = cap
      }
   }
}

優(yōu)先在棧上分配

func BenchmarkHeap(b *testing.B) {
   m := make([]*string, 1000)
   for i := 0; i < b.N; i++ {
      for i := 0; i < 1000; i++ {
         s := "test"
         m[i] = &s
      }
   }
}

func BenchmarkStack(b *testing.B) {
   m := make([]string, 1000)
   for i := 0; i < b.N; i++ {
      for i := 0; i < 1000; i++ {
         s := "test"
         m[i] = s
      }
   }
}

Benchmark

goos: darwin
goarch: amd64
BenchmarkHeap-12           44640         23033 ns/op       16000 B/op       1000 allocs/op
BenchmarkStack-12        4650966           252 ns/op           0 B/op          0 allocs/op

Map/Slice

Map中簡(jiǎn)單結(jié)構(gòu)盡量不使用指針

map[int]*int

func gcTime() time.Duration {
    start := time.Now()
    runtime.GC()
    return time.Since(start)
}

func f1() {
    s := make(map[int]int, 5e7)
    for i := 0; i < 5e7; i++ {
        s[i] = i
    }
    fmt.Printf("With %T, GC took %s\n", s, gcTime())
    _ = s[0]
}

func f2() {
    s := make(map[int]*int, 5e7)
    for i := 0; i < 5e7; i++ {
        s[i] = &i
    }
    fmt.Printf("With %T, GC took %s\n", s, gcTime())
    _=s[0]
}

Output:

With map[int]int, GC took 31.956029ms
With map[int]*int, GC took 184.174966ms

不包含指針的map在gc中不需要scanObject
另外根據(jù)map的實(shí)現(xiàn)(關(guān)鍵詞搜索bmap),當(dāng)元素值大于128byte時(shí),還是需要scanObject

type BigStruct struct {
   C01 int
   C02 int
   //...
   C16 int // 128byte gc scan臨界點(diǎn)
   C17 int //136byte
 }
 
 func f3() {
   s := make(map[int]BigStruct, N)
   for i := 0; i < N; i++ {
      s[i] = BigStruct{}
   }
   fmt.Printf("With %T, GC took %s\n", s, timeGC())
   _ = s[0]
}

Output:

With map[int]main.BigStruct, GC took 1.628134832s
With map[int]main.NoBigStruct, GC took 44.708865ms

BigStruct 多了一個(gè)C17,GC時(shí)間大幅增加

對(duì)比[]*int, []int和[]BigStruct

func f4() {
    s := make([]*int, N)
    for i := 0; i < N; i++ {
        s[i] = &i
    }
    fmt.Printf("With %T, GC took %s\n", s, gcTime())
    _ = s[0]
}

func f5() {
    s := make([]int, N)
    for i := 0; i < N; i++ {
        s[i] = i
    }
    fmt.Printf("With %T, GC took %s\n", s, gcTime())
    _ = s[0]
}

func f6() {
    s := make([]BigStruct, N)
    for i := 0; i < N; i++ {
        s[i] = BigStruct{}
    }
    fmt.Printf("With %T, GC took %s\n", s, gcTime())
    _ = s[0]
}

Output:

With []*int, GC took 137.308395ms
With []int, GC took 211.862μs
With []main.BigStruct, GC took 173.504μs

slice包含指針的時(shí)候同理需要scanObject,但不包含指針時(shí)不受元素大小影響,且gc效率要比map高很多
上面的優(yōu)化受到很多條條框框限制,比如map[int]string其實(shí)是包含指針的(見(jiàn)string定義),無(wú)法享受高效的gc,看上去不實(shí)用,但是基于此有一種應(yīng)用較多的優(yōu)化方式,即把大型的map結(jié)構(gòu)轉(zhuǎn)換為map[int]int(索引)+slice的方式,把gc壓力轉(zhuǎn)移到slice上(比map gc開(kāi)銷(xiāo)低),典型例子如bigcache

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容