Kubernetes garbage collector即垃圾收集器,存在于kube-controller-manger中,它負(fù)責(zé)回收kubernetes中的資源對(duì)象,監(jiān)聽資源對(duì)象事件,更新對(duì)象之間的依賴關(guān)系,并根據(jù)對(duì)象的刪除策略來決定是否刪除其關(guān)聯(lián)對(duì)象。
關(guān)于刪除關(guān)聯(lián)對(duì)象,細(xì)一點(diǎn)說就是,使用級(jí)聯(lián)刪除策略去刪除一個(gè)owner時(shí),會(huì)連帶這個(gè)owner對(duì)象的dependent對(duì)象也一起刪除掉。
關(guān)于對(duì)象的關(guān)聯(lián)依賴關(guān)系,garbage collector會(huì)監(jiān)聽資源對(duì)象事件,根據(jù)資源對(duì)象中ownerReference 的值,來構(gòu)建對(duì)象間的關(guān)聯(lián)依賴關(guān)系,也即owner與dependent之間的關(guān)系。
例子
以創(chuàng)建deployment對(duì)象為例進(jìn)行講解。創(chuàng)建deployment對(duì)象后,kube-controller-manager為其創(chuàng)建出replicaset對(duì)象,且自動(dòng)將該deployment的信息設(shè)置到replicaset對(duì)象ownerReference值。如下面示例,即說明replicaset對(duì)象test-1-59d7f45ffb的owner為deployment對(duì)象test-1,deployment對(duì)象test-1的dependent為replicaset對(duì)象test-1-59d7f45ffb。
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-1
namespace: test
uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
...
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: test-1-59d7f45ffb
namespace: test
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: Deployment
name: test-1
uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
uid: 386c380b-490e-470b-a33f-7d5b0bf945fb
...
同理,replicaset對(duì)象創(chuàng)建后,kube-controller-manager為其創(chuàng)建出pod對(duì)象,這些pod對(duì)象也會(huì)將replicaset對(duì)象的信息設(shè)置到pod對(duì)象的ownerReference的值中,replicaset是pod的owner,pod是replicaset的dependent。
對(duì)象中ownerReference 的值,指定了owner與dependent之間的關(guān)系。
簡介
GarbageCollector Controller源碼主要分為以下幾部分:
- monitors作為生產(chǎn)者將變化的資源放入graphChanges隊(duì)列;同時(shí)restMapper定期檢測集群內(nèi)資源類型,刷新monitors
- runProcessGraphChanges從graphChanges隊(duì)列中取出變化的item,根據(jù)情況放入attemptToDelete隊(duì)列;
- runProcessGraphChanges從graphChanges隊(duì)列中取出變化的item,根據(jù)情況放入attemptToOrphan隊(duì)列;
- runAttemptToDeleteWorker從attemptToDelete隊(duì)列取出,嘗試刪除垃圾資源;
- runAttemptToOrphanWorker從attemptToDelete隊(duì)列取出,處理該孤立的資源;

架構(gòu)

garbage collector中最關(guān)鍵的代碼就是
garbagecollector.go與graph_builder.go兩部分。
garbage collector的主要組成為1個(gè)圖(對(duì)象關(guān)聯(lián)依賴關(guān)系圖)、2個(gè)處理器(GraphBuilder與GarbageCollector)、3個(gè)事件隊(duì)列(graphChanges、attemptToDelete與attemptToOrphan):
1個(gè)圖
(1)uidToNode:對(duì)象關(guān)聯(lián)依賴關(guān)系圖,由GraphBuilder維護(hù),維護(hù)著所有對(duì)象間的關(guān)聯(lián)依賴關(guān)系。在該圖里,每一個(gè)k8s對(duì)象會(huì)對(duì)應(yīng)著關(guān)系圖里的一個(gè)node,而每個(gè)node都會(huì)維護(hù)一個(gè)owner列表以及dependent列表。
示例:現(xiàn)有一個(gè)deployment A,replicaset B(owner為deployment A),pod C(owner為replicaset B),則對(duì)象關(guān)聯(lián)依賴關(guān)系如下:
3個(gè)node,分別是A、B、C
A對(duì)應(yīng)一個(gè)node,無owner,dependent列表里有B;
B對(duì)應(yīng)一個(gè)node,owner列表里有A,dependent列表里有C;
C對(duì)應(yīng)一個(gè)node,owner列表里有B,無dependent。

2個(gè)處理器
(1)GraphBuilder:負(fù)責(zé)維護(hù)所有對(duì)象的關(guān)聯(lián)依賴關(guān)系圖,并產(chǎn)生事件觸發(fā)GarbageCollector執(zhí)行對(duì)象回收刪除操作。GraphBuilder從graphChanges事件隊(duì)列中獲取事件進(jìn)行消費(fèi),根據(jù)資源對(duì)象中ownerReference的值,來構(gòu)建、更新、刪除對(duì)象間的關(guān)聯(lián)依賴關(guān)系圖,也即owner與dependent之間的關(guān)系圖,然后再作為生產(chǎn)者生產(chǎn)事件,放入attemptToDelete或attemptToOrphan隊(duì)列中,觸發(fā)GarbageCollector執(zhí)行,看是否需要進(jìn)行關(guān)聯(lián)對(duì)象的回收刪除操作,而GarbageCollector進(jìn)行對(duì)象的回收刪除操作時(shí)會(huì)依賴于uidToNode這個(gè)關(guān)系圖。
(2)GarbageCollector:負(fù)責(zé)回收刪除對(duì)象。GarbageCollector作為消費(fèi)者,從attemptToDelete與attemptToOrphan隊(duì)列中取出事件進(jìn)行處理,若一個(gè)對(duì)象被刪除,且其刪除策略為級(jí)聯(lián)刪除,則進(jìn)行關(guān)聯(lián)對(duì)象的回收刪除。關(guān)于刪除關(guān)聯(lián)對(duì)象,細(xì)一點(diǎn)說就是,使用級(jí)聯(lián)刪除策略去刪除一個(gè)owner時(shí),會(huì)連帶這個(gè)owner對(duì)象的dependent對(duì)象也一起刪除掉。
3個(gè)事件隊(duì)列
-
graphChanges:list/watch apiserver,獲取事件,由informer生產(chǎn),由GraphBuilder消費(fèi); -
attemptToDelete:級(jí)聯(lián)刪除事件隊(duì)列,由GraphBuilder生產(chǎn),由GarbageCollector消費(fèi); -
attemptToOrphan:孤兒刪除事件隊(duì)列,由GraphBuilder生產(chǎn),由GarbageCollector消費(fèi)。
參數(shù)
想要啟用GC,需要在kube-apiserver和kube-controller-manager的啟動(dòng)參數(shù)中都設(shè)置--enable-garbage-collector為true,1.13.2版本中默認(rèn)開啟GC
cm組件啟動(dòng)參數(shù)中,與garbage collector相關(guān)的參數(shù)代碼如下:
// cmd/kube-controller-manager/app/options/garbagecollectorcontroller.go
// AddFlags adds flags related to GarbageCollectorController for controller manager to the specified FlagSet.
func (o *GarbageCollectorControllerOptions) AddFlags(fs *pflag.FlagSet) {
if o == nil {
return
}
fs.Int32Var(&o.ConcurrentGCSyncs, "concurrent-gc-syncs", o.ConcurrentGCSyncs, "The number of garbage collector workers that are allowed to sync concurrently.")
fs.BoolVar(&o.EnableGarbageCollector, "enable-garbage-collector", o.EnableGarbageCollector, "Enables the generic garbage collector. MUST be synced with the corresponding flag of the kube-apiserver.")
}
從代碼中可以看到,kcm組件啟動(dòng)參數(shù)中有兩個(gè)參數(shù)與garbage collector相關(guān),分別是:
(1)enable-garbage-collector:是否開啟garbage collector,默認(rèn)值為true;
(2)concurrent-gc-syncs:garbage collector同步操作的worker數(shù)量,默認(rèn)20。
garbage collector的源碼分析將分成兩部分進(jìn)行,分別是:
- 啟動(dòng)分析;
- 核心處理邏輯分析。
啟動(dòng)
kube-controller-manager啟動(dòng)入口,app.NewControllerManagerCommand()中加載controller manager默認(rèn)啟動(dòng)參數(shù),創(chuàng)建* cobra.Command對(duì)象:
// cmd/kube-controller-manager/app/core.go
func startGarbageCollectorController(ctx ControllerContext) (http.Handler, bool, error) {
if !ctx.ComponentConfig.GarbageCollectorController.EnableGarbageCollector {
return nil, false, nil
}
gcClientset := ctx.ClientBuilder.ClientOrDie("generic-garbage-collector")
discoveryClient := cacheddiscovery.NewMemCacheClient(gcClientset.Discovery())
config := ctx.ClientBuilder.ConfigOrDie("generic-garbage-collector")
metadataClient, err := metadata.NewForConfig(config)
if err != nil {
return nil, true, err
}
// Get an initial set of deletable resources to prime the garbage collector.
deletableResources := garbagecollector.GetDeletableResources(discoveryClient)
ignoredResources := make(map[schema.GroupResource]struct{})
for _, r := range ctx.ComponentConfig.GarbageCollectorController.GCIgnoredResources {
ignoredResources[schema.GroupResource{Group: r.Group, Resource: r.Resource}] = struct{}{}
}
garbageCollector, err := garbagecollector.NewGarbageCollector(
metadataClient,
ctx.RESTMapper,
deletableResources,
ignoredResources,
ctx.ObjectOrMetadataInformerFactory,
ctx.InformersStarted,
)
if err != nil {
return nil, true, fmt.Errorf("failed to start the generic garbage collector: %v", err)
}
// Start the garbage collector.
workers := int(ctx.ComponentConfig.GarbageCollectorController.ConcurrentGCSyncs)
go garbageCollector.Run(workers, ctx.Stop)
// Periodically refresh the RESTMapper with new discovery information and sync
// the garbage collector.
go garbageCollector.Sync(gcClientset.Discovery(), 30*time.Second, ctx.Stop)
return garbagecollector.NewDebugHandler(garbageCollector), true, nil
}
startGarbageCollectorController函數(shù)主要邏輯如下:
- 根據(jù)EnableGarbageCollector變量的值來決定是否開啟garbage collector,EnableGarbageCollector變量的值根據(jù)kcm組件啟動(dòng)參數(shù)--enable-garbage-collector配置獲取,默認(rèn)為true;不開啟則直接返回,不會(huì)繼續(xù)往下執(zhí)行;
- 初始化discoveryClient,主要用來獲取集群中的所有資源對(duì)象;
- 調(diào)用garbagecollector.GetDeletableResources,獲取集群內(nèi)garbage collector需要處理去刪除回收的所有資源對(duì)象,支持delete, list, watch三種操作的資源對(duì)象稱為 deletableResource;
- 調(diào)用garbagecollector.NewGarbageCollector初始化garbage collector;
- 調(diào)用garbageCollector.Run,啟動(dòng)garbage collector;garbageCollector.Run(workers, ctx.Stop)啟動(dòng)一個(gè)monitors用來監(jiān)聽資源對(duì)象的變化(對(duì)應(yīng)的由runProcessGraphChanges循環(huán)處理),和默認(rèn)20個(gè)deleteWorkers協(xié)程處理可刪除的資源對(duì)象、20個(gè)orphanWorkers協(xié)程處理孤兒對(duì)象。
- garbageCollector.Sync(gcClientset.Discovery(), 30*time.Second, ctx.Stop) 定時(shí)去獲取一個(gè)集群內(nèi)是否有新類型的資源對(duì)象的加入,并重新刷新monitors,以監(jiān)聽新類型的資源對(duì)象。
- 暴露http服務(wù),注冊(cè) debug 接口,用于debug,用來提供由GraphBuilder構(gòu)建的集群內(nèi)所有對(duì)象的關(guān)聯(lián)關(guān)系
下面對(duì)startGarbageCollectorController函數(shù)里的部分邏輯稍微展開一下分析。
garbagecollector.NewGarbageCollector
NewGarbageCollector函數(shù)負(fù)責(zé)初始化garbage collector。主要邏輯如下:
(1)初始化GarbageCollector結(jié)構(gòu)體;
(2)初始化GraphBuilder結(jié)構(gòu)體,并賦值給GarbageCollector結(jié)構(gòu)體的dependencyGraphBuilder屬性。
// pkg/controller/garbagecollector/garbagecollector.go
func NewGarbageCollector(
metadataClient metadata.Interface,
mapper resettableRESTMapper,
deletableResources map[schema.GroupVersionResource]struct{},
ignoredResources map[schema.GroupResource]struct{},
sharedInformers controller.InformerFactory,
informersStarted <-chan struct{},
) (*GarbageCollector, error) {
attemptToDelete := workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "garbage_collector_attempt_to_delete")
attemptToOrphan := workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "garbage_collector_attempt_to_orphan")
absentOwnerCache := NewUIDCache(500)
gc := &GarbageCollector{
metadataClient: metadataClient,
restMapper: mapper,
attemptToDelete: attemptToDelete,
attemptToOrphan: attemptToOrphan,
absentOwnerCache: absentOwnerCache,
}
gb := &GraphBuilder{
metadataClient: metadataClient,
informersStarted: informersStarted,
restMapper: mapper,
graphChanges: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "garbage_collector_graph_changes"),
uidToNode: &concurrentUIDToNode{
uidToNode: make(map[types.UID]*node),
},
attemptToDelete: attemptToDelete,
attemptToOrphan: attemptToOrphan,
absentOwnerCache: absentOwnerCache,
sharedInformers: sharedInformers,
ignoredResources: ignoredResources,
}
if err := gb.syncMonitors(deletableResources); err != nil {
utilruntime.HandleError(fmt.Errorf("failed to sync all monitors: %v", err))
}
gc.dependencyGraphBuilder = gb
return gc, nil
}
gb.syncMonitors
gb.syncMonitors的主要作用是調(diào)用gb.controllerFor對(duì)各個(gè)deletableResources(deletableResources指支持 “delete”, “l(fā)ist”, “watch” 三種操作的資源對(duì)象)資源對(duì)象的infomer做初始化,并為資源的變化事件注冊(cè)eventHandler(AddFunc、UpdateFunc 和 DeleteFunc),對(duì)于資源的add、update、delete event,都會(huì)push到graphChanges隊(duì)列中,然后gb.processGraphChanges會(huì)從graphChanges隊(duì)列中取出event進(jìn)行處理(后面介紹garbage collector處理邏輯的時(shí)候會(huì)做詳細(xì)分析)。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) syncMonitors(resources map[schema.GroupVersionResource]struct{}) error {
gb.monitorLock.Lock()
defer gb.monitorLock.Unlock()
toRemove := gb.monitors
if toRemove == nil {
toRemove = monitors{}
}
current := monitors{}
errs := []error{}
kept := 0
added := 0
for resource := range resources {
if _, ok := gb.ignoredResources[resource.GroupResource()]; ok {
continue
}
if m, ok := toRemove[resource]; ok {
current[resource] = m
delete(toRemove, resource)
kept++
continue
}
kind, err := gb.restMapper.KindFor(resource)
if err != nil {
errs = append(errs, fmt.Errorf("couldn't look up resource %q: %v", resource, err))
continue
}
c, s, err := gb.controllerFor(resource, kind)
if err != nil {
errs = append(errs, fmt.Errorf("couldn't start monitor for resource %q: %v", resource, err))
continue
}
current[resource] = &monitor{store: s, controller: c}
added++
}
gb.monitors = current
for _, monitor := range toRemove {
if monitor.stopCh != nil {
close(monitor.stopCh)
}
}
klog.V(4).Infof("synced monitors; added %d, kept %d, removed %d", added, kept, len(toRemove))
// NewAggregate returns nil if errs is 0-length
return utilerrors.NewAggregate(errs)
}
gb.controllerFor
gb.controllerFor主要是對(duì)資源對(duì)象的infomer做初始化,并為資源的變化事件注冊(cè)eventHandler(AddFunc、UpdateFunc 和 DeleteFunc),對(duì)于資源的add、update、delete event,都會(huì)push到graphChanges隊(duì)列中。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) controllerFor(resource schema.GroupVersionResource, kind schema.GroupVersionKind) (cache.Controller, cache.Store, error) {
handlers := cache.ResourceEventHandlerFuncs{
// add the event to the dependencyGraphBuilder's graphChanges.
AddFunc: func(obj interface{}) {
event := &event{
eventType: addEvent,
obj: obj,
gvk: kind,
}
gb.graphChanges.Add(event)
},
UpdateFunc: func(oldObj, newObj interface{}) {
// TODO: check if there are differences in the ownerRefs,
// finalizers, and DeletionTimestamp; if not, ignore the update.
event := &event{
eventType: updateEvent,
obj: newObj,
oldObj: oldObj,
gvk: kind,
}
gb.graphChanges.Add(event)
},
DeleteFunc: func(obj interface{}) {
// delta fifo may wrap the object in a cache.DeletedFinalStateUnknown, unwrap it
if deletedFinalStateUnknown, ok := obj.(cache.DeletedFinalStateUnknown); ok {
obj = deletedFinalStateUnknown.Obj
}
event := &event{
eventType: deleteEvent,
obj: obj,
gvk: kind,
}
gb.graphChanges.Add(event)
},
}
shared, err := gb.sharedInformers.ForResource(resource)
if err != nil {
klog.V(4).Infof("unable to use a shared informer for resource %q, kind %q: %v", resource.String(), kind.String(), err)
return nil, nil, err
}
klog.V(4).Infof("using a shared informer for resource %q, kind %q", resource.String(), kind.String())
// need to clone because it's from a shared cache
shared.Informer().AddEventHandlerWithResyncPeriod(handlers, ResourceResyncTime)
return shared.Informer().GetController(), shared.Informer().GetStore(), nil
}
garbageCollector.Run
garbageCollector.Run負(fù)責(zé)啟動(dòng)garbage collector,主要邏輯如下:
(1)調(diào)用gc.dependencyGraphBuilder.Run:啟動(dòng)GraphBuilder;
(2)根據(jù)啟動(dòng)參數(shù)配置的worker數(shù)量,起相應(yīng)數(shù)量的goroutine,執(zhí)行g(shù)c.runAttemptToDeleteWorker與gc.runAttemptToOrphanWorker,兩者屬于GarbageCollector的核心處理邏輯,都是去刪除需要被回收對(duì)象,具體分析會(huì)在下篇博客里進(jìn)行分析。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) Run(workers int, stopCh <-chan struct{}) {
defer utilruntime.HandleCrash()
defer gc.attemptToDelete.ShutDown()
defer gc.attemptToOrphan.ShutDown()
defer gc.dependencyGraphBuilder.graphChanges.ShutDown()
klog.Infof("Starting garbage collector controller")
defer klog.Infof("Shutting down garbage collector controller")
go gc.dependencyGraphBuilder.Run(stopCh)
if !cache.WaitForNamedCacheSync("garbage collector", stopCh, gc.dependencyGraphBuilder.IsSynced) {
return
}
klog.Infof("Garbage collector: all resource monitors have synced. Proceeding to collect garbage")
// gc workers
for i := 0; i < workers; i++ {
go wait.Until(gc.runAttemptToDeleteWorker, 1*time.Second, stopCh)
go wait.Until(gc.runAttemptToOrphanWorker, 1*time.Second, stopCh)
}
<-stopCh
}
gc.dependencyGraphBuilder.Run
gc.dependencyGraphBuilder.Run負(fù)責(zé)啟動(dòng)啟動(dòng)GraphBuilder,主要邏輯如下:
(1)調(diào)用gb.startMonitors,啟動(dòng)前面1.1 gb.syncMonitors中提到的infomers;
(2)每隔1s循環(huán)調(diào)用gb.runProcessGraphChanges,做GraphBuilder的核心邏輯處理,核心處理邏輯會(huì)在下篇博客里進(jìn)行分析。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) Run(stopCh <-chan struct{}) {
klog.Infof("GraphBuilder running")
defer klog.Infof("GraphBuilder stopping")
// Set up the stop channel.
gb.monitorLock.Lock()
gb.stopCh = stopCh
gb.running = true
gb.monitorLock.Unlock()
// Start monitors and begin change processing until the stop channel is
// closed.
gb.startMonitors()
wait.Until(gb.runProcessGraphChanges, 1*time.Second, stopCh)
// Stop any running monitors.
gb.monitorLock.Lock()
defer gb.monitorLock.Unlock()
monitors := gb.monitors
stopped := 0
for _, monitor := range monitors {
if monitor.stopCh != nil {
stopped++
close(monitor.stopCh)
}
}
// reset monitors so that the graph builder can be safely re-run/synced.
gb.monitors = nil
klog.Infof("stopped %d of %d monitors", stopped, len(monitors))
}
garbageCollector.Sync
garbageCollector.Sync的主要功能是周期性的查詢集群中所有的deletableResources,調(diào)用gc.resyncMonitors來更新GraphBuilder的monitors,為新出現(xiàn)的資源對(duì)象初始化infomer和注冊(cè)eventHandler,然后啟動(dòng)infomer,對(duì)已經(jīng)移除的資源對(duì)象的monitors進(jìn)行銷毀。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) Sync(discoveryClient discovery.ServerResourcesInterface, period time.Duration, stopCh <-chan struct{}) {
oldResources := make(map[schema.GroupVersionResource]struct{})
wait.Until(func() {
// Get the current resource list from discovery.
newResources := GetDeletableResources(discoveryClient)
...
if err := gc.resyncMonitors(newResources); err != nil {
utilruntime.HandleError(fmt.Errorf("failed to sync resource monitors (attempt %d): %v", attempt, err))
return false, nil
}
klog.V(4).Infof("resynced monitors")
...
gc.resyncMonitors
調(diào)用gc.dependencyGraphBuilder.syncMonitors:初始化infomer和注冊(cè)eventHandler;
調(diào)用gc.dependencyGraphBuilder.startMonitors:啟動(dòng)infomer。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) resyncMonitors(deletableResources map[schema.GroupVersionResource]struct{}) error {
if err := gc.dependencyGraphBuilder.syncMonitors(deletableResources); err != nil {
return err
}
gc.dependencyGraphBuilder.startMonitors()
return nil
}
garbagecollector.NewDebugHandler
garbagecollector.NewDebugHandler暴露http服務(wù),注冊(cè) debug 接口,用于debug,用來提供由GraphBuilder構(gòu)建的集群內(nèi)所有對(duì)象的關(guān)聯(lián)關(guān)系。
// pkg/controller/garbagecollector/dump.go
func NewDebugHandler(controller *GarbageCollector) http.Handler {
return &debugHTTPHandler{controller: controller}
}
type debugHTTPHandler struct {
controller *GarbageCollector
}
func (h *debugHTTPHandler) ServeHTTP(w http.ResponseWriter, req *http.Request) {
if req.URL.Path != "/graph" {
http.Error(w, "", http.StatusNotFound)
return
}
var graph graph.Directed
if uidStrings := req.URL.Query()["uid"]; len(uidStrings) > 0 {
uids := []types.UID{}
for _, uidString := range uidStrings {
uids = append(uids, types.UID(uidString))
}
graph = h.controller.dependencyGraphBuilder.uidToNode.ToGonumGraphForObj(uids...)
} else {
graph = h.controller.dependencyGraphBuilder.uidToNode.ToGonumGraph()
}
data, err := dot.Marshal(graph, "full", "", " ")
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "text/vnd.graphviz")
w.Header().Set("X-Content-Type-Options", "nosniff")
w.Write(data)
w.WriteHeader(http.StatusOK)
}
獲取對(duì)象關(guān)聯(lián)關(guān)系圖
獲取全部的對(duì)象關(guān)聯(lián)關(guān)系圖:
curl http://{master_ip}:{kcm_port}/debug/controllers/garbagecollector/graph -o {output_file}
獲取特定uid的對(duì)象關(guān)聯(lián)關(guān)系圖:
curl http://{master_ip}:{kcm_port}/debug/controllers/garbagecollector/graph?uid={project_uid} -o {output_file}
示例:
curl http://192.168.1.10:10252/debug/controllers/garbagecollector/graph?uid=8727f640-112e-21eb-11dd-626400510df6 -o /home/test
對(duì)象刪除策略
kubernetes 中有三種對(duì)象刪除策略:Orphan、Foreground 和Background,刪除某個(gè)對(duì)象時(shí),可以指定刪除策略。下面對(duì)這三種策略進(jìn)行介紹。
Foreground前臺(tái)刪除
Foreground即前臺(tái)刪除策略,屬于級(jí)聯(lián)刪除策略,垃圾收集器會(huì)刪除對(duì)象的所有dependent。
使用前臺(tái)刪除策略刪除某個(gè)對(duì)象時(shí),該對(duì)象的 deletionTimestamp 字段被設(shè)置,且對(duì)象的 metadata.finalizers 字段包含值 foregroundDeletion,用于阻塞該對(duì)象刪除,等到垃圾收集器在刪除了該對(duì)象中所有有阻塞能力的dependent對(duì)象(對(duì)象的 ownerReference.blockOwnerDeletion=true) 之后,再去除該對(duì)象的 metadata.finalizers 字段中的值 foregroundDeletion,然后刪除該owner對(duì)象。
以刪除deployment為例,使用前臺(tái)刪除策略,則按照Pod->ReplicaSet->Deployment的順序進(jìn)行刪除。
Background后臺(tái)刪除
Background即后臺(tái)刪除策略,屬于級(jí)聯(lián)刪除策略,Kubernetes會(huì)立即刪除該owner對(duì)象,之后垃圾收集器會(huì)在后臺(tái)自動(dòng)刪除其所有的dependent對(duì)象。
當(dāng)刪除一個(gè)對(duì)象時(shí)使用了Background后臺(tái)刪除策略時(shí),該對(duì)象因沒有相關(guān)的Finalizer設(shè)置(只有刪除策略為foreground或Orphan時(shí)會(huì)設(shè)置相關(guān)Finalizer),會(huì)直接被刪除,接著GraphBuilder會(huì)監(jiān)聽到該對(duì)象的delete事件,會(huì)將其dependents放入到attemptToDelete隊(duì)列中去,觸發(fā)GarbageCollector做dependents對(duì)象的回收刪除處理。
以刪除deployment為例,使用后臺(tái)刪除策略,則按照Deployment->ReplicaSet->Pod的順序進(jìn)行刪除。
Orphan孤兒刪除
Orphan即孤兒刪除策略,屬于非級(jí)聯(lián)刪除策略,即刪除某個(gè)對(duì)象時(shí),不會(huì)自動(dòng)刪除它的dependent,這些dependent也被稱作孤立對(duì)象。
當(dāng)刪除一個(gè)對(duì)象時(shí)使用了Orphan孤兒刪除策略時(shí),該對(duì)象的 metadata.finalizers 字段包含值 orphan,用于阻塞該對(duì)象刪除,直至GarbageCollector將其所有dependents的OwnerReferences屬性中的該owner的相關(guān)字段去除,再去除該owner對(duì)象的 metadata.finalizers 字段中的值 Orphan,最后才能刪除該owner對(duì)象。
以刪除deployment為例,使用孤兒刪除策略,則只刪除Deployment,對(duì)應(yīng)ReplicaSet和Pod不刪除。
刪除對(duì)象時(shí)指定刪除策略
當(dāng)刪除對(duì)象時(shí)沒有特別指定刪除策略,將會(huì)使用默認(rèn)刪除策略:Background即后臺(tái)刪除策略。
- 指定后臺(tái)刪除策略
curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Background"}' -H "Content-Type: application/json" - 指定前臺(tái)刪除策略
curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' -H "Content-Type: application/json" - 指定孤兒刪除策略
curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Orphan"}' -H "Content-Type: application/json"
garbage collector處理邏輯
1.GraphBuilder
首先先看到GraphBuilder。
GraphBuilder 主要有2個(gè)功能:
(1)基于 informers 中的資源事件在 uidToNode 屬性中維護(hù)著所有對(duì)象的關(guān)聯(lián)依賴關(guān)系;
(2)處理 graphChanges 中的事件,并作為生產(chǎn)者將事件放入到 attemptToDelete 和 attemptToOrphan 兩個(gè)隊(duì)列中,觸發(fā)消費(fèi)者GarbageCollector進(jìn)行對(duì)象的回收刪除操作。
GraphBuilder struct
先來簡單的分析下GraphBuilder struct,里面最關(guān)鍵的幾個(gè)屬性及作用如下:
(1)graphChanges:informers 監(jiān)聽到的事件會(huì)放在 graphChanges 中,然后GraphBuilder會(huì)作為消費(fèi)者,處理graphChanges隊(duì)列中的事件;
(2)uidToNode(對(duì)象依賴關(guān)聯(lián)關(guān)系圖):根據(jù)對(duì)象uid,維護(hù)所有對(duì)象的關(guān)聯(lián)依賴關(guān)系,也即前面說的owner與dependent之間的關(guān)系,也可以理解為GraphBuilder會(huì)維護(hù)一張所有對(duì)象的關(guān)聯(lián)依賴關(guān)系圖,而GarbageCollector進(jìn)行對(duì)象的回收刪除操作時(shí)會(huì)依賴于這個(gè)關(guān)系圖;
(3)attemptToDelete與attemptToOrphan:GraphBuilder作為生產(chǎn)者往attemptToDelete 和 attemptToOrphan 兩個(gè)隊(duì)列中存放事件,然后GarbageCollector作為消費(fèi)者會(huì)處理 attemptToDelete 和 attemptToOrphan 兩個(gè)隊(duì)列中的事件。
// pkg/controller/garbagecollector/graph_builder.go
type GraphBuilder struct {
...
// monitors are the producer of the graphChanges queue, graphBuilder alters
// the in-memory graph according to the changes.
graphChanges workqueue.RateLimitingInterface
// uidToNode doesn't require a lock to protect, because only the
// single-threaded GraphBuilder.processGraphChanges() reads/writes it.
uidToNode *concurrentUIDToNode
// GraphBuilder is the producer of attemptToDelete and attemptToOrphan, GC is the consumer.
attemptToDelete workqueue.RateLimitingInterface
attemptToOrphan workqueue.RateLimitingInterface
...
}
// pkg/controller/garbagecollector/graph.go
type concurrentUIDToNode struct {
uidToNodeLock sync.RWMutex
uidToNode map[types.UID]*node
}
// pkg/controller/garbagecollector/graph.go
type node struct {
...
dependents map[*node]struct{}
...
owners []metav1.OwnerReference
}
從結(jié)構(gòu)體定義中可以看到,一個(gè)k8s對(duì)象對(duì)應(yīng)著對(duì)象關(guān)聯(lián)依賴關(guān)系圖里的一個(gè)node,而每個(gè)node都會(huì)維護(hù)一個(gè)owner列表以及dependent列表。
GraphBuilder-gb.processGraphChanges
接下來看到GraphBuilder的處理邏輯部分,從gb.processGraphChanges作為入口進(jìn)行處理邏輯分析。
前面說過,informers 監(jiān)聽到的事件會(huì)放入到 graphChanges 隊(duì)列中,然后GraphBuilder會(huì)作為消費(fèi)者,處理graphChanges隊(duì)列中的事件,而processGraphChanges方法就是GraphBuilder作為消費(fèi)者處理graphChanges隊(duì)列中事件地方。
所以在此方法中,GraphBuilder既是消費(fèi)者又是生產(chǎn)者,消費(fèi)處理graphChanges 中的所有事件并進(jìn)行分類,再生產(chǎn)事件放入到 attemptToDelete 和 attemptToOrphan 兩個(gè)隊(duì)列中去,讓GarbageCollector作為消費(fèi)者去處理這兩個(gè)隊(duì)列中的事件。
主要邏輯:
(1)從graphChanges隊(duì)列中取出事件進(jìn)行處理;
(2)讀取uidToNode,判斷該對(duì)象是否已經(jīng)存在于已構(gòu)建的對(duì)象依賴關(guān)聯(lián)關(guān)系圖中;下面就開始根據(jù)對(duì)象是否存在于對(duì)象依賴關(guān)聯(lián)關(guān)系圖中以及事件類型來做不同的處理邏輯;
(3)若 uidToNode 中不存在該 node 且該事件是 addEvent 或 updateEvent,則為該 object 創(chuàng)建對(duì)應(yīng)的 node,并調(diào)用 gb.insertNode 將該 node 加到 uidToNode 中,然后將該 node 添加到其 owner 的 dependents 中;
然后再調(diào)用 gb.processTransitions 方法做處理,該方法的處理邏輯是判斷該對(duì)象是否處于刪除狀態(tài),若處于刪除狀態(tài)會(huì)判斷該對(duì)象是以 orphan 模式刪除還是以 foreground 模式刪除(其實(shí)就是判斷deployment對(duì)象的finalizer來區(qū)分刪除模式,刪除deployment的時(shí)候會(huì)帶上刪除策略,kube-apiserver會(huì)根據(jù)刪除策略給deployment對(duì)象打上相應(yīng)的finalizer),若以 orphan 模式刪除,則將該 node 加入到 attemptToOrphan 隊(duì)列中,若以 foreground 模式刪除則將該對(duì)象以及其所有 dependents 都加入到 attemptToDelete 隊(duì)列中;
(4)若 uidToNode 中存在該 node 且該事件是 addEvent 或 updateEvent 時(shí),則調(diào)用 referencesDiffs 方法檢查該對(duì)象的 OwnerReferences 字段是否有變化,有變化則做相應(yīng)處理,更新對(duì)象依賴關(guān)聯(lián)關(guān)系圖,最后調(diào)用 gb.processTransitions做處理;
(5)若事件為刪除事件,則調(diào)用gb.removeNode,從uidToNode中刪除該對(duì)象,然后從該node所有owners的dependents中刪除該對(duì)象,再把該對(duì)象的dependents放入到attemptToDelete隊(duì)列中,觸發(fā)GarbageCollector處理;最后檢查該 node 的所有 owners,若有處于刪除狀態(tài)的 owner,此時(shí)該 owner 可能處于刪除阻塞狀態(tài)正在等待該 node 的刪除,將該 owner 加入到 attemptToDelete隊(duì)列中,觸發(fā)GarbageCollector處理。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) runProcessGraphChanges() {
for gb.processGraphChanges() {
}
}
// Dequeueing an event from graphChanges, updating graph, populating dirty_queue.
func (gb *GraphBuilder) processGraphChanges() bool {
item, quit := gb.graphChanges.Get()
if quit {
return false
}
defer gb.graphChanges.Done(item)
event, ok := item.(*event)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect a *event, got %v", item))
return true
}
obj := event.obj
accessor, err := meta.Accessor(obj)
if err != nil {
utilruntime.HandleError(fmt.Errorf("cannot access obj: %v", err))
return true
}
klog.V(5).Infof("GraphBuilder process object: %s/%s, namespace %s, name %s, uid %s, event type %v", event.gvk.GroupVersion().String(), event.gvk.Kind, accessor.GetNamespace(), accessor.GetName(), string(accessor.GetUID()), event.eventType)
// Check if the node already exists
existingNode, found := gb.uidToNode.Read(accessor.GetUID())
if found {
// this marks the node as having been observed via an informer event
// 1. this depends on graphChanges only containing add/update events from the actual informer
// 2. this allows things tracking virtual nodes' existence to stop polling and rely on informer events
existingNode.markObserved()
}
switch {
case (event.eventType == addEvent || event.eventType == updateEvent) && !found:
newNode := &node{
identity: objectReference{
OwnerReference: metav1.OwnerReference{
APIVersion: event.gvk.GroupVersion().String(),
Kind: event.gvk.Kind,
UID: accessor.GetUID(),
Name: accessor.GetName(),
},
Namespace: accessor.GetNamespace(),
},
dependents: make(map[*node]struct{}),
owners: accessor.GetOwnerReferences(),
deletingDependents: beingDeleted(accessor) && hasDeleteDependentsFinalizer(accessor),
beingDeleted: beingDeleted(accessor),
}
gb.insertNode(newNode)
// the underlying delta_fifo may combine a creation and a deletion into
// one event, so we need to further process the event.
gb.processTransitions(event.oldObj, accessor, newNode)
case (event.eventType == addEvent || event.eventType == updateEvent) && found:
// handle changes in ownerReferences
added, removed, changed := referencesDiffs(existingNode.owners, accessor.GetOwnerReferences())
if len(added) != 0 || len(removed) != 0 || len(changed) != 0 {
// check if the changed dependency graph unblock owners that are
// waiting for the deletion of their dependents.
gb.addUnblockedOwnersToDeleteQueue(removed, changed)
// update the node itself
existingNode.owners = accessor.GetOwnerReferences()
// Add the node to its new owners' dependent lists.
gb.addDependentToOwners(existingNode, added)
// remove the node from the dependent list of node that are no longer in
// the node's owners list.
gb.removeDependentFromOwners(existingNode, removed)
}
if beingDeleted(accessor) {
existingNode.markBeingDeleted()
}
gb.processTransitions(event.oldObj, accessor, existingNode)
case event.eventType == deleteEvent:
if !found {
klog.V(5).Infof("%v doesn't exist in the graph, this shouldn't happen", accessor.GetUID())
return true
}
// removeNode updates the graph
gb.removeNode(existingNode)
existingNode.dependentsLock.RLock()
defer existingNode.dependentsLock.RUnlock()
if len(existingNode.dependents) > 0 {
gb.absentOwnerCache.Add(accessor.GetUID())
}
for dep := range existingNode.dependents {
gb.attemptToDelete.Add(dep)
}
for _, owner := range existingNode.owners {
ownerNode, found := gb.uidToNode.Read(owner.UID)
if !found || !ownerNode.isDeletingDependents() {
continue
}
// this is to let attempToDeleteItem check if all the owner's
// dependents are deleted, if so, the owner will be deleted.
gb.attemptToDelete.Add(ownerNode)
}
}
return true
}
結(jié)合代碼分析可以得知,當(dāng)刪除一個(gè)對(duì)象時(shí)使用了Background后臺(tái)刪除策略時(shí),該對(duì)象因沒有相關(guān)的Finalizer設(shè)置(只有刪除策略為Foreground或Orphan時(shí)會(huì)設(shè)置相關(guān)Finalizer),會(huì)直接被刪除,接著GraphBuilder會(huì)監(jiān)聽到該對(duì)象的delete事件,會(huì)將其dependents放入到attemptToDelete隊(duì)列中去,觸發(fā)GarbageCollector做dependents對(duì)象的回收刪除處理。
1.2.1 gb.insertNode
調(diào)用 gb.insertNode 將 node 加到 uidToNode 中,然后將該 node 添加到其 owner 的 dependents 中。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) insertNode(n *node) {
gb.uidToNode.Write(n)
gb.addDependentToOwners(n, n.owners)
}
func (gb *GraphBuilder) addDependentToOwners(n *node, owners []metav1.OwnerReference) {
for _, owner := range owners {
ownerNode, ok := gb.uidToNode.Read(owner.UID)
if !ok {
// Create a "virtual" node in the graph for the owner if it doesn't
// exist in the graph yet.
ownerNode = &node{
identity: objectReference{
OwnerReference: owner,
Namespace: n.identity.Namespace,
},
dependents: make(map[*node]struct{}),
virtual: true,
}
klog.V(5).Infof("add virtual node.identity: %s\n\n", ownerNode.identity)
gb.uidToNode.Write(ownerNode)
}
ownerNode.addDependent(n)
if !ok {
// Enqueue the virtual node into attemptToDelete.
// The garbage processor will enqueue a virtual delete
// event to delete it from the graph if API server confirms this
// owner doesn't exist.
gb.attemptToDelete.Add(ownerNode)
}
}
}
gb.processTransitions
gb.processTransitions 方法檢查k8s對(duì)象是否處于刪除狀態(tài)(對(duì)象的deletionTimestamp屬性不為空則處于刪除狀態(tài)),并且對(duì)象里含有刪除策略對(duì)應(yīng)的finalizer,然后做相應(yīng)的處理。
因?yàn)橹挥袆h除策略為Foreground或Orphan時(shí)對(duì)象才會(huì)會(huì)設(shè)置相關(guān)Finalizer,所以該方法只會(huì)處理刪除策略為Foreground或Orphan的對(duì)象,對(duì)于刪除策略為Background的對(duì)象不做處理。
若對(duì)象的deletionTimestamp屬性不為空,且有Orphaned刪除策略對(duì)應(yīng)的finalizer,則將對(duì)應(yīng)的node放入到 attemptToOrphan 隊(duì)列中,觸發(fā)GarbageCollector去消費(fèi)處理;
若對(duì)象的deletionTimestamp屬性不為空,且有foreground刪除策略對(duì)應(yīng)的finalizer,則調(diào)用n.markDeletingDependents標(biāo)記 node的 deletingDependents 屬性為 true,代表該node的dependents正在被刪除,并將對(duì)應(yīng)的node及其dependents放入到 attemptToDelete 隊(duì)列中,觸發(fā)GarbageCollector去消費(fèi)處理。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) processTransitions(oldObj interface{}, newAccessor metav1.Object, n *node) {
if startsWaitingForDependentsOrphaned(oldObj, newAccessor) {
klog.V(5).Infof("add %s to the attemptToOrphan", n.identity)
gb.attemptToOrphan.Add(n)
return
}
if startsWaitingForDependentsDeleted(oldObj, newAccessor) {
klog.V(2).Infof("add %s to the attemptToDelete, because it's waiting for its dependents to be deleted", n.identity)
// if the n is added as a "virtual" node, its deletingDependents field is not properly set, so always set it here.
n.markDeletingDependents()
for dep := range n.dependents {
gb.attemptToDelete.Add(dep)
}
gb.attemptToDelete.Add(n)
}
}
func startsWaitingForDependentsOrphaned(oldObj interface{}, newAccessor metav1.Object) bool {
return deletionStartsWithFinalizer(oldObj, newAccessor, metav1.FinalizerOrphanDependents)
}
func startsWaitingForDependentsDeleted(oldObj interface{}, newAccessor metav1.Object) bool {
return deletionStartsWithFinalizer(oldObj, newAccessor, metav1.FinalizerDeleteDependents)
}
func deletionStartsWithFinalizer(oldObj interface{}, newAccessor metav1.Object, matchingFinalizer string) bool {
// if the new object isn't being deleted, or doesn't have the finalizer we're interested in, return false
if !beingDeleted(newAccessor) || !hasFinalizer(newAccessor, matchingFinalizer) {
return false
}
// if the old object is nil, or wasn't being deleted, or didn't have the finalizer, return true
if oldObj == nil {
return true
}
oldAccessor, err := meta.Accessor(oldObj)
if err != nil {
utilruntime.HandleError(fmt.Errorf("cannot access oldObj: %v", err))
return false
}
return !beingDeleted(oldAccessor) || !hasFinalizer(oldAccessor, matchingFinalizer)
}
func beingDeleted(accessor metav1.Object) bool {
return accessor.GetDeletionTimestamp() != nil
}
func hasFinalizer(accessor metav1.Object, matchingFinalizer string) bool {
finalizers := accessor.GetFinalizers()
for _, finalizer := range finalizers {
if finalizer == matchingFinalizer {
return true
}
}
return false
}
gb.removeNode
調(diào)用gb.removeNode,從uidToNode中刪除該對(duì)象,然后從該node所有owners的dependents中刪除該對(duì)象,再把該對(duì)象的dependents放入到attemptToDelete隊(duì)列中,觸發(fā)GarbageCollector處理;最后檢查該 node 的所有 owners,若有處于刪除狀態(tài)的 owner,此時(shí)該 owner 可能處于刪除阻塞狀態(tài)正在等待該 node 的刪除,將該 owner 加入到 attemptToDelete隊(duì)列中,觸發(fā)GarbageCollector處理。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) removeNode(n *node) {
gb.uidToNode.Delete(n.identity.UID)
gb.removeDependentFromOwners(n, n.owners)
}
func (gb *GraphBuilder) removeDependentFromOwners(n *node, owners []metav1.OwnerReference) {
for _, owner := range owners {
ownerNode, ok := gb.uidToNode.Read(owner.UID)
if !ok {
continue
}
ownerNode.deleteDependent(n)
}
}
GarbageCollector
再來看到GarbageCollector。
GarbageCollector 主要有2個(gè)功能:
(1)處理 attemptToDelete隊(duì)列中的事件,根據(jù)對(duì)象刪除策略foreground或background做相應(yīng)的回收邏輯處理,刪除關(guān)聯(lián)對(duì)象;
(2)處理 attemptToOrphan隊(duì)列中的事件,根據(jù)對(duì)象刪除策略O(shè)rphan,更新該owner的所有dependents對(duì)象,將對(duì)象的OwnerReferences屬性中該owner的相關(guān)字段去除,接著再更新該owner對(duì)象,去除Orphan刪除策略對(duì)應(yīng)的finalizers。
GarbageCollector的2個(gè)關(guān)鍵處理方法:
(1)gc.runAttemptToDeleteWorker:主要負(fù)責(zé)處理attemptToDelete隊(duì)列中的事件,負(fù)責(zé)刪除策略為foreground或background的對(duì)象回收處理;
(2)gc.runAttemptToOrphanWorker:主要負(fù)責(zé)處理attemptToOrphan隊(duì)列中的事件,負(fù)責(zé)刪除策略為Orphan的對(duì)象回收處理。
GarbageCollector struct
先來簡單的分析下GarbageCollector struct,里面最關(guān)鍵的幾個(gè)屬性及作用如下:
(1)attemptToDelete與attemptToOrphan:GraphBuilder作為生產(chǎn)者往attemptToDelete 和 attemptToOrphan 兩個(gè)隊(duì)列中存放事件,然后GarbageCollector作為消費(fèi)者會(huì)處理 attemptToDelete 和 attemptToOrphan 兩個(gè)隊(duì)列中的事件。
// pkg/controller/garbagecollector/garbagecollector.go
type GarbageCollector struct {
...
attemptToDelete workqueue.RateLimitingInterface
attemptToOrphan workqueue.RateLimitingInterface
...
}
2.2 GarbageCollector-gc.runAttemptToDeleteWorker
接下來看到GarbageCollector的處理邏輯部分,從gc.runAttemptToDeleteWorker作為入口進(jìn)行處理邏輯分析。
runAttemptToDeleteWorker主要邏輯為循環(huán)調(diào)用attemptToDeleteWorker方法。
attemptToDeleteWorker方法主要邏輯:
(1)從attemptToDelete隊(duì)列中取出對(duì)象;
(2)調(diào)用 gc.attemptToDeleteItem 嘗試刪除 node;
(3)若刪除失敗則重新加入到 attemptToDelete 隊(duì)列中進(jìn)行重試。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) runAttemptToDeleteWorker() {
for gc.attemptToDeleteWorker() {
}
}
func (gc *GarbageCollector) attemptToDeleteWorker() bool {
item, quit := gc.attemptToDelete.Get()
gc.workerLock.RLock()
defer gc.workerLock.RUnlock()
if quit {
return false
}
defer gc.attemptToDelete.Done(item)
n, ok := item.(*node)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item))
return true
}
err := gc.attemptToDeleteItem(n)
if err != nil {
if _, ok := err.(*restMappingError); ok {
// There are at least two ways this can happen:
// 1. The reference is to an object of a custom type that has not yet been
// recognized by gc.restMapper (this is a transient error).
// 2. The reference is to an invalid group/version. We don't currently
// have a way to distinguish this from a valid type we will recognize
// after the next discovery sync.
// For now, record the error and retry.
klog.V(5).Infof("error syncing item %s: %v", n, err)
} else {
utilruntime.HandleError(fmt.Errorf("error syncing item %s: %v", n, err))
}
// retry if garbage collection of an object failed.
gc.attemptToDelete.AddRateLimited(item)
} else if !n.isObserved() {
// requeue if item hasn't been observed via an informer event yet.
// otherwise a virtual node for an item added AND removed during watch reestablishment can get stuck in the graph and never removed.
// see https://issue.k8s.io/56121
klog.V(5).Infof("item %s hasn't been observed via informer yet", n.identity)
gc.attemptToDelete.AddRateLimited(item)
}
return true
}
gc.attemptToDeleteItem
主要邏輯:
(1)判斷 node 是否處于刪除狀態(tài);
(2)從 apiserver 獲取該 node 對(duì)應(yīng)的對(duì)象;
(3)調(diào)用item.isDeletingDependents方法:通過 node 的 deletingDependents 字段判斷該 node 當(dāng)前是否正在刪除 dependents,若是則調(diào)用 gc.processDeletingDependentsItem 方法對(duì)dependents做進(jìn)一步處理:檢查該node 的 blockingDependents 是否被完全刪除,若是則移除該 node對(duì)應(yīng)對(duì)象的相關(guān) finalizer,若否,則將未刪除的 blockingDependents 加入到 attemptToDelete隊(duì)列中;
上面分析GraphBuilder時(shí)說到,在 GraphBuilder 處理 graphChanges 中的事件時(shí),在processTransitions方法邏輯里,會(huì)調(diào)用n.markDeletingDependents,標(biāo)記 node的 deletingDependents 屬性為 true;
(4)調(diào)用gc.classifyReferences將 node 的owner分為3類,分別是solid(至少有一個(gè) owner 存在且不處于刪除狀態(tài))、dangling(owner 均不存在)、waitingForDependentsDeletion(owner 存在,處于刪除狀態(tài)且正在等待其 dependents 被刪除);
(5)接下來將根據(jù)solid、dangling與waitingForDependentsDeletion的數(shù)量做不同的邏輯處理;
(6)第一種情況:當(dāng)solid數(shù)量不為0時(shí),即該node至少有一個(gè) owner 存在且不處于刪除狀態(tài),則說明該對(duì)象還不能被回收刪除,此時(shí)將 dangling 和 waitingForDependentsDeletion 列表中的 owner 從 node 的 ownerReferences 中刪除;
(7)第二種情況:solid數(shù)量為0,該 node 的 owner 處于 waitingForDependentsDeletion 狀態(tài)并且 node 的 dependents 未被完全刪除,將使用foreground前臺(tái)刪除策略來刪除該node對(duì)應(yīng)的對(duì)象;
(8)當(dāng)不滿足以上兩種情況時(shí)(即),進(jìn)入該默認(rèn)處理邏輯:按照刪除對(duì)象時(shí)使用的刪除策略,調(diào)用 apiserver 的接口刪除對(duì)象。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) attemptToDeleteItem(item *node) error {
klog.V(2).Infof("processing item %s", item.identity)
// "being deleted" is an one-way trip to the final deletion. We'll just wait for the final deletion, and then process the object's dependents.
if item.isBeingDeleted() && !item.isDeletingDependents() {
klog.V(5).Infof("processing item %s returned at once, because its DeletionTimestamp is non-nil", item.identity)
return nil
}
// TODO: It's only necessary to talk to the API server if this is a
// "virtual" node. The local graph could lag behind the real status, but in
// practice, the difference is small.
latest, err := gc.getObject(item.identity)
switch {
case errors.IsNotFound(err):
// the GraphBuilder can add "virtual" node for an owner that doesn't
// exist yet, so we need to enqueue a virtual Delete event to remove
// the virtual node from GraphBuilder.uidToNode.
klog.V(5).Infof("item %v not found, generating a virtual delete event", item.identity)
gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity)
// since we're manually inserting a delete event to remove this node,
// we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete
item.markObserved()
return nil
case err != nil:
return err
}
if latest.GetUID() != item.identity.UID {
klog.V(5).Infof("UID doesn't match, item %v not found, generating a virtual delete event", item.identity)
gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity)
// since we're manually inserting a delete event to remove this node,
// we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete
item.markObserved()
return nil
}
// TODO: attemptToOrphanWorker() routine is similar. Consider merging
// attemptToOrphanWorker() into attemptToDeleteItem() as well.
if item.isDeletingDependents() {
return gc.processDeletingDependentsItem(item)
}
// compute if we should delete the item
ownerReferences := latest.GetOwnerReferences()
if len(ownerReferences) == 0 {
klog.V(2).Infof("object %s's doesn't have an owner, continue on next item", item.identity)
return nil
}
solid, dangling, waitingForDependentsDeletion, err := gc.classifyReferences(item, ownerReferences)
if err != nil {
return err
}
klog.V(5).Infof("classify references of %s.\nsolid: %#v\ndangling: %#v\nwaitingForDependentsDeletion: %#v\n", item.identity, solid, dangling, waitingForDependentsDeletion)
switch {
case len(solid) != 0:
klog.V(2).Infof("object %#v has at least one existing owner: %#v, will not garbage collect", item.identity, solid)
if len(dangling) == 0 && len(waitingForDependentsDeletion) == 0 {
return nil
}
klog.V(2).Infof("remove dangling references %#v and waiting references %#v for object %s", dangling, waitingForDependentsDeletion, item.identity)
// waitingForDependentsDeletion needs to be deleted from the
// ownerReferences, otherwise the referenced objects will be stuck with
// the FinalizerDeletingDependents and never get deleted.
ownerUIDs := append(ownerRefsToUIDs(dangling), ownerRefsToUIDs(waitingForDependentsDeletion)...)
patch := deleteOwnerRefStrategicMergePatch(item.identity.UID, ownerUIDs...)
_, err = gc.patch(item, patch, func(n *node) ([]byte, error) {
return gc.deleteOwnerRefJSONMergePatch(n, ownerUIDs...)
})
return err
case len(waitingForDependentsDeletion) != 0 && item.dependentsLength() != 0:
deps := item.getDependents()
for _, dep := range deps {
if dep.isDeletingDependents() {
// this circle detection has false positives, we need to
// apply a more rigorous detection if this turns out to be a
// problem.
// there are multiple workers run attemptToDeleteItem in
// parallel, the circle detection can fail in a race condition.
klog.V(2).Infof("processing object %s, some of its owners and its dependent [%s] have FinalizerDeletingDependents, to prevent potential cycle, its ownerReferences are going to be modified to be non-blocking, then the object is going to be deleted with Foreground", item.identity, dep.identity)
patch, err := item.unblockOwnerReferencesStrategicMergePatch()
if err != nil {
return err
}
if _, err := gc.patch(item, patch, gc.unblockOwnerReferencesJSONMergePatch); err != nil {
return err
}
break
}
}
klog.V(2).Infof("at least one owner of object %s has FinalizerDeletingDependents, and the object itself has dependents, so it is going to be deleted in Foreground", item.identity)
// the deletion event will be observed by the graphBuilder, so the item
// will be processed again in processDeletingDependentsItem. If it
// doesn't have dependents, the function will remove the
// FinalizerDeletingDependents from the item, resulting in the final
// deletion of the item.
policy := metav1.DeletePropagationForeground
return gc.deleteObject(item.identity, &policy)
default:
// item doesn't have any solid owner, so it needs to be garbage
// collected. Also, none of item's owners is waiting for the deletion of
// the dependents, so set propagationPolicy based on existing finalizers.
var policy metav1.DeletionPropagation
switch {
case hasOrphanFinalizer(latest):
// if an existing orphan finalizer is already on the object, honor it.
policy = metav1.DeletePropagationOrphan
case hasDeleteDependentsFinalizer(latest):
// if an existing foreground finalizer is already on the object, honor it.
policy = metav1.DeletePropagationForeground
default:
// otherwise, default to background.
policy = metav1.DeletePropagationBackground
}
klog.V(2).Infof("delete object %s with propagation policy %s", item.identity, policy)
return gc.deleteObject(item.identity, &policy)
}
}
gc.processDeletingDependentsItem
主要邏輯:檢查該node 的 blockingDependents(即阻塞owner刪除的dpendents)是否被完全刪除,若是則移除該 node對(duì)應(yīng)對(duì)象的相關(guān) finalizer(finalizer移除后,kube-apiserver會(huì)刪除該對(duì)象),若否,則將未刪除的 blockingDependents 加入到 attemptToDelete隊(duì)列中。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) processDeletingDependentsItem(item *node) error {
blockingDependents := item.blockingDependents()
if len(blockingDependents) == 0 {
klog.V(2).Infof("remove DeleteDependents finalizer for item %s", item.identity)
return gc.removeFinalizer(item, metav1.FinalizerDeleteDependents)
}
for _, dep := range blockingDependents {
if !dep.isDeletingDependents() {
klog.V(2).Infof("adding %s to attemptToDelete, because its owner %s is deletingDependents", dep.identity, item.identity)
gc.attemptToDelete.Add(dep)
}
}
return nil
}
item.blockingDependents
item.blockingDependents返回會(huì)阻塞node刪除的dependents。一個(gè)dependents會(huì)不會(huì)阻塞owner的刪除,主要看這個(gè)dependents的ownerReferences的blockOwnerDeletion屬性值是否為true,為true則代表該dependents會(huì)阻塞owner的刪除。
// pkg/controller/garbagecollector/graph.go
func (n *node) blockingDependents() []*node {
dependents := n.getDependents()
var ret []*node
for _, dep := range dependents {
for _, owner := range dep.owners {
if owner.UID == n.identity.UID && owner.BlockOwnerDeletion != nil && *owner.BlockOwnerDeletion {
ret = append(ret, dep)
}
}
}
return ret
}
GarbageCollector-gc.runAttemptToOrphanWorker
gc.runAttemptToOrphanWorker方法是負(fù)責(zé)處理orphan刪除策略刪除的 node。
gc.runAttemptToDeleteWorker主要邏輯為循環(huán)調(diào)用gc.attemptToDeleteWorker方法。
下面來看一下gc.attemptToDeleteWorker方法的主要邏輯:
(1)從attemptToOrphan隊(duì)列中取出對(duì)象;
(2)調(diào)用gc.orphanDependents方法:更新該owner的所有dependents對(duì)象,將對(duì)象的OwnerReferences屬性中該owner的相關(guān)字段去除,失敗則將該owner重新加入到attemptToOrphan隊(duì)列中;
(3)調(diào)用gc.removeFinalizer方法:更新該owner對(duì)象,去除Orphan刪除策略對(duì)應(yīng)的finalizers。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) runAttemptToOrphanWorker() {
for gc.attemptToOrphanWorker() {
}
}
func (gc *GarbageCollector) attemptToOrphanWorker() bool {
item, quit := gc.attemptToOrphan.Get()
gc.workerLock.RLock()
defer gc.workerLock.RUnlock()
if quit {
return false
}
defer gc.attemptToOrphan.Done(item)
owner, ok := item.(*node)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item))
return true
}
// we don't need to lock each element, because they never get updated
owner.dependentsLock.RLock()
dependents := make([]*node, 0, len(owner.dependents))
for dependent := range owner.dependents {
dependents = append(dependents, dependent)
}
owner.dependentsLock.RUnlock()
err := gc.orphanDependents(owner.identity, dependents)
if err != nil {
utilruntime.HandleError(fmt.Errorf("orphanDependents for %s failed with %v", owner.identity, err))
gc.attemptToOrphan.AddRateLimited(item)
return true
}
// update the owner, remove "orphaningFinalizer" from its finalizers list
err = gc.removeFinalizer(owner, metav1.FinalizerOrphanDependents)
if err != nil {
utilruntime.HandleError(fmt.Errorf("removeOrphanFinalizer for %s failed with %v", owner.identity, err))
gc.attemptToOrphan.AddRateLimited(item)
}
return true
}
gc.orphanDependents
主要邏輯:更新指定owner的所有dependents對(duì)象,將對(duì)象的OwnerReferences屬性中該owner的相關(guān)字段去除,對(duì)于每個(gè)dependents,分別起一個(gè)goroutine來處理,加快處理速度。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) orphanDependents(owner objectReference, dependents []*node) error {
errCh := make(chan error, len(dependents))
wg := sync.WaitGroup{}
wg.Add(len(dependents))
for i := range dependents {
go func(dependent *node) {
defer wg.Done()
// the dependent.identity.UID is used as precondition
patch := deleteOwnerRefStrategicMergePatch(dependent.identity.UID, owner.UID)
_, err := gc.patch(dependent, patch, func(n *node) ([]byte, error) {
return gc.deleteOwnerRefJSONMergePatch(n, owner.UID)
})
// note that if the target ownerReference doesn't exist in the
// dependent, strategic merge patch will NOT return an error.
if err != nil && !errors.IsNotFound(err) {
errCh <- fmt.Errorf("orphaning %s failed, %v", dependent.identity, err)
}
}(dependents[i])
}
wg.Wait()
close(errCh)
var errorsSlice []error
for e := range errCh {
errorsSlice = append(errorsSlice, e)
}
if len(errorsSlice) != 0 {
return fmt.Errorf("failed to orphan dependents of owner %s, got errors: %s", owner, utilerrors.NewAggregate(errorsSlice).Error())
}
klog.V(5).Infof("successfully updated all dependents of owner %s", owner)
return nil
}