Android Watchdog學(xué)習(xí)筆記

Watchdog的作用是監(jiān)控系統(tǒng)服務(wù)有沒(méi)有卡住,如果有,就會(huì)重啟系統(tǒng)服務(wù)。

1. Watchdog的啟動(dòng)

SystemServer中創(chuàng)建Watchdog,并讓它的run跑起來(lái):

            final Watchdog watchdog = Watchdog.getInstance();
            watchdog.init(context, mActivityManagerService);
            ...
            Watchdog.getInstance().start();

2. 監(jiān)控服務(wù)

2.1 注冊(cè)監(jiān)控服務(wù)

系統(tǒng)服務(wù)首先會(huì)向Watchdog注冊(cè)自己

        Watchdog.getInstance().addMonitor(this);
        Watchdog.getInstance().addThread(mHandler);

注冊(cè)的內(nèi)容會(huì)放入Watchdog:

public void addMonitor(Monitor monitor) {
    // 將monitor對(duì)象添加到Monitor Checker中,
    // 在Watchdog初始化時(shí),可以看到Monitor Checker本身也是一個(gè)HandlerChecker對(duì)象
    mMonitors.add(monitor);
}
 
public void addThread(Handler thread, long timeoutMillis) {
    synchronized (this) {
        if (isAlive()) {
            throw new RuntimeException("Threads can't be added once the Watchdog is running");
        }
        final String name = thread.getLooper().getThread().getName();
        // 為Handler構(gòu)建一個(gè)HandlerChecker對(duì)象,其實(shí)就是**Looper Checker**
        mHandlerCheckers.add(new HandlerChecker(thread, name, timeoutMillis));
    }

HandlerChecker會(huì)調(diào)用Monitor檢測(cè)服務(wù)狀態(tài),同時(shí)根據(jù)檢測(cè)狀態(tài)做下一步處理。

2.2 Watchdog的監(jiān)控方式

Watchdog會(huì)在自己的run()方法中不斷進(jìn)行監(jiān)控:

    public void run() {

        while (true) {
            synchronized (this) {
                long timeout = CHECK_INTERVAL;
                // 1. 開(kāi)始監(jiān)控
                // Make sure we (re)spin the checkers that have become idle within
                // this wait-and-check interval
                for (int i=0; i<mHandlerCheckers.size(); i++) {
                    HandlerChecker hc = mHandlerCheckers.get(i);
                    hc.scheduleCheckLocked();
                }

            // 2. 給監(jiān)控線(xiàn)程一點(diǎn)時(shí)間(30s)
            long start = SystemClock.uptimeMillis();
            while (timeout > 0) {
                ...
                try {
                    wait(timeout);
                } catch (InterruptedException e) {
                    Log.wtf(TAG, e);
                }
                ...
                timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
            }
 
            // 3. 檢查HandlerChecker的完成狀態(tài)
            final int waitState = evaluateCheckerCompletionLocked();
            if (waitState == COMPLETED) {
                ...
                continue;
            } else if (waitState == WAITING) {
                ...
                continue;
            } else if (waitState == WAITED_HALF) {
                ...
                continue;
            }
 
            // 4. 存在超時(shí)的HandlerChecker
            blockedCheckers = getBlockedCheckersLocked();
            subject = describeCheckersLocked(blockedCheckers);
            allowRestart = mAllowRestart;
        }
        ...
        // 5. 保存日志,判斷是否需要?dú)⒌粝到y(tǒng)進(jìn)程
        Slog.w(TAG, "*** GOODBYE!");
        Process.killProcess(Process.myPid());

2.3 監(jiān)控服務(wù)時(shí)做了什么

HandlerChecker的scheduleCheckLocked用于發(fā)布一個(gè)HandlerChecker自己的run的執(zhí)行任務(wù):

       public void scheduleCheckLocked() {
            ...
            mStartTime = SystemClock.uptimeMillis();
            mHandler.postAtFrontOfQueue(this);
        }

執(zhí)行到被監(jiān)控的服務(wù)的monitor():

        public void run() {
            final int size = mMonitors.size();
            for (int i = 0 ; i < size ; i++) {
                synchronized (Watchdog.this) {
                    mCurrentMonitor = mMonitors.get(i);
                }
                mCurrentMonitor.monitor();
            }
            // 這里mCompleted置true。如果死鎖或者等待超時(shí)沒(méi)來(lái)得及置true,會(huì)在檢測(cè)時(shí)被認(rèn)為服務(wù)出現(xiàn)了問(wèn)題
            synchronized (Watchdog.this) {
                mCompleted = true;
                mCurrentMonitor = null;
            }
        }

服務(wù)的monitor其實(shí)就是嘗試獲取一下鎖:

    public void monitor() {
        synchronized (this) { }
    }

為什么獲取鎖就可以監(jiān)控服務(wù)狀態(tài)?因?yàn)橄到y(tǒng)服務(wù)會(huì)被很多客戶(hù)端調(diào)用,需要處理多線(xiàn)程,就必然會(huì)用到鎖。如果出現(xiàn)死鎖,或者某次調(diào)用卡住了占著鎖不放,Watchdog就獲取不到鎖,就可以認(rèn)為服務(wù)出現(xiàn)了異常。

2.3 檢查狀態(tài)

通過(guò)服務(wù)注冊(cè)的Handler(存在Watchdog的mHandlerCheckers中),來(lái)判斷服務(wù)的狀態(tài)。主要就是看Monitor方法是否順利完成,若沒(méi)有完成就計(jì)算耗時(shí)情況。

    private int evaluateCheckerCompletionLocked() {
        int state = COMPLETED;
        for (int i=0; i<mHandlerCheckers.size(); i++) {
            HandlerChecker hc = mHandlerCheckers.get(i);
            state = Math.max(state, hc.getCompletionStateLocked());
        }
        return state;
    }

        public int getCompletionStateLocked() {
            if (mCompleted) {
                return COMPLETED;
            } else {
                long latency = SystemClock.uptimeMillis() - mStartTime;
                if (latency < mWaitMax/2) {
                    return WAITING;
                } else if (latency < mWaitMax) {
                    return WAITED_HALF;
                }
            }
            return OVERDUE;
        }

2.4 異常處理

出現(xiàn)異常有兩種處理:
1,殺異常的進(jìn)程

Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: XXX
Watchdog: XXX
Watchdog: "*** GOODBYE!

2,重啟

Rebooting system because:xxx
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀(guān)點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容