ANR相關(guān)-Service ANR流程理解誤區(qū)

前言

Service發(fā)生ANR時(shí)一直存在一個(gè)誤區(qū),一直認(rèn)為

2121 bumpServiceExecutingLocked(r, execInFg, "bind");
2558 bumpServiceExecutingLocked(r, execInFg, "create");
2682 bumpServiceExecutingLocked(r, execInFg, "start");
2808 bumpServiceExecutingLocked(r, false, "bring down unbind");
2908 bumpServiceExecutingLocked(r, false, "destroy");
3004 bumpServiceExecutingLocked(s, false, "unbind");

這些操作每一個(gè)不超過(guò)timeout時(shí)間就行了,后來(lái)發(fā)現(xiàn)其實(shí)不是這樣,以下面的demo為例:

客戶端

class MainHandler extends  Handler{
    public static final int MSG_TEST_1 = 1;
    public static final int MSG_TEST_2 = 2;
    public static final int MSG_TEST_3 = 3;
    public static final int MSG_TEST_4 = 4;
    public static final int MSG_TEST_5 = 5;

    MainHandler(Looper loop){
        super(loop);
    }

    @Override
    public void handleMessage(Message msg) {
        switch (msg.what){
            case MSG_TEST_1:
                Log.i("weijuncheng","msg 1 start");
                try {
                    Thread.sleep(3500);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                Log.i("weijuncheng","msg 1 end");
                break;
            case MSG_TEST_2:
                Log.i("weijuncheng","msg 2 start");
                Log.i("weijuncheng","msg 2 end");
                break;
            case MSG_TEST_3:
                Log.i("weijuncheng","msg 3 start");
                try {
                    Thread.sleep(3500);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                Log.i("weijuncheng","msg 3 end");
                break;
            case MSG_TEST_4:
                Log.i("weijuncheng","msg 4 start");
                Log.i("weijuncheng","msg 4 end");
                break;
            case MSG_TEST_5:
                Log.i("weijuncheng","msg 5 start");
                Log.i("weijuncheng","msg 5 end");
                break;

        }
    }
}


public class MainActivity extends AppCompatActivity {

    private Handler mHandler;


    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        mHandler = new MainHandler(MainActivity.this.getMainLooper());

        Button btn = (Button)findViewById(R.id.btn);
        btn.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View v) {
                mHandler.post(new Runnable(){
                    @Override
                    public void run() {
                        Log.i("weijuncheng","message 0 start");
                        try {
                            Thread.sleep(3500);
                        } catch (InterruptedException e) {
                            e.printStackTrace();
                        }
                        Log.i("weijuncheng","message 0 end");
                    }});
                //上面的post Message打不出來(lái),會(huì)不會(huì)是post發(fā)送的原因
                mHandler.sendEmptyMessage(MainHandler.MSG_TEST_1);
                mHandler.sendEmptyMessage(MainHandler.MSG_TEST_2);
                mHandler.sendEmptyMessage(MainHandler.MSG_TEST_4);
                mHandler.sendEmptyMessageDelayed(MainHandler.MSG_TEST_5,100);
                startService(new Intent(MainActivity.this, TestAnrService.class));
            }

        });

        mHandler.sendEmptyMessageDelayed(MainHandler.MSG_TEST_3,1000);
    }
}

服務(wù)端

public class TestAnrService extends Service {

    private ITestAnrService.Stub IService = new ITestAnrService.Stub() {
        @Override
        public void basicTypes(int anInt, long aLong, boolean aBoolean, float aFloat, double aDouble, String aString) throws RemoteException {

        }

        @Override
        public void Method1() throws RemoteException {

        }

        @Override
        public void Method2() throws RemoteException {

        }
    };

    public TestAnrService() {
    }


    @Override
    public void onCreate() {
        Log.i("weijuncheng","onCreate start");
        super.onCreate();
        try {
            Thread.sleep(3500);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        Log.i("weijuncheng","onCreate end");
    }

    @Override
    public IBinder onBind(Intent intent) {
        // TODO: Return the communication channel to the service.
        return IService;
    }

    @Override
    public int onStartCommand(Intent intent, int flags, int startId) {
        Log.i("weijuncheng","onStartCommand start");
        try {
            Thread.sleep(30000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        Log.i("weijuncheng","onStartCommand end");
        return super.onStartCommand(intent, flags, startId);
    }
}

按照前面的理解,AMS的每個(gè)操作都不超過(guò)timeout時(shí)間,那么就產(chǎn)生問(wèn)題了,按理說(shuō)onCreate前面確實(shí)等了很長(zhǎng)時(shí)間,但是onCreate執(zhí)行完了,system_server中的ActivityMananger線程對(duì)應(yīng)的Handler應(yīng)該就會(huì)把相應(yīng)的超時(shí)消息移除了啊

整理

我們回顧下移除的位置serviceDoneExecutingLocked:

3138    private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
3139            boolean finishing) {
3140        if (DEBUG_SERVICE) Slog.v(TAG_SERVICE, "<<< DONE EXECUTING " + r
3141                + ": nesting=" + r.executeNesting
3142                + ", inDestroying=" + inDestroying + ", app=" + r.app);
3143        else if (DEBUG_SERVICE_EXECUTING) Slog.v(TAG_SERVICE_EXECUTING,
3144                "<<< DONE EXECUTING " + r.shortName);
3145        r.executeNesting--;
3146        if (r.executeNesting <= 0) {
3147            if (r.app != null) {
3148                if (DEBUG_SERVICE) Slog.v(TAG_SERVICE,
3149                        "Nesting at 0 of " + r.shortName);
3150                r.app.execServicesFg = false;
3151                r.app.executingServices.remove(r);
3152                if (r.app.executingServices.size() == 0) {
3153                    if (DEBUG_SERVICE || DEBUG_SERVICE_EXECUTING) Slog.v(TAG_SERVICE_EXECUTING,
3154                            "No more executingServices of " + r.shortName);
3155                    mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
3156                } else if (r.executeFg) {
3157                    // Need to re-evaluate whether the app still needs to be in the foreground.
3158                    for (int i=r.app.executingServices.size()-1; i>=0; i--) {
3159                        if (r.app.executingServices.valueAt(i).executeFg) {
3160                            r.app.execServicesFg = true;
3161                            break;
3162                        }
3163                    }
3164                }
3165                if (inDestroying) {
3166                    if (DEBUG_SERVICE) Slog.v(TAG_SERVICE,
3167                            "doneExecuting remove destroying " + r);
3168                    mDestroyingServices.remove(r);
3169                    r.bindings.clear();
3170                }
3171                mAm.updateOomAdjLocked(r.app, true);
3172            }
3173            r.executeFg = false;
3174            if (r.tracker != null) {
3175                r.tracker.setExecuting(false, mAm.mProcessStats.getMemFactorLocked(),
3176                        SystemClock.uptimeMillis());
3177                if (finishing) {
3178                    r.tracker.clearCurrentOwner(r, false);
3179                    r.tracker = null;
3180                }
3181            }
3182            if (finishing) {
3183                if (r.app != null && !r.app.persistent) {
3184                    r.app.services.remove(r);
3185                    if (r.whitelistManager) {
3186                        updateWhitelistManagerLocked(r.app);
3187                    }
3188                }
3189                r.app = null;
3190            }
3191        }
3192    }

可以看到,取消超時(shí)Message的時(shí)候有一個(gè)條件r.executeNesting <= 0;每執(zhí)行一個(gè)操作會(huì)+1,執(zhí)行完進(jìn)程會(huì)通知AMS,調(diào)用serviceDoneExecutingLocked的executeNesting--;但是就以startService為例,其操作不止一個(gè);當(dāng)客戶端進(jìn)程通過(guò)binder call調(diào)用startService時(shí),system_server一個(gè)binder線程A響應(yīng),如執(zhí)行realStartServiceLocked;在中間會(huì)再調(diào)用binder call對(duì)服務(wù)端進(jìn)行操作;服務(wù)端的一個(gè)binder線程響應(yīng),將相應(yīng)消息發(fā)送到服務(wù)端進(jìn)程的主線程;等到執(zhí)行完畢,通過(guò)binder call調(diào)用AMS中的binder線程B執(zhí)行serviceDoneExecutingLocked,r.executeNesting--;但是system_server的binder線程A不會(huì)等待B的執(zhí)行結(jié)果再繼續(xù)執(zhí)行;那么executeNesting的值很可能為2或者更多,就像上面那種情況;所以說(shuō)Service ANR的是以一個(gè)操作來(lái)判斷是否ANR,其粒度不是create,start,bind這種粒度,而是如startService這一個(gè)操作,需要在第一個(gè)超時(shí)消息真正被執(zhí)行前處理完(removeMessage時(shí)會(huì)將所有what值相同的Message全部移除)

加上debug service log驗(yàn)證:

12-21 11:28:26.309  1000  1487  9392 V ActivityManager: >>> EXECUTING create of ServiceRecord{da8991b u0 com.test.weijuncheng.testanr_ipc_server/.Service.TestAnrService} in app ProcessRecord{5519ab8 11495:com.test.weijuncheng.testanr_ipc_server/u0a172}
12-21 11:28:26.311  1000  1487  9392 V ActivityManager: Sending arguments to: ServiceRecord{da8991b u0 com.test.weijuncheng.testanr_ipc_server/.Service.TestAnrService} android.content.Intent$FilterComparison@e16fe3c9 args=Intent { cmp=com.test.weijuncheng.testanr_ipc_server/.Service.TestAnrService }
12-21 11:28:26.311  1000  1487  9392 V ActivityManager: >>> EXECUTING start of ServiceRecord{da8991b u0 com.test.weijuncheng.testanr_ipc_server/.Service.TestAnrService} in app ProcessRecord{5519ab8 11495:com.test.weijuncheng.testanr_ipc_server/u0a172}

12-21 11:28:36.863 10172 11495 11495 I weijuncheng: onCreate end
12-21 11:28:36.868  1000  1487  4648 V ActivityManager: <<< DONE EXECUTING ServiceRecord{da8991b u0 com.test.weijuncheng.testanr_ipc_server/.Service.TestAnrService}: nesting=2, inDestroying=false, app=ProcessRecord{5519ab8 11495:com.test.weijuncheng.testanr_ipc_server/u0a172}
12-21 11:28:36.871 10172 11495 11495 I weijuncheng: onStartCommand start

針對(duì)小米機(jī)型的技巧

在小米機(jī)型上,許多機(jī)器都有Message統(tǒng)計(jì)的功能,統(tǒng)計(jì)哪些Message等待時(shí)間,執(zhí)行時(shí)間較長(zhǎng),在/data/anr下有anr_info_processName.txt;上面的demo生成的文件如下:

ANR in com.test.weijuncheng.testanr_ipc_server
PID: 10504
Reason: executing service com.test.weijuncheng.testanr_ipc_server/.Service.TestAnrService
 package com.test.weijuncheng.testanr_ipc_server version Code: 1 version Name: 1.0 cur loop is : Looper (main, tid 2) {16ea0c0}
Dump time : 2018-12-14_09:42:55.746
---------- History of long time messages on Looper (main, tid 2) {16ea0c0}----------
#0: { what=114 target=android.app.ActivityThread$H when=2018-12-14_09:42:35.635 latency=+7s8ms processing=+3s518ms }
#1: { what=2 target=com.test.weijuncheng.testanr_ipc_server.MainHandler when=2018-12-14_09:42:35.631 latency=+7s12ms processing=+1ms }
#2: { what=1 target=com.test.weijuncheng.testanr_ipc_server.MainHandler when=2018-12-14_09:42:35.631 latency=+3s509ms processing=+3s503ms }
#3: { callback=com.test.weijuncheng.testanr_ipc_server.MainActivity$1$1 target=com.test.weijuncheng.testanr_ipc_server.MainHandler when=2018-12-14_09:42:35.631 latency=+7ms processing=+3s501ms }
#4: { what=3 target=com.test.weijuncheng.testanr_ipc_server.MainHandler when=2018-12-14_09:42:32.085 latency=+1ms processing=+3s502ms }
#5: { what=159 target=android.app.ActivityThread$H when=2018-12-14_09:42:30.820 latency=+80ms processing=+217ms }
-------------------------- END --------------------------
---------- Dump Current Running Message ----------
{ what=115 target=android.app.ActivityThread$H when=2018-12-14_09:42:35.636 latency=+10s527ms }
-------------------------- END --------------------------

可見(jiàn),其中有三個(gè)關(guān)鍵數(shù)據(jù) when,latency,processing

when-理想中被取出的時(shí)間

b.append(" when=" + DATE_FORMATTER.format(new Date(planCurrentTime)));
long planCurrentTime;     // using the java.lang.System.currentTimeMillis() time-base. 

從表現(xiàn)看是入隊(duì)的真實(shí)時(shí)間 (應(yīng)該說(shuō)是Message期望被取出的時(shí)間,假設(shè)隊(duì)列為空,when = 10s,那么其實(shí)Message早就入隊(duì)了,只不過(guò)需要等10s,期待10s后被取出運(yùn)行;然而實(shí)際被取出的時(shí)間就不一定了)

latency-等待時(shí)間

b.append(" latency=" + TimeUtils.formatDuration(getLatencyMillis()));
324        // unexcepted delay time
325        long getLatencyMillis() {
326            return dispatchTime - planTime; //真正從MessageQueue被取出執(zhí)行的時(shí)間-入隊(duì)時(shí)間;即真正開(kāi)始執(zhí)行前的等待時(shí)間
327        }
#0: { what=114 target=android.app.ActivityThread$H when=2018-12-14_06:43:07.762 latency=+7s9ms processing=+3s518ms }

等待了7s,因?yàn)殛?duì)列中前兩個(gè)Message每個(gè)耗時(shí)3.5s

processing-處理時(shí)間

b.append(" processing=" + TimeUtils.formatDuration(getProcessMillis()));

329        long getProcessMillis() {
330            if (isFinished()) {
331                return finishTime - dispatchTime; //消息真正執(zhí)行的時(shí)常
332            } else {
333                return 0; //如果在dump時(shí)消息還沒(méi)執(zhí)行完成,返回0
334            }
335        }

#0: { what=114 target=android.app.ActivityThread$H when=2018-12-14_06:43:07.762 latency=+7s9ms processing=+3s518ms }

這個(gè)執(zhí)行時(shí)長(zhǎng)3.5s,符合demo的邏輯

那么當(dāng)前消息已執(zhí)行時(shí)間 = Dump time : 2018-12-14_06:43:27.901 - 2018-12-14_06:43:07.762 - latency=+10s527ms = 9.5s 也就是執(zhí)行了9.5s,最后ANR了

---------- History of long time messages on Looper (main, tid 2) {16ea0c0}---------- 

這個(gè)順序是倒過(guò)來(lái),按入隊(duì)時(shí)間從早到晚是#5,#4,#3,#2,#1,#0
打印的閥值是這樣設(shè)置的latency + processing > 200 就記錄
根據(jù)這些信息,就可以判斷出哪個(gè)Message耗時(shí)了

總結(jié)

那么耗時(shí)的點(diǎn)可能在system_server中的ActivityManager線程相關(guān)Handler耗時(shí),ANR進(jìn)程的主線程Handler耗時(shí),binder call耗時(shí)3部分;一般是后兩種情況,第一種一般不太可能

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容