前言
Service發(fā)生ANR時(shí)一直存在一個(gè)誤區(qū),一直認(rèn)為
2121 bumpServiceExecutingLocked(r, execInFg, "bind");
2558 bumpServiceExecutingLocked(r, execInFg, "create");
2682 bumpServiceExecutingLocked(r, execInFg, "start");
2808 bumpServiceExecutingLocked(r, false, "bring down unbind");
2908 bumpServiceExecutingLocked(r, false, "destroy");
3004 bumpServiceExecutingLocked(s, false, "unbind");
這些操作每一個(gè)不超過(guò)timeout時(shí)間就行了,后來(lái)發(fā)現(xiàn)其實(shí)不是這樣,以下面的demo為例:
客戶端
class MainHandler extends Handler{
public static final int MSG_TEST_1 = 1;
public static final int MSG_TEST_2 = 2;
public static final int MSG_TEST_3 = 3;
public static final int MSG_TEST_4 = 4;
public static final int MSG_TEST_5 = 5;
MainHandler(Looper loop){
super(loop);
}
@Override
public void handleMessage(Message msg) {
switch (msg.what){
case MSG_TEST_1:
Log.i("weijuncheng","msg 1 start");
try {
Thread.sleep(3500);
} catch (InterruptedException e) {
e.printStackTrace();
}
Log.i("weijuncheng","msg 1 end");
break;
case MSG_TEST_2:
Log.i("weijuncheng","msg 2 start");
Log.i("weijuncheng","msg 2 end");
break;
case MSG_TEST_3:
Log.i("weijuncheng","msg 3 start");
try {
Thread.sleep(3500);
} catch (InterruptedException e) {
e.printStackTrace();
}
Log.i("weijuncheng","msg 3 end");
break;
case MSG_TEST_4:
Log.i("weijuncheng","msg 4 start");
Log.i("weijuncheng","msg 4 end");
break;
case MSG_TEST_5:
Log.i("weijuncheng","msg 5 start");
Log.i("weijuncheng","msg 5 end");
break;
}
}
}
public class MainActivity extends AppCompatActivity {
private Handler mHandler;
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
mHandler = new MainHandler(MainActivity.this.getMainLooper());
Button btn = (Button)findViewById(R.id.btn);
btn.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View v) {
mHandler.post(new Runnable(){
@Override
public void run() {
Log.i("weijuncheng","message 0 start");
try {
Thread.sleep(3500);
} catch (InterruptedException e) {
e.printStackTrace();
}
Log.i("weijuncheng","message 0 end");
}});
//上面的post Message打不出來(lái),會(huì)不會(huì)是post發(fā)送的原因
mHandler.sendEmptyMessage(MainHandler.MSG_TEST_1);
mHandler.sendEmptyMessage(MainHandler.MSG_TEST_2);
mHandler.sendEmptyMessage(MainHandler.MSG_TEST_4);
mHandler.sendEmptyMessageDelayed(MainHandler.MSG_TEST_5,100);
startService(new Intent(MainActivity.this, TestAnrService.class));
}
});
mHandler.sendEmptyMessageDelayed(MainHandler.MSG_TEST_3,1000);
}
}
服務(wù)端
public class TestAnrService extends Service {
private ITestAnrService.Stub IService = new ITestAnrService.Stub() {
@Override
public void basicTypes(int anInt, long aLong, boolean aBoolean, float aFloat, double aDouble, String aString) throws RemoteException {
}
@Override
public void Method1() throws RemoteException {
}
@Override
public void Method2() throws RemoteException {
}
};
public TestAnrService() {
}
@Override
public void onCreate() {
Log.i("weijuncheng","onCreate start");
super.onCreate();
try {
Thread.sleep(3500);
} catch (InterruptedException e) {
e.printStackTrace();
}
Log.i("weijuncheng","onCreate end");
}
@Override
public IBinder onBind(Intent intent) {
// TODO: Return the communication channel to the service.
return IService;
}
@Override
public int onStartCommand(Intent intent, int flags, int startId) {
Log.i("weijuncheng","onStartCommand start");
try {
Thread.sleep(30000);
} catch (InterruptedException e) {
e.printStackTrace();
}
Log.i("weijuncheng","onStartCommand end");
return super.onStartCommand(intent, flags, startId);
}
}
按照前面的理解,AMS的每個(gè)操作都不超過(guò)timeout時(shí)間,那么就產(chǎn)生問(wèn)題了,按理說(shuō)onCreate前面確實(shí)等了很長(zhǎng)時(shí)間,但是onCreate執(zhí)行完了,system_server中的ActivityMananger線程對(duì)應(yīng)的Handler應(yīng)該就會(huì)把相應(yīng)的超時(shí)消息移除了啊
整理
我們回顧下移除的位置serviceDoneExecutingLocked:
3138 private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
3139 boolean finishing) {
3140 if (DEBUG_SERVICE) Slog.v(TAG_SERVICE, "<<< DONE EXECUTING " + r
3141 + ": nesting=" + r.executeNesting
3142 + ", inDestroying=" + inDestroying + ", app=" + r.app);
3143 else if (DEBUG_SERVICE_EXECUTING) Slog.v(TAG_SERVICE_EXECUTING,
3144 "<<< DONE EXECUTING " + r.shortName);
3145 r.executeNesting--;
3146 if (r.executeNesting <= 0) {
3147 if (r.app != null) {
3148 if (DEBUG_SERVICE) Slog.v(TAG_SERVICE,
3149 "Nesting at 0 of " + r.shortName);
3150 r.app.execServicesFg = false;
3151 r.app.executingServices.remove(r);
3152 if (r.app.executingServices.size() == 0) {
3153 if (DEBUG_SERVICE || DEBUG_SERVICE_EXECUTING) Slog.v(TAG_SERVICE_EXECUTING,
3154 "No more executingServices of " + r.shortName);
3155 mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
3156 } else if (r.executeFg) {
3157 // Need to re-evaluate whether the app still needs to be in the foreground.
3158 for (int i=r.app.executingServices.size()-1; i>=0; i--) {
3159 if (r.app.executingServices.valueAt(i).executeFg) {
3160 r.app.execServicesFg = true;
3161 break;
3162 }
3163 }
3164 }
3165 if (inDestroying) {
3166 if (DEBUG_SERVICE) Slog.v(TAG_SERVICE,
3167 "doneExecuting remove destroying " + r);
3168 mDestroyingServices.remove(r);
3169 r.bindings.clear();
3170 }
3171 mAm.updateOomAdjLocked(r.app, true);
3172 }
3173 r.executeFg = false;
3174 if (r.tracker != null) {
3175 r.tracker.setExecuting(false, mAm.mProcessStats.getMemFactorLocked(),
3176 SystemClock.uptimeMillis());
3177 if (finishing) {
3178 r.tracker.clearCurrentOwner(r, false);
3179 r.tracker = null;
3180 }
3181 }
3182 if (finishing) {
3183 if (r.app != null && !r.app.persistent) {
3184 r.app.services.remove(r);
3185 if (r.whitelistManager) {
3186 updateWhitelistManagerLocked(r.app);
3187 }
3188 }
3189 r.app = null;
3190 }
3191 }
3192 }
可以看到,取消超時(shí)Message的時(shí)候有一個(gè)條件r.executeNesting <= 0;每執(zhí)行一個(gè)操作會(huì)+1,執(zhí)行完進(jìn)程會(huì)通知AMS,調(diào)用serviceDoneExecutingLocked的executeNesting--;但是就以startService為例,其操作不止一個(gè);當(dāng)客戶端進(jìn)程通過(guò)binder call調(diào)用startService時(shí),system_server一個(gè)binder線程A響應(yīng),如執(zhí)行realStartServiceLocked;在中間會(huì)再調(diào)用binder call對(duì)服務(wù)端進(jìn)行操作;服務(wù)端的一個(gè)binder線程響應(yīng),將相應(yīng)消息發(fā)送到服務(wù)端進(jìn)程的主線程;等到執(zhí)行完畢,通過(guò)binder call調(diào)用AMS中的binder線程B執(zhí)行serviceDoneExecutingLocked,r.executeNesting--;但是system_server的binder線程A不會(huì)等待B的執(zhí)行結(jié)果再繼續(xù)執(zhí)行;那么executeNesting的值很可能為2或者更多,就像上面那種情況;所以說(shuō)Service ANR的是以一個(gè)操作來(lái)判斷是否ANR,其粒度不是create,start,bind這種粒度,而是如startService這一個(gè)操作,需要在第一個(gè)超時(shí)消息真正被執(zhí)行前處理完(removeMessage時(shí)會(huì)將所有what值相同的Message全部移除)
加上debug service log驗(yàn)證:
12-21 11:28:26.309 1000 1487 9392 V ActivityManager: >>> EXECUTING create of ServiceRecord{da8991b u0 com.test.weijuncheng.testanr_ipc_server/.Service.TestAnrService} in app ProcessRecord{5519ab8 11495:com.test.weijuncheng.testanr_ipc_server/u0a172}
12-21 11:28:26.311 1000 1487 9392 V ActivityManager: Sending arguments to: ServiceRecord{da8991b u0 com.test.weijuncheng.testanr_ipc_server/.Service.TestAnrService} android.content.Intent$FilterComparison@e16fe3c9 args=Intent { cmp=com.test.weijuncheng.testanr_ipc_server/.Service.TestAnrService }
12-21 11:28:26.311 1000 1487 9392 V ActivityManager: >>> EXECUTING start of ServiceRecord{da8991b u0 com.test.weijuncheng.testanr_ipc_server/.Service.TestAnrService} in app ProcessRecord{5519ab8 11495:com.test.weijuncheng.testanr_ipc_server/u0a172}
12-21 11:28:36.863 10172 11495 11495 I weijuncheng: onCreate end
12-21 11:28:36.868 1000 1487 4648 V ActivityManager: <<< DONE EXECUTING ServiceRecord{da8991b u0 com.test.weijuncheng.testanr_ipc_server/.Service.TestAnrService}: nesting=2, inDestroying=false, app=ProcessRecord{5519ab8 11495:com.test.weijuncheng.testanr_ipc_server/u0a172}
12-21 11:28:36.871 10172 11495 11495 I weijuncheng: onStartCommand start
針對(duì)小米機(jī)型的技巧
在小米機(jī)型上,許多機(jī)器都有Message統(tǒng)計(jì)的功能,統(tǒng)計(jì)哪些Message等待時(shí)間,執(zhí)行時(shí)間較長(zhǎng),在/data/anr下有anr_info_processName.txt;上面的demo生成的文件如下:
ANR in com.test.weijuncheng.testanr_ipc_server
PID: 10504
Reason: executing service com.test.weijuncheng.testanr_ipc_server/.Service.TestAnrService
package com.test.weijuncheng.testanr_ipc_server version Code: 1 version Name: 1.0 cur loop is : Looper (main, tid 2) {16ea0c0}
Dump time : 2018-12-14_09:42:55.746
---------- History of long time messages on Looper (main, tid 2) {16ea0c0}----------
#0: { what=114 target=android.app.ActivityThread$H when=2018-12-14_09:42:35.635 latency=+7s8ms processing=+3s518ms }
#1: { what=2 target=com.test.weijuncheng.testanr_ipc_server.MainHandler when=2018-12-14_09:42:35.631 latency=+7s12ms processing=+1ms }
#2: { what=1 target=com.test.weijuncheng.testanr_ipc_server.MainHandler when=2018-12-14_09:42:35.631 latency=+3s509ms processing=+3s503ms }
#3: { callback=com.test.weijuncheng.testanr_ipc_server.MainActivity$1$1 target=com.test.weijuncheng.testanr_ipc_server.MainHandler when=2018-12-14_09:42:35.631 latency=+7ms processing=+3s501ms }
#4: { what=3 target=com.test.weijuncheng.testanr_ipc_server.MainHandler when=2018-12-14_09:42:32.085 latency=+1ms processing=+3s502ms }
#5: { what=159 target=android.app.ActivityThread$H when=2018-12-14_09:42:30.820 latency=+80ms processing=+217ms }
-------------------------- END --------------------------
---------- Dump Current Running Message ----------
{ what=115 target=android.app.ActivityThread$H when=2018-12-14_09:42:35.636 latency=+10s527ms }
-------------------------- END --------------------------
可見(jiàn),其中有三個(gè)關(guān)鍵數(shù)據(jù) when,latency,processing
when-理想中被取出的時(shí)間
b.append(" when=" + DATE_FORMATTER.format(new Date(planCurrentTime)));
long planCurrentTime; // using the java.lang.System.currentTimeMillis() time-base.
從表現(xiàn)看是入隊(duì)的真實(shí)時(shí)間 (應(yīng)該說(shuō)是Message期望被取出的時(shí)間,假設(shè)隊(duì)列為空,when = 10s,那么其實(shí)Message早就入隊(duì)了,只不過(guò)需要等10s,期待10s后被取出運(yùn)行;然而實(shí)際被取出的時(shí)間就不一定了)
latency-等待時(shí)間
b.append(" latency=" + TimeUtils.formatDuration(getLatencyMillis()));
324 // unexcepted delay time
325 long getLatencyMillis() {
326 return dispatchTime - planTime; //真正從MessageQueue被取出執(zhí)行的時(shí)間-入隊(duì)時(shí)間;即真正開(kāi)始執(zhí)行前的等待時(shí)間
327 }
#0: { what=114 target=android.app.ActivityThread$H when=2018-12-14_06:43:07.762 latency=+7s9ms processing=+3s518ms }
等待了7s,因?yàn)殛?duì)列中前兩個(gè)Message每個(gè)耗時(shí)3.5s
processing-處理時(shí)間
b.append(" processing=" + TimeUtils.formatDuration(getProcessMillis()));
329 long getProcessMillis() {
330 if (isFinished()) {
331 return finishTime - dispatchTime; //消息真正執(zhí)行的時(shí)常
332 } else {
333 return 0; //如果在dump時(shí)消息還沒(méi)執(zhí)行完成,返回0
334 }
335 }
#0: { what=114 target=android.app.ActivityThread$H when=2018-12-14_06:43:07.762 latency=+7s9ms processing=+3s518ms }
這個(gè)執(zhí)行時(shí)長(zhǎng)3.5s,符合demo的邏輯
那么當(dāng)前消息已執(zhí)行時(shí)間 = Dump time : 2018-12-14_06:43:27.901 - 2018-12-14_06:43:07.762 - latency=+10s527ms = 9.5s 也就是執(zhí)行了9.5s,最后ANR了
---------- History of long time messages on Looper (main, tid 2) {16ea0c0}----------
這個(gè)順序是倒過(guò)來(lái),按入隊(duì)時(shí)間從早到晚是#5,#4,#3,#2,#1,#0
打印的閥值是這樣設(shè)置的latency + processing > 200 就記錄
根據(jù)這些信息,就可以判斷出哪個(gè)Message耗時(shí)了
總結(jié)
那么耗時(shí)的點(diǎn)可能在system_server中的ActivityManager線程相關(guān)Handler耗時(shí),ANR進(jìn)程的主線程Handler耗時(shí),binder call耗時(shí)3部分;一般是后兩種情況,第一種一般不太可能