前言
ANR是Android中經(jīng)常遇到的問(wèn)題,常規(guī)的ANR問(wèn)題,一般可以通過(guò)adb日志和trace文件,找到導(dǎo)致ANR的原因,但是有很多偶發(fā)的ANR問(wèn)題,難以采用常規(guī)的手段來(lái)分析的,通過(guò)學(xué)習(xí)字節(jié)跳動(dòng)整治ANR的系列文章,聊聊自己的感悟。
字節(jié)跳動(dòng)技術(shù)團(tuán)隊(duì)整治ANR系列文章
一、工程師的噩夢(mèng)
如下主線程的trace日志,想必這個(gè)日志是工程師的噩夢(mèng),為什么是噩夢(mèng)?因?yàn)橐话阌龅竭@種日志,如果應(yīng)用的日志不多,系統(tǒng)的日志不多,基本就是無(wú)解的存在。
android.os.MessageQueue.nativePollOnce(Native Method)
android.os.MessageQueue.next(MessageQueue.java:337)
android.os.Looper.loopOnce(Looper.java:184)
android.os.Looper.loop(Looper.java:324)
android.app.ActivityThread.main(ActivityThread.java:8140)
java.lang.reflect.Method.invoke(Native Method)
com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:559)
com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1009)
二、字節(jié)整治ANR法寶
字節(jié)整治ANR的兩大法寶,Raster和Sliver
2.1 Raster
Raster主要作用:記錄主線程Message歷史信息

2.2 Sliver
Sliver主要作用:記錄線程歷史堆棧調(diào)用信息

小結(jié)
由于這兩個(gè)功能的代碼并沒(méi)有開(kāi)源,我自己簡(jiǎn)單的實(shí)現(xiàn)一下部分功能,來(lái)驗(yàn)證一下,這兩個(gè)法寶能否真正的破局ANR。
三、制造一個(gè)ANR現(xiàn)場(chǎng)
首先制造一個(gè)ANR的現(xiàn)場(chǎng),方便后面驗(yàn)證,代碼也很簡(jiǎn)單,只需要兩個(gè)類。
這個(gè)代碼進(jìn)入界面以后,按兩下back鍵就可以出現(xiàn)ANR
public class MainActivity extends AppCompatActivity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
}
@Override
protected void onResume() {
super.onResume();
Intent intent = new Intent(this, Service.class);
this.startService(intent);
}
}
public class Service extends android.app.Service {
@Nullable
@Override
public IBinder onBind(Intent intent) {
return null;
}
@Override
public void onCreate() {
super.onCreate();
try {
Thread.sleep(7000);
} catch (Exception e) {
e.printStackTrace();
}
}
}
四、簡(jiǎn)化版Raster和Sliver設(shè)計(jì)
4.1 簡(jiǎn)化版Raster
保存并打印歷史Message以及待處理的Message
4.2 簡(jiǎn)化版的Sliver
直接調(diào)用如下代碼打印主線程的堆棧。
Looper.getMainLooper().getThread().getStackTrace()
4.3 觸發(fā)的時(shí)間
在am_anr和ANR in的兩個(gè)時(shí)機(jī)同時(shí)觸發(fā)一次簡(jiǎn)化版Raster和Sliver,
am_anr比較接近ANR觸發(fā)點(diǎn),
ANR in其實(shí)是ANR觸發(fā)以后,過(guò)了一段時(shí)間了。
五、實(shí)踐
如下是我使用簡(jiǎn)化版的Raster和Sliver抓到的日志
5.1 am_anr時(shí)候觸發(fā)
-----------HISTORY MSG-----------
Current Time:70022
Working Message:start=64631 delay=278ms took=0ms repeat=0 { when=0 what=114 obj=CreateServiceData{token=android.os.BinderProxy@2d85d09 className=com.kobe.fpstest.Service packageName=com.kobe.fpstest intent=null} target=android.app.ActivityThread$H }
Message 0 :start=64150 delay=5ms took=3ms repeat=0 { when=0 what=162 obj=/data/user/0/com.kobe.fpstest target=android.app.ActivityThread$H }
Message 1 :start=64153 delay=2ms took=1ms repeat=0 { when=0 what=138 obj=Bundle[mParcelledData.dataSize=1172] target=android.app.ActivityThread$H }
Message 2 :start=64154 delay=3ms took=73ms repeat=0 { when=0 what=110 obj=AppBindData{appInfo=ApplicationInfo{a8c29d6 com.kobe.fpstest}} target=android.app.ActivityThread$H }
Message 3 :start=64227 delay=75ms took=151ms repeat=0 { when=0 what=159 obj=android.app.servertransaction.ClientTransaction@5058edb0 target=android.app.ActivityThread$H }
Message 4 :start=64378 delay=225ms took=0ms repeat=0 { when=0 what=159 obj=android.app.servertransaction.ClientTransaction@3f67e3d1 target=android.app.ActivityThread$H }
Message 5 :start=64378 delay=154ms took=1ms repeat=0 { when=0 what=164 obj=com.kobe.fpstest target=android.app.ActivityThread$H }
Message 6 :start=64379 delay=33ms took=0ms repeat=0 { when=0 callback=androidx.core.content.res.ResourcesCompat$FontCallback$2 target=android.os.Handler }
-----------TODO MSG-----------
Message 0 :{ when=-5s670ms what=115 obj=ServiceArgsData{token=android.os.BinderProxy@2d85d09 startId=1 args=Intent { cmp=com.kobe.fpstest/.Service }} target=android.app.ActivityThread$H }
Message 1 :{ when=-5s659ms callback=android.view.ViewRootImpl$4 target=android.view.ViewRootImpl$ViewRootHandler }
Message 2 :{ when=-5s653ms barrier=0 }
Message 3 :{ when=-5s643ms callback=android.view.Choreographer$FrameDisplayEventReceiver target=android.view.Choreographer$FrameHandler }
Message 4 :{ when=-5s91ms what=8 arg1=1 target=android.view.ViewRootImpl$ViewRootHandler }
Message 5 :{ when=-5s91ms what=149 obj=android.os.BinderProxy@fd199f1 target=android.app.ActivityThread$H }
Message 6 :{ when=-4s873ms callback=android.app.ActivityThread$$ExternalSyntheticLambda1 target=android.app.ActivityThread$H }
-----------MAIN THREAD STACK-----------
java.lang.Thread.sleep(Native Method)
java.lang.Thread.sleep(Thread.java:451)
java.lang.Thread.sleep(Thread.java:356)
com.kobe.fpstest.Service.onCreate(Service.java:25)
android.app.ActivityThread.handleCreateService(ActivityThread.java:4664)
android.app.ActivityThread.access$1700(ActivityThread.java:281)
android.app.ActivityThread$H.handleMessage(ActivityThread.java:2212)
android.os.Handler.dispatchMessage(Handler.java:106)
android.os.Looper.loopOnce(Looper.java:231)
android.os.Looper.loop(Looper.java:324)
android.app.ActivityThread.main(ActivityThread.java:8140)
java.lang.reflect.Method.invoke(Native Method)
com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:559)
com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1009)
可以看到Working Message的start是64631,Current Time是70022,也就說(shuō)這個(gè)Message目前已經(jīng)運(yùn)行了70022-64631=5391,也就是5.3秒。
正好堆棧抓到了java.lang.Thread.sleep(Native Method),這個(gè)堆棧又是屬于Working Message。
5.2 ANR in時(shí)候觸發(fā)
-----------HISTORY MSG-----------
Current Time:75887
Working Message:null
Message 0 :start=64631 delay=278ms took=7002ms repeat=0 { when=0 what=114 obj=CreateServiceData{token=android.os.BinderProxy@2d85d09 className=com.kobe.fpstest.Service packageName=com.kobe.fpstest intent=null} target=android.app.ActivityThread$H }
Message 1 :start=71633 delay=7280ms took=2ms repeat=0 { when=0 what=115 obj=ServiceArgsData{token=android.os.BinderProxy@2d85d09 startId=1 args=Intent { cmp=com.kobe.fpstest/.Service }} target=android.app.ActivityThread$H }
Message 2 :start=71635 delay=7270ms took=2ms repeat=0 { when=0 callback=android.view.ViewRootImpl$4 target=android.view.ViewRootImpl$ViewRootHandler }
Message 3 :start=71637 delay=7255ms took=137ms repeat=0 { when=0 callback=android.view.Choreographer$FrameDisplayEventReceiver target=android.view.Choreographer$FrameHandler }
Message 4 :start=71774 delay=71774ms took=1ms repeat=0 { when=0 callback=android.view.ViewRootImpl$$ExternalSyntheticLambda12 target=android.view.ViewRootImpl$ViewRootHandler }
Message 5 :start=71775 delay=6841ms took=0ms repeat=0 { when=0 what=8 arg1=1 target=android.view.ViewRootImpl$ViewRootHandler }
Message 6 :start=71775 delay=6841ms took=0ms repeat=0 { when=0 what=149 obj=android.os.BinderProxy@fd199f1 target=android.app.ActivityThread$H }
Message 7 :start=71775 delay=6622ms took=1ms repeat=0 { when=0 callback=android.app.ActivityThread$$ExternalSyntheticLambda1 target=android.app.ActivityThread$H }
Message 8 :start=71776 delay=136ms took=49ms repeat=0 { when=0 callback=androidx.appcompat.app.AppCompatDelegateImpl$2 target=android.view.ViewRootImpl$ViewRootHandler }
Message 9 :start=71826 delay=139ms took=2ms repeat=0 { when=0 what=31 obj=com.android.internal.os.SomeArgs@bd1a441 target=android.view.ViewRootImpl$ViewRootHandler }
Message 10 :start=71832 delay=21ms took=2ms repeat=0 { when=0 callback=android.view.Choreographer$FrameDisplayEventReceiver target=android.view.Choreographer$FrameHandler }
Message 11 :start=71834 delay=8ms took=0ms repeat=0 { when=0 what=6 target=android.view.ViewRootImpl$ViewRootHandler }
Message 12 :start=71834 delay=5ms took=0ms repeat=0 { when=0 what=13 target=android.view.ViewRootImpl$ViewRootHandler }
Message 13 :start=71834 delay=3ms took=0ms repeat=0 { when=0 what=4 arg1=1 obj=false target=android.view.inputmethod.InputMethodManager$H }
Message 14 :start=71840 delay=0ms took=1ms repeat=0 { when=0 what=2 obj=InputBindResult{result=SUCCESS_WITH_IME_SESSION method=com.android.internal.view.IInputMethodSession$Stub$Proxy@394ede6 id=com.google.android.inputmethod.latin/com.android.inputmethod.latin.LatinIME sequence=2 isInputMethodSuppressingSpellChecker=false} target=android.view.inputmethod.InputMethodManager$H }
Message 15 :start=71842 delay=0ms took=0ms repeat=0 { when=0 what=10 target=android.view.inputmethod.InputMethodManager$H }
Message 16 :start=71850 delay=0ms took=1ms repeat=0 { when=0 what=31 obj=com.android.internal.os.SomeArgs@bd1a441 target=android.view.ViewRootImpl$ViewRootHandler }
-----------TODO MSG-----------
-----------MAIN THREAD STACK-----------
android.os.MessageQueue.nativePollOnce(Native Method)
android.os.MessageQueue.next(MessageQueue.java:337)
android.os.Looper.loopOnce(Looper.java:184)
android.os.Looper.loop(Looper.java:324)
android.app.ActivityThread.main(ActivityThread.java:8140)
java.lang.reflect.Method.invoke(Native Method)
com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:559)
com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1009)
可以發(fā)現(xiàn)這個(gè)堆棧就是前面提到程序員的噩夢(mèng)堆棧,由于發(fā)生anr之后,dump應(yīng)用的線程堆棧,會(huì)有延遲,有時(shí)候dump到的堆棧就是導(dǎo)致ANR的Message運(yùn)行結(jié)束之后,拿到的trace的日志就是這樣子。
單單從堆棧已經(jīng)無(wú)法找到罪魁禍?zhǔn)琢耍强梢詮腍ISTORY MSG可以看到這個(gè)Message運(yùn)行了7002毫秒,這個(gè)就是前面am_anr中打印的Working Message,也就是導(dǎo)致這次ANR的罪魁禍?zhǔn)住?/p>
Message 0 :start=64631 delay=278ms took=7002ms repeat=0 { when=0 what=114 obj=CreateServiceData{token=android.os.BinderProxy@2d85d09 className=com.kobe.fpstest.Service packageName=com.kobe.fpstest intent=null} target=android.app.ActivityThread$H }
小結(jié)
通過(guò)簡(jiǎn)單的實(shí)踐,展示了簡(jiǎn)化版的Raster和Sliver是如何發(fā)揮作用的。
六、總結(jié)
對(duì)于很多偶發(fā)ANR問(wèn)題,其實(shí)都是主線程的代碼,在極端的情況下,無(wú)法高效的被執(zhí)行,同時(shí)手機(jī)狀態(tài)又是處于比較卡頓的狀態(tài),導(dǎo)致dump的堆棧無(wú)法抓到導(dǎo)致ANR的單個(gè)或者M(jìn)essage的有效堆棧,從而導(dǎo)致很多問(wèn)題基本無(wú)解。
其實(shí)Raster和Sliver的作用就是盡可能地還原發(fā)生ANR前,主線程在干什么,為什么運(yùn)行那么久。
順著這個(gè)思路,我在思考排除代碼設(shè)計(jì)層面的,就是假設(shè)你的代碼常規(guī)運(yùn)行沒(méi)有什么問(wèn)題。
在極端情況下,可能運(yùn)行耗時(shí)的原因可能有IO,內(nèi)存,CPU調(diào)度,如果盡可能還原發(fā)生ANR問(wèn)題前,主線程的IO,內(nèi)存,CPU調(diào)度的狀態(tài),以及影響他們的其他進(jìn)程線程,那樣子很多ANR問(wèn)題就可以迎刃而解。
我第一個(gè)想到的就是抓Perfetto Trace,可惜抓trace會(huì)影響一定性能,不能一直抓,等到ANR發(fā)生了再抓,又有可能來(lái)不及,除非可以預(yù)判即將發(fā)生ANR,然后提前抓Trace,就可以解決這個(gè)問(wèn)題。
尾巴
本文是在研究字節(jié)的文章以后,加上自己一些小實(shí)踐以后的感悟,要徹底消滅ANR,目前還沒(méi)有一個(gè)非常牛逼手段。由于字節(jié)的Raster和Sliver并沒(méi)有開(kāi)源,很多信息是我從文章里面推測(cè)出來(lái)的,如果有理解不到位的點(diǎn),請(qǐng)大家多多指正。
希望我的文章對(duì)你破局ANR問(wèn)題有一些啟發(fā)的意義。