ANR的定義
- 系統(tǒng)通過交互組件和用戶交互進(jìn)行超時(shí)監(jiān)控,主要用來判斷應(yīng)用進(jìn)程是否存在卡死或者響應(yīng)過慢的問題。
導(dǎo)致ANR的原因
應(yīng)用層導(dǎo)致ANR:
- 函數(shù)阻塞:死循環(huán),主線程IO,處理大數(shù)據(jù),
- 鎖出錯(cuò):主線程等待子線程的鎖。
- 內(nèi)存緊張:系統(tǒng)分配給一個(gè)應(yīng)用的內(nèi)存是有上限的,長期處于內(nèi)存緊張,會(huì)導(dǎo)致頻繁內(nèi)存交換,導(dǎo)致操作超時(shí)。
系統(tǒng)導(dǎo)致ANR: - CPU被搶占:一般來說,前臺(tái)會(huì)搶占后臺(tái)的,導(dǎo)致后臺(tái)應(yīng)用緊張被搶占。
- 系統(tǒng)服務(wù)無法及時(shí)響應(yīng):系統(tǒng)的服務(wù)都是Binder機(jī)制,服務(wù)能力也是有限的,有可能系統(tǒng)服務(wù)長時(shí)間不響應(yīng)導(dǎo)致ANR。
- 其他應(yīng)用占用大量內(nèi)存。
線下拿ANR日志
- adb pull /data/anr/
- adb bugreport
導(dǎo)入ANR生成的文件遇到的問題: - adb: error: failed to stat remote object 'data/anr/traces.txt': No such file or directory 這是因?yàn)閺S商對(duì)這塊做了優(yōu)化, 以前anr一直放在traces文件中,多次出現(xiàn)會(huì)有覆蓋的問題,高版本廠商做了優(yōu)化,會(huì)根據(jù)時(shí)間戳分別生成一個(gè)文件,打包到處。
- adb bugreport 會(huì)導(dǎo)出一個(gè)zip壓縮包。
線上ANR監(jiān)控方案
線上可以通過FileObsever監(jiān)控上述ANR信息文件的變化,如果文件發(fā)生了變化,可以把他上傳到服務(wù)器。
ANR dunp主要流程
ANR流程基本都是在system_server 系統(tǒng)進(jìn)程完成的,系統(tǒng)進(jìn)程的行為我們很難監(jiān)控到。
不管是怎么發(fā)生的ANR,最后都會(huì)走到appNotResponding.
例如:輸入超時(shí)的路徑
ActivityManagerService#inputDispatchingTimeOut->AnrHelper#appNotResponding->AnrConsumerThread#run->AnrRecord#NotResponding->ProcessRecord#appNotResponding.
//com.android.server.am.ProcessRecord.java
void appNotResponding(String activityShortComponentName, ApplicationInfo aInfo,
String parentShortComponentName, WindowProcessController parentProcess,
boolean aboveSystem, String annotation, boolean onlyDumpSelf) {
ArrayList<Integer> firstPids = new ArrayList<>(5);
SparseArray<Boolean> lastPids = new SparseArray<>(20);
mWindowProcessController.appEarlyNotResponding(annotation, () -> kill("anr",
ApplicationExitInfo.REASON_ANR, true));
long anrTime = SystemClock.uptimeMillis();
if (isMonitorCpuUsage()) {
mService.updateCpuStatsNow();
}
final boolean isSilentAnr;
synchronized (mService) {
//注釋1
// PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down.
//正在重啟
if (mService.mAtmInternal.isShuttingDown()) {
Slog.i(TAG, "During shutdown skipping ANR: " + this + " " + annotation);
return;
} else if (isNotResponding()) {
//已經(jīng)處于ANR流程中
Slog.i(TAG, "Skipping duplicate ANR: " + this + " " + annotation);
return;
} else if (isCrashing()) {
//正在crash的狀態(tài)
Slog.i(TAG, "Crashing app skipping ANR: " + this + " " + annotation);
return;
} else if (killedByAm) {
//app已經(jīng)被killed
Slog.i(TAG, "App already killed by AM skipping ANR: " + this + " " + annotation);
return;
} else if (killed) {
//app已經(jīng)死亡了
Slog.i(TAG, "Skipping died app ANR: " + this + " " + annotation);
return;
}
// In case we come through here for the same app before completing
// this one, mark as anring now so we will bail out.
//做個(gè)標(biāo)記
setNotResponding(true);
// Log the ANR to the event log.
EventLog.writeEvent(EventLogTags.AM_ANR, userId, pid, processName, info.flags,
annotation);
// Dump thread traces as quickly as we can, starting with "interesting" processes.
firstPids.add(pid);
// Don't dump other PIDs if it's a background ANR or is requested to only dump self.
//注釋2
//沉默的anr : 這里表示后臺(tái)anr
isSilentAnr = isSilentAnr();
if (!isSilentAnr && !onlyDumpSelf) {
int parentPid = pid;
if (parentProcess != null && parentProcess.getPid() > 0) {
parentPid = parentProcess.getPid();
}
if (parentPid != pid) firstPids.add(parentPid);
if (MY_PID != pid && MY_PID != parentPid) firstPids.add(MY_PID);
//選擇需要dump的進(jìn)程
for (int i = getLruProcessList().size() - 1; i >= 0; i--) {
ProcessRecord r = getLruProcessList().get(i);
if (r != null && r.thread != null) {
int myPid = r.pid;
if (myPid > 0 && myPid != pid && myPid != parentPid && myPid != MY_PID) {
if (r.isPersistent()) {
firstPids.add(myPid);
if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
} else if (r.treatLikeActivity) {
firstPids.add(myPid);
if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
} else {
lastPids.put(myPid, Boolean.TRUE);
if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
}
}
}
}
}
}
......
int[] pids = nativeProcs == null ? null : Process.getPidsForCommands(nativeProcs);
ArrayList<Integer> nativePids = null;
if (pids != null) {
nativePids = new ArrayList<>(pids.length);
for (int i : pids) {
nativePids.add(i);
}
}
// For background ANRs, don't pass the ProcessCpuTracker to
// avoid spending 1/2 second collecting stats to rank lastPids.
StringWriter tracesFileException = new StringWriter();
// To hold the start and end offset to the ANR trace file respectively.
final long[] offsets = new long[2];
//注釋4
File tracesFile = ActivityManagerService.dumpStackTraces(firstPids,
isSilentAnr ? null : processCpuTracker, isSilentAnr ? null : lastPids,
nativePids, tracesFileException, offsets);
......
}
- 正在重啟、已經(jīng)處于ANR流程中、正在crash、app已經(jīng)被killed和app已經(jīng)死亡了,不用處理ANR,直接return。
- isSilentAnr是表示當(dāng)前是否為一個(gè)后臺(tái)ANR,后臺(tái)ANR跟前臺(tái)ANR表現(xiàn)不同,前臺(tái)ANR會(huì)彈出無響應(yīng)的Dialog,后臺(tái)ANR會(huì)直接殺死進(jìn)程.
- 在上面注釋3中 需要dump的進(jìn)程,發(fā)生ANR dump 很多信息到trace文件中,dump的進(jìn)程分為3類:
- firstPids:firstPids是需要首先dump的重要進(jìn)程,發(fā)生ANR的進(jìn)程無論如何是一定要被dump的,也是首先被dump的,所以第一個(gè)被加到firstPids中。如果是SilentAnr(即后臺(tái)ANR),不用再加入任何其他的進(jìn)程。如果不是,需要進(jìn)一步添加其他的進(jìn)程:如果發(fā)生ANR的進(jìn)程不是system_server進(jìn)程的話,需要添加system_server進(jìn)程;接下來輪詢AMS維護(hù)的一個(gè)LRU的進(jìn)程List,如果最近訪問的進(jìn)程包含了persistent的進(jìn)程,或者帶有 BIND_TREAT_LIKE_ACTVITY 標(biāo)簽的進(jìn)程,都添加到firstPids中。
- extraPids:LRU進(jìn)程List中的其他進(jìn)程,都會(huì)首先添加到lastPids中,然后lastPids會(huì)進(jìn)一步被選出最近CPU使用率高的進(jìn)程,進(jìn)一步組成extraPids。
- nativePids:nativePids最為簡單,是一些固定的native的系統(tǒng)進(jìn)程,定義在WatchDog.java中
拿到需要dump的所有進(jìn)程pid后,AMS開始按照firstPids,nativePids,extraPids的順序dump這些進(jìn)程的堆棧,這里我們可以跟進(jìn)去看一下。
public static Pair<Long, Long> dumpStackTraces(String tracesFile, ArrayList<Integer> firstPids,
ArrayList<Integer> nativePids, ArrayList<Integer> extraPids) {
// 最多dump 20秒
long remainingTime = 20 * 1000;
// First collect all of the stacks of the most important pids.
if (firstPids != null) {
int num = firstPids.size();
for (int i = 0; i < num; i++) {
final int pid = firstPids.get(i);
final long timeTaken = dumpJavaTracesTombstoned(pid, tracesFile, remainingTime);
remainingTime -= timeTaken;
if (remainingTime <= 0) {
Slog.e(TAG, "Aborting stack trace dump (current firstPid=" + pid
+ "); deadline exceeded.");
return firstPidStart >= 0 ? new Pair<>(firstPidStart, firstPidEnd) : null;
}
}
}
......
}
這里有一個(gè)重要的操作就是一個(gè)進(jìn)程有很多線程,更別說這么多進(jìn)程了,所以這里規(guī)定了最長dump時(shí)間為20S,超過就會(huì)返回。這里確保ANR彈窗可以及時(shí)彈出,經(jīng)過一系列的邏輯:ActivityManagerService#dumpJavaTracesTombstoned() → Debug#dumpJavaBacktraceToFileTimeout() → android_os_Debug#android_os_Debug_dumpJavaBacktraceToFileTimeout() → android_os_Debug#dumpTraces() → debuggerd_client#dump_backtrace_to_file_timeout() → debuggerd_client#debuggerd_trigger_dump()。
bool debuggerd_trigger_dump(pid_t tid, DebuggerdDumpType dump_type, unsigned int timeout_ms, unique_fd output_fd) {
//pid是從AMS那邊傳過來的,即需要dump堆棧的進(jìn)程
pid_t pid = tid;
//......
// Send the signal.
//從android_os_Debug_dumpJavaBacktraceToFileTimeout過來的,dump_type為kDebuggerdJavaBacktrace
const int signal = (dump_type == kDebuggerdJavaBacktrace) ? SIGQUIT : BIONIC_SIGNAL_DEBUGGER;
sigval val = {.sival_int = (dump_type == kDebuggerdNativeBacktrace) ? 1 : 0};
//sigqueue:在隊(duì)列中向指定進(jìn)程發(fā)送一個(gè)信號(hào)和數(shù)據(jù),成功返回0
if (sigqueue(pid, signal, val) != 0) {
log_error(output_fd, errno, "failed to send signal to pid %d", pid);
return false;
}
//......
LOG(INFO) << TAG "done dumping process " << pid;
return true;
}
這里就是AMS進(jìn)程間接給需要dump堆棧那個(gè)進(jìn)程發(fā)送了一個(gè)SIGQUIT信號(hào),那個(gè)進(jìn)程就會(huì)收到SIGQUIT信號(hào)之后開始dump,當(dāng)一個(gè)進(jìn)程發(fā)生ANR的時(shí)候,就會(huì)收到SIGQUIT信號(hào)。
關(guān)于信號(hào)處理這里再說一下,除了Zygote進(jìn)程外,每個(gè)進(jìn)程都會(huì)創(chuàng)建一個(gè)SignalCatcher守護(hù)線程,用于捕獲SIGQUIT,SIGUSR1信號(hào),并采取相應(yīng)的行為。
//art/runtime/signal_catcher.cc
void* SignalCatcher::Run(void* arg) {
SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);
CHECK(signal_catcher != nullptr);
Runtime* runtime = Runtime::Current();
//檢查當(dāng)前線程是否依附到Android Runtime
CHECK(runtime->AttachCurrentThread("Signal Catcher", true, runtime->GetSystemThreadGroup(), !runtime->IsAotCompiler()));
Thread* self = Thread::Current();
DCHECK_NE(self->GetState(), kRunnable);
{
MutexLock mu(self, signal_catcher->lock_);
signal_catcher->thread_ = self;
signal_catcher->cond_.Broadcast(self);
}
SignalSet signals;
signals.Add(SIGQUIT); //添加對(duì)信號(hào)SIGQUIT的處理
signals.Add(SIGUSR1); //添加對(duì)信號(hào)SIGUSR1的處理
//死循環(huán),不斷等待監(jiān)聽2個(gè)信號(hào)的dao'l
while (true) {
//等待信號(hào)到來,這是個(gè)阻塞操作
int signal_number = signal_catcher->WaitForSignal(self, signals);
//當(dāng)信號(hào)捕獲需要停止時(shí),則取消當(dāng)前線程跟Android Runtime的關(guān)聯(lián)。
if (signal_catcher->ShouldHalt()) {
runtime->DetachCurrentThread();
return nullptr;
}
switch (signal_number) {
case SIGQUIT:
signal_catcher->HandleSigQuit(); //輸出線程trace
break;
case SIGUSR1:
signal_catcher->HandleSigUsr1(); //強(qiáng)制GC
break;
default:
LOG(ERROR) << "Unexpected signal %d" << signal_number;
break;
}
}
}
在SignalCatcher線程里面,死循環(huán)通過WaitForSignal監(jiān)聽SIGQUIT和SIGUSR1信號(hào)的到來,前面系統(tǒng)進(jìn)程system_server進(jìn)行發(fā)送的信號(hào)也就是在這里被監(jiān)聽到開始dump堆棧。

監(jiān)控SIGQUIT信號(hào)
前面我們提到了除Zygote進(jìn)程以外的其他進(jìn)程都有個(gè)Signal Catcher線程在不斷地通過sigwait監(jiān)聽SIGQUIT信號(hào),當(dāng)收到SIGQUIT信號(hào)時(shí)開始dump線程堆棧。我們需要攔截或者監(jiān)聽SIGQUIT信號(hào),首先需要了解信號(hào)處理的相關(guān)函數(shù),如kill、signal、sigaction、sigwait、pthread_sigmask等。
void signalHandler(int sig, siginfo_t *info, void *uc) {
__android_log_print(ANDROID_LOG_DEBUG, "xfhy_anr", "我監(jiān)聽到SIGQUIT信號(hào)了,可能發(fā)生anr了");
//在這里去dump主線程堆棧
}
extern "C"
JNIEXPORT jboolean JNICALL
Java_com_xfhy_watchsignaldemo_MainActivity_startWatch(JNIEnv *env, jobject thiz) {
sigset_t set, old_set;
sigemptyset(&set);
sigaddset(&set, SIGQUIT);
/*
* 這里需要調(diào)用SIG_UNBLOCK,因?yàn)槟繕?biāo)進(jìn)程被Zogyte fork出來的時(shí)候,主線程繼承了
* Zogyte的主線程的信號(hào)屏蔽關(guān)系,Zogyte主線程在初始化的時(shí)候,通過
* pthread_sigmask SIG_BLOCK把SIGQUIT的信號(hào)給屏蔽了,因此我們需要在自己進(jìn)程的主線程,
* 設(shè)置pthread_sigmask SIG_UNBLOCK ,這會(huì)導(dǎo)致原來的SignalCatcher sigwait將失效,
* 原因是SignalCatcher 線程會(huì)對(duì)SIGQUIT 信號(hào)處理
*/
int r = pthread_sigmask(SIG_UNBLOCK, &set, &old_set);
if (0 != r) {
return false;
}
struct sigaction sa{};
sa.sa_sigaction = signalHandler;
sa.sa_flags = SA_ONSTACK | SA_SIGINFO | SA_RESTART;
return sigaction(SIGQUIT, &sa, nullptr) == 0;
}
Android默認(rèn)把SIGQUIT設(shè)置成了BLOCKED,所以只會(huì)響應(yīng)Signal Catcher線程的sigwait監(jiān)聽SIGQUIT信號(hào),我們用sigaction監(jiān)聽的則收不到,所以這里還需要處理一下。我們通過pthread_sigmask或者sigprocmask把SIGQUIT設(shè)置為UNBLOCK,那么再次收到SIGQUIT時(shí),就一定會(huì)進(jìn)入到我們的signalHandler方法中。
我們用sigaction搶了Signal Catcher線程的SIGQUIT信號(hào),那Signal Catcher線程就收不到該信號(hào)了,那原本的系統(tǒng)dump堆棧的流程就沒了,這是不太合適的。所以我們需要將該信號(hào)重新發(fā)送出去,讓Signal Catcher線程接收到該信號(hào)。
int tid = getSignalCatcherThreadId(); //遍歷/proc/[pid]目錄,找到SignalCatcher線程的tid
tgkill(getpid(), tid, SIGQUIT);
發(fā)生ANR的進(jìn)程一定會(huì)收到SIGQUIT信號(hào);但是收到SIGQUIT信號(hào)的進(jìn)程并不一定發(fā)生了ANR。
此時(shí)我們可以通過主線程釋放處于卡頓狀態(tài)來判斷,怎么快速的知道主線程是否卡住了?可以通過Looper的mMessage對(duì)象,該對(duì)象的when變量,表示的是當(dāng)前正在處理的消息入隊(duì)的時(shí)間,我們可以通過when變量減去當(dāng)前時(shí)間,得到的就是等待時(shí)間,如果等待時(shí)間過長,就說明主線程是處于卡住的狀態(tài)
private static boolean isMainThreadStuck(){
try {
MessageQueue mainQueue = Looper.getMainLooper().getQueue();
Field field = mainQueue.getClass().getDeclaredField("mMessages");
field.setAccessible(true);
final Message mMessage = (Message) field.get(mainQueue);
if (mMessage != null) {
long when = mMessage.getWhen();
if(when == 0) {
return false;
}
long time = when - SystemClock.uptimeMillis();
long timeThreshold = BACKGROUND_MSG_THRESHOLD;
if (foreground) {
timeThreshold = FOREGROUND_MSG_THRESHOLD;
}
return time < timeThreshold;
}
} catch (Exception e){
return false;
}
return false;
}
獲取ANR Trace
Signal Catcher線程寫Trace也是一個(gè)邊界,它是通過socket的write方法來寫trace的。那我們可以直接hook這里的write,就能直接拿到系統(tǒng)dump的ANR Trace內(nèi)容。這個(gè)內(nèi)容非常全面,包括了所有線程的各種狀態(tài)、鎖和堆棧(包括native堆棧),對(duì)于我們排查問題十分有用,尤其是一些native問題和死鎖等問題。native hook采用PLT Hook方案,穩(wěn)得很,這種方案已經(jīng)在微信上驗(yàn)證了其穩(wěn)定性。
int (*original_connect)(int __fd, const struct sockaddr* __addr, socklen_t __addr_length);
int my_connect(int __fd, const struct sockaddr* __addr, socklen_t __addr_length) {
if (strcmp(__addr->sa_data, "/dev/socket/tombstoned_java_trace") == 0) {
isTraceWrite = true;
signalCatcherTid = gettid();
}
return original_connect(__fd, __addr, __addr_length);
}
int (*original_open)(const char *pathname, int flags, mode_t mode);
int my_open(const char *pathname, int flags, mode_t mode) {
if (strcmp(pathname, "/data/anr/traces.txt") == 0) {
isTraceWrite = true;
signalCatcherTid = gettid();
}
return original_open(pathname, flags, mode);
}
ssize_t (*original_write)(int fd, const void* const __pass_object_size0 buf, size_t count);
ssize_t my_write(int fd, const void* const buf, size_t count) {
if(isTraceWrite && signalCatcherTid == gettid()) {
isTraceWrite = false;
signalCatcherTid = 0;
char *content = (char *) buf;
printAnrTrace(content);
}
return original_write(fd, buf, count);
}
void hookAnrTraceWrite() {
int apiLevel = getApiLevel();
if (apiLevel < 19) {
return;
}
if (apiLevel >= 27) {
plt_hook("libcutils.so", "connect", (void *) my_connect, (void **) (&original_connect));
} else {
plt_hook("libart.so", "open", (void *) my_open, (void **) (&original_open));
}
if (apiLevel >= 30 || apiLevel == 25 || apiLevel ==24) {
plt_hook("libc.so", "write", (void *) my_write, (void **) (&original_write));
} else if (apiLevel == 29) {
plt_hook("libbase.so", "write", (void *) my_write, (void **) (&original_write));
} else {
plt_hook("libart.so", "write", (void *) my_write, (void **) (&original_write));
}
}
總結(jié)
總結(jié)一下,該方案通過去監(jiān)聽SIGQUIT信號(hào),從而感知當(dāng)前進(jìn)程可能發(fā)生了ANR,需配合當(dāng)前進(jìn)程是否處于NOT_RESPONDING狀態(tài)以及主線程是否卡頓來進(jìn)行甄別,以免誤判。注冊(cè)監(jiān)聽SIGQUIT信號(hào)之后,系統(tǒng)原來的Signal Catcher線程就監(jiān)聽不到這個(gè)信號(hào)了,需要把該信號(hào)轉(zhuǎn)發(fā)出去,讓它接收到,以免影響。當(dāng)前進(jìn)程的Signal Catcher線程要dump堆棧的時(shí)候,會(huì)通過socket的write向system server進(jìn)程進(jìn)行傳輸dump好的數(shù)據(jù),我們可以hook這個(gè)write,從而拿到系統(tǒng)dump好的ANR Trace內(nèi)容,相當(dāng)于我們并沒有影響系統(tǒng)的任何流程,還拿到了想要拿到的東西。這個(gè)方案完全是在系統(tǒng)的正常dump anr trace的過程中獲取信息,所以能拿到的東西更加全面,但是系統(tǒng)的dump過程其實(shí)是對(duì)性能影響比較大的,時(shí)間也比較久。
ANR 文件分析
----- pid 7761 at 2022-11-02 07:02:26 -----
Cmd line: com.xfhy.watchsignaldemo
Build fingerprint: 'HUAWEI/LYA-AL00/HWLYA:10/HUAWEILYA-AL00/10.1.0.163C00:user/release-keys'
ABI: 'arm64'
Build type: optimized
Zygote loaded classes=11918 post zygote classes=729
Dumping registered class loaders
#0 dalvik.system.PathClassLoader: [], parent #1
#1 java.lang.BootClassLoader: [], no parent
#2 dalvik.system.PathClassLoader: [/system/app/FeatureFramework/FeatureFramework.apk], no parent
#3 dalvik.system.PathClassLoader: [/data/app/com.xfhy.watchsignaldemo-4tkKMWojrpHAf-Q3iecaHQ==/base.apk:/data/app/com.xfhy.watchsignaldemo-4tkKMWojrpHAf-Q3iecaHQ==/base.apk!classes2.dex:/data/app/com.xfhy.watchsignaldemo-4tkKMWojrpHAf-Q3iecaHQ==/base.apk!classes4.dex:/data/app/com.xfhy.watchsignaldemo-4tkKMWojrpHAf-Q3iecaHQ==/base.apk!classes3.dex], parent #1
Done dumping class loaders
Intern table: 44132 strong; 436 weak
JNI: CheckJNI is off; globals=681 (plus 67 weak)
Libraries: /data/app/com.xfhy.watchsignaldemo-4tkKMWojrpHAf-Q3iecaHQ==/lib/arm64/libwatchsignaldemo.so libandroid.so libcompiler_rt.so libhitrace_jni.so libhiview_jni.so libhwapsimpl_jni.so libiAwareSdk_jni.so libimonitor_jni.so libjavacore.so libjavacrypto.so libjnigraphics.so libmedia_jni.so libopenjdk.so libsoundpool.so libwebviewchromium_loader.so (15)
//已分配堆內(nèi)存大小26M,其中2442kb醫(yī)用,總分配74512個(gè)對(duì)象
Heap: 90% free, 2442KB/26MB; 74512 objects
Total number of allocations 120222 //進(jìn)程創(chuàng)建到現(xiàn)在一共創(chuàng)建了多少對(duì)象
Total bytes allocated 10MB //進(jìn)程創(chuàng)建到現(xiàn)在一共申請(qǐng)了多少內(nèi)存
Total bytes freed 8173KB //進(jìn)程創(chuàng)建到現(xiàn)在一共釋放了多少內(nèi)存
Free memory 23MB //不擴(kuò)展堆的情況下可用的內(nèi)存
Free memory until GC 23MB //GC前的可用內(nèi)存
Free memory until OOME 381MB //OOM之前的可用內(nèi)存,這個(gè)值很小的話,說明已經(jīng)處于內(nèi)存緊張狀態(tài),app可能是占用了過多的內(nèi)存
Total memory 26MB //當(dāng)前總內(nèi)存(已用+可用)
Max memory 384MB //進(jìn)程最多能申請(qǐng)的內(nèi)存
.....//省略GC相關(guān)信息
//當(dāng)前進(jìn)程共17個(gè)線程
DALVIK THREADS (17):
//Signal Catcher線程調(diào)用棧
"Signal Catcher" daemon prio=5 tid=4 Runnable
| group="system" sCount=0 dsCount=0 flags=0 obj=0x18c84570 self=0x7252417800
| sysTid=7772 nice=0 cgrp=default sched=0/0 handle=0x725354ad50
| state=R schedstat=( 16273959 1085938 5 ) utm=0 stm=1 core=4 HZ=100
| stack=0x7253454000-0x7253456000 stackSize=991KB
| held mutexes= "mutator lock"(shared held)
native: #00 pc 000000000042f8e8 /apex/com.android.runtime/lib64/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, int, BacktraceMap*, char const*, art::ArtMethod*, void*, bool)+140)
native: #01 pc 0000000000523590 /apex/com.android.runtime/lib64/libart.so (art::Thread::DumpStack(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, bool, BacktraceMap*, bool) const+508)
native: #02 pc 000000000053e75c /apex/com.android.runtime/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+844)
native: #03 pc 000000000053735c /apex/com.android.runtime/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*)+504)
native: #04 pc 0000000000536744 /apex/com.android.runtime/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, bool)+1048)
native: #05 pc 0000000000536228 /apex/com.android.runtime/lib64/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char>>&)+884)
native: #06 pc 00000000004ee4d8 /apex/com.android.runtime/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char>>&)+196)
native: #07 pc 000000000050250c /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1356)
native: #08 pc 0000000000501558 /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::Run(void*)+268)
native: #09 pc 00000000000cf7c0 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36)
native: #10 pc 00000000000721a8 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)
(no managed stack frames)
"main" prio=5 tid=1 Sleeping
| group="main" sCount=1 dsCount=0 flags=1 obj=0x73907540 self=0x725f010800
| sysTid=7761 nice=-10 cgrp=default sched=1073741825/2 handle=0x72e60080d0
| state=S schedstat=( 281909898 5919799 311 ) utm=20 stm=7 core=4 HZ=100
| stack=0x7fca180000-0x7fca182000 stackSize=8192KB
| held mutexes=
at java.lang.Thread.sleep(Native method)
- sleeping on <0x00f895d9> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:443)
- locked <0x00f895d9> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:359)
at android.os.SystemClock.sleep(SystemClock.java:131)
at com.xfhy.watchsignaldemo.MainActivity.makeAnr(MainActivity.kt:35)
at java.lang.reflect.Method.invoke(Native method)
at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:441)
at android.view.View.performClick(View.java:7317)
at com.google.android.material.button.MaterialButton.performClick(MaterialButton.java:1219)
at android.view.View.performClickInternal(View.java:7291)
at android.view.View.access$3600(View.java:838)
at android.view.View$PerformClick.run(View.java:28247)
at android.os.Handler.handleCallback(Handler.java:900)
at android.os.Handler.dispatchMessage(Handler.java:103)
at android.os.Looper.loop(Looper.java:219)
at android.app.ActivityThread.main(ActivityThread.java:8668)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:513)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1109)
... //此處省略剩余的N個(gè)線程
"Signal Catcher" daemon prio=5 tid=4 Runnable
| group="system" sCount=0 dsCount=0 flags=0 obj=0x18c84570 self=0x7252417800
| sysTid=7772 nice=0 cgrp=default sched=0/0 handle=0x725354ad50
| state=R schedstat=( 16273959 1085938 5 ) utm=0 stm=1 core=4 HZ=100
| stack=0x7253454000-0x7253456000 stackSize=991KB
| held mutexes= "mutator lock"(shared held)
"Signal Catcher" daemon prio=5 tid=4 Runnable
- Signal Catcher" daemon : 線程名,有daemon表示守護(hù)線程
- prio:線程優(yōu)先級(jí)
- tid:線程內(nèi)部id
-
線程狀態(tài):Runnable
image.png
|group="system" sCount=0 dsCount=0 flags=0 obj=0x18c84570 self=0x7252417800
- group:線程所屬的線程組
- sCount:線程掛起次數(shù)
- dsCount:用于調(diào)試的線程掛起次數(shù)
- obj:當(dāng)前線程關(guān)聯(lián)的Java線程對(duì)象
- self:當(dāng)前線程地址
| sysTid=7772 nice=0 cgrp=default sched=0/0 handle=0x725354ad50
- sysTid:線程真正意義上的tid
- nice:調(diào)度優(yōu)先級(jí),值越小則優(yōu)先級(jí)越高
- cgrp:進(jìn)程所屬的進(jìn)程調(diào)度組
- sched:調(diào)度策略
- handle:函數(shù)處理地址
state=R schedstat=( 16273959 1085938 5 ) utm=0 stm=1 core=4 HZ=100
- state:線程狀態(tài)
- schedstat:CPU調(diào)度時(shí)間統(tǒng)計(jì)(schedstat括號(hào)中的3個(gè)數(shù)字依次是Running、Runable、Switch,Running時(shí)間:CPU運(yùn)行的時(shí)間,單位ns,Runable時(shí)間:RQ隊(duì)列的等待時(shí)間,單位ns,Switch次數(shù):CPU調(diào)度切換次數(shù))
- utm/stm:用戶態(tài)/內(nèi)核態(tài)的CPU時(shí)間
- core:該線程的最后運(yùn)行所在核
- HZ:時(shí)鐘頻率
| stack=0x7253454000-0x7253456000 stackSize=991KB
- stack:線程棧的地址區(qū)間
- stackSize:棧的大小
held mutexes= "mutator lock"(shared held)
- mutex:所持有mutex類型,有獨(dú)占鎖exclusive和共享鎖shared兩類
ANR具體案例分析
- CPU被搶占
CPU usage from 0ms to 10625ms later (2020-03-09 14:38:31.633 to 2020-03-09 14:38:42.257):
543% 2045/com.test.demo: 54% user + 89% kernel / faults: 4608 minor 1 major //注意看這里
99% 674/android.hardware.camera.provider@2.4-service: 81% user + 18% kernel / faults: 403 minor
24% 32589/com.wang.test: 22% user + 1.4% kernel / faults: 7432 minor 1 major
......
進(jìn)程占據(jù)CPU高達(dá)543%,搶占了大部分CPU資源,因?yàn)閷?dǎo)致發(fā)生ANR,這種ANR與我們的app無關(guān)。
- 內(nèi)存緊張導(dǎo)致ANR
如果一份ANR日志的CPU和堆棧都很正常,可以考慮是內(nèi)存緊張。看一下ANR日志里面的內(nèi)存相關(guān)部分。還可以去日志里面搜一下onTrimMemory,如果dump ANR日志的時(shí)間附近有相關(guān)日志,可能是內(nèi)存比較緊張了。
10-31 22:37:19.749 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:37:33.458 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:38:00.153 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:38:58.731 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:39:02.816 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
3.系統(tǒng)服務(wù)超時(shí)導(dǎo)致ANR
系統(tǒng)服務(wù)超時(shí)一般會(huì)包含BinderProxy.transactNative關(guān)鍵字,來看一段日志:
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x727851e8 self=0x78d7060e00
| sysTid=4894 nice=0 cgrp=default sched=0/0 handle=0x795cc1e9a8
| state=S schedstat=( 8292806752 1621087524 7167 ) utm=707 stm=122 core=5 HZ=100
| stack=0x7febb64000-0x7febb66000 stackSize=8MB
| held mutexes=
kernel: __switch_to+0x90/0xc4
kernel: binder_thread_read+0xbd8/0x144c
kernel: binder_ioctl_write_read.constprop.58+0x20c/0x348
kernel: binder_ioctl+0x5d4/0x88c
kernel: do_vfs_ioctl+0xb8/0xb1c
kernel: SyS_ioctl+0x84/0x98
kernel: cpu_switch_to+0x34c/0x22c0
native: #00 pc 000000000007a2ac /system/lib64/libc.so (__ioctl+4)
native: #01 pc 00000000000276ec /system/lib64/libc.so (ioctl+132)
native: #02 pc 00000000000557d4 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+252)
native: #03 pc 0000000000056494 /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+60)
native: #04 pc 00000000000562d0 /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+216)
native: #05 pc 000000000004ce1c /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+72)
native: #06 pc 00000000001281c8 /system/lib64/libandroid_runtime.so (???)
native: #07 pc 0000000000947ed4 /system/framework/arm64/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+196)
at android.os.BinderProxy.transactNative(Native method) ————————————————關(guān)鍵行?。。? at android.os.BinderProxy.transact(Binder.java:804)
at android.net.IConnectivityManager$Stub$Proxy.getActiveNetworkInfo(IConnectivityManager.java:1204)—關(guān)鍵行!
at android.net.ConnectivityManager.getActiveNetworkInfo(ConnectivityManager.java:800)
at com.xiaomi.NetworkUtils.getNetworkInfo(NetworkUtils.java:2)
at com.xiaomi.frameworkbase.utils.NetworkUtils.getNetWorkType(NetworkUtils.java:1)
at com.xiaomi.frameworkbase.utils.NetworkUtils.isWifiConnected(NetworkUtils.java:1)
從日志堆棧中可以看到是獲取網(wǎng)絡(luò)信息發(fā)生了ANR:getActiveNetworkInfo。系統(tǒng)的服務(wù)都是Binder機(jī)制(16個(gè)線程),服務(wù)能力也是有限的,有可能系統(tǒng)服務(wù)長時(shí)間不響應(yīng)導(dǎo)致ANR。如果其他應(yīng)用占用了所有Binder線程,那么當(dāng)前應(yīng)用只能等待。可進(jìn)一步搜索:blockUntilThreadAvailable關(guān)鍵字:
at android.os.Binder.blockUntilThreadAvailable(Native method)
如果有發(fā)現(xiàn)某個(gè)線程的堆棧,包含此字樣,可進(jìn)一步看其堆棧,確定是調(diào)用了什么系統(tǒng)服務(wù)。此類ANR也是屬于系統(tǒng)環(huán)境的問題,如果某類型手機(jī)上頻繁發(fā)生此問題,應(yīng)用層可以考慮規(guī)避策略。
