一、問(wèn)題描述:
10臺(tái)機(jī)器進(jìn)行某項(xiàng)自動(dòng)化測(cè)試,一輪5天,發(fā)現(xiàn)一臺(tái)機(jī)器沒有完成測(cè)試就停止了。
二、分析過(guò)程:
1. 拿到log,可以快速地定位到system_server發(fā)生了crash導(dǎo)致android層重啟,且直接原因是全局引用表溢出,虛擬機(jī)dump信息如下:
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256] JNI ERROR (app bug): global reference table overflow (max=51200)
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256] global reference table dump:
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? Last 10 entries (of 51200):
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? 51199: 0x12e3ba60 com.android.server.content.ContentService$ObserverNode$ObserverEntry
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? 51198: 0x12d93760 com.android.server.am.ServiceRecord
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? 51197: 0x12d8fa20 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? 51196: 0x12e391b8 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? 51195:0x12c4db58 com.android.server.am.ServiceRecord
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? 51194: 0x12dc3e78 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? 51193: 0x12e3b560 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? 51192: 0x12e38718 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? 51191: 0x12dc3fc0 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? 51190: 0x12fc8b60 com.android.server.am.ServiceRecord
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? Summary:
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? 24615 of java.lang.ref.WeakReference (24615 unique instances)
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? 23622 of com.android.server.content.ContentService$ObserverNode$ObserverEntry (23622 unique instances)
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? ? 758 of android.os.Binder (758 unique instances)
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? ? 485 of com.android.server.notification.NotificationManagerService$StatusBarNotificationHolder (485 unique instances)
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? ? 468 of com.android.server.am.ServiceRecord (468 unique instances)
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? ? 319 of java.lang.Class (239 unique instances)
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? ? 181 of java.nio.DirectByteBuffer (170 unique instances)
08-25 18:41:14.285? 955? 4704 F zygote64: indirect_reference_table.cc:256]? ? ? 119 of android.os.RemoteCallbackList$Callback (119 unique instances)
由上面dump的關(guān)鍵信息可以得到,虛擬機(jī)全局引用表對(duì)象引用個(gè)數(shù)限制為51200個(gè);最后在創(chuàng)建一個(gè)ObserverEntry對(duì)象;而引用數(shù)最多的兩個(gè)對(duì)象是ObserverEntry和WeakReference(懷疑是binder實(shí)體的弱引用,ObserverEntry代表一個(gè)內(nèi)容監(jiān)聽器在framwork的實(shí)例,是用來(lái)回調(diào)到app端的,所以O(shè)bserverEntry和WeakReference 數(shù)量接近)。這里只能從ObserverEntry對(duì)象入手繼續(xù)分析,結(jié)合ContentService的實(shí)現(xiàn),懷疑有應(yīng)用反復(fù)注冊(cè)了內(nèi)容監(jiān)聽器而沒有在適當(dāng)?shù)臅r(shí)候釋放。需要現(xiàn)場(chǎng)來(lái)進(jìn)行定位確認(rèn),但是復(fù)現(xiàn)問(wèn)題的機(jī)器現(xiàn)場(chǎng)已經(jīng)丟失,只能拿做完測(cè)試但未復(fù)現(xiàn)問(wèn)題、且沒重啟過(guò)的機(jī)器來(lái)做現(xiàn)場(chǎng)調(diào)試。
2.拿到一臺(tái)測(cè)試后沒有重啟的機(jī)器。既然ObserverEntry的對(duì)象這么多,那么就dumpsys content看看ContentService的信息,想進(jìn)一步了解其內(nèi)部信息可以看看ContentService的registerContentObserver、unregisterContentObserver和dump函數(shù)的邏輯。這里不詳細(xì)說(shuō),直接看結(jié)果。這里有大量的監(jiān)聽器,且注冊(cè)者都是pid=1266的進(jìn)程。通過(guò)ps命令查看或者log,發(fā)現(xiàn)pid 1266是SystemUI。
……
settings/global/always_on_display_constants:pid=1266 uid=10044 user=-1 target=17dc217
settings/global/always_on_display_constants:pid=1266 uid=10044 user=-1 target=878d604
……
pid 1266: 10532 observers
……
3. 現(xiàn)在我們基本確定是SystemUI反復(fù)注冊(cè)了always_on_display_constants的監(jiān)聽器而沒有釋放,通過(guò)監(jiān)聽器內(nèi)容搜索SystemUI的代碼,找到其中注冊(cè)always_on_display_constants監(jiān)聽器的邏輯,發(fā)現(xiàn)在啟動(dòng)DozeService的時(shí)候總共會(huì)注冊(cè)三個(gè)always_on_display_constants的監(jiān)聽器,但是沒有看到有注銷的地方。這應(yīng)該就是ObserverEntry對(duì)象泄露的原因。
4.那DozeService什么時(shí)候啟動(dòng),又是什么時(shí)候退出呢?再看看log,有大量的啟動(dòng)停止DozeService的log記錄,啟動(dòng)記錄7609條,每啟動(dòng)退出一次泄露3個(gè),在能看到的log記錄總共就泄露22827個(gè)。非常接近前面虛擬機(jī)dump出來(lái)的引用表信息ObserverEntry的數(shù)量。
08-21 05:41:56.013 955 4704 I PowerManagerService: Going to sleep due to power button (uid 1000)...
......
08-21 05:41:56.713? 955? 974 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.systemui.doze.DozeService}, isTest=false, canDoze=true, userId=0
......
08-21 05:42:01.530? 955? 1520 I PowerManagerService: Waking up from dozing (uid=1000 reason=android.policy:POWER)...
......
08-21 05:42:01.829? 955? 974 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.systemui.doze.DozeService}, isTest=false, canDoze=true, userId=0
自動(dòng)化測(cè)試通常都比較耗時(shí),問(wèn)題不好驗(yàn)證,最好是找到復(fù)現(xiàn)問(wèn)題的步驟。這個(gè)問(wèn)題也比較明顯,DozeService會(huì)在滅屏的時(shí)候啟動(dòng),亮屏的時(shí)候退出,這是framework的邏輯,這里不詳細(xì)分析。我們通過(guò)亮滅屏的動(dòng)作和dumpsys content看看我們前面的猜測(cè)是不是對(duì)的。滅屏后亮屏可以看到observer的對(duì)象增加3,而增加的3個(gè)都是always_on_display_constants
4. 和測(cè)試的同事確認(rèn)這個(gè)自動(dòng)化測(cè)試用例中是否有亮滅屏的動(dòng)作,得到了肯定的答復(fù),是客戶要求的測(cè)試用例!這樣我們就有了復(fù)現(xiàn)問(wèn)題的方法,后續(xù)修改驗(yàn)證也就方便了許多。
三、修改方案
diff --git a/packages/SystemUI/src/com/android/systemui/doze/AlwaysOnDisplayPolicy.java b/packages/SystemUI/src/com/android/systemui/doze/AlwaysOnDisplayPolicy.java
index debda21..9b69031 100644
--- a/packages/SystemUI/src/com/android/systemui/doze/AlwaysOnDisplayPolicy.java
+++ b/packages/SystemUI/src/com/android/systemui/doze/AlwaysOnDisplayPolicy.java
@@ -102,6 +102,14 @@
? ? ? ? mSettingsObserver.observe();
? ? }
+? ? public void clear() {
+? ? ? ? if (mSettingsObserver != null) {
+? ? ? ? ? ? //mSettingsObserver.clearObserve();
+? ? ? ? ? ? ContentResolver resolver = mContext.getContentResolver();
+? ? ? ? ? ? resolver.unregisterContentObserver(mSettingsObserver);
+? ? ? ? }
+? ? }
+
? ? private int[] parseIntArray(final String key, final int[] defaultArray) {
? ? ? ? final String value = mParser.getString(key, null);
? ? ? ? if (value != null) {
@@ -130,7 +138,12 @@
? ? ? ? ? ? ? ? ? ? false, this, UserHandle.USER_ALL);
? ? ? ? ? ? update(null);
? ? ? ? }
-
+/*
+? ? ? ? void clearObserve() {
+? ? ? ? ? ? ContentResolver resolver = mContext.getContentResolver();
+? ? ? ? ? ? resolver.registerContentObserver(this);
+? ? ? ? }
+*/
? ? ? ? @Override
? ? ? ? public void onChange(boolean selfChange, Uri uri) {
? ? ? ? ? ? update(uri);
diff --git a/packages/SystemUI/src/com/android/systemui/doze/DozeMachine.java b/packages/SystemUI/src/com/android/systemui/doze/DozeMachine.java
index 8ec6afc..7d3dc3f 100644
--- a/packages/SystemUI/src/com/android/systemui/doze/DozeMachine.java
+++ b/packages/SystemUI/src/com/android/systemui/doze/DozeMachine.java
@@ -129,6 +129,13 @@
? ? ? ? mParts = parts;
? ? }
+? ? /** clear some reference stored in framework-system_server */
+? ? public void clear() {
+? ? ? ? for (Part p : mParts) {
+? ? ? ? ? ? p.clear();
+? ? ? ? }
+? ? }
+
? ? /**
? ? ? * Requests transitioning to {@code requestedState}.
? ? ? *
@@ -348,6 +355,8 @@
? ? ? ? ? */
? ? ? ? void transitionTo(State oldState, State newState);
+? ? ? ? default void clear() {}
+
? ? ? ? /** Dump current state. For debugging only. */
? ? ? ? default void dump(PrintWriter pw) {}
? ? }
diff --git a/packages/SystemUI/src/com/android/systemui/doze/DozePauser.java b/packages/SystemUI/src/com/android/systemui/doze/DozePauser.java
index 58f1448..f7f49225 100644
--- a/packages/SystemUI/src/com/android/systemui/doze/DozePauser.java
+++ b/packages/SystemUI/src/com/android/systemui/doze/DozePauser.java
@@ -50,6 +50,13 @@
? ? ? ? }
? ? }
+? ? @Override
+? ? public void clear() {
+? ? ? ? if (mPolicy != null) {
+? ? ? ? ? ? mPolicy.clear();
+? ? ? ? }
+? ? }
+
? ? private void onTimeout() {
? ? ? ? mMachine.requestState(DozeMachine.State.DOZE_AOD_PAUSED);
? ? }
diff --git a/packages/SystemUI/src/com/android/systemui/doze/DozeScreenBrightness.java b/packages/SystemUI/src/com/android/systemui/doze/DozeScreenBrightness.java
index 4bb4e79..f31d3c6 100644
--- a/packages/SystemUI/src/com/android/systemui/doze/DozeScreenBrightness.java
+++ b/packages/SystemUI/src/com/android/systemui/doze/DozeScreenBrightness.java
@@ -38,6 +38,7 @@
? ? private final Sensor mLightSensor;
? ? private final int[] mSensorToBrightness;
? ? private final int[] mSensorToScrimOpacity;
+? ? private final AlwaysOnDisplayPolicy mPolicy;
? ? private boolean mRegistered;
? ? private int mDefaultDozeBrightness;
@@ -58,16 +59,31 @@
? ? ? ? mDefaultDozeBrightness = defaultDozeBrightness;
? ? ? ? mSensorToBrightness = sensorToBrightness;
? ? ? ? mSensorToScrimOpacity = sensorToScrimOpacity;
+
+? ? ? ? mPolicy = null;
? ? }
? ? @VisibleForTesting
? ? public DozeScreenBrightness(Context context, DozeMachine.Service service,
? ? ? ? ? ? SensorManager sensorManager, Sensor lightSensor, DozeHost host,
? ? ? ? ? ? Handler handler, AlwaysOnDisplayPolicy policy) {
+/*
? ? ? ? this(context, service, sensorManager, lightSensor, host, handler,
? ? ? ? ? ? ? ? context.getResources().getInteger(
? ? ? ? ? ? ? ? ? ? ? ? com.android.internal.R.integer.config_screenBrightnessDoze),
? ? ? ? ? ? ? ? policy.screenBrightnessArray, policy.dimmingScrimArray);
+*/
+? ? ? ? mContext = context;
+? ? ? ? mDozeService = service;
+? ? ? ? mSensorManager = sensorManager;
+? ? ? ? mLightSensor = lightSensor;
+? ? ? ? mDozeHost = host;
+? ? ? ? mHandler = handler;
+
+? ? ? ? mDefaultDozeBrightness = context.getResources().getInteger(com.android.internal.R.integer.config_screenBrightnessDoze);
+? ? ? ? mSensorToBrightness = policy.screenBrightnessArray;
+? ? ? ? mSensorToScrimOpacity = policy.dimmingScrimArray;
+? ? ? ? mPolicy = policy;
? ? }
? ? @Override
@@ -94,6 +110,13 @@
? ? }
? ? @Override
+? ? public void clear() {
+? ? ? ? if (mPolicy != null) {
+? ? ? ? ? ? mPolicy.clear();
+? ? ? ? }
+? ? }
+
+? ? @Override
? ? public void onSensorChanged(SensorEvent event) {
? ? ? ? Trace.beginSection("DozeScreenBrightness.onSensorChanged" + event.values[0]);
? ? ? ? try {
diff --git a/packages/SystemUI/src/com/android/systemui/doze/DozeSensors.java b/packages/SystemUI/src/com/android/systemui/doze/DozeSensors.java
index 91cde37..000e47a 100644
--- a/packages/SystemUI/src/com/android/systemui/doze/DozeSensors.java
+++ b/packages/SystemUI/src/com/android/systemui/doze/DozeSensors.java
@@ -133,6 +133,12 @@
? ? ? ? return null;
? ? }
+? ? public void clear() {
+? ? ? ? if (mProxSensor != null) {
+? ? ? ? ? ? mProxSensor.clear();
+? ? ? ? }
+? ? }
+
? ? public void setListening(boolean listen) {
? ? ? ? for (TriggerSensor s : mSensors) {
? ? ? ? ? ? s.setListening(listen);
@@ -234,6 +240,12 @@
? ? ? ? ? ? updateRegistered();
? ? ? ? }
+? ? ? ? public void clear() {
+? ? ? ? ? ? if (mPolicy != null) {
+? ? ? ? ? ? ? ? mPolicy.clear();
+? ? ? ? ? ? }
+? ? ? ? }
+
? ? ? ? private void updateRegistered() {
? ? ? ? ? ? setRegistered(mRequested && !mCooldownTimer.isScheduled());
? ? ? ? }
diff --git a/packages/SystemUI/src/com/android/systemui/doze/DozeService.java b/packages/SystemUI/src/com/android/systemui/doze/DozeService.java
index 98b1106..b147f97 100644
--- a/packages/SystemUI/src/com/android/systemui/doze/DozeService.java
+++ b/packages/SystemUI/src/com/android/systemui/doze/DozeService.java
@@ -57,6 +57,16 @@
? ? ? ? mDozeMachine = new DozeFactory().assembleMachine(this);
? ? }
+? ? /** {@inheritDoc} */
+? ? @Override
+? ? public void onDestroy() {
+? ? ? ? Log.d(TAG, "onDestroy() being called when DreamController stop this service");
+? ? ? ? if (mDozeMachine != null) {
+? ? ? ? ? ? mDozeMachine.clear();
+? ? ? ? }
+? ? ? ? super.onDestroy();
+? ? }
+
? ? @Override
? ? public void onPluginConnected(DozeServicePlugin plugin, Context pluginContext) {
? ? ? ? mDozePlugin = plugin;
diff --git a/packages/SystemUI/src/com/android/systemui/doze/DozeTriggers.java b/packages/SystemUI/src/com/android/systemui/doze/DozeTriggers.java
index f7a258a..ea6ae4d 100644
--- a/packages/SystemUI/src/com/android/systemui/doze/DozeTriggers.java
+++ b/packages/SystemUI/src/com/android/systemui/doze/DozeTriggers.java
@@ -212,6 +212,13 @@
? ? ? ? }
? ? }
+? ? @Override
+? ? public void clear() {
+? ? ? ? if (mDozeSensors != null) {
+? ? ? ? ? ? mDozeSensors.clear();
+? ? ? ? }
+? ? }
+
? ? private void checkTriggersAtInit() {
? ? ? ? if (mUiModeManager.getCurrentModeType() == Configuration.UI_MODE_TYPE_CAR
? ? ? ? ? ? ? ? || mDozeHost.isPowerSaveActive()
四、驗(yàn)證方法
由于問(wèn)題比較清晰,我們直接測(cè)試20000次”休眠喚醒”的動(dòng)作來(lái)對(duì)比驗(yàn)證。準(zhǔn)備兩臺(tái)機(jī)器,一臺(tái)刷老版本;一臺(tái)刷修改后的版本。驗(yàn)證結(jié)果為老版本8000次左右就發(fā)生了crash,而修改后的版本20000次后系統(tǒng)依然正常。