android persist apk 多次crash會(huì)進(jìn)入recovery模式

1.基本介紹

Google在Android 8.0加入該新功能,稱之為rescue party救援程序。

主要監(jiān)控系統(tǒng)核心程序出現(xiàn)循環(huán)崩潰的時(shí)候,會(huì)啟動(dòng)該程序,根據(jù)不同的救援級(jí)別做出一系列操作,看是否可恢復(fù)設(shè)備,最嚴(yán)重的時(shí)候則是通過(guò)進(jìn)入recovery然后提供用戶清空用戶數(shù)據(jù)恢復(fù)出廠設(shè)置解決。

代碼:

frameworks\base\services\core\java\com\android\server\RescueParty.java

1.級(jí)別

private static final int LEVEL_NONE = 0;

private static final intLEVEL_RESET_SETTINGS_UNTRUSTED_DEFAULTS = 1;

private static final intLEVEL_RESET_SETTINGS_UNTRUSTED_CHANGES = 2;

private static final intLEVEL_RESET_SETTINGS_TRUSTED_DEFAULTS = 3;

private static final intLEVEL_FACTORY_RESET = 4;

2.觸發(fā)場(chǎng)景:

(1)system_server 在 5 分鐘內(nèi)重啟 5 次以上調(diào)整一次級(jí)別。

(2)永久性系統(tǒng)應(yīng)用在 30 秒內(nèi)崩潰 5 次以上調(diào)整一次級(jí)別。

2.分析

Threshold?

類 Threshold :這個(gè)類主要實(shí)現(xiàn)對(duì)監(jiān)控進(jìn)程的崩潰次數(shù)的計(jì)數(shù)邏輯,每監(jiān)控一個(gè)進(jìn)程則實(shí)例化一個(gè)對(duì)應(yīng)的對(duì)象,進(jìn)程標(biāo)識(shí)為uid。

主要變量:

private final int uid;監(jiān)控進(jìn)程的uid

private final int triggerCount; 監(jiān)控進(jìn)程崩潰次數(shù)

private final long triggerWindow;監(jiān)控進(jìn)程對(duì)應(yīng)的時(shí)間邊界

主要方法:

public abstract int getCount();獲取崩潰次數(shù)

public abstract void setCount(int count);設(shè)置更新后的崩潰次數(shù)

public abstract long getStart();獲取該統(tǒng)計(jì)周期的起始時(shí)間

public abstract void setStart(long start);設(shè)置該統(tǒng)計(jì)周期的起始時(shí)間

public void reset() {重置崩潰次數(shù)和起始時(shí)間

? setCount(0);

? setStart(0);

}

public boolean incrementAndTest() {//通過(guò)調(diào)用這個(gè)函數(shù)實(shí)現(xiàn)崩潰次數(shù)更新和判斷是否超出該周期內(nèi)邊界時(shí)間限制

? finallong now = SystemClock.elapsedRealtime();//獲取當(dāng)前系統(tǒng)時(shí)間

? finallong window = now - getStart();//第一次的時(shí)候因?yàn)間etstart為0,所以都會(huì)大于triggerWindow,之后則通過(guò)window判斷目標(biāo)進(jìn)程是否已經(jīng)超出該周期的邊界時(shí)間限制。

? if(window > triggerWindow) {//時(shí)間超出限制,開(kāi)啟新統(tǒng)計(jì)周期

???????? setCount(1);

???????? setStart(now);

???????? returnfalse;

? }else {

???????? intcount = getCount() + 1;//崩潰統(tǒng)計(jì)次數(shù)加1

???????? setCount(count);

???????? EventLogTags.writeRescueNote(uid,count, window);

???????? Slog.w(TAG,"Noticed " + count + " events for UID " + uid + " inlast "

?????????????????????? +(window / 1000) + " sec");

???????? return(count >= triggerCount);//當(dāng)崩潰次數(shù)等于或者大于5次,返回true

? }

}


前文提到該救援程序主要實(shí)現(xiàn)對(duì)system_server和常駐進(jìn)程監(jiān)控,這里分開(kāi)分析

system_server進(jìn)程監(jiān)控

首先說(shuō)下類BootThreshold繼承了Threshold

幾個(gè)需要說(shuō)明的點(diǎn)

(1)監(jiān)控uid為android.os.Process.ROOT_UID =0,即zygote 進(jìn)程,因?yàn)閟ystem_server 重啟必然導(dǎo)致zygote重啟

????triggerCount = 5

? ?????? triggerWindow = 300 *DateUtils.SECOND_IN_MILLIS

? 構(gòu)造函數(shù):

? publicBootThreshold() {

???????super(android.os.Process.ROOT_UID, 5, 300 * DateUtils.SECOND_IN_MILLIS);

?}

綜上:統(tǒng)計(jì)周期時(shí)間邊界為300s即5分鐘,次數(shù)限制5次

System_server重啟次數(shù)和周期起始時(shí)間寫(xiě)入Settingsprovide

統(tǒng)計(jì)次數(shù)對(duì)應(yīng)的鍵值???? private static final StringPROP_RESCUE_BOOT_COUNT = "sys.rescue_boot_count";

統(tǒng)計(jì)周期的起始時(shí)間對(duì)應(yīng)的鍵值private static final String PROP_RESCUE_BOOT_START ="sys.rescue_boot_start";

預(yù)編譯的時(shí)候就實(shí)例BootThreshold給對(duì)象sBoot

private static final Threshold sBoot = newBootThreshold();

監(jiān)控方法,在system_server每次啟動(dòng)過(guò)程中有如下調(diào)用

SystemServer.startBootstrapServices

?==>RescueParty.noteBoot(mSystemContext);


?public static void noteBoot(Context context) {

? if(isDisabled()) return;

? if(sBoot.incrementAndTest()) {//如果5分鐘內(nèi)崩潰次數(shù)等于5次,則為true

???????? sBoot.reset();//首先重置統(tǒng)計(jì)信息

???????? incrementRescueLevel(sBoot.uid);//調(diào)整system_server的救援等級(jí)

???????? executeRescueLevel(context);//執(zhí)行救援操作

? }

}


private static voidincrementRescueLevel(int triggerUid)

???//每調(diào)用一次,救援等級(jí)+1,救援等級(jí)被寫(xiě)入到SettingsProvide的"sys.rescue_level" 鍵值對(duì)中保存,默認(rèn)為L(zhǎng)EVEL_NONE,最高級(jí)別為L(zhǎng)EVEL_FACTORY_RESET

? finalint level = MathUtils.constrain(

??????????????? SystemProperties.getInt(PROP_RESCUE_LEVEL,LEVEL_NONE) + 1,

??????????????? LEVEL_NONE,LEVEL_FACTORY_RESET);

? SystemProperties.set(PROP_RESCUE_LEVEL,Integer.toString(level));


? EventLogTags.writeRescueLevel(level,triggerUid);

? //調(diào)用PKMS的接口logCriticalInfo,寫(xiě)入等級(jí)更新的log,并保存在PKMS的log信息記錄文件中,目錄/data/system/uiderrors.txt

? PackageManagerService.logCriticalInfo(Log.WARN,"Incremented rescue level to "

??????????????? +levelToString(level) + " triggered by UID " + triggerUid);

}


private static voidexecuteRescueLevel(Context context) {

? finalint level = SystemProperties.getInt(PROP_RESCUE_LEVEL, LEVEL_NONE);//獲取救援等級(jí)

? if(level == LEVEL_NONE) return;


? Slog.w(TAG,"Attempting rescue level " + levelToString(level));

? try{

???????? executeRescueLevelInternal(context,level);//根據(jù)不同等級(jí)執(zhí)行相關(guān)救援操作

???????? EventLogTags.writeRescueSuccess(level);

???????? PackageManagerService.logCriticalInfo(Log.DEBUG,

?????????????????????? "Finishedrescue level " + levelToString(level));//寫(xiě)入log到uiderrors.txt

? }catch (Throwable t) {

???????? finalString msg = ExceptionUtils.getCompleteMessage(t);

???????? EventLogTags.writeRescueFailure(level,msg);

???????? PackageManagerService.logCriticalInfo(Log.ERROR,

?????????????????????? "Failedrescue level " + levelToString(level) + ": " + msg);

? }

}


private static voidexecuteRescueLevelInternal(Context context, int level) throws Exception {

? switch(level) {

? ??? 救援等級(jí)1-3通過(guò)更深入的重置Setting屬性設(shè)置來(lái)實(shí)現(xiàn),4等級(jí)即最高等級(jí)通過(guò)進(jìn)入recovery,讓客戶重置data分區(qū)實(shí)現(xiàn)。

???????? caseLEVEL_RESET_SETTINGS_UNTRUSTED_DEFAULTS:

??????????????? resetAllSettings(context,Settings.RESET_MODE_UNTRUSTED_DEFAULTS);//主要針對(duì)非系統(tǒng)進(jìn)程的屬性設(shè)置進(jìn)行重置

??????????????? break;

???????? caseLEVEL_RESET_SETTINGS_UNTRUSTED_CHANGES:

??????????????? resetAllSettings(context,Settings.RESET_MODE_UNTRUSTED_CHANGES);//針對(duì)非系統(tǒng)進(jìn)程屬性,來(lái)自系統(tǒng)默認(rèn)的屬性重置,其他刪除

??????????????? break;

???????? caseLEVEL_RESET_SETTINGS_TRUSTED_DEFAULTS:

??????????????? resetAllSettings(context,Settings.RESET_MODE_TRUSTED_DEFAULTS);//所有進(jìn)程系統(tǒng)默認(rèn)的屬性重置,其他刪除

??????????????? break;

???????? caseLEVEL_FACTORY_RESET://進(jìn)入recovery

??????????????? RecoverySystem.rebootPromptAndWipeUserData(context,TAG);//進(jìn)recovery

??????????????? break;

? }

}


private static voidresetAllSettings(Context context, int mode) throws Exception {


? Exceptionres = null;

? finalContentResolver resolver = context.getContentResolver();

? try{//重置系統(tǒng)級(jí)Setting 設(shè)置

???????? Settings.Global.resetToDefaultsAsUser(resolver,null, mode, UserHandle.USER_SYSTEM);

? }catch (Throwable t) {

???????? res= new RuntimeException("Failed to reset global settings", t);

? }

? for(int userId : getAllUserIds()) {//多用戶的時(shí)候,所有用戶的Setting設(shè)置都要重置

???????? try{

??????????????? Settings.Secure.resetToDefaultsAsUser(resolver,null, mode, userId);

???????? }catch (Throwable t) {

??????????????? res= new RuntimeException("Failed to reset secure settings for " +userId, t);

???????? }

? }

? if(res != null) {

???????? throwres;

? }

}

常駐進(jìn)程崩潰

AppThreshold 繼承Threshold,主要實(shí)現(xiàn)對(duì)常駐應(yīng)用進(jìn)程的監(jiān)控

幾個(gè)需要說(shuō)明的點(diǎn)

(1)監(jiān)控uid為傳入崩潰的應(yīng)用uid

????triggerCount = 5

? ?triggerWindow = 30 *DateUtils.SECOND_IN_MILLIS

? ?綜上:統(tǒng)計(jì)周期時(shí)間邊界為30s,次數(shù)限制5次

? publicAppThreshold(int uid) {

???????? super(uid,5, 30 * DateUtils.SECOND_IN_MILLIS);

? }

次數(shù)和周期統(tǒng)計(jì)交給對(duì)象自己的變量count和start保存

區(qū)別于system_server重啟的監(jiān)控,應(yīng)用進(jìn)程比較多,建立一個(gè)array列表去保存uid 和匹配的AppThreshold對(duì)象。

private static SparseArray<Threshold>sApps = new SparseArray<>();


當(dāng)應(yīng)用進(jìn)程出現(xiàn)Crash的時(shí)候,都會(huì)回調(diào)到AMS,AMS調(diào)用appErrors.crashApplicationInner方法,這個(gè)方法里面有如下邏輯

ProcessRecord r

if (r != null && r.persistent) {//當(dāng)前Crash的進(jìn)程是否是常駐進(jìn)程,是的話進(jìn)入并傳入uid

? RescueParty.notePersistentAppCrash(mContext,r.uid);

}

public static voidnotePersistentAppCrash(Context context, int uid) {

? if(isDisabled()) return;

? //為每一個(gè)崩潰過(guò)的常駐進(jìn)程實(shí)例化一個(gè)AppThreshold,并放在sApps保存

? Thresholdt = sApps.get(uid);

? if(t == null) {

???????? t= new AppThreshold(uid);

???????? sApps.put(uid,t);

? }

? 然后通過(guò)uid匹配獲取的AppThreshold進(jìn)行計(jì)數(shù)統(tǒng)計(jì)等操作,詳情同上文,不再贅述。

? if(t.incrementAndTest()) {

???????? t.reset();

???????? incrementRescueLevel(t.uid);

???????? executeRescueLevel(context);

? }

}


禁止場(chǎng)景

(1)PROP_ENABLE_RESCUE屬性值為false,并且PROP_DISABLE_RESCUE為true

(2)eng版本下

(3)手機(jī)連接usb模式

private static boolean isDisabled() {


? if(SystemProperties.getBoolean(PROP_ENABLE_RESCUE, false)) {

???????? returnfalse;

? }

//是否為eng版本

? if(Build.IS_ENG) {

???????? Slog.v(TAG,"Disabled because of eng build");

???????? returntrue;

? }

//是否有連接usb

? if(Build.IS_USERDEBUG && isUsbActive()) {

???????? Slog.v(TAG,"Disabled because of active USB connection");

???????? returntrue;

? }


? if(SystemProperties.getBoolean(PROP_DISABLE_RESCUE, false)) {

???????? Slog.v(TAG,"Disabled because of manual property");

???????? returntrue;

? }


? returnfalse;

}

其他場(chǎng)景

SettingProvide public的時(shí)候也會(huì)更新一次救援級(jí)別

/frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

installSystemProviders()->RescueParty.onSettingsProviderPublished(mContext);

???public static void onSettingsProviderPublished(Context context) {

???????executeRescueLevel(context);

??? }


服務(wù)初始化

voidcrashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo,

intcallingPid,intcallingUid) {

。。。

// If a persistent app is stuck in a crash loop, the device isn't very

// usable, so we want to consider sending out a rescue party.

if(r !=null&& r.persistent) {

RescueParty.notePersistentAppCrash(mContext, r.uid);

}

AppErrorResult result =newAppErrorResult();

TaskRecord task;

}


流程圖


處理方式:

代碼路徑如下:

? ? /frameworks/base/services/core/java/com/android/server/RescueParty.java

? ? 關(guān)閉可以直接

? ? ? ? private static boolean isDisabled() {

? ? ? ? ? ? return true;

? ? ? ? ? ? ....

? ? ? ? }

? ? ? ? 進(jìn)入recovery 的命令:

? ? ? private static void executeRescueLevelInternal(Context context, int level) throws Exception {

? ? ? ? ? ? ....

? ? ? ? case LEVEL_FACTORY_RESET:

RecoverySystem.rebootPromptAndWipeUserData(context,TAG);

break;

}

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容