一次奇妙的Context.startForegroundService() did not then call Service.startForeground() Crash

首先看堆棧:

android.app.RemoteServiceException

Context.startForegroundService() did not then call Service.startForeground(): ServiceRecord{413a500 u0 com.xx.xx/com.xx.baseUi.service.BackgroundKeepService}

android.app.ActivityThread$H.handleMessage(ActivityThread.java:1796)
android.os.Handler.dispatchMessage(Handler.java:106)
android.os.Looper.loop(Looper.java:213)
android.app.ActivityThread.main(ActivityThread.java:6954)
java.lang.reflect.Method.invoke(Native Method)
com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
com.android.internal.os.ZygoteInit.main(ZygoteInit.java:909)

大概是說 startForegroundService 但是沒有調(diào)用 Service.startForeground()
回到業(yè)務(wù)代碼里一看,BackgroundKeepServiceonCreate 方法里確實已經(jīng)調(diào)用了。這一下就很迷惑了...

思來想去都覺得這不會是AOSP的一個Bug吧... 有疑問,看源碼!

在Framework的 ActiveService 里面有兩個地方會拋出 Context.startForegroundService() did not then call Service.startForeground() 異常,分別是:

   void serviceForegroundTimeout(ServiceRecord r) {
        ProcessRecord app;
        synchronized (mAm) {
            if (!r.fgRequired || !r.fgWaiting || r.destroying) {
                return;
            }

            app = r.app;
            if (app != null && app.isDebugging()) {
                // The app's being debugged; let it ride
                return;
            }

            if (DEBUG_BACKGROUND_CHECK) {
                Slog.i(TAG, "Service foreground-required timeout for " + r);
            }
            r.fgWaiting = false;
            stopServiceLocked(r, false);
        }

        if (app != null) {
            final String annotation = "Context.startForegroundService() did not then call "
                    + "Service.startForeground(): " + r;
            Message msg = mAm.mHandler.obtainMessage(
                    ActivityManagerService.SERVICE_FOREGROUND_TIMEOUT_ANR_MSG);
            SomeArgs args = SomeArgs.obtain();
            args.arg1 = app;
            args.arg2 = annotation;
            msg.obj = args;
            mAm.mHandler.sendMessageDelayed(msg,
                    mAm.mConstants.mServiceStartForegroundAnrDelayMs);
        }
    }

 void serviceForegroundCrash(ProcessRecord app, String serviceRecord,
            ComponentName service) {
        mAm.crashApplicationWithTypeWithExtras(
                app.uid, app.getPid(), app.info.packageName, app.userId,
                "Context.startForegroundService() did not then call " + "Service.startForeground(): "
                    + serviceRecord, false /*force*/,
                ForegroundServiceDidNotStartInTimeException.TYPE_ID,
                ForegroundServiceDidNotStartInTimeException.createExtrasForService(service));
    }

由于第一個拋出的是ANR不是Crash,不是我們的目標,那么就重點分析 serviceForegroundCrash 方法。定位到調(diào)用的地方 ActivityManagerService 中有一段這樣的代碼:

         case SERVICE_FOREGROUND_CRASH_MSG: {
                SomeArgs args = (SomeArgs) msg.obj;
                mServices.serviceForegroundCrash(
                        (ProcessRecord) args.arg1,
                        (String) args.arg2,
                        (ComponentName) args.arg3);
                args.recycle();
            } break;

Crash并且符合堆棧拋出的特征。查找調(diào)用的地方在 ActiveServicebringDownServiceLocked 方法:

private void bringDownServiceLocked(ServiceRecord r, boolean enqueueOomAdj) {
           //....
            mAm.mAppOpsService.finishOperation(AppOpsManager.getToken(mAm.mAppOpsService),
                    AppOpsManager.OP_START_FOREGROUND, r.appInfo.uid, r.packageName, null);
            mAm.mHandler.removeMessages(
                    ActivityManagerService.SERVICE_FOREGROUND_TIMEOUT_MSG, r);
            if (r.app != null) {
                Message msg = mAm.mHandler.obtainMessage(
                        ActivityManagerService.SERVICE_FOREGROUND_CRASH_MSG);
                SomeArgs args = SomeArgs.obtain();
                args.arg1 = r.app;
                args.arg2 = r.toString();
                args.arg3 = r.getComponentName();

                msg.obj = args;
                mAm.mHandler.sendMessage(msg);
            }
        }

       //...
    }

繼續(xù)分析,在 ActiveService 里共有9處調(diào)用這個方法的,排除掉應(yīng)用啟動、殺進程等場景的調(diào)用,其中 private String bringUpServiceLocked() 中兩處調(diào)用,private final void bringDownServiceIfNeededLocked() 中一處調(diào)用。

OK,問題簡化。進一步分析 private String bringUpServiceLocked() 中兩處和進程啟動和權(quán)限相關(guān),那么可以確定不是這個問題。
問題再簡化,針對 private final void bringDownServiceIfNeededLocked() 分析即可。 ActiveService 里共有三處調(diào)用的地方,分別是:

  private void stopServiceLocked(ServiceRecord service, boolean enqueueOomAdj) {
        //...

        bringDownServiceIfNeededLocked(service, false, false, enqueueOomAdj); //這里
    }
    boolean stopServiceTokenLocked(ComponentName className, IBinder token,
            int startId) {
        if (DEBUG_SERVICE) Slog.v(TAG_SERVICE, "stopServiceToken: " + className
                + " " + token + " startId=" + startId);
        ServiceRecord r = findServiceLocked(className, token, UserHandle.getCallingUserId());
        if (r != null) {
            //...
            r.callStart = false;
            final long origId = Binder.clearCallingIdentity();
            bringDownServiceIfNeededLocked(r, false, false, false); //這里
            Binder.restoreCallingIdentity(origId);
            return true;
        }
        return false;
    }
 void removeConnectionLocked(ConnectionRecord c, ProcessRecord skipApp,
            ActivityServiceConnectionsHolder skipAct, boolean enqueueOomAdj) {
                //...
                bringDownServiceIfNeededLocked(s, true, hasAutoCreate, enqueueOomAdj); //這里
            }
        }
    }

最后一個是connection相關(guān)的,由于這個Service并沒有Binder connection排除。再此分析下來竟然和stopService有關(guān)系。??!這...


a4880442-6868-41e6-97b3-51b85c680fd9.jpg

先來把代碼里stopService的代碼注掉,跑一遍Monkey,果然沒事了。

那么究竟為什么stopService會導(dǎo)致這個Crash?讓我們先追蹤一下stopService的源碼。先看 ActivityManagerServicestopService()

   @Override
    public int stopService(IApplicationThread caller, Intent service,
            String resolvedType, int userId) {
        enforceNotIsolatedCaller("stopService");
        // Refuse possible leaked file descriptors
        if (service != null && service.hasFileDescriptors() == true) {
            throw new IllegalArgumentException("File descriptors passed in Intent");
        }

        synchronized(this) {
            return mServices.stopServiceLocked(caller, service, resolvedType, userId);
        }
    }

再追蹤 mServices.stopServiceLocked()ActiveServicesint stopServiceLocked(IApplicationThread caller, Intent service,String resolvedType, int userId) 方法:

    int stopServiceLocked(IApplicationThread caller, Intent service,
            String resolvedType, int userId) {
        if (DEBUG_SERVICE) Slog.v(TAG_SERVICE, "stopService: " + service
                + " type=" + resolvedType);

        final ProcessRecord callerApp = mAm.getRecordForAppLOSP(caller);
        if (caller != null && callerApp == null) {
            throw new SecurityException(
                    "Unable to find app for caller " + caller
                    + " (pid=" + Binder.getCallingPid()
                    + ") when stopping service " + service);
        }

        // If this service is active, make sure it is stopped.
        ServiceLookupResult r = retrieveServiceLocked(service, null, resolvedType, null,
                Binder.getCallingPid(), Binder.getCallingUid(), userId, false, false, false, false);
        if (r != null) {
            if (r.record != null) {
                final long origId = Binder.clearCallingIdentity();
                try {
                    stopServiceLocked(r.record, false);//這里
                } finally {
                    Binder.restoreCallingIdentity(origId);
                }
                return 1;
            }
            return -1;
        }

        return 0;
    }

到這里,就可以確定和上面的 private void stopServiceLocked(ServiceRecord service, boolean enqueueOomAdj) 對上了。細看這里面的邏輯:

        // Check to see if the service had been started as foreground, but being
        // brought down before actually showing a notification.  That is not allowed.
        if (r.fgRequired) {
            Slog.w(TAG_SERVICE, "Bringing down service while still waiting for start foreground: "
                    + r);
            r.fgRequired = false;
            r.fgWaiting = false;
            synchronized (mAm.mProcessStats.mLock) {
                ServiceState stracker = r.getTracker();
                if (stracker != null) {
                    stracker.setForeground(false, mAm.mProcessStats.getMemFactorLocked(),
                            SystemClock.uptimeMillis());
                }
            }
            mAm.mAppOpsService.finishOperation(AppOpsManager.getToken(mAm.mAppOpsService),
                    AppOpsManager.OP_START_FOREGROUND, r.appInfo.uid, r.packageName, null);
            mAm.mHandler.removeMessages(
                    ActivityManagerService.SERVICE_FOREGROUND_TIMEOUT_MSG, r);
            if (r.app != null) {
                Message msg = mAm.mHandler.obtainMessage(
                        ActivityManagerService.SERVICE_FOREGROUND_CRASH_MSG);
                SomeArgs args = SomeArgs.obtain();
                args.arg1 = r.app;
                args.arg2 = r.toString();
                args.arg3 = r.getComponentName();

                msg.obj = args;
                mAm.mHandler.sendMessage(msg);
            }
        }

如果r.fgRequired 是true,那么就會出現(xiàn)這個crash。 所以 r.fgRequired 是什么邏輯?在此之前,先看一下 ServicestartForeground() 到底干了啥?老規(guī)矩,看代碼。通過ActivityManagerService最終跳到我們的老朋友 ActiveService 中:

    @GuardedBy("mAm")
    public void setServiceForegroundLocked(ComponentName className, IBinder token,
            int id, Notification notification, int flags, int foregroundServiceType) {
        final int userId = UserHandle.getCallingUserId();
        final long origId = Binder.clearCallingIdentity();
        try {
            ServiceRecord r = findServiceLocked(className, token, userId);
            if (r != null) {
                setServiceForegroundInnerLocked(r, id, notification, flags, foregroundServiceType);
            }
        } finally {
            Binder.restoreCallingIdentity(origId);
        }
    }

繼續(xù) setServiceForegroundInnerLocked

@GuardedBy("mAm")
    private void setServiceForegroundInnerLocked(final ServiceRecord r, int id,
            Notification notification, int flags, int foregroundServiceType) {
//...
      if (r.fgRequired) {
                if (DEBUG_SERVICE || DEBUG_BACKGROUND_CHECK) {
                    Slog.i(TAG, "Service called startForeground() as required: " + r);
                }
                r.fgRequired = false;
                r.fgWaiting = false;
                alreadyStartedOp = stopProcStatsOp = true;
                mAm.mHandler.removeMessages(
                        ActivityManagerService.SERVICE_FOREGROUND_TIMEOUT_MSG, r);
            }
//...
}

也就是 r.fgRequired = false; 這個代碼沒有執(zhí)行,進而導(dǎo)致后面的Crash??墒菫槭裁碨ervice的onCreate()方法沒有執(zhí)行呢?這和Service的stop又有什么關(guān)系?難道說,我們的Servcie還沒有執(zhí)行onCreate方法就被stop了?思來想去也就只有這一個選項了!

由于Servcie調(diào)用onCreate方法是跨進程調(diào)用的,確實存在這種可能。寫個代碼驗證下,在startForegroundService后立刻調(diào)用stopService,果然預(yù)期中的crash出現(xiàn)了:

android.app.RemoteServiceException$ForegroundServiceDidNotStartInTimeException: Context.startForegroundService() did not then call Service.startForeground(): ServiceRecord{1e9c4ea u0 com.example.test/.MyService}
    at android.app.ActivityThread.generateForegroundServiceDidNotStartInTimeException(ActivityThread.java:2042)
    at android.app.ActivityThread.throwRemoteServiceException(ActivityThread.java:2013)
    at android.app.ActivityThread.-$$Nest$mthrowRemoteServiceException(Unknown Source:0)
    at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2282)
    at android.os.Handler.dispatchMessage(Handler.java:106)
    at android.os.Looper.loopOnce(Looper.java:201)
    at android.os.Looper.loop(Looper.java:288)
    at android.app.ActivityThread.main(ActivityThread.java:8049)
    at java.lang.reflect.Method.invoke(Native Method)
    at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:942)

對應(yīng)代碼:

 startForegroundService(Intent(this@MainActivity,MyService::class.java))
 stopService(Intent(this@MainActivity,MyService::class.java))

為什么會出現(xiàn)先后調(diào)用這兩個方法呢?原來App做了保活,應(yīng)用退到后臺前會啟動一個前臺Service,防止應(yīng)用在后臺被kill。這個前后臺監(jiān)聽是通過Application的ActivityLifecycleCallback實現(xiàn)的。也就是說可見頁面小于0,就會startForegroundService,大于0就會stopService。問題是應(yīng)用換皮膚的方案就是通過重建應(yīng)用完成的,那么在系統(tǒng)切換黑/白模式的時候,應(yīng)用會重建,相當于經(jīng)歷了一次快速前后臺切換,導(dǎo)致Crash。

那么如何解決呢?要通過Service的stopSelf()進行Service銷毀。具體代碼:

    override fun onStartCommand(intent: Intent?, flags: Int, startId: Int): Int {
        when (intent?.action) {
            ACTION_UPDATE_INFO -> {
              
            }

            ACTION_STOP_SERVICE -> {
                stopSelf()
            }
        }

        return START_NOT_STICKY
    }

調(diào)用的地方:

val intent = Intent(this, BackgroundKeepService::class.java)
intent.action = BackgroundKeepService.ACTION_STOP_SERVICE
startService(intent)

至此,完成了Crash的分析、復(fù)現(xiàn)和驗證。線上也沒有相關(guān)Crash。

總結(jié):這是AOSP對startForeground的一種限制,如果業(yè)務(wù)邏輯無法繞開短時間內(nèi)啟動暫停Service,上文提到的解決方案可以規(guī)避Crash問題。

ce528aa2-cdd6-4ba4-a832-5b772e060076.jpg
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容