問題現(xiàn)象
- 復(fù)現(xiàn)步驟
- Android 7.0平臺(剛bring up完成)
- user版本只要連接特定wifi, system_server進(jìn)程就必現(xiàn)native crash。
- userdebug版本沒有此問題。
分析定位
初步分析
-
tombstone文件如下
ABI: 'x86_64' pid: 5891, tid: 7173, name: Thread-8 >>> system_server <<< signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x3400a6 rax 000000006f528f00 rbx 00007fa0a04f7e30 rcx 00007fa0a04f7e01 rdx 0000000000000001 rsi 00007fa0a04f7ed4 rdi 000000006f1ed630 r8 0000000000000002 r9 00007fa0a04f7d38 r10 000000006f1c38d8 r11 0000000000000000 r12 0000000000200015 r13 000000000034002e r14 00007fa0a04f7ed8 r15 000000006f528f00 cs 0000000000000033 ss 000000000000002b rip 00007fa0bde2338d rbp 00007fa0a04f7d80 rsp 00007fa0a04f7be0 eflags 0000000000010206 backtrace: #00 pc 000000000057738d /system/lib64/libart.so (_ZN3art12InvokeMethodERKNS_33ScopedObjectAccessAlreadyRunnableEP8_jobjectS4_S4_m+125) #01 pc 00000000004cfad8 /system/lib64/libart.so (_ZN3artL24Constructor_newInstance0EP7_JNIEnvP8_jobjectP13_jobjectArray+1432) #02 pc 00000000006e3d4d /system/framework/x86_64/boot-core-oj.oat (offset 0x660000) (java.io.UnixFileSystem.canonicalize0 [DEDUPED]+235) #03 pc 0000000000b22caf /system/framework/x86_64/boot-core-oj.oat (offset 0x660000) (sun.security.x509.X500Name.asX500Principal+157) #04 pc 0000000000b327a7 /system/framework/x86_64/boot-core-oj.oat (offset 0x660000) (sun.security.x509.X509CertInfo.getX500Name+517) #05 pc 0000000000b34b45 /system/framework/x86_64/boot-core-oj.oat (offset 0x660000) (sun.security.x509.X509CertInfo.get+835) #06 pc 0000000000b2fe15 /system/framework/x86_64/boot-core-oj.oat (offset 0x660000) (sun.security.x509.X509CertImpl.getSubjectX500Principal+99) #07 pc 0000000000a93baf /system/framework/x86_64/boot-core-oj.oat (offset 0x660000) (sun.security.provider.certpath.PolicyChecker.mergeExplicitPolicy+93) #08 pc 0000000000a9341e /system/framework/x86_64/boot-core-oj.oat (offset 0x660000) (sun.security.provider.certpath.PolicyChecker.checkPolicy+2652) #09 pc 0000000000a98df1 /system/framework/x86_64/boot-core-oj.oat (offset 0x660000) (sun.security.provider.certpath.PolicyChecker.check+95) #10 pc 0000000000a91f9b /system/framework/x86_64/boot-core-oj.oat (offset 0x660000) (sun.security.provider.certpath.PKIXMasterCertPathValidator.validate+2265) #11 pc 0000000000a907d1 /system/framework/x86_64/boot-core-oj.oat (offset 0x660000) (sun.security.provider.certpath.PKIXCertPathValidator.validate+2895) #12 pc 0000000000a911cc /system/framework/x86_64/boot-core-oj.oat (offset 0x660000) (sun.security.provider.certpath.PKIXCertPathValidator.validate+1546) #13 pc 0000000000a9166b /system/framework/x86_64/boot-core-oj.oat (offset 0x660000) (sun.security.provider.certpath.PKIXCertPathValidator.engineValidate+233) #14 pc 0000000000819f52 /system/framework/x86_64/boot-core-oj.oat (offset 0x660000) (java.security.cert.CertPathValidator.validate+64) #15 pc 00000000000b6d14 /system/framework/x86_64/boot-conscrypt.oat (offset 0x70000) (com.android.org.conscrypt.TrustManagerImpl.verifyChain+1266) #16 pc 00000000000b5fab /system/framework/x86_64/boot-conscrypt.oat (offset 0x70000) (com.android.org.conscrypt.TrustManagerImpl.checkTrustedRecursive+2697) #17 pc 00000000000b57f1 /system/framework/x86_64/boot-conscrypt.oat (offset 0x70000) (com.android.org.conscrypt.TrustManagerImpl.checkTrustedRecursive+719) #18 pc 00000000000b5bb0 /system/framework/x86_64/boot-conscrypt.oat (offset 0x70000) (com.android.org.conscrypt.TrustManagerImpl.checkTrustedRecursive+1678) #19 pc 00000000000b5207 /system/framework/x86_64/boot-conscrypt.oat (offset 0x70000) (com.android.org.conscrypt.TrustManagerImpl.checkTrusted+645) #20 pc 00000000000b54a7 /system/framework/x86_64/boot-conscrypt.oat (offset 0x70000) (com.android.org.conscrypt.TrustManagerImpl.checkTrusted+421) #21 pc 00000000000b7713 /system/framework/x86_64/boot-conscrypt.oat (offset 0x70000) (com.android.org.conscrypt.TrustManagerImpl.checkServerTrusted+273) #22 pc 00000000000acf15 /system/framework/x86_64/boot-conscrypt.oat (offset 0x70000) (com.android.org.conscrypt.Platform.checkServerTrusted+323) #23 pc 00000000000a4885 /system/framework/x86_64/boot-conscrypt.oat (offset 0x70000) (com.android.org.conscrypt.OpenSSLSocketImpl.verifyCertificateChain+787) #24 pc 00000000001a6564 /system/lib64/libart.so (art_quick_invoke_stub+756) #25 pc 00000000001b4727 /system/lib64/libart.so (_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc+231) #26 pc 0000000000575967 /system/lib64/libart.so (_ZN3artL18InvokeWithArgArrayERKNS_33ScopedObjectAccessAlreadyRunnableEPNS_9ArtMethodEPNS_8ArgArrayEPNS_6JValueEPKc+87) #27 pc 00000000005771be /system/lib64/libart.so (_ZN3art35InvokeVirtualOrInterfaceWithVarArgsERKNS_33ScopedObjectAccessAlreadyRunnableEP8_jobjectP10_jmethodIDP13__va_list_tag+382) #28 pc 000000000046507c /system/lib64/libart.so (_ZN3art3JNI15CallVoidMethodVEP7_JNIEnvP8_jobjectP10_jmethodIDP13__va_list_tag+860) #29 pc 000000000001eb51 /system/lib64/libjavacrypto.so (_ZN7_JNIEnv14CallVoidMethodEP8_jobjectP10_jmethodIDz+161) #30 pc 000000000001f3e7 /system/lib64/libjavacrypto.so #31 pc 0000000000021468 /system/lib64/libssl.so #32 pc 0000000000015fd8 /system/lib64/libssl.so #33 pc 00000000000149ab /system/lib64/libssl.so #34 pc 000000000001954b /system/lib64/libjavacrypto.so #35 pc 000000000008061a /system/framework/x86_64/boot-conscrypt.oat (offset 0x70000) (com.android.org.conscrypt.NativeCrypto.SSL_do_handshake+376) #36 pc 00000000000a33aa /system/framework/x86_64/boot-conscrypt.oat (offset 0x70000) (com.android.org.conscrypt.OpenSSLSocketImpl.startHandshake+1944) #37 pc 00000000000986ba /system/framework/x86_64/boot-okhttp.oat (offset 0x8f000) (com.android.okhttp.Connection.connectTls+488) #38 pc 00000000000981d2 /system/framework/x86_64/boot-okhttp.oat (offset 0x8f000) (com.android.okhttp.Connection.connectSocket+192) #39 pc 000000000009a17e /system/framework/x86_64/boot-okhttp.oat (offset 0x8f000) (com.android.okhttp.Connection.connect+860) #40 pc 000000000009a4ed /system/framework/x86_64/boot-okhttp.oat (offset 0x8f000) (com.android.okhttp.Connection.connectAndSetOwner+203) #41 pc 00000000000b0ffd /system/framework/x86_64/boot-okhttp.oat (offset 0x8f000) (com.android.okhttp.OkHttpClient$1.connectAndSetOwner+75) #42 pc 00000000000cede7 /system/framework/x86_64/boot-okhttp.oat (offset 0x8f000) (com.android.okhttp.internal.http.HttpEngine.connect+501) #43 pc 00000000000d32d5 /system/framework/x86_64/boot-okhttp.oat (offset 0x8f000) (com.android.okhttp.internal.http.HttpEngine.sendRequest+755) #44 pc 00000000000dba50 /system/framework/x86_64/boot-okhttp.oat (offset 0x8f000) (com.android.okhttp.internal.huc.HttpURLConnectionImpl.execute+222) #45 pc 00000000000dbfe3 /system/framework/x86_64/boot-okhttp.oat (offset 0x8f000) (com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponse+145) #46 pc 00000000000de079 /system/framework/x86_64/boot-okhttp.oat (offset 0x8f000) (com.android.okhttp.internal.huc.HttpURLConnectionImpl.getInputStream+135) #47 pc 00000000000e0648 /system/framework/x86_64/boot-okhttp.oat (offset 0x8f000) (com.android.okhttp.internal.huc.HttpsURLConnectionImpl.getInputStream+54) #48 pc 00000000011d9af0 /system/framework/oat/x86_64/services.odex (offset 0xc82000) -
根據(jù)堆棧,初步分析ArtMethod*指向的內(nèi)存被篡改.
由于user版本, 只有一個(gè)tombstone文件,無法獲取更多有用信息.
-
調(diào)整版本配置,獲取core dump后,看到出問題附近內(nèi)存有RSA encryption相關(guān)的數(shù)據(jù)
ArtMethod 由前面的abstract_method計(jì)算而來,m = 0x000000006f528f00,
該地址附近的內(nèi)存都被一些加密相關(guān)的字符篡改了.000000006f528f00 004900570034002e 002e003100480054 ..4.W.I.T.H.1... 000000006f528f10 00340038002e0032 00310031002e0030 2...8.4.0...1.1. 000000006f528f20 0039003400350033 0031002e0031002e 3.5.4.9...1...1. 000000006f528f30 000000340031002e 000000006f2139b8 ..1.4....9!o.... 000000006f528f40 1a1723730000000d 0032004100480053 ....s#..S.H.A.2. 000000006f528f50 0049005700360035 0053005200480054 5.6.W.I.T.H.R.S. 000000006f528f60 0000000000000041 000000006f2139b8 A........9!o.... 000000006f528f70 968ed33600000017 0032004100480053 ....6...S.H.A.2. 000000006f528f80 0069005700360035 0053005200680074 5.6.W.i.t.h.R.S. 000000006f528f90 0063006e00450041 0074007000790072 A.E.n.c.r.y.p.t. 000000006f528fa0 0000006e006f0069 000000006f2139b8 i.o.n....9!o.... 000000006f528fb0 eeb292290000002e 00360031002e0032 ....)...2...1.6. 000000006f528fc0 003000340038002e 0031002e0031002e ..8.4.0...1...1. 000000006f528fd0 0033002e00310030 0032002e0034002e 0.1...3...4...2.
user/userdebug版本區(qū)別
- 從ART虛擬機(jī)角度而言,user和userdebug配置的dexpreopt不同:user版本配置為true, userdebug版本配置為false
- user版本配置為false后,問題不再出現(xiàn)
- userdebug版本配置為true后,問題必現(xiàn)
- 為便于調(diào)試,基于userdebug版本,將dexpreopt配置為true,編譯出新的image。目前問題表明跟dexpreopt有點(diǎn)關(guān)聯(lián)。
證明非內(nèi)存篡改
-
出問題ArtMethod* = 0x7042af00 位于下面的*.art文件
0x700cb000 0x705f1000 0x526000 0x0 /data/dalvik-cache/x86_64/system@framework@boot-core-oj.art -
查看ArtMethod*的52個(gè)bytes內(nèi)容
(gdb) x /52xb 0x7042af00 0x7042af00: 0x2e 0x00 0x38 0x00 0x34 0x00 0x30 0x00 0x7042af08: 0x2e 0x00 0x31 0x00 0x2e 0x00 0x31 0x00 0x7042af10: 0x30 0x00 0x31 0x00 0x2e 0x00 0x33 0x00 0x7042af18: 0x2e 0x00 0x34 0x00 0x2e 0x00 0x32 0x00 0x7042af20: 0x2e 0x00 0x34 0x00 0x57 0x00 0x49 0x00 0x7042af28: 0x54 0x00 0x48 0x00 0x31 0x00 0x2e 0x00 0x7042af30: 0x32 0x00 0x2e 0x00 -
查看*.art的指定偏移的52個(gè)bytes
$ hexdump -C -s 3538688 -n 52 boot-core-oj.art 0035ff00 2e 00 38 00 34 00 30 00 2e 00 31 00 2e 00 31 00 |..8.4.0...1...1.| 0035ff10 30 00 31 00 2e 00 33 00 2e 00 34 00 2e 00 32 00 |0.1...3...4...2.| 0035ff20 2e 00 34 00 57 00 49 00 54 00 48 00 31 00 2e 00 |..4.W.I.T.H.1...| 0035ff30 32 00 2e 00 |2...| 0035ff34 -
通過以上對比,說明這個(gè)boot-core-oj.art 里面的RSA加密之類的東西已經(jīng)存在了, 并不是加載到內(nèi)存后被篡改成這樣的.
目前想到有兩種可能性:
- ArtMethod地址錯(cuò)了
- boot-core-oj.art里面的內(nèi)容生成的時(shí)候就錯(cuò)了(只要訪問到這里的內(nèi)存就報(bào)錯(cuò)).
另外,https模塊同事分析java堆棧,發(fā)現(xiàn)函數(shù)調(diào)用存在異常,邏輯上根本不可能調(diào)用到。所以上面第二種可能性大些。
board對比差異
-
對比同分支下其它board生成image的差異
不論是arm平臺還是x86平臺,出問題board生成的相關(guān)boot image有些奇怪.
根據(jù)文件大小,- boot.oat應(yīng)是boot-radio_interactor_common.oat
- boot-core-oj.oat應(yīng)是boot.oat

-
對比環(huán)境變量
- 出問題board
BOOTCLASSPATH=/system/framework/radio_interactor_common.jar:/system/framework/core-oj.jar:/system/framework/core-libart.jar:/system/framework/conscrypt.jar:/system/framework/okhttp.jar:/system/framework/core-junit.jar:/system/framework/bouncycastle.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/telephony-common.jar:/system/framework/voip-common.jar:/system/framework/ims-common.jar:/system/framework/apache-xml.jar:/system/framework/org.apache.http.legacy.boot.jar - 正常board
BOOTCLASSPATH=/system/framework/core-oj.jar:/system/framework/core-libart.jar:/system/framework/conscrypt.jar:/system/framework/okhttp.jar:/system/framework/core-junit.jar:/system/framework/bouncycastle.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/telephony-common.jar:/system/framework/voip-common.jar:/system/framework/ims-common.jar:/system/framework/apache-xml.jar:/system/framework/org.apache.http.legacy.boot.jar:/system/framework/radio_interactor_common.jar - 對比可發(fā)現(xiàn),radio_interactor_common.jar在BOOTCLASSPATH中的順序不同.
而通過走讀代碼, build的相關(guān)描述如下:
# dex preopt on the bootclasspath produces multiple files. The first dex file # is converted into to boot.art (to match the legacy assumption that boot.art # exists), and the rest are converted to boot-<name>.art. # In addition, each .art file has an associated .oat file. LIBART_TARGET_BOOT_ART_EXTRA_FILES := $(foreach jar,$(wordlist 2,999,$(LIBART_TARGET_BOOT_JARS)),boot-$(jar).art boot-$(jar).oat) LIBART_TARGET_BOOT_ART_EXTRA_FILES += boot.oat以及
# The order of PRODUCT_BOOT_JARS matters. PRODUCT_BOOT_JARS := \ core-oj \ core-libart \ conscrypt \ okhttp \ core-junit \ bouncycastle \ ext \ framework \ telephony-common \ voip-common \ ims-common \ apache-xml \ org.apache.http.legacy.boot可知build系統(tǒng)默認(rèn)將PRODUCT_BOOT_JARS中的第一個(gè)編譯為boot.oat/boot.art, 其它為則編譯為boot-${jar}.oat/boot-${jar}.art文件
這說明board配置出了問題: radio_interactor_common被錯(cuò)誤地配置到了PRODUCT_BOOT_JARS的最前面。
- 出問題board
Root Cause
- 查看radio_interactor_common的使用
PRODUCT_BOOT_JARS += radio_interactor_common
- 再查看相關(guān)的調(diào)用順序,最終找到root cause。
$(call inherit-product, $(PLATDIR)/common/device.mk)
$(call inherit-product, $(SRC_TARGET_DIR)/product/core_64_bit.mk)
$(call inherit-product, $(SRC_TARGET_DIR)/product/aosp_base_telephony.mk)
$(call inherit-product, $(PLATDIR)/common/proprietories.mk)
- device.mk最終include到前面的 PRODUCT_BOOT_JARS += radio_interactor_common
- $(SRC_TARGET_DIR)/product/aosp_base_telephony.mk最終會include到系統(tǒng)默認(rèn)的boot class:
build/target/product/core_minimal.mk
- 對比正常board, 都是先include系統(tǒng)默認(rèn)的boot class, 再追加radio_interactor_common
解決方案
- 驗(yàn)證
- 調(diào)整*.mk文件incldue順序后, 查看編譯后的image文件
$ tree system/framework/x86* system/framework/x86 ├── boot-apache-xml.art ├── boot-apache-xml.oat ├── boot.art ├── boot-bouncycastle.art ├── boot-bouncycastle.oat ├── boot-conscrypt.art ├── boot-conscrypt.oat ├── boot-core-junit.art ├── boot-core-junit.oat ├── boot-core-libart.art ├── boot-core-libart.oat ├── boot-ext.art ├── boot-ext.oat ├── boot-framework.art ├── boot-framework.oat ├── boot-ims-common.art ├── boot-ims-common.oat ├── boot.oat ├── boot-okhttp.art ├── boot-okhttp.oat ├── boot-org.apache.http.legacy.boot.art ├── boot-org.apache.http.legacy.boot.oat ├── boot-radio_interactor_common.art ├── boot-radio_interactor_common.oat ├── boot-telephony-common.art ├── boot-telephony-common.oat ├── boot-voip-common.art └── boot-voip-common.oat system/framework/x86_64 ├── boot-apache-xml.art ├── boot-apache-xml.oat ├── boot.art ├── boot-bouncycastle.art ├── boot-bouncycastle.oat ├── boot-conscrypt.art ├── boot-conscrypt.oat ├── boot-core-junit.art ├── boot-core-junit.oat ├── boot-core-libart.art ├── boot-core-libart.oat ├── boot-ext.art ├── boot-ext.oat ├── boot-framework.art ├── boot-framework.oat ├── boot-ims-common.art ├── boot-ims-common.oat ├── boot.oat ├── boot-okhttp.art ├── boot-okhttp.oat ├── boot-org.apache.http.legacy.boot.art ├── boot-org.apache.http.legacy.boot.oat ├── boot-radio_interactor_common.art ├── boot-radio_interactor_common.oat ├── boot-telephony-common.art ├── boot-telephony-common.oat ├── boot-voip-common.art └── boot-voip-common.oat - 編譯出新版本后,問題不再復(fù)現(xiàn)。
- 調(diào)整*.mk文件incldue順序后, 查看編譯后的image文件
- 為防止以后再掉在這樣的坑里,修改build系統(tǒng)代碼,添加check機(jī)制:當(dāng)檢測到配置不對時(shí)報(bào)錯(cuò),這樣在bringup階段就暴露出問題.
diff --git a/core/dex_preopt_libart.mk b/core/dex_preopt_libart.mk index b469dc0..7145fed 100644 --- a/core/dex_preopt_libart.mk +++ b/core/dex_preopt_libart.mk @@ -87,6 +87,9 @@ LIBART_TARGET_BOOT_DEX_FILES := $(foreach jar,$(LIBART_TARGET_BOOT_JARS),$(call # is converted into to boot.art (to match the legacy assumption that boot.art # exists), and the rest are converted to boot-<name>.art. # In addition, each .art file has an associated .oat file. +ifneq (core-oj,$(word 1,$(LIBART_TARGET_BOOT_JARS))) +$(error "core-oj" must be the first file in <PRODUCT_BOOT_JARS> but now is "$(word 1,$(LIBART_TARGET_BOOT_JARS))") +endif LIBART_TARGET_BOOT_ART_EXTRA_FILES := $(foreach jar,$(wordlist 2,999,$(LIBART_TARGET_BOOT_JARS)),boot-$(jar).art boot-$(jar).oat) LIBART_TARGET_BOOT_ART_EXTRA_FILES += boot.oat