在正式解釋什么是fd泄露的時候,先看看三份log,是否有眼熟而不知所措感覺?結(jié)合公司同事的深入研究,總結(jié)了多種實際案例,才有了這篇文章,以后FD泄露問題在也不慌了。
log 1: Could not read input channel file descriptors from parcel
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: FATAL EXCEPTION: main
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: Process: com.miui.weather2, PID: 20556
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: java.lang.RuntimeException: Could not read input channel file descriptors from parcel.
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.view.InputChannel.nativeReadFromParcel(Native Method)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.view.InputChannel.readFromParcel(InputChannel.java:148)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.view.InputChannel$1.createFromParcel(InputChannel.java:39)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.view.InputChannel$1.createFromParcel(InputChannel.java:37)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at com.android.internal.view.InputBindResult.<init>(InputBindResult.java:68)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at com.android.internal.view.InputBindResult$1.createFromParcel(InputBindResult.java:112)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at com.android.internal.view.InputBindResult$1.createFromParcel(InputBindResult.java:110)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at com.android.internal.view.IInputMethodManager$Stub$Proxy.startInputOrWindowGainedFocus(IInputMethodManager.java:723)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.view.inputmethod.InputMethodManager.startInputInner(InputMethodManager.java:1295)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.view.inputmethod.InputMethodManager.onPostWindowFocus(InputMethodManager.java:1543)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.view.ViewRootImpl$ViewRootHandler.handleMessage(ViewRootImpl.java:4069)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.os.Handler.dispatchMessage(Handler.java:106)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.os.Looper.loop(Looper.java:171)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.app.ActivityThread.main(ActivityThread.java:6642)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at java.lang.reflect.Method.invoke(Native Method)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:518)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:873)
log 2:Could not allocate JNI Env
06-22 11:59:30.335 2308 2308 E AndroidRuntime: FATAL EXCEPTION: main
06-22 11:59:30.335 2308 2308 E AndroidRuntime: Process: com.xiaomi.bluetooth, PID: 2308
06-22 11:59:30.335 2308 2308 E AndroidRuntime: java.lang.OutOfMemoryError: Could not allocate JNI Env
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at java.lang.Thread.nativeCreate(Native Method)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at java.lang.Thread.start(Thread.java:730)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.c.dk(SynchronizedGattCallback.java:54)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.m.dk(GattPeripheral.java:97)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.m.eN(GattPeripheral.java:227)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.m.eq(GattPeripheral.java:221)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.z.run(PeripheralConnectionManager.java:462)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.os.Handler.handleCallback(Handler.java:754)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.os.Handler.dispatchMessage(Handler.java:95)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.os.Looper.loop(Looper.java:160)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.app.ActivityThread.main(ActivityThread.java:6202)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at java.lang.reflect.Method.invoke(Native Method)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:874)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:764)
log 3:unable to open database file (code 14)
android.database.sqlite.SQLiteCantOpenDatabaseException: unable to open database file (code 14)
at android.database.sqlite.SQLiteConnection.nativeExecuteForChangedRowCount(Native Method)
at android.database.sqlite.SQLiteConnection.executeForChangedRowCount(SQLiteConnection.java:735)
at android.database.sqlite.SQLiteSession.executeForChangedRowCount(SQLiteSession.java:754)
at android.database.sqlite.SQLiteStatement.executeUpdateDelete(SQLiteStatement.java:64)
at android.database.sqlite.SQLiteDatabase.updateWithOnConflict(SQLiteDatabase.java:1653)
at android.database.sqlite.SQLiteDatabase.update(SQLiteDatabase.java:1599)
at com.android.providers.telephony.TelephonyProvider.update(TelephonyProvider.java:2704)
at android.content.ContentProvider$Transport.update(ContentProvider.java:357)
at android.content.ContentResolver.update(ContentResolver.java:1688)
at com.android.internal.telephony.SubscriptionController.setCarrierText(SubscriptionController.java:1202)
at com.android.internal.telephony.SubscriptionControllerInjectorBase$DatabaseHandler.handleMessage(SubscriptionControllerInjectorBase.java:201)
at android.os.Handler.dispatchMessage(Handler.java:106)
at android.os.Looper.loop(Looper.java:164)
at android.os.HandlerThread.run(HandlerThread.java:65)
相信大多人都覺得這類問題不好解決,更有人覺得這種問題直接加個try catch,結(jié)果下個版本接中上報。因為上面的Log基本上都不是案發(fā)現(xiàn)場,正式開始需要先補一波FD泄露的基礎(chǔ)知識。在Android進程系列第一篇---進程基礎(chǔ)中的2.3小節(jié)也有簡單介紹。
一、FD的相關(guān)概念
概念:Fd的全稱是File descriptor,在linux OS里,所有都可以抽象成文件,比如普通的文件、目錄、塊設(shè)備、字符設(shè)備、socket、管道等等。當(dāng)通過一些系統(tǒng)調(diào)用(如open/socket等),會返回一個fd(就是一個數(shù)字)給你,然后根據(jù)這個fd對應(yīng)的文件進行操作,比如讀、寫。
1、FD從何而來?
我們說fd是一個數(shù)字,那么這個數(shù)字是怎么計算出來的?在內(nèi)核進程結(jié)構(gòu)體task_struct中為每個進程維護了一個數(shù)組,數(shù)組下標(biāo)就是fd,里面存儲的是對這個文件的描述了。里面就有files指針,維護著所有打開的文件信息:
struct task_struct {
...
/* Open file information: */
struct files_struct *files;
...
}
/*
* Open file table structure
*/
struct files_struct {
/*
* read mostly part
*/
atomic_t count;
bool resize_in_progress;
wait_queue_head_t resize_wait;
struct fdtable __rcu *fdt;
struct fdtable fdtab;
/*
* written part on a separate cache line in SMP
*/
spinlock_t file_lock ____cacheline_aligned_in_smp;
unsigned int next_fd;
unsigned long close_on_exec_init[1];
unsigned long open_fds_init[1];
unsigned long full_fds_bits_init[1];
struct file __rcu * fd_array[NR_OPEN_DEFAULT];
};
files_struct中維護一個fdtable,fdtable里的fd就是一個數(shù)組,file結(jié)構(gòu)體就是為打開文件的信息了。
struct fdtable {
unsigned int max_fds;
struct file __rcu **fd; /* current fd array */
unsigned long *close_on_exec;
unsigned long *open_fds;
unsigned long *full_fds_bits;
struct rcu_head rcu;
};
2、閾值
linux默認(rèn)對每個進程最大能打開的fd的個數(shù)是1024(軟限制是1024,硬限制是4096),你可以通過/proc/$pid/limits查看Max open files:

當(dāng)一個進程打開的文件數(shù)超過這個軟限制值1024將無法再打開文件了。所以就報出各種問題,和OOM問題一樣,crash堆棧有可能只是壓死駱駝的最后一根稻草,并不是真實案發(fā)現(xiàn)場。所以用完fd后需要close關(guān)閉這個fd,那么這個fd對應(yīng)的數(shù)字就被系統(tǒng)回收了,下一次的open才會被重新利用。
軟限制和硬限制的區(qū)別
硬限制是可以在任何時候任何進程中設(shè)置 但硬限制只能由超級用戶提起
軟限制是內(nèi)核實際執(zhí)行的限制,任何進程都可以將軟限制設(shè)置為任意小于等于對進程限制的硬限制的操作fd
如果覺得fd不夠用了,也可以用下面方式調(diào)整.
getrlimit(RLIMIT_NOFILE, &rlim);
setrlimit(RLIMIT_NOFILE, &rlim);
ulimit -n 2048
android.system.Os.getrlimit(OsConstants.RLIMIT_NOFILE);
egg:
void modifyfdlimit() {
rlimit fdLimit;
fdLimit.rlim_cur = 30000;
fdLimit.rlim_max = 30000;
if (-1 == setrlimit(RLIMIT_NOFILE, & fdLimit)) {
printf("Set max fd open count fai. /nl");
char cmdBuffer[ 64];
sprintf(cmdBuffer, "ulimit -n %d", 30000);
if (-1 == system(cmdBuffer)) {
printf("%s failed. /n", cmdBuffer);
exit(0);
}
if (-1 == getrlimit(RLIMIT_NOFILE, & fdLimit)){
printf("Ulimit fd number failed.");
exit(0);
}
}
}
3、打開的fd查看
使用ls -la /proc/$pid/fd查看
nitrogen:/ # pidof system_server
1956
nitrogen:/ # ls -la /proc/1956/f
fd/ fdinfo/
nitrogen:/ # ls -la /proc/1956/fd
total 0
dr-x------ 2 system system 0 2018-11-02 10:57 .
dr-xr-xr-x 9 system system 0 2018-11-02 10:57 ..
lrwx------ 1 system system 64 2018-11-02 12:26 0 -> /dev/null
lrwx------ 1 system system 64 2018-11-02 12:26 1 -> /dev/null
lr-x------ 1 system system 64 2018-11-02 12:26 10 -> /system/framework/QPerformance.jar
lrwx------ 1 system system 64 2018-11-02 12:26 100 -> anon_inode:[timerfd]
lrwx------ 1 system system 64 2018-11-02 12:26 101 -> anon_inode:[timerfd]
lrwx------ 1 system system 64 2018-11-02 12:26 102 -> anon_inode:[timerfd]
lrwx------ 1 system system 64 2018-11-02 12:26 103 -> anon_inode:[timerfd]
lrwx------ 1 system system 64 2018-11-02 12:26 104 -> anon_inode:[timerfd]
lrwx------ 1 system system 64 2018-11-02 12:26 105 -> anon_inode:[eventpoll]
lr-x------ 1 system system 64 2018-11-02 12:26 106 -> anon_inode:inotify
lr-x------ 1 system system 64 2018-11-02 12:26 107 -> pipe:[36681]
l-wx------ 1 system system 64 2018-11-02 12:26 108 -> pipe:[36681]
lrwx------ 1 system system 64 2018-11-02 12:26 109 -> anon_inode:[eventfd]
lr-x------ 1 system system 64 2018-11-02 12:26 11 -> /system/framework/core-oj.jar
lrwx------ 1 system system 64 2018-11-02 12:26 110 -> anon_inode:[eventpoll]
lrwx------ 1 system system 64 2018-11-02 12:26 111 -> socket:[35371]
lrwx------ 1 system system 64 2018-11-02 12:26 112 -> socket:[35372]
lr-x------ 1 system system 64 2018-11-02 12:26 113 -> /system/media/theme/defau
二、Fd Leak案例
現(xiàn)在看幾種FD泄露問題的案例,F(xiàn)D泄露問題的特點是:
- 同一個問題可能出現(xiàn)不同堆棧, 比較隱晦
- Fd泄漏時內(nèi)存可能不會出現(xiàn)不足,就算觸發(fā)GC也不一定能夠回收已經(jīng)創(chuàng)建的文件句柄
日志關(guān)鍵字:
ashmem_create_region failed for ‘indirect ref table’: Too many open files
"Too many open files"
"Could not allocate JNI Env"
"Could not allocate dup blob fd"
"Could not read input channel file descriptors from parcel"
"pthread_create"
"InputChannel is not initialized"
"Could not open input channel pair"
當(dāng)你看到上面幾種crash的堆棧之后,就需要往fd泄露的方向上去思考了
1、Resource相關(guān)
使用輸入輸出流沒有關(guān)閉的可能會出問題,F(xiàn)ileInputStream,F(xiàn)ileOutputStream,F(xiàn)ileReader,F(xiàn)ileWriter 等,因為每打開一個文件需要fd。一些輸入流也提供了基于fd的構(gòu)造方法
174 public FileInputStream(FileDescriptor fdObj) {
175 this(fdObj, false /* isFdOwner */);
176 }
177
下面是一種泄露案例
frameworks/base/services/core/java/com/android/server/pm/ResmonWhitelistPackage.java
10final class ResmonWhitelistPackage {
11 private final File mSystemDir;
12 private final File mWhitelistFile;
13
14 final ArrayList<String> mPackages = new ArrayList<String>();
15
16 ResmonWhitelistPackage() {
17 mSystemDir = new File("/system/", "etc");
18 mWhitelistFile = new File(mSystemDir, "resmonwhitelist.txt");
19 }
20
21 void readList() {
....
25 try {
26 /// M: Clear white list record before update it
27 mPackages.clear();
28 BufferedReader br = new BufferedReader(new FileReader(mWhitelistFile));
29 String line = br.readLine();
30 while (line != null) {
31 mPackages.add(line);
32 line = br.readLine();
33 }
34 br.close();
35 } catch (IOException e) {
36 //Log.e(PackageManagerService.TAG, "IO Exception happened while reading resmon whitelist");
37 e.printStackTrace();
38 }
39 }
40}
br.close并不是在finally語句中,可能會出現(xiàn)未關(guān)閉的可能。如果代碼寫的風(fēng)騷一點,也有辦法。從 Java 7 build 105 版本開始,Java 7 的編譯器和運行環(huán)境支持新的 try-with-resources 語句,稱為 ARM 塊(Automatic Resource Management) ,自動資源管理。
private static void customBufferStreamCopy(File source, File target) {
try (InputStream fis = new FileInputStream(source);
OutputStream fos = new FileOutputStream(target)){
byte[] buf = new byte[8192];
int i;
while ((i = fis.read(buf)) != -1) {
fos.write(buf, 0, i);
}
}
catch (Exception e) {
e.printStackTrace();
}
}
代碼清晰,且不會發(fā)生泄露。
2、HandlerThread相關(guān)
使用HandlerThread不小心也會發(fā)生fd泄露,看看這個案例
2.1、現(xiàn)象
systemui總是crash,發(fā)生問題系統(tǒng)版本Android O
2.2、初步分析
pid: 18465, tid: 32737, name: async_sensor >>> com.android.systemui <<<
signal 5 (SIGTRAP), code -32763 (PTRACE_EVENT_STOP), fault addr 0x3e800007fe1
x0 fffffffffffffffc x1 000000735df1ec38 x2 0000000000000010 x3 00000000ffffffff
x4 0000000000000000 x5 0000000000000008 x6 0000007428971000 x7 0000000000bb3876
x8 0000000000000016 x9 7fffffffffffffff x10 000000000000000c x11 0000000000000000
x12 000000735df1ed38 x13 000000005b20a831 x14 002cd0c4e58dc31b x15 0000fdf7aa690a91
x16 00000074245e7498 x17 000000742453bd00 x18 0000000000000004 x19 000000735df1f588
x20 000000738727b708 x21 00000000ffffffff x22 000000735df1f588 x23 000000738727b660
x24 0000000000000028 x25 000000000000000c x26 0000000014a000b0 x27 0000007385815300
x28 00000000710507c8 x29 000000735df1ebe0 x30 000000742453bd38
sp 000000735df1ebc0 pc 00000074245866dc pstate 0000000060000000
v0 00000000000000000000000000000000 v1 00000000000000000000000000000001
v2 00000000000000002065766974616e3c v3 00000000000000000000000000000000
v4 00000000000000008020080200000000 v5 00000000000000004000000000000000
v6 00000000000000000000000000000000 v7 00000000000000008020080280200802
v8 00000000000000000000000000000000 v9 00000000000000000000000000000000
v10 00000000000000000000000000000000 v11 00000000000000000000000000000000
v12 00000000000000000000000000000000 v13 00000000000000000000000000000000
v14 00000000000000000000000000000000 v15 00000000000000000000000000000000
v16 40100401401004014010040140100401 v17 a0080000a00a0000a800aa0040404000
v18 80200800000000008020080200000000 v19 000000000000000000000000ebad8083
v20 000000000000000000000000ebad8084 v21 000000000000000000000000ebad8085
v22 000000000000000000000000ebad8086 v23 000000000000000000000000ebad8087
v24 000000000000000000000000ebad8088 v25 000000000000000000000000ebad8089
v26 000000000000000000000000ebad808a v27 000000000000000000000000ebad808b
v28 000000000000000000000000ebad808c v29 000000000000000000000000ebad808d
v30 000000000000000000000000ebad808e v31 00000000000000000000000041e00000
fpsr 00000013 fpcr 00000000
backtrace:
#00 pc 000000000006a6dc /system/lib64/libc.so (__epoll_pwait+8)
#01 pc 000000000001fd34 /system/lib64/libc.so (epoll_pwait+52)
#02 pc 0000000000015d08 /system/lib64/libutils.so (android::Looper::pollInner(int)+144)
#03 pc 0000000000015bf0 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+108)
#04 pc 0000000000111bac /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
#05 pc 0000000000c005cc /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.app.NativeActivity.onWindowFocusChangedNative [DEDUPED]+140)
#06 pc 0000000001773f00 /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.os.MessageQueue.next+192)
乍看是處理消息的時候掛了?繼續(xù)查看log發(fā)現(xiàn)
06-15 22:00:33.921 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files
06-15 22:00:33.921 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22
06-15 22:00:33.921 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed
06-15 22:00:33.921 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files
06-15 22:00:33.921 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22
06-15 22:00:33.921 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed
06-15 22:00:33.921 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files
06-15 22:00:33.921 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22
06-15 22:00:33.921 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed
06-15 22:00:33.922 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files
06-15 22:00:33.922 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22
06-15 22:00:33.922 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed
06-15 22:00:33.922 1000 2155 2335 E OpenGLRenderer: GL error: GL_INVALID_OPERATION
06-15 22:00:33.922 1000 2155 2335 F OpenGLRenderer: glCopyTexSubImage2D error! GL_INVALID_OPERATION (0x502
狀態(tài)欄open fd超過1024, 看log有很多上面這種log,這個是真實案發(fā)現(xiàn)場嗎?
2.3、深入分析
O上發(fā)生NE時會將fd信息打印到tombstone文件中,看fd信息確實已經(jīng)滿了,多為anon_inode:[eventfd]和anon_inode:dmabuf,
backtrace:
#00 pc 000000000006a6dc /system/lib64/libc.so (__epoll_pwait+8)
#01 pc 000000000001fd34 /system/lib64/libc.so (epoll_pwait+52)
#02 pc 0000000000015d08 /system/lib64/libutils.so (android::Looper::pollInner(int)+144)
#03 pc 0000000000015bf0 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+108)
#04 pc 0000000000111bac /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
#05 pc 0000000000c005cc /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.app.NativeActivity.onWindowFocusChangedNative [DEDUPED]+140)
#06 pc 0000000001773f00 /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.os.MessageQueue.next+192)
....
fd 556: anon_inode:[eventpoll]
fd 557: anon_inode:[eventpoll]
fd 558: anon_inode:[eventfd]
fd 559: anon_inode:[eventpoll]
fd 560: anon_inode:[eventfd]
fd 561: anon_inode:[eventpoll]
fd 562: anon_inode:[eventfd]
fd 563: anon_inode:[eventfd]
fd 564: anon_inode:[eventpoll]
fd 565: anon_inode:[eventfd]
fd 566: anon_inode:[eventpoll]
fd 567: /dev/ashmem
..... //省略千行
fd 1022: anon_inode:dmabuf
fd 1023: socket:[3549620]
通過trace分析,還有一個關(guān)鍵的異常log,看到systemui進程有很多個async_sensor線程,為什么這個線程這么多呢?
pid: 11019, tid: 2301, name: async_sensor >>> com.android.systemui <<<
pid: 11019, tid: 2431, name: async_sensor >>> com.android.systemui <<<
pid: 11019, tid: 2522, name: async_sensor >>> com.android.systemui <<<
pid: 11019, tid: 2542, name: async_sensor >>> com.android.systemui <<<
pid: 11019, tid: 2600, name: async_sensor >>> com.android.systemui <<<
.....//省略若干
pid: 11019, tid: 5693, name: async_sensor >>> com.android.systemui <<<
搜查代碼async_sensor是什么?發(fā)現(xiàn)async_sensor是個HanderThread

看在DozeFactory中,有new AsyncSensorManager 的地方:

繼續(xù)查看assembleMachine方法在哪里調(diào)用的,DozeService中有調(diào)用 assembleMachine的地方

回頭在結(jié)合log發(fā)現(xiàn),DozeService被頻繁的啟動,看來一步步的接近真相了。這個問題看來是鎖屏同事造成的問題。
isTest=false, canDoze=true, userId=0
06-15 21:48:11.888 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:48:13.337 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:48:24.519 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:48:33.577 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:48:50.640 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:48:56.540 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:28.240 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:29.207 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:30.206 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:33.332 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:37.435 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:50.529 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:50:20.010 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:50:30.148 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
......
2.4、修復(fù)方案
最終轉(zhuǎn)給鎖屏同事,將此修復(fù)

所以本問題的RootCase就是頻繁的啟動了DozeService,創(chuàng)建了大量的HandlerThread導(dǎo)致fd泄露,那么為什么HandlerThread和fd泄露有關(guān)系呢?跟蹤源碼發(fā)現(xiàn)HandlerThread創(chuàng)建會引起Looper的創(chuàng)建,每一個Looper在創(chuàng)建的時候會打開兩個fd,一個是eventfd,另外一個是mEpolled,這個和tombstone文件中打印的fd也對上了。

總結(jié),這種泄露問題如果分析的時候,如果不知道HandlerThread會創(chuàng)建兩個fd的基本知識,那么這個問題比較難以分析。
2、Thread.start相關(guān)
線程啟動的時候,可能也會有fd泄露的風(fēng)險,不過這種錯誤不太容易犯下,如果你真是在一個循環(huán)中創(chuàng)建1024線程,那么立刻見效,程序死掉。

trace1
java.lang.OutOfMemoryError: Could not allocate JNI Env
at java.lang.Thread.nativeCreate(Native Method)
at java.lang.Thread.start(Thread.java:729)
at com.android.server.wifi.WifiNative.startHal(WifiNative.java:1639)
at com.android.server.wifi.WifiStateMachine.setupDriverForSoftAp(WifiStateMachine.java:3970)
at com.android.server.wifi.WifiStateMachine.-wrap9(WifiStateMachine.java)
at com.android.server.wifi.WifiStateMachine$InitialState.processMessage(WifiStateMachine.java:4480)
at com.android.internal.util.StateMachine$SmHandler.processMsg(StateMachine.java:980)
at com.android.internal.util.StateMachine$SmHandler.handleMessage(StateMachine.java:799)
at android.os.Handler.dispatchMessage(Handler.java:102)
at android.os.Looper.loop(Looper.java:163)
at android.os.HandlerThread.run(HandlerThread.java:61)
trace2
java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
at java.lang.Thread.nativeCreate(Native Method)
at java.lang.Thread.start(Thread.java:733)
at com.tencent.mm.sdk.f.b$a.start(SourceFile:61)
at com.tencent.mm.am.a.bU(SourceFile:60)
at com.tencent.mm.ui.MMAppMgr$8.tC(SourceFile:315)
at com.tencent.mm.sdk.platformtools.am.handleMessage(SourceFile:69)
at com.tencent.mm.sdk.platformtools.aj.handleMessage(SourceFile:173)
at com.tencent.mm.sdk.platformtools.aj.dispatchMessage(SourceFile:128)
at android.os.Looper.loop(Looper.java:176)
at android.app.ActivityThread.main(ActivityThread.java:6701)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:246)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:783)
3、Inputchannel 相關(guān)
Inputchannel也會可能出現(xiàn)fd泄露問題,如下:

3.1 在Activity中不斷彈Dialog
public class Main2Activity extends Activity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main2);
}
public void onClick(View view) {
for (int i = 0; i < 1024; i++) {
AlertDialog.Builder builder = new AlertDialog.Builder(this);
builder.setTitle("fd").setIcon(R.drawable.ic_launcher_background).create();
builder.show();
}
}
}
不過一會,這個App就死了,報出了下面的問題
11-02 17:38:22.263 9351-9351/com.example.wangjing.rebootdemo E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.example.wangjing.rebootdemo, PID: 9351
java.lang.IllegalStateException: Could not execute method for android:onClick
at android.view.View$DeclaredOnClickListener.onClick(View.java:5391)
at android.view.View.performClick(View.java:6311)
at android.view.View$PerformClick.run(View.java:24833)
at android.os.Handler.handleCallback(Handler.java:794)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:173)
at android.app.ActivityThread.main(ActivityThread.java:6653)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:547)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:821)
Caused by: java.lang.reflect.InvocationTargetException
at java.lang.reflect.Method.invoke(Native Method)
at android.view.View$DeclaredOnClickListener.onClick(View.java:5386)
at android.view.View.performClick(View.java:6311)
at android.view.View$PerformClick.run(View.java:24833)
at android.os.Handler.handleCallback(Handler.java:794)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:173)
at android.app.ActivityThread.main(ActivityThread.java:6653)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:547)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:821)
Caused by: java.lang.RuntimeException: Could not read input channel file descriptors from parcel.
at android.view.InputChannel.nativeReadFromParcel(Native Method)
at android.view.InputChannel.readFromParcel(InputChannel.java:148)
at android.view.IWindowSession$Stub$Proxy.addToDisplay(IWindowSession.java:804)
at android.view.ViewRootImpl.setView(ViewRootImpl.java:770)
at android.view.WindowManagerGlobal.addView(WindowManagerGlobal.java:356)
at android.view.WindowManagerImpl.addView(WindowManagerImpl.java:94)
at android.app.Dialog.show(Dialog.java:330)
at android.app.AlertDialog$Builder.show(AlertDialog.java:1114)
at com.example.wangjing.rebootdemo.Main2Activity.onClick(Main2Activity.java:28)
at java.lang.reflect.Method.invoke(Native Method)
at android.view.View$DeclaredOnClickListener.onClick(View.java:5386)
at android.view.View.performClick(View.java:6311)
at android.view.View$PerformClick.run(View.java:24833)
at android.os.Handler.handleCallback(Handler.java:794)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:173)
at android.app.ActivityThread.main(ActivityThread.java:6653)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:547)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:821)
查看一下這個進程的fd信息,很多的socket,正常情況下,這些東西是沒有的
jason:/ # ps -ef |grep "wangjing"
u0_a161 13134 9462 3 17:43:12 ? 00:00:04 com.example.wangjing.rebootdemo
root 13385 13380 3 17:45:26 pts/1 00:00:00 grep wangjing
jason:/ # ls -la proc/13134/fd/
total 0
dr-x------ 2 u0_a161 u0_a161 0 2018-11-02 17:43 .
dr-xr-xr-x 9 u0_a161 u0_a161 0 2018-11-02 17:43 ..
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 0 -> /dev/null
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 1 -> /dev/null
lr-x------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 10 -> /system/framework/com.nxp.nfc.nq.jar
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 100 -> socket:[17760179]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 101 -> socket:[17753761]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 102 -> socket:[17773237]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 103 -> socket:[17760182]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 104 -> socket:[17760184]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 105 -> socket:[17773239]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 106 -> socket:[17776657]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 107 -> socket:[17774959]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 108 -> socket:[17776659]
......
5、Bitmap 相關(guān)
bitmap也是需要fd的,如下圖,沒有關(guān)閉,可能引發(fā)fd泄露的可能。

trace1
java.lang.RuntimeException: Could not allocate dup blob fd.
at android.graphics.Bitmap.nativeCreateFromParcel(Native Method)
at android.graphics.Bitmap.access$100(Bitmap.java:36)
at android.graphics.Bitmap$1.createFromParcel(Bitmap.java:1528)
at android.graphics.Bitmap$1.createFromParcel(Bitmap.java:1520)
at android.widget.RemoteViews$BitmapCache.<init>(RemoteViews.java:954)
at android.widget.RemoteViews.<init>(RemoteViews.java:1820)
at android.widget.RemoteViews.<init>(RemoteViews.java:1812)
at android.widget.RemoteViews.clone(RemoteViews.java:1905)
at android.app.Notification.cloneInto(Notification.java:1534)
at android.app.Notification.clone(Notification.java:1508)
at android.service.notification.StatusBarNotification.clone(StatusBarNotification.java:161)
at com.android.server.notification.NotificationManagerService$NotificationListeners.notifyPostedLocked(NotificationManagerService.java:3557)
at com.android.server.notification.NotificationManagerService$8.run(NotificationManagerService.java:2337)
at android.os.Handler.handleCallback(Handler.java:815)
at android.os.Handler.dispatchMessage(Handler.java:104)
at android.os.Looper.loop(Looper.java:207)
at com.android.server.SystemServer.run(SystemServer.java:410)
at com.android.server.SystemServer.main(SystemServer.java:255)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:933)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:782)
三、總結(jié)
通過上面的五個案例,總結(jié)常見的fd泄露的情景,一般出現(xiàn)下面的log,就需要懷疑是否有fd泄露的情況。
"Too many open files"
"Could not allocate JNI Env"
"Could not allocate dup blob fd"
"Could not read input channel file descriptors from parcel"
"pthread_create"
"InputChannel is not initialized"
"Could not open input channel pair"
大批量的打開“anon_inode:[eventpoll]” 和 "pipe" 或者 "anon_inode:[eventfd]", 超過100個eventpoll, 通常情況下是開啟了太多的HandlerThread/Looper/MessageQueue, 線程忘記關(guān)閉, 或者looper 沒有釋放. 可以抓取hprof 進行快速分析
對于system server, 如果有大批量的socket 打開, 可能是因為Input Channel 沒有關(guān)閉, 此類同樣抓取hprof, 查看system server 中WindowState 的情況.
大量的打開“/dev/ashmem”, 如果是Context provider, 或者其他app, 很可能是打開數(shù)據(jù)庫沒有關(guān)閉, 或者數(shù)據(jù)庫鏈接頻繁打開忘記關(guān)閉. 這個時候查看這個進程的maps, cat proc/pid/maps, 即可看到這個ashmem 的name, 然后進一步可知道在哪里泄露.
3.1、容易復(fù)現(xiàn)
1.查看fd信息adb shell ls -a -l /proc/<pid>/fd ,lsof
2.查看進程線程信息:ps -t <pid>,或者抓進程trace, kill -3 <pid>
3.抓取hprof定位資源使用情況
3.2、難復(fù)現(xiàn)
1.對于應(yīng)用自身fd泄漏發(fā)生JE時可以在復(fù)寫UncatchHandlerException在應(yīng)用crash的時候通過readlink的方式讀取/proc/self/fd的信息,在后面發(fā)生的時候可以以獲取fd信息
2.O之后NE的Tombstone文件中有open files,可以查看打開的fd信息
3.抓取進程的ps信息或者trace信息
4.如果是inputchannel類型的,有可能是窗口類型的,因此可以查看window情況,dumpsys window