孤舟蓑笠翁,独钓寒江雪

Android 性能优化实战篇 -- ANR 问题分析2

概述

本次分析 ANR 的问题原因是由于通过 Binder 进行数据调用时由于对端被 block 而出现的 anr,涉及到了多个进程。
现象是进入应用后出现了一段事件的白屏,后面就尝试唤出多任务界面,结果就提示系统界面无响应。

分析

先来看一下系统界面的主线程的 trace 日志信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x741842b0 self=0xede4d000
| sysTid=2443 nice=-10 cgrp=default sched=0/0 handle=0xf1e14494
| state=S schedstat=( 10051801161 3897847955 16280 ) utm=783 stm=222 core=5 HZ=100
| stack=0xff543000-0xff545000 stackSize=8MB
| held mutexes=
kernel: (couldn't read /proc/self/task/2443/stack)
native: #00 pc 0006146c /system/lib/libc.so (__ioctl+8)
native: #01 pc 00022e63 /system/lib/libc.so (ioctl+30)
native: #02 pc 000405b9 /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+204)
native: #03 pc 00040fc1 /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+28)
native: #04 pc 000387fd /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
native: #05 pc 000cabcf /system/lib/libandroid_runtime.so (android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int)+82)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:1146)
at com.*.*.launcher.IExternalService$Stub$Proxy.setExternalCallback(IExternalService.java:230)
at com.*.systemui.fullscreen.misc.LauncherHelper.setExternalCallback(LauncherHelper.java:75)
at com.*.systemui.fullscreen.SystemUIGestureTouchHanlder.setCallback(SystemUIGestureTouchHanlder.java:160)
at com.*.systemui.recents.FlymeRecentsImpl.dismissStage(FlymeRecentsImpl.java:1148)
at com.*.systemui.recents.FlymeRecentsImpl$7.onDissmis(FlymeRecentsImpl.java:1052)
at com.*.systemui.recents.RecentsWindow.dismissStage(RecentsWindow.java:216)
at com.*.systemui.recents.RecentsWindow.onStop(RecentsWindow.java:354)
at com.*.systemui.recents.RecentsWindow$19$1.run(RecentsWindow.java:1018)
at android.os.Handler.handleCallback(Handler.java:873)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:193)
at android.app.ActivityThread.main(ActivityThread.java:6921)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:937)

可以看到是是多任务在进行 binder 调用时出现了 anr。
从调用栈来看,IExternalService 的服务端是 launcher ,我们需要分析 launcher 卡主的原因。
从抓取的日志文件中,也发现了 launcher 进行的 trace 日志信息。
发现 launcher 的几个 Binder 的 16 个线程都处于 Blocked 状态,在等待 15 号线程释放锁:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
"Binder:6492_1" prio=5 tid=8 Blocked
| group="main" sCount=1 dsCount=0 flags=1 obj=0x13440438 self=0x763fc16800
| sysTid=6503 nice=0 cgrp=default sched=0/0 handle=0x76308694f0
| state=S schedstat=( 77554719 76120884 394 ) utm=3 stm=3 core=2 HZ=100
| stack=0x763076e000-0x7630770000 stackSize=1009KB
| held mutexes=
at android.database.sqlite.SQLiteOpenHelper.getWritableDatabase(SQLiteOpenHelper.java:297)
- waiting to lock <0x0b7ac58a> (a com.meizu.flyme.launcher.LauncherProvider$a) held by thread 15
at com.meizu.flyme.launcher.LauncherProvider.query(SourceFile:202)
at android.content.ContentProvider.query(ContentProvider.java:1140)
at android.content.ContentProvider.query(ContentProvider.java:1232)
at android.content.ContentProvider$Transport.query(ContentProvider.java:258)
at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:102)
at android.os.Binder.execTransact(Binder.java:739)

......

"Binder:6492_F" prio=5 tid=43 Blocked
| group="main" sCount=1 dsCount=0 flags=1 obj=0x13a818a0 self=0x762a266400
| sysTid=4401 nice=-10 cgrp=default sched=0/0 handle=0x76295ef4f0
| state=S schedstat=( 21914528 36038592 117 ) utm=2 stm=0 core=2 HZ=100
| stack=0x76294f4000-0x76294f6000 stackSize=1009KB
| held mutexes=
at android.database.sqlite.SQLiteOpenHelper.getWritableDatabase(SQLiteOpenHelper.java:297)
- waiting to lock <0x0b7ac58a> (a com.meizu.flyme.launcher.LauncherProvider$a) held by thread 15
at com.meizu.flyme.launcher.LauncherProvider.query(SourceFile:202)
at android.content.ContentProvider.query(ContentProvider.java:1140)
at android.content.ContentProvider.query(ContentProvider.java:1232)
at android.content.ContentProvider$Transport.query(ContentProvider.java:258)
at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:102)
at android.os.Binder.execTransact(Binder.java:739)

在来看一下 15 号线程的状态:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
"launcher-loader" prio=5 tid=15 TimedWaiting
| group="main" sCount=1 dsCount=0 flags=1 obj=0x134409a8 self=0x763c81d800
| sysTid=6514 nice=0 cgrp=default sched=0/0 handle=0x762f9364f0
| state=S schedstat=( 2798107390 1594994571 2798 ) utm=215 stm=63 core=0 HZ=100
| stack=0x762f833000-0x762f835000 stackSize=1041KB
| held mutexes=
at java.lang.Object.wait(Native method)
- waiting on <0x09602018> (a java.lang.Object)
at java.lang.Thread.parkFor$(Thread.java:2137)
- locked <0x09602018> (a java.lang.Object)
at sun.misc.Unsafe.park(Unsafe.java:358)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:230)
at android.database.sqlite.SQLiteConnectionPool.waitForConnection(SQLiteConnectionPool.java:754)
at android.database.sqlite.SQLiteConnectionPool.acquireConnection(SQLiteConnectionPool.java:378)
at android.database.sqlite.SQLiteSession.acquireConnection(SQLiteSession.java:894)
at android.database.sqlite.SQLiteSession.beginTransactionUnchecked(SQLiteSession.java:310)
at android.database.sqlite.SQLiteSession.beginTransaction(SQLiteSession.java:298)
at android.database.sqlite.SQLiteDatabase.beginTransaction(SQLiteDatabase.java:549)
at android.database.sqlite.SQLiteDatabase.beginTransaction(SQLiteDatabase.java:460)
at com.meizu.flyme.launcher.LauncherProvider$a.a(SourceFile:1734)
- locked <0x0b7ac58a> (a com.meizu.flyme.launcher.LauncherProvider$a)
at com.meizu.flyme.launcher.LauncherProvider.a(SourceFile:325)
at com.meizu.flyme.launcher.LauncherModel.a(SourceFile:1808)
at com.meizu.flyme.launcher.LauncherModel$1.run(SourceFile:884)
- locked <0x02452a71> (a java.lang.Object)
at com.meizu.flyme.launcher.LauncherModel.b(SourceFile:510)
at com.meizu.flyme.launcher.LauncherModel.a(SourceFile:970)
at com.meizu.flyme.launcher.LauncherModel$i.run(SourceFile:4669)
at android.os.Handler.handleCallback(Handler.java:873)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:193)
at android.os.HandlerThread.run(HandlerThread.java:65)

看到 launcher-loader 在加载数据的时候卡主。
再来看一下 event log,在anr时间点附近有一段 am_failed_to_pause 的日志。

1
I am_failed_to_pause: [0,326199,com.meizu.flyme.directservice/org.hapjs.LauncherActivity$Launcher0,(none)]

在 main log中,也发现

1
01-23 15:57:07.044  1380  1420 W ActivityManager: Activity stop timeout for ActivityRecord{4fa37 u0 com.meizu.flyme.directservice/org.hapjs.LauncherActivity$Launcher0 t119}

由于发生 anr 之前,我们的应用发生过一段白屏,因此推测,我们的应用启动后在读取桌面数据库的数据,而此时桌面在加载数据库的时候被卡住,此时调起多任务,因为我们应用被卡住,没有正确地pause掉,调起多任务时也需要读取桌面的数据库,会出现前面分析的 binder 现在被 blocked 而出现 anr。
那么现在就需要分析我们应用白屏的原因,是在启动时哪里读取了桌面数据库。发现果然是在主线程读取了数据库,这个地方倒不是犯了一个在主线程读取数据库的错误,因为这里的逻辑需要必须同步地获取桌面数据。后面在这里加个一个超时等待,解决了这个问题。

完整日志参见注释内容: