孤舟蓑笠翁,独钓寒江雪

Android 性能优化实战篇 -- ANR 问题分析1

概述

本次分析 ANR 的问题原因是由于应用内存过低,主线程等待 GC 线程完成而导致的 ANR。

分析

从 /data/anr/ 目录下面拉取到 trace 文件,首先看看 main 线程的情况:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
"main" prio=5 tid=1 WaitingForGcToComplete
| group="main" sCount=1 dsCount=0 flags=1 obj=0x73c16330 self=0xf0d48000
| sysTid=23755 nice=-10 cgrp=default sched=0/0 handle=0xf4ead494
| state=S schedstat=( 66207538885 3504506943 19134 ) utm=6386 stm=234 core=3 HZ=100
| stack=0xff710000-0xff712000 stackSize=8MB
| held mutexes=
kernel: (couldn't read /proc/self/task/23755/stack)
native: #00 pc 00019eb0 /system/lib/libc.so (syscall+28)
native: #01 pc 000a6ba3 /system/lib/libart.so (art::ConditionVariable::WaitHoldingLocks(art::Thread*)+78)
native: #02 pc 0017ba95 /system/lib/libart.so (art::gc::Heap::WaitForGcToCompleteLocked(art::gc::GcCause, art::Thread*)+244)
native: #03 pc 00184b53 /system/lib/libart.so (art::gc::Heap::WaitForGcToComplete(art::gc::GcCause, art::Thread*)+186)
native: #04 pc 0018272f /system/lib/libart.so (art::gc::Heap::AllocateInternalWithGc(art::Thread*, art::gc::AllocatorType, bool, unsigned int, unsigned int*, unsigned int*, unsigned int*, art::ObjPtr<art::mirror::Class>*)+74)
native: #05 pc 003ccb8b /system/lib/libart.so (artAllocObjectFromCodeInitializedRegionTLAB+246)
native: #06 pc 00412d47 /system/lib/libart.so (art_quick_alloc_object_initialized_region_tlab+70)
native: #07 pc 0008c6f5 [anon:dalvik-jit-code-cache] (Java_org_hapjs_component_view_b_a_a__Ljava_lang_String_2Landroid_view_MotionEvent_2Z+196)
at org.hapjs.component.view.b.a.a(SourceFile:375)
at org.hapjs.component.view.b.a.a(SourceFile:359)
at org.hapjs.component.view.b.a.a(SourceFile:335)
at org.hapjs.component.view.a.a.onTouchEvent(SourceFile:164)
at android.view.View.dispatchTouchEvent(View.java:12560)
at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:3024)
at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:2705)
......

主线程正在 WaitingForGcToComplete,等待 GC 结束。
看一下内存使用情况:

1
2
3
4
5
6
7
8
9
10
Heap: 0% free, 255MB/256MB; 8277953 objects
......
Total number of allocations 52646254
Total bytes allocated 1GB
Total bytes freed 1GB
Free memory 488KB
Free memory until GC 488KB
Free memory until OOME 488KB
Total memory 256MB
Max memory 256MB

目前内存只有 488K 的剩余,已经接近 256M 的内存限制,内存对象有5千多万,也就是说GC过程需要扫描这些对象的巨大部分,导致耗时很久,另外内存距离OOM只有 488K ,说明有内存泄漏,或内存使用不合理。

另外查看线程信息时发现,此时有大量的线程在运行,仅其中一种类型的线程 [io]-XX竟然有100多个:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
"[io]-1564" prio=5 tid=79 Blocked
| group="main" sCount=1 dsCount=0 flags=1 obj=0x15806e28 self=0xd0c0f600
| sysTid=26683 nice=0 cgrp=default sched=0/0 handle=0xc4c3f970
| state=S schedstat=( 31151045 39657190 904 ) utm=0 stm=3 core=1 HZ=100
| stack=0xc4b3c000-0xc4b3e000 stackSize=1042KB
| held mutexes=
at org.hapjs.render.a.c.f(SourceFile:75)
- waiting to lock <0x007aea59> (a org.hapjs.render.a.a) held by thread 98
at org.hapjs.render.a.c.j(SourceFile:16)
at org.hapjs.render.css.p$c.a(SourceFile:225)
...

......

"[io]-1582" prio=5 tid=98 WaitingForGcToComplete
| group="main" sCount=1 dsCount=0 flags=1 obj=0x15808e90 self=0xcb826800
| sysTid=26701 nice=0 cgrp=default sched=0/0 handle=0xbe0ed970
| state=S schedstat=( 682866530 34270257 465 ) utm=66 stm=1 core=1 HZ=100
| stack=0xbdfea000-0xbdfec000 stackSize=1042KB
| held mutexes=
kernel: (couldn't read /proc/self/task/26701/stack)
native: #00 pc 00019eb0 /system/lib/libc.so (syscall+28)
native: #01 pc 000a6ba3 /system/lib/libart.so (art::ConditionVariable::WaitHoldingLocks(art::Thread*)+78)
native: #02 pc 0017ba95 /system/lib/libart.so (art::gc::Heap::WaitForGcToCompleteLocked(art::gc::GcCause, art::Thread*)+244)
native: #03 pc 00184b53 /system/lib/libart.so (art::gc::Heap::WaitForGcToComplete(art::gc::GcCause, art::Thread*)+186)
native: #04 pc 0018272f /system/lib/libart.so (art::gc::Heap::AllocateInternalWithGc(art::Thread*, art::gc::AllocatorType, bool, unsigned int, unsigned int*, unsigned int*, unsigned int*, art::ObjPtr<art::mirror::Class>*)+74)
native: #05 pc 003ccb8b /system/lib/libart.so (artAllocObjectFromCodeInitializedRegionTLAB+246)
native: #06 pc 00412d47 /system/lib/libart.so (art_quick_alloc_object_initialized_region_tlab+70)
native: #07 pc 00016e8d [anon:dalvik-jit-code-cache] (Java_org_hapjs_render_a_a_a__ILjava_lang_String_2+276)
at org.hapjs.render.a.a.a(SourceFile:42)
- locked <0x007aea59> (a org.hapjs.render.a.a)
at org.hapjs.render.a.d.b(SourceFile:291)
...

......

"[io]-1721" prio=5 tid=246 Blocked
| group="main" sCount=1 dsCount=0 flags=1 obj=0x134418e8 self=0xc7206000
| sysTid=26841 nice=0 cgrp=default sched=0/0 handle=0xb474d970
| state=S schedstat=( 4215470 903438 14 ) utm=0 stm=0 core=4 HZ=100
| stack=0xb464a000-0xb464c000 stackSize=1042KB
| held mutexes=
at org.hapjs.render.a.a.a(SourceFile:-1)
- waiting to lock <0x007aea59> (a org.hapjs.render.a.a) held by thread 98
at org.hapjs.render.a.d.b(SourceFile:291)
...

[io]-1582 也是在等待 GC 完成状态,由于该方法正在执行一个 synchronized 方法,竟然导致有 100 多个线程在等待锁的释放。
根据这个线索,发现执行这些线程是因为前端页面正在创建一个800多个item的列表,复现这个步骤时虽然不会每次都发生 anr,但是界面有明显的卡顿现象。采用分页加载后解决了这个问题。

完成的 trace 日志见注释内容: