Stressing memory on Android
Background
Memory stress issues are less common in android as devices nowadays have considerably higher RAM. Your chances of stumbling upon memory stress issues increase when a large number of users on your app are on low-end devices.
Here, the memory stress we are discussing is not related to java allocations since the java heap has a limit on the allocations for a particular process. Exceeding this limit leads to an out-of-memory exception even on high-end devices. Native heap on the other hand has lesser restrictions and is not garbage collected by GC, which can risk the process if memory is not managed properly on the native side.
LMKD (Lower memory killer daemon) is responsible for killing processes to reclaim memory in case the system memory goes down. LMKD assigns a oom_adj_score
to each process which helps it to prioritize running processes. Processes with a high score are killed first. You can look at your app score in the proc file system path: /proc/<pid>/oom_adj_score
.
You can create observability on such issues around the killing of your processes in android by tracking TRIM_MEMORY_EVENTS by overriding the onTrimMemory
method in your application class.
Problem
The biggest problem in analyzing any issue related to high memory pressure is to be able to reproduce such memory pressure conditions in your local test device farms. Reproducing such memory stress conditions will help you write any instrumentation to verify the behavior of your app in memory pressure conditions and also to empathize with the users when the memory pressure occurs and the device and process start to throttle.
Recently, working on a similar issue I stumbled upon this use case where I wanted to increase the device memory pressure and verify one of the native crashes in the app.
In this article, I will be introducing you to a tool that I found to be super useful for reproducing memory pressure conditions in any device.
Introducing: stressapptest 😎
stressapptest is one of the tools which can help you to reproduce memory pressure on any device. This tool helps you to define the time and memory size of the stress test. Together with this tool and instrumentation, you can verify any speculation that you have when your system goes into memory pressure.
The tool is written in native and we need ndk-build
from ndk for building this tool. You need to build for your test device architecture (x86, armv7, etc.). I have created a forked structure of the original stressapptest in this repository which already has static libraries built for x86. You can push these static libraries via ADB and execute the stress test.
Use the following set of commands to push the static library in your data file system, make it executable and run a stress test for 20 seconds of 990 MBs
Stressful Application Test (or stressapptest) tries to maximize randomized traffic to memory from processor and I/O, with the intent of creating a realistic high load situation. You can go through the internal documentation here.
Observations through stressapptest
SIGABRT due to low memory
Requesting allocations of size larger than free memory while the system is on memory pressure can raise signal abort, a fatal native crash for your process. You can verify whether SIGABRT
is raised by doing a stress test on memory through the stressapptest tool.
Setup
For verifying that SIGABRT is generated, we have done the following setup in our sample app:
- Script
allocate_native.sh
with the following content:
ADB command in the above snippet helps to allocate a number of native objects each of size ~42 MB approximately. This ADB command sends a broadcast to NativeAllocationReceiver.kt
, which is responsible for calling a native method that allocates native objects. We can configure the number of objects getting allocated by providing the number in intent extras as shown in the above code snippet.
2. Script log_memfree.sh
with the following content:
The above shell script logs the free memory from the meminfo proc file into log_free_memory.txt
file. MemFree
value from the meminfo proc file is the unused memory in KBs at any time. It's basically a sum of low free and high free memory which are kernel-based and userspace-based memory respectively.
proc/meminfo
file is part of proc file system which can be useful to get memory related information for a device at any instant. You can further checkout details about proc file system here.
3. Script stress_device.sh
with the following contents:
chmod +x ./log_memfree.sh
adb shell /data/local/tmp/stressapptest -s 20 -M 900 -C 8 & sh log_memfree.sh
The above shell script first, makes log_memfree.sh
executable. In line 2 the ADB command is used to run memory stress with stressapptest of 900 MBs for 20 seconds while also logging free memory in log_free_memory.txt
.
Steps
Perform the following steps for verification of SIGABRT signal:
- Run the
stress_device.sh
4-5 times and note the minimum MemFree value recorded each time in thelog_free_memory.txt
file while running the stress test. You need to interrupt the command with Ctrl+C as soon as the stress test status is shown as passed in the console, to stop an iteration. - Take an average of the minimum MemFree logged in each iteration
- For me the average minimum value for MemFree, while running the stress test, is roughly 115 MB, which means to overflow while the system is under memory pressure I have to natively create 3 objects with my broadcast given each are of 42 MB approximately.
- The next step is to run
allocate_native.sh
while running the stressapptest ADB command instress_device.sh
. - As soon as you will run
allocate_native.sh
wait for a while and BAM you will notice a native crash with SIGABRT.
Conclusion
Extract the latest tombstone inside the /data/tombstones
directory and notice the signal name and the abort message:
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'google/sdk_google_phone_x86/generic_x86:7.1.1/NYC/6695155:userdebug/test-keys'
Revision: '0'
ABI: 'x86'
pid: 5583, tid: 5583, name: njeet.stressapp >>> me.amanjeet.stressapp <<<
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
Abort message: '/Volumes/Android/buildbot/src/android/ndk-release-r21/external/libcxx/../../external/libcxxabi/src/abort_message.cpp:72: abort_message: assertion "terminating with uncaught exception of type std::bad_alloc: std::bad_alloc" failed'
eax 00000000 ebx 000015cf ecx 000015cf edx 00000006
esi ab5b258c edi ab5b2534
xcs 00000073 xds 0000007b xes 0000007b xfs 0000003b xss 0000007b
eip ab4de424 ebp bfd31228 esp bfd311cc flags 00000296
The above part of the generated tombstone shows that we tried to do allocations which were not possible and SIGABRT was generated. Abort message also shows that exception of type std::bad_alloc
was generated which must have been the reason for the abort signal.
Another thing to note is that if we try to do the same number of allocations when the system is not under memory pressure the SIGABRT will not be generated.
So, we have to be considerate of the number of native allocations we do from our process and have to ensure that the memory management is done properly.
TRIM_MEMORY_EVENTS during stressapptest
This was another interesting thing that I was able to verify or reproduce with help of stressapptest. I have never seen TRIM_MEMORY_EVENTS going to TRIM_MEMORY_COMPLETE in our local test devices. But with this stress test, we can easily notice the system going to TRIM_MEMORY_COMPLETE whenever the app is close to getting killed by the system.
The above-captured logs from logcat show trim memory events reach TRIM_MEMORY_COMPLETE which means it was close to getting killed by the system.
LMKD is active 💪
While running the stress test you can also verify that LMKD
which is used to reclaim memory when the system goes under memory pressure has started becoming active and killing the lower priority apps.
For this verification first, create memory stress with the stressapptest of around 1000 MB ~1 GB. This would be enough to invoke LMKD for an emulator with 1.5 GB of total
adb shell /data/local/tmp/stressapptest -s 20 -M 1000 -C 8
While we run this you can grep the kernel driver messages and grep for lowmemorykiller
with the following command:
adb shell dmesg | grep lowmemory
You will see the LMKD logs below which shows that lowmemorykiller
has killed some of lower priority apps also showing the adj score for the process:
[40975.906345] lowmemorykiller: Killing 'd.process.acore' (22130), adj 906,
[40975.941133] lowmemorykiller: Killing 'd.process.acore' (22130), adj 906,
[40975.981826] lowmemorykiller: Killing 'Signal Catcher' (22136), adj 906,
[40976.021413] lowmemorykiller: Killing '.android.dialer' (22337), adj 906,
[40976.044607] lowmemorykiller: Killing 'Gservices' (22351), adj 906,
[40976.047201] lowmemorykiller: Killing 'Gservices' (22360), adj 906,
[40976.116697] lowmemorykiller: Killing '.android.gms.ui' (22387), adj 904,
[40976.163206] lowmemorykiller: Killing 'oid.apps.photos' (22366), adj 904,
[40976.171674] lowmemorykiller: Killing 'Signal Catcher' (22371), adj 904
Did you find this tool useful? What other use cases do you think stressapptest
tool would have? Reach out to me at @droid_singh and let me know.