Stressing memory on Android

Background

Memory stress issues are less common in android as devices nowadays have considerably higher RAM. Your chances of stumbling upon memory stress issues increase when a large number of users on your app are on low-end devices.

Here, the memory stress we are discussing is not related to java allocations since the java heap has a limit on the allocations for a particular process. Exceeding this limit leads to an out-of-memory exception even on high-end devices. Native heap on the other hand has lesser restrictions and is not garbage collected by GC, which can risk the process if memory is not managed properly on the native side.

LMKD (Lower memory killer daemon) is responsible for killing processes to reclaim memory in case the system memory goes down. LMKD assigns a oom_adj_score to each process which helps it to prioritize running processes. Processes with a high score are killed first. You can look at your app score in the proc file system path: /proc/<pid>/oom_adj_score.

You can create observability on such issues around the killing of your processes in android by tracking TRIM_MEMORY_EVENTS by overriding the onTrimMemory method in your application class.

Problem

The biggest problem in analyzing any issue related to high memory pressure is to be able to reproduce such memory pressure conditions in your local test device farms. Reproducing such memory stress conditions will help you write any instrumentation to verify the behavior of your app in memory pressure conditions and also to empathize with the users when the memory pressure occurs and the device and process start to throttle.

Recently, working on a similar issue I stumbled upon this use case where I wanted to increase the device memory pressure and verify one of the native crashes in the app.

In this article, I will be introducing you to a tool that I found to be super useful for reproducing memory pressure conditions in any device.

Introducing: stressapptest 😎

stressapptest is one of the tools which can help you to reproduce memory pressure on any device. This tool helps you to define the time and memory size of the stress test. Together with this tool and instrumentation, you can verify any speculation that you have when your system goes into memory pressure.

The tool is written in native and we need ndk-build from ndk for building this tool. You need to build for your test device architecture (x86, armv7, etc.). I have created a forked structure of the original stressapptest in this repository which already has static libraries built for x86. You can push these static libraries via ADB and execute the stress test.

Use the following set of commands to push the static library in your data file system, make it executable and run a stress test for 20 seconds of 990 MBs

adb push stressapptest /data/local/tmp/
adb shell chmod 777  /data/local/tmp/stressapptest
adb shell /data/local/tmp/stressapptest -s 20 -M 990 -C 8

Executing the stressapptest

Stressful Application Test (or stressapptest) tries to maximize randomized traffic to memory from processor and I/O, with the intent of creating a realistic high load situation. You can go through the internal documentation here.

Observations through stressapptest

SIGABRT due to low memory

Requesting allocations of size larger than free memory while the system is on memory pressure can raise signal abort, a fatal native crash for your process. You can verify whether SIGABRT is raised by doing a stress test on memory through the stressapptest tool.

Setup

For verifying that SIGABRT is generated, we have done the following setup in our sample app:

  1. Script allocate_native.sh with the following content:
while :
do  
  adb shell am broadcast -a me.amanjeet.stressapp.ALLOCATE_NATIVE --ei NUMBER_OF_OBJECTS 5 -p me.amanjeet.stressapp
  sleep 1
done

Allocate native objects

ADB command in the above snippet helps to allocate a number of native objects each of size ~42 MB approximately. This ADB command sends a broadcast to NativeAllocationReceiver.kt , which is responsible for calling a native method that allocates native objects. We can configure the number of objects getting allocated by providing the number in intent extras as shown in the above code snippet.

2. Script log_memfree.sh with the following content:

while :
do
  adb shell cat proc/meminfo | grep MemFree >> log_free_memory.txt
  sleep 2
done

Log free memory in the file

The above shell script logs the free memory from the meminfo proc file into log_free_memory.txt file. MemFree value from the meminfo proc file is the unused memory in KBs at any time. It's basically a sum of low free and high free memory which are kernel-based and userspace-based memory respectively.

proc/meminfo file is part of proc file system which can be useful to get memory related information for a device at any instant. You can further checkout details about proc file system here.

3. Script stress_device.sh with the following contents:

chmod +x ./log_memfree.sh
adb shell /data/local/tmp/stressapptest -s 20 -M 900 -C 8 & sh log_memfree.sh

The above shell script first, makes log_memfree.sh executable. In line 2 the ADB command is used to run memory stress with stressapptest of 900 MBs for 20 seconds while also logging free memory in log_free_memory.txt.

Steps

Perform the following steps for verification of SIGABRT signal:

  1. Run the stress_device.sh 4-5 times and note the minimum MemFree value recorded each time in the log_free_memory.txt file while running the stress test. You need to interrupt the command with Ctrl+C as soon as the stress test status is shown as passed in the console, to stop an iteration.
  2. Take an average of the minimum MemFree logged in each iteration
  3. For me the average minimum value for MemFree, while running the stress test, is roughly 115 MB, which means to overflow while the system is under memory pressure I have to natively create 3 objects with my broadcast given each are of 42 MB approximately.
  4. The next step is to run allocate_native.sh while running the stressapptest ADB command in stress_device.sh.
  5. As soon as you will run allocate_native.sh wait for a while and BAM you will notice a native crash with SIGABRT.

Conclusion

Extract the latest tombstone inside the /data/tombstones directory and notice the signal name and the abort message:

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'google/sdk_google_phone_x86/generic_x86:7.1.1/NYC/6695155:userdebug/test-keys'
Revision: '0'
ABI: 'x86'
pid: 5583, tid: 5583, name: njeet.stressapp  >>> me.amanjeet.stressapp <<<
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
Abort message: '/Volumes/Android/buildbot/src/android/ndk-release-r21/external/libcxx/../../external/libcxxabi/src/abort_message.cpp:72: abort_message: assertion "terminating with uncaught exception of type std::bad_alloc: std::bad_alloc" failed'
    eax 00000000  ebx 000015cf  ecx 000015cf  edx 00000006
    esi ab5b258c  edi ab5b2534
    xcs 00000073  xds 0000007b  xes 0000007b  xfs 0000003b  xss 0000007b
    eip ab4de424  ebp bfd31228  esp bfd311cc  flags 00000296

The above part of the generated tombstone shows that we tried to do allocations which were not possible and SIGABRT was generated. Abort message also shows that exception of type std::bad_alloc was generated which must have been the reason for the abort signal.

Another thing to note is that if we try to do the same number of allocations when the system is not under memory pressure the SIGABRT will not be generated.

So, we have to be considerate of the number of native allocations we do from our process and have to ensure that the memory management is done properly.

TRIM_MEMORY_EVENTS during stressapptest

This was another interesting thing that I was able to verify or reproduce with help of stressapptest. I have never seen TRIM_MEMORY_EVENTS going to TRIM_MEMORY_COMPLETE in our local test devices. But with this stress test, we can easily notice the system going to TRIM_MEMORY_COMPLETE whenever the app is close to getting killed by the system.

me.amanjeet.stressapp D/STRESSAPP: 125829392 bytes allocated
me.amanjeet.stressapp D/MyApp: Memory Level 80
me.amanjeet.stressapp D/STRESSAPP: 125829392 bytes allocated
me.amanjeet.stressapp D/MyApp: Memory Level 80
me.amanjeet.stressapp D/STRESSAPP: 125829392 bytes allocated
me.amanjeet.stressapp D/MyApp: Memory Level 80
me.amanjeet.stressapp A/libc: /Volumes/Android/buildbot/src/android/ndk-release-r21/external/libcxx/../../external/libcxxabi/src/abort_message.cpp:72: abort_message: assertion "terminating with uncaught exception of type std::bad_alloc: std::bad_alloc" failed
me.amanjeet.stressapp A/libc: Fatal signal 6 (SIGABRT), code -6 in tid 5583 (njeet.stressapp)

Trim memory events reaching level 80

The above-captured logs from logcat show trim memory events reach TRIM_MEMORY_COMPLETE which means it was close to getting killed by the system.

LMKD is active 💪

While running the stress test you can also verify that LMKD which is used to reclaim memory when the system goes under memory pressure has started becoming active and killing the lower priority apps.

For this verification first, create memory stress with the stressapptest of around 1000 MB ~1 GB. This would be enough to invoke LMKD for an emulator with 1.5 GB of total

adb shell /data/local/tmp/stressapptest -s 20 -M 1000 -C 8

While we run this you can grep the kernel driver messages and grep for lowmemorykiller with the following command:

adb shell dmesg | grep lowmemory

You will see the LMKD logs below which shows that lowmemorykiller has killed some of lower priority apps also showing the adj score for the process:

[40975.906345] lowmemorykiller: Killing 'd.process.acore' (22130), adj 906,
[40975.941133] lowmemorykiller: Killing 'd.process.acore' (22130), adj 906,
[40975.981826] lowmemorykiller: Killing 'Signal Catcher' (22136), adj 906,
[40976.021413] lowmemorykiller: Killing '.android.dialer' (22337), adj 906,
[40976.044607] lowmemorykiller: Killing 'Gservices' (22351), adj 906,
[40976.047201] lowmemorykiller: Killing 'Gservices' (22360), adj 906,
[40976.116697] lowmemorykiller: Killing '.android.gms.ui' (22387), adj 904,
[40976.163206] lowmemorykiller: Killing 'oid.apps.photos' (22366), adj 904,
[40976.171674] lowmemorykiller: Killing 'Signal Catcher' (22371), adj 904

Did you find this tool useful? What other use cases do you think stressapptest tool would have? Reach out to me at @droid_singh and let me know.