Stressing the JVM GC on Android

Context

Recently, I stumbled upon this tweet from theapache64, raising an intriguing question about deliberately creating an Android app with poor performance. Delving into the answers, it became clear to me that we often rush to think about solutions without considering that if it would impact the system resources or not. For instance, the initial idea that comes to mind is to have random sleep intervals on the main thread, but this raises a few questions:

  • Does this method truly exert stress on the device's resources like memory, and CPU?
  • Can it even be a real programming error?

In addressing this issue, our focus should pivot towards finding strategies that stress a particular system resource. This was the time when I had an idea about trying to stress the Garbage collector which should ideally lead to:

  1. Skipping UI frames first
  2. Possible ANRs if you are dispatching input events
  3. Ultimately the famous OutOfMemory Error as well

Introducing: GCStress

Pivoting to this problem and just focusing on stressing Garbage collector, I started exploring Github and that's when I found: GCStress. It's a Java application that creates a thread "GCStress Hammer thread" with Thread.NORM_PRIORITY priority that:

  1. Creates a LinkedHashMap as a cache of Integer and object with a default capacity of  2,000,000.
  2. Runs in a loop and allocates a byte array of random size (0-255 bytes) against a integer cache key
  3. If a cache key is already present it's removed from the map

Now because we are allocating and deallocating in random sizes it causes fragmentation in heap leading to frequent GC calls.

💡
Fragmentation happens when there is no contiguous space to complete memory allocation and it can lead to more frequent GC runs that will try to compact the heap and recover larger contiguous spaces.

Integrating GCStress in Now In Android App

Since GCStress is a Java application my idea is to integrate jar as a dependency in an android app and invoke the code that causes the GC stress. For this purpose, we are going to use Now in Android app by Google. Let's look at the setup steps for this:

  1. The first step is to implement the gcstress jar. For that, we can include the following in the build.gradle.kts:
implementation(fileTree(mapOf("dir" to "gcstress", "include" to listOf("*.jar"))))
  1. For invoking the code we are going to include a new broadcast GCStressRecevier with an action: me.amanjeet.nowinandroid.gcstress and call it anytime with just an adb command:
adb shell am broadcast -a me.amanjeet.nowinandroid.gcstress -p com.google.samples.apps.nowinandroid.demo.debug
  1. The main part of this GCStress is the GCHammer Thread that fills the cache with a random size of ByteArray that ultimately cause the fragmentation:
internal class GCHammer(private val capacity: Int, private val maxsize: Int) : Runnable {
    private val map: LinkedHashMapWithCapacity<Int, Any> = LinkedHashMapWithCapacity(capacity)
    private var stop = false

    fun stop() {
        stop = true
    }

    override fun run() {
        val rand = Random()
        while (!stop) {
            val key: Int = rand.nextInt(capacity)
            val size: Int = rand.nextInt(maxsize)
            var value = map[key] as ByteArray?
            if (value == null) {
                // if the cache entry is empty, fill it
                map[key] = ByteArray(size)
            } else {
            // otherwise, remove it
            value = map.remove(key) as ByteArray?
            }
        }
    }
}

Here is the fork with this code.

Testing the GCStress with Now In Android app

To test the gcstress setup we are going to:

  1. Send the broadcast to the app with the command:
adb shell am broadcast -a me.amanjeet.nowinandroid.gcstress -p com.google.samples.apps.nowinandroid.demo.debug
  1. Start Interacting with the app.
  2. You'll start noticing GC logs in a little while:
WaitForGcToComplete blocked Alloc on Background for 298.522ms
Starting a blocking GC Alloc
Waiting for a blocking GC Alloc
  1. You'll also observe logs for skipped UI frames:
Skipped 69 frames! The application may be doing too much work on its main thread.
  1. Bam! You'll observe an ANR and finally an OutOfMemoryError:
ANR in com.google.samples.apps.nowinandroid.demo.debug (com.google.samples.apps.nowinandroid.demo.debug/com.google.samples.apps.nowinandroid.MainActivity)
PID: 30359
Reason: Input dispatching timed out (76edcc5 com.google.samples.apps.nowinandroid.demo.debug/com.google.samples.apps.nowinandroid.MainActivity (server) is not responding. Waited 5001ms for MotionEvent(deviceId=10, eventTime=29577131277000, source=TOUCHSCREEN | STYLUS, displayId=0, action=DOWN, actionButton=0x00000000, flags=0x00000000, metaState=0x00000000, buttonState=0x00000000, classification=NONE, edgeFlags=0x00000000, xPrecision=30.3, yPrecision=13.7, xCursorPosition=nan, yCursorPosition=nan, pointers=[0: (839.0, 904.9)]), policyFlags=0x62000000)
Parent: com.google.samples.apps.nowinandroid.demo.debug/com.google.samples.apps.nowinandroid.MainActivity
ErrorId: 615760f3-3a56-4aea-a4dd-640dc448823d
Frozen: false
Load: 0.2 / 0.08 / 0.08

----- Output from /proc/pressure/memory -----
...

OOM:

java.lang.OutOfMemoryError: Failed to allocate a 256 byte allocation with 1904016 free bytes and 1859KB until OOM, target footprint 201326592, growth limit 201326592; giving up on allocation because <1% of heap free after GC.
     at coil.fetch.HttpUriFetcher.newRequest(HttpUriFetcher.kt:184)
     at coil.fetch.HttpUriFetcher.fetch(HttpUriFetcher.kt:73)
     at coil.intercept.EngineInterceptor.fetch(EngineInterceptor.kt:165)
     at coil.intercept.EngineInterceptor.execute(EngineInterceptor.kt:122)
     at coil.intercept.EngineInterceptor.access$execute(EngineInterceptor.kt:41)
     at coil.intercept.EngineInterceptor$intercept$2.invokeSuspend(EngineInterceptor.kt:75)
     at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)

Conclusion

A little while back I wrote an article about Demystifying ANRs Puzzle where I wrote about ANRs where you'll see thread state WaitingForGCToComplete. This proves how GC calls can cause janky frames, and ANRs ultimately leading to OOM errors. It is important to note that this is a staged impact of frequent GC runs.

So, it's not only OOM reports that indicate the presence of memory leaks. It could be ANRs as well.

So, if you see various ANRs on Android Vitals with thread state WaitingForGCToComplete this usually means something is wrong with memory management or you have a lot of leaks. This also gives a way to reproduce this ANR 🎉

Did you find this blog useful? How would you stress other system resources like memory and CPU? Reach out to me at @droid_singh and let me know.


Photo by Crystal Kwok on Unsplash