Troubleshooting 'System memory must be at least' error

on waitingforcode.com

Troubleshooting 'System memory must be at least' error

You want to become a data engineer and don't know where to start? I was like you 4 years ago when I started to learn the data. From that experience I prepared a 12-weeks course that will help you to become a data engineer. Join the class today! Join the class!
When the unit tests work on "your machine" but fail on your colleague's, you know you did something wrong. When the failures are not about test assertions but technical reasons, the "something wrong" transforms into "something strange". And it may happen with Apache Spark as well.

The post will be about 2 different configuration properties that can help you to solve problems with unit tests in Apache Spark very quickly. That being said, you should always investigate the real reason for these problems later. In the first part, I will present the potential problems that can happen during the unit tests execution. In the next parts, I will present you 2 different properties that can help to fix the problems.

Not enough memory...but for unit tests?!

java.lang.IllegalArgumentException: System memory 259522560 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration.

That's the error you can meet if you run unit tests and have insufficient memory. After a quick analysis of the code, you can see that the error can come from org.apache.spark.memory.UnifiedMemoryManager or org.apache.spark.memory.StaticMemoryManager. The used manager depends on the spark.memory.useLegacyMode property and UnifiedMemoryManager is the default. In this post I will focus only on this part since the other one is considered as legacy.

When Spark creates an instance of the UnifiedMemoryManager, it takes the amount of available memory that way:

private val RESERVED_SYSTEM_MEMORY_BYTES = 300 * 1024 * 1024

    val systemMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)
    val reservedMemory = conf.getLong("spark.testing.reservedMemory",
      if (conf.contains("spark.testing")) 0 else RESERVED_SYSTEM_MEMORY_BYTES)
    val minSystemMemory = (reservedMemory * 1.5).ceil.toLong

In that snippet you can see the variable with the minimal amount of memory reserved for the OS and by default, it will be 300MB. It will be 0 if the property won't be defined manually and Spark will run unit tests (spark.testing property).

The error I quoted before occurs just after the code from the snippet:

    if (systemMemory < minSystemMemory) {
      throw new IllegalArgumentException(s"System memory $systemMemory must " +
        s"be at least $minSystemMemory. Please increase heap size using the --driver-memory " +
        s"option or spark.driver.memory in Spark configuration.")
    }

So, the error occurs when the JVM running the tests can use less than 471MB. It will still keep 300MB for the OS and leave the rest for Apache Spark storage and execution purposes.

Setting spark.testing.memory can help

If you set the spark.testing.memory property to more than the minimum, the if condition from the previous part will simply pass and later Spark will use that property to figure out the memory it can allocate to its execution and storage:

val usableMemory = systemMemory - reservedMemory
val memoryFraction = conf.getDouble("spark.memory.fraction", 0.6)
(usableMemory * memoryFraction).toLong

Setting this value in unit tests is fine. The volume of processed data is much smaller than for production workloads, thus you will probably never reach these increased memory limits and not provoke an OOM.

spark.testing can help as well

Another solution to fix the problem is spark.testing property. When enabled, Apache Spark will simply ignore the check and unless for whatever reason your JVM returns you a negative size of available memory, the tests should never fail.

If you opt for this approach, keep in mind that spark.testing property applies to other things as well. It will change the number of retries for port bindings (100 instead of 16), will disable background thread for application history and enforce the constraint of having the execution id for the query. These changes probably won't impact you very much.

In the post I presented you 2 possible solutions if for whatever reason you encounter the error from the first section, you can play with one of the properties presented in the post. But remember to check the real reason for that memory error. Maybe it hides something bigger?

Share on:

Share, like or comment this post on Twitter:

Share, like or comment this post on Facebook: