Scala coding standards

Versions: Scala 2.12.1

Each programming language has its own specific standards. Scala is not an exception and also comes with its own coding style specificity.

A virtual conference at the intersection of Data and AI. This is not a conference for the hype. Its real users talking about real experiences.
- 40+ speakers with the likes of Hannes from Duck DB, Sol Rashidi, Joe Reis, Sadie St. Lawrence, Ryan Wolf from nvidia, Rebecca from lidl
- 12th September 2024
- Three simultaneous tracks
- Panels, Lighting Talks, Keynotes, Booth crawls, Roundtables and Entertainment.
- Topics include (ingestion, finops for data, data for inference (feature platforms), data for ML observability
- 100% virtual and 100% free

👉 Register here

This post doesn't list all possible standards. Instead, it discusses only some of them I've met very often when working when engineers switching to Scala from Java. This post is then divided into 8 small sections. Each of them talks about one coding standard and its applicability.

Braces

The first overused coding practice concerns curly braces. In a lot of places we can skip them and some people use that fact to complicated if-else statements, as the one of the following snippet:

if (...)
  // some code here
  // and here
  if (...)
    // some specific code here
  else
    // and here
else 
  // ... 

And it's even worse when such construction is longer or is placed inside an anonymous block, as for instance map, foreach, flatMap and so forth. The most often reason I heard about such code style is the "lightweight" writing. Even though at a given moment it seems readable, most of the time it becomes unreadable some commits later. The code is error-prone and it's pretty easy to misplace some execution code. Thus, a lot of Scala coding style recommend skipping braces only when the whole expression fits in a single line, as below:

def createPrimaryKey(suffiix: String, value: String) = s"${suffix}_${value}"
val isRegistered = if (user.account.isDefined && user.id != "") true else false

Above rule is not debatable. But it's not the case of the next one focusing on the use of curly braces in anonymous methods. Sometimes we can meet a guideline telling to avoid excessive parentheses and curly braces for anonymous methods. And it discourages the following writing:

list.map(item => {
  ...
})

Instead it advises that form:

list.map { item =>
  ...
}

This advice can be discussed. Sometimes, for instance when the mapping function contains an if-else statement, writing it in the discouraged form seems to be more readable:

val stringifiedNumbers = (0 to 3).map(nr => {
  if (nr % 2 == 0) {

  } else {

  }
})

val stringifiedNumbers2 = (0 to 3).map { nr =>
  if (nr % 2 == 0) {

  } else {

  }
}

Parentheses

Another style point similar to braces is about parentheses. Scala recommends using parentheses only on the methods making some side-effect. The examples of such side-effect methods are mutators and the methods wrapping I/O operations.

The parentheses should be skipped for read-only methods without parameters, as:

class Person {
  // ... 
  def age = currentYear - bornYear
}

At this occasion we can introduce another important rule - methods exposing a value (aka getters) should be written without get* keyword such often used in Java. Once again, it's not forbidden by the compiler but it's a pretty good rule to follow. The rule exposes the values as simple properties to the class users, that seems much easier. It's only important to keep in mind that if such method involves an intensive computation, it's better to transform the methods into lazy evaluated values or, if they require some input parameters, into singleton values.

Lambda party

This one is one of my favorite anti-patterns. A lot of software engineers coming from Java 6 and older versions used to write all mapping or filtering logic inside loops. So when anonymous expressions came to Java, they've started to use them at every step. And it could be fine if they would respect some rules of good sense. It's valid also to Scala where anonymous expressions are also overused introducing a lot of avoidable noise:

object UserDataApi {
  def getUserData(id: String): String = ""
}
val inputPairs = Seq((1, "a"), (2, "b"), (1, "c"), (1, "d"), (3, "e"), (2, "f"))
inputPairs.groupBy(_._1)
  .map(a => {
    (a._1, a._2.map(x => (x._2, UserDataApi.getUserData(x._2))))
})

As in the case of braces skipped in nested if-else statements, the code may look fine at the given moment. But 99% of the time it will become unreadable after some weeks of work. And very often it's easy to improve that by introducing intermediary variables and using explicitly called functions instead of anonymous blocks:

case class UserData(value: String, data: String)
def enrichUserValues(userValues: Seq[(Int, String)]): Seq[UserData] = {
  userValues
    .map{ case (_, user) => user }
    .map(user => UserData(user, UserDataApi.getUserData(user)))
}
val pairsGroupedById = inputPairs.groupBy(idWithValue => idWithValue._1)
val userDataById = pairsGroupedById.map {
  case (id, values) => (id, enrichUserValues(values))
}

Return statement

At first contact it's difficult to eliminate the habit of using return statement. It's especially true for the code using shortcuts, as for instance:

if (...) {
  return "x"
}
// a lot of lines later
"y"

In such case we can simply add an else statement and remove return statement. That change is purely stylistic. However more technical reasons about the use of return in Scala exist. One of them concerns eager termination of current computation. Let's take one example to illustrate that:

describe("return statement") {
  it("should terminate the execution eagerly") {
    def sumWithReturn(numbers: Seq[Int]): Int = {
      numbers.reduce((nr1, nr2) => return nr1 + nr2)
    }
    def sumWithoutReturn(numbers: Seq[Int]): Int = {
      numbers.reduce((nr1, nr2) => nr1 + nr2)
    }

    val resultWithReturn = sumWithReturn(1 to 3)
    resultWithReturn shouldNot equal(6)
    val resultWithoutReturn = sumWithoutReturn(1 to 3)
    resultWithoutReturn shouldEqual 6
  }
}

As you can see both methods are almost the same - the only difference is the return expression in one of them. And as shown in the test assertions, this method terminates eagerly. It's because of the return expression which ends given computation and returns the result to the caller. It's always possible to rewrite the code without the use of return statement. Its use is not prohibited by the compiler but it's legitimately advised to get rid of it because of execution problems as the ones shown in the snippet.

Constants

Constants are another point where developers starting with Scala bring back the rules from other languages. Very often the constants are written with upper snake case, as here:

val TIME_TO_LIVE_HOURS = 48

But the official documentation states differently. According to it, the constants should be written in upper camel case:

val TimeToLiveHours = 48

Does it mean that all immutable fields of classes should be written so? Not really because they're not constants. The rule applies only to constants, so to the values defined in companion objects (or generally in objects) or in package objects:

class SomeRow(val json: String)
object SomeRow {
  val TimeToLiveHours = 48
}

Spaces

Spaces are generally not a difficult rule to follow but sometimes we can find pretty illogical writing styles as:

def sum(nr1:Int, nr2:Int):Int = nr1 + nr2
def sum(nr1 : Int, nr2 : Int) : Int = nr1 + nr2 

The rule to follow is to put only one space after colons. It separates pretty clearly the name of the variable and its type. Two others styles shown in above snippet are more confusing than that.

Generally the code should be indented with 2 spaces. An exception to that is a method with multiple parameters that don't fit in a single line. Of course, we should avoid such situation as much as possible but it's not the point. Let's see a pretty clear guideline to define such kind of methods:

def doSomethingWithALotOfParameters(param1: Int, 
    param2: Int, param3: Int, param4:Int): String {
  (param1 + param2).toString
}

As you can see, the new line parameters are indented with 4 spaces. However, the function's body is indented with 2 spaces. Visually it's easier to read than the version where each line is indented with the same number of spaces.

Explicit collections

The next best practice comes from the fact that a lot of languages don't distinguish collections to mutable and immutable. Scala does it and it brings some extra care during their use. First of all the advised type of collections to use are the immutable ones. They help to keep referential transparency and to work with multi-threading code.

Because of immutable collections preference we should always specify either which type of collection is used or only when the mutable type is used. It should be preferred over importing the whole collection globally:

import scala.collection.mutable

val mutableLetters = mutable.Seq[String]
val immutableLetters = Seq[String]

Implicits

When we start to work with Scala we often don't see the usefulness of implicits. But with the time passing, we start to appreciate them for their ability to extend some objects (Pimp my library pattern) or to enhance type safety with constraint evidence. However, at this stage we also have a tendency to overuse them and put them everywhere it's possible.

At this stage we add an extra parentheses to every method with some implicit declarations, as here:

def doSomething(param1: Int, param2: Int)(implicit someObject: Int) ... 

And here too the rule of "it's fine now but not tomorrow" can apply. Even though the code using implicits is pretty clear at the given moment, it will become less obvious with each new modification. It's even truer if such modification brings new implicits to the code - and if it doesn't take care and defines the implicits for already defined type, it may be a mess difficult to control.

So here too I'm pretty agree with Twitter's guidelines (link in Read also section) advising to use implicits only for: adapting behavior of dependencies (Pimp My Library), enhancing type safety, typeclassing and writing Manifests. It's worth to stressing the advice they give: "If you do find yourself using implicits, always ask yourself if there is a way to achieve the same thing without their help."

Coding style may be completely different from company to another. But very often some common rules of good sense exist. This post tried to explore some of them by explaining the good and bad points about braces, parentheses, anonymous methods, return statement, constants, spaces, collections, and implicit code. Please notice however that all of this is a subjective observation of advised patterns and that always they should be judged pragmatically.