So far only Google Guava allowed us to easily process collections. But the arrival of Java 8 brought a serious alternative to this library - streams.
Data Engineering Design Patterns
 
Looking for a book that defines and solves most common data engineering problems? I wrote 
one on that topic! You can read it online 
on the O'Reilly platform,
or get a print copy on Amazon.
I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩
As you can imagine, this article will describe this new feature of dealing with collections. At the begin we'll describe some basic concepts hidden behind streams. In the second part we'll describe the main features of Streams. At the end we'll show how they can be used to work with collections.
What Streams are ?
Streams can be thought as wrappers of collections which the main goal is to process data in functional way. They can be thought as an illustration of SQL language in the world of Java. As SQL operations, "SELECT...FROM...WHERE", Streams enable finding operations of specific elements in collections. They also, as "GROUP BY" and "LIMIT" clauses, help to aggregate the data. The execution of these operations in Streams is made through a stream pipeline. It consists of 3 families of operations:
- source operation: initialize Streams from: collections, I/O operations, arrays or generator function.
- intermediate operations: they can but don't have to appear in stream pipeline. If they are defined, their main purpose is to make some actions on data. By actions, we can understand: filtering, mapping, limiting, ordering. Each of them return a new Stream object.
- terminal operation: they are the last operations made on streams. They can return data with changes made by intermediate operations or process for the last time remaining items. We can retrieve here aggregation operations (if exists, min, max, sum...) and loop processing (for-each).
To resume this execution channel, we can tell that Streams consist on: defining input data, processing it and generating output. The main features of Streams are:
- no storage: Streams doesn't hold objects but object references.
- laziness: in some cases, streams won't iterate through all held references to generate the result. For example, when we want to get only 5 first elements, we can reduce Stream by dealing exclusively with the 5 first items.
- consumable: once consumed, streams can't be reused.
- parallelism: streams can be sequential and parallel.
Streams features
From the code side, streams are the implementations of typed interface called java.util.stream.Stream<T>. As we mentioned earlier, one of construction possibility consists on using static factory method of(T...values). Another possibility is the call of stream() or parallelStream() method of Collection interface.
In Streams we can find several concepts already implemented in Google Guava:
- predicate matching: we can find if all (matchAll), any anyMatch) or none (noneMatch) elements in the stream match given predicate.
- limiting and skipping: we can as well limit the number of returned elements (limit) as skip some elements in returned stream (skip).
- collecting: is one of terminal operations types. It allows to collect stream elements and assembly them together into single container. We can find there collectors for commonly used collections, as list (Collectors.toList()) or map (Collectors.groupingBy).
- ordering: thanks to sorterd method, we can easily sort items in Streams.
Thanks to some primitive specializations, streams can be used also with primitive types. We can find, among others, IntStream to deal with Integers, LongStream for Longs or another one, DoubleStream for Doubles.
Streams can be closed manually by calling close() method from superinterface of Stream, BaseStream. It implements also java.lang.AutoCloseable interface, so will be closed automatically on try-with-resources construction.
Streams examples
Below, test cases show several features of streams. You can find there the examples of filtering, predicating or aggregation:
public class StreamsTest {
  private static final String MAN_U = "Manchester United";
  private static final String JUVE = "Juventus";
  private List<Player> players = new ArrayList<>();
  private List<Player> manURemaining = new ArrayList<>();
  @Before
  public void initData() {
    // Manchester United players
    players.add(new Player("Roy", "Keane", MAN_U));
    players.add(new Player("Ryan", "Giggs", MAN_U));
    players.add(new Player("Laurent", "Blanc", MAN_U));
    // Manchester United remaining players
    manURemaining.add(new Player("Peter", "Schmeichel", MAN_U));
    manURemaining.add(new Player("Teddy", "Sheringham", MAN_U));
    manURemaining.add(new Player("Dwight", "Yorke", MAN_U));
    // Juventus FC players
    players.add(new Player("Michel", "Platini", JUVE));
    players.add(new Player("Alessandro", "Del Piero", JUVE));
    players.add(new Player("Angelo", "Peruzzi", JUVE));
  }
  @Test
  public void find_juve_players() {
      List<Player> juvePlayers = players.stream()
          .filter(player -> player.getTeam().equals(JUVE))
          .collect(Collectors.toList());
      assertThat(juvePlayers).extracting("team").containsOnly(JUVE);
  }
  @Test
  public void check_if_only_juve_players() {
    boolean onlyJuve = players.stream()
        .allMatch(new Predicate<Player>() {
            @Override
            public boolean test(Player player) {
                return JUVE.equals(player.getTeam());
            }
        });
    assertThat(onlyJuve).isFalse();
  }
  @Test
  public void check_if_only_man_u_or_juve_players() {
    boolean juveOrManU = players.stream()
        .anyMatch(new Predicate<Player>() {
            @Override
            public boolean test(Player player) {
                return JUVE.equals(player.getTeam()) || MAN_U.equals(player.getTeam());
            }
        });
    assertThat(juveOrManU).isTrue();
  }
  @Test
  public void check_if_no_milan_players() {
    boolean noMilanPlayers = players.stream()
        .noneMatch(new Predicate<Player>() {
            @Override
            public boolean test(Player player) {
                return "AC Milan".equals(player.getTeam()) || "Inter Milan".equals(player.getTeam());
            }
        });
    assertThat(noMilanPlayers).isTrue();
  } 
  @Test
  public void convert_to_only_man_u_players() {
    Iterator<Player> manuRemainingIterator = manURemaining.iterator();
    List<Player> manUPlayers = players.stream()
          .map(player -> player.getTeam().equals(JUVE) ? manuRemainingIterator.next() : player)
          .collect(Collectors.toList());
    assertThat(manUPlayers).extracting("team").containsOnly(MAN_U);
    assertThat(manuRemainingIterator.hasNext()).isFalse();
  }
  @Test
  public void covert_to_map_with_players_grouped_by_team() {
    Map<String, List<Player>> playerByTeam = players.stream()
            .collect(Collectors.groupingBy(player -> player.getTeam()));
    assertThat(playerByTeam).hasSize(2);
    assertThat(playerByTeam).containsKeys(JUVE, MAN_U);
    assertThat(playerByTeam.get(JUVE)).hasSize(3);
    assertThat(playerByTeam.get(MAN_U)).hasSize(3);
  }
  @Test
  public void convert_to_ordered_list() {
    List<Player> orderedPlayers = players.stream()
            .sorted(new PlayerComparator())
            .collect(Collectors.toList());
    assertThat(orderedPlayers.get(0).getLastName()).isEqualTo("Blanc");
    assertThat(orderedPlayers.get(1).getLastName()).isEqualTo("Del Piero");
    assertThat(orderedPlayers.get(2).getLastName()).isEqualTo("Giggs");
    assertThat(orderedPlayers.get(3).getLastName()).isEqualTo("Keane");
    assertThat(orderedPlayers.get(4).getLastName()).isEqualTo("Peruzzi");
    assertThat(orderedPlayers.get(5).getLastName()).isEqualTo("Platini");
  }
  @Test
  public void pagination_with_limit_and_skip_functions() {
    // Beware of order of skip() and limit() functions - see next test
    List<Player> orderedPlayers = players.stream()
        .sorted(new PlayerComparator())
        .skip(3)
        .limit(3)
        .collect(Collectors.toList());
    assertThat(orderedPlayers).hasSize(3);
    assertThat(orderedPlayers.get(0).getLastName()).isEqualTo("Keane");
    assertThat(orderedPlayers.get(1).getLastName()).isEqualTo("Peruzzi");
    assertThat(orderedPlayers.get(2).getLastName()).isEqualTo("Platini");
  }
  @Test
  public void failing_pagination_with_inversed_limit_and_skip_calls() {
    // first, we limit players list to only 3-elements sublist, after we skip these 3 elements - at the end we receive an empty list
    List<Player> orderedPlayers = players.stream()
        .sorted(new PlayerComparator())
        .limit(3)
        .skip(3)
        .collect(Collectors.toList());
    assertThat(orderedPlayers).isEmpty();
  }
  @Test
  public void construct_team_with_remaining_players() {
    List<Player> allPlayers = Stream.concat(players.stream(), manURemaining.stream())
        .collect(Collectors.toList());
    assertThat(allPlayers).hasSize(9)
      .extracting("lastName").contains("Blanc", "Del Piero", "Giggs", "Keane", "Peruzzi", "Platini", "Schmeichel", "Sheringham", "Yorke");
  }
  @Test
  public void init_stream_with_builder() {
    List<Player> builtPlayers = Stream.<Player>builder().add(new Player("Ole Gunnar", "Solskjaer", MAN_U))
      .add(new Player("Andy", "Cole", MAN_U))
      .build().collect(Collectors.toList());
    assertThat(builtPlayers).hasSize(2)
      .extracting("lastName").containsOnly("Solskjaer", "Cole");
  }
  @Test
  public void get_distinct_players_by_teams() {
    // distinct() is based on equals() method invocation
    players.add(players.get(0));
    players.add(players.get(1));
    List<Player> distinctPlayers = players.stream()
      .distinct()
      .collect(Collectors.toList());
    assertThat(distinctPlayers).hasSize(6);
  }
  @Test
  public void transfer_all_players_to_man_u() {
    players.stream()
      .forEach(new Consumer<Player>() {
          @Override
          public void accept(Player player) {
              player.setTeam(MAN_U);
          }
      });
    assertThat(players).hasSize(6)
            .extracting("team").containsOnly(MAN_U);
  }
  @Test
  public void reduce_to_get_last_player() {
    Player lastPlayer = players.stream()
      .reduce(new BinaryOperator<Player>() {
          @Override
          public Player apply(Player previousPlayer, Player nextPlayer) {
              return nextPlayer;
          }
      }).get();
    assertThat(lastPlayer.getLastName()).isEqualTo("Peruzzi");
  }
  @Test
  public void reduce_to_compose_multi_name_player() {
    Player multiPlayer = players.stream()
      .reduce(new BinaryOperator<Player>() {
          @Override
          public Player apply(Player previousPlayer, Player nextPlayer) {
              return new Player("", previousPlayer.getLastName() + " " +nextPlayer.getLastName(), "");
          }
      }).get();
    assertThat(multiPlayer.getLastName()).isEqualTo("Keane Giggs Blanc Platini Del Piero Peruzzi");
  }
  private static class PlayerComparator implements Comparator<Player> {
    @Override
    public int compare(Player player1, Player player2) {
      return ComparisonChain.start()
        .compare(player1.getLastName(), player2. getLastName())
        .compare(player1.getFirstName(), player2.getFirstName())
        .compare(player1.getTeam(), player2.getTeam())
        .result(); 
    }
  }
}
This article introduces an alternative to usual way of dealing with collection data. Thanks to streams we can not only reduce the amount of written code but also allow better testability and reusability. We saw that streams consist on defining some entry data and making terminal operation at the end. Meantime we can also make some intermediary operations to, for example, remove wrong items or change theirs properties.
Consulting
 
With nearly 16 years of experience, including 8 as data engineer, I offer expert consulting to design and optimize scalable data solutions. 
As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and 
drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!
👉 contact@waitingforcode.com
đź”— past projects

 
    