Java 8 streams API

My experience with querying a collection has always been clumsy. Writing iterative loops, adding nested if conditions and above all the hack of maintaining intermediary collection to store results. On top of that the brutal indicators from SonarQube – cyclomatic complexity. During all this, I felt there is a common pattern for filtering, sorting and there has to be an expressive way to achieve this functionality(someone said more like SQL!!!). With my love towards multiple technologies (Ok I accept I am a polyglot coder), I knew .Net already had an answer to this – LINQ. Simply have a collection and write a query to extract data. Something like below

// Specify the data source.
int[] scores = new int[] { 97, 92, 81, 60 };

// Define the query expression.
IEnumerable scoreQuery =
	from score in scores
	where score > 80
	select score;

// Execute the query.
foreach (int i in scoreQuery)
{
	Console.Write(i + " ");
}

Isn’t it more readable than writing for loops and if condition? If this feature is present in .Net for so long, why not in Java? In past, I have used few Java libraries doing the same thing. But something out of the box from Java would be nice. Fortunately, Java 8 came to the rescue and Collections package got a nice addition of streams. Java already has a stream for I/O operations, but that works on bytes produced out of network/file/buffer data. The streams concept introduced in Java8 is meant to ease the processing of Collections.

Before we delve into the concept, let me show you a magic of streams. Consider you have a list of student names. The list should be filtered based on the student whose name container letter ‘a’.

List students = Arrays.asList("John", "Merry", 
       "Nicole", "Roy", "Griffin", 
       "Harry", "Sandra", "Patrick");
//Java7 and before
List filteredStudents = new ArrayList<>();
for(String student : students) {
  if(student.contains("a")) {
    filteredStudents.add(student);
  }
}
		
//Java8 - streams
List nameWithA = students.stream()
        .filter(s -> s.contains("a"))
        .collect(Collectors.toList());
nameWithA.stream().forEach(System.out::println);

Did you notice the difference? How easy it was to convert multiple lines of a nested block to one single line and that too without explicitly declaring any intermediary collection to hold results. What if the same problem is made more complex to skip few records?

List students = Arrays.asList("John", "Merry", 
           "Nicole", "Roy", "Griffin", "Harry", 
           "Sandra", "Patrick");
// Java7 and before
List filteredStudents = new ArrayList<>();
int skipCounter = 0;
for (String student : students) {
  if (student.contains("a") && (++skipCounter > 2)) {
    filteredStudents.add(student);
  }
}

// Java8 - streams
List nameWithA = students.stream()
      .filter(s -> s.contains("a"))
      .skip(2).collect(Collectors.toList());

nameWithA.stream().forEach(System.out::println);

It feels like you have got Zeus power. Well, enough of magic, come back to earth. Let’s understand few concepts behind streams.

Filter

Filter

What is streams

Here is what official JavaDoc has to say about Streams

Stream operations are divided into intermediate and terminal operations and are combined to form stream pipelines. A stream pipeline consists of a source (such as a Collection, an array, a generator function, or an I/O channel); followed by zero or more intermediate operations such as Stream.filter or Stream.map; and a terminal operation such as Stream.forEach or Stream.reduce.

Based on this definition, streams can be characterised as

  • Stream works on source (Collection, array, generator or I/O).
  • Stream is a pipeline
  • Stream operations are divided in intermediate and terminal.
  • Bonus: Streams are lazy!!!

Stream works on source

Stream are created from multiple sources including Array, Collection, Generator or I/O operation. Refer below examples

//Collection stream
List colors = Arrays.asList("Red", "Pink", 
           "Blue", "Purple", "Green", "Orange",
           "Yellow", "Cyan");

Stream colorStream = colors.stream();

//Array Stream
int [] age = {34, 12, 23, 45, 67, 39,16};
IntStream intStream = Arrays.stream(age);

//Generator
IntStream.rangeClosed(1, 10).map(i -> i * 2).forEach(System.out::println);

In fact, most of the Java 8 API are designed to create Streams.

Stream is a pipeline

Stream offer flexibility to chain operations one after another. Nested loops and conditions are easily expressed within one single statement.

List colors = Arrays.asList("Red", "Pink", 
           "Blue", "Purple", "Green", "Orange",
           "Yellow", "Cyan");

long colorCount = colors.stream().filter(s -> s.length() > 4).filter(s->s.contains("p")).count();

Above code creates stream using List’s stream() method. The stream is then passed through multiple filters to exclude colors – color name containing letter ‘p’, with name size greater than 4 characters.

Stream operations – Intermediate and Terminal

filter(), map(), skip(), limit() etc. are considered as Intermediate operations. Operations which return new modified stream are considered as Intermediate. Intermediate operations can be chained one after another.

count(), sum(), collect(), findFirst() etc. are considered as Terminal operations. Terminal operations initiate the processing of Streams and terminate the pipeline. The easy way to identify terminal operation is – it deosn’t return stream object.

Streams are lazy!!!

Intermediate operations are lazy – if you just chain them one after the another, it doesn’t mean it will start execution immediately. To start the processing stream, you must invoke one of the terminal operations. Remember that, once you invoke the terminal operation the stream cannot be used again.

//Sysout will be never printed
colors.stream().filter(s-> { System.out.println("Lazy stream was called."); return true;});

//For every element, the message will be printed.
colors.stream().filter(s-> { System.out.println("Lazy stream was called."); return true;}).count();

In above example, since the first statement doesn’t end with a terminal operation, the message will never be printed. The second statement has a terminal operation, which ensures all the intermediate operations are executed and the stream is processed.

Wait, We forgot an important aspect!!!

All this may sound exciting and you may be convinced of using Streams in your code. But wait, hold your horses. What about performance? How does stream perform against its traditional counterpart – loop and threads? With parallelStream() you might be under the impression that the native support to process Collection might perform better than spinning custom threads. While I was trying to dig on this topic, I came across nice blogs from Codefx and IBM DeveloperWorks. Here is a snippet from Codefx

All in all I’d say that there were no big revelations. We have seen that palpable differences between loops and streams exist only with the simplest operations. It was a bit surprising, though, that the gap is closing when we come into the millions of elements. So there is little need to fear a considerable slowdown when using streams.

Be Sociable, Share!

Leave a Comment.