Java 8 - Part IV [Stream]
Let's start the fourth post of the Java 8 series about the changes that you can find from version 6 to 8. Now we will talk about the new way to manipulate collections, the streams.
Summary
What is Stream
The streams are an important implementation in Java 8. Before JDK 8 the collections were manipulated through iterators (for or while loops), based on the imperative programming paradigm where explicitly sad how the algorithm should works. To help that, java implements the API Stream (package java.util.stream) where stream operations do the iteration behind the scenes for us. It will help the programmer does not worry about how the iteration must take place. In other words, you let the Streams API manage how to process the data.
Other important difference between stream and collections is that a collection is in memory (all the values) and stream is a conceptually fixed data structure whose elements are computed on demand. Consequently, stream is a lazily constructed collection and collection is eagerly constructed.
A stream in Java is a sequence of data. A stream pipeline is the operations that run on a stream to produce a result. [1]
Reference: OCP8 Stream
Some important features are:
- Streams don't store its elements
- Streams are immutable
- Streams are not reusable
- Streams don't support indexed access to their elements
- Streams are easily parallelizable
- Stream operations are lazy when possible (values are computed when they are solicited by a consumer)
- It is not possible for the programmer to create his own operators.
The use of streams usually return a new stream and it makes possible the use of chained operation. The complete line from stream to finish the process and give a result is the pipeline. So, formally, the pipeline has three parts:
- Source: where the stream is created
- Intermediate Operations: operations that can be connected to form the pipeline and where result in another stream.
- Terminal Operation: the operation that closes a stream and where you have the result (no more streams).
Reference: Harnessing the Power of Java 8 Streams
The OCP Certified Study Guide gives us a great example to understand how this works.
stream [list] -> intermediate [filter, sorted, limit] -> terminal [forEach]
Java will figure out how to best implement the stream pipeline. To that, limit will warn when the number of elements is ok and sorted will execute when it has all elements to sort only once. So:
- Each element is (a) sent from stream to filter, (b) the filter verifies the length of the element, then (c) if the length is not ok the element is out of the assembly line process, if the length is ok the element is (d) sent to sorted.
- When every element was being processed, the sorted execute.
- After that, sorted send the elements, one by one to limit. The limit verifies if the quantity of elements is ok.
- If it's ok the limit send one by one element to forEach.
- When all elements were processed so java stops the line, and no more processing occurs in the pipeline.
Create a stream
The interface to use streams is java.util.stream.Stream, and the related primitives specialization are IntStream
, LongStream
and DoubleStream.
Intermediate Operations
The intermediate operations return streams. It makes possible chain operations, similar to the builder pattern. They are considered lazy because they don't need process the elements until a terminal operation is invoked. It happens because the intermediate operator can usually be merged or optimized by a terminal operation.
Some operations, such as limit, can be called short-circuit because of no necessity to process all the elements of the stream.
- Stateless Operations: the elements are processed independently each other - no state. Examples: filter, flatMap, map, peek.
- Stateful Operation: it retains the state - next process depends on the previous element. Examples: distinct, limit, skip, sorted.
Remember that an intermediate operation doing nothing until a terminal operation gets started. Then, in the example bellows nothing happens.
Terminal Operator
It's possible to use terminal operator without any intermediate operations. It does not return a stream, then, when it is finished, the stream pipeline cannot be used anymore. However, it's possible to have only one terminal operation in a stream pipeline.
The * Match methods, for example, can be called lazy because as soon as you find a matching element, there's no need to continuing processing the stream.
Let's check your understanding of terminal operations. Can you see the problem in the example?
There is no terminal operation!!! So, the result is another stream. It will print something like java.util.stream.ReferencePipeline$3@65ab7765. The stream contains "b" and "ab" elements.
One more concept to say here is about the reductions. Examples of reduction operations: collect, count, min, max, reduce.
Reductions are a special type of terminal operation where all of the contents of the stream are combined into a single primitive or Object.
Let’s see some examples:
You can see a complete example and compare the different way to implement using array and using stream.
Parallel streams
The streams can be split into multiple parts to be processed at the same time by different threads without writing any multithreaded code. The number of available cores of the processors will determine how many threads can be processed.
The parallel process sometimes can be slower than sequential process. If you use stateful operation, for example (sorted, limit, distinct, skip), it will make necessary to go through the entire stream to produce a result.
However, a parallel process can return the correct result and sometimes not. To guarantee that you will have a correct answer is necessary to synchronize the access. For that, you can use reduce to combine the elements of a stream into a single one.
With parallel streams, the reduce method creates intermediate values and then combines them, avoiding the "ordering" problem while still allowing streams to be processed in parallel by eliminating the shared stated and keep it inside the reduction process. The only requirement is that the applied reducing operation must be associative.
To work recursively with the parallel task you can use the fork/join framework. An example you can see here.
The fork/join framework was designed to recursively split a parallelizable task into smaller tasks and then combine the results of each subtask to produce the overall result.
Rules to choose parallel or sequential:
- For a small set of data, sequential streams are almost always the best choice due to the overhead of the parallelism.
- When using parallel streams, avoid stateful (like
sorted()
) and order-based (likefindFirst()
) operations. - Operations that are computationally expensive (considering all the operation in the pipeline), generally have a better performance using a parallel stream.
Now, you can see a practical example that the Java In Action gives to us.
Conclusion
The main reason to use streams is to make readable code. Sometimes the traditional loops can be faster than a sequential stream (Java performance tutorial, Performance With Java8 Streams, 3 Reasons why You Shouldn’t Replace Your for-loops by Stream.forEach()), but probably it will be improved in new versions.
To debug code can looks like a little difficult, but you can see some tips in a StackOverflow discussion and in an ibm post.
The stream is an extensive topic. You need to see each method and test them. Go one and do your homework.
Go deeply!!
Related Posts
- Java 8 - Part VII [Collections]
- Java 8 – Part VI [File IO NIO.2]
- Java 8 – Part V [Concurreny]
- Java 8 – Part IV [Streams]
- Java 8 – Parte III [Lambda]
- Java 8 - Part II [Localization, Date, Time]
- Java 8 - Language Enhancements
- JVM
Reference
- OCP8: Streams
- Harnessing the Power of Java 8 Streams
- OCP8: Parallel Stream
- OCP Oracle Cetified Professional Java SE 8 Programmer II - Study Guide
- Introduction to the Stream API
- Java In Action - Github