Java Immutable Singly Linked List

Java lacks a decent immutable singly linked list in the JDK. This is unfortunate because an immutable singly linked list is a very useful container data structure that is optimally suited for certain computationally complex algorithms. Of course there are lots of open source implementations of singly linked lists for Java out on the web, but I couldn’t find one that supported all my needs and wishes. Therefore I made one myself, and I have made it available on GitHub. In this article I describe the merits of immutable singly linked lists, and properties of my implementation of it.

Of course Java 9 supports immutable lists, which are under the hood doubly linked lists. With this one can do functional programming with value semantics to a certain extend. However, many desirable operations are not available on these immutable lists. In Java 9 you can create an immutable List simply with List.of(e1, e2, ..., eN). The result is a indeed an immutable List, but one that does not support the construction of new derived lists without modifying the original list. Performing .add(e) on an immutable List in Java 9 will throw an UnsupportedOperationException instead of creating a new List with the added element and leaving the original (immutable) one unmodified. Such an immutable List is of limited use, and rejecting the forbidden add operation at runtime is a bit late. Strongly typed Java should have done better than that.

The merits of an immutable singly linked list stem from the very useful combination of immutability and single linkedness. Suppose we have the list (1, 2, 3, 4) and we want to remove the first element from this list. Because the list is immutable, the only way to get this result seems to be to create an entire new list (2, 3, 4) that equals the original list with the first element removed. However, since the original list is immutable and singly linked, it is possible to reuse part of the original in the creation of the new list.

Singly linked lists are implemented with what I will call list fragments. A list fragment is an object [value, pointer] with a value (representing the content of the list fragment) and a pointer (pointing to the next list fragment, or nowhere “Nil” if there is no next list fragment). Our list (1, 2, 3, 4) will be represented by four list fragments [1, pointer-1], [2, pointer-2], [3, pointer-3], [4, Nil], where pointer-1 points to [2, pointer-2], and so on. This representation of lists is singly linked because list fragments only have a pointer to the next fragment and not a pointer back to the previous fragment. If there would be a pointer back to the previous fragment, the implementation would be a doubly linked list. If we replace in the notation [value, pointer] the pointer with the fragment pointed to, we can write our list (1, 2, 3, 4) as [1, [2, [3, [4, Nil]]]].

This notation shows that the list we want to construct: (2, 3, 4), also writable as [2, [3, [4, Nil]]], is in fact part of the original list. The only thing to be done therefore to create the new list from the original one is to traverse the pointer in the first object [1, pointer] of the original list. Notice that this construction would not be possible if the original list were a doubly linked list. The construction of a valid doubly linked derivative [2, [3, [4, Nil]]] requires that the back-link between [2, [3, [4, Nil]]] and its surrounding environment [1, pointer] be severed. This mutates the original list and so destroys immutability. The possibility to reuse portions of an immutable singly linked list is therefore because of the blissful ignorance of a list of its surrounding environment.

We see that elementary operations on immutable singly linked lists can be done extremely fast because parts of the original list can be reused safely. Since the reused part will not be mutated, there is no risk of losing data because of other actions that operate on the same data. The operations that can be done very fast are: removal of one or more elements from the front of the list, and addition of one or more elements to the front of the list. Consider for instance the addition of one element 0 to the front of the original list. Using linear notation we want to construct the list (0, 1, 2, 3, 4). Using [data, pointer] notation we want to construct [0, [1, [2, [3, [4, Nil]]]]] and we see immediately that to create our new list we only have to create an object [0, pointer] and let pointer point to the original list, thereby reusing the entire original list.

The operations that are fast on immutable singly linked lists are known as LIFO-operations, since a singly linked list is optimal suited for LIFO (Least In, First Out) operations. Fortunately, many processes and algorithms are based partly or entirely on LIFO-operations. And where there is a mix of operations on a data structure (LIFO and otherwise), an immutable singly linked list might nevertheless be the best choice, since LIFO operations are very fast and often more than compensate for other operations that are less optimal on a singly linked list. There are also clever algorithms that implement FIFO (First In, First Out) behaviour in terms of two singly linked lists, and that are just as fast as LIFO behaviour (that is: on average contant time of FIFO insertion and extraction). Only when an algorithm or process requires a lot of random access to the container, a different container data structure is usually a better choice.

The immutable singly linked list as presented here is of course virtually identical to the central list data structure of the programming language LISP. LISP (LISt Processing) has been and is a very suitable language for the programming of complex algorithms and Artificial Intelligence thanks, among others, to this singly linked list data structure. In LISP the cryptic names “car” and “cdr” are used for the functions to extract “head” and “tail” of the singly linked list; “head” and “tail” are in my opinion the best names. Scala uses head and tail for access to parts of its immutable singly linked list. In LISP the singly linked list is in fact not immutable. LISP has the functions “setcar” and “setcdr” to mutate singly linked lists. Of course you can restrict yourself to not using these operations and be able to safely reuse parts of your data structures.

Containers need a lot of helper functionality to enhance productivity. To start with: it is nowadays no longer acceptable to have to process the contents of a container by accessing it by index or by following explicitly the links of linked lists. Higher order functions like forEach, filter, fold, map and flatMap are indispensable to do basic processing of contents of containers. Besides these higher order functions all kinds of useful utility functions are imaginable to remove elements, to drop parts of the list, to reverse the list, to flatten a nested list, etc. The usability of a container depends heavily on the availability of these higher order functions and general utility functions.

I have made my implementation of an immutable singly linked list very rich in that respect. Virtually all well-known higher order functions for containers are supported by my implementation, using currently well established names. Java 8 (or higher) is therefore a necessity, since Java 8 supports Functional Interfaces and anonymous function construction. This last thing is bit clumsy in Java when compared to what is possible with Scala and JavaScript, but it works. My initial implementation of all utility functions (with a few exceptions) was recursive. Recursive processing of singly linked lists has several advantages. Recursive functions are elegant, short and easy to make work as desired. However, list processing with recursive functions can easily give stack overflow errors when processing large lists. Tail recursion could be a solution for this, but Java does not (yet) support tail recursion optimization, and not all recursive algorithms can be transformed into tail recursion. And when they can, elegance and conciseness is sometimes lost. Therefore I have made a second implementation of all utility functions with iterative instead of recursive language constructs. The recursive implementation was and is useful to test the tests.

In building iterative utility functions for a singly linked list, one quickly discovers the problems of strict immutability. When processing a singly linked list, for instance to implement map-functionality, there is only one direction in which the list can be processed: from head to tail. When one wants to build a new list in the course of traversing the original one, the new list has necessarily reversed order with regard to the original. This is because to construct a strict immutable singly linked list, a front fragment [head-value, tail-list] has to be created in one strike. Both head-value and a completed immutable tail-list have to be available. However, when processing the source list in the only one possible order, a completed immutable tail-list cannot be available because the source elements from which this list should be derived have not yet been seen and processed. A simple solution to this problem is to create a partial result in reverse order in a first pass, and to deliver a final result by reversing the partial result in a second pass. This way of working requires the construction of an intermediate list as lengthy a the source list with the only purpose to reverse the order of a partial result. Needless to say that this is a waste of computation resources that asks for a more efficient way of working. First a few lines of example code to illustrate the points made.

To understand the code, I have to give a short introduction in the structure of my singly linked list implementation. SList is the name of the interface that each list object implements. There are two classes that implement SList: SListImpl and SListNil. SListImpl represents non-empty lists and SListNil represents empty lists. Obviously SListImpl represents the earlier introduced list fragment [value, pointer] and SListNil represents the end of the list. It is not a fragment containing a value, but a “stopper” fragment, that enables us to treat empty lists the same as non-empty lists. Normally you should not be aware of SListImpl and SListNil and always work through the interface SList. Instantiation of the concrete classes is done by static factory methods in the singleton class SL. SL is a class with only static methods. Virtually all utility functions for lists are in SL. SListImpl delegates from its methods to functions in SL. In SListNil the methods are usually implemented directly because these implementations are almost all trivial.

public static <T, R> SList<R> map(SList<T> sList, Function<T, R> f) {
    SList<R> partResult = SL.empty();
    while (!sList.isEmpty()) {
        partResult = partResult.push(f.apply(sList.head()));
        sList = sList.tail();
    }
    return reverse(partResult);
}

public static <T> SList<T> reverse(SList<T> sList) {
    SList<R> result = SL.empty();
    while (!sList.isEmpty()) {
        result = result.push(sList.head());
        sList = sList.tail();
    }
    return result;
}

The list partResult in the function map is only used to construct the final list with results in the right order. After constructing the final list with a call to reverse(), partResult is immediately discarded. It would be nice if we could find a way to bypass the construction of this intermediate result. A straight forward recursive implementation of map looks as follows:

static <T, R> SList<R> map(SList<T> sList, Function<T, R> f) {
    if (sList.isEmpty()) return SL.empty();
    else return SL.cons(f.apply(sList.head()), map(sList.tail(), f));
}

The recursive solution looks elegant and deceptively simple. Make no mistake: the ordering of the result is the same as the ordering of the original and there seems to be no intermediate list to achieve the right ordering. But that’s not entirely true… The function is not tail recursive and the intermediate result that was explicit in partResult in the iterative map function is now implicit on the call stack. The call stack is in fact also a kind of list, and could very well be implemented as a singly linked list. The chain of stack frames corresponding to recursive calls in this solution is in fact a very expensive form of intermediate result. So expensive that the recursive solution will not allow processing of lists longer than a few thousands of elements, unless you explicitly enlarge stack size on your VM. Stack frames contain much more information that only the partial result of applying f to one list element (necessary to compute the result to be returned). Therefore, the straight forward recursive solution is short and elegant, but not suitable for code that needs to be fast and/or processes large lists.

Is there no way to circumvent the construction of an intermediate result list? There is, if we are willing to relax the immutability requirements of our immutable singly linked list a little bit. The problem is that we are not able to construct a list fragment in one strike because the content value and the rest of the list are not available simultaneously. The solution is therefore to allow the construction of list fragments in multiple stages. We need to be able to create list fragments (concrete: instances of SListImpl) without supplying head and/or tail information. The head and/or the tail should be able to be fixed at a later point in time with pseudo-setters fixHead() and fixTail(). With these methods, head and tail can be set only once to a value other than null. Whenever head has a value other than null, a call of fixHead() will lead to an UnsupportedOperationException. Therefore this singly linked list will in practice remain immutable, but during the creation phase of a list, a not yet filled part of a list under construction can be filled in later. I’ve tried to solve this with the Java concept of “final blanks” but that appeared to be impossible. All final blanks have to be assigned a value in each of the constructors, so final blanks cannot be used to construct a list fragment and fill the “blanks” later. Our solution to attain maximum speed in utility methods for our singly linked list is therefore a set of new factory methods SL.part() with which a list fragment can be created with only head, or only tail, or nothing at all. The missing parts can be filled in with pseudo-setters fixHead() and fixTail().

Using these new primitives we can finally write a map function that does not create an expensive intermediate list, but instead creates the final list in one pass, using the ability to extend a created part of the list with a tail part that comes available afterwards. The SL.part() static function creates a list fragment of which some parts have to be filled later. The speed advantage over a solution with an intermediate list is in my tests about 25%.

public static <T, R> SList<R> map(SList<T> sList, Function<T, R> f) {
    SList<R> resultHandle = SL.part();
    SList<R> destWalker = resultHandle;
    while (!sList.isEmpty()) {
        destWalker.fixTail(SL.part(f.apply(sList.head())));
        sList = sList.tail();
        destWalker = destWalker.tail();
    }
    destWalker.fixTail(SL.empty());
    return resultHandle.tail();
}

The new primitives SL.part(), fixHead() and fixTail() have been given package scope, so only the library developer has access to them. To make list building in both directions available to users in a clean way, two builder classes have been added to SL: SL.SListBuilder and SL.SListRevBuilder.