Discussion:
ModuleLayer#layers() and Configuration#configurations() produces non-topological ordering
Luke Hutchison
2018-06-15 03:37:48 UTC
Permalink
Hi, I noticed that the method ModuleLayer#layers() uses the following code
to get the module layers in resolution order:

/**
* Returns an ordered stream of layers. The first element is this layer,
* the remaining elements are the parent layers in DFS order.
*
* @implNote For now, the assumption is that the number of elements will
* be very low and so this method does not use a specialized
spliterator.
*/
Stream<ModuleLayer> layers() {
List<ModuleLayer> allLayers = this.allLayers;
if (allLayers != null)
return allLayers.stream();

allLayers = new ArrayList<>();
Set<ModuleLayer> visited = new HashSet<>();
Deque<ModuleLayer> stack = new ArrayDeque<>();
visited.add(this);
stack.push(this);

while (!stack.isEmpty()) {
ModuleLayer layer = stack.pop();
allLayers.add(layer);

// push in reverse order
for (int i = layer.parents.size() - 1; i >= 0; i--) {
ModuleLayer parent = layer.parents.get(i);
if (!visited.contains(parent)) {
visited.add(parent);
stack.push(parent);
}
}
}

this.allLayers = allLayers =
Collections.unmodifiableList(allLayers);
return allLayers.stream();
}

This code represents a preorder DFS. The resulting order is *not* a
topological sort ordering. This can lead to an ancestral layer being listed
in the final ordering before a descendant layer. For example, consider
these four layer -> parent relationships:

A -> [ B, C ]
B -> [ D ]
C -> [ D ]
D -> [ ]

The ordering code as given in ModuleLayer will produce the order [ A, B, D,
C ], causing the layers D and C to be out of order. The order is not a
toplological sort ordering, which guarantees that all descendants of a node
are listed before the node, and all parents of the node are listed after it.

To fix this, the search order should be changed to a *reversed postorder
DFS*. The current code uses a stack to implement iterative DFS. However,
the recursive postorder DFS algorithm needs to do some work after the
recursive call returns (specifically adding the current node to the
beginning of the output ordering), so in effect, the current iterative code
cannot be directly modified to yield a postorder traversal, without having
two different types of nodes pushed onto the stack ("recurse to parent"
nodes and "return from child" nodes), effectively mimicking Continuation
Passing Style. It is simpler therefore to change the code to make it
recursive.

The list allLayers should be changed to a LinkedList, allowing nodes to be
pushed onto the beginning of the list, so that the ordering doesn't have to
be reversed at the end (as required by topological sorting). The
allLayers.add(layer) line should be moved after the iteration returns (to
give postorder traversal), and should be changed to allLayers.push(layer) (to
reverse the output ordering). Also, the loop through parent nodes should no
longer be inverted, since we're not using a FIFO stack to process parents
anymore.

The fixed algorithm is as follows:

/** Recursively find the topological sort order of ancestral layers. */
private void findLayerOrder(Set<ModuleLayer> visited,
LinkedList<ModuleLayer> allLayersOut) {
if (visited.add(this)) {
for (int i = 0; i < parents.size(); i++) {
ModuleLayer parent = parents.get(i);
parent.findLayerOrder(visited, allLayersOut);
}
allLayersOut.push(this);
}
}

/**
* Returns an ordered stream of layers. The first element is this layer,
* the remaining elements are the parent layers in topological sort
order.
*
* @implNote For now, the assumption is that the number of elements will
* be very low and so this method does not use a specialized
spliterator.
*/
Stream<ModuleLayer> layers() {
if (this.allLayers == null) {
LinkedList<ModuleLayer> allLayers = new LinkedList<>();
findLayerOrder(new HashSet<ModuleLayer>(), allLayers);
this.allLayers = Collections.unmodifiableList(allLayers);
}
return this.allLayers.stream();
}

The same changes should also be made to Configuration#configurations() ,
since it uses the same broken DFS algorithm.

Is there anywhere else in the java.lang.module code that needs to be fixed
to properly perform topological sort?
David Lloyd
2018-06-15 14:25:36 UTC
Permalink
Post by Luke Hutchison
The list allLayers should be changed to a LinkedList, allowing nodes to be
pushed onto the beginning of the list, so that the ordering doesn't have to
A small quibble: LinkedList should normally never be used. You can
push things on to the beginning of an ArrayDeque (it is a double-ended
queue after all).
Post by Luke Hutchison
longer be inverted, since we're not using a FIFO stack to process parents
A stack would be LIFO. But at any rate the above comment applies:
anything LinkedList can do, ArrayDeque can probably do better.
--
- DML
Alan Bateman
2018-06-15 14:50:14 UTC
Permalink
Post by Luke Hutchison
This code represents a preorder DFS. The resulting order is *not* a
topological sort ordering. This can lead to an ancestral layer being listed
in the final ordering before a descendant layer. For example, consider
A -> [ B, C ]
B -> [ D ]
C -> [ D ]
D -> [ ]
The ordering code as given in ModuleLayer will produce the order [ A, B, D,
C ], causing the layers D and C to be out of order.
Multi parent configurations is a somewhat niche area but it is specified
to be DFS. If there someone in spec that has lead to you to assume the
above is incorrect? I'm just wondering if there is wording that need to
be improved or fixed somewhere.

-Alan
Luke Hutchison
2018-06-26 00:00:55 UTC
Permalink
(Sorry Alan for the dup -- I forgot to Reply-All to the list)
Post by Alan Bateman
Post by Luke Hutchison
The ordering code as given in ModuleLayer will produce the order [ A, B,
D,
Post by Luke Hutchison
C ], causing the layers D and C to be out of order.
Multi parent configurations is a somewhat niche area but it is specified
to be DFS. If there someone in spec that has lead to you to assume the
above is incorrect? I'm just wondering if there is wording that need to
be improved or fixed somewhere.
The alternative algorithm I suggested is still DFS -- it is just
postorder+reversed in its output, rather than pretorder.

It is not in reference to the spec that I make this suggestion, it is due
to the logical argument that the existing code is written not just to
handle tree-structured multi-parenting, but fully DAG-structured ancestral
graphs -- and because the existing code appears to attempt to ensure that
(1) a node is listed before its parent in the output, and (2) if there are
multiple parents, those parents are listed in the same relative order in
the output. These two criteria together would turn the partial ordering of
the DAG into a total ordering without any ambiguity. However, criterion (1)
is violated by the current code in DAG-structured ancestral graphs, in all
but the first directed path to the most distant common ancestor node.

I'm assuming that since more than one module may define classes in the same
package, it is possible to have the same class defined multiple times in
different layers. Therefore, if the intent of specifying that DFS should be
used for module resolution is that class definitions in shallower layers
should mask definitions of the same class in deeper layers, then
topological ordering does in fact matter -- because otherwise, as shown in
the example I gave, it is possible for deeper layers to be reached before
shallower layers.
Post by Alan Bateman
Post by Luke Hutchison
The list allLayers should be changed to a LinkedList, allowing nodes to
be
Post by Luke Hutchison
pushed onto the beginning of the list, so that the ordering doesn't have
to
A small quibble: LinkedList should normally never be used. You can
push things on to the beginning of an ArrayDeque (it is a double-ended
queue after all).
Thank you, I always assumed that ArrayDeque had O(size()) time for
insertion at the head -- but I just looked at the implementation and it's
O(1) -- very clever :-)
Post by Alan Bateman
Post by Luke Hutchison
This code represents a preorder DFS. The resulting order is *not* a
topological sort ordering. This can lead to an ancestral layer being
listed
Post by Luke Hutchison
in the final ordering before a descendant layer. For example, consider
A -> [ B, C ]
B -> [ D ]
C -> [ D ]
D -> [ ]
The ordering code as given in ModuleLayer will produce the order [ A, B,
D,
Post by Luke Hutchison
C ], causing the layers D and C to be out of order.
Multi parent configurations is a somewhat niche area but it is specified
to be DFS. If there someone in spec that has lead to you to assume the
above is incorrect? I'm just wondering if there is wording that need to
be improved or fixed somewhere.
-Alan
Luke Hutchison
2018-07-19 01:13:36 UTC
Permalink
Hi Alan, I was just wondering if you could please comment on this issue I
Post by Luke Hutchison
I'm assuming that since more than one module may define classes in the
same package, it is possible to have the same class defined multiple times
in different layers. Therefore, if the intent of specifying that DFS should
be used for module resolution is that class definitions in shallower layers
should mask definitions of the same class in deeper layers, then
topological ordering does in fact matter -- because otherwise, as shown in
the example I gave, it is possible for deeper layers to be reached before
shallower layers.
My specific questions are:

(1) Is it indeed possible for a class to be defined multiple times
(potentially differently) in different module layers, using the same
fully-qualified class name; and

(2) If this is possible, are the intended semantics in this case for the
class definition in descendant (child) layers to mask the class definition
in ancestral (parent) layers -- similarly to how a class definition found
earlier in the traditional classpath will mask other definitions of the
same class later in the classpath?

Loading...