Topological Ordering

Kevin Watkins

Feb 24, 2020 • 2 min read

Topological ordering is ordering dependent items in such a way that items you depend upon come first in the ordering. For example if x depends on y and y depends on z then the topological order would be z, y, x. Maybe dependency ordering would be a better name; it took me a long time to find details of an algorithm online simply because I didn't know it was called 'topological ordering'... My particular use case was making sure types in the language I'm developing were compiled in a sensible order, i.e. if a type X has a field of type Y, compile type Y first.

One way to produce a topological ordering is to think of the items and their dependencies as nodes in a graph. We can utilise a depth first traversal of the graph to help us order things correctly. The deepest node will have no dependencies so should be the first to be returned. The next nodes will all depend directly on the deepest only so are the next to be returned, and so on. We have to be make sure not to return the same node multiple times given a node can have multiple nodes that depend on it. We also need to make sure a node doesn't depend on itself either directly (x depends on x) or indirectly (x depends on y which depends on x) as then we cannot possibly topologically order the nodes. A lazy version in C# might look something like this:

public static IEnumerable<T> TopologicalOrder<T, TKey>(
   this IEnumerable<T> source, 
   Func<T, IEnumerable<T>> dependentOnSelector,
   Func<T, TKey> keySelector,
   IEqualityComparer<TKey> keyComparer)
{
   // Keep track of the nodes we've visited. An entry means we've
   // visited it. If the value stored against the node is true then
   // we are currently processing that node and nodes dependent on
   // it. false means we've seen it already but it's not in the 
   // current path through the graph.
   var visited = new Dictionary<TKey, bool>(keyComparer);

   return source.SelectMany(
      item => Visit(item, dependentOnSelector, keySelector, visited));
}

private static IEnumerable<T> Visit<T, TKey>(
   T node,
   Func<T, IEnumerable<T>> dependentOnSelector,
   Func<T, TKey> keySelector,
   Dictionary<TKey, bool> visited)
{
   var key = keySelector(node);

   // Have we already visited this node?
   if (visited.TryGetValue(key, out var inProcess))
   {
      // Yes we have. Are we currently processing that node
      // or any node dependent on it?
      if (inProcess)
      {
         // Yes we are. That means we have a cycle, as this node
         // is dependent upon itself.
         throw new InvalidOperationException(
            "Cyclic dependency found.");
      }
      
      // If we reach here we have visited the node already so don't
      // need to do anything further.
      yield break;
   }

   // Not yet visited this node. Mark it as currently being
   // processed.
   visited[key] = true;

   // Find all the nodes that depend on this one.
   var nodesDependentOn = dependentOnSelector(node);

   // Topologically order those nodes and return them.
   foreach (var nodeDependentOn in nodesDependentOn.SelectMany(c => Visit(c, dependentOnSelector, keySelector, visited)))
   {
      yield return nodeDependentOn;
   }

   // We have now returned all nodes dependent on this one so we 
   // are safe to return this one.
   yield return node;

   // Mark it as having been visited, but we're not currently
   // processing it or those that depend on it.
   visited[key] = false;
}

That's it. Not super complicated but can be a little tricky to get your head around at first. You can find it in my sample data structures and algorithms repository; the version there has niceties like null checking and including details of the nodes found in a cycle in the exception to make debugging easier.