Today I’ve been coding a solution for a problem we’ve encountered with @ggdaniel (cr0hn) during the development of GoLismero 2.0. It called for an implementation of an algorithm to find connected components in an undirected graph. You can find the source code at the bottom of this post.
A graph algorithm a day keeps the CS doctor away…
Suppose we have an undirected graph (connected by lines rather than arrows) in which we can find one or more “islands” of nodes that form connections to each other, but not to nodes in other “islands”. In graph theory, these “islands” are called connected components. In the image below, we see a graph with three connected components:
Now, suppose we have a set containing all nodes, and we can visit each node to know what are its neighbors, that is, the other nodes it’s connected to. We want to find all the connected components and put their nodes into separate sets. How would we do that?
I’ve been toying with dependency resolution a bit today, since it’s one of the features we plan to add to GoLismero 2.0 plugins with @ggdaniel (cr0hn). So I came up with this short example that’s reasonably fast and doesn’t use recursion at all, unlike many of the examples that I found on the net.
The basic idea is this: given a set of tasks (nodes) and the tasks that need to be performed before them, build a dependency graph and find the sets of tasks that can be run concurrently while satisfying the dependencies. For example, suppose we have tasks A, B, C and D. Task A can be run directly, it has no dependencies. Tasks B and C must be run only after A has completed, so we say B and C depend on A. Then task D depends on B and C, which in turn depend on A.
Dependency graph example
What the algorithm does, instead of traversing the graph recursively, is iteratively finding and removing from the graph all nodes that have no dependencies – that is, no arrows coming out of them. In our example, the first iteration removes node A, the second iteration removes nodes B and C, and the last iteration removes node D. And these are precisely the three batches of tasks that can run concurrently – first task A runs, on completion tasks B and C can run in parallel, and once both are finished task D can be started.
If at some point there are still nodes in the graph but we can’t find any nodes without dependencies, that means we have a circular dependency.
Circular dependency graph example