Breaking Code

March 11, 2013

An example dependency resolution algorithm in Python

Filed under: Programming — Tags: , , — Mario Vilas @ 2:36 pm

I’ve been toying with dependency resolution a bit today, since it’s one of the features we plan to add to GoLismero 2.0 plugins with @ggdaniel (cr0hn). So I came up with this short example that’s reasonably fast and doesn’t use recursion at all, unlike many of the examples that I found on the net.

The basic idea is this: given a set of tasks (nodes) and the tasks that need to be performed before them, build a dependency graph and find the sets of tasks that can be run concurrently while satisfying the dependencies. For example, suppose we have tasks A, B, C and D. Task A can be run directly, it has no dependencies. Tasks B and C must be run only after A has completed, so we say B and C depend on A. Then task D depends on B and C, which in turn depend on A.

Dependency graph example

Dependency graph example

What the algorithm does, instead of traversing the graph recursively, is iteratively finding and removing from the graph all nodes that have no dependencies – that is, no arrows coming out of them. In our example, the first iteration removes node A, the second iteration removes nodes B and C, and the last iteration removes node D. And these are precisely the three batches of tasks that can run concurrently – first task A runs, on completion tasks B and C can run in parallel, and once both are finished task D can be started.

If at some point there are still nodes in the graph but we can’t find any nodes without dependencies, that means we have a circular dependency.

Circular dependency graph example

Circular dependency graph example

Here’s an example of solving a larger graph, and detecting a circular dependency:

        $ ./example-dependencies.py
        A working dependency graph example:
        c -> a
        e -> c
        e -> d
        d -> b
        g -> e
        g -> f
        f -> a
        f -> b
        i -> a
        h -> g
        j -> b

        Batches:
        b, a
        i, d, c, f, j
        e
        g
        h

        A broken dependency graph example:
        a -> i
        c -> a
        e -> c
        e -> d
        d -> b
        g -> e
        g -> f
        f -> a
        f -> b
        i -> a
        h -> g
        j -> b

        Trying to resolve the dependencies will raise an exception:

        Traceback (most recent call last):
          File "example-dependencies.py", line 108, in 
            get_task_batches(nodes)
          File "example-dependencies.py", line 42, in get_task_batches
            raise ValueError(msg)
        ValueError: Circular dependencies found!
        a -> i
        c -> a
        e -> c
        g -> e
        g -> f
        f -> a
        i -> a
        h -> g
        $

Download

example-dependencies.py

Source code

#!/usr/bin/env python

# Dependency resolution example in Python
# By Mario Vilas (mvilas at gmail dot com)

# The graph nodes
class Task(object):
    def __init__(self, name, *depends):
        self.__name    = name
        self.__depends = set(depends)

    @property
    def name(self):
        return self.__name

    @property
    def depends(self):
        return self.__depends

# "Batches" are sets of tasks that can be run together
def get_task_batches(nodes):

    # Build a map of node names to node instances
    name_to_instance = dict( (n.name, n) for n in nodes )

    # Build a map of node names to dependency names
    name_to_deps = dict( (n.name, set(n.depends)) for n in nodes )

    # This is where we'll store the batches
    batches = []

    # While there are dependencies to solve...
    while name_to_deps:

        # Get all nodes with no dependencies
        ready = {name for name, deps in name_to_deps.iteritems() if not deps}

        # If there aren't any, we have a loop in the graph
        if not ready:
            msg  = "Circular dependencies found!\n"
            msg += format_dependencies(name_to_deps)
            raise ValueError(msg)

        # Remove them from the dependency graph
        for name in ready:
            del name_to_deps[name]
        for deps in name_to_deps.itervalues():
            deps.difference_update(ready)

        # Add the batch to the list
        batches.append( {name_to_instance[name] for name in ready} )

    # Return the list of batches
    return batches

# Format a dependency graph for printing
def format_dependencies(name_to_deps):
    msg = []
    for name, deps in name_to_deps.iteritems():
        for parent in deps:
            msg.append("%s -> %s" % (name, parent))
    return "\n".join(msg)

# Create and format a dependency graph for printing
def format_nodes(nodes):
    return format_dependencies(dict( (n.name, n.depends) for n in nodes ))

# The test code
if __name__ == "__main__":

    # An example, working dependency graph
    a = Task("a")
    b = Task("b")
    c = Task("c", "a")
    d = Task("d", "b")
    e = Task("e", "c", "d")
    f = Task("f", "a", "b")
    g = Task("g", "e", "f")
    h = Task("h", "g")
    i = Task("i", "a")
    j = Task("j", "b")
    k = Task("k")
    nodes = (a, b, c, d, e, f, g, h, i, j)

    # Show it on screen
    print "A working dependency graph example:"
    print format_nodes(nodes)
    print

    # Show the batches on screen
    print "Batches:"
    for bundle in get_task_batches(nodes):
        print ", ".join(node.name for node in bundle)
    print

    # An example, *broken* dependency graph
    a = Task("a", "i")
    nodes = (a, b, c, d, e, f, g, h, i, j)

    # Show it on screen
    print "A broken dependency graph example:"
    print format_nodes(nodes)
    print

    # This should raise an exception and show the current state of the graph
    print "Trying to resolve the dependencies will raise an exception:"
    print
    get_task_batches(nodes)

4 Comments »

  1. Has this been made into a library?

    Comment by Hok Shun Poon (@hokshunpoon) — September 23, 2013 @ 3:54 pm

  2. I’m afraid not, but check out NetworkX for an excellent Python library for working with graphs: http://networkx.github.io/

    In case you’re curious, this algorithm ended up in GoLismero here: https://github.com/golismero/golismero/blob/master/golismero/managers/pluginmanager.py

    Comment by Mario Vilas — September 23, 2013 @ 3:57 pm

  3. Hi this is exactly what I need to implement but for a large data set, when I tried your code, this give circular dependencies are found, which is incorrect, since the data set is more of a n-ary dependency tree. I need to batch process the dep tree. Could you please recommend an alternative for large set.

    Comment by petr — August 21, 2016 @ 11:15 am

  4. @petr: for a large dataset I’d recommend to just go with a good known solution. I suppose networkx already supports this. đŸ™‚

    Comment by Mario Vilas — August 21, 2016 @ 3:02 pm


RSS feed for comments on this post. TrackBack URI

Leave a comment

Create a free website or blog at WordPress.com.