GPars good for project?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

GPars good for project?

Marko Rodriguez
Hello,

My name is Marko and I work on Gremlin (http://gremlin.tinkerpop.com). Gremlin is a LINQ-ish, Groovy DSL for traversing graphs. Gremlin syntax is compiled down to Pipes (http://pipes.tinkerpop.com). Pipes is a data flow framework written in Java. A Pipe<S,E> implements Iterator<E> and it maps an object of type S to an object of type E. Besides some handy closure use cases, Groovy simply serves as a way to create dataflow pipelines. An example Gremlin expression is below where g is the graph, and v(1) is the vertex with id 1 in the graph:

         "the names of all of my friends of friends"
        g.v(1).outE('friend').inV.outE('friend').inV.name

I think GPars is a natural fit for Gremlin and could "automagically" parallelize Gremlin without requiring heavy Gremlin refactoring (if any). I was wondering if anyone had any thoughts on the matter and could ask me the questions that I should be asking to know if GPars and Gremlin would work nicely together.

Thank you for your time,
Marko.

http://markorodriguez.com
---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: GPars good for project?

Vaclav
Administrator
Hello Marko,

I'm happy to see you exploring GPars. Interestingly, I was briefly looking at Gremlin about a month ago. Let's call it coincidence:)

GPars dataflow was created as an easy-to-use implementation of the dataflow concurrency model with API designed specifically for Groovy. GPars can also be used from Java, although the code looses some of its beauty. Dataflow networks consist of independent operators/selectors, which are connected by channels with 1:1, 1:n, n:1 or n:m cardinality. The Dataflow section of the User Guide goes into the details - http://gpars.org/guide/guide/7.%20Dataflow%20Concurrency.html I believe the code samples could help you get a better feel of how well Gremlin and GPars could play together.

I think API compatibility and capability should most likely be the biggest concern at the moment. Unfortunately I don't know much about Pipes so can't judge how closely GPars covers the functionality Gremlin needs from pipes. For sure GPars may not provide out-of-the-box all the customized Pipes Gremlin may depend on. E.g. filtering pipes would have to be implemented, perhaps as a combination of a channel and a filtering operator.
What do you think? Have you tried running Gremlin on top of GPars?

Cheers,

Vaclav




On Thu, Feb 17, 2011 at 12:36 AM, Marko Rodriguez <[hidden email]> wrote:
Hello,

My name is Marko and I work on Gremlin (http://gremlin.tinkerpop.com). Gremlin is a LINQ-ish, Groovy DSL for traversing graphs. Gremlin syntax is compiled down to Pipes (http://pipes.tinkerpop.com). Pipes is a data flow framework written in Java. A Pipe<S,E> implements Iterator<E> and it maps an object of type S to an object of type E. Besides some handy closure use cases, Groovy simply serves as a way to create dataflow pipelines. An example Gremlin expression is below where g is the graph, and v(1) is the vertex with id 1 in the graph:

        "the names of all of my friends of friends"
       g.v(1).outE('friend').inV.outE('friend').inV.name

I think GPars is a natural fit for Gremlin and could "automagically" parallelize Gremlin without requiring heavy Gremlin refactoring (if any). I was wondering if anyone had any thoughts on the matter and could ask me the questions that I should be asking to know if GPars and Gremlin would work nicely together.

Thank you for your time,
Marko.

http://markorodriguez.com
---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email





--
E-mail: [hidden email]
Blog: http://www.jroller.com/vaclav
Linkedin page: http://www.linkedin.com/in/vaclavpech
Reply | Threaded
Open this post in threaded view
|

Re: GPars good for project?

Marko Rodriguez
Hi Vaclav,

> I'm happy to see you exploring GPars. Interestingly, I was briefly looking at Gremlin about a month ago. Let's call it coincidence:)

Cool. :)

> GPars dataflow was created as an easy-to-use implementation of the dataflow concurrency model with API designed specifically for Groovy...

I saw the dataflow framework of GPars and its much different (at least low-level wise) than the Pipes framework. As such, I would like to do a baby step first. I'm a coder that hates to write code :). I did this last night with GPars/Gremlin:

Here is how you calculate the primary eigenvector of a graph using standard Gremlin:

     m = [:];
     g.V.outE.inV.groupCount(m).loop(3) {it.loops < 4}

Here is how you calculate the primary eigenvector of the graph using GPars+Gremlin:

     m = new ConcurrentHashMap();
     g.V.eachParallel {it.outE.inV.groupCount(m).loop(3) {it.loops < 4} >> -1}

The average runtime over the a play graph I use was:

REGULAR SERIAL GREMLIN: ~20 seconds
GPARS CONCURRENT GREMLIN: ~15 seconds

In short, for every vertex in the graph, I create an emanating walker. Every time a walker touches a vertex, it increments a counter in a Map. With GPars, I'm able to make each walker work in parallel using eachParallel{}. Large calculations like this appear to be faster with GPars, but for short/simple calculations (where Gremlin is already working in the <20ms range, GPars only slows it down to +100ms).

Anywho. Thats all I have so far. Any thoughts on the matter, pointers, etc. would be appreciated. From here I will simply read the docs and play away.

Thanks GPars people,
Marko.

http://markorodriguez.com



---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email